Kendall's Tau Rank Correlation Statistic

From GM-RKB
Jump to navigation Jump to search

A Kendall's Tau (τ) Rank Correlation Statistic is non-parametric rank correlation statistic between the ranking of two variables when the measures are not equidistant.



References

2017a

[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \gt 0 }[/math] - pair is concordant
[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \lt 0 }[/math] - pair is discordant
[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} = 0 }[/math] - pair is considered a tie
[math]\displaystyle{ X_i = X_j }[/math] - pair is not compared
Kendall's tau is computed as
[math]\displaystyle{ \tau=\frac{N_c−N_d}{N_c+N_d} }[/math]
with [math]\displaystyle{ N_c }[/math] and [math]\displaystyle{ N_d }[/math] denoting the number of concordant pairs and the number of discordant pairs, respectively, in the sample. Ties add 0.5 to both the concordant and discordant counts. There are [math]\displaystyle{ \binom n 2 }[/math] possible pairs in the bivariate sample.
A value of +1 indicates that all pairs are concordant, a value of -1 indicates that all pairs are discordant, and a value of 0 indicates no relation (i.e., independence).
The Kendall tau independence test is a test of whether the Kendall tau coefficient is equal to zero.
For larger n (e.g., n > 60) or the case where there are many ties, the p-th upper quantile of the Kendall tau statistic can be approximated by
[math]\displaystyle{ w_p=zp\frac{\sqrt{2(2n+5)}}{3\sqrt{n(n−1)}} }[/math]
with [math]\displaystyle{ z_p }[/math] and [math]\displaystyle{ n }[/math] denoting the [math]\displaystyle{ p }[/math]-th quantile of the standard normal distribution and the sample size, respectively. The lower quantile is the negative of the upper quantile.
For a two-sided test, the p-value is computed as twice the minimum of the lower tailed and upper tailed quantiles.
For [math]\displaystyle{ n \leq 60 }[/math], tabulated quantiles (from Table A11 on pp. 543-544 of Conover) are used. These quantiles are exact when there are no ties in the data.

2017b

The Kendall coefficient is denoted with the Greek letter tau (τ).
[math]\displaystyle{ \tau = (4P / (n * (n - 1))) - 1 }[/math]
Where P is the number of concordant pairs and is calculated as the sum over all the items, of items ranked after the given item by both rankings.
(...) Kendall is used with two ordinal variables or an ordinal and an interval.
Before computers were commonly available, Spearman correlation was often used as a substitute as it was easier to calculate. Kendall is now often viewed as being a superior metrics.
The measure is sometimes just referred to as 'Kendall's tau'

2017c

Suppose two observations [math]\displaystyle{ (X_i , Y_i ) }[/math] and [math]\displaystyle{ (X_j , Y_j) }[/math] are concordant if they are in the same order with respect to each variable. That is, if
(1) [math]\displaystyle{ X_i \lt X_j }[/math] and [math]\displaystyle{ Y_i \lt Y_j }[/math] , or if
(2) [math]\displaystyle{ X_i \gt X_j }[/math] and [math]\displaystyle{ Y_i \gt Y_j }[/math]
They are discordant if they are in the reverse ordering for X and Y, or the values are arranged in opposite directions. That is, if
(1) [math]\displaystyle{ X_i \lt X_j }[/math] and [math]\displaystyle{ Y_i \gt Y_j }[/math] , or if
(2) [math]\displaystyle{ X_i \gt X_j }[/math] and [math]\displaystyle{ Y_i \lt Y_j }[/math]
The two observations are tied if [math]\displaystyle{ X_i = X_j }[/math] and/or [math]\displaystyle{ Y_i = Y_j }[/math] .
The total number of pairs that can be constructed for a sample size of [math]\displaystyle{ n }[/math] is
[math]\displaystyle{ N=\binom n 2 = \frac{1}{2}n(n−1) }[/math]
[math]\displaystyle{ N }[/math] can be decomposed into these five quantities:
[math]\displaystyle{ N=P+Q+X_0+Y_0+(XY)_0 }[/math]
where [math]\displaystyle{ P }[/math] is the number of concordant pairs, [math]\displaystyle{ Q }[/math] is the number of discordant pairs, [math]\displaystyle{ X_0 }[/math] is the number of pairs tied only on the [math]\displaystyle{ X }[/math] variable, [math]\displaystyle{ Y_0 }[/math] is the number of pairs tied only on the [math]\displaystyle{ Y }[/math] variable, and [math]\displaystyle{ (XY)_0 }[/math] is the number of pairs tied on both [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
The Kendall tau-b for measuring order association between variables X and Y is given by the following formula:
[math]\displaystyle{ t_b=\frac{P−Q}{(P+Q+X_0)(P+Q+Y_0)} }[/math]
This value becomes scaled and ranges between -1 and +1. Unlike Spearman it does estimate a population variance as:
[math]\displaystyle{ t_b }[/math] is the sample estimate of [math]\displaystyle{ t_b=Pr[concordance]−Pr[discordance] }[/math]
The Kendall tau-b has properties similar to the properties of the Spearman rs. Because the sample estimate, [math]\displaystyle{ t_b }[/math] , does estimate a population parameter, [math]\displaystyle{ t_b }[/math] , many statisticians prefer the Kendall tau-b to the Spearman rank correlation. coefficient.

2015a

2015b

(...)The definition of Kendall’s tau that is used is:
[math]\displaystyle{ \tau = (P - Q) / \sqrt{((P + Q + T) * (P + Q + U))} }[/math]
where P is the number of concordant pairs, Q the number of discordant pairs, T the number of ties only in x, and U the number of ties only in y. If a tie occurs for the same pair in both x and y, it is not added to either T or U.

2011