Kendall's Tau Rank Correlation Statistic

A Kendall's Tau (τ) Rank Correlation Statistic is non-parametric rank correlation statistic between the ranking of two variables when the measures are not equidistant.

Context
- It can be defined as [math]\displaystyle{ \tau = \frac{P-Q}{P+Q} }[/math] where [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] are the number of concordant pairs and the number of discordant pairs, respectively.
- …
Counter-Example(s):
See: Kendall Tau Correlation Test, Time Series, Statistics, τ, Statistic, Association (Statistics), Non-Parametric Statistics, Hypothesis Test, Rank Correlation.

References

2017a

(ITL-SED, 2017) ⇒ Retrieved 2017-01-08 from NIST (National Intitute of Standards and Technology, US) website. http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/kend_tau.htm
- Kendall's tau coefficient is a measure of concordance between two paired variables. Given the pairs [math]\displaystyle{ (X_i,Y_i) }[/math] and [math]\displaystyle{ (Xj,Yj) }[/math], then

[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \gt 0 }[/math] - pair is concordant

[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} \lt 0 }[/math] - pair is discordant

[math]\displaystyle{ \frac{Y_j−Y_i}{X_j−X_i} = 0 }[/math] - pair is considered a tie

[math]\displaystyle{ X_i = X_j }[/math] - pair is not compared

Kendall's tau is computed as

[math]\displaystyle{ \tau=\frac{N_c−N_d}{N_c+N_d} }[/math]

with [math]\displaystyle{ N_c }[/math] and [math]\displaystyle{ N_d }[/math] denoting the number of concordant pairs and the number of discordant pairs, respectively, in the sample. Ties add 0.5 to both the concordant and discordant counts. There are [math]\displaystyle{ \binom n 2 }[/math] possible pairs in the bivariate sample.

A value of +1 indicates that all pairs are concordant, a value of -1 indicates that all pairs are discordant, and a value of 0 indicates no relation (i.e., independence).

The Kendall tau independence test is a test of whether the Kendall tau coefficient is equal to zero.

For larger n (e.g., n > 60) or the case where there are many ties, the p-th upper quantile of the Kendall tau statistic can be approximated by

[math]\displaystyle{ w_p=zp\frac{\sqrt{2(2n+5)}}{3\sqrt{n(n−1)}} }[/math]

with [math]\displaystyle{ z_p }[/math] and [math]\displaystyle{ n }[/math] denoting the [math]\displaystyle{ p }[/math]-th quantile of the standard normal distribution and the sample size, respectively. The lower quantile is the negative of the upper quantile.

For a two-sided test, the p-value is computed as twice the minimum of the lower tailed and upper tailed quantiles.

For [math]\displaystyle{ n \leq 60 }[/math], tabulated quantiles (from Table A11 on pp. 543-544 of Conover) are used. These quantiles are exact when there are no ties in the data.

2017b

(CM, 2017) ⇒ http://changingminds.org/explanations/research/analysis/kendall.htm
- The Kendall Tau Rank Correlation Coefficient is used to measure the degree of correspondence between sets of rankings where the measures are not equidistant. It is used with non-parametric data

The Kendall coefficient is denoted with the Greek letter tau (τ).

[math]\displaystyle{ \tau = (4P / (n * (n - 1))) - 1 }[/math]

Where P is the number of concordant pairs and is calculated as the sum over all the items, of items ranked after the given item by both rankings.

(...) Kendall is used with two ordinal variables or an ordinal and an interval.

Before computers were commonly available, Spearman correlation was often used as a substitute as it was easier to calculate. Kendall is now often viewed as being a superior metrics.

The measure is sometimes just referred to as 'Kendall's tau'

2017c

(Stat 509, 2017) ⇒ Design and Analysis of Clinical Trials, The Pennsylvania State University, 18.3 - Kendall Tau-b Correlation Coefficient https://onlinecourses.science.psu.edu/stat509/node/158
- The Kendall tau-b correlation coefficient,[math]\displaystyle{ \tau_b }[/math] , is a nonparametric measure of association based on the number of concordances and discordances in paired observations.

Suppose two observations [math]\displaystyle{ (X_i , Y_i ) }[/math] and [math]\displaystyle{ (X_j , Y_j) }[/math] are concordant if they are in the same order with respect to each variable. That is, if

(1) [math]\displaystyle{ X_i \lt X_j }[/math] and [math]\displaystyle{ Y_i \lt Y_j }[/math] , or if

(2) [math]\displaystyle{ X_i \gt X_j }[/math] and [math]\displaystyle{ Y_i \gt Y_j }[/math]

They are discordant if they are in the reverse ordering for X and Y, or the values are arranged in opposite directions. That is, if

(1) [math]\displaystyle{ X_i \lt X_j }[/math] and [math]\displaystyle{ Y_i \gt Y_j }[/math] , or if

(2) [math]\displaystyle{ X_i \gt X_j }[/math] and [math]\displaystyle{ Y_i \lt Y_j }[/math]

The two observations are tied if [math]\displaystyle{ X_i = X_j }[/math] and/or [math]\displaystyle{ Y_i = Y_j }[/math] .

The total number of pairs that can be constructed for a sample size of [math]\displaystyle{ n }[/math] is

[math]\displaystyle{ N=\binom n 2 = \frac{1}{2}n(n−1) }[/math]

[math]\displaystyle{ N }[/math] can be decomposed into these five quantities:

[math]\displaystyle{ N=P+Q+X_0+Y_0+(XY)_0 }[/math]

where [math]\displaystyle{ P }[/math] is the number of concordant pairs, [math]\displaystyle{ Q }[/math] is the number of discordant pairs, [math]\displaystyle{ X_0 }[/math] is the number of pairs tied only on the [math]\displaystyle{ X }[/math] variable, [math]\displaystyle{ Y_0 }[/math] is the number of pairs tied only on the [math]\displaystyle{ Y }[/math] variable, and [math]\displaystyle{ (XY)_0 }[/math] is the number of pairs tied on both [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].

The Kendall tau-b for measuring order association between variables X and Y is given by the following formula:

[math]\displaystyle{ t_b=\frac{P−Q}{(P+Q+X_0)(P+Q+Y_0)} }[/math]

This value becomes scaled and ranges between -1 and +1. Unlike Spearman it does estimate a population variance as:

[math]\displaystyle{ t_b }[/math] is the sample estimate of [math]\displaystyle{ t_b=Pr[concordance]−Pr[discordance] }[/math]

The Kendall tau-b has properties similar to the properties of the Spearman rs. Because the sample estimate, [math]\displaystyle{ t_b }[/math] , does estimate a population parameter, [math]\displaystyle{ t_b }[/math] , many statisticians prefer the Kendall tau-b to the Spearman rank correlation. coefficient.

2015a

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient Retrieved:2015-8-20.
- In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient (after the Greek letter τ), is a statistic used to measure the association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.
  It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

2015b

(Scipy.org, 2015) ⇒ The Scipy community, Reference Guide https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.kendalltau.html
- Kendall’s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, values close to -1 indicate strong disagreement. This is the tau-b version of Kendall’s tau which accounts for ties.

(...)The definition of Kendall’s tau that is used is:

[math]\displaystyle{ \tau = (P - Q) / \sqrt{((P + Q + T) * (P + Q + U))} }[/math]

where P is the number of concordant pairs, Q the number of discordant pairs, T the number of ties only in x, and U the number of ties only in y. If a tie occurs for the same pair in both x and y, it is not added to either T or U.

2011

(Sammut & Webb, 2011) ⇒ Claude Sammut, and Geoffrey I. Webb. (2011). “Rank Correlation.” In: (Sammut & Webb, 2011) p.828
- QUOTE: Kendall’s tau is the number of pairwise rank inversions between τ and τ′, again normalized to the range [−1, +1]: :[math]\displaystyle{ (τ,τ′) ↦ 1 − \frac{4|{(i,j)∣i\lt j,τ(i)\lt τ(j) ∧ τ′(i)\gt τ′(j)}|}{m(m−1)} \ (2) }[/math]