Sample Correlation Coefficient Value

From GM-RKB
Jump to navigation Jump to search

A Sample Correlation Coefficient Value is a sample statistic that measures the correlation coefficient between two datasets.



References

2017a

[math]\displaystyle{ r = [ 1 / (n - 1) ] * Σ { [ (x_i - x) / s_x ] * [ (y_i - y) / s_y ] } }[/math]
where n is the number of observations in the sample, Σ is the summation symbol, x_i is the x value for observation i, x is the sample mean of x, [math]\displaystyle{ y_i }[/math] is the y value for observation i, y is the sample mean of y, [math]\displaystyle{ s_x }[/math] is the sample standard deviation of x, and [math]\displaystyle{ s_y }[/math] is the sample standard deviation of y.
Each of the latter two formulas can be derived from the first formula. Use the first or second formula when you have data from the entire population. Use the third formula when you only have sample data, but want to estimate the correlation in the population. When in doubt, use the first formula.

2017b

  • (Stat 509, 2017) ⇒ Design and Analysis of Clinical Trials, The Pennsylvania State University 18.1 - Pearson Correlation Coefficient https://onlinecourses.science.psu.edu/stat509/node/156
    • Suppose that we have two variables of interest, denoted as [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math], and suppose that we have a bivariate sample of size [math]\displaystyle{ n }[/math]:
[math]\displaystyle{ (X_1 , Y_1 ), (X_2 , Y_2 ), ... , (X_n , Y_n ) }[/math]
and we define the following statistics:
[math]\displaystyle{ \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i,\quad S_{XX}=\frac{1}{n−1}\sum_{i=1}^n(X_i−\bar{X})^2 }[/math]
[math]\displaystyle{ \bar{Y}=\frac{1}{n}\sum_{i=1}^n Y_i,\quad S_{YY}=\frac{1}{n−1}\sum_{i=1}^n(Y_i−\bar{Y})^2 }[/math]
[math]\displaystyle{ S_{XY}=\frac{1}{n−1}\sum_{i=1}^n(X_i−\bar{X})(Y_i−\bar{Y}) }[/math]
These statistics above represent the sample mean for X, the sample variance for X, the sample mean for Y, the sample variance for Y, and the sample covariance between X and Y, respectively. These should be very familiar to you.
The sample Pearson correlation coefficient (also called the sample product-moment correlation coefficient) for measuring the association between variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is given by the following formula:
[math]\displaystyle{ r_p=\frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}} }[/math]
The sample Pearson correlation coefficient, [math]\displaystyle{ r_p }[/math] , is the point estimate of the population Pearson correlation coefficient
[math]\displaystyle{ \rho_p=\frac{\sigma_{XY}}{\sqrt{\sigma_{XX}\sigma_{YY}}} }[/math]
The Pearson correlation coefficient measures the degree of linear relationship between [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ -1 ≤ r_p ≤ +1 }[/math], so that [math]\displaystyle{ r_p }[/math] is a "unitless" quantity, i.e., when you construct the correlation coefficient the units of measurement that are used cancel out. A value of +1 reflects perfect positive correlation and a value of -1 reflects perfect negative correlation.

2015a

[math]\displaystyle{ s(x,y)=\frac{1}{n−1}\sum_{i=1}^n[x_i−m(x)][y_i−m(y)] }[/math]
Assuming that the data vectors are not constant, so that the standard deviations are positive, the sample correlation is defined to be
[math]\displaystyle{ r(x,y)=\frac{s(x,y)}{s(x)s(y)} }[/math]
Note that the sample covariance is an average of the product of the deviations of the [math]\displaystyle{ x }[/math] and y data from their means. Thus, the physical unit of the sample covariance is the product of the units of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]. Correlation is a standardized version of covariance. In particular, correlation is dimensionless (has no physical units), since the covariance in the numerator and the product of the standard deviations in the denominator have the same units (the product of the units of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]). Note also that covariance and correlation have the same sign: positive, negative, or zero. In the first case, the data x and y are said to be positively correlated; in the second case x and y are said to be negatively correlated; and in the third case x and y are said to be uncorrelated.
To see that the sample covariance is a measure of association, recall first that the point [math]\displaystyle{ (m(x),m(y)) }[/math] is a measure of the center of the bivariate data. Indeed, if each point is the location of a unit mass, then [math]\displaystyle{ (m(x),m(y)) }[/math] is the center of mass as defined in physics.

2015b

correlation]] (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. It is widely used in the sciences as a measure of the degree of linear dependence between two variables. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. [1] [2]

  1. See: * As early as 1877, Galton was using the term "reversion" and the symbol “r” for what would become "regression". F. Galton (5, 12, 19 April 1877) "Typical laws of heredity," Nature, 15 (388, 389, 390) : 492–495 ; 512–514 ; 532–533. In the "Appendix" on page 532, Galton uses the term "reversion" and the symbol r. * (F. Galton) (September 24, 1885), "The British Association: Section II, Anthropology: Opening address by Francis Galton, F.R.S., etc., President of the Anthropological Institute, President of the Section," Nature, 32 (830) : 507–510. * Galton, F. (1886) "Regression towards mediocrity in hereditary stature," Journal of the Anthropological Institute of Great Britain and Ireland, 15 : 246–263.
  2. Karl Pearson (June 20, 1895) "Notes on regression and inheritance in the case of two parents," Proceedings of the Royal Society of London, 58 : 240–242.