Pearson's (r) Product-Moment Correlation Coefficient

From GM-RKB
(Redirected from Pearson Correlation)
Jump to navigation Jump to search

A Pearson's (r) Product-Moment Correlation Coefficient is a correlation coefficient defined as [math]\displaystyle{ r =\frac{\sum xy}{\sqrt{\sum x^2 \sum y^2}} }[/math] where [math]\displaystyle{ x=x_i-\bar{x} }[/math] and [math]\displaystyle{ y=y_i-\bar{y} }[/math], [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] are values for observation i, [math]\displaystyle{ \bar{x} }[/math] and [math]\displaystyle{ \bar{y} }[/math] are the respective is the mean values.



References

2017a

It is calculated by dividing the covariance of the two variables by the product of their standard deviations.
[math]\displaystyle{ r = SUM((x_i - xbar)(y - ybar)) / ((n - 1) * s_x * s_y) }[/math]
Where x and y are the variables, x_i is a single value of x, xbar is the mean of all x's, n is the number of variables, and sx is the standard deviation of all x's.
(...) Pearson is a parametric statistic and assumes:
  • A normal distribution.
  • Interval or ratio data.
  • A linear relationship between X and Y
The coefficient of determination, [math]\displaystyle{ r^2 }[/math], represents the percent of the variance in the dependent variable explained by the dependent variable.
Correlation explains a certain amount of variance, but not all. This works on a square law, so a correlation of 0.5 indicates that the independent variable explains 25% of the variance of the dependent variable, and a correlation of 0.9 accounts for 81% of the of the variance.
This means that the unexplained variance is indicated by (1-r2). This i typically due to random factors.
Pearson's Correlation is also known as the Pearson Product-Moment Correlation or Sample Correlation Coefficient. 'r' is also known as 'Pearson's r'.

2017b

Product-moment correlation coefficient. The correlation r between two variables is:
[math]\displaystyle{ r = \sum (xy) / \sqrt{ [ (\sum x^2 ) * (\sum y^2 ) ]} }[/math]
where Σ is the summation symbol, [math]\displaystyle{ x = x_i - x }[/math], [math]\displaystyle{ x_i }[/math] is the x value for observation i, x is the mean x value, [math]\displaystyle{ y = y_i - y }[/math], [math]\displaystyle{ y_i }[/math] is the y value for observation i, and y is the mean y value.
The formula below uses population means and population standard deviations to compute a population correlation coefficient (ρ) from population data.
Population correlation coefficient. The correlation ρ between two variables is:
[math]\displaystyle{ ρ = [ 1 / N ] * Σ { [ (X_i - μ_X) / σ_x ] * [ (Y_i - μ_Y) / σ_y ] } }[/math]
where N is the number of observations in the population, Σ is the summation symbol, [math]\displaystyle{ X_i }[/math] is the X value for observation i, μX is the population mean for variable X, [math]\displaystyle{ Y_i }[/math] is the Y value for observation i, [math]\displaystyle{ μ_Y }[/math] is the population mean for variable Y, [math]\displaystyle{ σ_x }[/math] is the population standard deviation of X, and σy is the population standard deviation of Y.
The formula below uses sample means and sample standard deviations to compute a correlation coefficient (r) from sample data.
Sample correlation coefficient. The correlation r between two variables is:
[math]\displaystyle{ r = [ 1 / (n - 1) ] * Σ { [ (x_i - x) / s_x ] * [ (y_i - y) / s_y ] } }[/math]
where n is the number of observations in the sample, Σ is the summation symbol, x_i is the x value for observation i, x is the sample mean of x, [math]\displaystyle{ y_i }[/math] is the y value for observation i, y is the sample mean of y, [math]\displaystyle{ s_x }[/math] is the sample standard deviation of x, and [math]\displaystyle{ s_y }[/math] is the sample standard deviation of y.
Each of the latter two formulas can be derived from the first formula. Use the first or second formula when you have data from the entire population. Use the third formula when you only have sample data, but want to estimate the correlation in the population. When in doubt, use the first formula.

2015

  1. See: * As early as 1877, Galton was using the term "reversion" and the symbol “r” for what would become "regression". F. Galton (5, 12, 19 April 1877) "Typical laws of heredity," Nature, 15 (388, 389, 390) : 492–495 ; 512–514 ; 532–533. In the "Appendix" on page 532, Galton uses the term "reversion" and the symbol r. * (F. Galton) (September 24, 1885), "The British Association: Section II, Anthropology: Opening address by Francis Galton, F.R.S., etc., President of the Anthropological Institute, President of the Section," Nature, 32 (830) : 507–510. * Galton, F. (1886) "Regression towards mediocrity in hereditary stature," Journal of the Anthropological Institute of Great Britain and Ireland, 15 : 246–263.
  2. Karl Pearson (June 20, 1895) "Notes on regression and inheritance in the case of two parents," Proceedings of the Royal Society of London, 58 : 240–242.