Sample Covariance Value

From GM-RKB
Jump to navigation Jump to search

A Sample Covariance Value is a sample statistic that measures the covariance between two datasets.

  • Context:
    • It is a point estimate of the population covariance.
    • It can be defined, for a bivariate sample [math]\displaystyle{ ((x_1, \cdots, x_n), (y_1, \cdots, y_n)) }[/math] of the random variables [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], as :
[math]\displaystyle{ s_{xy} =s(x,y)=\frac{1}{n-1} \sum_{i=1}^n [x_i − E(x)][y_i − E(y)] }[/math]
where [math]\displaystyle{ E(x) }[/math] and [math]\displaystyle{ E(y) }[/math] central tendency measures (e.g. sample mean values), and [math]\displaystyle{ n }[/math] is the sample size.
    • It obeys the following properties:
      • If [math]\displaystyle{ c }[/math] is a constant: [math]\displaystyle{ s(x,c)=0\;, \quad s(x+c,y+c)=s(x,y) }[/math] and [math]\displaystyle{ s(c\cdot x,y)=c\cdot s(x,y) }[/math].
      • It generalizes sample variance: [math]\displaystyle{ s(x,x)=s^2(x) }[/math],
      • It is symmetric: [math]\displaystyle{ s(x,y)=s(y,x) }[/math],
      • It is bilinear: [math]\displaystyle{ s(x+y,z)=s(x,z)+s(y,z) }[/math] and [math]\displaystyle{ s^2(x,y)= s^2(x)+2s(x)s(y)+s^2(y) }[/math].
  • Example(s):
    • Let's consider the following sample datasets [math]\displaystyle{ x=\{ 1,3, 10, 6 \} }[/math] and [math]\displaystyle{ y =\{ 4,5,7,10 \} }[/math].
      • Assuming [math]\displaystyle{ E(x)=5 }[/math] and [math]\displaystyle{ E(y)=6.5 }[/math] are the mean value of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], then
[math]\displaystyle{ s_{xy}=\frac{1}{3}(1-5)(4-5)+(3-6.5)(5 - 6.5)+(10 -5)(7 - 6.5) +(6-5)(10-6.5)=6.33 }[/math]


References

2017a

The sample mean and sample covariance are estimators of the population mean and population covariance, where the term population refers to the set from which the sample was taken.
The sample mean is a vector each of whose elements is the sample mean of one of the random variables that is, each of whose elements is the arithmetic average of the observed values of one of the variables. The sample covariance matrix is a square matrix whose i, j element is the sample covariance (an estimate of the population covariance) between the sets of observed values of two of the variables and whose i, i element is the sample variance of the observed values of one of the variables. If only one variable has had values observed, then the sample mean is a single number (the arithmetic average of the observed values of that variable) and the sample covariance matrix is also simply a single value (a 1x1 matrix containing a single number, the sample variance of the observed values of that variable).
Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics and applications to numerically represent the location and dispersion, respectively, of a distribution.

2017b

[math]\displaystyle{ q_{jk}=\frac{1}{N-1}\sum_{i=1}^{N}\left( X_{ij}-\bar{X}_j \right) \left( X_{ik}-\bar{X}_k \right), }[/math]
which is an estimate of the covariance between variable j and variable k.
The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector [math]\displaystyle{ \textstyle \mathbf{X} }[/math], a row vector whose jth element (j = 1, ..., K) is one of the random variables. The reason the sample covariance matrix has [math]\displaystyle{ \textstyle N-1 }[/math] in the denominator rather than [math]\displaystyle{ \textstyle N }[/math] is essentially that the population mean [math]\displaystyle{ \operatorname{E}(X) }[/math] is not known and is replaced by the sample mean [math]\displaystyle{ \mathbf{\bar{X}} }[/math]. If the population mean [math]\displaystyle{ \operatorname{E}(X) }[/math] is known, the analogous unbiased estimate is given by
[math]\displaystyle{ q_{jk}=\frac 1 N \sum_{i=1}^N \left( X_{ij} - \operatorname{E}(X_j)\right) \left( X_{ik} - \operatorname{E}(X_k)\right). }[/math]

2015

[math]\displaystyle{ s(x,y)=\frac{1}{n−1}\sum_{i=1}^n[x_i−m(x)][y_i−m(y)] }[/math]
Assuming that the data vectors are not constant, so that the standard deviations are positive, the sample correlation is defined to be
[math]\displaystyle{ r(x,y)=\frac{s(x,y)}{s(x)s(y)} }[/math]
Note that the sample covariance is an average of the product of the deviations of the [math]\displaystyle{ x }[/math] and y data from their means. Thus, the physical unit of the sample covariance is the product of the units of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]. Correlation is a standardized version of covariance. In particular, correlation is dimensionless (has no physical units), since the covariance in the numerator and the product of the standard deviations in the denominator have the same units (the product of the units of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]). Note also that covariance and correlation have the same sign: positive, negative, or zero. In the first case, the data x and y are said to be positively correlated; in the second case x and y are said to be negatively correlated; and in the third case x and y are said to be uncorrelated.
To see that the sample covariance is a measure of association, recall first that the point [math]\displaystyle{ (m(x),m(y)) }[/math] is a measure of the center of the bivariate data. Indeed, if each point is the location of a unit mass, then [math]\displaystyle{ (m(x),m(y)) }[/math] is the center of mass as defined in physics.

2013