Sample Covariance Value

A Sample Covariance Value is a sample statistic that measures the covariance between two datasets.

Context:
- It is a point estimate of the population covariance.
- It can be defined, for a bivariate sample [math]\displaystyle{ ((x_1, \cdots, x_n), (y_1, \cdots, y_n)) }[/math] of the random variables [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], as :

[math]\displaystyle{ s_{xy} =s(x,y)=\frac{1}{n-1} \sum_{i=1}^n [x_i − E(x)][y_i − E(y)] }[/math]

where [math]\displaystyle{ E(x) }[/math] and [math]\displaystyle{ E(y) }[/math] central tendency measures (e.g. sample mean values), and [math]\displaystyle{ n }[/math] is the sample size.

- It obeys the following properties:
  - If [math]\displaystyle{ c }[/math] is a constant: [math]\displaystyle{ s(x,c)=0\;, \quad s(x+c,y+c)=s(x,y) }[/math] and [math]\displaystyle{ s(c\cdot x,y)=c\cdot s(x,y) }[/math].
  - It generalizes sample variance: [math]\displaystyle{ s(x,x)=s^2(x) }[/math],
  - It is symmetric: [math]\displaystyle{ s(x,y)=s(y,x) }[/math],
  - It is bilinear: [math]\displaystyle{ s(x+y,z)=s(x,z)+s(y,z) }[/math] and [math]\displaystyle{ s^2(x,y)= s^2(x)+2s(x)s(y)+s^2(y) }[/math].
Example(s):
- Let's consider the following sample datasets [math]\displaystyle{ x=\{ 1,3, 10, 6 \} }[/math] and [math]\displaystyle{ y =\{ 4,5,7,10 \} }[/math].
  - Assuming [math]\displaystyle{ E(x)=5 }[/math] and [math]\displaystyle{ E(y)=6.5 }[/math] are the mean value of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], then

[math]\displaystyle{ s_{xy}=\frac{1}{3}(1-5)(4-5)+(3-6.5)(5 - 6.5)+(10 -5)(7 - 6.5) +(6-5)(10-6.5)=6.33 }[/math]

Counter-Example(s):
- Population Covariance.
- Sample Variance Value.
See: Sample Mean, Covariance Matrix Estimate.

References

2017a

(Wikipedia, 2017) ⇒ https://www.wikiwand.com/en/Sample_mean_and_covariance
- The sample mean or empirical mean and the sample covariance are statistics computed from a collection (the sample) of data on one or more random variables.

The sample mean and sample covariance are estimators of the population mean and population covariance, where the term population refers to the set from which the sample was taken.

The sample mean is a vector each of whose elements is the sample mean of one of the random variables that is, each of whose elements is the arithmetic average of the observed values of one of the variables. The sample covariance matrix is a square matrix whose i, j element is the sample covariance (an estimate of the population covariance) between the sets of observed values of two of the variables and whose i, i element is the sample variance of the observed values of one of the variables. If only one variable has had values observed, then the sample mean is a single number (the arithmetic average of the observed values of that variable) and the sample covariance matrix is also simply a single value (a 1x1 matrix containing a single number, the sample variance of the observed values of that variable).

Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics and applications to numerically represent the location and dispersion, respectively, of a distribution.

2017b

(Wikipedia, 2017) ⇒ https://www.wikiwand.com/en/Covariance#Calculating_the_sample_covariance
- The sample covariance of N observations of K variables is the K-by-K matrix [math]\displaystyle{ \textstyle \overline{ \overline q }=\left[[q_{jk}]\right] }[/math] with the entries

[math]\displaystyle{ q_{jk}=\frac{1}{N-1}\sum_{i=1}^{N}\left( X_{ij}-\bar{X}_j \right) \left( X_{ik}-\bar{X}_k \right), }[/math]

which is an estimate of the covariance between variable

j

and variable

k

.

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector [math]\displaystyle{ \textstyle \mathbf{X} }[/math], a row vector whose jth element (j = 1, ..., K) is one of the random variables. The reason the sample covariance matrix has [math]\displaystyle{ \textstyle N-1 }[/math] in the denominator rather than [math]\displaystyle{ \textstyle N }[/math] is essentially that the population mean [math]\displaystyle{ \operatorname{E}(X) }[/math] is not known and is replaced by the sample mean [math]\displaystyle{ \mathbf{\bar{X}} }[/math]. If the population mean [math]\displaystyle{ \operatorname{E}(X) }[/math] is known, the analogous unbiased estimate is given by

[math]\displaystyle{ q_{jk}=\frac 1 N \sum_{i=1}^N \left( X_{ij} - \operatorname{E}(X_j)\right) \left( X_{ik} - \operatorname{E}(X_k)\right). }[/math]

2015

(Siegrist et al.,2015) ⇒ Retrieved from http://www.math.uah.edu/stat/sample/Covariance.html Copyright 1997-2015 Kyle Siegrist
- The sample covariance is defined to be

[math]\displaystyle{ s(x,y)=\frac{1}{n−1}\sum_{i=1}^n[x_i−m(x)][y_i−m(y)] }[/math]

Assuming that the data vectors are not constant, so that the standard deviations are positive, the sample correlation is defined to be

[math]\displaystyle{ r(x,y)=\frac{s(x,y)}{s(x)s(y)} }[/math]

Note that the sample covariance is an average of the product of the deviations of the [math]\displaystyle{ x }[/math] and y data from their means. Thus, the physical unit of the sample covariance is the product of the units of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]. Correlation is a standardized version of covariance. In particular, correlation is dimensionless (has no physical units), since the covariance in the numerator and the product of the standard deviations in the denominator have the same units (the product of the units of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]). Note also that covariance and correlation have the same sign: positive, negative, or zero. In the first case, the data x and y are said to be positively correlated; in the second case x and y are said to be negatively correlated; and in the third case x and y are said to be uncorrelated.

To see that the sample covariance is a measure of association, recall first that the point [math]\displaystyle{ (m(x),m(y)) }[/math] is a measure of the center of the bivariate data. Indeed, if each point is the location of a unit mass, then [math]\displaystyle{ (m(x),m(y)) }[/math] is the center of mass as defined in physics.

2013

http://www.r-tutor.com/elementary-statistics/numerical-measures/covariance
- QUOTE: The covariance of two variables x and y in a data sample measures how the two are linearly related. A positive covariance would indicates a positive linear relationship between the variables, and a negative covariance would indicate the opposite.
  The sample covariance is defined in terms of the sample means as: : [math]\displaystyle{ s_{xy} = \frac{1}{n-1} \Sigma_{i=1}^n (x_i − \bar{x})(y_i − \bar{y}) }[/math] Similarly, the population covariance is defined in terms of the population means μx, μy as: [math]\displaystyle{ \sigma_{xy} = \frac{1}{N} \Sigma_{i=1}^n (x_i − \mu_{x})(y_i − \mu_{y}) }[/math]