# Covariance Function

(Redirected from covariance)

## References

### 2014

• (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Gaussian_process#Covariance_Functions Retrieved:2014-01-30.
• A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.[1] Thus, if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. The covariance matrix K between all the pair of points x and x' specifies a distribution on functions and is known as the Gram matrix. Importantly, because every valid covariance function is a scalar product of vectors, by construction the matrix K is a non-negative definite matrix. Equivalently, the covariance function K is a non-negative definite function in the sense that for every pair x and x' , K(x,x')≥ 0, if K(,) >0 then K is called positive definite. Importantly the non-negative definiteness of K enables its spectral decomposition using the Karhunen-Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.[2] [3]

Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the process is stationary, it depends on their separation, x - x' , while if non-stationary it depends on the actual position of the points x and x; an example of a stationary process is the Ornstein–Uhlenbeck process. On the contrary, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is non-stationary.

If the process depends only on |x - x'|, the Euclidean distance (not the direction) between x and x then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;[4] in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer.

Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.[2] If we expect that for "near-by" input points x and x' their corresponding output points y and y' to be "near-by" also, then the assumption of smoothness is present. If we wish to allow for significant displacement then we might choose a rougher covariance function. Extreme examples of the behaviour is the Ornstein–Uhlenbeck covariance function and the squared exponential where the former is never differentiable and the latter infinitely differentiable.

Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is achieved by mapping the input x to a two dimensional vector u(x) =(cos(x), sin(x)).

There are a number of common covariance functions:[3]

• Constant : $K_\text{C}(x,x') = C$
• Linear: $K_\text{L}(x,x') = x^T x'$
• Gaussian Noise: $K_\text{GN}(x,x') = \sigma^2 \delta_{x,x'}$
• Squared Exponential: $K_\text{SE}(x,x') = \exp \Big(-\frac{|d|^2}{2l^2} \Big)$
• Ornstein–Uhlenbeck: $K_\text{OU}(x,x') = \exp \Big(-\frac{|d| }{l} \Big)$
• Matérn: $K_\text{Matern}(x,x') = \frac{2^{1-\nu}}{\Gamma(\nu)} \Big(\frac{\sqrt{2\nu}|d|}{l} \Big)^\nu K_{\nu}\Big(\frac{\sqrt{2\nu}|d|}{l} \Big)$
• Periodic: $K_\text{P}(x,x') = \exp\Big(-\frac{ 2\sin^2(\frac{d}{2})}{ l^2} \Big)$
• Rational Quadratic: $K_\text{RQ}(x,x') = (1+|d|^2)^{-\alpha}, \quad \alpha \geq 0$
• Here $d = x- x'$. The parameter $l$ is the characteristic length-scale of the process (practically, "how far apart" two points $x$ and $x'$ have to be for $X$ to change significantly), δ is the Kronecker delta and σ the standard deviation of the noise fluctuations. Here $K_\nu$ is the modified Bessel function of order $\nu$ and $\Gamma$ is the gamma function evaluated for $\nu$. Importantly, a complicated covariance function can be defined as a linear combination of other simpler covariance functions in order to incorporate different insights about the data-set at hand.

Clearly, the inferential results are dependent on the values of the hyperparameters θ (e.g. $l$ and σ) defining the model's behaviour. A popular choice for θ is to provide maximum a posteriori (MAP) estimates of it by maximizing the marginal likelihood of the process; the marginalization being done over the observed process values $y$.[3] This approach is also known as maximum likelihood II, evidence maximization, or Empirical Bayes.[5]

1. Cite error: Invalid <ref> tag; no text was provided for refs named prml
2. Rasmussen, C.E.; Williams, C.K.I (2006). Gaussian Processes for Machine Learning. MIT Press. ISBN 0-262-18253-X.
3. Grimmett, Geoffrey; David Stirzaker (2001). Probability and Random Processes. Oxford University Press. ISBN 0198572220.
4. Cite error: Invalid <ref> tag; no text was provided for refs named seegerGPML

### 2012

• http://en.wikipedia.org/wiki/Covariance
• QUOTE: In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e. the variables tend to show similar behavior, the covariance is a positive number. In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e. the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not that easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which serves as an estimated value of the parameter.