Gaussian Process-based Regression (GPR) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Gaussian Process-based Regression (GPR) Algorithm is a Gaussian Process-based algorithm that is a model-based supervised regression algorithm.



References

2016

  1. Barkan, O., Weill, J., & Averbuch, A. (2016). "Gaussian Process Regression for Out-of-Sample Extension". arXiv preprint arXiv:1603.02194.

2016

  • (Wikipedia, 2016) ⇒ http://wikipedia.org/wiki/Gaussian_process#Gaussian_process_prediction Retrieved:2016-4-9.
    • When concerned with a general Gaussian process regression problem, it is assumed that for a Gaussian process f observed at coordinates x, the vector of values is just one sample from a multivariate Gaussian distribution of dimension equal to number of observed coordinates |x|. Therefore under the assumption of a zero-mean distribution, , where is the covariance matrix between all possible pairs for a given set of hyperparameters θ.

      As such the log marginal likelihood is: : [math]\displaystyle{ \log p(f(x)|\theta,x) = -\frac{1}{2}f(x)^T K(\theta,x,x')^{-1} f(x) -\frac{1}{2} \log \det(K(\theta,x,x')) - \frac{|x|}{2} \log 2\pi }[/math] and maximizing this marginal likelihood towards θ provides the complete specification of the Gaussian process f. One can briefly note at this point that the first term corresponds to a penalty term for a model's failure to fit observed values and the second term to a penalty term that increases proportionally to a model's complexity. Having specified θ making predictions about unobserved values at coordinates x* is then only a matter of drawing samples from the predictive distribution [math]\displaystyle{ p(y^*|x^*,f(x),x) = N(y^*|A,B) }[/math] where the posterior mean estimate A is defined as: : [math]\displaystyle{ A = K(\theta,x^*,x) K(\theta,x,x')^{-1} f(x) }[/math] and the posterior variance estimate B is defined as: : [math]\displaystyle{ B = K(\theta,x^*,x^*) - K(\theta,x^*,x) K(\theta,x,x')^{-1} K(\theta,x^*,x)^T }[/math] where is the covariance between the new coordinate of estimation x* and all other observed coordinates x for a given hyperparameter vector θ, and are defined as before and is the variance at point x* as dictated by θ. It is important to note that practically the posterior mean estimate (the "point estimate") is just a linear combination of the observations ; in a similar manner the variance of is actually independent of the observations . A known bottleneck in Gaussian process prediction is that the computational complexity of prediction is cubic in the number of points |x| and as such can become unfeasible for larger data sets. Works on sparse Gaussian processes, that usually are based on the idea of building a representative set for the given process f, try to circumvent this issue.


2005

  1. For notational simplicity we exclusively use zero-mean priors.