Gaussian Process Model

A Gaussian process model is a stochastic process model based on a finite linear combination of random variables/samples with a (consistent) joint Gaussian distribution.

AKA: Stationary Gaussian Time Series.
Context:
- It can be an input to a Gaussian Process Learning Task (solved by a Gaussian process algorithms).
- It can be fully specific by its Mean Function and Covariance Function.
- It can be associated to a Multivariate Gaussian Function.
- It can range from being a Zero-Mean Gaussian Process to being a Non Zero-Mean Gaussian Process.
- It can range from being a Simple Gaussian Process to being a Gaussian Process Mixture.
- It can be thought of as a probability distribution for generating functions.
- It can be specified by a mean function and a covariance function.
- …
Example(s):
- Radial Basis Function?
- linear damping, parametric gain, and linear coupling.
- …
Counter-Example(s):
- a Poisson Process.
- a Variance Gamma Process.
- a Dirichlet Process?
See: Kernel-based Method, Gaussian Process Regression, Markov Process, Multivariate Normal Distribution, Kriging.

References

2017a

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Gaussian_process Retrieved:2017-12-4.
- In probability theory and statistics, a Gaussian process is a particular kind of statistical model where observations occur in a continuous domain, e.g. time or space. In a Gaussian process, every point in some continuous input space is associated with a normally distributed random variable. Moreover, every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.
  Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data. The prediction is not just an estimate for that point, but also has uncertainty information — it is a one-dimensional Gaussian distribution (which is the marginal distribution at that point).
  For some kernel functions, matrix algebra can be used to calculate the predictions using the technique of kriging. When a parameterised kernel is used, optimisation software is typically used to fit a Gaussian process model.
  The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the Gaussian distribution (normal distribution). Gaussian processes can be seen as an infinite-dimensional generalization of multivariate normal distributions.
  Gaussian processes are useful in statistical modelling, benefiting from properties inherited from the normal. For example, if a random process is modelled as a Gaussian process, the distributions of various derived quantities can be obtained explicitly. Such quantities include the average value of the process over a range of times and the error in estimating the average using sample values at a small set of times.

2017b

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Gaussian_process#Definition Retrieved:2017-12-4.
- A time continuous stochastic process is Gaussian if and only if for every finite set of indices [math]\displaystyle{ t_1,\ldots,t_k }[/math] in the index set [math]\displaystyle{ T }[/math] : [math]\displaystyle{ \mathbf{X}_{t_1, \ldots, t_k} = (\mathbf{X}_{t_1}, \ldots, \mathbf{X}_{t_k}) }[/math] is a multivariate Gaussian random variable.^[1] That is the same as saying every linear combination of [math]\displaystyle{ (\mathbf{X}_{t_1}, \ldots, \mathbf{X}_{t_k}) }[/math] has a univariate normal (or Gaussian) distribution. Using characteristic functions of random variables, the Gaussian property can be formulated as follows: [math]\displaystyle{ \left\{X_t ; t\in T\right\} }[/math] is Gaussian if and only if, for every finite set of indices [math]\displaystyle{ t_1,\ldots,t_k }[/math], there are real-valued [math]\displaystyle{ \sigma_{\ell j} }[/math] , [math]\displaystyle{ \mu_\ell }[/math] with [math]\displaystyle{ \sigma_{jj} \gt 0 }[/math] such that the following equality holds for all [math]\displaystyle{ s_1,s_2,\ldots,s_k\in\mathbb{R} }[/math] : [math]\displaystyle{ \operatorname{E}\left(\exp\left(i \ \sum_{\ell=1}^k s_\ell \ \mathbf{X}_{t_\ell}\right)\right) = \exp \left(-\frac{1}{2} \, \sum_{\ell, j} \sigma_{\ell j} s_\ell s_j + i \sum_\ell \mu_\ell s_\ell\right). }[/math] where [math]\displaystyle{ i }[/math] denotes the imaginary number [math]\displaystyle{ \sqrt{-1} }[/math] .
  The numbers [math]\displaystyle{ \sigma_{\ell j} }[/math] and [math]\displaystyle{ \mu_\ell }[/math] can be shown to be the covariances and means of the variables in the process.

2017c

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Gaussian_process#Applications Retrieved:2017-12-4.
- A Gaussian process can be used as a prior probability distribution over functions in Bayesian inference. Given any set of N points in the desired domain of your functions, take a multivariate Gaussian whose covariance matrix parameter is the Gram matrix of your N points with some desired kernel, and sample from that Gaussian. Inference of continuous values with a Gaussian process prior is known as Gaussian process regression, or kriging; extending Gaussian process regression to multiple target variables is known as cokriging. Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool. Gaussian process regression can be further extended to address learning tasks in both supervised (e.g. probabilistic classification ) and unsupervised (e.g. manifold learning) learning frameworks. Gaussian processes can also be used in the context of mixture of experts models, e.g.,. ^[2] ^[3] The underlying rationale of such a learning framework consists in the fundamental assumption that the mapping of independent to dependent variables cannot be sufficiently captured by a single Gaussian process model. On the contrary, it is considered that the observations space is naturally divided into subspaces, each of which is characterized by a significantly different mapping function; each of these is learned via a different Gaussian process component in the postulated mixture.

2006

(Rasmussen & Williams, 2006) ⇒ Carl E. Rasmussen, and Christopher K. I. Williams. (2006). “Gaussian Processes for Machine Learning." MIT Press. ISBN:026218253X
(Rasmussen, 2006) ⇒ Carl Edward Rasmussen. (2006). “Advances in Gaussian Processes." Tutorial at Advances in Neural Information Processing Systems, 19 (NIPS 2016).

2005

(Quiñonero-Candela & Rasmussen, 2005) ⇒ Joaquin Quiñonero-Candela, and Carl Edward Rasmussen. (2005). “A Unifying View of Sparse Approximate Gaussian Process Regression.” In: The Journal of Machine Learning Research, 6.
- QUOTE: A Gaussian process (GP) is a collection of random variables, any finite number of which have consistent1 joint Gaussian distributions.

2004

(Rasmussen, 2004) ⇒ Carl Edward Rasmussen. (2004). “Gaussian Processes in Machine Learning.” In: Advanced Lectures on Machine Learning, 2004. ISBN:978-3-540-23122-6
- ABSTRACT: We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work.

↑ MacKay, David, J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. pp. 540. ISBN 9780521642989. http://www.inference.phy.cam.ac.uk/itprnn/book.pdf. ""The probability distribution of a function [math]\displaystyle{ y(\mathbf{x}) }[/math] is a Gaussian processes if for any finite selection of points [math]\displaystyle{ \mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(N)} }[/math], the density [math]\displaystyle{ P(y(\mathbf{x}^{(1)}),y(\mathbf{x}^{(2)}),\ldots,y(\mathbf{x}^{(N)})) }[/math] is a Gaussian""
↑ Emmanouil A. Platanios and Sotirios P. Chatzis, “Gaussian Process-Mixture Conditional Heteroscedasticity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 5, pp. 888–900, May 2014. [1]
↑ Sotirios P. Chatzis, “A Latent Variable Gaussian Process Model with Pitman-Yor Process Priors for Multiclass Classification,” Neurocomputing, vol. 120, pp. 482–489, Nov. 2013. [2]

[DrMacKayGPNN-1] MacKay, David, J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. pp. 540. ISBN 9780521642989. http://www.inference.phy.cam.ac.uk/itprnn/book.pdf. ""The probability distribution of a function [math]\displaystyle{ y(\mathbf{x}) }[/math] is a Gaussian processes if for any finite selection of points [math]\displaystyle{ \mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(N)} }[/math], the density [math]\displaystyle{ P(y(\mathbf{x}^{(1)}),y(\mathbf{x}^{(2)}),\ldots,y(\mathbf{x}^{(N)})) }[/math] is a Gaussian""

[2] Emmanouil A. Platanios and Sotirios P. Chatzis, “Gaussian Process-Mixture Conditional Heteroscedasticity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 5, pp. 888–900, May 2014. [1]

[3] Sotirios P. Chatzis, “A Latent Variable Gaussian Process Model with Pitman-Yor Process Priors for Multiclass Classification,” Neurocomputing, vol. 120, pp. 482–489, Nov. 2013. [2]

[1]

[2]

[3]

Gaussian Process Model

References

2017a

2017b

2017c

2016

2009a

2009b

2006

2005

2004