2009 NonLinearMatrixFactorizationwit

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Non-Linear Probabilistic Matrix Factorization, Collaborative Filtering Algorithm.

Notes

Cited By

Quotes

Abstract

A popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous state-of-the-art performance.

3 Non-Linear PMF via GP-LVMs

We have already highlighted the fact that probabilistic matrix factorization, with the parameters W marginalized is a Bayesian multi-output regression model in which we optimize with respect to the inputs to the regression. This type of model is equivalent to probabilistic PCA. However, it also belongs to a larger class of models called Gaussian process latent variable models (GP-LVM). Lawrence (2005) showed how the matrix C has an interpretation as a Gaussian process (GP) covariance matrix. The GP associated with the covariance function C = �..1 w XX> + �2I is a linear model. However, by replacing the inner product matrix, XX>, by a Mercer kernel the model becomes a non-linear GP model. Maximization of the log likelihood can no longer be done through an eigenvalue problem, but it is straightforward to apply stochastic gradient descent in the manner described above.

The regression model from (1) can be written as a product of univariate Gaussian distributions,

[math]\displaystyle{ p . YjW;X; �2� = DY j=1 NY i=1 N .. yi;j jfj (xi;:) ; �2I � ; }[/math]

where the mean of each Gaussian is given by the inner product fj (xi;:) = w> j;:xi;:. Probabilistic PCA can be recovered by marginalizing either W or X. The GPLVM is recovered by recognizing that we can place the prior distribution directly over the function f (�) through a Gaussian process (Rasmussen & Williams, 2006).

A Gaussian process (GP) can be thought of as a probability distribution for generating functions. The GP is specified by a mean and a covariance function. For any given set of observations of the function, f, the joint distribution over those observations is Gaussian. Restricting ourselves to GPs with a zero mean function, they are distributed as p (f jX) = N (f j0;K) ; where K represents the covariance function. The covariance function is made up of elements, k(xi;:; xj;:) that encode the degree of correlation between two samples, fi, fj from f as a function of the inputs associated with those samples, xi;: and xj;:. For a covariance function to be valid, it has to lead to a positive semi-definite matrix K for all valid inputs to the function. In practice that means that valid covariance functions have to be positive definite functions, i.e. the class of valid covariance functions is the same as the class of Mercer kernels (Sch?olkopf & Smola, 2001). A linear regression model is a GP in which the covariance function is taken to be k (xi;:; xj;:) = x> i;:xj;:.

A widely used covariance function that gives a prior Non-linear Matrix Factorization with Gaussian Processes over non-linear functions is known as the RBF covariance, k (x`;:; xi;:) = �m exp �m 2 jjx`;: 􀀀 xi;:jj2 �

This covariance can be substituted directly for the linear covariance function in (2) giving the following probabilistic model,

[math]\displaystyle{ p YjX; �2; � = DY j=1 N yij ;j j0;K+ �2I }[/math]

� where �are the parameters of the covariance function. Alternative covariance functions can also be considered, but in this paper we focus only on the RBF and linear covariance functions.



References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 NonLinearMatrixFactorizationwitNeil D. Lawrence
Raquel Urtasun
Non-linear Matrix Factorization with Gaussian Processes10.1145/1553374.15534522009