2007 ProbabilisticMatrixFactorizatio

Jump to navigation Jump to search

Subject Headings: Probabilistic Matrix Factorization.


Cited By



Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.

1 Introduction

One of the most popular approaches to collaborative filtering is based on low-dimensional factor models. The idea behind such models is that attitudes or preferences of a user are determined by a small number of unobserved factors. In a linear factor model, a user’s preferences are modeled by linearly combining item factor vectors using user-specific coefficients. For example, for N users and M movies, the N×M preference matrix R is given by the product of an N×D user coefficient matrix UT and a D×M factor matrix V [ 7]. Training such a model amounts to finding the best rank-D approximation to the observed N×M target matrix R under the given loss function.

A variety of probabilistic factor-based models has been proposed recently [2, 3, 4]. All these models can be viewed as graphical models in which hidden factor variables have directed connections to variables that represent user ratings. The major drawback of such models is that exact inference is intractable [12], which means that potentially slow or inaccurate approximations are required for computing the posterior distribution over hidden factors in such models.

Low-rank approximations based on minimizing the sum-squared distance can be found using Singular Value Decomposition (SVD). SVD finds the matrix [math]\displaystyle{ ˆR = UT V }[/math] of the given rank which minimizes the sum-squared distance to the target matrix R. Since most real-world datasets are sparse, most entries in R will be missing. In those cases, the sum-squared distance is computed only for the observed entries of the target matrix R. As shown by [9], this seemingly minor modification results in a difficult non-convex optimization problem which cannot be solved using standard SVD implementations.

Instead of constraining the rank of the approximation matrix ˆR = UT V, i.e. the number of factors, [10] proposed penalizing the norms of U and V. Learning in this model, however, requires solving a sparse semi-definite program (SDP), making this approach infeasible for datasets containing millions of observations.

Figure 1: The left panel shows the graphical model for Probabilistic Matrix Factorization (PMF).

The right panel shows the graphical model for constrained PMF.

Many of the collaborative filtering algorithms mentioned above have been applied to modelling user ratings on the Netflix Prize dataset that contains 480,189 users, 17,770 movies, and over 100 million observations (user / movie / rating triples). However, none of these methods have proved to be particularly successful for two reasons. First, none of the above-mentioned approaches, except for the matrix-factorization-based ones, scale well to large datasets. Second, most of the existing algorithms have trouble making accurate predictions for users who have very few ratings. A common practice in the collaborative filtering community is to remove all users with fewer than some minimal number of ratings. Consequently, the results reported on the standard datasets, such as MovieLens and EachMovie, then seem impressive because the most difficult cases have been removed. For example, the Netflix dataset is very imbalanced, with “infrequent” users rating 5 movies, while “frequent” users rating over 10,000 movies. However, since the standardized test set includes the complete range of users, the Netflix dataset provides amuch more realistic and useful benchmark for collaborative filtering algorithms.

The goal of this paper is to present probabilistic algorithms that scale linearly with the number of observations and perform well on very sparse and imbalanced datasets, such as the Netflix dataset. In Section 2 we present the Probabilistic Matrix Factorization (PMF) model that models the user preference matrix as a product of two lower-rank user and movie matrices. In Section 3, we extend the PMF model to include adaptive priors over the movie and user feature vectors and show how these priors can be used to control model complexity automatically. In Section 4 we introduce a constrained version of the PMF model that is based on the assumption that users who rate similar sets of movies have similar preferences. In Section 5 we report the experimental results that show that PMF considerably outperforms standard SVD models. We also show that constrained PMF and PMF with learnable priors improve model performance significantly. Our results demonstrate that constrained PMF is especially effective at making better predictions for users with few ratings.

2 Probabilistic Matrix Factorization (PMF)

Suppose we have M movies, N users, and integer rating values from 1 to K1. Let Rij represent the rating of user i for movie j, U ? RD×N and V? RD×M be latent user and movie feature matrices, with column vectors Ui and Vj representing user-specific and movie-specific latent feature vectors respectively. Since model performance is measured by computing the root mean squared error (RMSE) on the test set we first adopt a probabilistic linear model with Gaussian observation



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 ProbabilisticMatrixFactorizatioAndriy Mnih
Ruslan Salakhutdinov
Probabilistic Matrix Factorization