2002 MethodsAndMetricsForColdStartRecommendations

Jump to: navigation, search

Subject Headings: Recommender System, Evaluation, Movie Recommendation Task.




We have developed a method for recommending items that combines content and collaborative data under a single probabilistic framework. We benchmark our algorithm against a naïve Bayes classifier on the cold-start problem, where we wish to recommend items that no one in the community has yet rated. We systematically explore three testing methodologies using a publicly available data set, and explain how these methods apply to specific real-world applications. We advocate heuristic recommenders when benchmarking to give competent baseline performance. We introduce a new performance metric, the CROC curve, and demonstrate empirically that the various components of our testing strategy combine to obtain deeper understanding of the performance characteristics of recommender systems. Though the emphasis of our testing is on cold-start recommending, our methods for recommending and evaluation are general.


Recommender systems suggest items of interest to users based on their explicit and implicit preferences, the preferences of other users, and user and item attributes. For example, a movie recommender might combine explicit ratings data (e.g., Bob rates Shrek a 7 out of 10), implicit data (e.g., Mary purchased The Natural ), user demographic information (e.g., Mary is female), and movie content information (e.g., Scream is marketed as a horror movie) to make recommendations to specific users.

Pure collaborative filtering methods [3, 12, 15, 23, 30] base their recommendations on community preferences (e.g., user ratings and purchase histories), ignoring user and item attributes (e.g., demographics and product descriptions). On the other hand, pure content-based filtering or information filtering methods [17, 24] typically match query words or other user data with item attribute information, ignoring data from other users. Several hybrid algorithms combine both techniques [1, 4, 6, 8, 21, 29]. Though \content" usually refers to descriptive words associated with an item, we use the term more generally to refer to any form of item attribute information including, for example, the list of actors in a movie.

One difficult, though common, problem for a recommender system is the cold-start problem, where recommendations are required for items that no one (in our data set) has yet rated.1 Pure collaborative filtering cannot help in a cold-start setting, since no user preference information is available to form any basis for recommendations. However, [[content information]] can help bridge the gap from existing items to new items, by inferring similarities among them. Thus we can make recommendations for new items that appear similar to other recommended items. In this paper, we evaluate the performance of two machine learning algorithms on cold start prediction. We present our own probabilistic model that combines content and collaborative information by using expectation maximization (EM) learning to fit the model to the data. We perform benchmarking on movie ratings data and compare against a naive Bayes method that has also been proposed for this task [16].

Some key questions in evaluating recommender systems on testbed data are: what to predict, how to grade performance and what baseline to compare with. We identify three useful components to predict on our data set, and show where past work has focussed. In deciding what metric to use in evaluating performance, we have borrowed heavily from the literature in addition to developing our own tool: the CROC curve. For baseline measures of performance we advocate the use of heuristic recommenders: algorithms that are trivial to implement yet give performance that is well above random. We find that heuristic recommenders do surprisingly well: in some cases outperforming more sophisticated methods. Our testing goal is to uncover the most informative characterization of performance for our method and the naive Bayes algorithm.


Early recommender systems were pure collaborative filters that computed pairwise similarities among users and recommended items according to a similarity-weighted average [22, 30]. Breese et al. [3] refer to this class of algorithms as memory-based algorithms. Subsequent authors employed a variety of techniques for collaborative filtering, including hard-clustering users into classes [3], simultaneously hard-clustering users and items [31], soft-clustering users and items [14, 21], singular value decomposition [26], inferring item-item similarities [27], probabilistic modeling [3, 6, 10, 20, 21, 29], machine learning [1, 2, 18], and listranking [5, 7, 19]. More recently, authors have turned toward designing hybrid recommender systems that combine both collaborative and content information in various ways [1, 4, 6, 8, 21, 29]. To date, most comparisons among algorithms have been empirical or qualitative in nature [11, 25], though some worst-case performance bounds have been derived [7, 18], some general principles advocated [7], and some fundamental limitations explicated [19]. Techniques suggested in evaluating recommender system performance include mean average error, receiver operator characteristic (ROC) curves, ranked list metrics [3, 11] and variants of precision/recall statistics [25].

In this work we extend the hybrid recommender system of Popescul et al. [21] to average content data in a model based fashion. In evaluating our method we introduce novel testing strategies and metrics that can discover fifine-grain characterization of performance leading to actionable conclusions.

3. The Two-Way Aspect Model

In predicting an association between person [math]p[/math] and movie [math]m[/math], we employ a latent class variable framework called the aspect model that has been designed for contingency table smoothing [13]. Figure 1 (a) shows a graphical model description of the aspect model for a person/movie contingency table and Table 1 explains our notation used in the graphical model as well as in other descriptions of the movie recommendation task.,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 MethodsAndMetricsForColdStartRecommendationsAndrew I. Schein
Alexandrin Popescul
Lyle H. Ungar
David M. Pennock
Methods and Metrics for Cold-Start RecommendationsProceedings of the 25th ACM SIGIR Conferencehttp://dpennock.com/papers/schein-sigir-2002-cold-start.pdf10.1145/564376.5644212002