2016 PersonalizedRankingwithPairwise

(Guo et al., 2016) ⇒ Weiyu Guo, Shu Wu, Liang Wang, and Tieniu Tan. (2016). “Personalized Ranking with Pairwise Factorization Machines.” In: Neurocomputing Journal, 214(C). doi:10.1016/j.neucom.2016.05.074

Subject Headings: Pairwise Learning, Matrix Factorization Machines.

Notes

Cited By

Quotes

Keywords: Personalized Ranking; Adaptive Sampling; Pairwise Learning

Abstract

Pairwise learning is a vital technique for personalized ranking with implicit feedback. Given the assumption that each user is more interested in items which have been previously selected by the user than the remaining ones, pairwise learning algorithms can well learn users' preference, from not only the observed user feedbacks but also the underlying interactions between users and items. However, a mass of training instances are randomly derived according to such assumption, which makes the learning procedure often converge slowly and even result in poor predictive models. In addition, the cold start problem often perplexes pairwise learning methods, since most of traditional methods in personalized ranking only take explicit ratings or implicit feedbacks into consideration. For dealing with the above issues, this work proposes a novel personalized ranking model which incorporates implicit feedback with content information by making use of Factorization Machines. For efficiently estimating the parameters of the proposed model, we develop an adaptive sampler to draw informative training instances based on content information of users and items. The experimental results show that our adaptive item sampler indeed can speed up our model, and our model outperforms advanced methods in personalized ranking.

1. Introduction

Due to information overload on the Web, both users and e-commerce sellers, such as Taobao and Amazon, expect to exactly direct appropriate items to specific users. Conse- quently, recommender systems with personalized ranking have become an important part in modern applications. For example, Taobao and Amazon embed personalized ranking techniques into their recommender systems, which navigate potential customers to products according to the specific preference of customers. Most of studies [1][2][3] in personalized ranking are based on explicit ratings. However, as users are required to actively interact with systems, explicit ratings are hard to be obtained in practical applications. For instance, MovieLens invites a mass of users to rate movies on its website, and then personalized rec- ommendation is provided based on these explicit ratings. While implicit feedbacks, such as Preprint submitted to Elsevier June 7, 2016 whether a user has browsed a web page, or whether a customer has purchased a product, pervade on the Web and are easier to be collected. We thereby extend our research to consider implicit feedbacks.

A characteristic of implicit feedbacks is that it is one-class, i.e., only positive instances are observed. Traditional Collaborative Filtering (CF) approaches [4][5] are often suffer from over-fitting problem when they are dealing with this kind of data, because the number of observed positive instances is far less than that of negative instances. Pairwise learning algorithms, such as Bayesian Personalized Ranking (BPR) [6] and its extensions [3][7][8], are tailored to personalized ranking with implicit feedbacks. They usually assume that users are more interested in items that they have selected than the remaining items, and randomly draw item pairs with corresponding users to make up training instances. However, in practice, most of users have only seen a handful of items, and provided feedbacks on some of viewed items. Consequently, when the number of items is very large, the random sampler will derive massive meaningless and ineffective training instances which can not provide much helpful information to tune the personalized ranking lists. For example, in common sense, we can not say that a user dislikes a smart phone because he likes a toothbrush, and this pair, i.e., toothbrush and smart phone, is meaningless in real-world scenarios. Moreover, users can not view too many items at the same time, but the random sampler often draws item pairs which items are not viewed by corresponding users. For instance, when a user searches smart phones on an e-commerce website, the smart phones of popular brands may be viewed simultaneously by the user. While the smart phones of non-mainstream brands, usually are ranked at tail places of the recalled list, and the user tends to miss such non- mainstream brands. Due to the lack of user feedbacks, the predictive models are hard to accurately assess the degree of the user interest in non-mainstream smart phones, and the corresponding training instance has a low contribution to the learning of parameters. The slow convergence in the training procedure seems to be inevitable, on account of wasting a lot of time on massive meaningless and ineffective training instances.

To the best of our knowledge, there are two directions to design efficient samplers for speeding up the procedure of pairwise learning, i.e., directly utilizing characteristics of dis- tribution appearing in datasets, e.g., long-tailed distribution, to enlighten the samplers [9], or leveraging adaptive strategies [9][10] to adaptively guide the samplers. Since different datasets may follow different distributions, the methods in the first direction are hard to be scaled to general applications. Therefore, we turn to speed up the procedure of pair- wise learning by the second direction. We design an adaptive item sampler which holds a self-evident rule, i.e., \apples to apples", for drawing informative training instances. For example, suppose that a user has bought an iPhone 6, our sampler tends to draw other kinds of smart phones which are able to be compared with iPhone 6 in terms of product properties, rather than draw other things, e.g., toothbrushes. Moreover, under the category of smart phones, we further select those products which probably have been viewed by the user to make up the item pairs for the user.

On the other hand, new users and items may be added into recommender systems consis- tently. The conventional pairwise learning algorithms for personalized ranking, which only take account of collaborative information, e.g., implicit feedbacks or explicit ratings, are only able to deal with entities observed in the dataset. For new users or items which do not have any collaborative information, such methods are not capable of making meaning- ful personalized ranking. In real-world recommender systems, such cold start problems are often addressed by switching to content-based approaches.

In this work, for obtaining robust personalized ranking results, we associate content information of entities with implicit feedbacks to develop a Pairwise Ranking Factorization Machines (PRFM) which alleviates the cold start problem and enhances the performance of personalized ranking by incorporating BPR learning [6] with Factorization Machines [2]. Furthermore, for a comprehensive solution, we embed an adaptive and efficient sampler into the learning procedure of PRFM by making use of the parameters of PRFM and the content information of users and items.

In a nutshell, our contributions in this paper are listed as follows:

We associate content information of entities with implicit feedbacks to develop a pair-

wise learning framework, PRFM, which not only can alleviate the cold start problem, but also can improve personalized ranking by combining content information and im- plicit feedbacks.

To speed up the learning of PRFM, we take account of the content information, and

develop an adaptive sampling strategy to draw informative training data. Our adaptive sampler provides a better balance between efficiency and performance than previous strategies.

We conduct a series of experiments to validate the proposed methods. The results

show that PRFM can improve the performance of personalized ranking.

2. Related Work

Factorization methods are popular in personalized recommender systems. They are utilized to deal with various information collected by recommender systems, such as user feedbacks [6][11], attributes of item [2][7], user pro�les [12] and social information [13]. Ac- cording to the studied data types, these methods can be roughly arranged into two branches, i.e., collaborative methods and content-based methods.

2.1. Collaborative Methods

Collaborative Filtering (CF) has been one of the most successful approaches in recom- mender systems [4][5][14]. It collects users' historical data and generates predictions for such users based on similarity measurements of users and items. Matrix factorization (MF) techniques, such as SVD [15] and SVD++ [16], are very popular for collaborative filtering based on explicit ratings. They factorize the rating matrix to be two low rank matrices, which can indicate users' preference and properties of items respectively. Although some extensions [17][18][19] of MF can deal with implicit feedbacks, they easily suffer from the over-fitting problem because of commonly existed data skewness in implicit feedback datasets (the number of positive feedbacks is usually less than one percent of the total number). For alleviating the data skewness and providing personalized ranking lists based on implicit feedbacks, Bayesian Personalized Ranking (BPR) [6] and its extensions [3][8][9] are proposed, which make a pairwise assumption that users are more interested in items that they have previously selected than the remaining items.

Since the pairwise assumption of BPR often derives a large-scale training set, the ran- domly sampling strategy [6] is popular to draw training instances. However, the random sampler often leads to slow convergence. To speed up the BPR learning, Rendle et al. uti- lize the properties of long-tail effect to develop non-uniformly item samplers [9]. Zhong et al. [10] draw informative training pairs according to the preference of a given user on two unselected items. However, since the number of items in real-world datasets is very large, previous works [9][10] are perplexed by the dilemma in terms of how to balance the efficiency of sampler and the predictive quality of model. To alleviate this dilemma, Guo et al. [20] designed an adaptive sampler which uses feature learning methods to learn low dimensional representations for items, and utilizes such representations to initialize the ranking scores of items under categories. However, this strategy may depend on the quality of initialization too much. If the feature learning method can not obtain high-quality representations for items, then this kind of adaptive sampler may suffer from a bad performance.

2.2. Content-based Methods

Content-based methods [2][21][22] utilize contents of entities, such as attributes of items, texts of users and reviews of items, to learn factors of contents and predict specific users' preference. Factorization Machines (FM) [2] is a generic predictor, since they can mimic most fac- torization models with feature engineering [2]. In FM, all kinds of contents are concatenated to be a design matrix, and then factors associated with contents are learned by a process of regression or classification. Recently, for providing an easy access to different solvers for regression, classification and ranking tasks, fastFM [23] embeds the various losses into the FM framework. For instance, to deal with the ranking task, fastFM uses a random sampler to draw item pairs for the corresponding users, and learns the parameters of FM by making use of the pairwise loss of BPR. However, fastFM with pairwise loss of BPR can not well learn some parameters of FM which are relevant to user features, and this limitation hampers it to improve the quality of personalized ranking.

On the other hand, to alleviate the cold start problem, Map-BPR [7] and Maa-BPR [24] extends the BPR framework to learn content-aware mappings from the content space to the latent space. However, Map-BPR segments the procedures of matrix factorization and the learning of content-aware mappings to two independent phases. This limitation causes that the low rank matrices produced by Map-BPR ignore content information of user and item. Maa-BPR improves Map-BPR, and incorporates multiple attributes with implicit feedbacks to learn latent vectors for entities and attribute-aware mappings for multiple attributes.

3. Background

In this section, we introduce Bayesian Personalized Ranking (BPR) [6] and Factorization Machines (FM) [2], inspired by which we construct our model.

3.1. Bayesian Personalized Ranking

Bayesian Personalized Ranking (BPR) [6] makes a pairwise assumption that users are more interested in items that they have selected than the remaining items to learn low rank matrices for users and items respectively. Specifically, if user um has selected item vp but not selected item vq, then BPR assumes that, um prefers vp over vq, and defines this pairwise preference of um as P (p �m q) := � (xmpq ); (1) where � (x) = 1=(1 + exp (􀀀x)) and xmpq := f (um; vp) 􀀀 f (um; vq). f (�; �) is a scoring function which indicates the relevance between a user and an item. Based on the pairwise preference assumption, the set of all pairwise preference DS : = f (m; p; q) j vp 2 I+ um ^vq 2 I n I+ um g can be created from the corresponding implicit feedback dataset, where I+ um denotes the set of items which have been selected by user um, and a triple t = (m; p; q) represents that user um is relevant to item vp but irrelevant to item vq. For simplicity, we can call vp as a positive item of um, while vq is a negative one. And (um, vp) is a positive feedback, (um, vq) is a negative one. Given a set of pairwise preference DS, the optimization goal of BPR learning is to maximize the BPR-OPT criterion: BPR 􀀀 OPT := ln Y (m;p;q)2DS P (p �m q); (2) which is equivalent to minimize the negative log likelihood: L = 􀀀 X (m;p;q)2Ds ln � (xmpq ) + � 2 jj�jj2; (3) where � denotes the set of parameters and � is a hyper-parameter. Since the size of DS is very large, the learning algorithms of BPR are generally based on stochastic gradient descent (SGD) with randomly drawn training data from DS.

3.2. Factorization Machines

Factorization Machines (FM) [2] is a generic method, which can mimic most of factor- ization models just using feature engineering [2]. In FM, all kinds of contents are concatenated into a design matrix X = [x1; x2; :::; xn], where x_i is the feature vector of the i-th sample, and n is the number of training samples. Then, the FM model of order 2 is defined as ^y := w0 + Xn i=1 wi x_i + Xn i=1 Xn j=i+1 xi xj XK k=1 vikvjk; (4) where ^y is the predictive value. w0 is the global bias, and wi is the strength of the i-th feature. vik is the k-th interaction factor for the i-th feature. K is the number of interaction factors for each feature.

4. Proposed Model

In this section, we construct a personalized ranking model, i.e., Pairwise Ranking FM (PRFM), which incorporates implicit feedbacks with contents of users and items for person- alized ranking.

4.1. Problem Statement

…

5. Adaptive Sampling

5.1. Overview

5.2. Category Inference

…

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2016 PersonalizedRankingwithPairwise	Weiyu Guo Shu Wu Liang Wang Tieniu Tan			Personalized Ranking with Pairwise Factorization Machines				10.1016/j.neucom.2016.05.074		2016