# 2007 LearningtoRankFromPairwiseAppro

- (Cao et al., 2007) ⇒ Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. (2007). “Learning to Rank: From Pairwise Approach to Listwise Approach.” In: Proceedings of the 24th International Conference on Machine learning. ISBN:978-1-59593-793-3 doi:10.1145/1273496.1273513

**Subject Headings:** Pairwise Learning to Rank Algorithm.

## Notes

## Cited By

- http://scholar.google.com/scholar?q=%222007%22+Learning+to+Rank%3A+From+Pairwise+Approach+to+Listwise+Approach
- http://dl.acm.org/citation.cfm?id=1273496.1273513&preflayout=flat#citedby

## Quotes

### Abstract

The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects.
Learning to rank is useful for document retrieval, collaborative filtering, and many other applications.
Several methods for learning to rank have been proposed, which take object pairs as 'instances' in learning.
We refer to them as the pairwise approach in this paper.
Although the pairwise approach offers advantages, it ignores the fact that ranking is a prediction task on list of objects.
The paper postulates that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning.
The paper proposes a new probabilistic method for the approach.
Specifically it introduces two probability models, respectively referred to as permutation probability and top *k* probability, to define a listwise loss function for learning.
Neural Network and Gradient Descent are then employed as model and algorithm in the learning method.
Experimental results on information retrieval show that the proposed listwise approach performs better than the pairwise approach.

### 1. Introduction

The central issues of many applications are ranking. These include document retrieval, collaborative filtering, expert finding, anti web spam, sentiment analysis, and product rating. In this paper, we address learning to rank and without loss of generality we take document retrieval as example. Learning to rank, when applied to document retrieval, is a task as follows. Assume that there is a collection of documents. In retrieval (i.e., ranking), given a query, the ranking function assigns a score to each document, and ranks the documents in descending order of the scores. The ranking order represents the relevance of documents with respect to the query. In learning, a number of queries are provided; each query is associated with a perfect ranking list of documents; a ranking function is then created using the training data, such that the model can precisely predict the ranking lists in the training data.

Due to its importance, learning to rank has been drawing broad attention in the machine learning community recently. Several methods based on what we call the pairwise approach have been developed and successfully applied to document retrieval. This approach takes document pairs as instances in learning, and formalizes the problem of learning to rank as that of classification. Specifically, in learning it collects document pairs from the ranking lists, and for each document pair it assigns a label representing the relative relevance of the two documents. It then trains a classification model with the labeled data and makes use of the classification model in ranking. The uses of Support Vector Machines (SVM), Boosting, and Neural Network as the Learning to Rank: From Pairwise Approach to Listwise Approach classification model lead to the methods of Ranking SVM (Herbrich et al., 1999), RankBoost (Freund et al., 1998), and RankNet (Burges et al., 2005).

There are advantages with taking the pairwise approach. First, existing methodologies on classification can be directly applied. Second, the training instances of document pairs can be easily obtained in certain scenarios (Joachims, 2002). However, there are also problems with the approach. First, the objective of learning is formalized as minimizing errors in classification of document pairs, rather than minimizing errors in ranking of documents. Second, the assumption of that the document pairs are generated i.i.d. is also too strong. Third, the number of generated document pairs varies largely from query to query, which will result in training a model biased toward queries with more document pairs (Cao et al., 2006) In this paper, we propose employing what we call the listwise approach, in which document lists instead of document pairs are used as instances in learning. The major question then is how to define a listwise loss function, representing the difference between the ranking list output by a ranking model and the ranking list given as ground truth. We propose a probabilistic method to calculate the listwise loss function. Specifically we transform both the scores of the documents assigned by a ranking function and the explicit or implicit judgments of the documents given by humans into probability distributions. We can then utilize any metric between the probability distributions as the loss function. We consider the uses of two models for the transformation; one is referred to as permutation probability and the other top k probability.

We then propose a learning to rank method using the listwise loss function, with Neural Network as model and Gradient Descent as algorithm. We refer to it as ListNet. We applied ListNet to document retrieval and compared the results of it with those of existing pairwise methods including Ranking SVM, RankBoost, and RankNet. The results on three data sets show that our method outperforms the existing methods, suggesting that it is better to employ the listwise approach than the pairwise approach in learning to rank.

The major contributions of this paper include (1) proposal of the listwise approach, (2) formulation of the listwise loss function on the basis of probability models, (3) development of the ListNet method, (4) empirical verification of the effectiveness of the approach.

The rest of the paper is organized as follows. Section 2 introduces related work. Section 3 gives a general description on the listwise approach to learning to rank. Probability models for defining a listwise loss function are introduced in Section 4 and the learning method ListNet is explained in Section 5. Section 6 reports our experimental results. Finally, Section 7 makes conclusions.

### 2. Related Work

#### 2.1. Learning to Rank

Learning to rank is a new and popular topic in machine learning. There is one major approach to learning to rank, referred to as the pairwise approach in this paper. For other approaches, see (Shashua & Levin, 2002; Crammer & Singer, 2001; Lebanon & Lafferty, 2002), for example. In the pairwise approach, the learning task is formalized as classification of object pairs into two categories (correctly ranked and incorrectly ranked). Herbrich et al. (1999) proposed employing the approach and using the SVM techniques to build the classification model. The method is referred to as Ranking SVM. Freund et al. (1998) proposed performing the task in the same way but bymeans of Boosting. Burges et al. (2005) also adopted the approach and developed a method called RankNet. They employed Cross Entropy as loss function and Gradient Descent as algorithm to train a Neural Network model. Learning to rank, particularly the pairwise approach, has been successively applied to information retrieval. For instance, Joachims (2002) applied Ranking SVM to document retrieval. He developed a method of deriving document pairs for training, from users’ clicks-through data. Burges et al. (2005) applied RankNet to large scale web search. Cao et al. (2006) adapted Ranking SVM to document retrieval by modifying the loss function. See also (Qin et al., 2007; Tsai et al., 2007). 2.2. Probability Models on Ranking In statistics, probability models for representing ranking lists of objects and methods for estimation of the models have been proposed. For example, following the work by Luce (1959), Plackett (1975) defined probability models on ranking lists of objects. He further proposed a method for estimating the models. In this paper, we make use of similar probability distributions. However, the underlying structures (i.e., parameters) and the fundamental usages (i.e., transformation of scores to probability distributions) of our models differ from those of Plackett’s. 3. Listwise Approach In this section, we give a general description on learning to rank, with document retrieval as example. Particularly we describe in details the listwise approach. In this paper, we use superscript to denote the id of a query and subscript to denote the id of a document. Learning to Rank: From Pairwise Approach to Listwise Approach In training, a set of queries Q = {q(1), q(2), · · · , q(m)} is given. Each query q(i) is associated with a list of documents d(i) = �d(i) 1 , d(i) 2 , · · · , d(i) n(i) � , where d(i) …

## References

;

Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|

2007 LearningtoRankFromPairwiseAppro | Hang Li Tie-Yan Liu Zhe Cao Tao Qin Ming-Feng Tsai | Learning to Rank: From Pairwise Approach to Listwise Approach | 10.1145/1273496.1273513 | 2007 |