2009 ContextAwareQueryClassification

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Web Query Classification.

Notes

Cited By

Quotes

Author Keywords

Abstract

1. INTRODUCTION

  • Search engines have become one of the most popular tools for Web users to find their desired information. As a result, understanding the search intent behind the queries issued by Web users has become an important research problem. Query classification (or query categorization), denoted as QC, has been studied for this purpose by classifying user queries into a ranked list of predefined target categories. Such category information can be used to trigger the most appropriate vertical searches corresponding to a query, improve Web page ranking [18], and help find the relevant on-line advertisements.
  • Query classification is dramatically different from traditional text classification because of two issues. First, Web queries are usually very short. As reported in [5], most queries contain only 2-3 terms. Second, many queries are ambiguous [11], and it is common that a query belongs to multiple categories. For example, [27] manually labels 800 randomly sampled queries from the public data set from ACM KDD Cup'05, and 682 queries have multiple category labels.
  • To address the above challenges, a variety of query classification approaches have been proposed in the literature. In general, these approaches can be divided into three categories. The first category tries to augment the queries with extra data, including the search results returned for a certain query, the information from an existing corpus, or an intermediate taxonomy [8, 27]. The second category leverages unlabeled data to help improve the accuracy of supervised learning [5, 6]. Finally, the third category of approaches expands the training data by automatically labeling some queries in some click-through data via a self-training-like approach [21]. Although the existing methods may be successful in some cases, most of them are not context-aware; that is, they treat each query individually without considering the user behavior history.
  • A MOTIVATING EXAMPLE. Suppose that a user issues a query \Michael Jordan". It is not clear whether the user is interested in the famous basketball player or the machine learning researcher at UC Berkeley. Without understanding the user's search intent, many existing methods may classify the query into both categories "Sports" and "Computer Science". However, if we find that the user has issued a query "NBA" before "Michael Jordan", it is likely that the user is interested in the category of\Sports". Conversely, if the user issues some queries related to machine learning before the query "Michael Jordan", it may suggest the user is interested in the topics related to "Computer Science".

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 ContextAwareQueryClassificationHuanhuan Cao
Derek Hao Hu
Dou Shen
Daxin Jiang
Jian-Tao Sun
Enhong Chen
Qiang Yang
Context-Aware Query Classificationhttp://research.microsoft.com/pubs/81350/sigir09.pdf10.1145/1571941.1571945