2009 ContextAwareQueryClassification

(Cao et al., 2009) ⇒ Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang. (2009). “Context-Aware Query Classification.” In: Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009) doi:10.1145/1571941.1571945

Subject Headings: Web Query Classification.

Notes

Cited By

~8 http://scholar.google.com/scholar?cites=4105655774184241115

Quotes

Author Keywords

Search context, Query classification

Abstract

Understanding users' search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries without considering their context information. However, as exemplified by the well-known example on query “jaguar”, many Web queries are short and ambiguous, whose real meanings are uncertain without the context information. In this paper, we incorporate context information into the problem of query classification by using conditional random field (CRF) models. In our approach, we use neighboring queries and their corresponding clicked URLs (Web pages) in search sessions as the context information. We perform extensive experiments on real world search logs and validate the effectiveness and efficiency of our approach. We show that we can improve the F1 score by 52% as compared to other state-of-the-art baselines.

1. INTRODUCTION

Search engines have become one of the most popular tools for Web users to find their desired information. As a result, understanding the search intent behind the queries issued by Web users has become an important research problem. Query classification (or query categorization), denoted as QC, has been studied for this purpose by classifying user queries into a ranked list of predefined target categories. Such category information can be used to trigger the most appropriate vertical searches corresponding to a query, improve Web page ranking [18], and help find the relevant on-line advertisements.
Query classification is dramatically different from traditional text classification because of two issues. First, Web queries are usually very short. As reported in [5], most queries contain only 2-3 terms. Second, many queries are ambiguous [11], and it is common that a query belongs to multiple categories. For example, [27] manually labels 800 randomly sampled queries from the public data set from ACM KDD Cup'05, and 682 queries have multiple category labels.
To address the above challenges, a variety of query classification approaches have been proposed in the literature. In general, these approaches can be divided into three categories. The first category tries to augment the queries with extra data, including the search results returned for a certain query, the information from an existing corpus, or an intermediate taxonomy [8, 27]. The second category leverages unlabeled data to help improve the accuracy of supervised learning [5, 6]. Finally, the third category of approaches expands the training data by automatically labeling some queries in some click-through data via a self-training-like approach [21]. Although the existing methods may be successful in some cases, most of them are not context-aware; that is, they treat each query individually without considering the user behavior history.
A MOTIVATING EXAMPLE. Suppose that a user issues a query \Michael Jordan". It is not clear whether the user is interested in the famous basketball player or the machine learning researcher at UC Berkeley. Without understanding the user's search intent, many existing methods may classify the query into both categories "Sports" and "Computer Science". However, if we find that the user has issued a query "NBA" before "Michael Jordan", it is likely that the user is interested in the category of\Sports". Conversely, if the user issues some queries related to machine learning before the query "Michael Jordan", it may suggest the user is interested in the topics related to "Computer Science".

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2009 ContextAwareQueryClassification	Huanhuan Cao Derek Hao Hu Dou Shen Daxin Jiang Jian-Tao Sun Enhong Chen Qiang Yang			Context-Aware Query Classification		Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval	http://research.microsoft.com/pubs/81350/sigir09.pdf	10.1145/1571941.1571945		2009

2009 ContextAwareQueryClassification

Notes

Cited By

Quotes

Author Keywords

Abstract

1. INTRODUCTION

Navigation menu

Search