Tao Cheng
Jump to navigation
Jump to search
References
- Professional Home Page: http://research.microsoft.com/en-us/people/taocheng/
- DBLP Author Page: http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cheng:Tao.html
2011
- (Cheng, 20011) ⇒ Tao Cheng. (2011). “Toward Entity-Aware Search." PhD Thesis, University of Illinois at Urbana-Champaign
- ABSTRACT: As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability.
- SUBJECT(S): Entity Search; Entity-aware Search; Entity Indexing; Entity Synonym; Content Query Language.
- QUOTE: We use a prefix # sign (e.g.,
#phone
for phone entity) throughout the thesis to distinguish entities from keywords. Further, each entity type [math]\displaystyle{ E_i }[/math] is a set of entity instances that are extracted from the corpus, i.e., literal values of entity type [math]\displaystyle{ E_i }[/math] that occur somewhere in some document [math]\displaystyle{ d \in D }[/math]. We use [math]\displaystyle{ e_i }[/math] to denote an entity instance of entity type [math]\displaystyle{ E_i }[/math]. In the example of phone-number patterns, we may extract #phone = {“800-2017575”, “244-2919”, ...}
2010
- (Cheng, Lauw, & Paparizos, 2010) ⇒ Tao Cheng, Hady Lauw, and Stelios Paparizos. (2010). “Fuzzy Matching of Web Queries to Structured Data.” In: Proceedings of ICDE 2010 (ICDE 2010). doi:10.1109/ICDE.2010.5447817
- ABSTRACT: Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.
2007
- (Cheng et al., 2007) ⇒ Tao Cheng, Xifeng Yan, and Kevin Chen-Chuan Chang. (2007). “EntityRank: Searching Entities Directly and Holistically.” In: Proceedings of the 33rd International Conference on Very large data bases (VLDB 2007).
- Note: It describes the EntityRank System for the Entity Mention Search Task.