2011 EntityDisambiguationwithHierarc
- (Kataria et al., 2011) ⇒ Saurabh S. Kataria, Krishnan S. Kumar, Rajeev R. Rastogi, Prithviraj Sen, and Srinivasan H. Sengamedu. (2011). “Entity Disambiguation with Hierarchical Topic Models.” In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2011) Journal. ISBN:978-1-4503-0813-7 doi:10.1145/2020408.2020574
Subject Headings: Entity Reference Disambiguation.
Notes
Cited By
- http://scholar.google.com/scholar?q=%222011%22+Entity+Disambiguation+with+Hierarchical+Topic+Models
- http://dl.acm.org/citation.cfm?id=2020408.2020574&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learning accurate entity disambiguation models from crowd-sourced knowledge bases such as Wikipedia. Our main contribution is a semi-supervised hierarchical model called Wikipedia-based Pachinko Allocation Model (WPAM) that exploits: (1) All words in the Wikipedia corpus to learn word-entity associations (unlike existing approaches that only use words in a small fixed window around annotated entity references in Wikipedia pages), (2) Wikipedia annotations to appropriately bias the assignment of entity labels to annotated (and co-occurring unannotated) words during model learning, and (3) Wikipedia's category hierarchy to capture co-occurrence patterns among entities. We also propose a scheme for pruning spurious nodes from Wikipedia's crowd-sourced category hierarchy. In our experiments with multiple real-life datasets, we show that WPAM outperforms state-of-the-art baselines by as much as 16% in terms of disambiguation accuracy.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 EntityDisambiguationwithHierarc | Rajeev Rastogi Prithviraj Sen Saurabh S. Kataria Krishnan S. Kumar Srinivasan H. Sengamedu | Entity Disambiguation with Hierarchical Topic Models | 10.1145/2020408.2020574 | 2011 |