2011 EntityDisambiguationwithHierarc

From GM-RKB

Jump to navigation Jump to search

(Kataria et al., 2011) ⇒ Saurabh S. Kataria, Krishnan S. Kumar, Rajeev R. Rastogi, Prithviraj Sen, and Srinivasan H. Sengamedu. (2011). “Entity Disambiguation with Hierarchical Topic Models.” In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2011) Journal. ISBN:978-1-4503-0813-7 doi:10.1145/2020408.2020574

Subject Headings: Entity Reference Disambiguation.

Notes

Cited By

Quotes

Author Keywords

Algorithms; data mining; disambiguation; entity resolution; performance; theory; topic models

Abstract

Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learning accurate entity disambiguation models from crowd-sourced knowledge bases such as Wikipedia. Our main contribution is a semi-supervised hierarchical model called Wikipedia-based Pachinko Allocation Model (WPAM) that exploits: (1) All words in the Wikipedia corpus to learn word-entity associations (unlike existing approaches that only use words in a small fixed window around annotated entity references in Wikipedia pages), (2) Wikipedia annotations to appropriately bias the assignment of entity labels to annotated (and co-occurring unannotated) words during model learning, and (3) Wikipedia's category hierarchy to capture co-occurrence patterns among entities. We also propose a scheme for pruning spurious nodes from Wikipedia's crowd-sourced category hierarchy. In our experiments with multiple real-life datasets, we show that WPAM outperforms state-of-the-art baselines by as much as 16% in terms of disambiguation accuracy.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2011 EntityDisambiguationwithHierarc	Rajeev Rastogi Prithviraj Sen Saurabh S. Kataria Krishnan S. Kumar Srinivasan H. Sengamedu			Entity Disambiguation with Hierarchical Topic Models				10.1145/2020408.2020574		2011

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2011_EntityDisambiguationwithHierarc&oldid=845502"

Facts

... more about "2011 EntityDisambiguationwithHierarc"

Saurabh S. Kataria +, Krishnan S. Kumar +, Rajeev R. Rastogi +, Prithviraj Sen + and Srinivasan H. Sengamedu +

10.1145/2020408.2020574 +

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +

Entity Disambiguation with Hierarchical Topic Models +

2011 +