2011 EntityDisambiguationwithHierarc

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Entity Reference Disambiguation.

Notes

Cited By

Quotes

Author Keywords

Abstract

Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learning accurate entity disambiguation models from crowd-sourced knowledge bases such as Wikipedia. Our main contribution is a semi-supervised hierarchical model called Wikipedia-based Pachinko Allocation Model (WPAM) that exploits: (1) All words in the Wikipedia corpus to learn word-entity associations (unlike existing approaches that only use words in a small fixed window around annotated entity references in Wikipedia pages), (2) Wikipedia annotations to appropriately bias the assignment of entity labels to annotated (and co-occurring unannotated) words during model learning, and (3) Wikipedia's category hierarchy to capture co-occurrence patterns among entities. We also propose a scheme for pruning spurious nodes from Wikipedia's crowd-sourced category hierarchy. In our experiments with multiple real-life datasets, we show that WPAM outperforms state-of-the-art baselines by as much as 16% in terms of disambiguation accuracy.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 EntityDisambiguationwithHierarcRajeev Rastogi
Prithviraj Sen
Saurabh S. Kataria
Krishnan S. Kumar
Srinivasan H. Sengamedu
Entity Disambiguation with Hierarchical Topic Models10.1145/2020408.20205742011