2006 UsingEncyclKnowForNEDisambig

(Bunescu & Paşca, 2006) ⇒ Razvan C. Bunescu, Marius Paşca. (2006). “Using Encyclopedic Knowledge for Named Entity Disambiguation.” In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006).

Subject Headings: Entity Mention Resolution Algorithm

Notes

Its Presentation Deck can be found at http://www.cs.utexas.edu/~razvan/papers/eacl2006.ppt
It performs disambiguation into one of the known possible classes for a NE (determined from Wikipedia disambiguation pages).
Its contexts for training and testing are acquired from Wikipedia pages (as opposed to general text).
It uses vectors of co-occurring terms for disambigaution
It uses a taxonomy-based kernel that integrates word-category correlations.
It evaluates prediction for a given NE in a Wikipedia page context
- the correct class from among its known classes
- It includes one experiment that included 10% of out-of-Wikipedia entities.
It category space is restricted to Person Occupation, with 8,202 subclasses.
Its experiments consider:
- 110 broad classes
- 540 highly populated classes (w/o out-of-Wikipedia entities)
- 2,847 classes including less populated ones.
classification is performed in context
it does not evaluate recognition.

Cited By

2014

(Roth et al., 2014) ⇒ Dan Roth, Heng Ji, Ming‐Wei Chang, and Taylor Cassidy. (2014). “Wikification and Beyond: The Challenges of Entity and Concept Grounding.” Tutorial at ACL 2014.
- QUOTE: … Contextual disambiguation and grounding of concepts and entities in natural language text are essential to moving forward in many natural language understanding related tasks and are fundamental to many applications. The Wikification task (Bunescu and Pasca, 2006; Mihalcea and Csomai, 2007; Ratinov et al., 2011) aims at automatically identifying concept mentions appearing in a text document and link it to (or “ground it in”) a concept referent in a knowledge base (KB) (e.g., Wikipedia). …

2009

(Kulkarni et al., 2009) ⇒ Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti. (2009). “Collective Annotation of Wikipedia Entities in Web Text.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557073.
- QUOTE: … Bunescu and Paşca [3] further improved the compatibility function using SVMs with tree kernels. However, none of these systems attempt collective disambiguation across spots.

Quotes

Abstract

We present a new method for detecting and disambiguating named entities in open domain text. A disambiguation SVM kernel is trained to exploit the high coverage and rich structure of the knowledge encoded in an online encyclopedia. The resulting model significantly outperforms a less informed baseline.

Introduction

Whenever the queries search for pinpointed, factual information, the burden of filling the gap between the output granularity (whole documents) and the targeted information (a set of sentences or relevant phrases) stays with the users, by browsing the returned documents in order to find the actually relevant bits of information.

We organize all named entities from Wikipedia into a dictionary structure [math]\displaystyle{ D }[/math], where each string entry [math]\displaystyle{ d }[/math] in [math]\displaystyle{ D }[/math] is mapped to the set of entities d.E that can be denoted by [math]\displaystyle{ d }[/math] in Wikipedia.

The first step is to identify named entities, i.e. entities with a proper name title. Because every title in Wikipedia must begin with a capital letter, the decision whether a title is a proper name relies on the following sequence of heuristic steps:

1. If ....... is a multiword title, check the capitalization of all content words, i.e. words other than prepositions, determiners, conjunctions, relative pronouns or negations. Consider a named entity if and only if all . content words are capitalized.
2. If ....... is a one word title that contains at least two capital letters, then . is a named entity. Otherwise, go to step 3.
3. Count how many times ....... occurs in the text of the article, in positions other than at the beginning of sentences. If at least … of these occurrences are capitalized, then . is a named entity.

Named Entity Disambiguation

We use the term query to denote the occurrence of a proper name inside a Wikipedia article. If there is a dictionary entry matching the proper name in the query such that the set of …

Presentation

1) Classification:

Train a classifier for each proper name in the dictionary D.
Not feasible: 500K proper names  need 500K classifiers!

2) Ranking:

Design a scoring function score(q,ek) that computes the compatibility between the context of the proper name occurring in a query q, and any of the entities ek q.E that may be referred by that proper name.
For a given named entity query q, select the highest ranking entity:
Use cosine similarity between query context and article, based on the tf x idf formulation:

References

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. (1999). Modern Information Retrieval. ACM Press, New York.
Massimiliano Ciaramita, Thomas Hofmann, and Mark Johnson. (2003). Hierarchical semantic classification: Word sense disambiguation with world knowledge. In The 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico.
Robert Dale. (2003). Computational linguistics. Special Issue on the Web as a Corpus, 29(3), September.
Gottlob Frege. (1999). On sense and reference. In Maria Baghramian, editor, Modern Philosophy of Language, pages 3–25. Counterpoint Press.
Chung Heong Gooi and James Allan. (2004). Cross-document coreference on a large scale corpus. In: Proceedings of Human Language Technology Conference / North American Association for Computational Linguistics Annual Meeting, Boston, MA.
Thorsten Joachims. (1999). Making large-scale SVM learning practical. In Bernhard Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 169–184. MIT Press.
Thorsten Joachims. (2002). Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133–142.
Andrew McCallum, R. Rosenfeld, Tom M. Mitchell, and A. Y. Ng. (1998). Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), pages 359–367, Madison, WI.
M. Remy. (2002). Wikipedia: The free encyclopedia. Online Information Review, 26(6):434. www.wikipedia.org. Vladimir N. Vapnik. (1998). Statistical Learning Theory. John Wiley & Sons.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2006 UsingEncyclKnowForNEDisambig	Razvan C. Bunescu Marius Paşca			Using Encyclopedic Knowledge for Named Entity Disambiguation		Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics	http://www.cs.utexas.edu/~razvan/papers/eacl2006.pdf			2006