Entity Mention Normalization Algorithm

An entity mention normalization algorithm is a word mention reference resolution algorithm that can be implemented by an entity mention normalization system (to solve an Entity Mention Normalization Task).






  • (Vercoustre et al., 2008) ⇒ Anne-Marie Vercoustre, James A. Thom, and Jovan Pehcevski. (2008). “Entity Ranking in Wikipedia.” In: Proceedings of the 2008 ACM Symposium on Applied Computing. doi:10.1145/1363686.1363943
    • QUOTE: Cucerzan [8] uses Wikipedia data for named entity disambiguation. He first pre-processed a version of the Wikipedia collection (September 2006), and extracted more than 1.4 millions entities with an average of 2.4 surface forms by entities. He also extracted more than one million (entities, category) pairs that were further filtered down to 540 thousand pairs. Lexico-syntactic patterns, such as titles, links, paragraphs and lists, are used to build coreferences of entities in limited contexts. The knowledge extracted from Wikipedia is then used for improving entity disambiguation in the context of web and news search.
  • (Jijkounet al., 2008) ⇒ Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, and Maarten de Rijke\n. (2008). “Named entity normalization in user generated content.” In: Proceedings of the second workshop on Analytics for Noisy Unstructured Text Data (AND 2008:23-30).
    • QUOTE: Cucerzan [4] considers the entity normalization task for news and encyclopedia articles; they use information extracted from Wikipedia combined with machine learning for context-aware name disambiguation; the baseline that we use in this paper (taken from [11]) is a modification (and improved version) of Cucerzan [4]’s baseline. Cucerzan [4] also presents an extensive literature overview on the problem.
  • (Farkas, 2008) ⇒ Richárd Farkas. (2008). “The strength of co-authorship in gene name disambiguation.” In: BMC Bioinformatics 2008, 9:69. doi:10.1186/1471-2105-9-69
    • Taken one step further, the goal of Gene Name Normalisation (GN) [2] is to assign a unique identifier to each gene name found in a text.