Entity Mention Normalization Algorithm
An entity mention normalization algorithm is a word mention reference resolution algorithm that can be implemented by an entity mention normalization system (to solve an Entity Mention Normalization Task).
- AKA: Entity Mention to Entity Record Resolution/Disambiguation Method.
- It can range from being a Heuristic Entity Mention Normalization Algorithm to being a Data-Driven Entity Mention Normalization Algorithm (ranging from Supervised Entity Mention Normalization Algorithm, to Semi-Supervised Entity Mention Normalization Algorithm to Unsupervised Entity Mention Normalization Algorithm).
- It can range from being a Non-Collective Entity Mention Normalization Algorithm to being a Collective Entity Mention Normalization Algorithm.
- It can be supported by a Similarity Function, such as a String Similarity Function.
- It can range from being a Generic Entity Mention Normalization Algorithm to being a Domain-Specific Entity Mention Normalization Algorithm (e.g. Toponym Normalization Algorithm) by making use of Domain Specific Feature such as the calculation of distance from points based on Latitude and Longitude.
- It can detect the absence of the reference Concept in the Knowledge Base (the nil concept).
- See: Lesk Algorithm ..
- (Han et al., 2011) ⇒ Xianpei Han, Le Sun, and Jun Zhao. (2011). “Collective Entity Linking in Web Text: A Graph-based Method.” In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. doi:10.1145/2009916.2010019
- (Melli & Ester, 2010) ⇒ Gabor Melli, and Martin Ester. (2010). “Supervised Identification and Linking of Concept Mentions to a Domain-Specific Ontology.” In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). doi:10.1145/1871437.1871712
- (Dalvi et al., 2009) ⇒ Nilesh Dalvi, Ravi Kumar, Bo Pang, and Andrew Tomkins. (2009). “Matching Reviews to Objects Using a Language Model. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009).
- (Gentile et al., 2009) ⇒ Anna L. Gentile, Pierpaolo Basile, and Giovanni Semeraro. (2009). “WibNED Wikipedia Based Named Entity Disambiguation.” In: Proceedings of the 5th Italian Research Conference on Digital Libraries (IRCDL 2009).
- (Vercoustre et al., 2008) ⇒ Anne-Marie Vercoustre, James A. Thom, and Jovan Pehcevski. (2008). “Entity Ranking in Wikipedia.” In: Proceedings of the 2008 ACM Symposium on Applied Computing. doi:10.1145/1363686.1363943
- QUOTE: Cucerzan  uses Wikipedia data for named entity disambiguation. He first pre-processed a version of the Wikipedia collection (September 2006), and extracted more than 1.4 millions entities with an average of 2.4 surface forms by entities. He also extracted more than one million (entities, category) pairs that were further filtered down to 540 thousand pairs. Lexico-syntactic patterns, such as titles, links, paragraphs and lists, are used to build coreferences of entities in limited contexts. The knowledge extracted from Wikipedia is then used for improving entity disambiguation in the context of web and news search.
- (Jijkounet al., 2008) ⇒ Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, and Maarten de Rijke\n. (2008). “Named entity normalization in user generated content.” In: Proceedings of the second workshop on Analytics for Noisy Unstructured Text Data (AND 2008:23-30).
- QUOTE: Cucerzan  considers the entity normalization task for news and encyclopedia articles; they use information extracted from Wikipedia combined with machine learning for context-aware name disambiguation; the baseline that we use in this paper (taken from ) is a modification (and improved version) of Cucerzan ’s baseline. Cucerzan  also presents an extensive literature overview on the problem.
- (Farkas, 2008) ⇒ Richárd Farkas. (2008). “The strength of co-authorship in gene name disambiguation.” In: BMC Bioinformatics 2008, 9:69. doi:10.1186/1471-2105-9-69
- Taken one step further, the goal of Gene Name Normalisation (GN)  is to assign a unique identifier to each gene name found in a text.
- (Cucerzan, 2007) ⇒ Silviu Cucerzan. (2007). “Large-Scale Named Entity Disambiguation Based on Wikipedia Data.” In: Proceedings of EMNLP-CoNLL-2007.
- (Mihalcea & Csomai, 2007) ⇒ Rada Mihalcea, and Andras Csomai. (2007). “Wikify!: Linking documents to encyclopedic knowledge.” In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (CIKM 2007). doi:10.1145/1321440.1321475
- (Chakaravarthy et al., 2006) ⇒ Venkatesan T. Chakaravarthy, Himanshu Gupta, Prasan Roy, and Mukesh Mohania. (2006). “Efficiently Linking Text Documents with Relevant Structured Information.” In: Proceedings of VLDB 2006.
- (Bunescu & Paşca, 2006) ⇒ Razvan C. Bunescu, and Marius Paşca. (2006). “Using Encyclopedic Knowledge for Named Entity Disambiguation.” In: Proceedings of EACL-2006. (presentation)
- (Mansuri & Sarawagi, 2006) ⇒ I. Mansuri and Sunita Sarawagi, “A system for integrating unstructured data into relational databases,” In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), 2006.
- (Hassell et al., 2006) ⇒ Joseph Hassell, Boanerges Aleman-Meza, and I. Budak Arpinar. (2006). “Ontology-driven automatic entity disambiguation in unstructured text.” In: Proceedings of the 5th International Semantic Web Conference (ISWC). (PowerPoint)
- (Cohen, 2005) ⇒ Aaron M. Cohen. (2005). “Unsupervised gene/protein named entity normalization using automatically extracted dictionaries.” In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases.
- Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources.
- (Crim et al., 2005) ⇒ Jeremiah Crim, Ryan McDonald, and Fernando Pereira. (2005). “Automatically Annotating Documents with Normalized Gene Lists.” In: BMC Bioinformatics 2005, 6(Suppl 1):S13.
- (Borgman & Siegfried, 1992) ⇒ C. L. Borgman, and Susan L. Siegfried. (1992). “Getty's Synoname and Its Cousins: A survey of applications of personal name-matching algorithms.” In: Journal of the American Society for Information Science.