Named Entity Disambiguation (NED) Task

From GM-RKB
Jump to: navigation, search

A Named Entity Disambiguation (NED) Task is a text processing task that requires mention linking in a text document with their correct referent entities in a knowledge base.



References

2019

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Entity_linking Retrieved:2019-6-16.
    • In natural language processing, entity linking, named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization (NEN) [1] is the task of determining the identity of entities mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred as "Paris". NED is different from named entity recognition (NER) in that NER identifies the occurrence or mention of a named entity in text but it does not identify which specific entity it is.

      Entity linking requires a knowledge base containing the entities to which entity mentions can be linked. A popular choice for entity linking on open domain text are knowledge-bases based on Wikipedia, [2] in which each page is regarded as a named entity. NED using Wikipedia entities has been also called wikification (see Wikify! an early entity linking system[3] ). A knowledge base may also be induced automatically from training text [4] or manually built.

      Named entity mentions can be highly ambiguous; any entity linking method must address this inherent ambiguity. Various approaches to tackle this problem have been tried to date. In the seminal approach of Milne and Witten, supervised learning is employed using the anchor texts of Wikipedia entities as training data. [5] Other approaches also collected training data based on unambiguous synonyms. . Kulkarni et al. exploited the common property that topically coherent documents refer to entities belonging to strongly related types.

      Entity linking has been used to improve the performance of information retrieval systems M. A. Khalid, V. Jijkoun and M. de Rijke (2008). The impact of named entity normalization on information retrieval for question answering. Proc. ECIR. </ref> and to improve search performance on digital libraries. [6] [7] NED is also a key input for Semantic Search. [8]

  1. M. A. Khalid, V. Jijkoun and M. de Rijke (2008). The impact of named entity normalization on information retrieval for question answering. Proc. ECIR.
  2. Xianpei Han, Le Sun and Jun Zhao (2011). Collective entity linking in web text: a graph-based method. Proc. SIGIR.
  3. Rada Mihalcea and Andras Csomai (2007)Wikify! Linking Documents to Encyclopedic Knowledge. Proc. CIKM.
  4. Aaron M. Cohen (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. Proc. ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24.
  5. David Milne and Ian H. Witten (2008). Learning to link with Wikipedia. Proc. CIKM.
  6. Hui Han, Hongyuan Zha, C. Lee Giles, "Name disambiguation in author citations using a K-way spectral clustering method," ACM/IEEE Joint Conference on Digital Libraries 2005 (JCDL 2005): 334-343, 2005
  7. [1]
  8. STICS

2013

2011

2011a

2008a

2008b

2008c

2008d

2007

  • (Morgan et al., 2007) ⇒ Alexander A. Morgan, Benjamin Wellner, Jeffrey B. Colombe, Robert Arens, Marc E. Colosimo, Lynette Hirschman. (2007). “Evaluating the Automatic Mapping of Human Gene and Protein Mentions to Unique Identifiers.” In: Pacific Symposium Biocomputing, 12.
    • QUOTE:Vlachos et al. observed [19], in biomedical text there is a high occurrence of families of genes and proteins being mentioned by a single term such as: "Mxi1" belongs to the Mad (Mxi1) family of proteins, which function as potent antagonists of Myc oncoproteins". In future work in biomedical entity normalization, we suggest that normalizing entity mentions to family mentions may be an effective way to support other biomedical text mining tasks. Possibly the protein families in InterPro [6] could be used as normalization targets for mentions of families. For example, the mention of "Myc oncoproteins" could link to InterPro:IPR002418. This would enable information extraction systems that extract facts (relations, attributes) on gene families to attach those properties to all family members.

2006

2005

2002

1992

  • (Borgman & Siegfried, 1992) ⇒ C. L. Borgman, and S. L. Siegfried. (1992). “Getty's Synoname and Its Cousins: A survey of applications of personal name-matching algorithms.” In: Journal of the American Society for Information Science.