A Gazetteer-based Term Annotation Task is a term recognition task that is based on a term gazetteer.
- (Smith et al., 2008) ⇒ Larry Smith, Lorraine K. Tanabe, Rie J. Ando, Cheng-Ju Kuo, I-Fang Chung, Chun-Nan Hsu, Yu-Shi Lin, Roman Klinger, Christoph M. Friedrich, and Kuzman Ganchev, Manabu Torii, Hongfang Liu, Barry Haddow, Craig A. Struble, Richard J. Povinelli, Andreas Vlachos, William A. Baumgartner, Lawrence Hunter, Bob Carpenter, Richard TH Tsai, Hong-Jie Dai, Feng Liu, Yifei Chen, Chengjie Sun, Sophia Katrenko, Pieter Adriaans, Christian Blaschke, Rafael Torres, Mariana Neves, Preslav Nakov, Anna Divoli, Manuel Maña-López, Jacinto Mata, and W. John Wilbur. (2008). “Overview of BioCreative II Gene Mention Recognition.” In: Genome biology, 9(Suppl 2). doi:10.1186/gb-2008-9-s2-s2
- QUOTE: NER seeks to identify the words and phrases in text that reference entities in a given category, such as people, places, or companies, or in this application genes and proteins. NER is frequently accomplished with B-I-O tagging, which classifies each token as being at the beginning of the named entity (B), continuing the entity (I), or outside of any entity to be tagged (O). There are several lexical resources (sources of information about words) commonly used in solving the NER problem. A gazetteer is a list of names belonging to a particular category, such as places, persons, companies, genes, and so on. A lexicon is a source of information about different forms or grammatical properties of words. A thesaurus is a source of information indicating words with similar and/or related meanings. Systems in the BioCreative I challenge were classified as open if they used lexical resources, particularly gazetteers, and otherwise closed. A commonly used lexical resource is the Unified Medical Language System (UMLS), a controlled vocabulary of biomedical terminology maintained by the US National Library of Medicine.