Named Entity Recognition Algorithm
- It can be applied by a Named Entity Recognition System.
- It can range from:
- It can be supported by:
- It can range from being a Language-Independent Named Entity Recognition Algorithm to being a Language-Dependent Named Entity Recognition Algorithm, that takes advantage of a Language's Constraints.
- See: Named Entity Classifier
- (Liu et al., 2011) ⇒ Xiaohua Liu, Shaodian Zhang, Furu Wei, and Ming Zhou. (2011). “Recognizing Named Entities in Tweets.” In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. … We propose a novel NER system to address these challenges. Firstly, a K-Nearest Neighbors (KNN) based classifier is adopted to conduct word level classification, leveraging the similar and recently labeled tweets. Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear Conditional Random Fields (CRF) (Lafferty et al., 2001) model, which conducts fine-grained tweet level NER. Furthermore, the KNN and CRF model are repeatedly retrained with an incrementally augmented training set, into which high confidently labeled tweets are added. Indeed, it is the combination of KNN and CRF under a semi-supervised learning framework that differentiates ours from the existing. Finally, following [[Lev Ratinov and Dan Roth (2009)]], 30 gazetteers are used, which cover common names, countries, locations, temporal expressions, etc. These gazetteers represent general knowledge across domains. The underlying idea of our method is to combine global evidence from KNN and the gazetteers with local contextual information, and to use common knowledge and unlabeled tweets to make up for the lack of training data.
- (Kazam & Torsawa, 2007) ⇒ J. Kazama and K. Torisawa. (2007). “Exploiting Wikipedia as External Knowledge for Named Entity Recognition.” In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 698–707, 2007.
- (NadSek, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification." Lingvisticae Investigationes. Volume 30, Edition 1.
- (Kozareva, 2006) ⇒ Zornitsa Kozareva. (2006). “Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists.” In: Proceedings of EACL 2006.
- (Cimiano & Völker, 2005) ⇒ Philipp Cimiano, and Johanna Völker. (2005). “Towards Large-scale, Open-domain and Ontology-based Named Entity Classification.” In: Proceedings of RANLP-2005.
- (McDonald et al., 2004) ⇒ Ryan T. McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White and Fernando Pereira. (2004). “An entity tagger for recognizing acquired genomic variations in cancer literature." Bioinformatics 2004 20(17):3249-3251; doi:10.1093/bioinformatics/bth350
- (FEOAL, 2002) ⇒ K Franzén, G Eriksson, F Olsson, L Asker, P Lidén, J. Coster. (2002). “Protein names and how to find them." Elsevier. International Journal of Medical Informatics, Volume 67, Issue 1 - 3, Pages 49 - 61
- Investigates NER of proteins
- (Bikel et al., 1997) ⇒ Daniel Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. (1997). “Nymble: a High-performance Learning Name-finder.” In: Proceedings of Fifth Applied Natural Language Processing Conference (ANLC 1997). doi:10.3115/974557.974586
- NOTE: One of the earlier examples where learning was competitive with manually coded systems.