2007 ASurveyOfNER

(Nadeau & Sekine, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification.” In: Lingvisticae Investigationes, 30(1). doi:10.1075/li.30.1.03nad

Subject Headings: Named Entity, Named Entity Recognition Algorithm, Supervised NER, MUC Performance Metric, ACE Performance Metric

Notes

It is a Survey Paper on the Named Entity Recognition Task.

Cited By

~122 http://scholar.google.com/scholar?q=2007+%22A+Survey+of+Named+Entity+Recognition+and+Classification.%22

2008

(Whitelaw et al., 2008) ⇒ Casey Whitelaw, Alex Kehlenbeck, Nemanja Petrovic, and Lyle Ungar. (2008). “Web-scale named entity recognition.” In: Proceeding of the 17th ACM conference on Information and knowledge management (CIKM 2008). doi:10.1145/1458082.1458102

Quotes

Keywords

NAMED IDENTITY; SURVEY; LEARNING METHOD; FEATURE SPACE; EVALUATION

Abstract

This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words in a document. Evaluation techniques, ranging from intuitive exact match to very complex matching techniques with adjustable cost of errors, are an indisputable key to progress.

{{#ifanon:|

Introduction

The term “Named Entity”, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called “Named Entity Recognition and Classification (NERC)”.

We present here a survey of fifteen years of research in the NERC field, from 1991 to 2006. While early systems were making use of handcrafted rule-based algorithms, modern systems most often resort to machine learning techniques. We survey these techniques as well as other critical aspects of NERC such as features and evaluation methods. It was indeed concluded in a recent conference that the choice of features is at least as important as the choice of technique for obtaining a good NERC system (E. Tjong Kim Sang & De Meulder 2003). Moreover, the way NERC systems are evaluated and compared is essential to progress in the field. To the best of our knowledge, NERC features, techniques, and evaluation methods have not been surveyed extensively yet.

The first section of this survey presents some observations on published work from the point of view of activity per year, supported languages, preferred textual genre and domain, and supported entity types. It was collected from the review of a hundred English language papers sampled from the major conferences and journals. We do not claim this review to be exhaustive or representative of all the research in all languages, but we believe it gives a good feel for the breadth and depth of previous work. Section 2 covers the algorithmic techniques that were proposed for addressing the NERC task. Most techniques are borrowed from the Machine Learning (ML) field. Instead of elaborating on techniques themselves, the third section lists and classifies the proposed features, i.e., descriptions and characteristic of words for algorithmic consumption. Section 4 presents some of the evaluation paradigms that were proposed throughout the major forums. Finally, we present our conclusions.

1 Observations: 1991 to 2006

The computational research aiming at automatically identifying named entities in texts forms a vast and heterogeneous pool of strategies, methods and representations. One of the first research papers in the field was presented by Lisa F. Rau (1991) at the Seventh IEEE Conference on Artificial Intelligence Applications. Rau’s paper describes a system to “extract and recognize [company] names”. It relies on heuristics and handcrafted rules.

1.3 Entity type factor

In the expression “Named Entity”, the word “Named” aims to restrict the task to only those entities for which one or many rigid designators, as defined by S. Kripke (1982), stands for the referent. For instance, the automotive company created by Henry Ford in 1903 is referred to as Ford or Ford Motor Company. Rigid designators include proper names as well as certain natural kind terms like biological species and substances. There is a general agreement in the NERC community about the inclusion of temporal expressions and some numerical expressions such as amounts of money and other types of units.

Early work formulates the NERC problem as recognizing “proper names” in general (e.g., S. Coates-Stephens 1992, C. Thielen 1995). Overall, the most studied types are three specializations of “proper names”: names of “persons”, “locations” and “organizations”. These types are collectively known as “enamex” since the MUC-6 competition.

2.0 Learning methods

The ability to recognize previously unknown entities is an essential part of NERC systems. Such ability hinges upon recognition and classification rules triggered by distinctive features associated with positive and negative examples. While early studies were mostly based on handcrafted rules, most recent ones use supervised machine learning (SL) as a way to automatically induce rule-based systems or sequence labeling algorithms starting from a collection of training examples. This is evidenced, in the research community, by the fact that five systems out of eight were rule-based in the MUC-7 competition while sixteen systems were presented at CONLL-2003, a forum devoted to learning techniques. When training examples are not available, handcrafted rules remain the preferred technique, as shown in S. Sekine and Nobata (2004) who developed a NERC system for 200 entity types.

The idea of supervised learning is to study the features of positive and negative examples of NE over a large collection of annotated documents and design rules that capture instances of a given type. Section 2.1 explains SL approaches in more details. The main shortcoming of SL is the requirement of a large annotated corpus. The unavailability of such resources and the prohibitive cost of creating them lead to two alternative learning methods: semi-supervised learning (SSL) and unsupervised learning (UL). These techniques are presented in section 2.2 and 2.3 respectively.

2.1 Supervised learning

The current dominant technique for addressing the NERC problem is supervised learning. SL techniques include Hidden Markov Models (HMM) (D. Bikel et al. 1997), Decision Trees (S. Sekine 1998), Maximum Entropy Models (ME) (A. Borthwick 1998), Support Vector Machines (SVM) (M. Asahara & Matsumoto 2003), and Conditional Random Fields (CRF) (A. McCallum & Li 2003). These are all variants of the SL approach that typically consist of a system that reads a large annotated corpus, memorizes lists of entities, and creates disambiguation rules based on discriminative features. A baseline SL method that is often proposed consists of tagging words of a test corpus when they are annotated as entities in the training corpus. The performance of the baseline system depends on the vocabulary transfer, which is the proportion of words, without repetitions, appearing in both training and testing corpus. D. Palmer and Day (1997) calculated the vocabulary transfer on the MUC-6 training data. They report a transfer of 21%, with as much as 42% of location names being repeated but only 17% of organizations and 13% of person names. Vocabulary transfer is a good indicator of the recall (number of entities identified over the total number of entities) of the baseline system but is a pessimistic measure since some entities are frequently repeated in documents. A. Mikheev et al. (1999) precisely calculated the recall of the baseline system on the MUC-7 corpus. They report a recall of 76% for locations, 49% for organizations and 26% for persons with precision ranging from 70% to 90%. Whitelaw and Patrick (2003) report consistent results on MUC-7 for the aggregated enamex class. For the three enamex types together, the precision of recognition is 76% and the recall is 48%.

2.2 Semi-supervised learning

The term “semi-supervised” (or “weakly supervised”) is relatively recent. The main technique for SSL is called “bootstrapping” and involves a small degree of supervision, such as a set of seeds, for starting the learning process. For example, a system aimed at “disease names” might ask the user to provide a small number of example names. Then the system searches for sentences that contain these names and tries to identify some contextual clues common to the five examples. Then, the system tries to find other instances of disease names that appear in similar contexts. The learning process is then reapplied to the newly found examples, so as to discover new relevant contexts. By repeating this process, a large number of disease names and a large number of contexts will eventually be gathered. Recent experiments in semi-supervised NERC (Nadeau et al. 2006) report performances that rival baseline supervised approaches. Here are some examples of SSL approaches.

3.0 Feature space for NERc

Features are descriptors or characteristic attributes of words designed for algorithmic consumption. An example of a feature is a Boolean variable with the value true if a word is capitalized and false otherwise. Feature vector representation is an abstraction over text where typically each word is represented by one or many Boolean, numeric and nominal values. For example, a hypothetical NERC system may represent each word of a text with 3 attributes:

1) a Boolean attribute with the value true if the word is capitalized and false otherwise; 2) a numeric attribute corresponding to the length, in characters, of the word; 3) a nominal attribute corresponding to the lowercased version of the word.

In this scenario, the sentence “The president of Apple eats an apple.”, excluding the punctuation, would be represented by the following feature vectors:

 <true, 3, “the”>, <false, 9, “president”>, <false, 2, “of”>, <true, 5, “apple”>, <false, 4, “eats”>, <false, 2, “an”>, <false, 5, “apple”>

Usually, the NERC problem is resolved by applying a rule system over the features. For instance, a system might have two rules, a recognition rule: “capitalized words are candidate entities” and a classification rule: “the type of candidate entities of length greater than 3 words is organization”. These rules work well for the exemplar sentence above. However, real systems tend to be much more complex and their rules are often created by automatic learning techniques.

In this section, we present the features most often used for the recognition and classification of named entities. We organize them along three different axes: Word-level features, List lookup features and Document and corpus features.

3.1 Word-level features

Word-level features are related to the character makeup of words. They specifically describe word case, punctuation, numerical value and special characters. Table 1 lists subcategories of word-level features.

Table 1: Word-level features
Features Examples
Case
Punctuation
Digit
Character
Morphology
Part-of-speech
Function
- Starts with a capital letter
- Word is all uppercased
- The word is mixed case (e.g., ProSys, eBay)
- Ends with period, has internal period (e.g., St., I.B.M.)
- Internal apostrophe, hyphen or ampersand (e.g., O’Connor)
- Digit pattern (see section 3.1.1)
- Cardinal and Ordinal
- Roman number
- Word with digits (e.g., W3C, 3M)
- Possessive mark, first person pronoun
- Greek letters
- Prefix, suffix, singular version, stem
- Common ending (see section 3.1.2)
- proper name, verb, noun, foreign word
- Alpha, non-alpha, n-gram (see section 3.1.3)
- lowercase, uppercase version
- pattern, summarized pattern (see section 3.1.4)
- token length, phrase length

3.1.1 Digit pattern

Digits can express a wide range of useful information such as dates, percentages, intervals, identifiers, etc. Special attention must be given to some particular patterns of digits. For example, two-digit and four-digit numbers can stand for years (D. Bikel et al. 1997) and when followed by an “s”, they can stand for a decade; one and two digits may stand for a day or a month (S. Yu et al. 1998).

3.1.2 Common word ending

Morphological features are essentially related to words affixes and roots. For instance, a system may learn that a human profession often ends in “ist” (journalist, cyclist) or that nationality and languages often ends in “ish” and “an” (Spanish, Danish, Romanian). Another example of common word ending is organization names that often end in “ex”, “tech”, and “soft” (E. Bick 2004).

3.1.3 Functions over words

Features can be extracted by applying functions over words. An example is given by M. Collins and Singer (1999) who create a feature by isolating the non-alphabetic characters of a word (e.g., nonalpha(A.T.&T.) = ..&.) Another example is given by J. Patrick et al. (2002). who use character n-grams as features.

3.1.4 Patterns and summarized patterns

Pattern features were introduced by M. Collins (2002) and then used by others (W. Cohen & Sarawagi 2004 and B. Settles 2004). Their role is to map words onto a small set of patterns over character types. For instance, a pattern feature might map all uppercase letters to “A”, all lowercase letters to “a”, all digits to “0” and all punctuation to “-”:

x = "G.M.": GetPattern(x) = "A-A-"
x = "Machine-223": GetPattern(x) = "Aaaaaaa-000"

The summarized pattern feature is a condensed form of the above in which consecutive character types are not repeated in the mapped string. For instance, the preceding examples become:

x = "G.M.": GetSummarizedPattern(x) = "A-A-"
x = "Machine-223": GetSummarizedPattern(x) = "Aa-0"

3.2 List lookup features

Lists are the privileged features in NERC. The terms “gazetteer”, “lexicon” and “dictionary” are often used interchangeably with the term “list”. List inclusion is a way to express the relation “is a” (e.g., Paris is a city). It may appear obvious that if a word (Paris) is an element of a list of cities, then the probability of this word to be city, in a given text, is high. However, because of word polysemy, the probability is almost never 1 (e.g., the probability of “Fast” to represent a company is low because of the common adjective “fast” that is much more frequent).

Table 2: List lookup features.

Features Examples
General list
List of entities
List of entity cues

- General dictionary (see section 3.2.1) - Stop words (function words) - Capitalized nouns (e.g., January, Monday) - Common abbreviations - Organization, government, airline, educational - First name, last name, celebrity - Astral body, continent, country, state, city - Typical words in organization (see 3.2.2) - Person title, name prefix, post-nominal letters - Location typical word, cardinal point

In Table 2, we present three significant categories of lists used in literature. We could enumerate many more list examples but we decided to concentrate on those aimed at recognizing enamex types.

3.2.1 General dictionary

Common nouns listed in a dictionary are useful, for instance, in the disambiguation of capitalized words in ambiguous positions (e.g., sentence beginning). A. Mikheev (1999) reports that from 2677 words in ambiguous position in a given corpus, a general dictionary lookup allows identifying 1841 common nouns out of 1851 (99.4%) while only discarding 171 named entities out of 826 (20.7%). In other words, 20.7% of named entities are ambiguous with common nouns, in that corpus.

3.2.2 Words that are typical of organization names

Many authors propose to recognize organizations by identifying words that are frequently used in their names. For instance, knowing that “associates” is frequently used in organization names could lead to the recognition of “Computer Associates” and “BioMedia Associates” (D. McDonald 1993, R. Gaizauskas et al. 1995). The same rule applies to frequent first words (“American”, “General”) of an organization (L. Rau 1991). Some authors also exploit the fact that organizations often include the name of a person (F. Wolinski et al. 1995, Y. Ravin & Wacholder 1996) as in “Alfred P. Sloan Foundation”. Similarly, geographic names can be good indicators of an organization name (F. Wolinski et al. 1995) as in “France Telecom”. Organization designators such as “inc” and “corp” (L. Rau 1991) are also useful features.

3.2.3 On the list lookup techniques

Most approaches implicitly require candidate words to exactly match at least one element of a pre-existing list. However, we may want to allow some flexibility in the match conditions. At least three alternate lookup strategies are used in the NERC field. First, words can be stemmed (stripping off both inflectional and derivational suffixes) or lemmatized (normalizing for inflections only) before they are matched (S. Coates-Stephens 1992). For instance, if a list of cue words contains “technology”, the inflected form “technologies” will be considered as a successful match. For some languages (M. Jansche 2002), diacritics can be replaced by their canonical equivalent (e.g., ‘é’ replaced by ‘e’).

Second, candidate words can be “fuzzy-matched” against the reference list using some kind of thresholded edit-distance (Y. Tsuruoka & Tsujii 2003) or Jaro-Winkler (W. Cohen & Sarawagi 2004). This allows capturing small lexical variations in words that are not necessarily derivational or inflectional. For instance, Frederick could match Frederik because the edit-distance between the two words is very small (suppression of just one character, the ‘c’). Jaro-Winkler’s metric was specifically designed to match proper names following the observation that the first letters tend to be correct while name ending often varies.

Third, the reference list can be accessed using the Soundex algorithm (H. Raghavan & Allan 2004) which normalizes candidate words to their respective Soundex codes. This code is a combination of the first letter of a word plus a three digit code that represents its phonetic sound. Hence, similar sounding names like Lewinskey (soundex = l520) and Lewinsky (soundex = l520) are equivalent in respect to their Soundex code.

3.3 Document and corpus features

Document features are defined over both document content and document structure. Large collections of documents (corpora) are also excellent sources of features. We list in this section features that go beyond the single word and multi-word expression and include meta-information about documents and corpus statistics. Table 3: Features from documents. Features Examples Multiple occurrences Local syntax Meta information Corpus frequency - Other entities in the context - Uppercased and lowercased occurrences (see 3.3.1) - Anaphora, coreference (see 3.3.2) - Enumeration, apposition - Position in sentence, in paragraph, and in document - Uri, Email header, XML section, (see section 3.3.3) - Bulleted/numbered lists, tables, figures - Word and phrase frequency - Co-occurrences - Multiword unit permanency (see 3.3.4)

3.3.1 Multiple occurrences and multiple casing

C. Thielen (1995), Y. Ravin and Wacholder (1996) and A. Mikheev (1999) identify words that appear both in uppercased and lowercased form in a single document. Those words are hypothesized to be common nouns that appear both in ambiguous (e.g., sentence beginning) and unambiguous position.

3.3.2 Entity coreference and alias

The task of recognizing the multiple occurrences of a unique entity in a document dates back to the earliest research in the field (D. McDonald 1993, L. Rau 1991). Coreferences are the occurrences of a given word or word sequence referring to a given entity within a document. Deriving features from coreferences is mainly done by exploiting the context of every occurrence (e.g., Macdonald was the first, Macdonald said, was signed by Macdonald, …). Aliases of an entity are the various ways the entity is written in a document. For instance, we may have the following aliases for a given entity: Sir John A. Macdonald, John A. Macdonald, John Alexander Macdonald, and Macdonald. Deriving features from aliases is mainly done by leveraging the union of alias words (Sir, John, A, Alexander, Macdonald).

Finding coreferences and aliases in a text can be reduced to the same problem of finding all occurrences of an entity in a document. This problem is of great complexity. R. Gaizauskas et al. (1995) use 31 heuristic rules to match multiple occurrences of company names. For instance, two multi-word expressions match if one is the initial subsequence of the other. An even more complex task is the recognition of entity mention across documents. X. Li et al. (2004). propose and compare a supervised and an unsupervised model for this task. They propose the use of word-level features engineered to handle equivalences (e.g., prof. is equivalent to professor) and relational features to encode the relative order of tokens between two occurrences. Word-level features are often insufficient for complex problems. A metonymy, for instance, denotates a different concept than the literal denotation of a word (e.g., “New York” that stands for “New York Yankees”, “[[Hexagon[[” that stands for “France”). T. Poibeau (2006) shows that semantic tagging is a key issue in such case.

3.3.3 Document meta-information

Most meta-information about documents can be used directly: email headers are good indicator of person names, news often start with a location name, etc. Some authors make original use of meta-information. J. Zhu et al. (2005) uses document URL to bias probabilities of entities. For instance, many names (e.g., bird names) have high probability to be a “project name” if the URL is from a computer science department domain.

3.3.4 Statistics for Multiword units

J. Da Silva et al. (2004). propose some interesting feature functions for multi-word units that can be thresholded using corpus statistics. For example, they establish a threshold on the presence of rare and long lowercased words in entities. Only multiword units that do not contain rare lowercased words (rarity calculated as relative frequency in the corpus) of a relatively long size (mean size calculated from the corpus) are considered as candidate named entities. They also present a feature called permanency that consist of calculating the frequency of a word (e.g., Life) in a corpus divided by its frequency in case insensitive form (e.g., life, Life, LIFE, etc.)

4.1 MUC evaluations

In MUC events (R. Grishman & Sundheim 1996, N. Chinchor 1999), a system is scored on two axes: its ability to find the correct type (TYPE) and its ability to find exact text (TEXT). A correct TYPE is credited if an entity is assigned the correct type, regardless of boundaries as long as there is an overlap. A correct TEXT is credited if entity boundaries are correct, regardless of the type. For both TYPE and TEXT, three measures are kept: the number of correct answers (COR), the number of actual system guesses (ACT) and the number of possible entities in the solution (POS).

4.3 ACE evaluation

ACE has a complex evaluation procedure. It includes mechanisms for dealing various evaluation issues (partial match, wrong type, etc.). The ACE task definition is also more elaborated than previous tasks at the level of named entity “subtypes”, “class” as well as entity mentions (coreferences), and more, but these supplemental elements will be ignored here.

}}

References

Alfonseca, Enrique; Manandhar, S. (2002). An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery. In: Proceedings of International Conference on General WordNet.
Asahara, Masayuki; Matsumoto, Y. (2003). Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of Human Language Technology conference - North American chapter of the Association for Computational Linguistics.
Roberto Basili; Cammisa, M.; Donati, E. (2005). RitroveRAI: A Web Application for Semantic Indexing and Hyperlinking of Multimedia News. In: Proceedings of International Semantic Web Conference.
Bick, Eckhard (2004). A Named Entity Recognizer for Danish. In: Proceedings of Conference on Language Resources and Evaluation.
Bikel, Daniel M.; Miller, S.; Schwartz, R.; Weischedel, R. (1997). Nymble: a High-Performance Learning Name-finder. In: Proceedings of Conference on Applied Natural Language Processing.
Black, William J.; Rinaldi, F.; Mowatt, D. (1998). Facile: Description of the NE System used for Muc-7. In: Proceedings of Message Understanding Conference.
Bodenreider, Olivier; Zweigenbaum, P. (2000). Identifying Proper Names in Parallel Medical Terminologies. Stud Health Technol Inform 77.443-447, Amsterdam: IOS Press.
Boutsis, Sotiris; Demiros, I.; Giouli, V.; Liakata, M.; Papageorgiou, H.; Piperidis, S. (2000). A System for Recognition of Named Entities in Greek. In: Proceedings of International Conference on Natural Language Processing.
Borthwick, Andrew; Sterling, J.; Eugene Agichtein; Grishman, R. (1998). NYU: Description of the MENE Named Entity System as used in MUC-7. In: Proceedings of Seventh Message Understanding Conference.
Brin, Sergey. (1998). Extracting Patterns and Relations from the World Wide Web. In: Proceedings of Conference of Extending Database Technology. Workshop on the Web and Databases.
Carreras, Xavier; Márques, L.; Padró, L. (2003). Named Entity Recognition for Catalan Using Spanish Resources. In: Proceedings of Conference of the European Chapter of Association for Computational Linguistic.
Chen, H. H.; Lee, J. C. (1996). Identification and Classification of Proper Nouns in Chinese Texts. In: Proceedings of International Conference on Computational Linguistics.
Chinchor, Nancy. (1999). Overview of MUC-7/MET-2. In: Proceedings of Message Understanding Conference MUC-7.
Chinchor, Nancy; Robinson, P.; Brown, E. (1998). Hub-4 Named Entity Task Definition. In: Proceedings of DARPA Broadcast News Workshop.
Cimiano, Philipp; Völker, J. (2005). Towards Large-Scale, Open-Domain and Ontology-based Named Entity Classification. In: Proceedings of Conference on Recent Advances in Natural Language Processing.
Coates-Stephens, Sam. (1992). The Analysis and Acquisition of Proper Names for the Understanding of Free Text. Computers and the Humanities 26.441-456, San Francisco: Morgan Kaufmann Publishers.
Cohen, William W.; Sunita Sarawagi (2004). Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods. In: Proceedings of Conference on Knowledge Discovery in Data.
Collins, Michael. (2002). Ranking Algorithms for Named–Entity Extraction: Boosting and the Voted Perceptron. In: Proceedings of Association for Computational Linguistics.
Collins, Michael; Singer, Y. (1999). Unsupervised Models for Named Entity Classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
Cucchiarelli, Alessandro; Paola Velardi (2001). Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence. Computational Linguistics 27:1.123-131, Cambridge: MIT Press.
Cucerzan, Silviu; Yarowsky, D. (1999). Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
Da Silva, Joaquim Ferreira; Kozareva, Z.; Lopes, G. P. (2004). Cluster Analysis and Classificationof Named Entities. In: Proceedings of Conference on Language Resources and Evaluation.
(Doddington et al., 2004) ⇒ George Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, R. Weischedel. (2004). “The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation.” In: Proceedings of Conference on Language Resources and Evaluation (LREC 2004).
Etzioni, Oren; Cafarella, M.; Downey, D.; Popescu, A.-M.; Shaked, T.; Soderland, S.; Weld, D. S.; Yates, A. (2005). Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence 165.91-134, Essex: Elsevier Science Publishers.
Evans, Richard. (2003). A Framework for Named Entity Recognition in the Open Domain. In: Proceedings of Recent Advances in Natural Language Processing.
Ferro, Lisa; Gerber, L.; Mani, I.; Sundheim, B.; Wilson G. (2005). TIDES 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation.
Fleischman, Michael. (2001). Automated Subcategorization of Named Entities. In: Proceedings of Conference of the European Chapter of Association for Computational Linguistic.
Fleischman, Michael; Hovy. E. (2002). Fine Grained Classification of Named Entities. In: Proceedings of Conference on Computational Linguistics.
Gaizauskas, Robert.; Wakao, T.; Humphreys, K.; Hamish Cunningham; Wilks, Y. (1995). University of Sheffield: Description of the LaSIE System as Used for MUC-6. In: Proceedings of Message Understanding Conference.
Grishman, Ralph; Sundheim, B. (1996). Message Understanding Conference - 6: A Brief History. In: Proceedings of International Conference on Computational Linguistics.
Hearst, Marti. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of International Conference on Computational Linguistics.
Heng, Ji; Grishman, R. (2006). Data Selection in Semi-supervised Learning for Name Tagging. In: Proceedings. joint conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics. Information Extraction beyond the Document.
Huang, Fei. (2005). Multilingual Named Entity Extraction and Translation from Text and Speech. Ph.D. Thesis. Pittsburgh: Carnegie Mellon University.
Jansche, Martin. (2002). Named Entity Extraction with Conditional Markov Models and Classifiers. In: Proceedings of Conference on Computational Natural Language Learning.
Kokkinakis, Dimitri. 1998., AVENTINUS, GATE and Swedish Lingware. In: Proceedings of Nordic Computational Linguistics Conference.
(Kripkey, 1980) ⇒ Saul Kripkey. (1980). “Naming and Necessity.” Harvard University Press.
Lee, Seungwoo; Geunbae Lee, G. (2005). Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping. In: Proceedings of International Joint Conference on Natural Language Processing.
Li, Xin.; Morie, P.; Dan Roth. (2004). Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches. In: Proceedings of National Conference on Artificial Intelligence.
Dekang Lin (1998). Automatic retrieval and clustering of similar words. In: Proceedings of International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics.
McDonald, David D. (1993). Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In: Proceedings of Corpus Processing for Lexical Acquisition.
May, Jonathan; Brunstein, A.; Natarajan, P.; Weischedel, R. M. (2003). Surprise! What’s in a Cebuano or Hindi Name? ACM Transactions on Asian Language Information Processing 2:3.169-180, New York: ACM Press.
Maynard, Diana; Tablan, V.; Ursu, C.; Hamish Cunningham; Wilks, Y. (2001). Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in Natural Language Processing.
McCallum, Andrew; Li, W. (2003). Early Results for Named Entity Recognition with Conditional Random Fields, Features Induction and Web-Enhanced Lexicons. In: Proceedings of Conference on Computational Natural Language Learning.
Mikheev, Andrei. (1999). A Knowledge-free Method for Capitalized Word Disambiguation. In: Proceedings of Conference of Association for Computational Linguistics.
Mikheev, A.; Moens, M.; Grover, C. (1999). Named Entity Recognition without Gazetteers. In: Proceedings of Conference of European Chapter of the Association for Computational Linguistics.
(Minkov et al., 2005) ⇒ Einat Minkov, Richard C. Wang, and William W. Cohen. (2005). “Extracting personal names from email: applying named entity recognition to informal text.” In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. doi:10.3115/1220575.1220631
Nadeau, David; Turney, P.; Stan Matwin (2006). Unsupervised Named Entity Recognition: Generating Gazetteers and Resolving Ambiguity. In: Proceedings of Canadian Conference on Artificial Intelligence.
Narayanaswamy, Meenakshi; Ravikumar K. E.; Vijay-Shanker K. (2003). A Biological Named Entity Recognizer. In: Proceedings of Pacific Symposium on Biocomputing.
Ohta, Tomoko; Tateisi, Y.; Kim, J.; Mima, H.; Jun'ichi Tsujii (2002). The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain. In: Proceedings of Human Language Technology Conference.
Paşca, Marius; Dekang Lin; Bigham, J.; Lifchits, A.; Jain, A. (2006). Organizing and Searching the World Wide Web of Facts — Step One: The One-Million Fact Extraction Challenge. In: Proceedings of National Conference on Artificial Intelligence.
Patrick, Jon; Whitelaw, C.; Munro, R. (2002). SLINERC: The Sydney Language-Independent Named Entity Recogniser and Classifier. In: Proceedings of Conference on Natural Language Learning.
Palmer, David D.; Day, D. S. (1997). A Statistical Profile of the Named Entity Task. In: Proceedings of ACL Conference for Applied Natural Language Processing.
Petasis, Georgios; Vichot, F.; Wolinski, F.; Paliouras, G.; Karkaletsis, V.; Spyropoulos, C. D. (2001). Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems. In: Proceedings of Conference of Association for Computational Linguistics.
Piskorski, Jakub. (2004). Extraction of Polish Named-Entities. In: Proceedings of Conference on Language Resources an Evaluation.
Poibeau, Thierry. (2003). The Multilingual Named Entity Recognition Framework. In: Proceedings of Conference on European chapter of the Association for Computational Linguistics.
Poibeau, Thierry. (2006). Dealing with Metonymic Readings of Named Entities. In: Proceedings of Annual Conference of the Cognitive Science Society.
Poibeau, Thierry; Kosseim, L. (2001). Proper Name Extraction from Non-Journalistic Texts. In: Proceedings of Computational Linguistics in the Netherlands.
Popov, Borislav; Kirilov, A.; Diana Maynard; Manov, D. (2004). Creation of reusable components and language resources for Named Entity Recognition in Russian. In: Proceedings of Conference on Language Resources and Evaluation.
Raghavan, Hema; Allan, J. (2004). Using Soundex Codes for Indexing Names in ASR documents. In: Proceedings of Human Language Technology conference - North American chapter of the Association for Computational Linguistics. Interdisciplinary Approaches to Speech Indexing and Retrieval.
Rau, Lisa F. (1991). Extracting Company Names from Text. In: Proceedings of Conference on Artificial Intelligence Applications of IEEE.
Ravin, Yael; Wacholder, N. (1996). Extracting Names from Natural-Language Text. IBM Research Report RC 2033.
Ellen Riloff; Jones, R (1999). Learning Dictionaries for Information Extraction using Multi-level Bootstrapping. In: Proceedings of National Conference on Artificial Intelligence.
Rindfleisch, Thomas C.; Tanabe, L.; Weinstein, J. N. (2000). EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. In: Proceedings of Pacific Symposium on Biocomputing.
Santos, Diana; Seco, N.; Cardoso, N.; Vilela, R. (2006). HAREM: An Advanced NER Evaluation Contest for Portuguese. In: Proceedings of International Conference on Language Resources and Evaluation.
Satoshi Sekine. (1998). Nyu: Description of the Japanese NE System Used For Met-2. In: Proceedings of Message Understanding Conference.
Satoshi Sekine; Isahara, H. (2000). IREX: IR and IE Evaluation project in Japanese. In: Proceedings of Conference on Language Resources and Evaluation.
Satoshi Sekine; Nobata, C. (2004). Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy. In: Proceedings of Conference on Language Resources and Evaluation.
Settles, Burr. (2004). Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Proceedings of Conference on Computational Linguistics. Joint Workshop on Natural Language Processing in Biomedicine and its Applications.
Shen Dan; Zhang, J.; Zhou, G.; Su, J.; Tan, C. L. (2003). Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain. In: Proceedings of Conference of Association for Computational Linguistics. Natural Language Processing in Biomedicine.
Shinyama, Yusuke; Satoshi Sekine (2004). Named Entity Discovery Using Comparable News Articles. In: Proceedings of International Conference on Computational Linguistics.
Thielen, Christine. (1995). An Approach to Proper Name Tagging for German. In: Proceedings of Conference of European Chapter of the Association for Computational Linguistics. SIGDAT.
Tjong Kim Sang, Erik. F. (2002). Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of Conference on Natural Language Learning.
Tjong Kim Sang, Erik. F.; De Meulder, F. (2003). Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of Conference on Natural Language Learning.
Yoshimasa Tsuruoka; Jun'ichi Tsujii (2003). Boosting Precision and Recall of Dictionary-based Protein Name Recognition. In: Proceedings of Conference of Association for Computational Linguistics. Natural Language Processing in Biomedicine.
Peter D. Turney. (2001). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of European Conference on Machine Learning.
Tzong-Han Tsai, Richard; Wu S.-H.; Chou, W.-C.; Lin, Y.-C.; He, D.; Hsiang, J.; Sung, T.-Y.; Hsu, W.-L. (2006). Various Criteria in the Evaluation of Biomedical Named Entity Recognition. BMC Bioinformatics 7:92, BioMed Central.
Wang, Liang-Jyh; Li, W.-C.; Chang, C.-H. (1992). Recognizing Unregistered Names for Mandarin Word Identification. In: Proceedings of International Conference on Computational Linguistics.
Whitelaw, Casey; Patrick, J. (2003). Evaluating Corpora for Named Entity Recognition Using Character-level Features. In: Proceedings of Australian Conference on Artificial Intelligence.
(Witten et al., 1999b) ⇒ Ian H. Witten, Z. Bray, M. Mahoui, and W. J. Teahan. (1999). “Using Language Models for Generic Entity Extraction.” In: Proceedings of ICML 1999 Workshop on Machine Learning in Text Data Analysis.
Wolinski, Francis; Vichot, F.; Dillet, B. (1995). Automatic Processing Proper Names in Texts. In: Proceedings of Conference on European Chapter of the Association for Computational Linguistics.
Yangarber, Roman; Lin, W.; Grishman, R. (2002). Unsupervised Learning of Generalized Names. In: Proceedings of International Conference on Computational Linguistics.
Yu, Shihong; Bai S.; Wu, P. (1998). Description of the Kent Ridge Digital Labs System Used for MUC-7. In: Proceedings of Message Understanding Conference.
Zhu, Jianhan; Uren, V.; Motta, E. (2005). ESpotter: Adaptive Named Entity Recognition for Web Browsing. In: Proceedings of Conference Professional Knowledge Management. Intelligent IT Tools for Knowledge Management Systems.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 ASurveyOfNER	Satoshi Sekine David Nadeau			A Survey of Named Entity Recognition and Classification		Lingvisticae Investigationes	http://nlp.cs.nyu.edu/sekine/papers/li07.pdf			2007