657,267
edits
(ContinuousReplacement) Tag: continuous replacement |
m (Text replacement - ".We " to ". We ") |
||
Line 20: | Line 20: | ||
=== Abstract === | === Abstract === | ||
It is often claimed that Named Entity recognition systems need extensive gazetteers --- lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems.We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. [[We]] show that, for the text type and task of this competition, it is sufficient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. [[We]] conclude with observations about the domain independence of the competition and of our experiments. | It is often claimed that Named Entity recognition systems need extensive gazetteers --- lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. [[We]] show that, for the text type and task of this competition, it is sufficient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. [[We]] conclude with observations about the domain independence of the competition and of our experiments. | ||
== References == | == References == |