1999 AnAlgoThatLearnsWhInAName

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Supervised Named Entity Recognition, Hidden Markov Model, IdentiFinderTM.

Notes

Cited By

Quotes

Abstract

In this paper, we present IdentiFinderTM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. We have evaluated the model in English (based on data from the Sixth and Seventh Message Understanding Conferences [MUC-6, MUC-7] and broadcast news) and in Spanish (based on data distributed through the First Multilingual Entity Task [MET-1]), and on speech input (based on broadcast news). We report results here on standard materials only to quantify performance on data available to the community, namely, MUC-6 and MET-1. Results have been consistently better than reported by any other learning algorithm. IdentiFinder's performance is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available. We also present a controlled experiment showing the effect of training set size on performance, demonstrating that as little as 100,000 words of training data is adequate to get performance around 90% on newswire. Although we present our understanding of why this algorithm performs so well on this class of problems, we believe that significant improvement in performance may still be possible.


References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 AnAlgoThatLearnsWhInANameRichard Schwartz
Ralph Weischedel
Daniel M. Bikel
An Algorithm that Learns What‘s in a Namehttp://www.cis.upenn.edu/~dbikel/papers/algthatlearns.doc.pdf10.1023/A:1007558221122