- (Ratnaparkhi, 1996) ⇒ Adwait Ratnaparkhi. (1996). “A Maximum Entropy Model for Part-of-Speech Tagging.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 1996).
- (Collins, 2003) ⇒ Michael Collins. (2003). “Head-Driven Statistical Models for Natural Language Parsing.” In: Computational Linguistics, 29(4). doi:10.1162/089120103322753356.
- (Lafferty et al., 2001) ⇒ John D. Lafferty, Andrew McCallum, and Fernando Pereira. (2001). “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In: Proceedings of ICML Conference (ICML 2001).
This paper presents a statistical model which trains from a corpus annotated with Part-Of-Speech tags and assigns them to previously unseen text with state-of-the-art accuracy (96.6%). The model can be classified as a Maximum Entropy model and simultaneously uses many contextual "features" to predict the POS tag. Furthermore, this paper demonstrates the use of specialized features to model difficult tagging decisions, discusses the corpus consistency problems discovered during the implementation of these features, and proposes a training strategy that mitigates these problems.
- (Darroch & Ratcliff, 1972) ⇒ John N. Darroch, and Douglas Ratcliff. (1972). “Generalized Iterative Scaling for Log-Linear Models.” In: The Annals of Mathematical Statistics, 43(5).