2002 DiscriminativeTrainingMethodsForHMM
- (Collins, 2002b) ⇒ Michael Collins. (2002). “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with the Perceptron Algorithm.” In: Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing, (EMNLP 2002). doi:10.3115/1118693.1118694
Subject Headings: Voted Perceptron Model, Part-of-Speech Tagging Task, Base Noun Phrase Chunking Task, Discriminative Training Algorithm.
Notes
- This is a companion paper to (Collins, 2002a)
Cited By
- ~558 …
2006
- (Richardon & Domingos, 2006) ⇒ Matthew Richardson, and Pedro Domingos. (2006). “Markov Logic Networks.” In: Machine Learning, 62. doi:10.1007/s10994-006-5833-1.
2005
- (Collins & Koo, 2005) ⇒ Michael Collins, and Terry Koo. (2005). “Discriminative Reranking for Natural Language Parsing.” In: Computational Linguistics, 31(1) doi:10.1162/0891201053630273
2003
- (Sha & Pereira, 2003a) ⇒ Fei Sha, and Fernando Pereira. (2003). “Shallow Parsing with Conditional Random Fields.” In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). doi:10.3115/1073445.1073473
Quotes
Abstract
We describe new algorithms for training tagging Model|models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on part-of-speech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximum-entropy tagger.
…
2 Parameter Estimation
2.1 HMM Taggers
... As an alternative to maximum–likelihood parameter estimates, this paper will propose the following estimation algorithm. Say the training set consists of [math]\displaystyle{ n }[/math] tagged sentences, the ith sentence being of length ni
... Maximum-entropy models represent the tagging task through a feature-vector representation of history-tag pairs. A feature vector representation : H×T ! Rd is a function that maps a history–tag pair to a d-dimensional feature vector. Each component s(h, t) for s = 1 . . . . could be an arbitrary function of (h, t). It is common (e.g., see (Ratnaparkhi 96)) for each feature s to be an indicator function. For example, one such feature might be Theta1000(h,t) = 1 if current word wi is “the” and t=DT
otherwise.
2.2 Local and Global Feature Vectors
We now describe how to generalize the algorithm to more general representations of tagged sequences. In this section we describe the feature-vector representations which are commonly used in maximum-entropy models for tagging, and which are also used in this paper.
…
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2002 DiscriminativeTrainingMethodsForHMM | Michael Collins | Discriminative Training Methods for Hidden Markov Models: Theory and experiments with the perceptron algorithm | Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing | http://www.ai.mit.edu/people/mcollins/papers/tagperc.pdf | 10.3115/1118693.1118694 | 2002 |