2002 DiscriminativeTrainingMethodsForHMM

Jump to navigation Jump to search

Subject Headings: Voted Perceptron Model, Part-of-Speech Tagging Task, Base Noun Phrase Chunking Task, Discriminative Training Algorithm.


Cited By

  • ~558 …






We describe new algorithms for training tagging Model|models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on part-of-speech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximum-entropy tagger.

2 Parameter Estimation

2.1 HMM Taggers

... As an alternative to maximum–likelihood parameter estimates, this paper will propose the following estimation algorithm. Say the training set consists of [math]\displaystyle{ n }[/math] tagged sentences, the ith sentence being of length ni

... Maximum-entropy models represent the tagging task through a feature-vector representation of history-tag pairs. A feature vector representation  : H×T ! Rd is a function that maps a history–tag pair to a d-dimensional feature vector. Each component s(h, t) for s = 1 . . . . could be an arbitrary function of (h, t). It is common (e.g., see (Ratnaparkhi 96)) for each feature s to be an indicator function. For example, one such feature might be Theta1000(h,t) = 1 if current word wi is “the” and t=DT otherwise.

2.2 Local and Global Feature Vectors

We now describe how to generalize the algorithm to more general representations of tagged sequences. In this section we describe the feature-vector representations which are commonly used in maximum-entropy models for tagging, and which are also used in this paper.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 DiscriminativeTrainingMethodsForHMMMichael CollinsDiscriminative Training Methods for Hidden Markov Models: Theory and experiments with the perceptron algorithmProceedings of the ACL Conference on Empirical Methods in Natural Language Processinghttp://www.ai.mit.edu/people/mcollins/papers/tagperc.pdf10.3115/1118693.11186942002