# Maximum Entropy Markov Model (MEMM)

(Redirected from maximum entropy Markov models (MEMMs))

A Maximum Entropy Markov Model (MEMM) is a discriminative maximum entropy Markov Model that …

**AKA:**Conditional Markov Model (CMM).**Context:**- It can be instantiated as a … (Finite-State Sequence Tagging Model).
- It can be trained by a MEMM Training System (that implements a MEMM training algorithm).
- …

**Counter-Example(s):****See:**Markov Random Field, Logistic Regression Algorithm, Label-Bias Problem.

## References

### 2005

- (Jie Tang, 2005) ⇒ Jie Tang. (2005). “An Introduction for Conditional Random Fields." Literature Survey ¨C 2, Dec, 2005, at Tsinghua.

### 2003

- (Zelenko et al., 2003) ⇒ Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). “Kernel Methods for Relation Extraction.” In: Journal of Machine Learning Research, 3.
- QUOTE: MEMMs are able to model more complex transition and emission probability distributions and take into account various text features.

### 2001

- (Lafferty et al., 2001) ⇒ John D. Lafferty, Andrew McCallum, and Fernando Pereira. (2001). “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In: Proceedings of ICML 2001.
- QUOTE: … avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states.

### 2000

- (McCallum et al., 2000a) ⇒ Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). “Maximum Entropy Markov Models for Information Extraction and Segmentation.” In: Proceedings of ICML-2000.
- This paper presents a new Markovian sequence model, closely related to HMMs, that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We present positive experimental results on the segmentation of FAQ’s.