# Hidden Markov Model Family

A Hidden Markov Model Family is a temporal Bayesian metamodel (directed conditional graphical family that is based on a Markov chain) with observable states and hidden states (partial observability).

**AKA:**HMMs, Hidden Markov Models.**Context:**- It can (typically) assumes the Markov Property of strict Independence Assumption.
- It can (typically) not use information from Future States.
- It can (typically) have a Probabilistic Output.
- It can (typically) be a Generative Statistical Model.
- It can (typically) be Computational Tractable. (?)
- It can be instantiated with a Hidden Markov Network Instance.
- It can be an input to a Hidden Markov Model Training System (that applies an HMM training algorithm).

**Example(s):**- a Hierarchical Hidden Markov Model (metamodel).
- a HMM with Gaussian Output.
- a HMM with Mixture of Gaussians Output.
- an Auto Regressive HMM.
- an Input-Output HMM.
- a Coupled HMM.
- a Factorial HMM

**Counter-Example(s):****See:**Markov Model; Bayesian Network; Generative Learning Algorithm; Hidden Markov Logic; Baum-Welch Algorithm; Bayesian Methods; Expectation-Maximization Algorithm; Markov Process; Viterbi Algorithm

## References

### 2011

- (van den Bosch, 2011) ⇒ Antal van den Bosch. (2011). “Hidden Markov Models.” In: (Sammut & Webb, 2011) p.493
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Hidden_Markov_model
- A
**hidden Markov model**(HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (*hidden*) states. An HMM can be considered as the simplest dynamic Bayesian network.In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a

*hidden*Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model passes, not to the parameters of the model; even if the model parameters are known exactly, the model is still 'hidden'.Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.

- A

### 2010

- http://www.lwebzem.com/cgi-bin/courses/course_view.cgi?m=c5_m2_s1.html&user_id=&c=hmm_course
- QUOTE: Hidden Markov Model (HMM) is a statistical model of the finite set of states. Each state is associated with transition probabilities to another states. However the state itself is not visible from outside of the system. The observer only can see events that have different emission probabilities based on the state. …

### 2005

- (Jie Tang, 2005) ⇒ Jie Tang. (2005). “An Introduction for Conditional Random Fields." Literature Survey ¨C 2, Dec, 2005, at Tsinghua.

Cannot represent multiple interacting features or long range dependencies between observed elements.

### 2004

- http://www.cassandra.org/pomdp/pomdp-faq.shtml
- Michael Littman's nifty explanatory grid:

Markov Models |
Do we have control over the state transitons? |
||
---|---|---|---|

NO | YES | ||

Are the states completely observable? |
YES | ## Markov Chain |
## MDPMarkov Decision Process |

NO | ## HMMHidden Markov Model |
## POMDPPartially ObservableMarkov Decision Process |

### 2003

- (Zelenko et al., 2003) ⇒ Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). “Kernel Methods for Relation Extraction.” In: Journal of Machine Learning Research, 3.
- QUOTE: Hidden Markov Models (HMM) (Rabiner, 1990) have been perhaps the most popular approach for adaptive information extraction. HMMs exhibited excellent performance for name extraction (Bikel et al., 1999). Recently, HMM (with various extensions) have been applied to extraction of slots (“speaker”, “time”, etc.) in seminar announcements (Freitag & McCallum, 2000). HMMs are mostly appropriate for modeling local and flat problems. Relation extraction often involves modeling long range dependencies, for which HMM methodology is not directly applicable. Several probabilistic frameworks for modeling sequential data have recently been introduced to alleviate for HMM restrictions. We note Maximum Entropy Markov Models (MEMM) (McCallum et al., 2000) and Conditional Random Fields (CRF) (Lafferty et al., 2001). MEMMs are able to model more complex transition and emission probability distributions and take into account various text features. CRFs are an example of exponential models (Berger et al., 1996); as such, they enjoy a number of attractive properties (e.g., global likelihood maximum) and are better suited for modeling sequential data, as contrasted with other conditional models (Lafferty et al., 2001). They are yet to be experimentally validated for information extraction problems.

### 2000

- (McCallum et al., 2000a) ⇒ Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). “Maximum Entropy Markov Models for Information Extraction and Segmentation.” In: Proceedings of ICML-2000.
- QUOTE: Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial distributions over a discrete vocabulary, and the HMM parameters are set to maximize the likelihood of the observations.

### 1999

- (Bikel et al., 1999) ⇒ Daniel M. Bikel, Richard Schwartz, and Ralph M. Weischedel. (1999). “An Algorithm that Learns What‘s in a Name.” In: Machine Learning, 34. doi:10.1023/A:1007558221122

### 1998

- (Murphy, 1998) ⇒ Kevin P. Murphy. (1998). “A Brief Introduction to Graphical Models and Bayesian Networks." Web tutorial.
- QUOTE: The simplest kind of DBN is a Hidden Markov Model (HMM), which has one discrete hidden node and one discrete or continuous observed node per slice. We illustrate this below. As before, circles denote continuous nodes, squares denote discrete nodes, clear means hidden, shaded means observed.
:File:1998 ABriefIntroToBayesianNetworks hmm4.gif

We have "unrolled" the model for 4 "time slices" -- the structure and parameters are assumed to repeat as the model is unrolled further. Hence to specify a DBN, we need to define the intra-slice topology (within a slice), the inter-slice topology (between two slices), as well as the parameters for the first two slices. (Such a two-slice temporal Bayes net is often called a 2TBN.)

Some common variants on HMMs are shown below:

:File:1998 ABriefIntroToBayesianNetworks hmm zoo.gif

HMM with Gaussian output HMM with mixture of Gaussians output Auto Regressive HMM Input-output HMM Coupled HMM Factorial HMM

- QUOTE: The simplest kind of DBN is a Hidden Markov Model (HMM), which has one discrete hidden node and one discrete or continuous observed node per slice. We illustrate this below. As before, circles denote continuous nodes, squares denote discrete nodes, clear means hidden, shaded means observed.

### 1989

- (Rabiner, 1989) ⇒ Lawrence R. Rabiner. (1989). “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” In: Proceedings of the IEEE, 77(2).
- ABSTRACT: Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to carefully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech.

### 1986

- (Rabiner & Juang, 1986) ⇒ Lawrence R. Rabiner, and B. Juang. (1986). “An Introduction to Hidden Markov Models.” In: IEEE ASSP Magazine, 3(1).