Hidden Markov Model Family

A Hidden Markov Model Family is a temporal Bayesian metamodel (directed conditional graphical family that is based on a Markov chain) with observable states and hidden states (partial observability).

AKA: HMMs, Hidden Markov Models.
Context:
- It can (typically) assumes the Markov Property of strict Independence Assumption.
- It can (typically) not use information from Future States.
- It can (typically) have a Probabilistic Output.
- It can (typically) be a Generative Statistical Model.
- It can (typically) be Computational Tractable. (?)
- It can be instantiated with a Hidden Markov Network Instance.
- It can be an input to a Hidden Markov Model Training System (that applies an HMM training algorithm).
Example(s):
- a Hierarchical Hidden Markov Model (metamodel).
- a HMM with Gaussian Output.
- a HMM with Mixture of Gaussians Output.
- an Auto Regressive HMM.
- an Input-Output HMM.
- a Coupled HMM.
- a Factorial HMM.
Counter-Example(s):
- a Maximum Entropy Markov Model.
- a Conditional Probabilistic Graphical Model, such as a Conditional Random Fields Model.
- a Markov Decision Process.
See: Markov Model; Bayesian Network; Generative Learning Algorithm; Hidden Markov Logic; Baum-Welch Algorithm; Bayesian Methods; Expectation-Maximization Algorithm; Markov Process; Viterbi Algorithm.

References

2011

(van den Bosch, 2011) ⇒ Antal van den Bosch. (2011). “Hidden Markov Models.” In: (Sammut & Webb, 2011) p.493
(Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Hidden_Markov_model
- A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network.
  In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model passes, not to the parameters of the model; even if the model parameters are known exactly, the model is still 'hidden'.
  Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.
  A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.

Markov Models		Do we have control over the state transitons?
Markov Models		NO	YES
Are the states completely observable?	YES	Markov Chain	MDP Markov Decision Process
Are the states completely observable?	NO	HMM Hidden Markov Model	POMDP Partially Observable Markov Decision Process

2003

(Zelenko et al., 2003) ⇒ Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). “Kernel Methods for Relation Extraction.” In: Journal of Machine Learning Research, 3.
- QUOTE: Hidden Markov Models (HMM) (Rabiner, 1990) have been perhaps the most popular approach for adaptive information extraction. HMMs exhibited excellent performance for name extraction (Bikel et al., 1999). Recently, HMM (with various extensions) have been applied to extraction of slots (“speaker”, “time”, etc.) in seminar announcements (Freitag & McCallum, 2000). HMMs are mostly appropriate for modeling local and flat problems. Relation extraction often involves modeling long range dependencies, for which HMM methodology is not directly applicable. Several probabilistic frameworks for modeling sequential data have recently been introduced to alleviate for HMM restrictions. We note Maximum Entropy Markov Models (MEMM) (McCallum et al., 2000) and Conditional Random Fields (CRF) (Lafferty et al., 2001). MEMMs are able to model more complex transition and emission probability distributions and take into account various text features. CRFs are an example of exponential models (Berger et al., 1996); as such, they enjoy a number of attractive properties (e.g., global likelihood maximum) and are better suited for modeling sequential data, as contrasted with other conditional models (Lafferty et al., 2001). They are yet to be experimentally validated for information extraction problems.

2000

(McCallum et al., 2000a) ⇒ Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). “Maximum Entropy Markov Models for Information Extraction and Segmentation.” In: Proceedings of ICML-2000.
- QUOTE: Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial distributions over a discrete vocabulary, and the HMM parameters are set to maximize the likelihood of the observations.

1999

(Bikel et al., 1999) ⇒ Daniel M. Bikel, Richard Schwartz, and Ralph M. Weischedel. (1999). “An Algorithm that Learns What‘s in a Name.” In: Machine Learning, 34. doi:10.1023/A:1007558221122

1998

(Murphy, 1998) ⇒ Kevin P. Murphy. (1998). “A Brief Introduction to Graphical Models and Bayesian Networks." Web tutorial.
- QUOTE: The simplest kind of DBN is a Hidden Markov Model (HMM), which has one discrete hidden node and one discrete or continuous observed node per slice. We illustrate this below. As before, circles denote continuous nodes, squares denote discrete nodes, clear means hidden, shaded means observed.
  :File:1998 ABriefIntroToBayesianNetworks hmm4.gif
  We have "unrolled" the model for 4 "time slices" -- the structure and parameters are assumed to repeat as the model is unrolled further. Hence to specify a DBN, we need to define the intra-slice topology (within a slice), the inter-slice topology (between two slices), as well as the parameters for the first two slices. (Such a two-slice temporal Bayes net is often called a 2TBN.)
  Some common variants on HMMs are shown below:
  :File:1998 ABriefIntroToBayesianNetworks hmm zoo.gif
  HMM with Gaussian output HMM with mixture of Gaussians output Auto Regressive HMM Input-output HMM Coupled HMM Factorial HMM

1989

(Rabiner, 1989) ⇒ Lawrence R. Rabiner. (1989). “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” In: Proceedings of the IEEE, 77(2).
- ABSTRACT: Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to carefully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech.

1986

(Rabiner & Juang, 1986) ⇒ Lawrence R. Rabiner, and B. Juang. (1986). “An Introduction to Hidden Markov Models.” In: IEEE ASSP Magazine, 3(1).

Hidden Markov Model Family

References

2011

2010

2005

2004

Markov Chain

MDP

HMM

POMDP

2003

2000

1999

1998

1989

1986

Navigation menu

Search