# Statistical Mixture Model

A Statistical Mixture Model is a probabilistic generative model that can express a mixture probability function (with mixture model components).

**AKA:**Generative Mixture Model, Latent Class Model.**Context:**- It can range from (typically) being a Finite Mixture Model to being an Infinite Mixture Model (Neal, 1992; Rasmussen, 2000)
- It can range from (typically) being a Mixture Density Model (for a mixture density function) to being a Mixture Distribution Model (for a mixture mass function).
- It can be an input to a Mixture Model Fitting Task (and solved by a mixture model fitting system (that applies a mixture model fitting algorithm).
- It can be used to represent the presence of sub-populations without requiring that an observed data-set should identify the sub-population to which an individual observation belongs
- It can be an instance of a Mixture Model Family.

**Example(s):****Counter-Example(s):****See:**EM Algorithm; Unsupervised Learning Algorithm; Gaussian Cluster; Singular Value Decomposition; Density-Based Clustering; Gaussian Distribution; Graphical Models; Learning Graphical Models; Markov Chain Monte Carlo; Model-Based Clustering.

## References

### 2013

- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Mixture_model
- QUOTE:In statistics, a
**mixture model**is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data-set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population-identity information.Some ways of implementing mixture models involve steps that attribute postulated sub-population-identities to individual observations (or weights towards such sub-populations), in which case these can be regarded as types of unsupervised learning or clustering procedures. However not all inference procedures involve such steps.

Mixture models should not be confused with models for compositional data, i.e., data whose components are constrained to sum to a constant value (1, 100%, etc.).

- QUOTE:In statistics, a

### 2011

- (Baxter, 2011b) ⇒ Rohan A. Baxter. (2011). “Mixture Model.” In: (Sammut & Webb, 2011) p.680
- (Marin et al., 2011) ⇒ Jean-Michel Marin, Kerrie Mengersen, and Christian P. Robert. (2013). “Bayesian Modelling and Inference on Mixtures of Distributions.” In: Essential Bayesian Models. ISBN 0444537325.

### 2006

- (Bishop, 2006) ⇒ Christopher M. Bishop. (2006). “Pattern Recognition and Machine Learning." Springer, Information Science and Statistics. ISBN:0387310738

- (Teh et al., 2006) ⇒ Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. (2006). “Hierarchical Dirichlet Processes.” In: Journal of the American Statistical Association.
- QUOTE: We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. ... The discrete nature of the DP makes it unsuitable for general applications in Bayesian nonparametrics, but it is well suited for the problem of placing priors on mixture components in mixture modeling. The idea is basically to associate a mixture component with each atom in [math]G[/math].

### 2001

- (Luo & Hancock, 2001) ⇒ Bin Luo and Edwin R. Hancock. (2001). “Structural Graph Matching Using the EM Algorithm and Singular Value Decomposition.” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10).

### 1993

- (Utans, 1993) ⇒ J. Utans. (1993). “Mixture Models and EM Algorithms for Object Recognition within Compositional Hierarchies.” ICSI Berkeley Technical Report TR-93-004.

### 1994

- (Bailey & Elkan, 1994) ⇒ Timothy L. Bailey, and Charles Elkan. (1994). “Fitting a Mixture Model by Expectation Maximiation to Discover Motifs in Biopolymers." Technical Report: USCS CS94351
- QUOTE:The MM algorithm is an extension of the expectation maximization technique for fitting finite mixture models developed by [[Aitkin and Rubin [1985]]]. ... The MM algorithm searches for maximum likelihood estimates of the parameters of a finite mixture model which could have generated a given dataset of biopolymer sequences.

### 1988

- (McLachlan et al., 1988) ⇒ Geoffrey J. McLachlan, and Kaey E. Basford. (1988). “Mixture Models: Inference and Applications to Clustering.” Marcel Dekker. ISBN:0824776917