# Dirichlet Process

A Dirichlet Process is a stochastic mixture process whose sample path has a probability function.

**AKA:**DP.**Context:**- It is a Probability Distribution over probability distributions
- A sample drawn from a DP is a Random Discrete Distribution.

- It is a Probability Distribution over probability distributions
**Example(s):****Counter-Example(s):****See:**Dirichlet Process Prior, Dirichlet Distribution Function, Dirichlet Process Mixture Model, Nonparametric Bayesian Algorithm, Bayesian Methods; Bayesian Nonparametrics; Clustering; Density Estimation.

## References

### 2011

- (Yee Whye The, 2011) ⇒ Yee Whye The. (2011). “Dirichlet Process.” In: (Sammut & Webb, 2011) p.280

- http://en.wikipedia.org/wiki/Dirichlet_process#Related_distributions
- The Pitman–Yor distribution (also known as the 'two-parameter Poisson-Dirichlet process') is a generalisation of the Dirichlet process.
- The hierarchical Dirichlet process extends the ordinary Dirichlet process for modelling grouped data.

### 2009

- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Dirichlet_process
- In probability theory, a
**Dirichlet process**over a set [math]S[/math] is a stochastic process whose sample path is a probability distribution on*S*. The finite dimensional distributions are from the Dirichlet distribution: If [math]M[/math] is a finite measure on [math]S[/math] and [math]X[/math] is a random distribution drawn from a Dirichlet process, written as: [math]X \sim \mathrm{DP}\left(M\right)[/math] then for any partition of [math]S[/math], say [math]\left\{B_i\right\}_{i=1}^{n}[/math], we have that [math]\left(X\left(B_1\right),\dots,X\left(B_n\right)\right) \sim \mathrm{Dirichlet}\left(M\left(B_1\right),\dots,M\left(B_n\right)\right)[/math]

- In probability theory, a

### 2006

- (Teh et al., 2006) ⇒ Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. (2006). “Hierarchical Dirichlet Processes.” In: Journal of the American Statistical Association, 101(476). doi:10.1198/016214506000000302
- QUOTE: Our approach to the problem of sharing clusters among multiple, related groups is a nonparametric Bayesian approach, reposing on the Dirichlet process (Ferguson 1973). The Dirichlet process [math]\operatorname{DP}(\alpha_0,G_0)[/math] is a measure on measures. It has two parameters, a
*scaling parameter*[math]\alpha_0 \gt 0[/math] and a base probability measure [math]G_0[/math]. An explicit representation of a draw from a Dirichlet process (DP) was given by Sethuraman (1994), who showed that if [math]G ~ \operatorname{DP}(\alpha_0,G_0)[/math], then with probability one::[math]G = \Sigma_{k=1}^{\infty} \beta_k \delta_{\phi_k} , \text{(1)}[/math]

where the [math]\phi_k[/math] are independent random variables distributed according to [math]G0[/math], where [math]\delta_{\phi_k}[/math] is an atom at [math]\phi_k[/math], and where the “stick-breaking weights” [math]\beta_k[/math] are also random and depend on the parameter [math]\alpha_0[/math] (the definition of the [math]\beta_k[/math] is provided in Section 3.1).

- QUOTE: Our approach to the problem of sharing clusters among multiple, related groups is a nonparametric Bayesian approach, reposing on the Dirichlet process (Ferguson 1973). The Dirichlet process [math]\operatorname{DP}(\alpha_0,G_0)[/math] is a measure on measures. It has two parameters, a

### 2005

- (Blei & Jordan, 2005) ⇒ David M. Blei, and Michael I. Jordan (2005), "Variational Methods for Dirichlet Process Mixtures.” In: Bayesian Analysis, 1.
- QUOTE: Dirichlet process (DP) mixture models are the cornerstone of non-parametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of non-parametric Bayesian methods to a variety of practical data analysis problems.
The Dirichlet process (DP), introduced in Ferguson (1973), is a measure on measures. The DP is parameterized by a base distribution [math]G_0[/math] and a positive scaling parameter ...

- QUOTE: Dirichlet process (DP) mixture models are the cornerstone of non-parametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of non-parametric Bayesian methods to a variety of practical data analysis problems.

### 2000

- (Rasmussen, 2000) ⇒ Carl Rasmussen. (2000). “The Infinite Gaussian Mixture Model." In: Advances in Neural Information Processing Systems (NIPS) 12.

### 1994

- (Sethuraman, 1994) ⇒ J. Sethuraman. (1994). “A Constructive Definition of Dirichlet Priors." Statistica Sinica, 4.

### 1973

- (Ferfuson, 1973) ⇒ Thomas Ferguson. (1973). “Bayesian Analysis of Some Nonparametric Problems.” In: Annals of Statistics 1 (2) doi:10.1214/aos/1176342360
- ABSTRACT: The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be large--with respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called
**Dirichlet process priors**, broad in the sense of (I), for which (II) is realized, and for which treatment of many nonparametric statistical problems may be carried out, yielding results that are comparable to the classical theory.

- ABSTRACT: The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be large--with respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called