# Maximum a Posteriori Estimation Algorithm

A Maximum a Posteriori Estimation Algorithm is a point estimation algorithm that can solve a maximum a posteriori estimation task (by producing a maximum a posteriori estimate).

## References

### 2015

• (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation#Description Retrieved:2015-6-15.
• Assume that we want to estimate an unobserved population parameter $\displaystyle{ \theta }$ on the basis of observations $\displaystyle{ x }$ . Let $\displaystyle{ f }$ be the sampling distribution of $\displaystyle{ x }$ , so that $\displaystyle{ f(x|\theta) }$ is the probability of $\displaystyle{ x }$ when the underlying population parameter is $\displaystyle{ \theta }$ . Then the function: : $\displaystyle{ \theta \mapsto f(x | \theta) \! }$ is known as the likelihood function and the estimate: : $\displaystyle{ \hat{\theta}_{\mathrm{ML}}(x) = \underset{\theta}{\operatorname{arg\,max}} \ f(x | \theta) \! }$ is the maximum likelihood estimate of $\displaystyle{ \theta }$ .

Now assume that a prior distribution $\displaystyle{ g }$ over $\displaystyle{ \theta }$ exists. This allows us to treat $\displaystyle{ \theta }$ as a random variable as in Bayesian statistics. Then the posterior distribution of $\displaystyle{ \theta }$ is as follows: : $\displaystyle{ \theta \mapsto f(\theta | x) = \frac{f(x | \theta) \, g(\theta)}{\displaystyle\int_{\vartheta \in \Theta} f(x | \vartheta) \, g(\vartheta) \, d\vartheta} \! }$ where $\displaystyle{ g }$ is density function of $\displaystyle{ \theta }$ , $\displaystyle{ \Theta }$ is the domain of $\displaystyle{ g }$ . This is a straightforward application of Bayes' theorem.

The method of maximum a posterior estimation then estimates $\displaystyle{ \theta }$ as the mode of the posterior distribution of this random variable: : $\displaystyle{ \hat{\theta}_{\mathrm{MAP}}(x) = \underset{\theta}{\operatorname{arg\,max}} \ \frac{f(x | \theta) \, g(\theta)} {\displaystyle\int_{\vartheta} f(x | \vartheta) \, g(\vartheta) \, d\vartheta} = \underset{\theta}{\operatorname{arg\,max}} \ f(x | \theta) \, g(\theta). \! }$ The denominator of the posterior distribution (so-called partition function) does not depend on $\displaystyle{ \theta }$ and therefore plays no role in the optimization. Observe that the MAP estimate of $\displaystyle{ \theta }$ coincides with the ML estimate when the prior $\displaystyle{ g }$ is uniform (that is, a constant function). And when the loss function is of the form: :$\displaystyle{ L(\theta, a) = \begin{cases} 0 & \mbox{, if } |a-\theta|\lt c \\ 1 & \mbox{, otherwise} \\ \end{cases} \! }$ as $\displaystyle{ c }$ goes to 0, the sequence of Bayes estimators approaches the MAP estimator, provided that the distribution of $\displaystyle{ \theta }$ is unimodal. But generally a MAP estimator is not a Bayes estimator unless $\displaystyle{ \theta }$ is discrete.