Maximum a Posteriori Estimation Algorithm

Jump to navigation Jump to search

A Maximum a Posteriori Estimation Algorithm is a point estimation algorithm that can solve a maximum a posteriori estimation task (by producing a maximum a posteriori estimate).



  • (Wikipedia, 2015) ⇒ Retrieved:2015-6-15.
    • Assume that we want to estimate an unobserved population parameter [math]\displaystyle{ \theta }[/math] on the basis of observations [math]\displaystyle{ x }[/math] . Let [math]\displaystyle{ f }[/math] be the sampling distribution of [math]\displaystyle{ x }[/math] , so that [math]\displaystyle{ f(x|\theta) }[/math] is the probability of [math]\displaystyle{ x }[/math] when the underlying population parameter is [math]\displaystyle{ \theta }[/math] . Then the function: : [math]\displaystyle{ \theta \mapsto f(x | \theta) \! }[/math] is known as the likelihood function and the estimate: : [math]\displaystyle{ \hat{\theta}_{\mathrm{ML}}(x) = \underset{\theta}{\operatorname{arg\,max}} \ f(x | \theta) \! }[/math] is the maximum likelihood estimate of [math]\displaystyle{ \theta }[/math] .

      Now assume that a prior distribution [math]\displaystyle{ g }[/math] over [math]\displaystyle{ \theta }[/math] exists. This allows us to treat [math]\displaystyle{ \theta }[/math] as a random variable as in Bayesian statistics. Then the posterior distribution of [math]\displaystyle{ \theta }[/math] is as follows: : [math]\displaystyle{ \theta \mapsto f(\theta | x) = \frac{f(x | \theta) \, g(\theta)}{\displaystyle\int_{\vartheta \in \Theta} f(x | \vartheta) \, g(\vartheta) \, d\vartheta} \! }[/math] where [math]\displaystyle{ g }[/math] is density function of [math]\displaystyle{ \theta }[/math] , [math]\displaystyle{ \Theta }[/math] is the domain of [math]\displaystyle{ g }[/math] . This is a straightforward application of Bayes' theorem.

      The method of maximum a posterior estimation then estimates [math]\displaystyle{ \theta }[/math] as the mode of the posterior distribution of this random variable: : [math]\displaystyle{ \hat{\theta}_{\mathrm{MAP}}(x) = \underset{\theta}{\operatorname{arg\,max}} \ \frac{f(x | \theta) \, g(\theta)} {\displaystyle\int_{\vartheta} f(x | \vartheta) \, g(\vartheta) \, d\vartheta} = \underset{\theta}{\operatorname{arg\,max}} \ f(x | \theta) \, g(\theta). \! }[/math] The denominator of the posterior distribution (so-called partition function) does not depend on [math]\displaystyle{ \theta }[/math] and therefore plays no role in the optimization. Observe that the MAP estimate of [math]\displaystyle{ \theta }[/math] coincides with the ML estimate when the prior [math]\displaystyle{ g }[/math] is uniform (that is, a constant function). And when the loss function is of the form: :[math]\displaystyle{ L(\theta, a) = \begin{cases} 0 & \mbox{, if } |a-\theta|\lt c \\ 1 & \mbox{, otherwise} \\ \end{cases} \! }[/math] as [math]\displaystyle{ c }[/math] goes to 0, the sequence of Bayes estimators approaches the MAP estimator, provided that the distribution of [math]\displaystyle{ \theta }[/math] is unimodal. But generally a MAP estimator is not a Bayes estimator unless [math]\displaystyle{ \theta }[/math] is discrete.