# Maximum Likelihood Estimate

A Maximum Likelihood Estimate is a parameter estimation task based on the maximization of the likelihood function.

**AKA:**MLE Value.**Context:**- It can be expressed as [math]\displaystyle{ \hat{\theta}_{MLE} \subseteq \underset{\theta\in\Theta}{\operatorname{Max}}\{\mathcal{L}(x_1,x_2,\cdots,x_n|\theta)\} }[/math] where [math]\displaystyle{ \mathcal{L}(x_1,x_2,\cdots,x_n|\theta) }[/math] is the likelihood function of fixed parameters [math]\displaystyle{ \theta }[/math] and the observations [math]\displaystyle{ x_{1},\cdots ,x_{n} }[/math], [math]\displaystyle{ \Theta }[/math] is the set of possible parameters values.
- It can be the attained by a Maximum Likelihood Estimation System (that is solving an MLE task).
- It can (often) be an Optimality Criterion for the selection of Parameter for a Statistical Model that optimizes the Likelihood Function.
- …

**Example(s):**- a Language Model MLE, by a statistical LM.
- ...

**Counter-Example(s):****See:**Expected Value, Point Estimate, Joint Probability Distribution, Optimization Task.

## References

### 2017 =

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/maximum_likelihood
- In statistics,
**maximum likelihood estimation**(**MLE**) is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters. MLE can be seen as a special case of the maximum a posteriori estimation (MAP) that assumes a uniform prior distribution of the parameters, or as a variant of the MAP that ignores the prior and which therefore is unregularized.The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For example, one may be interested in the heights of adult female penguins, but be unable to measure the height of every single penguin in a population due to cost or time constraints. Assuming that the heights are normally distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the mean and variance as parameters and finding particular parametric values that make the observed results the most probable given the model.

In general, for a fixed set of data and underlying statistical model, the method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the "agreement" of the selected model with the observed data, and for discrete random variables it indeed maximizes the probability of the observed data under the resulting distribution. Maximum likelihood estimation gives a unified approach to estimation, which is well-defined in the case of the normal distribution and many other problems.

- In statistics,

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/maximum_likelihood Retrieved:2015-6-12.
- … Assuming that the heights are normally (Gaussian) distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population. ...

- (Wikipedia, 2015) ⇒ http://wikipedia.org/wiki/Maximum_likelihood#Asymptotic_normality Retrieved:2015-6-12.
**Estimate on boundary.**Sometimes the maximum likelihood estimate lies on the boundary of the set of possible parameters, or (if the boundary is not, strictly speaking, allowed) the likelihood gets larger and larger as the parameter approaches the boundary. Standard asymptotic theory needs the assumption that the true parameter value lies away from the boundary. If we have enough data, the maximum likelihood estimate will keep away from the boundary too. But with smaller samples, the estimate can lie on the boundary. In such cases, the asymptotic theory clearly does not give a practically useful approximation. Examples here would be variance-component models, where each component of variance, σ^{2}, must satisfy the constraint σ^{2}≥0.

### 2009

- http://clopinet.com/isabelle/Projects/ETH/Exam_Questions.html
- The maximum likelihood method of inference chooses the set of parameters of the model that maximize the likelihood.

- http://www.mindspring.com/~scarlson/tc/glossary.htm
- Clad. An optimality criterion that chooses the optimal phylogeny as having the most probability given a statistical model of the changes in characteristics. ...

- http://www.nhc.ed.ac.uk/index.php
- A method of determining which of two or more competing hypotheses (such as alternative phylogenetic trees) yields best fits to the data.

- http://www.bcu.ubc.ca/~otto/EvolDisc/Glossary.html
- A criterion for estimating a parameter from observed data under an explicit model. In phylogenetic analysis, the optimal tree under the maximum ...

### 2006

- (Cox, 2006) ⇒ David R. Cox. (2006). “Principles of Statistical Inference." Cambridge University Press. ISBN:9780521685672

### 2005

- (Kuhn et al., 2005) ⇒ Estelle Kuhn, and Marc Lavielle. (2005). “Maximum Likelihood Estimation in Nonlinear Mixed Effects Models.” In: Computational Statistics & Data Analysis, 49(4).
- QUOTE: … Our purpose is to propose a method for computing the maximum likelihood estimate of the unknown parameter vector [math]\displaystyle{ θ=(β,μ,Γ,σ^2) θ=(β,μ,Γ,σ^2) }[/math] and to compare this method with other existing methods, particularly those based on the maximum likelihood approach. …

### 2003

- (Davison, 2003) ⇒ Anthony C. Davison. (2003). “Statistical Models." Cambridge University Press. ISBN:0521773393

### 1991

- (Efron & Tibshirani, 1991) ⇒ Bradley Efron, and Robert Tibshirani. (1991). “Statistical Data Analysis in the Computer Age.” In: Science, 253(5018). 10.1126/science.253.5018.390
- Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and
**maximum likelihood estimation**, were designed to be implemented on mechanical calculators. ...

- Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and