# Maximum Likelihood Estimation Task

A Maximum Likelihood Estimation Task is a point estimation task that requires a maximum-likelihood estimate (that maximizes the (log‐)likelihood of the data).

**AKA:**MLE Fitting.**Context:****output:**MLE Value.- It can (typically) be performed when the Error Term Distribution is known to belong to a certain Parametric Family of Probability Distributions.
- It can range from being Regularized MLE to being Non-Regularized MLE.
- It can range from being a Simple MLE Task to being a Constrained MLE Task.
- It can be solved by an MLE-based System (that implements an MLE algorithm).
- It can (often) support a Parameter Optimization Task (as an optimality criterion).

**Example(s):**- Given a binomial stochastic process [math]B(n,p)[/math] such as a simple [[slot machine] with [math]n=5[/math] Bernoulli events of which [math]k=1[/math] is a success event, what is the maximum likelihood estimate for the success probability [math]p[/math]?
[math]= \frac{k+1}{n+2} = \frac{1+1}{5+2} = \frac{2}{7}[/math].

- Given a binomial stochastic process [math]B(n,p)[/math] such as a simple [[slot machine] with [math]n=5[/math] Bernoulli events of which [math]k=1[/math] is a success event, what is the maximum likelihood estimate for the success probability [math]p[/math]?
**Counter-Example(s):****See:**Prior Probability, Maximum Expected Value

## References

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Linear_regression#Maximum-likelihood_estimation_and_related_techniques Retrieved:2015-5-11.
- '
*Maximum likelihood estimation can be performed when the distribution of the error terms is known to belong to a certain parametric family*ƒ_{θ}of probability distributions.^{[1]}When*f*_{θ}is a normal distribution with zero mean and variance θ, the resulting estimate is identical to the OLS estimate. GLS estimates are maximum likelihood estimates when ε follows a multivariate normal distribution with a known covariance matrix.

- '

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/maximum_likelihood Retrieved:2015-5-11.
- Maximum likelihood estimation is used for a wide range of statistical models, including:
- linear models and generalized linear models;
- exploratory and confirmatory factor analysis;
- structural equation modeling;
- many situations in the context of hypothesis testing and confidence interval \
- discrete choice models;

- These uses arise across applications in widespread set of fields, including:
- communication systems;
- psychometrics;
- econometrics;
- time-delay of arrival (TDOA) in acoustic or electromagnetic detection;
- data modeling in nuclear and particle physics;
- magnetic resonance imaging;
^{[2]}^{[3]} - computational phylogenetics;
- origin/destination and path-choice modeling in transport networks;
- geographical satellite-image classification.

- Maximum likelihood estimation is used for a wide range of statistical models, including:

- ↑ Lange, Kenneth L.; Little, Roderick J. A.; Taylor,Jeremy M. G. (1989). "Robust Statistical Modeling Using the t Distribution".
*Journal of the American Statistical Association***84**(408): 881–896. doi:10.2307/2290063. JSTOR 2290063. - ↑ Sijbers, Jan; den Dekker, A.J. (2004). "Maximum Likelihood estimation of signal amplitude and noise variance from MR data".
*Magnetic Resonance in Medicine***51**(3): 586–594. doi:10.1002/mrm.10728. PMID 15004801. - ↑ Sijbers, Jan; den Dekker, A.J.; Scheunders, P.; Van Dyck, D. (1998). "Maximum Likelihood estimation of Rician distribution parameters".
*IEEE Transactions on Medical Imaging***17**(3): 357–361. doi:10.1109/42.712125. PMID 9735899.

### 2013

- http://stats.stackexchange.com/questions/65212/example-of-maximum-a-posteriori-estimation
- QUOTE: both ML and MAP are point estimators (they return an optimal set of weights, rather than a distribution of optimal weights).

### 2011

- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Maximum_likelihood
- In statistics,
**maximum-likelihood estimation**(MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters.The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For example, one may be interested in the heights of adult female giraffes, but be unable due to cost or time constraints, to measure the height of every single giraffe in a population. Assuming that the heights are normally (Gaussian) distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the mean and variance as parameters and finding particular parametric values that make the observed results the most probable (given the model).

In general, for a fixed set of data and underlying statistical model, the method of maximum likelihood selects values of the model parameters that produce a distribution that gives the observed data the greatest probability (i.e., parameters that maximize the likelihood function). Maximum-likelihood estimation gives a unified approach to estimation, which is well-defined in the case of the normal distribution and many other problems. However, in some complicated problems, difficulties do occur: in such problems, maximum-likelihood estimators are unsuitable or do not exist.

- In statistics,

### 2009

- http://clopinet.com/isabelle/Projects/ETH/Exam_Questions.html
- QUOTE: The maximum likelihood method of inference chooses the set of parameters of the model that maximize the likelihood.

- http://www.bcu.ubc.ca/~otto/EvolDisc/Glossary.html
- QUOTE: A criterion for estimating a parameter from observed data under an explicit model. In phylogenetic analysis, the optimal tree under the maximum ...

### 2006

- (Cox, 2006) ⇒ David R. Cox. (2006). “Principles of Statistical Inference." Cambridge University Press. ISBN:9780521685672

### 2003

- (Myung, 2003) ⇒ In Jae Myung. (2003). “Tutorial on Maximum Likelihood Estimation.” In: Journal of Mathematical Psychology, 47. doi:10.1016/S0022-2496(02)00028-7
- QUOTE: There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in psychology (e.g., Rubin, Hinton, & Wenzel, 1999; Lamberts, 2000 but see Usher & McClelland, 2001) and is tied to many familiar statistical concepts such as linear regression, sum of squares error, proportion variance accounted for (i.e. [math]r^2[/math]), and root mean squared deviation. LSE, which unlike MLE requires no or minimal distributional assumptions, is useful for obtaining a descriptive measure for the purpose of summarizing observed data, but it has no basis for testing hypotheses or constructing confidence intervals.
On the other hand, MLE is not as widely recognized among modelers in psychology, but it is a standard approach to parameter estimation and inference in statistics. MLE has many optimal properties in estimation: sufficiency (complete information about the parameter of interest contained in its MLE estimator); consistency (true parameter value that generated the data recovered asymptotically, i.e. for data of sufficiently large samples); efficiency (lowest-possible variance of parameter estimates achieved asymptotically); and parameterization invariance (same MLE solution obtained independent of the parametrization used).

- QUOTE: There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in psychology (e.g., Rubin, Hinton, & Wenzel, 1999; Lamberts, 2000 but see Usher & McClelland, 2001) and is tied to many familiar statistical concepts such as linear regression, sum of squares error, proportion variance accounted for (i.e. [math]r^2[/math]), and root mean squared deviation. LSE, which unlike MLE requires no or minimal distributional assumptions, is useful for obtaining a descriptive measure for the purpose of summarizing observed data, but it has no basis for testing hypotheses or constructing confidence intervals.

### 1991

- (Efron & Tibshirani, 1991) ⇒ Bradley Efron, and Robert Tibshirani. (1991). “Statistical Data Analysis in the Computer Age.” In: Science, 253(5018). 10.1126/science.253.5018.390
- QUOTE: Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and maximum likelihood estimation, were designed to be implemented on mechanical calculators. ...