# Mean Squared Error (MSE) Measure

A Mean Squared Error (MSE) Measure is a point estimator evaluation metric that is based on the average of the estimator's squared errors.

**AKA:**Squared Error Loss.**Context:**- It has a Gaussian Distribution.
- It can be an input to an MSE Estimation System (that solves an MSE estimation task).
- …

**Counter-Example(s):****See:**Squared Error, Maximum Likelihood Estimate, Expected Value, Omitted-Variable Bias, Bias of an Estimator, Unbiased Estimator, Standard Deviation.

## References

### 2017a

- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). "Squared Error Loss". In: (Sammut & Webb, 2011) p.912

### 2017b

- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). "Mean Squared Error". In: (Sammut & Webb, 2017) p.653
- QUOTE: Mean Squared Error is a model evaluation metric often used with regression models. The mean squared error of a model with respect to a test set is the mean of the squared prediction errors over all instances in the test set. The prediction error is the difference between the true value and the predicted value for an instance. : [math]mse=\frac{∑^n_{i=1}(y_i−λ(x_i))^2}{n}[/math] where y_i is the true target value for test instance x_i, λ(x_i) is the predicted target value for test instance xi, and n is the number of test instances.

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/mean_squared_error Retrieved:2015-1-16.
- In statistics, the
**mean squared error**(MSE) of an estimator measures the average of the squares of the "errors", that is, the difference between the estimator and what is estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or quadratic loss. The difference occurs because of randomness or because the estimator doesn't account for information that could produce a more accurate estimate.^{[1]}The MSE is the second moment (about the origin) of the error, and thus incorporates both the variance of the estimator and its bias. For an unbiased estimator, the MSE is the variance of the estimator. Like the variance, MSE has the same units of measurement as the square of the quantity being estimated. In an analogy to standard deviation, taking the square root of MSE yields the root-mean-square error or root-mean-square deviation (RMSE or RMSD), which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard deviation.

- In statistics, the

- ↑ Lehmann, E. L.; Casella, George (1998). Theory of Point Estimation (2nd ed.). New York: Springer. ISBN 978-0-387-98502-2. MR 1639875

### 2013

- (Wikipedia, 2013) ⇒ http://en.wikipedia.org/wiki/Mean_square_error#Definition_and_basic_properties
- If [math]\hat{Y}[/math] is a vector of n predictions, and [math]Y[/math] is the vector of the true values, then the MSE of the predictor is: :[math]MSE=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2.[/math] This is a known, computed quantity given a particular sample (and hence is sample-dependent).
The MSE of an estimator [math]\hat{\theta}[/math] with respect to the unknown parameter [math]\theta[/math] is defined as :[math]\operatorname{MSE}(\hat{\theta})=\operatorname{E}\big[(\hat{\theta}-\theta)^2\big].[/math] This definition depends on the unknown parameter, and the MSE in this sense is a property of an estimator (of a method of obtaining an estimate).

The MSE is equal to the sum of the variance and the squared bias of the estimator or of the predictions. In the case of the MSE of an estimator,

^{[1]}:[math]\operatorname{MSE}(\hat{\theta})=\operatorname{Var}(\hat{\theta})+ \left(\operatorname{Bias}(\hat{\theta},\theta)\right)^2.[/math] The MSE thus assesses the quality of an estimator or set of predictions in terms of its variation and degree of bias.Since MSE is an expectation, it is not a random variable. It may be a function of the unknown parameter [math]\theta[/math], but it does not depend on any random quantities. However, when MSE is computed for a particular estimator of [math]\theta[/math] the true value of which is not known, it will be subject to estimation error. In a Bayesian sense, this means that there are cases in which it may be treated as a random variable.

- If [math]\hat{Y}[/math] is a vector of n predictions, and [math]Y[/math] is the vector of the true values, then the MSE of the predictor is: :[math]MSE=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2.[/math] This is a known, computed quantity given a particular sample (and hence is sample-dependent).

- ↑ Wackerly, Dennis; Scheaffer, William (2008).
*Mathematical Statistics with Applications*(7 ed.). Belmont, CA, USA: Thomson Higher Education. ISBN 0-495-38508-5.

### 2009

- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Mean_squared_error
- QUOTE: In Statistics, the
**mean squared error**or MSE**of an Estimator is one of many ways to quantify the amount by which an Estimator differs from the true value of the quantity being estimated. As a loss function, MSE is called**squared error loss. MSE measures the average of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated. The difference occurs because of randomness or because the estimator doesn't account for information that could produce a more accurate estimate.^{[1]}

The MSE is the second moment (about the origin) of the error, and thus incorporates both the variance of the estimator and its bias. For an Unbiased Estimator, the MSE is the variance. Like the variance, MSE has the same unit of measurement as the square of the quantity being estimated. In an analogy to Standard Deviation, taking the square root of MSE yields the**Root Mean Squared Error**or**RMSE**, which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard error.

- QUOTE: In Statistics, the

- ↑ George Casella & E.L. Lehmann, "Theory of Point Estimation". Springer, (1999)

### 2001

- (Hotho et al., 2001) ⇒ Andreas Hotho, Alexander Maedche, and Steffen Staab. "Ontology-based Text Clustering.” In: Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision.
- QUOTE: All clustering approaches based on frequencies of terms/concepts and similarities of data points suffer from the same mathematical properties of the underlying spaces (cf. [2; 5]). These properties imply that even when “good” clusters with relatively small mean squared errors. It can be built, these clusters do not exhibit significant structural information as their data points are not really more similar to each other than to many other data points.

### 1993

- (Girod, 1993) ⇒ Bernd Girod. (1993). “What's Wrong with Mean-Squared Error?.” In: Digital images and human vision, MIT press.

### 1981

- (Sheiner & Beal, 1981) ⇒ Lewis B. Sheiner, and Stuart L. Beal. (1981). “Some Suggestions for Measuring Predictive Performance.” In: Journal of Pharmacokinetics and Pharmacodynamics, 9(4). doi:10.1007/BF01060893.
- ABSTRACT: The performance of a prediction or measurement method is often evaluated by computing the correlation coefficient and/or the regression of predictions on true (reference) values. These provide, however, only a poor description of predictive performance. The
**mean squared prediction error**(precision) and the mean prediction error (bias) provide better descriptions of predictive performance. These quantities are easily computed, and can be used to compare prediction methods to absolute standards or to one another. The measures, however, are unreliable when the reference method is imprecise. The use of these measures is discussed and illustrated.

- ABSTRACT: The performance of a prediction or measurement method is often evaluated by computing the correlation coefficient and/or the regression of predictions on true (reference) values. These provide, however, only a poor description of predictive performance. The