Coefficient of Determination

A Coefficient of Determination is a statistic for the amount of variability of a continuous dependent variable that can be accounted for by a regression model (typically a linear regression model) on one or more regressors

AKA: R-Squared.
Context:
- It can be calculated by squaring the r-value (Pearson Product-Moment Correlation Coefficient).
- It can be interpreted as the proportion of a response variable's variation that is explained (or not explained) by a regressed model and its regressor variables.
- …
Example(s)
- [math]\displaystyle{ R^2 = 1 }[/math] can indicate that the fitted model explains all variability (not that there is a cause-and-effect relationship).
- [math]\displaystyle{ R^2 = 0 }[/math] can indicate that no 'linear' relationship (for straight line regression, this means that the straight line model is a constant line (slope=0, intercept=) between the response variable and regressors).
- [math]\displaystyle{ R^2 = 0.7 }[/math] can indicate that approximately seventy percent of the variation in the response variable can be explained by the explanatory variable. The remaining thirty percent can be explained by unknown, lurking variables or inherent variability.
- …
Counter-Example(s):
- Coefficient of Correlation/Pearson Product-Moment Correlation Coefficient.
- Root Mean Squared Error (RMSE).
- Standard Deviation: Measures the dispersion or spread of a set of values, but does not directly indicate the accuracy or reliability of a single estimate.
- Coefficient of Variation: A normalized measure of the dispersion of a probability distribution. It does not offer a range within which a population parameter is likely to lie.
- P-Value: Used in hypothesis testing to indicate the probability of observing a test statistic as extreme as the one computed from the sample data. It is not a measure of uncertainty around a specific estimate.
- Margin of Error.
- …
See: Shrinkage.

References

2022

(Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Coefficient_of_determination Retrieved:2022-6-28.
- In statistics, the coefficient of determination, denoted R² or r² and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
  It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model. There are several definitions of R² that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r² is used instead of R². When only an intercept is included, then r² is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values. If additional regressors are included, R² is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination normally ranges from 0 to 1. There are cases where the computational definition of R² can yield negative values, depending on the definition used. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, R² may still be negative, for example when linear regression is conducted without including an intercept, or when a non-linear function is used to fit the data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. The coefficient of determination can be more (intuitively) informative than MAE, MAPE, MSE, and RMSE in regression analysis evaluation, as the former can be expressed as a percentage, whereas the latter measures have arbitrary ranges. It also proved more robust for poor fits compared to SMAPE on the test datasets in the article. When evaluating the goodness-of-fit of simulated (Y_pred) vs. measured (Y_obs) values, it is not appropriate to base this on the R² of the linear regression (i.e., Y_obs= m·Y_pred + b).The R² quantifies the degree of any linear correlation between Y_obs and Y_pred, while for the goodness-of-fit evaluation only one specific linear correlation should be taken into consideration: Y_obs = 1·Y_pred + 0 (i.e., the 1:1 line).

2012

http://en.wikipedia.org/wiki/Overfitting
- QUOTE:... Even when the fitted model does not have an excessive number of parameters, it is to be expected that the fitted relationship will appear to perform less well on a new data set than on the data set used for fitting. In particular, the value of the coefficient of determination will shrink relative to the original training data.

2012

http://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions
- QUOTE:The better the linear regression (on the right) fits the data in comparison to the simple average (on the left graph), the closer the value of [math]\displaystyle{ R^2 }[/math] is to one. The areas of the blue squares represent the squared residuals with respect to the linear regression. The areas of the red squares represent the squared residuals with respect to the average value.]] A data set has values y_i, each of which has an associated modelled value f_i (also sometimes referred to as ŷ_i). Here, the values y_i are called the observed values and the modelled values f_i are sometimes called the predicted values.
  The "variability" of the data set is measured through different sums of squares: :[math]\displaystyle{ SS_\text{tot}=\sum_i (y_i-\bar{y})^2, }[/math] the total sum of squares (proportional to the sample variance); :[math]\displaystyle{ SS_\text{reg}=\sum_i (f_i -\bar{y})^2, }[/math] the regression sum of squares, also called the explained sum of squares. :[math]\displaystyle{ SS_\text{err}=\sum_i (y_i - f_i)^2\, }[/math], the sum of squares of residuals, also called the residual sum of squares. In the above [math]\displaystyle{ \bar{y} }[/math] is the mean of the observed data: [math]\displaystyle{ \bar{y}=\frac{1}{n}\sum_i^n y_i }[/math] where n is the number of observations.
  The notations [math]\displaystyle{ SS_{R} }[/math] and [math]\displaystyle{ SS_{E} }[/math] should be avoided, since in some texts their meaning is reversed to Residual sum of squares and Explained sum of squares, respectively. The most general definition of the coefficient of determination is :[math]\displaystyle{ R^2 \equiv 1 - {SS_{\rm err}\over SS_{\rm tot}}.\, }[/math]

2010

(McAfee, 2010) ⇒ Gerry McAfee. (2010). “Master Math: AP Statistics, 1st Edition.” Course Technology Ptr. ISBN:1435456270
- QUOTE: ... The [math]\displaystyle{ {r^2} }[/math] value (coefficient of determination) is the amount of variability of y that can be explained or accounted for by the linear relationship of y on x. To find [math]\displaystyle{ {r^2} }[/math], we simply square the r-value. Remember, even an [math]\displaystyle{ {r^2} }[/math] value of 1 does not necessarily imply any cause-and-effect relationship!

2008

(Upton & Cook, 2008) ⇒ Graham Upton, and Ian Cook. (2008). “A Dictionary of Statistics, 2nd edition revised." Oxford University Press. ISBN:0199541450
- QUOTE: ANOVA ... After the contributions of all the specified sources of variation have been determined, the remainder, often called the residual sum of squares (RSS) or error sum of squares, is attributed to random variation. The mean square corresponding to RSS is often used as the yardstick for assessing the importance of the specified sources of variation. One method involves comparing ratios of mean squares with the critical values of an F-distribution.
  The proportion of variation explained by the model is [math]\displaystyle{ R^2 = 1 - {RSS \over TSS'}, }[/math] which is sometimes called the coefficient of determination. ...

Coefficient of Determination

References

2022

2012

2012

2010

2008

Navigation menu

Search