Ordinary Least Squares Regression Algorithm

From GM-RKB
(Redirected from ordinary least squares)
Jump to navigation Jump to search

An Ordinary Least Squares Regression Algorithm is a least squares regression algorithm that can be implemented by an ordinary least-squares regression system to solve an ordinary least-squares regression task (that minimizes the sum of squared distances between the observed response variables and the regressed model's fitted response variables against some regression dataset) whose input is an un-regularized regression model.



References

2014

2011

  • (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Linear_regression#Estimation_methods
    • QUOTE: … Ordinary least squares (OLS) is the simplest and thus most common estimator. It is conceptually simple and computationally straightforward. OLS estimates are commonly used to analyze both experimental and observational data. The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter β: [math]\displaystyle{ \hat\beta = (X'X)^{-1} X'y = \big(\, \tfrac{1}{n}{\textstyle\sum} x_i x'_i \,\big)^{-1} \big(\, \tfrac{1}{n}{\textstyle\sum} x_i y_i \,\big) }[/math] The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors[1]: [math]\displaystyle{ \operatorname{E}[\,x_i\varepsilon_i\,] = 0. }[/math] It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E[εi2|xi] does not depend on i. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate z that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.

      In simple linear regression, where there is only one regressor (with a constant), the OLS coefficient estimates have a simple form that is closely related to the correlation coefficient between the covariate and the response.

  1. Lai, T.L.; Robbins,H; Wei, C.Z. (1978). "Strong consistency of least squares estimates in multiple regression". Proceedings of the National Academy of Sciences USA 75 (7). 

2009

1996

  • (Tibshirani, 1996) ⇒ Robert Tibshirani. (1996). “Regression Shrinkage and Selection via the Lasso.” In: Journal of the Royal Statistical Society, Series B, 58(1).
    • Consider the usual regression situation: we have data [math]\displaystyle{ (\mathbf{x}^i, y^i), i=1,2,...,N \ , }[/math] where [math]\displaystyle{ \mathbf{x}^i=(x_{i1},..., x_{ip})^T }[/math] and [math]\displaystyle{ y_i }[/math] are the regressors and response for the ith observation. The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error. There are two reasons why the data analyst is often not satisfied with the OLS estimates. The first is prediction accuracy: the OLS estimates often have low bias but large variance; prediction accuracy can sometimes be improved by shrinking or setting to 0 some coefficients. By doing so we sacrifice a little bias to reduce the variance of the predicted values and hence may improve the overall prediction accuracy. The second reason is interpretation. With a large number of predictors, we often would like to determine a smaller subset that exhibits the strongest effects.

      The two standard techniques for improving the OLS estimates, subset selection and ridge regression, both have drawbacks. Subset selection provides interpretable models but can be extremely variable because it is a discrete process- regressors are either retained or dropped from the model. Small changes in the data can result in very different models being selected and this can reduce its prediction accuracy. Ridge regression is a continuous process that shrinks coefficients and hence is more stable: however, it does not set any coefficients to 0 and hence does not give an easily interpretable model.