# Ordinary Least Squares Regression Algorithm

An Ordinary Least Squares Regression Algorithm is a least squares regression algorithm that can be implemented by an ordinary least-squares regression system to solve an ordinary least-squares regression task (that minimizes the sum of squared distances between the observed response variables and the regressed model's fitted response variables against some regression dataset) whose input is an un-regularized regression model.

**AKA:**Un-Reqularized OLS.**Context:**- It can range from being a Ordinary Least Squares Linear Regression Algorithm to being a Ordinary Least Squares Non-Linear Regression Algorithm (for nonlinear least-squares).
- It can be implemented by an Ordinary Least Squares System (to solve an ordinary least-squares task).
- It can produce an Ordinary Least Squares Estimate.
- It can (typically) be a Brittle Regression Algorithm (that is not robust to outliers).
- …

**Counter-Example(s):****See:**Function Fitting Algorithm, Optimization Algorithm, Residual Squared Error.

## References

### 2014

- http://statsmodels.sourceforge.net/stable/regression.html
- QUOTE: Linear models with independently and identically distributed errors, and for errors with heteroscedasticity or autocorrelation. This module allows estimation by ordinary least squares (OLS), weighted least squares (WLS), generalized least squares (GLS), and feasible generalized least squares with autocorrelated AR(p) errors.

### 2011

- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Linear_regression#Estimation_methods
- QUOTE: …
**Ordinary least squares**(OLS) is the simplest and thus most common estimator. It is conceptually simple and computationally straightforward. OLS estimates are commonly used to analyze both experimental and observational data. The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter*β*: [math]\displaystyle{ \hat\beta = (X'X)^{-1} X'y = \big(\, \tfrac{1}{n}{\textstyle\sum} x_i x'_i \,\big)^{-1} \big(\, \tfrac{1}{n}{\textstyle\sum} x_i y_i \,\big) }[/math] The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors^{[1]}: [math]\displaystyle{ \operatorname{E}[\,x_i\varepsilon_i\,] = 0. }[/math] It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E[*ε*|_{i}^{2}*x*] does not depend on_{i}*i*. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate*z*that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of*β*. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.In simple linear regression, where there is only one regressor (with a constant), the OLS coefficient estimates have a simple form that is closely related to the correlation coefficient between the covariate and the response.

- QUOTE: …

- ↑ Lai, T.L.; Robbins,H; Wei, C.Z. (1978). "Strong consistency of least squares estimates in multiple regression".
*Proceedings of the National Academy of Sciences USA***75**(7).

### 2009

- http://en.wikipedia.org/wiki/Ordinary_least_squares
- In Statistics and Econometrics,
**ordinary least squares**(OLS) is a technique for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared distances between the observed responses in a set of data, and the fitted responses from the regression model. The linear least squares computational technique provides simple expressions for the estimated parameters in an OLS analysis, and hence for associated statistical values such as the standard errors of the parameters. OLS can mathematically be shown to be an optimal estimator in certain situations, and is closely related to the generalized least squares (GLS) estimation approach that is optimal in a broader set of situations. OLS can be derived as a maximum likelihood estimator under the assumption that the data are normally distributed, however the method has good statistical properties for a much broader class of distributions.

- In Statistics and Econometrics,

### 1996

- (Tibshirani, 1996) ⇒ Robert Tibshirani. (1996). “Regression Shrinkage and Selection via the Lasso.” In: Journal of the Royal Statistical Society, Series B, 58(1).
- Consider the usual regression situation: we have data [math]\displaystyle{ (\mathbf{x}^i, y^i), i=1,2,...,N \ , }[/math] where [math]\displaystyle{ \mathbf{x}^i=(x_{i1},..., x_{ip})^T }[/math] and [math]\displaystyle{ y_i }[/math] are the regressors and response for the
*i*th observation. The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error. There are two reasons why the data analyst is often not satisfied with the OLS estimates. The first is*prediction accuracy*: the OLS estimates often have low bias but large variance; prediction accuracy can sometimes be improved by shrinking or setting to 0 some coefficients. By doing so we sacrifice a little bias to reduce the variance of the predicted values and hence may improve the overall prediction accuracy. The second reason is*interpretation*. With a large number of predictors, we often would like to determine a smaller subset that exhibits the strongest effects.The two standard techniques for improving the OLS estimates, subset selection and ridge regression, both have drawbacks. Subset selection provides interpretable models but can be extremely variable because it is a discrete process- regressors are either retained or dropped from the model. Small changes in the data can result in very different models being selected and this can reduce its prediction accuracy. Ridge regression is a continuous process that shrinks coefficients and hence is more stable: however, it does not set any coefficients to 0 and hence does not give an easily interpretable model.

- Consider the usual regression situation: we have data [math]\displaystyle{ (\mathbf{x}^i, y^i), i=1,2,...,N \ , }[/math] where [math]\displaystyle{ \mathbf{x}^i=(x_{i1},..., x_{ip})^T }[/math] and [math]\displaystyle{ y_i }[/math] are the regressors and response for the