# Linear Function Fitting Algorithm

A Linear Function Fitting Algorithm is a parametric regression algorithm that accepts a linear probabilistic distribution and can be implemented into a linear regression system (to solve a linear regression task).

**AKA:**Linear Regression Algorithm, Rectilinear Regression Method, Supervised Linear Function Learning Technique.**Context:**- It can be used by a Linear Model Regression System (to solve a linear model regression task.
- It can range from being a Dense Linear Model Regression Algorithm to being a Sparse Linear Model Regression Algorithm.
- It consists of a minimization algorithm applied to the objective function or quality criterion of the form:
[math]E(f)=\sum _{i=1}^{n}R(y_{i},f(x_{i},{\boldsymbol \beta }))[/math] to solve the Linear Regression Task: [math]y_i=f(x_i,\beta)+\varepsilon_i[/math]

where [math]f(x_i,\beta)=\sum _{j=0}^{m}\beta _{j}\phi _{j}(x_i)[/math] is linear predictor function, [math]y_i[/math] is the response variable observed dataset, and [math]R()[/math] is a error function that may be derived as a loss function or a likelihood function.

- It can range from being a Univariate Linear Regression Algorithm to being a Multivariate Linear Regression Algorithm.
- It can range from being a Simple Linear Regression Algorithm to being a Robust Linear Regression Algorithm.
- It can range from being a Continuous Linear Regression Algorithm to being a Segmented Linear Regression Algorithm.
- It can range from being a Univariate Linear Regression Algorithm to being a Multivariate Linear Regression Algorithm.
- It can range from being a Closed-Form Linear Regression Algorithm to being an Approximate Linear Regression Algorithm.
- It can be a Stepwise Linear Regression Algorithm.
- …

**Example(s):**- Logistic Regression.
- Cox Model Regression.
- A Linear Least Squares Regression Algorithm which is based on the objective function: [math]E(f)=\sum _{i=1}^{n}|y_{i} - f(x_i,\boldsymbol\beta )|^2[/math]
- An Instrumental Variables Regression Algorithm.
- A Least Absolute Deviation Regression Algorithm.
- A Quantile Regression Algorithm.
- A Principal Component Regression Algorithm.
- A Ridge Regression Algorithm.
- A Least Angle Regression Algorithm.
- A Bayesian Linear Regression Algorithm.
- …

**Counter-Example(s):****See:**Linear Approximation, Linear Combination; Generalized Linear Model; Ridge Regression; Linear Correlation; Correlation Matrix; Gaussian Processes; Linear Predictor Function, Lasso Regression, Forward Stepwise Regression, Best-Subset Regression, Ridge Regression.

## References

### 2011

- (Quadrianto & Buntine, 2011b) ⇒ Novi Quadrianto, Wray L. Buntine. (2011). “Linear Regression.” In: (Sammut & Webb, 2011) p.603
- http://onlineregression.sdsu.edu/onlineregression11.php
- http://onlineregression.sdsu.edu/onlineregression13.php

### 2006

- (Tibshirani, 1996) ⇒ Robert Tibshirani. (1996). “Regression Shrinkage and Selection via the Lasso.” In: Journal of the Royal Statistical Society, Series B, 58(1).
- QUOTE: We propose a new method for estimation in linear models. The “lasso” minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. …
… Consider the usual regression situation: we have data [math](\mathbf{x}^i, y^i), i=1,2,...,N \ ,[/math] where [math]\mathbf{x}^i=(x_{i1},..., x_{ip})^T[/math] and [math]y_i[/math] are the regressors and response for the

*i*th observation. The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error.

- QUOTE: We propose a new method for estimation in linear models. The “lasso” minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. …

### 2009

- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Linear_regression#Estimation_methods
- QUOTE: Numerous procedures have been developed for parameter estimation and inference in linear regression. These methods differ in computational simplicity of algorithms, presence of a closed-form solution, robustness with respect to heavy-tailed distributions, and theoretical assumptions needed to validate desirable statistical properties such as consistency and asymptotic efficiency. Some of the more common estimation techniques for linear regression are summarized below.
**Ordinary least squares**(OLS) is the simplest and thus most common estimator. It is conceptually simple and computationally straightforward. OLS estimates are commonly used to analyze both experimental and observational data. The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter*β*: [math]\hat\beta = (X'X)^{-1} X'y = \big(\, \tfrac{1}{n}{\textstyle\sum} x_i x'_i \,\big)^{-1} \big(\, \tfrac{1}{n}{\textstyle\sum} x_i y_i \,\big)[/math] The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors^{[1]}: [math] \operatorname{E}[\,x_i\varepsilon_i\,] = 0. [/math] It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E[*ε*|_{i}^{2}*x*] does not depend on_{i}*i*. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate*z*that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of*β*. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.In simple linear regression, where there is only one regressor (with a constant), the OLS coefficient estimates have a simple form that is closely related to the correlation coefficient between the covariate and the response.

**Generalized least squares**(GLS) is an extension of the OLS method, that allows efficient estimation of*β*when either heteroscedasticity, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the*i*^{th}case is inversely proportional to var(*ε*). This special case of GLS is called “weighted least squares”. The GLS solution to estimation problem is [math] \hat\beta = (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y, [/math] where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant._{i}**Iteratively reweighted least squares**(IRLS) is used when heteroscedasticity, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data.^{[2]}In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of*β*.^{[3]}^{[4]}**Instrumental variables**regression (IV) can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary*instrumental variables***z**_{i}such that E[*z*] = 0. If_{i}ε_{i}*Z*is the matrix of instruments, then the estimator can be given in closed form as: [math] \hat\beta = (X'Z(Z'Z)^{-1}Z'X)^{-1}X'Z(Z'Z)^{-1}Z'y [/math]**Optimal instruments**regression is an extension of classical IV regression to the situation where E[*ε*|_{i}*z*] = 0._{i}**Least absolute deviation**(LAD) regression is a robust estimation technique in that it is less sensitive to the presence of outliers than OLS (but is less efficient than OLS when no outliers are present). It is equivalent to maximum likelihood estimation under a Laplace distribution model for*ε*.^{[5]}**Quantile regression**focuses on the conditional quantiles of*y*given*X*rather than the conditional mean of y*given*X*. Linear quantile regression models a particular conditional quantile, often the conditional median, as a linear function β′*x*of the predictors.***Maximum likelihood estimation**can be performed when the distribution of the error terms is known to belong to a certain parametric family*ƒ*of probability distributions._{θ}^{[6]}When f_{θ}is a normal distribution with mean zero and variance θ, the resulting estimate is identical to the OLS estimate. GLS estimates are maximum likelihood estimates when ε follows a multivariate normal distribution with a known covariance matrix.**Adaptive estimation**. If we assume that error terms are independent from the regressors [math]\varepsilon_i \perp \mathbf{x}_i[/math], the optimal estimator is the 2-step MLE, where the first step is used to non-parametrically estimate the distribution of the error term.^{[7]}- Mixed models are widely used to analyze linear regression relationships involving dependent data when the dependencies have a known structure. Common applications of mixed models include analysis of data involving repeated measurements, such as longitudinal data, or data obtained from cluster sampling. They are generally fit as parametric models, using maximum likelihood or Bayesian estimation. In the case where the errors are modeled as normal random variables, there is a close connection between mixed models and generalized least squares.
^{[8]}Fixed effects estimation is an alternative approach to analyzing this type of data. - Principal component regression (PCR)
^{[9]}^{[10]}is used when the number of predictor variables is large, or when strong correlations exist among the predictor variables. This two-stage procedure first reduces the predictor variables using principal component analysis then uses the reduced variables in an OLS regression fit. While it often works well in practice, there is no general theoretical reason that the most informative linear function of the predictor variables should lie among the dominant principal components of the multivariate distribution of the predictor variables. The partial least squares regression is the extension of the PCR method which does not suffer from the mentioned deficiency. - Total least squares (TLS)
^{[11]}is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is sometimes used when the covariates are assumed to be error-free. - Ridge regression,
^{[12]}^{[13]}^{[14]}and other forms of penalized estimation such as the Lasso,^{[15]}deliberately introduce bias into the estimation of*β*in order to reduce the variability of the estimate. The resulting estimators generally have lower mean squared error than the OLS estimates, particularly when multicollinearity is present. They are generally used when the goal is to predict the value of the response variable*y*for values of the predictors*x*that have not yet been observed. These methods are not as commonly used when the goal is inference, since it is difficult to account for the bias. *Least angle regression*^{[16]}is an estimation procedure for linear regression models that was developed to handle high-dimensional covariate vectors, potentially with more covariates than observations.- The Theil–Sen estimator is a simple robust estimation technique that choose the slope of the fit line to be the median of the slopes of the lines through pairs of sample points. It has similar statistical efficiency properties to simple linear regression but is much less sensitive to outliers.

- Other robust estimation techniques, including the α-trimmed mean approach, and L-, M-, S-, and R-estimators have been introduced.

- QUOTE: Numerous procedures have been developed for parameter estimation and inference in linear regression. These methods differ in computational simplicity of algorithms, presence of a closed-form solution, robustness with respect to heavy-tailed distributions, and theoretical assumptions needed to validate desirable statistical properties such as consistency and asymptotic efficiency. Some of the more common estimation techniques for linear regression are summarized below.

- ↑ Lai, T.L.; Robbins,H; Wei, C.Z. (1978). "Strong consistency of least squares estimates in multiple regression".
*Proceedings of the National Academy of Sciences USA***75**(7). - ↑ del Pino, Guido (1989). "The Unifying Role of Iterative Generalized Least Squares in Statistical Algorithms".
*Statistical Science***4**(4): 394–403. doi:10.1214/ss/1177012408. JSTOR 2245853. - ↑ Carroll, Raymond J. (1982). "Adapting for Heteroscedasticity in Linear Models".
*The Annals of Statistics***10**(4): 1224–1233. doi:10.1214/aos/1176345987. JSTOR 2240725. - ↑ Cohen, Michael; Dalal, Siddhartha R.; Tukey,John W. (1993). "Robust, Smoothly Heterogeneous Variance Regression".
*Journal of the Royal Statistical Society. Series C (Applied Statistics)***42**(2): 339–353. JSTOR 2986237. - ↑ Narula, Subhash C.; Wellington, John F. (1982). "The Minimum Sum of Absolute Errors Regression: A State of the Art Survey".
*International Statistical Review***50**(3): 317–326. doi:10.2307/1402501. JSTOR 1402501. - ↑ Lange, Kenneth L.; Little, Roderick J. A.; Taylor,Jeremy M. G. (1989). "Robust Statistical Modeling Using the t Distribution".
*Journal of the American Statistical Association***84**(408): 881–896. doi:10.2307/2290063. JSTOR 2290063. - ↑ Stone, C. J. (1975). "Adaptive maximum likelihood estimators of a location parameter".
*The Annals of Statistics***3**(2): 267–284. doi:10.1214/aos/1176343056. JSTOR 2958945. - ↑ Goldstein, H. (1986). "Multilevel Mixed Linear Model Analysis Using Iterative Generalized Least Squares".
*Biometrika***73**(1): 43–56. doi:10.1093/biomet/73.1.43. JSTOR 2336270. - ↑ Hawkins, Douglas M. (1973). "On the Investigation of Alternative Regressions by Principal Component Analysis".
*Journal of the Royal Statistical Society. Series C (Applied Statistics)***22**(3): 275–286. JSTOR 2346776. - ↑ Jolliffe, Ian T. (1982). "A Note on the Use of Principal Components in Regression".
*Journal of the Royal Statistical Society. Series C (Applied Statistics)***31**(3): 300–303. JSTOR 2348005. - ↑ Nievergelt, Yves (1994). "Total Least Squares: State-of-the-Art Regression in Numerical Analysis".
*SIAM Review***36**(2): 258–264. doi:10.1137/1036055. JSTOR 2132463. - ↑ Swindel, Benee F. (1981). "Geometry of Ridge Regression Illustrated".
*The American Statistician***35**(1): 12–15. doi:10.2307/2683577. JSTOR 2683577. - ↑ Draper, Norman R.; van Nostrand,R. Craig (1979). "Ridge Regression and James-Stein Estimation: Review and Comments".
*Technometrics***21**(4): 451–466. doi:10.2307/1268284. JSTOR 1268284. - ↑ Hoerl, Arthur E.; Kennard,Robert W.; Hoerl,Roger W. (1985). "Practical Use of Ridge Regression: A Challenge Met".
*Journal of the Royal Statistical Society. Series C (Applied Statistics)***34**(2): 114–120. JSTOR 2347363. - ↑ Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the Lasso".
*Journal of the Royal Statistical Society. Series B (Methodological)***58**(1): 267–288. JSTOR 2346178. - ↑ Efron, Bradley; Hastie,Trevor; Johnstone,Iain Johnstone;Tibshirani,Robert (2004). "Least Angle Regression".
*The Annals of Statistics***32**(2): 407–451. doi:10.1214/009053604000000067. JSTOR 3448465.

### 2009

- (WordNet, 2009) ⇒ http://wordnetweb.princeton.edu/perl/webwn?s=linear%20regression
- S: (n) linear regression, rectilinear regression (the relation between variables when the regression equation is linear: e.g., y = ax + b)

### 2004

- (Hastie et al., 2004) ⇒ Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. (2004). “The Entire Regularization Path for the Support Vector Machine.” In: The Journal of Machine Learning Research, 5.
- QUOTE: There are many ways to fit such a linear classifier, including
**linear regression**, Fisher’s linear discriminant analysis, and logistic regression

- QUOTE: There are many ways to fit such a linear classifier, including

### 1991

- (Efron & Tibshirani, 1991) ⇒ Bradley Efron, and Robert Tibshirani. (1991). “Statistical Data Analysis in the Computer Age.” In: Science, 253(5018). 10.1126/science.253.5018.390
- QUOTE: Most of our familiar statistical methods, such as hypothesis testing,
**linear regression**, analysis of variance, and maximum likelihood estimation, were designed to be implemented on mechanical calculators. …

- QUOTE: Most of our familiar statistical methods, such as hypothesis testing,