# Linear Regression Task

A Linear Regression Task is a regression analysis task that is based on linear predictor functions.

**AKA:**Supervised Linear Function Fitting, Generalized Linear Regression Task.**Context:**- Task Input:
N-observed Numerically-Labeled Training Dataset, [math]D=\{(x_1,y_1),(x_2,y_2),\cdots(x_n,y_n)\}[/math] that can be represented by

- [math]\mathbf{Y}[/math], response variable continuous dataset
- [math]\mathbf{X}[/math], predictor variables continuous dataset.

**output**:- [math]\boldsymbol{\beta}[/math], estimated linear model parameters vector, a continuous dataset.
- [math]\mathbf{Y^*}[/math], the Fitted Linear Function values , a continuous dataset of the predicted values.
- [math]\sum_{i=1}^n||\hat{y}_i - y_i||^2[/math] sum of squared errors vector, a continuous dataset.
- [math]\sigma_x,\sigma_y,\rho_{X,Y}...[/math], standard deviations, correlation coefficient, standard error of estimate and other statistical information the fitting parameters.

**Task Requirements**- It requires solving the linear model equation :

- Task Input:

- [math]y_i=f(x_i,\boldsymbol\beta)+\varepsilon_i\quad[/math] with [math]f(x_i,\beta_j )=\sum _{j=0}^{m}\beta _{j}\phi_{j}(x_i)[/math] for [math]\quad i=1,\cdots,n \;[/math] and [math]j=0,\cdots, p[/math]
- in this regression function [math]f(X)=f(x_i,\boldsymbol\beta)[/math] is a parametric regression function which is a linear combination between [math]p[/math] regression coefficients [math]\beta_j[/math] (parameters) and basis functions [math]\phi_j(x_i)[/math]. Usually, the basis function are polynomial, i.e. [math]\phi_j(x^i)=x_i^{j}[/math]

- or
- [math]\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{U},[/math] where [math]\mathbf{Y}=y_i[/math] is the measurement matrix, [math]\mathbf{X}=\phi _{j}(x_i)[/math] is design matrix, [math]\mathbf{B}=\boldsymbol{\beta}=\beta_j [/math] the parameters matrix and [math]\mathbf{U}=\varepsilon_i[/math] a errors matrix. This is,
[math]\begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \begin{pmatrix} \phi_0(x_1) & \phi_1(x_1) & \cdots & \phi_p(x_1) \\ \phi_0(x_2) & \phi_1(x_2) & \cdots & \phi_p{x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \phi_0(x_n) & \phi_1(x_n) & \cdots & \phi_p(x_n) \end{pmatrix}\begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{pmatrix}+\begin{pmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{pmatrix}[/math]

- [math]\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{U},[/math] where [math]\mathbf{Y}=y_i[/math] is the measurement matrix, [math]\mathbf{X}=\phi _{j}(x_i)[/math] is design matrix, [math]\mathbf{B}=\boldsymbol{\beta}=\beta_j [/math] the parameters matrix and [math]\mathbf{U}=\varepsilon_i[/math] a errors matrix. This is,
- by estimating the best-fitting [math]\beta[/math] parameters that optimizes the following objective function:
[math]E(f)=\sum _{i=1}^{n}L(y_{i},f(x_{i},{\boldsymbol \beta }))[/math]

[math]L(\cdot)[/math] is an error function that may be derived as a loss function or the negative of a likelihood function.

- (optional) It may require a Measurement Error Model.
- A regression diagnostic test to determine goodness of fit the regression model and the statistical significance of the estimated parameters

- it can be solved by a Linear Regression System that implements a linear regression algorithm

- It can also be defined as a function fitting task that requires the production of a (best-fitting) fitted linear function.
- It can range from being a Manual Linear Regression Task to being an Automated Linear Regression Task.

**Example(s):**- A numerical experiment resulted in the four [math](x, y)[/math] data points [math]{(1, 6), (2, 5), (3, 7), (4, 10)}[/math], find a line [math]y=\beta_1+\beta_2 x[/math] that best fits these four points

e.g. ⇒ [math]y=3.5+1.4x[/math]. - A Simple Linear Regression Task: [math]y_i=\beta_0+\beta_1 x_i+\varepsilon_i,\quad i=1,\cdots ,n\;[/math],
[math]\begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \begin{pmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots\\ 1 & x_n \\ \end{pmatrix}\begin{pmatrix} \beta_0 \\ \beta_1 \\ \end{pmatrix}+\begin{pmatrix} \varepsilon_0 \\ \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{pmatrix}[/math]

- The regression problem: [math]y_i=\beta_0+\beta_1 x_i+\beta_2x_i^2+\varepsilon_i,\quad i=1,\cdots ,n\;[/math],
[math]\begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \begin{pmatrix} 1 & x_1&x_1^2\\ 1 & x_2&x_2^2 \\ \vdots & \vdots&\vdots\\ 1 & x_n&x_n^2 \\ \end{pmatrix}\begin{pmatrix} \beta_0 \\ \beta_1 \\ \beta_2\\ \end{pmatrix}+\begin{pmatrix} \varepsilon_0 \\ \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{pmatrix}[/math]

Although, it includes a quadratic term, this is still linear in the regression parameters.

- A Multivariate Linear Regression Task.
- A Regularized Linear Regression Task.
- A Linear Least-Squares Regression Task

- A numerical experiment resulted in the four [math](x, y)[/math] data points [math]{(1, 6), (2, 5), (3, 7), (4, 10)}[/math], find a line [math]y=\beta_1+\beta_2 x[/math] that best fits these four points
**Counter-Example(s):****See:**Curve Fitting, System of Linear Equations, Linear Model.

## References

### 2017

- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/linear_model.html Retrieved: 2017-30-07.
- QUOTE: The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if [math]\hat{y}[/math] is the predicted value.
[math]\hat{y}(w, x) = w_0 + w_1 x_1 + \cdots + w_p x_p[/math]

Across the module, we designate the vector [math]w = (w_1,\cdots, w_p)[/math] as

`coef_`

and [math]w_0[/math] as`intercept_`

.

- QUOTE: The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if [math]\hat{y}[/math] is the predicted value.

### 2014

- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/linear_regression Retrieved:2014-11-23.
- In statistics,
**linear regression**is an approach for modeling the relationship between a scalar dependent variable*y*and one or more explanatory variables denoted*X*. The case of one explanatory variable is called*simple linear regression*. For more than one explanatory variable, the process is called*multiple linear regression*. (This term should be distinguished from*multivariate linear regression*, where multiple correlated dependent variables are predicted,rather than a single scalar variable.)In linear regression, data are modeled using linear predictor functions, and unknown model parameters are estimated from the data. Such models are called

*linear models*. Most commonly, linear regression refers to a model in which the conditional mean of*y*given the value of*X*is an affine function of*X*. Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of*y*given*X*is expressed as a linear function of*X*. Like all forms of regression analysis,*linear regression*focuses on the conditional probability distribution of*y*given*X*, rather than on the joint probability distribution of*y*and*X*, which is the domain of multivariate analysis.Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

- If the goal is prediction, or forecasting, or reduction, linear regression can be used to fit a predictive model to an observed data set of
*y*and*X*values. After developing such a model, if an additional value of*X*is then given without its accompanying value of*y*, the fitted model can be used to make a prediction of the value of*y*. - Given a variable
*y*and a number of variables*X*_{1}, ...,*X*_{p}that may be related to*y*, linear regression analysis can be applied to quantify the strength of the relationship between*y*and the*X*_{j}, to assess which*X*_{j}may have no relationship with*y*at all, and to identify which subsets of the*X*_{j}contain redundant information about*y*.

- If the goal is prediction, or forecasting, or reduction, linear regression can be used to fit a predictive model to an observed data set of
- Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares loss function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.

- In statistics,

### 2013

- http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Formulae-for-statistical-models
- The template for a statistical model is a linear regression model with independent, homoscedastic errors :[math]y_i = \sum_{j=0}^p \beta_j x_{ij} + e_i, \, i = 1, …, n,[/math] where the [math]e_i[/math] are [math]NID(0, sigma^2[/math]). In matrix terms this would be written :[math]y = \mathbf{X} \beta + e[/math] where the y is the response vector, X is the model matrix or design matrix and has columns [math]x_0, x_1, …, x_p[/math], the determining variables. Very often [math]x_0[/math] will be a column of ones defining an intercept term.

### 2011a

- (Allain, 2011) ⇒ Rhett Allain. (2015). “Linear Regression by Hand.” In: Wired, 2011-01-16
- QUOTE: It only makes sense. I did linear regression in google docs and I did it for python. But what if you neither of those? Can you do it by hand? Why yes. Suppose I take the same data from the pylab example and I imagine trying to add a linear function to represent that data. …
… you have to make up some criteria for choosing the best line. Commonly, it is chosen to pick the line such that the value of the sum of d2 is minimized. … typically, the horizontal variable is your independent variable – so these might be some set values. The vertical data is typically the one with the most error (but not always). …

There. That is the the basic form of linear regression by hand. Note that there ARE other ways to do this – more complicated ways (assuming different types of distributions for the data). Also, the same basic idea is followed if you want to fit some higher order polynomial. Warning, it gets complicated (algebraically) real quick.

- QUOTE: It only makes sense. I did linear regression in google docs and I did it for python. But what if you neither of those? Can you do it by hand? Why yes. Suppose I take the same data from the pylab example and I imagine trying to add a linear function to represent that data. …

### 2011b

- (Quadrianto & Buntine, 2011) ⇒ Novi Quadrianto and Wray L. Buntine (2011). "Linear Regression" In: (Sammut & Webb, 2011) pp 747-750.
- QUOTE: (1): Linear regression is an instance of the Regression problem which is an approach to modeling a functional relationship between input variables [math]x[/math] and an output/response variable [math]y[/math]. In linear regression, a linear function of the input variables is used, and more generally a linear function of some vector function of the input variables [math]\phi(x)[/math]can also be used. The linear function estimates the mean of [math]y[/math] (or more generally the median or a quantile).
- QUOTE: (2): Formally, in a regression problem, we are interested in recovering a functional dependency [math]y_i = f(x_i ) +\epsilon_i[/math] from [math]N[/math] observed training data points [math]\{(x_i , y_i )\}_{i = 1}^N[/math] , where [math]y\,\in \mathbb{R}[/math] the noisy observed output at input location [math]x_i\,\in\, \mathbb{R}^d[/math]. For the linear parametric technique, we tackle this regression problem by parameterizing the latent regression function f() by a parameter [math]w\,\in\,\mathbb{R}^H[/math], that is, [math]f(x_i ) := \phi(x_i ), w[/math] for [math]H[/math] fixed basis functions [math]\{\phi_h (x_i )\}_{h = 1}^H[/math] . Note that the function is a linear function of the weight vector [math]w[/math]. The simplest form of the linear parametric model is when [math]\phi(x_i)=x_i\,\in \mathbb{R}^d[/math], that is, the model is also linear with respect to the input variables, [math]f(x_i ) : = w_0 + w_1x_{i1} + \cdots + w_d x_{id} [/math]. Here the weight [math]w_0[/math] allows for any constant offset in the data. With general basis functions such as polynomials, exponentials, sigmoids, or even more sophisticated Fourier or wavelets bases, we can obtain a regression function which is nonlinear with respect to the input variables although still linear with respect to the parameters.