# Linear Least-Squares L1-Regularized Regression Task

## References

### 2017a

• (Zhang, 2017) ⇒ Xinhua Zhang (2017). “Regularization" in “Encyclopedia of Machine Learning and Data Mining” (Sammut & Webb, 2017) pp 1083 - 1088 ISBN: 978-1-4899-7687-1, DOI: 10.1007/978-1-4899-7687-1_718
• QUOTE: A common approach to regularization is to penalize a model by its complexity measured by some real-valued function, e.g., a certain “norm” of $\mathbf{w}$. We list some examples below.

L1 regularization

L1 regularizer, $\left \|\mathbf{w}\right \|_{1} :=\sum _{i}\left \vert w_{i}\right \vert$, is a popular approach to finding sparse models, i.e., only a few components of $\mathbf{w}$ are nonzero, and only a corresponding small number of features are relevant to the prediction. A well-known example is the LASSO algorithm (Tibshirani,1996), which uses a L1-regularized least square:

$\displaystyle{\min _{\mathbf{w}\in \mathbb{R}^{p}}\left \|X^{\top }\mathbf{w} -\mathbf{ y}\right \|^{2} +\lambda \left \|\mathbf{w}\right \|_{ 1}.}$.

### 2017d

• (Scikit-Learn, 2017) ⇒ "1.1.3. Lasso" http://scikit-learn.org/stable/modules/linear_model.html#lasso
• QUOTE: The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero weights (see Compressive sensing: tomography reconstruction with L1 prior (Lasso)).

Mathematically, it consists of a linear model trained with $\ell_1$ prior as regularizer. The objective function to minimize is:

$\underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}$

The lasso estimate thus solves the minimization of the least-squares penalty with $\alpha ||w||_1$ added, where $\alpha$ is a constant and $||w||_1$ is the $\ell_1$-norm of the parameter vector.

1. Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the lasso”. Journal of the Royal Statistical Society. Series B (methodological) 58 (1). Wiley: 267–88. http://www.jstor.org/stable/2346178.
2. Breiman, Leo. 1995. “Better Subset Regression Using the Nonnegative Garrote”. Technometrics 37 (4). Taylor & Francis, Ltd.: 373–84. doi:10.2307/1269730.
3. Tibshirani, Robert. 1997. "The lasso Method for Variable Selection in the Cox Model". Statistics in Medicine, Vol. 16, 385—395 (1997)