Linear Least-Squares L1-Regularized Regression System

From GM-RKB
Jump to navigation Jump to search

A Linear Least-Squares L1-Regularized Regression System is an regularized linear regression system that is a least-squares regression system which implements a LASSO algorithm to solve a LASSO regression task.



References

2017b

  • (Scikit-Learn, 2017) ⇒ "1.1.3. Lasso" http://scikit-learn.org/stable/modules/linear_model.html#lasso
    • QUOTE: The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero weights (see Compressive sensing: tomography reconstruction with L1 prior (Lasso)).

      Mathematically, it consists of a linear model trained with [math]\displaystyle{ \ell_1 }[/math] prior as regularizer. The objective function to minimize is:

      [math]\displaystyle{ \underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha ||w||_1} }[/math]

      The lasso estimate thus solves the minimization of the least-squares penalty with [math]\displaystyle{ \alpha ||w||_1 }[/math] added, where [math]\displaystyle{ \alpha }[/math] is a constant and [math]\displaystyle{ ||w||_1 }[/math] is the [math]\displaystyle{ \ell_1 }[/math]-norm of the parameter ve

2016

  • (Jain, 2016) Aarshay Jain (2016) ⇒ https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/#four
    • QUOTE:

      LASSO stands for Least Absolute Shrinkage and Selection Operator. I know it doesn’t give much of an idea but there are 2 key words here – ‘absolute‘ and ‘selection‘.

      Lasso regression performs L1 regularization, i.e. it adds a factor of sum of absolute value of coefficients in the optimization objective. Thus, lasso regression optimizes the following:

      Objective = RSS + α * (sum of absolute value of coefficients)

      Here, α (alpha) works similar to that of ridge and provides a trade-off between balancing RSS and magnitude of coefficients. Like that of ridge, α can take various values. Lets iterate it here briefly:

      1. α = 0: Same coefficients as simple linear regression
      2. α = ∞: All coefficients zero (same logic as before)
      3. 0 < α < ∞: coefficients between 0 and that of simple linear regression

      Yes its appearing to be very similar to Ridge till now. But just hang on with me and you’ll know the difference by the time we finish. Like before, lets run lasso regression on the same problem as above. First we’ll define a generic function: (...).