Lasso Regression Algorithm

From GM-RKB
(Redirected from L1-Norm Regularizer)
Jump to navigation Jump to search

A Lasso Regression Algorithm is a linear regression algorithm that uses shrinkage and selection.



References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Least_squares#Lasso_method Retrieved:2015-1-14.
    • An alternative regularized version of least squares is lasso (least absolute shrinkage and selection operator), which uses the constraint that [math]\displaystyle{ \|\beta\|_1 }[/math], the L1-norm of the parameter vector, is no greater than a given value. (As above, this is equivalent to an unconstrained minimization of the least-squares penalty with [math]\displaystyle{ \alpha\|\beta\|_1 }[/math] added.) In a Bayesian context, this is equivalent to placing a zero-mean Laplace prior distribution on the parameter vector. The optimization problem may be solved using quadratic programming or more general convex optimization methods, as well as by specific algorithms such as the least angle regression algorithm. One of the prime differences between Lasso and ridge regression is that in ridge regression, as the penalty is increased, all parameters are reduced while still remaining non-zero, while in Lasso, increasing the penalty will cause more and more of the parameters to be driven to zero. This is an advantage of Lasso over ridge regression, as driving parameters to zero deselects the features from the regression. Thus, Lasso automatically selects more relevant features and discards the others, whereas Ridge regression never fully discards any features. Some feature selection techniques are developed based on the LASSO including Bolasso which bootstraps samples, and FeaLect which analyzes the regression coefficients corresponding to different values of [math]\displaystyle{ \alpha }[/math] to score all the features.

      The L1-regularized formulation is useful in some contexts due to its tendency to prefer solutions with fewer nonzero parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. An extension of this approach is elastic net regularization.


2011

  • http://www-stat.stanford.edu/~tibs/lasso.html
    • The Lasso is a shrinkage and selection method for linear regression. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients. It has connections to soft-thresholding of wavelet coefficients, forward stagewise regression, and boosting methods.


  • http://en.wikipedia.org/wiki/Least_squares#LASSO_method
    • In some contexts a regularized version of the least squares solution may be preferable. The LASSO (least absolute shrinkage and selection operator) algorithm, for example, finds a least-squares solution with the constraint that [math]\displaystyle{ |\beta|_1 }[/math], the L1-norm of the parameter vector, is no greater than a given value. Equivalently, it may solve an unconstrained minimization of the least-squares penalty with [math]\displaystyle{ \alpha|\beta|_1 }[/math] added, where [math]\displaystyle{ \alpha }[/math] is a constant (this is the Lagrangian form of the constrained problem.) This problem may be solved using quadratic programming or more general convex optimization methods, as well as by specific algorithms such as the least angle regression algorithm. The L1-regularized formulation is useful in some contexts due to its tendency to prefer solutions with fewer nonzero parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the LASSO and its variants are fundamental to the field of compressed sensing.

2009

2008

2007

2005

  • (Tibshirani et al., 2005) ⇒ Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. (2005). “Sparsity and Smoothness via the Fused Lasso.” In: Journal of the Royal Statistical Society (Series B), 67(1).
    • ABSTRACT: The lasso penalizes a least squares regression by the sum of the absolute values (<math>L_1-norm<math>) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the 'fused lasso', a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the $L_1-norm$ of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences - i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N, the sample size. The technique is also extended to the 'hinge' loss function that underlies the support vector classifier. We illustrate the methods on examples from protein mass spectroscopy and gene expression data.

1996