1996 RegressionShrinkageAndSelViaLasso

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Lasso Algorithm.

Notes

Cited By

2004

Quotes

Author Keywords

Abstract

We propose a new method for estimation in linear models. The “lassominimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

1. Introduction

Consider the usual regression situation: we have data [math]\displaystyle{ (\mathbf{x}^i, y^i), i=1,2,...,N \ , }[/math] where [math]\displaystyle{ \mathbf{x}^i=(x_{i1},..., x_{ip})^T }[/math] and [math]\displaystyle{ y_i }[/math] are the regressors and response for the ith observation. The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error. There are two reasons why the data analyst is often not satisfied with the OLS estimates. The first is prediction accuracy: the OLS estimates often have low bias but large variance; prediction accuracy can sometimes be improved by shrinking or setting to 0 some coefficients. By doing so we sacrifice a little bias to reduce the variance of the predicted values and hence may improve the overall prediction accuracy. The second reason is interpretation. With a large number of predictors, we often would like to determine a smaller subset that exhibits the strongest effects.

The two standard techniques for improving the OLS estimates, subset selection and ridge regression, both have drawbacks. Subset selection provides interpretable models but can be extremely variable because it is a discrete process- regressors are either retained or dropped from the model. Small changes in the data can result in very different models being selected and this can reduce its prediction accuracy. Ridge regression is a continuous process that shrinks coefficients and hence is more stable: however, it does not set any coefficients to 0 and hence does not give an easily interpretable model.

We propose a new technique, called the lasso, for 'least absolute shrinkage and selection operator'. It shrinks some coefficients and sets others to 0, and hence tries to retain the good features of both subset selection and ridge regression.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1996 RegressionShrinkageAndSelViaLassoRegression Shrinkage and Selection via the Lassohttp://www.cse.iitb.ac.in/~avinava/machine/papers/Regression Shrinkage and Selection via the Lasso.pdf