Least Angle Regression Cross-Validation System

From GM-RKB
Jump to navigation Jump to search

An Least Angle Regression Cross-Validation System is a Least Angle Regression System that implements an Cross-Validation Algorithm to solve a Linear Regression Task.



References

2017A

Least-angle regression (LARS) is a regression algorithm for high-dimensional data, developed by Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani. LARS is similar to forward stepwise regression. At each step, it finds the predictor most correlated with the response. When there are multiple predictors having equal correlation, instead of continuing along the same predictor, it proceeds in a direction equiangular between the predictors.
The advantages of LARS are:
  • It is numerically efficient in contexts where p >> n (i.e., when the number of dimensions is significantly greater than the number of points)
  • It is computationally just as fast as forwarding selection and has the same order of complexity as an ordinary least squares.
  • It produces a full piecewise linear solution path, which is useful in cross-validation or similar attempts to tune the model.
  • If two variables are almost equally correlated with the response, then their coefficients should increase at approximately the same rate. The algorithm thus behaves as intuition would expect, and also is more stable.
  • It is easily modified to produce solutions for other estimators, like the Lasso.
The disadvantages of the LARS method include:
Because LARS is based upon an iterative refitting of the residuals, it would appear to be especially sensitive to the effects of noise. This problem is discussed in detail by Weisberg in the discussion section of the Efron et al. (2004) Annals of Statistics article.
The LARS model can be used using estimator Lars, or its low-level implementation lars_path.

2017B

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/grid_search.html
    • QUOTE: 3.2.4.1. Model specific cross-validation

      Some models can fit data for a range of values of some parameter almost as efficiently as fitting the estimator for a single value of the parameter. This feature can be leveraged to perform a more efficient cross-validation used for model selection of this parameter.

      The most common parameter amenable to this strategy is the parameter encoding the strength of the regularizer. In this case we say that we compute the regularization path of the estimator.

Here is the list of such models:

 :: linear_model.ElasticNetCV([l1_ratio, eps,...]), Elastic Net model with iterative fitting along a regularization path;

linear_model.LarsCV([fit_intercept, ...]), Cross-validated Least Angle Regression model
linear_model.LassoCV([eps, n_alphas, ...]), Lasso linear model with iterative fitting along a regularization path;
linear_model.LassoLarsCV([fit_intercept, ...]), Cross-validated Lasso, using the LARS algorithm
linear_model.LogisticRegressionCV([Cs, ...]), Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.MultiTaskElasticNetCV([...]), Multi-task L1/L2 ElasticNet with built-in cross-validation.
linear_model.MultiTaskLassoCV([eps, ...]), Multi-task L1/L2 Lasso with built-in cross-validation.
linear_model.OrthogonalMatchingPursuitCV([...]), Cross-validated Orthogonal Matching Pursuit model (OMP)
linear_model.RidgeCV([alphas, ...]), Ridge regression with built-in cross-validation.
linear_model.RidgeClassifierCV([alphas, ...]), Ridge classifier with built-in cross-validation.