2008 OneStepSparseEstInNonconcPenLikMods

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Rejoinder

  • Boosting minimizes an empirical loss function via iterative functional gradient decent.

Abstract

Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In this article, we propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the one-step LLA estimator from the LLA algorithm as the final estimates. Statistically, we show that if the regularization parameter is appropriately chosen, the one-step LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the one-step LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the one-step sparse estimation methods. The results are very encouraging.

{{#ifanon:|

1. Introduction.

Variable selection and feature extraction are fundamental for knowledge discovery and predictive modeling with high-dimensionality (Fan and Li [13]). The best subset selection procedure along with traditional model selection criteria, such as AIC and BIC, becomes infeasible for feature selection from high-dimensional data due to too expensive computational cost. Furthermore, the best subset selection suffers from several drawbacks, the most severe of which is its lack of stability as analyzed in Breiman [4]. LASSO (Tibshirani [32]) method utilizes the L1 penalty to automatically select significant variable via continuous shrinkage, thus retaining the good features of both the best subset selection and ridge regression. In the same spirit of LASSO, the [[penalized likelihood with nonconcave penalty functions has been proposed to select significant variables for various parametric models, including generalized linear regression models and robust linear regression model (Fan and Li [10] and Fan and Peng [15]), and some semiparametric models, such as the Cox model and partially linear models (Fan and Li [11, 12] and Cai, Fan, Li and Zhou [5]). Fan and Li [10] provide deep insights into how to select a penalty function. They further advocate the use of penalty functions satisfying certain mathematical conditions such that the resulting penalized likelihood estimate possesses the properties of sparsity, continuity and unbiasedness. These mathematical conditions imply that the penalty function has to be singular at the origin and nonconvex over (0,∞). In the work aforementioned, it has been shown that when the regularization parameter is appropriately chosen, the nonconcave penalized likelihood estimates perform as well as the oracle procedure in terms of selecting the correct subset model and estimating the true nonzero coefficients.

Although nonconcave penalized likelihood approaches have promising theoretical properties, the singularity and nonconvexity of the penalty function challenge us to invent numerical algorithms which are capable of maximizing a nondifferentiable nonconcave function. Fan and Li [10] suggested iteratively, locally approximating the penalty function by a quadratic function and referred such approximation as to local quadratic approximation (LQA). With the aid of the LQA, the optimization of penalized likelihood function can be carried out using a modified Newton–Raphson algorithm. However, as pointed out in Fan and Li [10] and Hunter and Li [20], the LQA algorithm shares a drawback of backward stepwise variable selection: If a covariate is deleted at any step in the LQA algorithm, it will necessarily be excluded from the final selected model (see Section 2.2 for more details). Hunter and Li [20] addressed this issue by optimizing a slightly perturbed version of LQA, which alleviates the aforementioned drawback, but it is difficult to choose the size of perturbation. Another strategy to overcome the computational difficulty is using the one-step (or k-step) estimates from the iterative LQA algorithm with good starting estimators, as suggested by Fan and Li [10]. This is similar to the well-known one-step estimation argument in the maximum likelihood estimation (MLE) setting (Bickel [2], Lehmann and Casella [24], Robinson [30] and Cai, Fan, Zhou and Zhou [6]). See also Fan and Chen [9], Fan, Lin and Zhou [14] and Cai et al. [6] for some recent work on one-step estimators in local and marginal likelihood models. However, the problem with the one-step LQA estimator is that it cannot have a sparse representation, thus losing the most attractive and important property of the nonconcave penalized likelihood estimator.

In this article we develop a methodology and theory for constructing an efficient one-step sparse estimation procedure in nonconcave penalized likelihood models. For that purpose, we first propose a new iterative algorithm based on local linear approximation (LLA) for maximizing the nonconcave penalized likelihood. The LLA enjoys three significant advantages over the LQA and the perturbed LQA. First, in the LLA we do not have to delete any small coefficient or choose the size of perturbation in order to avoid numerical instability. Second, we demonstrate that the LLA is the best convex minorization–maximization (MM) algorithm, thus proving the convergence of the LLA algorithm by the ascent property of MM algorithms (Lange, Hunter and Yang [23]). Third, the LLA naturally produces a sparse estimates via continuous penalization. We then propose using the one-step LLA estimator from the LLA algorithm as the final estimates. Computationally, the one-step LLA estimates alleviate the computation burden in the iterative algorithm and overcome the potential local maxima problem in maximizing the nonconcave penalized likelihood. In addition, we can take advantage of the efficient algorithm for solving LASSO to compute the one-step LLA estimator. Statistically, we show that if the regularization parameter is appropriately chosen, the one-step LLA estimates enjoy the oracle properties, provided that the initial estimates are good enough. Therefore, the one-step LLA estimator can dramatically reduce the computation cost without losing statistical efficiency.

}},


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 OneStepSparseEstInNonconcPenLikModsHui Zou
Runze Li
One-Step Sparse Estimates in Nonconcave Penalized Likelihood Models (with discussion)Annals of Statisticshttp://www.stat.psu.edu/~rli/research/AOS0316.pdf2008