sklearn.ensemble.GradientBoostingRegressor

From GM-RKB
Jump to navigation Jump to search

A sklearn.ensemble.GradientBoostingRegressor is an Gradient Tree Boosting Regression System within sklearn.ensemble module.

1) Import Gradient Tree Boosting Regression System from scikit-learn : from sklearn.ensemble import GradientBoostingRegressor
2) Create design matrix X and response vector Y
3) Create Gradient Tree Boosting Regressor object: BR=GradientBoostingRegressor([loss=’ls’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, ...])
4) Choose method(s):
  • apply(X), applies trees in the ensemble to X, return leaf indices.
  • fit(X, y[, sample_weight, monitor]), fits the gradient boosting model.
  • get_params([deep]), gets parameters for this estimator.
  • predict(X), predicts regression target for X.
  • score(X, y[, sample_weight]), returns the coefficient of determination R^2 of the prediction.
  • set_params(**params), sets the parameters of this estimator.
  • staged_predict(X), predicts regression target at each stage for X.


References

2017a

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html Retrieved:2017-10-22.
    • QUOTE: class sklearn.ensemble.GradientBoostingRegressor(loss=’ls’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)

       Gradient Boosting for regression.

       GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.

      Read more in the User Guide.

2017b

The disadvantages of GBRT are:
  • Scalability, due to the sequential nature of boosting it can hardly be parallelized.
The module sklearn.ensemble provides methods for both classification and regression via gradient boosted regression trees.