sklearn.ensemble.GradientBoostingClassifier

From GM-RKB
Jump to navigation Jump to search

A sklearn.ensemble.GradientBoostingClassifier is an Gradient Boosting Classification System within sklearn.ensemble module.

1) Import Gradient Tree Boosting Classification System from scikit-learn : from sklearn.ensemble import GradientBoostingClassifier
2) Create design matrix X and response vector Y
3) Create Gradient Tree Boosting Classifier object: BC=GradientBoostingClassifier([loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, ...])
4) Choose method(s):
  • apply(X), applies trees in the ensemble to X, return leaf indices.
  • decision_function(X), computes the decision function of X.
  • fit(X, y[, sample_weight, monitor]), fits the gradient boosting model.
  • get_params([deep]), gets parameters for this estimator.
  • predict(X), predicts class for X.
  • predict_log_proba(X), predicts class log-probabilities for X.
  • predict_proba(X), predicts class probabilities for X.
  • score(X, y[, sample_weight]), returns the mean accuracy on the given test data and labels.
  • set_params(**params), sets the parameters of this estimator.
  • staged_decision_function(X), computes decision function of X for each iteration.
  • staged_predict(X), predicts class at each stage for X.
  • staged_predict_proba(X), predicts class probabilities at each stage for X.


References

2017a

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html Retrieved:2017-10-22.
    • QUOTE: class sklearn.ensemble.GradientBoostingClassifier(loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)

       Gradient Boosting for classification.

       GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced. Read more in the User Guide.

2017b

The disadvantages of GBRT are:
  • Scalability, due to the sequential nature of boosting it can hardly be parallelized.
The module sklearn.ensemble provides methods for both classification and regression via gradient boosted regression trees.