sklearn.ensemble.ExtraTreesClassifier

From GM-RKB
Jump to navigation Jump to search

A sklearn.ensemble.ExtraTreesClassifier is an Extremely Randomized Trees Classification System within sklearn.ensemble module.

1) Import Extremely Randomized Trees Classification System from scikit-learn : from sklearn.ensemble import ExtraTreesClassifier
2) Create design matrix X and response vector Y
3) Create Extremely Randomized Trees Classifier object: clf=ExtraTreesClassifier([n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1,...])
4) Choose method(s):
  • apply(X), applies trees in the forest to X, return leaf indices.
  • decision_path(X), returns the decision path in the forest
  • fit(X, y[, sample_weight]), build a forest of trees from the training set (X, y).
  • get_params([deep]), retrieves parameters for this estimator.
  • predict(X), predicts class for X.
  • predict_log_proba(X), predicts class log-probabilities for X.
  • predict_proba(X), predicts class probabilities for X.
  • score(X, y[, sample_weight]), returns the mean accuracy on the given test data and labels.
  • set_params(**params), sets the parameters of this estimator.


References

2017a

  • (Scikit Learn, 2017A) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
    • QUOTE: class sklearn.ensemble.ExtraTreesClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

      An extra-trees classifier.

      This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

      Read more in the User Guide

      (...)

      The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

2017b

2006