sklearn.ensemble.BaggingClassifier

From GM-RKB
Jump to navigation Jump to search

A sklearn.ensemble.BaggingClassifier is an Bagging Classification System within sklearn.ensemble module.

1) Import Bagging Classification System from scikit-learn : from sklearn.ensemble import BaggingClassifier
2) Create design matrix X and response vector Y
3) Create Bagging Classifier object: BC=BaggingClassifier(base_estimator=None, n_estimators=10[, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, ...])
4) Choose method(s):
  • decision_function(X), calculates the average of the decision functions of the base classifiers.
  • fit(X, y[, sample_weight]), builds a Bagging ensemble of estimators from the training set (X, y).
  • get_params([deep]), gets parameters for this estimator.
  • predict(X), predicts class for X.
  • predict_log_proba(X), predicts class log-probabilities for X.
  • predict_proba(X), predicts class probabilities for X.
  • score(X, y[, sample_weight]), returns the mean accuracy on the given test data and labels.
  • set_params(**params), sets the parameters of this estimator.


References

2017a

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html Retrieved: 2017-10-22.
    • QUOTE:class sklearn.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)

      A Bagging classifier.

      A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

      This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [R154]. If samples are drawn with replacement, then the method is known as Bagging [R155]. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [R156]. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [R157].

      Read more in the User Guide.

2017b

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator Retrieved: 2017-10-22.
    • QUOTE: In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction. These methods are used as a way to reduce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it. In many cases, bagging methods constitute a very simple way to improve with respect to a single model, without making it necessary to adapt the underlying base algorithm. As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees).

      Bagging methods come in many flavours but mostly differ from each other by the way they draw random subsets of the training set:

      • When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [B1999].
      • When samples are drawn with replacement, then the method is known as Bagging [B1996].
      • When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [H1998].
Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [LG2012].
In scikit-learn, bagging methods are offered as a unified BaggingClassifier meta-estimator (resp. BaggingRegressor), taking as input a user-specified base estimator along with parameters specifying the strategy to draw random subsets. In particular, max_samples and max_features control the size of the subsets (in terms of samples and features), while bootstrap and bootstrap_features control whether samples and features are drawn with or without replacement. When using a subset of the available samples the generalization accuracy can be estimated with the out-of-bag samples by setting oob_score=True. As an example, the snippet below illustrates how to instantiate a bagging ensemble of KNeighborsClassifier base estimators, each built on random subsets of 50% of the samples and 50% of the features.

2017c

2017d

  • (Sammut & Webb, 2017) ⇒ "Ensemble Learning". In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 393-402.
    • QUOTE: In the Bagging algorithm (Breiman 1996) each member of the ensemble is constructed from a different training dataset, and the predictions combined either by uniform averaging or voting over class labels. Each dataset is generated by sampling from the total N data examples, choosing N items uniformly at random with replacement. Each sample is known as a bootstrap; the name Bagging is an acronym derived from Bootstrap AGGregatING. Since a bootstrap samples N items uniformly at random with replacement, the probability of any individual data item not being selected is [math]\displaystyle{ p = (1 − 1∕N)^N }[/math]. Therefore with large N, a single bootstrap is expected to contain approximately 63. 2 % of the original set, while 36. 8 % of the originals are not selected.

      Like many ensemble methods, Bagging works best with unstable models, that is those that produce differing generalization behavior with small changes to the training data. These are also known as high variance models, examples of which are decision trees and neural networks. Bagging therefore tends not to work well with very simple models. In effect, Bagging samples randomly from the space of possible models to make up the ensemble – with very simple models the sampling produces almost identical (low diversity) predictions.

      Despite its apparent capability for variance reduction, situations have been demonstrated where Bagging can converge without affecting variance (see Brown et al. 2005). Several other explanations have been proposed for Bagging’s success, including links to Bayesian model averaging. In summary, it seems that several years from its introduction, despite its apparent simplicity, Bagging is still not fully understood.

2017E