Bootstrapped Resampling Algorithm
Jump to navigation
Jump to search
A Bootstrap Algorithm is a Resampling Algorithm that ...
- AKA: Bootstrapping, Bootstrap.
- Context:
- It can be used to improve the Accuracy of an Supervised Learning Algorithm by Resampling with replacement.
- It can be used to solve an Estimation Task.
- It produces a bootstrap distribution
- It is a generalization of the Jackknife Algorithm.
- Counter-Example(s):
- See: Accuracy, Cross-Validation.
References
2009
- http://en.wikipedia.org/wiki/Bootstrapping_(statistics)
- In Statistics, bootstrapping is a modern, computer-intensive, general purpose approach to Statistical Inference, falling within a broader class of resampling methods. Bootstrapping is the practice of estimating properties of an Estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution of the observed data. In the case where a set of observations can be assumed to be from an Independent and Identically Distributed population, this can be implemented by constructing a number of resamples of the observed dataset (and of equal size to the observed dataset), each of which is obtained by Random Sampling with Replacement from the original dataset. It may also be used for constructing hypothesis tests. It is often used as an alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors.
- The advantage of bootstrapping over analytical methods is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite-sample guarantees, and has a tendency to be overly optimistic. The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches.
2006
- (Xia, 2006a) ⇒ Fei Xia. (2006). "Bootstrapping." Course Lecture. LING 572 - Advanced Statistical Methods in Natural Language Processing
2002
- (Gabor Melli, 2002) ⇒ Gabor Melli. (2002). "PredictionWorks' Data Mining Glossary." PredictionWorks.
- QUOTE: Bootstrap: A technique used to estimate a model's accuracy. Bootstrap performs [math]\displaystyle{ b }[/math] experiments with a training set that is randomly sampled from the data set. Finally, the technique reports the average and standard deviation of the accuracy achieved on each of the b runs. Bootstrap differs from cross-validation in that test sets across experiments will likely share some rows, while in cross-validation is guaranteed to test each row in the data set once and only once. See also accuracy, resampling techniques and cross-validation.
1995
- (Kohavi, 1995) ⇒ Ron Kohavi. (1995). "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection." In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995).
- QUOTE: We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap.
1993
- (Efron & Tibshirani, 1993) ⇒ Bradley Efron, and Robert Tibshirani. (1993). "An Ïntroduction to the Bootstrap." Chapman and Hall. ISBN:0412042312
- Keywords: standard error, confidence intervals, jackknife estimate, cross-validation, delta method, permutation test, Fisher information, histogram, exponential family, random variable, bootstrap computations, empirical distribution function, nonparametric, null hypothesis, bioequivalence, bootstrap samples, estimate bias, LSAT, standard deviation, importance sampling
1979
- (Efron, 1979) ⇒ Bradley Efron. (1979). "Bootstrap Methods: Another Look at the Jackknife." In: The Annals of Statistics, 7(1). http://www.jstor.org/stable/2958830
- QUOTE: ... given a random sample [math]\displaystyle{ \mathbf{X} = (X_1, X_2, ..., X_n) }[/math] from an unknown probability distribution [math]\displaystyle{ F }[/math], estimate the sampling distribution of some prespecified random variable [math]\displaystyle{ R(\mathbf{X}, F) }[/math], on the basis of the observed data [math]\displaystyle{ \mathbf{x} }[/math]. (Standard jackknife theory gives an approximate mean and variance in the case [math]\displaystyle{ R(\mathbf{X},F)=\theta(\hat{F})-\theta(F), \theta }[/math] some parameter of interest.) A general method, called the "bootstrap," is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.