1999 EfficientProgressiveSampling

(Provost et al., 1999) ⇒ Foster Provost, David Jensen, and Tim Oates. (1999). “Efficient Progressive Sampling.” In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999). doi:10.1145/312129.312188

Subject Headings: Progressive Sampling Algorithm, Large Training Set Learning Algorithm.

Notes

Cited By

Quotes

Abstract

Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive sampling - using progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1999 EfficientProgressiveSampling	Foster Provost David Jensen Tim Oates			Efficient Progressive Sampling			http://dx.doi.org/10.1145/312129.312188	10.1145/312129.312188