1999 EfficientProgressiveSampling

From GM-RKB
Jump to: navigation, search

Subject Headings: Progressive Sampling Algorithm, Large Training Set Learning Algorithm.

Notes

Cited By

Quotes

Abstract

Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive sampling - using progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 EfficientProgressiveSamplingFoster Provost
David Jensen
Tim Oates
Efficient Progressive SamplingProceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mininghttp://dx.doi.org/10.1145/312129.31218810.1145/312129.3121881999