Large Training Set Learning Algorithm

Context:
- It can focus one "one read per record".
- It can make use of Training Record Sampling.
See: Small Training Set Learning Algorithm.

References

(Provost et al., 1999) ⇒ Foster Provost, David Jensen, and Tim Oates. “[http://dx.doi.org/10.1145/312129.312188 Efficient Progressive Sampling.” In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999).
Foster Provost, and Venkateswarlu Kolluri. (1999). “A Survey of Methods for Scaling Up Inductive Algorithms.” In: Journal Data Mining and Knowledge Discovery, 3(2).
- ABSTRACT: One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule sets, in order to provide focus and specific details; the issues and techniques generalize to other types of data mining. We begin with a discussion of important issues related to scaling up. We highlight similarities among scaling techniques by categorizing them into three main approaches. For each approach, we then describe, compare, and contrast the different constituent techniques, drawing on specific examples from published papers. Finally, we use the preceding analysis to suggest how to proceed when dealing with a large problem, and where to focus future research.

George H. John, and Brian Lent. (1997). “SIPping from the Data Firehose.” In: Proceedings of KDD 1997.
- http://robotics.stanford.edu/~gjohn/ftp/papers/sipping.ps