2000 LessIsMore

Subject Headings: Active Learning Task, Support Vector Machines, Document Classification Task.

Notes

It reports a surprising result that more training data, especially when filled with outliers can reduce performance.
It bases this on learning curves for an active learning task.
It proposes that better performance can sometimes be achieved by using fewer training examples.

We describe a simple active learning heuristic which greatly enhances the generalization behavior of support vector machines (SVMs) on several practical document classification tasks. We observe a number of benefits, the most surprising of which is that a SVM trained on a well-chosen subset of the available corpus frequently performs better than one trained on all available data. The heuristic for choosing this subset is simple to compute, and makes no use of information about the test set. Given that the training time of SVMs depends heavily on the training set size , our heuristic not only offers better performance with fewer data, it frequently does so in less time than the naive approach of training on all available data.

There are many uses for a good document classifier — sorting mail into mailboxes, filtering spam or routing news articles. The problem is that learning to classify documents requires manually labelling more documents than a typical user can tolerate. This makes it an obvious target for active learning, where we can let the system ask for labels only on the documents which will most help the classifier learn. (See Tong and Koller (2000) in this volume for parallel research on this topic.)
In this paper, we describe the application of active learning to a support vector machine (SVM) document classifier. Although one can define an “optimal” (but greedy) active learner for SVMs, it is computationally impractical to implement. Instead, we use the simple, computationally efficient heuristic of labeling examples that lie closest to the SVM’s dividing hyperplane. Testing this heuristic on several domains, we observe a number of results, some of which are quite surprising. Compared with a SVM trained on randomly selected examples, the active learning heuristic provides significantly better generalization performance for a given number of training examples.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2000 LessIsMore	Greg Schohn David Cohn			Less is More: Active Learning with Support Vector Machines		Proceedings of the Seventeenth International Conference on Machine Learning	http://www.cs.cmu.edu/~cohn/papers/alsvm.ps.gz			2000