2010 WhyLabelWhenYouCanSearchAlterna

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Active learning, machine learning, class imbalance, human resources, on-line advertising, micro-outsourcing.

Abstract

This paper analyses alternative techniques for deploying low-cost human resources for data acquisition for classifier induction in domains exhibiting extreme class imbalance -- where traditional labeling strategies, such as active learning, can be ineffective. Consider the problem of building classifiers to help brands control the content adjacent to their on-line advertisements. Although frequent enough to worry advertisers, objectionable categories are rare in the distribution of impressions encountered by most on-line advertisers -- so rare that traditional sampling techniques do not find enough positive examples to train effective models. An alternative way to deploy human resources for training-data acquisition is to have them “guide” the learning by searching explicitly for training examples of each class. We show that under extreme skew, even basic techniques for guided learning completely dominate smart (active) strategies for applying human resources to select cases for labeling. Therefore, it is critical to consider the relative cost of search versus labeling, and we demonstrate the tradeoffs for different relative costs. We show that in cost/skew settings where the choice between search and active labeling is equivocal, a hybrid strategy can combine the benefits.


References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 WhyLabelWhenYouCanSearchAlternaFoster Provost
Josh Attenberg
Why Label When You Can Search?: Alternatives to Active Learning for Applying Human Resources to Build Classification Models under Extreme Class ImbalanceKDD-2010 Proceedingshttp://pages.stern.nyu.edu/~fprovost/Papers/guidedlearning-kdd2010.pdf10.1145/1835804.18358592010