2008 LearningClassifiersfromOnlyPosi

(Elkan et al., 2008) ⇒ Charles Elkan, and Keith Noto. (2008). “Learning Classifiers from Only Positive and Unlabeled Data.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401920

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how to learn a standard binary classifier given a nontraditional training set of this nature.

Under the assumption that the labeled examples are selected randomly from the positive examples, we show that a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. We show how to use this result in two different ways to learn a classifier from a nontraditional training set. We then apply these two new methods to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database. Our experiments in this domain show that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 LearningClassifiersfromOnlyPosi	Charles P. Elkan Keith Noto			Learning Classifiers from Only Positive and Unlabeled Data				10.1145/1401890.1401920