1999 BootstrappingForTextLearningTasks

Jump to: navigation, search

Subject Headings: Text Classification Algorithm, Bootstrap Algorithm, NLP Task


Cited By



When applying text learning algorithms to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. This paper presents bootstrapping as an alternative approach to learning from large sets of labeled data. Instead of a large quantity of labeled data, this paper advocates using a small amount of seed information and a large collection of easily-obtained unlabeled data. Bootstrapping initializes a learner with the seed information; it then iterates, applying the learner to calculate labels for the unlabeled data, and incorporating some of these labels into the training input for the learner. Two case studies of this approach are presented. Bootstrapping for information extraction provides 76% precision for a 250-word dictionary for extracting locations from web pages, when starting with just a few seed locations. Bootstrapping a text classifier from a few keywords per class and a class hierarchy provides accuracy of 66%, a level close to human agreement, when placing computer science research papers into a topic hierarchy. The success of these two examples argues for the strength of the general bootstrapping approach for text learning tasks.



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 BootstrappingForTextLearningTasksRosie Jones
Andrew McCallum
Kamal Nigam
Ellen Riloff
Bootstrapping for Text Learning TasksProceedings of the IJCAI 1999 Workshop on Text Mining: Foundations, Techniques, and Applicationshttp://www.kamalnigam.com/papers/bootstrap-ijcaiws99.pdf1999