2006 EffectiveSelfTrainingforParsing

(McClosky et al., 2006) ⇒ David McClosky, Eugene Charniak, and Mark Johnson. (2006). “Effective Self-training for Parsing.” In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. doi:10.3115/1220835.1220855

Subject Headings:

Notes

Additional information can be found at http://stanford.edu/~mcclosky/selftraining.html

Cited By

Quotes

Abstract

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

2 Previous work

A simple method of incorporating unlabeled data into a new model is self-training. In self-training, the existing model first labels unlabeled data. The newly labeled data is then treated as truth and combined with the actual labeled data to train a new model. This process can be iterated over different sets of unlabeled data if desired. It is not surprising that self-training is not normally effective: Charniak (1997) and Steedman et al. (2003) report either minor improvements or significant damage from using self-training for parsing. Clark et al. (2003) applies self-training to POS-tagging and reports the same outcomes. One would assume that errors in the original model would be amplified in the new model.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2006 EffectiveSelfTrainingforParsing	Eugene Charniak Mark Johnson David McClosky			Effective Self-training for Parsing				10.3115/1220835.1220855