2014 FastHighAccuracyPartofSpeechTag

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Tag Dictionary; POS Tagging Task; Moore's Tag Dictionary.

Notes

Cited By

Quotes

Abstract

Part-of-speech (POS) taggers can be quite accurate, but for practical use, accuracy often has to be sacrificed for speed. For example, the maintainers of the Stanford tagger (Toutanova et al., 2003; Manning, 2011) recommend tagging with a model whose per tag error rate is 17% higher, relatively, than their most accurate model, to gain a factor of 10 or more in speed. In this paper, we treat POS tagging as a single-token independent multiclass classification task. We show that by using a rich feature set we can obtain high tagging accuracy within this framework, and by employing some novel feature-weight-combination and hypothesis-pruning techniques we can also get very fast tagging with this model. A prototype tagger implemented in Perl is tested and found to be at least 8 times faster than any publicly available tagger reported to have comparable accuracy on the standard Penn Treebank Wall Street Journal test set.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2014 FastHighAccuracyPartofSpeechTagRobert MooreFast High-Accuracy Part-of-Speech Tagging by Independent Classifiers