1999 UsingMaximumEntropyforTextClass

From GM-RKB
Revision as of 19:51, 5 January 2014 by Maintenance script (talk | contribs) (Importing text file)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

This paper proposes the use of maximum en- tropy techniques for text classi�cation. Maxi- mum entropy is a probability distribution esti- mation technique widely used for a variety of natural language tasks, such as language mod- eling, part-of-speech tagging, and text segmen- tation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are uni- form. Constraints on the distribution, derived from labeled training data, inform the tech- nique where to be minimally non-uniform. The maximum entropy formulation has a unique so- lution which can be found by the improved it- erative scaling algorithm. In this paper, max- imum entropy is used for text classi�cation by estimating the conditional distribution of the class variable given the document. In experi- ments on several text datasets we compare ac- curacy to naive Bayes and show that maximum entropy is sometimes signi�cantly better, but also sometimes worse. Much future work re- mains, but the results indicate that maximum entropy is a promising technique for text classifi�cation.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 UsingMaximumEntropyforTextClassKamal Nigam
John D. Lafferty
Andrew McCallum
Using Maximum Entropy for Text Classification1999