1996 AMaxEntApproachToNLP

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Maximum Entropy Models, Exponential Statistical Models, Maximum Entropy Modeling, Maximum Entropy-based Predictive Classifier, Feature Selection, Statistical Natural Language Processing.

Notes

Cited By

2001

Quotes

Abstract

The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the wide-scale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.



References

  • 1. Bahl, L.; Brown, P.; de Souza, P.; and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transaction on Acoustics, Speech, and Signal Processing, 37(7).
  • 2. Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry Printz, Luboš Ureš, The Candide system for machine translation, Proceedings of the workshop on Human Language Technology, March 08-11, 1994, Plainsboro, NJ doi:10.3115/1075812.1075844
  • 3. Black, E.; Jelinek, F.; Lafferty, J.; Magerman, D.; Mercer, R.; and Roukos, S. (1992). Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In: Proceedingseedings, DARPA Speech and Natural Language Workshop, Arden House, New York.
  • 4. Brown, D. (1959). A Note on Approximations to Discrete Probability Distributions. Information and Control, 2:386--392.
  • 5. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, Robert L. Mercer, The mathematics of statistical machine translation: parameter estimation, Computational Linguistics, v.19 n.2, June 1993
  • 6. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
  • 7. Brown, P.; Della Pietra, V.; de Souza, P.; and Mercer, R. (1990). Class-based N-Gram Models of Natural Language. Proceedings, IBM Natural Language ITL, 283--298.
  • 8. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, A statistical approach to sense disambiguation in machine translation, Proceedings of the workshop on Speech and Natural Language, p.146-151, February 19-22, 1991, Pacific Grove, California doi:10.3115/112405.112427
  • 9. Thomas M. Cover, Joy A. Thomas, Elements of information theory, Wiley-Interscience, New York, NY, 1991
  • 10. Csiszár, I. (1975). I-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability, 3(1):146--158.
  • 11. ibid. (1989). A Geometric Interpretation of Darroch and Ratcliff's Generalized Iterative Scaling. The Annals of Statistics, 17(3):1409--1413.
  • 12. Csiszár, L. and Tusnády, G. (1984). Information Geometry and Alternating Minimization Procedures. Statistics & Decisions, Supplemental Issue, no. 1, 205--237.
  • 13. Darroch, J. N. and Ratcliff, D. (1972). Generalized Iterative Scaling for Log-linear Models. Annals of Mathematical Statistics, no. 43, 1470--1480.
  • 14. Stephen Della Pietra, Vincent J. Della Pietra, J. Gillet, John D. Lafferty, H. Printz, L. Ures, Inference and Estimation of a Long-Range Trigram Model, Proceedings of the Second International Colloquium on Grammatical Inference and Applications, p.78-92, September 21-23, 1994
  • 15. Stephen D Pietra, Vincent D Pietra, John Lafferty, Inducing Features of Random Fields, Carnegie Mellon University, Pittsburgh, PA, 1995
  • 16. Arthur P. Dempster; Laird, N. M.; and Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(B):1--38.
  • 17. Guiasu, S. and Shenitzer, A. (1985). The Principle of Maximum Entropy. The Mathematical Intelligencer, 7(1).
  • 18. Jaynes, E. T. (1990) "Notes on Present Status and Future Prospects.” In: Maximum Entropy and Bayesian Methods, edited by W. T. Grandy and L. H. Schick. Kluwer, 1--13.
  • 19. Jelinek, F. and Mercer, R. L. (1980). Interpolated Estimation of Markov Source Parameters from Sparse Data. In: Proceedingseedings, Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands.
  • 20. Lucassen, J. and Mercer, R. (1984). An Information Theoretic Approach to Automatic Determination of Phonemic Baseforms. In: Proceedingseedings, IEEE International Conference on Acoustics, Speech and Signal Processing, San Diego, CA, 42.5.1--42.5.4.
  • 21. Merialdo, B. (1990). Tagging Text with a Probabilistic Model. In: Proceedingseedings, IBM Natural Language ITL, Paris, France, 161--172.
  • 22. Nádas, A.; Mercer, R.; Bahl, L.; Bakis, R.; Cohen, P.; Cole, A.; Jelinek, F.; and Lewis, B. (1981). Continuous Speech Recognition with Automatically Selected Acoustic Prototypes Obtained by either Bootstrapping or Clustering. In: Proceedingseedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, 1153--1155.
  • 23. Sokolnikoff, I. S. and Redheffer, R. M. (1966). Mathematics of Physics and Modern Engineering, Second Edition, McGraw-Hill Book Company.,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1996 AMaxEntApproachToNLPVincent J. Della Pietra
Stephen A. Della Pietra
Adam L. Berger
A Maximum Entropy Approach to Natural Language ProcessingComputational Linguistics (CL) Research Areahttp://acl.ldc.upenn.edu/J/J96/J96-1002.pdf1996