Maximum Entropy-based Learning Algorithm

AKA: Maximum Entropy Modeling Algorithm, Maximum Entropy Algorithm.
Context:
- It can produce a Maximum Entropy Model? Generalized Maximum Entropy Model?.
- It can be:
  - a Binomial Maximum Entropy-based Learning Algorithm.
  - a Multinomial Maximum Entropy-based Learning Algorithm.
Example(s):
- Logistic Regression Algorithm.
- CRF-based Modeling Algorithm
See: Generalized Maximum Entropy Model, Maximum Entropy Network/Maximum Entropy Markov Network, Exponential Model.

References

http://www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/node2.html
- The maximum entropy method answers both these questions. Intuitively, the principle is simple: model all that is known and assume nothing about that which is unknown. In other words, given a collection of facts, choose a model which is consistent with all the facts, but otherwise as uniform as possible. This is precisely the approach we took in selecting our model at each step in the above example. ... In its most general formulation, maximum entropy can be used to estimate any probability distribution. In this paper we are interested in classication; thus we limit our further discussion to learning conditional distributions from labeled training data. Specically, we learn the conditional distribution of the class label given a document.
http://homepages.inf.ed.ac.uk/s0450736/maxent.html
MEGA Optimization Package
- http://www.cs.utah.edu/~hal/megam/

(Goodman, 2004) ⇒ J. Goodman. (2004). Exponential Priors for Maximum Entropy Models. In: Proceedings of HLTNAACL (2004). http://citeseer.ist.psu.edu/goodman04exponential.html

(Berger & al, 1996) ⇒ Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. (1996). "A Maximum Entropy Approach to Natural Language Processing." In: Computational Linguistics, 22(1).

(Kesavan & Kapur, 1989) ⇒ H. K. Kesavan, and J. N. Kapur. (1989). "The Generalized Maximum Entropy Principle." In: IEEE Transactions on Systems, Man and Cybernetics, 19(5). doi:10.1109/21.44019
- ABSTRACT: Generalizations of the maximum entropy principle (MEP) of E.T. Jaynes (1957) and the minimum discrimination information principle (MDIP) of S. Kullback (1959) are described. The generalizations have been achieved by enunciating the entropy maximization postulate and examining its consequences. The inverse principles which are inherent in the MEP and MDIP are made quite explicit. Several examples are given to illustrate the power and scope of the generalized maximum entropy principle that follows from the entropy maximization postulate.

(Thomas, 1979) ⇒ Marlin U. Thomas. (1979). "A Generalized Maximum Entropy Principle." In: Operations Research, 27(6).
- ABSTRACT: We describe a generalized maximum entropy principle for dealing with decision problems involving uncertainty but with some prior knowledge about the probability space corresponding to nature. This knowledge is expressed through known bounds on event probabilities and moments, which can be incorporated into a nonlinear programming problem. The solution provides a maximum entropy distribution that is then used in treating the decision problem as one involving risk. We describe an example application that involves the selection of oil spill recovery systems for inland harbor regions.

(Jaynes, 1957) ⇒ E. T. Jaynes. (1957). "Information Theory and Statistical Mechanics.
- "Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum entropy estimate. It is least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information.