# Log-Linear Model

A Log-Linear Model is a mathematical model that takes the form of a function whose logarithm is a first-degree polynomial function of the parameters of the model.

**Example(s):****Counter-Example(s)**- a Linear Model,
- a Non-Linear Model.

**See:**Log-Linear Analysis, Markov Logic Network, Logistic Regression, Maxent Model, Statistical Natural Language Processing, Maximum Entropy Models for Natural Language Processing.

## References

### 2017a

- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2011). "Log Linear Models". In: (Sammut & Webb, 2017).

### 2017b

- (Ratnaparkhi, 2017) ⇒ Adwait Ratnaparkhi (2017). "Maximum Entropy Models for Natural Language Processing". In: (Sammut & Webb, 2017).
- QUOTE: The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.
The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes 1957).

In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus with which the word that co-occurs with the tag corresponding to determiner, or DET, is a piece of observed evidence. A probability model is consistent with the observed evidence if its calculated estimates of the co-occurrence counts agree with the observed counts in the corpus.

The goal of the maximum entropy framework is to find a model that is consistent with the co-occurrence counts, but is otherwise maximally uncertain. It provides a way to combine many pieces of evidence into a single probability model. An iterative parameter estimation procedure is usually necessary in order to find the maximum entropy probability model.

- QUOTE: The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Log-linear_model Retrieved:2015-3-22.
- A
**log-linear model**is a mathematical model that takes the form of a function whose logarithm is a first-degree polynomial function of the parameters of the model, which makes it possible to apply (possibly multivariate) linear regression. That is, it has the general form : [math] \exp \left(c + \sum_{i} w_i f_i(X) \right)\,, [/math] in which the*f*(_{i}*X*) are quantities that are functions of the variables*X*, in general a vector of values, while*c*and the*w*stand for the model parameters._{i}The term may specifically be used for:

- A log-linear plot or graph, which is a type of semi-log plot.
- Poisson regression for contingency tables, a type of generalized linear model.

- The specific applications of log-linear models are where the output quantity lies in the range 0 to ∞, for values of the independent variables
*X*, or more immediately, the transformed quantities*f*(_{i}*X*) in the range −∞ to +∞. This may be contrasted to logistic models, similar to the logistic function, for which the output quantity lies in the range 0 to 1. Thus the contexts where these models are useful or realistic often depends on the range of the values being modelled.

- A

### 2013

- (Collins, 2013b) ⇒ Michael Collins. (2013). “Log-Linear Models." Course notes for NLP by Michael Collins, Columbia University.
- QUOTE: The abstract problem is as follows. We have some set of possible inputs, [math]\mathcal{X}[/math], and a set of possible labels, [math]\mathcal{Y}[/math]. Our task is to model the conditional probability [math]p(y \mid x)[/math] for any pair (x; y) such that [math]x \in \mathcal{X}[/math] and [math]y \in \mathcal{Y}[/math].

…Definition 1 (Log-linear Models) A log-linear model consists of the following components:

- A set [math]\mathcal{X}[/math] of possible inputs.
- A set [math]\mathcal{Y}[/math] of possible labels. The set [math]\mathcal{Y}[/math] is assumed to be finite.
- A positive integer d specifying the number of features and parameters in the model.
- A function [math]f : \mathcal{X} \times \mathcal{Y} \rightarrow \mathbb{R}^d[/math] that maps any [math](x, y)[/math] pair to a feature-vector [math]f(x, y)[/math].
- A parameter vector[math]v \in \mathbb{R}^d [/math].

- For any [math]x \in \mathcal{X}[/math], [math]y \in \mathcal{Y}[/math], the model defines a conditional probability :[math] p(y \mid x; v) = \frac{\exp (v \cdot f(x,y))}{\Sigma_{y' \in \mathcal{Y}} \exp (v \cdot f(x, y'))} [/math] Here [math] \exp(x) = e^x[/math], and [math] v \cdot f(x,y) = \Sigma^d_{k=1} v_k f_k(x,y)[/math] is the inner product between v and f(x,y). The term [math]p(y \mid x; v)[/math] is intended to be read as “the probability of y conditioned on x, under parameter values v”.

- QUOTE: The abstract problem is as follows. We have some set of possible inputs, [math]\mathcal{X}[/math], and a set of possible labels, [math]\mathcal{Y}[/math]. Our task is to model the conditional probability [math]p(y \mid x)[/math] for any pair (x; y) such that [math]x \in \mathcal{X}[/math] and [math]y \in \mathcal{Y}[/math].

### 2003

- (Davison, 2003) ⇒ Anthony C. Davison. (2003). “Statistical Models." Cambridge University Press. ISBN:0521773393

### 1997

- (Christensen, 1997) ⇒ Ronald Christensen. (1997). “Log-Linear Models and Logistic Regression, 2nd edition." Springer. ISBN:0387982477

### 1980

- (Knoke et al., 1980) ⇒ David Knoke, Peter J. Burke, and Peter Burke (1980). "Log-linear models" (Vol. 20). Sage.
- QUOTE: Since multiple regression — in which one variable is taken as the linear function of the values of several independent variables—is a more widely known method, we shall draw explicit parallels between it and log-linear modelling. Regression procedures are normally used to predict numerical values only on an interval or ratio scale dependent variable. However, when the dependent variable is a dichotomy, coded "1" if, for example, respondents agree and "0" if respondents disagree with a survey item, then an ordinary regression upon predictor variables can be interpreted as showing how the probability of a favorable response is affected. In one major version of log-linear models, a dichotomous dependent variable can be treated analogously to a regression, with the essential difference that the independent variables affect not the probability but the odds on the dependent variable (e.g., the ratio of favorable to unfavorable responses). Other similarities between regression and log-linear models will be pointed out as we go along. Some similarities to probit analysis also may be seen, although we shall not develop them in this paper.

### 1957

- (Jaynes, 1957) ⇒ E. T. Jaynes (1957). "Information theory and statistical mechanics". Phys Rev 106(4):620–630