Log-Linear Model

A Log-Linear Model is a mathematical model that takes the form of a function whose logarithm is a first-degree polynomial function of the parameters of the model.

Example(s):
- a Markov Logic Networks,
- a Maximum Entropy Model.
- …
Counter-Example(s)
- a Linear Model,
- a Non-Linear Model.
See: Log-Linear Analysis, Markov Logic Network, Logistic Regression, Maxent Model, Statistical Natural Language Processing, Maximum Entropy Models for Natural Language Processing.

References

2017a

(Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2011). "Log Linear Models". In: (Sammut & Webb, 2017).
- QUOTE: Maximum Entropy Models for Natural Language Processing

2017b

(Ratnaparkhi, 2017) ⇒ Adwait Ratnaparkhi (2017). "Maximum Entropy Models for Natural Language Processing". In: (Sammut & Webb, 2017).
- QUOTE: The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.
  The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes 1957).
  In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus with which the word that co-occurs with the tag corresponding to determiner, or DET, is a piece of observed evidence. A probability model is consistent with the observed evidence if its calculated estimates of the co-occurrence counts agree with the observed counts in the corpus.
  The goal of the maximum entropy framework is to find a model that is consistent with the co-occurrence counts, but is otherwise maximally uncertain. It provides a way to combine many pieces of evidence into a single probability model. An iterative parameter estimation procedure is usually necessary in order to find the maximum entropy probability model.

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Log-linear_model Retrieved:2015-3-22.
- A log-linear model is a mathematical model that takes the form of a function whose logarithm is a first-degree polynomial function of the parameters of the model, which makes it possible to apply (possibly multivariate) linear regression. That is, it has the general form : [math]\displaystyle{ \exp \left(c + \sum_{i} w_i f_i(X) \right)\,, }[/math] in which the f_i(X) are quantities that are functions of the variables X, in general a vector of values, while c and the w_i stand for the model parameters.
  The term may specifically be used for:
  - A log-linear plot or graph, which is a type of semi-log plot.
  - Poisson regression for contingency tables, a type of generalized linear model.
- The specific applications of log-linear models are where the output quantity lies in the range 0 to ∞, for values of the independent variables $X$ , or more immediately, the transformed quantities $f i (X)$ in the range −∞ to +∞. This may be contrasted to logistic models, similar to the logistic function, for which the output quantity lies in the range 0 to 1. Thus the contexts where these models are useful or realistic often depends on the range of the values being modelled.

2013

(Collins, 2013b) ⇒ Michael Collins. (2013). “Log-Linear Models." Course notes for NLP by Michael Collins, Columbia University.
- QUOTE: The abstract problem is as follows. We have some set of possible inputs, [math]\displaystyle{ \mathcal{X} }[/math], and a set of possible labels, [math]\displaystyle{ \mathcal{Y} }[/math]. Our task is to model the conditional probability [math]\displaystyle{ p(y \mid x) }[/math] for any pair (x; y) such that [math]\displaystyle{ x \in \mathcal{X} }[/math] and [math]\displaystyle{ y \in \mathcal{Y} }[/math].
  …
  Definition 1 (Log-linear Models) A log-linear model consists of the following components:
  - A set [math]\displaystyle{ \mathcal{X} }[/math] of possible inputs.
  - A set [math]\displaystyle{ \mathcal{Y} }[/math] of possible labels. The set [math]\displaystyle{ \mathcal{Y} }[/math] is assumed to be finite.
  - A positive integer d specifying the number of features and parameters in the model.
  - A function [math]\displaystyle{ f : \mathcal{X} \times \mathcal{Y} \rightarrow \mathbb{R}^d }[/math] that maps any [math]\displaystyle{ (x, y) }[/math] pair to a feature-vector [math]\displaystyle{ f(x, y) }[/math].
  - A parameter vector[math]\displaystyle{ v \in \mathbb{R}^d }[/math].
- For any [math]\displaystyle{ x \in \mathcal{X} }[/math], [math]\displaystyle{ y \in \mathcal{Y} }[/math], the model defines a conditional probability :[math]\displaystyle{ p(y \mid x; v) = \frac{\exp (v \cdot f(x,y))}{\Sigma_{y' \in \mathcal{Y}} \exp (v \cdot f(x, y'))} }[/math] Here [math]\displaystyle{ \exp(x) = e^x }[/math], and [math]\displaystyle{ v \cdot f(x,y) = \Sigma^d_{k=1} v_k f_k(x,y) }[/math] is the inner product between v and f(x,y). The term [math]\displaystyle{ p(y \mid x; v) }[/math] is intended to be read as “the probability of y conditioned on x, under parameter values v”.

2003

(Davison, 2003) ⇒ Anthony C. Davison. (2003). “Statistical Models." Cambridge University Press. ISBN:0521773393

1997

(Christensen, 1997) ⇒ Ronald Christensen. (1997). “Log-Linear Models and Logistic Regression, 2nd edition." Springer. ISBN:0387982477

1980

(Knoke et al., 1980) ⇒ David Knoke, Peter J. Burke, and Peter Burke (1980). "Log-linear models" (Vol. 20). Sage.
- QUOTE: Since multiple regression — in which one variable is taken as the linear function of the values of several independent variables — is a more widely known method, we shall draw explicit parallels between it and log-linear modelling. Regression procedures are normally used to predict numerical values only on an interval or ratio scale dependent variable. However, when the dependent variable is a dichotomy, coded "1" if, for example, respondents agree and "0" if respondents disagree with a survey item, then an ordinary regression upon predictor variables can be interpreted as showing how the probability of a favorable response is affected. In one major version of log-linear models, a dichotomous dependent variable can be treated analogously to a regression, with the essential difference that the independent variables affect not the probability but the odds on the dependent variable (e.g., the ratio of favorable to unfavorable responses). Other similarities between regression and log-linear models will be pointed out as we go along. Some similarities to probit analysis also may be seen, although we shall not develop them in this paper.

1957

(Jaynes, 1957) ⇒ E. T. Jaynes (1957). "Information theory and statistical mechanics". Phys Rev 106(4):620–630

Log-Linear Model

References

2017a

2017b

2015

2013

2003

1997

1980

1957

Navigation menu

Search