A Link Function is a function that provides the relationship between a linear predictor and a distribution function mean.

## References

### 2017

• (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Generalized_linear_model#Link_function Retrieved:2017-4-20.
• … The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

### 2017

• (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Generalized_linear_model#Link_function Retrieved:2017-4-20.
• The link function provides the relationship between the linear predictor and the mean of the distribution function. There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined canonical link function which is derived from the exponential of the response's density function. However in some cases it makes sense to try to match the domain of the link function to the range of the distribution function's mean, or use a non-canonical link function for algorithmic purposes, for example Bayesian probit regression.

When using a distribution function with a canonical parameter $\theta$ , the canonical link function is the function that expresses $\theta$ in terms of $\mu$ , i.e. $\theta = b(\mu)$ . For the most common distributions, the mean $\mu$ is one of the parameters in the standard form of the distribution's density function, and then $b(\mu)$ is the function as defined above that maps the density function into its canonical form. When using the canonical link function, $b(\mu) = \theta = \mathbf{X}\boldsymbol{\beta}$ , which allows $\mathbf{X}^{\rm T} \mathbf{Y}$ to be a sufficient statistic for $\boldsymbol{\beta}$ .

Following is a table of several exponential-family distributions in common use and the data they are typically used for, along with the canonical link functions and their inverses (sometimes referred to as the mean function, as done here).

Common distributions with typical uses and canonical link functions
Distribution Support of distribution Typical uses Link name Link function Mean function
Normal real: $(-\infty,+\infty)$ Linear-response data Identity $\mathbf{X}\boldsymbol{\beta}=\mu\,\!$ $\mu=\mathbf{X}\boldsymbol{\beta}\,\!$
Exponential real: $(0,+\infty)$ Exponential-response data, scale parameters Inverse $\mathbf{X}\boldsymbol{\beta}=\mu^{-1}\,\!$ $\mu=(\mathbf{X}\boldsymbol{\beta})^{-1}\,\!$
Gamma
Inverse
Gaussian
real: $(0, +\infty)$ Inverse
squared
$\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\!$ $\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\!$
Poisson integer: $0,1,2,\ldots$ count of occurrences in fixed amount of time/space Log $\mathbf{X}\boldsymbol{\beta}=\ln{(\mu)}\,\!$ $\mu=\exp{(\mathbf{X}\boldsymbol{\beta})}\,\!$
Bernoulli integer: $\{0,1\}$ outcome of single yes/no occurrence Logit $\mathbf{X}\boldsymbol{\beta}=\ln{\left(\frac{\mu}{1-\mu}\right)}\,\!$ $\mu=\frac{\exp{(\mathbf{X}\boldsymbol{\beta})}}{1 + \exp{(\mathbf{X}\boldsymbol{\beta})}} = \frac{1}{1 + \exp{(-\mathbf{X}\boldsymbol{\beta})}}\,\!$
Binomial integer: $0,1,\ldots,N$ count of # of "yes" occurrences out of N yes/no occurrences
Categorical integer: $[0,K)$ outcome of single K-way occurrence
K-vector of integer: $[0,1]$, where exactly one element in the vector has the value 1
Multinomial K-vector of integer: $[0,N]$ count of occurrences of different types (1 .. K) out of N total K-way occurrences
• In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. In particular, the linear predictor may be negative, which would give an impossible negative mean. When maximizing the likelihood, precautions must be taken to avoid this. An alternative is to use a noncanonical link function.

Note also that in the case of the Bernoulli, binomial, categorical and multinomial distributions, the support of the distributions is not the same type of data as the parameter being predicted. In all of these cases, the predicted parameter is one or more probabilities, i.e. real numbers in the range $[0,1]$ . The resulting model is known as logistic regression (or multinomial logistic regression in the case that K-way rather than binary values are being predicted).

For the Bernoulli and binomial distributions, the parameter is a single probability, indicating the likelihood of occurrence of a single event. The Bernoulli still satisfies the basic condition of the generalized linear model in that, even though a single outcome will always be either 0 or 1, the expected value will nonetheless be a real-valued probability, i.e. the probability of occurrence of a "yes" (or 1) outcome. Similarly, in a binomial distribution, the expected value is Np, i.e. the expected proportion of "yes" outcomes will be the probability to be predicted.

For categorical and multinomial distributions, the parameter to be predicted is a K-vector of probabilities, with the further restriction that all probabilities must add up to 1. Each probability indicates the likelihood of occurrence of one of the K possible values. For the multinomial distribution, and for the vector form of the categorical distribution, the expected values of the elements of the vector can be related to the predicted probabilities similarly to the binomial and Bernoulli distributions.