# Logistic Regression Algorithm

A Logistic Regression Algorithm is a discriminative maximum entropy-based generalized linear classification algorithm which assumes that the log-odds of an observation y can be expressed as a linear function of the input variables.

## References

### 2015

• (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/logistic_regression Retrieved:2015-5-13.
• In statistics, logistic regression, or logit regression, or logit model[1] is a direct probability model that was developed by statistician D. R. Cox in 1958[2] [3] although much work was done in the single independent variable case almost two decades earlier. The binary logistic model is used to predict a binary response based on one or more predictor variables (features). That is, it is used in estimating the parameters of a qualitative response model. The probabilities describing the possible outcomes of a single trial are modeled, as a function of the explanatory (predictor) variables, using a logistic function. Frequently (and hereafter in this article) "logistic regression" is used to refer specifically to the problem in which the dependent variable is binary—that is, the number of available categories is two—while problems with more than two categories are referred to as multinomial logistic regression or polytomous logistic regression, or, if the multiple categories are ordered, as ordinal logistic regression.[3]

Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables, which are usually (but not necessarily) continuous, by estimating probabilities. Thus, it treats the same set of problems as does probit regression using similar techniques; the first assumes a logistic function and the second a standard normal distribution function.

Logistic regression can be seen as a special case of generalized linear model and thus analogous to linear regression. The model of logistic regression, however, is based on quite different assumptions (about the relationship between dependent and independent variables) from those of linear regression. In particular the key differences of these two models can be seen in the following two features of logistic regression. First, the conditional distribution $p(y \mid x)$ is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary. Second, the estimated probabilities are restricted to [0,1] through the logistic distribution function because logistic regression predicts the probability of the instance being positive.

Logistic regression is an alternative to Fisher's 1936 classification method, linear discriminant analysis. If the assumptions of linear discriminant analysis hold, application of Bayes' rule to reverse the conditioning results in the logistic model, so if linear discriminant assumptions are true, logistic regression assumptions must hold. The converse is not true, so the logistic model has fewer assumptions than discriminant analysis and makes no assumption on the distribution of the independent variables.

### 2011

• (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Logistic_regression
• … Logistic regression analyzes binomially distributed data of the form $Y_i \ \sim B(n_i,p_i),\text{ for }i = 1, \dots , m,$ where the numbers of Bernoulli trials ni are known and the probabilities of success pi are unknown. An example of this distribution is the fraction of seeds (pi) that germinate after ni are planted.

The model proposes for each trial i there is a set of explanatory variables that might inform the final probability. These explanatory variables can be thought of as being in a k-dimensional vector Xi and the model then takes the form $p_i = \operatorname{E}\left(\left.\frac{Y_i}{n_{i}}\right|X_i \right). \,$

The logits, natural logs of the odds, of the unknown binomial probabilities are modeled as a linear function of the Xi. $\operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i}.$ ...

### 1970

• (Cox, 1970) ⇒ D. R. Cox. (1970). “The Analysis of Binary Data." Methuen & Co.

1. Cite error: Invalid <ref> tag; no text was provided for refs named Freedman09
2. Cite error: Invalid <ref> tag; no text was provided for refs named cox58reg
3. Cite error: Invalid <ref> tag; no text was provided for refs named wal67est