Multinomial Logistic Regression Task
A Multinomial Logistic Regression Task is a Logistic Regression Task that is a multinomial classification task.
- AKA: Softmax Task.
- Context:
- It can be solved by a Multinomial Logistic Regression System (that implements a multinomial logistic regression algorithm).
- See: Error Propagation, Logistic Regression, Statistical Classification, Linear Predictor Function, Linear Combination, Dot Product, Regression Coefficient, Discrete Choice, Utility, Perceptron, Support Vector Machine, Linear Discriminant Analysis.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Multinomial_logistic_regression#Setup Retrieved:2017-7-24.
- The basic setup is the same as in logistic regression, the only difference being that the dependent variables are categorical rather than binary, i.e. there are K possible outcomes rather than just two. The following description is somewhat shortened; for more details, consult the logistic regression article.
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Multinomial_logistic_regression#Introduction Retrieved:2017-7-24.
- There are multiple equivalent ways to describe the mathematical model underlying multinomial logistic regression. This can make it difficult to compare different treatments of the subject in different texts. The article on logistic regression presents a number of equivalent formulations of simple logistic regression, and many of these have analogues in the multinomial logit model .
The idea behind all of them, as in many other statistical classification techniques, is to construct a linear predictor function that constructs a score from a set of weights that are linearly combined with the explanatory variables (features) of a given observation using a dot product: : [math]\displaystyle{ \operatorname{score}(\mathbf{X}_i,k) = \boldsymbol\beta_k \cdot \mathbf{X}_i, }[/math] where Xi is the vector of explanatory variables describing observation i, βk is a vector of weights (or regression coefficients) corresponding to outcome k, and score(Xi, k) is the score associated with assigning observation i to category k. In discrete choice theory, where observations represent people and outcomes represent choices, the score is considered the utility associated with person i choosing outcome k. The predicted outcome is the one with the highest score.
The difference between the multinomial logit model and numerous other methods, models, algorithms, etc. with the same basic setup (the perceptron algorithm, support vector machines, linear discriminant analysis, etc.) is the procedure for determining (training) the optimal weights/coefficients and the way that the score is interpreted. In particular, in the multinomial logit model, the score can directly be converted to a probability value, indicating the probability of observation i choosing outcome k given the measured characteristics of the observation. This provides a principled way of incorporating the prediction of a particular multinomial logit model into a larger procedure that may involve multiple such predictions, each with a possibility of error. Without such means of combining predictions, errors tend to multiply. For example, imagine a large predictive model that is broken down into a series of submodels where the prediction of a given submodel is used as the input of another submodel, and that prediction is in turn used as the input into a third submodel, etc. If each submodel has 90% accuracy in its predictions, and there are five submodels in series, then the overall model has only .95 = 59% accuracy. If each submodel has 80% accuracy, then overall accuracy drops to .85 = 33% accuracy. This issue is known as error propagation and is a serious problem in real-world predictive models, which are usually composed of numerous parts. Predicting probabilities of each possible outcome, rather than simply making a single optimal prediction, is one means of alleviating this issue.
- There are multiple equivalent ways to describe the mathematical model underlying multinomial logistic regression. This can make it difficult to compare different treatments of the subject in different texts. The article on logistic regression presents a number of equivalent formulations of simple logistic regression, and many of these have analogues in the multinomial logit model .