Confusion Matrix

A confusion matrix is a contingency table that represents the count of a classifier's class predictions with respect to the actual outcome on some labeled learning set.

Context:
- It can (typically) have table size [math]\displaystyle{ L \times L }[/math], where [math]\displaystyle{ L }[/math] is the number target labels.
- It can range from being a 2x2 Confusion Matrix (for binary classification) to being a 3x3 Confusion Matrix to being ...
- It can be used to support a Classifier Performance Metric.
- It can be used to illustrate a predictive algorithm's:
  - likelihood of predicting each one of [math]\displaystyle{ L }[/math] classes.
  - Predictions areas where the Function encounters difficulties.
Example:
- A confusion matrix for a classification task with the three (c=3) output classes: A, B and C. The test set used to evaluate the algorithm contained 100 cases with a distribution of 30 As, 35 Bs and 35 Cs. A perfect classifier would have only made predictions along the diagonal, but the results below show that the algorithm was only correct on (20+25+24)/100 = 69% of the cases. The matrix can be used to infer that the classifier often confuses dairy for cans (11 incorrect) and cans for dairy (9 wrong). This matrix also includes summations of the rows and columns.

ACTUAL/PREDICTED	A	B	C	SUM
A	20	2	11	33
B	2	25	1	28
C	9	5	24	38
SUM	31	32	36	100

Counter-Example(s):
- a Cost-Benefit Matrix.
- a Joint Probability Table.
- a Learning Curve.
- a ROC Graph.
- a Contingency Table.
See: Classification Task.

References

2018

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/confusion_matrix Retrieved:2018-7-19.
- In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
  It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).

2011

(Ting, 2011a) ⇒ Kai Ming Ting. (2011). “Confusion Matrix.” In: (Sammut & Webb, 2011) p.209

2007

(Griffin et al., 2007) ⇒ Gregory Griffin, Alex Holub, and Pietro Perona. (2007). “Caltech-256 Object Category Dataset. California Institute of Technology - Technical Report. (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:CNS-TR-2007-001
- QUOTE: Figure 13: Selected rows and columns of a the 256 x 256 confusion matrix [math]\displaystyle{ M }[/math] for spatial pyramid matching [2] and [math]\displaystyle{ N_{train}=30 }[/math]. Matrix elements containing 0.0 have been left blank. The first 6 categories are chosen because they are likely to be confounded with the last 6 categories. The main diagonal shows the performance on just these 12 categories. The diagonals of the other 2 quadrants show whether the algorithm can detect categories which are similar but not exact. … Confusion between car-tire and car-side is entirely absent.

2006

(Fawcett, 2006a) ⇒ Tom Fawcett. (2006). “An Introduction to ROC Analysis" In: Pattern Recognition Letters, 27(8). doi:doi:10.1016/j.patrec.2005.10.010
- QUOTE: Given a classifier and an instance, there are four possible outcomes. If the instance is positive and it is classified as positive, it is counted as a true positive; if it is classified as negative, it is counted as a false negative. If the instance is negative and it is classified as negative, it is counted as a true negative; if it is classified as positive, it is counted as a false positive. Given a classifier and a set of instances (the test set), a two-by-two confusion matrix (also called a contingency table) can be constructed representing the dispositions of the set of instances. This matrix forms the basis for many common metrics. Fig. 1 shows a confusion matrix and equations of several common metrics that can be calculated from it. The numbers along the major diagonal represent the correct decisions made, and the numbers of this diagonal represent the errors — the confusion — between the various classes.

2002

(Hamilton, 2002) ⇒ Howard Hamilton. (2002). “Confusion Matrix." Class Notes for Computer Science 831: Knowledge Discovery in Databases. University of Regina
- QUOTE: A confusion matrix (Kohavi and Provost, 1998) contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix.

1998

(Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). “Glossary of Terms.” In: Machine Leanring 30(2-3).
- Confusion matrix: A matrix showing the predicted and actual classifications. A confusion matrix is of size LxL, where L is the number of different label values. The following confusion matrix is for L=2:

actual \ predicted	negative	positive
Negative	a	b
Positive	c	d

1971

(Townsend, 1971) ⇒ J. T. Townsend. (1971). “Theoretical Analysis of an Alphabetic Confusion Matrix.” In: Attention, Perception, & Psychophysics, 9(1).
- ABSTRACT: Attempted to acquire a confusion matrix of the entire upper-case English alphabet with a simple nonserified font under tachistoscopic conditions. This was accomplished with 2 experimental conditions, 1 with blank poststimulus field and 1 with noisy poststimulus field, for 6 Ss in 650 trials each. Results were: (a) the finite-state model that assumed stimulus similarity (the overlap activation model) and the choice model predicted the confusion-matrix entries about equally well in terms of a sum-of-squared deviations criterion and better than the all-or-none activation model, which assumed only a perfect perception or random-guessing state following a stimulus presentation; (b) the parts of the confusion matrix that fit best varied with the particular model, and this finding was related to the models; (c) the best scaling result in terms of a goodness-of-fit measure was obtained with the blank poststimulus field condition, with a technique allowing different distances for tied similarity values, and with the Euclidean as opposed to the city-block metric; and (d) there was agreement among the models in terms of the way in which the models reflected sensory and response bias structure in the data, and in the way in which a single model measured these attributes across experimental conditions, as well as agreement among similarity and distance measures with physical similarity. (24 ref.)

Confusion Matrix

References

2018

2011

2007

2006

2002

1998

1971

Navigation menu

Search