Confusion Matrix
From GM-RKB
A confusion matrix is a square matrix that represents the count of a classifier's class predictions with respect to the actual outcome on some labeled learning set.
- Context:
- It has size \(L\), where \(L\) is the number target labels.
- It can be used to calculate a classifier performance metric.
- It can illustrate a predictive algorithm's:
- likelihood of predicting each one of \(L\) classes.
- Predictions areas where the Function encounters difficulties.
- It can be a 2x2 Table when it is a binary classification model
- Example:
- A confusion matrix for a classification task with the three (c=3) output classes: A, B and C. The test set used to evaluate the algorithm contained 100 cases with a distribution of 30 As, 35 Bs and 35 Cs. A perfect classifier would have only made predictions along the diagonal, but the results below show that the algorithm was only correct on (20+25+24)/100 = 69% of the cases. The matrix can be used to infer that the classifier often confuses dairy for cans (11 incorrect) and cans for dairy (9 wrong). This matrix also includes summations of the rows and columns.
A | B | C | SUM | ||
A | 20 | 2 | 11 | 34 | |
B | 2 | 25 | 1 | 28 | |
C | 9 | 5 | 24 | 38 | |
SUM | 31 | 32 | 36 | 100 |
- Counter-Example(s):
- See: Classification Task.
References
2011
- (Ting, 2011a) ⇒ Kai Ming Ting. (2011). "Confusion Matrix." In: (Sammut & Webb, 2011) p.209
- (Wikipedia, 2011) http://en.wikipedia.org/wiki/Confusion_matrix
- In the field of artificial intelligence, a confusion matrix is a visualization tool typically used in supervised learning (in unsupervised learning it is typically called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. One benefit of a confusion matrix is that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). When a data set is unbalanced (when the number of samples in different classes vary greatly) the error rate of a classifier is not representative of the true performance of the classifier. This can easily be understood by an example: If there are 990 samples from class A and only 10 samples from class B, the classifier can easily be biased towards class A. If the classifier classifies all the samples as class A, the accuracy will be 99%. This is not a good indication of the classifier's true performance. The classifier has a 100% recognition rate for class A but a 0% recognition rate for class B.
2007
- (Griffin & al, 2007) ⇒ Gregory Griffin, Alex Holub, and Pietro Perona. (2007). "Caltech-256 Object Category Dataset. California Institute of Technology - Technical Report. (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:CNS-TR-2007-001
- QUOTE: Figure 13: Selected rows and columns of a the 256 x 256 confusion matrix \(M\) for spatial pyramid matching [2] and \(N_{train}=30\). Matrix elements containing 0.0 have been left blank. The first 6 categories are chosen because they are likely to be confounded with the last 6 categories. The main diagonal shows the performance on just these 12 categories. The diagonals of the other 2 quadrants show whether the algorithm can detect categories which are similar but not exact. ... Confusion between car-tire and car-side is entirely absent.
2006
- (Fawcett, 2006a) ⇒ Tom Fawcett. (2006). "An Introduction to ROC Analysis" In: Pattern Recognition Letters, 27(8). doi:doi:10.1016/j.patrec.2005.10.010
- QUOTE: Given a classifier and an instance, there are four possible outcomes. If the instance is positive and it is classified as positive, it is counted as a true positive; if it is classified as negative, it is counted as a false negative. If the instance is negative and it is classified as negative, it is counted as a true negative; if it is classified as positive, it is counted as a false positive. Given a classifier and a set of instances (the test set), a two-by-two confusion matrix (also called a contingency table) can be constructed representing the dispositions of the set of instances. This matrix forms the basis for many common metrics. Fig. 1 shows a confusion matrix and equations of several common metrics that can be calculated from it. The numbers along the major diagonal represent the correct decisions made, and the numbers of this diagonal represent the errors — the confusion — between the various classes.
2002
- (Hamilton, 2002) ⇒ Howard Hamilton. (2002). "Confusion Matrix." Class Notes for Computer Science 831: Knowledge Discovery in Databases. University of Regina
- QUOTE: A confusion matrix (Kohavi and Provost, 1998) contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix.
1998
- (Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). "Glossary of Terms." In: Machine Leanring 30(2-3).
- Confusion matrix: A matrix showing the predicted and actual classifications. A confusion matrix is of size LxL, where L is the number of different label values. The following confusion matrix is for L=2:
actual \ predicted |
negative |
positive |
Negative |
a |
b |
Positive |
c |
d |
1971
- (Townsend, 1971) ⇒ J. T. Townsend. (1971). "Theoretical analysis of an alphabetic confusion matrix." In: Attention, Perception, & Psychophysics, 9(1).
- ABSTRACT: Attempted to acquire a confusion matrix of the entire upper-case English alphabet with a simple nonserified font under tachistoscopic conditions. This was accomplished with 2 experimental conditions, 1 with blank poststimulus field and 1 with noisy poststimulus field, for 6 Ss in 650 trials each. Results were: (a) the finite-state model that assumed stimulus similarity (the overlap activation model) and the choice model predicted the confusion-matrix entries about equally well in terms of a sum-of-squared deviations criterion and better than the all-or-none activation model, which assumed only a perfect perception or random-guessing state following a stimulus presentation; (b) the parts of the confusion matrix that fit best varied with the particular model, and this finding was related to the models; (c) the best scaling result in terms of a goodness-of-fit measure was obtained with the blank poststimulus field condition, with a technique allowing different distances for tied similarity values, and with the Euclidean as opposed to the city-block metric; and (d) there was agreement among the models in terms of the way in which the models reflected sensory and response bias structure in the data, and in the way in which a single model measured these attributes across experimental conditions, as well as agreement among similarity and distance measures with physical similarity. (24 ref.)