Area Under the Receiver-Operator Curve (AUC) Metric

(Redirected from Area Under the Curve)
Jump to: navigation, search

An Area Under the Receiver-Operator Curve (AUC) Metric is a classifier performance measure based on the area of an ROC curve.



  • (Wikipedia, 2015) ⇒ Retrieved:2015-7-18.
    • When using normalized units, the area under the curve (often referred to as simply the AUC, or AUROC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative'). [1] This can be seen as follows: the area under the curve is given by (the integral boundaries are reversed as large T has a lower value on the x-axis) : [math] A = \int_{\infty}^{-\infty} y(T) x'(T) \, dT = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT = \int_{-\infty}^{\infty} \mbox{TPR}(T) P_0(T) \, dT = \langle \mbox{TPR} \rangle [/math] . The angular brackets denote average from the distribution of negative samples. It can further be shown that the AUC is closely related to the Mann–Whitney U,[2] [3] which tests whether positives are ranked higher than negatives. It is also equivalent to the Wilcoxon test of ranks.[3] The AUC is related to the Gini coefficient ( [math] G_1 [/math] ) by the formula [math] G_1 = 2 \mbox{AUC} - 1 [/math] , where: : [math] G_1 = 1 - \sum_{k=1}^n (X_{k} - X_{k-1}) (Y_k + Y_{k-1}) [/math] [4] In this way, it is possible to calculate the AUC by using an average of a number of trapezoidal approximations. It is also common to calculate the Area Under the ROC Convex Hull (ROC AUCH = ROCH AUC) as any point on the line segment between two prediction results can be achieved by randomly using one or other system with probabilities proportional to the relative length of the opposite component of the segment. Interestingly, it is also possible to invert concavities – just as in the figure the worse solution can be reflected to become a better solution; concavities can be reflected in any line segment, but this more extreme form of fusion is much more likely to overfit the data.[5] The machine learning community most often uses the ROC AUC statistic for model comparison. However, this practice has recently been questioned based upon new machine learning research that shows that the AUC is quite noisy as a classification measure[6] and has some other significant problems in model comparison.[7] [8] A reliable and valid AUC estimate can be interpreted as the probability that the classifier will assign a higher score to a randomly chosen positive example than to a randomly chosen negative example. However, the critical research[6][7] suggests frequent failures in obtaining reliable and valid AUC estimates. Thus, the practical value of the AUC measure has been called into question,[8] raising the possibility that the AUC may actually introduce more uncertainty into machine learning classification accuracy comparisons than resolution. Nonetheless, the coherence of AUC as a measure of aggregated classification performance has been vindicated, in terms of a uniform rate distribution,[9] and AUC has been linked to a number of other performance metrics such as the Brier score.[10] One recent explanation of the problem with ROC AUC is that reducing the ROC Curve to a single number ignores the fact that it is about the tradeoffs between the different systems or performance points plotted and not the performance of an individual system, as well as ignoring the possibility of concavity repair, so that related alternative measures such as Informedness[11] or DeltaP are recommended. These measures are essentially equivalent to the Gini for a single prediction point with DeltaP' = Informedness = 2AUC-1, whilst DeltaP = Markedness represents the dual (viz. predicting the prediction from the real class) and their geometric mean is the Matthews correlation coefficient.[11]









  1. Fawcett, Tom (2006); An introduction to ROC analysis, Pattern Recognition Letters, 27, 861–874.
  2. Hanley, James A.; McNeil, Barbara J. (1982). “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve". Radiology. 143 (1): 29–36. doi:10.1148/radiology.143.1.7063747. PMID 7063747.
  3. 3.0 3.1 Mason, Simon J.; Graham, Nicholas E. (2002). “Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation" (PDF). Quarterly Journal of the Royal Meteorological Society. 128: 2145–2166. doi:10.1256/003590002320603584.
  4. Hand, David J.; and Till, Robert J. (2001); A simple generalization of the area under the ROC curve for multiple class classification problems, Machine Learning, 45, 171–186.
  5. Flach, P.A.; Wu, S. (2005). “Repairing concavities in ROC curves." (PDF). 19th International Joint Conference on Artificial Intelligence (IJCAI'05). pp. 702–707.
  6. 6.0 6.1 Hanczar, Blaise; Hua, Jianping; Sima, Chao; Weinstein, John; Bittner, Michael; and Dougherty, Edward R. (2010); Small-sample precision of ROC-related estimates, Bioinformatics 26 (6): 822–830
  7. 7.0 7.1 Lobo, Jorge M.; Jiménez-Valverde, Alberto; and Real, Raimundo (2008), AUC: a misleading measure of the performance of predictive distribution models, Global Ecology and Biogeography, 17: 145–151
  8. 8.0 8.1 Hand, David J. (2009); Measuring classifier performance: A coherent alternative to the area under the ROC curve, Machine Learning, 77: 103–123
  9. Flach, P.A.; Hernandez-Orallo, J.; Ferri, C. (2011). “A coherent interpretation of AUC as a measure of aggregated classification performance." (PDF). Proceedings of the 28th International Conference on Machine Learning (ICML-11). pp. 657–664.
  10. Hernandez-Orallo, J.; Flach, P.A.; Ferri, C. (2012). “A unified view of performance metrics: translating threshold choice into expected classification loss" (PDF). Journal of Machine Learning Research. 13: 2813–2869.
  11. 11.0 11.1 [1]