2009 MismatchedModelsWrongResultsand

Jump to navigation Jump to search

Subject Headings:


Cited By


Author Keywords


Data mining techniques use 'score functions' to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a 'classification threshold' to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 MismatchedModelsWrongResultsandDavid J. HandMismatched Models, Wrong Results, and Dreadful Decisions: On Choosing Appropriate Data Mining ToolsKDD-2009 Proceedings10.1145/1557019.15570212009