2009 MismatchedModelsWrongResultsand

(Hand, 2009) ⇒ David J. Hand. (2009). “Mismatched Models, Wrong Results, and Dreadful Decisions: On Choosing Appropriate Data Mining Tools.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557021

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Data mining techniques use 'score functions' to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a 'classification threshold' to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2009 MismatchedModelsWrongResultsand	David J. Hand			Mismatched Models, Wrong Results, and Dreadful Decisions: On Choosing Appropriate Data Mining Tools		KDD-2009 Proceedings		10.1145/1557019.1557021		2009