2012 DesignPrinciplesofMassiveRobust

Subject Headings:

Notes

Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2012 DesignPrinciplesofMassiveRobust	Foster Provost Brian Dalessandro Claudia Perlich Ori Stitelman Troy Raeder			Design Principles of Massive, Robust Prediction Systems				10.1145/2339530.2339740		2012