2010 EstimatingRatesofRareEventswith

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

We consider the problem of estimating rates of rare events for high dimensional, multivariate categorical data where several dimensions are hierarchical. Such problems are routine in several data mining applications including computational advertising, our main focus in this paper. We propose LMMH, a novel log-linear modeling method that scales to massive data applications with billions of training records and several million potential predictors in a map-reduce framework. Our method exploits correlations in aggregates observed at multiple resolutions when working with multiple hierarchies; stable estimates at coarser resolution provide informative prior information to improve estimates at finer resolutions. Other than prediction accuracy and scalability, our method has an inbuilt variable screening procedure based on a “spike and slab prior” that provides parsimony by removing non-informative predictors without hurting predictive accuracy. We perform large scale experiments on data from real computational advertising applications and illustrate our approach on datasets with several billion records and hundreds of millions of predictors. Extensive comparisons with other benchmark methods show significant improvements in prediction accuracy.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 EstimatingRatesofRareEventswithDeepak Agarwal
Rahul Agrawal
Rajiv Khanna
Nagaraj Kota
Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear ModelsKDD-2010 Proceedingshttp://users.cis.fiu.edu/~lzhen001/activities/KDD USB key 2010/docs/p213.pdf10.1145/1835804.18358342010