- (Badanidiyuru et al., 2014) ⇒ Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. (2014). “Streaming Submodular Maximization: Massive Data Summarization on the Fly.” In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2014) Journal. ISBN:978-1-4503-2956-9 doi:10.1145/2623330.2623637
Subject Headings: Nonparametric Learning in Massive Datasets Algorithm, Nonparametric Regression.
How can one summarize a massive data set “on the fly", i.e., without even having seen it in its entirety? In this paper, we address the problem of extracting representative elements from a large stream of data. I.e., we would like to select a subset of say k data points from the stream that are most representative according to some objective function. Many natural notions of “representativeness " satisfy submodularity, an intuitive notion of diminishing returns. Thus, such problems can be reduced to maximizing a submodular set function subject to a cardinality constraint. Classical approaches to submodular maximization require full access to the data set. We develop the first efficient streaming algorithm with constant factor [math]1/2-ε[/math] approximation guarantee to the optimum solution, requiring only a single pass through the data, and memory independent of data size. In our experiments, we extensively evaluate the effectiveness of our approach on several applications, including training large-scale kernel methods and exemplar-based clustering, on millions of data points. We observe that our streaming method, while achieving practically the same utility value, runs about 100 times faster than previous work.
|2014 StreamingSubmodularMaximization||Ashwinkumar Badanidiyuru|
|Streaming Submodular Maximization: Massive Data Summarization on the Fly||10.1145/2623330.2623637||2014|
|Author||Ashwinkumar Badanidiyuru +, Baharan Mirzasoleiman +, Amin Karbasi + and Andreas Krause +|
|proceedings||Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +|
|title||Streaming Submodular Maximization: Massive Data Summarization on the Fly +|