2014 GraphSampleandHoldAFrameworkfor

From GM-RKB

Jump to navigation Jump to search

(Ahmed et al., 2014) ⇒ Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. (2014). “Graph Sample and Hold: A Framework for Big-graph Analytics.” In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2014) Journal. ISBN:978-1-4503-2956-9 doi:10.1145/2623330.2623757

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Graph algorithms; graph streams; network sampling; statistical estimation

Abstract

Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy.

While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we pro - pose a generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH), which samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state in memory. We use a Horvitz-Thompson construction in conjunction with a scheme that samples arriving edges without adjacencies to previously sampled edges with probability p and holds edges with adjacencies with probability q. Our sample and hold framework facilitates the accurate estimation of subgraph patterns by enabling the dependence of the sampling process to vary based on previous history. Within our framework, we show how to produce statistically unbiased estimators for various graph properties from the sample. Given that the graph analytics will run on a sample instead of the whole population, the runtime complexity is kept under control. Moreover, given that the estimators are unbiased, the approximation error is also kept under control. Finally, we test the performance of the proposed framework (gSH) on various types of graphs, showing that from a sample with -- 40K edges, it produces estimates with relative errors < 1%.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2014 GraphSampleandHoldAFrameworkfor	Jennifer Neville Nesreen K. Ahmed Nick Duffield Ramana Kompella			Graph Sample and Hold: A Framework for Big-graph Analytics				10.1145/2623330.2623757		2014

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2014_GraphSampleandHoldAFrameworkfor&oldid=850429"

Facts

... more about "2014 GraphSampleandHoldAFrameworkfor"

Nesreen K. Ahmed +, Nick Duffield +, Jennifer Neville + and Ramana Kompella +

10.1145/2623330.2623757 +

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +

Graph Sample and Hold: A Framework for Big-graph Analytics +

2014 +