2007 CostEffectiveOutbreakDetectioni

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Network Analysis

Notes

Cited By

2009

Quotes

Abstract

Given a water distribution network, where should we place sensors toquickly detect contaminants? Or, which blogs should we read to avoid missing important stories?.

These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information asquickly as possible. We present a general methodology for near optimal sensor placement in these and related problems. We demonstrate that many realistic outbreak detection objectives (e.g., detection likelihood, population affected) exhibit the property of " submodularity ". We exploit submodularity to develop an efficient algorithm that scales to large problems, achieving near optimal placements, while being 700 times faster than a simple greedy algorithm. We also derive online bounds on the quality of the placements obtained by any algorithm. Our algorithms and bounds also handle cases where nodes (sensor locations, blogs) have different costs.

We evaluate our approach on several large real-world problems, including a model of a water distribution network from the EPA, andreal blog data. The obtained sensor placements are provably near optimal, providing a constant fraction of the optimal solution. We show that the approach scales, achieving speedups and savings in storage of several orders of magnitude. We also show how the approach leads to deeper insights in both applications, answering multicriteria trade-off, cost-sensitivity and generalization questions.

Abstract

Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories?.

These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information asquickly as possible.

We present a general methodology for near optimal sensor placement in these and related problems. We demonstrate that many realistic outbreak detection objectives (e.g., detection likelihood, population affected) exhibit the property of "submodularity". We exploit submodularity to develop an efficient algorithm that scales to large problems, achieving near optimal placements, while being 700 times faster than a simple greedy algorithm. We also derive online bounds on the quality of the placements obtained by any algorithm. Our algorithms and bounds also handle cases where nodes (sensor locations, blogs) have different costs.

We evaluate our approach on several large real-world problems,including a model of a water distribution network from the EPA, andreal blog data. The obtained sensor placements are provably near optimal, providing a constant fraction of the optimal solution. We show that the approach scales, achieving speedups and savings in storage of several orders of magnitude. We also show how the approach leads to deeper insights in both applications, answering multicriteria trade-off, cost-sensitivity and generalization questions.

7.2 Related work

Optimizing submodular functions. The fundamental result about the greedy algorithm for maximizing submodular functions in the unit-cost case goes back to [17]. The first approximation results about maximizing submodular functions in the non-constant cost case were proved by [25]. They developed an algorithm with approximation guarantee of (1 − 1/e), which however requires a number of function evaluations Ω(B \mid V \mid 4) in the size of the ground set V (if the lowest cost is constant). In contrast, the number of evaluations required by CELF is O(B \mid V \mid ), while still providing a constant factor approximation guarantee.

Virus propagation and outbreak detection
Work on spread of diseases in networks and immunization mostly focuses on determining the value of the epidemic threshold [1], a critical value of the virus transmission probability above which the virus creates an epidemic. Several strategies for immunization have also been proposed: uniform node immunization, targeted immunization of high degree nodes [20] and acquaintance immunization, which focuses on highly connected nodes [5]. In the context of our work, uniform immunization corresponds to randomly placing sensors in the network. Similarly, targeted immunization corresponds to selecting nodes based on their in- or out-degree. As we have seen in Figures 5 and 11, both strategies perform worse than direct optimization of the population affected criterion.
Information cascades and blog networks
Cascades have been studied for many years by sociologists concerned with the diffusion of innovation [23]; more recently, cascades we used for studying viral marketing [8, 14], selecting trendsetters in social networks [21], and explaining trends in blogspace [9, 13]. Studies of blogspace either spend effort mining topics from posts [9] or consider only the properties of blogspace as a graph of unlabeled URLs [13]. Recently, [16] studied the properties and models of information cascades in blogs. While previous work either focused on empirical analyses of information propagation and/or provided models for it, we develop a general methodology for node selection in networks while optimizing a given criterion.
Water distribution network monitoring
A number of approaches have been proposed for optimizing water sensor networks (c.f., [2] for an overview of the literature). Most of these approaches are only applicable to small networks up to approximately 500 nodes. Many approaches are based on heuristics (such as genetic algorithms [18], cross-entropy selection [6], etc.) that cannot provide provable performance guarantees about the solutions. Closest to ours is an approach by [2], who equate the placement problem with a p-median problem, and make use of a large toolset of existing algorithms for this problem. The problem instances solved by [2] are a factor 72 smaller than the instances considered in this paper. In order to obtain bounds for the quality of the generated placements, the approach in [2] needs to solve a complex (NP-hard) mixed-integer program. Our approach is the first algorithm for the water network placement problem, which is guaranteed to provide solutions that achieve at least a constant fraction of the optimal solution within polynomial time. Additionally, it handles orders of magnitude larger problems than previously considered.

References

  • 1. N. Bailey. The Mathematical Theory of Infectious Diseases and Its Applications. Griffin, London, 1975.
  • 2. J. Berry, W. E. Hart, C. E. Phillips, J. G. Uber, and J. Watson. Sensor Placement in Municipal Water Networks with Temporal Integer Programming Models. J. Water Resources Planning and Management, 2006.
  • 3. S. Bikhchandani, D. Hirshleifer, and I. Welch. A Theory of Fads, Fashion, Custom, and Cultural Change As Informational Cascades. J. of Polit. Econ., (5), 1992.
  • 4. Stephen Boyd, Lieven Vandenberghe, Convex Optimization, Cambridge University Press, New York, NY, 2004
  • 5. R. Cohen, S. Havlin, and D. Ben Avraham. Efficient Immunization Strategies for Computer Networks and Populations. Physical Review Letters, 91:247901, 2003.
  • 6. G. Dorini, P. Jonkergouw, and Et. Al. An Efficient Algorithm for Sensor Placement in Water Distribution Systems. In at. Dist. Syst. An. Conf., 2006.
  • 7. Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, Takashi Tomokiyo, Deriving Marketing Intelligence from Online Discussion, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 21-24, 2005, Chicago, Illinois, USA doi:10.1145/1081870.1081919
  • 8. J. Goldenberg, B. Libai, and E. Muller. Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-mouth. Marketing Letters, 12, 2001.
  • 9. Daniel Gruhl, R. Guha, David Liben-Nowell, Andrew Tomkins, Information Diffusion through Blogspace, Proceedings of the 13th International Conference on World Wide Web, May 17-20, 2004, New York, NY, USA doi:10.1145/988672.988739
  • 10. David Kempe, Jon Kleinberg, Éva Tardos, Maximizing the Spread of Influence through a Social Network, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2003, Washington, D.C. doi:10.1145/956750.956769
  • 11. Samir Khuller, Anna Moss, Joseph (Seffi) Naor, The Budgeted Maximum Coverage Problem, Information Processing Letters, v.70 n.1, p.39-45, April 01, 1999 doi:10.1016/S0020-0190(99)00031-9
  • 12. A. Krause, J. Leskovec, C. Guestrin, J. VanBriesen, and C. Faloutsos. Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks. Submitted to the J. of Water Resources Planning An Management, 2007.
  • 13. Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins, On the Bursty Evolution of Blogspace, Proceedings of the 12th International Conference on World Wide Web, May 20-24, 2003, Budapest, Hungary doi:10.1145/775152.775233
  • 14. Jure Leskovec, Lada A. Adamic, Bernardo A. Huberman, The Dynamics of Viral Marketing, Proceedings of the 7th ACM Conference on Electronic Commerce, p.228-237, June 11-15, 2006, Ann Arbor, Michigan, USA doi:10.1145/1134707.1134732
  • 15. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective Outbreak Detection in Networks. TR, CMU-ML-07-111, 2007.
  • 16. J. Leskovec, M. McGlohon, C. Faloutsos, N. S. Glance, and M. Hurst. Cascading Behavior in Large Blog Graphs. In SDM, 2007.
  • 17. G. Nemhauser, L. Wolsey, and M. Fisher. An Analysis of the Approximations for Maximizing Submodular Set Functions. Mathematical Programming, 14, 1978.
  • 18. A. Ostfeld and E. Salomons. Optimal Layout of Early Warning Detection Stations for Water Distribution Systems Security. J. Water Resources Planning and Management, 130(5):377--385, 2004.
  • 19. A. Ostfeld, J. G. Uber, and E. Salomons. Battle of Water Sensor Networks: A Design Challenge for Engineers and Algorithms. In WDSA, 2006.
  • 20. R. Pastor-Satorras and A. Vespignani. Immunization of Complex Networks. Physical Review E, 65, 2002.
  • 21. Matthew Richardson, Pedro Domingos, Mining Knowledge-sharing Sites for Viral Marketing, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada doi:10.1145/775047.775057
  • 22. T. G. Robertazzi, S. C. Schwartz, An Accelerated Sequential Algorithm for Producing D-optimal Designs, SIAM Journal on Scientific and Statistical Computing, v.10 n.2, p.341-358, March 1989 doi:10.1137/0910022
  • 23. E. Rogers. Diffusion of Innovations. Free Press, 1995.
  • 24. L. A. Rossman. The Epanet Programmer's Toolkit for Analysis of Water Distribution Systems. In Ann. Wat. Res. Plan. Mgmt. Conference, 1999.
  • 25. Maxim Sviridenko, A Note on Maximizing a Submodular Set Function Subject to a Knapsack Constraint, Operations Research Letters, v.32 n.1, p.41-43, January, 2004 doi:10.1016/S0167-6377(03)00062-2;


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 CostEffectiveOutbreakDetectioniNatalie Glance
Christos Faloutsos
Carlos Guestrin
Jure Leskovec
Andreas Krause
Jeanne VanBriesen
Cost-effective Outbreak Detection in Networkshttp://www.cs.cmu.edu/~jure/pubs/detect-kdd07.pdf10.1145/1281192.12812392007