2010 GraphDataManagementAndMining

(Aggarwal & Wang, 2010) ⇒ Charu C. Aggarwal, Haixun Wang. (2010b). “Graph Data Management and Mining: A Survey of Algorithms and Applications.” In: (Aggarwal & Wang, 2010a) doi:10.1007/978-1-4419-6045-0_2

Subject Headings: Graph Management Task, Graph Management Algorithm, Graph Mining Task, Graph Mining Algorithm.

Notes

Quotes

Author Keywords

Graph Mining, Graph Management

Abstract

Graph mining and management has become a popular area of research in recent years because of its numerous applications in a wide variety of practical fields, including computational biology, software bug localization and computer networking. Different applications result in graphs of different sizes and complexities. Correspondingly, the applications have different requirements for the underlying mining algorithms. In this chapter, we will provide a survey of different kinds of graph mining and management algorithms. We will also discuss a number of applications, which are dependent upon graph representations. We will discuss how the different graph mining algorithms can be adapted for different applications. Finally, we will discuss important avenues of future research in the area.

1. Introduction 13

2. Graph Data Management Algorithms 16

2.1 Indexing and Query Processing Techniques 16

2.2 Reachability Queries 19

2.3 Graph Matching 21

2.4 Keyword Search 24

2.5 Synopsis Construction of Massive Graphs 27

3. Graph Mining Algorithms 29

3.1 Pattern Mining in Graphs 29

3.2 Clustering Algorithms for Graph Data 32

3.3 Classification Algorithms for Graph Data 37

Classification is a central task in data mining and machine learning. As graphs are used to represent entities and their relationships in an increasing variety of applications, the topic of graph classification has attracted much attention in both academia and industry. For example, in pharmaceutics and drug design, we are interested to know the relationship between the activity of a chemical compound and the structure of the compound, which is represented by a graph. In social network analysis, we study the relationship between the health of a community (e.g., whether it is expanding or shrinking) and its structure, which again is represented by graphs.
Graph classification is concerned with two different but related learning tasks.
- Label Propagation. A subset of nodes in a graph are labeled. The task is to learn a model from the labeled nodes and use the model to classify the unlabeled nodes.
- Graph classification. A subset of graphs in a graph dataset are labeled. The task is to learn a model from the labeled graphs and use the model to classify the unlabeled graphs.
Label Propagation. The concept of label or belief propagation [174, 209, 210] is a fundamental technique which is used in order to leverage graph structure in the context of classification in a number of relational domains. The scenario of label propagation [44] occurs in many applications. As an example, social network analysis is being used as a mean for targeted marketing.

3.4 The Dynamics of Time-Evolving Graphs 40

4. Graph Applications 43

4.1 Chemical and Biological Applications 43

4.2 Web Applications 45

4.3 Software Bug Localization 51

5. Conclusions and Future Research 55

References

[1] Chemaxon. Screen, Chemaxon Inc., 2005.
[2] Daylight. Daylight Toolkit, Daylight Inc, Mission Viejo, CA, USA, 2008.
[3] Oracle Spatial Topology and Network Data Models 10g Release 1 (10.1) URL: http://www.oracle.com/technology/products/spatial /pdf/10g network model twp.pdf
[4] Semantic Web Challenge. URL: http://challenge.semanticweb.org/
[5] J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN) (Cancun, Mexico). 598-612, 2002.
[6] S. Abiteboul, P. Buneman, D. Suciu. Data on the web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 1999.
[7] C. Aggarwal, Y. Xie, P. Yu. GConnect: A Connectivity Index for Massive Disk-Resident Graphs, VLDB Conference, 2009.
[8] C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.
[9] C. Aggarwal, P. Yu. Online Analysis of Community Evolution in Data Streams. SIAM Conference on Data Mining, 2005. 56 MANAGING AND MINING GRAPH DATA
[10] Rakesh Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of Tran- sitive Relationships in Large Data and Knowledge Bases, ACM SIGMOD Conference, 1989.
[11] Rakesh Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases, VLDB Conference, 1994.
[12] S. Agrawal, S. Chaudhuri, G. Das. DBXplorer: A system for keyword- based search over relational databases. ICDE Conference, 2002.
[13] R. Ahuja, J. Orlin, T.Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.
[14] S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis. On Storing Voluminous RDF Description Bases. In WebDB, 2001.
[15] S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases. In SemWeb, 2001.
[16] S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM KDD Conference, 2007.
[17] Ricardo Baeza-Yates, A Tiberi. Extracting semantic relations from query logs. ACM KDD Conference, 2007.
[18] Z. Bar-Yossef, R. Kumar, D. Sivakumar. Reductions in streaming algo- rithms, with an application to counting triangles in graphs. ACM SODA Conference, 2002.
[19] D. Beckett. The Design and Implementation of the Redland RDF Appli- cation Framework. WWW Conference, 2001.
[20] P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2(1), 2005.
[21] P. Berkhin. Bookmark-coloring approach to personalized pagerank com- puting. Internet Mathematics, 3(1), 2006.
[22] M. Berlingerio, F. Bonchi, B. Bringmann, A. Gionis. Mining Graph- Evolution Rules, PKDD Conference, 2009.
[23] S. Bhagat, G. Cormode, I. Rozenbaum. Applying link-based classifica- tion to label blogs. WebKDD/SNA-KDD, pages 97–117, 2007.
[24] G. Bhalotia, C. Nakhe, A. Hulgeri, Soumen Chakrabarti, S. Sudarshan. Key- word searching and browsing in databases using BANKS. ICDE Conference, 2002.
[25] M. Bilgic, L. Getoor. Effective label acquisition for collective classifica- tion. ACM KDD Conference, pages 43–51, 2008. Graph Data Management and Mining: A Survey of Algorithms and Applications 57
[26] S. Boag, D. Chamberlin, M. F. Fern«andez, D. Florescu, J. Robie, J. Sim«eon. XQuery 1.0: An XML query language. URL: W3C, http://www.w3.org/TR/xquery/, 2007.
[27] I. Bordino, D. Donato, A. Gionis, S. Leonardi. Mining Large Networks with Subgraph Counting. IEEE ICDM Conference, 2008.
[28] C. Borgelt, M. R. Berthold. Mining molecular fragments: Finding Relevant Substructures of Molecules. ICDM Conference, 2002.
[29] S. Brin, L. Page. The Anatomy of a Large Scale Hypertextual Search Engine, WWW Conference, 1998.
[30] H.J. Bohm, G. Schneider. Virtual Screening for Bioactive Molecules. Wiley-VCH, 2000.
[31] B. Bringmann, S. Nijssen. What is frequent in a single graph? PAKDD Conference, 2008.
[32] A. Z. Broder, M. Charikar, A. Frieze, M. Mitzenmacher. Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8– 13):1157–1166, 1997.
[33] J. Broekstra, A. Kampman, F. V. Harmelen. Sesame: A Generic Archi- tecture for Storing and Querying RDF and RDF Schema. In ISWC Conference, 2002.
[34] H. Bunke. On a relation between graph edit distance and maximum com- mon subgraph. Pattern Recognition Letters, 18: pp. 689–694, 1997.
[35] H. Bunke, G. Allermann. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1: pp. 245–253, 1983.
[36] H. Bunke, X. Jiang, A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1): pp. 13–25, 2000.
[37] H. Bunke, K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3): pp. 255–259, 1998.
[38] J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, K. Wilkinson. Jena: implementing the Semantic Web recommendations. In WWW Conference, 2004.
[39] V. R. de Carvalho, W.W. Cohen. On the collective classification of email "speech acts". ACM SIGIR Conference, pages 345–352, 2005.
[40] D. Chakrabarti, Y. Wang, C. Wang, Jure Leskovec, C. Faloutsos. Epidemic thresholds in real networks. ACM Transactions on Information Systems and Security, 10(4), 2008.
[41] D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.
[42] Soumen Chakrabarti. Dynamic Personalized Pagerank in Entity-Relation Graphs, WWW Conference, 2007. 58 MANAGING AND MINING GRAPH DATA
[43] R.-Y. Chang, A. Podgurski, J. Yang. Discovering Neglected Conditions in Software by Mining Dependence Graphs. IEEE Transactions on Software Engineering, 34(5):579–596, 2008.
[44] O. Chapelle, A. Zien, B. Sch-olkopf, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.
[45] S. S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.
[46] C. Chen, C. Lin, M. Fredrikson, M. Christodorescu, X. Yan, Jiawei Han, Mining Graph Patterns Efficiently via Randomized Summaries, VLDB Conference, 2009.
[47] L. Chen, A. Gupta, M. E. Kurul. Stack-based algorithms for pattern matching on dags. VLDB Conference, 2005.
[48] J. Cheng, J. Xu Yu, X. Lin, H.Wang, P. S. Yu. Fast Computing of Reach- ability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.
[49] J. Cheng, J. Xu Yu, X. Lin, H. Wang, P. S. Yu. Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.
[50] Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. KDD Conference, 2007.
[51] C. Chung, J. Min, K. Shim. APEX: An adaptive path index for XML data. In SIGMOD Conference, 2002.
[52] J. Clark, S. DeRose. XML Path Language (XPath). URL: W3C, http://www.w3.org/TR/xpath/, 1999.
[53] E. Cohen. Size-estimation Framework with Applications to Transitive Closure and Reachability, Journal of Computer and System Sciences, v.55 n.3, p.441-453, Dec. 1997.
[54] E. Cohen, E. Halperin, H. Kaplan, U. Zwick. Reachability and Distance Queries via 2-hop Labels, ACM Symposium on Discrete Algorithms, 2002.
[55] S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB Conference, 2003.
[56] M. P. Consens, A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In PODS Conference, 1990.
[57] D. Conte, P. Foggia, C. Sansone, M. Vento. Thirty Years of Graph Matching in Pattern Recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18(3): pp. 265–298, 2004.
[58] D. Cook, L. Holder. Mining Graph Data, John Wiley & Sons Inc, 2007.
[59] B. F. Cooper, N. Sample, M. Franklin, G. Hjaltason, M. Shadmon. A fast index for semistructured data. In VLDB Conference, pages 341–350, 2001. Graph Data Management and Mining: A Survey of Algorithms and Applications 59
[60] L.P. Cordella, P. Foggia, C. Sansone, M. Vento. A (Sub)graph Isomor- phism Algorithm for Matching Large Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(20): pp. 1367–1372, 2004.
[61] G. Cormode, S. Muthukrishnan. Space efficient mining of multigraph streams. ACM PODS Conference, 2005.
[62] K. Crammer Y. Singer. A new family of online algorithms for category ranking. Journal of Machine Learning Research., 3:1025–1058, 2003.
[63] T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Docu- ments Using Structural Summaries. Information Systems, Elsevier, January 2005.
[64] V. Dallmeier, C. Lindig, A. Zeller. Lightweight Defect Localization for Java. In: Proceedings of the 19th European Conference on Object-Oriented Programming (ECOOP), 2005.
[65] M. Deshpande, M. Kuramochi, N. Wale, G. Karypis. Frequent Substructure-based Approaches for Classifying Chemical Compounds. IEEE Transactions on Knowledge and Data Engineering, 17: pp. 1036– 1050, 2005.
[66] E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269-271.
[67] F. Eichinger, K. B-ohm, M. Huber. Improved Software Fault Detection with Graph Mining. Workshop on Mining and Learning with Graphs, 2008.
[68] F. Eichinger, K. B-ohm, M. Huber. Mining Edge-Weighted Call Graphs to Localize Software Bugs. PKDD Conference, 2008.
[69] T. Falkowski, J. Bartelheimer, M. Spilopoulou. Mining and Visualizing the Evolution of Subgroups in Social Networks, ACM International Conference on Web Intelligence, 2006.
[70] M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.
[71] W. Fan, K. Zhang, H. Cheng, J. Gao. X. Yan, Jiawei Han, P. S. Yu O. Ver- scheure. Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. ACM KDD Conference, 2008.
[72] G. Di Fatta, S. Leue, E. Stegantova. Discriminative Pattern Mining in Software Fault Detection. Workshop on Software Quality Assurance, 2006.
[73] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, J. Zhang. Graph Dis- tances in the Data-Stream Model. SIAM Journal on Computing, 38(5): pp. 1709–1727, 2008.
[74] J. Ferlez, C. Faloutsos, Jure Leskovec, D. Mladenić, Marko Grobelnik. Moni- toring Network Evolution using MDL. IEEE ICDE Conference, 2008. 60 MANAGING AND MINING GRAPH DATA
[75] M. Fiedler, C. Borgelt. Support computation for mining frequent sub- graphs in a single graph. Workshop on Mining and Learning with Graphs (MLG’07), 2007.
[76] M.A. Fischler, R.A. Elschlager. The representation and matching of pic- torial structures. IEEE Transactions on Computers, 22(1): pp 67–92, 1973.
[77] P.-O. Fjallstrom. Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science, Vol 3, no 10, 1998.
[78] G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.
[79] D. Fogaras, B. R«acz, K. Csalog«any, T. Sarl«os. Towards scaling fully per- sonalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3), 2005.
[80] M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness,W. H. Freeman, 1979.
[81] T. Gartner, P. Flach, S. Wrobel. On graph kernels: Hardness results and efficient alternatives. 16th Annual Conference on Learning Theory, pp. 129–143, 2003.
[82] D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.
[83] R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs. International Conference in Pattern recognition (ICPR), 2002.
[84] S. Godbole, S. Sarawagi. Discriminative methods for multi-labeled clas- sification. PAKDD Conference, pages 22–30, 2004.
[85] R. Goldman, J. Widom. DataGuides: Enable query formulation and opti- mization in semistructured databases. VLDB Conference, pages 436–445, 1997.
[86] L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: ranked key- word search over XML documents. ACMSIGMOD Conference, pages 16– 27, 2003.
[87] M. S. Gupta, A. Pathak, Soumen Chakrabarti. Fast algorithms for top-k personalized pagerank queries. WWW Conference, 2008.
[88] R. H. Guting. GraphDB: Modeling and querying graphs in databases. In VLDB Conference, pages 297–308, 1994.
[89] M. Gyssens, J. Paredaens, D. van Gucht. A graph-oriented object database model. In PODS Conference, pages 417–424, 1990.
[90] Jiawei Han, J. Pei, Y. Yin. Mining Frequent Patterns without Candidate Generation. SIGMOD Conference, 2000. Graph Data Management and Mining: A Survey of Algorithms and Applications 61
[91] S. Harris, N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS Conference, 2003.
[92] S. Harris, N. Shadbolt. SPARQL query processing with conventional re- lational database systems. In SSWS Conference, 2005.
[93] M. Al Hasan, V. Chaoji, S. Salem, J. Besson, M. J. Zaki. ORIGAMI:Mining Representative Orthogonal Graph Patterns. ICDM Conference, 2007.
[94] D. Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz, 1999.
[95] T. Haveliwala. Topic-Sensitive Page Rank, World Wide Web Conference, 2002.
[96] H. He, A. K. Singh. Query Language and Access Methods for Graph Databases, appears as a chapter in Managing and Mining Graph Data, ed. Charu Aggarwal, Springer, 2010.
[97] H. He, Querying and mining graph databases. Ph.D. Thesis, UCSB, 2007.
[98] H. He, A. K. Singh. Efficient Algorithms for Mining Significant Sub- structures from Graphs with Quality Guarantees. ICDM Conference, 2007.
[99] H. He, H. Wang, J. Yang, P. S. Yu. BLINKS: Ranked keyword searches on graphs. SIGMOD Conference, 2007.
[100] J. Huan, W. Wang, J. Prins, J. Yang. Spin: Mining Maximal Frequent Subgraphs from Graph Databases. KDD Conference, 2004.
[101] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, A. Trop- sha. Mining Spatial Motifs from Protein Structure Graphs. Research in Computational Molecular Biology (RECOMB), pp. 308–315, 2004.
[102] V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4):525–539, 2006.
[103] V. Hristidis, Y. Papakonstantinou. Discover: Keyword search in rela- tional databases. VLDB Conference, 2002.
[104] A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. PKDD Conference, pages 13–23, 2000.
[105] H. V. Jagadish. A compression technique to materialize transitive clo- sure. ACM Trans. Database Syst., 15(4):558–598, 1990.
[106] H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, C. Yu. TIMBER: A native XML database. In VLDB Journal, 11(4):274–291, 2002.
[107] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, K. Thompson. TAX: A tree algebra for XML. DBPL Conference, 2001. 62 MANAGING AND MINING GRAPH DATA
[108] G. Jeh, J. Widom. Scaling personalized web search. In WWW, pages 271–279, 2003.
[109] J. L. Jenkins, A. Bender, J. W. Davies. In silico target fishing: Pre- dicting biological targets from chemical structure. Drug Discovery Today, 3(4):413–421, 2006.
[110] R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, G. Agrawal. Discovering Frequent Topological Structures from Graph Datasets. ACM KDD Conference, 2005.
[111] R. Jin, H. Hong, H. Wang, Y. Xiang, N. Ruan. Computing Label- Constraint Reachability in Graph Databases. Under submission, 2009.
[112] R. Jin, Y. Xiang, N. Ruan, D. Fuhry. 3-HOP: A high-compression in- dexing scheme for reachability query. SIGMOD Conference, 2009.
[113] V. Kacholia, S. Pandit, Soumen Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB Conference, 2005.
[114] H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between La- beled Graphs, ICML, 2003.
[115] R. Kaushik, P. Bohannon, J. Naughton, H. Korth. Covering indexes for branching path queries. In SIGMOD Conference, June 2002.
[116] B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291-307.
[117] M.-S. Kim, Jiawei Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks, VLDB Conference, 2009.
[118] J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46(5):pp. 604–632, 1999.
[119] R.I. Kondor, J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. ICML Conference, pp. 315–322, 2002.
[120] M. Koyuturk, A. Grama, W. Szpankowski. An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks. Bioinformatics, 20:I200–207, 2004.
[121] T. Kudo, E.Maeda, Y.Matsumoto. An Application of Boosting to Graph Classification, NIPS Conference 2004.
[122] R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.
[123] M. Kuramochi, G. Karypis. Frequent subgraph discovery. ICDM Conference, pp. 313–320, Nov. 2001.
[124] M. Kuramochi, G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3): pp. 243–271, 2005. Graph Data Management and Mining: A Survey of Algorithms and Applications 63
[125] J. Larrosa, G. Valiente. Constraint satisfaction algorithms for graph pat- tern matching. Mathematical Structures in Computer Science, 12(4): pp. 403–422, 2002.
[126] M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. CIKM Conference, 2002.
[127] Jure Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. S. Glance. Cost-effective outbreak detection in networks. KDD Conference, pp. 420–429, 2007.
[128] Jure Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs, SDM Conference, 2007.
[129] Jure Leskovec, J. Kleinberg, C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. ACM KDD Conference, 2005.
[130] Jure Leskovec, Eric Horvitz. Planetary-Scale Views on a Large Instant- Messaging Network, WWW Conference, 2008.
[131] Jure Leskovec, L. Backstrom, R. Kumar, A. Tomkins. Microscopic Evolu- tion of Social Networks, ACM KDD Conference, 2008.
[132] Q. Li, B. Moon. Indexing and querying XML data for regular path expressions. In VLDB Conference, pages 361–370, September 2001.
[133] W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.
[134] L. Lim, H. Wang, M. Wang. Semantic Queries in Databases: Problems and Challenges. CIKM Conference, 2009.
[135] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, B. L. Tseng. FacetNet: A frame- work for analyzing communities and their evolutions in dynamic networks. WWW Conference, 2008.
[136] C. Liu, X. Yan, H. Yu, Jiawei Han, P. S. Yu. Mining Behavior Graphs for “Backtrace” of Noncrashing Bugs. SDM Conference, 2005.
[137] C. Liu, X. Yan, L. Fei, Jiawei Han, S. P. Midkiff. SOBER: Statistical Model-based Bug Localization. SIGSOFT Software Engineering Notes, 30(5):286–295, 2005.
[138] Q. Lu, L. Getoor. Link-based classification. ICML Conference, pages 496–503, 2003.
[139] F. Manola, E. Miller. RDF Primer. W3C, http://www.w3.org/TR/rdf- primer/, 2004.
[140] A. McGregor. Finding Graph Matchings in Data Streams. APPROXRANDOM, pp. 170–181, 2005. 64 MANAGING AND MINING GRAPH DATA
[141] T. Milo and D. Suciu. Index structures for path expression. In ICDT Conference, pages 277–295, 1999.
[142] S. Navlakha, R. Rastogi, N. Shrivastava. Graph Summarization with Bounded Error. ACMSIGMOD Conference, pp. 419–432, 2008.
[143] M. Neuhaus, H. Bunke. Self-organizing maps for learning the edit costs in graph matching. IEEE Transactions on Systems, Man, and Cybernetics, 35(3) pp. 503–514, 2005.
[144] M. Neuhaus, H. Bunke. Automatic learning of cost functions for graph edit distance. Information Sciences, 177(1), pp 239–247, 2007.
[145] M. Neuhaus, H. Bunke. Bridging the Gap Between Graph Edit Distance and Kernel Machines. World Scientific, 2007.
[146] M. Newman. Finding community structure in networks using the eigen- vectors of matrices. Physical Review E, 2006.
[147] M. E. J. Newman. The spread of epidemic disease on networks, Phys. Rev. E 66, 016128, 2002.
[148] J. Pei, D. Jiang, A. Zhang. OnMining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.
[149] Nidhi,M. Glick, J. Davies, J. Jenkins. Prediction of biological targets for compounds using multiple-category bayesian models trained on chemoge- nomics databases. J Chem Inf Model, 46:1124–1133, 2006.
[150] S. Nijssen, J. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of SIGKDD, pages 647–652, 2004.
[151] L. Page, S. Brin, Rajeev Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
[152] Z. Pan, J. Heflin. DLDB: Extending relational databases to support Se- mantic Web queries. In PSSS Conference, 2003.
[153] J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.
[154] E. Prud'hommeaux and A. Seaborne. SPARQL query language for RDF. W3C, URL: http://www.w3.org/TR/rdf-sparql-query/, 2007.
[155] L. Qin, J.-X. Yu, L. Chang. Keyword search in databases: The power of RDBMS. SIGMOD Conference, 2009.
[156] S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405-416, 2003.
[157] S. Ranu, A. K. Singh. GraphSig: A scalable approach to mining signifi- cant subgraphs in large graph databases. ICDE Conference, 2009.
[158] M. Rattigan, M.Maier, D. Jensen. Graph Clustering with Network Sructure Indices. ICML, 2007. Graph Data Management and Mining: A Survey of Algorithms and Applications 65
[159] P. R. Raw, B. Moon. PRIX: Indexing and querying XML using pr-ufer sequences. ICDE Conference, 2004.
[160] J. W. Raymond, P. Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comp. Aided Mol. Des., 16(7):521–533, 2002.
[161] K. Riesen, X. Jiang, H. Bunke. Exact and Inexact Graph Matching: Methodology and Applications, appears as a chapter in Managing and Mining Graph Data, ed. Charu Aggarwal, Springer, 2010.
[162] H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. GBoost: A mathematical programming approach to graph classification and regres- sion. Machine Learning, 2008.
[163] F. Sams-Dodd. Target-based drug discovery: is something wrong? Drug Discov Today, 10(2):139–147, Jan 2005.
[164] P. Sarkar, A. Moore, A. Prakash. Fast Incremental Proximity Search in Large Graphs, ICML Conference, 2008.
[165] P. Sarkar, A. Moore. Fast Dynamic Re-ranking of Large Graphs, WWW Conference, 2009.
[166] A. D. Sarma, S. Gollapudi, R. Panigrahy. Estimating PageRank in Graph Streams, ACM PODS Conference, 2008.
[167] V. Satuluri, S. Parthasarathy. Scalable Graph Clustering Using Stochas- tic Flows: Applications to Community Discovery, ACM KDD Conference, 2009.
[168] R. Schenkel, A. Theobald, G. Weikum. Hopi: An efficient connection index for complex XML document collections. EDBT Conference, 2004.
[169] J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, J. F. Naughton. Relational databases for querying XML documents: Limita- tions and opportunities. VLDB Conference, 1999.
[170] N. Stiefl, I. A.Watson, K. Baumann, A. Zaliani. Erg: 2d pharmacophore descriptor for scaffold hopping. J. Chem. Info. Model., 46:208–220, 2006.
[171] J. Sun, S. Papadimitriou, C. Faloutsos, P. Yu. GraphScope: Parameter Free Mining of Large Time-Evolving Graphs, ACM KDD Conference, 2007.
[172] S. J. Swamidass, J. Chen, J. Bruand, P. Phung, L. Ralaivola, P. Baldi. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1):359–368, 2005.
[173] L. Tang, H. Liu, J. Zhang, Z. Nazeri. Community evolution in dynamic multi-mode networks. ACM KDD Conference, 2008.
[174] B. Taskar, P. Abbeel, Daphne Koller. Discriminative probabilistic models for relational data. In UAI, pages 485–492, 2002. 66 MANAGING AND MINING GRAPH DATA
[175] H. Tong, C. Faloutsos, J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, pages 613–622, 2006.
[176] S. TrißI, U. Leser. Fast and practical indexing and querying of very large graphs. SIGMOD Conference, 2007.
[177] A. A. Tsay, W. S. Lovejoy, D. R. Karger. Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383-413, 1999.
[178] K. Tsuda, W. S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004.
[179] K. Tsuda, H. Saigo. Graph Classification, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
[180] J.R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of the Association for Computing Machinery, 23(1): pp. 31–42, 1976.
[181] N. Vanetik, E. Gudes, S. E. Shimony. Computing Frequent Graph Pat- terns from Semi-structured Data. IEEE ICDM Conference, 2002.
[182] R. Volz, D. Oberle, S. Staab, and B. Motik. KAON SERVER : A Se- mantic Web Management System. In WWW Conference, 2003.
[183] H.Wang, C. Aggarwal. A Survey of Algorithms for Keyword Search on Graph Data. appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
[184] H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.
[185] H.Wang, S. Park,W. Fan, P. S. Yu. ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. In SIGMOD Conference, 2003.
[186] H. Wang, X. Meng. On the Sequencing of Tree Structures for XML Indexing. In ICDE Conference, 2005.
[187] Y. Wang, D. Chakrabarti, C. Wang, C. Faloutsos. Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint, SRDS, pp. 25-34, 2003.
[188] N. Wale, G. Karypis. Target identification for chemical compounds using target-ligand activity data and ranking based methods. Technical Re- port TR-08-035, University of Minnesota, 2008.
[189] N. Wale, G. Karypis, I. A. Watson. Method for effective virtual screening and scaffold-hopping in chemical compounds. Comput Syst Bioinformatics Conf, 6:403–414, 2007.
[190] N. Wale, X. Ning, G. Karypis. Trends in Chemical Graph Data Mining, appears as a chapter inManaging andMining Graph Data, Springer, 2010.
[191] N.Wale, I. A.Watson, G. Karypis. Indirect similarity based methods for effective scaffold-hopping in chemical compounds. J. Chem. Info. Model., 48(4):730–741, 2008. Graph Data Management and Mining: A Survey of Algorithms and Applications 67
[192] N.Wale, I. A.Watson, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Journal of Knowledge and Information Systems, 14:347–375, 2008.
[193] C. Weiss, P. Karras, A. Bernstein. Hexastore: Sextuple Indexing for Se- mantic Web Data Management. In VLDB Conference, 2008.
[194] K.Wilkinson. Jena property table implementation. In SSWS Conference, 2006.
[195] K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in Jena2. In SWDB Conference, 2003.
[196] Y. Xu, Y. Papakonstantinou. Efficient LCA based keyword search in XML data. EDBT Conference, 2008.
[197] Y. Xu, Y.Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. ACM SIGMOD Conference, 2005.
[198] X. Yan, Jiawei Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.
[199] X. Yan, H. Cheng, Jiawei Han, P. S. Yu. Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.
[200] X. Yan, Jiawei Han. Gspan: Graph-based Substructure Pattern Mining. ICDM Conference, 2002.
[201] X. Yan, P. S. Yu, Jiawei Han. Graph indexing: A frequent structure-based approach. SIGMOD Conference, 2004.
[202] X. Yan, P. S. Yu, Jiawei Han. Substructure similarity search in graph databases. SIGMOD Conference, 2005.
[203] X. Yan, B. He, F. Zhu, Jiawei Han. Top-K Aggregation Queries Over Large Networks, IEEE ICDE Conference, 2010.
[204] J. X. Yu, J. Cheng. Graph Reachability Queries: A Survey, appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
[205] M. J. Zaki, Charu C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.
[206] T. Zhang, Alexandrin Popescul, B. Dom. Linear prediction models with graph regularization for web-page categorization. ACM KDD Conference, pages 821–826, 2006.
[207] Q. Zhang, I. Muegge. Scaffold hopping through virtual screening using 2d and 3d similarity descriptors: Ranking, voting and consensus scoring. J. Chem. Info. Model., 49:1536–1548, 2006.
[208] P. Zhao, J. Yu, P. Yu. Graph indexing: tree + delta >= graph. VLDB Conference, 2007.
[209] D. Zhou, J. Huang, B. Sch-olkopf. Learning from labeled and unlabeled data on a directed graph. ICML Conference, pages 1036–1043, 2005. 68 MANAGING AND MINING GRAPH DATA
[210] D. Zhou, O. Bousquet, J.Weston, B. Sch-olkopf. Learning with local and global consistency. Advances in Neural Information Processing Systems (NIPS) 16, pages 321–328. MIT Press, 2004.
[211] X. Zhu, Zoubin Ghahramani, J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. ICML Conference, pages 912– 919, 2003.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2010 GraphDataManagementAndMining	Charu C. Aggarwal Haixun Wang			Graph Data Management and Mining: A Survey of Algorithms and Applications			http://www.springer.com/cda/content/document/cda downloaddocument/9781441960443-c1.pdf	10.1007/978-1-4419-6045-0_2		2010