2005 LinkMiningASurvey

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Link Mining Algorithm, Graph Mining Algorithm, Survey.

Notes

Cited By

Quotes

Abstract

Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-fertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes.


References

  • 1. J. Adibi, H. Chalupsky, Marko Grobelnik, N. Milic-Frayling, and D. Mladenic. KDD Workshop on Link Analysis and Group Detection. 2004.
  • 2. J. Adibi, H. Chalupsky, E. Melz, and A. Valente. The KOJAK group finder: Connecting the dots via integrated knowledge-based and statistical reasoning. In Innovative Applications of Artificial Intelligence Conference, 2004.
  • 3. J. Adibi, Marko Grobelnik, D. Mladenić, and Patrick Pantel. KDD Workshop on Link Discovery: Issues, Approaches and Applications. 2005.
  • 4. Rakesh Agrawal, Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, p.487-499, September 12-15, 1994
  • 5. Edoardo M. Airoldi, Kathleen M. Carley, Sampling algorithms for pure network topologies: a study on the stability and the separability of metric embeddings, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.13-22, December 2005  doi:10.1145/1117454.1117457
  • 6. R. Ananthakrishna, S. Chaudhuri, and Venkatesh Ganti. Eliminating fuzzy duplicates in data warehouses. In: Proceedings of The International Conference on Very Large Databases (VLDB), Hong Kong, China, 2002.
  • 7. A. L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.
  • 8. Krishna Bharat, Monika R. Henzinger, Improved algorithms for topic distillation in a hyperlinked environment, Proceedings of the 21st ACM SIGIR Conference retrieval, p.104-111, August 24-28, 1998, Melbourne, Australia  doi:10.1145/290941.290972
  • 9. Indrajit Bhattacharya, Lise Getoor, Iterative record linkage for cleaning and integration, Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13, 2004, Paris, France  doi:10.1145/1008694.1008697
  • 10. I. Bhattacharya and Lise Getoor. Entity resolution in graphs. Technical Report 4758, Computer Science Department, University of Maryland, 2005.
  • 11. I. Bhattacharya and Lise Getoor. A Latent dirichlet model for unsupervised entity resolution. In SIAM International Conference on Data Mining, (2006). To Appear.
  • 12. P. Bonacich. Power and centrality: A family of measures. American Journal of Sociology, 92(5):1170--1182, 1987.
  • 13. S. P. Borgatti and M. G. Everett. Models of core / periphery structures. Social Networks, 21:375--395, 1999.
  • 14. P. J. Carrington, J. Scott, and Stanley Wasserman. Models and Methods in Social Network Analysis. Cambridge University Press, Cambridge, 2005.
  • 15. Deepayan Chakrabarti, Christos Faloutsos, Tools for large graph mining, Carnegie Mellon University, Pittsburgh, PA, 2005
  • 16. Soumen Chakrabarti. (2002). “Mining the Web.” Morgan Kaufman.
  • 17. Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. (1998). “Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text.” In: Proceedings of the seventh International Conference on World Wide Web 7.
  • 18. Soumen Chakrabarti, Byron Dom, Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD Conference, p.307-318, June 01-04, 1998, Seattle, Washington, United States
  • 19. R. Chellappa and A. Jain. Markov random fields: theory and applications. Academic Press, Boston, 1993.
  • 20. David Cohn, Huan Chang, Learning to Probabilistically Identify Authoritative Documents, Proceedings of the Seventeenth International Conference on Machine Learning, p.167-174, June 29-July 02, 2000
  • 21. D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Neural Information Processing Systems 13, 2001.
  • 22. D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.
  • 23. Diane J. Cook, Lawrence B. Holder, Graph-based Data Mining, IEEE Intelligent Systems, v.15 n.2, p.32-41, March 2000  doi:10.1109/5254.850825
  • 24. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom M. Mitchell, Kamal Nigam, Seán Slattery, Learning to construct knowledge bases from the World Wide Web, Artificial Intelligence, v.118 n.1-2, p.69-113, April 2000  doi:10.1016/S0004-3702(00)00004-7
  • 25. Aron Culotta, Andrew McCallum, Joint deduplication of multiple record types in relational data, Proceedings of the 14th ACM International Conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany  doi:10.1145/1099554.1099615
  • 26. G. V. Cybenko and J. Srivastava. SIAM Workshop on Link Analysis, Counterterrorism and Privacy. 2004.
  • 27. L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In: Proceedings of The International Conference on Knowledge Discovery and Data Mining, pages 30--36, 1998.
  • 28. Thomas G. Dietterich, Lise Getoor, and K. Murphy. ICML Workshop on Statistical Relational Learning and its Connections to Other Fields. 2004.
  • 29. Chris Ding, Xiaofeng He, Parry Husbands, Hongyuan Zha, Horst D. Simon, PageRank, HITS and a unified framework for link analysis, Proceedings of the 25th ACM SIGIR Conference retrieval, August 11-15, 2002, Tampere, Finland  doi:10.1145/564376.564440
  • 30. C. H. Q. Ding. Spectral clustering, (2004). http://crd.lbl.gov/cding/Spectral/.
  • 31. AnHai Doan, Pedro Domingos, Alon Halevy, Learning to Match the Schemas of Data Sources: A Multistrategy Approach, Machine Learning, v.50 n.3, p.279-301, March 2003  doi:10.1023/A:1021765902788
  • 32. AnHai Doan, Jayant Madhavan, Pedro Domingos, Alon Halevy, Learning to map between ontologies on the semantic web, Proceedings of the 11th International Conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA  doi:10.1145/511446.511532
  • 33. Pedro Domingos and Matthew Richardson. Markov logic: A unifying framework for statistical relational learning. In ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields, pages 49--54, 2004.
  • 34. Xin Dong, Alon Halevy, Jayant Madhavan, Reference reconciliation in complex information spaces, Proceedings of the 2005 ACM SIGMOD Conference, June 14-16, 2005, Baltimore, Maryland  doi:10.1145/1066157.1066168
  • 35. S. Donoho, T. Dybala, Marko Grobelnik, N. Milic-Frayling, and D. Mladenic. KDD Workshop on Link Analysis for Detecting Complex Behavior. 2003.
  • 36. S. Dzeroski and H. Blockeel. KDD Workshop on Multi-Relational Data Mining. 2004.
  • 37. S. Dzeroski and H. Blockeel. KDD Workshop on Multi-Relational Data Mining. 2005.
  • 38. Saso (Ed.) Dzeroski, Nada Lavrac, Saso Dzeroski, Relational Data Mining, Springer-Verlag New York, Inc., Secaucus, NJ, 2001
  • 39. S. Dzeroski, L. D. Raedt, and S. Wrobel. KDD Workshop on Multi-Relational Data Mining. 2003.
  • 40. Ronen Feldman. Link analysis: Current state of the art, 2002.
  • 41. O. Frank and K. Nowicki. Exploratory statistical analysis of networks. Annals of Discrete Mathematics, 55:349--366, 1993.
  • 42. T. Frantz and K. M. Carley. A formal characterization of cellular networks. Technical Report CMU-ISRI-05-109, Carnegie Mellon University, 2005.
  • 43. L. Freeman. Centrality in social networks: Conceptual clarifications. Social Networks, 1:215--239, 1979.
  • 44. T. Gärtner. Exponential and geometric kernels for graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data, 2002.
  • 45. Thomas Gärtner, A survey of kernels for structured data, ACM SIGKDD Explorations Newsletter, v.5 n.1, July 2003  doi:10.1145/959242.959248
  • 46. Lise Getoor, Link mining: a new data mining challenge, ACM SIGKDD Explorations Newsletter, v.5 n.1, July 2003  doi:10.1145/959242.959253
  • 47. Lisa Getoor, Nir Friedman, Daphne Koller, Benjamin Taskar, Learning probabilistic models of link structure, The Journal of Machine Learning Research, 3, 3/1/2003
  • 48. Lise Getoor and D. Jensen. AAAI Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.
  • 49. Lise Getoor and D. Jensen. IJCAI Workshop on Learning Statistical Models from Relational Data. 2003.
  • 50. David Gibson, Jon Kleinberg, Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States  doi:10.1145/276627.276652
  • 51. Taher H. Haveliwala, Topic-sensitive PageRank, Proceedings of the 11th International Conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA  doi:10.1145/511446.511513
  • 52. M. Huisman and T. A. B. Snijders. Statistical analysis of longitudinal network data with changing composition. Sociological Methods and Research, 32:253--287, 2003.
  • 53. R. Hummel and S. Zucker. On the foundations of relaxation labeling processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 267--287, 1983.
  • 54. Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda, An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data, Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, p.13-23, September 13-16, 2000
  • 55. Glen Jeh, Jennifer Widom, SimRank: a measure of structural-context similarity, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada  doi:10.1145/775047.775126
  • 56. Glen Jeh, Jennifer Widom, Scaling personalized web search, Proceedings of the 12th International Conference on World Wide Web, May 20-24, 2003, Budapest, Hungary  doi:10.1145/775152.775191
  • 57. D. Jensen. Statistical challenges to inductive inference in linked data. In Seventh International Workshop on Artificial Intelligence and Statistics, 1999.
  • 58. D. Jensen and H. Goldberg. AAAI Fall Symposium on AI and Link Analysis. AAAI Press, 1998.
  • 59. D. V. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SIAM International Conference on Data Mining, April 21--23 2005.
  • 60. H. Kashima and A. Inokuchi. Kernels for graph classification. In ICDM Workshop on Active Mining, 2002.
  • 61. C. Kemp, T. L. Griffiths, and Joshua B. Tenenbaum. Discovering latent classes in relational data. Technical Report AI Memo 2004-019, Massachusetts Institute of Technology, September 2004.
  • 62. Nikhil S. Ketkar, Lawrence B. Holder, Diane J. Cook, Comparison of graph-based and logic-based multi-relational data mining, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.64-71, December 2005  doi:10.1145/1117454.1117463
  • 63. R. D. King, S. H. Muggleton, A. Srinivasan, and M. J. E. Sternberg. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. National Academy of Sciences, 93(1):438--442, January 1996.
  • 64. Jon Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM (JACM), v.46 n.5, p.604-632, Sept. 1999  doi:10.1145/324133.324140
  • 65. A. Knobbe and D. van der Wallen. ECML/PKDD Workshop on Multi-Relational Data Mining. 2001.
  • 66. J. N. Kok and T. Washio. ECML/PKDD Workshop on Mining Graphs, Trees and Sequences. 2004.
  • 67. J. Kubica, A. Moore, D. Cohn, and J. Schneider. eGraph: A fast graph-based method for link analysis and queries. In IJCAI 2003 Text-Mining and Link-Analysis Workshop, August 2003.
  • 68. Jeremy Kubica, Andrew Moore, Jeff Schneider, Tractable Group Detection on Large Link Data Sets, Proceedings of the Third IEEE International Conference on Data Mining, p.573, November 19-22, 2003
  • 69. Jeremy Kubica, Andrew Moore, Jeff Schneider, Yiming Yang, Stochastic link and group detection, Eighteenth national conference on Artificial intelligence, p.798-804, July 28-August 01, 2002, Edmonton, Alberta, Canada
  • 70. Michihiro Kuramochi, George Karypis, Frequent Subgraph Discovery, Proceedings of the 2001 IEEE International Conference on Data Mining, p.313-320, November 29-December 02, 2001
  • 71. John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, p.282-289, June 28-July 01, 2001
  • 72. Nada Lavrac, Saso Dzeroski, Inductive Logic Programming: Techniques and Applications, Routledge, New York, NY, 1993
  • 73. R. Lempel, S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.387-401, June 2000
  • 74. Xin Li, Paul Morie, Dan Roth, Semantic integration in text: from ambiguous names to identifiable entities, AI Magazine, v.26 n.1, p.45-58, March 2005
  • 75. David Liben-Nowell, Jon Kleinberg, The link prediction problem for social networks, Proceedings of the twelfth International Conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA  doi:10.1145/956863.956972
  • 76. Q. Lu and Lise Getoor. Link-based classification. In: Proceedings of The International Conference on Machine Learning, 2003.
  • 77. Alexander Maedche, Steffen Staab, Ontology Learning for the Semantic Web, IEEE Intelligent Systems, v.16 n.2, p.72-79, March 2001  doi:10.1109/5254.920602
  • 78. Takashi Matsuda, Tadashi Horiuchi, Hiroshi Motoda, Takashi Washio, Extension of Graph-based Induction for General Graph Structured Data, Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, p.420-431, April 18-20, 2000
  • 79. S. Muggleton, editor. Inductive Logic Programming. Academic Press, 1992.
  • 80. (Neville & Jensen, 2000) ⇒ Jennifer Neville, and David Jensen. (2000). “Iterative Classification in Relational Data.” In: Proceedings of AAAI-2000 Workshop on Statistical Relational Learning.
  • 81. Jennifer Neville, David Jensen, Leveraging Relational Autocorrelation with Latent Group Models, Proceedings of the Fifth IEEE International Conference on Data Mining, p.322-329, November 27-30, 2005  doi:10.1109/ICDM.2005.89
  • 82. M. E. J. Newman. Detecting community structure in networks. European Physical Journal B, 38:321--330, 2004.
  • 83. A. Y. Ng, A. X. Zheng, and Michael I. Jordan. Link analysis, eigenvectors and stability. In International Joint Conference on Artificial Intelligence (IJCAI), pages 903--910, 2001.
  • 84. Andrew Y. Ng , Alice X. Zheng, Michael I. Jordan, Stable algorithms for link analysis, Proceedings of the 24th ACM SIGIR Conference retrieval, p.258-266, September 2001, New Orleans, Louisiana, United States  doi:10.1145/383952.384003
  • 85. S. Nijssen, T. Meinl, and G. Karypis. ECML/PKDD Workshop on Mining Graphs, Trees and Sequences. 2005.
  • 86. K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077--1087, 2001.
  • 87. Hyo-Jung Oh, Sung Hyon Myaeng, Mann-Ho Lee, A practical hypertext catergorization method using links and incrementally available class information, Proceedings of the 23rd ACM SIGIR Conference retrieval, p.264-271, July 24-28, 2000, Athens, Greece  doi:10.1145/345508.345594
  • 88. Joshua O'Madadhain, Jon Hutchins, Padhraic Smyth, Prediction and ranking algorithms for event-based network data, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.23-30, December 2005  doi:10.1145/1117454.1117458
  • 89. J. O'Madadhain and Padhraic Smyth. EventRank: A framework for ranking time-varying networks. In KDD Workshop on Link Discovery (LinkKDD): Issues, Approaches and Applications, 2005.
  • 90. J. O'Madadhain, Padhraic Smyth, and L. Adamic. Learning predictive models for link formation. Presented at the International Sunbelt Social Network Conference, February, 2005.
  • 91. L. Page, S. Brin, Rajeev Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.
  • 92. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In Advances in Neural Information Processing Systems 15. MIT Press, 2003.
  • 93. Alexandrin Popescul and L. H. Ungar. Statistical relational learning for link prediction. In IJCAI Workshop on Learning Statistical Models from Relational Data, 2003.
  • 94. L. D. Raedt and T. Washio. ECML/PKDD Workshop on Mining Graphs, Trees and Sequences. 2003.
  • 95. Davood Rafiei, Alberto O. Mendelzon, What is this page known for? Computing Web page reputations, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.823-835, June 2000, Amsterdam, The Netherlands
  • 96. Cartic Ramakrishnan, William H. Milnor, Matthew Perry, Amit P. Sheth, Discovering informative connection subgraphs in multi-relational graphs, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.56-63, December 2005  doi:10.1145/1117454.1117462
  • 97. Matthew J. Rattigan, David Jensen, The case for anomalous link discovery, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.41-47, December 2005  doi:10.1145/1117454.1117460
  • 98. Matthew Richardson and Pedro Domingos. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.
  • 99. A. Rosenfeld, R. Hummel, and S. Zucker. Scene labeling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics, 6(6), 1976.
  • 100. Ted E. Senator, Link mining applications: progress and challenges, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.76-83, December 2005  doi:10.1145/1117454.1117465
  • 101. Lisa Singh, Lise Getoor, Louis Licamele, Pruning Social Networks Using Structural Properties and Descriptive Attributes, Proceedings of the Fifth IEEE International Conference on Data Mining, p.773-776, November 27-30, 2005  doi:10.1109/ICDM.2005.125
  • 102. P. Singla and Pedro Domingos. Multi-relational record linkage. In KDD Workshop on Multi-Relational Data Mining, Seattle, WA, August 2004.
  • 103. D. Skillicorn and K. Carley. SIAM Workshop on Link Analysis, Counterterrorism and Security. 2005.
  • 104. A. Srivastava, D. Barbara, H. Kargupta, and Vipin Kumar. SIAM Workshop on Data Mining for Counterterrorism and Security. 2003.
  • 105. Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos, Relevance search and anomaly detection in bipartite graphs, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.48-55, December 2005  doi:10.1145/1117454.1117461
  • 106. Latanya Sweeney, Privacy-enhanced linking, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.72-75, December 2005  doi:10.1145/1117454.1117464
  • 107. Ben Taskar, P. Abbeel, and Daphne Koller. Discriminative probabilistic models for relational data. In: Proceedings of UAI, pages 485--492, Edmonton, Canada, 2002.
  • 108. Ben Taskar, M.-F. Wong, P. Abbeel, and Daphne Koller. Link prediction in relational data. In Neural Information Processing Systems Conference, Vancouver, Canada, December 2003.
  • 109. J. R. Tyler, D. M. Wilkinson, and B. A. Huberman. Email as Spectroscopy: Automated Discovery of Community Structure within Organizations. Kluwer, B. V., Deventer, The Netherlands, The Netherlands, 2003.
  • 110. X. Wang, N. Mohanty, and Andrew McCallum. Group and topic discovery from relations and text. In KDD Workshop on Link Discovery, August 2005.
  • 111. Stanley Wasserman. Conformity of two sociometric relations. Psychometrika, 52:3--18, 1987.
  • 112. Stanley Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, 1994.
  • 113. D. J. Watts and S. H. Strogatz. Collective dynamics of "small-world" networks. Nature, 393:440--442, 1998.
  • 114. Scott White, Padhraic Smyth, Algorithms for estimating relative importance in networks, Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2003, Washington, D.C.  doi:10.1145/956750.956782
  • 115. A. P. Wolfe and D. Jensen. Playing multiple roles: Discovering overlapping roles in social networks. In ICML-04 Workshop on Statistical Relational Learning and its Connections to Other Fields, 2004.
  • 116. Xifeng Yan, Jiawei Han, gSpan: Graph-based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), p.721, December 09-12, 2002
  • 117. K. Yoshida, H. Motoda, and N. Indurkhya. Graph-based induction as a unified learning framework. Journal of Applied Intelligence, 4(3):297--316, July 1994.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 LinkMiningASurveyChristopher P. Diehl
Lise Getoor
Link Mining: A surveySIGKDD Explorations Newsletterhttp://www.kdd.org/explorations/issues/7-2-2005-12/1-Getoor.pdf10.1145/1117454.11174562005