2006 LinkPredictionUsignSupervisedLearning

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Supervised Link Prediction Algorithm, Supervised Multiclass Classification Algorithm.

Notes

Cited By

Quotes

Abstract

Social network analysis has attracted much attention in recent years. Link prediction is a key research direction within this area. In this paper, we study link prediction as a supervised learning task. Along the way, we identify a set of features that are key to the performance under the supervised learning setup. The identified features are very easy to compute, and at the same time surprisingly effective in solving the link prediction problem. We also explain the effectiveness of the features from their class density distribution. Then we compare different classes of supervised learning algorithms in terms of their prediction performance using various performance metrics, such as accuracy, precision-recall, F-values, squared error etc. with a 5-fold cross validation. Our results on two practical social network datasets shows that most of the well-known classification algorithms (decision tree, k-NN, multilayer perceptron, SVM, RBF network) can predict links with comparable performances, but SVM outperforms all of them with narrow margin in all performance measures. Again, ranking of features with popular feature ranking algorithms shows that a small subset of features always plays a significant role in link prediction.

=== 1 Introduction and Background

Social networks are a popular way to model the interaction among the people in a group or community. They can be visualized as graphs, where a vertex corresponds to a person in some group and an edge represents some form of association between the corresponding persons. The associations are usually driven by mutual interests that are intrinsic to a group. However, social networks are very dynamic objects, since new edges and vertices are added to the graph over the time. Understanding the dynamics that drives the evolution of social network is a complex problem due to a large number of variable parameters. But, a comparatively easier problem is to understand the association between two specific nodes. Several variations of the above problems make interesting research topics. For instance, some of the interesting questions that can be posed are { how does the association pattern change over time, what are the factors that drive the associations, how is the association between two nodes affected by other nodes. The specific problem instance that we address in this research is to predict the likelihood of a future association between two nodes, knowing that there is no association between the nodes in the current state of the graph. This problem is commonly known as the Link Prediction problem.

2 Related Work

3 Data and Experimental Setup

Consider a social network [math]\displaystyle{ G=\lt V,E\gt }[/math] in which each edge [math]\displaystyle{ e = }[/math]

4 Results and Discussions

Table 2 and 3 show the performance comparison for different classifi ers on the BIOBASE and DBLP datasets respectively. In both the datasets, counts of positive class and the negative class were almost the same. So, a baseline classifier would have an accuracy around 50% by classifying all the testing data points to be equal to 1 or 0, whereas all the models that we tried reached an accuracy above 80%.

5 Issues regarding Real-life Dataset

6 Future Work

Our research currently considers link prediction only in the co-authorship domain. In future, we would like to consider a number of datasets from different domains to better understand the link prediction problem. We would also like to define a degree of confidence for link prediction instead of providing a hard binary classification. Moreover, our current attribute set does not have any attributes that capture causal relationships. It is possible that some of the attribute values that we consider are time dependent, i.e. their values should be evaluated by using different weights for different years. In future, we like to consider these kind of attributes. There are online social networks, such as LinkedIn (http://linkedin.com) and Friendster (http://www.friendster.com), where they will be very useful. These online networks would like to predict which users would share common interests. These interests are likely to change over time which would affect the likelihood of a link between two users. This is similar to keeping track of dynamic user groups.

7 Conclusion

Link prediction in a social network is an important problem and it is very helpful in analyzing and understanding social groups. Such understanding can lead to efficient implementation of tools to identify hidden groups or to find missing members of groups, etc. which are the most common problems in security and criminal investigation research. In this research we suggest categories of features that should be considered for link prediction in any social network application. Of course, the exact value of a feature would depend on the application at hand. For example, in a terrorist network, two terrorists could have strong proximity either if they have the same skills or if they have complementary skills. Through this work, we have shown that the link prediction problem can be handled effectively by modeling it as a classification problem. We have shown that most of the popular classification models can solve the problem with an acceptable accuracy, but the state of the art classification algorithm, SVM, beats all of them in many performance metrics. Moreover, we provided a comparison of the features and ranked them according to their prediction ability using different feature analysis algorithms. We believe that these ranks are meaningful and can help other researchers to choose attributes for link prediction problem in a similar domain.

References

  • [1] Lada A. Adamic and E. Adar, Friend and Neighbors on the Web, Social Networks, 25(3), 2003, pp. 211-300.
  • [2] A. L. Barabasi, H. Jeong, Z. Neda, A Schubert and T Vicsek, Evolution of the Social Network for Scientific Collaboration, PHYSICA A, 311 (2002), pp. 3.
  • [3] J. Baumes and M. M. Ismail, Finding Communities by Clustering a Graph into Overlapping Subgraph, Intl. Conference on Applied Computing,(2005).
  • [4] I. Bhattacharya, and L. Getoor, Deduplication and Group Detection using Links., LinkKDD, 2004.
  • [5] D. Cai, Z. Shao, X. He, X. Yan and J. Han, Mining Hidden Communities in Heterogeneous Social Networks, LinkKDD 2005.
  • [6] R. Caruana and A. Niculescu-Mizil, Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria, KDD 2004.
  • [7] R. Caruana, A. Niculescu-Mizil, G. Crew and A. Ksikes, Ensemble Selection from Libraries of Models, Intl. Conference of Machine Learning, 2004.
  • [8] R. Cross, A. Parket, L. Prusak and S. Borgatti, Knowing What We Know: Supporting Knowledge Creation and Sharing in Social Networks, Organizational Dynamics, 30 (2001), pp. 100-120.
  • [9] S. N. Dorogovtsev, J. F. Mendes, Evolution of Networks, Advan. Physics., 51 (2002), pp. 1079.
  • [10] C. Faloutsos, K. McCurley and A. Tomkins, Fast Discovery of Connection Subgraphs, Intl. Conference Knowledge. Discovery and Data Mining, 2004.
  • [11] A. Goldenberg, and A. W. Moore, Bayes Net Graphs to Understand Co-authorship Networks?, LinkKDD 2005.
  • [12] G. Gu and E. Chang, Aligning Boundary in Kernel Space for Learning Imbalanced Dataset, ICDM, 2004.
  • [13] Q. Hu, X. Zhang and D .Saha, Modeling Virus and Anti-Virus Dynamics in Topology-Aware Networks, GLOBECOM, 2004, IEEE.
  • [14] Z. Huang, X. Li and H. Chen, Link Prediction Approach to Collaborative Filtering, Join Conference on Digital Libraries, Denver, CO, 2005
  • [15] Z. Huang and D. Zeng, Why Does Collaborative Filtering Work? - Recommendation Model Validation and Selection by Analyzing Bipartite Random Graphs, * Workshop of Information Technologies and Systems, Las Vegas, NV, 2005
  • [16] J. M. Kleinberg, Navigation in a Small World, Nature, 406 (2000), pp 845.
  • [17] D. Liben-Nowell and J. Kleinberg, The Link Prediction Problem for Social Networks, LinkKDD, 2004.
  • [18] B. Malin, Unsupervised Name Disambiguation via Social Network Similarity. Workshop on Link Analysis, Counter-terrorism, and Security, 2005.
  • [19] S. Milgram, The Small World Problem, Psychology Today, 2 (1967), pp 60-67.
  • [20] M. J. Newman, The Structure of Scientific Collaboration Networks, Proceedings of National Acad. of Sciences, 98 (2001), pp 404-409.
  • [21] M. J. Newman, Fast Algorithm for Detecting Community Structure in Networks, LinkKDD 2004.
  • [22] M. Ohira, N. Ohsugi, T. Ohoka and K. Matsumoto, Accelerating Cross-Project Knowledge Collaboration using Collaborative Filtering and Social Networks, Intl. Conference of Software Engineering, 2005.
  • [23] J. Myers, K. Laskey, and K. DeJong, Learning Bayesian Networks from Incomplete Data using Evolutionary Algorithms, Fifteen Conference on Uncertainty in Artificial Intelligence, Toronto, Canada, 1999
  • [24] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005.
  • [25] Z. Zheng, X. Wu, R. Srihari, Feature Selection for Text Categorization on Imbalanced Data, SIGKDD Explorations, 2002, pp. 80-89.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 LinkPredictionUsignSupervisedLearningMohammed J. Zaki
Mohammad Al Hasan
Vineet Chaoji
Saeed Salem
Link Prediction Using Supervised LearningProceedings of the SDM 2006 Workshop on Link Analysishttp://www.cs.rpiscrews.us/~zaki/PS/LINK06.pdf2006