2005 NetKitSRL

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

~7 http://scholar.google.com/scholar?cites=6821997904229989497

Quotes

Abstract

  • This paper describes NetKit-SRL, or NetKit for short, a toolkit for learning from and classifying networked data. The toolkit is open-source and publicly available. It is modular and built for ease of plug-and-play — such that it is easy to add new modules and have them interact with other existing modules. Currently available NetKit modules are focused on “batch” within-network learning and classification: given a partially labeled network, where all nodes and edges are already known to exist, estimate the class membership probability of the unlabeled nodes in the network. NetKit has been used in various network domains such as websites, citation graphs, movies and social networks.

Introduction

2.2 PriorWork

  • For machine learning research on networked data, the watershed paper of Chakrabarti et al. (1998) studied classifying webpages based on the text and (possibly inferred) class labels of neighboring pages, using relaxation labeling paired with naive Bayes local and relational classifiers. In their experiments, using the link structure substantially improved classification over using the local (text) information alone. Further, considering the text of the neighbors generally hurt performance (based on the methods they used), whereas using only the (inferred) class labels improved performance. More recently, Lu & Getoor (2003) investigated network classification applied to linked documents (webpages and published manuscripts with an accompanying citation graph). They used the text of the document as well as a relational classifier.
  • Univariate within-network classification has been considered previously (Bernstein et al., 2002; Macskassy & Provost, 2003). Using business news, Bernstein et al. (2003). linked companies if they co-occurred in a news story. They demonstrated the effectiveness of various vector-space techniques for network classification of companies into industry sectors. Other domains such as webpages, movies and citation graphs have also been considered for univariate within-network classification; Macskassy & Provost (2003) investigated how well the univariate classification performed as varying amounts of data initially were labeled.
  • Markov Random Fields (MRFs) have been used extensively for univariate network classification for vision and image restoration. Nodes in the network are pixels in an image and the labels are image-related such as whether a pixel is part of a vertical or horizontal border (Geman & Geman, 1984; Winkler, 2003). One popular method to compute the MRF joint probability is Gibbs sampling (Geman & Geman, 1984). The most common use of Gibbs sampling in vision is not to compute the final posteriors as we do in NetKit, but rather to get final classifications. Graph-cut techniques recently have been used in vision research as an alternative to using Gibbs sampling (Boykov et al., 2001), iteratively changing the labelings of many nodes at once by solving a min-cut/max-flow problem based on the current labelings.
  • Several recent methods apply to learning in networked data, beyond the homogeneous, univariate case treated in this paper. Conditional Random Fields (CRFs) (Lafferty et al., 2001) are an extension of MRFs where labels are conditioned not only on the labels of neighbors, but also on the attributes of the node itself and the attributes of the neighborhood nodes. There has been a considerable amount of work studying Probabilistic Relational Models, such as Relational Bayesian Networks (RBNs)3 (Koller & Pfeffer, 1998; Taskar et al., 2001), Relational Dependency Networks (RDNs) (Neville & Jensen, 2004), and Relational Markov Networks (RMNs) (Taskar et al., 2002). The above systems use only a few of the many relational learning techniques proposed in the literature. There are many more, for example from the rich literature of inductive logic programming (ILP) (e.g. (Flach & Lachiche, 1999; Dzeroski & Lavrac, 2001; Kramer et al., 2001; Domingos & Richardson, 2004)), or based on using relational database joins to generate relational features (e.g. (Perlich & Provost, 2003; Popescul & Ungar, 2003)). These techniques could be the basis for additional relational model components in NetKit.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 NetKitSRLFoster Provost
Sofus A. Macskassy
NetKit-SRL: A Toolkit for Network Learning and Inference and its use for classification of networked datahttp://www.casos.cs.cmu.edu/events/conferences/2005/2005 proceedings/Macskassy2.pdf