2003 OnKernelMethodsForRelLearning

(Cumby & Roth, 2003c) ⇒ Chad Cumby, and Dan Roth. (2003). “On Kernel Methods for Relational Learning.” In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003).

Subject Headings: Parameterized Kernel, Relational Kernel, Graph Kernel.

Notes

Cited By

~110+

2004

(Culotta and Sorensen, 2004) ⇒ Aron Culottaand J. S. Sorensen. (2004). “<>Dependency Tree Kernels for Relation Extraction.” In: Proceedings ofACL 2004.
- tree kernel variants are described in (Zelenko et al., 2003; Collins and Duffy, 2002; Cumby and Roth, 2003).

Quotes

Abstract

Kernel methods have gained a great deal of popularity in the machine learning community as a method to learn indirectly in high-dimensional feature spaces. Those interested in relational learning have recently begun to cast learning from structured and relational data in terms of kernel operations.

We describe a general family of kernel functions built up from a description language of limited expressivity and use it to study the benefits and drawbacks of kernel learning in relational domains. Learning with kernels in this family directly models learning over an expanded feature space constructed using the same description language. This allows us to examine issues of time complexity in terms of learning with these and other relational kernels, and how these relate to generalization ability. The tradeoffs between using kernels in a very high dimensional implicit space versus a restricted feature space, is highlighted through two experiments, in bioinformatics and in natural language processing.

Introduction

Haussler's work on convolution kernels (Haussler, 1999) introduced the idea that kernels could be built to work with discrete data structures iteratively from kernels for smaller composite parts. These kernels followed the form of a generalized sum over products – a generalized convolution. Kernels were shown for several discrete datatypes including strings and rooted trees, and more recently (Collins & Duffy, 2002) developed kernels for datatypes useful in many NLP tasks, demonstrating their usefulness with the Voted Perceptron algorithm (Freund & Schapire, 1998).

While these past examples of relational kernels are formulated separately to meet each problem at hand, we seek to develop a flexible mechanism for building kernel functions for many structured learning problems based on a unified knowledge representation. At the heart of our approach is a definition of a relational kernel that is specified in a \syntax-driven" manner through the use of a description language. (Cumby & Roth, 2002) introduced a feature description language and have shown how to use propositional classifiers to successfully learn over structured data, and produce relational representation, in the sense that different data instantiations yield the same features and have the same weights in the linear classifier learned. There, as in (Roth & Yih, 2001), this was done by significantly blowing up the relational feature-space.

…

References

Ben-David, S., & Simon, N. E. U. H. (2002). Limitations of learning via embeddings in euclidean half-spaces. JMLR.
Blum, A. (1992). Learning boolean functions in an infinite attribute space. Machine Learning, 9, 373{386.
Carlson, A., Cumby, C., Rosen, J., & Dan Roth. (1999). The SNoW learning architecture (Technical Report UIUCDCS-R-99-2101). UIUC Computer Science Dep.
Michael Collins, & Du y, N. (2002). New ranking algorithms for parsing and taggins: Kernels over discrete structures, and the voted perceptron. Proceedings of ACL 2002.
Cristianini, N., & John Shawe-Taylor (2000). An introduction to support vector machines. Cambridge Press.
Cumby, C., & Dan Roth. (2002). Learning with feature description logics. Proceedings of the 12th International Conference on Inductive Logic Programming.
Cumby, C., & Dan Roth. (2003a). Feature extraction languages for propositionalized relational learning. IJCAI'03 Workshop on Learning Statistical Models from Relational Data.
Cumby, C., & Dan Roth. (2003b). Kernel methods for relational learning (Technical Report UIUCDCS-R-2003-2345). UIUC Computer Science Department.
Yoav Freund, & Robert E. Schapire (1998). Large margin classification using the perceptron algorithm. Computational Learing Theory (pp. 209{217).
D. Haussler. (1999). Convolution kernels on discrete structures (Technical Report UCSC-CRL-99-10). Univerisity of California - Santa Cruz.
Khardon, R., Dan Roth, & Servedio, R. (2001). Efficiency versus convergence of boolean kernels for on-line learning algorithms. NIPS-14.
Khardon, R., Dan Roth, & Valiant, L. G. (1999). Relational learning for NLP using linear threshold elements. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 911{917).
Kramer, S., Lavrac, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In
S. Dzeroski and N. Lavrac (Eds.), Relational data mining. Springer Verlag.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285{318.
Novikoff, A. (1963). On convergence proofs for perceptrons. Proceeding of the Symposium on the Mathematical Theory of Automata (pp. 615{622).
Dan Roth. (1998). Learning to resolve natural language ambiguities: A uni ed approach. National Conference on Artificial Intelligence (pp. 806{813).
Dan Roth. (1999). Learning in natural language. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 898{904).
Dan Roth, & Yih, W. (2001). Relational learning via propositional algorithms: An information ext raction case study. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 1257{1263).
Srinivasan, A., Mugleton, S., King, R. D., & Sternberg, M. (1996). Theories for mutagenicity: a study of first order and feature based induction. Artificial Intelligence, 85(1-2), 277{299.
(Weston et al., 2000) ⇒ Weston, J., Sayan Mukherjee, Chapelle, O., Pontil, M., Poggio, T., & Vladimir N. Vapnik (2000). “Feature Selection for SVMs.” In: Neural Information Processing Systems (NIPS 2000).

Bibtex

@inproceedings{CumbyRo03,

author = {C. Cumby and Dan Roth},
title = {On Kernel Methods for Relational Learning},
booktitle = Proceedings of the International Conference on Machine Learning (ICML),
year = {2003},
pages = {107--114},
url= "http://l2r.cs.uiuc.edu/~danr/Papers/CumbyRo03.pdf",
}

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 OnKernelMethodsForRelLearning	Chad Cumby Dan Roth			On Kernel Methods for Relational Learning		Proceedings of the Twentieth International Conference on Machine Learning	http://l2r.cs.uiuc.edu/~danr/Papers/CumbyRo03.pdf			2003