2003 KernelMethodsForRelationExtraction

Jump to: navigation, search

Subject Headings: Relation Mention Recognition Algorithm, Relational Data Kernel Function


Cited By




We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text. We experimentally evaluate the proposed methods and compare them with feature-based learning algorithms, with promising results.


  • Steven P. Abney. Parsing by chunks. In Robert Berwick, Steven P. Abney, and Carol Tenny, editors, Principlebased parsing. Kluwer Academic Publishers, 1990.
  • C. Aone, L. Halverson, T. Hampton, and M. Ramos-Santacruz. SRA: Description of the IE2 system used for MUC-7. In: Proceedings of MUC-7, 1998.
  • C. Aone and M. Ramos-Santacruz. REES: A large-scale relation and event extraction system. In: Proceedings of the 6th Applied Natural Language Processing Conference, 2000.
  • A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.
  • D. M. Bikel, R. Schwartz, and R. M. Weischedel. An algorithm that learns what’s in a name. Machine Learning, 34(1-3):211–231, 1999.
  • Michael Collins. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of 40th Conference of the Association for Computational Linguistics, 2002.
  • Michael Collins and N. Duffy. Convolution kernels for natural language. In: Proceedings of NIPS-2001, 2001.
  • C. Cortes and Vladimir N. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
  • N. Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines (and Other Kernel-based Learning Methods). Cambridge University Press, 2000.
  • R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley, New York, 1973.
  • R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Seqience Analysis. Cambridge University Press, 1998.
  • Dayne Freitag and Andrew McCallum. Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the 7th Conference on Artificial Intelligence (AAAI-00) and of the 12th Conference on Innovative Applications of Artificial Intelligence (IAAI-00), pages 584–589, Menlo Park, CA, July 30– 3 (2000). AAAI Press.
  • Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.
  • T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression. Bioinformatics, 16, 2000.
  • L. Goldfarb. A new approach to pattern recognition. In Progress in pattern recognition 2. North Holland, 1985.
  • T. Graepel, R. Herbrich, and K. Obermayer. Classification on pairwise proximity data. In Advances in Neural Information Processing Systems 11, 1999.
  • (Haussler, 1999) ⇒ D. Haussler. (1999). “Convolution Kernels on Discrete Structures”. Technical Report UCSC-CLR-99-10, University of California at Santa Cruz.
  • R. A. Horn and C. A. Johnson. Matrix Analysis. Cambridge University press, Cambridge, 1985.
  • F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1997.
  • Thorsten Joachims. Text categorization with support vector machines: learning with many relevant features. European Conference Mach. Learning, ECML98, April 1998.
  • Thorsten Joachims. Learning Text Classifiers with Support Vector Machines. Kluwer Academic Publishers, Dordrecht, NL, 2002.
  • John D. Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, pages 282–289. Morgan Kaufmann, San Francisco, CA, 2001.
  • N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285, 1987.
  • (Lodhi et al., 2002) ⇒ H. Lodhi, C. Saunders, John Shawe-Taylor, N. Cristianini, and C. Watkins. (2002). “Text classification using string kernels. The Journal of Machine Learning Research, vol:2.
  • Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of 17th International Conference on Machine Learning, pages 591–598. Morgan Kaufmann, San Francisco, CA, 2000.
  • (MillerCFRSSW, 1998) ⇒ S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. (1998). “Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7.<>/i" In: Proceedings of MUC-7.
  • M. Munoz, V. Punyakanok, Dan Roth, and D. Zimak. A learning approach to shallow parsing. Technical Report 2087, University of Illinois at Urbana-Champaign, Urbana, Illinois, 1999.
  • National Institute of Standars and Technology. Proceedings of the 6th Message Undertanding Conference (MUC-7), 1998.
  • E. Pekalska, P. Paclik, and R. Duin. A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2, 2001.
  • Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1990.
  • Frank Rosenblatt. Principles of Neurodynamics: Perceptrons and the theory of brain mechanisms. Spartan Books, Washington D.C., 1962.
  • Dan Roth. Learning in natural language. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 898–904, S.F., July 31–August 6 (1999). Morgan Kaufmann Publishers.
  • Dan Roth and W. Yih. Relational learning via propositional algorithms: An information extraction case study. In Bernhard Nebel, editor, Proceedings of the seventeenth International Conference on Artificial Intelligence (IJCAI-01), pages 1257–1263, San Francisco, CA, August 4–10 (2001). Morgan Kaufmann Publishers, Inc.
  • D. Sankoff and J. Kruskal, editors. Time Warps, String Edits, and Macromolecules. CSLI Pulications, 1999.
  • C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.
  • Vladimir N. Vapnik. Statistical Learning Theory. John Wiley, 1998.
  • (Watkins, 2000) ⇒ C. Watkins. (2000). “Dynamic alignment kernels.” In: A.J. Smola, P.L. Bartlett, B. Schlkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 KernelMethodsForRelationExtractionDmitry Zelenko
Chinatsu Aone
Anthony Richardella
Kernel Methods for Relation ExtractionThe Journal of Machine Learning Researchhttp://www.jmlr.org/papers/volume3/zelenko03a/zelenko03a.pdf2003