2008 AGraphKernelForPPIExtraction

Jump to: navigation, search

Subject Headings: Protein-Protein Interaction Extraction Task, Supervised Graph Kernel Algorithm.


Cited By



In this paper, we propose a graph kernel based approach for the automated extraction of protein-protein interactions (PPI) from scientific literature. In contrast to earlier approaches to PPI extraction, the introduced all dependency-paths kernel has the capability to consider full, general dependency graphs. We evaluate the proposed method across five publicly available PPI corpora providing the most comprehensive evaluation done for a machine learning based PPI-extraction system.

Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, achieving 56.4 F-score and 84.8 AUC on the AImed corpus. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources.


  • Andrew P. Bradley. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159.
  • Razvan C. Bunescu and Raymond Mooney. (2005). A shortest path dependency kernel for relation extraction. In: Proceedings of HLT/EMNLP’05, pages 724–731.
  • Razvan C. Bunescu and Raymond Mooney. (2006). Subsequence kernels for relation extraction. In: Proceedings of NIPS’05, pages 171–178. MIT Press.
  • Razvan C. Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond Mooney, Arun Kumar Ramani, and Yuk Wah Wong. (2005). Comparative experiments on learning information extractors for proteins and their interactions. Artif Intell Med, 33(2):139–155.
  • Eugene Charniak and Matthew Lease. (2005). Parsing biomedical literature. In: Proceedings of IJCNLP’05, pages 58–69.
  • Andrew Brian Clegg and Adrian Shepherd. (2007). Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics, 8(1):24.
  • Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. (2006). Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC’06, pages 449–454.
  • J. Ding, D. Berleant, D. Nettleton, and E. Wurtele. (2002). Mining MEDLINE: abstracts, sentences, or phrases? In: Proceedings of PSB’02, pages 326–337.
  • (Fundel et al., 2007) ⇒ Katrin Fundel, Robert Kuffner, and Ralf Zimmer. (2007). “RelEx - Relation Extraction Using Dependency Parse Trees.” In: Bioinformatics 23(3) (Bioinformatics).
  • Thomas G¨artner, Peter A. Flach, and Stefan Wrobel. (2003). On graph kernels: Hardness results and efficient alternatives. In COLT’03, pages 129–143. Springer.
  • (Giuliano et al., 2006) ⇒ Claudio Giuliano, Alberto Lavelli, and Lorenza Romano. (2006). “Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature.” In: Proceedings of EACL-2006 (EACL 2006).
  • James A. Hanley and B. J. McNeil. (1982). The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36.
  • Lawrence Hunter, Zhiyong Lu, James Firby, William A. Baumgartner, Helen L Johnson, Philip V. Ogren, and K. Bretonnel Cohen. (2008). OpenDMAP: An opensource, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-specific gene expression. BMC Bioinformatics, 9(78).
  • Jin-Dong Kim, Tomoko Ohta, and Jun'ichi Tsujii. (2008). Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(10).
  • Carl D. Meyer. (2000). Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics.
  • Tomohiro Mitsumori, Masaki Murata, Yasushi Fukuda, Kouichi Doi, and Hirohumi Doi. (2006). Extracting protein-protein interaction information from biomedical text with svm. IEICE - Trans. Inf. Syst., E89-D(8):2464–2466.
  • Claire N´edellec. (2005). Learning language in logic - genic interaction extraction challenge. In: Proceedings of LLL’05.
  • Tapio Pahikkala, Jorma Boberg, and Tapio Salakoski. 2006a. Fast n-fold cross-validation for regularized least-squares. In: Proceedings of SCAI’06, pages 83–90.
  • Tapio Pahikkala, Evgeni Tsivtsivadze, Jorma Boberg, and Tapio Salakoski. 2006b. Graph kernels versus graph representations: a case study in parse ranking. In: Proceedings of the ECML/PKDD’06 workshop on Mining and Learning with Graphs.
  • Sampo Pyysalo, Filip Ginter, Juho Heimonen, Jari Bj¨orne, Jorma Boberg, Jouni J¨arvinen, and Tapio Salakoski. 2007a. BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(50).
  • Sampo Pyysalo, Filip Ginter, Veronika Laippala, Katri Haverinen, Juho Heimonen, and Tapio Salakoski. 2007b. On the unification of syntactic annotations under the stanford dependency scheme: A case study on BioInfer and GENIA. In: Proceedings of BioNLP’07, pages 25–32.
  • Sampo Pyysalo, Antti Airola, Juho Heimonen, Jari Bj¨orne, Filip Ginter, and Tapio Salakoski. (2008). Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, special issue, 9(Suppl 3):S6.
  • Ryan Rifkin, Gene Yeo, and Tomaso Poggio, (2003). Regularized Least-squares Classification, volume 190 of NATO Science Series III: Computer and System Sciences, chapter 7, pages 131–154. IOS Press.
  • Rune Sætre, Kenji Sagae, and Jun'ichi Tsujii. (2008). Syntactic features for protein-protein interaction extraction. In: Proceedings of LBM’07, volume 319, pages6.1–6.14.
  • Akane Yakushiji, Yusuke Miyao, Yuka Tateisi, and Jun'ichi Tsujii. (2005). Biomedical information extraction with predicate-argument structure patterns. In: Proceedings of SMBM’05, pages 60–69.
  • Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). Kernel methods for relation extraction. J. Mach. Learn. Res., 3:1083–1106.,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 AGraphKernelForPPIExtractionAntti Airola
Sampo Pyysalo
Jari Björne
Tapio Pahikkala
Filip Ginter
Tapio Salakoski
A Graph Kernel for Protein-Protein Interaction Extractionhttp://aclweb.org/anthology-new/W/W08/W08-0601.pdf2008