Biomedical Information Extraction (IE) Task

Jump to navigation Jump to search

A Biomedical Information Extraction (IE) Task is an medical IE task for biomedical information from biomedical literature.




  • (Yang et al., 2009) ⇒ Zhihao Yang, Hongfei Lin, Baodong Wu. (2009). “BioPPIExtractor: A protein–protein interaction extraction system for biomedical literature.” In: Expert Systems with Applications, 36(2):1. doi:10.1016/j.eswa.2007.12.014
    • ABSTRACT: Automatic extracting protein–protein interaction information from biomedical literature can help to build protein relation network, predict protein function and design new drugs. This paper presents a protein–protein interaction extraction system BioPPIExtractor for biomedical literature. This system applies Conditional Random Fields model to tag protein names in biomedical text, then uses a link grammar parser to identify the syntactic roles in sentences and at last extracts complete interactions by analyzing the matching contents of syntactic roles and their linguistically significant combinations. Experimental evaluations with two other state of the art extraction systems indicate that BioPPIExtractor system achieves better performance.



  • Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry. Gene clustering by latent semantic indexing of medline abstracts . Bioinformatics, 21:104–115, 2005.
  • G . Leroy and H. Chen. Genescene: An ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts . In JASIST 2005 Special Issue on Bioinformatics, 2005.
  • Ryan Mcdonald, [[Fernando Pereira]], and [[Seth Kulick]]. Simple algorithms for complex relation extraction simple algorithms for complex relation extraction with applications to biomedical ie . In ACL-05, pages 491–498, 2005.
  • Conrad Plake, [[Jörg Hakenberg]], and [[Ulf Leser]]. Optimizing syntax-patterns for discovering protein-protein-interactions . In Proc ACM Symposium on Applied Computing, SAC, Bioinformatics Track, Santa Fe, USA , March 2005.
  • Jeffrey T. Chang and [[Russ B. Altman]]. Extracting and characterizing gene-drug relationships from the literature. Pharmacogenetics, 14(9):577–586, Sept. 2004.
  • D . P. A. Corney, Langdon W.B. Buxton, B. F., and D. T Jones. BioRAT: Extracting Biological Information from Full-length Papers, volume 20 of 17. Bioinformatics, 2004.
  • Takaaki Hasegawa, [[Satoshi Sekine]], and [[Ralph Grishman]]. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL'04), pages 415–422, Barcelona, Spain , July 2004.
  • R . Mack, S. Mukherjea, A. Soffer, N. Uramoto, A. Coden E. Brown, J. Cooper, A. Inokuchia, B. Iyer, Y. Mass, H. Matsuzawa, and L. V. Subramaniam. Text analytics for life science using the unstructured information management architecture. IBM Systems Journal, 34(3):490–515, 2004.
  • Barbara Rosario and [[Marti Hearst]]. Classifying semantic relations in bioscience texts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL'04), Barcelona, Spain , July 2004.
  • Razvan Bunescu, Ruifang Ge, Rohit J. Kate, [[Raymond Mooney]], Yuk Wah Wong, Edward M. Marcotte, and Arun Kumar Ramani. Learning to extract proteins and their interactions from medline abstracts. In Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics, pages 46–53, Washington DC, Augest 2003.
  • Nikolai Daraselia, Sergei Egorov, Andrey Yazhuk, and Svetlana Novichkova. Extracting human protein interactions from medline using a full-sentence parser. Bioinformatics, 19(0):1–8, 2003.
  • J . A. Dickerson, D. Berleant, Z. Cox, W. Qi, and E. Wurtele. Creating metabolic network models using text mining and expert knowledge. Computational Biology and Genome Informatics, pages 207–238, 2003.
  • I . Donaldson, J. Martin, B. Bruijn, and C. Wolting. Prebind and textomy - mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4(11), 2003.
  • R . Gaizauskas, G. Demetriou, P. J. Artymiuk, and P. Willett. Protein structures and information extraction from biological texts: The pasta system. Bioinformatics, 19(1):135 – 143, 2003.
  • J. Hosaka, J. Koh, and A. Konagaya. Effect of utilizing terminology on extraction of protein-protein interaction information from biomedical literature. In Proceedings of the 10th Conference of the European Chapter of the Association for Computer Linguistics (EACL'03), page 107¨C110, Budapest, Hungary , ([[2003]]). ACL, East Stroudsburg, PA.
  • D . Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Improving literature based discovery support by genetic knowledge integration. Stud. Health Technol. Inform., 95:68–73, 2003.
  • Gondy Leroy, Hsinchun Chen, Jesse D. Martinez, Shauna Eggers, Ryan Falsey, Kerri Kislin, Zan Huang, Jiexun Li, Jie Xu, Daniel McDonald, and Gavin Ng. Genescene: Biomedical text and data mining. In Proceedings of the third ACM/IEEE-CS joint conference on Digital libraries, pages 116 – 118, Houston, Texas, 2003.
  • S . Novichkova, S. Egorov, and N. Daraselia. Medscan: a natural language processing engine for medline abstracts. Bioinformatics, 19(13):1699–1706, 2003.
  • Cheng Niu [[Rohini K Srihari]], Wei Li and Thomas Cornell. Infoxtract: A customizable intermediate level information extraction engine. In HLT-HLT-NAACL 2003 Workshop: Software Engineering and Architecture of Language Technology Systems, pages 51–58, Edmonton, Canada , May-June 2003.
  • R . K. Srihari, M. E. Ruiz, and M. Srikanth. Concept chain graphs: A hybrid ir framework for biomedical text mining. In Workshop on Text Analysis and Search for Bioinformatics, SIGIR'03, Toronto, Canada , 2003.
  • Takayuki Takahata, Yasuhiro Kouchi, Kaoru Asano, and Toshihisa Takagi. Disease-associated genes extraction from literature database. Genome Informatics, 14:703–704, 2003.
  • C . Blaschke and A. Valencia. The frame-based module of the suiseki information extraction system. IEEE Intelligent Systems, 17(2):14–20, 2002.
  • Ronen Feldman, Yizhar Regev, Michal Finkelstein-Landau, Eyal Hurvitz, and Boris Kogan. Mining biomedical literature using information extraction. Current Drug Discovery, pages 19–23, October 2002.
  • Y. Fu, T. Bauer, J. Mostafa, M. Palakal, and S. Mukhopadhyay. Concept extraction and association from cancer literature. In Proceedings of the 4th international workshop on Web information and data management, pages 100–103, McLean, Virginia, USA , 2002.
  • U . Hahn and M. Romacker. Creating knowledge repositories from biomedical reports: The medsyndikate text mining system. In Proceedings PSB 2002, pages 338–349, 2002.
  • G . Leroy and H. Chen. Filling preposition-based templates to capture information from medical abstracts. In Proceedings of the Pacific Symposium on Biocomputing '02 (PSB'02), page 350¨C361, Lihue, HI, 2002.
  • Carolina Perez-Iratxeta, Peer Bork, and Miguel A. Andrade. Association of genes to genetically inherited diseases using data mining. Nature Genetics, 31:316–319, July 2002.
  • J . [[James Pustejovsky]] Castano, J. Zhang, M. Kotecki, and B. Cochran. Robust relational parsing over biomedical literature: Extracting inhibit relations. In Proceedings of the Pacific Symposium on Biocomputing, pages 362–373, 2002.
  • H . Shatkay, S. Edwards, and M Boguski. Information retrieval meets gene analysis. Information Retrieval Meets Gene, 17(2):45–53, 2002.
  • C . Friedman, P. Kra, M.Krauthammer, H.Yu, and A.Rzhetsky. Genies: a natural-langauge processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 17(1):74–82, 2001.
  • T . K. Jenssen, A. Laegreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28:21–28, May 2001.
  • E. Marcotte, I. Xenarios, and D. Eisenberg. Mining literature for protein-protein interactions. Bioinformatics, 17(4):359¨C363, 2001.
  • Jong C. Park, Hyun Sook Kim, and Jung Jae Kim. Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar. Pac Symp Biocomput, 6:296–407, 2001.
  • Limsoon Wong. Pies, a protein interaction extraction system. In Proceedings of the sixth Pacific Symposium on Biocomputing (PSB 2001), pages 520–531, 2001.
  • A . Yakushiji, Y. Tateisi Y. Miyao, and [[Jun'ichi Tsujii]]. Event extraction from biomedical papers using a full parser. In Proceedings of the sixth Pacific Symposium on Biocomputing (PSB 2001), pages 408–419, ([[2001]]). ( PDF )
  • K . Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science journal articles. In Pacific Symposium on Biocomputing 5, pages 502–513, 2000.
  • D. Proux, F. Rechenmann, and L. Julliard. A pragmatic information extraction strategy for gathering data on genetic interactions. In Proceedings of International Conference on Intelligent System of Molecular Biology, pages 279–285, 2000.
  • T . Rindflesch, J. Rajah, and L. Hunter. Extracting molecular binding relationships from biomedical text. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP-NAACL 2000), page 188¨C195, Seattle, WA, ACL, East Stroudsburg, PA, 2000.
  • T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter. Edgar: Extraction of drugs, genes and relations from the biomedical literature. In In: Proceedings of 5th Pacific Symposium on Biocomputing, pages 514–525, 2000.
  • B . J. Stapley and G. Benoit. Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in medline abstracts. In Proceedings of the fifth Pacific Symposium on Biocomputing (PSB 2000), pages 529–40, 2000.
  • James Thomas, David Milward, Chirstos Ouzounis, Stephen Pulman, and Mark Carroll. Automatic extraction of protein interactions from scientific abstracts. In Pac Symp Biocomput., pages 541–552, 2000.
  • C . Blaschke, M. Andrade, C. Ouzounis, and A. Valencia. Automatic extraction of biological information from scientific text: Protein-protein interactions. In Proceedings of the AAAI Conference on Intelligent Systems in Microbiology (ISMB '99), page 60¨C77, Heidelberg, Germany, AAAI, Menlo Park, CA, 1999.
  • M . Craven and J. Kumlien. Constructing biological knowledge-bases by extracting information from text sources. Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany ([[1999]]).
  • S . Ng and M. Wong. Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Informatics, 10:104–112, 1999.
  • T . Ono, H. Hishigaki, A. Tanigami, and T. Takagi. Automatic extraction of information on protein-protein interaction from scientific literature. Genome Informatics, Universal Academy Press, pages 296–297, 1999.
  • L. Tanabe, U. Scherf, L. H. Smith, J. K. Lee, L. Hunter, and J. N. Weinstein. Medminer: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques, 27:1210–1217, 1999.
  • T . Sekimizu, H.S. Park, and [[Jun'ichi Tsujii]]. Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. In Genome Informatics, volume Examples of sentences. Shallow parser. Noun phrase recognizer. Algorithm to identify the arguments of verbs. 62-71, 1998.
  • Susan Jones and Janet M. Thornton. Principles of protein–protein interactions. In Proc. Natl. Acad. Sci, pages 13–20, January 1996.