Keywords: Relation Recognition from Text Algorithm, Protein-Protein Interaction
Quotes
Abstract
- MOTIVATION: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases.
- RESULTS: We developed RelEx, an approach for relation extraction from free text. It expands on natural language preprocessing by applying a small number of simple rules to achieve competitive recall and precision. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with relations of proteins and extracted approx. 150.000 relations.
- AVAILABILITY: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our web-site (http://www.bio.ifi.lmu.de/publications/RelEx/).
1. Introduction
- "The simplest approach is the detection of co-occurrences of entities from within sentences or abstracts (Ding et al., 2002; Jelier et al., 2005; Jenssen et al., 2001). It relies on the hypothesis that entities which are repeatedly mentioned together are somehow related. Extracted relations exhibit high sensitivity but very low specificity. Generally, the type and direction of the relation cannot be determined."
- "Pattern based extraction approaches (Blaschke et al., 1999; Blaschke and Valencia, 2001; Leroy and Chen, 2002; Ono et al., 2001) were set up to increase specificity, yet they achieve significantly lower recall."
- "As an extension to standard relation extraction pipelines, we propose the use of dependency parse trees (Klein and Manning, 2002, 2003; Mel’cuk, 1988) as a means for biomedical relation extraction. Dependency parse trees reveal non-local dependencies within sentences, i.e. between words that are far apart in a sentence. Sentences of biomedical texts tend to be long and complicated and frequently mention a number of possible effectors and effectees. Dependency parse trees provide a useful structure for the sentences by annotating edges with dependency types, e.g. subject, auxiliary, modifier."
Notes
- Gene and protein names are identified by ProMiner (Hanisch et al., 2005)
- Ranked #1 for two of three NER tests in BioCreative
- based on matching to a synonym dictionary (Fundel and Zimmer, 2006)
References
- Blaschke,C. et al. (1999) Automatic extraction of biological information from scientific text: protein-protein interactions. Proc. Int. Conf. Intell. Syst. Mol. Biol., 60–67.
- Blaschke,C. and Valencia,A. (2001) The potential use of suiseki as a protein interaction discovery tool. Genome Inform. Ser. Workshop Genome Inform., 12, 123–134.
- [DingBNW, 2002] => J. Ding, D. Berleant, D. Nettleton, and E. Wurtelec. (2002). Mining Medline: Abstracts, Sentences, or Phrases? Pacific Symposium on Biocomputing 7:326-337.
- [JelierJDEMMK, 2005] => R. Jelier, G. Jenster, L. C. J. Dorssers, C. C. van der Eijk, E. M. van Mulligen, B. Mons, and J. A. Kors. (2005). Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics, 21, 2049–2058.
- Jenssen,T.K. et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet., 28, 21–28.
- Fundel,K. and Zimmer,R. (2006) Gene and protein nomenclature in public databases. BMC Bioinformatics, 7, 372.
- [HanischFMZF, 2005] => D. Hanisch, K. Fundel, HT. Mevissen, R. Zimmer, and J. Fluck. (2005). http://www.biomedcentral.com/1471-2105/6/S1/S14">Prominer: rule-based protein and gene entity recognition. BMC Bioinformatics, 6 (Suppl 1), S14.
- Klein,D. and Manning,C.D. (2002) Fast exact inference with a factored model for natural language parsing. Adv. Neural Inform. Proc. Syst., 15, 3–10.
- Klein,D. and Manning,C.D. (2003) Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics.
- Leroy,G. and Chen,H. (2002) Filling preposition-based templates to capture information from medical abstracts. Pac. Symp. Biocomput., 7, 350–361.
- Mel’cuk,I. (1988) Dependency Syntax: Theory and Practice. State University Press of New York, NY.
- Ono,T. et al. (2001) Automated extraction of information on protein–protein interactions from the biological literature. Bioinformatics, 17, 155–161.
BibTeX
@article{,
author = "Katrin Fundel, Robert Küffner, Ralf Zimmer",
title = "RelEx--relation extraction using dependency parse trees",
journal = "Bioinformatics",
volume = "23(3)",
pages = "365-371",
month = feb,
year = 2007}