2008 ConceptRecogForExtractingProteinInterRels

(Baumgartner Jr et al., 2008) ⇒ William A. Baumgartner Jr, Zhiyong Lu, Helen L Johnson, J Gregory Caporaso, Jesse Paquette, Anna Lindemann, Elizabeth K White, Olga Medvedeva, K Bretonnel Cohen, Lawrence Hunter. (2008). “Concept Recognition for Extracting Protein Interaction Relations from Biomedical Text.” In: Genome Biology supplement on The BioCreative II - Critical Assessment for Information Extraction in Biology Challenge.

Subject Headings: Gene Mention Annotation Task, OpenDMAP, Protein-Protein Interaction Recognition Algorithm.

Notes

Cited By

Quote

Abstract

Background: Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing.

Results: Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist.

Conclusion: Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet

Background

Early efforts in information extraction have focused primarily on identification of character strings and, for the most part, have not been adopted for use by biologists. We posit that a prominent factor in the biologist's reluctance to rely on current information extraction technologies is the ambiguity that remains in these extracted strings of text. For example, there are a multitude of tools that can extract gene names from text. This is a classic problem in biomedical natural language processing (BioNLP), and one that has been extensively studied [1,2]. Determining that a particular string of text in a larger document corresponds to a gene name is a challenging problem, and by no means one that should be discounted. However, from a biologist's perspective, knowing that a string of characters is a gene name leaves much to be desired. Among other things, it would be helpful to know exactly which gene and from which species the identified character string is referring. This phenomenon is not limited to the identification of gene names in text, but applies also to many of the common targets of biomedical information extraction, such as cell types, diseases, tissues, and so on.

Recently, however, efforts have shifted toward the identification of concepts as opposed to character strings [3]. Concepts differ from character strings in that they are grounded in well defined knowledge resources. Concept recognition provides the key piece of information missing from a string of text - an unambiguous semantic representation of what the characters denote. The BioCreative II tasks have provided a platform to evaluate concept recognition systems in the field of biomedical language processing. As a demonstration of the potential effectiveness of integrating concept recognition, we have constructed a protein interaction relation extraction system, the components of which were generated through participation in several of the BioCreative II tasks.

We took a modular approach to the BioCreative II tasks, building on system components from other tasks whenever possible. To facilitate component integration, we made extensive use of the Unstructured Information Management Architecture (UIMA) [4,5] framework. Four benefits accrued from this strategy. First, the complete integration of all processing steps allowed us to experiment quickly and easily with different approaches to the many subtasks involved. Second, it made it easy for us to evaluate quickly the results of these experiments against the official datasets. Third, it provided us with a clean interface for incorporating tools from other groups, including LingPipe [6], A Biomedical Named Entity Recognizer (ABNER) [7], and Schwartz and Hearst's abbreviation detection algorithm [8]. Finally, it allowed for distribution of workload over the construction of the various system components that were created.

A key focus in our work, and for the protein-protein interaction extraction task (interaction pair subtask [IPS]) in particular, was the use of a concept recognition system being developed by our group. Called Open source Direct Memory Access Parser (OpenDMAP), it is a modern implementation of the DMAP paradigm first developed by Riesbeck [9], Martin [10], and Fitzgerald [11]. The earliest descriptions of the paradigm assumed that a DMAP system would approach all levels of linguistic analysis through a single optimization procedure. In this work we show that analysis can be modularized, and even externalized, without losing the essential semantic flavor of the DMAP paradigm. Hunter and coworkers [3] have described OpenDMAP in detail.

It should be noted that the benefits of concept recognition are not limited to information extraction tasks. Concept recognition has the potential to also contribute to such areas of BioNLP as document retrieval and summarization. We will touch briefly on these during our discussion of the interaction article subtask (IAS) and the interaction sentence subtask (ISS), respectively.

Grounding the concepts: the gene normalization task

Our experiments for the GM task have demonstrated the ability to identify gene mentions in text at a relatively high level of accuracy. Identification of mentions in text, as we noted above, is only part of the concept recognition process. In order to truly recognize a concept, it is necessary to normalize (or ground) the mention to a unique entity in a well defined knowledge source. This knowledge source typically takes the form of a genomic database (for example, grounding a gene mention to a particular Entrez Gene [15] identifier) or a biological ontology (for example, associating a mention describing a molecular function with a particular Gene Ontology [16] concept). Attempts to address this problem have been met with limited success in the past [17,18] for a variety of reasons, species ambiguity being a prominent issue. The 2006 gene normalization (GN) task took steps to isolate the normalization problem by removing from the equation the often confounding question of species identification. By limiting the normalization procedure to human genes only, development efforts were able to focus solely on the task of mapping a gene mention to a lexicon of genes, namely the Entrez Gene database. It is important to note, however, that the ability to identify the species under discussion is just as important in the normalization procedure as mapping the mention to a particular gene. Elimination of this part of the GN task increases the feasibility of the task considerably. See the report by Morgan and coworkers [19] for further details on the BioCreative II GN task.

Our approach to the GN task builds upon work completed for the GM task. Briefly, gene mentions are identified and then processed into a regularized form. An attempt to find a unique Entrez Gene entry to map to is made, and if multiple entries are found then a disambiguation procedure is invoked. The primary novelty of our approach lies in the steps that we take to deal with resolving conjunctive structures.

Gene mention regularization

A set of heuristics was used to regularize all gene names and symbols in the dictionary and all gene mentions outputted by the GM system. These heuristics are based on earlier work [26,27] and on previous dictionary-based systems [28]. Table 4 shows the effects of the individual rules on performance. Use of all seven rules in sequential order resulted in a noticeable increase in F measure from 0.586 to 0.774. Mapping mentions to Entrez Gene identifiers

After the extracted gene mentions have been regularized and conjunctions have been addressed, the processed mentions are compared with all entries in the dictionary using exact string matching. If multiple matches are found, then a disambiguation procedure (discussed below) is invoked.

In addition to exact string matching, we also investigated some approximate string matching techniques. Like Fang and coworkers [28], we found that approximate matching noticeably increased search time but did not markedly improve performance.

Gene name disambiguation

Gene names and symbols can be ambiguous across species when identical names and/or symbols are used to refer to orthologous genes, or within a species when a gene name or symbol is used to represent more than one distinct gene. For example, CHED is used as a synonym for two separate Entrez Gene entries: CHED1 (EntrezGene: 8197) and CDC2L5 (EntrezGene: 8621). Because the species question has been essentially removed from the equation for the task described here, we are concerned with only the latter. It has been estimated that more than 5% of terms for a single organism are ambiguous and that approximately 85% of terms are ambiguous across species [29,30]. For the (single-species) GN task, we developed two approaches to gene name disambiguation. The first method attempts to identify 'definitions' of gene symbols, using the Schwartz and Hearst algorithm [8], which identifies abbreviations and their long forms in text. Our second approach, similar to that of Lesk [31], examines the five tokens that appear before and after the ambiguous gene. We then calculate the Dice coefficient between both the abbreviation definitions and flanking tokens and the full name of each gene candidate, as given in Entrez Gene. The gene with the highest nonzero Dice coefficient is returned. If the Dice coefficients are all zero, we return nothing.

Our results indicate that finding unabbreviated gene names or flanking words plays an important role in resolving ambiguous terms (Table 5). Moreover, this gene name disambiguation procedure can provide evidence for a term being a false gene mention. For example, STS (PMID: 11210186) is recognized as a gene mention, but its surrounding words, content mapping, and RH (Radiation Hybrid) analysis indicate that it is an experimental method. We assembled a list of words suggesting nonprotein terms such as sequence or analysis. When they were matched to a gene's unabbreviated name or its flanking words, the gene was considered a false mention.

Even with the improvement yielded by this disambiguation procedure, gene name ambiguity remains a key contributor to system error. On the development data, our precision for mentions that only matched a single Entrez entry was 0.85, whereas for ambiguous entries it was only 0.63. (Recall is difficult to compute for the two cases, because we do not know how many mentions in the gold standard are ambiguous.) Other techniques applied

To further enhance system performance, especially with regard to false-positive gene mention identification, we assembled stop word lists consisting of common English words, protein family terms, nonprotein molecules, and experimental methods. The common English stop word list included 5,000 words derived by word frequency in the Brown corpus [32]. The protein family terms were derived from an in-house manual annotation project, which annotated protein families. A list of small molecules (for example, Ca) was also added. Words found in these lists were never recognized as gene names, even if they appear in the gene dictionary.

Discussion

One goal of this work was to extend the OpenDMAP concept recognition system. We were able to do so, incorporating a number of third-party linguistic and semantic analysis tools without surrendering an essential characteristic of the DMAP paradigm: complete integration of semantic and linguistic knowledge, without segregating lexical and domain knowledge into separate components.

Our use of UIMA [4,5] as a framework for integrating the various software components used throughout our BioCreative II submissions was integral to the performances we were able to achieve. For each major component, a UIMA wrapper was created so that it could be plugged into the system. By using a standardized framework, we were not only able to distribute the tasks of development with the assurance that the pieces would work in concert once combined, but we were also able to design our systems in such a way that as they became successively more complicated, evaluation remained quick, easy, and modular. Not only was it possible to incorporate infrastructure constructed expressly for the BioCreative tasks, but it was just as easy to utilize external tools developed before the BioCreative tasks and/or by third-parties. This allowed us to benefit from LingPipe, Schwartz and Hearst's abbreviation-defining algorithm, ABNER, KeX, ABGene, and the GENIA POS tagger (op cit). Utilizing this framework provided not only a robust development architecture and production-ready execution environment, but also tremendous time savings.

The major goal of our work on this shared task, however, was to explore the integration of concept recognition in biomedical information extraction systems. The potential for information extraction is undeniable. As the breadth of knowledge in the biomedical literature continues to expand, it has become increasingly difficult for a single person to keep up with even a single specific research topic. Concept recognition techniques provide a potential remedy for this situation. As we discussed in the IAS section, the use of conceptual features could greatly benefit information retrieval as well as document classification systems. For the case of classifying protein interaction documents, defining a concept for 'experimental protein interaction detection methods' could potentially resolve some of the bias we encountered due to differences in the publication years among the training and test sets. It should be noted that there have been some contradictory reports on the benefits of using concepts, in particular in the domain of information retrieval. Results from the TREC Genomics ad hoc retrieval task in 2003 [52] pointed to the use of multiple concepts - MeSH headings, substance name fields in Medline, and species - as accounting for elevated performance. On the contrary, results from TREC Genomics 2004 [53] indicated that "[retrieval] approaches that attempted to map to controlled vocabulary terms did not fare as well." For proponents of concept recognition, this may first appear mildly disconcerting, but a closer examination of the TREC Genomics 2004 findings shows a number of factors that may be responsible for the poor performance. In particular, the systems classified as 'conceptual' simply were not very good at concept recognition, or made a poor choice of concepts by relying solely on a single conceptual type. In short, the role of concepts in these systems is somewhat overstated; thus, the conclusions regarding the influence of concept use should be tempered.

Integrating concept recognition into tasks other than information retrieval or document classification also has direct implications for the benchside biologist, among others. The merging of the many genomic databases by creating new links among their respective entities has the potential to uncover previously unknown information, or make known information more accessible to a wider population of scientists. The ultimate goal of extracting different relation types from text and generating links among concepts could be the potential for hypothesis generation and testing over the known 'facts' of biomedicine. This is certainly a lofty goal, but concept recognition is a key component to achieving automatic hypothesis generation and testing, and we, as a community, have taken the first steps down this path.

As detection of currently untapped conceptual types improves, so will the benefits of integrating conceptual recognition into current information gathering technologies. The BioCreative II tasks have provided a snapshot of the state of conceptual recognition in BioNLP, and all indications are that progress is being made. However, the potential of conceptually based systems will not be fully realized until concepts can be accurately, reliably, and unambiguously extracted from text.

References

1. Yeh A, Morgan A, Marc E. Colosimo, Lynette Hirschman: BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics 2005, 6(suppl 1):1471-2105. OpenURL
2. Smith L, Tanabe LK, nee Ando RJ, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich CM, Ganchev K, Torii M, Liu H, Haddow B, Struble CA, Povinelli RJ, Vlachos A, Baumgartner WA Jr, Hunter L, Carpenter B, Tsai RTH, Dai HJ, Liu F, Chen Y, Sun C, Sophia Katrenko, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, Divoli A, Mana-Lopez M, Mata-Vazquez J, Wilbur WJ: Overview of BioCreative II gene mention recognition. Genome Biol 2008, 9(Suppl 2):S2. OpenURL
3. Hunter L, Lu Z, Firby JR, Baumgartner WA Jr, Johnson HL, Ogren PV, Cohen KB: OpenDMAP: An open-source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC Bioinformatics 2008, 9:78. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL
4. Ferrucci D, Lally A: UIMA: an architectural approach to unstructured information processing in the corporate. Nat Lang Eng 2004, 10:327-348. OpenURL
5. Mack R, Mukherjea S, Soffer A, Uramoto N, Brown E, Coden A, Cooper J, Inokuchi A, Iyer B, Mass Y, Matsuzawa H, Subramaniam L: Text analytics for life science using the Unstructured Information Management Architecture. IBM Syst J 2004, 43:490-515. OpenURL
6. (Carpenter, 2004) ⇒ Bob Carpenter. 2004. "Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval.” In: Proceedings of the 13th Meeting of the Text Retrieval Conference (TREC).
7. Settles B: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 2005, 21:3191-3192. PubMed Abstract | Publisher Full Text OpenURL
8. Schwartz AS, Hearst MA: A simple algorithm for identifying abbreviation definitions in biomedical text. Pac Symp Biocomput 2003, 451-462. PubMed Abstract OpenURL
9. Riesbeck C: From conceptual analyzer to direct memory access parsing: an overview. In Advances in Cognitive Sciences. Edited by: Sharkey N. Ellis Horwood Limited, Chichester, UK; 1986:236-258. OpenURL
10. Martin CE: Direct memory access parsing. PhD thesis. Yale University; (1991). OpenURL
11. Fitzgerald W: Building embedded conceptual parsers. PhD thesis. Northwestern University; (1994). OpenURL
12. Hatzivassiloglou V, Duboue P, Rzhetsky A: Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics 2001, 17:97-106. OpenURL
13. Baumgartner WA Jr, Lu Z, Johnson HL, Caporaso JG, Paquette J, Lindemann A, White EK, Medvedeva O, Cohen KB, Hunter L: An integrated approach to concept recognition in biomedical text. [1] webcite Proceedings of the Second BioCreative Challenge Evaluation Workshop; 23 to 25 April 2007; Madrid, Spain OpenURL
14. Kinoshita S, Cohen KB, Ogren PV, Hunter L: BioCreAtIvE task1A: entity identification with a stochastic tagger. BMC Bioinformatics 2005, 6(suppl 1):1471-2105. OpenURL
15. Maglott D, Ostell J, Pruitt K, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2007, (35 Database):D26. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL
16. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25:25-9. PubMed Abstract | Publisher Full Text OpenURL
17. Lynette Hirschman, Marc E. Colosimo, A. Morgan, and A. Yeh. (2005). “Overview of BioCreAtIvE task 1B: normalized gene lists.” In: BMC Bioinformatics 2005, 6(suppl 1).
18. Blaschke C, Leon EA, Krallinger M, Valencia A: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005, 6(suppl 1):S16. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL
19. Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, Liu H-h, Torres R, Krauthammer M, Lau WW, Liu H, Hsu C-N, Schuemie M, Cohen KB, Lynette Hirschman: Overview of BioCreative II gene normalization. Genome Biol 2008, 9(Suppl 2):S3. OpenURL
20. Fukuda K, Tamura A, Tsunoda T, Takagi T: Toward information extraction: identifying protein names from biological papers. Pac Symp Biocomput 1998, 707-718. PubMed Abstract OpenURL
21. Tanabe L, Wilbur WJ: Tagging gene and protein names in biomedical text. Bioinformatics 2002, 18:1124-1132. PubMed Abstract | Publisher Full Text OpenURL
22. Buyko E, Tomanek K, Hahn U: Resolution of coordination ellipses in complex biological named entity mentions using conditional random fields. [http:/ / mandrake.csse.unimelb.edu.au/ pacling2007/ files/ final/ 23/ 23_Paper_meta.pdf] webcite Proceedings of the ISMB BioLINK Workshop (2007). OpenURL
23. Lu Z: Text mining on GeneRIFs. PhD thesis. University of Colorado School of Medicine; (2007). OpenURL
24. Gene FTP site [2] webcite OpenURL
25. UniProt [3] webcite OpenURL
26. Cohen AM: Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. [4] webcite Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics (2005). OpenURL
27. Cohen KB, Dolbey AE, Acquaah-Mensah GK, Hunter L: Contrast and variability in gene names. In: Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain; Philadelphia, PA. Morristown, NJ: Association for Computational Linguistics; 2002:14-20. OpenURL
28. Fang H, Kevin Murphy, Jin Y, Kim JS, White PS: Human gene name normalization using text matching with automatically extracted synonym dictionaries. [5] webcite Proceedings of the BioLNP Workshop on Linking Natural Language Processing and Biology (2006). OpenURL
29. Tuason O, Chen L, Liu H, Blake JA, Friedman C: Biological nomenclatures: a source of lexical knowledge and ambiguity. Pac Symp Biocomput 2004, 238-249. PubMed Abstract OpenURL
30. Chen L, Liu H, Friedman C: Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 2005, 21:248-256. PubMed Abstract | Publisher Full Text OpenURL
31. Michael E. Lesk: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. [http:/ / zeno.ling.gu.se/ kurshemsidor/ komputationell-syntax-och-semantik/ artiklar/ Lesk-1986a.pdf] webcite Proceedings of the 1986 SIGDOC Conference (1987). OpenURL
32. Francis W, Kucera H: Brown Corpus Manual. Providence, Rhode Island: Brown University; 1964. OpenURL
33. Ian H. Witten, E. Frank. (2005). “Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition. Morgan Kaufmann
34. Caporaso GJ, Baumgartner WA Jr, Cohen BK, Johnson HL, Paquette J, Hunter L: Concept recognition and the TREC genomics tasks. [6] webcite The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings (2005). OpenURL
35. Cohen A, Bhupatiraju R, Hersh W: Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. [7] webcite Proceedings of The Thirteenth Text REtrieval Conference (TREC 2004) (2004). OpenURL
36. Caporaso J, Baumgartner W Jr, Kim H, Lu Z, Johnson H, Medvedeva O, Lindemann A, Fox L, White E, Cohen K, Hunter L: Concept recognition, information retrieval, and machine learning in genomics question-answering. [8] webcite Proceedings of The Fifteenth Text REtrieval Conference (TREC 2006) (2006). OpenURL
37. Edmundson HP: New methods in automatic extracting. J Assoc Comput Machinery 1969, 16:264-285. OpenURL
38. Lu Z, Cohen KB, Hunter L: Finding GeneRIFs via Gene Ontology annotations. Pac Symp Biocomput 2006, 52-63. PubMed Abstract OpenURL
39. IntAct [9] webcite OpenURL
40. MINT [10] webcite OpenURL
41. Chatr-aryamontri A, Kerrien S, Khadake J, Orchard S, Ceol A, Licata L, Castagnoli L, Costa S, Derow C, Huntley R, Aranda B, Leroy C, Thorneycroft D, Apweiler R, Cesareni G, Hermjakob H: MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data. Genome Biology 2008, 9(Suppl 2):S5. OpenURL
42. Krallinger M, Florian Leitner, Rodriguez-Penagos C, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 2008, 9(Suppl 2):S4. OpenURL
43. Yoshimasa Tsuruoka, Tateishi Y, Kim JD, Ohta T, McNaught J, Sophia Ananiadou, Tsuji J: Developing a robust part-of-speech tagger for biomedical text. [11] webcite Advances in informatics - 10th Panhellenic Conference on Informatics (2005). OpenURL
44. OpenDMAP: Open source Direct Memory Access Parser [12] webcite OpenURL
45. Noy NF, Sintek M, Decker S, Crubezy M, Fergerson RW, Musen MA: Creating semantic web contents with Protege-2000. IEEE Intelligent Systems 2001, 2:60-71. OpenURL
46. Ravichandran D, Hovy E: Learning surface text patterns for a question answering system. [http:/ / www.isi.edu/ natural-language/ projects/ webclopedia/ pubs/ 02ACL-patterns.pdf] webcite Proceedings of the ACL Conference (2002). OpenURL
47. BioNLP Corpora [13] webcite OpenURL
48. Blaschke C, Andrade MA, Ouzounis C, Valencia A: Automatic extraction of biological information from scientific text: protein-protein interactions. [http:/ / citeseer.ist.psu.edu/ cache/ papers/ cs/ 12608/ http:zSzzSzgredos.cnb.uam.eszSzmedl ine_interactionszSzCBlaschke99.pdf/ blaschke99automatic.pdf] webcite Intelligent Systems for Molecular Biology (1999). OpenURL
49. Johnson HL, Baumgartner WA Jr, Krallinger M, Cohen KB, Hunter L: Corpus refactoring: a feasibility study. J Biomed Discovery Collab 2007, 2:4. OpenURL
50. Plake C, Hakenberg J, Leser U: Optimizing syntax patterns for discovering protein-protein interactions. In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing. New York, NY. ACM Press; 2005:195-201. OpenURL
51. Prodisen [14] webcite OpenURL
52. Hersh W, Bhupatiraju RT: TREC genomics track overview. [15] webcite Proceedings of The Twelfth Text REtrieval Conference (TREC 2003) (2003). OpenURL
53. Hersh W, Bhupatiraju R, Ross L, Roberts P, Cohen A, Kraemer D: Enhancing access to the Bibliome: the TREC 2004 Genomics Track. J Biomed Discovery Collab 2006, 1:3. OpenURL,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 ConceptRecogForExtractingProteinInterRels	William A. Baumgartner Jr Zhiyong Lu Helen L Johnson J Gregory Caporaso Jesse Paquette Anna Lindemann Elizabeth K White Olga Medvedeva K Bretonnel Cohen Lawrence Hunter			Concept Recognition for Extracting Protein Interaction Relations from Biomedical Text		Genome Biology supplement on The BioCreative II - Critical Assessment for Information Extraction in Biology Challenge	http://genomebiology.com/2008/9/S2/S9			2008