2007 OntologyDesignForBiomedicalTextMining

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Text Mining, Ontology Structure Learning Algorithm, Ontology Population Task, Named Entity Mention Annotation Task

Notes

  • References few if any of the Relation Detection from Text Algorithms from the NLP and Data Mining community.
  • Does not provide details of the algorithms. E.g. in Section "6.5 Relation Detection" does not describe the 'rule' used to connect organisms and proteins. Must they be neighbors? If so then this rerults in low Recall.
  • Does not report how well it performs (Performance Metrics).

Cited By

Quotes

Author Keywords

Text Mining; NLP; Ontology Design; Ontology Population; Ontological NLP

Abstract

Text Mining in biology and biomedicine requires a large amount of domain-specific knowledge. Publicly accessible resources hold much of the information needed, yet their practical integration into natural language processing (NLP) systems is fraught with manifold hurdles, especially the problem of semantic disconnectedness throughout the various resources and components. Ontologies can provide the necessary framework for a consistent semantic integration, while additionally delivering formal reasoning capabilities to NLP.

In this chapter, we address four important aspects relating to the integration of ontology and NLP: (i) An analysis of the different integration alternatives and their respective vantages; (ii) The design requirements for an ontology supporting NLP tasks; (iii) Creation and initialization of an ontology using publicly available tools and databases; and (iv) The connection of common NLP tasks with an ontology, including technical aspects of ontology deployment in a text mining framework. A concrete application example — text mining of enzyme mutations — is provided to motivate and illustrate these points.

4.1.1 Named Entity Recognition

Finding Named Entities (NEs) is one of the most basic tasks in text mining. In biological texts, typical examples for NEs are Proteins, Organisms, or Chemicals. Named entity recognition, often also called semantic tagging, is a well-understood NLP task. Basic approaches to finding named entities include rule-based techniques using finite-state transducers [17, 42] and statistical taggers, e.g., using Support Vector Machines (SVMs) [32] or Hidden Markov Models (HMMs) [33].

Scientific publications and other knowledge resources containing natural language text in the biomedical domain show certain characteristics that make term recognition unusually difficult [37]. There is a high degree of term variation, partly caused by the lack of a common naming scheme for the above mentioned entities, like proteins or organisms. Often, identical names are used for a gene and the protein encoded by it, further complicating the automatic identification of genes and proteins. Moreover, there is an abundant use of abbreviations in the field, where their expansion into the non-abbreviated form is easy for expert human readers, but difficult for text mining systems.

While NE recognition is a well analysed task for the domain of newspaper and newswire articles, biomedical text mining requires further processing of detected entities, especially normalization and grounding.

4.1.2 Entity Normalization

Entities in natural language texts that occur in multiple places are often written differently: Person names, for example, might omit (or abbreviate) the first name, and include or omit titles and middle initials. Similarly, in biological documents, entities are often abbreviated in subsequent descriptions, e.g., the same organism can be referred to by both of the different textual descriptors, Trichoderma reesei and T. reesei. Likewise, the same protein mutation can be encoded using single-letter or three-letter amino acid references. It is important for downstream processing components that these entities are normalized to a single descriptor, e.g., the non-abbreviated form. For a thorough discussion on abbreviations in the biomedical domain, we refer the reader to [13].

4.1.3 Coreference Resolution

A task related to normalization is coreference resolution. In addition to abbreviations, other variations in names often exist. Within a biological text for example, the same protein might be referred to as Xylanase II and endo-1,4-b-Xylanase II. In addition, pronominal references like it or this can also refer to a particular entity [12]. Consider the following sentence:

Interestingly, the Brønsted constants for the hydrolysis of aryl b-glucosides by Abg, a b-glucosidase from Agrobacterium faecalis, and its catalytic nucleophile mutant, E358D, [. . .] are also identical, as also are b1g values for wild-type and E78D Bacillus subtilis xylanase (Lawson et al., 1996). (from: A.M. MacLeod, D. Tull, K. Rupitz, R.A. J.Warren, and S. G.Withers: “Mechanistic Consequences of Mutation of Active Site Carboxylates in a Retaining beta-1,4-Glycanase from Cellulomonas fimi,” Biochemistry 1996, 35(40), PMID 8855954.)

In the part “hydrolysis of aryl b-glucosides by Abg, a b-glucosidase from Agrobacterium faecalis, and its catalytic nucleophile mutant, E358D,” the pronoun its refers to the b-glucosidase protein Abg, however, this is not obvious for an NLP system.

Finding all the different descriptors referring to the same entity (both nominal and pronominal) is the task of coreference resolution. The resulting list of entities is collected in a coreference chain. Note that even after successful resolution, a normalized name still needs to be picked from the coreference chain.

4.1.4 Grounding

As a final step in NE detection, many entities need to be grounded with respect to an external resource, like a database. This is especially important for most biological entities, which have corresponding entries in various databases, e.g., Swiss-Prot for proteins. When further information is needed for downstream analysis tasks, like the automatic processing of amino acid sequences, grounding the textual entity to a unique database entry (e.g., assigning a Swiss-Prot ID to a protein entity) is a mandatory prerequisite. Thus, even if an entity is correctly detected from an NLP perspective, it might still be ambiguous with respect to such an external resource (or not exist at all), which makes it useless for further automated processing until the entity has been grounded.

4.1.5 Relation Detection

Finding entities alone is not sufficient for a text mining system: most of the important information is contained within the relations between entities. For example, the Mutation Miner system described above needs to determine which organism produces a particular protein (protein <-> organism relation) and which protein is modified by a mutation (mutation <-> protein relation).

Relation detection can be very complex. Typical approaches employ predefined patterns or templates, which can be expressed as grammar rules, or a deep syntactic analysis using a full or partial parser for the extraction of predicate-argument structures [34]. The performance of a relation detection component can be improved given information about semantically possible relations, thereby restricting the space of possible combinations.

5.3.1 The Swiss-Prot Protein Database

The UniProt Knowledge Base [3] is a set of two protein databases, Swiss-Prot10 and TrEMBL. Both hold entries about proteins appearing in published works, including information about protein functions, their domain structure, associated organisms, post-translational modifications, variants, among others. Swiss-Prot, which consisted of 228,670 entries as of 2006-07-02, contains “manually-annotated records with information extracted from literature and curator-evaluated computational analysis,”11 while TrEMBL is populated by automatic analysis tools. In the Mutation Miner system, we use the manually curated Swiss-Prot database to gain reliable grounding (see Section 4.2) of proteins found in biological documents (Req. #4).

Figure 13-6 shows the Swiss-Prot entry for a variant of the xylanase 2 protein. The entries most important for NLP analysis are the various “Synonyms,” as they can all appear in a given biomedical document (Req. #3), the canonical name (“Protein name”) that can depend on its host organism, and a unique ID (“Primary accession number”) that allows unambiguous linking to the protein’s entry.

A further essential feature of Swiss-Prot is that its entries are linked to other databases, notably to the NCBI Taxonomy database described in the previous section. This can be seen in the “From” line where the ID of the host organism (“TaxID”) is recorded. Thus, proteins found in documents can easily be linked to their hosting organisms (Req. #5).

The Swiss-Prot data can be downloaded from the Swiss-Prot website in XML, FASTA [38], and plain text format. We adapted our tool for writing NCBI data to an SQL database by exchanging its parser component in order to add the Swiss-Prot data to the database as well, thus enabling queries spanning the two datasets, using the NCBI ID recorded in both to join the results. The database entry corresponding to Figure 13-6 contains the fields ID for the unique identifier, DE for the possible names, GN for the corresponding gene’s name, and OX for the identifier linking to the Taxonomy database:

ID XYN2_TRIRE STANDARD; PRT; 222 AA. DE Endo-1,4-beta-xylanase 2 precursor (EC 3.2.1.8) (Xylanase 2) (1,4- DE beta-D-xylan xylanohydrolase 2). GN Name=xyn2; OS Trichoderma reesei (Hypocrea jecorina). OX NCBI_TaxID=51453; RX MEDLINE=93103679; PubMed=1369024; [...]

The protein data is then encoded in the ontology, similar to the information concerning organisms. Thus, the ontology now has all the required information for detecting protein named entities, as well as assigning normalized names and grounding them to Swiss-Prot IDs (note that some additional processing is required for Protein analysis, including abbreviation detection [13], however, we cannot cover these steps within the scope of this chapter).

Of particular interest are the relations between proteins and organisms inferred from the NCBI TaxID value, which are also transferred into our ontology according to Req. #5 (note the organismProteinRel relation in Figure 13- 4). We can now create relation instances, again using Jena (cf. Figure 13-5):

ObjectProperty organismProteinRel = m.getObjectProperty( mmNS+”organismProteinRel” );
for( Iterator protIt = proteinClass. listInstances (); protIt .hasNext() ) {
[. . .] // Find the ncbiId stored in the protein ’s record. Query for the organism with this id org = (Object)rdfLiteralQuery( ox, ncbiId, organismClass, m ); prot .addProperty( organismProteinRel, org );
}

How we exploit the relation information from the ontology for the NLP analysis of entity relations is covered in Section 6.5.

There is further potentially interesting information available in Swiss-Prot records that could also be transferred to the ontology, for instance the Medline and Pubmed IDs of the publications where primary information concerning the protein is found (shown in the RX line of the listing), as well as the protein sequence (see Figure 13-9) needed for further automatic processing of text mining results.

6.3 Normalization and Grounding

Normalization needs to decide on a canonical name for each entity, like a protein or an organism. Since the ontology encodes information about e.g. scientific names for organisms, a corresponding normalized entry can often be uniquely determined with a simple lookup. In case of abbreviations, however, finding the canonical name usually involves an additional disambiguation step.

For example, if we encounter E. coli in a text, it is first recognised as an organism from the pattern “species preceded by abbreviation.” The NLP component can now query the ontology for a genus instance with a name matching E* and a species named coli, and filter the results for valid genus species combinations denoting an existing organism. Ideally, this would yield the single combination of genus Escherichia and species coli, forming the correct organism name. However, the above query returns in fact four entries. Two can be discarded because their names are classified by NCBI as misspellings of Escherichia coli, as shown by the identical tax id (cf. Table 13-2). Yet the two remaining combinations, with the names Escherichia coli and Entamoeba coli, are both classified as “scientific name.” A disambiguation step now has to determine which one is the correct normalized form for E. coli: This is the task of coreference resolution covered in Section 6.4 below.

Once the normalized name (and thus the represented ontology instance) has been determined, in the case of organisms and proteins the corresponding database ID can be trivially retrieved from the instance, where it was stored as an OWL datatype property as described in Section 5.1. Since the database record can now be unambiguously looked up, the entity is grounded with respect to an external source. For our examples, these IDs are P36217 for the xylanase variant shown in Figure 13-6, and 562 for E. coli, whose database entries are shown in Figure 13-2.

The end result of this step is a semantic annotation of the named entities as they appear in a text, which includes the detected information from normalization and grounding, as shown in Figure 13-8.

Mutation Normalization and Grounding.

Mutation normalization and grounding exhibits some interesting additional properties. As mentioned in Section 5.4, protein mutations are first normalized to a single-letter format from their textual description, which can be easily achieved using the amino acid information stored in the ontology.

More involved is the grounding of a mutation with respect to its protein sequence. Using the already grounded protein information, an amino acid sequence is retrieved from Entrez14 using eFetch15 (see Figure 13-9). Mutated residues can then be located on the retrieved sequences and only those mutation/sequence combinations bearing the declared wild type residues at the specified coordinates with the correct offset between multiple mutations are eligible for subsequent processing. Single point mutations must match the amino acid at the designated coordinate exactly. Mutations detected in a text that cannot be grounded to its designated protein are discarded [53].

6.5 Relation Detection

Relation detection, for example between organisms and proteins, requires more involved NLP analysis, like full or partial parsing for predicate-argument extraction [23, 31, 51].

A common problem in relation extraction is the high amount of ambiguity, especially when using full parsers [55]. Employing an ontology encoding semantically valid relations (Req. #5) allows to constrain the number of detected relation candidates to the semantically valid ones, which ideally results in a unique relation and otherwise boosts precision [30].

We give an example for detecting and disambiguating protein-organism relations, which is illustrated in Figure 13-11. Information from Swiss-Prot, including protein synonyms and taxonomic origin, is encoded in our ontology as detailed in Section 5.3. We can use this information to resolve ambiguous entities in a relation by discarding possible combinations that are not supported by the ontology, as each protein in Swiss-Prot is linked to its hosting organism via the latter’s NCBI Taxonomy ID.

In the given example sentence, the phrase “Bacillus subtilis xylanase” refers to a protein of the Xylanase family. This can be automatically determined by the named entity detection (see Section 6.2), semantically annotating “xylanase” as Protein and “Bacillus subtilis” as Organism. But it is not yet clear which protein is meant precisely. As can be seen in Figure 13-6, canonical protein names can change according to the organism they have been generated from: Xylanase 2 from Trichoderma reesei has the normalized name Endo-1,4-betaxylanase 2 [Precursor] and a grounded ID in Swiss-Prot of P36217. Querying the ontology for proteins with “xylanase” in their name yields no72 different proteins. However, in this example, Bacillus subtilis, which was tagged as organism by the NE component, can be unambiguously grounded, because it is a name occurring in the NCBI Taxonomy database, with the ID 1423 (see Figure 13-8).

So, the ontology query can be refined by including the organism’s NCBI ID, which is used in Swiss-Prot to record the organism producing a protein. The resulting query for a protein named “xylanase ” that is linked to the NCBI entry 1423 yields exactly one result, the correct protein “Endo-1,4-beta-xylanase A precursor (EC 3.2.1.8) (Xylanase A) (1,4-beta-D-xylan xylanohydrolase A).”

References

  • Ananiadou S. and McNaught J., editors. Text Mining for Biology and Biomedicine. Artech House, 2006.
  • Baader F., Calvanese D., McGuinness D.L., Nardi D., and Patel-Schneider P.F., editors. The Description Logic Handbook: Theory, Implementation and Application. Cambridge University Press, 2002.
  • Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O’Donovan C., Redaschi N., and Yeh L.S.L. The Universal Protein Resource (UniProt). Nucleic Acids Research, (2005).
  • Baker C.J.O., Shaban-Nejad A., Su X., Haarslev V., and Butler G. Semantic Web Infrastructure for Fungal Enzyme Biotechnologists. Journal of Web Semantics, vol. 4(3), 2006. Special issue on Semantic Web for the Life Sciences.
  • Baker C.J.O., Su X., Butler G., and Haarslev V. Ontoligent Interactive Query Tool. In M.T. Koné and D. Lemire, editors, Canadian Semantic Web Series, vol. 2 of Semantic Web and Beyond. Springer, 2006.
  • Baker C.J.O. and Witte R. Mutation Mining — A Prospector’s Tale. Information Systems Frontiers (ISF), vol. 8(1):47–57, February 2006.
  • Baker C.J.O., Witte R., Shaban-Nejad A., Butler G., and Haarslev V. The Fungal-Web Ontology: Application Scenarios. In Eighth Annual Bio-Ontologies Meeting, pages 1–2. Detroit, Michigan, USA, June 24 2005.
  • Bodenreider O. Lexical, Terminological, and Ontological Resources for Biological Text Mining. In Ananiadou and McNaught [1], chapter 3.
  • Bontcheva K., Tablan V., Maynard D., and Cunningham H. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, 2004.
  • Buitelaar P., Philipp Cimiano, and Magnini B., editors. Ontology Learning from Text: Methods, Evaluation and Applications, vol. 123 of Frontiers in Artificial Intelligence and Applications. IOS Press, 2005.
  • Camon E.B., Barrell D.G., Dimmer E.C., Lee V., Magrane M., Maslen J., Binns D., and Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics, vol. 6(Suppl 1), 2005.
  • Casta˜no J., Zhang J., and Pustejovsky J. Anaphora Resolution in Biomedical Literature. In International Symposium on Reference Resolution. (2002).
  • Chang J. and Schütze H. Abbreviations in Biomedical Text. In Ananiadou and McNaught [1], chapter 5.
  • Cohen A.M. and HershW.R. A survey of current work in biomedical text mining. Briefings in Bioinformatics, vol. 6:57–71, 2005.
  • Couto F.M., Silva M.J., and Coutinho P. ProFAL: PROtein Functional Annotation through Literature. In VII Conference on Software Engineering and Databases (JISBD), pages 747–756. (2003).
  • Cunningham H., Maynard D., Bontcheva K., and Tablan V. GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the ACL. (2002). http://gate.ac.uk. D R A F T Page 31 August 30, 2006, 10:41am D R A F T 32 Revolutionizing Knowledge Discovery in the Life Sciences
  • Cunningham H., Maynard D., and Tablan V. JAPE: a Java Annotation Patterns Engine (Second Edition). Technical report, University of Sheffield, Department of Computer Science, 2000.
  • (Doms & Schroeder, 2005) ⇒ Andreas Doms, and Michael Schroeder. (2005). “GoPubMed: exploring PubMed with the gene ontology.” In: Nucl. Acids Res., 33(suppl 2). doi:10.1093/nar/gki470
  • Federhen S. The Taxonomy Project. In J. McEntyre and J. Ostell, editors, The NCBI Handbook, chapter 4. National Library of Medicine (US), National Center forBiotechnology Information, 2003.
  • Gabdoulline R.R., Hoffmann R., Leitner F., and Wade R.C. ProSAT: functional annotation of protein 3D structures. Bioinformatics, vol. 19(13):1723–1725, 2003.
  • Gasperin C. Semi-supervised anaphora resolution in biomedical texts. In: Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP). New York City, NY, USA, 2006.
  • Haarslev V. and M¨oller R. RACER System Description. In: Proceedings of International Joint Conference on Automated Reasoning (IJCAR), pages 701–705. Springer-Verlag Berlin, Siena, Italy, June 18–23 2001.
  • Hahn U. and Wermter J. Levels of Natural Language Processing for Text Mining. In Ananiadou and McNaught [1], chapter 2.
  • Lynette Hirschman. and Blaschke C. Evaluation of Text Mining in Biology. In Ananiadou and McNaught [1], chapter 9.
  • Lynette Hirschman, Yeh A., Blaschke C., and Valencia A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, vol. 6(Suppl 1), 2005.
  • Horn F., Lau A.L., and Cohen F.E. Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics, vol. 20(4):557–568, 2004.
  • Kawabata T., Ota M., and Nishikawa K. The protein mutant database. Nucleic Acids Research, vol. 27(1), 1999.
  • Kim J.J. and Park J.C. BioAR: Anaphora Resolution for Relating Protein Names to Proteome Database Entries. In Sanda M. Harabagiu and D. Farwell, editors, ACL 2004: Workshop on Reference Resolution and its Applications, pages 79–86. Association for Computational Linguistics, Barcelona, Spain, 2004.
  • Kiryakov A., Borislav Popov, Terziev I., Manov D., and Ognyanoffe D. Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics, vol. 2(1), 2005.
  • Leroy G. and Chen H. Genescene: An Ontology-enhanced Integration of Linguistic and Co-occurrence based Relations in Biomedical Texts. Journal of the American Society for Information Systems and Technology (JASIST), vol. 56(5):457–468, March 2005.
  • Leroy G., Chen H., and Martinez J.D. A shallow parser based on closed-class words to capture relations in biomedical text. J. of Biomedical Informatics, vol. 36:145–158, 2003.
  • Li Y., Bontcheva K., and Cunningham H. Using Uneven Margins SVM and Perceptron for Information Extraction. In: Proceedings of Ninth Conference on Computational Natural Language Learning (CoNLL). (2005).
  • Manning C.D. and Schütze H. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
  • McNaught J. and Black W.J. Information Extraction. In Ananiadou and McNaught [1], chapter 7.
  • Müller H.M., Kenny E.E., and Sternberg P.W. Textpresso: An Ontology-based Information Retrieval and Extraction System for Biological Literature. PLoS Biology, vol. 2(11):1984– 1998, November 2004. D R A F T Page 32 August 30, 2006, 10:41am D R A F T Ontology Design for Biomedical Text Mining 33
  • Niles I. and Pease A. Towards a Standard UpperOntology. In C.Welty and B. Smith, editors, Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS). Ogunquit, Maine, 2001.
  • Park J.C. and Kim J.J. Named Entity Recognition. In Ananiadou and McNaught [1], chapter 6.
  • Pearson W.R. and Lipman D.J. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the USA, vol. 85(8):2444–2448, April 1988.
  • Borislav Popov, Kiryakov A., Ognyanoff D., Manov D., Kirilov A., and Goranov M. Towards Semantic Web Information Extraction. In Human Language Technologies Workshop at the 2nd International Semantic Web Conference (ISWC). Sanibel Island, Florida, USA, October 20 2003.
  • Rebholz-Schuhmann D., Kirsch H., and Couto F. Facts from Text — Is Text Mining Ready to Deliver? PLoS Biology, vol. 3:188–191, 2005.
  • Rebholz-Schuhmann D.,Marcel S., Albert S., Tolle R., Casari G., and Kirsch H. Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Research, vol. 32(1):135–142, 2004.
  • Roche E. and Schabes Y., editors. Finite-State Language Processing. MIT Press, 1997.
  • Schuman J. and Bergler S. Postnominal prepositional attachment in proteomics. In: Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP). New York City, NY, USA, 2006.
  • Shaban-Nejad A., Baker C.J.O., Haarslev V., and Butler G. The FungalWeb Ontology: SemanticWeb Challenges in Bioinformatics and Genomics. In Springer LNCS 3729, pages 1063–1066. (2005).
  • Smith M.K., Welty C., and McGuinness D.L., editors. OWL Web Ontology Language Guide. World Wide Web Consortium, (2004). http://www.w3.org/TR/owl-guide/.
  • Spasic I., Ananiadou S., McNaught J., and Kumar A. Text mining and ontologies in biomedicine: making sense of raw text. Briefings in Bioinformatics, vol. 6, 2005.
  • Steffen Staab, and Rudi Studer, editors. Handbook on Ontologies. Springer, 2004.
  • Stoica E. and Hearst M. Predicting Gene Functions from Text Using a Cross-Species Approach. In Pacific Symposium on Biocomputing (PSB), pages 88–99. (2006).
  • Jun'ichi Tsujii. and Ananiadou S. Thesaurus or logical ontology, which one do we need for text mining? Language Resources and Evaluation, vol. 39(1):77–90, 2005.
  • Vlachos A., Gasperin C., Lewin I., and Briscoe T. Bootstrapping the Recognition and Anaphoric Linking of Named Entities in Drosophila Articles. In Pacific Symposium on Biocomputing, pages 100–111. (2006).
  • Wattarujeekrit T., Shah P.K., and Collier N. PASBio: predicate-argument structures for event extraction in molecular biology. BioMed Central Bioinformatics, vol. 5(155), 2004.
  • Wessel M. and M¨oller R. High Performance Semantic Web Query Answering Engine. In International Workshop on Description Logics (DL). Edinburgh, Scotland, UK, 2005.
  • Witte R. and Baker C.J.O. Combining Biological Databases and Text Mining to support New Bioinformatics Applications. In 10th International Conference on Applications of Natural Language to Information Systems (NLDB), vol. 3513 of LNCS, pages 310–321. Springer, Alicante, Spain, June 15–17 2005.
  • Wood M.M., Lydon S.J., TablanV.,MaynardD., and CunninghamH. Populating a Database from Parallel Texts Using Ontology-based Information Extraction. In 9th International Conference on Applications of Natural Language to Information Systems (NLDB), vol. 3136 of LNCS. Springer, 2004.
  • Yakushiji A., Tateisi Y., Miyao Y., and Jun'ichi Tsujii. Event extraction from biomedical papers using a full parser. In: Proceedings of the 6th Pacific Symposium on BioComputing (PSB), pages 408–419. Hawaii, USA, January 2001.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 OntologyDesignForBiomedicalTextMiningRené Witte
Thomas Kappler
Christopher J. O. Baker
Ontology Design for Biomedical Text Mininghttp://www.rene-witte.net/system/files/ontology design preprint.pdf10.1007/978-0-387-48438-92007