2003 ExtractingBioInteractionsWithALinkParser

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Information Extraction, Network Analysis

Notes

Cited By

Quotes

Abstract

Many natural language processing approaches at various complexity levels have been reported for extracting biochemical interactions from MEDLINE. While some algorithms using simple template matching are unable to deal with the complex syntactic structures, others exploiting sophisticated parsing techniques are hindered by greater computational cost. This study investigates link grammar parsing for extracting biochemical interactions. Link grammar parsing can handle many syntactic structures and is computationally relatively efficient. We experimented on a sample MEDLINE corpus. Although the parser was originally developed for conversational English and made many mistakes in parsing sentences from the biochemical domain, it nevertheless achieved better overall performance than a co-occurrence-only method. Customizing the parser for the biomedical domain is expected to improve its performance further.

2. Related Work

Coordination occurs in a sentence when it contains a shared structure. The sharing avoids duplication, so that the sentence is more compact than if sharing had not occurred. That is the main reason why this syntactic structure is so widely used in MEDLINE abstracts and elsewhere. Coordination can be applied to various sentence components.

For example,

  • Protein A activates proteins B and C.
  • Protein A activates protein B and protein C.
  • Protein A activates protein B, and inhibits protein C.

All of these examples use coordination to avoid saying, for example, “Protein A activates protein B. Protein A activates protein C.”

3. Link grammar and the link grammar parser

Link grammar was first introduced by Sleator and Temperley to simplify English grammar with a context-free grammar [8]. The basic idea of link grammar is to connect pairs of words in a sentence with various links. Each word is viewed as a block with connectors coming out. There are various types of connectors, and connectors may point to the right or to the left. A valid sentence may have more than one complete linkage, just as a sentence may have several meanings.

Grinberg et al. developed a robust parser to implement the link grammar. It has a dictionary of about 60,000 words, and can recognize a wide range of English syntactic phenomena: noun-verb agreement, questions, imperatives, complex and irregular verbs, many types of nouns, past- or present-participles in noun phrases, commas, a variety of adjective types, prepositions, adverbs, relative clauses, possessives, coordinating conjunctions, and others. The parser was tested on a corpus of English telephone conversations. Its robustness was demonstrated by its ability to handle many “ungrammatical” sentences and sentence fragments. If a complete linkage cannot be found, the parser will try to form a “partial linkage” by ignoring one or more of the words in the sentence. The parser has an internal timer. If the timer runs down before a complete or partial linkage has been found, the parser will output whatever it has found so far (termed a fragmented linkage).

References

  • Blaschke, C., M. Andrade, C. Ouzounis, and A. Valencia (1999) Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions. AAAI Conference on Intelligent Systems in Molecular Biology 60-67.
  • Ding, J. (2003). PathBinder: A Sentence Repository of Biochemical Interactions Extracted from MEDLINE (MS thesis). Iowa State University, Ames, IA.
  • Ding, J., D. Berleant, D. Nettleton, and E. Wurtele (2002) Mining MEDLINE: Abstracts, Sentences, or Phrases? Pacific Symposium on Biocomputing 7: 326-337.
  • Grinberg, D., John D. Lafferty, and D. Sleator (1995) A Robust Parsing Algorithm for Link Grammars. Proceedings of the Fourth International Workshop on Parsing Technologies.
  • Leroy, G. and H. Chen (2002) Filling Preposition-based Templates to Capture Information from Medical Abstracts. Pacific Symposium on Biocomputing 7: 350-361.
  • Ng, S.-K. and M. Wong (1999) Toward Routine Automatic Pathway Discovery from On-Line Scientific Text Abstracts. Genome Informatics 10: 104-112.
  • Park, J.C., H.S. Kim, and J.J. Kim (2001) Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar. Pacific Symposium on Biocomputing 6: 396-407.
  • Sleator, D. and D. Temperley (1993) Parsing English with a ink Grammar. Third International Workshop on Parsing Technologies. * Temperley, D., D. Sleator, and John D. Lafferty Link Grammar Parser Online Demo. http://www.link.cs.cmu.edu/link/submit-sentence-4.html
  • Wong, L. (2001) PIES, a Protein Interaction Extraction System. Pacific Symposium on Biocomputing 6.
  • Yakushiji, A., Y. Tateisi, Y. Miyao, and Jun'ichi Tsujii (2001) Event Extraction from Biomedical Papers Using a Full Parser. Pacific Symposium on Biocomputing 6: 408-419.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 ExtractingBioInteractionsWithALinkParserJun Xu
Jing Ding
Daniel Berleant
Andy W. Fulmer
Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parserhttp://class.ee.iastate.edu/berleant/home/me/cv/papers/LGPmanuscript8-8-03a.pdf