2006 ExtrOfGenDisRelsFromMedlineUsingDomDictsAndML

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

We describe a system that extracts disease-gene relations from MedLine. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by dictionary matching. Since dictionary matching produces a large number of false positives, we developed a method of machine learning-based named entity recognition (NER) to filter out false recognitions of disease/gene names. We found that the performance of relation extraction is heavily dependent upon the performance of NER filtering and that the filtering improves the precision of relation extraction by 26.7% at the cost of a small reduction in recall."

Conclusion and Future work

The aim of this research was to build a system to automatically extract useful information from publicly available biomedical data sources. In particular, our focus was on relation extraction between diseases and genes. We found that named-entity recognition (NER) using ME-based filtering significantly improves the precision of relation extraction at the cost of a small reduction in recall.

We conducted experiments to show the performance of our relation extraction system and how it depends on the performance of the NER scheme. We could safely regard co-occurrences as containing correct relations if candidate disease and gene names were considered to be correct.

In this work, we did not address the problem of polysemous terms, which would cause difficulty in linking such terms with database entries. One solution would be to incorporate techniques for ambiguity resolution into our system. For example, S. Gaudan et al. proposed the use of SVMs for abbreviation resolution and achieved 98.9% precision and 98.2% recall.

The number of co-occurrences in the training and testing sets was rather small for the purpose of evaluating our system. Future work should encompass increasing the size of the annotated corpus and enriching annotation.

References

  • D.R. Swanson, Fish oil, Raynaud’s syndrome, and undiscovered public knowledge, Perspect Biol Med, 30(1), pp.7–18 (1986).
  • D. Hristovski, B. Peterlin, J.A. Mitchell, and S.M. Humphrey, Improving literature based discovery support by genetic knowledge integration, Stud. Health Technol. Inform., 95, pp.68–73 (2003).
  • C. Perez-Iratxeta, P. Bork, M.A. Andrade, Association of genes to genetically inherited diseases using data mining, Nat Genet, 31(3), pp.316–319 (2002).
  • D. Proux et al., A pragmatic information extraction strategy for gathering data on genetic interactions, ISMB, 8, pp.279–285 (2000).
  • James Pustejovsky et al., Medstract : Creating Large-scale Information Servers for biomedical libraries, Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, pp.85–92 (2002).
  • Yoshimasa Tsuruoka and Jun'ichi Tsujii, Boosting Precision and Recall of Dictionary-based Protein Name Recognition, Proceedings of the ACL-03 Workshop on Natural Language Processing in Biomedicine, pp.41–48 (2003).
  • Enju v1.0: http://www-tsujii.is.s.u-tokyo.ac.jp/enju/index.html (2004).
  • T. Ninomiya, Yoshimasa Tsuruoka, Y. Miyao, and Jun'ichi Tsujii, Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing, Proceedings of the 9th International Workshop on Parsing Technologies (2005).
  • GENIA Part-of-Speech Tagger v0.3: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/postagger/ (2004).
  • GENIA Corpus 3.0p: http://www-tsujii.is.s.utokyo. ac.jp/genia/topics/Corpus/3.0/GENIA3.0p.intro.html (2003).
  • S. Gaudan et al., Resolving abbreviations to their senses in Medline, Bioinformatics, 21(18), pp.3658–3664 (2005).

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 ExtrOfGenDisRelsFromMedlineUsingDomDictsAndMLJun'ichi Tsujii
Yoshimasa Tsuruoka
Jin-Dong Kim
Hong-woo Chun
Rie Shiba
Naoki Nagata
Teruyoshi Hishiki
Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learninghttp://helix-web.stanford.edu/psb06/chun.pdf