1999 ConstrBioKBsByIE

Jump to: navigation, search

Subject Headings: Relation Detection from Text Algorithm, PPLRE Project, Distant-Supervision Learning Algorithm.


Cited By



Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task a statistical text classification method, and a relational learning method and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from weakly" labeled training data.


  • Andrade, M. A., and Valencia, A. (1997). Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. In: Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, 25{32. Halkidiki, Greece: AAAI Press.
  • Boland, M. V.; Markey, M. K.; and Murphy, R. F. (1996). Automated classification of protein localization patterns. Molecular Biology of the Cell 8(346a).
  • Califf, M. E. (1998). Relational Learning Techniques for Natural Language Extraction. Ph.D. Dissertation, Computer Science Department, University of Texas, Austin, TX. AI Technical Report 98-276.
  • Cardie, C. (1997). Empirical methods in information extraction. AI Magazine 18(4):65{80.
  • Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, 147{ 150. Stockholm, Sweden: Pitman.
  • Cowie, J., and Lehnert, W. (1996). Information extraction. Communications of the ACM 39(1):80{91.
  • Pedro Domingos, and Michael J. Pazzani (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29:103{130.
  • Dayne Freitag (1998). Multistrategy learning for information extraction. In: Proceedings of the Fifteenth International Conference on Machine Learning, 161{169. Madison, WI: Morgan Kaufmann.
  • Fukuda, K.; Tsunoda, T.; Tamura, A.; and Takagi, T. (1998). Toward information extraction: Identifying protein names from biological papers. In Pacific Symposium on Biocomputing, 707{718.
  • Genome Annotation Consortium. (1999). The genome channel. http://compbio.ornl.gov/tools/channel/.
  • Hodges, P. E.; Payne, W. E.; and Garrels, J. I. (1998). Yeast protein database (YPD): A database for the complete proteome of saccharomyces cerevisiae. Nucleic Acids Research 26:68{72.
  • Karp, P.; Riley, M.; Paley, S.; and Pellegrini-Toole, A. (1997). EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nucleic Acids Research 25(1).
  • Lathrop, R. H.; Steffen, N. R.; Raphael, M. P.; Deeds-Rubin, S.; Michael J. Pazzani J.; Cimoch, P.; See, D. M.; and Tilles, J. G. (1998). Knowledge-based avoidance of drug-resistant HIV mutants. In: Proceedings of the Tenth Conference on Innovative Applications of Artificial Intelligence. Madison, WI: AAAI Press.
  • Leek, T. (1997). Information extraction using hidden markov models. Master's thesis, Department of Computer Science and Engineering, University of California, San Diego, CA.
  • Lewis, D. D., and Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 81{93.
  • Tom M. Mitchell. M. (1997). Machine Learning. New York: McGraw-Hill.
  • National Center for Biotechnology Information. (1999). Entrez. http://www.ncbi.nlm.nih.gov/Entrez/.
  • National Library of Medicine. 1999a. Pubmed. http://www.ncbi.nlm.nih.gov/PubMed/.
  • National Library of Medicine. 1999b. Unified medical language system. http://www.nlm.nih.gov/research/umls/umlsmain.html.
  • Ohta, Y.; Yamamoto, Y.; Okazaki, T.; Uchiyama, I.; and Takagi, T. (1997). Automatic construction of knowledge base from biological papers. In: Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, 218{225. Halkidiki, Greece: AAAI Press.
  • Judea Pearl (1988). Probabalistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann.
  • Porter, M. F. (1980). An algorithm for suffix stripping. Program 14(3):127{130.
  • Provost, F., and Fawcett, T. (1998). Robust classification systems for imprecise environments. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, 706{713. Madison, WI: AAAI Press.
  • J. Ross Quinlan (1990). Learning logical definitions from relations. Machine Learning 5:239{2666.
  • Richards, B. L., and Mooney, R. J. (1992). Learning relations by pathfinding. In: Proceedings of the Tenth National Conference on Artificial Intelligence, 50{55. San Jose, CA: AAAI/MIT Press.
  • Ellen Riloff (1996). An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence 85:101{134.
  • Ellen Riloff (1998). The sundance sentence analyzer. http://www.cs.utah.edu/projects/nlp/.
  • Rost, B. (1996). PHD: Predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology 266:525{539.
  • Slattery, S., and Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In: Proceedings of the Eighth International Conference on Inductive Logic Programming. Springer Verlag.
  • Soderland, S. (1996). Learning Text Analysis Rules for Domain-speific Natural Language Processing. Ph.D. Dissertation, University of Massachusetts. Department of Computer Science Technical Report 96-087.
  • Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning.
  • Swanson, D. R., and Smalheiser, N. R. (1997). An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence 91:183{203.
  • Weeber, M., and Vos, R. (1998). Extracting expert medical knowledge from texts. In Working Notes of the Intelligent Data Analysis in Medicine and Pharmacology Workshop, 23{28.
  • Xu, Y.; Mural, R. J.; Einstein, J. R.; Shah, M. B.; and Uberbacher, E. C. (1996). GRAIL: A multi-agent neural network system for gene identification. Proceedings of the IEEE 84(10):1544{1552.,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 ConstrBioKBsByIEMark Craven
Johan Kumlien
Constructing Biological Knowledge-bases by Extracting Information from Text SourcesProceedings of the International Conference on Intelligent Systems for Molecular Biologyhttp://www.biostat.wisc.edu/~craven/papers/ismb99.pdf1999