2007 FrontiersInBiomedicalTextMining

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Biomedicine, Text Mining, Information Extraction, Text Summarization, Image Mining, Question Answering, Literature-based Discovery, Evaluation, User Orientation

Notes

Cited By

~49 http://scholar.google.com/scholar?cites=13638850713200521247

2008

Quotes

Abstract

It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year.


References

  • Aronson AR, Bodenreider O, Demner-Fushman D, et al. From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches. In:. In: Biological, Translational, and Clinical Language Processing (2007) Prague, Czech Republic: Association for Computational Linguistics. 105–12.
  • S Hunter L, Cohen KB. Biomedical language processing: what's beyond PubMed? Mol Cell (2006) 21:589–94. [CrossRef][ISI][Medline]
  • Ng SK. Integrating text mining with data mining. In: Text Mining for Biology and Biomedicine — Sophia Ananiadou, McNaught J, eds. (2006) Norwood, Massachussets: Artech House Publishers.
  • Baumgartner WA Jr, Cohen KB, Fox L, et al. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics (ISMB proceedings issue) (2007) 23(13):i41–i48.
  • (Cohen & Hersh, 2005) ⇒ Aaron Michael Cohen, and William R. Hersh. (2005). “A Survey of Current Work in Biomedical Text Mining.” In: Briefings in Bioinformatics 2005 6(1). doi:10.1093/bib/6.1.57
  • S Spasic I, Sophia Ananiadou, McNaught J, et al. Text mining and ontologies in biomedicine: making sense of raw text. Brief Bioinform (2005) 6(3):239–51. [Abstract/Free Full Text]
  • S Sophia Ananiadou, Kell DBB, Jun'ichi TsujiiII. Text mining and its potential applications in systems biology. Trends Biotechnol (2006) 24(12):571–579. [CrossRef][ISI][Medline]
  1. S de Bruijn B, Martin J. Getting to the (c)ore of knowledge: mining biomedical literature. Int J Med Inform (2002) 67:7–18. [CrossRef][ISI][Medline]
  • S Cohen KB, Hunter L. Natural language processing and systems biology. In: Artificial Intelligence Methods and Tools for Systems Biology — Dubitzky W, Azuaje F, eds. (2004). Heidelberg: Springer. 147–74.
  • S Weeber M, Kors JA, Mons B. Online tools to support literature-based discovery in the life sciences. Brief Bioinform (2005) 6(3):277–86. [Abstract/Free Full Text]
  • S Shatkay H. Hairpins in bookstacks: information retrieval from biomedical text. Brief Bioinform (2005) 6(3):222–38. [Medline]
  • S Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol (2005) 6:224. [CrossRef][Medline]
  • S Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet (2006) 7:119–29. [CrossRef][ISI][Medline]
  • S Sophia Ananiadou, McNaught J. Text Mining for Biology and Biomedicine (2006) Norwood, Massachussets: Artech House Publishers.
  • S Shatkay H, Craven M. Biomedical Text Mining (2007) Cambridge, Massachussets: MIT Press.
  • Jackson P, Moulinier I. Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization (2002) Amsterdam: John Benjamins Publishing Company.
  • Hearst MA. What is text mining? http://www.ischool.berkeley.edu/~hearst/text-mining.html, (October 2003 date last accessed).
  • Fukuda K, Tamura A, Tsunoda T, et al. Toward information extraction: identifying protein names from biological papers. In. Pac Symp Biocomput Maui, Hawaii, 1998:707–18.
  1. McDonald R, Pereira F. Identifying gene and protein mentions in text using conditional random fields. BMC Bioinform (2005) 6:(Suppl). (1:S6).
  • Jin Y, McDonald R, Lerman K, et al. Automated recognition of malignancy mentions in biomedical literature. BMC Bioinform (2006) 7:492. [CrossRef][Medline]
  • Yeh A, Morgan A, Marc E. Colosimo, et al. BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinform (2005) 6. :(Suppl)1. *.
  • Olsson F, Eriksson G, Franzén K, et al. Notions of correctness when evaluating protein name taggers. In. Proceedings of the 19th International Conference on computational linguistics (COLING 2002): Taipei, Taiwan, (2002). 765–71.
  • Dingare S, Nissim M, Finkel J, et al. A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations: Conference papers. Comp Funct Genomics (2005) 6(1–2):77–85. [CrossRef]
  • Sandler T, Schein A, Ungar L. Automatic term list generation for entity tagging. Bioinformatics (2006) 22(6):651–7. [Abstract/Free Full Text]
  • Tanabe L, Thom L, Matten W, et al. SemCat: semantically categorized entities for genomics. AMIA Annu Symp Proceedings of Washington, DC, 2006:754–8.
  • Tanabe L, Wilbur W. A priority model for named entities. In. BioNLP (2006).
  • Maguitman AG, Rechtsteiner A, Verspoor K, et al. Large-scale testing of bibliome informatics using Pfam protein families. In: Pac Symp Biocomput 11 (2006) Maui, Hawaii. 76–87. http://psb.stanford.edu/psb-online/proceedings/psb06/maguitman.pdf.
  • Bunescu R, Mooney R, Ramani A, et al. Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from MEDLINE. In: BioNLP (2006).
  • Curran JR, Moens M. Scaling context space. In. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA. ACL, 2002:231–8.
  • Fundel K, Küffner R, Zimmer R. RelEx — relation extraction using dependency parse trees. In: Bioinformatics (2007) 23:365–71. http://bioinformatics.oxfordjournals.org/cgi/content/full/23/3/365. *. [Abstract/Free Full Text]
  • Hanisch D, Fundel K, Mevissen HT, et al. Prominer: rule-based protein and gene entity recognition. BMC Bioinforma (2005) 6. (Suppl 1):(S14).
  • Nédellec C. Learning language in logic — genic interaction extraction challenge. In: Proceedings of the ICML05 workshop: Learning Language in Logic (LLL05) (2005) Bonn, Germany.
  • Rinaldi F, Schneider G, Kaljurand K, et al. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artif Intell Med (2007) 39(2):127–36. [CrossRef][ISI][Medline]
  • Rinaldi F, Schneider G, Kaljurand K, et al. An environment for relation mining over richly annotated corpora: the case of GENIA. In: BMC Bioinforma (2006) 7((Suppl 3):(S3)). http://www.biomedcentral.com/content/pdf/1471-2105-7-S3-S3.pdf.
  • Miyao Y, Ohta T, Masuda K, et al. Semantic retrieval for the accurate identification of relational concepts in massive textbases. In. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, Australia. 2006:1017–24.
  • Ninomiya T, Yoshimasa Tsuruoka, Miyao Y, et al. Fast and scalable HPSG parsing. TAL 2005 (2007) 46(2).
  • Ohta T, Miyao Y, Ninomiya T, et al. An intelligent search engine and GUI-based efficient MEDLINE search tool based on deep syntactic parsing. In. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Interactive Presentation Sessions, Sydney, Australia. 2006:17–20.
  • Tsai TH, Chou WC, Lin YC, et al. BIOSMILE: Adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features. In: BioNLP (2006).
  • Kim JD, Ohta T, Tateisi Y, et al. Genia corpus — a semantically annotated corpus for bio-textmining. Bioinformatics (2003) 19(Suppl 1):180–2. [CrossRef]
  • Masseroli M, Kilicoglu H, Lang FM, et al. Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease. In: BMC Bioinform (2006) 7. (291). http://www.biomedcentral.com/content/pdf/1471-2105-7-291.pdf.
  • Rodriguez-Esteban R, Iossifov I, Rzhetsky A. Imitating manual curation of text-mined facts in biomedicine. PLoS Comput Biol (2006) 2(9)..
  • Lee LC, Horn F, Cohen FE. Automatic extraction of protein point mutations using a graph bigram association. PLoS Comput Biol (2007) 3(2).
  • Chang DTH, Weng YZ, Lin JH, et al. Protemot: prediction of protein binding sites with automatically extracted geometrical templates. In: Nucleic Acids Res (2006) 34:W303–9. http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1538868&blobtype=pdf. [Abstract/Free Full Text]
  • Chun HW, Yoshimasa Tsuruoka, Kim JD, et al. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In: Pac Symp Biocomput 11 (2006) Maui, Hawaii. 4–15. http://psb.stanford.edu/psb-online/proceedings/psb06/chun.pdf.
  • Lussier Y, Borlawsky T, Rappaport D, et al. PhenoGO: assigning phenotypic context to Gene Ontology annotations with natural language processing. In: Pac Symp Biocomput 11 (2006) Maui, Hawaii. 64–75. http://psb.stanford.edu/psb-online/proceedings/psb06/lussier.pdf.
  • Ahlers CB, Fiszman M, Demner-Fushman D, et al. Extracting semantic predications from MEDLINE citations for pharmacogenomics. In: Pac Symp Biocomput 12 (2007) Maui, Hawaii. 209–20.
  • Baker CJO, Witte R. Mutation mining: a prospector's tale. J Inform Syst Front (2006) 8(1):47–57. [CrossRef]
  • Kim JJ, Zhang Z, Park JC, et al. BioContrasts: extracting and exploiting protein-protein contrastive relations from biomedical literature. In: Bioinformatics (2006) 22:597–605. http://bioinformatics.oxfordjournals.org/cgi/reprint/22/5/597.pdf. *. [Abstract/Free Full Text]
  • DUC 2006: task, documents, and measures. (September 2007, date last accessed). http://duc.nist.gov/duc2006/tasks.html.
  • Demner-Fushman D, Lin J. Answer extraction, semantic clustering and extractive summarization for clinical question answering. In. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, Australia. 2006:841–8.
  • Ling X, Jiang J, He X, et al. Automatically generating gene summaries from biomedical literature. In: Pac Symp Biocomput 11 (2006) Maui, Hawaii. 40–51. http://psb.stanford.edu/psb-online/proceedings/psb06/ling.pdf. *.
  • Hersh W, Bhupatiraju RT. TREC Genomics track overview. In: The twelfth Text Retrieval Conference, TREC (2003). National Institute of Standards and Technology (2003) Gaithersburg, Maryland. 14–23.
  • Lu Z, Cohen KB, Hunter L. Finding GeneRIFs via Gene Ontology annotations. In: Pac Symp Biocomput 11 (2006) Maui, Hawaii. 52–63. http://psb.stanford.edu/psb-online/proceedings/psb06/lu.pdf.
  • Rebholz-Schuhmann D, Kirsch H, Arregui M, et al. EBIMed-text crunching to gather facts for proteins from Medline. Bioinformatics (2007) 23:e237–44. [Abstract/Free Full Text]
  • Chiang JH, Shin JW, Liu HH, et al. GeneLibrarian: an effective gene-information summarization and visualization system. BMC Bioinform (2006) 7(392).
  • Fernández J, Hoffmann R, Valencia A. iHOP Web services. Nucleic Acids Res (2007) 35(Web Server issue). (W21–6).
  • Tanabe L, Scherf U, Smith LH, et al. Medminer: an Internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques (1999) 27(6):1210–4. 1216–7. [ISI][Medline]
  • Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinform (2004) 5(147).
  • Névéol A, Shooshan SE, Humphrey SM, et al. Multiple approaches to fine-grained indexing of the biomedical literature. In: Pac Symp Biocomput 12 (2007) Maui, Hawaii. 292–303.
  • Stoica E, Hearst M. Predicting gene functions from text using a cross-species approach. In: Pacific Symposium on Biocomputing 11 (2006) 88–99. http://psb.stanford.edu/psb-online/proceedings/psb06/stoica.pdf. PSB 2006: Maui, Hawaii.
  • Fyshe A, Szafron D. Term generalization and synonym resolution for biological abstracts: using the gene ontology as a source of expert knowledge. In. BioNLP (2006).
  • Höglund A, Blum T, Brady S, et al. Significantly improved prediction of subcellular localization by integrating text and protein sequence data. In: Pac Symp Biocomput 11 (2006) Maui, Hawaii. 16–27. http://psb.stanford.edu/psb-online/proceedings/psb06/hoglund.pdf.
  • Murphy RF, Velliste M, Yao J, et al. Searching online journals for fluorescence microscope images depicting protein subcellular location patterns. In. IEEE International Symposium on BioInformatics and Biomedical Engineering, Rockville, Maryland, USA. 2001:119–28.
  • Kou Z, Cohen W, Murphy R. Extracting information from text and images for location proteomics. In: ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD) (2003) Washington, DC. 2–9.
  • Shatkay H, Chen N, Blostein D. Integrating image data into biomedical text categorization. Bioinformatics (2006) 22:e446–53.. [Abstract/Free Full Text]
  • Yu H, Lee M. Accessing bioscience images from abstract sentences. Bioinformatics (2006) 22:e547–56. *. [Abstract/Free Full Text]
  • Rafkind B, Lee M, Chang S, et al. Exploring text and image features to classify images in bioscience literature. In. (2006) BioNLP, New York, USA. 73–80.
  • Kou Z, Cohen W, Murphy R. A stacked graphical model for associating sub-images with sub-captions. In. Pacific Symposium on Biocomputing 12 (2007) 257–68.
  • Rhodes J, Boyer S, Kreulen J, et al. Mining patents using molecular similarity search. In: Pac Symp Biocomput 12 (2007) Maui, Hawaii. 304–15.
  • Claudinot S, Nicolas M, Oshima H, et al. Long-term renewal of hair follicles from clonogenic multipotent stem cells. PNAS (2005) 102(41):14677–82. Image on cover page. [Abstract/Free Full Text]
  • Tan SL, Nakao H, He Y, et al. NS5A, a nonstructural protein of hepatitis C virus, binds growth factor receptor-bound protein 2 adaptor protein in a Src homology 3 domain/ligand-dependent manner and perturbs mitogenic signaling. PNAS (1999) 96(10):5533–8. [Abstract/Free Full Text]
  • Schuler B, Lipman E, Steinbach P, et al. Polyproline and the "spectroscopic ruler" revisited with single-molecule fluorescence. PNAS (2005) 102(8):2754–9. [Abstract/Free Full Text]
  • Kihara D, Lu H, Kolinski A, et al. TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. PNAS (2001) 98(18):10125–30. [Abstract/Free Full Text]
  • Hu H, Li M, Labrador J, et al. Cross GTPase-activating protein (CrossGAP)/Vilse links the Roundabout receptor to Rac to regulate midline repulsion. PNAS (2005) 102(12):4613–8. [Abstract/Free Full Text]
  • Yu H. Towards answering biological questions with experimental evidence: Automatically identifying text that summarize image content in full-text articles. In: AMIA Annu Symp Proc (2006) Washington, DC. 834–8.
  • Muller H, Deselaers T, Lehmann T, et al. Overview of the ImageCLEFmed 2006 medical retrieval and annotation tasks. In. CLEF 2006 Working Notes, Alicante, Spain. 2006.
  • Lacoste C, Chevallet JP, Lim J, et al. IPAL knowledge-based medical image retrieval in ImageCLEFmed (2006). In:. In: CLEF 2006 Working Notes (2006) Alicante, Spain. *.
  • Voorhees EM, Tice DM. The TREC-8 question answering track evaluation. In. Voorhees EM, Harman D, eds. (2000) Gaithersburg, Maryland. Proceedings of the Eigth Text REtrieval Conference (TREC-8). NIST, 2000.
  • Mollá D, Vicedo J. Question answering in restricted domains: An overview. Comput Linguist (2007) 33(1):41–61. [CrossRef]
  • Hersh W, Cohen AM, Roberts P. TREC 2006 genomics track overview. In. (2006) Gaithersburg, Maryland. The Fifteenth Text Retrieval Conference — TREC (2006). NIST.
  • Zweigenbaum P. Question answering in biomedicine. In. Proc Workshop on Natural Language Processing for Question Answering, EACL 2003 — de Rijke M, Webber B, eds. (2003). 1–4. Budapest. ACL.
  • Ely J, Osheroff J, Chambliss M, et al. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc (2005) 12:217–24. [Abstract/Free Full Text]
  • Jacquemart P, Zweigenbaum P. Towards a medical question-answering system: a feasibility study. In: Stud Health Technol Inform (2003) Vol. 95. Amsterdam: IOS Press. 463–8. [Medline]
  • Huang X, Lin J, Demner-Fushman D. Evaluation of PICO as a knowledge representation for clinical questions. In: AMIA Annu Symp Proc (2006) Washington, DC. 359–63.
  • Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist (2007) 33:63–103. *. [CrossRef]
  • Lin J, Demner-Fushman D. The role of knowledge in conceptual retrieval: A study in the domain of clinical medicine. In. (2006) Seattle, Washington. 469–76. 29th ACM SIGIR Conference Retrieval (SIGIR).
  • (Yu, Lee et al., 2007) ⇒ Yu H, Lee M, Kaufman D, et al. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. J Biomed Inform (2007) 40(3):236–51. [CrossRef][ISI][Medline]
  • Zhou W, Yu C, Torvik V, et al. A concept-based framework for passage retrieval in Genomics. In. Proceedings of Fifteenth Text REtrieval Conference, Gaithersburg. 2006.
  • Demner-Fushman D, Humphrey S, Ide N, et al. Finding relevant passages in scientific articles: fusion of automatic approaches vs. an interactive team effort. In. Proceedings of Fifteenth Text REtrieval Conference, Gaithersburg. (2006). *.
  • Jiang J, He X, Zhai C. Robust pseudo feedback estimation and HMM passage extraction: UIUC at TREC 2006 Genomics Track. In. Proceedings of Fifteenth Text REtrieval Conference, Gaithersburg. 2006.
  • Caporaso J, Baumgartner W, Kim H, et al. Concept recognition, information retrieval, and machine learning in genomics question-answering. In. Proceedings of Fifteenth Text REtrieval Conference, Gaithersburg. 2006.
  • Divoli A, Hearst M, Nakov P, et al. BioText team report for the TREC 2006 Genomics Track. In. Proceedings of Fifteenth Text REtrieval Conference, Gaithersburg. 2006.
  • Zheng H, Lin C, Huang L, et al. Using profile matching and text categorization for answer extraction in TREC Genomics. In. Proceedings of Fifteenth Text REtrieval Conference, Gaithersburg. 2006.
  • Swanson DR. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med (1986) 30:7–18. [ISI][Medline]
  • Yetisgen-Yildiz M, Pratt W. Using statistical and knowledge-based approaches for literature-based discovery. J Biomed Inform (2006) 39(6):600–11. *. [CrossRef][ISI][Medline]
  • Jelier R, Jenster G, Dorssers LC, et al. Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation. BMC Bioinform (2007) 8:14. [CrossRef][Medline]
  • Seki K, Mostafa J. Discovering implicit associations between genes and hereditary diseases. In: Pac Symp Biocomput 12 (2007) Maui, Hawaii. 316–27. *.
  • Pospisil P, Iyer LK, Adelstein SJ, et al. A combined approach to data mining of textual and structured data to identify cancer-related targets. BMC Bioinform (2006) 7(354).
  • Palakal M, Bright J, Sebastian T, et al. A comparative study of cells in inflammation, EAE and MS using biomedical literature data mining. J Biomed Sci (2007) 14(1):67–85. *. [CrossRef][ISI][Medline]
  • Rzhetsky A, Iossifov I, Loh JM, et al. Microparadigms: chains of collective reasoning in publications about molecular interactions. PNAS (2006) 103:4940–5.. [Abstract/Free Full Text]
  • Hristovski D, Friedman C, Rindflesch T, et al. Exploiting semantic relations for literature-based discovery. In: AMIA Annu Symp Proc (2006) Washington, DC. 349–53..
  • Demaine J, Martin J, Wei L, et al. LitMiner: integration of library services within a bio-informatics application. In: Biomed Digit Libr (2006) 3(11). doi:10.1186/1742-5581-3-11, http://www.bio-diglib.com/content/3/1/11, http://www.litminer.com/..
  • Xiang Z, Zheng W, He Y. BBP: Brucella genome annotation with literature mining and curation. BMC Bioinform (2006) 7(347). *.
  • Smalheiser NR, Torvik VI, Bischoff-Grethe A, et al. Collaborative development of the Arrowsmith two node search interface designed for laboratory investigators. In: J Biomed Discov Collab (2006) 1(8). http://www.j-biomed-discovery.com/content/1/1/8.
  • Aubry M, Monnier A, Chicault C, et al. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets. BMC Bioinform (2006) 7:241. [CrossRef][Medline]
  • Firth JR. Papers in Linguistics (1957) London: Oxford University Press. 1934–1951.
  • Habert B, Zweigenbaum P. Contextual acquisition of information categories: what has been done and what can be done automatically? In: The Legacy of Zellig Harris: Language and information into the 21st Century, Mathematics and computability of language — Nevin BE, Johnson SM, eds. (2002). Vol. 2. Amsterdam: John Benjamins. 203–31.
  • Hristovski D, Peterlin B, Mitchell J, et al. Using literature-based discovery to identify disease candidate genes. Int J Med Inform (2005) 74(2–4):289–98. [CrossRef][ISI][Medline]
  • Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform (2003) 36:462–77. [CrossRef][ISI][Medline]
  • Turtle HR, Croft WB. Evaluation of an inference network-based retrieval model. ACM T Inform Syst (1991) 9(3):187–222. [CrossRef]
  • Torvik VI, Smalheiser N. A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics (2007) *.
  • Yeh AS, Lynette Hirschman, Morgan AA. Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics (2003) 19(Suppl 1):i331–9. [Abstract]
  • Voorhees E. TREC: Improving information access through evaluation. In: Bulletin of the American Society for Information Science and Technology (2005) 32(1). http://www.asis.org/Bulletin/Oct-05/voorhees.html.
  • Wilbur WJ, Rzhetsky A, Shatkay H. New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinform (2006) 25(356)..
  • Pyysalo S, Ginter F, Heimonen J, et al. BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform (2007) 8(50).
  • Morgan AA, Ben Wellner, Colombe JB, et al. Evaluating the automatic mapping of human gene and protein mentions to unique identifiers. In: Pac Symp Biocomput 12 (2007) Maui, Hawaii. 281–91. *.
  • Karamanis N, Lewin I, Sealy R, et al. Integrating natural language processing with Flybase curation. In: Pac Symp Biocomput 12 (2007) Maui, Hawaii. 245–56. *.
  • Muller HM, Kenny EE, Sternberg PW. Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol (2004) 2(11):e309. [CrossRef][Medline]
  • Hersh W, Cohen A, Yang J, et al. Trec 2005 genomics track overview. In. (2005). Gaithersburg, Maryland. The Fourteenth Text Retrieval Conference — TREC 2005.
  • Lynette Hirschman, Yeh A, Blaschke C, et al. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinform (2005) 6.
  • Camon EB, Barrell DG, Dimmer EC, et al. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinform (2005) 6(Suppl 1):S17.
  • Wattarujeekrit T, Shah PK, Collier N. PASBio: predicate-argument structures for event extraction in molecular biology. BMC Bioinform (2004) 5:155. [CrossRef][Medline]
  • Chou WC, Tsai RTH, Su YS, et al. A semi-automatic method for annotating a biomedical proposition bank. In. (2006) Australia: Association for Computational Linguistics. Sydney. 5–12. Proceedings of the workshop on frontiers in linguistically annotated corpora 2006.
  • Cohen KB, Hunter L. A critical review of PASBio's argument structures for biomedical verbs. BMC Bioinform (2006) 7(Suppl 3):S5.
  • Rosario B, Hearst MA. Classifying semantic relations in bioscience texts. In. Proceedings of ACL 2004 (2004) 430–7.
  • Lynette Hirschman, Marc E. Colosimo, Morgan A, et al. Overview of BioCreative Task 1B: normalized gene lists. BMC Bioinform (2005) 6(Suppl 1):S11. *.
  • Cohen AM. Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In: Linking biological literature, ontologies and databases: mining biological semantics (2005) Detroit, Michigan: Association for Computational Linguistics. 17–24.
  • Fang HR, Kevin Murphy, Jin Y, et al. Human gene name normalization using text matching with automatically extracted synonym dictionaries. In: Linking natural language processing and biology: towards deeper biological literature analysis (2006) Brooklyn, New York: Association for Computational Linguistics. 41–8.
  • Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinform (2004) 5:1471–2105.
  • Shah PK, Jensen LJ, Boue S, et al. Extraction of transcript diversity from scientific literature. PLoS Comput Biol (2005) 1(1):67–73. [CrossRef][ISI]
  • Horn F, Lau AL, Cohen FE. Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics (2004) 20(4):557–68. [Abstract/Free Full Text]
  • Zweigenbaum P, Demner-Fushman D, Yu H, et al. New frontiers in biomedical text mining. In: Proc Pac Symp Biocomput 12 (2007) Maui, Hawaii. 205–8.
  • Lynette Hirschman, Bourne P, Cohen KB, et al. Translating Biology: text mining tools that work, (2007). (September 2007, date last accessed). http://psb.stanford.edu/cfp-nlp.html.
  • Verspoor K, Cohen KB, Mani I, et al. Introduction to BioNLP’06. In: Linking natural language processing and biology: towards deeper biological literature analysis (2006) Brooklyn, New York: Association for Computational Linguistics. 3–5.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 FrontiersInBiomedicalTextMiningHong Yu
Pierre Zweigenbaum
Dina Demner-Fushman
Kevin B. Cohen
Frontiers of Biomedical Text Mining: current progresshttps://pantherfile.uwm.edu/hongyu/www/files/articles/briefings.bioinformatics.pdf10.1093/bib/bbm045