2010 BuildingASemAnnCorpusOfClinicRecs

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Semantically Annotated Corpus, Clinical Record, Annotation Methodology.

Notes

Quotes

Author Key words

Abstract

1. Introduction

  • We describe the creation of a semantically annotated corpus of clinical texts. The documents of this corpus are drawn from the free text component of patient records, and the annotations capture clinically significant information communicated by these texts. The corpus is intended for use in developing and evaluating systems that can automatically extract this kind of clinically significant information from the textual component of patient records. The corpus has been created within the context of the CLinical E-Science Framework (CLEF) project [1]: a multi-site research project that has been developing the technology and techniques required for a high quality repository of electronic patient records. Such a repository must meet high standards of security and interoperability, and should enable ethical and user-friendly access to patient information, so as to facilitate both clinical care and biomedical research. CLEF has chosen to work in the area of cancer informatics, as one of the project partners

References

  • [1] Rector A, Rogers J, Taweel A, Ingram D, Kalra D, Milan J, et al. CLEF — joining up healthcare with clinical and post-genomic research. In: Proceedings of UK e-Science All Hands Meeting 2003. Nottingham, UK; 2003. p. 264–267.
  • [2] Grishman R. Information Extraction. In: Mitkov R, editor. The Oxford Handbook of Computational Linguistics; 2003. Chapter 30.
  • [3] Harkema H, Roberts I, Gaizauskas R, Hepple M. Information Extraction from Clinical Records. In: Cox SJ, Walker DW, editors. Proceedings of the UK e-Science All Hands Meeting 2005. Nottingham, UK; 2005. p. 254–258.
  • [4] Riloff E. Automatically Generating Extraction Patterns from Untagged Text. In: AAAI/IAAI, Vol. 2; 1996. p. 1044–1049.
  • [5] Roberts A, Gaizauskas R, Hepple M, Davis N, Demetriou G, Guo Y, et al. The CLEF Corpus: Semantic Annotation of Clinical Text. In: Proc AMIA Symp. Chicago, IL, USA; 2007. p. 625–629.
  • [6] Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Setzer A, et al. Semantic Annotation of Clinical Text: The CLEF Corpus. In: Proceedings of Building and evaluating resources for biomedical text mining: workshop at Sixth International Conference on Language Resources and Evaluation, LREC 2008. Marrakech, Morocco: ELRA; 2008. .
  • [7] Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus — a semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(1):i180–i182.
  • [8] Kim JD, Ohta T, Tsujii J. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics. 2008;9(1).
  • [9] Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, et al. Integrated Annotation for Biomedical Information Extraction. In: Hirschman L, Pustejovsky J, editors. HLT-NAACL 2004Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases. Boston, Massachusetts, USA: Association for Computational Linguistics; 2004. p. 61–68.
  • [10] Franzén K, Gunnar, Eriksson, Olsson F, Asker L, Lidén P, et al. Protein names and how to find them. Int J Med Inform. 2002;67(1–3):49–61.
  • [11] Rosario B, Hearst MA. Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics; 2004. p. 430.
  • [12] Rosario B, Hearst MA. Multi-way relation classification: application to protein-protein interactions. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Morristown, NJ, USA: Association for Computational Linguistics; 2005. p. 732–739.
  • [13] Alex B, Grover C, Haddow B, Kabadjov M, Klein E, Matthews M, et al. The ITI TXM Corpora: Tissue Expressions and Protein-Protein Interactions. In: Proceedings of Building and evaluating resources for biomedical text mining: Workshop at Sixth International Conference on Language Resources and Evaluation, LREC 2008. Marrakech, Morocco; 2008. p. 11–18. In press.
  • [14] Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ. GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics. 2005;6(Suppl 1)(S3).
  • [15] Nédellec C. Learning Language in Logic - Genic Interaction Extraction Challenge. In: Proceedings of the ICML05 Workshop on Learning Language in Logic. Bonn, Germany; 2005. p. 31–37.
  • [16] TREC Genomics Track. [cited 6 June 2008]; Available from http://ir.ohsu.edu/genomics;.
  • [17] Pestian JP, Brew C, Matykiewicz P, Hovermale D, Johnson N, Cohen KB, et al. A shared task involving multi-label classification of clinical free text. In: Biological, translational, and clinical language processing. Prague, Czech Republic: Association for Computational Linguistics; 2007. p. 97–104.
  • [18] Hersh WR, Muller H, Jensen JR, Yang J, Gorman PN, Ruch P. Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection. J Am Med Inform Assoc. 2006;13(5):488–496.
  • [19] Mller H, Deselaers T, Lehmann TM, Clough PD, Hersh W. Overview of the ImageCLEFmed 2006 medical retrieval and annotation tasks. In: Cross Language Evaluation Forum (CLEF)Workshop 2006. vol. 4730. Alicante, Spain: Springer; 2007. p. 595–608.
  • [20] i2b2 NLP shared task. [cited 6 June 2008]; Available from http://ir.ohsu.edu/genomics/;.
  • [21] Ogren PV, Savova G, Buntrock JD, Chute CG. Building and Evaluating Annotated Corpora for Medical NLP Systems. In: Proc AMIA Symp; 2006. p. 1050.
  • [22] Meystre S, Haug PJ. Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. Journal of Biomedical Informatics. 2006;39(6):589–599.
  • [23] Denny JC, Smithers JD, Miller RA, Spickard A. “Understanding” Medical School Curriculum Content Using KnowledgeMap. Journal of the American Medical Informatics Association. 2003;10(4):351–362.
  • [24] Elkin PL, Brown SH, Bauer BA, Husser CS, Carruth W, Bergstrom LR, et al. A controlled trial of automated classification of negation from clinical notes. BMC Medical Informatics and Decision Making. 2005;5(13).
  • [25] Friedman C, Hripcsak G. Evaluating natural language processors in the clinical domain. Methods of Information in Medicine. 1998;37(4-5):334–44.
  • [26] International Classification of Diseases (ICD).
  • [cited 6 June 2008]; Available from http://www.who.int/classifications/icd;.
  • [27] Rogers J, Puleston C, Rector A. The CLEF Chronicle: Patient Histories Derived from Electronic Health Records. Data Engineering Workshops, 2006 Proceedings 22nd International Conference on. 2006;p. x109–x109.
  • [28] Hallett C, Power R, Scott D. Summarisation and Visualisation of e-Health Data Repositories. In: Proceedings of the UK e-Science All Hands Meeting. Nottingham, UK; 2006. p. 69–77.
  • [29]Gennari JH, Musen MA, Fergerson RW, Grosso WE, Crubézy M, Eriksson H, et al. The evolution of Protégé: an environment for knowledge-based systems development. International Journal Human-Computer Studies. 2003;58(1):89– 123.
  • [30] Ogren PV. Knowtator: a Protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Morristown, NJ, USA: Association for Computational Linguistics; 2006. p. 273–275.
  • [31] Defense Advanced Research Projects Agency. Proceedings of the Seventh Message Understanding Conference (MUC-7); 1998. Available at http://www.itl.nist.gov/iaui/894.02/related projects/muc/.
  • [32] Boisen S, Crystal MR, Schwartz R, Stone R, Weischedel R. Annotating resources for information extraction. In: Proceedings of the Second Language Resources and Evaluation, LREC 2000; 2000. p. 1211–1214.
  • [33] Demetriou G, Gaizauskas R, Sun H, Roberts A. ANNALIST – ANNotation ALIgnment and Scoring Tool. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. Marrakech, Morocco: ELRA; 2008. In press.
  • [34] Hripcsak G, Rothschild A. Agreement, F-measure and reliability in information retrieval. J Am Med Inform Assoc. 2005 May-June;12(3):296–298.
  • [35] Cunningham H, Maynard D, Bontcheva K, Tablan V. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia, PA, USA; 2002. p. 168–175.
  • [36] GATE – General Architecture for Text Engineering. [cited 6 June 2008]; Available from http://gate.ac.uk;.
  • [37] UMLS Knowledge Sources, 2007AB; 2007.
  • [38] Pustejovsky J, no JC, Ingria R, Saur´i R, Gaizauskas R, Setzer A, et al. TimeML: Robust Specification of Event and Temporal Expressions in Text. In: Proceedings of the Fifth International Workshop on Computational Semantics (IWCS-5). Tilburg; 2003. .
  • [39] Verhagen M, Gaizauskas R, Schilder F, Hepple M, Katz G, Pustejovsky J. SemEval-2007 Task 15: TempEval Temporal Relation Identification. In: Proceedings of the 4th International Workshop on Semantic Evaluations. Prague; 2007. p. 75–80.
  • [40] Mani I, Wilson G. Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000). New Brunswick, New Jersey; 2000. p. 69–76.
  • [41] Harkema H, Gaizauskas R, Hepple M, Davis N, Guo Y, Roberts A, et al. A Large-Scale Resource for Storing and Recognizing Technical Terminology. In: Proceedings of 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal; 2004. p. 83–86.
  • [42] Lindberg D, Humphreys B, McCray A. The Unified Medical Language System. Methods Inf Med. 1993;32(4):281–291.
  • [43] Li Y, Bontcheva K, Cunningham H. SVM Based Learning System for Information Extraction. In: Deterministic and statistical methods in machine learning: first international workshop. No. 3635 in Lecture Notes in Computer Science. Springer; 2005. p. 319–339.
  • [44] Roberts A, Gaizauskas R, Hepple M, Guo Y. Combining terminology resources and statistical methods for entity recognition: an evaluation. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. Marrakech, Morocco; 2008. .
  • [45] Roberts A, Gaizauskas R, Hepple M. Extracting Clinical Relationships from Patient Narratives. In: Proceedings of the Workshop on BioNLP 2008. Columbus, OH, USA: Association for Computational Linguistics; 2008. .
  • [46] Thompson CA, Califf ME, Mooney RJ. Active learning for natural language parsing and information extraction. In: Proceedings16th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA; 1999. p. 406–414.
  • [47] Ghani R, Jones R, Mitchell T, Riloff E. Active Learning For Information Extraction With Multiple View Feature Sets. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003) Workshop on Adaptive Text Extraction and Mining; 2003. .
  • [48] SAFE, the Semantic Annotation Factory Environment. [cited 2 October 2008]; Available from http://gate.ac.uk/safe/;.
  • [49] BioNotate. [cited 2 October 2008]; Available from http://sourceforge.net/projects/bionotate/;.
  • [50] Clinical E-Science Framework: Sheffield NLP. [cited 2 October 2008]; Available from http://nlp.shef.ac.uk/clef/;.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 BuildingASemAnnCorpusOfClinicRecsAngus Roberts
Robert Gaizauskas
Mark Hepple
George Demetriou
Yikun Guo
Ian Roberts
Andrea Setzer
Building a Semantically Annotated Corpus of Clinical Textshttp://eprints.whiterose.ac.uk/10186/10.1016/j.jbi.2008.12.013