2002 TheGENIAcorpus

Subject Headings: GENIA Corpus.

Notes

With the information overload in genome-related field, there is an increasing need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are building the ontology and the corpus simultaneously, using each other. In this paper we report on our new corpus, its ontological basis, annotation scheme, and statistics of annotated objects. We also describe the tools used for corpus annotation and management.,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2002 TheGENIAcorpus	Tomoko Ohta Jin-Dong Kim Yuka Tateisi			The GENIA corpus: an annotated research abstract corpus in molecular biology domain		Proceedings of the 2nd International Conference on Human Language Technology Research	http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/paper/hlt2002GENIA.pdf			2002