2007 EfficientAnnotationwiththeJenaA

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Jena ANnotation System.

Notes

Cited By

Quotes

Abstract

With ever-increasing demands on the diversity of annotations of language data, the need arises to reduce the amount of efforts involved in generating such value-added language resources. We introduce here the Jena ANnotation Environment (Jane), a platform that supports the complete annotation life-cycle and allows for 'focused' annotation based on active learning. The focus we provide yields significant savings in annotation efforts by presenting only informative items to the annotator. We report on our experience with this approach through simulated and real-world annotations in the domain of immunogenetics for NE annotations.

References

  • 1. Lynn Carlson, Daniel Marcu, Mary Ellen Okurowski, Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory, Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, p.1-10, September 01-02, 2001, Aalborg, Denmark doi:10.3115/1118078.1118083
  • 2. Sean P. Engelson, Ido Dagan, Minimizing Manual Annotation Cost in Supervised Training from Corpora, Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, p.319-326, June 24-27, 1996, Santa Cruz, California doi:10.3115/981863.981905
  • 3. Ben Hachey, Beatrice Alex, Markus Becker, Investigating the Effects of Selective Sampling on the Annotation Task, Proceedings of the Ninth Conference on Computational Natural Language Learning, June 29-30, 2005, Ann Arbor, Michigan
  • 4. Rebecca Hwa, Sample Selection for Statistical Grammar Induction, Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, p.45-52, October 07-08, 2000, Hong Kong doi:10.3115/1117794.1117800
  • 5. Jin-Dong Kim and Jun'ichi Tsujii. 2006. Corpora and their Annotation. In S. Ananiadou and J. McNaught, Editors, Text Mining for Biology and Biomedicine, Pp. 179--211. Artech.
  • 6. Mitchell P. Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini, Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, v.19 n.2, June 1993
  • 7. C. Müller and M. Strube. 2003. Multi-level Annotation in MMax. In: Proc. of the 4th SIGdial Workshop on Discourse and Dialogue, Pp. 198--207.
  • 8. Grace Ngai, David Yarowsky, Rule Writing Or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, p.117-125, October 03-06, 2000, Hong Kong doi:10.3115/1075218.1075234
  • 9. Tomoko Ohta, Yuka Tateisi, Jin-Dong Kim, The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain, Proceedings of the Second International Conference on Human Language Technology Research, March 24-27, 2002, San Diego, California
  • 10. Martha Palmer, Daniel Gildea, Paul Kingsbury, The Proposition Bank: An Annotated Corpus of Semantic Roles, Computational Linguistics, v.31 n.1, p.71-106, March 2005 doi:10.1162/0891201053630264
  • 11. David Pierce and Claire Cardie. 2001. Limitations of Co-training for Natural Language Learning from Large Datasets. In: Proc. of EMNLP 2001, Pp. 1--9.
  • 12. James Pustejovsky, Patrick Hanks, Roser Saurí, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro, and Marcia Lazo. 2003. The TimeBank Corpus. In: Proc. of the Corpus Linguistics 2003 Conference, Pp. 647--656.
  • 13. Burr Settles, Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, August 28-29, 2004, Geneva, Switzerland
  • 14. H. S. Seung, M. Opper, H. Sompolinsky, Query by Committee, Proceedings of the Fifth Annual Workshop on Computational Learning Theory, p.287-294, July 27-29, 1992, Pittsburgh, Pennsylvania, USA doi:10.1145/130385.130417
  • 15. Erik F. Tjong Kim Sang, Fien De Meulder, Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, p.142-147, May 31, 2003, Edmonton, Canada doi:10.3115/1119176.1119195
  • 16. Katrin Tomanek, Joachim Wermter, and Udo Hahn. 2007. An Approach to Downsizing Annotation Costs and Maintaining Corpus Reusability. In Proc of EMNLP-CoNLL 2007.
  • 17. Kees Van Deemter, Rodger Kibble, On Coreferring: Coreference in MUC and Related Annotation Schemes, Computational Linguistics, v.26 n.4, December 2000
  • 18. Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation, 39(2/3):165--210.

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 EfficientAnnotationwiththeJenaAKatrin Tomanek
Udo Hahn
Joachim Wermter
Efficient Annotation with the Jena ANnotation Environment (JANE)2007