2006 NPsForEvents

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Entity Mention Coreference Resolution Benchmark Task, Reuters Corpus.

Notes

Cited By

Quotes

Abstract

This paper describes a pilot project which developed a methodology for NP and event coreference annotation consisting of detailed annotation schemes and guidelines. In order to develop this, a small sample annotated corpus in the domain of terrorism/security was built. The methodology developed can be used as a basis for large-scale annotation to produce much-needed resources. In contrast to related projects, ours focused almost exclusively on the development of annotation guidelines and schemes, to ensure that future annotations based on this methodology capture the phenomena both reliably and in detail. The project also involved extensive discussions in order to redraft the guidelines, as well as major extensions to PALinkA, our existing annotation tool, to accommodate event as well as NP coreference annotation.

1. Introduction

  • The computational treatment of coreference has recently become an important topic in Natural Language Processing (NLP). A wide range of applications including question answering, information extraction and multidocument summarisation benefit from coreference information. Progress in the interpretation of coreference of noun phrases (NPs) and events depends on the availability of suitable annotated corpora. In order to build such corpora, appropriate guidelines and schemes which allow the annotation of data need to be formulated.
  • To date, there exist several resources related to noun phrase and event coreference, but these are still relatively few. There are a number of small corpora annotated for within-document NP coreference (e.g. Ge, 1998; Mitkov et al., 2000). Other resources related to coreference and event annotation do not concentrate solely on these annotations, and include the TimeBank corpus (Pustejovsky et al., 2003b) and the corpus developed in the ACE program. Several annotation schemes have been developed to annotate existing resources, but an investigation showed that none of these were completely appropriate for our task.
  • This paper reports the efforts of a pilot project which investigated NP and event coreference. The main objective of this project was to develop a methodology, consisting of detailed annotation schemes and guidelines, for the marking of NP and event coreference within documents. In order to develop the guidelines and schemes, a sample annotated corpus in the domain of terrorism/security was built. This methodology can be used as a basis for large-scale annotation to produce much-needed resources in the future. In contrast to other annotation projects, this project focused almost exclusively on the development of guidelines and schemes for the annotation of NP and event coreference, which should ensure that future annotations based on this methodology capture the phenomena both reliably and in detail.
  • All the resources developed in the project can be found on the project web page at http://clg.wlv.ac.uk/projects/NP4E to redraft and improve the guidelines, as well as major changes to enable our existing annotation tool PALinkA (Orasan, 2003), to accommodate events as well as NPs.

References

ACE. http://www.itl.nist.gov/iaui/894.01/tests/ace/ Bagga, A., Baldwin, B. (1999). Cross-document event coreference: annotations, experiments and observations. In: Proceedings of the ACL’99 Workshop on Coreference and its Applications. pp. 1-8. Botley, S. (1999). Corpora and discourse anaphora: using corpus evidence to test theoretical claims. PhD Thesis. University of Lancaster. Bruneseaux, F., Romary, L. (1997). Codage des références et coréférences dans les dialogues hommemachine. In Proceedings of ACH-ALLC '97. pp. 15-17. Davies, S., Poesio, M., Bruneseaux, F., Romary, L. (1998). Annotating coreference in dialogues: proposal for a scheme for MATE. First draft. Available at http://www.hcrc.ed.ac.uk/~poesio/MATE/anno_manual .html de Rocha, M. (1997). Supporting anaphor resolution with a corpus-based probabilistic model. In: Proceedings of the ACL'97/EACL'97 Workshop on Operational Factors in Practical, Robust Anaphora Resolution. pp. 54-61. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. Fligelstone, S. (1992). Developing a scheme for annotating text to show anaphoric relations. In G. Leitner (Ed.) New Directions in English Language Corpora: Methodology, Results, Software Developments. Berlin: Mouton de Gruyter, pp. 153-170. Garside, R., Fligelstone, S., Botley, S. (1997) Discourse annotation: anaphoric relations in corpora. In R. Garside, G. Leech, A. McEnery (Eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. London: Longman, pp. 66-84. Ge, N. (1998). Annotating the Penn Treebank with coreference information. Internal report, Department of Computer Science, Brown University. Lynette Hirschman (1997). MUC-7 coreference task definition. Version 3.0 Mitkov, R., Evans, R., Orasan, C., Barbu, C., Jones, L., Sotirova, V. (2000). Coreference and anaphora: developing annotating tools, annotated resources and annotation strategies. In: Proceedings of DAARC2000. pp. 49-58. Orasan, C. (2003). PALinkA: A highly customisable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialogue, ACL’03. pp. 39-43. Passonneau, R. J., Litman, D. L. (1997). Discourse segmentation by human and automated means. Computational Linguistics 23(1), pp. 103-139. Poesio, M., Vieira, R. (1998). A corpus-based investigation of definite description use. Computational Linguistics 24(2), pp. 183-216. James Pustejovsky, Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G. (2003). TimeML: Robust Specification of Event and Temporal Expressions in Text. In: Proceedings of IWCS-5, Fifth International Workshop on Computational Semantics. James Pustejovsky, Hanks, P., Sauri, R., See, A., Gaizuaskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., Lazo, M. (2003). The TIMEBANK Corpus. In Proceedings of Corpus Linguistics (2003). pp. 647-656. Rose, T.G., Stevenson, M., Whitehead, M. (2002). The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources. In: Proceedings of LREC2002. pp. 827-833 Setzer, A., Gaizauskas, R. (2000). Annotating Events and Temporal Information in Newswire Texts. In Proceedings of LREC2000. pp. 1287-1293. Setzer, A., Gaizauskas, R. (2002). On the Importance of Annotating Event-Event Temporal Relations in Text. In Proceedings of the Workshop on Annotation Standards for Temporal Information in Natural Language, LREC2002. pp. 52-60. van Deemter, K., Kibble, R. (1999). What is coreference and what should coreference annotation be? In Proceedings of the ACL’99 Workshop on Coreference and its Applications. pp. 90-96.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 NPsForEventsConstantin Orasan
Laura Hasler
Karin Naumann
NPs for Events: Experiments in Coreference Annotationhttp://clg.wlv.ac.uk/projects/NP4E/539 pdf.pdf