2010 SupervisedNounPhraseCoreference

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Noun-Phrase Coreference Resolution, Supervised Learning Algorithm.

Notes

Cited By

Quotes

Author Keywords

Abstract

The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.

1. Introduction

Noun phrase (NP) coreference resolution, the task of determining which NPs in a text or dialogue refer to the same real-world entity, has been at the core of natural language processing (NLP) since the 1960s. NP coreference is related to the task of anaphora resolution, whose goal is to identify an antecedent for an anaphoric NP (i.e., an NP that depends on another NP, specifically its antecedent, for its interpretation) [see van Deemter and Kibble (2000) for a detailed discussion of the difference between the two tasks]. Despite its simple task definition, coreference is generally considered a difficult NLP task, typically involving the use of sophisticated knowledge sources and inference procedures (Charniak, 1972). Computational theories of discourse, in particular focusing (see Grosz (1977) and Sidner (1979)) and centering (Grosz et al. (1983; 1995)), have heavily influenced coreference research in the 1970s and 1980s, leading to the development of numerous centering algorithms (see Walker et al. (1998)).

The focus of coreference research underwent a gradual shift from heuristic approaches to machine learning approaches in the 1990s. This shift can be attributed in part to the advent of the statistical NLP era, and in part to the public availability of annotated coreference corpora produced as part of the MUC-6 (1995) and MUC-7 (1998) conferences. Learning-based coreference research has remained vibrant since then, with results regularly published not only in general NLP conferences, but also in specialized conferences (e.g., the biennial Discourse Anaphora and Anaphor Resolution Colloquium (DAARC)) and workshops (e.g., the series of Bergen Workshop on Anaphora Resolution (WAR)). Being inherently a clustering task, coreference has also received a lot of attention in the machine learning community.

Fifteen years have passed since the first paper on learning-based coreference resolution was published (Connolly et al., 1994). Our goal in this paper is to provide NLP researchers with a survey of the major milestones in supervised coreference research, focusing on the computational models, the linguistic features, the annotated corpora, and the evaluation metrics that were developed in the past fifteen years. Note that several leading coreference researchers have published books (e.g., Mitkov (2002)), written survey articles (e.g., Mitkov (1999), Strube (2009)), and delivered tutorials (e.g., Strube (2002), Ponzetto and Poesio (2009)) that provide a broad overview of coreference research. This survey paper aims to complement, rather than supersede, these previously published materials. In particular, while existing survey papers discuss learning-based coreference research primarily in the context of the influential mention-pair model, we additionally survey recently proposed learning-based coreference models, which attempt to address the weaknesses of the mention-pair model. Due to space limitations, however, we will restrict our discussion to the most commonly investigated kind of coreference relation: the identity relation for NPs, excluding coreference among clauses and bridging references (e.g., part/whole and set/subset relations).



References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 SupervisedNounPhraseCoreferenceVincent NgSupervised Noun Phrase Coreference Research: The First Fifteen Years2010