Coreference Clustering Task
(Redirected from Coreference Resolution)
- AKA: Coreference Resolution, Duplicate Referencer Detection.
- input: A Referencer Set (with some reference information).
- output: A set of Coreference Clusters.
- It can range from (typically) being an Entity Reference Clustering Task to being a Relation Reference Clustering Task.
- It can be solved by a Coreference Resolution System (that applies a Coreference Resolution algorithm).
- It can range from being a Heuristic Coreference Resolution Task to being a Data-Driven Coreference Resolution Task.
- It can range from being a General Coreference Resolution Task to being a Domain-Specific Coreference Resolution Task.
- It can support a Reference Grounding Task.
- See: Coreference Chain, Coreferential Expression Set, Ontology, Word Mention Clustering Task, Coreference Relation, Markable, Natural Language Processing Task, Tokenization, Sentence Segmentation, Part-Of-Speech Tagging, Named Entity Recognition.
- (Sawhney & Wang, 2015) ⇒ Kartik Sawhney and Rebecca Wang (2015) "Coreference Resolution" https://stanford.edu/~kartiks2/coref.pdf
- Overview - Coreference resolution refers to the task of clustering different mentions referring to the same entity. This is particularly useful in other NLP tasks, including retrieving information about specific named entities, machine translation, among others. In this report, we discuss our approach, implementation and observations for a few baseline systems, a rule-based system, and a classifier-based system. To quantify the effectiveness of our implementation, we use the MUC and B^3 measures (precision, recall and F1) for coreference evaluation. The difference in the two scoring metrics in how they define a coreference set within a text (in terms of links or in terms of classes or clusters) results in interesting observations as we discuss in the report.
- QUOTE: A given entity - representing a person, a location, or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language and supporting intelligent access to textual information requires identifying whether different entity mentions are actually referencing the same entity. The Coreference Resolution Demo processes unannotated text, detecting mentions of entities and showing which mentions are coreferential.
- (Wick et al., 2009) ⇒ Michael Wick, Aron Culotta, Khashayar Rohanimanesh, and Andrew McCallum. (2009). “An Entity Based Model for Coreference Resolution.” In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009).
- QUOTE: Coreference resolution is the problem of clustering mentions (or records) into sets referring to the same underlying entity (e.g., person, places, organizations). Over the past several years, increasingly powerful supervised machine learning techniques have been developed to solve this problem. Initial solutions treated it as a set of independent binary classifications, one for each pair of mentions [1, 2]. Next, relational probability models were developed to capture the dependency between each of these classifications [3, 4]; however the parameterization of these methods still consists of features over pairs of mentions. Finally, methods have been developed to enable arbitrary features over entire clusters of mentions [5, 6, 7].
- (Pasula, 2006) ⇒ Hanna Pasula. (2006). “Approximate Inference Techniques for Identity Uncertainty." Lecture
- QUOTE: Many interesting tasks, such as vehicle tracking, data association, and mapping, involve reasoning about the objects present in a domain. However, the observations on which this reasoning is to be based frequently fail to explicitly describe these objects' identities, properties, or even their number, and may in addition be noisy or nondeterministic. When this is the case, identifying the set of objects present becomes an important aspect of the whole task.
- (Soon et al., 2001) ⇒ Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. (2001). “A Machine Learning Approach to Coreference Resolution of Noun Phrases.” In: Computational Linguistics, Vol. 27, No. 4.
- QUOTE: A prerequisite for coreference resolution is to obtain most, if not all, of the possible markables in a raw input text.To determine the markables, a pipeline of natural language processing (NLP) modules is used, as shown in Figure 1. They consist of tokenization, sentence segmentation, morphological processing, part-of-speech tagging, noun phrase identification, named entity recognition, nested noun phrase extraction, and semantic class determination. As far as coreference resolution is concerned, the goal of these NLP modules is to determine the boundary of the markables, and to provide the necessary information about each markable for subsequent generation of features in the training examples.