# Unsupervised Coreference Resolution System

A Unsupervised Coreference Resolution System is a Coreference Resolution System that is based on Unsupervised Machine Learning System.

**Context:**- It can solve a Unsupervised Coreference Resolution Task by implementing a Unsupervised Coreference Resolution Algorithm.

**Example(s):**- A Multi-Pass Sieve Coreference Resolution System (Raghunathan et al., 2010),
- A Generative Unsupervised Coreference Resolution System via EM Clustering (Ng, 2008),
- A Bayesian Unsupervised Coreference Resolution System (Haghighi & Klein, 2007).
- A Joint Unsupervised Coreference Resolution System with Markov Logic (Poon & Domingos, 2008)

**Counter-Example(s):****See:**Coreference Resolution System, Unsupervised Machine Learning System, Clustering Task, Entity Mention Normalization System, Natural Language Processing System, Information Extraction System.

## References

### 2011

- (Zheng et al., 2011) ⇒ Jiaping Zheng, Wendy W. Chapman, Rebecca S. Crowley, and Guergana K. Savova. (2011). “Coreference Resolution: A Review of General Methodologies and Applications in the Clinical Domain.” In: Journal of Biomedical Informatics, 44(6). doi:10.1016/j.jbi.2011.08.006
- QUOTE:
The first substantial effort to tackle the coreference resolution task in an unsupervised manner is described in Haghighi and Klein. They adopted a fully generative, nonparametric Bayesian model, based on hierarchical Dirichlet processes. For each document, the goal was to find the assignment of the entity indices Z for all the mentions X that maximizes the posterior probability P(Z|X). Documents are represented as mixture models, with infinite number of components, which correspond to the number of entities. An entity is drawn from a nonparametric Dirichlet process, and then the head of the mention is generated from a symmetric Dirichlet distribution. Furthermore, a pronoun head model and a salience model are designed to improve performance on pronouns by modeling additional grammatical and semantic features (gender, number, and semantic type) and recency. They achieved F-scores ranging from 62.3% to 70.3%.

Ng presented a generative unsupervised model that views coreference as an Expectation-Maximization (EM) clustering process. The model operates at the document level to induce a partition (a valid clustering) of the mentions.

- QUOTE:

### 2010

- (Raghunathan et al., 2010) ⇒ Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher Manning. (2010). “A Multi-pass Sieve for Coreference Resolution.” In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.
- QUOTE: We propose an unsupervised sieve-like approach to coreference resolution that addresses these issues. The approach applies tiers of coreference models one at a time from highest to lowest precision. Each tier builds on the entity clusters constructed by previous models in the sieve, guaranteeing that stronger features are given precedence over weaker ones. Furthermore, each model’s decisions are richly informed by sharing attributes across the mentions clustered in earlier tiers. This ensures that each decision uses all of the information available at the time. We implemented all components in our approach using only deterministic models. All our components are unsupervised, in the sense that they do not require training on gold coreference links.

### 2009

- (Wick et al., 2009) ⇒ Michael Wick, Aron Culotta, Khashayar Rohanimanesh, Andrew McCallum. (2009). “An Entity Based Model for Coreference Resolution.” In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009)
- QUOTE: Statistical approaches to coreference resolution can be broadly placed into two categories: generative models, which model the joint probability, and discriminative models that model that conditional probability. These models can be either supervised (uses labeled coreference data for learning) or unsupervised (no labeled data is used).

### 2008a

- (Ng, 2008) ⇒ Vincent Ng. (2008). “Unsupervised Models for Coreference Resolution.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing.
- QUOTE: ... we present a generative, unsupervised model for probabilistically inducing coreference partitions on unlabeled documents, rather than classifying mention pairs, via EM clustering (Section 2). In fact, our model combines the best of two worlds: it operates at the document level, while exploiting essential linguistic constraints on coreferent mentions (e.g., gender and number agreement) provided by traditional pairwise classification models.

### 2008b

- (Clark and González-Brenes, 2008) ⇒ Jonathan H. Clark, José P. González-Brenes. (2008). “Coreference: Current Trends and Future Directions." CMU course on Language and Statistics II Literature Review.
- QUOTE: Haghighi and Klein (2007) proposed a hierarchical Dirichlet Process to find the referents of mentions within a document. They extend their solution to find coreferents across documents with the entities being shared across the corpus. The number of clusters are determined by the inference (...). Their work was the first unsupervised approach to report performance “in the same range” of fully supervised approaches for coreference resolution (...)
Ng (2008) conceptualizes the coreference resolution problem as inducing coreference partitions on unlabeled documents, rather than classifying whether mention pairs are coreferent. For this they modify the Expectation-Maximization (EM) algorithm, so that the number of clusters does not have to be predetermined. Instead of initializing the model with a uniform distribution over clusters, the model is initialized with a small amount of labeled data for the first iteration of EM (...)

Poon and Domingos (2008) present an unsupervised model using Markov Logic Network (MLN). MLN is a first-order knowledge base with a weight attached to each formula; if the weight is infinite, then the MLN behaves exactly as first-order logic does. With finite weights when a world violates a formula in a MLN, the world becomes less probable, but not impossible (Richardson and Domingos, 2006). The basic idea in a MLN is to soften the constraints imposed by a set of first-order logic formulas. Under the hood, MLNs use first-order logic as a language to define a template that will be extended as a Markov network. The Markov network is created with one node per ground atom and one feature per ground clause. This combines first-order logic and probabilistic graphical models into a single representation.

- QUOTE: Haghighi and Klein (2007) proposed a hierarchical Dirichlet Process to find the referents of mentions within a document. They extend their solution to find coreferents across documents with the entities being shared across the corpus. The number of clusters are determined by the inference (...). Their work was the first unsupervised approach to report performance “in the same range” of fully supervised approaches for coreference resolution (...)

### 2008c

- (Poon & Domingos, 2008) ⇒ Hoifung Poon, Pedro Domingos. (2008). “Joint Unsupervised Coreference Resolution with Markov Logic.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008).
- QUOTE: The lack of label information in unsupervised coreference resolution can potentially be overcome by performing joint inference, which leverages the “easy” decisions to help make related “hard” ones. Relations that have been exploited in supervised coreference resolution include transitivity (McCallum & Wellner, 2005) and anaphoricity (Denis & Baldridge, 2007). However, there is little work to date on joint inference for unsupervised resolution.
We address this problem using Markov logic, a powerful and flexible language that combines probabilistic graphical models and first-order logic (Richardson & Domingos, 2006). Markov logic allows us to easily build models involving relations among mentions, like apposition and predicate nominals. By extending the state-of-the-art algorithms for inference and learning, we developed the first general-purpose unsupervised learning algorithm for Markov logic, and applied it to unsupervised coreference resolution.

- QUOTE: The lack of label information in unsupervised coreference resolution can potentially be overcome by performing joint inference, which leverages the “easy” decisions to help make related “hard” ones. Relations that have been exploited in supervised coreference resolution include transitivity (McCallum & Wellner, 2005) and anaphoricity (Denis & Baldridge, 2007). However, there is little work to date on joint inference for unsupervised resolution.

### 2007

- (Haghighi & Klein, 2007) ⇒ Aria Haghighi, Dan Klein. (2007). “Unsupervised Coreference Resolution in a Nonparametric Bayesian Model.” In: Proceedings of ACL 2007.
- QUOTE: We present an unsupervised, nonparametric Bayesian approach to coreference resolution which models both global entity identity across a corpus as well as the sequential anaphoric structure within each document.