- (Culotta et al., 2007) ⇒ Aron Culotta, Michael Wick, Robert Hall, and Andrew McCallum. (2007). “First-Order Probabilistic Models for Coreference Resolution.” In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL 2007).
- Proposes a Conditional Random Field with First-Order Logic for expressing Features. This enables features over sets of Entity Mentions.
- Proposes a Parameter Estimation method for these "weighted logic" models based on learning rankings and error-driven training.
- Achieves State-Of-The-Art results on ACE 2004 coreference task: 79 (a 45% reduction in error from the previous 69).
- ~74 http://scholar.google.com/scholar?q=%22First-Order+Probabilistic+Models+for+Coreference+Resolution%22+2007
- (Yang et al., 2008) ⇒ Xiaofeng Yang, Jian Su, Jun Lang, Chew Lim Tan, Ting Liu, and Sheng Li. (2008). “An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming.” In: Proceedings of ACL Conference (ACL 2008).
- One problem that arises with the entity-mention model is how to represent the knowledge related to an entity. In a document, an entity may have more than one mention. It is impractical to enumerate all the mentions in an entity and record their information in a single feature vector, as it would make the feature space too large. Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms. A solution seen in previous work (Luo et al., 2004; Culotta et al., 2007) is to design a set of first-order features summarizing the information of the mentions in an entity, for example, “whether the entity has any mention that is a name alias of the active mention?” or “whether most of the mentions in the entity have the same head word as the active mention?” These features, nevertheless, are designed in an ad-hoc manner and lack the capability of describing each individual mention in an entity. Culotta et al. (2007) present a system which uses an online learning approach to train a classifier to judge whether two entities are coreferential or not. The features describing the relationships between two entities are obtained based on the information of every possible pair of mentions from the two entities. Different from (Luo et al., 2004), the entity-level features are computed using a “Most-X” strategy, that is, two given entities would have a feature X, if most of the mention pairs from the two entities have the feature X.
- (Poon & Domingos, 2008) ⇒ Hoifung Poon, and Pedro Domingos. (2008). “Joint Unsupervised Coreference Resolution with Markov Logic.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008).
Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a first-order probabilistic model for coreference. We outline a set of approximations that make this approach practical, and apply our method to the ACE coreference dataset, achieving a 45% error reduction over a comparable method that only considers features of pairs of noun phrases. This result demonstrates an example of how a first-order logic representation can be incorporated into a probabilistic model and scaled efficiently.
- B. Amit and B. Baldwin. (1998). Algorithms for scoring coreference chains. In: Proceedings of the Seventh Message Understanding Conference (MUC7).
- Razvan C. Bunescu and Raymond Mooney. (2004). Collective information extraction with relational markov networks. In ACL.
- Y. Censor and S.A. Zenios. (1997). Parallel optimization : theory, algorithms, and applications. Oxford University Press.
- Michael Collins and Brian Roark. (2004). Incremental parsing with the perceptron algorithm. In ACL.
- Koby Crammer and Yoram Singer. (2003). Ultraconservative online algorithms for multiclass problems. JMLR, 3:951–991.
- Aron Culotta and Andrew McCallum. (2006). Tractable learning and inference with high-order representations. In ICML Workshop on Open Problems in Statistical Relational Learning, Pittsburgh, PA.
- Hal Daum´e III and Daniel Marcu. 2005a. A large-scale exploration of effective global features for a joint entity detection and tracking model. In HLT/EMNLP, Vancouver, Canada.
- Hal Daum´e III and Daniel Marcu. 2005b. Learning as search optimization: Approximate large margin methods for structured prediction. In ICML, Bonn, Germany.
- Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. (2005). Lifted first-order probabilistic inference. In IJCAI, pages 1319–1325.
- Pascal Denis and Jason Baldridge. (2007). A ranking approach to pronoun resolution. In IJCAI.
- Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, pages 363–370.
- H. Gaifman. 1964. Concerning measures in first order calculi. Israel J. Math, 2:1–18.
- J. Y. Halpern. (1990). An analysis of first-order logics of probability. Artificial Intelligence, 46:311–350.
- Xiaoqiang Luo, Abe Ittycheriah, Hongyan Jing, Nanda Kambhatla, and Salim Roukos. (2004). A mention-synchronous coreference resolution algorithm based on the Bell tree. In ACL, page 135.
- Andrew McCallum and Ben Wellner. (2003). Toward conditional models of identity uncertainty with application to proper noun coreference. In IJCAI Workshop on Information Integration on the Web.
- Andrew McCallum and Ben Wellner. (2005). Conditional models of identity uncertainty with application to noun coreference. In Lawrence K. Saul, Yair Weiss, and L´eon Bottou, editors, NIPS17. MIT Press, Cambridge, MA.
- Brian Milch, Bhaskara Marthi, and Stuart Russell. (2004). BLOG: Relational modeling with unknown objects. In ICML 2004Workshop on Statistical Relational Learning and Its Connections to Other Fields. Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag,
- Daniel L. Ong, and Andrey Kolobov. (2005). BLOG: Probabilistic models with unknown objects. In IJCAI.
- Vincent Ng and Claire Cardie. (2002). Improving machine learning approaches to coreference resolution. In ACL.
- Vincent Ng. (2005). Machine learning for coreference resolution: From local classification to global ranking. In ACL.
- Cristina Nicolae and Gabriel Nicolae. (2006). Bestcut: A graph algorithm for coreference resolution. In EMNLP, pages 275–283, Sydney, Australia, July. Association for Computational Linguistics.
- Mark A. Paskin. (2002). Maximum entropy probabilistic logic. Technical Report UCB/CSD-01-1161, University of California, Berkeley.
- D. Poole. (2003). First-order probabilistic inference. In IJCAI, pages 985–991, Acapulco, Mexico. Morgan Kaufman.
- Matthew Richardson and Pedro Domingos. (2006). Markov logic networks. Machine Learning, 62:107–136.
- Dan Roth and W. Yih. (2004). A linear programming formulation for global inference in natural language tasks. In The 8th Conference on Compuational Natural Language Learning, May.
- Parag Singla and Pedro Domingos. (2005). Discriminative training of markov logic networks. In AAAI, Pittsburgh, PA.
- Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. (2001). A machine learning approach to coreference resolution of noun phrases. Comput. Linguist., 27(4):521–544.
- Charles Sutton and Andrew McCallum. (2004). Collective segmentation and labeling of distant entities in information extraction. Technical Report TR # 04-49, University of Massachusetts, July.
- Charles Sutton and Andrew McCallum. (2005). Piecewise training of undirected models. In 21st Conference on Uncertainty in Artificial Intelligence.,
|Author||Aron Culotta +, Michael Wick +, Robert Hall + and Andrew McCallum +|
|journal||Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics +|
|title||First-Order Probabilistic Models for Coreference Resolution +|