Subject Headings: Record Canonicalization Algorithm, Record Deduplication Algorithm, CORA Citation Matching Benchmark Task, Coreference Resolution System.




Recently, many advanced machine learning approaches have been proposed for coreference resolution; however, all of the discriminatively-trained models reason over mentions, rather than entities. That is, they do not explicitly contain variables indicating the “canonical” values for each attribute of an entity (e.g., name, venue, title, etc.). This canonicalization step is typically implemented as a post-processing routine to coreference resolution prior to adding the extracted entity to a database. In this paper, we propose a discriminatively-trained model that jointly performs coreference resolution and canonicalization, enabling features over hypothesized entities. We validate our approach on two different coreference problems: newswire anaphora resolution and research paper citation matching, demonstrating improvements in both tasks and achieving an error reduction of up to 62% when compared to a method that reasons about mentions only.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 AnEntityBasedModelForCorefResolutionMichael Wick
Aron Culotta
Khashayar Rohanimanesh
Andrew McCallum
An Entity Based Model for Coreference ResolutionProceedings of the SIAM International Conference on Data Mininghttp://maroo.cs.umass.edu/pub/web/getpdf.php?id=8622009