2007 UnsupervisedResOfObjectsAndRelsOnTheWeb

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Unsupervised Entity Mention Coreference Resolution Algorithm.

Notes

Cited By

Quotes

Abstract

  • The task of identifying synonymous relations and objects, or Synonym Resolution (SR), is critical for high-quality information extraction. The bulk of previous SR work assumed strong domain knowledge or hand-tagged training examples. This paper investigates SR in the context of unsupervised information extraction, where neither is available. The paper presents a scalable, fully-implemented system for SR that runs in O(KN log N) time in the number of extractions N and the maximum number of synonyms per word, K. The system, called RESOLVER, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. Given two million assertions extracted from the Web, RESOLVER resolves objects with 78% precision and an estimated 68% recall and resolves relations with 90% precision and 35% recall.

1. Introduction

  • Web Information Extraction (WIE) systems extract assertions that describe a relation and its arguments from Web text (e.g., (is capital of,D.C., United States)). WIE systems can extract hundreds of millions of assertions containing millions of different strings from the Web (e.g., the TEXTRUNNER system (Banko et al., 2007)).1 WIE systems often extract assertions that describe the same real-world object or relation using different names. For example, a WIE system might extract (is capital city of,Washington,U.S.), which describes the same relationship as above but contains a different name for the relation and each argument.
  • Synonyms are prevalent in text, and the Web corpus is no exception. Our data set of two million assertions extracted from a Web crawl contained over a half-dozen different names each for the United States and Washington, D.C., and three for the “is capital of” relation. The top 80 most commonly extracted objects had an average of 2.9 extracted names per entity, and several had as many as 10 names. The top 100 most commonly extracted relations had an average of 4.9 synonyms per relation.
  • We refer to the problem of identifying synonymous object and relation names as Synonym Resolution (SR).2 An SR system for WIE takes a set of assertions as input and returns a set of clusters, with each cluster containing coreferential object strings or relation strings. Previous techniques for SR have focused on one particular aspect of the problem, either objects or relations. In addition, the techniques either depend on a large set of training examples, or are tailored to a specific domain by assuming knowledge of the domain’s schema. …

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 UnsupervisedResOfObjectsAndRelsOnTheWebAlexander Yates
Oren Etzioni
Unsupervised Resolution of Objects and Relations on the Webhttp://acl.ldc.upenn.edu/N/N07/N07-1016.pdf