2006 PreemptiveIEUsingUnresRelDiscov

Jump to navigation Jump to search

Subject Headings: Unrestricted Relation Discovery Task, Open Information Extraction.


Cited By


  • (Banko et al., 2007) ⇒ Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. (2007). “Open Information Extraction from the Web.” In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2007).
    • Also this year, (Shinyama & Sekine, 2006) described an approach to “unrestricted relation discovery” that was developed independently of our work, and tested on a collection of 28,000 newswire articles. This work contains the important idea of avoiding relation-specificity, but does not scale to the Web as explained below. Given a collection of documents, their system first performs clustering of the entire set of articles, partitioning the corpus into sets of articles believed to discuss similar topics. Within each cluster, named-entity recognition, co-reference resolution and deep linguistic parse structures are computed and then used to automatically identify relations between sets of entities. This use of “heavy” linguistic machinery would be problematic if applied to the Web. Shinyama and Sekine’s system, which uses pairwise vector-space clustering, initially requires an O(D2) effort where D is the number of documents. Each document assigned to a cluster is then subject to linguistic processing, potentially resulting in another pass through the set of input documents. This is far more expensive for large document collections than TEXTRUNNER’s O(D+T log T ) runtime as presented earlier. From a collection of 28,000 newswire articles, Shinyama and Sekine were able to discover 101 relations. While it is difficult to measure the exact number of relations found by TEXTRUNNER on its 9,000,000 Web page corpus, it is at least two or three orders of magnitude greater than 101.




  • Eugene Agichtein and L. Gravano. (2000). Snowball: Extracting Relations from Large Plaintext Collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL-00).
  • Sergey Brin. (1998). Extracting Patterns and Relations from the World Wide Web. In WebDB Workshop at EDBT ’98.
  • Eugene Charniak. (2000). A maximum-entropy-inspired parser. In: Proceedings of NAACL-2000.
  • Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. (2004). Discovering relations among named entities from large corpora. In: Proceedings of the Annual Meeting of Association of Computational Linguistics (ACL-04).
  • Adam Meyers, Ralph Grishman, Michiko Kosaka, and Shubin Zhao. 2001a. Covering Treebanks with GLARF. In ACL/EACL Workshop on Sharing Tools and Resources for Research and Education.
  • Adam Meyers, Michiko Kosaka, Satoshi Sekine, Ralph Grishman, and Shubin Zhao. 2001b. Parsing and GLARFing. In: Proceedings of RANLP-2001, Tzigov Chark, Bulgaria.
  • Deepak Ravichandran and Eduard Hovy. (2002). Learning surface text patterns for a question answering system. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL).
  • Ellen Riloff. (1996). Automatically Generating Extraction Patterns from Untagged Text. In: Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96).
  • Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman. (2003). An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In: Proceedings of the Annual Meeting of Association of Computational Linguistics (ACL-03).
  • Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja Huttunen. (2000). Unsupervised Discovery of Scenario-level Patterns for Information Extraction. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-00).


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 PreemptiveIEUsingUnresRelDiscovSatoshi Sekine
Yusuke Shinyama
Preemptive Information Extraction Using Unrestricted Relation Discoveryhttp://cs.nyu.edu/yusuke/research/hlt-naacl-2006-preemptive-information-extraction-using-unrestricted-relation-discovery-paper.pdf