Jump to navigation Jump to search
- (Shinyama & Sekine, 2006) ⇒ Yusuke Shinyama, Satoshi Sekine. (2006). “Preemptive Information Extraction Using Unrestricted Relation Discovery.” In: Proceedings of the HLT-NAACL Conference (HLT-NAACL 2006).
- (Banko et al., 2007) ⇒ Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. (2007). “Open Information Extraction from the Web.” In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2007).
- Also this year, (Shinyama & Sekine, 2006) described an approach to “unrestricted relation discovery” that was developed independently of our work, and tested on a collection of 28,000 newswire articles. This work contains the important idea of avoiding relation-specificity, but does not scale to the Web as explained below. Given a collection of documents, their system first performs clustering of the entire set of articles, partitioning the corpus into sets of articles believed to discuss similar topics. Within each cluster, named-entity recognition, co-reference resolution and deep linguistic parse structures are computed and then used to automatically identify relations between sets of entities. This use of “heavy” linguistic machinery would be problematic if applied to the Web. Shinyama and Sekine’s system, which uses pairwise vector-space clustering, initially requires an O(D2) effort where D is the number of documents. Each document assigned to a cluster is then subject to linguistic processing, potentially resulting in another pass through the set of input documents. This is far more expensive for large document collections than TEXTRUNNER’s O(D+T log T ) runtime as presented earlier. From a collection of 28,000 newswire articles, Shinyama and Sekine were able to discover 101 relations. While it is difficult to measure the exact number of relations found by TEXTRUNNER on its 9,000,000 Web page corpus, it is at least two or three orders of magnitude greater than 101.
- We are trying to extend the boundary of Information Extraction (IE) systems. Existing IE systems require a lot of time and human effort to tune for a new scenario. Preemptive Information Extraction is an attempt to automatically create all feasible IE systems in advance without human intervention. We propose a technique called Unrestricted Relation Discovery that discovers all possible relations from texts and presents them as tables. We present a preliminary system that obtains reasonably good results.
- Eugene Agichtein and L. Gravano. (2000). Snowball: Extracting Relations from Large Plaintext Collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL-00).
- Sergey Brin. (1998). Extracting Patterns and Relations from the World Wide Web. In WebDB Workshop at EDBT ’98.
- Eugene Charniak. (2000). A maximum-entropy-inspired parser. In: Proceedings of NAACL-2000.
- Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. (2004). Discovering relations among named entities from large corpora. In: Proceedings of the Annual Meeting of Association of Computational Linguistics (ACL-04).
- Adam Meyers, Ralph Grishman, Michiko Kosaka, and Shubin Zhao. 2001a. Covering Treebanks with GLARF. In ACL/EACL Workshop on Sharing Tools and Resources for Research and Education.
- Adam Meyers, Michiko Kosaka, Satoshi Sekine, Ralph Grishman, and Shubin Zhao. 2001b. Parsing and GLARFing. In: Proceedings of RANLP-2001, Tzigov Chark, Bulgaria.
- Deepak Ravichandran and Eduard Hovy. (2002). Learning surface text patterns for a question answering system. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL).
- Ellen Riloff. (1996). Automatically Generating Extraction Patterns from Untagged Text. In: Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96).
- Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman. (2003). An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In: Proceedings of the Annual Meeting of Association of Computational Linguistics (ACL-03).
- Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja Huttunen. (2000). Unsupervised Discovery of Scenario-level Patterns for Information Extraction. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-00).
|2006 PreemptiveIEUsingUnresRelDiscov||Satoshi Sekine|
|Preemptive Information Extraction Using Unrestricted Relation Discovery||http://cs.nyu.edu/yusuke/research/hlt-naacl-2006-preemptive-information-extraction-using-unrestricted-relation-discovery-paper.pdf|