2007 LearningToExtrRelsFromTheWeb

Jump to: navigation, search

Subject Headings: Relation Detection from Text Algorithm, Semi-supervised Learning


  • Surprised that they do not reference Snowball.

Cited By



We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents.


A growing body of recent work in information extraction has addressed the problem of relation extraction (RE), identifying relationships between entities stated in text, such as LivesIn(Person, Location) or EmployedBy(Person, Company). Supervised learning has been shown to be effective for RE (Zelenko et al., 2003; Culotta and Sorensen, 2004; Bunescu and Mooney, 2006); however, annotating large corpora with examples of the relations to be extracted is expensive and tedious.

a search engine is used to find sentences on the web that mention both of the entities in each of the pairs. Although not all of the sentences for positive pairs will state the desired relationship, many of them will. Presumably, none of the sentences for negative pairs state the targeted relation.

Multiple instance learning (MIL) is a machine learning framework that exploits this sort of weak supervision, in which a positive bag is a set of instances which is guaranteed to contain at least one positive example, and a negative bag is a set of instances all of which are negative.

2 Problem Description

We address the task of learning a relation extraction system targeted to a fixed binary relationship R. The only supervision given to the learning algorithm is a small set of pairs of named entities that are known to belong (positive) or not belong (negative) to the given relationship. Table 1 shows four positive and two negative example pairs for the corporate acquisition relationship. For each pair, a bag of sentences containing the two arguments can be extracted from a corpus of text documents. The corpus is assumed to be sufficiently large and diverse such that, if the pair is positive, it is highly likely that the corresponding bag contains at least one sentence that explicitly asserts the relationship R between the two arguments. In Section 6 we describe a method for extracting bags of relevant sentences from the web. +/− Arg a1 Arg a2 + Google YouTube + Adobe Systems Macromedia + Viacom DreamWorks + Novartis Eon Labs − Yahoo Microsoft − Pfizer Teva

Using a limited set of entity pairs (e.g. those in Table 1) and their associated bags as training data, the aim is to induce a relation extraction system that can reliably decide whether two entities mentioned in the same sentence exhibit the target relationship or not. In particular, when tested on the example sentences from Figure 1, the system should classify S1, S3, and S4 as positive, and S2 and S5 as negative.

+/S1: Search engine giant Google has bought videosharing website YouTube in a controversial $1.6 billion deal. −/S2: The companies will merge Google’s search expertise with YouTube’s video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. +/S3: Google has acquired social media company, YouTube for 1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. +/S4: Drug giant Pfizer Inc. has reached an agreement to buy the private biotechnology firm Rinat Neuroscience Corp., the companies announced Thursday. −/S5: He has also received consulting fees from Alpharma, Eli Lilly and Company, Pfizer, Wyeth Pharmaceuticals, Rinat Neuroscience, Elan Pharmaceuticals, and Forest Laboratories. Figure 1: Sentence examples

  • As formulated above, the learning task can be seen as an instance of multiple instance learning. However, there are important properties that set it apart from problems previously considered in MIL. The most distinguishing characteristic is that the number of bags is very small, while the average size of the bags is very large."

3 Multiple Instance Learning

Since its introduction by Dietterich (1997), an extensive and quite diverse set of methods have been proposed for solving the MIL problem. For the task of relation extraction, we consider only MIL methods where the decision function can be expressed in terms of kernels computed between bag instances. This choice was motivated by the comparatively high accuracy obtained by kernel-based SVMs when applied to various natural language tasks, and in particular to relation extraction. Through the use of kernels, SVMs (Vapnik, 1998; Schölkopf and Smola, 2002) can work efficiently with instances that implicitly belong to a high dimensional feature space. When used for classification, the decision function computed by the learning algorithm is equivalent to a hyperplane in this feature space. Overfitting is avoided in the SVM formulation by requiring that positive and negative training instances be maximally separated by the decision hyperplane.

Gartner et al. (2002). adapted SVMs to the MIL setting using various multi-instance kernels. Two of these – the normalized set kernel, and the statistic kernel – have been experimentally compared to other methods by Ray and Craven (2005), with competitive results. Alternatively, a simple approach to MIL is to transform it into a standard supervised learning problem by labeling all instances from positive bags as positive. An interesting outcome of the study conducted by Ray and Craven (2005) was that, despite the class noise in the resulting positive examples, such a simple approach often obtains competitive results when compared against other more sophisticated MIL methods.

We believe that an MIL method based on multiinstance kernels is not appropriate for training datasets that contain just a few, very large bags. In a multi-instance kernel approach, only bags (and not instances) are considered as training examples

10 Conclusion

We have presented a new approach to relation extraction that leverages the vast amount of information available on the web. The new RE system is trained using only a handful of entity pairs known to exhibit and not exhibit the target relationship. We have extended an existing relation extraction kernel to learn in this setting and to resolve problems caused by the minimal supervision provided. Experimental results demonstrate that the new approach can reliably extract relations from web documents.


  • Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. (2003). Support vector machines for multiple-instance learning. In NIPS 15, pages 561–568, Vancouver, BC.MIT Press.
  • Collin F. Baker, Charles J. Fillmore, and John B. Lowe. (1998). The Berkeley FrameNet project. In: Proceedings of COLING–ACL ’98, pages 86–90, San Francisco, CA. Morgan Kaufmann Publishers.
  • Razvan C. Bunescu and Raymond Mooney. (2006). Subsequence kernels for relation extraction. In Yair Weiss, Bernhard Schölkopf, and J. Platt, editors, NIPS 18.
  • M. Craven and J. Kumlien. (1999). Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of ISMB’99, pages 77–86, Heidelberg, Germany.
  • Aron Culotta and Jeffrey Sorensen. (2004). Dependency tree kernels for relation extraction. In: Proceedings of ACL’04, pages 423–429, Barcelona, Spain, July.
  • Thomas G. Dietterich, Richard H. Lathrop, and Tomas Lozano-Perez. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1-2):31–71.
  • Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. (2005). Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence, 165(1):91–134.
  • T. Gartner, P.A. Flach, A. Kowalczyk, and A.J. Smola. (2002). Multi-instance kernels. In: Proceedings of ICML’02, pages 179–186, Sydney, Australia, July. Morgan Kaufmann.
  • Marti Hearst. (1992). Automatic acquisition of hyponyms from large text corpora. In: Proceedings of ACL’92, Nantes, France.
  • Judea Pearl. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3):241–288.
  • Soumya Ray and Mark Craven. (2005). Supervised versus multiple instance learning: An empirical comparison. In: Proceedings of ICML’05, pages 697–704, Bonn, Germany.
  • Bernhard Schölkopf and Alexander J. Smola. (2002). Learning with kernels - support vector machines, regularization, optimization and beyond. MIT Press, Cambridge, MA.
  • N. A. Smith and J. Eisner. (2005). Contrastive estimation: Training Log-Linear Models on Unlabeled Data. In: Proceedings of ACL’05, pages 354–362, Ann Arbor, Michigan.
  • Vladimir N. Vapnik. (1998). Statistical Learning Theory. John Wiley & Sons.
  • D. Zelenko, C. Aone, and A. Richardella. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3:1083–1106.
  • Q. Zhang, S. A. Goldman, W. Yu, and J. Fritts. (2002). Contentbased image retrieval using multiple-instance learning. In: Proceedings of ICML’02, pages 682–689.



 author    = {Bunescu, Razvan  and Mooney, Raymond},
 title     = {Learning to Extract Relations from the Web using Minimal Supervision},
 booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
 month     = {June},
 year      = {2007},
 address   = {Prague, Czech Republic},
 publisher = {Association for Computational Linguistics},
 pages     = {576--583},
 url       = {http://www.aclweb.org/anthology/P/P07/P07-0073}

} ,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 LearningToExtrRelsFromTheWebRazvan C. Bunescu
Raymond J. Mooney
Learning to Extract Relations from the Web using Minimal SupervisionProceedings of 2007 ACL Conferencehttp://acl.ldc.upenn.edu/P/P07/P07-1073.pdf2007