2008 TheTradeoffsBetOpenAndTradRelExtr

Jump to navigation Jump to search

Subject Headings: Self-Supervised Information Extraction, Open Information Extraction.


Cited By



Traditional Information Extraction (IE) takes a relation name and hand-tagged examples of that relation as input. Open IE is a relation-independent extraction paradigm that is tailored to massive and heterogeneous corpora such as theWeb. An Open IE system extracts a diverse set of relational tuples from text without any relation-specific input. How is Open IE possible? We analyze a sample of English sentences to demonstrate that numerous relationships are expressed using a compact set of relation-independent lexico-syntactic patterns, which can be learned by an Open IE system.

What are the tradeoffs between Open IE and traditional IE? We consider this question in the context of two tasks. First, when the number of relations is massive, and the relations themselves are not pre-specified, we argue that Open IE is necessary. We then present a new model for Open IE called O-CRF and show that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous state-of-the-art Open IE system. Second, when the number of target relations is small, and their names are known in advance, we show that O-CRF is able to match the precision of a traditional extraction system, though at substantially lower recall. Finally, we show how to combine the two types of systems into a hybrid that achieves higher precision than a traditional extractor, with comparable recall.


3.1.1 Training

As with O-NB, O-CRF’s training process is self-supervised. O-CRF applies a handful of relation-independent heuristics to the PennTreebank and obtains a set of labeled examples in the form of relational tuples. The heuristics were designed to capture dependencies typically obtained via syntactic parsing and semantic role labelling. For example, a heuristic used to identify positive examples is the extraction of noun phrases participating in a subject-verb-object relationship, e.g., “<Einstein> received <the Nobel Prize> in 1921.” An example of a heuristic that locates negative examples is the extraction of objects that cross the boundary of an adverbial clause, e.g. “He studied <Einstein’s work> when visiting <Germany>.”

4 Hybrid Relation Extraction

Since O-CRF and R1-CRF have complementary views of the extraction process, it is natural to wonder whether they can be combined to produce a more powerful extractor. In a variety of machine learning settings, the use of an ensemble of diverse classifiers during prediction has been observed to yield higher levels of performance compared to individual algorithms. We now describe an ensemble-based or hybrid approach to RE that leverages the different views offered by open, self-supervised extraction in O- CRF , and lexicalized, supervised extraction in R1- CRF .

4.1 Stacking

Stacked generalization , or stacking , (Wolpert, 1992), is an ensemble-based framework in which the goal is learn a meta-classifier from the output of sev- eral base-level classifiers. The training set used to train the meta-classifier is generated using a leave- one-out procedure: for each base-level algorithm, a classifier is trained from all but one training example and then used to generate a prediction for the left- out example. The meta-classifier is trained using the predictions of the base-level classifiers as features, and the true label as given by the training data. Previous studies (Ting and Witten, 1999; Zenko and Dzeroski, 2002; Sigletos et al., 2005) have shown that the probabilities of each class value as estimated by each base-level algorithm are effective features when training meta-learners. Stacking was shown to be consistently more effective than voting, another popular ensemble-based method in which the outputs of the base-classifiers are combined ei- ther through majority vote or by taking the class value with the highest average probability.

4.2 Stacked Relation Extraction

We used the stacking methodology to build an ensemble-based extractor, referred to as H-CRF. Treating the output of an O-CRF and R1-CRF as black boxes, H-CRF learns to predict which, if any, tokens found between a pair of entities [math]\displaystyle{ (e_1,e_2) }[/math], indicates a relationship. Due to the sequential nature of our RE task, H-CRF employs a CRF as the meta-learner, as opposed to a decision tree or regression-based classifier. H-CRF uses the probability distribution over the set of possible labels according to each O-CRF and R1-CRF as features. To obtain the probability at each position of a linear-chain CRF, the constrained forward-backward technique described in (Culotta and McCallum, 2004) is used. H-CRF also computes the Monge Elkan distance (Monge and Elkan, 1996) between the relations predicted by O-CRF and R1-CRF and includes the result in the feature set. An additional meta-feature utilized by H-CRF indicates whether either or both base extractors return “no relation” for a given pair of entities. In addition to O-CRF these numeric features, H-CRF uses a subset of the base features used by O-CRF and R1-CRF. At each given position i between e 1 and e 2 , the presence of the word observed at i as a feature, as well as the presence of the part-of-speech-tag at i.

P R F1
P R F1
93.9 65.1 76.9
100 38.6 55.7
89.1 36.0 51.3
100 9.7 55.7
95.2 50.0 65.6
95.2 25.3 40.0
95.7 46.8 62.9
100 25.5 40.6
0 0 0
0 0 0
88.3 45.2 59.8
86.6 23.2 36.6

Table 2: Open Extraction by Relation Category. O-CRF outperforms O-NB, obtaining nearly double its recall and increased precision. O-CRF’s gains are partly due to its lower false positive rate for relationships categorized as “Other.”,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 TheTradeoffsBetOpenAndTradRelExtrOren Etzioni
Michele Banko
The Tradeoffs Between Open and Traditional Relation Extractionhttp://turing.cs.washington.edu/papers/acl08.pdf