2005 AShortestPathDependencyRelationExtraction

Jump to navigation Jump to search

Subject Headings: Dependency Grammar-based Relation Recognition Algorithm, Dependency Grammar-based Relation Recognition Classifier, ACE Benchmark Task, Relation Mention Recognition Algorithm.


Cited By

  • ~176 …



  • (Zhang et al., 2006b) ⇒ Min Zhang, Jie Zhang, and Jian Su. (2006). “Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel.” In: Proceedings of HLT Conference (HLT 2006).
    • Bunescu and Mooney (2005) propose a shortest path dependency kernel for relation extraction. They argue that the information to model a relationship between entities is typically captured by the shortest path between the two entities in the dependency graph. Their kernel is very straightforward. It just sums up the number of common word classes at each position in the two paths. We notice that one issue of this kernel is that they limit the two paths must have the same length, otherwise the kernel similarity score is zero. Therefore, although this kernel shows non-trivial performance improvement than that of Culotta and Sorensen (2004), the constraint makes the two dependency kernels share the similar behavior: good precision but much lower recall on the ACE corpus.



We present a novel approach to relation extraction, based on the observation that the information required to assert a relationship between two named entities in the same sentence is typically captured by the shortest path between the two entities in the dependency graph. Experiments on extracting top-level relations from the ACE (Automated Content Extraction) newspaper corpus show that the new shortest path dependency kernel outperforms a recent approach based on dependency tree kernels.

Learning with Dependency Paths

This is a simple kernel, whose computation takes O(n) time. If the two paths have different lengths, they correspond to different ways of expressing a relationship – for instance, they may pass through a different number of predicate argument structures. Consequently, the kernel is defined to be 0 in this case. Otherwise, it is the product of the number of common word classes at each position in the two paths.

Experimental Evaluation

We applied the shortest path dependency kernel to the problem of extracting top-level relations from the ACE corpus (NIST, 2000), the version used for the September 2002 evaluation.

Extracting dependencies using a CFG parser

Local dependencies can be extracted from a CFG parse tree using simple heuristic rules for finding the head child for each type of constituent. Alternatively, head-modifier dependencies can be directly output by a parser whose model is based on lexical dependencies. In our experiments, we used the full parse output from Collins’ parser (Collins, 1997), in which every non-terminal node is already annotated with head information. Because local dependencies assemble into a tree for each sentence, there is only one (shortest) path between any two entities in a dependency tree.

Experimental Results

A recent approach to extracting relations is described in (Culotta and Sorensen, 2004). The authors use a generalized version of the tree kernel from (Zelenko et al., 2003) to compute a kernel over relation examples, where a relation example consists of the smallest dependency tree containing the two entities of the relation. Precision and recall values are reported for the task of extracting the 5 top-level relations in the ACE corpus under two different scenarios:

  • [S1] This is the classic setting: one multi-class SVM is learned to discriminate among the 5 toplevel classes, plus one more class for the no-relation cases.
  • [S2] Because of the highly skewed data distribution, the recall of the SVM approach in the first scenario is very low. In (Culotta and Sorensen, 2004) the authors propose doing relation extraction in two steps: first, one binary SVM is trained for relation detection, which means that all positive relation instances are combined into one class. Then the thresholded output of this binary classifier is used as training data for a second multi-class SVM, which is trained for relation classification. The same kernel is used in both stages.

The shortest-path dependency kernels outperform the dependency kernel from (Culotta and Sorensen, 2004) in both scenarios, with a more significant difference for SP-CFG. An error analysis revealed that Collins’ parser was better at capturing local dependencies, hence the increased accuracy of SP-CFG. Another advantage of our shortest-path dependency kernels is that their training and testing are very fast – this is due to representing the sentence as a chain of dependencies on which a fast kernel can be computed. All the four SP kernels from Table 5 take between 2 and 3 hours to train and test on a 2.6GHz Pentium IV machine.

To avoid numerical problems, we constrained the dependency paths to pass through at most 10 words (as observed in the training data) by setting the kernel to 0 for longer paths. We also tried the alternative solution of normalizing the kernel, however this led to a slight decrease in accuracy. Having longer paths give larger kernel scores in the unnormalized version does not pose a problem because, by definition, paths of different lengths correspond to disjoint sets of features. Consequently, the SVM algorithm will induce lower weights for features occurring in longer paths, resulting in a linear separator that works irrespective of the size of the dependency paths.

Related Work

In (Zelenko et al., 2003), the authors do relation extraction using a tree kernel defined over shallow parse tree representations of sentences. The same tree kernel is slightly generalized in (Culotta and Sorensen, 2004) and used in conjunction with dependency trees. In both approaches, a relation instance is defined to be the smallest subtree in the parse or dependency tree that includes both entities. In this paper we argued that the information relevant to relation extraction is almost entirely concentrated in the shortest path in the dependency tree, leading to an even smaller representation. Another difference between the tree kernels above and our new kernel is that the tree kernels used for relation extraction are opaque i.e. the semantics of the dimensions in the corresponding Hilbert space is not obvious. For the shortest-path kernels, the semantics is known by definition: each path feature corresponds to a dimension in the Hilbert space. This transparency allows us to easily restrict the types of patterns counted by the kernel to types that we deem relevant for relation extraction. The tree kernels are also more time consuming, especially in the sparse setting, where they count sparse subsequences of children common to nodes in the two trees. In (Zelenko et al., 2003), the tree kernel is computed in O(mn) time, where [math]\displaystyle{ m }[/math] and [math]\displaystyle{ n }[/math] are the number of nodes in the two trees. This changes to O(mn^3) in the sparse setting.

Our shortest-path intuition bears some similarity with the underlying assumption of the relational pathfinding algorithm from (Richards and Mooney, 1992) : “in most relational domains, important concepts will be represented by a small number of fixed paths among the constants defining a positive instance – for example, the grandparent relation is defined by a single fixed path consisting of two parent relations.” We can see this happening also in the task of relation extraction from ACE, where “important concepts” are the 5 types of relations, and the “constants” defining a positive instance are the 5 types of entities.


Local and non-local (deep) dependencies are equally important for finding relations. In this paper we tried extracting both types of dependencies using a CCG parser, however another approach is to recover deep dependencies from syntactic parses, as in (Campbell, 2004; Levy and Manning, 2004). This may have the advantage of preserving the quality of local dependencies while completing the representation with non-local dependencies.

Currently, the method assumes that the named entities are known. A natural extension is to automatically extract both the entities and their relationships. Recent research (Roth and Yih, 2004) indicates that integrating entity recognition with relation extraction in a global model that captures the mutual influences between the two tasks can lead to significant improvements in accuracy.


  • Richard Campbell. (2004). Using linguistic principles to recover empty categories. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 645–652, Barcelona, Spain, July.
  • Michael J. Collins. (1997). Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97), pages 16–23.
  • Nello Cristianini and John Shawe-Taylor. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
  • (Culotta and Sorensen, 2004) ⇒ Aron Culottaand J. S. Sorensen. (2004). “Dependency Tree Kernels for Relation Extraction.” In: Proceedings ofACL 2004.
  • Ralph Grishman. (1995). Message Understanding Conference 6. http://cs.nyu.edu/cs/faculty/grishman/muc6.html.
  • Julia Hockenmaier and Mark Steedman. (2002). Generative models for statistical parsing with combinatory categorial grammar. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), pages 335–342, Philadelphia, PA.
  • Roger Levy and Christopher D. Manning. (2004). Deep dependencies from context-free statistical parsers: Correcting the surface dependency approximation. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 327–334, Barcelona, Spain, July. NIST. 2000.
  • ACE – Automatic Content Extraction. http://www.nist.gov/speech/tests/ace.
  • Soumya Ray and Mark Craven. (2001). Representing sentence structure in hidden Markov models for information extraction. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001), pages 1273–1279, Seattle, WA.
  • Bradley L. Richards and Raymond Mooney. (1992). Learning relations by pathfinding. In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), pages 50–55, San Jose, CA, July.
  • Dan Roth and W. Yih. (2004). A linear programming formulation for global inference in natural language tasks. In: Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), pages 1–8, Boston, MA. Mark Steedman. (2000). The Syntactic Process. The MIT Press, Cambridge, MA.
  • Vladimir N. Vapnik. (1998). Statistical Learning Theory. John Wiley & Sons.
  • (ZelenkoAR, 2003) ⇒ D. Zelenko, C. Aone, and A. Richardella. (2003). “Kernel Methods for Relation Extraction”. Journal of Machine Learning Research.



 author    = {Razvan C. Bunescu and
             Raymond Mooney},
 title     = {A Shortest Path Dependency Kernel for Relation Extraction.},
 booktitle = {HLT/EMNLP},
 year      = {2005},
 ee        = {http://acl.ldc.upenn.edu/H/H05/H05-1091.pdf},
 crossref  = {DBLP:conf/naacl/2005},
 bibsource = {DBLP, http://dblp.uni-trier.de}



 title     = {HLT/EMNLP 2005, Human Language Technology Conference and
              Conference on Empirical Methods in Natural Language Processing,
              Proceedings of the Conference, 6-8 October 2005, Vancouver,
              British Columbia, Canada},
 booktitle = {HLT/EMNLP},
 publisher = {The Association for Computational Linguistics},
 year      = {2005},
 bibsource = {DBLP, http://dblp.uni-trier.de}

} ,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 AShortestPathDependencyRelationExtractionRazvan C. Bunescu
Raymond J. Mooney
A Shortest Path Dependency Kernel for Relation ExtractionProceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processinghttp://delivery.acm.org/10.1145/1230000/1220666/p724-bunescu.pdf?key1=1220666&key2=4641327921&coll=DL&dl=ACM&CFID=9293060&CFTOKEN=6075728810.3115/1220575.12206662005