2006 ExploringSyntacticFeaturesforRe

Jump to: navigation, search

Subject Headings:


Cited By



This paper proposes to use a convolution kernel over parse trees to model syntactic structure information for relation extraction. Our study reveals that the syntactic structure features embedded in a parse tree are very effective for relation extraction and these features can be well captured by the convolution tree kernel. Evaluation on the ACE 2003 corpus shows that the convolution kernel over parse trees can achieve comparable performance with the previous best-reported feature-based methods on the 24 ACE relation subtypes. It also shows that our method significantly outperforms the previous two dependency tree kernels on the 5 ACE relation major types.

Related Work

Miller et al. (2000) address the task of relation extraction from the statistical parsing viewpoint. They integrate various tasks such as POS tagging, NE tagging, template extraction and relation extraction into a generative model. Their results essentially depend on the entire full parse tree."

Kambhatla (2004) employs Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text for relation extraction. … Kambhatla (2004) use the path of non-terminals connecting two mentions in a parse tree as the parse tree features.

Zhou et al. (2005) explore various features in relation extraction using SVM. They conduct exhaustive experiments to investigate the incorporation and the individual contribution of diverse features. They report that chunking information contributes to most of the performance improvement from the syntactic aspect. … Zhou et al. (2005) introduce additional chunking features to enhance the parse tree features.

However, the hierarchical structured information in the parse trees is not well preserved in their parse treerelated features.

As an alternative to the feature-based methods, kernel methods (Haussler, 1999) have been proposed to implicitly explore features in a high dimensional space by employing a kernel function to calculate the similarity between two objects directly. In particular, the kernel methods could be very effective at reducing the burden of feature engineering for structured objects in NLP research (Culotta and Sorensen, 2004). This is because a kernel can measure the similarity between two discrete structured objects directly using the original representation of the objects instead of explicitly enumerating their features.

Zelenko et al. (2003). develop a tree kernel for relation extraction. Their tree kernel is recursively defined in a top-down manner, matching nodes from roots to leaf nodes. For each pair of matching nodes, a subsequence kernel on their child nodes is invoked, which matches either contiguous or sparse subsequences of node.

Culotta and Sorensen (2004) generalize this kernel to estimate similarity between dependency trees. One may note that their tree kernel requires the matchable nodes must be at the same depth counting from the root node. This is a strong constraint on the matching of syntax so it is not surprising that the model has good precision but very low recall on the ACE corpus (Zhao and Grishman, 2005). In addition, according to the top-down node matching mechanism of the kernel, once a node is not matchable with any node in the same layer in another tree, all the sub-trees below this node are discarded even if some of them are matchable to their counterparts in another tree.

Bunescu and Mooney (2005) propose a shortest path dependency kernel for relation extraction. They argue that the information to model a relationship between entities is typically captured by the shortest path between the two entities in the dependency graph. Their kernel is very straightforward. It just sums up the number of common word classes at each position in the two paths. We notice that one issue of this kernel is that they limit the two paths must have the same length, otherwise the kernel similarity score is zero. Therefore, although this kernel shows non-trivial performance improvement than that of Culotta and Sorensen (2004), the constraint makes the two dependency kernels share the similar behavior: good precision but much lower recall on the ACE corpus.

Zhao and Grishman (2005) define a feature based composite kernel to integrate diverse features. Their kernel displays very good performance on the 2004 version of ACE corpus. Since this is a feature-based kernel, all the features used in the kernel have to be explicitly enumerated. Similar with the feature-based method, they also represent the tree feature as a link path between two entities. Therefore, we wonder whether their performance improvement is mainly due to the explicitly incorporation of diverse linguistic features instead of the kernel method itself.

The above discussion suggests that the syntactic features in a parse tree may not be fully utilized in the previous work, whether feature-based or kernel-based. We believe that the syntactic tree features could play a more important role than that reported in the previous work. Since convolution kernels aim to capture structural information in terms of sub-structures, which providing a viable alternative to flat features, in this paper, we propose to use a convolution tree kernel to explore syntactic features for relation extraction. To our knowledge, convolution kernels have not been explored for relation extraction

Convolution kernels were proposed as a concept of kernels for a discrete structure by Haussler (1999) in machine learning study. This framework defines a kernel between input objects by applying convolution “sub-kernels” that are the kernels for the decompositions (parts) of the objects. Convolution kernels are abstract concepts, and the instances of them are determined by the definition of “sub-kernels”. The Tree Kernel (Collins and Duffy, 2001), String Subsequence Kernel (SSK) (Lodhi et al., 2002) and Graph Kernel (HDAG Kernel) (Suzuki et al., 2003) are examples of convolution kernels instances in the NLP field.


  1. ACE. 2004. The Automatic Content Extraction (ACE) Projects. http://www.ldc.upenn.edu/Projects/ACE/.
  2. (Bunescu and Mooney, 2005) ⇒ Razvan C. Bunescu and Raymond Mooney. (2005). “A Shortest Path Dependency Kernel for Relation Extraction.” In: Proceedings of HLT/EMNLP-2005.
  3. Charniak E. (2001). Immediate-head Parsing for Language Models. ACL-2001
  4. (Collins and Duffy, 2001) ⇒ Michael Collins and N. Duffy. (2001). “Convolution Kernels for Natural Language.” In: Proceedings of NIPS-2001.
  5. (Culotta and Sorensen, 2004) ⇒ Aron Culottaand J. S. Sorensen. (2004). “Dependency Tree Kernels for Relation Extraction.” In: Proceedings ofACL 2004.
  6. (Haussler, 1999) ⇒ D. Haussler. (1999). “Convolution Kernels on Discrete Structures. Technical Report UCSC-CLR-99-10, University of California at Santa Cruz.
  7. (Joachims, 1998) ⇒ Thorsten Joachims. (1998). “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In: Proceedings of the European Conference on Machine Learning ECML-1998.
  8. (Kambhatla, 2004) ⇒ Nanda Kambhatla. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. Poster. In: Proceedings of [[ACL 2004]
  9. (Lodhi & al., 2002) ⇒ H. Lodhi, C. Saunders, John Shawe-Taylor, N. Cristianini, and C. Watkins. (2002). “Text classification using string kernels. The Journal of Machine Learning Research, vol:2.
  10. (Miller et al., 2000) ⇒ Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. (2000). “A Novel Use of Statistical Parsing to Extract Information from Text.” In: Proceedings of NAACL Conference (NAACL 2000).
  11. (Moschitti, 2004) ⇒ Alessandro Moschitti. (2004). “A study on Convolution Kernels for Shallow Semantic Parsing.” In: Proceedings of the 42-th Conference on Association for Computational Linguistic (ACL-2004), Barcelona, Spain, (2004). (website)
  12. MUC. 1987-1998. The nist MUC website: http: www.itl.nist.gov/iaui/894.02/related_projects/muc/
  13. Suzuki J., Hirao T., Sasaki Y. and Maeda E. (2003). Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data. ACL-2003
  14. (Vishwanathan and Smola, 2002) ⇒ S. V. N. Vishwanathan and A. J. Smola. (2001). “Fast Kernels for String and Tree Matching.” In: Proceedings of NIPS-2002.
  15. (ZelenkoAR, 2003) ⇒ D. Zelenko, C. Aone, and A. Richardella. (2003). “Kernel Methods for Relation Extraction. Journal of Machine Learning Research.
  16. (Zhao and Grishman, 2005) ⇒ S. Zhao and Ralph Grishman. (2005). “Extracting Relations with Integrated Information Using Kernel Methods.” In: Proceedings of or ACL-2005.
  17. (Zhou et al., 2005) ⇒ GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang (2005). “Exploring Various Knowledge in Relation Extraction.” In: Proceedings of ACL Conference (ACL 2005).;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 ExploringSyntacticFeaturesforReMin Zhang
Jie Zhang
Jian Su
Exploring Syntactic Features for Relation Extraction Using a Convolution Tree Kernel10.3115/1220835.12208722006
AuthorMin Zhang +, Jie Zhang + and Jian Su +
doi10.3115/1220835.1220872 +
titleExploring Syntactic Features for Relation Extraction Using a Convolution Tree Kernel +
year2006 +