2006 MakingTreeKernelsPracticalForNLP

(Moschitti, 2006) ⇒ Alessandro Moschitti. (2006). “Making Tree Kernels practical for Natural Language Learning.” In: Proceedings of ECML 2006.

Subject Headings: Relational Data Kernel Function, Semantic Role Labeling

Notes

Presentation
- http://www.uniroma2.it/didattica/BdDD/deposito/kernel-methods.pdf

Cited By

Quotes

Abstract

“In recent years tree kernels have been proposed for the automatic learning of natural language applications. Unfortunately, they show (a) an inherent super linear complexity and (b) a lower accuracy than traditional attribute/value methods. In this paper, we show that tree kernels are very helpful in the processing of natural language as (a) we provide a simple algorithm to compute tree kernels in linear average running time and (b) our study on the classification properties of diverse tree kernels show that kernel combinations always improve the traditional methods. Experiments with Support Vector Machines on the predicate argument classification task provide empirical support to our thesis.

Introduction

Our fast algorithm computes the kernels between two syntactic parse trees in O(m + n) average time, where m and n are the number of nodes in the two trees. This low complexity allows SVMs to carry out experiments on hundreds of thousands of training instances since it is not higher than the complexity of the polynomial kernel, widely used on large experimentation e.g. (Pradhan et al., 2004).

Subtrees and Subset Trees

“We define as a subtree (ST) any node of a tree along with all its descendants. For example, the line in Figure 1 circles the subtree rooted in the NP node. A subset tree (SST) is a more general structure. The difference with the subtrees is that the leaves can be associated with non-terminal symbols. The SSTs satisfy the constraint that they are generated by applying the same grammatical rule set which generated the original tree. For example, [S [N VP]] is a SST of the tree in Figure 1 which has two non-terminal symbols, N and VP, as leaves.

Related Work

In (Collins and Duffy, 2002), the SST tree kernel was experimented with the Voted Perceptron for the parse-tree reranking task. The combination with the original PCFG model improved the syntactic parsing. Additionally, it was alluded that the average execution time depends on the number of repeated productions.
In (Vishwanathan and Smola, 2002), a linear complexity algorithm for the computation of the ST kernel is provided (in the worst case). The main idea is the use of the suffix trees to store partial matches for the evaluation of the string kernel (Lodhi et al., 2000). This can be used to compute the ST fragments once the tree is converted into a string. To our knowledge, ours is the first application of the ST kernel for a natural language task.
In (Kazama and Torisawa, 2005), an interesting algorithm that speeds up the average running time is presented. Such algorithm looks for node pairs that have in common a large number of trees (malicious nodes) and applies a transformation to the trees rooted in such nodes to make faster the kernel computation. The results show an increase of the speed similar to the one produced by our method.
In (Zelenko et al., 2003), two kernels over syntactic shallow parser structures were devised for the extraction of linguistic relations, e.g. person affiliation. To measure the similarity between two nodes, the contiguous string kernel and the sparse string kernel (Lodhi et al., 2000) were used. In (Culotta and Sorensen, 2004) such kernels were slightly generalized by providing a matching function for the node pairs. The time complexity for their computation limited the experiments on data set of just 200 news items. Moreover, we note that the above tree kernels are not convolution kernels as those proposed in this article.
In (Shen et al., 2003), a tree-kernel based on Lexicalized Tree Adjoining Grammar (LTAG) for the parse-reranking task was proposed. Since QTK was used for the kernel computation, the high learning complexity forced the authors to train different SVMs on different slices of training data. Our FTK, adapted for the LTAG tree kernel, would have allowed SVMs to be trained on the whole data.
In (Cumby and Roth, 2003), a feature description language was used to extract structural features from the syntactic shallow parse trees associated with named entities. The experiments on the named entity categorization showed that when the description language selects an adequate set of tree fragments the Voted Perceptron algorithm increases its classification accuracy. The explanation was that the complete tree fragment set contains many irrelevant features and may cause overfitting.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2006 MakingTreeKernelsPracticalForNLP	Alessandro Moschitti			Making Tree Kernels practical for Natural Language Learning			http://ai-nlp.info.uniroma2.it/moschitti/articles/EACL2006.pdf