2001 ConvolutionKernelsForNaturalLanguage

(Collins & Duffy, 2001) ⇒ Michael Collins, Nigel Duffy. (2001). “Convolution Kernels for Natural Language.” In: Proceedings of NIPS 2001.

Subject Headings: Tree Kernel Function, Reranking Algorithm

Notes

Cited By

~ 492 http://scholar.google.com/scholar?cites=16472213777115409804

- A kernel for labelled ordered trees was introduced by Collins and Duffy (2001, 2002) for a Natural Language Processing (NLP) task in which the goal is to rerank the candidate parse trees of a sentence generated by a probabilistic context free grammar (PCFG). The parse tree kernel evaluates the tree similarity by counting the number of common proper subtrees between two trees, weighting larger subtree fragments by an exponential decaying factor. Counting proper subtrees (and not simple subtrees) in a parse tree aims at do not split the production rules which constitute the tree.
Paper Presentation by Ruinan (Renie) Lu
- http://www.d.umn.edu/~rmaclin/cs8751/spring2003/talks/luxx0121.pdf

Quotes

Abstract

We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.

A Tree Kernel

This recursive kernel structure, where a kernel between two objects is defined in terms of kernels between its parts is quite a general idea. Haussler [10] goes into some detail describing which construction operations are valid in this context, i.e. which operations maintain the essential Mercer conditions. This paper and previous work by Lodhi et al. [12] examining the application of convolution kernels to strings provide some evidence that convolution kernels may provide an extremely useful tool for applying modern machine learning techniques to highly structured objects. The key idea here is that one may take a structured object and split it up into parts. If one can construct kernels over the parts then one can combine these into a kernel over the whole object. Clearly, this idea can be extended recursively so that one only needs to construct kernels over the “atomic” parts of a structured object. The recursive combination of the kernels over parts of an object retains information regarding the structure of that object.

References

Aizerman, M., Braverman, E., and Rozonoer, L. (1964). Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning. Automation and Remote Control, 25:821–837.
Bod, R. (1998). Beyond Grammar: An Experience-based Theory of Language. Cambridge University Press.
Eugene Charniak (1997). Statistical techniques for natural language parsing. In AI Magazine, Vol. 18, No. 4.
Michael Collins (2000). Discriminative Reranking for Natural Language Parsing. Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann.
Michael Collins and Duffy, N. (2001). Parsing with a Single Neuron: Convolution Kernels for Natural Language Problems. Technical report UCSC-CRL-01-01, University of California at Santa Cruz.
Cortes, C. and Vladimir N. Vapnik (1995). Support–Vector Networks. Machine Learning, 20(3):273–297.
Yoav Freund and Robert E. Schapire (1999). Large Margin Classification using the Perceptron Algorithm. In Machine Learning, 37(3):277–296.
Yoav Freund, Iyer, R., Robert E. SchapireE., & Singer, Y. (1998). “An efficient boosting algorithm for combining preferences.” In: Machine Learning: Proceedings of the Fifteenth International Conference.
Goodman, J. (1996). Efficient algorithms for parsing the DOP model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 96), pages 143-152.
(Haussler, 1999) ⇒ D. Haussler. (1999). “Convolution Kernels on Discrete Structures”. Technical Report UCSC-CLR-99-10, University of California at Santa Cruz.
Johnson, M. The DOP estimation method is biased and inconsistent. To appear in Computational Linguistics.
(Lodhi et al., 2001) ⇒ H. Lodhi, C. Saunders, John Shawe-Taylor, N. Cristianini, and C. Watkins. (2001). “Text classification using string kernels.” In: NIPS 2001.
Johnson, M., Geman, S., Canon, S., Chi, S., & Riezler, S. (1999). Estimators for stochastic ‘unification-based” grammars. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. San Francisco: Morgan Kaufmann.
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of english: The Penn treebank. Computational Linguistics, 19, 313-330.
Bernhard Schölkopf, Smola, A., and Muller, K.-R. (1999). Kernel principal component analysis. In Bernhard Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods – SV Learning, pages 327-352. MIT Press, Cambridge, MA.
(Watkins, 2000) ⇒ C. Watkins. (2000). “Dynamic alignment kernels.</o>” In A.J. Smola, P.L. Bartlett, B. Schlkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2001 ConvolutionKernelsForNaturalLanguage	Michael Collins Nigel Duffy			Convolution Kernels for Natural Language		Proceedings of NIPS 2001	http://books.nips.cc/papers/files/nips14/AA58.pdf			2001