2002 TextClassificationUsingStringKernels

(Lodhi et al., 2002) ⇒ Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, Chris Watkins. (2002). “Text Classification Using String Kernels.” In: The Journal of Machine Learning Research, 2.

Subject Headings: String Kernel Function, Text Classification Algorithm

Notes

Cited By

~567 http://scholar.google.com/scholar?cites=16820926272594478857

2003

(ZelenkoAR, 2003)
- "An excellent example is that of subsequence kernels (Lodhi et al., 2002). In this case, the objects are strings of characters, and the kernel function computes the number of common subsequences of characters in two strings, where each subsequence match is additionally decreased by the factor reflecting how spread out the matched subsequence in the original sequences (Lodhi et al., 2002). Despite the exponential number of features (subsequences), it is possible to compute the subsequence kernel in polytime.

Quotes

Abstract

We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of [math]\displaystyle{ k }[/math], since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with different decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations efficiently for large datasets.

References

Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July (1992). ACM Press.
W. B. Cavnar. Using an n-gram based document representation with a vector processing retrieval model. In D. K. Harman, editor, Proceedings of TREC –3, 3rd Text Retrieval Conference, pages 269–278, Gaithersburg, Maryland, US, (1994). National Institute of Standards and Technology, Gaithersburg, US. http://trec.nist.gov/pubs/trec3/t3proceedings.html.
N. Cristianini, A. Elisseef, and John Shawe-Taylor. On optimizing kernel alignment. Technical Report NC-TR-01-087, Royal Holloway, University of London, UK, 2001.
N. Cristianini, A. Elisseef, and John Shawe-Taylor. On kernel-target alignment. In Neural Information Processing System (NIPS ’01), to appear.
N. Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
T. Friess, N. Cristianini, and C. Campbell. The kernel-adatron: a fast and simple training procedure for support vector machines. In J. Shavlik, editor, Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann, 1998.
(Haussler, 1999) ⇒ D. Haussler. (1999). “Convolution Kernels on Discrete Structures. Technical Report UCSC-CLR-99-10, University of California at Santa Cruz.
S. Huffman. Acquaintance: Language-independent document categorization by n-grams. In D. K. Harman and E. M. Voorhees, editors, Proceedings of TREC –4, 4th Text Retrieval Conference, pages 359–371, Gaithersburg, Maryland, US, (1995). National Institute of Standards and Technology, Gaithersburg, US. http://trec.nist.gov/pubs/trec4/t3proceedings.html.
Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of the European Conference on Machine Learning, pages 137–142, Berlin, (1998). Springer.
Thorsten Joachims. Making large–scale SVM learning practical. In Bernhard Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184, Cambridge, MA, (1999). MIT Press.
H. Lodhi, John Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string

kernels. In T. K. Leen, Thomas G. Dietterich, and V. Tresp, editors, Adavances in Neural Infromation Processing Systems, pages 563–569, Cambridge, MA, (2001). MIT Press. 443

J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society London (A), 209: 415–446, 1909.
(Salton et al., 1975) ⇒ Gerard M. Salton, A. Wong, and C. Yang. (1975). “A Vector Space Model for Automatic Indexing.” In: Communications of the ACM, 18(11):613–620.
Bernhard Schölkopf. Support Vector Learning. R. Oldenbourg Verlag, München, (1997). Doktorarbeit, Technische Universit¨at Berlin. Available from http://www.kyb.tuebingen.mpg.de/~bs.
Bernhard Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R.Müller, G. R¨atsch, and Alexander J. Smola. Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5):1000–1017, 1999.
A. J. Smola and Bernhard Schölkopf. Sparse greedy matrix approximation for machine learning. In Pat Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning, pages 911–918. Morgan-Kauffman, 2000.
Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
(Watkins, 2000) ⇒ C. Watkins. (2000). “Dynamic alignment kernels.” In: A.J. Smola, P.L. Bartlett, B. Schlkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers.
C. Williams and M. Seeger. Using the Nystr¨om method to speed up kernel machines. In T. K. Leen, Thomas G. Dietterich, and V. Tresp, editors, Adavances in Neural Infromation

Processing Systems, pages 682–688, Cambridge, MA, (2001). MIT Press.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2002 TextClassificationUsingStringKernels	Huma Lodhi Craig Saunders John Shawe-Taylor Nello Cristianini Chris Watkins			Text Classification Using String Kernels			http://www.jmlr.org/papers/volume2/lodhi02a/lodhi02a.pdf