2002 TextClassificationUsingStringKernels

From GM-RKB
Jump to navigation Jump to search

Subject Headings: String Kernel Function, Text Classification Algorithm

Notes

Cited By

2003

  • (ZelenkoAR, 2003)
    • "An excellent example is that of subsequence kernels (Lodhi et al., 2002). In this case, the objects are strings of characters, and the kernel function computes the number of common subsequences of characters in two strings, where each subsequence match is additionally decreased by the factor reflecting how spread out the matched subsequence in the original sequences (Lodhi et al., 2002). Despite the exponential number of features (subsequences), it is possible to compute the subsequence kernel in polytime.

Quotes

Abstract

We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of [math]\displaystyle{ k }[/math], since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with different decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations efficiently for large datasets.


References

  • Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July (1992). ACM Press.
  • W. B. Cavnar. Using an n-gram based document representation with a vector processing retrieval model. In D. K. Harman, editor, Proceedings of TREC –3, 3rd Text Retrieval Conference, pages 269–278, Gaithersburg, Maryland, US, (1994). National Institute of Standards and Technology, Gaithersburg, US. http://trec.nist.gov/pubs/trec3/t3proceedings.html.
  • N. Cristianini, A. Elisseef, and John Shawe-Taylor. On optimizing kernel alignment. Technical Report NC-TR-01-087, Royal Holloway, University of London, UK, 2001.
  • N. Cristianini, A. Elisseef, and John Shawe-Taylor. On kernel-target alignment. In Neural Information Processing System (NIPS ’01), to appear.
  • N. Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
  • T. Friess, N. Cristianini, and C. Campbell. The kernel-adatron: a fast and simple training procedure for support vector machines. In J. Shavlik, editor, Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann, 1998.
  • (Haussler, 1999) ⇒ D. Haussler. (1999). “Convolution Kernels on Discrete Structures. Technical Report UCSC-CLR-99-10, University of California at Santa Cruz.
  • S. Huffman. Acquaintance: Language-independent document categorization by n-grams. In D. K. Harman and E. M. Voorhees, editors, Proceedings of TREC –4, 4th Text Retrieval Conference, pages 359–371, Gaithersburg, Maryland, US, (1995). National Institute of Standards and Technology, Gaithersburg, US. http://trec.nist.gov/pubs/trec4/t3proceedings.html.
  • Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of the European Conference on Machine Learning, pages 137–142, Berlin, (1998). Springer.
  • Thorsten Joachims. Making large–scale SVM learning practical. In Bernhard Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184, Cambridge, MA, (1999). MIT Press.
  • H. Lodhi, John Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string

kernels. In T. K. Leen, Thomas G. Dietterich, and V. Tresp, editors, Adavances in Neural Infromation Processing Systems, pages 563–569, Cambridge, MA, (2001). MIT Press. 443

Processing Systems, pages 682–688, Cambridge, MA, (2001). MIT Press.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 TextClassificationUsingStringKernelsHuma Lodhi
Craig Saunders
John Shawe-Taylor
Nello Cristianini
Chris Watkins
Text Classification Using String Kernelshttp://www.jmlr.org/papers/volume2/lodhi02a/lodhi02a.pdf