1998 UsingAGenerlizedInstSetForTextCat

From GM-RKB
Jump to navigation Jump to search

Subject Headings: k-Nearest Neighbor Algorithm, Text Classification Algorithm.

Notes

Quotes

Abstract

  • We investigate several recent approaches for text categorization under the framework of similarity-based learning. They include two families of text categorization techniques, namely the k-nearest neighbor (k-NN) algorithm and linear classifiers. After identifying the weakness and strength of each technique, we propose a new technique known as the generalized instance set (GIS) algorithm by unifying the strengths of k-NN and linear classifiers and adapting to characteristics of text categorization problems. We also explore some variants of our GIS approach. We have implemented our GIS algorithm, the ExpNet algorithm, and some linear classifiers. Extensive experiments have been conducted on two common document corpora, namely the OHSUMED collection and the Reuters-21578 collection. The results show that our new approach outperforms the latest k-NN approach and linear classifiers in all experiments.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1998 UsingAGenerlizedInstSetForTextCatWai Lam
Chao Yang Ho
Using a Generalized Instance Set for Automatic Text Categorizationhttp://cui.unige.ch/~ehrler/Project/Gambone/UsingAGeneralizedInstanceSetForAutomaticTextCategorization.pdf10.1145/290941.290961