1998 UsingAGenerlizedInstSetForTextCat
Jump to navigation
Jump to search
- (Lam & Ho, 1998) ⇒ Wai Lam, Chao Yang Ho. (1998). “Using a Generalized Instance Set for Automatic Text Categorization.” In: Proceedings of the 21st ACM SIGIR Conference (SIGIR 1998). doi:10.1145/290941.290961
Subject Headings: k-Nearest Neighbor Algorithm, Text Classification Algorithm.
Notes
Quotes
Abstract
- We investigate several recent approaches for text categorization under the framework of similarity-based learning. They include two families of text categorization techniques, namely the k-nearest neighbor (k-NN) algorithm and linear classifiers. After identifying the weakness and strength of each technique, we propose a new technique known as the generalized instance set (GIS) algorithm by unifying the strengths of k-NN and linear classifiers and adapting to characteristics of text categorization problems. We also explore some variants of our GIS approach. We have implemented our GIS algorithm, the ExpNet algorithm, and some linear classifiers. Extensive experiments have been conducted on two common document corpora, namely the OHSUMED collection and the Reuters-21578 collection. The results show that our new approach outperforms the latest k-NN approach and linear classifiers in all experiments.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1998 UsingAGenerlizedInstSetForTextCat | Wai Lam Chao Yang Ho | Using a Generalized Instance Set for Automatic Text Categorization | http://cui.unige.ch/~ehrler/Project/Gambone/UsingAGeneralizedInstanceSetForAutomaticTextCategorization.pdf | 10.1145/290941.290961 |