Text-Item Classification Task

(Redirected from Text Categorization)
Jump to: navigation, search

A text-item classification ask is a linguistic classification task whose input is a text item (whose class set is a document category set).







  • (Yang, 1999) ⇒ Y. Yang. (1999). “An Evaluation of Statistical Approaches to Text Categorization.” In: Journal of Information Retrieval, 1.
    • NOTE: it experiments on a search space of ~18,000 Medical Subject Headings (MeSH).



  • (Wilbur & Yang, 1996) ⇒ J. Wilbur, and Y. Yang. (1996). “Analysis of Statistical Term Strength and its Use in the Indexing and Retrieval of Molecular Biology Texts.” In: Comput. Biol. Med., 26(3), 209–222.
    • experiment on a search space of less than 18,000 Medical Subject Headings (MeSH).


  • (Yang & Chute, 1992) ⇒ Y. Yang, and C. Chute. (1992). “A Linear Least Squares Fit Mapping Method for Information Retrieval from Natural Language Texts.” In: COLING 1992.
    • Work with the International Classification of Diseases (about 12,000 concepts)


  • (Field, 1975) ⇒ B. J. Field. (1975). “Towards Automatic Indexing: Automatic assignment of controlled-language indexing and classification from free indexing.” In: : Journal of Documentation, 31(4). doi:10.1108/eb026605


  • (Borko & Bernick, 1963) ⇒ Harold Borko, and Myrna Bernick. (1963). “Automatic Document Classification.” In: Journal of the ACM (JACM).
    • The problem of automatic document classification is a part of the larger problem of automatic content analysis. Classification means the determination of subject content. For a document to be classified under a given heading, it must be ascertained that its subject matter relates to that area of discourse. In most cases this is a relatively easy decision for a human being to make. The question being raised is whether a computer can be programmed to determine the subject content of a document and the category (categories) into which it should be classified.