Supervised Text-Item Classification Algorithm: Difference between revisions

Latest revision as of 07:32, 22 August 2024

A Supervised Text-Item Classification Algorithm is a data-driven text-item classification algorithm that is a supervised classification algorithm.

Context:
- It can be implemented by a Supervised Text Classification System (to solve a supervised text classification task).
- It can range from being a Fully-Supervised Text Classification Algorithm to being a Semi-Supervised Text Classification Algorithm.
- It can range from being a Supervised Binary Text Classification Algorithm to being a Supervised Multiclass Text Classification Algorithm.
- It can range from being a Supervised Unilabel Text Classification Algorithm to being a Supervised Multilabel Text Classification Algorithm.
- It can range from being a Supervised Free-Form Text-Item Classification Algorithm to being a Supervised Structured Text-Item Classification Algorithm.
- ...
Example(s):
Counter-Example(s):
- an Unsupervised Text Classification Algorithm.
- an Heuristic Text Classification Algorithm.
See: Supervised Text-Item Classification, Supervised Text-Item Classification System.

References

2023

(Lin et al., 2023) ⇒ Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin. (2023). “Linear Classifier: An Often-Forgotten Baseline for Text Classification.” In: arXiv preprint arXiv:2306.07111. doi:10.48550/arXiv.2306.07111
- NOTE:
  - It utilizes 2622 preprocessed theft crime cases from a city spanning 2009-2019, aiming to enhance crime prediction accuracy using text classification.
  - It employs the TF-IDF (Term Frequency-Inverse Document Frequency) model for feature extraction, determining the relevance of words in the crime data documents.

2007a

(Shehata et al., 2007) ⇒ Shady Shehata, Fakhri Karray, and Mohamed Kamel. (2007). “A Concept-based Model for Enhancing Text Categorization." (KDD-2007) In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007).

2007b

(Thet et al., 2007) ⇒ Tun Thura Thet, Jin-Cheon Na, and Christopher S. G. Khoo. (2007). “Filtering Product Reviews from Web Search Results.” In: Proceedings of the 2007 ACM symposium on Document Engineering.
- Compares the performance of a Supervised Learning Algorithm and a Heuristic Approach to a Text Categorization Task that is based on Search Snippets.

2002a

(Lodhi et al., 2002) ⇒ Huma Lodhi, Craig Saunders, John Shawe Taylor, Nello Cristianini, and Chris Watkins. (2002). “Text Classification Using String Kernels.” In: The Journal of Machine Learning Research, 2.
- We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously.

2002b

(Sebastiani, 2002) ⇒ Fabrizio Sebastiani. (2002). “Machine Learning in Automated Text Categorization.” In: Association of Computing Machinery Computing Surveys (CSUR), 34(1).
- … In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories.

2001

(Slonim and Tishby, 2001) ⇒ N. Slonim, and N. Tishby. (2001). “The Power of Word Clusters for Text Classification.” In: Proceedings of the 23rd European Colloquium on Information Retrieval Research (ECIR 2001).

2000

(Nigam et al., 2000) ⇒ Kamal Nigam, Andrew McCallum, Tom M. Mitchell, and W. Cohen. (2000). “Text Classification from Labeled and Unlabeled Documents Using EM.” In: Machine Learning. doi:10.1023/A:1007692713085
(Han & Karypsis, 2000) ⇒ Eui-Hong (Sam) Han, and George Karypis. (2000). “Centroid-based Document Classification: Analysis and Experimental Results.” In: Army High Performance Computing.

1999

(McCallum, 1999) ⇒ Andrew McCallum. (1999). “Multi-label Text Classication with a Mixture Model Trained by EM.” In: AAAI 99 Workshop on Text Learning.
(Yang & Liu, 1999) ⇒ Yiming Yang, and Xin Liu. (1999). “A Re-examination of Text Categorization Methods.” In: Proceedings of the 22nd ACM SIGIR Conference Retrieval (SIGIR 1999).
(Nigam et al., 1999) ⇒ Kamal Nigam, John Lafferty, and Andrew McCallum. (1999). “Using Maximum Entropy for Text Classification.” In: IJCAI-99 workshop on machine learning for information filtering.
- QUOTE: Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, part-of-speech tagging, and text segmentation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are uniform. Constraints on the distribution, derived from labeled training data, inform the technique where to be minimally non-uniform. The maximum entropy formulation has a unique solution which can be found by the improved iterative scaling algorithm. In this paper, maximum entropy is used for text classification by estimating the conditional distribution of the class variable given the document. In experiments on several text datasets we compare accuracy to naive Bayes and show that maximum entropy is sometimes significantly better, but also sometimes worse. Much future work remains, but the results indicate that maximum entropy is a promising technique for text classification.

1998

(Apte et al., 1998) ⇒ C. Apte, F. Damerau, and Sholom M. Weiss. (1998). “Text mining with decision rules and decision trees.” In: Proceedings of the Conference on Automated Learning and Discorery, Workshop 6: Learning from Text and the Web.
- Decision Tree Algorithm.

(Baker & McCallum, 1998) ⇒ L. Douglas Baker, and Andrew McCallum. (1998). “Distributional Clustering of Words for Text Classification.” In: Proceedings of the 21st ACM SIGIR Conference Retrieval (SIGIR 1998). doi:10.1145/290941.290970
- It suggests that Word Stemming Task can impair classification performance.
- It proposes the clustering of terms that tend to indicate the presence of the same category.
- It applies a Bayesian Classification Algorithm.

(Joachims, 1998) ⇒ Thorsten Joachims. (1998). “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In: Proceedings of the European Conference on Machine Learning (ECML 1998).
- Support Vector Machine Algorithm

(Lam & Ho, 1998) ⇒ Wai Lam, and Chao Yang Ho. (1998). “Using a Generalized Instance Set for Automatic Text Categorization.” In: Proceedings of the 21st ACM SIGIR Conference retrieval (SIGIR 1998). doi:10.1145/290941.290961
- k-Nearest Neighbor Algorithm.
(McCallum & Niham, 1998) ⇒ Andrew McCallum, Kamal Nigam. (1998). “A Comparison of Event Models for Naive Bayes Text Classification.” In: AAAI/ICML-98 Workshop on Learning for Text Categorization.
- Naive-Bayes Classification Algorithm.

((McCallum & Nigam, 1998) ⇒ Andrew McCallum, and Kamal Nigam. (1998). “A Comparison of Event Models for Naive Bayes Text Classification.” In: Proceedings of AAAI-98 Workshop on Learning for Text Categorization.

1997

Y Yang, JO Pedersen. (1997). “A Comparative Study on Feature Selection in Text Categorization.” In: MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE-....
- http://scholar.google.com/scholar?cluster=6967414460160565120
Hwee Tou Ng, Wei Boon Goh, and Kok Leong Low. (1997). “Feature selection, perception learning, and a usability case study for text categorization.]] In: Proceedings of the 20th ACM SIGIR Conference Retrieval.
- Neural Network Algorithm.
Daphne Koller, and Mehran Sahami. (1997). “Hierarchically Classifying Documents Using Very Few Words.” In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997).
- Bayesian Classification Algorithm.

1996

David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. (1996). “Training algorithms for linear text classifiers.” In: Proceedings of the 19th ACM SIGIR Conference retrieval.
- Online Learning Algorithm.
William W. Cohen, and Yoram Singer. (1996). “Context-Sensitive Learning Methods for Text Categorization.” In: Proceedings of the 19th ACM SIGIR Conference Retrieval.
- Online Learning Algorithm.
I. Moulinier, G. Raskinis, and J. Ganascia. (1996). “Text categorization: a symbolic approach.” In: Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval.
- Inductive Rule Learning Algorithm.

1995

E. Wiener, J.O. Pedersen, and A.S. Weigend. (1995). “A neural network approach to topic spotting.” In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1995).
- Neural Network Algorithm.
William W. Cohen. (1995). “Text Categorization and Relational Learning.” In: The Twelfth International Conference on Machine Learning (ICML 1995).
- Inductive Rule Learning Algorithm.

1994

Chidanand Apté, Fred Damerau, and Sholom M. Weiss. (1994). “Towards Language Independent Automated Learning of Text Categorization Models.” In: Proceedings of the 17th ACM SIGIR Conference Retrieval.
- Inductive Rule Learning Algorithm.
D. D. Lewis, and M. Ringuette. (1994). “Comparison of Two Learning Algorithms for Text Categorization.” In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994).
- Decision Tree Algorithm, Bayesian Classification Algorithm.
(Yang & Chute, 1994) ⇒ Yiming Yang, and Christopher G. Chute. (1994). “An Example-based Mapping Method for Text Categorization and Retrieval.” In: ACM Transactions on Information Systems (TOIS 1994), 12(3).
- Regression Model.
(Cavnar & Trenkle, 1994) ⇒ William B. Cavnar, and John M. Trenkle. (1994). “N-gram-based Text Categorization.” In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval.

1993

Kostas Tzeras, Stephan Hartmann. (1993). “Automatic Indexing Based on Bayesian Inference Networks.” In: Proceedings of the 16th ACM SIGIR Conference retrieval (SIGIR 1993).
- Bayesian Classification Algorithm.

1992

(Masand et al., 1992) ⇒ Brij Masand, Gordon Linoff, and David Waltz. (1992). “Classifying News Stories Using Memory Based Reasoning.” In: Proceedings of the 15th ACM SIGIR Conference.
- Notes: proposes a k-Nearest Neighbor-based Supervised Text Classification Algorithm.

1991

N. Fuhr, S. Hartmanna, G. Lustig, M. Schwantner, and K. Tzeras. (1991). “Air/x - a rule-based Multistage Indexing Systems for Large Subject Fields.” In: Proceedings of RIAO 1991.

@@ Line 53: / Line 53: @@
 * ([[1999_AReExaminationOfTextCategorizationMethods|Yang & Liu, 1999]]) ⇒ [[Yiming Yang]], and Xin Liu. ([[1999]]). “[http://dx.doi.org/10.1145/312624.312647 A Re-examination of Text Categorization Methods].” In: Proceedings of the 22nd [[ACM SIGIR Conference]] Retrieval ([[SIGIR]] 1999).
 * ([[1999_UsingMaximumEntropyforTextClass|Nigam et al., 1999]]) ⇒ [[Kamal Nigam]], [[John Lafferty]], and [[Andrew McCallum]]. ([[1999]]). “[http://www.kamalnigam.com/papers/maxent-ijcaiws99.pdf Using Maximum Entropy for Text Classification].” In: [[IJCAI-99 workshop on machine learning for information filtering]].
-** QUOTE: [[Maximum entropy algorithm|Maximum entropy]] is a [[probability distribution estimation technique]] widely used for a variety of natural language tasks, such as [[language modeling]], [[part-of-speech tagging]], and [[supervised text segmentation|text segmentation]]. The underlying principle of [[Maximum entropy algorithm|maximum entropy]] is that without external knowledge, one should prefer [[probability distribution|distributions]] that are [[uniform distribution|uniform]]. Constraints on the distribution, derived from labeled [[training data]], inform [[Maximum Entropy algorithm|the technique]] where to be minimally non-uniform. The [[Maximum entropy algorithm|maximum entropy formulation]] has a unique solution which can be found by the improved [[iterative scaling algorithm]]. [[In this paper]], [[Maximum entropy algorithm|maximum entropy]] is used for [[supervised text classification|text classification]] by [[estimating]] the [[conditional distribution]] of the [[class variable]] given the [[document]]. In experiments on several [[text dataset]]s we compare accuracy to [[naive Bayes classification algorithm|naive Bayes]] and show that [[Maximum entropy algorithm|maximum entropy]] is sometimes significantly better, but also sometimes worse. Much future work remains, but the results indicate that [[Maximum entropy algorithm|maximum entropy]] is a promising technique for [[supervised text classification|text classification]].
+** QUOTE: [[Maximum entropy algorithm|Maximum entropy]] is a [[probability distribution estimation technique]] widely used for a variety of natural language tasks, such as [[language modeling]], [[part-of-speech tagging]], and [[supervised text segmentation|text segmentation]]. The underlying principle of [[Maximum entropy algorithm|maximum entropy]] is that without external knowledge, one should prefer [[probability distribution|distribution]]s that are [[uniform distribution|uniform]]. Constraints on the distribution, derived from labeled [[training data]], inform [[Maximum Entropy algorithm|the technique]] where to be minimally non-uniform. The [[Maximum entropy algorithm|maximum entropy formulation]] has a unique solution which can be found by the improved [[iterative scaling algorithm]]. [[In this paper]], [[Maximum entropy algorithm|maximum entropy]] is used for [[supervised text classification|text classification]] by [[estimating]] the [[conditional distribution]] of the [[class variable]] given the [[document]]. In experiments on several [[text dataset]]s we compare accuracy to [[naive Bayes classification algorithm|naive Bayes]] and show that [[Maximum entropy algorithm|maximum entropy]] is sometimes significantly better, but also sometimes worse. Much future work remains, but the results indicate that [[Maximum entropy algorithm|maximum entropy]] is a promising technique for [[supervised text classification|text classification]].
 === 1998 ===

Supervised Text-Item Classification Algorithm: Difference between revisions

Latest revision as of 07:32, 22 August 2024

References

2023

2007a

2007b

2002a

2002b

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

Navigation menu

Search