Categorical Prediction Task Performance Measure
(Redirected from classification performance)
		
		
		
		Jump to navigation
		Jump to search
		A Categorical Prediction Task Performance Measure is a prediction task performance measure for categorical prediction tasks that evaluates classification accuracy and classification errors.
- AKA: Classification Performance Measure, Classification Task Performance Measure, Classifier Performance Metric, Classification Evaluation Measure, Categorical Classification Metric, Classification Model Performance Measure.
 - Context:
- It can (typically) evaluate True Positives, True Negatives, False Positives, and False Negatives from confusion matrices.
 - It can (typically) assess Class Imbalance effects on classification performance.
 - It can (typically) incorporate Misclassification Costs when errors have different consequences.
 - It can (often) require Threshold Selection for probabilistic classifiers.
 - It can (often) be computed using Cross-Validation or Hold-Out Validation for robust estimation.
 - It can (often) guide Model Selection and Hyperparameter Tuning in machine learning pipelines.
 - It can (often) be aggregated across classes using Macro-Averaging, Micro-Averaging, or Weighted Averaging.
 - It can range from being a Binary Classification Performance Measure to being a Multi-Class Classification Performance Measure, depending on its class count.
 - It can range from being a Threshold-Dependent Classification Measure to being a Threshold-Independent Classification Measure, depending on its decision boundary sensitivity.
 - It can range from being a Class-Specific Performance Measure to being an Overall Performance Measure, depending on its aggregation scope.
 - It can range from being a Point Estimate Classification Measure to being a Probabilistic Classification Measure, depending on its prediction type.
 - It can integrate with Machine Learning Frameworks for automated evaluation.
 - It can integrate with Model Monitoring Systems for production deployment.
 - ...
 
 - Example(s):
- Accuracy-Based Measures, such as:
- Classification Accuracy Measure: Proportion of correct predictions overall.
 - Balanced Accuracy: Average of recall obtained on each class.
 - Top-K Accuracy: Whether true class is in top K predictions.
 
 - Error-Based Measures, such as:
- Classification Error Rate: Proportion of incorrect predictions.
 - Mean Per-Class Error: Average error rate across classes.
 - Confusion Matrix: Complete error breakdown by class.
 
 - Precision-Recall Measures, such as:
- Precision Measure: Proportion of positive predictions that are correct.
 - Recall Measure (Sensitivity): Proportion of actual positives correctly identified.
 - F1-Score: Harmonic mean of precision and recall.
 - F-Beta Score: Weighted harmonic mean with beta parameter.
 - Macro-F1 Measure: F1 averaged across all classes.
 - Micro-F1 Measure: F1 computed globally across all instances.
 
 - Threshold-Independent Measures, such as:
- Area Under ROC Curve (AUC-ROC): Performance across all thresholds.
 - Area Under Precision-Recall Curve (AUC-PR): Performance on imbalanced data.
 - Average Precision (AP): Weighted mean of precisions at each threshold.
 
 - Correlation-Based Measures, such as:
- Matthews Correlation Coefficient (MCC): Correlation between predicted and actual.
 - Cohen's Kappa: Agreement corrected for chance.
 - Phi Coefficient: Association for binary classification.
 
 - Probabilistic Measures, such as:
- Log Loss (Cross-Entropy): Logarithmic penalty for wrong predictions.
 - Brier Score: Mean squared difference between predicted probabilities and outcomes.
 - Calibration Error: Difference between predicted and observed probabilities.
 
 - Cost-Sensitive Measures, such as:
- Expected Cost: Weighted sum of misclassification costs.
 - Profit Curve: Profit as function of classification threshold.
 - Cost Curve: Expected cost across class distributions.
 
 - Multi-Class Specific Measures, such as:
- Macro-Precision: Average precision across all classes.
 - Micro-Recall: Global recall across all instances.
 - Weighted F-Score: F-score weighted by class frequency.
 - One-vs-All AUC: AUC for each class against all others.
 
 - Imbalanced Data Measures, such as:
- G-Mean: Geometric mean of sensitivity and specificity.
 - Informedness (Youden's J): Sensitivity + Specificity - 1.
 - Markedness: Precision + Negative Predictive Value - 1.
 
 - Information-Theoretic Measures, such as:
- Mutual Information: Information shared between predictions and labels.
 - Normalized Mutual Information: Normalized version for comparison.
 - Adjusted Rand Index: Rand index corrected for chance.
 
 - ...
 
 - Accuracy-Based Measures, such as:
 - Counter-Example(s):
- Regression Performance Measure, which evaluates continuous predictions rather than categories.
 - Ranking Performance Measure, which evaluates ordering rather than classification.
 - Clustering Performance Measure, which evaluates unsupervised grouping.
 - Ordinal Prediction Task Performance Measure, which considers ordering of categories.
 - Sequence Labeling Performance Measure, which evaluates sequential predictions.
 - Object Detection Performance Measure, which combines localization and classification.
 
 - See: Classification Task, Confusion Matrix, ROC Curve, Precision-Recall Curve, Class Imbalance, Cost-Sensitive Learning, Threshold Selection, Model Calibration, Binary Classification, Multi-Class Classification, Multi-Label Classification, Imbalanced Learning, Performance Evaluation, Cross-Validation, Model Selection.