Categorical Prediction Task Performance Measure
(Redirected from Classification Performance)
Jump to navigation
Jump to search
A Categorical Prediction Task Performance Measure is a prediction task performance measure for categorical prediction tasks that evaluates classification accuracy and classification errors.
- AKA: Classification Performance Measure, Classification Task Performance Measure, Classifier Performance Metric, Classification Evaluation Measure, Categorical Classification Metric, Classification Model Performance Measure.
- Context:
- It can (typically) evaluate True Positives, True Negatives, False Positives, and False Negatives from confusion matrices.
- It can (typically) assess Class Imbalance effects on classification performance.
- It can (typically) incorporate Misclassification Costs when errors have different consequences.
- It can (often) require Threshold Selection for probabilistic classifiers.
- It can (often) be computed using Cross-Validation or Hold-Out Validation for robust estimation.
- It can (often) guide Model Selection and Hyperparameter Tuning in machine learning pipelines.
- It can (often) be aggregated across classes using Macro-Averaging, Micro-Averaging, or Weighted Averaging.
- It can range from being a Binary Classification Performance Measure to being a Multi-Class Classification Performance Measure, depending on its class count.
- It can range from being a Threshold-Dependent Classification Measure to being a Threshold-Independent Classification Measure, depending on its decision boundary sensitivity.
- It can range from being a Class-Specific Performance Measure to being an Overall Performance Measure, depending on its aggregation scope.
- It can range from being a Point Estimate Classification Measure to being a Probabilistic Classification Measure, depending on its prediction type.
- It can integrate with Machine Learning Frameworks for automated evaluation.
- It can integrate with Model Monitoring Systems for production deployment.
- ...
- Example(s):
- Accuracy-Based Measures, such as:
- Classification Accuracy Measure: Proportion of correct predictions overall.
- Balanced Accuracy: Average of recall obtained on each class.
- Top-K Accuracy: Whether true class is in top K predictions.
- Error-Based Measures, such as:
- Classification Error Rate: Proportion of incorrect predictions.
- Mean Per-Class Error: Average error rate across classes.
- Confusion Matrix: Complete error breakdown by class.
- Precision-Recall Measures, such as:
- Precision Measure: Proportion of positive predictions that are correct.
- Recall Measure (Sensitivity): Proportion of actual positives correctly identified.
- F1-Score: Harmonic mean of precision and recall.
- F-Beta Score: Weighted harmonic mean with beta parameter.
- Macro-F1 Measure: F1 averaged across all classes.
- Micro-F1 Measure: F1 computed globally across all instances.
- Threshold-Independent Measures, such as:
- Area Under ROC Curve (AUC-ROC): Performance across all thresholds.
- Area Under Precision-Recall Curve (AUC-PR): Performance on imbalanced data.
- Average Precision (AP): Weighted mean of precisions at each threshold.
- Correlation-Based Measures, such as:
- Matthews Correlation Coefficient (MCC): Correlation between predicted and actual.
- Cohen's Kappa: Agreement corrected for chance.
- Phi Coefficient: Association for binary classification.
- Probabilistic Measures, such as:
- Log Loss (Cross-Entropy): Logarithmic penalty for wrong predictions.
- Brier Score: Mean squared difference between predicted probabilities and outcomes.
- Calibration Error: Difference between predicted and observed probabilities.
- Cost-Sensitive Measures, such as:
- Expected Cost: Weighted sum of misclassification costs.
- Profit Curve: Profit as function of classification threshold.
- Cost Curve: Expected cost across class distributions.
- Multi-Class Specific Measures, such as:
- Macro-Precision: Average precision across all classes.
- Micro-Recall: Global recall across all instances.
- Weighted F-Score: F-score weighted by class frequency.
- One-vs-All AUC: AUC for each class against all others.
- Imbalanced Data Measures, such as:
- G-Mean: Geometric mean of sensitivity and specificity.
- Informedness (Youden's J): Sensitivity + Specificity - 1.
- Markedness: Precision + Negative Predictive Value - 1.
- Information-Theoretic Measures, such as:
- Mutual Information: Information shared between predictions and labels.
- Normalized Mutual Information: Normalized version for comparison.
- Adjusted Rand Index: Rand index corrected for chance.
- ...
- Accuracy-Based Measures, such as:
- Counter-Example(s):
- Regression Performance Measure, which evaluates continuous predictions rather than categories.
- Ranking Performance Measure, which evaluates ordering rather than classification.
- Clustering Performance Measure, which evaluates unsupervised grouping.
- Ordinal Prediction Task Performance Measure, which considers ordering of categories.
- Sequence Labeling Performance Measure, which evaluates sequential predictions.
- Object Detection Performance Measure, which combines localization and classification.
- See: Classification Task, Confusion Matrix, ROC Curve, Precision-Recall Curve, Class Imbalance, Cost-Sensitive Learning, Threshold Selection, Model Calibration, Binary Classification, Multi-Class Classification, Multi-Label Classification, Imbalanced Learning, Performance Evaluation, Cross-Validation, Model Selection.