Binary Classification Performance Measure
(Redirected from Boolean Classification Performance Measure)
Jump to navigation
Jump to search
A Binary Classification Performance Measure is a categorical prediction task performance measure for binary classification tasks that evaluates two-class predictions.
- AKA: Boolean Classification Performance Measure, Two-Class Classification Metric, Binary Classifier Performance Metric, Predictive Relation Performance Metric.
- Context:
- It can (typically) be derived from a 2x2 Confusion Matrix containing True Positives, True Negatives, False Positives, and False Negatives.
- It can (typically) evaluate Type I Errors (false positives) and Type II Errors (false negatives) separately.
- It can (typically) be affected by Class Imbalance between positive and negative classes.
- It can (often) require Threshold Selection for probabilistic binary classifiers.
- It can (often) be optimized for specific Cost-Sensitive Learning scenarios where error types have different impacts.
- It can (often) guide Binary Classifier Selection based on domain-specific requirements.
- It can (often) be visualized using ROC Curves, Precision-Recall Curves, or Lift Charts.
- It can range from being a Threshold-Dependent Binary Measure to being a Threshold-Independent Binary Measure, depending on its decision boundary requirement.
- It can range from being a Class-Specific Binary Measure to being a Combined Binary Measure, depending on its class focus.
- It can range from being a Point Estimate Binary Measure to being a Probabilistic Binary Measure, depending on its prediction type.
- It can integrate with Binary Classification Systems for model evaluation.
- It can integrate with Hyperparameter Optimization Frameworks for model tuning.
- ...
- Example(s):
- Basic Rate Measures, such as:
- True Positive Rate (Sensitivity/Recall): TP / (TP + FN).
- True Negative Rate (Specificity): TN / (TN + FP).
- False Positive Rate: FP / (TN + FP).
- False Negative Rate: FN / (TP + FN).
- Precision-Based Measures, such as:
- Combined Measures, such as:
- F1-Score: Harmonic mean of precision and recall.
- F-Beta Score: Weighted harmonic mean with beta parameter.
- G-Mean: Geometric mean of sensitivity and specificity.
- Balanced Accuracy: Average of sensitivity and specificity.
- Correlation Measures, such as:
- Matthews Correlation Coefficient: Correlation between predicted and actual binary labels.
- Phi Coefficient: Pearson correlation for binary variables.
- Cohen's Kappa: Agreement corrected for chance.
- Youden's J Statistic: Sensitivity + Specificity - 1.
- Probabilistic Measures, such as:
- Log Loss (Binary Cross-Entropy): Logarithmic penalty for probabilistic predictions.
- Brier Score: Mean squared difference between predicted probabilities and outcomes.
- Expected Calibration Error: Average difference between confidence and accuracy.
- Threshold-Independent Measures, such as:
- Area Under ROC Curve (AUC-ROC): Performance across all thresholds.
- Area Under Precision-Recall Curve (AUC-PR): Performance for imbalanced datasets.
- Average Precision: Weighted mean of precisions at each threshold.
- Information-Theoretic Measures, such as:
- Mutual Information: Information shared between predictions and labels.
- Information Gain: Reduction in entropy from predictions.
- Kullback-Leibler Divergence: Divergence between predicted and actual distributions.
- Cost-Sensitive Measures, such as:
- Expected Cost: Weighted sum of error costs.
- Profit Score: Economic value of predictions.
- Utility Score: Domain-specific utility function.
- Statistical Test Measures, such as:
- McNemar's Test: Statistical significance of classifier differences.
- Binomial Test: Significance of classification accuracy.
- DeLong's Test: Significance of AUC differences.
- ...
- Basic Rate Measures, such as:
- Counter-Example(s):
- Multi-Class Classification Performance Measure, which evaluates more than two classes.
- Regression Performance Measure, which evaluates continuous predictions.
- Ranking Performance Measure, which evaluates ordering quality.
- Clustering Performance Measure, which evaluates unsupervised grouping.
- Multi-Label Classification Measure, which allows multiple labels per instance.
- See: Binary Classification Task, Confusion Matrix, ROC Analysis, Precision-Recall Analysis, Type I and Type II Errors, Class Imbalance Problem, Threshold Selection, Cost-Sensitive Learning, Statistical Hypothesis Testing, Classifier Comparison, Model Calibration, Binary Classifier.
References
2009
- Eric W. Weisstein. “Statistical Test." From MathWorld -- A Wolfram Web Resource. http://mathworld.wolfram.com/StatisticalTest.html
- QUOTE: A test used to determine the statistical significance of an observation. Two main types of error can occur:
- 1. A type I error occurs when a false negative result is obtained in terms of the null hypothesis by obtaining a false positive measurement.
- 2. A type II error occurs when a false positive result is obtained in terms of the null hypothesis by obtaining a false negative measurement.
- The probability that a statistical test will be positive for a true statistic is sometimes called the test's sensitivity, and the probability that a test will be negative for a negative statistic is sometimes called the specificity. The following table summarizes the names given to the various combinations of the actual state of affairs and observed test results.
- result name
- true positive result sensitivity
- false negative result 1-sensitivity
- true negative result specificity
- false positive result 1-specificity
- QUOTE: A test used to determine the statistical significance of an observation. Two main types of error can occur: