Binary Classification Performance Measure
(Redirected from Boolean Classification Performance Measure)
		
		
		
		Jump to navigation
		Jump to search
		A Binary Classification Performance Measure is a categorical prediction task performance measure for binary classification tasks that evaluates two-class predictions.
- AKA: Boolean Classification Performance Measure, Two-Class Classification Metric, Binary Classifier Performance Metric, Predictive Relation Performance Metric.
 - Context:
- It can (typically) be derived from a 2x2 Confusion Matrix containing True Positives, True Negatives, False Positives, and False Negatives.
 - It can (typically) evaluate Type I Errors (false positives) and Type II Errors (false negatives) separately.
 - It can (typically) be affected by Class Imbalance between positive and negative classes.
 - It can (often) require Threshold Selection for probabilistic binary classifiers.
 - It can (often) be optimized for specific Cost-Sensitive Learning scenarios where error types have different impacts.
 - It can (often) guide Binary Classifier Selection based on domain-specific requirements.
 - It can (often) be visualized using ROC Curves, Precision-Recall Curves, or Lift Charts.
 - It can range from being a Threshold-Dependent Binary Measure to being a Threshold-Independent Binary Measure, depending on its decision boundary requirement.
 - It can range from being a Class-Specific Binary Measure to being a Combined Binary Measure, depending on its class focus.
 - It can range from being a Point Estimate Binary Measure to being a Probabilistic Binary Measure, depending on its prediction type.
 - It can integrate with Binary Classification Systems for model evaluation.
 - It can integrate with Hyperparameter Optimization Frameworks for model tuning.
 - ...
 
 - Example(s):
- Basic Rate Measures, such as:
- True Positive Rate (Sensitivity/Recall): TP / (TP + FN).
 - True Negative Rate (Specificity): TN / (TN + FP).
 - False Positive Rate: FP / (TN + FP).
 - False Negative Rate: FN / (TP + FN).
 
 - Precision-Based Measures, such as:
 - Combined Measures, such as:
- F1-Score: Harmonic mean of precision and recall.
 - F-Beta Score: Weighted harmonic mean with beta parameter.
 - G-Mean: Geometric mean of sensitivity and specificity.
 - Balanced Accuracy: Average of sensitivity and specificity.
 
 - Correlation Measures, such as:
- Matthews Correlation Coefficient: Correlation between predicted and actual binary labels.
 - Phi Coefficient: Pearson correlation for binary variables.
 - Cohen's Kappa: Agreement corrected for chance.
 - Youden's J Statistic: Sensitivity + Specificity - 1.
 
 - Probabilistic Measures, such as:
- Log Loss (Binary Cross-Entropy): Logarithmic penalty for probabilistic predictions.
 - Brier Score: Mean squared difference between predicted probabilities and outcomes.
 - Expected Calibration Error: Average difference between confidence and accuracy.
 
 - Threshold-Independent Measures, such as:
- Area Under ROC Curve (AUC-ROC): Performance across all thresholds.
 - Area Under Precision-Recall Curve (AUC-PR): Performance for imbalanced datasets.
 - Average Precision: Weighted mean of precisions at each threshold.
 
 - Information-Theoretic Measures, such as:
- Mutual Information: Information shared between predictions and labels.
 - Information Gain: Reduction in entropy from predictions.
 - Kullback-Leibler Divergence: Divergence between predicted and actual distributions.
 
 - Cost-Sensitive Measures, such as:
- Expected Cost: Weighted sum of error costs.
 - Profit Score: Economic value of predictions.
 - Utility Score: Domain-specific utility function.
 
 - Statistical Test Measures, such as:
- McNemar's Test: Statistical significance of classifier differences.
 - Binomial Test: Significance of classification accuracy.
 - DeLong's Test: Significance of AUC differences.
 
 - ...
 
 - Basic Rate Measures, such as:
 - Counter-Example(s):
- Multi-Class Classification Performance Measure, which evaluates more than two classes.
 - Regression Performance Measure, which evaluates continuous predictions.
 - Ranking Performance Measure, which evaluates ordering quality.
 - Clustering Performance Measure, which evaluates unsupervised grouping.
 - Multi-Label Classification Measure, which allows multiple labels per instance.
 
 - See: Binary Classification Task, Confusion Matrix, ROC Analysis, Precision-Recall Analysis, Type I and Type II Errors, Class Imbalance Problem, Threshold Selection, Cost-Sensitive Learning, Statistical Hypothesis Testing, Classifier Comparison, Model Calibration, Binary Classifier.
 
References
2009
- Eric W. Weisstein. “Statistical Test." From MathWorld -- A Wolfram Web Resource. http://mathworld.wolfram.com/StatisticalTest.html
- QUOTE: A test used to determine the statistical significance of an observation. Two main types of error can occur:
- 1. A type I error occurs when a false negative result is obtained in terms of the null hypothesis by obtaining a false positive measurement.
 - 2. A type II error occurs when a false positive result is obtained in terms of the null hypothesis by obtaining a false negative measurement.
 
 - The probability that a statistical test will be positive for a true statistic is sometimes called the test's sensitivity, and the probability that a test will be negative for a negative statistic is sometimes called the specificity. The following table summarizes the names given to the various combinations of the actual state of affairs and observed test results.
- result name
 - true positive result sensitivity
 - false negative result 1-sensitivity
 - true negative result specificity
 - false positive result 1-specificity
 
 
 - QUOTE: A test used to determine the statistical significance of an observation. Two main types of error can occur: