Macro-F1 Difference P-Value Method

From GM-RKB

(Redirected from Macro F1 Difference Significance Test)

Jump to navigation Jump to search

A Macro-F1 Difference P-Value Method is a p-value calculation method that compares two macro-F1 scores by variance summation and computing difference Z-scores under independence assumptions.

AKA: Macro-F1 Comparison Test, Two-Model Macro-F1 Test, Macro F1 Difference Significance Test, Paired Macro-F1 Hypothesis Test.
Context:
- It can typically compute difference variance by summing model variances.
- It can typically assume Independent Groups Assumption in Variance Estimation Method within and across models.
- It can typically apply two-sided alternative hypothesis tests for model comparisons.
- It can often evaluate model improvement through statistical significance.
- It can often support multiple comparison corrections when testing several model pairs.
- It can often handle paired evaluation on the same test set.
- It can range from being a Independent Macro-F1 Difference P-Value Method to being a Paired Macro-F1 Difference P-Value Method, depending on its sample relationship.
- It can range from being a Equal-Variance Macro-F1 Difference P-Value Method to being a Unequal-Variance Macro-F1 Difference P-Value Method, depending on its variance assumption.
- It can range from being a Two-Tailed Macro-F1 Difference P-Value Method to being a One-Tailed Macro-F1 Difference P-Value Method, depending on its alternative hypothesis.
- It can range from being a Small-Sample Macro-F1 Difference P-Value Method to being a Large-Sample Macro-F1 Difference P-Value Method, depending on its sample size handling.
- ...
Example(s):
- Two-Model Comparison Tests, such as:
  - Model A: macro-F1=0.838, Model B: macro-F1=0.781.
  - Difference=0.057, p-value=0.37 (not significant).
- Baseline vs Enhanced Model Tests, such as:
  - Testing improvement over baseline classifier.
  - One-sided test for positive difference.
- Algorithm Comparison Tests, such as:
  - Comparing neural vs tree-based models.
  - Accounting for group-level differences.
- ...
Counter-Example(s):
- Single-Model F1 Test, which doesn't compare models.
- Paired t-Test Method, which uses different assumptions.
- DeLong Test, which is for AUC comparison.
See: Macro-F1 P-Value Calculation Method, P-Value Calculation Method, Model Comparison Method, Independent Groups Assumption in Variance Estimation Method, Two-Sided Alternative Hypothesis Test, Variance Summation Rule, Statistical Hypothesis Testing, Macro-F1 Measure from Group Counts Method, Difference Test, Z-Score for Performance Metric Test Method.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Macro-F1_Difference_P-Value_Method&oldid=972927"