ML Evaluation Pitfall

From GM-RKB

Jump to navigation Jump to search

A ML Evaluation Pitfall is an evaluation error that is a methodological flaw leading to misleading machine learning metrics and invalid model assessments.

AKA: Machine Learning Evaluation Error, ML Assessment Mistake, Model Evaluation Flaw, ML Testing Pitfall, Evaluation Antipattern.
Context:
- It can typically involve Data Leakage Issues through train-test contamination, temporal leakage, and feature leakage.
- It can typically manifest Selection Bias Problems via sampling bias, survivorship bias, and cherry-picking results.
- It can typically create Overfitting Indications through test set reuse, hyperparameter overfitting, and validation set contamination.
- It can typically cause Metric Misalignment via inappropriate metric choice, class imbalance ignorance, and business objective mismatch.
- It can typically produce Statistical Invalidity through multiple testing problems, p-hacking practices, and significance misinterpretation.
- ...
- It can often result from Preprocessing Leakage via normalization leakage, imputation leakage, and encoding leakage.
- It can often stem from Cross-Validation Errors through improper stratification, grouped data splitting, and nested cv mistakes.
- It can often arise from Distribution Shifts via covariate shift, concept drift, and domain mismatch.
- It can often emerge from Evaluation Protocol Flaws through inconsistent evaluation, unfair comparisons, and baseline omission.
- It can often occur from Human Evaluation Biases via confirmation bias, anchoring effects, and label noise.
- ...
- It can range from being a Subtle ML Evaluation Pitfall to being an Obvious ML Evaluation Pitfall, depending on its ml pitfall detectability.
- It can range from being a Common ML Evaluation Pitfall to being a Rare ML Evaluation Pitfall, depending on its ml pitfall frequency.
- It can range from being a Minor ML Evaluation Pitfall to being a Critical ML Evaluation Pitfall, depending on its ml pitfall impact severity.
- It can range from being a Data-Related ML Evaluation Pitfall to being a Method-Related ML Evaluation Pitfall, depending on its ml pitfall source.
- It can range from being a Preventable ML Evaluation Pitfall to being an Inherent ML Evaluation Pitfall, depending on its ml pitfall avoidability.
- ...
- It can be detected by Evaluation Audit Tools through ml sanity checks, ml diagnostic tests, and ml validation procedures.
- It can be prevented by Best Practice Frameworks via ml evaluation guidelines, ml testing protocols, and ml review checklists.
- It can be mitigated by Correction Techniques using ml resampling methods, ml debiasing approaches, and ml robust evaluation.
- It can be documented in Evaluation Reports through ml limitation sections, ml threat disclosures, and ml validity analysis.
- ...
Example(s):
- Data Leakage Pitfalls, such as:
  - Machine Learning Data Leakage from training data appearing in test set.
  - Target Leakage from future information in feature engineering.
  - Group Leakage from related samples across data splits.
- Overfitting Pitfalls, such as:
  - Test Set Overfitting from repeated evaluation on same test data.
  - Validation Set Overfitting from excessive hyperparameter tuning.
  - Leaderboard Overfitting from multiple submissions to competition.
- Bias Pitfalls, such as:
  - Sampling Bias from non-representative data collection.
  - Survivorship Bias from missing failure cases in dataset.
  - Label Bias from systematic annotation errors.
- Metric Pitfalls, such as:
  - Accuracy Paradox in imbalanced classification.
  - Simpson's Paradox in aggregated metrics.
  - Goodhart's Law when metric becomes target.
- Protocol Pitfalls, such as:
  - Temporal Validation Error using future data for past prediction.
  - Distribution Mismatch between training distribution and deployment distribution.
  - Peeking Error from data exploration before split.
- ...
Counter-Example(s):
- Valid Evaluation Practice, which follows proper protocols and avoids methodological flaws.
- Robust Testing Method, which uses sound statistical practices and appropriate metrics.
- Best Practice Implementation, which applies proven techniques and standard procedures.
- Controlled Experiment, which maintains proper isolation and valid comparisons.
See: Machine Learning Evaluation, Machine Learning Data Leakage, Overfitting, Selection Bias, Cross-Validation, Statistical Validity, Evaluation Metric, Test Set, Validation Set, ML Best Practice, LLM Evaluation Method.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=ML_Evaluation_Pitfall&oldid=963761"