ML Evaluation Pitfall
Jump to navigation
Jump to search
A ML Evaluation Pitfall is an evaluation error that is a methodological flaw leading to misleading machine learning metrics and invalid model assessments.
- AKA: Machine Learning Evaluation Error, ML Assessment Mistake, Model Evaluation Flaw, ML Testing Pitfall, Evaluation Antipattern.
- Context:
- It can typically involve Data Leakage Issues through train-test contamination, temporal leakage, and feature leakage.
- It can typically manifest Selection Bias Problems via sampling bias, survivorship bias, and cherry-picking results.
- It can typically create Overfitting Indications through test set reuse, hyperparameter overfitting, and validation set contamination.
- It can typically cause Metric Misalignment via inappropriate metric choice, class imbalance ignorance, and business objective mismatch.
- It can typically produce Statistical Invalidity through multiple testing problems, p-hacking practices, and significance misinterpretation.
- ...
- It can often result from Preprocessing Leakage via normalization leakage, imputation leakage, and encoding leakage.
- It can often stem from Cross-Validation Errors through improper stratification, grouped data splitting, and nested cv mistakes.
- It can often arise from Distribution Shifts via covariate shift, concept drift, and domain mismatch.
- It can often emerge from Evaluation Protocol Flaws through inconsistent evaluation, unfair comparisons, and baseline omission.
- It can often occur from Human Evaluation Biases via confirmation bias, anchoring effects, and label noise.
- ...
- It can range from being a Subtle ML Evaluation Pitfall to being an Obvious ML Evaluation Pitfall, depending on its ml pitfall detectability.
- It can range from being a Common ML Evaluation Pitfall to being a Rare ML Evaluation Pitfall, depending on its ml pitfall frequency.
- It can range from being a Minor ML Evaluation Pitfall to being a Critical ML Evaluation Pitfall, depending on its ml pitfall impact severity.
- It can range from being a Data-Related ML Evaluation Pitfall to being a Method-Related ML Evaluation Pitfall, depending on its ml pitfall source.
- It can range from being a Preventable ML Evaluation Pitfall to being an Inherent ML Evaluation Pitfall, depending on its ml pitfall avoidability.
- ...
- It can be detected by Evaluation Audit Tools through ml sanity checks, ml diagnostic tests, and ml validation procedures.
- It can be prevented by Best Practice Frameworks via ml evaluation guidelines, ml testing protocols, and ml review checklists.
- It can be mitigated by Correction Techniques using ml resampling methods, ml debiasing approaches, and ml robust evaluation.
- It can be documented in Evaluation Reports through ml limitation sections, ml threat disclosures, and ml validity analysis.
- ...
- Example(s):
- Data Leakage Pitfalls, such as:
- Machine Learning Data Leakage from training data appearing in test set.
- Target Leakage from future information in feature engineering.
- Group Leakage from related samples across data splits.
- Overfitting Pitfalls, such as:
- Bias Pitfalls, such as:
- Sampling Bias from non-representative data collection.
- Survivorship Bias from missing failure cases in dataset.
- Label Bias from systematic annotation errors.
- Metric Pitfalls, such as:
- Protocol Pitfalls, such as:
- Temporal Validation Error using future data for past prediction.
- Distribution Mismatch between training distribution and deployment distribution.
- Peeking Error from data exploration before split.
- ...
- Data Leakage Pitfalls, such as:
- Counter-Example(s):
- Valid Evaluation Practice, which follows proper protocols and avoids methodological flaws.
- Robust Testing Method, which uses sound statistical practices and appropriate metrics.
- Best Practice Implementation, which applies proven techniques and standard procedures.
- Controlled Experiment, which maintains proper isolation and valid comparisons.
- See: Machine Learning Evaluation, Machine Learning Data Leakage, Overfitting, Selection Bias, Cross-Validation, Statistical Validity, Evaluation Metric, Test Set, Validation Set, ML Best Practice, LLM Evaluation Method.