Evidence Faithfulness Evaluation Task
Jump to navigation
Jump to search
An Evidence Faithfulness Evaluation Task is an evaluation task that is an explainability assessment task measuring how faithfully a model's decisions align with its identified evidence spans.
- AKA: Faithfulness Assessment Task, Evidence-Decision Alignment Task.
- Context:
- It can typically require Perturbation Tests removing identified evidence.
- It can typically measure Decision Sensitivity to evidence modification.
- It can typically detect Post-hoc Rationalization versus true evidence reliance.
- It can typically assess Evidence Necessity for model predictions.
- It can typically identify Spurious Correlations in evidence selection.
- ...
- It can often employ Ablation Studyes on evidence spans.
- It can often use Masking Experiments for evidence importance.
- It can often incorporate Adversarial Perturbations to test robustness.
- It can often require Multiple Test Configurations for comprehensive evaluation.
- ...
- It can range from being an Automatic Faithfulness Evaluation Task to being a Human Faithfulness Evaluation Task, depending on its evaluation approach.
- It can range from being a Binary Faithfulness Task to being a Graded Faithfulness Task, depending on its scoring method.
- ...
- It can evaluate Evidence-Based NLP Systems for faithfulness property.
- It can be solved by an Evidence Faithfulness Evaluation System.
- It can produce Evidence Faithfulness Measures as output.
- It can support Interpretability Research in explainable AI.
- ...
- Example(s):
- Comprehensiveness Evaluation Tasks, such as:
- Sufficiency Evaluation Tasks, such as:
- Adversarial Faithfulness Tasks, such as:
- ...
- Counter-Example(s):
- Plausibility Evaluation Tasks, which measure human agreement not model faithfulness.
- Accuracy Evaluation Tasks, which assess prediction correctness not evidence alignment.
- Generation Quality Tasks, which evaluate output fluency not evidence reliance.
- See: Explainability Evaluation Task, Interpretability Assessment Task, Model Faithfulness Task, Evidence Quality Evaluation.