Evidence Alignment Metric
Jump to navigation
Jump to search
An Evidence Alignment Metric is an explainability evaluation metric that is a span comparison metric measuring the alignment quality between human-annotated evidence spans and system-identified spans in explainable NLU tasks.
- AKA: Span Agreement Metric, Evidence Correspondence Score.
- Context:
- It can typically calculate Token-Level Overlap between annotation sets.
- It can typically measure Span Boundary Agreement for start and end positions.
- It can typically assess Coverage Ratio of human-identified evidence.
- It can typically quantify Precision Trade-offs in evidence selection.
- It can typically support Fine-Grained Analysis of alignment patterns.
- ...
- It can often incorporate Partial Credit Functions for near-matches.
- It can often employ Weighted Scoring based on evidence importance.
- It can often use Set-Based Metrics for multiple evidence spans.
- It can often normalize by document length or span count.
- ...
- It can range from being a Binary Alignment Metric to being a Continuous Alignment Metric, depending on its scoring function.
- It can range from being a Position-Based Metric to being a Content-Based Metric, depending on its comparison method.
- ...
- It can evaluate Evidence-Based NLP System interpretability.
- It can compare Model Evidences with human judgments.
- It can diagnose Evidence Selection Biases in system.
- It can guide Model Improvement for better alignment.
- ...
- Example(s):
- Counter-Example(s):
- Classification Accuracy, which ignores evidence location.
- Perplexity Score, which measures language model not alignment.
- Latency Metric, which evaluates speed not quality.
- See: Interpretability Metric, Human-AI Agreement Measure, Span Evaluation Metric, Explainability Score.