Span Extraction Performance Measure
(Redirected from Evidence Extraction Accuracy Measure)
Jump to navigation
Jump to search
A Span Extraction Performance Measure is an information extraction metric that is a span evaluation metric quantifying the extraction accuracy of exact text spans in span-level evidence extraction systems.
- AKA: Span-Level Evaluation Metric, Evidence Extraction Accuracy Measure.
- Context:
- It can typically measure Boundary Precision for span start and span end positions.
- It can typically calculate Token-Level Overlap between predicted spans and gold spans.
- It can typically assess Span Completeness via coverage metrics.
- It can typically evaluate Span Minimality through conciseness scores.
- It can typically consider Partial Match Credit for overlapping spans.
- ...
- It can often employ Exact Match Scores for strict evaluation.
- It can often utilize Token F1 Scores for partial credit.
- It can often incorporate Character-Level IOU for fine-grained evaluation.
- It can often include Span-Level Precision and span-level recall.
- ...
- It can range from being a Binary Span Measure to being a Graded Span Measure, depending on its scoring method.
- It can range from being a Single-Span Measure to being a Multi-Span Measure, depending on its evaluation scope.
- ...
- It can evaluate Span-Level Evidence Extraction Task performance.
- It can compare System Output Spans with gold standard spans.
- It can be normalized for cross-dataset comparison.
- It can guide Model Optimization in span extraction training.
- ...
- Example(s):
- SQuAD Exact Match, requiring perfect string match of extracted span.
- SQuAD F1 Score, computing token-level F1 between prediction and gold answer.
- Span-Level IOU, calculating intersection over union of character positions.
- Partial Match Score, giving graduated credit for overlapping spans.
- Window-Based Accuracy, accepting spans within k-token window.
- ...
- Counter-Example(s):
- Document-Level Accuracy, which evaluates whole document not span extraction.
- BLEU Score, which measures generation quality not extraction precision.
- Classification Accuracy, which scores labels not span boundarys.
- See: Information Extraction Metric, Span Evaluation Metric, NLP Performance Measure, Evidence Quality Metric.