Fact Verification Performance Measure
Jump to navigation
Jump to search
A Fact Verification Performance Measure is a verification evaluation metric that is an evidence-based metric assessing both verdict accuracy and evidence quality in span-based fact verification systems.
- AKA: Fact Checking Score, Evidence-Based Verification Metric.
- Context:
- It can typically require Correct Verdict AND valid evidence spans.
- It can typically penalize Lucky Guesses without proper evidence.
- It can typically reward Complete Evidence Sets supporting verdict.
- It can typically measure Evidence Precision and evidence recall.
- It can typically assess Verdict Confidence based on evidence strength.
- ...
- It can often incorporate Weighted Scoring for evidence importance.
- It can often employ Hierarchical Evaluation at document, sentence, and span levels.
- It can often include Partial Credit for incomplete evidence.
- It can often measure System Calibration between confidence and accuracy.
- ...
- It can range from being a Strict Joint Measure to being a Relaxed Joint Measure, depending on its evaluation criteria.
- It can range from being a Binary Verification Measure to being a Multi-Class Verification Measure, depending on its verdict types.
- ...
- It can evaluate Span-Based Fact Verification Task performance.
- It can compare System Evidences with annotated evidence sets.
- It can diagnose Retrieval Failures versus classification errors.
- It can guide System Improvement for fact checking applications.
- ...
- Example(s):
- FEVER Score, requiring correct label AND complete evidence.
- Verdict Accuracy, measuring only classification correctness.
- Evidence F1 Score, evaluating evidence extraction quality.
- Potts Score, combining verdict and evidence ranking.
- Calibrated Verification Score, including confidence alignment.
- ...
- Counter-Example(s):
- Simple Accuracy, which ignores evidence requirement.
- Retrieval Recall, which measures document retrieval not verification.
- Generation BLEU, which evaluates text quality not factuality.
- See: Fact Checking Metric, Evidence-Based Evaluation, Joint Performance Measure, Verification Assessment.