Multi-Hop Retrieval Performance Measure
Jump to navigation
Jump to search
A Multi-Hop Retrieval Performance Measure is a retrieval evaluation metric that is a complex task metric assessing the retrieval quality of evidence chains in multi-hop evidence retrieval systems.
- AKA: Chain Retrieval Metric, Multi-Step Evidence Score.
- Context:
- It can typically evaluate Complete Chain Accuracy for full reasoning paths.
- It can typically measure Hop-Level Precision at each retrieval step.
- It can typically assess Bridge Entity Coverage in evidence connections.
- It can typically quantify Reasoning Path Diversity across alternative chains.
- It can typically penalize Redundant Retrievals in evidence sequences.
- ...
- It can often incorporate Partial Credit Schemes for incomplete chains.
- It can often use Weighted Scoring based on hop importance.
- It can often measure Retrieval Efficiency via hop count.
- It can often evaluate Evidence Ordering Quality in chain construction.
- ...
- It can range from being a Binary Chain Measure to being a Graded Chain Measure, depending on its scoring method.
- It can range from being a Exact Match Measure to being a Relaxed Match Measure, depending on its evaluation strictness.
- ...
- It can evaluate Multi-Hop Evidence Retrieval Task performance.
- It can compare System Reasoning Paths with gold evidence chains.
- It can diagnose Retrieval Failure Points in reasoning process.
- It can guide System Optimization for complex retrieval.
- ...
- Example(s):
- HotpotQA EM Score, requiring exact match of all supporting facts.
- Supporting Fact F1, measuring overlap with gold evidence sets.
- Chain Completion Rate, calculating percentage of successful chains.
- Mean Reciprocal Rank for each hop in retrieval sequence.
- Path Accuracy Score, evaluating correctness of reasoning trajectory.
- ...
- Counter-Example(s):
- Single Document Recall, which ignores multi-hop requirements.
- Answer Accuracy, which evaluates final answer not evidence chain.
- Retrieval Speed Metric, which measures efficiency not quality.
- See: Information Retrieval Metric, Complex Task Evaluation, Evidence Quality Measure, Reasoning Performance Metric.