Human Baseline Evaluation Measure
(Redirected from human baseline evaluation measure)
Jump to navigation
Jump to search
A Human Baseline Evaluation Measure is an evaluation measure that measures system performance relative to human performance establishing human-anchored benchmarks for quality assessment.
- AKA: Human-Relative Measure, Human-Anchored Measure, Human Performance Comparison Measure, Human-Referenced Evaluation Measure.
- Context:
- It can typically normalize System Scores against human baselines.
- It can typically establish Performance Ceilings based on human capability.
- It can often enable Cross-Task Comparisons through human normalization.
- It can often support Deployment Decisions requiring human-level performance.
- It can incorporate Expert Human Performance or Average Human Performance.
- It can account for Human Performance Variance across evaluators.
- It can facilitate Interpretable Scores for stakeholder communication.
- It can validate Superhuman Claims when system exceeds human.
- It can range from being a Single-Human Baseline Measure to being a Multi-Human Baseline Measure, depending on its reference count.
- It can range from being a Expert-Baseline Measure to being a Crowd-Baseline Measure, depending on its human quality.
- It can range from being a Absolute Human Baseline Measure to being a Relative Human Baseline Measure, depending on its comparison type.
- It can range from being a Task-Specific Human Baseline Measure to being a General Human Baseline Measure, depending on its application scope.
- ...
- Examples:
- Direct Comparison Measures, such as:
- Normalized Score Measures, such as:
- Task-Specific Human Baselines, such as:
- ...
- Counter-Examples:
- Absolute Performance Measure, which lacks human reference.
- Gold Standard Measure, which uses curated references.
- Inter-System Measure, which compares system-to-system.
- See: Evaluation Measure, Human Parity Measure, Human Evaluation Method, Performance Ceiling, Baseline Comparison, Normalization Method, Human Performance Measurement.