Evaluation Reliability Measure
Jump to navigation
Jump to search
An Evaluation Reliability Measure is a reliability measure that quantifies consistency and reproducibility of evaluation judgments across evaluators or evaluation instances.
- AKA: Evaluation Reliability Metric, Reliability Measure, Consistency Measure, Reproducibility Measure, Agreement Measure.
- Context:
- It can typically measure Inter-Rater Consistency among multiple evaluators.
- It can typically account for Chance Agreement through statistical correction.
- It can often identify Unreliable Evaluators requiring additional training.
- It can often establish Reliability Thresholds for evaluation validity.
- It can assess Temporal Stability through test-retest reliability.
- It can evaluate Internal Consistency of evaluation instruments.
- It can support Evaluation Quality Control through reliability monitoring.
- It can guide Evaluator Training Programs based on reliability scores.
- It can range from being a Binary Reliability Measure to being a Multi-Class Reliability Measure, depending on its category structure.
- It can range from being a Pairwise Reliability Measure to being a Group Reliability Measure, depending on its evaluator configuration.
- It can range from being a Raw Agreement Measure to being a Adjusted Agreement Measure, depending on its correction method.
- It can range from being a Global Reliability Measure to being a Item-Level Reliability Measure, depending on its aggregation scope.
- ...
- Examples:
- Agreement-Based Measures, such as:
- Statistical Reliability Measures, such as:
- Domain-Specific Reliabilitys, such as:
- ...
- Counter-Examples:
- Validity Measure, which measures accuracy not consistency.
- Performance Measure, which measures quality not agreement.
- Efficiency Measure, which measures speed not reliability.
- See: Reliability Measure, Inter-Expert Agreement Measure, Inter-Rater Reliability, Agreement Coefficient, Evaluation Quality, Statistical Agreement, Consistency Measure.