NLG Gold Reference Dataset
(Redirected from Expert-Curated NLG Benchmark)
Jump to navigation
Jump to search
An NLG Gold Reference Dataset is an evaluation reference dataset that contains expert-curated references serving as quality benchmarks for NLG system evaluation.
- AKA: Gold Standard NLG Dataset, Reference Text Collection, Expert-Curated NLG Benchmark, Ground Truth NLG Dataset.
- Context:
- It can typically include Multiple Reference Variants per input instance.
- It can typically undergo Expert Validation Processes for quality assurance.
- It can often support Pyramid Method Evaluations through content unit annotation.
- It can often enable Reference-Based Metrics like ROUGE Score and BLEU Score.
- It can incorporate Domain-Specific Annotations for specialized evaluation.
- It can maintain Annotation Guidelines ensuring consistency standards.
- It can provide Coverage Statistics across linguistic phenomenon.
- It can facilitate System Comparisons through standardized benchmarks.
- It can range from being a Small NLG Gold Reference Dataset to being a Large NLG Gold Reference Dataset, depending on its dataset size.
- It can range from being a Single-Domain NLG Gold Reference Dataset to being a Multi-Domain NLG Gold Reference Dataset, depending on its domain coverage.
- It can range from being a Monolingual NLG Gold Reference Dataset to being a Multilingual NLG Gold Reference Dataset, depending on its language scope.
- It can range from being a Static NLG Gold Reference Dataset to being a Dynamic NLG Gold Reference Dataset, depending on its update frequency.
- ...
- Examples:
- Summarization Gold Reference Datasets, such as:
- Translation Gold Reference Datasets, such as:
- Dialogue Gold Reference Datasets, such as:
- ...
- Counter-Examples:
- Synthetic Reference Dataset, which uses automated generation.
- Crowd-Sourced Dataset, which lacks expert curation.
- Raw Text Corpus, which lacks reference annotation.
- See: Evaluation Reference Dataset, Pyramid Method, Content Unit Annotation, Expert Adjudication Process, NLG Evaluation Framework, Reference-Based Evaluation Metric, Evaluation Protocol, Inter-Expert Agreement Metric, Evaluation Aspect.