NLG Gold Reference Dataset

From GM-RKB

(Redirected from Expert-Curated NLG Benchmark)

Jump to navigation Jump to search

An NLG Gold Reference Dataset is an evaluation reference dataset that contains expert-curated references serving as quality benchmarks for NLG system evaluation.

AKA: Gold Standard NLG Dataset, Reference Text Collection, Expert-Curated NLG Benchmark, Ground Truth NLG Dataset.
Context:
- It can typically include Multiple Reference Variants per input instance.
- It can typically undergo Expert Validation Processes for quality assurance.
- It can often support Pyramid Method Evaluations through content unit annotation.
- It can often enable Reference-Based Metrics like ROUGE Score and BLEU Score.
- It can incorporate Domain-Specific Annotations for specialized evaluation.
- It can maintain Annotation Guidelines ensuring consistency standards.
- It can provide Coverage Statistics across linguistic phenomenon.
- It can facilitate System Comparisons through standardized benchmarks.
- It can range from being a Small NLG Gold Reference Dataset to being a Large NLG Gold Reference Dataset, depending on its dataset size.
- It can range from being a Single-Domain NLG Gold Reference Dataset to being a Multi-Domain NLG Gold Reference Dataset, depending on its domain coverage.
- It can range from being a Monolingual NLG Gold Reference Dataset to being a Multilingual NLG Gold Reference Dataset, depending on its language scope.
- It can range from being a Static NLG Gold Reference Dataset to being a Dynamic NLG Gold Reference Dataset, depending on its update frequency.
- ...
Examples:
- Summarization Gold Reference Datasets, such as:
- Translation Gold Reference Datasets, such as:
  - WMT Gold Reference Dataset for machine translation.
  - OPUS Gold Reference Dataset for parallel text.
- Dialogue Gold Reference Datasets, such as:
  - PersonaChat Gold Reference Dataset for personality-based dialogue.
  - MultiWOZ Gold Reference Dataset for task-oriented dialogue.
- ...
Counter-Examples:
- Synthetic Reference Dataset, which uses automated generation.
- Crowd-Sourced Dataset, which lacks expert curation.
- Raw Text Corpus, which lacks reference annotation.
See: Evaluation Reference Dataset, Pyramid Method, Content Unit Annotation, Expert Adjudication Process, NLG Evaluation Framework, Reference-Based Evaluation Metric, Evaluation Protocol, Inter-Expert Agreement Metric, Evaluation Aspect.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=NLG_Gold_Reference_Dataset&oldid=974712"