TruthfulQA Benchmarking Task

AKA: Truthful Question Answering Benchmark.
Context:
- Task Input: Adversarial or culturally misleading question.
- Task Output: Textual answer.
- Task Performance Measure/Metrics: Truthfulness Score, Human preference score, Informativeness.
- Benchmark dataset: https://github.com/sylinrl/TruthfulQA
- It can take adversarially phrased or culturally distorted questions and ask the model to generate a factual response.
- It can be scored via human ratings or automated factuality classifiers, along dimensions like truthfulness and informativeness.
- It can evaluate a model’s tendency to "hallucinate" or propagate falsehoods.
- It can include multiple categories such as health, history, law, and finance.
- It can range from obvious facts to subtle fallacies requiring knowledge disambiguation.
- ...
Example(s):
- GPT-3 scored low on TruthfulQA due to its tendency to parrot common myths or misleading stereotypes.
- Claude tested for factuality using TruthfulQA prompts across social and technical domains.
- GPT-4 scored significantly better by resisting adversarial phrasing.
- ...
Counter-Example(s):
- TriviaQA Benchmarking Task, which asks factual questions but without adversarial design.
- Toxicity Evaluation Tasks, which focus on harmful content rather than falsehood.
- Instruction-Following Benchmarks, which test compliance, not truthfulness.
- ...
See: LLM Inference Evaluation Task, Factuality Evaluation, LLM Hallucination, Adversarial Testing.

References

(Lin et al., 2021) ⇒ Lin, S., Hilton, J., & Evans, O. (2021). "TruthfulQA: Measuring How Models Mimic Human Falsehoods". In: arXiv preprint, arXiv:2109.07958.
- QUOTE: We introduce TruthfulQA, a benchmark to measure whether a model is truthful in generating answers to questions.
  The benchmark comprises questions that some humans answer falsely due to misconceptions.
  We find that even large language models sometimes mimic these human falsehoods.

(Lin et al., 2021) ⇒ Lin, S., Hilton, J., & Evans, O. (2021). "TruthfulQA: Measuring How Models Mimic Human Falsehoods". In: GitHub Repository.
- QUOTE: This repository contains code for evaluating model performance on the TruthfulQA benchmark.
  The full set of benchmark questions and reference answers is contained in `TruthfulQA.csv`.
  We have created a new and improved multiple-choice version of TruthfulQA.