BIG-Bench (Beyond the Imitation Game) Benchmark
(Redirected from BIG-bench benchmark)
Jump to navigation
Jump to search
A BIG-Bench (Beyond the Imitation Game) Benchmark is a text-based AI benchmark that ...
- Context:
- It can be composed of 78 diverse BIG-Bench Tasks in 9 categories, including summarization, question answering, and dialogue
- …
- Counter-Exmaple(s):
- See: BIG-Bench Hard.
References
2023
- GBard
- The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. It includes more than 200 tasks that cover a wide range of abilities, including
- The BIG-bench benchmark is designed to be challenging and to measure the true capabilities of large language models. It is still under development, but it is already being used by researchers to evaluate the latest language models and to develop new ways to improve their performance.
2022
- (Suzgun et al., 2022) ⇒ Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery et al. (2022). “Challenging Big-bench Tasks and Whether Chain-of-thought Can Solve Them.” arXiv preprint arXiv:2210.09261 https://doi.org/10.48550/arXiv.2210.09261
- ABSTRACT: BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language models fall short of average human-rater performance, and are those tasks actually unsolvable by current language models? ...
2022
- https://github.com/google/BIG-bench
- QUOTE: The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. The more than 200 tasks included in BIG-bench are summarized by keyword here, and by task name here. A paper introducing the benchmark, including evaluation results on large language models, is currently under review, and is available as a preprint.
New task submissions are encouraged. Tasks will be reviewed and merged into the BIG-bench repository on a rolling basis. New tasks are no longer eligible for inclusion in the initial BIG-bench release and paper. However, they will be included in future BIG-bench releases, and the task authors will be included in the author list of future publications.
- QUOTE: The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. The more than 200 tasks included in BIG-bench are summarized by keyword here, and by task name here. A paper introducing the benchmark, including evaluation results on large language models, is currently under review, and is available as a preprint.
2022
- (Srivastava et al., 2022) ⇒ Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, and others. (2022). “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models.” In: arXiv preprint arXiv:2206.04615.