LLM Benchmarking System

From GM-RKB

Jump to navigation Jump to search

An LLM Benchmarking System is a ML benchmarking system for LLMs.

Context:
- It can (typically) evaluate Large Language Models across a variety of tasks to determine their performance and capabilities.
- ...
Example(s):
- HELM LLM Benchmarking Framework, which evaluates LLMs across 42 different scenarios using multiple metrics.
- ...
Counter-Example(s):
- ...
See: HELM, GLUE, SuperGLUE, BIG-bench, Benchmarking.

References

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=LLM_Benchmarking_System&oldid=916404"

Concept