OpenAI Evals Framework

From GM-RKB
Jump to navigation Jump to search

A OpenAI Evals Framework is an LLM-based System Evaluation Framework that is an OpenAI project.

  • Context:
    • It can incorporate LLMs as components.
    • It can ncludes an open-source registry of challenging evals, designed to assess system behavior through the Completion Function Protocol.
    • It can aims to facilitate building evals with minimal coding.
    • It can supports the evaluation of various system behaviors, including prompt chains or tool-using agents.
    • It can uses Git-LFS for downloading and managing evals from its registry.
    • It can offers guidelines for writing your own completion functions and submitting modelgraded evals with custom YAML files.
    • It can provides options for installing formatters for pre-committing and running evals locally via pip.
    • It can supports logging eval results to a Snowflake database.
    • ...
  • Example(s):
  • Counter-Example(s):
    • ...
  • See: Large Language Model Evaluation, AI Model Evaluation, Git-LFS.


References

2023

  • (OpenAI, 2023) ⇒ OpenAI. (2023). “OpenAI Evals: A Framework for Evaluating LLMs." In: GitHub Repository. [2]
    • QUOTE: Evals is a framework for evaluating LLMs (large language models) or systems built using LLMs as components. It also includes an open-source registry of challenging evals... With Evals, we aim to make it as simple as possible to build an eval while writing as little code as possible. An "eval" is a task used to evaluate the quality of a system's behavior... To get set up with evals, follow the setup instructions below. You can also run and create evals using Weights & Biases... To run evals, you will need to set up and specify your OpenAI API key...