LLM-based System Evaluation Framework

From GM-RKB

Jump to navigation Jump to search

A LLM-based System Evaluation Framework is a AI system evaluation framework to builder LLM-based system evaluation systems.

Context:
- It can include various LLM-based System Performance Measures such as Perplexity, BLEU, ROUGE, METEOR, Human Evaluation, Diversity, and Zero-shot Evaluation.
- It can simply access with LMM-based System Benchmarks, such as: Big Bench, GLUE Benchmark, SuperGLUE Benchmark, MMLU, LIT, ParlAI, CoQA, LAMBADA, and HellaSwag.
- ...
Example(s):
- OpenAI Evals Framework.
- ...
Counter-Example(s):
- ...
See: Large Language Model, Model Evaluation, OpenAI Moderation API, EleutherAI LM Eval,

References

.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=LLM-based_System_Evaluation_Framework&oldid=883252"