LLM-as-Judge Evaluation Benchmark Dataset

From GM-RKB
Jump to navigation Jump to search

An LLM-as-Judge Evaluation Benchmark Dataset is a curated annotated benchmark dataset that provides test cases for evaluating llm judge performance.