Arena Elo Score
(Redirected from LMSYS Arena Elo Score)
Jump to navigation
Jump to search
A Arena Elo Score is a relative performance metric that is a crowdsourced LLM ranking score that ranks large language models based on human preference votes in pairwise comparisons.
- AKA: LMSYS Arena Elo Score, LMArena Elo Rating.
- Context:
- It can derive Bradley-Terry Model coefficients through maximum likelihood estimation on arena pairwise vote data.
- It can scale Arena BT Coefficients to arena elo rating scale using a factor of approximately 173.7178 plus an anchor constant.
- It can incorporate Arena Uncertainty Quantification through arena bootstrapping to compute arena confidence intervals.
- It can update dynamically from ongoing Arena Crowdsourced Battles in platforms like LMArena.
- It can handle Arena Tie Votes by assigning 0.5 probability to each model.
- ...
- It can range from being a Low-Volume Arena Elo Score to being a High-Volume Arena Elo Score, depending on its arena vote count.
- It can range from being an Online-Update Arena Elo Score to being a BT-MLE Arena Elo Score, depending on its arena estimation methodology.
- ...
- It can predict Arena Win Probability based on score differences.
- It can integrate with Arena LLM Leaderboards for ranking.
- ...
- Example(s):
- Gemini-2.5-Pro Arena Elo Score (2025), with score of 1474.
- Grok-4-0709 Arena Elo Score, with score of 1440.
- DeepSeek-R1-0528 Arena Elo Score, with score of 1424.
- ...
- Counter-Example(s):
- Traditional Benchmark Scores, which rely on fixed datasets rather than dynamic votes.
- Absolute Performance Metrics, which provide non-relative evaluations.
- Static Elo Ratings, which do not update dynamically.
- See: Bradley-Terry Model, Elo Rating System, LMSYS Chatbot Arena, Large Language Model Evaluation.