Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13

Running on CPU Upgrade

237

MMLU-Pro Leaderboard

🥇

237

More advanced and challenging multi-task evaluation
Running

58

Stick To Your Role! Leaderboard

🎭

58

Benchmarking LLMs on the stability of simulated populations
Running

53

ZeroEval Leaderboard

📊

53

Embed ZeroEval for evaluation
Running

26

Decentralized Arena Leaderboard

🥇

26

View and compare LLM evaluations across various domains
Runtime error

Featured

432

Open Medical-LLM Leaderboard

🥇

432

Explore and submit models for benchmarking
Running

295

GPU Poor LLM Arena

🏆

295

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

Featured

129

Open VLM Video Leaderboard

🌎

129

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

13.7k

Open LLM Leaderboard

🏆

13.7k

Track, rank and evaluate open LLMs and chatbots
Running

451

TTS Spaces Arena

🤗

451

Blind vote on HF TTS models!