Display and explore a leaderboard for model evaluations
How Good are LLMs at Text-Based Video Games?
A space to view and inspect all the tasks in lighteval