Running on CPU Upgrade 24 Gaia2 Agents Evaluation Leaderboard π 24 Display and submit model evaluation results on a leaderboard