Text Generation
Transformers
Safetensors
step3p5
conversational
custom_code
Eval Results

Add MathArena evaluation result for aime/aime_2026

#26

This PR adds a new MathArena evaluation result so it can be indexed on the model leaderboard page.

Model: stepfun-ai/Step-3.5-Flash
Competition dataset id: MathArena/aime_2026
Score: 96.67
Result file: .eval_results/MathArena--aime_2026.yaml

The results are the same as the ones displayed on our webpage.

Note: this is an experimental feature, we are currently trying to make this work as smooth as possible.

This is actually a fantastic score

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment