Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| <!-- Banner -------------------------------------------------------------- --> | |
| <p align="center"> | |
| <b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/> | |
| ConditionedMath (AIME & MATH500) Β· PuzzleTrivial Β· Zero-shot pipelines | |
| </p> | |
| --- | |
| ## π Why ReasoningTrap? | |
| > Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints. | |
| > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems. | |
| * **Modified from Famous MATH Reasoning Benchmark** β AIME & MATH500 problems altered with minimal constraints to divert reasoning paths. | |
| * **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one. | |
| * **Plug-and-play** β evaluate any π€ Transformers model with vLLM in simple instructions. | |