Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,14 +10,17 @@ pinned: false
|
|
| 10 |
<!-- Banner -------------------------------------------------------------- -->
|
| 11 |
<p align="center">
|
| 12 |
<b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
|
| 13 |
-
ConditionedMath (AIME & MATH500) · PuzzleTrivial ·
|
| 14 |
</p>
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
## 📜 Why ReasoningTrap?
|
| 19 |
|
| 20 |
-
> Current RL-tuned Reasoning LLMs excel at *producing* answers
|
| 21 |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
|
| 22 |
-
* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500
|
|
|
|
|
|
|
|
|
|
| 23 |
|
|
|
|
| 10 |
<!-- Banner -------------------------------------------------------------- -->
|
| 11 |
<p align="center">
|
| 12 |
<b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
|
| 13 |
+
ConditionedMath (AIME & MATH500) · PuzzleTrivial · Zero-shot pipelines
|
| 14 |
</p>
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
## 📜 Why ReasoningTrap?
|
| 19 |
|
| 20 |
+
> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints.
|
| 21 |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
|
| 22 |
+
* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
|
| 23 |
+
* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
|
| 24 |
+
* **Plug-and-play** – evaluate any 🤗 Transformers model with vLLM in simple instructions.
|
| 25 |
+
|
| 26 |
|