Spaces:

ReasoningTrap
/

README

Running

jadohu commited on May 27

Commit

d30ba4f

verified ·

1 Parent(s): 589ef88

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -10,14 +10,17 @@ pinned: false
 <!-- Banner -------------------------------------------------------------- -->
 <p align="center">
   <b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
-  ConditionedMath (AIME &amp; MATH500) · PuzzleTrivial · Training scripts · Zero-shot pipelines
 </p>
 ---
 ## 📜 Why ReasoningTrap?
-> Current RL-tuned Reasoning LLMs excel at *producing* answers, but often ignore explicit user constraints.
 > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
-* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 long-form proofs.

 <!-- Banner -------------------------------------------------------------- -->
 <p align="center">
   <b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
+  ConditionedMath (AIME &amp; MATH500) · PuzzleTrivial · Zero-shot pipelines
 </p>
 ---
 ## 📜 Why ReasoningTrap?
+> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints.
 > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
+* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
+* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
+* **Plug-and-play** – evaluate any 🤗 Transformers model with vLLM in simple instructions.