Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -20,5 +20,4 @@ pinned: false
|
|
| 20 |
> Current RL-tuned Reasoning LLMs excel at *producing* answers, but often ignore explicit user constraints.
|
| 21 |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
|
| 22 |
* **Modified from Famous MATH Reasoning Benchmark** β AIME & MATH500 long-form proofs.
|
| 23 |
-
* **Plug-and-play** β evaluate any π€ Transformers, vLLM or OpenAI-style chat model in two lines.
|
| 24 |
|
|
|
|
| 20 |
> Current RL-tuned Reasoning LLMs excel at *producing* answers, but often ignore explicit user constraints.
|
| 21 |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
|
| 22 |
* **Modified from Famous MATH Reasoning Benchmark** β AIME & MATH500 long-form proofs.
|
|
|
|
| 23 |
|