Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
|
@@ -35,19 +35,32 @@ resource constraints. To train models that are robust to truncated thinking, we
|
|
| 35 |
introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
|
| 36 |
which teaches the model to reason adaptively when the thinking process is cut
|
| 37 |
short and generalizes effectively to unseen budget constraints without additional
|
| 38 |
-
training.
|
|
|
|
|
|
|
| 39 |
<p align="center">
|
| 40 |
<img src="figs/framework.png" width="80%" />
|
| 41 |
</p>
|
| 42 |
-
|
| 43 |
-
|
|
|
|
| 44 |
**Main Takeaways**
|
| 45 |
1. βοΈ Thinking + Solution are explicitly separated with independent budgets β boosting reliability under tight compute constraints.
|
| 46 |
2. π§ Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
|
| 47 |
3. π Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
|
| 48 |
4. βοΈ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
|
| 49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
## Citation
|
| 52 |
|
| 53 |
|
|
|
|
| 35 |
introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
|
| 36 |
which teaches the model to reason adaptively when the thinking process is cut
|
| 37 |
short and generalizes effectively to unseen budget constraints without additional
|
| 38 |
+
training.
|
| 39 |
+
""")
|
| 40 |
+
gr.HTML("""
|
| 41 |
<p align="center">
|
| 42 |
<img src="figs/framework.png" width="80%" />
|
| 43 |
</p>
|
| 44 |
+
""")
|
| 45 |
+
gr.Markdown(
|
| 46 |
+
"""
|
| 47 |
**Main Takeaways**
|
| 48 |
1. βοΈ Thinking + Solution are explicitly separated with independent budgets β boosting reliability under tight compute constraints.
|
| 49 |
2. π§ Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
|
| 50 |
3. π Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
|
| 51 |
4. βοΈ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
|
| 52 |
|
| 53 |
+
<p align="center">
|
| 54 |
+
<img src="figs/aime.png" width="46%" />
|
| 55 |
+
<img src="figs/livecode.png" width="48%" />
|
| 56 |
+
</p>
|
| 57 |
|
| 58 |
+
<p align="center">
|
| 59 |
+
<img src="figs/codetable.png" width="90%" />
|
| 60 |
+
</p>
|
| 61 |
+
""")
|
| 62 |
+
gr.Markdown(
|
| 63 |
+
"""
|
| 64 |
## Citation
|
| 65 |
|
| 66 |
|