---
library_name: transformers
tags:
- math
license: mit
datasets:
- sparkle-reasoning/hardmath
pipeline_tag: reinforcement-learning
---

**SparkleRL-7B-Stage2-aug** is the **Stage 2 RL-tuned model with partial-step scaffolding** used in the paper *Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning*.

---

## **Links**

Paper: https://arxiv.org/abs/2506.04723

Code: https://github.com/sparkle-reasoning/sparkle

Project Page: https://sparkle-reasoning.github.io/

---

## **Quick Start**

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

name = "sparkle-reasoning/SparkleRL-7B-Stage2-aug"

tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
    name, torch_dtype=torch.float16, device_map="auto"
)

prompt = "Solve step by step: If 3x + 5 = 20, what is x?"
inp = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inp, max_new_tokens=256)

print(tok.decode(out[0], skip_special_tokens=True))
```

---

## **Citation**

```bibtex
@misc{wang2025accuracydissectingmathematicalreasoning,
    title={Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning}, 
    author={Jiayu Wang and Yifei Ming and Zixuan Ke and Caiming Xiong and Shafiq Joty and Aws Albarghouthi and Frederic Sala},
    year={2025},
    eprint={2506.04723},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2506.04723},
}
```