--- library_name: transformers tags: - math license: mit datasets: - sparkle-reasoning/hardmath pipeline_tag: reinforcement-learning --- **SparkleRL-7B-Stage2-aug** is the **Stage 2 RL-tuned model with partial-step scaffolding** used in the paper *Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning*. --- ## **Links** Paper: https://arxiv.org/abs/2506.04723 Code: https://github.com/sparkle-reasoning/sparkle Project Page: https://sparkle-reasoning.github.io/ --- ## **Quick Start** ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch name = "sparkle-reasoning/SparkleRL-7B-Stage2-aug" tok = AutoTokenizer.from_pretrained(name) model = AutoModelForCausalLM.from_pretrained( name, torch_dtype=torch.float16, device_map="auto" ) prompt = "Solve step by step: If 3x + 5 = 20, what is x?" inp = tok(prompt, return_tensors="pt").to(model.device) out = model.generate(**inp, max_new_tokens=256) print(tok.decode(out[0], skip_special_tokens=True)) ``` --- ## **Citation** ```bibtex @misc{wang2025accuracydissectingmathematicalreasoning, title={Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning}, author={Jiayu Wang and Yifei Ming and Zixuan Ke and Caiming Xiong and Shafiq Joty and Aws Albarghouthi and Frederic Sala}, year={2025}, eprint={2506.04723}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2506.04723}, } ```