MARIO-Math-Reasoning
/

SVPO_7B

Model card Files Files and versions

Decaderan commited on Dec 3, 2024

Commit

00c65a8

·

verified ·

1 Parent(s): 1f5041a

Update README.md

Files changed (1) hide show

README.md +33 -3

README.md CHANGED Viewed

@@ -1,3 +1,33 @@
----
-license: mit
----

+# Step-level Value Preference Optimization for Mathematical Reasoning
+This is the official repository for paper [Step-level Value Preference Optimization for Mathematical Reasoning](https://arxiv.org/abs/2406.10858). It is extracted from our internal corporate codebase. As a result, there may be slight differences when reproducing the numbers reported in our paper, but they should be very close.
+The implementation of SVPO is based on [AlphaMath](https://arxiv.org/abs/2405.03553), such as MCTS and Step-level beam search (SBS).
+Therefore, we provide the [code](https://github.com/MARIO-Math-Reasoning/Super_MARIO) of step-level preference pairs construction in this repository to facilitate reproduction.
+## Citation
+SVPO
+```
+@misc{chen2024steplevel,
+      title={Step-level Value Preference Optimization for Mathematical Reasoning},
+      author={Guoxin Chen and Minpeng Liao and Chengxi Li and Kai Fan},
+      year={2024},
+      eprint={2406.10858},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+AlphaMATH
+```
+@misc{chen2024alphamath,
+      title={AlphaMath Almost Zero: process Supervision without process},
+      author={Guoxin Chen and Minpeng Liao and Chengxi Li and Kai Fan},
+      year={2024},
+      eprint={2405.03553},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```