Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ tags:
|
|
| 15 |
We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
|
| 16 |
The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
|
| 17 |
|
| 18 |
-
The AceMath-7B/72B-RM models are developed from
|
| 19 |
|
| 20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
| 21 |
|
|
@@ -30,7 +30,7 @@ For more information about AceMath, check our [website](https://research.nvidia.
|
|
| 30 |
### Evaluation & Training Data
|
| 31 |
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
| 32 |
|
| 33 |
-
###
|
| 34 |
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
|
| 35 |
|
| 36 |
## Benchmark Results (AceMath-Instruct + AceMath-72B-RM)
|
|
|
|
| 15 |
We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
|
| 16 |
The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
|
| 17 |
|
| 18 |
+
The AceMath-7B/72B-RM models are developed from our AceMath-7B/72B-Instruct models and trained on AceMath-RM-Training-Data using Bradley-Terry loss. The architecture employs standard sequence classification with a linear layer on top of the language model, using the final token to output a scalar score.pull
|
| 19 |
|
| 20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
| 21 |
|
|
|
|
| 30 |
### Evaluation & Training Data
|
| 31 |
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
| 32 |
|
| 33 |
+
### General Instruction Models
|
| 34 |
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
|
| 35 |
|
| 36 |
## Benchmark Results (AceMath-Instruct + AceMath-72B-RM)
|