Update README.md
Browse files
README.md
CHANGED
|
@@ -13,19 +13,23 @@ library_name: transformers
|
|
| 13 |
---
|
| 14 |
# VibeThinker-1.5B
|
| 15 |
## Introduction
|
| 16 |
-
VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total training cost of only $7,800 USD, it achieves reasoning performance on several challenging benchmarks that matches or even surpasses that of significantly larger models.
|
| 17 |
|
| 18 |
-
 all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).
|
| 22 |
|
| 23 |
-
Code Generation: It achieved a score of
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |

|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Highlights Summary
|
| 30 |
💡 Breakthrough Scale-Efficiency: With an extremely small parameter count of 1.5B, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium.
|
| 31 |
|
|
|
|
| 13 |
---
|
| 14 |
# VibeThinker-1.5B
|
| 15 |
## Introduction
|
| 16 |
+
VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total training cost of only $7,800 USD, it achieves reasoning performance on several challenging benchmarks that matches or even surpasses that of significantly larger models.
|
| 17 |
|
| 18 |
+

|
| 19 |
|
|
|
|
| 20 |
Mathematical Reasoning: On the three major math benchmarks AIME24, AIME25, and HMMT25, its scores (80.9, 73.7, and 50.1, respectively) all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).
|
| 21 |
|
| 22 |
+
Code Generation: It achieved a score of 55.3 on LiveCodeBench v5.
|
| 23 |
|
| 24 |
+
On the AIME 25 benchmark, VibeThinker-1.5B significantly extends the Pareto frontier of reasoning accuracy versus model scale, demonstrating that exceptional performance can be achieved with extreme parameter efficiency.
|
| 25 |
|
| 26 |

|
| 27 |
|
| 28 |
+
|
| 29 |
+
Its core innovation lies in the "Spectrum-to-Signal Principle" (SSP) training framework: it first explores solution diversity during the Supervised Fine-Tuning (SFT) stage, and then optimizes its policy to reinforce correct signals in the Reinforcement Learning (RL) stage.
|
| 30 |
+
|
| 31 |
+

|
| 32 |
+
|
| 33 |
## Highlights Summary
|
| 34 |
💡 Breakthrough Scale-Efficiency: With an extremely small parameter count of 1.5B, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium.
|
| 35 |
|