YinZhiBin commited on
Commit
687e0d5
·
verified ·
1 Parent(s): 807421a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -10,4 +10,20 @@ tags:
10
  - gpqa
11
  pipeline_tag: text-generation
12
  library_name: transformers
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - gpqa
11
  pipeline_tag: text-generation
12
  library_name: transformers
13
+ ---
14
+ # VibeThinker-1.5B
15
+ ## Introduction
16
+ VibeThinker-1.5B is a 1.5-billion parameter dense language model whose core innovation lies in adopting a post-training method called the "Spectrum-to-Signal Principle" (SSP). This method systematically enhances the reasoning capabilities of small models through a two-stage design: in the Supervised Fine-Tuning stage, it employs diversity-exploring distillation to generate a broad solution space, and in the Reinforcement Learning stage, it uses maximum entropy-guided policy optimization to reinforce the correct signal from that space.
17
+
18
+ With an extremely low training cost (a total of $7,800 USD), the model achieves results on several challenging mathematics and code generation benchmarks that surpass or are comparable to models with hundreds of times more parameters, significantly challenging the industry consensus that "small models struggle to possess strong reasoning capabilities."
19
+
20
+ ## Key Performance Data:
21
+ Mathematical Reasoning: On the three major math benchmarks AIME24, AIME25, and HMMT25, its scores (80.9, 73.7, and 50.1, respectively) all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).
22
+ Code Generation: It achieved a score of 51.0 on LiveCodeBench v6, slightly ahead of Magistral Medium (50.3) and significantly outperforming the base model (0.0).
23
+
24
+ ## Highlights Summary
25
+ 💡 Breakthrough Scale-Efficiency: With an extremely small parameter count of 1.5B, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium, proving that sophisticated algorithmic design can largely compensate for the gap in parameter scale.
26
+
27
+ 🔁 Innovative Training Paradigm: The proposed SSP framework explicitly divides the training process into "Spectrum" (the SFT stage, pursuing diversity of solutions) and "Signal" (the RL stage, reinforcing the correct signal), providing a new paradigm for the efficient reasoning training of small models.
28
+
29
+ 🌱 Low Cost and High Accessibility: A total training cost of under $8,000 USD dramatically lowers the barrier to entry for advanced AI research and application, promoting the democratization of AI research.