WeiboAI
/

VibeThinker-1.5B

Text Generation

text-generation-inference

Model card Files Files and versions

YinZhiBin commited on Nov 4

Commit

687e0d5

·

verified ·

1 Parent(s): 807421a

Update README.md

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -10,4 +10,20 @@ tags:
 - gpqa
 pipeline_tag: text-generation
 library_name: transformers
----

 - gpqa
 pipeline_tag: text-generation
 library_name: transformers
+---
+# VibeThinker-1.5B
+## Introduction
+VibeThinker-1.5B is a 1.5-billion parameter dense language model whose core innovation lies in adopting a post-training method called the "Spectrum-to-Signal Principle" (SSP). This method systematically enhances the reasoning capabilities of small models through a two-stage design: in the Supervised Fine-Tuning stage, it employs diversity-exploring distillation to generate a broad solution space, and in the Reinforcement Learning stage, it uses maximum entropy-guided policy optimization to reinforce the correct signal from that space.
+With an extremely low training cost (a total of $7,800 USD), the model achieves results on several challenging mathematics and code generation benchmarks that surpass or are comparable to models with hundreds of times more parameters, significantly challenging the industry consensus that "small models struggle to possess strong reasoning capabilities."
+## Key Performance Data:
+Mathematical Reasoning: On the three major math benchmarks AIME24, AIME25, and HMMT25, its scores (80.9, 73.7, and 50.1, respectively) all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).
+Code Generation: It achieved a score of 51.0 on LiveCodeBench v6, slightly ahead of Magistral Medium (50.3) and significantly outperforming the base model (0.0).
+## Highlights Summary
+💡 Breakthrough Scale-Efficiency: With an extremely small parameter count of 1.5B, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium, proving that sophisticated algorithmic design can largely compensate for the gap in parameter scale.
+🔁 Innovative Training Paradigm: The proposed SSP framework explicitly divides the training process into "Spectrum" (the SFT stage, pursuing diversity of solutions) and "Signal" (the RL stage, reinforcing the correct signal), providing a new paradigm for the efficient reasoning training of small models.
+🌱 Low Cost and High Accessibility: A total training cost of under $8,000 USD dramatically lowers the barrier to entry for advanced AI research and application, promoting the democratization of AI research.