YinZhiBin commited on
Commit
1bfdd60
·
verified ·
1 Parent(s): 90804b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -37,9 +37,12 @@ VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total t
37
  ![image](https://cdn-uploads.huggingface.co/production/uploads/64d1faaa1ed6649d70d1fa2f/9d2--eHAW3aOlvdTe48cR.png)
38
  VibeThinker-1.5B's core innovation lies in the "Spectrum-to-Signal Principle" (SSP) training framework: it first explores solution diversity during the Supervised Fine-Tuning (SFT) stage, and then optimizes its policy to reinforce correct signals in the Reinforcement Learning (RL) stage. By systematically integrating these two phases, our approach establishes diversity as the central technical design principle, enabling VibeThinker-1.5B to achieve robust performance that surpasses conventional training paradigms.
39
 
40
- ##
41
  To facilitate quick verification by the community, we recommend the following parameter settings: temperature: 0.6 or 1.0, max token length: 40960, top_p: 0.95, top_k: -1.
42
- **We recommend using this model for competitive-style math and coding problems.** A more detailed evaluation scheme we have prepared can be found on [GitHub](https://github.com/WeiboAI/VibeThinker/tree/main/eval).
 
 
 
43
 
44
  ## License
45
  The model repository is licensed under the MIT License.
 
37
  ![image](https://cdn-uploads.huggingface.co/production/uploads/64d1faaa1ed6649d70d1fa2f/9d2--eHAW3aOlvdTe48cR.png)
38
  VibeThinker-1.5B's core innovation lies in the "Spectrum-to-Signal Principle" (SSP) training framework: it first explores solution diversity during the Supervised Fine-Tuning (SFT) stage, and then optimizes its policy to reinforce correct signals in the Reinforcement Learning (RL) stage. By systematically integrating these two phases, our approach establishes diversity as the central technical design principle, enabling VibeThinker-1.5B to achieve robust performance that surpasses conventional training paradigms.
39
 
40
+ ## Usage Guidelines
41
  To facilitate quick verification by the community, we recommend the following parameter settings: temperature: 0.6 or 1.0, max token length: 40960, top_p: 0.95, top_k: -1.
42
+
43
+ **We recommend using this model for competitive-style math and coding problems.**
44
+
45
+ A more detailed evaluation scheme we have prepared can be found on [GitHub](https://github.com/WeiboAI/VibeThinker/tree/main/eval).
46
 
47
  ## License
48
  The model repository is licensed under the MIT License.