VladBoyko commited on
Commit
582b310
Β·
verified Β·
1 Parent(s): efd8236

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -97
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- title: VibeThinker 1.5B Advanced
3
- emoji: 🧠⚑
4
- colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
@@ -10,121 +10,109 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # VibeThinker-1.5B: Advanced Interface with vLLM
14
 
15
- A high-performance reasoning model interface featuring:
16
 
17
- ## ✨ Key Features
18
 
19
- - ⚑ **10-20x Faster Inference** with vLLM optimization
20
- - πŸ€” **Collapsible Thinking Sections** - Explore the model's reasoning process
21
- - πŸ’» **Interactive Code Blocks** - Copy or download code with one click
22
- - πŸ“ **Structured Output Parsing** - Clean separation of thoughts, text, and code
23
- - 🎨 **Beautiful UI** - Modern, responsive design with syntax highlighting
24
 
25
- ## πŸš€ Performance Highlights
26
 
27
- | Model | Parameters | AIME24 | AIME25 | Training Cost |
28
- |-------|------------|--------|--------|---------------|
29
- | VibeThinker-1.5B | **1.5B** | **80.3** | **74.4** | **$7,800** |
30
- | DeepSeek R1 | 671B | 79.8 | 70.0 | $294,000+ |
31
 
32
- **400Γ— smaller, yet outperforms on math benchmarks!**
33
 
34
- ## πŸ“– Usage
35
 
36
- 1. Enter your math problem or coding challenge (English works best)
37
- 2. Adjust temperature and max tokens if needed
38
- 3. Click "Generate Solution"
39
- 4. Explore the thinking process, read the response, and interact with code blocks
40
 
41
- ## πŸ”§ Technical Details
42
 
43
- - **Backend**: vLLM for optimized inference
44
- - **GPU**: Runs efficiently on Nvidia T4 (16GB VRAM)
45
- - **Context Length**: Supports up to 40,960 tokens
46
- - **Output Parsing**: Automatic detection of thinking, text, and code sections
47
 
48
- ## πŸ“š Resources
 
 
 
 
 
49
 
50
- - [GitHub Repository](https://github.com/WeiboAI/VibeThinker)
51
- - [Technical Report](https://huggingface.co/papers/2511.06221)
52
- - [Model Card](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
53
 
54
- ## πŸ™ Citation
 
 
 
 
 
 
55
 
56
- ```bibtex
57
- @misc{xu2025tinymodelbiglogic,
58
- title={Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B},
59
- author={Sen Xu and Yi Zhou and Wei Wang and Jixin Min and Zhibin Yin and Yingwei Dai and Shixi Liu and Lianyu Pang and Yirong Chen and Junlin Zhang},
60
- year={2025},
61
- eprint={2511.06221},
62
- archivePrefix={arXiv},
63
- primaryClass={cs.AI},
64
- }```
65
 
66
- ---
 
 
 
 
 
 
 
67
 
68
- ## 🎯 **Key Improvements**
69
-
70
- ### **1. Performance (vLLM)**
71
- - **10-20x faster** inference compared to standard transformers
72
- - Better memory management with `gpu_memory_utilization=0.9`
73
- - Optimized for batch processing and long contexts
74
-
75
- ### **2. Output Parsing**
76
- The `parse_model_output()` function:
77
- - βœ… Extracts `<think>` tags for reasoning sections
78
- - βœ… Identifies code blocks with ` ``` ` markers
79
- - βœ… Separates regular text content
80
- - βœ… Handles nested and multiple sections
81
-
82
- ### **3. UI Enhancements**
83
-
84
- #### **Thinking Sections** πŸ€”
85
- - Collapsed by default (orange/yellow theme)
86
- - Click to expand and see reasoning
87
- - Monospace font for better readability
88
-
89
- #### **Code Blocks** πŸ’»
90
- - Open by default (blue theme)
91
- - **Copy button** - One-click clipboard copy
92
- - **Download button** - Save as `.py`, `.js`, `.html`, etc.
93
- - Language-aware file extensions
94
- - Syntax highlighting ready (add Prism.js if needed)
95
-
96
- #### **Text Sections** πŸ“
97
- - Clean, readable font
98
- - Proper line spacing
99
- - Subtle borders and shadows
100
-
101
- ### **4. Production-Ready Features**
102
- - Error handling with user-friendly messages
103
- - Queue management for multiple users
104
- - Responsive design
105
- - Accessible controls
106
- - Example problems pre-loaded
107
 
108
- ---
 
 
 
 
 
 
 
109
 
110
- ## πŸ“Š **Expected Performance**
 
 
 
111
 
112
- | Metric | Before (Transformers) | After (vLLM) | Improvement |
113
- |--------|----------------------|--------------|-------------|
114
- | **First Token Latency** | ~5-8s | ~0.5-1s | **8-10x faster** |
115
- | **Generation Speed** | ~10-15 tokens/s | ~100-150 tokens/s | **10x faster** |
116
- | **Total Time (8K tokens)** | ~400-600s | ~40-80s | **10x faster** |
117
- | **Memory Usage** | ~8-10GB | ~6-8GB | **More efficient** |
118
 
119
- For your 400s generation time on T4, vLLM should bring it down to **40-80 seconds**! πŸŽ‰
 
 
120
 
121
- ---
122
 
123
- ## πŸš€ **Quick Deployment**
 
 
 
124
 
125
- 1. Upload the three files to your HuggingFace Space
126
- 2. Select **Nvidia T4 - small** hardware ($0.40/hour)
127
- 3. Wait for build (~5-10 minutes for vLLM compilation)
128
- 4. Enjoy blazing-fast inference! ⚑
129
 
130
- The vLLM compilation might take a bit longer on first build, but the runtime performance will be dramatically better!
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: VibeThinker-1.5B Competitive Coding Assistant
3
+ emoji: 🧠
4
+ colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
 
10
  license: mit
11
  ---
12
 
13
+ # 🧠 VibeThinker-1.5B Competitive Coding Assistant
14
 
15
+ An interactive demo of **VibeThinker-1.5B** optimized for competitive programming challenges.
16
 
17
+ ## ⚑ Performance Highlights
18
 
19
+ - **AIME24**: 80.3 (surpasses DeepSeek R1's 79.8)
20
+ - **AIME25**: 74.4 (vs DeepSeek R1's 70.0)
21
+ - **LiveCodeBench V6**: 51.1 (competitive coding)
22
+ - **Training Cost**: Only $7,800 USD
23
+ - **Parameters**: 1.5B (400Γ— smaller than DeepSeek R1)
24
 
25
+ ## 🎯 What It's Best At
26
 
27
+ βœ… **Competitive Programming**: LeetCode, Codeforces, AtCoder-style algorithm problems
28
+ βœ… **Python Coding Challenges**: Problems with clear input/output specifications
29
+ βœ… **Mathematical Reasoning**: Complex proofs and formal reasoning tasks
30
+ βœ… **Algorithm Design**: Dynamic programming, graph algorithms, optimization problems
31
 
32
+ ## ⚠️ Important Limitations
33
 
34
+ This model is **specialized for competitive programming**, not general software development:
35
 
36
+ ❌ Not suitable for: Building applications, debugging real codebases, using specific libraries
37
+ ❌ Limited knowledge: Low encyclopedic knowledge, Python-focused training
38
+ ❌ Overthinking tendency: May generate verbose reasoning for simple tasks
39
+ ❌ Narrow scope: Optimized for benchmark-style problems, not production code
40
 
41
+ *See [community feedback analysis](https://www.reddit.com/r/LocalLLaMA/comments/1ou1emx/) for detailed real-world testing insights*
42
 
43
+ ## πŸš€ Features
 
 
 
44
 
45
+ - **🧠 Intelligent Parsing**: Automatic separation of reasoning and solution
46
+ - **πŸ“Š Token Tracking**: Real-time stats on generation time and token usage
47
+ - **πŸ’» Clean Code Display**: Syntax-highlighted, copyable/downloadable code blocks
48
+ - **πŸ“± Responsive Design**: Modern UI with collapsible reasoning sections
49
+ - **🎨 High Contrast**: Readable output with dark code blocks on white background
50
+ - **πŸ”„ Loop Detection**: Automatically detects and truncates repetitive output
51
 
52
+ ## πŸ› οΈ Technical Details
 
 
53
 
54
+ ### Model Information
55
+ - **Base Model**: Qwen2.5-Math-1.5B
56
+ - **Training Method**: Spectrum-to-Signal Principle (SSP)
57
+ - Supervised Fine-Tuning (SFT) for solution diversity
58
+ - Reinforcement Learning (RL) for correct reasoning paths
59
+ - **Inference Engine**: Standard `transformers` library (PyTorch)
60
+ - **Token Efficiency**: Configurable thinking depth via prompt hints
61
 
62
+ ### Hardware Requirements
63
+ - **Recommended**: Nvidia T4 - small (16 GB VRAM)
64
+ - **Memory Usage**: ~3-4 GB VRAM (1.5B params in float16)
65
+ - **Cost**: $0.40/hour on HuggingFace Spaces
 
 
 
 
 
66
 
67
+ ### Implementation
68
+ ```python
69
+ # Clean, simple transformers implementation
70
+ - torch.float16 for efficiency
71
+ - device_map="auto" for automatic GPU placement
72
+ - Repetition penalty (1.1) to reduce loops
73
+ - Automatic loop detection and truncation
74
+ ```
75
 
76
+ ## πŸ“– Usage Tips
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
+ ### For Best Results:
79
+ 1. **Frame problems competitively**: Clear input/output, edge cases, constraints
80
+ 2. **Adjust thinking tokens**:
81
+ - 1024-2048 for quick, simple problems
82
+ - 3072-4096 for standard algorithm challenges
83
+ - 6144-8192 for complex multi-step reasoning
84
+ 3. **Use Python**: Model trained primarily on Python code
85
+ 4. **Specify format**: Request specific output format (function, class, test cases)
86
 
87
+ ### Example Prompts:
88
+ ```
89
+ βœ… Good: "Write a function to find the longest increasing subsequence.
90
+ Include time/space complexity analysis and test with [10,9,2,5,3,7,101,18]"
91
 
92
+ βœ… Good: "Implement Dijkstra's algorithm with a min-heap. Handle disconnected graphs."
 
 
 
 
 
93
 
94
+ ❌ Poor: "Debug my React app" (not its purpose)
95
+ ❌ Poor: "How do I use pandas?" (limited library knowledge)
96
+ ```
97
 
98
+ ## πŸ”— Resources
99
 
100
+ - **Model**: [WeiboAI/VibeThinker-1.5B](https://huggingface.co/WeiboAI/VibeThinker-1.5B)
101
+ - **Paper**: [arXiv:2511.06221](https://arxiv.org/abs/2511.06221)
102
+ - **GitHub**: [WeiboAI/VibeThinker](https://github.com/WeiboAI/VibeThinker)
103
+ - **License**: MIT
104
 
105
+ ## πŸ™ Credits
 
 
 
106
 
107
+ Developed by **WeiboAI**. This Space demonstrates the model with a clean interface and enhanced user experience.
108
+
109
+ ## πŸ“ Citation
110
+
111
+ ```bibtex
112
+ @article{vibethinker2025,
113
+ title={Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B},
114
+ author={WeiboAI Team},
115
+ journal={arXiv preprint arXiv:2511.06221},
116
+ year={2025}
117
+ }
118
+ ```