Nanbeige4.1-3B-f32-GGUF
Nanbeige4.1-3B from Nanbeige is a compact 3B-parameter decoder-only Transformer language model (both Base and Thinking variants) pre-trained on 23T high-quality tokens using hybrid filtering and WSD strategies, followed by multi-stage post-training—30M+ SFT samples, thoughtfulT refinement, dual-level distillation from larger Nanbeige models, and RL—achieving state-of-the-art small-model reasoning that outperforms Qwen3-8B/30B-32B on AIME2024/2025 (SOTA averages), GPQA-Diamond, LiveCodeBench-Pro, IMO-Answer-Bench, BFCL-V4 tool-use (53.8, +5.2 over Qwen3-30B-A3B), and Arena-Hard-V2/Multi-Challenge alignment (60.0/41.8) with 64K RoPE-extended context via ABF. Designed for deep single-pass multi-step reasoning on math/science/coding/puzzles without agentic loops, it employs Fine-Grained Warmup-Stable-Decay scheduling (0.1T warmup + 18.9T stable phases shifting to top-quality data) for superior token/sequence-level performance, matching 10x-larger models on demanding tasks while enabling consumer-grade local deployment under open license.
Nanbeige4.1-3B [GGUF]
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| Nanbeige4.1-3B.BF16.gguf | BF16 | 7.87 GB | Download |
| Nanbeige4.1-3B.F16.gguf | F16 | 7.87 GB | Download |
| Nanbeige4.1-3B.F32.gguf | F32 | 15.7 GB | Download |
| Nanbeige4.1-3B.Q8_0.gguf | Q8_0 | 4.18 GB | Download |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 2,743
8-bit
16-bit
32-bit
