Nanbeige4.1-3B-f32-GGUF

Nanbeige4.1-3B from Nanbeige is a compact 3B-parameter decoder-only Transformer language model (both Base and Thinking variants) pre-trained on 23T high-quality tokens using hybrid filtering and WSD strategies, followed by multi-stage post-training—30M+ SFT samples, thoughtfulT refinement, dual-level distillation from larger Nanbeige models, and RL—achieving state-of-the-art small-model reasoning that outperforms Qwen3-8B/30B-32B on AIME2024/2025 (SOTA averages), GPQA-Diamond, LiveCodeBench-Pro, IMO-Answer-Bench, BFCL-V4 tool-use (53.8, +5.2 over Qwen3-30B-A3B), and Arena-Hard-V2/Multi-Challenge alignment (60.0/41.8) with 64K RoPE-extended context via ABF. Designed for deep single-pass multi-step reasoning on math/science/coding/puzzles without agentic loops, it employs Fine-Grained Warmup-Stable-Decay scheduling (0.1T warmup + 18.9T stable phases shifting to top-quality data) for superior token/sequence-level performance, matching 10x-larger models on demanding tasks while enabling consumer-grade local deployment under open license.

Nanbeige4.1-3B [GGUF]

File Name Quant Type File Size File Link
Nanbeige4.1-3B.BF16.gguf BF16 7.87 GB Download
Nanbeige4.1-3B.F16.gguf F16 7.87 GB Download
Nanbeige4.1-3B.F32.gguf F32 15.7 GB Download
Nanbeige4.1-3B.Q8_0.gguf Q8_0 4.18 GB Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
2,743
GGUF
Model size
4B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Nanbeige4.1-3B-f32-GGUF

Quantized
(41)
this model

Space using prithivMLmods/Nanbeige4.1-3B-f32-GGUF 1