Nanbeige4.1-3B-f32-GGUF

Nanbeige4.1-3B from Nanbeige is a compact 3B-parameter decoder-only Transformer language model (both Base and Thinking variants) pre-trained on 23T high-quality tokens using hybrid filtering and WSD strategies, followed by multi-stage post-training—30M+ SFT samples, thoughtfulT refinement, dual-level distillation from larger Nanbeige models, and RL—achieving state-of-the-art small-model reasoning that outperforms Qwen3-8B/30B-32B on AIME2024/2025 (SOTA averages), GPQA-Diamond, LiveCodeBench-Pro, IMO-Answer-Bench, BFCL-V4 tool-use (53.8, +5.2 over Qwen3-30B-A3B), and Arena-Hard-V2/Multi-Challenge alignment (60.0/41.8) with 64K RoPE-extended context via ABF. Designed for deep single-pass multi-step reasoning on math/science/coding/puzzles without agentic loops, it employs Fine-Grained Warmup-Stable-Decay scheduling (0.1T warmup + 18.9T stable phases shifting to top-quality data) for superior token/sequence-level performance, matching 10x-larger models on demanding tasks while enabling consumer-grade local deployment under open license.

Nanbeige4.1-3B [GGUF]

File Name	Quant Type	File Size	File Link
Nanbeige4.1-3B.BF16.gguf	BF16	7.87 GB	Download
Nanbeige4.1-3B.F16.gguf	F16	7.87 GB	Download
Nanbeige4.1-3B.F32.gguf	F32	15.7 GB	Download
Nanbeige4.1-3B.Q8_0.gguf	Q8_0	4.18 GB	Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

Downloads last month: 2,743

GGUF

Model size

4B params

Architecture

llama

Hardware compatibility

8-bit

16-bit

32-bit

Model tree for prithivMLmods/Nanbeige4.1-3B-f32-GGUF

Base model

Nanbeige/Nanbeige4-3B-Base

Finetuned

Nanbeige/Nanbeige4.1-3B

Quantized

(41)

this model

prithivMLmods
/

Nanbeige4.1-3B-f32-GGUF

Nanbeige4.1-3B-f32-GGUF

Nanbeige4.1-3B [GGUF]

Quants Usage

Model tree for prithivMLmods/Nanbeige4.1-3B-f32-GGUF

Space using prithivMLmods/Nanbeige4.1-3B-f32-GGUF 1