🔢 INT4 vs FP4: The Future of 4-Bit Quantization

Community Article Published November 19, 2025

How Kimi Stole the Show

Why INT4

The Nuance of Conversion
What Gets Lost

The Distribution Mismatch

What If Kimi Were Trained on Blackwell?

How the Future Will Unfold
1. FP4 Will Eventually Dominate

2. The Ampere Renaissance

3. Challenges for Blackwell

4. The Next Chapter

How Kimi Stole the Show

4-bit quantization has been simmering in research labs for years. The promise was always clear: cut model size by 4× (from BF16/FP16), preserve most quality, democratize deployment of large models.

When Nvidia unveiled Blackwell in early 2024, NVFP4 was a centerpiece feature—a proper 4-bit floating-point format designed specifically for neural networks. Unlike INT4's uniform integer grid, FP4 brings the dynamic range advantages of floating-point to 4-bit precision. More precision near zero where weights cluster, wider range for outliers, better numerical properties overall.

FP4 is objectively superior to INT4. The format gives you:

Dynamic range: Sign bit + 2-bit exponent + 1-bit mantissa
Better representation near zero: Where most weights live
Proper handling of outliers: Without clipping or overflow
Hardware acceleration: Designed for Tensor Cores on Blackwell

Then Kimi dropped the K2 Thinking model, in INT4.

Why INT4

The reasons are simple.

Kimi doesn't have Blackwell GPUs. This is true for other Chinese model makers. Due to export control, their training cluster should mainly consist of Ampere (A800) and Hopper (H800). Ampere has basic INT4 support through its third-generation Tensor Cores. For Hopper, INT4 training works through software-based quantization methods like AWQ (Activation-aware Weight Quantization), not native hardware acceleration.

Their customers don't have Blackwell GPUs either.
The Ampere series are still mainstream in China, popular in US too. As such, it makes total sense to optimize your model around INT4, where the ecosystem is centered at.

Blackwell cannot inference Kimi's INT4 model. This is the ironic part, because Nvidia went all-in on NVFP4. The architecture simply lacks the tensor core instructions for native INT4. You can run INT4 models through software emulation, but you lose the performance gains that made Blackwell compelling in the first place.

The Nuance of Conversion

Here is another idea for Blackwell. Take the Kimi model, and convert from INT4 to FP4. This is effectively PTQ (post-training quantization), which is lossy.

What Gets Lost

QAT Training Path (INT4):
═══════════════════════════════════════════════════
BF16 activations → [Quantize] → INT4 → Forward pass
│
└─> Model learns INT4's specific grid spacing
└─> Weights cluster at INT4's quantization levels
└─> Error compensation tuned for INT4's properties

All intermediate training trajectories: shaped by INT4


PTQ Conversion (INT4 → FP4):
═══════════════════════════════════════════════════
Trained INT4 weights → [Direct format conversion] → FP4
│
Lost: the BF16→INT4 training trajectories
Lost: the quantization-aware adaptations
Lost: the error compensation learned during training

Result: Suboptimal FP4 model

The Distribution Mismatch

INT4 and FP4 have fundamentally different value distributions:

INT4: Uniform spacing
─────────────────────────────────────────
-8 -7 -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7
-  •  •  •  •  •  •  •  •  •  •  •  •  •  •  •
└─────────────────┬─────────────────┘
         Equal spacing everywhere


FP4: Exponential spacing (more precision near zero)
─────────────────────────────────────────
              Near zero: dense spacing
           ↓              ↓
-8.0  -4.0  -2.0  -1.0  -0.5  0  0.5  1.0  2.0  4.0  8.0
-      •     •     •     •   •   •    •    •    •    •
└──┘  └─┘  └┘   └──────┘    └──────┘   └┘  └─┘  └──┘
Wider    Medium   Fine      Fine    Medium    Wider

When you PTQ from INT4 to FP4, you're trying to map a uniform grid onto an exponential one. Values that were stable at INT4's quantization levels will land between FP4's levels. Vice versa if converting FP4 to INT4.

What If Kimi Were Trained on Blackwell?

Here is the sad part: the resulting model would have been even stronger than what we have today. As explained above, FP4 is newer and better than INT4, and was just designed for this purpose.

How the Future Will Unfold

1. FP4 Will Eventually Dominate

This is inevitable. FP4 is technically superior—better numerical properties, wider dynamic range, designed by people who understand neural network weight distributions. Once Blackwell reaches critical mass globally, FP4-native models will dominate too.

The timeline? It is hard to say. We only know this for sure: a great many training talents cannot access it today.

2. The Ampere Renaissance

Here's the interesting twist: Ampere and Hopper stay relevant longer than Nvidia's roadmap implied.

Why? Chinese models trained for INT4 create a large ecosystem of inference workloads optimized for pre-Blackwell hardware. If you are a neocloud or hyperscaler sitting on many H100s, Kimi just made them valuable for at least another 2 years.

Big tech's GPU depreciation narrative improves. CFOs are happy. The urgency to upgrade dwindles. INT4-native Chinese models provide a compelling reason to keep older hardware in production rather than writing it off.

3. Challenges for Blackwell

On the other hand, if the hottest open-source models can't be inferenced on Blackwell without performance loss, the drive to hardware upgrade will weaken. Also pressure is mounting on the other side of the fence. For those training talents who do have access to Blackwell, are you up to the challenge to train a SOTA model on FP4?

4. The Next Chapter

FP4 will eventually replace INT4. Innovations didn't slow down. They always find a new path. This tension between Chinese models and American hardware fragments the ecosystem which could have been simpler. It creates interesting dynamics which I touched upon in this post.

This pattern will continue. A few months or quarters down the road, another story will emerge.

References:

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote