FP8 Quantized Qwen3.5-397B-A17B

This is a preliminary version (and subject to change) of FP8 quantized Qwen/Qwen3.5-397B-A17B model. The model has both weights and activations quantized to FP8 format with vllm-project/llm-compressor.

It is compatible and tested against vllm==0.16.0rc2.dev250+g28bffe946. Deploy it with: vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic.

Preliminary Evaluations

GSM8k via vLLM's tests/evals/gsm8k/gsm8k_eval.py shows almost no degradation of accuracy:

	Qwen/Qwen3.5-397B-A17B	RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic (this model)
Accuracy	89.5	89.4
Recovery	-	99.9%

Under greedy sampling, the model generates almost identical text to the unquantized baseline. Qwen/Qwen3.5-397B-A17B is left, RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamic is right:

Note: More rigorous evaluations are currently in progress and will be available soon.

Downloads last month: 1,710

Safetensors

Model size

397B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic

Base model

Qwen/Qwen3.5-397B-A17B

Quantized

(17)

this model