FP8 Quantized Qwen3.5-397B-A17B
This is a preliminary version (and subject to change) of FP8 quantized Qwen/Qwen3.5-397B-A17B model. The model has both weights and activations quantized to FP8 format with vllm-project/llm-compressor.
It is compatible and tested against vllm==0.16.0rc2.dev250+g28bffe946. Deploy it with: vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic.
Preliminary Evaluations
- GSM8k via vLLM's
tests/evals/gsm8k/gsm8k_eval.pyshows almost no degradation of accuracy:
| Qwen/Qwen3.5-397B-A17B | RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic (this model) |
|
|---|---|---|
| Accuracy | 89.5 | 89.4 |
| Recovery | - | 99.9% |
- Under greedy sampling, the model generates almost identical text to the unquantized baseline.
Qwen/Qwen3.5-397B-A17Bis left,RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamicis right:
Note: More rigorous evaluations are currently in progress and will be available soon.
- Downloads last month
- 1,710
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic
Base model
Qwen/Qwen3.5-397B-A17B