RedHatAI
/

Qwen3.5-397B-A17B-FP8-dynamic

compressed-tensors

Model card Files Files and versions

ekurtic commited on 3 days ago

Commit

16ec7a5

·

verified ·

1 Parent(s): 2366496

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -14,6 +14,8 @@ name: RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamic
 This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
 The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
 # Preliminary Evaluations
 1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy:

 This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
 The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
+It is compatible and tested against `vllm==0.16.0rc2.dev250+g28bffe946`. Deploy it with: `vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic`.
 # Preliminary Evaluations
 1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy: