ekurtic commited on
Commit
16ec7a5
·
verified ·
1 Parent(s): 2366496

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -14,6 +14,8 @@ name: RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamic
14
  This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
15
  The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
16
 
 
 
17
  # Preliminary Evaluations
18
 
19
  1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy:
 
14
  This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
15
  The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
16
 
17
+ It is compatible and tested against `vllm==0.16.0rc2.dev250+g28bffe946`. Deploy it with: `vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic`.
18
+
19
  # Preliminary Evaluations
20
 
21
  1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy: