Update README.md
Browse files
README.md
CHANGED
|
@@ -14,6 +14,8 @@ name: RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamic
|
|
| 14 |
This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
|
| 15 |
The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
|
| 16 |
|
|
|
|
|
|
|
| 17 |
# Preliminary Evaluations
|
| 18 |
|
| 19 |
1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy:
|
|
|
|
| 14 |
This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) model.
|
| 15 |
The model has both weights and activations quantized to FP8 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
|
| 16 |
|
| 17 |
+
It is compatible and tested against `vllm==0.16.0rc2.dev250+g28bffe946`. Deploy it with: `vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic`.
|
| 18 |
+
|
| 19 |
# Preliminary Evaluations
|
| 20 |
|
| 21 |
1) GSM8k via vLLM's `tests/evals/gsm8k/gsm8k_eval.py` shows almost no degradation of accuracy:
|