Update vllm eval results
Browse files
README.md
CHANGED
|
@@ -10,6 +10,13 @@ This model is a int4 model with group_size 128 and symmetric quantization of [de
|
|
| 10 |
Please follow the license of the original model.
|
| 11 |
|
| 12 |
## How To Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
### INT4 Inference
|
| 14 |
Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
|
| 15 |
For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
|
|
@@ -168,6 +175,26 @@ autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, n
|
|
| 168 |
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
| 169 |
```
|
| 170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
|
| 172 |
## Ethical Considerations and Limitations
|
| 173 |
|
|
|
|
| 10 |
Please follow the license of the original model.
|
| 11 |
|
| 12 |
## How To Use
|
| 13 |
+
|
| 14 |
+
### vLLM usage
|
| 15 |
+
|
| 16 |
+
~~~bash
|
| 17 |
+
vllm serve Intel/DeepSeek-V3.1-int4-AutoRound
|
| 18 |
+
~~~
|
| 19 |
+
|
| 20 |
### INT4 Inference
|
| 21 |
Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
|
| 22 |
For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
|
|
|
|
| 175 |
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
| 176 |
```
|
| 177 |
|
| 178 |
+
## Evaluate Results
|
| 179 |
+
|
| 180 |
+
| benchmark | backend | Intel/DeepSeek-V3.1-int4-AutoRound | deepseek-ai/DeepSeek-V3.1 |
|
| 181 |
+
| :-------: | :-----: | :--------------------------------: | :-----------------------: |
|
| 182 |
+
| mmlu_pro | vllm | 0.7865 | 0.7965 |
|
| 183 |
+
|
| 184 |
+
```
|
| 185 |
+
# key dependency version
|
| 186 |
+
torch 2.8.0
|
| 187 |
+
transformers 4.56.2
|
| 188 |
+
lm_eval 0.4.9.1
|
| 189 |
+
vllm 0.10.2rc3.dev291+g535d80056.precompiled
|
| 190 |
+
|
| 191 |
+
# eval cmd
|
| 192 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
|
| 193 |
+
lm_eval --model vllm \
|
| 194 |
+
--model_args pretrained=Intel/DeepSeek-V3.1-int4-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
|
| 195 |
+
--tasks mmlu_pro \
|
| 196 |
+
--batch_size 64
|
| 197 |
+
```
|
| 198 |
|
| 199 |
## Ethical Considerations and Limitations
|
| 200 |
|