Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DeepSeek V3 - INT4 (TensorRT-LLM)
|
| 2 |
+
|
| 3 |
+
This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
base_model:
|
| 7 |
+
- deepseek-ai/DeepSeek-V3
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
Model Summary
|
| 12 |
+
• Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8)
|
| 13 |
+
• Quantization: Weight-only INT4 (W4A16)
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
```sh
|
| 17 |
+
python convert_checkpoint.py \
|
| 18 |
+
--model_dir /home/user/hf/deepseek-v3-bf16 \
|
| 19 |
+
--output_dir /home/user/hf/deepseek-v3-int4 \
|
| 20 |
+
--dtype bfloat16 \
|
| 21 |
+
--tp_size 4 \
|
| 22 |
+
--use_weight_only \
|
| 23 |
+
--weight_only_precision int4 \
|
| 24 |
+
--workers 4
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
### Example usage:
|
| 29 |
+
|
| 30 |
+
```sh
|
| 31 |
+
trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT \
|
| 32 |
+
--output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4 \
|
| 33 |
+
...
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
### Disclaimer:
|
| 38 |
+
|
| 39 |
+
This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases.
|