inarikami
/

DeepSeek-V3-int4-TensorRT

Text Generation

Model card Files Files and versions

inarikami commited on Dec 27, 2024

Commit

dfefd68

·

verified ·

1 Parent(s): 05471d4

Create README.md

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# DeepSeek V3 - INT4 (TensorRT-LLM)
+This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM.
+---
+base_model:
+- deepseek-ai/DeepSeek-V3
+---
+Model Summary
+	•	Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8)
+	•	Quantization: Weight-only INT4 (W4A16)
+```sh
+python convert_checkpoint.py \
+  --model_dir /home/user/hf/deepseek-v3-bf16 \
+  --output_dir /home/user/hf/deepseek-v3-int4 \
+  --dtype bfloat16 \
+  --tp_size 4 \
+  --use_weight_only \
+  --weight_only_precision int4 \
+  --workers 4
+```
+### Example usage:
+```sh
+trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT  \
+--output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4  \
+...
+```
+### Disclaimer:
+This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases.