Update README.md
Browse files
README.md
CHANGED
|
@@ -23,12 +23,15 @@ This model is a quantized version of [deepseek-ai/DeepSeek-R1](https://huggingfa
|
|
| 23 |
|
| 24 |
# Model Quantization
|
| 25 |
|
| 26 |
-
The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
**Quantization scripts:**
|
| 29 |
```
|
| 30 |
-
# Dequantize the FP8 pretrained model to BFloat16, and then quantize the BFloat16 model using the following script.
|
| 31 |
-
|
| 32 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
| 33 |
python3 quantize_quark.py --model_dir $MODEL_DIR \
|
| 34 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
|
|
|
| 23 |
|
| 24 |
# Model Quantization
|
| 25 |
|
| 26 |
+
The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format, and the AutoSmoothQuant algorithm was applied to enhance accuracy.
|
| 27 |
+
|
| 28 |
+
**Preprocessing requirement:**
|
| 29 |
+
|
| 30 |
+
Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
|
| 31 |
+
You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [unsloth/DeepSeek-R1-BF16](https://huggingface.co/unsloth/DeepSeek-R1-BF16).
|
| 32 |
|
| 33 |
**Quantization scripts:**
|
| 34 |
```
|
|
|
|
|
|
|
| 35 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
| 36 |
python3 quantize_quark.py --model_dir $MODEL_DIR \
|
| 37 |
--quant_scheme w_mxfp4_a_mxfp4 \
|