amd
/

DeepSeek-R1-MXFP4-Preview

8-bit precision

Model card Files Files and versions

linzhao-amd commited on Aug 1

Commit

d996ba3

·

verified ·

1 Parent(s): bd3481d

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -23,12 +23,15 @@ This model is a quantized version of [deepseek-ai/DeepSeek-R1](https://huggingfa
 # Model Quantization
-The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Weights and activations were quantized to MXFP4. The AutoSmoothQuant algorithm was applied to enhance accuracy during quantization.
 **Quantization scripts:**
 ```
-# Dequantize the FP8 pretrained model to BFloat16, and then quantize the BFloat16 model using the following script.
 cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \

 # Model Quantization
+The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format, and the AutoSmoothQuant algorithm was applied to enhance accuracy.
+**Preprocessing requirement:**
+Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
+You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [unsloth/DeepSeek-R1-BF16](https://huggingface.co/unsloth/DeepSeek-R1-BF16).
 **Quantization scripts:**
 ```
 cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \