amd
/

DeepSeek-R1-MXFP4-Preview

8-bit precision

Model card Files Files and versions

bowenbaoamd commited on Oct 10

Commit

ce0cfef

·

verified ·

1 Parent(s): 359d1ee

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ This model was built with deepseek-ai DeepSeek-R1 model by applying [AMD-Quark](
 # Model Quantization
-The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format, and the AutoSmoothQuant algorithm was applied to enhance accuracy.
 **Preprocessing requirement:**
@@ -37,7 +37,6 @@ cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \
                           --group_size 32 \
-                          --kv_cache_dtype fp8 \
                           --num_calib_data 128 \
                           --exclude_layers "lm_head" \
                           --multi_device \

 # Model Quantization
+The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format.
 **Preprocessing requirement:**
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \
                           --group_size 32 \
                           --num_calib_data 128 \
                           --exclude_layers "lm_head" \
                           --multi_device \