File size: 4,331 Bytes
39e77c4 3828dcc 39e77c4 3828dcc 80f9a7b e809005 469c869 b35e9d0 fbf650b 4dc58d4 077bb01 3828dcc 5674584 3828dcc ce0cfef d996ba3 3828dcc 855b652 3828dcc 469c869 3828dcc c59c670 2c11de6 7f771af 3828dcc c407d8d 3828dcc f8a62be cceef5d f8a62be cceef5d f8a62be cceef5d f8a62be cceef5d f8a62be cceef5d f8a62be f1328be f8a62be a430df2 c407d8d a430df2 f8a62be cceef5d f8a62be cceef5d f8a62be 3828dcc 39e77c4 3828dcc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: mit
base_model:
- deepseek-ai/DeepSeek-R1
---
# Model Overview
- **Model Architecture:** DeepSeek-R1
- **Input:** Text
- **Output:** Text
- **Supported Hardware Microarchitecture:** AMD MI350/MI355
- **ROCm**: 7.0
- **PyTorch**: 2.8.0
- **Transformers**: 4.53.0
- **Operating System(s):** Linux
- **Inference Engine:** [SGLang](https://docs.sglang.ai/)
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10)
- **Weight quantization:** OCP MXFP4, Static
- **Activation quantization:** OCP MXFP4, Dynamic
- **KV cache**: OCP FP8, Static
- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
This model was built with deepseek-ai DeepSeek-R1 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
# Model Quantization
The model was quantized from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format.
**Preprocessing requirement:**
Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [unsloth/DeepSeek-R1-BF16](https://huggingface.co/unsloth/DeepSeek-R1-BF16).
**Quantization scripts:**
```
cd Quark/examples/torch/language_modeling/llm_ptq/
python3 quantize_quark.py --model_dir $MODEL_DIR \
--quant_scheme w_mxfp4_a_mxfp4 \
--group_size 32 \
--num_calib_data 128 \
--exclude_layers "lm_head" \
--skip_evaluation \
--multi_device \
--model_export hf_format \
--output_dir amd/DeepSeek-R1-MXFP4-Preview
```
# Deployment
### Use with SGLang
This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) backend.
## Evaluation
The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) frameworks.
### Accuracy
<table>
<tr>
<td><strong>Benchmark</strong>
</td>
<td><strong>DeepSeek-R1 </strong>
</td>
<td><strong>DeepSeek-R1-MXFP4-Preview(this model)</strong>
</td>
<td><strong>Recovery</strong>
</td>
</tr>
<tr>
<td>AIME24
</td>
<td>78.0
</td>
<td>69.57
</td>
<td>89.19%
</td>
</tr>
<tr>
<td>GSM8K
</td>
<td>95.81
</td>
<td>93.95
</td>
<td>98.05%
</td>
</tr>
</table>
### Reproduction
The result of AIME24 was obtained using [SGLang](https://docs.sglang.ai/) while result of GSM8K was obtained using [vLLM](https://docs.vllm.ai/en/latest/). Both evaluations were conducted via forked [lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot).
### AIME24
```
# Launching server
python3 -m sglang.launch_server \
--model amd/DeepSeek-R1-MXFP4-Preview \
--tp 8 \
--trust-remote-code \
--n-share-experts-fusion 8 \
--disable-radix-cache
# Evaluating
lm_eval --model local-completions \
--model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
--tasks aime24 \
--num_fewshot 0 \
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
--batch_size auto \
--log_samples \
--output_path output_data/aime24 2>&1 | tee logs/aime24.log
```
### GSM8K
```
lm_eval --model local-completions \
--model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
--tasks gsm8k \
--num_fewshot 5 \
--batch_size auto \
--log_samples \
--output_path output_data/gsm8k 2>&1 | tee logs/gsm8k.log
```
# License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved. |