amd
/

Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3

Model card Files Files and versions

linzhao-amd commited on Jul 28

Commit

5b1d7fe

·

verified ·

1 Parent(s): eb42dad

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -11,6 +11,9 @@ base_model:
 - **Model Name**: Mixtral-7x8b
 - **Version**: MLPerf v5.1
 - **Commit**: Close Division Commit
 ## Calibration Dataset
 The calibration dataset consists of **1024 mixed datasets** provided by MLPerf, which includes:
@@ -30,6 +33,29 @@ The following layers are ignored during quantization:
 - `*.o_proj`
 - `lm_head`
 # Model Performance Comparison
 | Metric                | Baseline Accuracy Target (%) | FP8 Quant Accuracy  (%) |

 - **Model Name**: Mixtral-7x8b
 - **Version**: MLPerf v5.1
 - **Commit**: Close Division Commit
+- **Supported Hardware Microarchitecture**: AMD MI300/MI325
+- **Transformers**: 4.51.0
+- **Quark:** [0.9](https://quark.docs.amd.com/latest/install.html)
 ## Calibration Dataset
 The calibration dataset consists of **1024 mixed datasets** provided by MLPerf, which includes:
 - `*.o_proj`
 - `lm_head`
+## Quantization Scripts
+```
+cd examples/torch/language_modeling/llm_ptq/
+MODEL_DIR="mistralai/Mixtral-8x7B-Instruct-v0.1"
+DATASET="./mlperf_data/mixtral_8x7b%2F2024.06.06_mixtral_15k_calibration_v4.pkl"
+OUTPUT_DIR="quantized_models/Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf"
+python3 quantize_quark.py --model_dir "${MODEL}" \
+                          --output_dir "${OUTPUT_DIR}" \
+                          --dataset "${DATASET}" \
+                          --data_type float16 \
+                          --multi_gpu \
+                          --quant_scheme w_fp8_a_fp8 \
+                          --kv_cache_dtype fp8 \
+                          --num_calib_data 1024 \
+                          --seq_len 1024 \
+                          --min_kv_scale 1.0 \
+                          --model_export hf_format \
+                          --custom_mode fp8 \
+                          --quant_algo autosmoothquant \
+                          --exclude_layers "lm_head" "*.gate"
+```
 # Model Performance Comparison
 | Metric                | Baseline Accuracy Target (%) | FP8 Quant Accuracy  (%) |