amd
/

Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3

Model card Files Files and versions

ZhaofengZhang-AMD commited on Dec 16, 2024

Commit

19c1d68

·

verified ·

1 Parent(s): aa4fdc5

Update README.md

Files changed (1) hide show

README.md +45 -3

README.md CHANGED Viewed

@@ -1,3 +1,45 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+metrics:
+- accuracy
+base_model:
+- mistralai/Mixtral-8x7B-Instruct-v0.1
+---
+# Quark Team FP8 Mixtral-8x7B Model Overview
+## Model Information For MLPerf
+- **Model Name**: Mixtral-7x8b
+- **Version**: MLPerf v5.0
+- **Commit**: Close Division Commit
+## Calibration Dataset
+The calibration dataset consists of **1024 mixed datasets** provided by MLPerf, which includes:
+- **325 GSM8k samples**
+- **325 MBXP samples**
+- **374 OpenOcra samples**
+## Quantized Tensors
+The following tensors are quantized in each decoder:
+- **Expert MLP Inputs and Weights** (excluding the router)
+- **Linear qkv Inputs and Weight**
+- **KV Cache Entries**
+## Ignored Layers
+The following layers are ignored during quantization:
+- `*.gate`
+- `*.o_proj`
+- `lm_head`
+# Model Performance Comparison
+| Metric                | Baseline Accuracy Target (%) | FP8 Quant Accuracy  (%) |
+|-----------------------|--------------------|-----------------------|
+| **GSM8K (Math)**             | 73.66              | 73.18 (99.34%)                 |
+| **Open Orca (Chat)**         |                    |                       |
+| - Rouge1             | 45.5989            |  45.4362 (99.64%)                |
+| - Rouge2             | 23.3526         | 23.168 (99.21%)                |
+| - RougeL             | 30.4608           | 30.2922 (99.45%)               |
+| **MBXP (Code)**              | 60.16              |  60.08 (99.87%)                 |