Update README.md
Browse files
README.md
CHANGED
|
@@ -224,6 +224,32 @@ for output in outputs:
|
|
| 224 |
~~~
|
| 225 |
|
| 226 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
|
| 228 |
|
| 229 |
### Reproduce the model
|
|
|
|
| 224 |
~~~
|
| 225 |
|
| 226 |
|
| 227 |
+
### Evaluate the model
|
| 228 |
+
|
| 229 |
+
~~~bash
|
| 230 |
+
auto-round --eval --model "Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-inc" --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
|
| 231 |
+
~~~
|
| 232 |
+
|
| 233 |
+
|
| 234 |
+
|
| 235 |
+
| Metric | BF16 | INT4(auto-round) | INT4 (auto-round-best) |
|
| 236 |
+
| -------------------- | ------ | ---------------- | ---------------------- |
|
| 237 |
+
| Avg | 0.5958 | 0.5913 | 0.5926 |
|
| 238 |
+
| arc_challenge | 0.5137 | 0.5102 | 0.5043 |
|
| 239 |
+
| arc_easy | 0.7908 | 0.7862 | 0.7921 |
|
| 240 |
+
| boolq | 0.8498 | 0.8526 | 0.8443 |
|
| 241 |
+
| ceval-valid | 0.7296 | 0.7177 | 0.7140 |
|
| 242 |
+
| cmmlu | 0.7159 | 0.7029 | 0.7027 |
|
| 243 |
+
| gsm8k | 0.8211 | 0.8029 | 0.8234 |
|
| 244 |
+
| hellaswag | 0.5781 | 0.5703 | 0.5670 |
|
| 245 |
+
| lambada_openai | 0.5544 | 0.5490 | 0.5626 |
|
| 246 |
+
| leaderboard_ifeval | 0.2731 | 0.2729 | 0.2542 |
|
| 247 |
+
| leaderboard_mmlu_pro | 0.4115 | 0.4105 | 0.4117 |
|
| 248 |
+
| openbookqa | 0.3020 | 0.3060 | 0.3100 |
|
| 249 |
+
| piqa | 0.7617 | 0.7617 | 0.7612 |
|
| 250 |
+
| truthfulqa_mc1 | 0.3562 | 0.3611 | 0.3696 |
|
| 251 |
+
| winogrande | 0.6835 | 0.6740 | 0.6788 |
|
| 252 |
+
|
| 253 |
|
| 254 |
|
| 255 |
### Reproduce the model
|