Quantization was performed using exllama3 v0.0.15.
| Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
|---|---|---|---|---|---|---|---|---|---|
| 3.0bpw | 84 | 0.16501465 | 0.20774904 | 4.08571518 | 0.8661 | 0.5977 | 0.3563 | 0.1950 | 0.1001 |
| 4.0bpw | 111 | 0.04595388 | 0.04850649 | 3.72529002 | 0.9290 | 0.7547 | 0.5509 | 0.3743 | 0.2411 |
| 5.0bpw | 138 | 0.01664260 | 0.01688963 | 3.67661701 | 0.9563 | 0.8392 | 0.6822 | 0.5256 | 0.3872 |
| 5.5bpw | 151 | 0.01046010 | 0.01069444 | 3.65590399 | 0.9652 | 0.8699 | 0.7384 | 0.5977 | 0.4654 |
| 5.5bpw_opt | 151 | 0.00949581 | 0.00957771 | 3.65956156 | 0.9665 | 0.8748 | 0.7451 | 0.6049 | 0.4731 |
| 6.0bpw | 165 | 0.00739469 | 0.00749036 | 3.64918384 | 0.9705 | 0.8889 | 0.7720 | 0.6403 | 0.5140 |
| 6.23bpw | 171 | 0.00564774 | 0.00568584 | 3.64926113 | 0.9735 | 0.8989 | 0.7912 | 0.6696 | 0.5487 |
| 6.23bpw_opt | 171 | 0.00743874 | 0.00743475 | 3.65749682 | 0.9702 | 0.8866 | 0.7671 | 0.6351 | 0.5084 |
| 7.0bpw | 192 | 0.00429302 | 0.00428297 | 3.64699539 | 0.9782 | 0.9143 | 0.8197 | 0.7095 | 0.5975 |
| original | 437 | - | - | 3.64520522 | - | - | - | - | - |
The 5.5bpw-opt and 6.23bpw-opt quantizations were created using a combination of the 5.0bpw, 6.0bpw, and 7.0bpw measurements to optimize layer-wise bit allocation.
Full measurements data: json
Metrics explanation
- KL-divergence: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original).
- Perplexity: Indicates how well the model predicts the next token. Lower values mean better prediction quality.
- Top-K agreement: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match):
- K=1: The fraction of times both models predict the exact same token as their top-1 choice
- K=5: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order
Model tree for NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3
Base model
Qwen/Qwen3-235B-A22B-Instruct-2507