NeuroSenko's picture
Update README.md
6b41dc7 verified
---
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE
base_model: Qwen/Qwen3-235B-A22B-Instruct-2507
base_model_relation: quantized
quantization: exl3
pipeline_tag: text-generation
tags:
- exl3
library_name: exllamav3
---
Quantization was performed using [exllama3 v0.0.15](https://github.com/turboderp-org/exllamav3).
| Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
|------------------------------------------------------------------------------------------------------|---------|------------------------|----------------------|------------|-----------|-----------|-----------|-----------|-----------|
| [3.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/3.0bpw) | 84 | 0.16501465 | 0.20774904 | 4.08571518 | 0.8661 | 0.5977 | 0.3563 | 0.1950 | 0.1001 |
| [4.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/4.0bpw) | 111 | 0.04595388 | 0.04850649 | 3.72529002 | 0.9290 | 0.7547 | 0.5509 | 0.3743 | 0.2411 |
| [5.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.0bpw) | 138 | 0.01664260 | 0.01688963 | 3.67661701 | 0.9563 | 0.8392 | 0.6822 | 0.5256 | 0.3872 |
| [5.5bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw) | 151 | 0.01046010 | 0.01069444 | 3.65590399 | 0.9652 | 0.8699 | 0.7384 | 0.5977 | 0.4654 |
| [5.5bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw-opt) | 151 | 0.00949581 | 0.00957771 | 3.65956156 | 0.9665 | 0.8748 | 0.7451 | 0.6049 | 0.4731 |
| [6.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.0bpw) | 165 | 0.00739469 | 0.00749036 | 3.64918384 | 0.9705 | 0.8889 | 0.7720 | 0.6403 | 0.5140 |
| [6.23bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw) | 171 | 0.00564774 | 0.00568584 | 3.64926113 | 0.9735 | 0.8989 | 0.7912 | 0.6696 | 0.5487 |
| [6.23bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw-opt) | 171 | 0.00743874 | 0.00743475 | 3.65749682 | 0.9702 | 0.8866 | 0.7671 | 0.6351 | 0.5084 |
| [7.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/7.0bpw) | 192 | 0.00429302 | 0.00428297 | 3.64699539 | 0.9782 | 0.9143 | 0.8197 | 0.7095 | 0.5975 |
| original | 437 | - | - | 3.64520522 | - | - | - | - | - |
The 5.5bpw-opt and 6.23bpw-opt quantizations were created using a combination of the 5.0bpw, 6.0bpw, and 7.0bpw measurements to optimize layer-wise bit allocation.
Full measurements data: [json](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/blob/main/Qwen3-235B-A22B-Instruct-2507_measurements_5vs6vs7.json)
### Metrics explanation
- **KL-divergence**: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original).
- **Perplexity**: Indicates how well the model predicts the next token. Lower values mean better prediction quality.
- **Top-K agreement**: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match):
- **K=1**: The fraction of times both models predict the exact same token as their top-1 choice
- **K=5**: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order