|
|
--- |
|
|
license: apache-2.0 |
|
|
license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE |
|
|
base_model: Qwen/Qwen3-235B-A22B-Instruct-2507 |
|
|
base_model_relation: quantized |
|
|
quantization: exl3 |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- exl3 |
|
|
library_name: exllamav3 |
|
|
--- |
|
|
|
|
|
Quantization was performed using [exllama3 v0.0.15](https://github.com/turboderp-org/exllamav3). |
|
|
|
|
|
| Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 | |
|
|
|------------------------------------------------------------------------------------------------------|---------|------------------------|----------------------|------------|-----------|-----------|-----------|-----------|-----------| |
|
|
| [3.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/3.0bpw) | 84 | 0.16501465 | 0.20774904 | 4.08571518 | 0.8661 | 0.5977 | 0.3563 | 0.1950 | 0.1001 | |
|
|
| [4.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/4.0bpw) | 111 | 0.04595388 | 0.04850649 | 3.72529002 | 0.9290 | 0.7547 | 0.5509 | 0.3743 | 0.2411 | |
|
|
| [5.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.0bpw) | 138 | 0.01664260 | 0.01688963 | 3.67661701 | 0.9563 | 0.8392 | 0.6822 | 0.5256 | 0.3872 | |
|
|
| [5.5bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw) | 151 | 0.01046010 | 0.01069444 | 3.65590399 | 0.9652 | 0.8699 | 0.7384 | 0.5977 | 0.4654 | |
|
|
| [5.5bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw-opt) | 151 | 0.00949581 | 0.00957771 | 3.65956156 | 0.9665 | 0.8748 | 0.7451 | 0.6049 | 0.4731 | |
|
|
| [6.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.0bpw) | 165 | 0.00739469 | 0.00749036 | 3.64918384 | 0.9705 | 0.8889 | 0.7720 | 0.6403 | 0.5140 | |
|
|
| [6.23bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw) | 171 | 0.00564774 | 0.00568584 | 3.64926113 | 0.9735 | 0.8989 | 0.7912 | 0.6696 | 0.5487 | |
|
|
| [6.23bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw-opt) | 171 | 0.00743874 | 0.00743475 | 3.65749682 | 0.9702 | 0.8866 | 0.7671 | 0.6351 | 0.5084 | |
|
|
| [7.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/7.0bpw) | 192 | 0.00429302 | 0.00428297 | 3.64699539 | 0.9782 | 0.9143 | 0.8197 | 0.7095 | 0.5975 | |
|
|
| original | 437 | - | - | 3.64520522 | - | - | - | - | - | |
|
|
|
|
|
The 5.5bpw-opt and 6.23bpw-opt quantizations were created using a combination of the 5.0bpw, 6.0bpw, and 7.0bpw measurements to optimize layer-wise bit allocation. |
|
|
|
|
|
Full measurements data: [json](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/blob/main/Qwen3-235B-A22B-Instruct-2507_measurements_5vs6vs7.json) |
|
|
|
|
|
### Metrics explanation |
|
|
|
|
|
- **KL-divergence**: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original). |
|
|
- **Perplexity**: Indicates how well the model predicts the next token. Lower values mean better prediction quality. |
|
|
- **Top-K agreement**: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match): |
|
|
- **K=1**: The fraction of times both models predict the exact same token as their top-1 choice |
|
|
- **K=5**: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order |