Update README.md

6b41dc7 verified 4 days ago

4.3 kB

	---
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE
	base_model: Qwen/Qwen3-235B-A22B-Instruct-2507
	base_model_relation: quantized
	quantization: exl3
	pipeline_tag: text-generation
	tags:
	- exl3
	library_name: exllamav3
	---

	Quantization was performed using [exllama3 v0.0.15](https://github.com/turboderp-org/exllamav3).

	\| Quant \| Size (GB) \| KL-div (quant, orig) \| KL-div (orig, quant) \| Perplexity \| Top-K K=1 \| Top-K K=2 \| Top-K K=3 \| Top-K K=4 \| Top-K K=5 \|
	\|------------------------------------------------------------------------------------------------------\|---------\|------------------------\|----------------------\|------------\|-----------\|-----------\|-----------\|-----------\|-----------\|
	\| [3.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/3.0bpw) \| 84 \| 0.16501465 \| 0.20774904 \| 4.08571518 \| 0.8661 \| 0.5977 \| 0.3563 \| 0.1950 \| 0.1001 \|
	\| [4.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/4.0bpw) \| 111 \| 0.04595388 \| 0.04850649 \| 3.72529002 \| 0.9290 \| 0.7547 \| 0.5509 \| 0.3743 \| 0.2411 \|
	\| [5.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.0bpw) \| 138 \| 0.01664260 \| 0.01688963 \| 3.67661701 \| 0.9563 \| 0.8392 \| 0.6822 \| 0.5256 \| 0.3872 \|
	\| [5.5bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw) \| 151 \| 0.01046010 \| 0.01069444 \| 3.65590399 \| 0.9652 \| 0.8699 \| 0.7384 \| 0.5977 \| 0.4654 \|
	\| [5.5bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw-opt) \| 151 \| 0.00949581 \| 0.00957771 \| 3.65956156 \| 0.9665 \| 0.8748 \| 0.7451 \| 0.6049 \| 0.4731 \|
	\| [6.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.0bpw) \| 165 \| 0.00739469 \| 0.00749036 \| 3.64918384 \| 0.9705 \| 0.8889 \| 0.7720 \| 0.6403 \| 0.5140 \|
	\| [6.23bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw) \| 171 \| 0.00564774 \| 0.00568584 \| 3.64926113 \| 0.9735 \| 0.8989 \| 0.7912 \| 0.6696 \| 0.5487 \|
	\| [6.23bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw-opt) \| 171 \| 0.00743874 \| 0.00743475 \| 3.65749682 \| 0.9702 \| 0.8866 \| 0.7671 \| 0.6351 \| 0.5084 \|
	\| [7.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/7.0bpw) \| 192 \| 0.00429302 \| 0.00428297 \| 3.64699539 \| 0.9782 \| 0.9143 \| 0.8197 \| 0.7095 \| 0.5975 \|
	\| original \| 437 \| - \| - \| 3.64520522 \| - \| - \| - \| - \| - \|

	The 5.5bpw-opt and 6.23bpw-opt quantizations were created using a combination of the 5.0bpw, 6.0bpw, and 7.0bpw measurements to optimize layer-wise bit allocation.

	Full measurements data: [json](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/blob/main/Qwen3-235B-A22B-Instruct-2507_measurements_5vs6vs7.json)

	### Metrics explanation

	- KL-divergence: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original).
	- Perplexity: Indicates how well the model predicts the next token. Lower values mean better prediction quality.
	- Top-K agreement: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match):
	- K=1: The fraction of times both models predict the exact same token as their top-1 choice
	- K=5: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order

	---
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE
	base_model: Qwen/Qwen3-235B-A22B-Instruct-2507
	base_model_relation: quantized
	quantization: exl3
	pipeline_tag: text-generation
	tags:
	- exl3
	library_name: exllamav3
	---

	Quantization was performed using [exllama3 v0.0.15](https://github.com/turboderp-org/exllamav3).

	\| Quant \| Size (GB) \| KL-div (quant, orig) \| KL-div (orig, quant) \| Perplexity \| Top-K K=1 \| Top-K K=2 \| Top-K K=3 \| Top-K K=4 \| Top-K K=5 \|
	\|------------------------------------------------------------------------------------------------------\|---------\|------------------------\|----------------------\|------------\|-----------\|-----------\|-----------\|-----------\|-----------\|
	\| [3.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/3.0bpw) \| 84 \| 0.16501465 \| 0.20774904 \| 4.08571518 \| 0.8661 \| 0.5977 \| 0.3563 \| 0.1950 \| 0.1001 \|
	\| [4.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/4.0bpw) \| 111 \| 0.04595388 \| 0.04850649 \| 3.72529002 \| 0.9290 \| 0.7547 \| 0.5509 \| 0.3743 \| 0.2411 \|
	\| [5.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.0bpw) \| 138 \| 0.01664260 \| 0.01688963 \| 3.67661701 \| 0.9563 \| 0.8392 \| 0.6822 \| 0.5256 \| 0.3872 \|
	\| [5.5bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw) \| 151 \| 0.01046010 \| 0.01069444 \| 3.65590399 \| 0.9652 \| 0.8699 \| 0.7384 \| 0.5977 \| 0.4654 \|
	\| [5.5bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/5.5bpw-opt) \| 151 \| 0.00949581 \| 0.00957771 \| 3.65956156 \| 0.9665 \| 0.8748 \| 0.7451 \| 0.6049 \| 0.4731 \|
	\| [6.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.0bpw) \| 165 \| 0.00739469 \| 0.00749036 \| 3.64918384 \| 0.9705 \| 0.8889 \| 0.7720 \| 0.6403 \| 0.5140 \|
	\| [6.23bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw) \| 171 \| 0.00564774 \| 0.00568584 \| 3.64926113 \| 0.9735 \| 0.8989 \| 0.7912 \| 0.6696 \| 0.5487 \|
	\| [6.23bpw_opt](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/6.23bpw-opt) \| 171 \| 0.00743874 \| 0.00743475 \| 3.65749682 \| 0.9702 \| 0.8866 \| 0.7671 \| 0.6351 \| 0.5084 \|
	\| [7.0bpw](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/tree/7.0bpw) \| 192 \| 0.00429302 \| 0.00428297 \| 3.64699539 \| 0.9782 \| 0.9143 \| 0.8197 \| 0.7095 \| 0.5975 \|
	\| original \| 437 \| - \| - \| 3.64520522 \| - \| - \| - \| - \| - \|

	The 5.5bpw-opt and 6.23bpw-opt quantizations were created using a combination of the 5.0bpw, 6.0bpw, and 7.0bpw measurements to optimize layer-wise bit allocation.

	Full measurements data: [json](https://huggingface.co/NeuroSenko/Qwen3-235B-A22B-Instruct-2507-exl3/blob/main/Qwen3-235B-A22B-Instruct-2507_measurements_5vs6vs7.json)

	### Metrics explanation

	- KL-divergence: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original).
	- Perplexity: Indicates how well the model predicts the next token. Lower values mean better prediction quality.
	- Top-K agreement: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match):
	- K=1: The fraction of times both models predict the exact same token as their top-1 choice
	- K=5: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order