Update README.md

6b41dc7 verified 2 days ago

4.3 kB

metadata

license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE
base_model: Qwen/Qwen3-235B-A22B-Instruct-2507
base_model_relation: quantized
quantization: exl3
pipeline_tag: text-generation
tags:
  - exl3
library_name: exllamav3

Quantization was performed using exllama3 v0.0.15.

Quant	Size (GB)	KL-div (quant, orig)	KL-div (orig, quant)	Perplexity	Top-K K=1	Top-K K=2	Top-K K=3	Top-K K=4	Top-K K=5
3.0bpw	84	0.16501465	0.20774904	4.08571518	0.8661	0.5977	0.3563	0.1950	0.1001
4.0bpw	111	0.04595388	0.04850649	3.72529002	0.9290	0.7547	0.5509	0.3743	0.2411
5.0bpw	138	0.01664260	0.01688963	3.67661701	0.9563	0.8392	0.6822	0.5256	0.3872
5.5bpw	151	0.01046010	0.01069444	3.65590399	0.9652	0.8699	0.7384	0.5977	0.4654
5.5bpw_opt	151	0.00949581	0.00957771	3.65956156	0.9665	0.8748	0.7451	0.6049	0.4731
6.0bpw	165	0.00739469	0.00749036	3.64918384	0.9705	0.8889	0.7720	0.6403	0.5140
6.23bpw	171	0.00564774	0.00568584	3.64926113	0.9735	0.8989	0.7912	0.6696	0.5487
6.23bpw_opt	171	0.00743874	0.00743475	3.65749682	0.9702	0.8866	0.7671	0.6351	0.5084
7.0bpw	192	0.00429302	0.00428297	3.64699539	0.9782	0.9143	0.8197	0.7095	0.5975
original	437	-	-	3.64520522	-	-	-	-	-

The 5.5bpw-opt and 6.23bpw-opt quantizations were created using a combination of the 5.0bpw, 6.0bpw, and 7.0bpw measurements to optimize layer-wise bit allocation.

Full measurements data: json

Metrics explanation

KL-divergence: Measures the difference between probability distributions of quantized and original models. Lower is better (closer to original).
Perplexity: Indicates how well the model predicts the next token. Lower values mean better prediction quality.
Top-K agreement: Shows how often the quantized model's top-K predictions exactly match the original model's predictions. Higher values indicate better preservation of the original model's behavior (1.0 = perfect match):
- K=1: The fraction of times both models predict the exact same token as their top-1 choice
- K=5: The fraction of times both models have the same 5 tokens in their top-5 predictions in the exact same ranking order