Qwen3.5 Unsloth GGUF Evaluation Results

#33
by danielhanchen - opened

Third party results conducted by Benjamin Marie:

Run the model locally via GGUFs here: https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF

"I tested Unsloth's UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well.
In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error increase).
You can cut on the order of ~500 GB of memory footprint while seeing little to no practical degradation (at least on the tasks I tried)."

Note the 3-bit is slightly higher accuracy than 4-bit due to a normal margin of error.

HBiVlMdWIAA5uNX

explain pls

To run models locally you need GGUFs and need to quantize them down. Benjamin show's how Unsloth's GGUFs of Qwen3.5 perform very well and nearly match the model's full precision even at 3 or 4-bit.

Is there a way to run the quantized gguf in sglang? Documentation on unsloth seems outdated. In particular, I am very interested in mxfp4 quantization.

Sign up or log in to comment