Guide to Run Qwen3.5 locally! π
#14
by
danielhanchen
- opened
Hey guys we made a guide to run Qwen3.5 locally on your local device.
Run 3-bit on a 192GB RAM Mac, or 4-bit (MXFP4) on an M3 Ultra with 256GB RAM (or less).
Guide: https://unsloth.ai/docs/models/qwen3.5
GGUF: https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF
Let us know if you have any questions!

Thank you!
I was wondering if you can help explain (and maybe recommend) which 4bit quant would be the best for my use case of Apple Silicon Mac Studio 512GB -> llama-server -> Roo Code
- Which 4 bit is the best for Accuracy?
- Which is the best for Speed?
- Is there a model that offers that best of both for my Apple Silicon use case?
(Knowing that on your blog, you often say: "We use the UD-Q4_K_XL quant for the best size/accuracy balance")
IQ4_XS
Q4_K_S
IQ4_NL
Q4_0
Q4_1
Q4_K_M
Q4_K_XL
MXFP4
Would this answer change model-to-model?
Convert to MLX
Will this model outperform the Qwen 3 next 80b thinking model at Q6?
I hope we can come up with a way to run this on 98GB of RAM and 24GB VRAM...