Guide to Run Qwen3.5 locally! 💜

#14

by danielhanchen - opened 4 days ago

Discussion

danielhanchen

Qwen org 4 days ago

•

edited 4 days ago

Hey guys we made a guide to run Qwen3.5 locally on your local device.

Run 3-bit on a 192GB RAM Mac, or 4-bit (MXFP4) on an M3 Ultra with 256GB RAM (or less).

Guide: https://unsloth.ai/docs/models/qwen3.5

GGUF: https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF

Let us know if you have any questions!

qwen3.5-guide

TimothyRoo

4 days ago

Thank you!

I was wondering if you can help explain (and maybe recommend) which 4bit quant would be the best for my use case of Apple Silicon Mac Studio 512GB -> llama-server -> Roo Code

Which 4 bit is the best for Accuracy?
Which is the best for Speed?
Is there a model that offers that best of both for my Apple Silicon use case?
(Knowing that on your blog, you often say: "We use the UD-Q4_K_XL quant for the best size/accuracy balance")

IQ4_XS
Q4_K_S
IQ4_NL
Q4_0
Q4_1
Q4_K_M
Q4_K_XL
MXFP4

Would this answer change model-to-model?

OriginalSuperman

3 days ago

Convert to MLX

MEW001

1 day ago

Will this model outperform the Qwen 3 next 80b thinking model at Q6?

km6slf

1 day ago

I hope we can come up with a way to run this on 98GB of RAM and 24GB VRAM...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment