Need help quantizing Qwen3-VL 4B Instruct to AWQ or GPTQ (and compatible library versions)

#12
by harsh-it - opened

Hi everyone,

I’m trying to quantize Qwen3-VL 4B Instruct into AWQ or GPTQ format for efficient inference.

I’ve already tried using AutoAWQ and LLMCompressor, but since I was running this on Kaggle, I faced dependency and GPU memory issues.
I know that basic 4-bit quantization via bitsandbytes works fine, but I want to properly convert the model into AWQ or GPTQ so that it can be loaded later using auto-gptq or lmdeploy.

Can anyone please share:

A working method or script for quantizing this specific model to AWQ/GPTQ

The compatible versions of libraries (torch, transformers, auto-gptq, autoawq, etc.) that are known to work together

Thanks in advance for any pointers or examples!

Sign up or log in to comment