Need help quantizing Qwen3-VL 4B Instruct to AWQ or GPTQ (and compatible library versions)
#12
by
harsh-it
- opened
Hi everyone,
I’m trying to quantize Qwen3-VL 4B Instruct into AWQ or GPTQ format for efficient inference.
I’ve already tried using AutoAWQ and LLMCompressor, but since I was running this on Kaggle, I faced dependency and GPU memory issues.
I know that basic 4-bit quantization via bitsandbytes works fine, but I want to properly convert the model into AWQ or GPTQ so that it can be loaded later using auto-gptq or lmdeploy.
Can anyone please share:
A working method or script for quantizing this specific model to AWQ/GPTQ
The compatible versions of libraries (torch, transformers, auto-gptq, autoawq, etc.) that are known to work together
Thanks in advance for any pointers or examples!