Need help quantizing Qwen3-VL 4B Instruct to AWQ or GPTQ (and compatible library versions)

#12

by harsh-it - opened 30 days ago

30 days ago

Hi everyone,

I’m trying to quantize Qwen3-VL 4B Instruct into AWQ or GPTQ format for efficient inference.

I’ve already tried using AutoAWQ and LLMCompressor, but since I was running this on Kaggle, I faced dependency and GPU memory issues.
I know that basic 4-bit quantization via bitsandbytes works fine, but I want to properly convert the model into AWQ or GPTQ so that it can be loaded later using auto-gptq or lmdeploy.

Can anyone please share:

A working method or script for quantizing this specific model to AWQ/GPTQ

The compatible versions of libraries (torch, transformers, auto-gptq, autoawq, etc.) that are known to work together

Thanks in advance for any pointers or examples!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment