Guidelines for Loading Qwen3 (GPTQ) Quantized Models
Installation Setup
Download the GPTQ-for-Qwen_hf folder.
File Replacement
If you need to use the tests we provide, please download the files in the eval_my directory on GitHub and pay attention to the "Attention" section in the README:
- Add eval_my directory: Place the
eval_mydirectory under theGPTQ-for-Qwendirectory.
Load the model
Group-wise Quantization
1. Perform GPTQ search
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize 128 \
--load path_of_.pth
2. Evaluate the quantized model
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize 128 \
--load path_of_.pth --eval
Per-channel Quantization
1. Perform GPTQ search
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize -1 \
--load path_of_.pth
2. Evaluate the quantized model
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize -1 \
--load path_of_.pth --eval
Notes
- You need to input the corresponding
wbitandgroupsizeparameters for the model; otherwise, loading errors may occur. - Set the
groupsizeparameter to -1 for per-channel quantization. - Make sure you have sufficient GPU memory to run a 32B-sized model
- Check GitHub for more information.