# Guidelines for Loading Qwen3 (GPTQ) Quantized Models ## Installation Setup Download the `GPTQ-for-Qwen_hf` folder. ## File Replacement If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the **"Attention"** section in the `README`: - **Add eval_my directory**: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory. ## Load the model ### Group-wise Quantization #### 1. Perform GPTQ search ```bash CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ --wbits model_wbit --groupsize 128 \ --load path_of_.pth ``` #### 2. Evaluate the quantized model ```bash CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ --wbits model_wbit --groupsize 128 \ --load path_of_.pth --eval ``` ### Per-channel Quantization #### 1. Perform GPTQ search ```bash CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ --wbits model_wbit --groupsize -1 \ --load path_of_.pth ``` #### 2. Evaluate the quantized model ```bash CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ --wbits model_wbit --groupsize -1 \ --load path_of_.pth --eval ``` ## Notes - You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur. - Set the `groupsize` parameter to -1 for per-channel quantization. - Make sure you have sufficient GPU memory to run a 32B-sized model - Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information.