# Guidelines for Loading Qwen3 (GPTQ) Quantized Models

## Installation Setup

Download the `GPTQ-for-Qwen_hf` folder.

## File Replacement

If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the **"Attention"** section in the `README`:

- **Add eval_my directory**: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory.

## Load the model

### Group-wise Quantization

#### 1. Perform GPTQ search

```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit  --groupsize 128  \
--load path_of_.pth
```

#### 2. Evaluate the quantized model

```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit  --groupsize 128  \
--load path_of_.pth --eval
```

### Per-channel Quantization

#### 1. Perform GPTQ search

```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit  --groupsize -1  \
--load path_of_.pth
```

#### 2. Evaluate the quantized model

```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit  --groupsize -1  \
--load path_of_.pth --eval
```

## Notes

- You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur.
- Set the `groupsize` parameter to -1 for per-channel quantization.
- Make sure you have sufficient GPU memory to run a 32B-sized model
- Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information.