ValueError: Weight output_partition_size = 16 is not divisible by weight quantization block_n = 128

#2
by ValeKnappich - opened

vllm fails to serve the model with the current quantization config:

ValueError: Weight output_partition_size = 16 is not divisible by weight quantization block_n = 128

The error traces back to the quantization of ColumnParallelLinear. I am using tp=4, but with higher tp, output_partition_size would only get smaller.

Did you manager to deploy this with vllm?

Sign up or log in to comment