ValueError: Weight output_partition_size = 16 is not divisible by weight quantization block_n = 128
#2
by
ValeKnappich
- opened
vllm fails to serve the model with the current quantization config:
ValueError: Weight output_partition_size = 16 is not divisible by weight quantization block_n = 128
The error traces back to the quantization of ColumnParallelLinear. I am using tp=4, but with higher tp, output_partition_size would only get smaller.
Did you manager to deploy this with vllm?