IQ2_KS works

#7
by coughmedicine - opened

but IMHO not as good as IQ1_KT of GLM 4.6 or 4.7.

Interesting, I'd imagine for agentic use and tool calls that maybe MiniMax-M2.5 would be better possibly?

Though M2.5 does not have any shexp nor dense layers - it is only attn and routed exps so with less active parameters and the percentage of active parameters being more quantized now it could be M2.5 doesn't handle heavy quantization as well as GLM's.

And I guess with GLM-4.7 smol-IQ1_KT 82.442 GiB (1.976 BPW) you can probably fit in more context because it is MLA so quite efficient.

If only GLM-5 were not so chonky! oof...

Sign up or log in to comment