IQ2_M performs surprisingly nice
Use case: fooling around with writing / RP.
- When thinking is disabled, it's too focused in the moment (addressing most recent input, not catching the nuances of what happened before).
? Sometimes (rarely) it mixes up words / concepts: perplexity is noticeable, although treatable with re-generations.
+ Enabling thinking fights the 1st issue and generally makes the model get its shit together, even recalling some unexpected details (i.e. character lore not present in profile or lorebook).
Overall, IQ2_M is a little short of being impressive.
Update.
So, about that part -- Sometimes (rarely) it mixes up words / concepts -- either I'm getting crazy or raising the temperature eliminates this issue completely.
Anyway, scoring it as truly impressive now, at least in RP. I still wish there was an in-between variant (like 3bpw, no larger than 130 - 135GB), since IQ4_XS is just too big.
For the ~3bpw range, I'd suggest @ubergarm 's quants: https://huggingface.co/ubergarm/GLM-4.7-GGUF
He's got an smol-IQ2_KS 99.237 GiB (2.379 BPW) and IQ2_KL 129.279 GiB (3.099 BPW) that might be suitable.