IQ2_M performs surprisingly nice

#1
by AutisticPancake - opened

Use case: fooling around with writing / RP.

- When thinking is disabled, it's too focused in the moment (addressing most recent input, not catching the nuances of what happened before).
? Sometimes (rarely) it mixes up words / concepts: perplexity is noticeable, although treatable with re-generations.
+ Enabling thinking fights the 1st issue and generally makes the model get its shit together, even recalling some unexpected details (i.e. character lore not present in profile or lorebook).

Overall, IQ2_M is a little short of being impressive.

Update.

So, about that part -- Sometimes (rarely) it mixes up words / concepts -- either I'm getting crazy or raising the temperature eliminates this issue completely.
Anyway, scoring it as truly impressive now, at least in RP. I still wish there was an in-between variant (like 3bpw, no larger than 130 - 135GB), since IQ4_XS is just too big.

AutisticPancake changed discussion title from IQ2_M performs surprisingly nice (with some caveats). to IQ2_M performs surprisingly nice

For the ~3bpw range, I'd suggest @ubergarm 's quants: https://huggingface.co/ubergarm/GLM-4.7-GGUF

He's got an smol-IQ2_KS 99.237 GiB (2.379 BPW) and IQ2_KL 129.279 GiB (3.099 BPW) that might be suitable.

Sign up or log in to comment