Feb 12: All GLM-4.7-Flash quants reuploaded again?
#28
by
coder543
- opened
I had codex investigate the difference between the old and new UD-Q4_K_XL files:
1. File/container metadata
- New file is 32 bytes larger.
- GGUF.kv_count changed from 59 to 60.
- One new GGUF key was added: general.sampling.top_p = 0.95.
- Data block starts 32 bytes later in the new file (consistent with one added metadata entry).
2. Model tensor content
- Not just metadata: 4 tensors differ in quantization type and bytes:
- blk.9.ffn_down_exps.weight: Q4_K -> Q5_K
- blk.9.ffn_down_shexp.weight: Q6_K -> Q8_0
- blk.13.ffn_down_exps.weight: Q5_K -> Q4_K
- blk.13.ffn_down_shexp.weight: Q8_0 -> Q6_K
- These 4 tensors also have different SHA-256 payloads by name, so this is a real re-quantization change, not just
relabeling.
- Other checked tensors (example: output.weight) matched exactly.
3. Net effect
- Looks like precision was shifted between block 9 and block 13 while keeping overall size essentially the same, plus
adding top_p metadata.
Curious what prompted this change?
coder543
changed discussion status to
closed