Feb 12: All GLM-4.7-Flash quants reuploaded again?

#28
by coder543 - opened

I had codex investigate the difference between the old and new UD-Q4_K_XL files:

  1. File/container metadata

  - New file is 32 bytes larger.
  - GGUF.kv_count changed from 59 to 60.
  - One new GGUF key was added: general.sampling.top_p = 0.95.
  - Data block starts 32 bytes later in the new file (consistent with one added metadata entry).

  2. Model tensor content

  - Not just metadata: 4 tensors differ in quantization type and bytes:
      - blk.9.ffn_down_exps.weight: Q4_K -> Q5_K
      - blk.9.ffn_down_shexp.weight: Q6_K -> Q8_0
      - blk.13.ffn_down_exps.weight: Q5_K -> Q4_K
      - blk.13.ffn_down_shexp.weight: Q8_0 -> Q6_K
  - These 4 tensors also have different SHA-256 payloads by name, so this is a real re-quantization change, not just
    relabeling.
  - Other checked tensors (example: output.weight) matched exactly.

  3. Net effect

  - Looks like precision was shifted between block 9 and block 13 while keeping overall size essentially the same, plus
    adding top_p metadata.

Curious what prompted this change?

coder543 changed discussion status to closed

Sign up or log in to comment