Subscribe and Support

This is zai-org/GLM-4.7-Flash quantized with AutoRound to W4A16, with fallback to 16-bit for the first MLP layer and the shared experts. The model is compatible with vLLM (tested: v0.15.0). Tested with an RTX Pro 6000 WK.

Developed by: The Kaitchup

Downloads last month: 100

Safetensors

Model size

5B params

Tensor type

F32

I32

BF16

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kaitchup/GLM-4.7-Flash-autoround-w4a16-excl-sharedexperts

Base model

zai-org/GLM-4.7-Flash

Quantized

(68)

this model

Collection including kaitchup/GLM-4.7-Flash-autoround-w4a16-excl-sharedexperts

Quantized GLM4.7 Flash

Collection

Verified models. • 3 items • Updated 22 days ago