GLM-5 Pruned Q4_K_M GGUF

Pruned and quantized version of GLM-5 in GGUF format.

Model Details

With llama.cpp:

llama-server --model GLM-5-pruned-Q4_K_M.gguf --n-gpu-layers 999 --ctx-size 8192

This is a community upload of a pruned + quantized GLM-5 model. Requires significant RAM/VRAM due to the large MoE architecture.

GGUF

Model size

387B params

Architecture

glm-dsa

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

zai-org/GLM-5

Quantized

(15)

this model