GLM-5 Pruned Q4_K_M GGUF
Pruned and quantized version of GLM-5 in GGUF format.
Model Details
- Base Model: GLM-5 by Zhipu AI / Z.AI
- Quantization: Q4_K_M (4-bit, medium quality)
- Pruning: Pruned variant for reduced size
- Format: GGUF (compatible with llama.cpp, ollama, etc.)
- File Size: ~218 GB
Usage
With llama.cpp:
llama-server --model GLM-5-pruned-Q4_K_M.gguf --n-gpu-layers 999 --ctx-size 8192
Notes
This is a community upload of a pruned + quantized GLM-5 model. Requires significant RAM/VRAM due to the large MoE architecture.
- Downloads last month
- 5,118
Hardware compatibility
Log In
to add your hardware
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Maklei/GLM-5-pruned-Q4_K_M-GGUF
Base model
zai-org/GLM-5