Problem with model
1
#22 opened about 6 hours ago
by
dwojcik
Why does the KV cache occupy so much GPU memory?
1
#21 opened about 8 hours ago
by
yyg201708
How to stop thinking?
1
#20 opened about 9 hours ago
by
sha9921
Excellent version
π₯
2
3
#19 opened about 10 hours ago
by
luxiangyu
Cannot run vLLM on DGX Spark: ImportError: libcudart.so.12
#18 opened about 13 hours ago
by
yyg201708
Update README.md
#17 opened about 14 hours ago
by
dougyster1
Adding SGLang Docker
#16 opened about 14 hours ago
by
dougyster1
I hope GLM can release version 4.6 Air with Chinese thought processes, as version 4.7 seems to be written entirely in English. Alternatively, I'd like to release version 4.8 Air directly.
π
π€
3
#15 opened about 16 hours ago
by
mimeng1990
Installation Video and Testing - Step by Step
π
1
#13 opened about 20 hours ago
by
fahdmirzac
llama.cpp inference - 20 times (!) slower than OSS 20 on a RTX 5090
β
1
9
#12 opened 1 day ago
by
cmp-nct
We are so back!
β€οΈ
5
#10 opened 1 day ago
by
Carnyzzle
Is a dedicated Tech Report planned for GLM-4.7-Flash?
1
#8 opened 1 day ago
by
NodeLinker
FP8
3
#7 opened 1 day ago
by
Daemontatox
Recommended sampling parameters
π€
1
5
#6 opened 1 day ago
by
sszymczyk
Thank you!
π₯
13
#4 opened 1 day ago
by
mav23
Enormous KV-cache size?
π
β
4
13
#3 opened 1 day ago
by
nephepritou
Base model
π₯
6
1
#2 opened 1 day ago
by
tcpmux
Performance Discussion
π
2
2
#1 opened 1 day ago
by
IndenScale