Thank you + Docker Image!

#2
by macandchiz - opened

Thank you very much for the quantization! If you don't mind, I would like to share the link to my Docker Hub repository so people can use that custom VLLM Docker image that works for the model! Getting roughly 108 t/s on dual 3090 @118000!

https://hub.docker.com/r/infantryman77/glm-47-flash-awq

cyankiwi org

Thank you for sharing!

Thank you very much for the quantization! If you don't mind, I would like to share the link to my Docker Hub repository so people can use that custom VLLM Docker image that works for the model! Getting roughly 108 t/s on dual 3090 @118000!

https://hub.docker.com/r/infantryman77/glm-47-flash-awq

which parameters are you using? I was able to get only up to 98304 with 2x3090, a bit more than that causes OOM
EDIT: I just saw the link... is there any downside of setting kv-cache-memory 6G?

Sign up or log in to comment