Thank you + Docker Image!

by macandchiz - opened 27 days ago

27 days ago

•

Thank you very much for the quantization! If you don't mind, I would like to share the link to my Docker Hub repository so people can use that custom VLLM Docker image that works for the model! Getting roughly 108 t/s on dual 3090 @118000!

https://hub.docker.com/r/infantryman77/glm-47-flash-awq

cpatonn

cyankiwi org 26 days ago

Thank you for sharing!

rainbyte

13 days ago

•

edited 13 days ago

Thank you very much for the quantization! If you don't mind, I would like to share the link to my Docker Hub repository so people can use that custom VLLM Docker image that works for the model! Getting roughly 108 t/s on dual 3090 @118000!

https://hub.docker.com/r/infantryman77/glm-47-flash-awq

which parameters are you using? I was able to get only up to 98304 with 2x3090, a bit more than that causes OOM
EDIT: I just saw the link... is there any downside of setting kv-cache-memory 6G?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment