Thank you + Docker Image!
#2
by
macandchiz
- opened
Thank you very much for the quantization! If you don't mind, I would like to share the link to my Docker Hub repository so people can use that custom VLLM Docker image that works for the model! Getting roughly 108 t/s on dual 3090 @118000!
Thank you for sharing!
Thank you very much for the quantization! If you don't mind, I would like to share the link to my Docker Hub repository so people can use that custom VLLM Docker image that works for the model! Getting roughly 108 t/s on dual 3090 @118000!
which parameters are you using? I was able to get only up to 98304 with 2x3090, a bit more than that causes OOM
EDIT: I just saw the link... is there any downside of setting kv-cache-memory 6G?