any quantization released to reduce memory to fit into 8gb GPU RAM?

#33
by atolfia - opened

Fantastic work!!! but is was impossible for me to fit the personaplex into my 8gb gpu ram even trying to use RUST (moshi). Any plans for quantization released to reduce memory to fit into 8gb GPU RAM?
Thanks!!!

This a great work, for me personally i might be trying to use a rush

People even i am trying to Quantise it to 4Bit and run it on my RTX 3050 with GPU Offloading and other methods - But still facing issue - if you want i will give access to my GDRIVE where i have stored them all you guys can see what more can be done : nirooph1@gmail.com lets-connect

NVIDIA org

No plans for an official 4-bit quantization that will fit on 8GB VRAM. However, I am working on a FP8 weights-only quantization that should fit 16GB VRAM. Will leave this discussion open in case others successfully make a 4-bit quantized version. No plans on official Rust support either. My resources are quite limited, and focused on making a smarter future model.

Sign up or log in to comment