any quantization released to reduce memory to fit into 8gb GPU RAM?

#33

by atolfia - opened 15 days ago

15 days ago

Fantastic work!!! but is was impossible for me to fit the personaplex into my 8gb gpu ram even trying to use RUST (moshi). Any plans for quantization released to reduce memory to fit into 8gb GPU RAM?
Thanks!!!

smarttech445

15 days ago

This a great work, for me personally i might be trying to use a rush

Niroop-2007

14 days ago

People even i am trying to Quantise it to 4Bit and run it on my RTX 3050 with GPU Offloading and other methods - But still facing issue - if you want i will give access to my GDRIVE where i have stored them all you guys can see what more can be done : nirooph1@gmail.com lets-connect

royrajarshi

NVIDIA org 2 days ago

No plans for an official 4-bit quantization that will fit on 8GB VRAM. However, I am working on a FP8 weights-only quantization that should fit 16GB VRAM. Will leave this discussion open in case others successfully make a 4-bit quantized version. No plans on official Rust support either. My resources are quite limited, and focused on making a smarter future model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment