TheDrummer/Precog-123B-v1
https://huggingface.co/TheDrummer/Precog-123B-v1 -- Guessing this one was meant to already be in the pipeline (given that the exl3 is linked from the model card) but was either missed, or didn't work for some reason. Hoping it's just a miss and is easy to do!
git clone https://github.com/turboderp-org/exllamav3 && cd exllamav3 && pip install ninja packaging && pip install flash-attn>=2.7.4.post1 --no-build-isolation --no-cache-dir && pip install -r requirements.txt
I assume the issue is because of the nightly build mismatching, this should fix said mismatch and solve "/usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEab"
it takes about 6 or 8 hours to quant on an 18000+ cores (Gpus a6000 ada, L40, 5090) so I'm afraid I won't be able to myself, but otherwise its possible, for whoever wants to give it a try
this said, thanks for the quants artus they're well made, we really appreciate it