Q8_0 and Q5_0_custom static quants for this merge. Also, an overall 4.8bpw quant for IK_Llama.cpp and Croco.cpp, targetting 48GB VRAM users.

GGUF

Model size

71B params

Architecture

llama

Hardware compatibility

5-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexesQuants/Llama-3.3-Nemotron-70B-Instruct-Abliterated-TA_v0.10-iMat-CQ-GGUF

Base model

Quantized

(1)

this model