Model Card for ealexeev/Mistral-Small-24B-NVFP4

Model Description

This is a compressed version of Mistral Small 24B Instruct, quantized to NVFP4 using llm-compressor.

This model is optimized for NVIDIA Blackwell/Hopper GPUs (H100, B200) and vLLM.

Metric	Base Model (FP16)	This Model (NVFP4)	Delta
HellaSwag (Logic)	83.47%	83.20%	-0.27%
IFEval (Strict)	71.46%	70.50%	-0.96%
Throughput	712 tok/s	1344 tok/s	+1.88x

⚠️ This model requires vLLM. It will NOT work with GGUF/llama.cpp/Ollama.

vllm serve ealexeev/Mistral-Small-24B-NVFP4 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.8 \
    --enforce-eager

Safetensors

Model size

14B params

Tensor type

BF16

F32

F8_E4M3

Base model

Finetuned

Quantized

(104)

this model