Model Card for ealexeev/Mistral-Small-24B-NVFP4

Model Description

This is a compressed version of Mistral Small 24B Instruct, quantized to NVFP4 using llm-compressor.

This model is optimized for NVIDIA Blackwell/Hopper GPUs (H100, B200) and vLLM.

Benchmarks (Run on DGX Spark)

Metric Base Model (FP16) This Model (NVFP4) Delta
HellaSwag (Logic) 83.47% 83.20% -0.27%
IFEval (Strict) 71.46% 70.50% -0.96%
Throughput 712 tok/s 1344 tok/s +1.88x

Usage

⚠️ This model requires vLLM. It will NOT work with GGUF/llama.cpp/Ollama.

vLLM Command

vllm serve ealexeev/Mistral-Small-24B-NVFP4 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.8 \
    --enforce-eager
Downloads last month
21
Safetensors
Model size
14B params
Tensor type
BF16
·
F32
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ealexeev/Mistral-Small-24B-NVFP4

Dataset used to train ealexeev/Mistral-Small-24B-NVFP4