Model Card for ealexeev/Mistral-Small-24B-NVFP4
Model Description
This is a compressed version of Mistral Small 24B Instruct, quantized to NVFP4 using llm-compressor.
This model is optimized for NVIDIA Blackwell/Hopper GPUs (H100, B200) and vLLM.
Benchmarks (Run on DGX Spark)
| Metric | Base Model (FP16) | This Model (NVFP4) | Delta |
|---|---|---|---|
| HellaSwag (Logic) | 83.47% | 83.20% | -0.27% |
| IFEval (Strict) | 71.46% | 70.50% | -0.96% |
| Throughput | 712 tok/s | 1344 tok/s | +1.88x |
Usage
⚠️ This model requires vLLM. It will NOT work with GGUF/llama.cpp/Ollama.
vLLM Command
vllm serve ealexeev/Mistral-Small-24B-NVFP4 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.8 \
--enforce-eager
- Downloads last month
- 21
Model tree for ealexeev/Mistral-Small-24B-NVFP4
Base model
mistralai/Mistral-Small-24B-Base-2501