Voxtral-Mini-4B-Realtime-2602 (GGUF)

This repository contains GGUF weights for Voxtral Realtime 4B, a high-performance speech recognition (STT) model optimized for low-latency, real-time inference.

These weights are converted from the original mistralai/Voxtral-Mini-4B-Realtime-2602 model.

Voxtral is designed to process streaming audio with minimal delay, making it ideal for live transcription, voice assistants, and interactive applications.

Model Details

Model Type: Speech Recognition / Transcription
Parameters: ~4 Billion
Architecture: Hybrid Encoder-Decoder with Log-Mel preprocessing
Format: GGUF (optimized for ggml)
Sample Rate: 16,000 Hz (Mono)

Inference with voxtral.cpp

For the fastest inference performance on CPU and GPU, use the voxtral.cpp (https://github.com/andrijdavid/voxtral.cpp) repository. It provides a lightweight C++ implementation based on ggml.

Getting Started

Clone the repository and build

 git clone https://github.com/andrijdavid/voxtral.cpp
 cd voxtral.cpp
 cmake -B build -DCMAKE_BUILD_TYPE=Release
 cmake --build build -j

Download a quantized model Use the provided script to download your preferred quantization (e.g., Q4_0):

  ./tools/download_model.sh Q4_0

Run Transcription

Prepare a 16kHz mono WAV file and run inference:

      ./build/voxtral \
      --model models/voxtral/Q4_0.gguf \
      --audio input.wav \
      --threads 8

For more advanced usage, including streaming examples and conversion scripts, please visit the voxtral.cpp GitHub repository (https://github.com/andrijdavid/voxtral.cpp).

Downloads last month: 2,326

GGUF

Model size

4B params

Architecture

voxtral_realtime

Hardware compatibility

2-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for andrijdavid/Voxtral-Mini-4B-Realtime-2602-GGUF

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Quantized

(4)

this model