Whisper Burn GGUF β Q4_0 Quantized Models
Q4_0 quantized GGUF versions of OpenAI'''s Whisper models, optimized for GPU inference with whisper-burn.
Files
| File | Model | Size | Parameters |
|---|---|---|---|
| \ | Whisper Large V3 | ~800 MB | 1550M (32 encoder + 32 decoder layers) |
| \ | Whisper Medium | ~604 MB | 769M (24 encoder + 24 decoder layers) |
| \ | BPE tokenizer | ~2.1 MB | Shared by all models |
Quantization Details
- Format: GGUF v3 with Q4_0 quantization
- What'''s quantized: 2D weight matrices with dimensions > 256 are quantized to 4-bit (Q4_0 blocks: f16 scale + 16 packed nibble bytes per 32 elements)
- What stays F32: Token embeddings, positional embeddings, biases, layer norms, and small matrices
- Conversion script: \ from the whisper-burn repository
Model Comparison
| Model | Mel bins | Hidden dim | Encoder layers | Decoder layers | Accuracy | Speed |
|---|---|---|---|---|---|---|
| Large V3 | 128 | 1280 | 32 | 32 | Best | Slower |
| Medium | 80 | 1024 | 24 | 24 | Good | Fast |
Usage with whisper-burn
These models are automatically downloaded by the whisper-burn desktop application. You can also download them manually:
Place all files in a \ directory next to the whisper-burn executable.
About whisper-burn
whisper-burn is a native Rust implementation of OpenAI'''s Whisper using the Burn ML framework with GPU acceleration via wgpu (Vulkan/Metal/DirectX).
Key features:
- Pure Rust β no Python, no ONNX, no external runtime
- GPU-accelerated β custom WGSL compute shaders for fused Q4 dequantization + matrix multiplication
- Push-to-Talk β global hotkey with support for any key combo including modifier-only (e.g. Ctrl+Win)
- 99+ languages β all Whisper-supported languages + automatic detection
- Auto-paste β transcribed text automatically pasted into the active application
- Windows native β desktop app with dark theme UI
Inference Pipeline
\
Source Models
License
The quantized weights inherit the license from the original OpenAI Whisper models (MIT License).
- Downloads last month
- 113
Hardware compatibility
Log In
to add your hardware
We're not able to determine the quantization variants.