Whisper Burn GGUF β€” Q4_0 Quantized Models

Q4_0 quantized GGUF versions of OpenAI'''s Whisper models, optimized for GPU inference with whisper-burn.

Files

File Model Size Parameters
\ Whisper Large V3 ~800 MB 1550M (32 encoder + 32 decoder layers)
\ Whisper Medium ~604 MB 769M (24 encoder + 24 decoder layers)
\ BPE tokenizer ~2.1 MB Shared by all models

Quantization Details

  • Format: GGUF v3 with Q4_0 quantization
  • What'''s quantized: 2D weight matrices with dimensions > 256 are quantized to 4-bit (Q4_0 blocks: f16 scale + 16 packed nibble bytes per 32 elements)
  • What stays F32: Token embeddings, positional embeddings, biases, layer norms, and small matrices
  • Conversion script: \ from the whisper-burn repository

Model Comparison

Model Mel bins Hidden dim Encoder layers Decoder layers Accuracy Speed
Large V3 128 1280 32 32 Best Slower
Medium 80 1024 24 24 Good Fast

Usage with whisper-burn

These models are automatically downloaded by the whisper-burn desktop application. You can also download them manually:


Place all files in a \ directory next to the whisper-burn executable.

About whisper-burn

whisper-burn is a native Rust implementation of OpenAI'''s Whisper using the Burn ML framework with GPU acceleration via wgpu (Vulkan/Metal/DirectX).

Key features:

  • Pure Rust β€” no Python, no ONNX, no external runtime
  • GPU-accelerated β€” custom WGSL compute shaders for fused Q4 dequantization + matrix multiplication
  • Push-to-Talk β€” global hotkey with support for any key combo including modifier-only (e.g. Ctrl+Win)
  • 99+ languages β€” all Whisper-supported languages + automatic detection
  • Auto-paste β€” transcribed text automatically pasted into the active application
  • Windows native β€” desktop app with dark theme UI

Inference Pipeline

\

Source Models

License

The quantized weights inherit the license from the original OpenAI Whisper models (MIT License).

Downloads last month
113
GGUF
Model size
2B params
Architecture
whisper
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support