Whisper Small Turbo
This is a "turbo" variant of openai/whisper-small created by reducing the decoder layers from 12 to 4 (following the same approach used for whisper-large-v3-turbo).
Model Description
- Base model: openai/whisper-small (244M parameters)
- Turbo variant:
168M parameters (31% reduction) - Decoder layers: 4 (reduced from 12)
- Encoder layers: 12 (unchanged)
Architecture Changes
| Parameter | Original | Turbo |
|---|---|---|
| encoder_layers | 12 | 12 |
| decoder_layers | 12 | 4 |
| d_model | 768 | 768 |
| Total Parameters | 244M | ~168M |
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
processor = WhisperProcessor.from_pretrained("mekpro/whisper-small-turbo")
model = WhisperForConditionalGeneration.from_pretrained("mekpro/whisper-small-turbo")
# Load audio and transcribe
# audio_input = ... # Your audio array at 16kHz
# input_features = processor(audio_input, sampling_rate=16000, return_tensors="pt").input_features
# predicted_ids = model.generate(input_features)
# transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
Creation Method
This model was created by:
- Loading the original whisper-small model
- Creating a new model with decoder_layers=4
- Copying encoder weights (unchanged)
- Copying first 4 decoder layers (indices 0-3)
- Copying all embeddings and layer norms
No additional fine-tuning was performed on this model.
Limitations
As this model has not been fine-tuned after decoder reduction, it may show degraded performance compared to the original whisper-small. For best results, consider fine-tuning on your target domain.
- Downloads last month
- 41