π°π¬ Whisper Small - Kyrgyz, English, Russian Speech To Text model
Model Description
kyrgyz-whisper-small is a fine-tuned multilingual speech recognition model based on OpenAI's Whisper Small architecture. This model adds native Kyrgyz language support while maintaining strong performance on English and Russian.
Key Features
- Kyrgyz language support via custom
<|ky|>token. - Multilingual: Kyrgyz, English, and Russian.
- Trained on ~2,000 hours of Kyrgyz audio + 40% English/Russian audio
- Ready for further improvement with LoRA fine-tuning (see Colab Notebook)
- Optimized for real-world, noisy audio conditions
Performance
WER Distributions on FLEURS Benchmark
The following visualization shows improvement after fine-tuning:
Key Observations:
- Kyrgyz: Dramatic improvement from ~100% WER (unusable) β practical performance with peak around 0.2-0.4 WER
- English & Russian: Some performance degradation compared to base model as trade-off for Kyrgyz support
- Distributions shifted right (higher WER)
- This is expected when adding a new language to a fixed-capacity model
- Multi-language trade-off: The model sacrifices some accuracy on English/Russian to gain Kyrgyz capabilities
- Benchmark Fleurs
Recommended Use Cases
- Kyrgyz media transcription
- Multilingual call centers
- Educational content in Kyrgyz
- Code-switching scenarios (common in Kyrgyzstan where people mix languages)
- Foundation model for LoRA fine-tuning on clean Kyrgyz data
Technical Implementation
Custom Tokenizer Integration
from transformers import AutoTokenizer
# Load custom tokenizer with Kyrgyz support
tokenizer = AutoTokenizer.from_pretrained(
"nineninesix/kyrgyz-whisper-small",
trust_remote_code=True, ### !!! important !!!
language="kyrgyz",
task="transcribe"
)
Kyrgyz Token Initialization
The <|ky|> token was initialized as an average of embeddings from linguistically similar languages:
embedding_ky = (embedding_ru + embedding_kk + embedding_tr) / 3
Usage
Pipeline Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline, WhisperFeatureExtractor, AutoTokenizer
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "nineninesix/kyrgyz-whisper-small"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True)
model.to(device)
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True, language="kyrgyz", task="transcribe")
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=tokenizer,
feature_extractor=feature_extractor,
torch_dtype=torch_dtype,
device=device
)
result = pipe("audio.mp3")
print(result['text'])
Further Fine-tuning with LoRA
This model serves as a foundation for domain-specific fine-tuning using LoRA (Low-Rank Adaptation).
Unsloth integration example: see this Google Colab
Benefits of LoRA fine-tuning:
- Adapt to specific domains (medical, legal, conversational)
- Memory-efficient training
- Faster training than full fine-tuning
- Improved accuracy on clean datasets
Limitations
- Trained on noisy data - may have higher WER on clean benchmarks vs. clean-trained models
- Best performance on Kyrgyz, English, and Russian (other languages not supported)
- Requires custom tokenizer for Kyrgyz language support
- May require domain-specific fine-tuning for specialized applications
Citation
@misc{kyrgyz-whisper-small,
author = {nineninesix},
title = {Whisper Small - Kyrgyz, English, Russian},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/nineninesix/kyrgyz-whisper-small}
}
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Acknowledgments
- Based on OpenAI's Whisper architecture
- Kyrgyz tokenizer:
kyrgyz-ai/whisper_tokenizer_ky - Training datasets: Kyrgyz ASR community contributions
- Inspired by multilingual ASR research
License
Apache 2.0 - see LICENSE file for details.
- Downloads last month
- 31
Model tree for nineninesix/kyrgyz-whisper-small
Base model
openai/whisper-small