ASR for African Voices π
Collection
Robust speech-to-text models for languages of Africa
β’
14 items
β’
Updated
β’
2
This model is a fine-tuned version of Wav2Vec2-BERT 2.0 for Kinyarwanda automatic speech recognition (ASR). It was trained on the 1000 hours dataset from the Kinyarwanda ASR hackthon on Kaggle (Track B), dataset covering Health, Government, Finance, Education, and Agriculture domains. The model is robust and the in-domain WER is below 8.4%.
| Audio | Human Transcription | ASR Transcription | |
|---|---|---|---|
| 1 | Umugore wambaye umupira w'akazi mpuzankano iri mu ibara ry'umuhondo handitseho amagambo yandikishije ibara ry'ubururu. Afite igikoresho cy'itumanaho gikoreshwa mu guhamagara no kwandika ubutumwa bugufi. | umugore wambaye umupira w'akazi impuzankano iri mu ibara ry'umuhondo handitseho amagambo yandikishije ibara ry'ubururu afite igikoresho cy'itumanaho gikoreshwa mu guhamagara no kwandika ubutumwa bugufi | |
| 2 | Igikoresho cyifashishwa mu kwiga imibare ndetse kiba kirimo ibindi bikoresho byinshi harimo amarati atatu ndetse n'irati imwe ndende n'ikaramu na kompa, ibigibi byahawe abanyeshuri biga mu myaka ya mbere n'iya kabiri kugira ngo bajye babyifashisha bari kwiga imibare. | igikoresho cyifashishwa mu kwiga imibare ndetse kiba kirimo ibindi bikoresho byinshi harimo amarati atatu ndetse n'irati imwe ndende n'ikaramu na kompa ibi ngibi byahawe abanyeshuri biga mu myaka ya mbere n'iya kabiri kugira ngo bajye babyifashisha bari kwiga imibare | |
| 3 | Iyi ni Kizimyamwoto iri mu ibara ry'umutuku. Hejuru hakaba hariho amabara y'umuhondo ku ruhande hakaba hariho akantu kameze nk'isaha, hasi hakaba hariho akabara gasa n'ubururu kari amagambo menshi mu rurimi rw'icyongereza hasi yako hakaba hari n'akandi kari mu ibara ry' umuhondo handikishijemo amagambo y'icyongereza, hasi yako hakaba hari n' utundi tuntu tw' utubokisi tw' umweru harimo utuntu tujyiye dushushanyije hakaba hariho n' inyajwi bi na si. | iyinzuzinyamwoto iri mu ibara ry'umutuku hejuru hakaba hariho ahariho amabara y'umuhondo ku ruhande hakaba hariho akantu kameze nk'isaha hasi hakaba hariho akabara gatoya k'ubururu kariho amagambo yandikishije mu rurimi rw'icyongereza hasi yako hakaba hari n'akandi kari mu ibara ry'umuhondo wandikishijemo amagambo y'icyongereza hasi yako hakaba hariho utundi tutu tw'tuboisi hariho amaotw'utubogisi tw'umweru harimo utuntu tugiye dushushanyije hakaba hariho n'inyajwi bi na si |
The model can be used directly for automatic speech recognition of Kinyarwanda audio:
from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC
import torch
import torchaudio
# load model and processor
processor = Wav2Vec2BertProcessor.from_pretrained("badrex/w2v-bert-2.0-kinyarwanda-asr-1000h")
model = Wav2Vec2BertForCTC.from_pretrained("badrex/w2v-bert-2.0-kinyarwanda-asr-1000h")
# load audio
audio_input, sample_rate = torchaudio.load("path/to/audio.wav")
# preprocess
inputs = processor(audio_input.squeeze(), sampling_rate=sample_rate, return_tensors="pt")
# inference
with torch.no_grad():
logits = model(**inputs).logits
# decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)
This model can be used as a foundation for:
The model was fine-tuned on the Kinyarwanda ASR hackthon - Track B dataset:
@misc{w2v_bert_kinyarwanda_asr,
author = {Badr M. Abdullah},
title = {Adapting Wav2Vec2-BERT 2.0 for Kinyarwanda ASR},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/badrex/w2v-bert-2.0-kinyarwanda-asr-1000h}
}
@misc{kinyarwanda_asr_track_b,
title={Kinyarwanda Automatic Speech Recognition Track B},
author={Digital Umuganda},
year={2025},
url={https://www.kaggle.com/competitions/kinyarwanda-automatic-speech-recognition-track-b}
}
For questions or issues, please contact via the Hugging Face model repository.
Base model
facebook/w2v-bert-2.0