Soloba-CTC-600M Series
soloba-ctc-0.6b-v3 is a fine tuned version of RobotsMali/soloba-ctc-0.6b-v2 on RobotsMali/kunkado. This model does not consistently produce Capitalizations and Punctuations and it cannot produce acoustic event tags like those found in Kunkado its transcriptions. It was fine-tuned using NVIDIA NeMo.
π¨ Important Note
This model, along with its associated resources, is part of an ongoing research effort, improvements and refinements are expected in future versions. A human evaluation report of the model is coming soon. Users should be aware that:
- The model may not generalize very well accross all speaking conditions and dialects.
- Community feedback is welcome, and contributions are encouraged to refine the model further.
NVIDIA NeMo: Training
To fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version.
pip install nemo-toolkit['asr']
How to Use This Model
Note that this model has been released for research purposes primarily.
Load Model with NeMo
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="RobotsMali/soloba-ctc-0.6b-v3")
Transcribe Audio
model.eval()
# Assuming you have a test audio file named sample_audio.wav
asr_model.transcribe(['sample_audio.wav'])
Input
This model accepts any mono-channel audio (wav files) as input and resamples them to 16 kHz sample rate before performing the forward pass
Output
This model provides transcribed speech as an hypothesis object with a text attribute containing the transcription string for a given speech sample. (nemo>=2.3)
Model Architecture
This model uses a FastConformer Ecoder and a Convolutional decoder with CTC Loss. FastConformer is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: Fast-Conformer Model.
Training
The NeMo toolkit was used for finetuning this model for 39,000 steps over RobotsMali/soloba-ctc-0.6b-v2 model with bacth_size 32. The finetuning codes and configurations can be found at RobotsMali-AI/bambara-asr.
The tokenizer for this model was trained on the text transcripts of the train set of RobotsMali/kunkado using this script.
Dataset
This model was fine-tuned on the kunkado dataset, the human-reviewed subset, which consists of ~40 hours of transcribed Bambara speech data. The text was normalized with the bambara-normalizer prior to training, normalizing numbers, removing punctuations and removings tags.
Performance
We report the Word Error Rate (WER) and Character Error Rate (CER) for this model:
| Benchmark | Decoding | WER (%) β | CER (%) β |
|---|---|---|---|
| Kunkado | CTC | 38.87 | 21.65 |
| Nyana Eval | CTC | XX.XX | YY.YY |
License
This model is released under the CC-BY-4.0 license. By using this model, you agree to the terms of the license.
Feel free to open a discussion on Hugging Face or file an issue on GitHub for help or contributions.
- Downloads last month
- -
Model tree for RobotsMali/soloba-ctc-0.6b-v3
Base model
nvidia/parakeet-ctc-0.6bDataset used to train RobotsMali/soloba-ctc-0.6b-v3
Evaluation results
- Test WER on Kunkadotest set self-reported38.871
- Test CER on Kunkadotest set self-reported21.648
- Test WER on Nyana Evaltest set self-reportedXX.XXX
- Test CER on Nyana Evaltest set self-reportedYY.YYY