anaszil
/

whisper-large-v3-turbo-darija

Automatic Speech Recognition

Model card Files Files and versions

anaszil commited on about 1 month ago

Commit

9240591

·

verified ·

1 Parent(s): c9eb957

Update README.md

Files changed (1) hide show

README.md +10 -7

README.md CHANGED Viewed

@@ -1,6 +1,13 @@
 ---
 base_model: openai/whisper-large-v3-turbo
 library_name: peft
 ---
 # 🗣️ Whisper Large v3 Turbo – Moroccan Darija (LoRA Fine-tuned)
@@ -84,15 +91,12 @@ print(output["text"])
 - **Optimizer:** AdamW
 - **Seed:** 42
 - **Training Time:** ~4.1 hours on 1 × H100 80 GB
-- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `out_proj`, `fc1`, `fc2` |
 - **Rank (`r`):** 16
-- **Alpha (`lora_alpha`):** | 32 |
 ### 🎧 Dataset
 - **Data Source:** Private Moroccan Darija speech corpus (to be released soon).
-- **Segmentation:** All audio split into ≤30 s chunks.
-> The full dataset used to train this model will be open-sourced soon.
 ---
@@ -107,5 +111,4 @@ print(output["text"])
 Evaluation was performed on a held-out subset from the same data distribution.
 The model achieves a low CER but a relatively higher WER compared to other languages. This difference is mainly due to the absence of a standardized writing system for Darija in Morocco — many words can be spelled in several valid ways. This variability also reflects a limitation of the dataset used for fine-tuning and highlights the need to establish a consistent orthographic standard for Darija before large-scale data collection efforts.
----

 ---
 base_model: openai/whisper-large-v3-turbo
 library_name: peft
+license: mit
+language:
+- ar
+metrics:
+- cer
+- wer
+pipeline_tag: automatic-speech-recognition
 ---
 # 🗣️ Whisper Large v3 Turbo – Moroccan Darija (LoRA Fine-tuned)
 - **Optimizer:** AdamW
 - **Seed:** 42
 - **Training Time:** ~4.1 hours on 1 × H100 80 GB
+- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `out_proj`, `fc1`, `fc2`
 - **Rank (`r`):** 16
+- **Alpha (`lora_alpha`):** 32
 ### 🎧 Dataset
 - **Data Source:** Private Moroccan Darija speech corpus (to be released soon).
 ---
 Evaluation was performed on a held-out subset from the same data distribution.
 The model achieves a low CER but a relatively higher WER compared to other languages. This difference is mainly due to the absence of a standardized writing system for Darija in Morocco — many words can be spelled in several valid ways. This variability also reflects a limitation of the dataset used for fine-tuning and highlights the need to establish a consistent orthographic standard for Darija before large-scale data collection efforts.
+---