qfuxa
/

whisper-base-french-lora

@@ -1,42 +1,216 @@
-{
-  "alpha_pattern": {},
-  "auto_mapping": {
-    "base_model_class": "WhisperForConditionalGeneration",
-    "parent_library": "transformers.models.whisper.modeling_whisper"
-  },
-  "base_model_name_or_path": "openai/whisper-base",
-  "bias": "none",
-  "corda_config": null,
-  "eva_config": null,
-  "exclude_modules": null,
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 32,
-  "lora_bias": false,
-  "lora_dropout": 0.05,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "qalora_group_size": 16,
-  "r": 16,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "k_proj",
-    "v_proj",
-    "out_proj",
-    "q_proj"
-  ],
-  "target_parameters": null,
-  "task_type": "SEQ_2_SEQ_LM",
-  "trainable_token_indices": null,
-  "use_dora": false,
-  "use_qalora": false,
-  "use_rslora": false
-}

+---
+license: apache-2.0
+language:
+- fr
+library_name: peft
+base_model: openai/whisper-base
+tags:
+- whisper
+- speech-recognition
+- asr
+- lora
+- french
+- whisperlivekit
+- peft
+datasets:
+- mozilla-foundation/common_voice_23_0
+metrics:
+- wer
+- cer
+pipeline_tag: automatic-speech-recognition
+model-index:
+- name: whisper-base-french-lora
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Speech Recognition
+    dataset:
+      name: Common Voice 23.0 French
+      type: mozilla-foundation/common_voice_23_0
+      config: fr
+      split: test
+    metrics:
+    - type: wer
+      value: 39.30
+      name: Test WER
+    - type: cer
+      value: 17.39
+      name: Test CER
+  - task:
+      type: automatic-speech-recognition
+      name: Speech Recognition
+    dataset:
+      name: Common Voice 23.0 French
+      type: mozilla-foundation/common_voice_17_0
+      config: fr
+      split: validation
+    metrics:
+    - type: wer
+      value: 28.06
+      name: Validation WER
+    - type: cer
+      value: 10.06
+      name: Validation CER
+---
+# Whisper Base French LoRA
+A LoRA (Low-Rank Adaptation) fine-tuned adapter for [openai/whisper-base](https://huggingface.co/openai/whisper-base) optimized for French speech recognition.
+This adapter was specifically designed for use with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit), providing ultra-low-latency French transcription.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | `openai/whisper-base` (74M params) |
+| **Adapter Type** | LoRA (PEFT) |
+| **Trainable Parameters** | ~2.4M (~3.2% of base) |
+| **Language** | French (fr) |
+| **Task** | Transcription |
+### LoRA Configuration
+```python
+LoraConfig(
+    r=16,
+    lora_alpha=32,
+    lora_dropout=0.05,
+    bias="none",
+    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
+)
+```
+## Performance
+### Comparison with Baseline
+| Split | Model | WER ↓ | CER ↓ |
+|-------|-------|-------|-------|
+| **Validation** | Whisper Base (baseline) | 36.94% | 15.62% |
+| **Validation** | **+ This LoRA** | **28.06%** | **10.06%** |
+| **Test** | Whisper Base (baseline) | 60.47% | 31.63% |
+| **Test** | **+ This LoRA** | **39.30%** | **17.39%** |
+### Improvement Summary
+| Split | WER Reduction | CER Reduction |
+|-------|---------------|---------------|
+| Validation | **-8.88 pts** (24% relative) | **-5.56 pts** (36% relative) |
+| Test | **-21.17 pts** (35% relative) | **-14.24 pts** (45% relative) |
+## Usage
+### With WhisperLiveKit (Recommended)
+The easiest way to use this model is with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) for real-time French transcription:
+```bash
+pip install whisperlivekit
+# Start the server with French LoRA (auto-downloads from HuggingFace)
+wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora
+```
+The adapter is automatically downloaded and cached from HuggingFace Hub on first use.
+### With Transformers + PEFT
+```python
+from transformers import WhisperForConditionalGeneration, WhisperProcessor
+from peft import PeftModel
+import torch
+# Load base model
+base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
+processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "qfuxa/whisper-base-french-lora")
+model = model.merge_and_unload()  # Optional: merge for faster inference
+# Transcribe
+audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
+generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
+transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+### With Native Whisper (WhisperLiveKit Backend)
+```python
+from whisperlivekit.whisper import load_model
+# Load Whisper base with French LoRA adapter
+model = load_model(
+    "base",
+    lora_path="path/to/whisper-base-french-lora"
+)
+# Transcribe
+result = model.transcribe(audio, language="fr")
+```
+## Training Details
+### Dataset
+- **Source**: [Mozilla Common Voice](https://commonvoice.mozilla.org/) v23.0 French
+- **Training samples**: 100,000
+- **Validation samples**: 2,000
+- **Test samples**: 2,000
+### Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Epochs | 5 |
+| Effective batch size | 128 (16 × 8 accumulation) |
+| Learning rate | 3e-4 |
+| Warmup steps | 100 |
+| Weight decay | 0.01 |
+| Optimizer | AdamW |
+| Early stopping | 5 evaluations patience |
+### Hardware
+- Trained on Apple Silicon (MPS)
+## Limitations
+- Optimized specifically for French; may not generalize well to other languages
+- Based on `whisper-base` (74M params) — consider larger models for higher accuracy
+- Performance may vary on domain-specific audio (medical, legal, technical)
+- Trained on crowd-sourced Common Voice data; may have biases toward certain accents
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{whisper-base-french-lora,
+  author = {Quentin Fuxa},
+  title = {Whisper Base French LoRA},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/qfuxa/whisper-base-french-lora}
+}
+@misc{whisperlivekit,
+  author = {Quentin Fuxa},
+  title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
+  year = {2025},
+  publisher = {GitHub},
+  url = {https://github.com/QuentinFuxa/WhisperLiveKit}
+}
+```
+## License
+Apache 2.0 — same as the base Whisper model.
+## Acknowledgments
+- [OpenAI Whisper](https://github.com/openai/whisper) for the base model
+- [Mozilla Common Voice](https://commonvoice.mozilla.org/) for the French dataset
+- [Hugging Face PEFT](https://github.com/huggingface/peft) for LoRA implementation