qfuxa
/

whisper-base-french-lora

@@ -1,212 +1,42 @@
----
-license: apache-2.0
-language:
-- fr
-library_name: peft
-base_model: openai/whisper-base
-tags:
-- whisper
-- speech-recognition
-- asr
-- lora
-- french
-- whisperlivekit
-- peft
-datasets:
-- mozilla-foundation/common_voice_17_0
-metrics:
-- wer
-- cer
-pipeline_tag: automatic-speech-recognition
-model-index:
-- name: whisper-base-french-lora
-  results:
-  - task:
-      type: automatic-speech-recognition
-      name: Speech Recognition
-    dataset:
-      name: Common Voice 23.0 French
-      type: mozilla-foundation/common_voice_17_0
-      config: fr
-      split: test
-    metrics:
-    - type: wer
-      value: 39.30
-      name: Test WER
-    - type: cer
-      value: 17.39
-      name: Test CER
-  - task:
-      type: automatic-speech-recognition
-      name: Speech Recognition
-    dataset:
-      name: Common Voice 23.0 French
-      type: mozilla-foundation/common_voice_17_0
-      config: fr
-      split: validation
-    metrics:
-    - type: wer
-      value: 28.06
-      name: Validation WER
-    - type: cer
-      value: 10.06
-      name: Validation CER
----
-# Whisper Base French LoRA
-A LoRA (Low-Rank Adaptation) fine-tuned adapter for [openai/whisper-base](https://huggingface.co/openai/whisper-base) optimized for French speech recognition.
-This adapter was specifically designed for use with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit), providing ultra-low-latency French transcription.
-## Model Details
-| Property | Value |
-|----------|-------|
-| **Base Model** | `openai/whisper-base` (74M params) |
-| **Adapter Type** | LoRA (PEFT) |
-| **Trainable Parameters** | ~2.4M (~3.2% of base) |
-| **Language** | French (fr) |
-| **Task** | Transcription |
-### LoRA Configuration
-```python
-LoraConfig(
-    r=16,
-    lora_alpha=32,
-    lora_dropout=0.05,
-    bias="none",
-    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
-)
-```
-## Performance
-### Comparison with Baseline
-| Split | Model | WER ↓ | CER ↓ |
-|-------|-------|-------|-------|
-| **Validation** | Whisper Base (baseline) | 36.94% | 15.62% |
-| **Validation** | **+ This LoRA** | **28.06%** | **10.06%** |
-| **Test** | Whisper Base (baseline) | 60.47% | 31.63% |
-| **Test** | **+ This LoRA** | **39.30%** | **17.39%** |
-### Improvement Summary
-| Split | WER Reduction | CER Reduction |
-|-------|---------------|---------------|
-| Validation | **-8.88 pts** (24% relative) | **-5.56 pts** (36% relative) |
-| Test | **-21.17 pts** (35% relative) | **-14.24 pts** (45% relative) |
-## Usage
-### With WhisperLiveKit (Recommended)
-The easiest way to use this model is with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) for real-time French transcription:
-```bash
-pip install whisperlivekit
-# Start the server with French LoRA (auto-downloads from HuggingFace)
-wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora
-```
-The adapter is automatically downloaded and cached from HuggingFace Hub on first use.
-### With Transformers + PEFT
-```python
-from transformers import WhisperForConditionalGeneration, WhisperProcessor
-from peft import PeftModel
-import torch
-# Load base model
-base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
-processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")
-# Load LoRA adapter
-model = PeftModel.from_pretrained(base_model, "QuentinFuxa/whisper-base-french-lora")
-model = model.merge_and_unload()  # Optional: merge for faster inference
-# Transcribe
-audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
-generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
-transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-```
-### With Native Whisper (WhisperLiveKit Backend)
-```python
-from whisperlivekit.whisper import load_model
-# Load Whisper base with French LoRA adapter
-model = load_model(
-    "base",
-    lora_path="path/to/whisper-base-french-lora"
-)
-# Transcribe
-result = model.transcribe(audio, language="fr")
-```
-## Training Details
-### Dataset
-- **Source**: [Mozilla Common Voice](https://commonvoice.mozilla.org/) v23.0 French
-- **Training samples**: 100,000
-- **Validation samples**: 2,000
-- **Test samples**: 2,000
-### Training Configuration
-| Parameter | Value |
-|-----------|-------|
-| Epochs | 5 |
-| Effective batch size | 128 (16 × 8 accumulation) |
-| Learning rate | 3e-4 |
-| Warmup steps | 100 |
-| Weight decay | 0.01 |
-| Optimizer | AdamW |
-| Early stopping | 5 evaluations patience |
-## Limitations
-- Optimized specifically for French; may not generalize well to other languages
-- Based on `whisper-base` (74M params) — consider larger models for higher accuracy
-- Performance may vary on domain-specific audio (medical, legal, technical)
-- Trained on crowd-sourced Common Voice data; may have biases toward certain accents
-## Citation
-If you use this model, please cite:
-```bibtex
-@misc{whisper-base-french-lora,
-  author = {Quentin Fuxa},
-  title = {Whisper Base French LoRA},
-  year = {2025},
-  publisher = {Hugging Face},
-  url = {https://huggingface.co/QuentinFuxa/whisper-base-french-lora}
-}
-@misc{whisperlivekit,
-  author = {Quentin Fuxa},
-  title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
-  year = {2025},
-  publisher = {GitHub},
-  url = {https://github.com/QuentinFuxa/WhisperLiveKit}
-}
-```
-## License
-Apache 2.0 — same as the base Whisper model.
-## Acknowledgments
-- [OpenAI Whisper](https://github.com/openai/whisper) for the base model
-- [Mozilla Common Voice](https://commonvoice.mozilla.org/) for the French dataset
-- [Hugging Face PEFT](https://github.com/huggingface/peft) for LoRA implementation

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "WhisperForConditionalGeneration",
+    "parent_library": "transformers.models.whisper.modeling_whisper"
+  },
+  "base_model_name_or_path": "openai/whisper-base",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "v_proj",
+    "out_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "SEQ_2_SEQ_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}