qfuxa commited on
Commit
1af3cdb
·
verified ·
1 Parent(s): a9d89ee

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +216 -42
README.md CHANGED
@@ -1,42 +1,216 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": {
4
- "base_model_class": "WhisperForConditionalGeneration",
5
- "parent_library": "transformers.models.whisper.modeling_whisper"
6
- },
7
- "base_model_name_or_path": "openai/whisper-base",
8
- "bias": "none",
9
- "corda_config": null,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 32,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": null,
25
- "peft_type": "LORA",
26
- "qalora_group_size": 16,
27
- "r": 16,
28
- "rank_pattern": {},
29
- "revision": null,
30
- "target_modules": [
31
- "k_proj",
32
- "v_proj",
33
- "out_proj",
34
- "q_proj"
35
- ],
36
- "target_parameters": null,
37
- "task_type": "SEQ_2_SEQ_LM",
38
- "trainable_token_indices": null,
39
- "use_dora": false,
40
- "use_qalora": false,
41
- "use_rslora": false
42
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - fr
5
+ library_name: peft
6
+ base_model: openai/whisper-base
7
+ tags:
8
+ - whisper
9
+ - speech-recognition
10
+ - asr
11
+ - lora
12
+ - french
13
+ - whisperlivekit
14
+ - peft
15
+ datasets:
16
+ - mozilla-foundation/common_voice_23_0
17
+ metrics:
18
+ - wer
19
+ - cer
20
+ pipeline_tag: automatic-speech-recognition
21
+ model-index:
22
+ - name: whisper-base-french-lora
23
+ results:
24
+ - task:
25
+ type: automatic-speech-recognition
26
+ name: Speech Recognition
27
+ dataset:
28
+ name: Common Voice 23.0 French
29
+ type: mozilla-foundation/common_voice_23_0
30
+ config: fr
31
+ split: test
32
+ metrics:
33
+ - type: wer
34
+ value: 39.30
35
+ name: Test WER
36
+ - type: cer
37
+ value: 17.39
38
+ name: Test CER
39
+ - task:
40
+ type: automatic-speech-recognition
41
+ name: Speech Recognition
42
+ dataset:
43
+ name: Common Voice 23.0 French
44
+ type: mozilla-foundation/common_voice_17_0
45
+ config: fr
46
+ split: validation
47
+ metrics:
48
+ - type: wer
49
+ value: 28.06
50
+ name: Validation WER
51
+ - type: cer
52
+ value: 10.06
53
+ name: Validation CER
54
+ ---
55
+
56
+ # Whisper Base French LoRA
57
+
58
+ A LoRA (Low-Rank Adaptation) fine-tuned adapter for [openai/whisper-base](https://huggingface.co/openai/whisper-base) optimized for French speech recognition.
59
+
60
+ This adapter was specifically designed for use with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit), providing ultra-low-latency French transcription.
61
+
62
+ ## Model Details
63
+
64
+ | Property | Value |
65
+ |----------|-------|
66
+ | **Base Model** | `openai/whisper-base` (74M params) |
67
+ | **Adapter Type** | LoRA (PEFT) |
68
+ | **Trainable Parameters** | ~2.4M (~3.2% of base) |
69
+ | **Language** | French (fr) |
70
+ | **Task** | Transcription |
71
+
72
+ ### LoRA Configuration
73
+
74
+ ```python
75
+ LoraConfig(
76
+ r=16,
77
+ lora_alpha=32,
78
+ lora_dropout=0.05,
79
+ bias="none",
80
+ target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
81
+ )
82
+ ```
83
+
84
+ ## Performance
85
+
86
+ ### Comparison with Baseline
87
+
88
+ | Split | Model | WER ↓ | CER ↓ |
89
+ |-------|-------|-------|-------|
90
+ | **Validation** | Whisper Base (baseline) | 36.94% | 15.62% |
91
+ | **Validation** | **+ This LoRA** | **28.06%** | **10.06%** |
92
+ | **Test** | Whisper Base (baseline) | 60.47% | 31.63% |
93
+ | **Test** | **+ This LoRA** | **39.30%** | **17.39%** |
94
+
95
+ ### Improvement Summary
96
+
97
+ | Split | WER Reduction | CER Reduction |
98
+ |-------|---------------|---------------|
99
+ | Validation | **-8.88 pts** (24% relative) | **-5.56 pts** (36% relative) |
100
+ | Test | **-21.17 pts** (35% relative) | **-14.24 pts** (45% relative) |
101
+
102
+ ## Usage
103
+
104
+ ### With WhisperLiveKit (Recommended)
105
+
106
+ The easiest way to use this model is with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) for real-time French transcription:
107
+
108
+ ```bash
109
+ pip install whisperlivekit
110
+
111
+ # Start the server with French LoRA (auto-downloads from HuggingFace)
112
+ wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora
113
+ ```
114
+
115
+ The adapter is automatically downloaded and cached from HuggingFace Hub on first use.
116
+
117
+ ### With Transformers + PEFT
118
+
119
+ ```python
120
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
121
+ from peft import PeftModel
122
+ import torch
123
+
124
+ # Load base model
125
+ base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
126
+ processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")
127
+
128
+ # Load LoRA adapter
129
+ model = PeftModel.from_pretrained(base_model, "qfuxa/whisper-base-french-lora")
130
+ model = model.merge_and_unload() # Optional: merge for faster inference
131
+
132
+ # Transcribe
133
+ audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
134
+ generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
135
+ transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
136
+ ```
137
+
138
+ ### With Native Whisper (WhisperLiveKit Backend)
139
+
140
+ ```python
141
+ from whisperlivekit.whisper import load_model
142
+
143
+ # Load Whisper base with French LoRA adapter
144
+ model = load_model(
145
+ "base",
146
+ lora_path="path/to/whisper-base-french-lora"
147
+ )
148
+
149
+ # Transcribe
150
+ result = model.transcribe(audio, language="fr")
151
+ ```
152
+
153
+ ## Training Details
154
+
155
+ ### Dataset
156
+
157
+ - **Source**: [Mozilla Common Voice](https://commonvoice.mozilla.org/) v23.0 French
158
+ - **Training samples**: 100,000
159
+ - **Validation samples**: 2,000
160
+ - **Test samples**: 2,000
161
+
162
+ ### Training Configuration
163
+
164
+ | Parameter | Value |
165
+ |-----------|-------|
166
+ | Epochs | 5 |
167
+ | Effective batch size | 128 (16 × 8 accumulation) |
168
+ | Learning rate | 3e-4 |
169
+ | Warmup steps | 100 |
170
+ | Weight decay | 0.01 |
171
+ | Optimizer | AdamW |
172
+ | Early stopping | 5 evaluations patience |
173
+
174
+ ### Hardware
175
+
176
+ - Trained on Apple Silicon (MPS)
177
+
178
+ ## Limitations
179
+
180
+ - Optimized specifically for French; may not generalize well to other languages
181
+ - Based on `whisper-base` (74M params) — consider larger models for higher accuracy
182
+ - Performance may vary on domain-specific audio (medical, legal, technical)
183
+ - Trained on crowd-sourced Common Voice data; may have biases toward certain accents
184
+
185
+ ## Citation
186
+
187
+ If you use this model, please cite:
188
+
189
+ ```bibtex
190
+ @misc{whisper-base-french-lora,
191
+ author = {Quentin Fuxa},
192
+ title = {Whisper Base French LoRA},
193
+ year = {2025},
194
+ publisher = {Hugging Face},
195
+ url = {https://huggingface.co/qfuxa/whisper-base-french-lora}
196
+ }
197
+
198
+ @misc{whisperlivekit,
199
+ author = {Quentin Fuxa},
200
+ title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
201
+ year = {2025},
202
+ publisher = {GitHub},
203
+ url = {https://github.com/QuentinFuxa/WhisperLiveKit}
204
+ }
205
+ ```
206
+
207
+ ## License
208
+
209
+ Apache 2.0 — same as the base Whisper model.
210
+
211
+ ## Acknowledgments
212
+
213
+ - [OpenAI Whisper](https://github.com/openai/whisper) for the base model
214
+ - [Mozilla Common Voice](https://commonvoice.mozilla.org/) for the French dataset
215
+ - [Hugging Face PEFT](https://github.com/huggingface/peft) for LoRA implementation
216
+