qfuxa commited on
Commit
a9d89ee
·
verified ·
1 Parent(s): 020b3cc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +42 -212
README.md CHANGED
@@ -1,212 +1,42 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - fr
5
- library_name: peft
6
- base_model: openai/whisper-base
7
- tags:
8
- - whisper
9
- - speech-recognition
10
- - asr
11
- - lora
12
- - french
13
- - whisperlivekit
14
- - peft
15
- datasets:
16
- - mozilla-foundation/common_voice_17_0
17
- metrics:
18
- - wer
19
- - cer
20
- pipeline_tag: automatic-speech-recognition
21
- model-index:
22
- - name: whisper-base-french-lora
23
- results:
24
- - task:
25
- type: automatic-speech-recognition
26
- name: Speech Recognition
27
- dataset:
28
- name: Common Voice 23.0 French
29
- type: mozilla-foundation/common_voice_17_0
30
- config: fr
31
- split: test
32
- metrics:
33
- - type: wer
34
- value: 39.30
35
- name: Test WER
36
- - type: cer
37
- value: 17.39
38
- name: Test CER
39
- - task:
40
- type: automatic-speech-recognition
41
- name: Speech Recognition
42
- dataset:
43
- name: Common Voice 23.0 French
44
- type: mozilla-foundation/common_voice_17_0
45
- config: fr
46
- split: validation
47
- metrics:
48
- - type: wer
49
- value: 28.06
50
- name: Validation WER
51
- - type: cer
52
- value: 10.06
53
- name: Validation CER
54
- ---
55
-
56
- # Whisper Base French LoRA
57
-
58
- A LoRA (Low-Rank Adaptation) fine-tuned adapter for [openai/whisper-base](https://huggingface.co/openai/whisper-base) optimized for French speech recognition.
59
-
60
- This adapter was specifically designed for use with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit), providing ultra-low-latency French transcription.
61
-
62
- ## Model Details
63
-
64
- | Property | Value |
65
- |----------|-------|
66
- | **Base Model** | `openai/whisper-base` (74M params) |
67
- | **Adapter Type** | LoRA (PEFT) |
68
- | **Trainable Parameters** | ~2.4M (~3.2% of base) |
69
- | **Language** | French (fr) |
70
- | **Task** | Transcription |
71
-
72
- ### LoRA Configuration
73
-
74
- ```python
75
- LoraConfig(
76
- r=16,
77
- lora_alpha=32,
78
- lora_dropout=0.05,
79
- bias="none",
80
- target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
81
- )
82
- ```
83
-
84
- ## Performance
85
-
86
- ### Comparison with Baseline
87
-
88
- | Split | Model | WER ↓ | CER ↓ |
89
- |-------|-------|-------|-------|
90
- | **Validation** | Whisper Base (baseline) | 36.94% | 15.62% |
91
- | **Validation** | **+ This LoRA** | **28.06%** | **10.06%** |
92
- | **Test** | Whisper Base (baseline) | 60.47% | 31.63% |
93
- | **Test** | **+ This LoRA** | **39.30%** | **17.39%** |
94
-
95
- ### Improvement Summary
96
-
97
- | Split | WER Reduction | CER Reduction |
98
- |-------|---------------|---------------|
99
- | Validation | **-8.88 pts** (24% relative) | **-5.56 pts** (36% relative) |
100
- | Test | **-21.17 pts** (35% relative) | **-14.24 pts** (45% relative) |
101
-
102
- ## Usage
103
-
104
- ### With WhisperLiveKit (Recommended)
105
-
106
- The easiest way to use this model is with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) for real-time French transcription:
107
-
108
- ```bash
109
- pip install whisperlivekit
110
-
111
- # Start the server with French LoRA (auto-downloads from HuggingFace)
112
- wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora
113
- ```
114
-
115
- The adapter is automatically downloaded and cached from HuggingFace Hub on first use.
116
-
117
- ### With Transformers + PEFT
118
-
119
- ```python
120
- from transformers import WhisperForConditionalGeneration, WhisperProcessor
121
- from peft import PeftModel
122
- import torch
123
-
124
- # Load base model
125
- base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
126
- processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")
127
-
128
- # Load LoRA adapter
129
- model = PeftModel.from_pretrained(base_model, "QuentinFuxa/whisper-base-french-lora")
130
- model = model.merge_and_unload() # Optional: merge for faster inference
131
-
132
- # Transcribe
133
- audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
134
- generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
135
- transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
136
- ```
137
-
138
- ### With Native Whisper (WhisperLiveKit Backend)
139
-
140
- ```python
141
- from whisperlivekit.whisper import load_model
142
-
143
- # Load Whisper base with French LoRA adapter
144
- model = load_model(
145
- "base",
146
- lora_path="path/to/whisper-base-french-lora"
147
- )
148
-
149
- # Transcribe
150
- result = model.transcribe(audio, language="fr")
151
- ```
152
-
153
- ## Training Details
154
-
155
- ### Dataset
156
-
157
- - **Source**: [Mozilla Common Voice](https://commonvoice.mozilla.org/) v23.0 French
158
- - **Training samples**: 100,000
159
- - **Validation samples**: 2,000
160
- - **Test samples**: 2,000
161
-
162
- ### Training Configuration
163
-
164
- | Parameter | Value |
165
- |-----------|-------|
166
- | Epochs | 5 |
167
- | Effective batch size | 128 (16 × 8 accumulation) |
168
- | Learning rate | 3e-4 |
169
- | Warmup steps | 100 |
170
- | Weight decay | 0.01 |
171
- | Optimizer | AdamW |
172
- | Early stopping | 5 evaluations patience |
173
-
174
- ## Limitations
175
-
176
- - Optimized specifically for French; may not generalize well to other languages
177
- - Based on `whisper-base` (74M params) — consider larger models for higher accuracy
178
- - Performance may vary on domain-specific audio (medical, legal, technical)
179
- - Trained on crowd-sourced Common Voice data; may have biases toward certain accents
180
-
181
- ## Citation
182
-
183
- If you use this model, please cite:
184
-
185
- ```bibtex
186
- @misc{whisper-base-french-lora,
187
- author = {Quentin Fuxa},
188
- title = {Whisper Base French LoRA},
189
- year = {2025},
190
- publisher = {Hugging Face},
191
- url = {https://huggingface.co/QuentinFuxa/whisper-base-french-lora}
192
- }
193
-
194
- @misc{whisperlivekit,
195
- author = {Quentin Fuxa},
196
- title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
197
- year = {2025},
198
- publisher = {GitHub},
199
- url = {https://github.com/QuentinFuxa/WhisperLiveKit}
200
- }
201
- ```
202
-
203
- ## License
204
-
205
- Apache 2.0 — same as the base Whisper model.
206
-
207
- ## Acknowledgments
208
-
209
- - [OpenAI Whisper](https://github.com/openai/whisper) for the base model
210
- - [Mozilla Common Voice](https://commonvoice.mozilla.org/) for the French dataset
211
- - [Hugging Face PEFT](https://github.com/huggingface/peft) for LoRA implementation
212
-
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-base",
8
+ "bias": "none",
9
+ "corda_config": null,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "qalora_group_size": 16,
27
+ "r": 16,
28
+ "rank_pattern": {},
29
+ "revision": null,
30
+ "target_modules": [
31
+ "k_proj",
32
+ "v_proj",
33
+ "out_proj",
34
+ "q_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "SEQ_2_SEQ_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }