whisper-base-french-lora / README.md

qfuxa

Upload README.md with huggingface_hub

1af3cdb verified 7 days ago

preview code

raw

history blame contribute delete

5.75 kB

metadata

license: apache-2.0
language:
  - fr
library_name: peft
base_model: openai/whisper-base
tags:
  - whisper
  - speech-recognition
  - asr
  - lora
  - french
  - whisperlivekit
  - peft
datasets:
  - mozilla-foundation/common_voice_23_0
metrics:
  - wer
  - cer
pipeline_tag: automatic-speech-recognition
model-index:
  - name: whisper-base-french-lora
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: Common Voice 23.0 French
          type: mozilla-foundation/common_voice_23_0
          config: fr
          split: test
        metrics:
          - type: wer
            value: 39.3
            name: Test WER
          - type: cer
            value: 17.39
            name: Test CER
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: Common Voice 23.0 French
          type: mozilla-foundation/common_voice_17_0
          config: fr
          split: validation
        metrics:
          - type: wer
            value: 28.06
            name: Validation WER
          - type: cer
            value: 10.06
            name: Validation CER

Whisper Base French LoRA

A LoRA (Low-Rank Adaptation) fine-tuned adapter for openai/whisper-base optimized for French speech recognition.

This adapter was specifically designed for use with WhisperLiveKit, providing ultra-low-latency French transcription.

Model Details

Property	Value
Base Model	`openai/whisper-base` (74M params)
Adapter Type	LoRA (PEFT)
Trainable Parameters	~~2.4M (~~3.2% of base)
Language	French (fr)
Task	Transcription

LoRA Configuration

LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
)

Performance

Comparison with Baseline

Split	Model	WER ↓	CER ↓
Validation	Whisper Base (baseline)	36.94%	15.62%
Validation	+ This LoRA	28.06%	10.06%
Test	Whisper Base (baseline)	60.47%	31.63%
Test	+ This LoRA	39.30%	17.39%

Improvement Summary

Split	WER Reduction	CER Reduction
Validation	-8.88 pts (24% relative)	-5.56 pts (36% relative)
Test	-21.17 pts (35% relative)	-14.24 pts (45% relative)

Usage

With WhisperLiveKit (Recommended)

The easiest way to use this model is with WhisperLiveKit for real-time French transcription:

pip install whisperlivekit

# Start the server with French LoRA (auto-downloads from HuggingFace)
wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora

The adapter is automatically downloaded and cached from HuggingFace Hub on first use.

With Transformers + PEFT

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch

# Load base model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "qfuxa/whisper-base-french-lora")
model = model.merge_and_unload()  # Optional: merge for faster inference

# Transcribe
audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

With Native Whisper (WhisperLiveKit Backend)

from whisperlivekit.whisper import load_model

# Load Whisper base with French LoRA adapter
model = load_model(
    "base",
    lora_path="path/to/whisper-base-french-lora"
)

# Transcribe
result = model.transcribe(audio, language="fr")

Training Details

Dataset

Source: Mozilla Common Voice v23.0 French
Training samples: 100,000
Validation samples: 2,000
Test samples: 2,000

Training Configuration

Parameter	Value
Epochs	5
Effective batch size	128 (16 × 8 accumulation)
Learning rate	3e-4
Warmup steps	100
Weight decay	0.01
Optimizer	AdamW
Early stopping	5 evaluations patience

Hardware

Trained on Apple Silicon (MPS)

Limitations

Optimized specifically for French; may not generalize well to other languages
Based on whisper-base (74M params) — consider larger models for higher accuracy
Performance may vary on domain-specific audio (medical, legal, technical)
Trained on crowd-sourced Common Voice data; may have biases toward certain accents

Citation

If you use this model, please cite:

@misc{whisper-base-french-lora,
  author = {Quentin Fuxa},
  title = {Whisper Base French LoRA},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/qfuxa/whisper-base-french-lora}
}

@misc{whisperlivekit,
  author = {Quentin Fuxa},
  title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/QuentinFuxa/WhisperLiveKit}
}

License

Apache 2.0 — same as the base Whisper model.

Acknowledgments

OpenAI Whisper for the base model
Mozilla Common Voice for the French dataset
Hugging Face PEFT for LoRA implementation