qfuxa's picture
Upload README.md with huggingface_hub
1af3cdb verified
metadata
license: apache-2.0
language:
  - fr
library_name: peft
base_model: openai/whisper-base
tags:
  - whisper
  - speech-recognition
  - asr
  - lora
  - french
  - whisperlivekit
  - peft
datasets:
  - mozilla-foundation/common_voice_23_0
metrics:
  - wer
  - cer
pipeline_tag: automatic-speech-recognition
model-index:
  - name: whisper-base-french-lora
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: Common Voice 23.0 French
          type: mozilla-foundation/common_voice_23_0
          config: fr
          split: test
        metrics:
          - type: wer
            value: 39.3
            name: Test WER
          - type: cer
            value: 17.39
            name: Test CER
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: Common Voice 23.0 French
          type: mozilla-foundation/common_voice_17_0
          config: fr
          split: validation
        metrics:
          - type: wer
            value: 28.06
            name: Validation WER
          - type: cer
            value: 10.06
            name: Validation CER

Whisper Base French LoRA

A LoRA (Low-Rank Adaptation) fine-tuned adapter for openai/whisper-base optimized for French speech recognition.

This adapter was specifically designed for use with WhisperLiveKit, providing ultra-low-latency French transcription.

Model Details

Property Value
Base Model openai/whisper-base (74M params)
Adapter Type LoRA (PEFT)
Trainable Parameters 2.4M (3.2% of base)
Language French (fr)
Task Transcription

LoRA Configuration

LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
)

Performance

Comparison with Baseline

Split Model WER ↓ CER ↓
Validation Whisper Base (baseline) 36.94% 15.62%
Validation + This LoRA 28.06% 10.06%
Test Whisper Base (baseline) 60.47% 31.63%
Test + This LoRA 39.30% 17.39%

Improvement Summary

Split WER Reduction CER Reduction
Validation -8.88 pts (24% relative) -5.56 pts (36% relative)
Test -21.17 pts (35% relative) -14.24 pts (45% relative)

Usage

With WhisperLiveKit (Recommended)

The easiest way to use this model is with WhisperLiveKit for real-time French transcription:

pip install whisperlivekit

# Start the server with French LoRA (auto-downloads from HuggingFace)
wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora

The adapter is automatically downloaded and cached from HuggingFace Hub on first use.

With Transformers + PEFT

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch

# Load base model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "qfuxa/whisper-base-french-lora")
model = model.merge_and_unload()  # Optional: merge for faster inference

# Transcribe
audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

With Native Whisper (WhisperLiveKit Backend)

from whisperlivekit.whisper import load_model

# Load Whisper base with French LoRA adapter
model = load_model(
    "base",
    lora_path="path/to/whisper-base-french-lora"
)

# Transcribe
result = model.transcribe(audio, language="fr")

Training Details

Dataset

  • Source: Mozilla Common Voice v23.0 French
  • Training samples: 100,000
  • Validation samples: 2,000
  • Test samples: 2,000

Training Configuration

Parameter Value
Epochs 5
Effective batch size 128 (16 × 8 accumulation)
Learning rate 3e-4
Warmup steps 100
Weight decay 0.01
Optimizer AdamW
Early stopping 5 evaluations patience

Hardware

  • Trained on Apple Silicon (MPS)

Limitations

  • Optimized specifically for French; may not generalize well to other languages
  • Based on whisper-base (74M params) — consider larger models for higher accuracy
  • Performance may vary on domain-specific audio (medical, legal, technical)
  • Trained on crowd-sourced Common Voice data; may have biases toward certain accents

Citation

If you use this model, please cite:

@misc{whisper-base-french-lora,
  author = {Quentin Fuxa},
  title = {Whisper Base French LoRA},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/qfuxa/whisper-base-french-lora}
}

@misc{whisperlivekit,
  author = {Quentin Fuxa},
  title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/QuentinFuxa/WhisperLiveKit}
}

License

Apache 2.0 — same as the base Whisper model.

Acknowledgments