mbellan's picture
Initial deployment
c3efd49

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Voice Model RL Training
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
python_version: 3.11
hardware: t4-small

Voice Model RL Training

Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms.

Features

  • 🎯 Multiple RL Algorithms: Choose between PPO and REINFORCE
  • πŸš€ GPU Acceleration: Automatic GPU detection and usage
  • πŸ“Š Real-time Monitoring: Track training progress in real-time
  • 🎡 Model Comparison: Compare base vs trained models
  • πŸ’Ύ Checkpoint Management: Automatic model saving and loading
  • 🎀 Multiple Base Models: Support for Wav2Vec2, WavLM, and more

Supported Models

  • Facebook Wav2Vec2 (Base & Large)
  • Microsoft WavLM Base Plus
  • Any compatible HuggingFace speech model

How to Use

1. Training Tab

  1. Select Base Model: Choose from available pretrained models
  2. Configure Algorithm: Select PPO (recommended) or REINFORCE
  3. Set Parameters:
    • Episodes: 10-100 (start with 20 for testing)
    • Learning Rate: 1e-5 to 1e-3 (default: 3e-4)
    • Batch Size: 4-64 (depends on GPU memory)
  4. Start Training: Click "Start Training" and monitor progress

2. Compare Results Tab

  1. Upload Audio: Provide a test audio sample
  2. Generate Comparison: Process through both models
  3. Listen: Compare base vs trained model outputs

Reward Functions

The training optimizes for three key metrics:

  • Clarity (33%): Audio signal quality and noise reduction
  • Naturalness (33%): Natural speech patterns and prosody
  • Accuracy (34%): Fidelity to original content

Hardware Requirements

  • CPU: Works but slow (5-10 min per episode)
  • GPU: Recommended (T4 or better) (1-2 min per episode)
  • Memory: 8GB+ RAM, 4GB+ VRAM

Technical Details

RL Algorithms

PPO (Proximal Policy Optimization)

  • More stable training
  • Uses value function
  • Better for most cases
  • Slightly slower per episode

REINFORCE

  • Simpler algorithm
  • Higher variance
  • Faster per episode
  • May need more episodes

Training Process

  1. Load pretrained base model
  2. Add RL policy/value heads
  3. Train using custom reward function
  4. Save checkpoints periodically
  5. Generate comparisons

Local Development

Clone and run locally:

git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training
cd voice-model-rl-training
pip install -r requirements.txt
python app.py

Repository Structure

voice-rl-training/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ voice_rl/             # Core training modules
β”‚   β”œβ”€β”€ models/           # Model wrappers
β”‚   β”œβ”€β”€ rl/               # RL algorithms
β”‚   β”œβ”€β”€ training/         # Training orchestration
β”‚   β”œβ”€β”€ data/             # Data handling
β”‚   β”œβ”€β”€ monitoring/       # Metrics and visualization
β”‚   └── evaluation/       # Model evaluation
└── workspace/            # Training outputs (git-ignored)