Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: Voice Model RL Training
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
python_version: 3.11
hardware: t4-small
Voice Model RL Training
Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms.
Features
- π― Multiple RL Algorithms: Choose between PPO and REINFORCE
- π GPU Acceleration: Automatic GPU detection and usage
- π Real-time Monitoring: Track training progress in real-time
- π΅ Model Comparison: Compare base vs trained models
- πΎ Checkpoint Management: Automatic model saving and loading
- π€ Multiple Base Models: Support for Wav2Vec2, WavLM, and more
Supported Models
- Facebook Wav2Vec2 (Base & Large)
- Microsoft WavLM Base Plus
- Any compatible HuggingFace speech model
How to Use
1. Training Tab
- Select Base Model: Choose from available pretrained models
- Configure Algorithm: Select PPO (recommended) or REINFORCE
- Set Parameters:
- Episodes: 10-100 (start with 20 for testing)
- Learning Rate: 1e-5 to 1e-3 (default: 3e-4)
- Batch Size: 4-64 (depends on GPU memory)
- Start Training: Click "Start Training" and monitor progress
2. Compare Results Tab
- Upload Audio: Provide a test audio sample
- Generate Comparison: Process through both models
- Listen: Compare base vs trained model outputs
Reward Functions
The training optimizes for three key metrics:
- Clarity (33%): Audio signal quality and noise reduction
- Naturalness (33%): Natural speech patterns and prosody
- Accuracy (34%): Fidelity to original content
Hardware Requirements
- CPU: Works but slow (5-10 min per episode)
- GPU: Recommended (T4 or better) (1-2 min per episode)
- Memory: 8GB+ RAM, 4GB+ VRAM
Technical Details
RL Algorithms
PPO (Proximal Policy Optimization)
- More stable training
- Uses value function
- Better for most cases
- Slightly slower per episode
REINFORCE
- Simpler algorithm
- Higher variance
- Faster per episode
- May need more episodes
Training Process
- Load pretrained base model
- Add RL policy/value heads
- Train using custom reward function
- Save checkpoints periodically
- Generate comparisons
Local Development
Clone and run locally:
git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training
cd voice-model-rl-training
pip install -r requirements.txt
python app.py
Repository Structure
voice-rl-training/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ voice_rl/ # Core training modules
β βββ models/ # Model wrappers
β βββ rl/ # RL algorithms
β βββ training/ # Training orchestration
β βββ data/ # Data handling
β βββ monitoring/ # Metrics and visualization
β βββ evaluation/ # Model evaluation
βββ workspace/ # Training outputs (git-ignored)