mbellan's picture
Initial deployment
c3efd49
---
title: Voice Model RL Training
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
python_version: 3.11
hardware: t4-small
---
# Voice Model RL Training
Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms.
## Features
- 🎯 **Multiple RL Algorithms**: Choose between PPO and REINFORCE
- πŸš€ **GPU Acceleration**: Automatic GPU detection and usage
- πŸ“Š **Real-time Monitoring**: Track training progress in real-time
- 🎡 **Model Comparison**: Compare base vs trained models
- πŸ’Ύ **Checkpoint Management**: Automatic model saving and loading
- 🎀 **Multiple Base Models**: Support for Wav2Vec2, WavLM, and more
## Supported Models
- Facebook Wav2Vec2 (Base & Large)
- Microsoft WavLM Base Plus
- Any compatible HuggingFace speech model
## How to Use
### 1. Training Tab
1. **Select Base Model**: Choose from available pretrained models
2. **Configure Algorithm**: Select PPO (recommended) or REINFORCE
3. **Set Parameters**:
- Episodes: 10-100 (start with 20 for testing)
- Learning Rate: 1e-5 to 1e-3 (default: 3e-4)
- Batch Size: 4-64 (depends on GPU memory)
4. **Start Training**: Click "Start Training" and monitor progress
### 2. Compare Results Tab
1. **Upload Audio**: Provide a test audio sample
2. **Generate Comparison**: Process through both models
3. **Listen**: Compare base vs trained model outputs
## Reward Functions
The training optimizes for three key metrics:
- **Clarity** (33%): Audio signal quality and noise reduction
- **Naturalness** (33%): Natural speech patterns and prosody
- **Accuracy** (34%): Fidelity to original content
## Hardware Requirements
- **CPU**: Works but slow (5-10 min per episode)
- **GPU**: Recommended (T4 or better) (1-2 min per episode)
- **Memory**: 8GB+ RAM, 4GB+ VRAM
## Technical Details
### RL Algorithms
**PPO (Proximal Policy Optimization)**
- More stable training
- Uses value function
- Better for most cases
- Slightly slower per episode
**REINFORCE**
- Simpler algorithm
- Higher variance
- Faster per episode
- May need more episodes
### Training Process
1. Load pretrained base model
2. Add RL policy/value heads
3. Train using custom reward function
4. Save checkpoints periodically
5. Generate comparisons
## Local Development
Clone and run locally:
```bash
git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training
cd voice-model-rl-training
pip install -r requirements.txt
python app.py
```
## Repository Structure
```
voice-rl-training/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ voice_rl/ # Core training modules
β”‚ β”œβ”€β”€ models/ # Model wrappers
β”‚ β”œβ”€β”€ rl/ # RL algorithms
β”‚ β”œβ”€β”€ training/ # Training orchestration
β”‚ β”œβ”€β”€ data/ # Data handling
β”‚ β”œβ”€β”€ monitoring/ # Metrics and visualization
β”‚ └── evaluation/ # Model evaluation
└── workspace/ # Training outputs (git-ignored)
```