Spaces:

iteratehack
/

voice-model-rl-training

Runtime error

App Files Files Community

voice-model-rl-training / README.md

mbellan

Initial deployment

c3efd49 11 days ago

preview code

raw

history blame contribute delete

3.19 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Voice Model RL Training
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
python_version: 3.11
hardware: t4-small

Voice Model RL Training

Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms.

Features

🎯 Multiple RL Algorithms: Choose between PPO and REINFORCE
🚀 GPU Acceleration: Automatic GPU detection and usage
📊 Real-time Monitoring: Track training progress in real-time
🎵 Model Comparison: Compare base vs trained models
💾 Checkpoint Management: Automatic model saving and loading
🎤 Multiple Base Models: Support for Wav2Vec2, WavLM, and more

Supported Models

Facebook Wav2Vec2 (Base & Large)
Microsoft WavLM Base Plus
Any compatible HuggingFace speech model

How to Use

1. Training Tab

Select Base Model: Choose from available pretrained models
Configure Algorithm: Select PPO (recommended) or REINFORCE
Set Parameters:
- Episodes: 10-100 (start with 20 for testing)
- Learning Rate: 1e-5 to 1e-3 (default: 3e-4)
- Batch Size: 4-64 (depends on GPU memory)
Start Training: Click "Start Training" and monitor progress

2. Compare Results Tab

Upload Audio: Provide a test audio sample
Generate Comparison: Process through both models
Listen: Compare base vs trained model outputs

Reward Functions

The training optimizes for three key metrics:

Clarity (33%): Audio signal quality and noise reduction
Naturalness (33%): Natural speech patterns and prosody
Accuracy (34%): Fidelity to original content

Hardware Requirements

CPU: Works but slow (5-10 min per episode)
GPU: Recommended (T4 or better) (1-2 min per episode)
Memory: 8GB+ RAM, 4GB+ VRAM

Technical Details

RL Algorithms

PPO (Proximal Policy Optimization)

More stable training
Uses value function
Better for most cases
Slightly slower per episode

REINFORCE

Simpler algorithm
Higher variance
Faster per episode
May need more episodes

Training Process

Load pretrained base model
Add RL policy/value heads
Train using custom reward function
Save checkpoints periodically
Generate comparisons

Local Development

Clone and run locally:

git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training
cd voice-model-rl-training
pip install -r requirements.txt
python app.py

Repository Structure

voice-rl-training/
├── app.py                 # Main Gradio application
├── requirements.txt       # Python dependencies
├── README.md             # This file
├── voice_rl/             # Core training modules
│   ├── models/           # Model wrappers
│   ├── rl/               # RL algorithms
│   ├── training/         # Training orchestration
│   ├── data/             # Data handling
│   ├── monitoring/       # Metrics and visualization
│   └── evaluation/       # Model evaluation
└── workspace/            # Training outputs (git-ignored)