Spaces:
Runtime error
Runtime error
File size: 3,189 Bytes
c3efd49 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
title: Voice Model RL Training
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
python_version: 3.11
hardware: t4-small
---
# Voice Model RL Training
Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms.
## Features
- π― **Multiple RL Algorithms**: Choose between PPO and REINFORCE
- π **GPU Acceleration**: Automatic GPU detection and usage
- π **Real-time Monitoring**: Track training progress in real-time
- π΅ **Model Comparison**: Compare base vs trained models
- πΎ **Checkpoint Management**: Automatic model saving and loading
- π€ **Multiple Base Models**: Support for Wav2Vec2, WavLM, and more
## Supported Models
- Facebook Wav2Vec2 (Base & Large)
- Microsoft WavLM Base Plus
- Any compatible HuggingFace speech model
## How to Use
### 1. Training Tab
1. **Select Base Model**: Choose from available pretrained models
2. **Configure Algorithm**: Select PPO (recommended) or REINFORCE
3. **Set Parameters**:
- Episodes: 10-100 (start with 20 for testing)
- Learning Rate: 1e-5 to 1e-3 (default: 3e-4)
- Batch Size: 4-64 (depends on GPU memory)
4. **Start Training**: Click "Start Training" and monitor progress
### 2. Compare Results Tab
1. **Upload Audio**: Provide a test audio sample
2. **Generate Comparison**: Process through both models
3. **Listen**: Compare base vs trained model outputs
## Reward Functions
The training optimizes for three key metrics:
- **Clarity** (33%): Audio signal quality and noise reduction
- **Naturalness** (33%): Natural speech patterns and prosody
- **Accuracy** (34%): Fidelity to original content
## Hardware Requirements
- **CPU**: Works but slow (5-10 min per episode)
- **GPU**: Recommended (T4 or better) (1-2 min per episode)
- **Memory**: 8GB+ RAM, 4GB+ VRAM
## Technical Details
### RL Algorithms
**PPO (Proximal Policy Optimization)**
- More stable training
- Uses value function
- Better for most cases
- Slightly slower per episode
**REINFORCE**
- Simpler algorithm
- Higher variance
- Faster per episode
- May need more episodes
### Training Process
1. Load pretrained base model
2. Add RL policy/value heads
3. Train using custom reward function
4. Save checkpoints periodically
5. Generate comparisons
## Local Development
Clone and run locally:
```bash
git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training
cd voice-model-rl-training
pip install -r requirements.txt
python app.py
```
## Repository Structure
```
voice-rl-training/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ voice_rl/ # Core training modules
β βββ models/ # Model wrappers
β βββ rl/ # RL algorithms
β βββ training/ # Training orchestration
β βββ data/ # Data handling
β βββ monitoring/ # Metrics and visualization
β βββ evaluation/ # Model evaluation
βββ workspace/ # Training outputs (git-ignored)
```
|