Spaces:
Runtime error
Runtime error
| title: Voice Model RL Training | |
| emoji: ποΈ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| python_version: 3.11 | |
| hardware: t4-small | |
| # Voice Model RL Training | |
| Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms. | |
| ## Features | |
| - π― **Multiple RL Algorithms**: Choose between PPO and REINFORCE | |
| - π **GPU Acceleration**: Automatic GPU detection and usage | |
| - π **Real-time Monitoring**: Track training progress in real-time | |
| - π΅ **Model Comparison**: Compare base vs trained models | |
| - πΎ **Checkpoint Management**: Automatic model saving and loading | |
| - π€ **Multiple Base Models**: Support for Wav2Vec2, WavLM, and more | |
| ## Supported Models | |
| - Facebook Wav2Vec2 (Base & Large) | |
| - Microsoft WavLM Base Plus | |
| - Any compatible HuggingFace speech model | |
| ## How to Use | |
| ### 1. Training Tab | |
| 1. **Select Base Model**: Choose from available pretrained models | |
| 2. **Configure Algorithm**: Select PPO (recommended) or REINFORCE | |
| 3. **Set Parameters**: | |
| - Episodes: 10-100 (start with 20 for testing) | |
| - Learning Rate: 1e-5 to 1e-3 (default: 3e-4) | |
| - Batch Size: 4-64 (depends on GPU memory) | |
| 4. **Start Training**: Click "Start Training" and monitor progress | |
| ### 2. Compare Results Tab | |
| 1. **Upload Audio**: Provide a test audio sample | |
| 2. **Generate Comparison**: Process through both models | |
| 3. **Listen**: Compare base vs trained model outputs | |
| ## Reward Functions | |
| The training optimizes for three key metrics: | |
| - **Clarity** (33%): Audio signal quality and noise reduction | |
| - **Naturalness** (33%): Natural speech patterns and prosody | |
| - **Accuracy** (34%): Fidelity to original content | |
| ## Hardware Requirements | |
| - **CPU**: Works but slow (5-10 min per episode) | |
| - **GPU**: Recommended (T4 or better) (1-2 min per episode) | |
| - **Memory**: 8GB+ RAM, 4GB+ VRAM | |
| ## Technical Details | |
| ### RL Algorithms | |
| **PPO (Proximal Policy Optimization)** | |
| - More stable training | |
| - Uses value function | |
| - Better for most cases | |
| - Slightly slower per episode | |
| **REINFORCE** | |
| - Simpler algorithm | |
| - Higher variance | |
| - Faster per episode | |
| - May need more episodes | |
| ### Training Process | |
| 1. Load pretrained base model | |
| 2. Add RL policy/value heads | |
| 3. Train using custom reward function | |
| 4. Save checkpoints periodically | |
| 5. Generate comparisons | |
| ## Local Development | |
| Clone and run locally: | |
| ```bash | |
| git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training | |
| cd voice-model-rl-training | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| ## Repository Structure | |
| ``` | |
| voice-rl-training/ | |
| βββ app.py # Main Gradio application | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| βββ voice_rl/ # Core training modules | |
| β βββ models/ # Model wrappers | |
| β βββ rl/ # RL algorithms | |
| β βββ training/ # Training orchestration | |
| β βββ data/ # Data handling | |
| β βββ monitoring/ # Metrics and visualization | |
| β βββ evaluation/ # Model evaluation | |
| βββ workspace/ # Training outputs (git-ignored) | |
| ``` | |