Spaces:

iteratehack
/

voice-model-rl-training

Runtime error

App Files Files Community

voice-model-rl-training / README.md

mbellan

Initial deployment

c3efd49 12 days ago

preview code

raw

history blame contribute delete

3.19 kB

	---
	title: Voice Model RL Training
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	python_version: 3.11
	hardware: t4-small
	---

	# Voice Model RL Training

	Train open-source voice models using Reinforcement Learning with PPO and REINFORCE algorithms.

	## Features

	- 🎯 Multiple RL Algorithms: Choose between PPO and REINFORCE
	- 🚀 GPU Acceleration: Automatic GPU detection and usage
	- 📊 Real-time Monitoring: Track training progress in real-time
	- 🎵 Model Comparison: Compare base vs trained models
	- 💾 Checkpoint Management: Automatic model saving and loading
	- 🎤 Multiple Base Models: Support for Wav2Vec2, WavLM, and more

	## Supported Models

	- Facebook Wav2Vec2 (Base & Large)
	- Microsoft WavLM Base Plus
	- Any compatible HuggingFace speech model

	## How to Use

	### 1. Training Tab

	1. Select Base Model: Choose from available pretrained models
	2. Configure Algorithm: Select PPO (recommended) or REINFORCE
	3. Set Parameters:
	- Episodes: 10-100 (start with 20 for testing)
	- Learning Rate: 1e-5 to 1e-3 (default: 3e-4)
	- Batch Size: 4-64 (depends on GPU memory)
	4. Start Training: Click "Start Training" and monitor progress

	### 2. Compare Results Tab

	1. Upload Audio: Provide a test audio sample
	2. Generate Comparison: Process through both models
	3. Listen: Compare base vs trained model outputs

	## Reward Functions

	The training optimizes for three key metrics:

	- Clarity (33%): Audio signal quality and noise reduction
	- Naturalness (33%): Natural speech patterns and prosody
	- Accuracy (34%): Fidelity to original content

	## Hardware Requirements

	- CPU: Works but slow (5-10 min per episode)
	- GPU: Recommended (T4 or better) (1-2 min per episode)
	- Memory: 8GB+ RAM, 4GB+ VRAM

	## Technical Details

	### RL Algorithms

	PPO (Proximal Policy Optimization)
	- More stable training
	- Uses value function
	- Better for most cases
	- Slightly slower per episode

	REINFORCE
	- Simpler algorithm
	- Higher variance
	- Faster per episode
	- May need more episodes

	### Training Process

	1. Load pretrained base model
	2. Add RL policy/value heads
	3. Train using custom reward function
	4. Save checkpoints periodically
	5. Generate comparisons

	## Local Development

	Clone and run locally:

	```bash
	git clone https://huggingface.co/spaces/USERNAME/voice-model-rl-training
	cd voice-model-rl-training
	pip install -r requirements.txt
	python app.py
	```

	## Repository Structure

	```
	voice-rl-training/
	├── app.py # Main Gradio application
	├── requirements.txt # Python dependencies
	├── README.md # This file
	├── voice_rl/ # Core training modules
	│ ├── models/ # Model wrappers
	│ ├── rl/ # RL algorithms
	│ ├── training/ # Training orchestration
	│ ├── data/ # Data handling
	│ ├── monitoring/ # Metrics and visualization
	│ └── evaluation/ # Model evaluation
	└── workspace/ # Training outputs (git-ignored)
	```