π£οΈ Fine-Tuned SpeechT5 Model
This repository contains a fine-tuned version of SpeechT5 trained on approximately 60 minutes of Great voice I found in Youtube(it's might be AI generated) for text-to-speech (TTS) generation.
π§ Model Overview
The goal of this model is to replicate the tone, rhythm, and delivery style of Andrew Tateβs speeches using the SpeechT5 architecture.
It performs well for short speech synthesis tasks but still exhibits a slightly metallic sound due to limited training data.
βοΈ Training Configuration
| Parameter | Value |
|---|---|
| Batch Size | 8 |
| Learning Rate | 8e-5 |
| Optimizer | AdamW |
| Scheduler | Linear |
| Training Steps | 7000 |
ποΈ Dataset
- Duration: ~1h18min minutes of clean audio
- Sampling Rate: 16 kHz
- Format: WAV
- Text Source: Manual transcriptions
π§ Results
- The model produces clear and expressive speech aligned with Andrew Tateβs vocal tone.
- Some metallic artifacts are still audible, likely due to the dataset size and limited training steps.
- Further training and data augmentation could improve naturalness.
π Recommendations for Improvement
- Increase total training audio to 2β3 hours for better voice consistency.
π§© Model Architecture
- Base Model:
microsoft/speecht5_tts - Fine-Tuning Framework: Hugging Face Transformers
- Optimizer: AdamW
Installation
pip install txtai
Usage
from txtai.pipeline import TextToSpeech
from IPython.display import Audio
# Load the fine-tuned model
tts = TextToSpeech("bakhil-aissa/speecht5_stoic_voice")
# Generate speech
speech, rate = tts("Good morning, everyone. Today, I'd like to tell you a story about curiosity, the kind that pushes us to explore new ideas and challenge old limits.")
# Play audio
Audio(speech, rate=rate)
Features
- Minimal training data (1.5 hours)
- Natural voice synthesis
- Easy to use with txtai pipeline
- Hugging Face integration
Model
Model ID: bakhil-aissa/speecht5_stoic_voice
Available on Hugging Face Just copy and paste this directly into your README.md file!RetryClaude can make mistakes. Please double-check responses.
Example
- Downloads last month
- 13
Model tree for bakhil-aissa/speecht5_stoic_voice
Base model
microsoft/speecht5_tts