🗣️ Fine-Tuned SpeechT5 Model

This repository contains a fine-tuned version of SpeechT5 trained on approximately 60 minutes of Great voice I found in Youtube(it's might be AI generated) for text-to-speech (TTS) generation.

🧠 Model Overview

The goal of this model is to replicate the tone, rhythm, and delivery style of Andrew Tate’s speeches using the SpeechT5 architecture.
It performs well for short speech synthesis tasks but still exhibits a slightly metallic sound due to limited training data.

⚙️ Training Configuration

Parameter	Value
Batch Size	8
Learning Rate	8e-5
Optimizer	AdamW
Scheduler	Linear
Training Steps	7000

🗂️ Dataset

Duration: ~1h18min minutes of clean audio
Sampling Rate: 16 kHz
Format: WAV
Text Source: Manual transcriptions

🎧 Results

The model produces clear and expressive speech aligned with Andrew Tate’s vocal tone.
Some metallic artifacts are still audible, likely due to the dataset size and limited training steps.
Further training and data augmentation could improve naturalness.

🚀 Recommendations for Improvement

Increase total training audio to 2–3 hours for better voice consistency.

🧩 Model Architecture

Base Model: microsoft/speecht5_tts
Fine-Tuning Framework: Hugging Face Transformers
Optimizer: AdamW

Installation

pip install txtai

Usage

from txtai.pipeline import TextToSpeech
from IPython.display import Audio

# Load the fine-tuned model
tts = TextToSpeech("bakhil-aissa/speecht5_stoic_voice")

# Generate speech
speech, rate = tts("Good morning, everyone. Today, I'd like to tell you a story about curiosity, the kind that pushes us to explore new ideas and challenge old limits.")

# Play audio
Audio(speech, rate=rate)

Features

Minimal training data (1.5 hours)
Natural voice synthesis
Easy to use with txtai pipeline
Hugging Face integration

Model

Model ID: bakhil-aissa/speecht5_stoic_voice

Available on Hugging Face Just copy and paste this directly into your README.md file!RetryClaude can make mistakes. Please double-check responses.

Example

Downloads last month: 13

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for bakhil-aissa/speecht5_stoic_voice

Base model

microsoft/speecht5_tts

Quantized

(3)

this model