πŸ—£οΈ Fine-Tuned SpeechT5 Model

This repository contains a fine-tuned version of SpeechT5 trained on approximately 60 minutes of Great voice I found in Youtube(it's might be AI generated) for text-to-speech (TTS) generation.


🧠 Model Overview

The goal of this model is to replicate the tone, rhythm, and delivery style of Andrew Tate’s speeches using the SpeechT5 architecture.
It performs well for short speech synthesis tasks but still exhibits a slightly metallic sound due to limited training data.


βš™οΈ Training Configuration

Parameter Value
Batch Size 8
Learning Rate 8e-5
Optimizer AdamW
Scheduler Linear
Training Steps 7000

πŸ—‚οΈ Dataset

  • Duration: ~1h18min minutes of clean audio
  • Sampling Rate: 16 kHz
  • Format: WAV
  • Text Source: Manual transcriptions

🎧 Results

  • The model produces clear and expressive speech aligned with Andrew Tate’s vocal tone.
  • Some metallic artifacts are still audible, likely due to the dataset size and limited training steps.
  • Further training and data augmentation could improve naturalness.

πŸš€ Recommendations for Improvement

  • Increase total training audio to 2–3 hours for better voice consistency.

🧩 Model Architecture

  • Base Model: microsoft/speecht5_tts
  • Fine-Tuning Framework: Hugging Face Transformers
  • Optimizer: AdamW

Installation

pip install txtai

Usage

from txtai.pipeline import TextToSpeech
from IPython.display import Audio

# Load the fine-tuned model
tts = TextToSpeech("bakhil-aissa/speecht5_stoic_voice")

# Generate speech
speech, rate = tts("Good morning, everyone. Today, I'd like to tell you a story about curiosity, the kind that pushes us to explore new ideas and challenge old limits.")

# Play audio
Audio(speech, rate=rate)

Features

  • Minimal training data (1.5 hours)
  • Natural voice synthesis
  • Easy to use with txtai pipeline
  • Hugging Face integration

Model

Model ID: bakhil-aissa/speecht5_stoic_voice

Available on Hugging Face Just copy and paste this directly into your README.md file!RetryClaude can make mistakes. Please double-check responses.

Example


Downloads last month
13
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bakhil-aissa/speecht5_stoic_voice

Quantized
(3)
this model