How to create new voice styles + voice cloning support?

by cybowolf - opened 18 days ago

18 days ago

Hi team,
I'm exploring Supertonic TTS and I saw the voice_styles/*.json files (example: F1.json). I want to create my own custom voice style for a new voice profile.

How can we generate a new voice-style JSON?

Is there a documented format or schema?

Are these styles tied to a specific embedding space inside the model?

Can external users add/register new styles without retraining the model?

Does the open-source Supertonic model support voice cloning?

If yes, what is the recommended dataset + process?

If not, is cloning only available in Supertone Play/API?

Will fine-tuning or speaker adaptation be supported in future releases?

I want to build custom voices and custom styles for my project, so any guidance or examples would be greatly appreciated.

Thanks!

anlgboy-cream

Supertone org 18 days ago

Thank you for your interest!
We're actively building a pipeline to let users incorporate their preferred voices into the open-source model, with the goal of releasing it before the end of the year.
Supertone Play/API offer high-quality voice cloning. However, they do not generate the JSON files required for use with the open-source model.

cybowolf

17 days ago

Thank you so much for the clarification!

That’s great to hear a user-friendly pipeline for adding custom voices to the open-source model will be incredibly valuable, and I’m looking forward to the release later this year.
I appreciate the transparency and the progress you’re making. The open-source community will definitely benefit from this upcoming feature. Looking forward to updates!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment