How to create new voice styles + voice cloning support?
Hi team,
I'm exploring Supertonic TTS and I saw the voice_styles/*.json files (example: F1.json). I want to create my own custom voice style for a new voice profile.
How can we generate a new voice-style JSON?
Is there a documented format or schema?
Are these styles tied to a specific embedding space inside the model?
Can external users add/register new styles without retraining the model?
Does the open-source Supertonic model support voice cloning?
If yes, what is the recommended dataset + process?
If not, is cloning only available in Supertone Play/API?
Will fine-tuning or speaker adaptation be supported in future releases?
I want to build custom voices and custom styles for my project, so any guidance or examples would be greatly appreciated.
Thanks!
Thank you for your interest!
We're actively building a pipeline to let users incorporate their preferred voices into the open-source model, with the goal of releasing it before the end of the year.
Supertone Play/API offer high-quality voice cloning. However, they do not generate the JSON files required for use with the open-source model.
Thank you so much for the clarification!
That’s great to hear a user-friendly pipeline for adding custom voices to the open-source model will be incredibly valuable, and I’m looking forward to the release later this year.
I appreciate the transparency and the progress you’re making. The open-source community will definitely benefit from this upcoming feature. Looking forward to updates!