Spaces:

OOI-FrontierTech
/

tts_mockingbird

Paused

App Files Files Community

khof312 commited on Oct 7, 2024

Commit

32515b5

1 Parent(s): 915d672

Update the README about the different synthesizers.

Browse files

Files changed (1) hide show

app.py +11 -3

app.py CHANGED Viewed

@@ -242,12 +242,19 @@ type=['wav'])
 with about:
     #st.header("How it works")
     st.markdown('''# Mockingbird TTS Demo
-This page is a demo of the openly available Text to Speech models for various languages of interest. Currently, 4 synthesizers are supported:
 - [**Meta's Massively Multilingual Speech (MMS)**](https://ai.meta.com/blog/multilingual-model-speech-recognition/) model, which supports over 1000 languages.[^1]
-- [**Coqui's TTS**](https://docs.coqui.ai/en/latest/#) package;[^2] while no longer supported, Coqui acted as a hub for TTS model hosting and these models are still available.
-- [**ESpeak-NG's**](https://github.com/espeak-ng/espeak-ng/tree/master)'s synthetic voices**[^3]
 - [**IMS Toucan**](https://github.com/DigitalPhonetics/IMS-Toucan), which supports 7000 languages.[^4]
 - [**Piper**](https://github.com/rhasspy/piper), a TTS system that supports multiple voices per language and approximately 30 languages.[^5]
 Voice conversion is currently achieved through Coqui.
@@ -268,6 +275,7 @@ Notes:
 [^3]: [Language list](https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md)
 [^4]: Language list is available in the Gradio API documentation [here](https://huggingface.co/spaces/Flux9665/MassivelyMultilingualTTS).
 [^5]: The list of available voices is [here](https://github.com/rhasspy/piper/blob/master/VOICES.md), model checkpoints are [here](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main), and they can be tested [here](https://rhasspy.github.io/piper-samples/).
 ''')

 with about:
     #st.header("How it works")
     st.markdown('''# Mockingbird TTS Demo
+This page is a demo of the openly available Text to Speech models for various languages of interest. Currently, 3 synthesizers with multilingual offerings are supported out of the box:
 - [**Meta's Massively Multilingual Speech (MMS)**](https://ai.meta.com/blog/multilingual-model-speech-recognition/) model, which supports over 1000 languages.[^1]
 - [**IMS Toucan**](https://github.com/DigitalPhonetics/IMS-Toucan), which supports 7000 languages.[^4]
+- [**ESpeak-NG's**](https://github.com/espeak-ng/espeak-ng/tree/master)'s synthetic voices**[^3]
+On a case-by-case basis, for different languages of interest, I have added:
+- [**Coqui's TTS**](https://docs.coqui.ai/en/latest/#) package;[^2] while no longer supported, Coqui acted as a hub for TTS model hosting and these models are still available. Languages must be added on a model-by-model basis.
+- Specific fine-tuned variants of Meta's MMS (either fine-tuned by [Yoach Lacombe](https://huggingface.co/ylacombe), or fine-tuned by me using his scripts).
+I am in the process of adding support for:
 - [**Piper**](https://github.com/rhasspy/piper), a TTS system that supports multiple voices per language and approximately 30 languages.[^5]
+- [**African Voices**](https://github.com/neulab/AfricanVoices), a CMU research project that fine-tuned synthesizers for different African languages. The site hosting the synthesizers is deprecated but they can be downloaded from Google's Wayback Machine. [^6]
 Voice conversion is currently achieved through Coqui.
 [^3]: [Language list](https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md)
 [^4]: Language list is available in the Gradio API documentation [here](https://huggingface.co/spaces/Flux9665/MassivelyMultilingualTTS).
 [^5]: The list of available voices is [here](https://github.com/rhasspy/piper/blob/master/VOICES.md), model checkpoints are [here](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main), and they can be tested [here](https://rhasspy.github.io/piper-samples/).
+[^6]:
 ''')