Spaces:
Runtime error
Runtime error
| title: Text To Speech With Pitch Controls | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 4.22.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| 1. Libraries and Tools Used: | |
| - Transformers: Provides the `VitsModel` and `AutoTokenizer`, with the use of `facebook/mms-tts-eng` model, a sophisticated text-to-speech model designed by Facebook. | |
| - Torch: A companion library for Transformers, essential for processing the data through the speech model. | |
| - Librosa: A library for audio processing, especially used here for pitch adjustment of the speech. | |
| - Soundfile: Utilized to save the speech output as an audio file. | |
| - Tempfile: Creates temporary files for intermediate storage during processing. | |
| - Gradio: Facilitates the creation of a user-friendly web interface for the text-to-speech application. | |
| 2. Pipeline for Text-to-Speech Conversion: | |
| - Text Input: You begin by typing in the text you want to be converted into speech. | |
| - Tokenization: `AutoTokenizer` processes this text, preparing it for the speech model. | |
| - Speech Synthesis: The `facebook/mms-tts-eng` model within the `VitsModel` takes this processed text and generates the spoken words. | |
| - Pitch Adjustment: 0 Pitch Value: Represents the normal, unaltered pitch of the speech. This is the default state where the voice sounds as it naturally would, without any modifications. | |
| Negative Pitch Values: When you set the pitch to a negative value, it makes the voice sound higher. This is similar to moving up the notes on a piano, resulting in a higher, perhaps more youthful or feminine tone. | |
| Positive Pitch Values: Conversely, positive pitch values make the voice sound lower. This is akin to moving down the notes on a piano. A positive pitch shift results in a deeper, more resonant tone, often associated with a more masculine or mature voice. | |
| - Saving Audio: The speech with the adjusted pitch is saved as an audio file using `Soundfile` and `Tempfile`. | |
| - Interactive Web Interface: Gradio provides an interface where you input text, adjust the pitch using a slider, and listen to the speech output. | |