Spaces:

IbrahimSalah
/

Arabic-TTS-Spark

Running on Zero

App Files Files Community

ibrahimabdelaal commited on 25 days ago

Commit

6d0fe97

1 Parent(s): 7307ef3

Add torchaudio dependency and improve UI layout

Browse files

Files changed (2) hide show

app.py +53 -95
requirements.txt +1 -0

app.py CHANGED Viewed

@@ -286,117 +286,75 @@ DEFAULT_REFERENCE_AUDIO = "reference.wav"
 # Create Gradio interface
 with gr.Blocks(title="Arabic TTS - Spark", theme=gr.themes.Soft()) as demo:
     gr.Markdown("""
-    # 🎙️ Arabic Text-to-Speech (Spark Model)
-    Generate high-quality Arabic speech from text using the Spark TTS model with voice cloning capabilities.
     **Model:** [IbrahimSalah/Arabic-TTS-Spark](https://huggingface.co/IbrahimSalah/Arabic-TTS-Spark)
-    ### ⚡ Quick Start:
-    1. Enter **diacritized Arabic text** to synthesize (تشكيل required)
-    2. Use the default reference audio or upload your own (5-30 seconds, clear speech)
-    3. Provide the **diacritized transcript** of your reference audio
-    4. Click "Generate Speech"
-    ### ⚠️ Important Notes:
-    - **Diacritized text (تشكيل) is required** for both input text and reference transcript
-    - You can use any LLM (GPT, Claude, Gemini) to add diacritics to your text
-    - Example prompt for LLM: "أضف التشكيل الكامل للنص التالي: [your text]"
-    - Default reference audio is provided for quick testing
-    ### 💡 Tips:
-    - Use high-quality reference audio with minimal background noise
-    - Reference audio should be 5-30 seconds long
-    - Longer texts are automatically split into chunks with smooth transitions
-    - First generation may take 30-60 seconds due to model loading
     """)
     with gr.Row():
-        with gr.Column():
             text_input = gr.Textbox(
-                label="📝 Text to Synthesize (Diacritized Arabic / نص عربي مُشكّل)",
-                placeholder="Enter diacritized Arabic text here... مثال: تُسَاهِمُ التِّقْنِيَّاتُ الْحَدِيثَةُ فِي تَسْهِيلِ حَيَاةِ الْإِنْسَانِ",
-                lines=5,
-                value=DEFAULT_TEXT,
-                info="⚠️ Text must include diacritics (تشكيل). Use GPT/Claude to add them."
-            )
-            gr.Markdown("**🎵 Reference Audio (Default Provided)**")
-            gr.Markdown("*Upload custom reference audio or use the default (WAV format, 5-30 seconds)*")
-            reference_audio = gr.Audio(
-                label="Reference Audio",
-                type="filepath",
-                value=DEFAULT_REFERENCE_AUDIO
             )
-            reference_transcript = gr.Textbox(
-                label="📄 Reference Transcript (Diacritized / نص مُشكّل)",
-                placeholder="Enter the diacritized transcript of your reference audio...",
-                lines=2,
-                value=DEFAULT_REFERENCE_TEXT,
-                info="⚠️ Must match the reference audio exactly with full diacritics"
-            )
             with gr.Accordion("⚙️ Advanced Settings", open=False):
-                temperature = gr.Slider(0.1, 1.5, value=0.8, step=0.1, label="Temperature",
-                                      info="Higher = more variation (0.6-1.0 recommended)")
-                top_p = gr.Slider(0.1, 1.0, value=0.95, step=0.05, label="Top P",
-                                info="Nucleus sampling threshold")
-                max_chunk = gr.Slider(100, 500, value=300, step=50, label="Max Chunk Length",
-                                    info="Characters per chunk for long texts")
-                crossfade = gr.Slider(0.01, 0.2, value=0.08, step=0.01, label="Crossfade Duration (s)",
-                                    info="Smooth transitions between chunks")
             generate_btn = gr.Button("🎤 Generate Speech", variant="primary", size="lg")
-        with gr.Column():
             output_audio = gr.Audio(label="🔊 Generated Speech", type="filepath")
-            status_text = gr.Textbox(label="Status", interactive=False, lines=3)
     # Examples
-    gr.Markdown("### 📚 Examples (All with Full Diacritics)")
-    gr.Examples(
-        examples=[
-            [DEFAULT_TEXT, DEFAULT_REFERENCE_AUDIO, DEFAULT_REFERENCE_TEXT],
-            ["السَّلَامُ عَلَيْكُمْ وَرَحْمَةُ اللَّهِ وَبَرَكَاتُهُ، كَيْفَ حَالُكَ الْيَوْمَ؟", DEFAULT_REFERENCE_AUDIO, DEFAULT_REFERENCE_TEXT],
-            ["الذَّكَاءُ الِاصْطِنَاعِيُّ يُغَيِّرُ الْعَالَمَ بِسُرْعَةٍ كَبِيرَةٍ وَيُسَاهِمُ فِي تَطْوِيرِ حُلُولٍ مُبْتَكَرَةٍ لِلْمُشْكِلَاتِ الْمُعَقَّدَةِ.", DEFAULT_REFERENCE_AUDIO, DEFAULT_REFERENCE_TEXT]
-        ],
-        inputs=[text_input, reference_audio, reference_transcript],
-        label="Click an example to try it out"
-    )
-    gr.Markdown("""
-    ### 📖 About
-    This Space uses the **Arabic-TTS-Spark** model for high-quality Arabic text-to-speech synthesis with voice cloning.
-    ### 🔧 How to Add Diacritics (التشكيل):
-    **Option 1: Use AI (Recommended)**
-    - Ask ChatGPT, Claude, or Gemini: "أضف التشكيل الكامل للنص التالي: [paste your text]"
-    - Or in English: "Add full Arabic diacritics to the following text: [paste your text]"
-    **Option 2: Online Tools**
-    - [Tashkeel Tool](https://tahadz.com/mishkal)
-    - [Harakat.ai](https://harakat.ai)
-    **Option 3: Microsoft Word**
-    - Type Arabic text → Select text → Review tab → Arabic Diacritics
-    ### 📊 Model Info
-    - **Architecture**: Transformer-based TTS with voice cloning
-    - **Sample Rate**: 24kHz
-    - **Languages**: Modern Standard Arabic (MSA) and dialects
-    - **Max Input**: Unlimited (automatic chunking)
-    ### 🔗 Links
-    - **Model Card**: [IbrahimSalah/Arabic-TTS-Spark](https://huggingface.co/IbrahimSalah/Arabic-TTS-Spark)
-    - **F5-TTS Arabic**: [IbrahimSalah/Arabic-F5-TTS-v2](https://huggingface.co/IbrahimSalah/Arabic-F5-TTS-v2)
-    - **Report Issues**: [Discussions](https://huggingface.co/IbrahimSalah/Arabic-TTS-Spark/discussions)
-    ---
-    Made with ❤️ by **Ibrahim Salah** | [HuggingFace Profile](https://huggingface.co/IbrahimSalah)
-    """)
     generate_btn.click(
         fn=generate_speech,
@@ -405,5 +363,5 @@ with gr.Blocks(title="Arabic TTS - Spark", theme=gr.themes.Soft()) as demo:
     )
 if __name__ == "__main__":
-    demo.queue(max_size=20)  # Enable queue for better handling
     demo.launch()

 # Create Gradio interface
 with gr.Blocks(title="Arabic TTS - Spark", theme=gr.themes.Soft()) as demo:
     gr.Markdown("""
+    # 🎙️ Arabic Text-to-Speech | Spark Model
+    High-quality Arabic TTS with voice cloning. **Diacritized text (تشكيل) required.**
     **Model:** [IbrahimSalah/Arabic-TTS-Spark](https://huggingface.co/IbrahimSalah/Arabic-TTS-Spark)
     """)
     with gr.Row():
+        with gr.Column(scale=1):
             text_input = gr.Textbox(
+                label="📝 Text to Synthesize (Arabic with Tashkeel)",
+                placeholder="أَدْخِلْ نَصًّا عَرَبِيًّا مُشَكَّلًا هُنَا...",
+                lines=6,
+                value=DEFAULT_TEXT
             )
+            with gr.Row():
+                with gr.Column():
+                    gr.Markdown("**🎵 Reference Audio**")
+                    reference_audio = gr.Audio(
+                        label="",
+                        type="filepath",
+                        value=DEFAULT_REFERENCE_AUDIO
+                    )
+                with gr.Column():
+                    reference_transcript = gr.Textbox(
+                        label="📄 Reference Transcript (with Tashkeel)",
+                        placeholder="النص المقابل للصوت المرجعي...",
+                        lines=4,
+                        value=DEFAULT_REFERENCE_TEXT
+                    )
             with gr.Accordion("⚙️ Advanced Settings", open=False):
+                with gr.Row():
+                    temperature = gr.Slider(0.1, 1.5, value=0.8, step=0.1, label="Temperature")
+                    top_p = gr.Slider(0.1, 1.0, value=0.95, step=0.05, label="Top P")
+                with gr.Row():
+                    max_chunk = gr.Slider(100, 500, value=300, step=50, label="Max Chunk Length")
+                    crossfade = gr.Slider(0.01, 0.2, value=0.08, step=0.01, label="Crossfade (s)")
             generate_btn = gr.Button("🎤 Generate Speech", variant="primary", size="lg")
+        with gr.Column(scale=1):
             output_audio = gr.Audio(label="🔊 Generated Speech", type="filepath")
+            status_text = gr.Textbox(label="Status", interactive=False, lines=2)
+            gr.Markdown("""
+            ### ℹ️ Requirements
+            - **Diacritized text is required** (تشكيل/تشكيل)
+            - Reference audio: 5-30 seconds, clear speech
+            - Use AI (ChatGPT/Claude) or [online tools](https://tahadz.com/mishkal) to add diacritics
+            ### 🔗 Resources
+            - [Model Card](https://huggingface.co/IbrahimSalah/Arabic-TTS-Spark)
+            - [F5-TTS Arabic](https://huggingface.co/IbrahimSalah/Arabic-F5-TTS-v2)
+            - [Report Issues](https://huggingface.co/IbrahimSalah/Arabic-TTS-Spark/discussions)
+            """)
     # Examples
+    with gr.Accordion("📚 Examples", open=False):
+        gr.Examples(
+            examples=[
+                [DEFAULT_TEXT, DEFAULT_REFERENCE_AUDIO, DEFAULT_REFERENCE_TEXT],
+                ["السَّلَامُ عَلَيْكُمْ وَرَحْمَةُ اللَّهِ وَبَرَكَاتُهُ، كَيْفَ حَالُكَ الْيَوْمَ؟", DEFAULT_REFERENCE_AUDIO, DEFAULT_REFERENCE_TEXT],
+                ["الذَّكَاءُ الِاصْطِنَاعِيُّ يُغَيِّرُ الْعَالَمَ بِسُرْعَةٍ كَبِيرَةٍ وَيُسَاهِمُ فِي تَطْوِيرِ حُلُولٍ مُبْتَكَرَةٍ.", DEFAULT_REFERENCE_AUDIO, DEFAULT_REFERENCE_TEXT]
+            ],
+            inputs=[text_input, reference_audio, reference_transcript]
+        )
     generate_btn.click(
         fn=generate_speech,
     )
 if __name__ == "__main__":
+    demo.queue(max_size=20)
     demo.launch()

requirements.txt CHANGED Viewed

@@ -1,5 +1,6 @@
 gradio==4.44.0
 torch==2.1.0
 transformers==4.46.2
 soundfile==0.12.1
 numpy==1.24.3

 gradio==4.44.0
 torch==2.1.0
+torchaudio==2.1.0
 transformers==4.46.2
 soundfile==0.12.1
 numpy==1.24.3