Spaces:

KiWA001
/

kai-api-gateway

Running

KiWA001 commited on 4 days ago

Commit

b88f56b

1 Parent(s): 80d9d9d

Add SpeechMA TTS provider with 11Labs-compatible API

- Create speechma_tts_provider.py with Playwright automation
- Add 11Labs-compatible TTS endpoints (/v1/text-to-speech/*)
- Implement OCR-based CAPTCHA solving (Tesseract/EasyOCR)
- Support 20+ voices with Ava Multilingual as default
- Add voice effects (pitch, speed, volume) support
- Include test script and documentation

Files changed (7) hide show

TTS_README.md +322 -0
deploy-microservice.sh +0 -0
main.py +2 -0
ocr_utils.py +236 -0
providers/speechma_tts_provider.py +367 -0
test_tts_api.py +280 -0
tts_router.py +500 -0

TTS_README.md ADDED Viewed

	@@ -0,0 +1,322 @@

+# SpeechMA TTS Provider - 11Labs Compatible API
+This module adds text-to-speech capabilities to KAI API using [SpeechMA](https://speechma.com) as the backend provider. The API is designed to be compatible with ElevenLabs API structure.
+## Features
+- 🎙️ **20+ High-Quality Voices** (Ava, Andrew, Brian, Emma, and more)
+- 🔐 **Automatic CAPTCHA Solving** with OCR
+- 🌍 **Multilingual Support** (English, Spanish, French, German, Japanese, etc.)
+- 📱 **11Labs API Compatible** - Drop-in replacement for ElevenLabs
+- 🎛️ **Voice Effects** (pitch, speed, volume control)
+## Installation
+### Required Dependencies
+```bash
+# Core dependencies (already in your project)
+pip install fastapi playwright
+# OCR dependencies (for CAPTCHA solving)
+pip install pytesseract pillow
+# OR use EasyOCR (alternative)
+pip install easyocr
+# Install Playwright browsers
+playwright install chromium
+```
+### System Dependencies
+For **pytesseract**, install Tesseract OCR:
+**macOS:**
+```bash
+brew install tesseract
+```
+**Ubuntu/Debian:**
+```bash
+sudo apt-get install tesseract-ocr
+```
+**Windows:**
+Download from: https://github.com/UB-Mannheim/tesseract/wiki
+## API Endpoints
+### 11Labs-Compatible Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/v1/models` | GET | List available TTS models |
+| `/v1/voices` | GET | List all voices |
+| `/v1/voices/{voice_id}` | GET | Get voice details |
+| `/v1/voices/{voice_id}/settings` | GET | Get voice settings |
+| `/v1/text-to-speech/{voice_id}` | POST | Generate speech |
+| `/v1/text-to-speech/{voice_id}/stream` | POST | Generate speech (streaming) |
+| `/v1/user/subscription` | GET | Get subscription info |
+### SpeechMA-Specific Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/v1/tts/speechma` | POST | Direct SpeechMA TTS with custom options |
+| `/v1/tts/speechma/voices` | GET | Get all SpeechMA voices |
+| `/v1/tts/health` | GET | Check TTS service health |
+## Usage Examples
+### 1. List Available Voices
+```bash
+curl -X GET "http://localhost:8000/v1/voices" \
+  -H "Authorization: Bearer YOUR_API_KEY"
+```
+**Response:**
+```json
+{
+  "voices": [
+    {
+      "voice_id": "ava",
+      "name": "Ava Multilingual",
+      "category": "premade",
+      "labels": {
+        "accent": "United States",
+        "description": "Female Multilingual voice",
+        "gender": "female"
+      }
+    }
+  ]
+}
+```
+### 2. Generate Speech (11Labs Style)
+```bash
+curl -X POST "http://localhost:8000/v1/text-to-speech/ava" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "text": "Hello! This is a test of the SpeechMA TTS API.",
+    "model_id": "eleven_multilingual_v2",
+    "voice_settings": {
+      "stability": 0.5,
+      "similarity_boost": 0.75
+    }
+  }' \
+  --output speech.mp3
+```
+### 3. Generate Speech (SpeechMA Direct)
+```bash
+curl -X POST "http://localhost:8000/v1/tts/speechma" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "text": "Hello with custom voice effects!",
+    "voice_id": "ava",
+    "pitch": 0,
+    "speed": 0,
+    "volume": 100
+  }' \
+  --output speech_custom.mp3
+```
+### 4. Python Client Example
+```python
+import requests
+# Configuration
+API_KEY = "your-api-key"
+BASE_URL = "http://localhost:8000"
+# Generate speech
+response = requests.post(
+    f"{BASE_URL}/v1/text-to-speech/ava",
+    headers={"Authorization": f"Bearer {API_KEY}"},
+    json={"text": "Hello, world!"}
+)
+# Save audio
+with open("output.mp3", "wb") as f:
+    f.write(response.content)
+print("Audio saved!")
+```
+## Available Voices
+### Default: Ava Multilingual
+The default voice is **Ava Multilingual** - a high-quality female voice with multilingual capabilities.
+### All Available Voices
+| Voice ID | Name | Gender | Language | Country |
+|----------|------|--------|----------|---------|
+| `ava` | Ava Multilingual | Female | Multilingual | United States |
+| `andrew` | Andrew Multilingual | Male | Multilingual | United States |
+| `brian` | Brian Multilingual | Male | Multilingual | United States |
+| `emma` | Emma Multilingual | Female | Multilingual | United Kingdom |
+| `remy` | Remy Multilingual | Male | Multilingual | France |
+| `vivienne` | Vivienne Multilingual | Female | Multilingual | United States |
+| `daniel` | Daniel Multilingual | Male | Multilingual | United Kingdom |
+| `serena` | Serena Multilingual | Female | Multilingual | United States |
+| `matthew` | Matthew Multilingual | Male | Multilingual | United States |
+| `jane` | Jane Multilingual | Female | Multilingual | United States |
+| `alfonso` | Alfonso Multilingual | Male | Multilingual | Spain |
+| `mario` | Mario Multilingual | Male | Multilingual | Italy |
+| `klaus` | Klaus Multilingual | Male | Multilingual | Germany |
+| `sakura` | Sakura Multilingual | Female | Multilingual | Japan |
+| `xin` | Xin Multilingual | Female | Multilingual | China |
+| `jose` | Jose Multilingual | Male | Multilingual | Brazil |
+| `ines` | Ines Multilingual | Female | Multilingual | Portugal |
+| `amira` | Amira Multilingual | Female | Multilingual | Saudi Arabia |
+| `fatima` | Fatima Multilingual | Female | Multilingual | UAE |
+## Voice Effects (Direct API Only)
+When using the `/v1/tts/speechma` endpoint, you can customize:
+- **pitch**: Voice pitch adjustment (-10 to 10)
+- **speed**: Speech speed adjustment (-10 to 10)
+- **volume**: Volume percentage (0-200)
+```json
+{
+  "text": "Custom voice settings",
+  "voice_id": "ava",
+  "pitch": 2,
+  "speed": -1,
+  "volume": 120
+}
+```
+## CAPTCHA Handling
+SpeechMA requires CAPTCHA verification. The provider automatically:
+1. Extracts CAPTCHA images from the page
+2. Uses OCR (Tesseract or EasyOCR) to read the 5-digit code
+3. Enters the code and submits
+4. If OCR fails, automatically refreshes the CAPTCHA and retries (up to 5 times)
+### Manual CAPTCHA Solving (If OCR Fails)
+If OCR consistently fails, you can:
+1. Check the CAPTCHA image manually at https://speechma.com
+2. Call the API with pre-solved CAPTCHA (future enhancement)
+3. Ensure Tesseract is properly installed
+## Testing
+Run the test suite:
+```bash
+python test_tts_api.py
+```
+This will test:
+- ✅ Health check
+- ✅ List voices and models
+- ✅ Get voice details
+- ✅ Generate audio samples
+- ✅ Direct SpeechMA API
+## Limitations
+1. **Character Limit**: Maximum 2000 characters per request
+2. **Rate Limits**: Depends on SpeechMA's server capacity
+3. **CAPTCHA**: May occasionally fail if OCR can't read the image
+4. **Audio Format**: Returns MP3 only (output_format is for compatibility)
+## Troubleshooting
+### CAPTCHA Not Solving
+1. **Install Tesseract OCR:**
+   ```bash
+   # macOS
+   brew install tesseract
+   # Ubuntu
+   sudo apt-get install tesseract-ocr
+   ```
+2. **Try EasyOCR instead:**
+   ```bash
+   pip install easyocr
+   ```
+3. **Check browser automation:**
+   ```bash
+   playwright install chromium
+   ```
+### Audio Not Generating
+1. Check SpeechMA is accessible: `GET /v1/tts/health`
+2. Check Playwright is installed: `playwright install`
+3. Try refreshing CAPTCHA manually on speechma.com
+### Import Errors
+```bash
+# Install missing OCR libraries
+pip install pytesseract pillow
+# Or
+pip install easyocr
+```
+## Architecture
+```
+┌─────────────┐     ┌──────────────┐     ┌─────────────┐
+│  API Client │────▶│  TTS Router  │────▶│ SpeechMA    │
+│             │     │ (11Labs API) │     │ Provider    │
+└─────────────┘     └──────────────┘     └──────┬──────┘
+                                                │
+                                         ┌──────▼──────┐
+                                         │  Playwright │
+                                         │  Browser    │
+                                         └──────┬──────┘
+                                                │
+                                         ┌──────▼──────┐
+                                         │  OCR Utils  │
+                                         │ (Tesseract/ │
+                                         │  EasyOCR)   │
+                                         └─────────────┘
+```
+## API Compatibility
+This implementation aims to be compatible with ElevenLabs API v1:
+- ✅ Text-to-Speech conversion
+- ✅ Voice listing
+- ✅ Voice details
+- ✅ Model listing
+- ✅ Subscription info (mock)
+- ❌ Voice cloning (not supported by SpeechMA)
+- ❌ Real-time streaming (returns complete file)
+- ❌ Pronunciation dictionaries (ignored)
+- ❐ Voice settings (stored but not fully applied)
+## Credits
+- **SpeechMA**: https://speechma.com - Free TTS service
+- **ElevenLabs**: API structure inspiration
+- **Tesseract OCR**: Open source OCR engine
+- **EasyOCR**: Alternative OCR library
+## License
+This code is part of the KAI API project. Follow your project's license terms.

deploy-microservice.sh CHANGED Viewed

File without changes

main.py CHANGED Viewed

@@ -54,6 +54,7 @@ from models import (
 from services import engine, search_engine
 from v1_router import router as v1_router
 from admin_router import router as admin_router
 # ---------- Logging ----------
 logging.basicConfig(
@@ -110,6 +111,7 @@ app.add_middleware(
 # Include OpenAI Router
 app.include_router(v1_router)
 app.include_router(admin_router)
 # ---------- Admin Routes ----------

 from services import engine, search_engine
 from v1_router import router as v1_router
 from admin_router import router as admin_router
+from tts_router import router as tts_router
 # ---------- Logging ----------
 logging.basicConfig(
 # Include OpenAI Router
 app.include_router(v1_router)
 app.include_router(admin_router)
+app.include_router(tts_router)
 # ---------- Admin Routes ----------

ocr_utils.py ADDED Viewed

	@@ -0,0 +1,236 @@

+"""
+OCR Utilities for CAPTCHA Solving
+---------------------------------
+Helper functions to solve CAPTCHA images from SpeechMA.
+"""
+import base64
+import io
+import re
+from typing import Optional
+async def extract_digits_from_image(image_data: bytes, method: str = "auto") -> Optional[str]:
+    """
+    Extract 5-digit CAPTCHA code from image.
+    Args:
+        image_data: Raw image bytes
+        method: OCR method to use - "tesseract", "easyocr", or "auto"
+    Returns:
+        5-digit code or None if extraction failed
+    """
+    if method == "auto":
+        # Try tesseract first, then easyocr
+        result = await _try_tesseract(image_data)
+        if result:
+            return result
+        return await _try_easyocr(image_data)
+    elif method == "tesseract":
+        return await _try_tesseract(image_data)
+    elif method == "easyocr":
+        return await _try_easyocr(image_data)
+    return None
+async def _try_tesseract(image_data: bytes) -> Optional[str]:
+    """Try extracting digits using pytesseract."""
+    try:
+        import pytesseract
+        from PIL import Image, ImageEnhance, ImageFilter
+        # Load image
+        image = Image.open(io.BytesIO(image_data))
+        # Preprocess for better OCR
+        # Convert to grayscale
+        image = image.convert('L')
+        # Enhance contrast
+        enhancer = ImageEnhance.Contrast(image)
+        image = enhancer.enhance(2.0)
+        # Denoise
+        image = image.filter(ImageFilter.MedianFilter(size=3))
+        # Binarize
+        threshold = 128
+        image = image.point(lambda x: 0 if x < threshold else 255, '1')
+        # OCR config optimized for single line of digits
+        custom_config = r'--oem 3 --psm 7 -c tessedit_char_whitelist=0123456789'
+        text = pytesseract.image_to_string(image, config=custom_config)
+        # Extract exactly 5 digits
+        digits = re.findall(r'\d', text)
+        if len(digits) >= 5:
+            return ''.join(digits[:5])
+        return None
+    except ImportError:
+        return None
+    except Exception as e:
+        print(f"Tesseract OCR error: {e}")
+        return None
+async def _try_easyocr(image_data: bytes) -> Optional[str]:
+    """Try extracting digits using EasyOCR."""
+    try:
+        import easyocr
+        import tempfile
+        import os
+        # EasyOCR requires a file path, so save temporarily
+        with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as tmp:
+            tmp.write(image_data)
+            tmp_path = tmp.name
+        try:
+            # Initialize reader (English only)
+            reader = easyocr.Reader(['en'], gpu=False)
+            # Read text
+            results = reader.readtext(tmp_path)
+            if results:
+                # Get the text with highest confidence
+                text = results[0][1]
+                # Extract exactly 5 digits
+                digits = re.findall(r'\d', text)
+                if len(digits) >= 5:
+                    return ''.join(digits[:5])
+        finally:
+            # Clean up temp file
+            if os.path.exists(tmp_path):
+                os.remove(tmp_path)
+        return None
+    except ImportError:
+        return None
+    except Exception as e:
+        print(f"EasyOCR error: {e}")
+        return None
+def preprocess_captcha_image(image_data: bytes) -> bytes:
+    """
+    Preprocess CAPTCHA image for better OCR results.
+    Args:
+        image_data: Raw image bytes
+    Returns:
+        Preprocessed image bytes
+    """
+    try:
+        from PIL import Image, ImageEnhance, ImageFilter
+        # Load image
+        image = Image.open(io.BytesIO(image_data))
+        # Convert to grayscale
+        image = image.convert('L')
+        # Enhance contrast
+        enhancer = ImageEnhance.Contrast(image)
+        image = enhancer.enhance(2.0)
+        # Sharpen
+        image = image.filter(ImageFilter.SHARPEN)
+        # Resize slightly larger for better OCR
+        width, height = image.size
+        image = image.resize((width * 2, height * 2), Image.Resampling.LANCZOS)
+        # Save to bytes
+        output = io.BytesIO()
+        image.save(output, format='PNG')
+        return output.getvalue()
+    except Exception as e:
+        print(f"Image preprocessing error: {e}")
+        return image_data
+# Simple fallback digit recognition (very basic)
+def simple_digit_recognition(image_data: bytes) -> Optional[str]:
+    """
+    Very simple fallback digit recognition.
+    Not very accurate, but doesn't require external libraries.
+    Args:
+        image_data: Raw image bytes
+    Returns:
+        Guessed 5-digit code or None
+    """
+    try:
+        from PIL import Image
+        image = Image.open(io.BytesIO(image_data))
+        image = image.convert('L')
+        # Get image dimensions
+        width, height = image.size
+        # Simple heuristic: Look for 5 vertical segments with high contrast
+        # This is a very naive approach and won't work well for complex CAPTCHAs
+        pixels = list(image.getdata())
+        # Divide image into 5 equal vertical segments
+        segment_width = width // 5
+        digits = []
+        for i in range(5):
+            # Get center of each segment
+            x = i * segment_width + segment_width // 2
+            # Count dark pixels in this column
+            dark_count = 0
+            for y in range(height):
+                idx = y * width + x
+                if idx < len(pixels) and pixels[idx] < 128:
+                    dark_count += 1
+            # Simple classification based on darkness
+            # This is extremely basic and won't work reliably
+            darkness_ratio = dark_count / height
+            # Guess digit based on darkness (very rough)
+            if darkness_ratio < 0.1:
+                digits.append('1')
+            elif darkness_ratio < 0.2:
+                digits.append('7')
+            elif darkness_ratio < 0.3:
+                digits.append('4')
+            elif darkness_ratio < 0.4:
+                digits.append('2')
+            elif darkness_ratio < 0.5:
+                digits.append('3')
+            elif darkness_ratio < 0.6:
+                digits.append('5')
+            elif darkness_ratio < 0.7:
+                digits.append('6')
+            elif darkness_ratio < 0.8:
+                digits.append('9')
+            elif darkness_ratio < 0.9:
+                digits.append('8')
+            else:
+                digits.append('0')
+        return ''.join(digits)
+    except Exception as e:
+        print(f"Simple recognition error: {e}")
+        return None

providers/speechma_tts_provider.py ADDED Viewed

	@@ -0,0 +1,367 @@

+"""
+SpeechMA TTS Provider
+---------------------
+Uses Playwright to automate speechma.com TTS generation.
+Handles CAPTCHA solving via OCR and voice selection.
+"""
+import asyncio
+import base64
+import re
+import time
+from typing import Optional
+from playwright.async_api import async_playwright, Page, ElementHandle
+import io
+try:
+    from PIL import Image
+    HAS_PIL = True
+except ImportError:
+    HAS_PIL = False
+from ocr_utils import extract_digits_from_image
+# SpeechMA Voice IDs mapping to their display names
+SPEECHMA_VOICES = {
+    "andrew": {"name": "Andrew Multilingual", "gender": "Male", "language": "Multilingual", "country": "United States"},
+    "ava": {"name": "Ava Multilingual", "gender": "Female", "language": "Multilingual", "country": "United States"},
+    "brian": {"name": "Brian Multilingual", "gender": "Male", "language": "Multilingual", "country": "United States"},
+    "emma": {"name": "Emma Multilingual", "gender": "Female", "language": "Multilingual", "country": "United Kingdom"},
+    "remy": {"name": "Remy Multilingual", "gender": "Male", "language": "Multilingual", "country": "France"},
+    "vivienne": {"name": "Vivienne Multilingual", "gender": "Female", "language": "Multilingual", "country": "United States"},
+    "daniel": {"name": "Daniel Multilingual", "gender": "Male", "language": "Multilingual", "country": "United Kingdom"},
+    "serena": {"name": "Serena Multilingual", "gender": "Female", "language": "Multilingual", "country": "United States"},
+    "matthew": {"name": "Matthew Multilingual", "gender": "Male", "language": "Multilingual", "country": "United States"},
+    "jane": {"name": "Jane Multilingual", "gender": "Female", "language": "Multilingual", "country": "United States"},
+    "alfonso": {"name": "Alfonso Multilingual", "gender": "Male", "language": "Multilingual", "country": "Spain"},
+    "mario": {"name": "Mario Multilingual", "gender": "Male", "language": "Multilingual", "country": "Italy"},
+    "klaus": {"name": "Klaus Multilingual", "gender": "Male", "language": "Multilingual", "country": "Germany"},
+    "sakura": {"name": "Sakura Multilingual", "gender": "Female", "language": "Multilingual", "country": "Japan"},
+    "xin": {"name": "Xin Multilingual", "gender": "Female", "language": "Multilingual", "country": "China"},
+    "jose": {"name": "Jose Multilingual", "gender": "Male", "language": "Multilingual", "country": "Brazil"},
+    "ines": {"name": "Ines Multilingual", "gender": "Female", "language": "Multilingual", "country": "Portugal"},
+    "amira": {"name": "Amira Multilingual", "gender": "Female", "language": "Multilingual", "country": "Saudi Arabia"},
+    "fatima": {"name": "Fatima Multilingual", "gender": "Female", "language": "Multilingual", "country": "UAE"},
+}
+class SpeechMATTSProvider:
+    """SpeechMA Text-to-Speech Provider using Playwright automation."""
+    def __init__(self):
+        self.base_url = "https://speechma.com"
+        self.default_voice = "ava"
+        self.browser = None
+        self.context = None
+    def get_voice_info(self, voice_id: str) -> Optional[dict]:
+        """Get voice information by voice_id."""
+        voice_id_lower = voice_id.lower()
+        # Try direct match first
+        if voice_id_lower in SPEECHMA_VOICES:
+            return {"voice_id": voice_id_lower, **SPEECHMA_VOICES[voice_id_lower]}
+        # Try to find by partial match in name
+        for vid, info in SPEECHMA_VOICES.items():
+            if voice_id_lower in info["name"].lower():
+                return {"voice_id": vid, **info}
+        # Return default if not found
+        return {"voice_id": self.default_voice, **SPEECHMA_VOICES[self.default_voice]}
+    def get_available_voices(self) -> list[dict]:
+        """Return all available voices."""
+        return [{"voice_id": vid, **info} for vid, info in SPEECHMA_VOICES.items()]
+    async def _extract_captcha_code(self, page: Page) -> Optional[str]:
+        """
+        Extract CAPTCHA code from the image using OCR.
+        Returns the 5-digit code or None if failed.
+        """
+        try:
+            # Find the CAPTCHA image element
+            captcha_img = await page.wait_for_selector('img[alt="captcha"], .captcha-image, [class*="captcha"] img', timeout=5000)
+            if not captcha_img:
+                return None
+            # Get the image src
+            src = await captcha_img.get_attribute('src')
+            if not src:
+                return None
+            # If it's a data URL, extract base64
+            if src.startswith('data:image'):
+                base64_data = src.split(',')[1]
+                image_data = base64.b64decode(base64_data)
+            else:
+                # Otherwise download it
+                import aiohttp
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(src) as response:
+                        image_data = await response.read()
+            # Use OCR utilities to extract digits
+            code = await extract_digits_from_image(image_data, method="auto")
+            return code
+        except Exception as e:
+            print(f"CAPTCHA extraction error: {e}")
+            return None
+    async def _refresh_captcha(self, page: Page) -> bool:
+        """Click the refresh button to get a new CAPTCHA."""
+        try:
+            # Find and click refresh button
+            refresh_btn = await page.query_selector('button[onclick*="refreshCaptcha"], button.captcha-refresh, button:has-text("↻")')
+            if refresh_btn:
+                await refresh_btn.click()
+                await asyncio.sleep(1)
+                return True
+            # Try finding by icon/aria-label
+            refresh_btn = await page.query_selector('button[aria-label*="refresh"], button[title*="refresh"]')
+            if refresh_btn:
+                await refresh_btn.click()
+                await asyncio.sleep(1)
+                return True
+        except Exception as e:
+            print(f"CAPTCHA refresh error: {e}")
+        return False
+    async def _select_voice(self, page: Page, voice_id: str) -> bool:
+        """Select the specified voice."""
+        try:
+            voice_info = self.get_voice_info(voice_id)
+            voice_name = voice_info["name"]
+            # Wait for voice selection area to load
+            await page.wait_for_selector('[class*="voice"]', timeout=10000)
+            # Find the voice card by name
+            voice_selector = f'text={voice_name}'
+            voice_element = await page.query_selector(voice_selector)
+            if voice_element:
+                await voice_element.click()
+                await asyncio.sleep(0.5)
+                return True
+            # Try alternative selectors
+            voice_cards = await page.query_selector_all('[class*="voice-card"], [class*="voice-item"], div[class*="voice"]')
+            for card in voice_cards:
+                text = await card.inner_text()
+                if voice_name.lower() in text.lower():
+                    await card.click()
+                    await asyncio.sleep(0.5)
+                    return True
+            return False
+        except Exception as e:
+            print(f"Voice selection error: {e}")
+            return False
+    async def _set_voice_effects(self, page: Page, pitch: int = 0, speed: int = 0, volume: int = 100) -> bool:
+        """Set voice effects (pitch, speed, volume)."""
+        try:
+            # Click Voice Effects button
+            effects_btn = await page.query_selector('button:has-text("Voice Effects"), [class*="voice-effects"]')
+            if effects_btn:
+                await effects_btn.click()
+                await asyncio.sleep(0.5)
+            # Set pitch if not 0
+            if pitch != 0:
+                pitch_input = await page.query_selector('input[placeholder*="pitch"], input[name*="pitch"], [class*="pitch"] input')
+                if pitch_input:
+                    await pitch_input.fill(str(pitch))
+            # Set speed if not 0
+            if speed != 0:
+                speed_input = await page.query_selector('input[placeholder*="speed"], input[name*="speed"], [class*="speed"] input')
+                if speed_input:
+                    await speed_input.fill(str(speed))
+            # Set volume
+            if volume != 100:
+                volume_input = await page.query_selector('input[placeholder*="volume"], input[name*="volume"], [class*="volume"] input')
+                if volume_input:
+                    await volume_input.fill(str(volume))
+            return True
+        except Exception as e:
+            print(f"Voice effects error: {e}")
+            return False
+    async def generate_speech(
+        self,
+        text: str,
+        voice_id: str = "ava",
+        output_format: str = "mp3",
+        pitch: int = 0,
+        speed: int = 0,
+        volume: int = 100
+    ) -> Optional[bytes]:
+        """
+        Generate speech from text using SpeechMA.
+        Args:
+            text: Text to convert to speech (max 2000 chars)
+            voice_id: Voice ID to use
+            output_format: Output audio format
+            pitch: Voice pitch adjustment (-10 to 10)
+            speed: Speech speed adjustment (-10 to 10)
+            volume: Volume percentage (0-200)
+        Returns:
+            Audio data as bytes or None if failed
+        """
+        # Limit text length
+        if len(text) > 2000:
+            text = text[:2000]
+        async with async_playwright() as p:
+            browser = None
+            try:
+                # Launch browser
+                browser = await p.chromium.launch(
+                    headless=True,
+                    args=['--no-sandbox', '--disable-setuid-sandbox']
+                )
+                context = await browser.new_context(
+                    viewport={'width': 1280, 'height': 800},
+                    user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
+                )
+                page = await context.new_page()
+                # Navigate to SpeechMA
+                await page.goto(self.base_url, wait_until='networkidle', timeout=60000)
+                await asyncio.sleep(2)  # Wait for page to fully load
+                # Enter text
+                text_area = await page.wait_for_selector('textarea[placeholder*="text"], textarea[name*="text"], #text-input', timeout=10000)
+                if not text_area:
+                    raise Exception("Could not find text input area")
+                await text_area.fill(text)
+                await asyncio.sleep(0.5)
+                # Select voice
+                voice_selected = await self._select_voice(page, voice_id)
+                if not voice_selected:
+                    print(f"Warning: Could not select voice {voice_id}, using default")
+                # Set voice effects if needed
+                if pitch != 0 or speed != 0 or volume != 100:
+                    await self._set_voice_effects(page, pitch, speed, volume)
+                # Solve CAPTCHA
+                max_captcha_attempts = 5
+                captcha_solved = False
+                for attempt in range(max_captcha_attempts):
+                    # Extract CAPTCHA code
+                    captcha_code = await self._extract_captcha_code(page)
+                    if captcha_code and len(captcha_code) == 5:
+                        # Enter CAPTCHA
+                        captcha_input = await page.query_selector('input[placeholder*="captcha"], input[name*="captcha"], #captcha-input')
+                        if captcha_input:
+                            await captcha_input.fill(captcha_code)
+                            await asyncio.sleep(0.5)
+                            captcha_solved = True
+                            break
+                    # If CAPTCHA extraction failed, try refreshing
+                    if attempt < max_captcha_attempts - 1:
+                        refreshed = await self._refresh_captcha(page)
+                        if refreshed:
+                            await asyncio.sleep(2)  # Wait for new CAPTCHA
+                            continue
+                        else:
+                            # Try reloading the page
+                            await page.reload(wait_until='networkidle')
+                            await asyncio.sleep(2)
+                            # Re-enter text
+                            await text_area.fill(text)
+                            await asyncio.sleep(0.5)
+                if not captcha_solved:
+                    raise Exception("Could not solve CAPTCHA after multiple attempts")
+                # Click Generate Audio button
+                generate_btn = await page.wait_for_selector('button:has-text("Generate Audio"), button[type="submit"]', timeout=10000)
+                if not generate_btn:
+                    raise Exception("Could not find Generate Audio button")
+                # Set up download handler before clicking
+                download_future = asyncio.Future()
+                async def handle_download(download):
+                    try:
+                        path = await download.path()
+                        with open(path, 'rb') as f:
+                            data = f.read()
+                        download_future.set_result(data)
+                    except Exception as e:
+                        download_future.set_exception(e)
+                page.on('download', lambda d: asyncio.create_task(handle_download(d)))
+                # Click generate
+                await generate_btn.click()
+                # Wait for generation and download
+                try:
+                    audio_data = await asyncio.wait_for(download_future, timeout=60)
+                    return audio_data
+                except asyncio.TimeoutError:
+                    # Alternative: Try to get audio from audio player element
+                    audio_element = await page.wait_for_selector('audio[src], source[type="audio/mp3"]', timeout=10000)
+                    if audio_element:
+                        audio_src = await audio_element.get_attribute('src')
+                        if audio_src:
+                            # Download audio from URL
+                            import aiohttp
+                            async with aiohttp.ClientSession() as session:
+                                async with session.get(audio_src) as response:
+                                    return await response.read()
+                    raise Exception("Audio generation timeout - download not detected")
+            except Exception as e:
+                print(f"SpeechMA generation error: {e}")
+                return None
+            finally:
+                if browser:
+                    await browser.close()
+    async def health_check(self) -> bool:
+        """Check if SpeechMA is accessible."""
+        try:
+            async with async_playwright() as p:
+                browser = await p.chromium.launch(headless=True)
+                page = await browser.new_page()
+                await page.goto(self.base_url, timeout=30000)
+                await browser.close()
+                return True
+        except Exception:
+            return False
+# Global provider instance
+_speechma_provider = None
+def get_speechma_provider() -> SpeechMATTSProvider:
+    """Get or create the SpeechMA provider singleton."""
+    global _speechma_provider
+    if _speechma_provider is None:
+        _speechma_provider = SpeechMATTSProvider()
+    return _speechma_provider

test_tts_api.py ADDED Viewed

	@@ -0,0 +1,280 @@

+#!/usr/bin/env python3
+"""
+Test Script for SpeechMA TTS API
+--------------------------------
+Example usage of the 11Labs-compatible TTS endpoints.
+"""
+import requests
+import os
+from pathlib import Path
+# Configuration
+BASE_URL = "http://localhost:8000"  # Change to your API URL
+API_KEY = "your-api-key-here"  # Your KAI API key
+def test_list_voices():
+    """Test listing available voices."""
+    print("\n🎙️  Testing: List Voices")
+    print("-" * 50)
+    response = requests.get(
+        f"{BASE_URL}/v1/voices",
+        headers={"Authorization": f"Bearer {API_KEY}"}
+    )
+    if response.status_code == 200:
+        data = response.json()
+        print(f"✅ Found {len(data['voices'])} voices")
+        # Print first 5 voices
+        for voice in data['voices'][:5]:
+            print(f"  - {voice['voice_id']}: {voice['name']}")
+            if voice.get('labels'):
+                print(f"    Gender: {voice['labels'].get('gender', 'N/A')}, "
+                      f"Accent: {voice['labels'].get('accent', 'N/A')}")
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+def test_list_models():
+    """Test listing TTS models."""
+    print("\n🤖 Testing: List Models")
+    print("-" * 50)
+    response = requests.get(
+        f"{BASE_URL}/v1/models",
+        headers={"Authorization": f"Bearer {API_KEY}"}
+    )
+    if response.status_code == 200:
+        data = response.json()
+        print(f"✅ Found {len(data['models'])} models")
+        for model in data['models']:
+            print(f"  - {model['model_id']}: {model['name']}")
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+def test_get_voice(voice_id: str = "ava"):
+    """Test getting a specific voice."""
+    print(f"\n🎭 Testing: Get Voice '{voice_id}'")
+    print("-" * 50)
+    response = requests.get(
+        f"{BASE_URL}/v1/voices/{voice_id}",
+        headers={"Authorization": f"Bearer {API_KEY}"}
+    )
+    if response.status_code == 200:
+        voice = response.json()
+        print(f"✅ Found voice: {voice['name']}")
+        print(f"   Category: {voice['category']}")
+        if voice.get('labels'):
+            print(f"   Labels: {voice['labels']}")
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+def test_text_to_speech(voice_id: str = "ava", text: str = "Hello, this is a test."):
+    """Test text-to-speech conversion."""
+    print(f"\n🔊 Testing: Text-to-Speech with '{voice_id}'")
+    print("-" * 50)
+    print(f"Text: {text}")
+    payload = {
+        "text": text,
+        "model_id": "eleven_multilingual_v2",
+        "voice_settings": {
+            "stability": 0.5,
+            "similarity_boost": 0.75
+        }
+    }
+    response = requests.post(
+        f"{BASE_URL}/v1/text-to-speech/{voice_id}",
+        headers={
+            "Authorization": f"Bearer {API_KEY}",
+            "Content-Type": "application/json"
+        },
+        json=payload
+    )
+    if response.status_code == 200:
+        # Save audio file
+        output_file = f"test_output_{voice_id}.mp3"
+        with open(output_file, "wb") as f:
+            f.write(response.content)
+        file_size = len(response.content)
+        print(f"✅ Success! Saved to {output_file}")
+        print(f"   File size: {file_size:,} bytes")
+        # Show headers
+        if 'X-Character-Count' in response.headers:
+            print(f"   Character count: {response.headers['X-Character-Count']}")
+        if 'Request-Id' in response.headers:
+            print(f"   Request ID: {response.headers['Request-Id']}")
+        return output_file
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+        return None
+def test_speechma_direct(text: str = "Hello from SpeechMA direct API.", voice_id: str = "ava"):
+    """Test the direct SpeechMA endpoint with more options."""
+    print(f"\n🎯 Testing: SpeechMA Direct API")
+    print("-" * 50)
+    print(f"Text: {text}")
+    print(f"Voice: {voice_id}")
+    payload = {
+        "text": text,
+        "voice_id": voice_id,
+        "pitch": 0,
+        "speed": 0,
+        "volume": 100
+    }
+    response = requests.post(
+        f"{BASE_URL}/v1/tts/speechma",
+        headers={
+            "Authorization": f"Bearer {API_KEY}",
+            "Content-Type": "application/json"
+        },
+        json=payload
+    )
+    if response.status_code == 200:
+        output_file = f"test_speechma_direct_{voice_id}.mp3"
+        with open(output_file, "wb") as f:
+            f.write(response.content)
+        file_size = len(response.content)
+        print(f"✅ Success! Saved to {output_file}")
+        print(f"   File size: {file_size:,} bytes")
+        if 'X-Voice-Used' in response.headers:
+            print(f"   Voice used: {response.headers['X-Voice-Used']}")
+        return output_file
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+        return None
+def test_speechma_voices():
+    """Test getting SpeechMA-specific voice list."""
+    print("\n🎙️  Testing: SpeechMA Voices List")
+    print("-" * 50)
+    response = requests.get(
+        f"{BASE_URL}/v1/tts/speechma/voices",
+        headers={"Authorization": f"Bearer {API_KEY}"}
+    )
+    if response.status_code == 200:
+        data = response.json()
+        print(f"✅ Found {data['count']} voices")
+        print(f"   Default: {data['default_voice']}")
+        # Print all voices
+        print("\n   Available Voices:")
+        for voice in data['voices'][:10]:  # First 10
+            print(f"   - {voice['voice_id']}: {voice['name']} ({voice['gender']}, {voice['country']})")
+        if data['count'] > 10:
+            print(f"   ... and {data['count'] - 10} more")
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+def test_health():
+    """Test TTS health check."""
+    print("\n🏥 Testing: Health Check")
+    print("-" * 50)
+    response = requests.get(f"{BASE_URL}/v1/tts/health")
+    if response.status_code == 200:
+        data = response.json()
+        print(f"✅ Status: {data['status']}")
+        print(f"   Provider: {data['provider']}")
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+def test_user_subscription():
+    """Test user subscription endpoint."""
+    print("\n👤 Testing: User Subscription")
+    print("-" * 50)
+    response = requests.get(
+        f"{BASE_URL}/v1/user/subscription",
+        headers={"Authorization": f"Bearer {API_KEY}"}
+    )
+    if response.status_code == 200:
+        data = response.json()
+        print(f"✅ Tier: {data['tier']}")
+        print(f"   Character limit: {data['character_limit']:,}")
+        print(f"   Character used: {data['character_count']:,}")
+    else:
+        print(f"❌ Failed: {response.status_code}")
+        print(response.text)
+def main():
+    """Run all tests."""
+    print("\n" + "=" * 60)
+    print("🧪 SpeechMA TTS API Test Suite")
+    print("=" * 60)
+    print(f"Base URL: {BASE_URL}")
+    # Health check first
+    test_health()
+    # List resources
+    test_list_models()
+    test_list_voices()
+    test_speechma_voices()
+    # Get specific voice
+    test_get_voice("ava")
+    test_get_voice("andrew")
+    # User info
+    test_user_subscription()
+    # TTS generation (comment out if you don't want to generate audio)
+    print("\n" + "=" * 60)
+    print("🎵 Generating Audio Samples...")
+    print("=" * 60)
+    # Test different voices
+    test_text_to_speech("ava", "Hello! I am Ava, a multilingual voice.")
+    test_text_to_speech("andrew", "Greetings! I am Andrew, ready to help you.")
+    test_text_to_speech("emma", "Hi there! I'm Emma with a British accent.")
+    # Test direct API with effects
+    test_speechma_direct(
+        "This is a test with custom voice settings.",
+        "brian"
+    )
+    print("\n" + "=" * 60)
+    print("✅ All tests completed!")
+    print("=" * 60)
+if __name__ == "__main__":
+    main()

tts_router.py ADDED Viewed

	@@ -0,0 +1,500 @@

+"""
+TTS Router - 11Labs Compatible API
+----------------------------------
+Text-to-Speech endpoints compatible with ElevenLabs API structure.
+Uses SpeechMA as the backend provider.
+"""
+from fastapi import APIRouter, Depends, HTTPException, Header, Request, Response
+from fastapi.responses import StreamingResponse, JSONResponse
+from pydantic import BaseModel, Field
+from typing import List, Optional, Dict, Any, Literal
+import time
+import uuid
+import json
+from auth import verify_api_key
+from providers.speechma_tts_provider import get_speechma_provider
+router = APIRouter()
+# --- Pydantic Models (11Labs Compatible) ---
+class VoiceSettings(BaseModel):
+    """Voice settings for TTS."""
+    stability: float = Field(default=0.5, ge=0.0, le=1.0, description="Voice stability")
+    similarity_boost: float = Field(default=0.75, ge=0.0, le=1.0, description="Similarity boost")
+    style: float = Field(default=0.0, ge=0.0, le=1.0, description="Style exaggeration")
+    use_speaker_boost: bool = Field(default=True, description="Use speaker boost")
+class TextToSpeechRequest(BaseModel):
+    """11Labs-compatible TTS request."""
+    text: str = Field(..., max_length=2000, description="Text to convert to speech")
+    model_id: Optional[str] = Field("eleven_multilingual_v2", description="Model ID (ignored, uses SpeechMA)")
+    voice_settings: Optional[VoiceSettings] = Field(None, description="Voice settings")
+    pronunciation_dictionary_locators: Optional[List[Dict[str, str]]] = None
+    seed: Optional[int] = None
+    previous_text: Optional[str] = None
+    language_code: Optional[str] = None
+    # SpeechMA-specific fields
+    voice_id: Optional[str] = Field("ava", description="Voice ID to use")
+    output_format: Optional[str] = Field("mp3_44100_128", description="Output format")
+    optimize_streaming_latency: Optional[int] = Field(0, ge=0, le=4)
+class VoiceResponse(BaseModel):
+    """Voice information response."""
+    voice_id: str
+    name: str
+    samples: Optional[List[Dict[str, Any]]] = None
+    category: str = "premade"
+    fine_tuning: Optional[Dict[str, Any]] = None
+    labels: Optional[Dict[str, str]] = None
+    description: Optional[str] = None
+    preview_url: Optional[str] = None
+    available_for_tiers: List[str] = ["free", "starter", "creator", "enterprise"]
+    settings: Optional[VoiceSettings] = None
+    sharing: Optional[Dict[str, Any]] = None
+    high_quality_base_model_ids: Optional[List[str]] = None
+    safety_control: Optional[str] = None
+    voice_verification: Optional[Dict[str, Any]] = None
+    permission_on_resource: Optional[str] = None
+    is_legacy: bool = False
+    is_mixed: bool = False
+class VoicesListResponse(BaseModel):
+    """List of voices response."""
+    voices: List[VoiceResponse]
+class TTSModelInfo(BaseModel):
+    """TTS model information."""
+    model_id: str
+    name: str
+    description: str
+    can_do_text_to_speech: bool = True
+    can_do_voice_conversion: bool = False
+    can_use_style: bool = True
+    can_use_speaker_boost: bool = True
+    serves_pro_voices: bool = True
+    serves_v2_models: bool = True
+    token_cost_factor: float = 1.0
+    requires_alpha_access: bool = False
+    max_characters_request_free_user: int = 2000
+    max_characters_request_subscribed_user: int = 2000
+    languages: List[Dict[str, str]]
+class TTSModelsResponse(BaseModel):
+    """TTS models list response."""
+    models: List[TTSModelInfo]
+class UserSubscriptionResponse(BaseModel):
+    """User subscription info (mock for compatibility)."""
+    tier: str = "free"
+    character_count: int = 0
+    character_limit: int = 1000000
+    can_extend_character_limit: bool = True
+    allowed_to_extend_character_limit: bool = True
+    next_character_count_reset_unix: int = 0
+    voice_slots_used: int = 1
+    voice_slots_available: int = 100
+    professional_voice_slots_used: int = 0
+    professional_voice_slots_available: int = 5
+    can_use_delayed_payment_methods: bool = False
+    can_use_instant_voice_cloning: bool = True
+    can_use_professional_voice_cloning: bool = False
+    currency: Dict[str, Any] = {"usd": "USD"}
+    status: str = "active"
+    has_open_invoices: bool = False
+# --- Helper Functions ---
+def format_voice_to_11labs(voice_id: str, voice_info: dict) -> VoiceResponse:
+    """Convert SpeechMA voice to 11Labs format."""
+    return VoiceResponse(
+        voice_id=voice_id,
+        name=voice_info["name"],
+        category="premade",
+        labels={
+            "accent": voice_info.get("country", "Multilingual"),
+            "description": f"{voice_info['gender']} {voice_info['language']} voice",
+            "age": "adult",
+            "gender": voice_info["gender"].lower(),
+            "use_case": "general"
+        },
+        description=f"{voice_info['gender']} {voice_info['language']} voice from {voice_info.get('country', 'Unknown')}",
+        settings=VoiceSettings()
+    )
+# --- Endpoints ---
+@router.get("/v1/user/subscription", response_model=UserSubscriptionResponse)
+async def get_user_subscription(
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Get user subscription information.
+    Mock endpoint for 11Labs compatibility.
+    """
+    return UserSubscriptionResponse(
+        tier="free",
+        character_count=0,
+        character_limit=1000000,
+        next_character_count_reset_unix=int(time.time()) + 86400 * 30
+    )
+@router.get("/v1/models", response_model=TTSModelsResponse)
+async def list_tts_models(
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    List available TTS models.
+    """
+    models = [
+        TTSModelInfo(
+            model_id="eleven_multilingual_v2",
+            name="Eleven Multilingual v2",
+            description="Our most advanced multilingual model with highest quality",
+            can_do_text_to_speech=True,
+            can_do_voice_conversion=False,
+            can_use_style=True,
+            can_use_speaker_boost=True,
+            serves_pro_voices=True,
+            serves_v2_models=True,
+            token_cost_factor=1.0,
+            requires_alpha_access=False,
+            max_characters_request_free_user=2000,
+            max_characters_request_subscribed_user=2000,
+            languages=[
+                {"language_id": "en", "name": "English"},
+                {"language_id": "es", "name": "Spanish"},
+                {"language_id": "fr", "name": "French"},
+                {"language_id": "de", "name": "German"},
+                {"language_id": "it", "name": "Italian"},
+                {"language_id": "pt", "name": "Portuguese"},
+                {"language_id": "ja", "name": "Japanese"},
+                {"language_id": "zh", "name": "Chinese"},
+                {"language_id": "ar", "name": "Arabic"},
+                {"language_id": "hi", "name": "Hindi"},
+            ]
+        ),
+        TTSModelInfo(
+            model_id="eleven_flash_v2_5",
+            name="Eleven Flash v2.5",
+            description="Ultra-low latency model (~75ms)",
+            can_do_text_to_speech=True,
+            can_do_voice_conversion=False,
+            can_use_style=False,
+            can_use_speaker_boost=True,
+            serves_pro_voices=True,
+            serves_v2_models=True,
+            token_cost_factor=0.5,
+            requires_alpha_access=False,
+            max_characters_request_free_user=2000,
+            max_characters_request_subscribed_user=2000,
+            languages=[
+                {"language_id": "en", "name": "English"},
+                {"language_id": "es", "name": "Spanish"},
+                {"language_id": "fr", "name": "French"},
+            ]
+        )
+    ]
+    return TTSModelsResponse(models=models)
+@router.get("/v1/voices", response_model=VoicesListResponse)
+async def list_voices(
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    List all available voices.
+    """
+    provider = get_speechma_provider()
+    voices_data = provider.get_available_voices()
+    voices = []
+    for voice_data in voices_data:
+        voice_id = voice_data["voice_id"]
+        info = {
+            "name": voice_data["name"],
+            "gender": voice_data["gender"],
+            "language": voice_data["language"],
+            "country": voice_data.get("country", "Unknown")
+        }
+        voices.append(format_voice_to_11labs(voice_id, info))
+    return VoicesListResponse(voices=voices)
+@router.get("/v1/voices/{voice_id}", response_model=VoiceResponse)
+async def get_voice(
+    voice_id: str,
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Get information about a specific voice.
+    """
+    provider = get_speechma_provider()
+    voice_info = provider.get_voice_info(voice_id)
+    if not voice_info:
+        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
+    return format_voice_to_11labs(voice_info["voice_id"], {
+        "name": voice_info["name"],
+        "gender": voice_info["gender"],
+        "language": voice_info["language"],
+        "country": voice_info.get("country", "Unknown")
+    })
+@router.get("/v1/voices/{voice_id}/settings", response_model=VoiceSettings)
+async def get_voice_settings(
+    voice_id: str,
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Get default settings for a voice.
+    """
+    provider = get_speechma_provider()
+    voice_info = provider.get_voice_info(voice_id)
+    if not voice_info:
+        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
+    return VoiceSettings()
+@router.post("/v1/text-to-speech/{voice_id}")
+async def text_to_speech(
+    voice_id: str,
+    request: TextToSpeechRequest,
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Convert text to speech.
+    This endpoint is compatible with 11Labs API:
+    POST /v1/text-to-speech/{voice_id}
+    Returns audio data as MP3.
+    """
+    provider = get_speechma_provider()
+    # Validate voice
+    voice_info = provider.get_voice_info(voice_id)
+    if not voice_info:
+        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
+    # Use provided voice_id or from request
+    actual_voice_id = voice_id
+    # Generate speech
+    try:
+        audio_data = await provider.generate_speech(
+            text=request.text,
+            voice_id=actual_voice_id,
+            output_format=request.output_format or "mp3"
+        )
+        if audio_data is None:
+            raise HTTPException(
+                status_code=500,
+                detail="Failed to generate speech. This could be due to CAPTCHA issues or site changes."
+            )
+        # Return audio with proper headers
+        headers = {
+            "Content-Type": "audio/mpeg",
+            "X-Character-Count": str(len(request.text)),
+            "Request-Id": f"tts-{uuid.uuid4().hex[:12]}"
+        }
+        return Response(
+            content=audio_data,
+            media_type="audio/mpeg",
+            headers=headers
+        )
+    except Exception as e:
+        raise HTTPException(
+            status_code=500,
+            detail=f"Speech generation failed: {str(e)}"
+        )
+@router.post("/v1/text-to-speech/{voice_id}/stream")
+async def text_to_speech_stream(
+    voice_id: str,
+    request: TextToSpeechRequest,
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Convert text to speech with streaming response.
+    Note: Since SpeechMA generates complete audio files,
+    this returns the full audio as a stream.
+    """
+    provider = get_speechma_provider()
+    # Validate voice
+    voice_info = provider.get_voice_info(voice_id)
+    if not voice_info:
+        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
+    try:
+        audio_data = await provider.generate_speech(
+            text=request.text,
+            voice_id=voice_id,
+            output_format=request.output_format or "mp3"
+        )
+        if audio_data is None:
+            raise HTTPException(
+                status_code=500,
+                detail="Failed to generate speech"
+            )
+        # Return as streaming response
+        def audio_generator():
+            # Yield audio data in chunks
+            chunk_size = 8192
+            for i in range(0, len(audio_data), chunk_size):
+                yield audio_data[i:i + chunk_size]
+        headers = {
+            "X-Character-Count": str(len(request.text)),
+            "Request-Id": f"tts-stream-{uuid.uuid4().hex[:12]}"
+        }
+        return StreamingResponse(
+            audio_generator(),
+            media_type="audio/mpeg",
+            headers=headers
+        )
+    except Exception as e:
+        raise HTTPException(
+            status_code=500,
+            detail=f"Speech generation failed: {str(e)}"
+        )
+# Additional SpeechMA-specific endpoints
+@router.post("/v1/tts/speechma")
+async def speechma_tts(
+    request: Request,
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Direct SpeechMA TTS endpoint with custom options.
+    Body: {
+        "text": "Hello world",
+        "voice_id": "ava",
+        "pitch": 0,
+        "speed": 0,
+        "volume": 100
+    }
+    """
+    data = await request.json()
+    text = data.get("text")
+    voice_id = data.get("voice_id", "ava")
+    pitch = data.get("pitch", 0)
+    speed = data.get("speed", 0)
+    volume = data.get("volume", 100)
+    if not text:
+        raise HTTPException(status_code=400, detail="Text is required")
+    if len(text) > 2000:
+        raise HTTPException(status_code=400, detail="Text exceeds 2000 character limit")
+    provider = get_speechma_provider()
+    # Validate voice
+    voice_info = provider.get_voice_info(voice_id)
+    if not voice_info:
+        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
+    try:
+        audio_data = await provider.generate_speech(
+            text=text,
+            voice_id=voice_id,
+            pitch=pitch,
+            speed=speed,
+            volume=volume
+        )
+        if audio_data is None:
+            raise HTTPException(
+                status_code=500,
+                detail="Failed to generate speech. This could be due to CAPTCHA issues."
+            )
+        return Response(
+            content=audio_data,
+            media_type="audio/mpeg",
+            headers={
+                "Content-Disposition": f'attachment; filename="speech_{voice_id}.mp3"',
+                "X-Voice-Used": voice_info["voice_id"]
+            }
+        )
+    except Exception as e:
+        raise HTTPException(
+            status_code=500,
+            detail=f"Speech generation failed: {str(e)}"
+        )
+@router.get("/v1/tts/speechma/voices")
+async def speechma_voices(
+    key_data: dict = Depends(verify_api_key)
+):
+    """
+    Get all available SpeechMA voices with full details.
+    """
+    provider = get_speechma_provider()
+    voices = provider.get_available_voices()
+    return JSONResponse({
+        "voices": voices,
+        "count": len(voices),
+        "default_voice": "ava"
+    })
+@router.get("/v1/tts/health")
+async def tts_health_check():
+    """
+    Check if TTS service is healthy.
+    """
+    try:
+        provider = get_speechma_provider()
+        is_healthy = await provider.health_check()
+        return JSONResponse({
+            "status": "healthy" if is_healthy else "unhealthy",
+            "provider": "speechma",
+            "timestamp": time.time()
+        })
+    except Exception as e:
+        return JSONResponse({
+            "status": "unhealthy",
+            "provider": "speechma",
+            "error": str(e),
+            "timestamp": time.time()
+        }, status_code=503)