๐ŸŽ™๏ธ VoiceAPI - Multi-lingual Indian Language TTS

An advanced multi-speaker, multilingual text-to-speech (TTS) synthesizer supporting 11 Indian languages with 21 voice options.

๐ŸŒŸ Features

  • 11 Indian Languages: Hindi, Bengali, Marathi, Telugu, Kannada, Gujarati, Bhojpuri, Chhattisgarhi, Maithili, Magahi, English
  • 21 Voice Options: Male and female voices for each language
  • High-Quality Audio: 22050 Hz sample rate, natural prosody
  • REST API: Simple GET/POST endpoints for easy integration
  • Real-time Synthesis: Fast inference on CPU/GPU

๐Ÿ—ฃ๏ธ Supported Languages

Language Code Female Male Script
Hindi hi โœ… โœ… เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
Bengali bn โœ… โœ… เฆฌเฆพเฆ‚เฆฒเฆพ
Marathi mr โœ… โœ… เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
Telugu te โœ… โœ… เฐคเฑ†เฐฒเฑเฐ—เฑ
Kannada kn โœ… โœ… เฒ•เฒจเณเฒจเฒก
Gujarati gu โœ… (MMS) - เช—เซเชœเชฐเชพเชคเซ€
Bhojpuri bho โœ… โœ… เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
Chhattisgarhi hne โœ… โœ… เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
Maithili mai โœ… โœ… เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
Magahi mag โœ… โœ… เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
English en โœ… โœ… Latin

๐Ÿ“ก API Usage

Endpoint

``` https://harshil748-voiceapi.hf.space/ ```

Parameters

Parameter Type Required Description
`text` string Yes Text to synthesize (lowercase for English)
`lang` string Yes Language name (hindi, bengali, etc.)
`speaker_wav` file Yes Reference WAV file (for API compatibility)

Example (Python)

```python import requests

base_url = 'https://harshil748-voiceapi.hf.space/Get_Inference' WavPath = 'reference.wav'

params = { 'text': 'เคจเคฎเคธเฅเคคเฅ‡, เค†เคช เค•เฅˆเคธเฅ‡ เคนเฅˆเค‚?', 'lang': 'hindi', }

with open(WavPath, "rb") as AudioFile: response = requests.get(base_url, params=params, files={'speaker_wav': AudioFile.read()})

if response.status_code == 200: with open('output.wav', 'wb') as f: f.write(response.content) print("Audio saved as 'output.wav'") ```

Example (cURL)

```bash curl -X POST "https://harshil748-voiceapi.hf.space/Get_Inference?text=hello&lang=english" \ -F "speaker_wav=@reference.wav" \ -o output.wav ```

๐Ÿ—๏ธ Model Architecture

  • Base Model: VITS (Variational Inference with adversarial learning for Text-to-Speech)
  • Encoder: Transformer-based text encoder (6 layers, 192 hidden channels)
  • Decoder: HiFi-GAN neural vocoder
  • Duration Predictor: Stochastic duration predictor for natural prosody
  • Sample Rate: 22050 Hz (16000 Hz for Gujarati MMS)

๐Ÿ“Š Training

Datasets Used

Dataset Languages Source License
OpenSLR-103 Hindi OpenSLR CC BY 4.0
OpenSLR-37 Bengali OpenSLR CC BY 4.0
OpenSLR-64 Marathi OpenSLR CC BY 4.0
OpenSLR-66 Telugu OpenSLR CC BY 4.0
OpenSLR-79 Kannada OpenSLR CC BY 4.0
OpenSLR-78 Gujarati OpenSLR CC BY 4.0
Common Voice Hindi, Bengali Mozilla CC0
IndicTTS Multiple IIT Madras Research
Indic-Voices Multiple AI4Bharat CC BY 4.0

Training Configuration

  • Epochs: 1000
  • Batch Size: 32
  • Learning Rate: 2e-4
  • Optimizer: AdamW
  • FP16 Training: Enabled
  • Hardware: NVIDIA V100/A100 GPUs

See `training/` directory for full training scripts and configurations.

๐Ÿš€ Deployment

This API is deployed on HuggingFace Spaces using Docker:

```dockerfile FROM python:3.10-slim

... installs dependencies

Downloads models from Harshil748/VoiceAPI-Models

Runs FastAPI server on port 7860

```

Models are hosted separately at Harshil748/VoiceAPI-Models (~8GB).

๐Ÿ“ Project Structure

```

VoiceAPI/ โ”œโ”€โ”€ app.py # HuggingFace Spaces entry point โ”œโ”€โ”€ Dockerfile # Docker configuration โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ download_models.py # Model downloader โ”œโ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ api.py # FastAPI REST server โ”‚ โ”œโ”€โ”€ engine.py # TTS inference engine โ”‚ โ”œโ”€โ”€ config.py # Voice configurations โ”‚ โ””โ”€โ”€ tokenizer.py # Text tokenization โ””โ”€โ”€ training/ โ”œโ”€โ”€ train_vits.py # VITS training script โ”œโ”€โ”€ prepare_dataset.py # Data preparation โ”œโ”€โ”€ export_model.py # Model export โ”œโ”€โ”€ datasets.csv # Dataset links โ””โ”€โ”€ configs/ # Training configs

```

๐Ÿ“œ License

  • Code: MIT License
  • Models: CC BY 4.0 (following SYSPIN licensing)
  • Datasets: Individual licenses (see training/datasets.csv)

๐Ÿ™ Acknowledgments

๐Ÿ“ง Contact

Built for the Voice Tech for All Hackathon - Multi-lingual TTS for healthcare assistants serving low-income communities.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Harshil748/VoiceAPI-Models 1