WansooKimHF commited on
Commit
618b0bb
·
verified ·
1 Parent(s): 2c6ace1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -1
README.md CHANGED
@@ -2,4 +2,151 @@
2
  license: bigscience-openrail-m
3
  language:
4
  - en
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: bigscience-openrail-m
3
  language:
4
  - en
5
+ ---
6
+
7
+ # Supertonic — Lightning Fast, On-Device TTS
8
+
9
+ **Supertonic** is a lightning-fast, on-device text-to-speech system designed for **extreme performance** with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your device—no cloud, no API calls, no privacy concerns.
10
+
11
+ > 🎧 **Try it now**: Experience Supertonic in your browser with our [**Interactive Demo**](https://huggingface.co/spaces/Supertone/supertonic#interactive-demo), or get started with pre-trained models from [**Hugging Face Hub**](https://huggingface.co/Supertone/supertonic)
12
+
13
+ ### Table of Contents
14
+
15
+ - [Why Supertonic?](#why-supertonic)
16
+ - [Language Support](#language-support)
17
+ - [Getting Started](#getting-started)
18
+ - [Performance](#performance)
19
+ - [Citation](#citation)
20
+ - [License](#license)
21
+
22
+ ## Why Supertonic?
23
+
24
+ - **⚡ Blazingly Fast**: Generates speech up to **167× faster than real-time** on consumer hardware (M4 Pro)—unmatched by any other TTS system
25
+ - **🪶 Ultra Lightweight**: Only **66M parameters**, optimized for efficient on-device performance with minimal footprint
26
+ - **📱 On-Device Capable**: **Complete privacy** and **zero latency**—all processing happens locally on your device
27
+ - **🎨 Natural Text Handling**: Seamlessly processes numbers, dates, currency, abbreviations, and complex expressions without pre-processing
28
+ - **⚙️ Highly Configurable**: Adjust inference steps, batch processing, and other parameters to match your specific needs
29
+ - **🧩 Flexible Deployment**: Deploy seamlessly across servers, browsers, and edge devices with multiple runtime backends.
30
+
31
+
32
+ ## Language Support
33
+
34
+ We provide ready-to-use TTS inference examples across multiple ecosystems:
35
+
36
+ | Language/Platform | Path | Description |
37
+ |-------------------|------|-------------|
38
+ | [**Python**](py/) | `py/` | ONNX Runtime inference |
39
+ | [**Node.js**](nodejs/) | `nodejs/` | Server-side JavaScript |
40
+ | [**Browser**](web/) | `web/` | WebGPU/WASM inference |
41
+ | [**Java**](java/) | `java/` | Cross-platform JVM |
42
+ | [**C++**](cpp/) | `cpp/` | High-performance C++ |
43
+ | [**C#**](csharp/) | `csharp/` | .NET ecosystem |
44
+ | [**Go**](go/) | `go/` | Go implementation |
45
+ | [**Swift**](swift/) | `swift/` | macOS applications |
46
+ | [**iOS**](ios/) | `ios/` | Native iOS apps |
47
+ | [**Rust**](rust/) | `rust/` | Memory-safe systems |
48
+
49
+ > For detailed usage instructions, please refer to the README.md in each language directory.
50
+
51
+ ## Getting Started
52
+
53
+ First, clone the repository:
54
+
55
+ ```bash
56
+ git clone https://github.com/supertone-inc/supertonic.git
57
+ cd supertonic
58
+ ```
59
+
60
+ ### Prerequisites
61
+
62
+ Before running the examples, download the ONNX models and preset voices, and place them in the `assets` directory:
63
+
64
+ ```bash
65
+ git clone https://huggingface.co/Supertone/supertonic assets
66
+ ```
67
+
68
+ > **Note:** The Hugging Face repository uses Git LFS. Please ensure Git LFS is installed and initialized before cloning or pulling large model files.
69
+ > - macOS: `brew install git-lfs && git lfs install`
70
+ > - Generic: see `https://git-lfs.com` for installers
71
+
72
+
73
+ ### Technical Details
74
+
75
+ - **Runtime**: ONNX Runtime for cross-platform inference (CPU-optimized; GPU mode is not tested)
76
+ - **Browser Support**: onnxruntime-web for client-side inference
77
+ - **Batch Processing**: Supports batch inference for improved throughput
78
+ - **Audio Output**: Outputs 16-bit WAV files
79
+
80
+ ## Performance
81
+
82
+ We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars).
83
+
84
+ **Metrics:**
85
+ - **Characters per Second**: Measures throughput by dividing the number of input characters by the time required to generate audio. Higher is better.
86
+ - **Real-time Factor (RTF)**: Measures the time taken to synthesize audio relative to its duration. Lower is better (e.g., RTF of 0.1 means it takes 0.1 seconds to generate one second of audio).
87
+
88
+ ### Characters per Second
89
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
90
+ |--------|-----------------|----------------|-----------------|
91
+ | **Supertonic** (M4 pro - CPU) | 912 | 1048 | 1263 |
92
+ | **Supertonic** (M4 pro - WebGPU) | 996 | 1801 | 2509 |
93
+ | **Supertonic** (RTX4090) | 2615 | 6548 | 12164 |
94
+ | `API` [ElevenLabs Flash v2.5](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) | 144 | 209 | 287 |
95
+ | `API` [OpenAI TTS-1](https://platform.openai.com/docs/guides/text-to-speech) | 37 | 55 | 82 |
96
+ | `API` [Gemini 2.5 Flash TTS](https://ai.google.dev/gemini-api/docs/speech-generation) | 12 | 18 | 24 |
97
+ | `API` [Supertone Sona speech 1](https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech) | 38 | 64 | 92 |
98
+ | `Open` [Kokoro](https://github.com/hexgrad/kokoro/) | 104 | 107 | 117 |
99
+ | `Open` [NeuTTS Air](https://github.com/neuphonic/neutts-air) | 37 | 42 | 47 |
100
+
101
+ > **Notes:**
102
+ > `API` = Cloud-based API services (measured from Seoul)
103
+ > `Open` = Open-source models
104
+ > Supertonic (M4 pro - CPU) and (M4 pro - WebGPU): Tested with ONNX
105
+ > Supertonic (RTX4090): Tested with PyTorch model
106
+ > Kokoro: Tested on M4 Pro CPU with ONNX
107
+ > NeuTTS Air: Tested on M4 Pro CPU with Q8-GGUF
108
+
109
+ ### Real-time Factor
110
+
111
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
112
+ |--------|-----------------|----------------|-----------------|
113
+ | **Supertonic** (M4 pro - CPU) | 0.015 | 0.013 | 0.012 |
114
+ | **Supertonic** (M4 pro - WebGPU) | 0.014 | 0.007 | 0.006 |
115
+ | **Supertonic** (RTX4090) | 0.005 | 0.002 | 0.001 |
116
+ | `API` [ElevenLabs Flash v2.5](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) | 0.133 | 0.077 | 0.057 |
117
+ | `API` [OpenAI TTS-1](https://platform.openai.com/docs/guides/text-to-speech) | 0.471 | 0.302 | 0.201 |
118
+ | `API` [Gemini 2.5 Flash TTS](https://ai.google.dev/gemini-api/docs/speech-generation) | 1.060 | 0.673 | 0.541 |
119
+ | `API` [Supertone Sona speech 1](https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech) | 0.372 | 0.206 | 0.163 |
120
+ | `Open` [Kokoro](https://github.com/hexgrad/kokoro/) | 0.144 | 0.124 | 0.126 |
121
+ | `Open` [NeuTTS Air](https://github.com/neuphonic/neutts-air) | 0.390 | 0.338 | 0.343 |
122
+
123
+ <details>
124
+ <summary><b>Additional Performance Data (5-step inference)</b></summary>
125
+
126
+ <br>
127
+
128
+ **Characters per Second (5-step)**
129
+
130
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
131
+ |--------|-----------------|----------------|-----------------|
132
+ | **Supertonic** (M4 pro - CPU) | 596 | 691 | 850 |
133
+ | **Supertonic** (M4 pro - WebGPU) | 570 | 1118 | 1546 |
134
+ | **Supertonic** (RTX4090) | 1286 | 3757 | 6242 |
135
+
136
+ **Real-time Factor (5-step)**
137
+
138
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
139
+ |--------|-----------------|----------------|-----------------|
140
+ | **Supertonic** (M4 pro - CPU) | 0.023 | 0.019 | 0.018 |
141
+ | **Supertonic** (M4 pro - WebGPU) | 0.024 | 0.012 | 0.010 |
142
+ | **Supertonic** (RTX4090) | 0.011 | 0.004 | 0.002 |
143
+
144
+ </details>
145
+
146
+ ## License
147
+
148
+ This project’s sample code is released under the MIT License. - see the [LICENSE](https://github.com/supertone-inc/supertonic?tab=MIT-1-ov-file) for details.
149
+
150
+ The accompanying model is released under the OpenRAIL-M License. - see the [LICENSE](https://bigscience.huggingface.co/blog/bigscience-openrail-m) file for details.
151
+
152
+ This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. - see the [LICENSE](https://docs.pytorch.org/FBGEMM/general/License.html) for details.