Add `library_name` to metadata
Browse filesThis PR enhances the model card by adding `library_name: transformers` to the metadata.
This tag is justified by the `config.json` file, which specifies `"architectures": ["LlamaForCausalLM"]` and `"model_type": "llama"`. Llama-based models are typically integrated and used with the Hugging Face `transformers` library, enabling a predefined code snippet for users on the Hub.
No sample usage code snippet has been added as the provided GitHub README does not contain a suitable Python example for programmatic inference via a library.
README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
| 2 |
language:
|
| 3 |
- zh
|
| 4 |
- en
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- llm
|
| 7 |
- tts
|
|
@@ -9,8 +11,7 @@ tags:
|
|
| 9 |
- voice-cloning
|
| 10 |
- reinforcement-learning
|
| 11 |
- flow-matching
|
| 12 |
-
|
| 13 |
-
pipeline_tag: text-to-speech
|
| 14 |
---
|
| 15 |
|
| 16 |
# GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS
|
|
@@ -35,12 +36,12 @@ By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS sign
|
|
| 35 |
|
| 36 |
### Key Features
|
| 37 |
|
| 38 |
-
*
|
| 39 |
-
*
|
| 40 |
-
*
|
| 41 |
-
*
|
| 42 |
-
*
|
| 43 |
-
*
|
| 44 |
|
| 45 |
## System Architecture
|
| 46 |
|
|
@@ -73,7 +74,7 @@ Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error
|
|
| 73 |
### Installation
|
| 74 |
|
| 75 |
```bash
|
| 76 |
-
git clone
|
| 77 |
cd GLM-TTS
|
| 78 |
pip install -r requirements.txt
|
| 79 |
```
|
|
@@ -115,4 +116,4 @@ If you find GLM-TTS useful for your research, please cite our technical report:
|
|
| 115 |
primaryClass={cs.SD},
|
| 116 |
url={https://arxiv.org/abs/2512.14291},
|
| 117 |
}
|
| 118 |
-
|
|
|
|
| 2 |
language:
|
| 3 |
- zh
|
| 4 |
- en
|
| 5 |
+
license: mit
|
| 6 |
+
pipeline_tag: text-to-speech
|
| 7 |
tags:
|
| 8 |
- llm
|
| 9 |
- tts
|
|
|
|
| 11 |
- voice-cloning
|
| 12 |
- reinforcement-learning
|
| 13 |
- flow-matching
|
| 14 |
+
library_name: transformers
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
# GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS
|
|
|
|
| 36 |
|
| 37 |
### Key Features
|
| 38 |
|
| 39 |
+
* **Zero-shot Voice Cloning:** Clone any speaker's voice with just 3-10 seconds of prompt audio.
|
| 40 |
+
* **RL-enhanced Emotion Control:** Utilizes a multi-reward reinforcement learning framework (GRPO) to optimize prosody and emotion.
|
| 41 |
+
* **High-quality Synthesis:** Generates speech comparable to commercial systems with reduced Character Error Rate (CER).
|
| 42 |
+
* **Phoneme-level Control:** Supports "Hybrid Phoneme + Text" input for precise pronunciation control (e.g., polyphones).
|
| 43 |
+
* **Streaming Inference:** Supports real-time audio generation suitable for interactive applications.
|
| 44 |
+
* **Bilingual Support:** Optimized for Chinese and English mixed text.
|
| 45 |
|
| 46 |
## System Architecture
|
| 47 |
|
|
|
|
| 74 |
### Installation
|
| 75 |
|
| 76 |
```bash
|
| 77 |
+
git clone https://github.com/zai-org/GLM-TTS.git
|
| 78 |
cd GLM-TTS
|
| 79 |
pip install -r requirements.txt
|
| 80 |
```
|
|
|
|
| 116 |
primaryClass={cs.SD},
|
| 117 |
url={https://arxiv.org/abs/2512.14291},
|
| 118 |
}
|
| 119 |
+
```
|