Update README.md
Browse files
README.md
CHANGED
|
@@ -5,10 +5,95 @@ language:
|
|
| 5 |
- de
|
| 6 |
- fr
|
| 7 |
- zh
|
|
|
|
|
|
|
| 8 |
base_model:
|
| 9 |
- FunAudioLLM/CosyVoice2-0.5B
|
| 10 |
- Qwen/Qwen3-0.6B
|
| 11 |
- utter-project/EuroLLM-1.7B-Instruct
|
| 12 |
- mistralai/Mistral-7B-v0.3
|
| 13 |
pipeline_tag: text-to-speech
|
| 14 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- de
|
| 6 |
- fr
|
| 7 |
- zh
|
| 8 |
+
- ko
|
| 9 |
+
- ja
|
| 10 |
base_model:
|
| 11 |
- FunAudioLLM/CosyVoice2-0.5B
|
| 12 |
- Qwen/Qwen3-0.6B
|
| 13 |
- utter-project/EuroLLM-1.7B-Instruct
|
| 14 |
- mistralai/Mistral-7B-v0.3
|
| 15 |
pipeline_tag: text-to-speech
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
<p align="center">
|
| 20 |
+
<img src="https://horstmann.tech/cosyvoice2-demo/cosyvoice2-logo-clear.png" alt="CosyVoice2-EU logo" width="260">
|
| 21 |
+
</p>
|
| 22 |
+
|
| 23 |
+
# CosyVoice2-0.5B-EU — FR/DE Zero-Shot Voice Cloning (CosyVoice2)
|
| 24 |
+
|
| 25 |
+
**Europeanized CosyVoice2 for French & German.**
|
| 26 |
+
Plug-and-play zero-shot voice cloning with streaming support, bilingual training (FR+DE), and a simple CLI via the companion PyPI package.
|
| 27 |
+
|
| 28 |
+
**👉 PyPI:** `cosyvoice2-eu` (current: **0.2.7**) at https://pypi.org/project/cosyvoice2-eu/
|
| 29 |
+
**👉 Demo:** https://horstmann.tech/cosyvoice2-demo/
|
| 30 |
+
**👉 Built on:** FunAudioLLM **CosyVoice2** (semantic LM + chunk-aware flow + HiFi-GAN)
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## TL;DR
|
| 35 |
+
High-quality **French/German** zero-shot TTS (text + short reference audio) built on **CosyVoice2**. Optimized for sentence-to-paragraph narration, bilingual FR+DE adaptation, and easy local inference.
|
| 36 |
+
While this model is optimized for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## Quickstart (CLI)
|
| 41 |
+
|
| 42 |
+
Install:
|
| 43 |
+
```bash
|
| 44 |
+
pip install cosyvoice2-eu
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
French example:
|
| 48 |
+
```bash
|
| 49 |
+
cosy2-eu --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé." --prompt path/to/french_ref.wav --out out_fr.wav
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
German example:
|
| 53 |
+
```bash
|
| 54 |
+
cosy2-eu --text "Hallo! Ich präsentiere CosyVoice 2 – ein fortschrittliches TTS-System." --prompt path/to/german_ref.wav --out out_de.wav
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
> First run downloads the model from this repo and caches it locally.
|
| 58 |
+
> Tip: You can experiment with prompts for style control using `"<style>. <|endofprompt|> <text>"`, e.g., "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute?"
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
## What you get
|
| 63 |
+
- **Zero-shot voice cloning** for **FR/DE** (reference audio → cloned timbre & style).
|
| 64 |
+
- **Bilingual adaptation** (FR+DE) on top of CosyVoice2 for stronger data efficiency. While this model adds support for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.
|
| 65 |
+
- **Streaming & non-streaming** synthesis supported by the underlying architecture.
|
| 66 |
+
- **Simple local inference**: one pip install, one CLI (`cosy2-eu`).
|
| 67 |
+
- **Interoperable components** (text→semantic LM, flow decoder, HiFi-GAN vocoder).
|
| 68 |
+
|
| 69 |
+
Also compatible with original CosyVoice2 languages (EN/ZH/JA/KO & dialects).
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## Inputs / Outputs
|
| 74 |
+
- **Input:** text (FR/DE) + short **reference audio** (mono WAV recommended).
|
| 75 |
+
- **Output:** synthesized WAV cloning the reference speaker’s timbre, speaking the input text in FR/DE.
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## Notes & limitations
|
| 80 |
+
- FR/DE were adapted under constrained open-data budgets; extreme edge cases (very noisy prompts, long numerics, heavy code-switching) may require careful prompting or additional fine-tuning.
|
| 81 |
+
- Voice cloning carries **misuse risks** (impersonation, fraud). Use only with consent and follow local laws/policies.
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## License & attribution
|
| 86 |
+
- **License:** Apache-2.0 (see card metadata / repo).
|
| 87 |
+
- Built on **CosyVoice2** by FunAudioLLM; please cite their work (see below).
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
**Links**
|
| 93 |
+
- PyPI (inference CLI): https://pypi.org/project/cosyvoice2-eu/
|
| 94 |
+
- Upstream project: https://github.com/FunAudioLLM/CosyVoice
|
| 95 |
+
- CosyVoice2 paper & page: https://arxiv.org/abs/2412.10117 • https://funaudiollm.github.io/cosyvoice2/
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
*If you use CosyVoice2-0.5B-EU in research or products, please add a short acknowledgment and share feedback or samples—we’re continuously improving FR/DE expressiveness and robustness.*
|