Update README.md
Browse files
README.md
CHANGED
|
@@ -19,10 +19,10 @@ We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (
|
|
| 19 |
|
| 20 |
[](https://github.com/HeCheng0625/Diffusion-Speech-Tokenizer)
|
| 21 |
[](https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf)
|
| 22 |
-
[](https://www.python.org/)
|
| 24 |
[](https://pytorch.org/)
|
| 25 |
-
[
|
| 61 |
|
| 62 |
# Load AR TTS model, it will automatically download the model from Hugging Face for the first time
|
| 63 |
-
tts_model = TTSInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-AR-Qwen2.5-
|
| 64 |
|
| 65 |
# Load MGM TTS model, it will automatically download the model from Hugging Face for the first time
|
| 66 |
tts_model = MGMInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-MGM")
|
|
@@ -120,13 +120,14 @@ sf.write("./use_examples/test_audio/trump_rec.wav", rec_audio, 24000)
|
|
| 120 |
import torch
|
| 121 |
import soundfile as sf
|
| 122 |
from models.tts.llm_tts.inference_llm_tts import TTSInferencePipeline
|
|
|
|
| 123 |
|
| 124 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 125 |
|
| 126 |
# Create AR TTS pipeline
|
| 127 |
pipeline = TTSInferencePipeline.from_pretrained(
|
| 128 |
tadicodec_path="./ckpt/TaDiCodec",
|
| 129 |
-
llm_path="./ckpt/TaDiCodec-TTS-AR-Qwen2.5-
|
| 130 |
device=device,
|
| 131 |
)
|
| 132 |
|
|
@@ -178,8 +179,6 @@ MaskGCT:
|
|
| 178 |
|
| 179 |
# π Acknowledgments
|
| 180 |
|
| 181 |
-
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|
| 182 |
-
|
| 183 |
- **MGM-based TTS** is built upon [MaskGCT](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct).
|
| 184 |
|
| 185 |
- **Vocos vocoder** is built upon [Vocos](https://github.com/gemelo-ai/vocos).
|
|
@@ -188,3 +187,4 @@ MaskGCT:
|
|
| 188 |
|
| 189 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
| 190 |
|
|
|
|
|
|
| 19 |
|
| 20 |
[](https://github.com/HeCheng0625/Diffusion-Speech-Tokenizer)
|
| 21 |
[](https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf)
|
| 22 |
+
[](https://tadicodec.github.io/)
|
| 23 |
[](https://www.python.org/)
|
| 24 |
[](https://pytorch.org/)
|
| 25 |
+
[](https://huggingface.co/amphion/TaDiCodec)
|
| 26 |
|
| 27 |
# π€ Pre-trained Models
|
| 28 |
|
|
|
|
| 34 |
|
| 35 |
| Model | π€ Hugging Face | π· Status |
|
| 36 |
|:-----:|:---------------:|:------:|
|
| 37 |
+
| **π TaDiCodec** | [](https://huggingface.co/amphion/TaDiCodec) | β
|
|
| 38 |
+
| **π TaDiCodec-old** | [](https://huggingface.co/amphion/TaDiCodec-old) | π§ |
|
| 39 |
|
| 40 |
*Note: TaDiCodec-old is the old version of TaDiCodec, the TaDiCodec-TTS-AR-Phi-3.5-4B is based on TaDiCodec-old.*
|
| 41 |
|
|
|
|
| 43 |
|
| 44 |
| Model | Type | LLM | π€ Hugging Face | π· Status |
|
| 45 |
|:-----:|:----:|:---:|:---------------:|:-------------:|
|
| 46 |
+
| **π€ TaDiCodec-TTS-AR-Qwen2.5-0.5B** | AR | Qwen2.5-0.5B-Instruct | [](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B) | β
|
|
| 47 |
+
| **π€ TaDiCodec-TTS-AR-Qwen2.5-3B** | AR | Qwen2.5-3B-Instruct | [](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-3B) | β
|
|
| 48 |
+
| **π€ TaDiCodec-TTS-AR-Phi-3.5-4B** | AR | Phi-3.5-mini-instruct | [](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Phi-3.5-4B) | π§ |
|
| 49 |
+
| **π TaDiCodec-TTS-MGM** | MGM | - | [](https://huggingface.co/amphion/TaDiCodec-TTS-MGM) | β
|
|
| 50 |
|
| 51 |
## π§ Quick Model Usage
|
| 52 |
|
|
|
|
| 60 |
tokenizer = TaDiCodecPipline.from_pretrained("amphion/TaDiCodec")
|
| 61 |
|
| 62 |
# Load AR TTS model, it will automatically download the model from Hugging Face for the first time
|
| 63 |
+
tts_model = TTSInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-AR-Qwen2.5-3B")
|
| 64 |
|
| 65 |
# Load MGM TTS model, it will automatically download the model from Hugging Face for the first time
|
| 66 |
tts_model = MGMInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-MGM")
|
|
|
|
| 120 |
import torch
|
| 121 |
import soundfile as sf
|
| 122 |
from models.tts.llm_tts.inference_llm_tts import TTSInferencePipeline
|
| 123 |
+
# from models.tts.llm_tts.inference_mgm_tts import MGMInferencePipeline
|
| 124 |
|
| 125 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 126 |
|
| 127 |
# Create AR TTS pipeline
|
| 128 |
pipeline = TTSInferencePipeline.from_pretrained(
|
| 129 |
tadicodec_path="./ckpt/TaDiCodec",
|
| 130 |
+
llm_path="./ckpt/TaDiCodec-TTS-AR-Qwen2.5-3B",
|
| 131 |
device=device,
|
| 132 |
)
|
| 133 |
|
|
|
|
| 179 |
|
| 180 |
# π Acknowledgments
|
| 181 |
|
|
|
|
|
|
|
| 182 |
- **MGM-based TTS** is built upon [MaskGCT](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct).
|
| 183 |
|
| 184 |
- **Vocos vocoder** is built upon [Vocos](https://github.com/gemelo-ai/vocos).
|
|
|
|
| 187 |
|
| 188 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
| 189 |
|
| 190 |
+
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|