hf-audio
/

xcodec2

@@ -20,7 +20,14 @@ Its architecture is based on X-Codec with several major differences:
 - **Semantic Supervision During Training**: It adds a semantic reconstruction loss, ensuring that the discrete tokens preserve meaningful linguistic and emotional information — crucial for TTS tasks.
 - **Transformer-Friendly Design**: The 1D token structure of X-Codec2 naturally aligns with the autoregressive modeling in LLMs like LLaMA, improving training efficiency and downstream compatibility.
-## Usage example
 Here is a quick example of how to encode and decode an audio using this model:

 - **Semantic Supervision During Training**: It adds a semantic reconstruction loss, ensuring that the discrete tokens preserve meaningful linguistic and emotional information — crucial for TTS tasks.
 - **Transformer-Friendly Design**: The 1D token structure of X-Codec2 naturally aligns with the autoregressive modeling in LLMs like LLaMA, improving training efficiency and downstream compatibility.
+## Usage example
+Since Xcodec2 isn't yet merged into Transformers, you can install from source from the [corresponding fork](https://github.com/Deep-unlearning/transformers/tree/add-xcodec2).
+Setup
+```python
+pip install git+https://github.com/Deep-unlearning/transformers.git@add-xcodec2
+```
 Here is a quick example of how to encode and decode an audio using this model: