Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,58 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DualCodec
|
| 2 |
+
## Installation
|
| 3 |
+
```bash
|
| 4 |
+
pip install dualcodec
|
| 5 |
+
```
|
| 6 |
+
|
| 7 |
+
## Available models
|
| 8 |
+
<!-- - 12hz_v1: DualCodec model trained with 12Hz sampling rate.
|
| 9 |
+
- 25hz_v1: DualCodec model trained with 25Hz sampling rate. -->
|
| 10 |
+
|
| 11 |
+
| Model_ID | Frame Rate | RVQ Quantizers | Semantic Codebook Size (RVQ-1 Size) | Acoustic Codebook Size (RVQ-rest Size) | Training Data |
|
| 12 |
+
|-----------|------------|----------------------|-------------------------------------|----------------------------------------|---------------------|
|
| 13 |
+
| 12hz_v1 | 12.5Hz | Any from 1-8 (maximum 8) | 16384 | 4096 | 100K hours Emilia |
|
| 14 |
+
| 25hz_v1 | 25Hz | Any from 1-12 (maximum 12) | 16384 | 1024 | 100K hours Emilia |
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## How to inference
|
| 18 |
+
|
| 19 |
+
Download checkpoints to local:
|
| 20 |
+
```
|
| 21 |
+
# export HF_ENDPOINT=https://hf-mirror.com # uncomment this to use huggingface mirror if you're in China
|
| 22 |
+
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
|
| 23 |
+
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
To inference an audio in a python script:
|
| 27 |
+
```python
|
| 28 |
+
import dualcodec
|
| 29 |
+
|
| 30 |
+
w2v_path = "./w2v-bert-2.0" # your downloaded path
|
| 31 |
+
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
|
| 32 |
+
model_id = "12hz_v1" # or "25hz_v1"
|
| 33 |
+
|
| 34 |
+
dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
|
| 35 |
+
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")
|
| 36 |
+
|
| 37 |
+
# do inference for your wav
|
| 38 |
+
import torchaudio
|
| 39 |
+
audio, sr = torchaudio.load("YOUR_WAV.wav")
|
| 40 |
+
# resample to 24kHz
|
| 41 |
+
audio = torchaudio.functional.resample(audio, sr, 24000)
|
| 42 |
+
audio = audio.reshape(1,1,-1)
|
| 43 |
+
# extract codes, for example, using 8 quantizers here:
|
| 44 |
+
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
|
| 45 |
+
# semantic_codes shape: torch.Size([1, 1, T])
|
| 46 |
+
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])
|
| 47 |
+
|
| 48 |
+
# produce output audio
|
| 49 |
+
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)
|
| 50 |
+
|
| 51 |
+
# save output audio
|
| 52 |
+
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
See "example.ipynb" for example inference scripts.
|
| 56 |
+
|
| 57 |
+
## Training DualCodec
|
| 58 |
+
Stay tuned for the training code release!
|