Audio-to-Audio
dualcodec
jiaqili3 commited on
Commit
030cac5
·
verified ·
1 Parent(s): 32b7765

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DualCodec
2
+ ## Installation
3
+ ```bash
4
+ pip install dualcodec
5
+ ```
6
+
7
+ ## Available models
8
+ <!-- - 12hz_v1: DualCodec model trained with 12Hz sampling rate.
9
+ - 25hz_v1: DualCodec model trained with 25Hz sampling rate. -->
10
+
11
+ | Model_ID | Frame Rate | RVQ Quantizers | Semantic Codebook Size (RVQ-1 Size) | Acoustic Codebook Size (RVQ-rest Size) | Training Data |
12
+ |-----------|------------|----------------------|-------------------------------------|----------------------------------------|---------------------|
13
+ | 12hz_v1 | 12.5Hz | Any from 1-8 (maximum 8) | 16384 | 4096 | 100K hours Emilia |
14
+ | 25hz_v1 | 25Hz | Any from 1-12 (maximum 12) | 16384 | 1024 | 100K hours Emilia |
15
+
16
+
17
+ ## How to inference
18
+
19
+ Download checkpoints to local:
20
+ ```
21
+ # export HF_ENDPOINT=https://hf-mirror.com # uncomment this to use huggingface mirror if you're in China
22
+ huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
23
+ huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
24
+ ```
25
+
26
+ To inference an audio in a python script:
27
+ ```python
28
+ import dualcodec
29
+
30
+ w2v_path = "./w2v-bert-2.0" # your downloaded path
31
+ dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
32
+ model_id = "12hz_v1" # or "25hz_v1"
33
+
34
+ dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
35
+ inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")
36
+
37
+ # do inference for your wav
38
+ import torchaudio
39
+ audio, sr = torchaudio.load("YOUR_WAV.wav")
40
+ # resample to 24kHz
41
+ audio = torchaudio.functional.resample(audio, sr, 24000)
42
+ audio = audio.reshape(1,1,-1)
43
+ # extract codes, for example, using 8 quantizers here:
44
+ semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
45
+ # semantic_codes shape: torch.Size([1, 1, T])
46
+ # acoustic_codes shape: torch.Size([1, n_quantizers-1, T])
47
+
48
+ # produce output audio
49
+ out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)
50
+
51
+ # save output audio
52
+ torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)
53
+ ```
54
+
55
+ See "example.ipynb" for example inference scripts.
56
+
57
+ ## Training DualCodec
58
+ Stay tuned for the training code release!