|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
This is a fine-tune of the 7B model of VibeVoice. Requires 21.9GB of VRAM for inference (through OpenVoiceLab) |
|
|
|
|
|
<br> |
|
|
Fine-tuning was done using the code available here: <br> |
|
|
https://github.com/voicepowered-ai/VibeVoice-finetuning |
|
|
|
|
|
|
|
|
<br><br> |
|
|
Dataset used for fine-tuning, was 764 audio files available for Halo 1, Halo 2, and Halo 3, via the links below: <br> |
|
|
Halo 1: https://sounds.spriters-resource.com/xbox/halocombatevolved/asset/413569/ <br> |
|
|
Halo 2: https://sounds.spriters-resource.com/xbox/halo2/asset/436393/?source=genre <br> |
|
|
Halo 3: https://sounds.spriters-resource.com/xbox_360/halo3/asset/405404/ <br> |
|
|
|
|
|
|
|
|
|
|
|
<br><br> |
|
|
Fine-tuning parameters used were: |
|
|
|
|
|
batch_size = 1 <br> |
|
|
drop_rate = 0.2 <br> |
|
|
grad_accum = 1 <br> |
|
|
lr = 2.5e-5 <br> |
|
|
lora_r = 128 <br> |
|
|
lora_alpha = 512 <br> |
|
|
epochs = 20 <br> |
|
|
train_diff = True <br> |
|
|
bf16 = True <br> |
|
|
grad_clip = True <br> |
|
|
max_grad = 0.8 <br> |
|
|
grad_checkpoint = False <br> |
|
|
diff_weight = 1.4 <br> |
|
|
ce_weight = 0.04 <br> |
|
|
warmup = 0.03 <br> |
|
|
scheduler = "cosine" <br> |
|
|
|
|
|
<br> |
|
|
Special thanks to mrfakename, for creating OpenVoiceLab, a fantastic resource for both inference, and fine-tuning, with quite a nice GUI. |
|
|
|