Safetensors
Cortana / README.md
Peter W
Update README.md
b494f20 verified
metadata
license: apache-2.0

This is a fine-tune of the 7B model of VibeVoice. Requires 21.9GB of VRAM for inference (through OpenVoiceLab)


Fine-tuning was done using the code available here:
https://github.com/voicepowered-ai/VibeVoice-finetuning



Dataset used for fine-tuning, was 764 audio files available for Halo 1, Halo 2, and Halo 3, via the links below:
Halo 1: https://sounds.spriters-resource.com/xbox/halocombatevolved/asset/413569/
Halo 2: https://sounds.spriters-resource.com/xbox/halo2/asset/436393/?source=genre
Halo 3: https://sounds.spriters-resource.com/xbox_360/halo3/asset/405404/



Fine-tuning parameters used were:

batch_size = 1
drop_rate = 0.2
grad_accum = 1
lr = 2.5e-5
lora_r = 128
lora_alpha = 512
epochs = 20
train_diff = True
bf16 = True
grad_clip = True
max_grad = 0.8
grad_checkpoint = False
diff_weight = 1.4
ce_weight = 0.04
warmup = 0.03
scheduler = "cosine"


Special thanks to mrfakename, for creating OpenVoiceLab, a fantastic resource for both inference, and fine-tuning, with quite a nice GUI.