xvapitch_nvidia / README.md
Pendrokar's picture
sample table
ce9453d verified
metadata
license: cc-by-4.0
language:
  - en
  - de
  - es
  - it
  - nl
  - pt
  - pl
  - ro
  - sv
  - da
  - fi
  - hu
  - el
  - fr
  - ru
  - uk
  - tr
  - ar
  - hi
  - jp
  - ko
  - zh
  - vi
  - la
  - ha
  - sw
  - yo
  - wo
thumbnail: https://raw.githubusercontent.com/DanRuta/xVA-Synth/master/assets/x-icon.png
library: xvasynth
tags:
  - emotion
  - audio
  - text-to-speech
  - tts
pipeline_tag: text-to-speech
datasets:
  - MikhailT/hifi-tts
base_model: Pendrokar/xvapitch

xVASynth's xVAPitch (v3) type of voice models based on NVIDIA HIFI NeMo datasets.

Models created by Dan Ruta, origin link:

Dataset supposed origin:

Name Synthesis Sample
ccby_nvidia_hifi_6671_M
ccby_nvidia_hifi_92_F
ccby_nvidia_hifi_6097_M
ccby_nv_hifi_11614_F
ccby_nvidia_hifi_11697_F
ccby_nvidia_hifi_12787_F
ccby_nvidia_hifi_6670_M
ccby_nvidia_hifi_8051_F
ccby_nvidia_hifi_9017_M
ccby_nvidia_hifi_9136_F

(These audio samples were created with the xVASynth Editor with the SR option (44kHz), not xVATrainer whose automatically created samples often sound different

Legal note: Although these datasets are licensed as CC BY 4.0, the base v3 model that these models are fine-tuned from, was pre-trained on non-permissive data.

v3 base model: https://huggingface.co/Pendrokar/xvapitch