SoulX-Singer / README.md

drbaph

Update README.md

0d12ed4 verified 11 days ago

preview code

raw

history blame contribute delete

3.53 kB

metadata

language:
  - en
  - zh
library_name: huggingface_hub
license: apache-2.0
pipeline_tag: text-to-speech
tags:
  - text-to-audio
  - music
  - singing-voice-synthesis
  - svs
  - zero-shot

ComfyUI Custom Node

This repository includes a custom node for ComfyUI integration:

🔗 ComfyUI-SoulX-Singer

Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.

SoulX-Singer: Converted .pt model to .safetensors

bf16 + fp32

Audio Samples

Original Audio

SpongeBob Voice

Male Voice

Towards High-Quality Zero-Shot Singing Voice Synthesis

Overview

SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

For more details, please refer to the paper: SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis.

Features

Zero-shot synthesis: Generate singing voices for unseen singers without fine-tuning
Melody-conditioned control: Use F0 contour for pitch guidance
Score-conditioned control: Use MIDI notes for precise musical notation
High-fidelity output: Realistic vocal synthesis with natural expression
Safetensors format: Optimized model weights in bf16 + fp32 precision

Citation

If you use SoulX-Singer in your research, please cite:

@article{soulxsinger2025,
  title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
  author={Soul-AILab},
  journal={arXiv preprint arXiv:2602.07803},
  year={2025}
}

License

This project is licensed under the Apache License 2.0.