SoulX-Singer / README.md
drbaph's picture
Update README.md
0d12ed4 verified
metadata
language:
  - en
  - zh
library_name: huggingface_hub
license: apache-2.0
pipeline_tag: text-to-speech
tags:
  - text-to-audio
  - music
  - singing-voice-synthesis
  - svs
  - zero-shot

ComfyUI Custom Node

This repository includes a custom node for ComfyUI integration:

🔗 ComfyUI-SoulX-Singer

Screenshot 2026-02-11 160905

Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.

SoulX-Singer: Converted .pt model to .safetensors

bf16 + fp32

Audio Samples

Original Audio

SpongeBob Voice

Male Voice


Towards High-Quality Zero-Shot Singing Voice Synthesis

SoulX-Singer_Logo

version Github arXiv technical report Apache-2.0


Overview

SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

For more details, please refer to the paper: SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis.


Features

  • Zero-shot synthesis: Generate singing voices for unseen singers without fine-tuning
  • Melody-conditioned control: Use F0 contour for pitch guidance
  • Score-conditioned control: Use MIDI notes for precise musical notation
  • High-fidelity output: Realistic vocal synthesis with natural expression
  • Safetensors format: Optimized model weights in bf16 + fp32 precision

Citation

If you use SoulX-Singer in your research, please cite:

@article{soulxsinger2025,
  title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
  author={Soul-AILab},
  journal={arXiv preprint arXiv:2602.07803},
  year={2025}
}

License

This project is licensed under the Apache License 2.0.