--- language: - en - zh library_name: huggingface_hub license: apache-2.0 pipeline_tag: text-to-speech tags: - text-to-audio - music - singing-voice-synthesis - svs - zero-shot --- ## ComfyUI Custom Node This repository includes a custom node for ComfyUI integration: 🔗 **[ComfyUI-SoulX-Singer](https://github.com/Saganaki22/ComfyUI-SoulX-Singer)** ![Screenshot 2026-02-11 160905](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/FqxVnkFDrVt287ppwQj90.png) Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis. # SoulX-Singer: Converted .pt model to .safetensors **bf16 + fp32** ## Audio Samples ### Original Audio

### SpongeBob Voice

### Male Voice

---

Towards High-Quality Zero-Shot Singing Voice Synthesis

--- ## Overview **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression. For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803). --- ## Features - **Zero-shot synthesis**: Generate singing voices for unseen singers without fine-tuning - **Melody-conditioned control**: Use F0 contour for pitch guidance - **Score-conditioned control**: Use MIDI notes for precise musical notation - **High-fidelity output**: Realistic vocal synthesis with natural expression - **Safetensors format**: Optimized model weights in bf16 + fp32 precision --- ## Citation If you use SoulX-Singer in your research, please cite: ```bibtex @article{soulxsinger2025, title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis}, author={Soul-AILab}, journal={arXiv preprint arXiv:2602.07803}, year={2025} } ``` --- ## License This project is licensed under the Apache License 2.0.