|
|
--- |
|
|
|
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
library_name: huggingface_hub |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-to-speech |
|
|
tags: |
|
|
- text-to-audio |
|
|
- music |
|
|
- singing-voice-synthesis |
|
|
- svs |
|
|
- zero-shot |
|
|
|
|
|
--- |
|
|
|
|
|
## ComfyUI Custom Node |
|
|
|
|
|
This repository includes a custom node for ComfyUI integration: |
|
|
|
|
|
🔗 **[ComfyUI-SoulX-Singer](https://github.com/Saganaki22/ComfyUI-SoulX-Singer)** |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis. |
|
|
|
|
|
# SoulX-Singer: Converted .pt model to .safetensors |
|
|
**bf16 + fp32** |
|
|
|
|
|
## Audio Samples |
|
|
|
|
|
### Original Audio |
|
|
<audio controls> |
|
|
<source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/song.mp3" type="audio/mpeg"> |
|
|
Your browser does not support the audio element. |
|
|
</audio> |
|
|
|
|
|
### SpongeBob Voice |
|
|
<audio controls> |
|
|
<source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-1.mp3" type="audio/mpeg"> |
|
|
Your browser does not support the audio element. |
|
|
</audio> |
|
|
|
|
|
### Male Voice |
|
|
<audio controls> |
|
|
<source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-2.mp3" type="audio/mpeg"> |
|
|
Your browser does not support the audio element. |
|
|
</audio> |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<b><em>Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b> |
|
|
<p> |
|
|
<img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;"> |
|
|
</p> |
|
|
<p> |
|
|
<a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a> |
|
|
<a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a> |
|
|
<a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a> |
|
|
<a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a> |
|
|
<a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a> |
|
|
</p> |
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
**SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression. |
|
|
|
|
|
For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803). |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Zero-shot synthesis**: Generate singing voices for unseen singers without fine-tuning |
|
|
- **Melody-conditioned control**: Use F0 contour for pitch guidance |
|
|
- **Score-conditioned control**: Use MIDI notes for precise musical notation |
|
|
- **High-fidelity output**: Realistic vocal synthesis with natural expression |
|
|
- **Safetensors format**: Optimized model weights in bf16 + fp32 precision |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use SoulX-Singer in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{soulxsinger2025, |
|
|
title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis}, |
|
|
author={Soul-AILab}, |
|
|
journal={arXiv preprint arXiv:2602.07803}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This project is licensed under the Apache License 2.0. |