Music Source Separation

This is the Demucs v4 models from Facebook Research.

What is HTDemucs?

HTDemucs (Hybrid Transformer Demucs) is Meta AI's fourth-generation music source separation model, introduced in Hybrid Transformers for Music Source Separation (Rouard et al., ICASSP 2023).

Where earlier Demucs generations processed audio purely in the time domain, HTDemucs runs two parallel encoders simultaneously — one operating on the raw waveform, the other on the STFT spectrogram — with a Transformer Encoder with cross-attention at the bottleneck connecting them. This lets the model correlate time-domain and frequency-domain features before decoding, yielding measurably better separation quality — especially on spectrally complex, temporally sparse instruments like piano and guitar.

The htdemucs_6s variant adds dedicated guitar and piano stems on top of the standard drums/bass/other/vocals quad, making it the most capable publicly available separation model for music production use.

From Facebook research:

Demucs is based on U-Net convolutional architecture inspired by Wave-U-Net and SING, with GLUs, a BiLSTM between the encoder and decoder, specific initialization of weights and transposed convolutions in the decoder.

See facebookresearch's repository for more information on Demucs.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for iBoostAI/Demucs-v4

Hybrid Transformers for Music Source Separation

Paper • 2211.08553 • Published Nov 15, 2022 • 1