|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# MMEDIT |
|
|
|
|
|
[](https://arxiv.org/abs/25xx.xxxxx) |
|
|
[](https://huggingface.co/CocoBro/MMEdit) |
|
|
[](./LICENSE) |
|
|
|
|
|
|
|
|
## Introduction |
|
|
π£ **MMEDIT** is a state-of-the-art audio generation model built upon the powerful [Qwen2-Audio 7B](https://huggingface.co/Qwen/Qwen2-Audio-7B). It leverages the robust audio understanding and instruction-following capabilities of the large language model to achieve precise and high-fidelity audio editing. |
|
|
|
|
|
--- |
|
|
## Model Download |
|
|
| Models | π€ Hugging Face | |
|
|
|-------|-------| |
|
|
| MMEdit| [MMEdit](https://huggingface.co/CocoBro/MMEdit) | |
|
|
|
|
|
download our pretrained model into ./ckpt/mmedit/ |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Usage |
|
|
### π§ Dependencies and Installation |
|
|
- Python >= 3.10 |
|
|
- [PyTorch >= 2.5.0](https://pytorch.org/) |
|
|
- [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) |
|
|
- Dependent models: |
|
|
- [Qwen/Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct), download into `./ckpt/qwen2-audio-7B-Instruct/` |
|
|
|
|
|
```bash |
|
|
# 1. Clone the repository |
|
|
git clone https://github.com/xycs6k8r2Anonymous/MMEdit.git |
|
|
cd MMEDIT |
|
|
|
|
|
# 2. Create environment |
|
|
conda create -n mmedit python=3.10 -y |
|
|
conda activate mmedit |
|
|
|
|
|
# 3. Install PyTorch and dependencies |
|
|
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Download Qwen2-Audio-7B-Instruct |
|
|
huggingface-cli download Qwen/Qwen2-Audio-7B-Instruct --local-dir ./ckpt/qwen2-audio-7B-instruct |
|
|
|
|
|
# Download MMEdit (Our Model) |
|
|
huggingface-cli download CocoBro/MMEdit --local-dir ./ckpt/mmedit |
|
|
``` |
|
|
|
|
|
## π Data Preparation |
|
|
|
|
|
For detailed instructions on the data pipeline, and dataset structure used for training, please refer to our separate documentation: |
|
|
|
|
|
π **[Data Pipeline & Preparation Guide](./datapipeline/datapipeline.md)** |
|
|
|
|
|
|
|
|
## β‘ Quick Start |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 1. Inference |
|
|
You can quickly generate example audio with the following code: |
|
|
|
|
|
``` |
|
|
bash bash_scripts/infer_single.sh |
|
|
``` |
|
|
|
|
|
The output will be save at inference/example |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### 1. Configuration |
|
|
Before running inference or training, please check `configs/config.yaml`. The project uses `hydra` for configuration management, allowing easy overrides via command line. |
|
|
|
|
|
### 2. Inference |
|
|
To run batch inference using the provided scripts: |
|
|
|
|
|
```bash |
|
|
cd src |
|
|
bash bash_scripts/inference.sh |
|
|
``` |
|
|
|
|
|
### 3. Training |
|
|
Ensure you have downloaded the **Qwen2-Audio-7B-Instruct** checkpoint to `./ckpt/qwen2-audio-7B-instruct` and prepared your data according to the [Data Pipeline Guide](./docs/DATA_PIPELINE.md). |
|
|
|
|
|
```bash |
|
|
cd src |
|
|
# Launch distributed training |
|
|
bash bash_scripts/train_dist.sh |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Todo |
|
|
- [ ] Release inference code and checkpoints. |
|
|
- [ ] Release training scripts. |
|
|
- [ ] Add HuggingFace Gradio Demo. |
|
|
- [ ] Release evaluation metrics and post-processing tools. |
|
|
|
|
|
## π€ Acknowledgement |
|
|
We thank the following open-source projects for their inspiration and code: |
|
|
* [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio) |
|
|
* [Uniflowaudio](https://github.com/wsntxxn/UniFlow-Audio) |
|
|
* [AudioTime](https://github.com/wsntxxn/UniFlow-Audio) |
|
|
|
|
|
|
|
|
## ποΈ Citation |
|
|
If you find this project useful, please cite our paper: |
|
|
|
|
|
```bibtex |
|
|
@article{mmedit2024, |
|
|
title={MMEDIT: Audio Generation based on Qwen2-Audio 7B}, |
|
|
author={Your Name and Collaborators}, |
|
|
journal={arXiv preprint arXiv:25xx.xxxxx}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|