--- pipeline_tag: any-to-any library_name: transformers tags: - text-to-image - image-editing - image-understanding - vision-language - multimodal - unified-model - teacher-model - diffusion license: mit --- ## 🌌 UniPic3-Teacher-Model

## 📖 Introduction

**UniPic3-Teacher-Model** is the **high-quality teacher diffusion model** used in the UniPic 3.0 framework. It is trained with **full multi-step diffusion sampling** and optimized for **maximum perceptual quality, semantic consistency, and realism**. This model serves as the **teacher backbone** for: - **Distribution Matching Distillation (DMD)** - **Consistency / trajectory distillation** - **Few-step student model training** Rather than being optimized for fast inference, the teacher model prioritizes **generation fidelity and stability**, providing a strong and reliable supervision signal for downstream distilled models. --- ## 🧠 Model Characteristics - **Role**: Teacher model (not a distilled student) - **Sampling**: Multi-step diffusion (high-fidelity) - **Architecture**: Unified UniPic3 Transformer - **Tasks Supported**: - Single-image editing - Multi-image composition (2–6 images) - Human–Object Interaction (HOI) - **Resolution**: Flexible, within pixel budget constraints - **Training Objective**: - Flow Matching / Diffusion loss - Used as teacher for DMD & consistency training --- ## 📊 Benchmarks

This teacher model achieves **state-of-the-art performance** on: - Image editing benchmarks - Multi-image composition benchmarks It provides **high-quality supervision targets** for distilled UniPic3 student models. --- ## ⚠️ Important Note > **This repository hosts the teacher model.** > It is **not optimized for few-step inference**. If you are looking for: - ⚡ **4–8 step fast inference** - 🚀 **Deployment-friendly distilled models** please refer to the **UniPic3-DMD / distilled checkpoints** instead. --- ## 🧠 Usage (Teacher Model) ### 1. Clone the Repository ```bash git clone https://github.com/SkyworkAI/UniPic cd UniPic-3 ``` ### 2. Set Up the Environment ```bash conda create -n unipic python=3.10 conda activate unipic3 pip install -r requirements.txt ``` ### 3.Batch Inference ```bash transformer_path = "Skywork/Unipic3" python -m torch.distributed.launch --nproc_per_node=1 --master_port 29501 --use_env \ qwen_image_edit_fast/batch_inference.py \ --jsonl_path data/val.jsonl \ --output_dir work_dirs/output \ --distributed \ --num_inference_steps 50 \ --true_cfg_scale 4.0 \ --transformer transformer_path \ --skip_existing ``` ## 📄 License This model is released under the MIT License. ## Citation If you use Skywork-UniPic in your research, please cite: ``` @article{wang2025skywork, title={Skywork unipic: Unified autoregressive modeling for visual understanding and generation}, author={Wang, Peiyu and Peng, Yi and Gan, Yimeng and Hu, Liang and Xie, Tianyidan and Wang, Xiaokun and Wei, Yichen and Tang, Chuanxin and Zhu, Bo and Li, Changshi and others}, journal={arXiv preprint arXiv:2508.03320}, year={2025} } ``` ``` @article{wei2025skywork, title={Skywork unipic 2.0: Building kontext model with online rl for unified multimodal model}, author={Wei, Hongyang and Xu, Baixin and Liu, Hongbo and Wu, Cyrus and Liu, Jie and Peng, Yi and Wang, Peiyu and Liu, Zexiang and He, Jingwen and Xietian, Yidan and others}, journal={arXiv preprint arXiv:2509.04548}, year={2025} } ``` ``` @article{wei2026skywork, title={Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling}, author={Wei, Hongyang and Liu, Hongbo and Wang, Zidong and Peng, Yi and Xu, Baixin and Wu, Size and Zhang, Xuying and He, Xianglong and Liu, Zexiang and Wang, Peiyu and others}, journal={arXiv preprint arXiv:2601.15664}, year={2026} } ```