File size: 11,754 Bytes
a5e6506 79ccf3a 54eca88 79ccf3a 54eca88 79ccf3a 54eca88 e2ea72f 54eca88 79ccf3a 54eca88 79ccf3a 54eca88 79ccf3a 54eca88 313bba5 79ccf3a 54eca88 79ccf3a 54eca88 79ccf3a e2ea72f 79ccf3a 54eca88 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
---
license: cc-by-nc-sa-4.0
---
# ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
<div align="center">
[-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2401.01456)
[](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html)
[-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2502.19937)
[-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2504.06895)
[](https://huggingface.co/tellurion/ColorizeDiffusion/tree/main)
[](https://github.com/tellurion-kanata/colorizeDiffusion/blob/master/LICENSE)
</div>

(April. 2025)
Official implementation of Colorize Diffusion.
Colorize Diffusion is a SD-based colorization framework that can achieve high-quality colorization results with arbitrary input pairs.
Fundamental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
***Version 1*** - Base training, 512px. Released, ckpt starts with **mult**.
***Version 1.5*** - Solving spatial entanglement, 512px. Released, ckpt starts with **switch**.
***Version 2*** - Enhancing background and style transfer, 768px. Released, ckpt starts with **v2**.
***Version XL*** - Enhancing embedding guidance for character colorization, geometry disentanglement, 1024px. Available soon.
## Getting Start
-------------------------------------------------------------------------------------------
```shell
conda env create -f environment.yaml
conda activate hf
```
## User Interface
-------------------------------------------------------------------------------------------
We implement a fully-featured UI. To run it, just:
```shell
python -u app.py
```
The default server address is http://localhost:7860.
#### Important inference options
| Options | Description |
|:----------------------|:--------------------------------------------------------------------------------------------------|
| BG enhance | Low-level feature injection for v2 models. |
| FG enhance | Useless for currently open-sourced models. |
| Reference strength | Decreasing it to increase semantic fidelity to sketch inputs. |
| Foreground strength | Similar to reference strength but only for foreground region. Need to activate FG or BG enhance. |
| Preprocessor | Sketch preprocessing. **Extract** is suggested if the sketch input is complicated pencil drawing. |
| Line extractor | Line extractors used when preprocessor is **Extract**. |
| Sketch guidance scale | Classifier-free guidance scale of the sketch image, suggested 1. |
| Attention injection | Noised low-level feature injection, 2x inference time. |
### 768-level Cross-content colorization results (from v2)


### 1536-level Character colorization results (from XL)


## Manipulation
-------------------------------------------------------------------------------------------
The colorization results can be manipulated using text prompts, see [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
It is now deactivated by default. To activate it, use
```shell
python -u app.py -manipulate
```
For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.
The manipulation result and correlation visualization of the settings:
Target prompt: the girl's blonde hair
Anchor prompt the girl's brown hair
Control prompt the girl's brown hair,
Target scale: 8
Enhanced: false
Thresholds: 0.5γ0.55γ0.65γ0.95


As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.
#### Manipulation options
| Options | Description |
| :----- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Group index | The index of selected manipulation sequences's parameter group. |
| Target prompt | The prompt used to specify the desired visual attribute for the image after manipulation. |
| Anchor prompt | The prompt to specify the anchored visaul attribute for the image before manipulation. |
| Control prompt | Used for local manipulation (crossattn-based models). The prompt to specify the target regions. |
| Enhance | Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute). |
| Target scale | The scale used to progressively control the manipulation. |
| Thresholds | Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 < threshold 0 < threshold 1 < threshold 2 < threshold 3 < 1.0. |
| \<Threshold0 | Select regions most related to control prompt. Indicated by deep blue. |
| Threshold0-Threshold1 | Select regions related to control prompt. Indicated by blue. |
| Threshold1-Threshold2 | Select neighbouring but unrelated regions. Indicated by green. |
| Threshold2-Threshold3 | Select unrelated regions. Indicated by orange. |
| \>Threshold3 | Select most unrelated regions. Indicated by brown. |
|Add| Click add to save current manipulation in the sequence. |
## Training
Our implementation is based on Accelerate and Deepspeed.
Before starting a training, first collect data and organize your training dataset as follows:
```
[dataset_path]
βββ image_list.json # Optionally for image indexing
βββ color/ # Color images
β βββ 0001.zip
| | βββ 10001.png
| | βββ 100001.jpg
β | βββ ...
β βββ 0002.zip
β βββ ...
βββ sketch # Sketch images
β βββ 0001.zip
| | βββ 10001.png
| | βββ 100001.jpg
β | βββ ...
β βββ 0002.zip
β βββ ...
βββ mask # Mask images (required for fg-bg training)
βββ 0001.zip
| βββ 10001.png
| βββ 100001.jpg
| βββ ...
βββ 0002.zip
βββ ...
```
For details of dataset organization, check `data/dataloader.py`.
Training command example:
```
accelerate launch --config_file [accelerate_config_file] \
train.py \
--name base \
--dataroot [dataset_path] \
--batch_size 64 \
--num_threads 8 \
-cfg configs/train/sd2.1/mult.yaml \
-pt [pretrained_model_path]
```
Refer to `options.py` for training/inference/validation arguments.
Note that the `batch size` here is micro batch size per gpu. If you run the command on 8 gpus, the total batch size is 512.
## Code reference
1. [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)
2. [Stable Diffusion XL](https://github.com/Stability-AI/generative-models)
3. [SD-webui-ControlNet](https://github.com/Mikubill/sd-webui-controlnet)
4. [Stable-Diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
5. [K-diffusion](https://github.com/crowsonkb/k-diffusion)
6. [Deepspeed](https://github.com/microsoft/DeepSpeed)
7. [sketchKeras-PyTorch](https://github.com/higumax/sketchKeras-pytorch)
## Citation
```
@article{2024arXiv240101456Y,
author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru},
title = "{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}",
journal = {arXiv e-prints},
year = {2024},
doi = {10.48550/arXiv.2401.01456},
}
@InProceedings{Yan_2025_WACV,
author = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru},
title = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
year = {2025},
pages = {5092-5102}
}
@article{2025arXiv250219937Y,
author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian},
title = "{Image Referenced Sketch Colorization Based on Animation Creation Workflow}",
journal = {arXiv e-prints},
year = {2025},
doi = {10.48550/arXiv.2502.19937},
}
@article{yan2025colorizediffusionv2enhancingreferencebased,
title={ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities},
author={Dingkun Yan and Xinrui Wang and Yusuke Iwasawa and Yutaka Matsuo and Suguru Saito and Jiaxian Guo},
year={2025},
journal = {arXiv e-prints},
doi = {10.48550/arXiv.2504.06895},
} |