Ghaser commited on
Commit
682380a
·
verified ·
1 Parent(s): 7069b3a

Upload folder using huggingface_hub

Browse files
visual_gen/dit/README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # PixWizard Model Card
8
+
9
+ ## Paper or resources for more information:
10
+ Paper: [https://arxiv.org/abs/2409.15278](https://arxiv.org/abs/2409.15278) \
11
+ Code: [https://github.com/AFeng-x/PixWizard](https://github.com/AFeng-x/PixWizard)
12
+
13
+
14
+ ## Citations
15
+ ```
16
+ @article{lin2024pixwizard,
17
+ title={PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions},
18
+ author={Lin, Weifeng and Wei, Xinyu and Zhang, Renrui and Zhuo, Le and Zhao, Shitian and Huang, Siyuan and Xie, Junlin and Qiao, Yu and Gao, Peng and Li, Hongsheng},
19
+ journal={arXiv preprint arXiv:2409.15278},
20
+ year={2024}
21
+ }
22
+ ```
visual_gen/dit/consolidated.00-of-01.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:927ec489b43235cdbeda6cb8fac5b1db26e3b0c76eb0011d04f3a9af0170994b
3
+ size 8046201485
visual_gen/dit/consolidated_ema.00-of-01.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85716aac6fe46893a55a80e1f6b152b94094c35f79fad240bab81f0829f9b163
3
+ size 8046205577
visual_gen/dit/gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
visual_gen/dit/model_args.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e07e0ff9a089bc12e827ff20c4653fd17596c301ae144c3d0110d34274f4da2
3
+ size 2284
visual_gen/sdxl-vae/README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - stable-diffusion
5
+ - stable-diffusion-diffusers
6
+ inference: false
7
+ ---
8
+ # SDXL - VAE
9
+
10
+ #### How to use with 🧨 diffusers
11
+ You can integrate this fine-tuned VAE decoder to your existing `diffusers` workflows, by including a `vae` argument to the `StableDiffusionPipeline`
12
+ ```py
13
+ from diffusers.models import AutoencoderKL
14
+ from diffusers import StableDiffusionPipeline
15
+
16
+ model = "stabilityai/your-stable-diffusion-model"
17
+ vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
18
+ pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
19
+ ```
20
+
21
+ ## Model
22
+ [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) is a [latent diffusion model](https://arxiv.org/abs/2112.10752), where the diffusion operates in a pretrained,
23
+ learned (and fixed) latent space of an autoencoder.
24
+ While the bulk of the semantic composition is done by the latent diffusion model,
25
+ we can improve _local_, high-frequency details in generated images by improving the quality of the autoencoder.
26
+ To this end, we train the same autoencoder architecture used for the original [Stable Diffusion](https://github.com/CompVis/stable-diffusion) at a larger batch-size (256 vs 9)
27
+ and additionally track the weights with an exponential moving average (EMA).
28
+ The resulting autoencoder outperforms the original model in all evaluated reconstruction metrics, see the table below.
29
+
30
+
31
+ ## Evaluation
32
+ _SDXL-VAE vs original kl-f8 VAE vs f8-ft-MSE_
33
+ ### COCO 2017 (256x256, val, 5000 images)
34
+ | Model | rFID | PSNR | SSIM | PSIM | Link | Comments
35
+ |----------|------|--------------|---------------|---------------|------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
36
+ | | | | | | | |
37
+ | SDXL-VAE | 4.42 | 24.7 +/- 3.9 | 0.73 +/- 0.13 | 0.88 +/- 0.27 | https://huggingface.co/stabilityai/sdxl-vae/blob/main/sdxl_vae.safetensors | as used in SDXL |
38
+ | original | 4.99 | 23.4 +/- 3.8 | 0.69 +/- 0.14 | 1.01 +/- 0.28 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip | as used in SD |
39
+ | ft-MSE | 4.70 | 24.5 +/- 3.7 | 0.71 +/- 0.13 | 0.92 +/- 0.27 | https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs |
visual_gen/sdxl-vae/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.18.0.dev0",
4
+ "_name_or_path": ".",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 1024,
24
+ "scaling_factor": 0.13025,
25
+ "up_block_types": [
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D",
29
+ "UpDecoderBlock2D"
30
+ ]
31
+ }
visual_gen/sdxl-vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1598f3d24932bcfe6634e8b618ea1e30ab1d57f5aad13a6d2de446d2199f2341
3
+ size 334643268
visual_gen/sdxl-vae/gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text