Spaces:
Runtime error
Runtime error
| ## ๐ฅ 1. We provide all the links of Sana pth and diffusers safetensor below | |
| | Model | Reso | pth link | diffusers | Precision | Description | | |
| |----------------------|--------|-----------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------| | |
| | Sana-0.6B | 512px | [Sana_600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px) | [Efficient-Large-Model/Sana_600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px_diffusers) | fp16/fp32 | Multi-Language | | |
| | Sana-0.6B | 1024px | [Sana_600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px) | [Efficient-Large-Model/Sana_600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers) | fp16/fp32 | Multi-Language | | |
| | Sana-1.6B | 512px | [Sana_1600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px) | [Efficient-Large-Model/Sana_1600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) | fp16/fp32 | - | | |
| | Sana-1.6B | 512px | [Sana_1600M_512px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing) | [Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers) | fp16/fp32 | Multi-Language | | |
| | Sana-1.6B | 1024px | [Sana_1600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px) | [Efficient-Large-Model/Sana_1600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers) | fp16/fp32 | - | | |
| | Sana-1.6B | 1024px | [Sana_1600M_1024px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing) | [Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | fp16/fp32 | Multi-Language | | |
| | Sana-1.6B | 1024px | [Sana_1600M_1024px_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) | [Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | |
| | Sana-1.6B | 1024px | - | [mit-han-lab/svdq-int4-sana-1600m](https://huggingface.co/mit-han-lab/svdq-int4-sana-1600m) | **int4** | Multi-Language | | |
| | Sana-1.6B | 2Kpx | [Sana_1600M_2Kpx_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16) | [Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | |
| | Sana-1.6B | 4Kpx | [Sana_1600M_4Kpx_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16) | [Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | |
| | Sana-1.6B | 4Kpx | [Sana_1600M_4Kpx_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16) | [Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | |
| | ControlNet | | | | | | | |
| | Sana-1.6B-ControlNet | 1Kpx | [Sana_1600M_1024px_BF16_ControlNet_HED](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_ControlNet_HED) | Coming soon | **bf16**/fp32 | Multi-Language | | |
| | Sana-0.6B-ControlNet | 1Kpx | [Sana_600M_1024px_ControlNet_HED](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_ControlNet_HED) | Coming soon | fp16/fp32 | - | | |
| ## โ 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference. | |
| ### We provide two samples to use fp16 and bf16 weights, respectively. | |
| โ๏ธMake sure to set `variant` and `torch_dtype` in diffusers pipelines to the desired precision. | |
| #### 1). For fp16 models | |
| ```python | |
| # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers | |
| import torch | |
| from diffusers import SanaPipeline | |
| pipe = SanaPipeline.from_pretrained( | |
| "Efficient-Large-Model/Sana_1600M_1024px_diffusers", | |
| variant="fp16", | |
| torch_dtype=torch.float16, | |
| ) | |
| pipe.to("cuda") | |
| pipe.vae.to(torch.bfloat16) | |
| pipe.text_encoder.to(torch.bfloat16) | |
| prompt = 'a cyberpunk cat with a neon sign that says "Sana"' | |
| image = pipe( | |
| prompt=prompt, | |
| height=1024, | |
| width=1024, | |
| guidance_scale=5.0, | |
| num_inference_steps=20, | |
| generator=torch.Generator(device="cuda").manual_seed(42), | |
| )[0] | |
| image[0].save("sana.png") | |
| ``` | |
| #### 2). For bf16 models | |
| ```python | |
| # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers | |
| import torch | |
| from diffusers import SanaPAGPipeline | |
| pipe = SanaPAGPipeline.from_pretrained( | |
| "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", | |
| variant="bf16", | |
| torch_dtype=torch.bfloat16, | |
| pag_applied_layers="transformer_blocks.8", | |
| ) | |
| pipe.to("cuda") | |
| pipe.text_encoder.to(torch.bfloat16) | |
| pipe.vae.to(torch.bfloat16) | |
| prompt = 'a cyberpunk cat with a neon sign that says "Sana"' | |
| image = pipe( | |
| prompt=prompt, | |
| guidance_scale=5.0, | |
| pag_scale=2.0, | |
| num_inference_steps=20, | |
| generator=torch.Generator(device="cuda").manual_seed(42), | |
| )[0] | |
| image[0].save('sana.png') | |
| ``` | |
| ## โ 3. 4K models | |
| 4K models need VAE tiling to avoid OOM issue.(16 GPU is recommended) | |
| ```python | |
| # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers | |
| import torch | |
| from diffusers import SanaPipeline | |
| pipe = SanaPipeline.from_pretrained( | |
| "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers", | |
| variant="bf16", | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| pipe.to("cuda") | |
| pipe.vae.to(torch.bfloat16) | |
| pipe.text_encoder.to(torch.bfloat16) | |
| # for 4096x4096 image generation OOM issue, feel free adjust the tile size | |
| if pipe.transformer.config.sample_size == 128: | |
| pipe.vae.enable_tiling( | |
| tile_sample_min_height=1024, | |
| tile_sample_min_width=1024, | |
| tile_sample_stride_height=896, | |
| tile_sample_stride_width=896, | |
| ) | |
| prompt = 'a cyberpunk cat with a neon sign that says "Sana"' | |
| image = pipe( | |
| prompt=prompt, | |
| height=4096, | |
| width=4096, | |
| guidance_scale=5.0, | |
| num_inference_steps=20, | |
| generator=torch.Generator(device="cuda").manual_seed(42), | |
| )[0] | |
| image[0].save("sana_4K.png") | |
| ``` | |
| ## โ 4. int4 inference | |
| This int4 model is quantized with [SVDQuant-Nunchaku](https://github.com/mit-han-lab/nunchaku). You need first follow the [guidance of installation](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation) of nunchaku engine, then you can use the following code snippet to perform inference with int4 Sana model. | |
| Here we show the code snippet for SanaPipeline. For SanaPAGPipeline, please refer to the [SanaPAGPipeline](https://github.com/mit-han-lab/nunchaku/blob/main/examples/sana_1600m_pag.py) section. | |
| ```python | |
| import torch | |
| from diffusers import SanaPipeline | |
| from nunchaku.models.transformer_sana import NunchakuSanaTransformer2DModel | |
| transformer = NunchakuSanaTransformer2DModel.from_pretrained("mit-han-lab/svdq-int4-sana-1600m") | |
| pipe = SanaPipeline.from_pretrained( | |
| "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", | |
| transformer=transformer, | |
| variant="bf16", | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| pipe.text_encoder.to(torch.bfloat16) | |
| pipe.vae.to(torch.bfloat16) | |
| image = pipe( | |
| prompt="A cute ๐ผ eating ๐, ink drawing style", | |
| height=1024, | |
| width=1024, | |
| guidance_scale=4.5, | |
| num_inference_steps=20, | |
| generator=torch.Generator().manual_seed(42), | |
| ).images[0] | |
| image.save("sana_1600m.png") | |
| ``` | |