File size: 2,163 Bytes
f8d01c5
 
5ce960d
 
 
 
 
 
 
d9e364d
f8d01c5
 
5ce960d
f8d01c5
5ce960d
f8d01c5
d2910c6
f8d01c5
5ce960d
f8d01c5
5ce960d
 
 
 
 
f8d01c5
5ce960d
f8d01c5
5ce960d
 
be1c2f7
f8d01c5
5ce960d
 
f8d01c5
be1c2f7
 
 
 
 
f8d01c5
994cd2b
be1c2f7
 
 
f8d01c5
5ce960d
be1c2f7
 
 
 
f8d01c5
5ce960d
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
library_name: diffusers
license: apache-2.0
datasets:
- laion/relaion400m
base_model:
- black-forest-labs/FLUX.2-dev
tags:
- tae
- taef2
---

# About

Tiny AutoEncoder trained on the latent space of [black-forest-labs/FLUX.2-dev](https://huggingface.co/black-forest-labs/FLUX.2-dev)'s autoencoder. Works to convert between latent and image space up to 20x faster and in 28x fewer parameters at the expense of a small amount of quality. 

Code for this model is available [here](https://huggingface.co/fal/FLUX.2-Tiny-AutoEncoder-FlashPack/blob/main/flux2_tiny_autoencoder.py). Requires [flashpack](https://github.com/fal-ai/flashpack).

# Round-Trip Comparisons

| Source | Image |
| ------ | ----- | 
| https://www.pexels.com/photo/mirror-lying-on-open-book-11495792/ | ![compare_autoencoders_1](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/u7ZnjY8FAwu09-iyEC_um.png) |
| https://www.pexels.com/photo/brown-hummingbird-selective-focus-photography-1133957/ | ![compare_autoencoders_2](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/ZzvJu3VfrzlvZ7bDDASog.png) |
| https://www.pexels.com/photo/person-with-body-painting-1209843/ | ![compare_autoencoders_3](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/B56LPhLYiGT0ffnBVIRbP.png) |

# Usage

```py
import torch
import torchvision.transforms.functional as F

from PIL import Image
from flux2_tiny_autoencoder import Flux2TinyAutoEncoder

device = torch.device("cuda")
tiny_vae = Flux2TinyAutoEncoder.from_pretrained_flashpack(
    "fal/FLUX.2-Tiny-AutoEncoder-FlashPack",
    device=device,
)

pil_image = Image.open("/path/to/image.png")
image_tensor = F.to_tensor(pil_image)
image_tensor = image_tensor.unsqueeze(0) * 2.0 - 1.0
image_tensor = image_tensor.to(device, dtype=tiny_vae.dtype)

with torch.inference_mode():
    latents = tiny_vae.encode(image_tensor, return_dict=False)
    recon = tiny_vae.decode(latents, return_dict=False)
    recon = recon.squeeze(0).clamp(-1, 1) / 2.0 + 0.5
    recon = recon.float().detach().cpu()

recon_image = F.to_pil_image(recon)
recon_image.save("reconstituted.png")
```