Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,14 @@ datasets:
|
|
| 9 |
|
| 10 |
# Versatile Diffusion (v1.0, four-flow)
|
| 11 |
|
| 12 |
-
We built **Versatile Diffusion (VD), the first unified multi-flow multimodal diffusion framework**, as a step towards **Universal Generative AI**. Versatile Diffusion can natively support image-to-text, image-variation, text-to-image, and text-variation, and can be further extended to other applications such as semantic-style disentanglement, image-text dual-guided generation, latent image-to-text-to-image editing, and more. Future versions will support more modalities such as speech, music, video and 3D.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
# Versatile Diffusion (v1.0, four-flow)
|
| 11 |
|
| 12 |
+
We built **Versatile Diffusion (VD), the first unified multi-flow multimodal diffusion framework**, as a step towards **Universal Generative AI**. Versatile Diffusion can natively support image-to-text, image-variation, text-to-image, and text-variation, and can be further extended to other applications such as semantic-style disentanglement, image-text dual-guided generation, latent image-to-text-to-image editing, and more. Future versions will support more modalities such as speech, music, video and 3D.
|
| 13 |
+
|
| 14 |
+
# Model Description
|
| 15 |
+
|
| 16 |
+
One single flow of Versatile Diffusion contains a VAE, a diffuser, and a context encoder, and thus handles one task (e.g., text-to-image) under one data type (e.g., image) and one context type (e.g., text). The multi-flow structure of Versatile Diffusion shows in the following diagram:
|
| 17 |
+
|
| 18 |
+
<p align="center">
|
| 19 |
+
<img src="assets/figures/VD_framework.png" width="99%">
|
| 20 |
+
</p>
|
| 21 |
+
|
| 22 |
+
# Intended uses & limitations
|