Update README.md with new model card content
Browse files
README.md
CHANGED
|
@@ -6,34 +6,41 @@ tags:
|
|
| 6 |
- keras
|
| 7 |
---
|
| 8 |
### Model Overview
|
| 9 |
-
# Stable Diffusion 3 Medium
|
| 10 |
-

|
| 11 |
-
|
| 12 |
-
## Model
|
| 13 |
-
|
| 14 |
-

|
| 15 |
-
|
| 16 |
[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
|
| 17 |
|
| 18 |
For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
|
| 19 |
|
| 20 |
Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
### Model Description
|
| 25 |
|
| 26 |
- **Developed by:** Stability AI
|
| 27 |
- **Model type:** MMDiT text-to-image generative model
|
| 28 |
-
- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
|
| 29 |
-
(https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
|
| 30 |
([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
|
| 31 |
|
| 32 |
-
### Model card
|
| 33 |
-
https://huggingface.co/stabilityai/stable-diffusion-3-medium
|
| 34 |
-
|
| 35 |
## Example Usage
|
| 36 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
# Pretrained Stable Diffusion 3 model.
|
| 38 |
model = keras_hub.models.StableDiffusion3Backbone.from_preset(
|
| 39 |
"stable_diffusion_3_medium"
|
|
@@ -156,6 +163,10 @@ text_to_image.generate(
|
|
| 156 |
## Example Usage with Hugging Face URI
|
| 157 |
|
| 158 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
# Pretrained Stable Diffusion 3 model.
|
| 160 |
model = keras_hub.models.StableDiffusion3Backbone.from_preset(
|
| 161 |
"hf://keras/stable_diffusion_3_medium"
|
|
|
|
| 6 |
- keras
|
| 7 |
---
|
| 8 |
### Model Overview
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
|
| 10 |
|
| 11 |
For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
|
| 12 |
|
| 13 |
Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
|
| 14 |
|
| 15 |
+
## Links
|
| 16 |
+
|
| 17 |
+
* [SD3 Quickstart Notebook Text-to-image](https://www.kaggle.com/code/laxmareddypatlolla/stablediffusion3-quickstart-notebook)
|
| 18 |
+
* [SD3 Quickstart Notebook Image-to-image](https://colab.sandbox.google.com/gist/laxmareddyp/46de6fbb274b12e8515c7bc55dfc5c57/stable-diffusion-3.ipynb)
|
| 19 |
+
* [SD3 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
|
| 20 |
+
* [SD3 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
|
| 21 |
+
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
|
| 22 |
+
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
|
| 23 |
+
|
| 24 |
+
## Presets
|
| 25 |
+
|
| 26 |
+
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
|
| 27 |
|
| 28 |
+
| Preset name | Parameters | Description |
|
| 29 |
+
|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
|
| 30 |
+
| stable_diffusion_3_medium | 2.99B | 3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI. |
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
- **Developed by:** Stability AI
|
| 34 |
- **Model type:** MMDiT text-to-image generative model
|
| 35 |
+
- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
|
|
|
|
| 36 |
([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
|
| 37 |
|
|
|
|
|
|
|
|
|
|
| 38 |
## Example Usage
|
| 39 |
```python
|
| 40 |
+
!pip install -U keras-hub
|
| 41 |
+
!pip install -U keras
|
| 42 |
+
```
|
| 43 |
+
```
|
| 44 |
# Pretrained Stable Diffusion 3 model.
|
| 45 |
model = keras_hub.models.StableDiffusion3Backbone.from_preset(
|
| 46 |
"stable_diffusion_3_medium"
|
|
|
|
| 163 |
## Example Usage with Hugging Face URI
|
| 164 |
|
| 165 |
```python
|
| 166 |
+
!pip install -U keras-hub
|
| 167 |
+
!pip install -U keras
|
| 168 |
+
```
|
| 169 |
+
```
|
| 170 |
# Pretrained Stable Diffusion 3 model.
|
| 171 |
model = keras_hub.models.StableDiffusion3Backbone.from_preset(
|
| 172 |
"hf://keras/stable_diffusion_3_medium"
|