keras
/

stable_diffusion_3_medium

@@ -6,34 +6,41 @@ tags:
 - keras
 ---
 ### Model Overview
-# Stable Diffusion 3 Medium
-![demo](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3demo.jpg)
-## Model
-![mmdit](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/mmdit.png)
 [Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
 For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
 Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
-### Model Description
 - **Developed by:** Stability AI
 - **Model type:** MMDiT text-to-image generative model
-- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
-(https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
 ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
-### Model card
-https://huggingface.co/stabilityai/stable-diffusion-3-medium
 ## Example Usage
 ```python
 # Pretrained Stable Diffusion 3 model.
 model = keras_hub.models.StableDiffusion3Backbone.from_preset(
     "stable_diffusion_3_medium"
@@ -156,6 +163,10 @@ text_to_image.generate(
 ## Example Usage with Hugging Face URI
 ```python
 # Pretrained Stable Diffusion 3 model.
 model = keras_hub.models.StableDiffusion3Backbone.from_preset(
     "hf://keras/stable_diffusion_3_medium"

 - keras
 ---
 ### Model Overview
 [Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
 For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
 Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
+## Links
+* [SD3 Quickstart Notebook Text-to-image](https://www.kaggle.com/code/laxmareddypatlolla/stablediffusion3-quickstart-notebook)
+* [SD3 Quickstart Notebook Image-to-image](https://colab.sandbox.google.com/gist/laxmareddyp/46de6fbb274b12e8515c7bc55dfc5c57/stable-diffusion-3.ipynb)
+* [SD3 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
+* [SD3 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
+* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Presets
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset name                            | Parameters | Description                                                                                                  |
+|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
+| stable_diffusion_3_medium | 2.99B     | 3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI. |
 - **Developed by:** Stability AI
 - **Model type:** MMDiT text-to-image generative model
+- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
 ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
 ## Example Usage
 ```python
+!pip install -U keras-hub
+!pip install -U keras
+```
+```
 # Pretrained Stable Diffusion 3 model.
 model = keras_hub.models.StableDiffusion3Backbone.from_preset(
     "stable_diffusion_3_medium"
 ## Example Usage with Hugging Face URI
 ```python
+!pip install -U keras-hub
+!pip install -U keras
+```
+```
 # Pretrained Stable Diffusion 3 model.
 model = keras_hub.models.StableDiffusion3Backbone.from_preset(
     "hf://keras/stable_diffusion_3_medium"