|
|
--- |
|
|
license: other |
|
|
base_model: stabilityai/stable-diffusion-3.5-medium |
|
|
tags: |
|
|
- stable-diffusion |
|
|
- stable-diffusion-diffusers |
|
|
- text-to-image |
|
|
- diffusers |
|
|
- dreambooth |
|
|
- redhat |
|
|
- corporate-branding |
|
|
- fine-tuned |
|
|
library_name: diffusers |
|
|
pipeline_tag: text-to-image |
|
|
--- |
|
|
|
|
|
# RedHat Dog SD3 - Fine-tuned Stable Diffusion 3.5 Model |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a fine-tuned version of [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) trained using the Dreambooth technique to generate images of a specific Red Hat branded dog character ("rhteddy"). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: stabilityai/stable-diffusion-3.5-medium |
|
|
- **Fine-tuning Method**: Dreambooth |
|
|
- **Training Data**: 5-10 images of Red Hat dog character |
|
|
- **Training Steps**: 800 steps |
|
|
- **Resolution**: 512x512 pixels |
|
|
- **Hardware**: NVIDIA L40S GPU (40GB memory) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is designed for: |
|
|
- Generating images of the Red Hat dog character in various contexts |
|
|
- Educational demonstrations of Dreambooth fine-tuning |
|
|
- Corporate branding and marketing content creation |
|
|
- Research into personalized diffusion models |
|
|
|
|
|
## Example |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import DiffusionPipeline |
|
|
|
|
|
pipeline = DiffusionPipeline.from_pretrained( |
|
|
"cfchase/redhat-dog-sd3", |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
pipeline.to(device) |
|
|
|
|
|
# Generate an image |
|
|
image = pipeline("photo of a rhteddy dog in a park").images[0] |
|
|
image.save("redhat_dog_park.png") |
|
|
``` |
|
|
|
|
|
### Recommended Prompts |
|
|
|
|
|
The model works best with prompts that include the trigger phrase `rhteddy dog`: |
|
|
|
|
|
- `"photo of a rhteddy dog"` |
|
|
- `"rhteddy dog sitting in an office"` |
|
|
- `"rhteddy dog wearing a Red Hat"` |
|
|
- `"rhteddy dog in a technology conference"` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
- **Instance Prompt**: "photo of a rhteddy dog" |
|
|
- **Class Prompt**: "a photo of dog" |
|
|
- **Learning Rate**: 5e-6 |
|
|
- **Batch Size**: 1 |
|
|
- **Gradient Accumulation Steps**: 2 |
|
|
- **Optimizer**: 8-bit Adam |
|
|
- **Scheduler**: Constant |
|
|
- **Prior Preservation**: Enabled with 200 class images |
|
|
|
|
|
### Training Environment |
|
|
|
|
|
- **Platform**: Red Hat OpenShift AI (RHOAI) |
|
|
- **Framework**: Hugging Face Diffusers |
|
|
- **Acceleration**: xFormers, gradient checkpointing |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
This model inherits the architecture of Stable Diffusion 3.5 Medium: |
|
|
- **Transformer**: SD3Transformer2DModel |
|
|
- **VAE**: AutoencoderKL |
|
|
- **Text Encoders**: |
|
|
- 2x CLIPTextModelWithProjection |
|
|
- 1x T5EncoderModel |
|
|
- **Scheduler**: FlowMatchEulerDiscreteScheduler |
|
|
|
|
|
## Limitations and Bias |
|
|
|
|
|
- The model is specifically trained on Red Hat branded imagery and may not generalize well to other contexts |
|
|
- Training data was limited to a small dataset, which may result in overfitting |
|
|
- The model inherits any biases present in the base Stable Diffusion 3.5 model |
|
|
- Performance is optimized for the specific "rhteddy dog" concept and may struggle with significant variations |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The training data consists of approximately 5-10 high-quality images of the Red Hat dog character, featuring: |
|
|
- Various poses and angles |
|
|
- Consistent visual style and branding |
|
|
- Professional photography quality |
|
|
- Clear subject focus |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- **Model Size**: ~47GB (full precision weights) |
|
|
- **Inference Requirements**: |
|
|
- GPU with 8GB+ VRAM recommended |
|
|
- CUDA-compatible device |
|
|
- Python 3.8+ |
|
|
- PyTorch 2.0+ |
|
|
- Diffusers library |
|
|
|
|
|
## License |
|
|
|
|
|
This model is based on Stable Diffusion 3.5 Medium and is subject to the same licensing terms. Please refer to the [original model license](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) for details. |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions about this model or the training process, please refer to the [Red Hat OpenShift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed) or the associated training notebooks. |