redhat-dog-sd3 / README.md

Update README.md

680e889 verified 4 months ago

3.93 kB

	---
	license: other
	base_model: stabilityai/stable-diffusion-3.5-medium
	tags:
	- stable-diffusion
	- stable-diffusion-diffusers
	- text-to-image
	- diffusers
	- dreambooth
	- redhat
	- corporate-branding
	- fine-tuned
	library_name: diffusers
	pipeline_tag: text-to-image
	---

	# RedHat Dog SD3 - Fine-tuned Stable Diffusion 3.5 Model

	## Model Description

	This is a fine-tuned version of [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) trained using the Dreambooth technique to generate images of a specific Red Hat branded dog character ("rhteddy").

	## Model Details

	- Base Model: stabilityai/stable-diffusion-3.5-medium
	- Fine-tuning Method: Dreambooth
	- Training Data: 5-10 images of Red Hat dog character
	- Training Steps: 800 steps
	- Resolution: 512x512 pixels
	- Hardware: NVIDIA L40S GPU (40GB memory)

	## Intended Use

	This model is designed for:
	- Generating images of the Red Hat dog character in various contexts
	- Educational demonstrations of Dreambooth fine-tuning
	- Corporate branding and marketing content creation
	- Research into personalized diffusion models

	## Example

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipeline = DiffusionPipeline.from_pretrained(
	"cfchase/redhat-dog-sd3",
	torch_dtype=torch.bfloat16
	)

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	pipeline.to(device)

	# Generate an image
	image = pipeline("photo of a rhteddy dog in a park").images[0]
	image.save("redhat_dog_park.png")
	```

	### Recommended Prompts

	The model works best with prompts that include the trigger phrase `rhteddy dog`:

	- `"photo of a rhteddy dog"`
	- `"rhteddy dog sitting in an office"`
	- `"rhteddy dog wearing a Red Hat"`
	- `"rhteddy dog in a technology conference"`

	## Training Details

	### Training Configuration

	- Instance Prompt: "photo of a rhteddy dog"
	- Class Prompt: "a photo of dog"
	- Learning Rate: 5e-6
	- Batch Size: 1
	- Gradient Accumulation Steps: 2
	- Optimizer: 8-bit Adam
	- Scheduler: Constant
	- Prior Preservation: Enabled with 200 class images

	### Training Environment

	- Platform: Red Hat OpenShift AI (RHOAI)
	- Framework: Hugging Face Diffusers
	- Acceleration: xFormers, gradient checkpointing

	## Model Architecture

	This model inherits the architecture of Stable Diffusion 3.5 Medium:
	- Transformer: SD3Transformer2DModel
	- VAE: AutoencoderKL
	- Text Encoders:
	- 2x CLIPTextModelWithProjection
	- 1x T5EncoderModel
	- Scheduler: FlowMatchEulerDiscreteScheduler

	## Limitations and Bias

	- The model is specifically trained on Red Hat branded imagery and may not generalize well to other contexts
	- Training data was limited to a small dataset, which may result in overfitting
	- The model inherits any biases present in the base Stable Diffusion 3.5 model
	- Performance is optimized for the specific "rhteddy dog" concept and may struggle with significant variations

	## Training Data

	The training data consists of approximately 5-10 high-quality images of the Red Hat dog character, featuring:
	- Various poses and angles
	- Consistent visual style and branding
	- Professional photography quality
	- Clear subject focus

	## Technical Specifications

	- Model Size: ~47GB (full precision weights)
	- Inference Requirements:
	- GPU with 8GB+ VRAM recommended
	- CUDA-compatible device
	- Python 3.8+
	- PyTorch 2.0+
	- Diffusers library

	## License

	This model is based on Stable Diffusion 3.5 Medium and is subject to the same licensing terms. Please refer to the [original model license](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) for details.

	## Contact

	For questions about this model or the training process, please refer to the [Red Hat OpenShift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed) or the associated training notebooks.