Javad Taghia
commited on
Commit
·
702ffd2
1
Parent(s):
3d7f7af
model card update
Browse files
README.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# dee-unlearning-tiny-sd
|
| 2 |
+
|
| 3 |
+
**Model family:** Stable Diffusion | **Base:** SG161222/Realistic_Vision_V4.0 (Diffusers 0.19.0.dev0)
|
| 4 |
+
|
| 5 |
+
This repository packages the inference components (VAE, UNet, tokenizer, text encoder, scheduler config) that instantiate a `StableDiffusionPipeline` tuned for lightweight experimentation with deep unlearning ideas. All large binaries are stored under Git LFS (`*.bin` and other model artifact extensions as configured in `.gitattributes`).
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Model summary
|
| 10 |
+
|
| 11 |
+
- **Architecture:** `StableDiffusionPipeline` with `UNet2DConditionModel`, `CLIPTextModel`, `AutoencoderKL`, and `DPMSolverMultistepScheduler`.
|
| 12 |
+
- **Scheduler:** DPMSolver++ (multistep) configured with `num_train_timesteps=1000`, `steps_offset=1`, and the default `epsilon` prediction type that aligns with the diffusion formulation used in Realistic Vision.
|
| 13 |
+
- **Intended behavior:** Generate photorealistic samples guided by text prompts. The “tiny” name reflects a focus on a compact deployment bundle rather than a new generative architecture.
|
| 14 |
+
|
| 15 |
+
## Usage
|
| 16 |
+
|
| 17 |
+
1. Install dependencies (tested with `diffusers==0.19.0.dev0`, `transformers`, `torch`, `accelerate`, `safetensors`).
|
| 18 |
+
2. Load the pipeline with the provided components.
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
from diffusers import StableDiffusionPipeline
|
| 22 |
+
from transformers import CLIPTokenizer, CLIPTextModel
|
| 23 |
+
from diffusers import UNet2DConditionModel, AutoencoderKL, DPMSolverMultistepScheduler
|
| 24 |
+
|
| 25 |
+
pipeline = StableDiffusionPipeline(
|
| 26 |
+
text_encoder=CLIPTextModel.from_pretrained("path/to/text_encoder"),
|
| 27 |
+
tokenizer=CLIPTokenizer.from_pretrained("path/to/tokenizer"),
|
| 28 |
+
unet=UNet2DConditionModel.from_pretrained("path/to/unet"),
|
| 29 |
+
vae=AutoencoderKL.from_pretrained("path/to/vae"),
|
| 30 |
+
scheduler=DPMSolverMultistepScheduler.from_config("path/to/scheduler"),
|
| 31 |
+
)
|
| 32 |
+
pipeline.to("cuda")
|
| 33 |
+
prompt = "A cinematic portrait of a futuristic astronaut exploring a coral reef"
|
| 34 |
+
with torch.autocast("cuda"):
|
| 35 |
+
image = pipeline(prompt, num_inference_steps=25, guidance_scale=7.5).images[0]
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
Replace each `from_pretrained` call with the relative path inside this repository (e.g., `"text_encoder"`). Exported weights follow the standard Diffusers layout, so you can also load the entire pipeline from disk with `StableDiffusionPipeline.load_from_directory(...)` if you prefer a single root.
|
| 39 |
+
|
| 40 |
+
## Known limitations
|
| 41 |
+
|
| 42 |
+
- Not evaluated on a public benchmark: quality, bias, and safety metrics are unknown beyond the original Realistic Vision baseline.
|
| 43 |
+
- Outputs inherit the biases of the base dataset, which can include underrepresentation of marginalized groups and the tendency to hallucinate architecture or people.
|
| 44 |
+
- Prompts that contradict physics, are highly abstract, or request disallowed content may fail or produce unpredictable imagery.
|
| 45 |
+
- Fine-tuning past the provided weights may require additional safety mitigations depending on your dataset.
|
| 46 |
+
|
| 47 |
+
## Opportunities
|
| 48 |
+
|
| 49 |
+
1. **Research experimentation:** Use this compact bundle to investigate targeted unlearning strategies or dataset pruning without re-downloading massive checkpoints.
|
| 50 |
+
2. **Edge deployment:** Swap in a smaller scheduler or reduce `num_inference_steps` to explore speed/quality trade-offs for on-device sampling.
|
| 51 |
+
3. **Controlled generation:** Attach additional conditioning (CLIP embeddings, ControlNet) to the pipeline for downstream applications such as assistive art tools, conditional rendering, or creative assistants.
|
| 52 |
+
|
| 53 |
+
## Safety considerations
|
| 54 |
+
|
| 55 |
+
- Follow established safety best practices when generating faces, political imagery, or NSFW prompts; the pipeline does not include a safety checker.
|
| 56 |
+
- Monitor outputs for deceptive or fabricated content before deployment in public-facing products.
|
| 57 |
+
- Don’t use the model to impersonate real people, create harmful memes, or automate disinformation campaigns.
|
| 58 |
+
|
| 59 |
+
## Attribution & licensing
|
| 60 |
+
|
| 61 |
+
This work builds on the `SG161222/Realistic_Vision_V4.0` checkpoints and the Diffusers ecosystem. Verify and comply with the upstream license before redistributing or fine-tuning the weights.
|