File size: 3,905 Bytes
47e50c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

Z-Image-Turbo is a Gradio-based Hugging Face Space for image generation using the Z-Image diffusion transformer model. It provides a web interface for text-to-image generation with optional prompt enhancement via API.

## Running the Application

**Start the Gradio app:**
```bash
python app.py
```

The app will launch with MCP server support enabled and be accessible via the Gradio interface.

## Environment Variables

Required environment variables (set these before running):

- `MODEL_PATH`: Path or HF model ID (default: "Tongyi-MAI/Z-Image-Turbo")
- `HF_TOKEN`: Hugging Face token for model access
- `DASHSCOPE_API_KEY`: Optional, for prompt enhancement feature (currently disabled in UI)
- `ENABLE_COMPILE`: Enable torch.compile optimizations (default: "true")
- `ENABLE_WARMUP`: Warmup model on startup (default: "true")
- `ATTENTION_BACKEND`: Attention implementation (default: "flash_3")

## Architecture

### Core Components

**app.py** - Main application file containing:
- Model loading and initialization (`load_models`, `init_app`)
- Image generation pipeline using ZImagePipeline from diffusers
- Gradio UI with resolution presets and generation controls
- Optional prompt enhancement via DashScope API (currently disabled in UI)
- Zero GPU integration with AoTI (Ahead of Time Inductor) compilation

**pe.py** - Contains `prompt_template` for the prompt expander, a Chinese language system prompt that guides LLMs to transform user prompts into detailed visual descriptions suitable for image generation models.

### Key Functions

**`generate(prompt, resolution, seed, steps, shift, enhance, random_seed, gallery_images, progress)`** (app.py:366)
- Main generation function decorated with `@spaces.GPU`
- Processes prompt, applies settings, generates image
- Returns updated gallery, seed used
- The `enhance` parameter is currently disabled in the UI but functional in code

**`load_models(model_path, enable_compile, attention_backend)`** (app.py:100)
- Loads VAE, text encoder, tokenizer, and transformer
- Applies torch.compile optimizations if enabled
- Configures attention backend (native/flash_3)

**`warmup_model(pipe, resolutions)`** (app.py:205)
- Pre-warms model for all resolution configurations
- Reduces first-generation latency

### Resolution System

The app supports two resolution categories (1024 and 1280) with multiple aspect ratios:
- 1:1, 9:7, 7:9, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16, 21:9, 9:21
- Resolutions are stored in `RES_CHOICES` dict and parsed via `get_resolution()`

### Prompt Enhancement (Currently Disabled)

The `PromptExpander` and `APIPromptExpander` classes provide optional prompt enhancement via DashScope API:
- Backend: OpenAI-compatible API at dashscope.aliyuncs.com
- Model: qwen3-max-preview
- System prompt from `pe.prompt_template` guides detailed visual description generation
- UI controls are commented out but underlying code is functional

## Dependencies

Install via:
```bash
pip install -r requirements.txt
```

Key dependencies:
- gradio (UI framework)
- torch, transformers, diffusers (ML models)
- spaces (Hugging Face Spaces integration)
- openai (for optional prompt enhancement)
- Custom diffusers fork from GitHub with Z-Image support

## Model Details

- Architecture: Single-stream diffusion transformer (Z-Image)
- Scheduler: FlowMatchEulerDiscreteScheduler with configurable shift parameter
- Precision: bfloat16
- Device: CUDA required
- Attention: Configurable backend (native or flash_3)

## Zero GPU Integration

The app uses Hugging Face Spaces Zero GPU features:
- `@spaces.GPU` decorator on generate function
- AoTI (Ahead of Time Inductor) compilation for transformer blocks (app.py:458-459)
- Pre-compiled blocks loaded from "zerogpu-aoti/Z-Image" with flash_attention_3 variant