Spaces:
Running
on
Zero
Running
on
Zero
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Overview | |
| Z-Image-Turbo is a Gradio-based Hugging Face Space for image generation using the Z-Image diffusion transformer model. It provides a web interface for text-to-image generation with optional prompt enhancement via API. | |
| ## Running the Application | |
| **Start the Gradio app:** | |
| ```bash | |
| python app.py | |
| ``` | |
| The app will launch with MCP server support enabled and be accessible via the Gradio interface. | |
| ## Environment Variables | |
| Required environment variables (set these before running): | |
| - `MODEL_PATH`: Path or HF model ID (default: "Tongyi-MAI/Z-Image-Turbo") | |
| - `HF_TOKEN`: Hugging Face token for model access | |
| - `DASHSCOPE_API_KEY`: Optional, for prompt enhancement feature (currently disabled in UI) | |
| - `ENABLE_COMPILE`: Enable torch.compile optimizations (default: "true") | |
| - `ENABLE_WARMUP`: Warmup model on startup (default: "true") | |
| - `ATTENTION_BACKEND`: Attention implementation (default: "flash_3") | |
| ## Architecture | |
| ### Core Components | |
| **app.py** - Main application file containing: | |
| - Model loading and initialization (`load_models`, `init_app`) | |
| - Image generation pipeline using ZImagePipeline from diffusers | |
| - Gradio UI with resolution presets and generation controls | |
| - Optional prompt enhancement via DashScope API (currently disabled in UI) | |
| - Zero GPU integration with AoTI (Ahead of Time Inductor) compilation | |
| **pe.py** - Contains `prompt_template` for the prompt expander, a Chinese language system prompt that guides LLMs to transform user prompts into detailed visual descriptions suitable for image generation models. | |
| ### Key Functions | |
| **`generate(prompt, resolution, seed, steps, shift, enhance, random_seed, gallery_images, progress)`** (app.py:366) | |
| - Main generation function decorated with `@spaces.GPU` | |
| - Processes prompt, applies settings, generates image | |
| - Returns updated gallery, seed used | |
| - The `enhance` parameter is currently disabled in the UI but functional in code | |
| **`load_models(model_path, enable_compile, attention_backend)`** (app.py:100) | |
| - Loads VAE, text encoder, tokenizer, and transformer | |
| - Applies torch.compile optimizations if enabled | |
| - Configures attention backend (native/flash_3) | |
| **`warmup_model(pipe, resolutions)`** (app.py:205) | |
| - Pre-warms model for all resolution configurations | |
| - Reduces first-generation latency | |
| ### Resolution System | |
| The app supports two resolution categories (1024 and 1280) with multiple aspect ratios: | |
| - 1:1, 9:7, 7:9, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16, 21:9, 9:21 | |
| - Resolutions are stored in `RES_CHOICES` dict and parsed via `get_resolution()` | |
| ### Prompt Enhancement (Currently Disabled) | |
| The `PromptExpander` and `APIPromptExpander` classes provide optional prompt enhancement via DashScope API: | |
| - Backend: OpenAI-compatible API at dashscope.aliyuncs.com | |
| - Model: qwen3-max-preview | |
| - System prompt from `pe.prompt_template` guides detailed visual description generation | |
| - UI controls are commented out but underlying code is functional | |
| ## Dependencies | |
| Install via: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Key dependencies: | |
| - gradio (UI framework) | |
| - torch, transformers, diffusers (ML models) | |
| - spaces (Hugging Face Spaces integration) | |
| - openai (for optional prompt enhancement) | |
| - Custom diffusers fork from GitHub with Z-Image support | |
| ## Model Details | |
| - Architecture: Single-stream diffusion transformer (Z-Image) | |
| - Scheduler: FlowMatchEulerDiscreteScheduler with configurable shift parameter | |
| - Precision: bfloat16 | |
| - Device: CUDA required | |
| - Attention: Configurable backend (native or flash_3) | |
| ## Zero GPU Integration | |
| The app uses Hugging Face Spaces Zero GPU features: | |
| - `@spaces.GPU` decorator on generate function | |
| - AoTI (Ahead of Time Inductor) compilation for transformer blocks (app.py:458-459) | |
| - Pre-compiled blocks loaded from "zerogpu-aoti/Z-Image" with flash_attention_3 variant | |