Z-Image-Turbo

Running on Zero

App Files Files Community

Z-Image-Turbo / CLAUDE.md

tchung1970

Add Korean localization and CLAUDE.md documentation

47e50c0 9 days ago

preview code

raw

history blame contribute delete

3.91 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Overview

	Z-Image-Turbo is a Gradio-based Hugging Face Space for image generation using the Z-Image diffusion transformer model. It provides a web interface for text-to-image generation with optional prompt enhancement via API.

	## Running the Application

	Start the Gradio app:
	```bash
	python app.py
	```

	The app will launch with MCP server support enabled and be accessible via the Gradio interface.

	## Environment Variables

	Required environment variables (set these before running):

	- `MODEL_PATH`: Path or HF model ID (default: "Tongyi-MAI/Z-Image-Turbo")
	- `HF_TOKEN`: Hugging Face token for model access
	- `DASHSCOPE_API_KEY`: Optional, for prompt enhancement feature (currently disabled in UI)
	- `ENABLE_COMPILE`: Enable torch.compile optimizations (default: "true")
	- `ENABLE_WARMUP`: Warmup model on startup (default: "true")
	- `ATTENTION_BACKEND`: Attention implementation (default: "flash_3")

	## Architecture

	### Core Components

	app.py - Main application file containing:
	- Model loading and initialization (`load_models`, `init_app`)
	- Image generation pipeline using ZImagePipeline from diffusers
	- Gradio UI with resolution presets and generation controls
	- Optional prompt enhancement via DashScope API (currently disabled in UI)
	- Zero GPU integration with AoTI (Ahead of Time Inductor) compilation

	pe.py - Contains `prompt_template` for the prompt expander, a Chinese language system prompt that guides LLMs to transform user prompts into detailed visual descriptions suitable for image generation models.

	### Key Functions

	`generate(prompt, resolution, seed, steps, shift, enhance, random_seed, gallery_images, progress)` (app.py:366)
	- Main generation function decorated with `@spaces.GPU`
	- Processes prompt, applies settings, generates image
	- Returns updated gallery, seed used
	- The `enhance` parameter is currently disabled in the UI but functional in code

	`load_models(model_path, enable_compile, attention_backend)` (app.py:100)
	- Loads VAE, text encoder, tokenizer, and transformer
	- Applies torch.compile optimizations if enabled
	- Configures attention backend (native/flash_3)

	`warmup_model(pipe, resolutions)` (app.py:205)
	- Pre-warms model for all resolution configurations
	- Reduces first-generation latency

	### Resolution System

	The app supports two resolution categories (1024 and 1280) with multiple aspect ratios:
	- 1:1, 9:7, 7:9, 4:3, 3:4, 3:2, 2:3, 16:9, 9:16, 21:9, 9:21
	- Resolutions are stored in `RES_CHOICES` dict and parsed via `get_resolution()`

	### Prompt Enhancement (Currently Disabled)

	The `PromptExpander` and `APIPromptExpander` classes provide optional prompt enhancement via DashScope API:
	- Backend: OpenAI-compatible API at dashscope.aliyuncs.com
	- Model: qwen3-max-preview
	- System prompt from `pe.prompt_template` guides detailed visual description generation
	- UI controls are commented out but underlying code is functional

	## Dependencies

	Install via:
	```bash
	pip install -r requirements.txt
	```

	Key dependencies:
	- gradio (UI framework)
	- torch, transformers, diffusers (ML models)
	- spaces (Hugging Face Spaces integration)
	- openai (for optional prompt enhancement)
	- Custom diffusers fork from GitHub with Z-Image support

	## Model Details

	- Architecture: Single-stream diffusion transformer (Z-Image)
	- Scheduler: FlowMatchEulerDiscreteScheduler with configurable shift parameter
	- Precision: bfloat16
	- Device: CUDA required
	- Attention: Configurable backend (native or flash_3)

	## Zero GPU Integration

	The app uses Hugging Face Spaces Zero GPU features:
	- `@spaces.GPU` decorator on generate function
	- AoTI (Ahead of Time Inductor) compilation for transformer blocks (app.py:458-459)
	- Pre-compiled blocks loaded from "zerogpu-aoti/Z-Image" with flash_attention_3 variant