Spaces:
Running
on
Zero
Running
on
Zero
File size: 5,681 Bytes
5f4445f 355629c 5f4445f 355629c 5f4445f 355629c 5f4445f 355629c 5f4445f 355629c 5f4445f 355629c 5f4445f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities.
## Commands
### Run the application locally
```bash
python app.py
```
### Install dependencies
```bash
pip install -r requirements.txt
```
## Architecture
### Core Components
1. **Model Pipeline** (`app.py:130-164`)
- Uses `Qwen/Qwen-Image` diffusion model with custom FlowMatchEulerDiscreteScheduler
- Loads Lightning LoRA weights for 8-step acceleration
- Configured for bfloat16 precision on CUDA
2. **Prompt Enhancement System** (`app.py:41-125`)
- `polish_prompt()`: Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts
- `get_caption_language()`: Detects Chinese vs English prompts
- `rewrite()`: Language-specific prompt enhancement with different system prompts for Chinese/English
- Requires `HF_TOKEN` environment variable for API access
3. **Style Presets System** (`app.py:16-87`)
- `load_style_presets()`: Loads style presets from `style_presets.yaml`
- `apply_style_preset()`: Applies selected style to prompts
- Supports custom styles and random style selection
- Each preset includes prefix, suffix, and negative prompt components
4. **Page Layouts System** (`app.py:89-145`)
- `load_page_layouts()`: Loads multi-image layouts from `page_layouts.yaml`
- `get_layout_choices()`: Returns available layouts for a given number of images
- `get_layout_metadata()`: Extracts panel metadata (type, focus, composition) for each position
- Supports 1-8 images per page with 5-6 layout variations each
- Dynamic layout selection based on number of images
- **Panel Metadata System**: Each panel position includes metadata that describes:
- `panel_type`: establishing/action/closeup/dialogue/reaction/transition/detail/splash
- `focus`: environment/character/characters/action/emotion/object/event
- `composition`: wide/tall/square/portrait/landscape
- Metadata is used to guide the LLM in generating appropriate scene descriptions
5. **Story Generation System** (`app.py:147-265`)
- `generate_story_scenes()`: Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions
- Takes panel metadata as input to generate contextually appropriate content
- Adapts descriptions based on panel type, focus, and composition
- Returns structured scene data with captions and dialogue
- `parse_yaml_scenes()`: Parses LLM output into structured scene data
6. **Image Size Calculation** (`app.py:267-330`)
- `get_image_size_for_position()`: Calculates precise image dimensions based on layout aspect ratio
- Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy
- Ensures images fill their layout containers without floating
- `get_layout_position_for_image()`: Retrieves position data for a specific panel
7. **PDF Generation** (`app.py:450-540`)
- `create_single_page_pdf()`: Creates PDF page with images arranged per layout
- `create_multi_page_pdf()`: Combines multiple pages into a single document
- Uses ReportLab for high-quality PDF generation
- Preserves image quality at 95% JPEG compression
- A4 page size with flexible positioning system
- Smart filling: fills space completely when aspect ratios match (<2% difference)
8. **Multi-Image Generation** (`app.py:545-650`)
- `infer_page()`: Main generation orchestrator
- Generates multiple images and combines into PDF
- Progressive generation with status updates
- Seed management for reproducibility across multiple images
- Returns PDF file, preview image, and seed information
9. **Gradio Interface** (`app.py:750-900+`)
- Slider for selecting 1-8 images per page
- Dynamic layout dropdown that updates based on image count
- Style preset dropdown with custom style text option
- PDF download and image preview outputs
- Advanced settings for all generation parameters
## Key Configuration
- **Scheduler Config** (`app.py:133-148`): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting
- **Aspect Ratios** (`app.py:170-188`): Predefined aspect ratios optimized for 1024 base resolution
- **Style Presets** (`style_presets.yaml`): Configurable style presets with prompt modifiers and negative prompts
- **Page Layouts** (`page_layouts.yaml`): Flexible layout system for 1-4 images per page
- **Default Settings**: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page
## Environment Variables
- `HF_TOKEN`: Required for prompt enhancement via Hugging Face InferenceClient
- Used for accessing Cerebras provider for Qwen3-235B model
## Key Features
- **Session-based storage**: Each user session gets a unique temporary directory that persists for 24 hours
- **Multi-page PDF generation**: Users can generate up to 128 pages in a single document
- **Dynamic page addition**: Click "Generate page N" to add the next page to the PDF
- **Flexible layouts**: Different layout options for 1-4 images per page
- **Style presets**: 20+ predefined artistic styles
- **Automatic cleanup**: Old sessions are automatically cleaned after 24 hours
## Model Dependencies
- Main model: `Qwen/Qwen-Image`
- LoRA weights: `lightx2v/Qwen-Image-Lightning` (V1.1 safetensors)
- Prompt enhancement model: `Qwen/Qwen3-235B-A22B-Instruct-2507` via Cerebras |