AiComicFactory2

Running on Zero

File size: 5,681 Bytes

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities.

## Commands

### Run the application locally
```bash
python app.py
```

### Install dependencies
```bash
pip install -r requirements.txt
```

## Architecture

### Core Components

1. **Model Pipeline** (`app.py:130-164`)
   - Uses `Qwen/Qwen-Image` diffusion model with custom FlowMatchEulerDiscreteScheduler
   - Loads Lightning LoRA weights for 8-step acceleration
   - Configured for bfloat16 precision on CUDA

2. **Prompt Enhancement System** (`app.py:41-125`)
   - `polish_prompt()`: Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts
   - `get_caption_language()`: Detects Chinese vs English prompts
   - `rewrite()`: Language-specific prompt enhancement with different system prompts for Chinese/English
   - Requires `HF_TOKEN` environment variable for API access

3. **Style Presets System** (`app.py:16-87`)
   - `load_style_presets()`: Loads style presets from `style_presets.yaml`
   - `apply_style_preset()`: Applies selected style to prompts
   - Supports custom styles and random style selection
   - Each preset includes prefix, suffix, and negative prompt components

4. **Page Layouts System** (`app.py:89-145`)
   - `load_page_layouts()`: Loads multi-image layouts from `page_layouts.yaml`
   - `get_layout_choices()`: Returns available layouts for a given number of images
   - `get_layout_metadata()`: Extracts panel metadata (type, focus, composition) for each position
   - Supports 1-8 images per page with 5-6 layout variations each
   - Dynamic layout selection based on number of images
   - **Panel Metadata System**: Each panel position includes metadata that describes:
     - `panel_type`: establishing/action/closeup/dialogue/reaction/transition/detail/splash
     - `focus`: environment/character/characters/action/emotion/object/event
     - `composition`: wide/tall/square/portrait/landscape
   - Metadata is used to guide the LLM in generating appropriate scene descriptions

5. **Story Generation System** (`app.py:147-265`)
   - `generate_story_scenes()`: Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions
   - Takes panel metadata as input to generate contextually appropriate content
   - Adapts descriptions based on panel type, focus, and composition
   - Returns structured scene data with captions and dialogue
   - `parse_yaml_scenes()`: Parses LLM output into structured scene data

6. **Image Size Calculation** (`app.py:267-330`)
   - `get_image_size_for_position()`: Calculates precise image dimensions based on layout aspect ratio
   - Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy
   - Ensures images fill their layout containers without floating
   - `get_layout_position_for_image()`: Retrieves position data for a specific panel

7. **PDF Generation** (`app.py:450-540`)
   - `create_single_page_pdf()`: Creates PDF page with images arranged per layout
   - `create_multi_page_pdf()`: Combines multiple pages into a single document
   - Uses ReportLab for high-quality PDF generation
   - Preserves image quality at 95% JPEG compression
   - A4 page size with flexible positioning system
   - Smart filling: fills space completely when aspect ratios match (<2% difference)

8. **Multi-Image Generation** (`app.py:545-650`)
   - `infer_page()`: Main generation orchestrator
   - Generates multiple images and combines into PDF
   - Progressive generation with status updates
   - Seed management for reproducibility across multiple images
   - Returns PDF file, preview image, and seed information

9. **Gradio Interface** (`app.py:750-900+`)
   - Slider for selecting 1-8 images per page
   - Dynamic layout dropdown that updates based on image count
   - Style preset dropdown with custom style text option
   - PDF download and image preview outputs
   - Advanced settings for all generation parameters

## Key Configuration

- **Scheduler Config** (`app.py:133-148`): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting
- **Aspect Ratios** (`app.py:170-188`): Predefined aspect ratios optimized for 1024 base resolution
- **Style Presets** (`style_presets.yaml`): Configurable style presets with prompt modifiers and negative prompts
- **Page Layouts** (`page_layouts.yaml`): Flexible layout system for 1-4 images per page
- **Default Settings**: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page

## Environment Variables

- `HF_TOKEN`: Required for prompt enhancement via Hugging Face InferenceClient
- Used for accessing Cerebras provider for Qwen3-235B model

## Key Features

- **Session-based storage**: Each user session gets a unique temporary directory that persists for 24 hours
- **Multi-page PDF generation**: Users can generate up to 128 pages in a single document
- **Dynamic page addition**: Click "Generate page N" to add the next page to the PDF
- **Flexible layouts**: Different layout options for 1-4 images per page
- **Style presets**: 20+ predefined artistic styles
- **Automatic cleanup**: Old sessions are automatically cleaned after 24 hours

## Model Dependencies

- Main model: `Qwen/Qwen-Image`
- LoRA weights: `lightx2v/Qwen-Image-Lightning` (V1.1 safetensors)
- Prompt enhancement model: `Qwen/Qwen3-235B-A22B-Instruct-2507` via Cerebras