NewBie-AI
/

NewBie-image-Exp0.1

+prompt = f"""
+                Please annotate each character in the image and provide image tag information in JSON format, along with a detailed description of the scene.
+                【Character Identification and Annotation】
+                1. Create a bounding box (bbox) for each character, formatted as [bottom-left x coordinate, bottom-left y coordinate, top-right x coordinate, top-right y coordinate]
+                2. The bounding box should precisely contain the entire character, neither too large nor too small
+                3. Character names are temporarily unknown, please use placeholders like $character_1$, $character_2$, etc.
+                【Overall Image Analysis】
+                1. After analyzing the positions of all characters, provide an overall description of the image, including both tags and caption sections
+                2. Tags section:
+                   - Reorganize based on the original tags provided in the <tags> content
+                   - Group tags by character using structured XML format:
+                   <character_1>
+                   <n>$character_1$</n>
+                   <gender>1girl/1boy</gender>
+                   <appearance>facial features, hair color, hair style, eye color, skin tone, age appearance, etc.</appearance>
+                   <clothing>clothing type, color, style, accessories, footwear, etc.</clothing>
+                   <body_type>height, build, physical characteristics, etc.</body_type>
+                   <expression>facial expression, emotional state, mood, etc.</expression>
+                   <action>current pose, movement, gesture, activity, etc.</action>
+                   <interaction>interaction with other characters, objects, or environment</interaction>
+                   <position>precise position in image (center, left, right, foreground, background, etc.)</position>
+                   </character_1>
+                   - Use structured XML format for general tags in <general_tags>:
+                     * <count>: Overall character count (1girl, 2girls, 3girls, 1boy, multiple boys, etc.)
+                     * <artists>: Artist name, art style attribution, creator information
+                     * <style>: Art style (anime style, watercolor, oil painting, digital art, realistic, etc.)
+                     * <background>: Background type (indoor, outdoor, landscape, cityscape, abstract, etc.)
+                     * <environment>: Specific environment (room, forest, city, beach, school, office, etc.)
+                     * <perspective>: Viewpoint (from above, from below, side view, close-up, wide shot, etc.)
+                     * <atmosphere>: Mood and atmosphere (dark, bright, moody, cheerful, romantic, mysterious, etc.)
+                     * <lighting>: Lighting conditions (natural light, artificial light, sunset, candlelight, neon, etc.)
+                     * <quality>: Image quality tags (high resolution, masterpiece, best quality, detailed, etc.)
+                     * <objects>: Important objects (furniture, decorations, tools, vehicles, weapons, etc.)
+                     * <other>: Any other scene-related tags not covered in above categories
+                   - ALWAYS include gender tags (1girl, 1boy, etc.) for each character in their specific gender section
+                   - Character information should be precise and detailed; if there are conflicting tags, select the most accurate ones based on your judgment
+                   - Add detailed information about character positions in the image - this is EXTREMELY important
+                   - Use professional prompt word format for tags, not natural language
+                   - If you observe important elements in the image that are not included in the <tags>, add appropriate tags
+                   - All tags must be in XML format for consistency and easy parsing
+                   - **IMPORTANT: If any XML attribute is not applicable or not visible in the image, you can omit that tag entirely. Only include tags that are relevant and accurately describe what you can observe in the image.**
+                3. Caption section (this is the most important part):
+                   - Use natural language to describe the entire image content in extreme detail
+                   - Describe each character's name in detail (using placeholders like $character_1$, etc.)
+                   - Clearly indicate each character's gender
+                   - Describe each character's clothing in detail, including clothing type, color, style, accessories, etc.
+                   - Describe each character's actions and poses in detail
+                   - Describe each character's expressions and emotional states in detail
+                   - Describe each character's precise position in the image (e.g., "$character_1$ is positioned in the center-left of the frame")
+                   - Describe the relative positioning between characters in extreme detail (e.g., "$character_1$ is standing to the left of $character_2$, slightly in front")
+                   - Describe how characters interact with scene elements in detail (e.g., "$character_1$ is sitting on the bed")
+                   - Describe the background and environmental details in detail (furniture, decorations, etc.)
+                   - Describe the overall atmosphere, lighting, color tones, perspective, artistic style, and other visual effects of the image in detail
+                   - Make sure to describe the entire scene in a coherent, flowing narrative rather than simply listing points
+                   - Ensure the description covers all important visual elements in the image
+                   - If adult content exists, it is also very important to accurately analyze the adult content! Describe in detail the sexual activities currently taking place, such as positions, oral sex, M-legs, etc., as well as actions such as holding the penis, spreading the vagina, anus, urethra, etc., and other descriptions such as a toy inserted into the vagina, a vibrator inserted into the vagina, and fondling the vagina. These can be found in <tags>. your description should be very detail on this content!!!!!!!!
+                   - Your description should cover all the details in the picture and be accurate and objective! You only need to describe the elements in the picture, not analyze it by saying what it feels like.
+                【Format Requirements and Flexibility】
+                1. The JSON format must be maintained correctly
+                2. While maintaining consistent format, you can flexibly supplement tags and descriptions
+                3. Tag format must remain consistent, but content can be adjusted according to the actual situation in the image
+                4. **XML Tag Flexibility: You can omit any XML tags that are not applicable, not visible, or not relevant to the specific image. For example:**
+                   - If a character's expression is not clearly visible, omit the <expression> tag
+                   - If there are no notable objects in the scene, omit the <objects> tag
+                   - If the artist is unknown or not identifiable, omit the <artists> tag
+                   - Only include tags that accurately describe observable elements in the image
+                5. The caption must be extremely detailed and comprehensive, with a length of at least 200 words
+                6. You must ensure your output is entirely in English!
+                ******Your output should cover as many elements and content in the image as possible!!! The more detailed, the better!!**************
+                【Output Format】
+                The content in '<>' should be replaced
+                Please output strictly according to the following JSON format, without adding any other content:
+                Note: You can omit any XML tags that are not applicable or not clearly observable in the image.
+                {{
+                  "character_1": {{
+                    "bbox": [x1, y1, x2, y2],
+                    "name": "$character_1$"
+                  }},
+                  "character_2": {{
+                    "bbox": [x3, y3, x4, y4],
+                    "name": "$character_2$"
+                  }},
+                  // Add more characters in this format as needed
+                  "image": {{
+                    "tags": "
+                    <character_1>
+                    <n>$character_1$</n>
+                    <gender>1girl</gender>
+                    <appearance>detailed appearance description</appearance>
+                    <clothing>detailed clothing description</clothing>
+                    <body_type>body type description</body_type>
+                    <expression>expression description</expression>
+                    <action>action description</action>
+                    <interaction>interaction description</interaction>
+                    <position>position description</position>
+                    </character_1>
+                    <character_2>
+                    <n>$character_2$</n>
+                    <gender>1boy</gender>
+                    <appearance>detailed appearance description</appearance>
+                    <clothing>detailed clothing description</clothing>
+                    <body_type>body type description</body_type>
+                    <expression>expression description</expression>
+                    <action>action description</action>
+                    <interaction>interaction description</interaction>
+                    <position>position description</position>
+                    </character_2>
+                    <general_tags>
+                    <count>1girl, 1boy, multiple characters, etc.</count>
+                    <artists>artist name, art style attribution, etc.</artists>
+                    <style>anime style, watercolor, oil painting, digital art, etc.</style>
+                    <background>indoor, outdoor, landscape, cityscape, etc.</background>
+                    <environment>room, forest, city, beach, school, etc.</environment>
+                    <perspective>from above, from below, side view, close-up, etc.</perspective>
+                    <atmosphere>dark, bright, moody, cheerful, romantic, etc.</atmosphere>
+                    <lighting>natural light, artificial light, sunset, candlelight, etc.</lighting>
+                    <quality>high resolution, masterpiece, best quality, etc.</quality>
+                    <objects>furniture, decorations, tools, vehicles, etc.</objects>
+                    <other>any other scene-related tags not covered above</other>
+                    </general_tags>
+                    // Note: Omit any tags above that are not applicable to the specific image
+                    ",
+                    "caption": "Extremely detailed description of the scene, including all characters' names, genders, appearances, clothing, actions, expressions, precise positions, relative positions, as well as scene background, environment, atmosphere, lighting, objects, perspective, artistic style, etc. The description should be extremely thorough, with vivid details, at least 200 words. Output in English."
+                  }}
+                }}
+                <tags>
+                {tags}
+                </tags>
+                """