Add pipeline tag and library name

This PR ensures the model can be found at https://huggingface.co/models?pipeline_tag=image-to-image, and also adds the correct `library_name`.

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 ---
 license: mit
 ---
 <p align="center">
@@ -50,7 +52,7 @@ license: mit
 UniWorld shows excellent performance in **20+** tasks.
-UniWorld, trained on only 2.7M samples, consistently outperforms [BAGEL](https://github.com/ByteDance-Seed/Bagel) (trained on 2665M samples) on the ImgEdit-Bench for image manipulation. It also surpasses the specialized image editing model [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) across multiple dimensions, including add, adjust, and extract on ImgEdit-Bench.
 **Click to play**
@@ -314,7 +316,7 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
 * [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan): An open‑source text-to-image/video foundation model, which provides a lot of caption data.
 * [SEED-Data-Edit](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit): A hybrid dataset for instruction-guided image editing.
 * [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct): The new flagship vision-language model of Qwen.
-* [FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev): Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image.
 * [SigLIP 2](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md): New multilingual vision-language encoders.
 * [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit): A state-of-the-art image editing model.
 * [BLIP3-o](https://github.com/JiuhaiChen/BLIP3o): A unified multimodal model that combines the reasoning and instruction following strength of autoregressive models with the generative power of diffusion models.
@@ -373,5 +375,4 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
   <img src="https://contrib.rocks/image?repo=PKU-YuanGroup/UniWorld-V1" />
 </a>
-This model is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147)

 ---
 license: mit
+library_name: univa
+pipeline_tag: image-to-image
 ---
 <p align="center">
 UniWorld shows excellent performance in **20+** tasks.
+UniWorld, trained on only 2.7M samples, consistently outperforms [BAGEL](https://github.com/ByteDance-Seed/Bagel) on the ImgEdit-Bench for image manipulation. It also surpasses the specialized image editing model [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) across multiple dimensions, including add, adjust, and extract on ImgEdit-Bench.
 **Click to play**
 * [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan): An open‑source text-to-image/video foundation model, which provides a lot of caption data.
 * [SEED-Data-Edit](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit): A hybrid dataset for instruction-guided image editing.
 * [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct): The new flagship vision-language model of Qwen.
+* [FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image.
 * [SigLIP 2](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md): New multilingual vision-language encoders.
 * [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit): A state-of-the-art image editing model.
 * [BLIP3-o](https://github.com/JiuhaiChen/BLIP3o): A unified multimodal model that combines the reasoning and instruction following strength of autoregressive models with the generative power of diffusion models.
   <img src="https://contrib.rocks/image?repo=PKU-YuanGroup/UniWorld-V1" />
 </a>
+This model is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147)