Add pipeline tag and library name
Browse filesThis PR ensures the model can be found at https://huggingface.co/models?pipeline_tag=image-to-image, and also adds the correct `library_name`.
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
<p align="center">
|
|
@@ -50,7 +52,7 @@ license: mit
|
|
| 50 |
|
| 51 |
UniWorld shows excellent performance in **20+** tasks.
|
| 52 |
|
| 53 |
-
UniWorld, trained on only 2.7M samples, consistently outperforms [BAGEL](https://github.com/ByteDance-Seed/Bagel)
|
| 54 |
|
| 55 |
**Click to play**
|
| 56 |
|
|
@@ -314,7 +316,7 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
|
|
| 314 |
* [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan): An open‑source text-to-image/video foundation model, which provides a lot of caption data.
|
| 315 |
* [SEED-Data-Edit](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit): A hybrid dataset for instruction-guided image editing.
|
| 316 |
* [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct): The new flagship vision-language model of Qwen.
|
| 317 |
-
* [FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-
|
| 318 |
* [SigLIP 2](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md): New multilingual vision-language encoders.
|
| 319 |
* [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit): A state-of-the-art image editing model.
|
| 320 |
* [BLIP3-o](https://github.com/JiuhaiChen/BLIP3o): A unified multimodal model that combines the reasoning and instruction following strength of autoregressive models with the generative power of diffusion models.
|
|
@@ -373,5 +375,4 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
|
|
| 373 |
<img src="https://contrib.rocks/image?repo=PKU-YuanGroup/UniWorld-V1" />
|
| 374 |
</a>
|
| 375 |
|
| 376 |
-
This model is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147)
|
| 377 |
-
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
library_name: univa
|
| 4 |
+
pipeline_tag: image-to-image
|
| 5 |
---
|
| 6 |
|
| 7 |
<p align="center">
|
|
|
|
| 52 |
|
| 53 |
UniWorld shows excellent performance in **20+** tasks.
|
| 54 |
|
| 55 |
+
UniWorld, trained on only 2.7M samples, consistently outperforms [BAGEL](https://github.com/ByteDance-Seed/Bagel) on the ImgEdit-Bench for image manipulation. It also surpasses the specialized image editing model [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) across multiple dimensions, including add, adjust, and extract on ImgEdit-Bench.
|
| 56 |
|
| 57 |
**Click to play**
|
| 58 |
|
|
|
|
| 316 |
* [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan): An open‑source text-to-image/video foundation model, which provides a lot of caption data.
|
| 317 |
* [SEED-Data-Edit](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit): A hybrid dataset for instruction-guided image editing.
|
| 318 |
* [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct): The new flagship vision-language model of Qwen.
|
| 319 |
+
* [FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image.
|
| 320 |
* [SigLIP 2](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md): New multilingual vision-language encoders.
|
| 321 |
* [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit): A state-of-the-art image editing model.
|
| 322 |
* [BLIP3-o](https://github.com/JiuhaiChen/BLIP3o): A unified multimodal model that combines the reasoning and instruction following strength of autoregressive models with the generative power of diffusion models.
|
|
|
|
| 375 |
<img src="https://contrib.rocks/image?repo=PKU-YuanGroup/UniWorld-V1" />
|
| 376 |
</a>
|
| 377 |
|
| 378 |
+
This model is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147)
|
|
|