Image-to-Image
univa
Safetensors
univa_qwen2p5vl
nielsr HF Staff commited on
Commit
7602623
·
verified ·
1 Parent(s): 03cbcad

Add pipeline tag and library name

Browse files

This PR ensures the model can be found at https://huggingface.co/models?pipeline_tag=image-to-image, and also adds the correct `library_name`.

Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
5
  <p align="center">
@@ -50,7 +52,7 @@ license: mit
50
 
51
  UniWorld shows excellent performance in **20+** tasks.
52
 
53
- UniWorld, trained on only 2.7M samples, consistently outperforms [BAGEL](https://github.com/ByteDance-Seed/Bagel) (trained on 2665M samples) on the ImgEdit-Bench for image manipulation. It also surpasses the specialized image editing model [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) across multiple dimensions, including add, adjust, and extract on ImgEdit-Bench.
54
 
55
  **Click to play**
56
 
@@ -314,7 +316,7 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
314
  * [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan): An open‑source text-to-image/video foundation model, which provides a lot of caption data.
315
  * [SEED-Data-Edit](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit): A hybrid dataset for instruction-guided image editing.
316
  * [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct): The new flagship vision-language model of Qwen.
317
- * [FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev): Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image.
318
  * [SigLIP 2](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md): New multilingual vision-language encoders.
319
  * [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit): A state-of-the-art image editing model.
320
  * [BLIP3-o](https://github.com/JiuhaiChen/BLIP3o): A unified multimodal model that combines the reasoning and instruction following strength of autoregressive models with the generative power of diffusion models.
@@ -373,5 +375,4 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
373
  <img src="https://contrib.rocks/image?repo=PKU-YuanGroup/UniWorld-V1" />
374
  </a>
375
 
376
- This model is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147)
377
-
 
1
  ---
2
  license: mit
3
+ library_name: univa
4
+ pipeline_tag: image-to-image
5
  ---
6
 
7
  <p align="center">
 
52
 
53
  UniWorld shows excellent performance in **20+** tasks.
54
 
55
+ UniWorld, trained on only 2.7M samples, consistently outperforms [BAGEL](https://github.com/ByteDance-Seed/Bagel) on the ImgEdit-Bench for image manipulation. It also surpasses the specialized image editing model [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) across multiple dimensions, including add, adjust, and extract on ImgEdit-Bench.
56
 
57
  **Click to play**
58
 
 
316
  * [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan): An open‑source text-to-image/video foundation model, which provides a lot of caption data.
317
  * [SEED-Data-Edit](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit): A hybrid dataset for instruction-guided image editing.
318
  * [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct): The new flagship vision-language model of Qwen.
319
+ * [FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev): Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image.
320
  * [SigLIP 2](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md): New multilingual vision-language encoders.
321
  * [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit): A state-of-the-art image editing model.
322
  * [BLIP3-o](https://github.com/JiuhaiChen/BLIP3o): A unified multimodal model that combines the reasoning and instruction following strength of autoregressive models with the generative power of diffusion models.
 
375
  <img src="https://contrib.rocks/image?repo=PKU-YuanGroup/UniWorld-V1" />
376
  </a>
377
 
378
+ This model is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147)