Enhance model card for PixNerd (#1)

Browse files

- Enhance model card for PixNerd (f3f9510680b19d577e5b290905ece5ac356a0d16)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +92 -1

README.md CHANGED Viewed

@@ -1,5 +1,96 @@
 ---
 license: apache-2.0
 ---
-arxiv.org/abs/2507.23268

 ---
 license: apache-2.0
+pipeline_tag: unconditional-image-generation
 ---
+# PixNerd: Pixel Neural Field Diffusion
+<div style="text-align: center;">
+  <a href="https://huggingface.co/papers/2507.23268"><img src="https://img.shields.io/badge/Paper-2507.23268-b31b1b.svg" alt="Paper"></a>
+  <a href="https://github.com/MCG-NJU/PixNerd"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&" alt="Code"></a>
+  <a href="https://huggingface.co/spaces/MCG-NJU/PixNerd"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Online_Demo-green" alt="Demo"></a>
+</div>
+PixNerd is a novel pixel-space diffusion transformer for image generation, introduced in the paper [PixNerd: Pixel Neural Field Diffusion](https://huggingface.co/papers/2507.23268). Unlike conventional diffusion models that depend on a compressed latent space shaped by a pre-trained VAE, PixNerd proposes to model patch-wise decoding with a neural field. This results in a single-scale, single-stage, efficient, and end-to-end solution that directly operates in pixel space, avoiding accumulated errors and decoding artifacts.
+<p align="center">
+  <img src="https://huggingface.co/MCG-NJU/PixNerd/resolve/main/figs/arch.png" alt="PixNerd Architecture Diagram" width="700">
+</p>
+### ✨ Key Highlights
+*   **Efficient Pixel-Space Diffusion**: Directly models image generation in pixel space, eliminating the need for VAEs and their associated complexities or artifacts.
+*   **Neural Field Decoding**: Employs neural fields for patch-wise decoding, improving the modeling of high-frequency details.
+*   **Single-Stage & End-to-End**: Offers a simplified, efficient training and inference paradigm without complex cascade pipelines.
+*   **High Performance**: Achieves competitive FID scores on ImageNet 256x256 (2.15 FID) and 512x512 (2.84 FID) for unconditional image generation.
+*   **Text-to-Image Extension**: The framework is extensible to text-to-image applications, achieving strong results on benchmarks like GenEval (0.73 overall score) and DPG (80.9 overall score).
+## Visualizations
+Below are sample images generated by PixNerd, showcasing its capabilities:
+<p align="center">
+  <img src="https://huggingface.co/MCG-NJU/PixNerd/resolve/main/figs/pixelnerd_teaser.png" alt="PixNerd Teaser" width="700">
+  <br/>
+  <img src="https://huggingface.co/MCG-NJU/PixNerd/resolve/main/figs/pixnerd_multires.png" alt="PixNerd Multi-Resolution Examples" width="700">
+</p>
+## Checkpoints
+The following checkpoints are available:
+| Dataset       | Model         | Params | FID   | HuggingFace                           |
+|---------------|---------------|--------|-------|---------------------------------------|
+| ImageNet256   | PixNerd-XL/16 | 700M   | 2.15  | [🤗](https://huggingface.co/MCG-NJU/PixNerd-XL-P16-C2I) |
+| ImageNet512   | PixNerd-XL/16 | 700M   | 2.84  | [🤗](https://huggingface.co/MCG-NJU/PixNerd-XL-P16-C2I) |
+| Dataset       | Model         | Params | GenEval | DPG  | HuggingFace                                              |
+|---------------|---------------|--------|------|------|----------------------------------------------------------|
+| Text-to-Image | PixNerd-XXL/16| 1.2B | 0.73 | 80.9 | [🤗](https://huggingface.co/MCG-NJU/PixNerd-XXL-P16-T2I) |
+## Online Demos
+You can try out the PixNerd-XXL/16 (text-to-image) model on our Hugging Face Space demo: [https://huggingface.co/spaces/MCG-NJU/PixNerd](https://huggingface.co/spaces/MCG-NJU/PixNerd).
+To host a local Gradio demo for text-to-image applications, run the following command after setting up the environment:
+```bash
+python app.py --config configs_t2i/inference_heavydecoder.yaml  --ckpt_path=XXX.ckpt
+```
+## Usage
+For image generation (C2i for ImageNet), you can use the provided codebase. First, install the required dependencies:
+```bash
+# for installation
+pip install -r requirements.txt
+```
+Then, run inference using the `main.py` script (replace `XXX.ckpt` with your checkpoint path):
+```bash
+# for inference
+python main.py predict -c configs_c2i/pix256std1_repa_pixnerd_xl.yaml --ckpt_path=XXX.ckpt
+# or specify the GPU(s) to use:
+CUDA_VISIBLE_DEVICES=0,1, python main.py predict -c configs_c2i/pix256std1_repa_pixnerd_xl.yaml --ckpt_path=XXX.ckpt
+```
+For more details on training and evaluation for both C2i and T2i applications, please refer to the [official GitHub repository](https://github.com/MCG-NJU/PixNerd).
+## Citation
+If you find this work useful for your research, please cite our paper:
+```bibtex
+@article{2507.23268,
+Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
+Title = {PixNerd: Pixel Neural Field Diffusion},
+Year = {2025},
+Eprint = {arXiv:2507.23268},
+}
+```
+## Acknowledgement
+The code is mainly built upon [FlowDCN](https://github.com/MCG-NJU/DDT) and [DDT](https://github.com/MCG-NJU/FlowDCN).