Add comprehensive model card for InfCam

This PR adds a comprehensive model card for the InfCam model.

It includes:
- **Metadata**: `pipeline_tag: image-to-video`. The `library_name` and `license` tags are omitted due to a lack of explicit evidence for their values in the provided context, adhering to strict disclaimers.
- **Key Links**: Direct links to the paper ([https://huggingface.co/papers/2512.17040](https://huggingface.co/papers/2512.17040)), project page ([https://emjay73.github.io/InfCam/](https://emjay73.github.io/InfCam/)), and GitHub repository ([https://github.com/emjay73/InfCam](https://github.com/emjay73/InfCam)).
- **Introduction**: A brief overview of the model from the paper abstract and the "TL;DR" from the GitHub README.
- **Teaser Video**: Link to the teaser video as provided in the GitHub README.
- **Sample Usage**: A detailed "Inference" section, including environment setup, checkpoint downloads, and specific `bash` and `python` inference commands, all directly copied from the GitHub README. This ensures accuracy and adherence to the "do not make up code" disclaimer. Literal newline characters in code snippets are preserved.
- **Additional Info**: Includes the camera trajectory table, acknowledgements, and BibTeX citation from the original GitHub repository.

These additions enhance discoverability on the Hugging Face Hub and provide users with comprehensive information and a clear guide to using the model.

Files changed (1) hide show

README.md +151 -0

README.md ADDED Viewed

	@@ -0,0 +1,151 @@

+---
+pipeline_tag: image-to-video
+---
+# InfCam: Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation
+This repository contains the InfCam model presented in the paper [Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation](https://huggingface.co/papers/2512.17040).
+Project Page: [https://emjay73.github.io/InfCam/](https://emjay73.github.io/InfCam/)
+Code: [https://github.com/emjay73/InfCam](https://github.com/emjay73/InfCam)
+<div align="center">
+[Min-Jung Kim<sup>*</sup>](https://emjay73.github.io/), [Jeongho Kim<sup>*</sup>](https://scholar.google.co.kr/citations?user=4SCCBFwAAAAJ&hl=ko), [Hoiyeong Jin<sup></sup>](https://scholar.google.co.kr/citations?hl=ko&user=Jp-zhtUAAAAJ), [Junha Hyung<sup></sup>](https://junhahyung.github.io/), [Jaegul Choo<sup></sup>](https://sites.google.com/site/jaegulchoo)
+<br>
+*Equal Contribution
+<p align="center">
+  <img src="https://huggingface.co/emjay73/InfCam/resolve/main/assets/GSAI_preview_image.png" width="20%" alt="GSAI Preview">
+</p>
+</div>
+## Teaser Video
+https://github.com/user-attachments/assets/1c52baf4-b5ff-417e-a6c6-c8570e667bd8
+## Introduction
+InfCam is a depth-free, camera-controlled video-to-video generation framework with high pose fidelity. It aims to provide creators with cinematic camera control capabilities in post-production.
+**TL;DR:** Given a video and a target camera trajectory, InfCam generates a video that faithfully follows the specified camera path without depth prior.
+## 🔥 Updates
+- [ ] Release training code
+- [x] Release inference code and model weights (2025.12.19)
+## ⚙️ Code
+### Inference
+Step 1: Set up the environment
+```
+conda create -n infcam python=3.12
+conda activate infcam
+pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121
+pip install cupy-cuda12x
+pip install transformers==4.46.2
+pip install sentencepiece
+pip install controlnet-aux==0.0.7
+pip install imageio
+pip install imageio[ffmpeg]
+pip install safetensors
+pip install einops
+pip install protobuf
+pip install modelscope
+pip install ftfy
+pip install lpips
+pip install lightning
+pip install pandas
+pip install matplotlib
+pip install wandb
+pip install ffmpeg-python
+pip install numpy
+pip install opencv-python
+```
+Step 2: Download the pretrained checkpoints
+1. Download the pre-trained Wan2.1 model
+```shell
+python download_wan2.1.py
+```
+2. Download the pre-trained UniDepth model
+Download the pre-trained weights from [huggingface](https://huggingface.co/lpiccinelli/unidepth-v2-vitl14) and place it in ```models/unidepth-v2-vitl14```.
+```shell
+cd models
+git clone https://huggingface.co/lpiccinelli/unidepth-v2-vitl14
+```
+3. Download the pre-trained InfCam checkpoint
+Download the pre-trained InfCam weights from [huggingface](https://huggingface.co/emjay73/InfCam/tree/main) and place it in ```models/InfCam```.
+```shell
+cd models
+git clone https://huggingface.co/emjay73/InfCam
+```
+Step 3: Test the example videos
+```shell
+bash run_inference.sh
+```
+or
+```shell
+for CAM in {1..10}; do
+    CUDA_VISIBLE_DEVICES=0 python inference_infcam.py \
+        --cam_type ${CAM} \
+	    --ckpt_path "models/InfCam/step35000.ckpt" \
+        --camera_extrinsics_path "./sample_data/cameras/camera_extrinsics_10types.json" \
+        --output_dir "./results/sample_data" \
+	    --dataset_path "./sample_data" \
+        --metadata_file_name "metadata.csv" \
+        --num_frames 81 --width 832 --height 480 \
+        --num_inference_steps 20 \
+	    --zoom_factor 1.0 \
+        --k_from_unidepth \
+	    --seed ${SEED}
+done
+```
+This inference code requires 48 GB of memory for UniDepth and 28 GB for the InfCam pipeline.
+Step 4: Test your own videos
+If you want to test your own videos, you need to prepare your test data following the structure of the ```sample_data``` folder. This includes N mp4 videos, each with at least 81 frames, and a ```metadata.csv``` file that stores their paths and corresponding captions. You can refer to the '[caption branch](https://github.com/emjay73/InfCam/tree/caption) for metadata.csv extraction.
+We provide several preset camera types, as shown in the table below.
+These follow the ReCamMaster presets, but the starting point of each trajectory differs from that of the initial frame.
+| cam_type       | Trajectory                  |
+|-------------------|-----------------------------|
+| 1 | Pan Right                   |
+| 2 | Pan Left                    |
+| 3 | Tilt Up                     |
+| 4 | Tilt Down                   |
+| 5 | Zoom In                     |
+| 6 | Zoom Out                    |
+| 7 | Translate Up (with rotation)   |
+| 8 | Translate Down (with rotation) |
+| 9 | Arc Left (with rotation)    |
+| 10 | Arc Right (with rotation)   |
+## 🤗 Thank You Note
+Our work is based on the following repositories.\
+Thank you for your outstanding contributions!
+[ReCamMaster](https://jianhongbai.github.io/ReCamMaster/): Re-capture in-the-wild videos with novel camera trajectories, and release a multi-camera synchronized video dataset rendered with Unreal Engine 5.
+[WAN2.1](https://github.com/Wan-Video/Wan2.1): A comprehensive and open suite of video foundation models.
+[UniDepthV2](https://github.com/lpiccinelli-eth/UniDepth): Monocular metric depth estimation.
+## 🌟 Citation
+Please leave us a star 🌟 and cite our paper if you find our work helpful.
+```bibtex
+@article{kim2025infinite,
+  title={Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation},
+  author={Kim, Min-Jung and Kim, Jeongho and Jin, Hoiyeong and Hyung, Junha and Choo, Jaegul},
+  journal={arXiv preprint arXiv:2512.17040},
+  year={2025}
+}
+```