nielsr HF Staff commited on
Commit
8a3fe69
·
verified ·
1 Parent(s): 2e7ee39

Add comprehensive model card for InfCam

Browse files

This PR adds a comprehensive model card for the InfCam model.

It includes:
- **Metadata**: `pipeline_tag: image-to-video`. The `library_name` and `license` tags are omitted due to a lack of explicit evidence for their values in the provided context, adhering to strict disclaimers.
- **Key Links**: Direct links to the paper ([https://huggingface.co/papers/2512.17040](https://huggingface.co/papers/2512.17040)), project page ([https://emjay73.github.io/InfCam/](https://emjay73.github.io/InfCam/)), and GitHub repository ([https://github.com/emjay73/InfCam](https://github.com/emjay73/InfCam)).
- **Introduction**: A brief overview of the model from the paper abstract and the "TL;DR" from the GitHub README.
- **Teaser Video**: Link to the teaser video as provided in the GitHub README.
- **Sample Usage**: A detailed "Inference" section, including environment setup, checkpoint downloads, and specific `bash` and `python` inference commands, all directly copied from the GitHub README. This ensures accuracy and adherence to the "do not make up code" disclaimer. Literal newline characters in code snippets are preserved.
- **Additional Info**: Includes the camera trajectory table, acknowledgements, and BibTeX citation from the original GitHub repository.

These additions enhance discoverability on the Hugging Face Hub and provide users with comprehensive information and a clear guide to using the model.

Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-to-video
3
+ ---
4
+
5
+ # InfCam: Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation
6
+
7
+ This repository contains the InfCam model presented in the paper [Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation](https://huggingface.co/papers/2512.17040).
8
+
9
+ Project Page: [https://emjay73.github.io/InfCam/](https://emjay73.github.io/InfCam/)
10
+ Code: [https://github.com/emjay73/InfCam](https://github.com/emjay73/InfCam)
11
+
12
+ <div align="center">
13
+ [Min-Jung Kim<sup>*</sup>](https://emjay73.github.io/), [Jeongho Kim<sup>*</sup>](https://scholar.google.co.kr/citations?user=4SCCBFwAAAAJ&hl=ko), [Hoiyeong Jin<sup></sup>](https://scholar.google.co.kr/citations?hl=ko&user=Jp-zhtUAAAAJ), [Junha Hyung<sup></sup>](https://junhahyung.github.io/), [Jaegul Choo<sup></sup>](https://sites.google.com/site/jaegulchoo)
14
+ <br>
15
+ *Equal Contribution
16
+ <p align="center">
17
+ <img src="https://huggingface.co/emjay73/InfCam/resolve/main/assets/GSAI_preview_image.png" width="20%" alt="GSAI Preview">
18
+ </p>
19
+ </div>
20
+
21
+ ## Teaser Video
22
+ https://github.com/user-attachments/assets/1c52baf4-b5ff-417e-a6c6-c8570e667bd8
23
+
24
+ ## Introduction
25
+
26
+ InfCam is a depth-free, camera-controlled video-to-video generation framework with high pose fidelity. It aims to provide creators with cinematic camera control capabilities in post-production.
27
+ **TL;DR:** Given a video and a target camera trajectory, InfCam generates a video that faithfully follows the specified camera path without depth prior.
28
+
29
+ ## 🔥 Updates
30
+ - [ ] Release training code
31
+ - [x] Release inference code and model weights (2025.12.19)
32
+
33
+ ## ⚙️ Code
34
+ ### Inference
35
+ Step 1: Set up the environment
36
+
37
+ ```
38
+ conda create -n infcam python=3.12
39
+ conda activate infcam
40
+
41
+ pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121
42
+ pip install cupy-cuda12x
43
+ pip install transformers==4.46.2
44
+ pip install sentencepiece
45
+ pip install controlnet-aux==0.0.7
46
+ pip install imageio
47
+ pip install imageio[ffmpeg]
48
+ pip install safetensors
49
+ pip install einops
50
+ pip install protobuf
51
+ pip install modelscope
52
+ pip install ftfy
53
+ pip install lpips
54
+ pip install lightning
55
+ pip install pandas
56
+ pip install matplotlib
57
+ pip install wandb
58
+ pip install ffmpeg-python
59
+ pip install numpy
60
+ pip install opencv-python
61
+ ```
62
+
63
+ Step 2: Download the pretrained checkpoints
64
+ 1. Download the pre-trained Wan2.1 model
65
+
66
+ ```shell
67
+ python download_wan2.1.py
68
+ ```
69
+
70
+ 2. Download the pre-trained UniDepth model
71
+
72
+ Download the pre-trained weights from [huggingface](https://huggingface.co/lpiccinelli/unidepth-v2-vitl14) and place it in ```models/unidepth-v2-vitl14```.
73
+ ```shell
74
+ cd models
75
+ git clone https://huggingface.co/lpiccinelli/unidepth-v2-vitl14
76
+ ```
77
+
78
+ 3. Download the pre-trained InfCam checkpoint
79
+
80
+ Download the pre-trained InfCam weights from [huggingface](https://huggingface.co/emjay73/InfCam/tree/main) and place it in ```models/InfCam```.
81
+
82
+ ```shell
83
+ cd models
84
+ git clone https://huggingface.co/emjay73/InfCam
85
+ ```
86
+ Step 3: Test the example videos
87
+
88
+ ```shell
89
+ bash run_inference.sh
90
+ ```
91
+ or
92
+ ```shell
93
+ for CAM in {1..10}; do
94
+ CUDA_VISIBLE_DEVICES=0 python inference_infcam.py \
95
+ --cam_type ${CAM} \
96
+ --ckpt_path "models/InfCam/step35000.ckpt" \
97
+ --camera_extrinsics_path "./sample_data/cameras/camera_extrinsics_10types.json" \
98
+ --output_dir "./results/sample_data" \
99
+ --dataset_path "./sample_data" \
100
+ --metadata_file_name "metadata.csv" \
101
+ --num_frames 81 --width 832 --height 480 \
102
+ --num_inference_steps 20 \
103
+ --zoom_factor 1.0 \
104
+ --k_from_unidepth \
105
+ --seed ${SEED}
106
+ done
107
+ ```
108
+ This inference code requires 48 GB of memory for UniDepth and 28 GB for the InfCam pipeline.
109
+
110
+ Step 4: Test your own videos
111
+
112
+ If you want to test your own videos, you need to prepare your test data following the structure of the ```sample_data``` folder. This includes N mp4 videos, each with at least 81 frames, and a ```metadata.csv``` file that stores their paths and corresponding captions. You can refer to the '[caption branch](https://github.com/emjay73/InfCam/tree/caption) for metadata.csv extraction.
113
+
114
+
115
+ We provide several preset camera types, as shown in the table below.
116
+ These follow the ReCamMaster presets, but the starting point of each trajectory differs from that of the initial frame.
117
+
118
+ | cam_type | Trajectory |
119
+ |-------------------|-----------------------------|
120
+ | 1 | Pan Right |
121
+ | 2 | Pan Left |
122
+ | 3 | Tilt Up |
123
+ | 4 | Tilt Down |
124
+ | 5 | Zoom In |
125
+ | 6 | Zoom Out |
126
+ | 7 | Translate Up (with rotation) |
127
+ | 8 | Translate Down (with rotation) |
128
+ | 9 | Arc Left (with rotation) |
129
+ | 10 | Arc Right (with rotation) |
130
+
131
+ ## 🤗 Thank You Note
132
+ Our work is based on the following repositories.\
133
+ Thank you for your outstanding contributions!
134
+
135
+ [ReCamMaster](https://jianhongbai.github.io/ReCamMaster/): Re-capture in-the-wild videos with novel camera trajectories, and release a multi-camera synchronized video dataset rendered with Unreal Engine 5.
136
+
137
+ [WAN2.1](https://github.com/Wan-Video/Wan2.1): A comprehensive and open suite of video foundation models.
138
+
139
+ [UniDepthV2](https://github.com/lpiccinelli-eth/UniDepth): Monocular metric depth estimation.
140
+
141
+ ## 🌟 Citation
142
+
143
+ Please leave us a star 🌟 and cite our paper if you find our work helpful.
144
+ ```bibtex
145
+ @article{kim2025infinite,
146
+ title={Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation},
147
+ author={Kim, Min-Jung and Kim, Jeongho and Jin, Hoiyeong and Hyung, Junha and Choo, Jaegul},
148
+ journal={arXiv preprint arXiv:2512.17040},
149
+ year={2025}
150
+ }
151
+ ```