|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
tags: |
|
|
- 3d-reconstruction |
|
|
- 3d-generation |
|
|
--- |
|
|
|
|
|
# ShapeR |
|
|
|
|
|
 |
|
|
|
|
|
ShapeR introduces a novel approach to metric shape generation. Given an input image sequence, preprocessing extracts per-object metric sparse SLAM points, images, poses, and captions using off-the-shelf methods. A rectified flow transformer operating on VecSet latents conditions on these multimodal inputs to generate a shape code, which is decoded into the objectβs mesh. By applying the model object-centrically to each detected object, we obtain a metric reconstruction of the entire scene. |
|
|
|
|
|
[Project Page](http://facebookresearch.github.io/ShapeR) | [Paper](https://cdn.jsdelivr.net/gh/facebookresearch/ShapeR@main/resources/ShapeR.pdf) | [Video](https://www.youtube.com/watch?v=EbY30KAA55I) | [HF-Model](https://huggingface.co/facebook/ShapeR/) | [HF Evaluation Dataset](https://huggingface.co/datasets/facebook/ShapeR-Evaluation) |
|
|
|
|
|
## Usage |
|
|
|
|
|
Clone the [repository](https://github.com/facebookresearch/ShapeR) and follow the INSTALL.md instructions to install the required dependencies. |
|
|
|
|
|
### Inference |
|
|
|
|
|
```bash |
|
|
python infer_shape.py --input_pkl <sample.pkl> --config balance --output_dir output |
|
|
``` |
|
|
|
|
|
**Arguments:** |
|
|
- `--input_pkl`: Path to preprocessed pickle file (relative to `data/`) |
|
|
- `--config`: Preset configuration |
|
|
- `quality`: 16 views, 50 steps (best quality, slowest) |
|
|
- `balance`: 16 views, 25 steps (recommended) |
|
|
- `speed`: 4 views, 10 steps (fastest) |
|
|
- `--output_dir`: Output directory for meshes and visualizations |
|
|
- `--do_transform_to_world`: Transform output mesh to world coordinates |
|
|
- `--remove_floating_geometry`: Remove disconnected mesh components (default: on) |
|
|
- `--simplify_mesh`: Reduce mesh complexity (default: on) |
|
|
- `--save_visualization`: Save comparison visualization (default: on) |
|
|
|
|
|
### Example |
|
|
|
|
|
```bash |
|
|
python infer_shape.py --input_pkl gaia_29093018117008733__20250415_125853.pkl --config balance |
|
|
``` |
|
|
|
|
|
**Output:** |
|
|
- `<name>.glb`: Reconstructed 3D mesh |
|
|
- `VIS__<name>.jpg`: Visualization comparing input, prediction, and ground truth |
|
|
|
|
|
 |
|
|
|
|
|
## Data Format |
|
|
|
|
|
This codebase assumes that A sequence has already been processed using the Aria MPS pipeline, along with an 3D object instance detector, resulting in the pickle files with the preprocessed data required for ShapeR ingestion. |
|
|
|
|
|
We release the [**ShapeR Evaluation Dataset**](https://huggingface.co/datasets/facebook/ShapeR-Evaluation) containing preprocessed samples from Aria glasses captures. Each sample is a pickle file with point clouds, multi-view images, camera parameters, text captions, and ground truth meshes. |
|
|
|
|
|
For a detailed walkthrough of the data format, see the **[`explore_data.ipynb`](https://github.com/facebookresearch/ShapeR/blob/main/explore_data.ipynb)** notebook which includes: |
|
|
- Complete pickle file structure with all keys and their dimensions |
|
|
- Interactive 3D visualization of point clouds and meshes |
|
|
- Camera position visualization |
|
|
- Image and mask grid displays |
|
|
- DataLoader usage examples for both SLAM and RGB variants |
|
|
- Explanation of view selection strategies |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
ShapeR/ |
|
|
βββ infer_shape.py # Main inference script |
|
|
βββ explore_data.ipynb # Data exploration notebook |
|
|
βββ dataset/ |
|
|
β βββ shaper_dataset.py # Dataset and dataloader |
|
|
β βββ image_processor.py # View selection and image preprocessing |
|
|
β βββ point_cloud.py # Point cloud to SparseTensor |
|
|
βββ model/ |
|
|
β βββ flow_matching/ |
|
|
β β βββ shaper_denoiser.py # Flow matching denoiser |
|
|
β βββ vae3d/ |
|
|
β β βββ autoencoder.py # 3D VAE for mesh generation |
|
|
β βββ pointcloud_encoder.py # Sparse 3D convolution encoder |
|
|
β βββ dino_and_ray_feature_extractor.py # Image feature extraction |
|
|
βββ preprocessing/ |
|
|
β βββ helper.py # Fisheye rectification, camera utils |
|
|
βββ postprocessing/ |
|
|
β βββ helper.py # Mesh cleanup, visualization |
|
|
βββ checkpoints/ # Model weights (downloaded automatically) |
|
|
βββ data/ # Input pickle files |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
The ShapeR model are licensed under CC-BY-NC. See the [LICENSE](LICENSE) file for details. However portions of the project use checkpoints under separate license terms: see [NOTICE](NOTICE). e.g. DinoV2 is licensed under Apache-2.0 license. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find ShapeR useful for your research, please cite our paper: |
|
|
|
|
|
```bibtex |
|
|
@misc{siddiqui2026shaperrobustconditional3d, |
|
|
title={ShapeR: Robust Conditional 3D Shape Generation from Casual Captures}, |
|
|
author={Yawar Siddiqui and Duncan Frost and Samir Aroudj and Armen Avetisyan and Henry Howard-Jenkins and Daniel DeTone and Pierre Moulon and Qirui Wu and Zhengqin Li and Julian Straub and Richard Newcombe and Jakob Engel}, |
|
|
year={2026}, |
|
|
eprint={2601.11514}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2601.11514}, |
|
|
} |
|
|
``` |