Introduction
GigaBrain-0 is a world model-powered Vision-Language-Action (VLA) foundation model designed for robots. It leverages diverse, scalable data generated by world models, reducing reliance on costly real-world robot data while enhancing cross-task generalization. With innovations like RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, GigaBrain-0 excels in spatial reasoning, object state understanding, and long-horizon task execution. It supports dexterous manipulation, mobile tasks, and long-horizon planning, offering robust performance across diverse environments and conditions.
Citation
@article{team2025gigabrain,
title={GigaBrain-0: A World Model-Powered Vision-Language-Action Model},
author={GigaAI},
year={2025},
eprint={2510.19430},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.19430},
}
- Downloads last month
- 200
