GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Introduction

GigaBrain-0 is a world model-powered Vision-Language-Action (VLA) foundation model designed for robots. It leverages diverse, scalable data generated by world models, reducing reliance on costly real-world robot data while enhancing cross-task generalization. With innovations like RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, GigaBrain-0 excels in spatial reasoning, object state understanding, and long-horizon task execution. It supports dexterous manipulation, mobile tasks, and long-horizon planning, offering robust performance across diverse environments and conditions.

Citation

@article{team2025gigabrain,
  title={GigaBrain-0: A World Model-Powered Vision-Language-Action Model},
  author={GigaAI},
  year={2025},
  eprint={2510.19430},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2510.19430},
}

Downloads last month: 200

Video Preview

Robotics