| This repository contains the ViLaVT-7B model as presented in Chatting with Images for Introspective Visual Thinking. Please refer to the code https://github.com/AntResearchNLP/ViLaVT. | |
| If you find our work helpful, please consider citing our papers: | |
| ``` | |
| @misc{wu2026chattingimagesintrospectivevisual, | |
| title={Chatting with Images for Introspective Visual Thinking}, | |
| author={Junfei Wu and Jian Guan and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan}, | |
| year={2026}, | |
| eprint={2602.11073}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2602.11073}, | |
| } | |
| ``` |