|
|
--- |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
tags: |
|
|
- gui-agent |
|
|
- phone-use agent |
|
|
- computer-use agent |
|
|
- pua |
|
|
- android |
|
|
- multimodal |
|
|
- gelab-zero |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-VL-4B-Instruct |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
This model is part of the [**GELab-Zero**](https://github.com/stepfun-ai/gelab-zero) project, which aims to accelerate the innovation and application deployment of GUI Agents by providing: |
|
|
1. **A 4B GUI Agent model** capable of running on local computers. |
|
|
2. **Plug-and-play inference infrastructure** that handles ADB connections, dependency installation, and task recording/replay (**available in the** [**GELab-Zero**](https://github.com/stepfun-ai/gelab-zero)). |
|
|
|
|
|
### Key Capabilities |
|
|
* **Local Deployment**: Optimized for consumer-grade hardware, balancing low latency with privacy. |
|
|
* **GUI Navigation**: Proficient in detecting and interacting with UI elements (click, type, slide, wait, etc.) based on visual cues. |
|
|
* **Complex Task Execution**: Handles multi-step long-horizon tasks across various apps (Food, Transportation, Shopping, Social, etc.). |
|
|
* **Open-World Generalization**: Capable of zero-shot operation across diverse unseen applications and complex dynamic interfaces without requiring app-specific adaptation. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Quick Start with Ollama |
|
|
|
|
|
The easiest way to run inference is using Ollama. |
|
|
|
|
|
1.**Install Ollama**: Download from [ollama.com](https://ollama.com/). |
|
|
|
|
|
2.**Download the Model**: |
|
|
|
|
|
```bash |
|
|
# Install huggingface-cli |
|
|
pip install huggingface_hub |
|
|
# Download model |
|
|
huggingface-cli download --resume-download stepfun-ai/GELab-Zero-4B-preview --local-dir gelab-zero-4b-preview |
|
|
``` |
|
|
|
|
|
3.**Create and Run in Ollama**: |
|
|
|
|
|
```bash |
|
|
cd gelab-zero-4b-preview |
|
|
ollama create gelab-zero-4b-preview -f Modelfile |
|
|
|
|
|
# Test the model |
|
|
curl -X POST http://localhost:11434/v1/chat/completions \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"model": "gelab-zero-4b-preview", |
|
|
"messages": [{"role": "user", "content": "Hello, GELab-Zero!"}] |
|
|
}' |
|
|
``` |
|
|
|
|
|
To use this model for actual Android device control (ADB connection, task execution), please use the [GELab-Zero](https://github.com/stepfun-ai/gelab-zero). |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find GELab-Zero-4B-preview useful for your research, please consider citing our work :) |
|
|
|
|
|
```bibtex |
|
|
@software{gelab_zero_2025, |
|
|
title={GELab-Zero: An Advanced Mobile Agent Inference System}, |
|
|
author={GELab Team}, |
|
|
year={2025}, |
|
|
url={https://github.com/stepfun-ai/gelab-zero} |
|
|
} |
|
|
|
|
|
@inproceedings{gelab_mt_rl, |
|
|
title={GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning}, |
|
|
author={Yan, Haolong and Shen, Yeqing and Huang, Xin and Wang, Jia and Tan, Kaijun and Liang, Zhixuan and Li, Hongxin and Ge, Zheng and Yoshie, Osamu and Li, Si and others}, |
|
|
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems} |
|
|
} |
|
|
``` |