# CUA2 - Computer Use Agent 2 An AI-powered automation interface featuring real-time agent task processing, VNC streaming, and step-by-step execution visualization. ## ๐Ÿš€ Overview CUA2 is a full-stack application that provides a modern web interface for AI agents to perform automated computer tasks. The system features real-time WebSocket communication between a FastAPI backend and React frontend, allowing users to monitor agent execution, view screenshots, track token usage, and stream VNC sessions. ## ๐Ÿ—๏ธ Architecture ![CUA2 Architecture](assets/architecture.png) ## ๐Ÿ› ๏ธ Tech Stack ### Backend (`cua2-core`) - **FastAPI** - **Uvicorn** - **smolagents** - AI agent framework with OpenAI/LiteLLM support ### Frontend (`cua2-front`) - **React TS** - **Vite** ## ๐Ÿ“‹ Prerequisites - **Python** 3.10 or higher - **Node.js** 18 or higher - **npm** - **uv** - Python package manager ### Installing uv **macOS/Linux:** ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` For more installation options, visit: https://docs.astral.sh/uv/getting-started/installation/ ## ๐Ÿš€ Getting Started ### 1. Clone the Repository ```bash git clone https://github.com/huggingface/CUA2.git cd CUA2 ``` ### 2. Install Dependencies Use the Makefile for quick setup: ```bash make sync ``` This will: - Install Python dependencies using `uv` - Install Node.js dependencies for the frontend Or install manually: ```bash # Backend dependencies cd cua2-core uv sync --all-extras # Frontend dependencies cd ../cua2-front npm install ``` ### 3. Environment Configuration Copy the example environment file and configure your settings: ```bash cd cua2-core cp env.example .env ``` Edit `.env` with your configuration: - API keys for OpenAI/LiteLLM - Database connections (if applicable) - Other service credentials ### 4. Start Development Servers #### Option 1: Using Makefile (Recommended) Open two terminal windows: **Terminal 1 - Backend:** ```bash make dev-backend ``` **Terminal 2 - Frontend:** ```bash make dev-frontend ``` #### Option 2: Manual Start **Terminal 1 - Backend:** ```bash cd cua2-core uv run uvicorn cua2_core.main:app --reload --host 0.0.0.0 --port 8000 ``` **Terminal 2 - Frontend:** ```bash cd cua2-front npm run dev ``` ### 5. Access the Application - **Frontend**: http://localhost:5173 - **Backend API**: http://localhost:8000 - **API Documentation**: http://localhost:8000/docs - **ReDoc**: http://localhost:8000/redoc ## ๐Ÿ“ Project Structure ``` CUA2/ โ”œโ”€โ”€ cua2-core/ # Backend application โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ”‚ โ””โ”€โ”€ cua2_core/ โ”‚ โ”‚ โ”œโ”€โ”€ app.py # FastAPI application setup โ”‚ โ”‚ โ”œโ”€โ”€ main.py # Application entry point โ”‚ โ”‚ โ”œโ”€โ”€ models/ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ models.py # Pydantic models โ”‚ โ”‚ โ”œโ”€โ”€ routes/ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ routes.py # REST API endpoints โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ websocket.py # WebSocket endpoint โ”‚ โ”‚ โ”œโ”€โ”€ services/ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ agent_service.py # Agent task processing โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ simulation_metadata/ # Demo data โ”‚ โ”‚ โ””โ”€โ”€ websocket/ โ”‚ โ”‚ โ””โ”€โ”€ websocket_manager.py # WebSocket management โ”‚ โ”œโ”€โ”€ pyproject.toml # Python dependencies โ”‚ โ””โ”€โ”€ env.example # Environment variables template โ”‚ โ”œโ”€โ”€ cua2-front/ # Frontend application โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ”‚ โ”œโ”€โ”€ App.tsx # Main application component โ”‚ โ”‚ โ”œโ”€โ”€ pages/ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ Index.tsx # Main page โ”‚ โ”‚ โ”œโ”€โ”€ components/ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ mock/ # UI components โ”‚ โ”‚ โ”œโ”€โ”€ hooks/ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ useWebSocket.ts # WebSocket hook โ”‚ โ”‚ โ””โ”€โ”€ types/ โ”‚ โ”‚ โ””โ”€โ”€ agent.ts # TypeScript type definitions โ”‚ โ”œโ”€โ”€ package.json # Node dependencies โ”‚ โ””โ”€โ”€ vite.config.ts # Vite configuration โ”‚ โ”œโ”€โ”€ Makefile # Development commands โ””โ”€โ”€ README.md # This file ``` ## ๐Ÿ”Œ API Endpoints ### REST API | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/health` | Health check with WebSocket connection count | | GET | `/tasks` | Get all active tasks | | GET | `/tasks/{task_id}` | Get specific task status | | GET | `/docs` | Interactive API documentation (Swagger) | | GET | `/redoc` | Alternative API documentation (ReDoc) | ### WebSocket #### Client โ†’ Server Events - `user_task` - New user task request #### Server โ†’ Client Events - `agent_start` - Agent begins processing - `agent_progress` - New step completed with image and metadata - `agent_complete` - Task finished successfully - `agent_error` - Error occurred during processing - `vnc_url_set` - VNC stream URL available - `vnc_url_unset` - VNC stream ended - `heartbeat` - Connection keep-alive ## ๐Ÿงช Development ### Available Make Commands ```bash make sync # Sync all dependencies (Python + Node.js) make dev-backend # Start backend development server make dev-frontend # Start frontend development server make pre-commit # Run pre-commit hooks make clean # Clean build artifacts and caches ``` ### Code Quality ```bash # Backend make pre-commit ``` ### Build for Production ```bash # Frontend cd cua2-front npm run build # The build output will be in cua2-front/dist/ ``` **Happy Coding! ๐Ÿš€**