computer-use-agent / README.md
A-Mahla
ADD CUA backbone (#1)
304e233
|
raw
history blame
5.68 kB

CUA2 - Computer Use Agent 2

An AI-powered automation interface featuring real-time agent task processing, VNC streaming, and step-by-step execution visualization.

πŸš€ Overview

CUA2 is a full-stack application that provides a modern web interface for AI agents to perform automated computer tasks. The system features real-time WebSocket communication between a FastAPI backend and React frontend, allowing users to monitor agent execution, view screenshots, track token usage, and stream VNC sessions.

πŸ—οΈ Architecture

CUA2 Architecture

πŸ› οΈ Tech Stack

Backend (cua2-core)

  • FastAPI
  • Uvicorn
  • smolagents - AI agent framework with OpenAI/LiteLLM support

Frontend (cua2-front)

  • React TS
  • Vite

πŸ“‹ Prerequisites

  • Python 3.10 or higher
  • Node.js 18 or higher
  • npm
  • uv - Python package manager

Installing uv

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

For more installation options, visit: https://docs.astral.sh/uv/getting-started/installation/

πŸš€ Getting Started

1. Clone the Repository

git clone https://github.com/huggingface/CUA2.git
cd CUA2

2. Install Dependencies

Use the Makefile for quick setup:

make sync

This will:

  • Install Python dependencies using uv
  • Install Node.js dependencies for the frontend

Or install manually:

# Backend dependencies
cd cua2-core
uv sync --all-extras

# Frontend dependencies
cd ../cua2-front
npm install

3. Environment Configuration

Copy the example environment file and configure your settings:

cd cua2-core
cp env.example .env

Edit .env with your configuration:

  • API keys for OpenAI/LiteLLM
  • Database connections (if applicable)
  • Other service credentials

4. Start Development Servers

Option 1: Using Makefile (Recommended)

Open two terminal windows:

Terminal 1 - Backend:

make dev-backend

Terminal 2 - Frontend:

make dev-frontend

Option 2: Manual Start

Terminal 1 - Backend:

cd cua2-core
uv run uvicorn cua2_core.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend:

cd cua2-front
npm run dev

5. Access the Application

πŸ“ Project Structure

CUA2/
β”œβ”€β”€ cua2-core/                      # Backend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── cua2_core/
β”‚   β”‚       β”œβ”€β”€ app.py              # FastAPI application setup
β”‚   β”‚       β”œβ”€β”€ main.py             # Application entry point
β”‚   β”‚       β”œβ”€β”€ models/
β”‚   β”‚       β”‚   └── models.py       # Pydantic models
β”‚   β”‚       β”œβ”€β”€ routes/
β”‚   β”‚       β”‚   β”œβ”€β”€ routes.py       # REST API endpoints
β”‚   β”‚       β”‚   └── websocket.py    # WebSocket endpoint
β”‚   β”‚       β”œβ”€β”€ services/
β”‚   β”‚       β”‚   β”œβ”€β”€ agent_service.py # Agent task processing
β”‚   β”‚       β”‚   └── simulation_metadata/ # Demo data
β”‚   β”‚       └── websocket/
β”‚   β”‚           └── websocket_manager.py # WebSocket management
β”‚   β”œβ”€β”€ pyproject.toml              # Python dependencies
β”‚   └── env.example                 # Environment variables template
β”‚
β”œβ”€β”€ cua2-front/                     # Frontend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx                 # Main application component
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   └── Index.tsx           # Main page
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   └── mock/               # UI components
β”‚   β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   β”‚   └── useWebSocket.ts     # WebSocket hook
β”‚   β”‚   └── types/
β”‚   β”‚       └── agent.ts            # TypeScript type definitions
β”‚   β”œβ”€β”€ package.json                # Node dependencies
β”‚   └── vite.config.ts              # Vite configuration
β”‚
β”œβ”€β”€ Makefile                        # Development commands
└── README.md                       # This file

πŸ”Œ API Endpoints

REST API

Method Endpoint Description
GET /health Health check with WebSocket connection count
GET /tasks Get all active tasks
GET /tasks/{task_id} Get specific task status
GET /docs Interactive API documentation (Swagger)
GET /redoc Alternative API documentation (ReDoc)

WebSocket

Client β†’ Server Events

  • user_task - New user task request

Server β†’ Client Events

  • agent_start - Agent begins processing
  • agent_progress - New step completed with image and metadata
  • agent_complete - Task finished successfully
  • agent_error - Error occurred during processing
  • vnc_url_set - VNC stream URL available
  • vnc_url_unset - VNC stream ended
  • heartbeat - Connection keep-alive

πŸ§ͺ Development

Available Make Commands

make sync              # Sync all dependencies (Python + Node.js)
make dev-backend       # Start backend development server
make dev-frontend      # Start frontend development server
make pre-commit        # Run pre-commit hooks
make clean             # Clean build artifacts and caches

Code Quality

# Backend
make pre-commit

Build for Production

# Frontend
cd cua2-front
npm run build

# The build output will be in cua2-front/dist/

Happy Coding! πŸš€