computer-use-agent / README.md
A-Mahla
ADD CUA backbone (#1)
304e233
|
raw
history blame
5.68 kB
# CUA2 - Computer Use Agent 2
An AI-powered automation interface featuring real-time agent task processing, VNC streaming, and step-by-step execution visualization.
## πŸš€ Overview
CUA2 is a full-stack application that provides a modern web interface for AI agents to perform automated computer tasks. The system features real-time WebSocket communication between a FastAPI backend and React frontend, allowing users to monitor agent execution, view screenshots, track token usage, and stream VNC sessions.
## πŸ—οΈ Architecture
![CUA2 Architecture](assets/architecture.png)
## πŸ› οΈ Tech Stack
### Backend (`cua2-core`)
- **FastAPI**
- **Uvicorn**
- **smolagents** - AI agent framework with OpenAI/LiteLLM support
### Frontend (`cua2-front`)
- **React TS**
- **Vite**
## πŸ“‹ Prerequisites
- **Python** 3.10 or higher
- **Node.js** 18 or higher
- **npm**
- **uv** - Python package manager
### Installing uv
**macOS/Linux:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
For more installation options, visit: https://docs.astral.sh/uv/getting-started/installation/
## πŸš€ Getting Started
### 1. Clone the Repository
```bash
git clone https://github.com/huggingface/CUA2.git
cd CUA2
```
### 2. Install Dependencies
Use the Makefile for quick setup:
```bash
make sync
```
This will:
- Install Python dependencies using `uv`
- Install Node.js dependencies for the frontend
Or install manually:
```bash
# Backend dependencies
cd cua2-core
uv sync --all-extras
# Frontend dependencies
cd ../cua2-front
npm install
```
### 3. Environment Configuration
Copy the example environment file and configure your settings:
```bash
cd cua2-core
cp env.example .env
```
Edit `.env` with your configuration:
- API keys for OpenAI/LiteLLM
- Database connections (if applicable)
- Other service credentials
### 4. Start Development Servers
#### Option 1: Using Makefile (Recommended)
Open two terminal windows:
**Terminal 1 - Backend:**
```bash
make dev-backend
```
**Terminal 2 - Frontend:**
```bash
make dev-frontend
```
#### Option 2: Manual Start
**Terminal 1 - Backend:**
```bash
cd cua2-core
uv run uvicorn cua2_core.main:app --reload --host 0.0.0.0 --port 8000
```
**Terminal 2 - Frontend:**
```bash
cd cua2-front
npm run dev
```
### 5. Access the Application
- **Frontend**: http://localhost:5173
- **Backend API**: http://localhost:8000
- **API Documentation**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
## πŸ“ Project Structure
```
CUA2/
β”œβ”€β”€ cua2-core/ # Backend application
β”‚ β”œβ”€β”€ src/
β”‚ β”‚ └── cua2_core/
β”‚ β”‚ β”œβ”€β”€ app.py # FastAPI application setup
β”‚ β”‚ β”œβ”€β”€ main.py # Application entry point
β”‚ β”‚ β”œβ”€β”€ models/
β”‚ β”‚ β”‚ └── models.py # Pydantic models
β”‚ β”‚ β”œβ”€β”€ routes/
β”‚ β”‚ β”‚ β”œβ”€β”€ routes.py # REST API endpoints
β”‚ β”‚ β”‚ └── websocket.py # WebSocket endpoint
β”‚ β”‚ β”œβ”€β”€ services/
β”‚ β”‚ β”‚ β”œβ”€β”€ agent_service.py # Agent task processing
β”‚ β”‚ β”‚ └── simulation_metadata/ # Demo data
β”‚ β”‚ └── websocket/
β”‚ β”‚ └── websocket_manager.py # WebSocket management
β”‚ β”œβ”€β”€ pyproject.toml # Python dependencies
β”‚ └── env.example # Environment variables template
β”‚
β”œβ”€β”€ cua2-front/ # Frontend application
β”‚ β”œβ”€β”€ src/
β”‚ β”‚ β”œβ”€β”€ App.tsx # Main application component
β”‚ β”‚ β”œβ”€β”€ pages/
β”‚ β”‚ β”‚ └── Index.tsx # Main page
β”‚ β”‚ β”œβ”€β”€ components/
β”‚ β”‚ β”‚ └── mock/ # UI components
β”‚ β”‚ β”œβ”€β”€ hooks/
β”‚ β”‚ β”‚ └── useWebSocket.ts # WebSocket hook
β”‚ β”‚ └── types/
β”‚ β”‚ └── agent.ts # TypeScript type definitions
β”‚ β”œβ”€β”€ package.json # Node dependencies
β”‚ └── vite.config.ts # Vite configuration
β”‚
β”œβ”€β”€ Makefile # Development commands
└── README.md # This file
```
## πŸ”Œ API Endpoints
### REST API
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/health` | Health check with WebSocket connection count |
| GET | `/tasks` | Get all active tasks |
| GET | `/tasks/{task_id}` | Get specific task status |
| GET | `/docs` | Interactive API documentation (Swagger) |
| GET | `/redoc` | Alternative API documentation (ReDoc) |
### WebSocket
#### Client β†’ Server Events
- `user_task` - New user task request
#### Server β†’ Client Events
- `agent_start` - Agent begins processing
- `agent_progress` - New step completed with image and metadata
- `agent_complete` - Task finished successfully
- `agent_error` - Error occurred during processing
- `vnc_url_set` - VNC stream URL available
- `vnc_url_unset` - VNC stream ended
- `heartbeat` - Connection keep-alive
## πŸ§ͺ Development
### Available Make Commands
```bash
make sync # Sync all dependencies (Python + Node.js)
make dev-backend # Start backend development server
make dev-frontend # Start frontend development server
make pre-commit # Run pre-commit hooks
make clean # Clean build artifacts and caches
```
### Code Quality
```bash
# Backend
make pre-commit
```
### Build for Production
```bash
# Frontend
cd cua2-front
npm run build
# The build output will be in cua2-front/dist/
```
**Happy Coding! πŸš€**