--- title: Browsergym_env Environment Server emoji: 🐏 colorFrom: gray colorTo: purple sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv --- # Browsergym_env Environment Server FastAPI server for browsergym_env environment powered by Meta's OpenEnv. ## About This Space provides a containerized environment for browsergym_env interactions. Built with FastAPI and OpenEnv framework. ## Web Interface This deployment includes an interactive web interface for exploring the environment: - **HumanAgent Interface**: Interact with the environment using a web form - **State Observer**: Real-time view of environment state and action history - **Live Updates**: WebSocket-based real-time updates Access the web interface at: `/web` ## API Documentation Visit `/docs` for interactive API documentation. ## Health Check The environment provides a health check endpoint at `/health`. # BrowserGym Environment BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv. ## Why BrowserGym? BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites. **What are these benchmarks?** - **MiniWoB++ (Training)**: 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. **No external setup needed** - tasks run in isolated browser sessions. - **WebArena (Evaluation)**: 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multi-step, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. **Requires running 7 backend services** (shopping site, GitLab instance, etc). - **VisualWebArena**: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content. - **WorkArena**: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications. **The training → evaluation pipeline:** 1. Train on MiniWoB (simple, controlled, fast iterations) 2. Evaluate on WebArena (complex, realistic, measures real-world capability) **Key advantage**: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works. ## Quick Start - Training (MiniWoB) ### No Setup Required! 🎉 ```python from envs.browsergym_env import BrowserGymEnv, BrowserGymAction # Create environment for MiniWoB training task env = BrowserGymEnv.from_docker_image( "ghcr.io/openenv/browsergym-env:latest", environment={ "BROWSERGYM_BENCHMARK": "miniwob", "BROWSERGYM_TASK_NAME": "click-test", # or "click-button", "click-dialog", etc. } ) # Train your agent! for episode in range(1000): result = env.reset() print(f"Goal: {result.observation.goal}") done = False while not done: # Your agent decides what to do action_str = agent.get_action(result.observation.text) action = BrowserGymAction(action_str=action_str) result = env.step(action) done = result.done print(f"Reward: {result.reward}") env.close() ``` ### Available Tasks by Benchmark #### MiniWoB++ Tasks (Training - 100+ tasks) MiniWoB tasks are organized by difficulty and type. Here are the main categories: **Click Tasks** (Basic interaction) | Task Name | Description | Difficulty | |-----------|-------------|------------| | `click-test` | Click a single button | ⭐ Easy | | `click-button` | Click button with specific text | ⭐ Easy | | `click-button-sequence` | Click buttons in order | ⭐⭐ Medium | | `click-checkboxes` | Select specific checkboxes | ⭐⭐ Medium | | `click-checkboxes-soft` | Select checkboxes (multiple valid) | ⭐⭐ Medium | | `click-checkboxes-large` | Many checkboxes to select from | ⭐⭐ Medium | | `click-checkboxes-transfer` | Transfer learning variation | ⭐⭐ Medium | | `click-dialog` | Click correct button in dialog | ⭐ Easy | | `click-dialog-2` | More complex dialog | ⭐⭐ Medium | | `click-link` | Click on a link | ⭐ Easy | | `click-option` | Select from dropdown | ⭐⭐ Medium | | `click-pie` | Click on pie chart slice | ⭐⭐ Medium | | `click-scroll-list` | Click item in scrollable list | ⭐⭐⭐ Hard | | `click-shades` | Click on specific color shade | ⭐⭐ Medium | | `click-shape` | Click on specific shape | ⭐⭐ Medium | | `click-tab` | Switch between tabs | ⭐⭐ Medium | | `click-tab-2` | More complex tab switching | ⭐⭐⭐ Hard | | `click-widget` | Click on UI widget | ⭐⭐ Medium | **Text Entry Tasks** (Typing and forms) | Task Name | Description | Difficulty | |-----------|-------------|------------| | `enter-text` | Type text into input field | ⭐ Easy | | `enter-text-dynamic` | Dynamic text entry | ⭐⭐ Medium | | `enter-text-2` | Multiple text fields | ⭐⭐ Medium | | `enter-password` | Fill password field | ⭐ Easy | | `enter-date` | Enter a date | ⭐⭐ Medium | | `enter-time` | Enter a time | ⭐⭐ Medium | | `login-user` | Complete login form | ⭐⭐ Medium | | `login-user-popup` | Login via popup | ⭐⭐⭐ Hard | **Navigation Tasks** (Multi-step interaction) | Task Name | Description | Difficulty | |-----------|-------------|------------| | `navigate-tree` | Navigate through tree structure | ⭐⭐⭐ Hard | | `search-engine` | Use search interface | ⭐⭐ Medium | | `use-autocomplete` | Interact with autocomplete | ⭐⭐⭐ Hard | | `book-flight` | Book a flight (complex form) | ⭐⭐⭐⭐ Very Hard | | `choose-date` | Pick date from calendar | ⭐⭐⭐ Hard | | `choose-date-easy` | Simplified date picker | ⭐⭐ Medium | | `choose-date-medium` | Medium difficulty date picker | ⭐⭐⭐ Hard | | `choose-list` | Select from long list | ⭐⭐ Medium | **Visual/Spatial Tasks** (Requires visual understanding) | Task Name | Description | Difficulty | |-----------|-------------|------------| | `count-sides` | Count sides of shape | ⭐⭐ Medium | | `count-shape` | Count specific shapes | ⭐⭐ Medium | | `find-word` | Find word in text | ⭐⭐ Medium | | `focus-text` | Focus on text element | ⭐ Easy | | `focus-text-2` | More complex focus task | ⭐⭐ Medium | | `grid-coordinate` | Click grid coordinate | ⭐⭐ Medium | | `guess-number` | Guess a number game | ⭐⭐⭐ Hard | | `identify-shape` | Identify shape type | ⭐⭐ Medium | | `read-table` | Extract info from table | ⭐⭐⭐ Hard | | `read-table-2` | More complex table reading | ⭐⭐⭐ Hard | **Email/Social Tasks** (Realistic scenarios) | Task Name | Description | Difficulty | |-----------|-------------|------------| | `email-inbox` | Manage email inbox | ⭐⭐⭐⭐ Very Hard | | `email-inbox-forward` | Forward emails | ⭐⭐⭐⭐ Very Hard | | `email-inbox-nl` | Natural language email task | ⭐⭐⭐⭐ Very Hard | | `email-inbox-star-reply` | Star and reply to emails | ⭐⭐⭐⭐ Very Hard | | `social-media` | Social media interaction | ⭐⭐⭐⭐ Very Hard | | `social-media-some` | Partial social media task | ⭐⭐⭐ Hard | **Total:** 100+ tasks across all categories **Usage:** ```python # Easy task for quick testing env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"}) # Medium difficulty for training env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"}) # Hard task for evaluation env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"}) ``` #### WebArena Tasks (Evaluation - 812 tasks) WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811. **By Website:** | Website | Task Count | Description | Example Tasks | |---------|------------|-------------|---------------| | Shopping | ~200 | E-commerce site | Search products, add to cart, checkout | | Shopping Admin | ~150 | Admin panel | Manage products, orders, customers | | Reddit | ~150 | Forum/social | Post, comment, search discussions | | GitLab | ~200 | Code repository | Create issues, merge requests, review code | | Wikipedia | ~100 | Knowledge base | Search, read, extract information | | Map | ~12 | Location service | Find places, get directions | **By Difficulty:** | Difficulty | Task Count | Steps Required | Example | |------------|------------|----------------|---------| | Easy | ~200 | 1-5 steps | "Find the price of product X" | | Medium | ~400 | 5-15 steps | "Add cheapest laptop to cart" | | Hard | ~212 | 15+ steps | "Create merge request for bug fix" | **Usage:** ```python # Task 0 (usually easy) env = BrowserGymEnv(environment={ "BROWSERGYM_BENCHMARK": "webarena", "BROWSERGYM_TASK_NAME": "0", "SHOPPING": "http://your-server:7770", # ... other URLs }) # Task 156 (GitLab merge request) env = BrowserGymEnv(environment={ "BROWSERGYM_BENCHMARK": "webarena", "BROWSERGYM_TASK_NAME": "156", # ... URLs }) ``` **Note:** WebArena tasks require the full backend infrastructure. See [WebArena setup guide](https://github.com/web-arena-x/webarena/tree/main/environment_docker). #### VisualWebArena Tasks (910 tasks) Similar to WebArena but requires visual understanding. Tasks involve: - Image-based reasoning - Visual element identification - Multimodal interaction (text + images) #### WorkArena Tasks Enterprise software automation tasks: - CRM operations - Project management - Business workflows **Full task lists:** - [MiniWoB++ tasks](https://github.com/Farama-Foundation/miniwob-plusplus/tree/master/miniwob/environment) - [WebArena tasks](https://github.com/web-arena-x/webarena/blob/main/config_files/) - [BrowserGym documentation](https://github.com/ServiceNow/BrowserGym) ## Evaluation (WebArena) ### Prerequisites WebArena requires setting up backend infrastructure. See the [WebArena documentation](https://github.com/web-arena-x/webarena/tree/main/environment_docker). ### Usage ```python from envs.browsergym_env import BrowserGymEnv, BrowserGymAction # Create environment for WebArena evaluation env = BrowserGymEnv.from_docker_image( "ghcr.io/openenv/browsergym-env:latest", environment={ "BROWSERGYM_BENCHMARK": "webarena", "BROWSERGYM_TASK_NAME": "0", # Task ID # WebArena backend URLs (required) "SHOPPING": "http://your-server:7770", "SHOPPING_ADMIN": "http://your-server:7780/admin", "REDDIT": "http://your-server:9999", "GITLAB": "http://your-server:8023", "MAP": "http://your-server:3000", "WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing", "HOMEPAGE": "http://your-server:4399", } ) # Evaluate your trained agent result = env.reset() while not result.done: action_str = agent.get_action(result.observation) action = BrowserGymAction(action_str=action_str) result = env.step(action) print(f"Success: {result.reward}") env.close() ``` ## Building the Docker Image ### Prerequisites 1. **Base Image**: Build the OpenEnv base image first: ```bash # From the OpenEnv repository root docker build -t openenv-base:latest -f src/core/containers/images/Dockerfile . ``` ### Build the BrowserGym Environment ```bash # From the OpenEnv repository root docker build -t browsergym-env:latest -f src/envs/browsergym_env/server/Dockerfile . ``` ### Run the Server #### For MiniWoB (Training): ```bash docker run -p 8000:8000 \ -e BROWSERGYM_BENCHMARK="miniwob" \ -e BROWSERGYM_TASK_NAME="click-test" \ browsergym-env:latest ``` #### For WebArena (Evaluation): ```bash docker run -p 8000:8000 \ -e BROWSERGYM_BENCHMARK="webarena" \ -e BROWSERGYM_TASK_NAME="0" \ -e SHOPPING="http://your-server:7770" \ -e SHOPPING_ADMIN="http://your-server:7780/admin" \ -e REDDIT="http://your-server:9999" \ -e GITLAB="http://your-server:8023" \ -e MAP="http://your-server:3000" \ -e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \ -e HOMEPAGE="http://your-server:4399" \ browsergym-env:latest ``` ## Environment Details ### Action Actions in BrowserGym are natural language strings that describe browser operations: ```python from envs.browsergym_env import BrowserGymAction # Click actions action = BrowserGymAction(action_str="click('Submit button')") action = BrowserGymAction(action_str="click('element_id_123')") # Type actions action = BrowserGymAction(action_str="fill('username', 'john@example.com')") action = BrowserGymAction(action_str="fill('password', 'secret123')") # Navigate actions action = BrowserGymAction(action_str="goto('https://example.com')") # Keyboard actions action = BrowserGymAction(action_str="press('Enter')") action = BrowserGymAction(action_str="press('Tab')") # Scroll actions action = BrowserGymAction(action_str="scroll('down')") ``` ### Observation Observations contain multiple modalities: ```python result = env.step(action) obs = result.observation # Text observations print(obs.text) # Primary text representation (AXTree or DOM) print(obs.axtree_txt) # Accessibility tree print(obs.pruned_html) # Pruned HTML (interactive elements only) # Page metadata print(obs.url) # Current URL print(obs.goal) # Task goal/instruction # Visual (if enabled) if obs.screenshot is not None: print(obs.screenshot.shape) # [height, width, channels] # Error handling if obs.last_action_error: print(f"Action failed: {obs.error}") # Episode status print(obs.done) # True if episode ended print(obs.reward) # Reward for the step # Access full BrowserGym data (includes timestamps, etc.) print(obs.metadata["browsergym_obs"]) # Full observation dict from BrowserGym print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.) ``` #### Advanced: Accessing Raw BrowserGym Data For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in `metadata`: ```python result = env.step(action) # Access timestamps (if available) info = result.observation.metadata["browsergym_info"] if "timestamp" in info: print(f"Action timestamp: {info['timestamp']}") # Access additional observation fields obs_dict = result.observation.metadata["browsergym_obs"] if "dom_object" in obs_dict: dom = obs_dict["dom_object"] # Work with raw DOM object # Access page performance data if "performance" in info: print(f"Page load time: {info['performance']}") ``` ### State The environment state tracks progress: ```python state = env.state() print(f"Benchmark: {state.benchmark}") # 'miniwob', 'webarena', etc. print(f"Task: {state.task_name}") # Task name/ID print(f"Episode: {state.episode_id}") # Unique episode ID print(f"Steps: {state.step_count}") # Number of steps taken print(f"Total Reward: {state.cum_reward}") # Cumulative reward print(f"Goal: {state.goal}") # Task instruction print(f"URL: {state.current_url}") # Current page URL ``` ## Configuration Environment variables: ### Common Settings - `BROWSERGYM_BENCHMARK`: Benchmark to use (`miniwob`, `webarena`, `visualwebarena`, `workarena`) - `BROWSERGYM_TASK_NAME`: Specific task name (optional, will use first available if not set) - `BROWSERGYM_HEADLESS`: Run browser in headless mode (default: `true`) - `BROWSERGYM_VIEWPORT_WIDTH`: Browser viewport width (default: `1280`) - `BROWSERGYM_VIEWPORT_HEIGHT`: Browser viewport height (default: `720`) - `BROWSERGYM_TIMEOUT`: Action timeout in milliseconds (default: `10000`) ### WebArena-Specific (only needed for WebArena benchmark) - `SHOPPING`: Shopping website URL - `SHOPPING_ADMIN`: Shopping admin panel URL - `REDDIT`: Reddit-like forum URL - `GITLAB`: GitLab instance URL - `MAP`: Map service URL - `WIKIPEDIA`: Wikipedia instance URL - `HOMEPAGE`: Homepage URL ## Supported Benchmarks ### 1. MiniWoB++ (Training) ✅ Recommended for Training - **100+ tasks** ranging from simple (click buttons) to complex (form filling, navigation) - **Fast**: Instant resets, quick episodes - **Randomized**: Task variations for generalization - **No setup**: Works out-of-the-box - **Dense rewards**: Immediate feedback for learning **Use Case**: Train agents on fundamental web navigation skills ### 2. WebArena (Evaluation) 📊 Benchmark - **812 realistic tasks** across 6 websites - **Complex**: Multi-step reasoning, real web interfaces - **Requires setup**: Need to run 7 backend services - **Sparse rewards**: Binary success/failure - **Evaluation-focused**: Test real-world performance **Use Case**: Evaluate agents on realistic web tasks ### 3. VisualWebArena (Evaluation) 👁️ Visual Benchmark - **910 tasks** requiring visual understanding - **Multimodal**: Both text and visual observations - **Requires setup**: Similar to WebArena - **Challenging**: Requires visual reasoning **Use Case**: Test visual web navigation capabilities ### 4. WorkArena (Evaluation) 💼 Enterprise Benchmark - **Enterprise tasks**: CRM, project management, etc. - **Realistic workflows**: Real enterprise software - **Requires setup**: Enterprise software instances **Use Case**: Evaluate on business automation tasks ## Typical Training Pipeline ```python from envs.browsergym_env import BrowserGymEnv, BrowserGymAction # Stage 1: Train on MiniWoB (simple tasks, fast) train_env = BrowserGymEnv.from_docker_image( "browsergym-env:latest", environment={ "BROWSERGYM_BENCHMARK": "miniwob", "BROWSERGYM_TASK_NAME": "click-button", } ) # Train your agent (RL, imitation learning, etc.) agent.train(train_env, num_episodes=10000) train_env.close() # Stage 2: Evaluate on WebArena (complex tasks, realistic) eval_env = BrowserGymEnv.from_docker_image( "browsergym-env:latest", environment={ "BROWSERGYM_BENCHMARK": "webarena", "BROWSERGYM_TASK_NAME": "0", # ... WebArena URLs } ) # Test performance success_rate = agent.evaluate(eval_env, num_tasks=812) print(f"WebArena Success Rate: {success_rate:.2%}") eval_env.close() ``` ## Development & Testing ### Running Tests ```bash # From the OpenEnv repository root pytest tests/envs/test_browsergym_env.py ``` ### Local Development ```bash # Install in development mode cd /path/to/OpenEnv pip install -e . # Install BrowserGym pip install browsergym browsergym-miniwob browsergym-webarena # Run the server locally cd src/envs/browsergym_env/server export BROWSERGYM_BENCHMARK=miniwob export BROWSERGYM_TASK_NAME=click-test python app.py ``` ## Project Structure ``` browsergym_env/ ├── __init__.py # Module exports ├── models.py # Action, Observation, State dataclasses ├── client.py # HTTPEnvClient implementation ├── README.md # This file └── server/ ├── __init__.py ├── app.py # FastAPI application ├── browsergym_environment.py # Environment implementation ├── Dockerfile # Container specification └── requirements.txt # Python dependencies ``` ## References - [BrowserGym GitHub](https://github.com/ServiceNow/BrowserGym) - [MiniWoB++ Paper](https://arxiv.org/abs/1802.08802) - [WebArena Paper](https://arxiv.org/abs/2307.13854) - [WebArena Website](https://webarena.dev/) - [VisualWebArena Paper](https://jykoh.com/vwa) - [OpenEnv Documentation](https://github.com/openenv/openenv)