RLVE_Gym / README.md
ZhiyuanZeng's picture
Upload folder using huggingface_hub
eae0874 verified
|
raw
history blame
6.27 kB
metadata
title: RlveGym Environment Server
emoji: 📡
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

RlveGym Environment

This package contains a collection of 400 verifiable environments from RLVE-Gym, introduced by the paper RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments (original GitHub repository is here).

Quick Start

The simplest way to use RlveGym environment is through the RlveGymEnv class:

from RLVE_Gym import RlveGymAction, RlveGymEnv

try:
    # Create environment from Docker image
    RLVE_Gymenv = RlveGymEnv.from_docker_image("RLVE_Gym-env:latest")

    # Reset
    result = RLVE_Gymenv.reset()
    print(f"Problem Prompt: {result.observation.problem_input}")
    # Or:
    print(f"Problem Prompt (from the environment's state): {RLVE_Gymenv.state().problem_input}")

    # Send multiple outputs
    outputs = [
        "Wrong Format",
        r"<answer>0</answer>", # Wrong Answer
        r"<answer>" + str(RLVE_Gymenv.problem.parameter["reference_answer"]) + r"</answer>", # Correct Answer
    ]

    for output in outputs:
        result = RLVE_Gymenv.step(RlveGymAction(output = output))
        print(f"Sent: '{output}'")
        print(f"Result: `{result}`")

finally:
    # Always clean up
    RLVE_Gymenv.close()

That's it! The RlveGymEnv.from_docker_image() method handles:

  • Starting the Docker container
  • Waiting for the server to be ready
  • Connecting to the environment
  • Container cleanup when you call close()

Building the Docker Image

Before using the environment, you need to build the Docker image:

# From project root
docker build -t RLVE_Gym-env:latest -f server/Dockerfile .

Deploying to Hugging Face Spaces

You can easily deploy your OpenEnv environment to Hugging Face Spaces using the openenv push command:

# From the environment directory (where openenv.yaml is located)
openenv push

# Or specify options
openenv push --namespace my-org --private

The openenv push command will:

  1. Validate that the directory is an OpenEnv environment (checks for openenv.yaml)
  2. Prepare a custom build for Hugging Face Docker space (enables web interface)
  3. Upload to Hugging Face (ensuring you're logged in)

Prerequisites

  • Authenticate with Hugging Face: The command will prompt for login if not already authenticated

Options

  • --directory, -d: Directory containing the OpenEnv environment (defaults to current directory)
  • --repo-id, -r: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
  • --base-image, -b: Base Docker image to use (overrides Dockerfile FROM)
  • --private: Deploy the space as private (default: public)

Examples

# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
openenv push

# Push to a specific repository
openenv push --repo-id my-org/my-env

# Push with a custom base image
openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest

# Push as a private space
openenv push --private

# Combine options
openenv push --repo-id my-org/my-env --base-image custom-base:latest --private

After deployment, your space will be available at: https://huggingface.co/spaces/<repo-id>

The deployed space includes:

  • Web Interface at /web - Interactive UI for exploring the environment
  • API Documentation at /docs - Full OpenAPI/Swagger interface
  • Health Check at /health - Container health monitoring

Environment Details

Environment Initialization

Please check here for detailed usage:

  • environment_identifier (str) - The environment's identifier. Check here for detailed usage.
  • difficulty (int) - The difficulty of generated problems.
  • answer_markers (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
  • seed (int) - The initial seed to use when generating the first problem. Whenever reset() is called, the seed will be incremented by 1.

Action

RlveGymAction: Contains a single field

  • output (str) - The model's output to get verified.

State

RlveGymState:

  • seed (int) - The seed to use when running reset().
  • problem_input (Optional[str]) - The input of the problem; if it is None, it means that the problem generation has not been run, or it failed.
  • num_samples (int) and sum_accuracy (int) - The statistics of the result of step(action) so far for the current problem (the number of outputs sent to the verifier and the number of correct ones).

Observation

RlveGymObservation:

  • problem_input (Optional[str]) - The input of the problem; if it is None, it means that the problem generation has not been run, or it failed.
  • verifier_result (Optional[dict]) - Contains reward as the raw reward, accuracy as the 0/1 correctness, and format_score as the 0/1 format correctness.
  • success (bool) - True or False indicates whether the operation succeeds.
  • message (str) - The explanation of success.
  • reward (Optional[float]) - The value is verifier_result["reward"].

Advanced Usage

Connecting to an Existing Server

If you already have an RlveGymEnv server running, you can connect directly:

from RLVE_Gym import RlveGymEnv

# Connect to existing server
RLVE_Gymenv = RlveGymEnv(base_url="<ENV_HTTP_URL_HERE>")

# Use as normal
result = RLVE_Gymenv.reset()
result = RLVE_Gymenv.step(RlveGymAction(output="Hello!"))

Note: When connecting to an existing server, RLVE_Gymenv.close() will NOT stop the server.

Development & Testing

Direct Environment Testing

Test the environment logic directly without starting the HTTP server:

# From the server directory
python3 server/RLVE_Gym_environment.py

This verifies that:

  • Environment resets correctly
  • Step executes actions properly
  • State tracking works
  • Rewards are calculated correctly

Running Locally

Run the server locally for development:

uvicorn server.app:app --reload