Sandbox Examples

Sandbox helpers are experimental. Use them when an environment needs an owned Docker-backed process instead of a separate server terminal.

The runnable files live in examples/python/sandbox.

Gymnasium Sandbox

Start with the Gymnasium example:

uv run python examples/python/sandbox/gym_sandbox.py

It starts CartPole-v1 inside a sandbox image and connects with rlmesh.numpy.SandboxEnv:

from rlmesh.numpy import SandboxEnv

env = SandboxEnv(
    "CartPole-v1",
    packages=["gymnasium==1.3.0"],
    imports=["gymnasium"],
)

packages are installed in the sandbox image and imports are checked at startup. The client shape is the same as RemoteEnv, so a try/finally keeps the owned container from leaking:

MAX_STEPS = 45

try:
    obs, info = env.reset(seed=0)
    for step in range(1, MAX_STEPS + 1):
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        print(f"step={step} reward={reward:.3f}")
        if terminated or truncated:
            print("episode complete")
            break
finally:
    env.close()

The runnable file is examples/python/sandbox/gym_sandbox.py.

Hugging Face Sandbox

hf_sandbox.py shows the same single-env loop against a Hugging Face EnvHub source:

uv run python examples/python/sandbox/hf_sandbox.py

Only the constructor changes; the source is an hf:// reference instead of a Gymnasium id:

from rlmesh.numpy import SandboxEnv

env = SandboxEnv(
    "hf://lerobot/cartpole-env:cartpole_suite/0",
    trust_remote_code=True,
    allow_unpinned_hf=True,
)

The selector chooses suite cartpole_suite, task 0. The example uses SandboxEnv because it requests one environment. Use SandboxVectorEnv when serving more than one:

from rlmesh.numpy import SandboxVectorEnv

envs = SandboxVectorEnv("CartPole-v1", num_envs=2)

The demo is intentionally unpinned; for real evaluations, pin the repository to a full commit SHA and keep trust_remote_code=False unless you have reviewed the source.

The runnable file is examples/python/sandbox/hf_sandbox.py.