NumPy

Use the NumPy backend for examples, notebooks, and model code that already works with arrays.

What This Backend Changes

rlmesh.numpy keeps the same environment, model, and sandbox behavior as the shared RLMesh client APIs, but decodes tensor leaves to NumPy arrays. Space wrappers returned from NumPy clients also sample NumPy-compatible values.

Install it with:

pip install "rlmesh[numpy]"

Concrete API

Shared behavior

Backend-specific behavior

rlmesh.numpy.RemoteEnv

Remote Environments single clients

Observations, actions, and render frames use arrays.

rlmesh.numpy.RemoteVectorEnv

Remote Environments vector clients

Batched values use NumPy-compatible containers.

rlmesh.numpy.Model

Models

predict_fn receives NumPy-decoded observations.

rlmesh.numpy.SandboxEnv

Sandbox single sandbox sessions

Owned sandbox client is rlmesh.numpy.RemoteEnv.

rlmesh.numpy.SandboxModel

Sandbox

Runs a model policy in its own container (experimental).

rlmesh.numpy.SandboxVectorEnv

Sandbox vector sandbox sessions

Owned sandbox client is rlmesh.numpy.RemoteVectorEnv.

Conversion Semantics

  • asarray(tensor) returns a writable copy of the tensor bytes, matching Gymnasium where reset/step observations are writable (so obs /= 255.0 works). For a zero-copy, read-only view that shares the tensor buffer, use numpy.from_dlpack(tensor) or the buffer protocol.

  • from_array(array) always copies: it makes the array C-contiguous and serializes its bytes into a fresh RLMesh tensor.

  • bfloat16 tensors have no buffer-protocol format, so asarray copies through raw bytes and needs the optional ml_dtypes package. Install rlmesh[bfloat16]. Without it, asarray raises an ImportError naming that extra.

Value Helpers

rlmesh.numpy.ensure_available()[source]

Raise if NumPy is not installed.

Return type:

None

rlmesh.numpy.asarray(tensor)[source]

Return a writable NumPy array containing an RLMesh tensor’s data.

The returned array owns a fresh copy of the tensor bytes, so it is writable and matches Gymnasium, where reset/step observations are writable (idioms such as obs /= 255.0 work). For an opt-in zero-copy view that shares the tensor buffer, use the buffer protocol or DLPack directly (for example numpy.from_dlpack(tensor)), treating the result as read-only.

Parameters:

tensor (Tensor) – RLMesh tensor value to convert.

Returns:

A writable NumPy array with a copy of the tensor data. bfloat16 tensors require the ml_dtypes package (rlmesh[bfloat16]).

Return type:

object

rlmesh.numpy.from_array(array)[source]

Encode a NumPy array or scalar as an RLMesh value.

Parameters:

array (object) – NumPy array or scalar to encode.

Returns:

Tensor for non-scalar arrays, or a primitive for scalar values.

Return type:

Tensor | None | bool | int | float | str | bytes

rlmesh.numpy.space_from_spec(spec)[source]

Create a NumPy-adapted space wrapper for a native space spec.

Parameters:

spec (SpaceSpec)

Return type:

Space[None | bool | int | float | str | bytes | object | list[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]] | tuple[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], …] | dict[str, None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]]

RemoteEnv

final class rlmesh.numpy.RemoteEnv[source]

Bases: RemoteEnvBase[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]

NumPy-backed remote client for a single RLMesh environment.

Observations, rewards, and actions are decoded into Python primitives, NumPy arrays, or nested containers of those values. Use this client when a model or notebook expects NumPy values at the environment boundary.

Parameters:
  • address – Endpoint address such as "tcp://127.0.0.1:5555", "127.0.0.1:5555", or "unix:///tmp/env.sock".

  • host – TCP host helper used when address is omitted.

  • port – TCP port helper used when address is omitted.

  • path – Unix socket path helper used when address is omitted.

  • transport – Explicit transport selector.

Examples

>>> from rlmesh.numpy import RemoteEnv
>>> env = RemoteEnv("127.0.0.1:5555")
>>> observation, info = env.reset(seed=42)
>>> observation, reward, terminated, truncated, info = env.step(0)
>>> env.close()

RemoteVectorEnv

final class rlmesh.numpy.RemoteVectorEnv[source]

Bases: RemoteVectorEnvBase[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]

NumPy-backed remote client for a vectorized RLMesh environment.

A vector client connects one model process to an endpoint that owns multiple environment instances. Batched observations, rewards, terminations, and truncations decode into NumPy-compatible values.

Parameters:
  • address – Endpoint address such as "tcp://127.0.0.1:5555".

  • host – TCP host helper used when address is omitted.

  • port – TCP port helper used when address is omitted.

  • path – Unix socket path helper used when address is omitted.

  • transport – Explicit transport selector.

Examples

>>> from rlmesh.numpy import RemoteVectorEnv
>>> envs = RemoteVectorEnv("127.0.0.1:5555")
>>> observations, infos = envs.reset(seed=42)
>>> actions = [envs.single_action_space.sample() for _ in range(envs.num_envs)]
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
>>> envs.close()

Model

final class rlmesh.numpy.Model[source]

Bases: ModelBase[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]

NumPy-backed model: predict works in NumPy values.

The NumPy-typed ModelBaseModel(source, spec=...) where source is a predict callable; run(env, seeds=[...]) returns a typed RunResult. See ModelBase.

Examples

>>> from rlmesh.numpy import Model
>>> Model(lambda observation: 0).run("127.0.0.1:5555", seeds=[0]).mean_reward
0.0

Sandbox

final class rlmesh.numpy.SandboxEnv[source]

Bases: SandboxEnvBase[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]

Owned NumPy-backed sandbox session for one environment.

The sandbox starts an isolated environment process, connects a NumPy remote client to it, and stops the owned container when closed.

Parameters:
  • source – Gymnasium id, explicit gym:// source, or pinned environment source such as an EnvHub/Hugging Face reference.

  • base_image – Optional Docker base image override.

  • rlmesh_package – Optional RLMesh package, wheel, or "local" installed in the sandbox.

  • packages – Extra environment packages installed in the sandbox.

  • imports – Import names checked during sandbox startup.

  • trust_remote_code – Allow remote environment code to execute.

  • allow_unpinned_hf – Allow Hugging Face sources without a pinned revision.

  • **gym_make_kwargs – Keyword arguments forwarded to environment creation.

Examples

>>> from rlmesh.numpy import SandboxEnv
>>> env = SandboxEnv("CartPole-v1", packages=["gymnasium==1.3.0"])
>>> observation, info = env.reset(seed=42)
>>> env.close()
final class rlmesh.numpy.SandboxVectorEnv[source]

Bases: SandboxVectorEnvBase[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]

Owned NumPy-backed sandbox session for vectorized environments.

The sandbox starts multiple isolated environment instances and exposes them through the same vector client interface as a separately served endpoint.

Parameters:
  • source – Gymnasium id, explicit gym:// source, or pinned environment source such as an EnvHub/Hugging Face reference.

  • num_envs – Number of environment instances to create.

  • vectorization_mode – Vectorization mode requested inside the sandbox.

  • base_image – Optional Docker base image override.

  • rlmesh_package – Optional RLMesh package, wheel, or "local" installed in the sandbox.

  • packages – Extra environment packages installed in the sandbox.

  • imports – Import names checked during sandbox startup.

  • trust_remote_code – Allow remote environment code to execute.

  • allow_unpinned_hf – Allow Hugging Face sources without a pinned revision.

  • **env_make_kwargs – Keyword arguments forwarded to environment creation.

Examples

>>> from rlmesh.numpy import SandboxVectorEnv
>>> envs = SandboxVectorEnv("CartPole-v1", num_envs=2)
>>> observations, infos = envs.reset(seed=42)
>>> envs.close()