NumPy¶
Use the NumPy backend for examples, notebooks, and model code that already works with arrays.
What This Backend Changes¶
rlmesh.numpy keeps the same environment, model, and sandbox behavior as the shared RLMesh client
APIs, but decodes tensor leaves to NumPy arrays. Space wrappers returned from NumPy clients also
sample NumPy-compatible values.
Install it with:
pip install "rlmesh[numpy]"
Concrete API |
Shared behavior |
Backend-specific behavior |
|---|---|---|
|
Remote Environments single clients |
Observations, actions, and render frames use arrays. |
|
Remote Environments vector clients |
Batched values use NumPy-compatible containers. |
|
|
|
|
Sandbox single sandbox sessions |
Owned sandbox client is |
|
Runs a model policy in its own container (experimental). |
|
|
Sandbox vector sandbox sessions |
Owned sandbox client is |
Conversion Semantics¶
asarray(tensor)returns a writable copy of the tensor bytes, matching Gymnasium wherereset/stepobservations are writable (soobs /= 255.0works). For a zero-copy, read-only view that shares the tensor buffer, usenumpy.from_dlpack(tensor)or the buffer protocol.from_array(array)always copies: it makes the array C-contiguous and serializes its bytes into a fresh RLMesh tensor.bfloat16tensors have no buffer-protocol format, soasarraycopies through raw bytes and needs the optional ml_dtypes package. Installrlmesh[bfloat16]. Without it,asarrayraises anImportErrornaming that extra.
Value Helpers¶
- rlmesh.numpy.asarray(tensor)[source]¶
Return a writable NumPy array containing an RLMesh tensor’s data.
The returned array owns a fresh copy of the tensor bytes, so it is writable and matches Gymnasium, where
reset/stepobservations are writable (idioms such asobs /= 255.0work). For an opt-in zero-copy view that shares the tensor buffer, use the buffer protocol or DLPack directly (for examplenumpy.from_dlpack(tensor)), treating the result as read-only.- Parameters:
tensor (Tensor) – RLMesh tensor value to convert.
- Returns:
A writable NumPy array with a copy of the tensor data.
bfloat16tensors require theml_dtypespackage (rlmesh[bfloat16]).- Return type:
object
- rlmesh.numpy.from_array(array)[source]¶
Encode a NumPy array or scalar as an RLMesh value.
- Parameters:
array (object) – NumPy array or scalar to encode.
- Returns:
Tensor for non-scalar arrays, or a primitive for scalar values.
- Return type:
Tensor | None | bool | int | float | str | bytes
- rlmesh.numpy.space_from_spec(spec)[source]¶
Create a NumPy-adapted space wrapper for a native space spec.
- Parameters:
spec (SpaceSpec)
- Return type:
Space[None | bool | int | float | str | bytes | object | list[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]] | tuple[None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue], …] | dict[str, None | bool | int | float | str | bytes | object | list[NumpyValue] | tuple[NumpyValue, …] | dict[str, NumpyValue]]]
RemoteEnv¶
- final class rlmesh.numpy.RemoteEnv[source]¶
Bases:
RemoteEnvBase[None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue],None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue]]NumPy-backed remote client for a single RLMesh environment.
Observations, rewards, and actions are decoded into Python primitives, NumPy arrays, or nested containers of those values. Use this client when a model or notebook expects NumPy values at the environment boundary.
- Parameters:
address – Endpoint address such as
"tcp://127.0.0.1:5555","127.0.0.1:5555", or"unix:///tmp/env.sock".host – TCP host helper used when
addressis omitted.port – TCP port helper used when
addressis omitted.path – Unix socket path helper used when
addressis omitted.transport – Explicit transport selector.
Examples
>>> from rlmesh.numpy import RemoteEnv >>> env = RemoteEnv("127.0.0.1:5555") >>> observation, info = env.reset(seed=42) >>> observation, reward, terminated, truncated, info = env.step(0) >>> env.close()
RemoteVectorEnv¶
- final class rlmesh.numpy.RemoteVectorEnv[source]¶
Bases:
RemoteVectorEnvBase[None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue],None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue]]NumPy-backed remote client for a vectorized RLMesh environment.
A vector client connects one model process to an endpoint that owns multiple environment instances. Batched observations, rewards, terminations, and truncations decode into NumPy-compatible values.
- Parameters:
address – Endpoint address such as
"tcp://127.0.0.1:5555".host – TCP host helper used when
addressis omitted.port – TCP port helper used when
addressis omitted.path – Unix socket path helper used when
addressis omitted.transport – Explicit transport selector.
Examples
>>> from rlmesh.numpy import RemoteVectorEnv >>> envs = RemoteVectorEnv("127.0.0.1:5555") >>> observations, infos = envs.reset(seed=42) >>> actions = [envs.single_action_space.sample() for _ in range(envs.num_envs)] >>> observations, rewards, terminations, truncations, infos = envs.step(actions) >>> envs.close()
Model¶
- final class rlmesh.numpy.Model[source]¶
Bases:
ModelBase[None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue],None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue]]NumPy-backed model:
predictworks in NumPy values.The NumPy-typed
ModelBase–Model(source, spec=...)wheresourceis a predict callable;run(env, seeds=[...])returns a typedRunResult. SeeModelBase.Examples
>>> from rlmesh.numpy import Model >>> Model(lambda observation: 0).run("127.0.0.1:5555", seeds=[0]).mean_reward 0.0
Sandbox¶
- final class rlmesh.numpy.SandboxEnv[source]¶
Bases:
SandboxEnvBase[None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue],None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue]]Owned NumPy-backed sandbox session for one environment.
The sandbox starts an isolated environment process, connects a NumPy remote client to it, and stops the owned container when closed.
- Parameters:
source – Gymnasium id, explicit
gym://source, or pinned environment source such as an EnvHub/Hugging Face reference.base_image – Optional Docker base image override.
rlmesh_package – Optional RLMesh package, wheel, or
"local"installed in the sandbox.packages – Extra environment packages installed in the sandbox.
imports – Import names checked during sandbox startup.
trust_remote_code – Allow remote environment code to execute.
allow_unpinned_hf – Allow Hugging Face sources without a pinned revision.
**gym_make_kwargs – Keyword arguments forwarded to environment creation.
Examples
>>> from rlmesh.numpy import SandboxEnv >>> env = SandboxEnv("CartPole-v1", packages=["gymnasium==1.3.0"]) >>> observation, info = env.reset(seed=42) >>> env.close()
- final class rlmesh.numpy.SandboxVectorEnv[source]¶
Bases:
SandboxVectorEnvBase[None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue],None|bool|int|float|str|bytes|object|list[NumpyValue] |tuple[NumpyValue, …] |dict[str, NumpyValue]]Owned NumPy-backed sandbox session for vectorized environments.
The sandbox starts multiple isolated environment instances and exposes them through the same vector client interface as a separately served endpoint.
- Parameters:
source – Gymnasium id, explicit
gym://source, or pinned environment source such as an EnvHub/Hugging Face reference.num_envs – Number of environment instances to create.
vectorization_mode – Vectorization mode requested inside the sandbox.
base_image – Optional Docker base image override.
rlmesh_package – Optional RLMesh package, wheel, or
"local"installed in the sandbox.packages – Extra environment packages installed in the sandbox.
imports – Import names checked during sandbox startup.
trust_remote_code – Allow remote environment code to execute.
allow_unpinned_hf – Allow Hugging Face sources without a pinned revision.
**env_make_kwargs – Keyword arguments forwarded to environment creation.
Examples
>>> from rlmesh.numpy import SandboxVectorEnv >>> envs = SandboxVectorEnv("CartPole-v1", num_envs=2) >>> observations, infos = envs.reset(seed=42) >>> envs.close()