Custom Work¶
RLMesh works with Gymnasium registrations and Gymnasium-style Python objects. The quickstart serves one custom environment over gRPC and connects one model worker to it.
Custom Environment¶
examples/python/quickstart/serve.py defines a tiny
CounterEnv without importing Gymnasium, then serves it with EnvServer:
import rlmesh
class CounterEnv:
observation_space = rlmesh.spaces.Discrete(5)
action_space = rlmesh.spaces.Discrete(2)
def __init__(self):
self.step_count = 0
def reset(self, seed=None, options=None):
self.step_count = 0
return 0, {}
def step(self, action):
self.step_count += 1
observation = self.step_count % 5
terminated = self.step_count >= 3
return observation, 1.0, terminated, False, {"action": action}
def close(self):
pass
server = rlmesh.EnvServer(CounterEnv(), "127.0.0.1:5555")
print(f"serving CounterEnv on {server.address}")
server.serve()
Replace CounterEnv with your own environment object if it has the same shape: an
observation_space, an action_space, reset(seed=None, options=None), step(action), and
close(). Run it:
uv run python examples/python/quickstart/serve.py
Custom Model¶
examples/python/quickstart/model.py wraps a
prediction function as a model worker. predict takes an observation and returns an action; Model
runs episodes against the served endpoint:
from rlmesh.numpy import Model
def predict(observation):
return 0
model = Model(predict)
model.run("127.0.0.1:5555", max_episodes=1)
Run it against the server:
uv run python examples/python/quickstart/model.py --episodes 1
Drive It Yourself¶
If you would rather step the environment by hand instead of handing it to a Model,
examples/python/quickstart/eval.py opens a
RemoteEnv and runs a sampled-action loop:
from rlmesh.numpy import RemoteEnv
env = RemoteEnv("127.0.0.1:5555")
obs, info = env.reset(seed=0)
for step in range(1, 65):
action = env.action_space.sample()
obs, reward, term, trunc, info = env.step(action)
if term or trunc:
break
env.close()
That is the separation: the environment serves observations, and the model worker (or your own loop) returns actions.