Edition 2026.06

  • Status: provisional; seals when v0.1.0 ships

  • Protocol generation: rlmesh.protocol.v1

This document is the behavioral contract identified by the workflow edition string 2026.06. When a handshake selects 2026.06, both peers commit to the semantics below for the rest of the session.

The protobuf files for rlmesh.protocol.v1 define the wire shape; this document defines what conforming use of that shape means. Where an implementation and this document disagree, the implementation has a bug. A change that alters the meaning of an interaction described here mints a new edition; it does not amend this one once sealed.

Session Establishment

A session is one successful Handshake followed by one Join stream. Conforming clients do not open Join before completing a handshake with compatible = true.

  • The client states its wire protocol in protocol_generation and offers every edition it can operate under in supported_workflow_editions.

  • The server replies compatible = true only when the protocol generations are compatible and the edition offer intersects the server’s supported set. The selected edition is the highest mutual one and is returned in selected_workflow_edition.

  • On compatible = false, error_message explains the failure and supported_workflow_editions lists the server’s editions for diagnostics. The session ends; there is no renegotiation round.

  • Capability maps are advisory in both directions: a present key means the named feature is available, an absent key means it is not. Capabilities gate optional features; they never change the meaning of the interactions defined here.

  • The environment handshake response carries the EnvContract (spaces, metadata, render mode, num_envs) only when compatible = true. The contract is fixed for the life of the session.

Environment Workflow (rlmesh.env.v1.EnvService)

Ordering

Requests on a Join stream are processed strictly in arrival order, one at a time. Every request produces exactly one response, carrying the originating request_id. There is no reordering and no silent dropping: a request the server cannot satisfy produces an in-band EnvError response.

Vectorization

The served environment is a fixed-width vector of num_envs sub-environments, established by the handshake contract. Observations and actions are batched SpaceValues covering all sub-environments; rewards, terminated_mask, and truncated_mask carry one entry per sub-environment, in index order (mask bytes are per-sub-environment flags; nonzero means set).

Reset

Reset (re)starts all sub-environments and must precede the first Step of a session. seeds is either empty (server defaults apply) or carries one seed per sub-environment. The response carries the initial batched observation and one tracked episode id per sub-environment.

Step

Step applies one batched action to all sub-environments. The response carries the next batched observation, per-sub-environment rewards and termination/truncation masks, shared infos, the current episode_ids, and completed_episodes metadata for episodes that ended on this step.

Episode Accounting

A tracked episode per sub-environment begins at Reset. Each Step accrues to it. When a sub-environment reports terminated or truncated, its episode completes: metadata (id, seed, step count, cumulative reward, termination cause, timing, final_info) is delivered once in that response’s completed_episodes, and the sub-environment’s entry in episode_ids is empty until the next Reset.

The edition itself does not restart sub-environments on termination. Whether a terminated sub-environment continues to accept steps is the served environment’s autoreset behavior, conveyed through observations and infos; only an explicit Reset re-establishes tracked episodes.

Render and Close

Render returns a PNG frame when the environment supports the contract’s render mode, and no frame otherwise. Close ends the session: the server responds once (with metadata for episodes it finalizes) and then closes the Join stream. No further requests on that stream are answered.

Timeouts and Errors

A positive timeout_ms on a request is a server-enforced deadline; expiry produces an in-band EnvError with code TIMEOUT. Errors are reported in-band as EnvError responses with a code, message, and is_recoverable: a recoverable error leaves the session usable for further requests; a non-recoverable error means the client must abandon the session.

Shutdown

Shutdown is a unary request to terminate the endpoint itself, distinct from closing a session. The server may refuse it (accepted = false), and endpoints may disable remote shutdown entirely.

Model Workflow (rlmesh.model.v1.ModelService)

The model service reverses the dialing direction: the runtime connects to a served model participant. Session establishment is as above (without an environment contract in the response).

Route Lifecycle

A route is a configured model-side execution context within a session, identified by PredictContext.route_id. ConfigureRoute must precede the first Predict on a route and fixes that route’s EnvContract for all later predictions; a Predict on an unconfigured route produces a ModelError with code NOT_CONFIGURED. CloseRoute releases the route; Close requests graceful shutdown of the whole participant after the runtime reaches a normal terminal condition.

Predict

A Predict request carries the route context and a batched observation encoded per the route’s observation space; slots identifies each row of the batch (sub-environment index, episode id, step, reset flag). The response mirrors the request’s context: same session, route, request id, and slots. It carries an action batch encoded per the route’s action space, with one action row per observation slot.

Ordering and Errors

Every request on a Join stream is answered exactly once, mirroring request_id; failures are in-band ModelError responses, and is_recoverable has the same meaning as on the environment service.

A served model may pipeline requests: process them concurrently and emit responses in completion order rather than strict arrival order. This is a scheduling choice (out of scope for the edition; see below), never a wire change. Responses still mirror request_id, so a client demuxes them by id. A server that pipelines advertises the rlmesh.model.concurrent_predict.v1 capability at handshake; its absence means responses arrive in arrival order. Per-route ordering is always preserved: for a given route_id, the model applies ConfigureRoute, Predict, and CloseRoute in the order the client sent them, including lifecycle effects (on_reset, episode-end accounting). A whole-session Close drains after every outstanding request. Requests for different routes may complete in either order.

Value Conformance

When a handshake selects 2026.06, both peers agree on how observation and action values are checked against the spaces declared in the EnvContract. A value’s dtype is always coerced to its declared dtype before transport, so a delivered value’s dtype always equals the space the peer negotiated — a peer never receives a per-message dtype. A conformance warning may accompany a delivered value; a warning never withholds or alters the value beyond the coercion already applied.

Structural conformance

Regardless of the validation policy, a value is rejected when:

  • a Dict value is missing a declared key;

  • a Box or MultiDiscrete value has the wrong rank or shape, a MultiBinary value the wrong shape, or a Tuple value the wrong arity;

  • a Discrete element is outside its domain, or a MultiDiscrete element is outside its own per-element domain;

  • any element of a numeric value is NaN. NaN is never a member of any space, so it is rejected even when other elements are merely out of bounds.

A structural rejection is reported as a non-recoverable EnvError; the value is never delivered.

Range conformance

For Box bounds and Text charset and length, the serving side carries a validation policy:

  • warn (the default) — the deviation is delivered and reported as a conformance warning.

  • strict — the deviation is rejected, like a structural deviation.

  • off — the range, charset, and length checks are skipped; structural conformance still applies.

A Box element is in range when it satisfies its declared bounds; an infinite or absent bound imposes no constraint on that side, so +/-inf is in range against a matching infinite bound. A Text value conforms when its length is within [min_length, max_length] (counted in characters) and, when the charset is non-empty, every character is in it. Observations and actions share one policy and both default to warn.

Enforcement

The serving side validates each observation it produces before transport and each action it receives before delivering it to the environment. Structural deviations are rejected regardless of policy; range deviations follow the policy.

Dtype conformance

A value is coerced to its declared dtype before transport:

  • a value carried as a native RLMesh tensor must already have the declared dtype;

  • a value supplied as a host array or sequence (for example NumPy) is coerced to the declared dtype, except that a float supplied for an integer dtype is rejected unless every element is finite, integral, and representable in the target integer dtype’s range — there is no silent truncation and no out-of-range wraparound;

  • a floating-point value supplied for a floating-point dtype is coerced to the declared dtype (a narrowing such as float64 to float32 may lose precision).

Conformance warnings

A conformance warning is reported in-band under the reserved rlmesh.conformance.warning key in the info map returned by reset and step, at most once per (deviation kind, value path) per session. Conformance warnings are advisory and never make a session non-recoverable; the path format is advisory.

Out of Scope

This edition does not constrain transport security or authentication, client retry or reconnection policy, scheduling and batching strategy, performance, or the meaning of individual capability names. Those evolve freely without minting an edition.