Environment

`Environment`

Bases: PyTreeNode

`StepType`

Bases: PyTreeNode

`TERMINATION = jnp.asarray(2)` `class-attribute` `instance-attribute`

The episode ended and the current state is an absorbing state.

`TRANSITION = jnp.asarray(0)` `class-attribute` `instance-attribute`

Standard timestep transition: the episode continues

`TRUNCATION = jnp.asarray(1)` `class-attribute` `instance-attribute`

The environment reached its maximum number of timesteps. The episode ended, but the agent could have still collected rewards. The value of the state is not 0

`Timestep`

Bases: PyTreeNode

`action: Array` `instance-attribute`

The action taken by the agent at the current timestep a_t = $\pi(s_t)$, where $s_t$ is state

`info: Dict[str, Any] = struct.field(default_factory=dict)` `class-attribute` `instance-attribute`

Additional information about the environment. Useful for accumulations (e.g. returns)

`observation: Array` `instance-attribute`

The observation corresponding to the current state (for POMDPs)

`reward: Array` `instance-attribute`

The reward $r_{t=1}$ received by the agent after taking action $a_t$

`state: State` `instance-attribute`

The true state of the MDP, $s_t$ before taking action action

`step_type: Array` `instance-attribute`

The type of the current timestep, 0 for TRANSITION, 1 for TRUNCATION, 2 for TERMINATION

`t: Array` `instance-attribute`

The number of timesteps elapsed from the last reset of the environment

Environment

Environment

StepType

TERMINATION = jnp.asarray(2) class-attribute instance-attribute

TRANSITION = jnp.asarray(0) class-attribute instance-attribute

TRUNCATION = jnp.asarray(1) class-attribute instance-attribute

Timestep

action: Array instance-attribute

info: Dict[str, Any] = struct.field(default_factory=dict) class-attribute instance-attribute

observation: Array instance-attribute

reward: Array instance-attribute

state: State instance-attribute

step_type: Array instance-attribute

t: Array instance-attribute

`Environment`

`StepType`

`TERMINATION = jnp.asarray(2)` `class-attribute` `instance-attribute`

`TRANSITION = jnp.asarray(0)` `class-attribute` `instance-attribute`

`TRUNCATION = jnp.asarray(1)` `class-attribute` `instance-attribute`

`Timestep`

`action: Array` `instance-attribute`

`info: Dict[str, Any] = struct.field(default_factory=dict)` `class-attribute` `instance-attribute`

`observation: Array` `instance-attribute`

`reward: Array` `instance-attribute`

`state: State` `instance-attribute`

`step_type: Array` `instance-attribute`

`t: Array` `instance-attribute`