Environment
Environment
Bases: PyTreeNode
StepType
Bases: PyTreeNode
TERMINATION = jnp.asarray(2)
class-attribute
instance-attribute
The episode ended and the current state is an absorbing state.
TRANSITION = jnp.asarray(0)
class-attribute
instance-attribute
Standard timestep transition: the episode continues
TRUNCATION = jnp.asarray(1)
class-attribute
instance-attribute
The environment reached its maximum number of timesteps. The episode ended, but the agent could have still collected rewards. The value of the state is not 0
Timestep
Bases: PyTreeNode
action: Array
instance-attribute
The action taken by the agent at the current timestep a_t = $\pi(s_t)$, where $s_t$ is state
info: Dict[str, Any] = struct.field(default_factory=dict)
class-attribute
instance-attribute
Additional information about the environment. Useful for accumulations (e.g. returns)
observation: Array
instance-attribute
The observation corresponding to the current state (for POMDPs)
reward: Array
instance-attribute
The reward $r_{t=1}$ received by the agent after taking action $a_t$
state: State
instance-attribute
The true state of the MDP, $s_t$ before taking action action
step_type: Array
instance-attribute
The type of the current timestep, 0 for TRANSITION, 1 for TRUNCATION, 2 for TERMINATION
t: Array
instance-attribute
The number of timesteps elapsed from the last reset of the environment