Skip to content

Environment

Bases: PyTreeNode

Bases: PyTreeNode

The episode ended and the current state is an absorbing state.

Standard timestep transition: the episode continues

The environment reached its maximum number of timesteps. The episode ended, but the agent could have still collected rewards. The value of the state is not 0

Bases: PyTreeNode

The action taken by the agent at the current timestep a_t = $\pi(s_t)$, where $s_t$ is state

Additional information about the environment. Useful for accumulations (e.g. returns)

The observation corresponding to the current state (for POMDPs)

The reward $r_{t=1}$ received by the agent after taking action $a_t$

The true state of the MDP, $s_t$ before taking action action

The type of the current timestep, 0 for TRANSITION, 1 for TRUNCATION, 2 for TERMINATION

The number of timesteps elapsed from the last reset of the environment