Environment
Environment
    
              Bases: PyTreeNode
StepType
    
              Bases: PyTreeNode
TERMINATION = jnp.asarray(2)
  
      class-attribute
      instance-attribute
  
    The episode ended and the current state is an absorbing state.
TRANSITION = jnp.asarray(0)
  
      class-attribute
      instance-attribute
  
    Standard timestep transition: the episode continues
TRUNCATION = jnp.asarray(1)
  
      class-attribute
      instance-attribute
  
    The environment reached its maximum number of timesteps. The episode ended, but the agent could have still collected rewards. The value of the state is not 0
Timestep
    
              Bases: PyTreeNode
action
  
      instance-attribute
  
    The action taken by the agent at the current timestep a_t = $\pi(s_t)$, where $s_t$ is state
info = struct.field(default_factory=dict)
  
      class-attribute
      instance-attribute
  
    Additional information about the environment. Useful for accumulations (e.g. returns)
observation
  
      instance-attribute
  
    The observation corresponding to the current state (for POMDPs)
reward
  
      instance-attribute
  
    The reward $r_{t=1}$ received by the agent after taking action $a_t$
state
  
      instance-attribute
  
    The true state of the MDP, $s_t$ before taking action action
step_type
  
      instance-attribute
  
    The type of the current timestep, 0 for TRANSITION, 1 for TRUNCATION, 2 for TERMINATION
t
  
      instance-attribute
  
    The number of timesteps elapsed from the last reset of the environment