Rewards
DEFAULT_TASK = compose(on_goal_reached, action_cost)
  
      module-attribute
  
    The default task for the game, composed of the on_goal_reached and action_cost reward functions.
action_cost(prev_state, action, new_state, cost=0.01)
    A reward function that returns a negative value when an action is taken. 
All actions have a cost of cost, except for noops.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| prev_state | State | The previous state of the game. | required | 
| action | Array | The action taken. | required | 
| new_state | State | The new state of the game. | required | 
| cost | float | The cost of taking an action. | 0.01 | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Array | Array | A scalar array  | 
compose(*reward_functions, operator=jnp.sum)
    Compose multiple reward functions into a single reward function.
The functions are called in order and the results are reduced using the operator     function.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| *reward_functions | Callable[[State, Array, State], Array] | A list of reward functions. | () | 
| operator | Callable | The operator to reduce the results of the reward functions. | sum | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Callable | Callable | A composed reward function that applies the  | 
free(state)
    A reward function that always returns 0, to simulate reward-free learning.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| state | State | The current state of the game. | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Array | Array | A scalar array  | 
on_door_done(prev_state, action, state)
    A reward function that returns a positive value when the agent uses the action     done in front of a door.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| state | State | The current state of the game. | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Array | Array | A scalar array  | 
on_goal_reached(prev_state, action, state)
    A reward function that returns 1 when the goal is reached, and 0 otherwise.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| state | State | The current state of the game. | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Array | Array | A scalar array  | 
time_cost(prev_state, action, new_state, cost=0.01)
    A reward function that returns a negative value as time passes, paying a cost     of cost at each time step.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| prev_state | State | The previous state of the game. | required | 
| action | Array | The action taken. | required | 
| new_state | State | The new state of the game. | required | 
| cost | float | The cost of time passing. | 0.01 | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Array | Array | A scalar array  | 
wall_hit_cost(prev_state, action, state, cost=0.01)
    A reward function that returns a negative value when the agent hits a wall,     paying a cost of cost for each wall hit.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| state | State | The current state of the game. | required | 
| cost | float | The cost of hitting a wall. | 0.01 | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Array | Array | A scalar array  |