Rewards

`DEFAULT_TASK = compose(on_goal_reached, action_cost)` `module-attribute`

The default task for the game, composed of the on_goal_reached and action_cost reward functions.

`action_cost(prev_state, action, new_state, cost=0.01)`

A reward function that returns a negative value when an action is taken. All actions have a cost of cost, except for noops.

Parameters:

Name	Type	Description	Default
`prev_state`	`State`	The previous state of the game.	required
`action`	`Array`	The action taken.	required
`new_state`	`State`	The new state of the game.	required
`cost`	`float`	The cost of taking an action.	`0.01`

Returns:

Name	Type	Description
`Array`	`Array`	A scalar array `f32[]` with value -`cost` if the action is not a noop, and 0 otherwise.

`compose(*reward_functions, operator=jnp.sum)`

Compose multiple reward functions into a single reward function. The functions are called in order and the results are reduced using the operator function.

Parameters:

Name	Type	Description	Default
`*reward_functions`	`Callable[[State, Array, State], Array]`	A list of reward functions.	`()`
`operator`	`Callable`	The operator to reduce the results of the reward functions.	`sum`

Returns:

Name	Type	Description
`Callable`	`Callable`	A composed reward function that applies the `operator` to the results of the reward functions.

`free(state)`

A reward function that always returns 0, to simulate reward-free learning.

Parameters:

Name	Type	Description	Default
`state`	`State`	The current state of the game.	required

Returns:

Name	Type	Description
`Array`	`Array`	A scalar array `f32[]` with value 0.

`on_door_done(prev_state, action, state)`

A reward function that returns a positive value when the agent uses the action done in front of a door.

Parameters:

Name	Type	Description	Default
`state`	`State`	The current state of the game.	required

Returns:

Name	Type	Description
`Array`	`Array`	A scalar array `f32[]` with value 1 if the agent uses the action `done` in front of a door, and 0 otherwise.

`on_goal_reached(prev_state, action, state)`

A reward function that returns 1 when the goal is reached, and 0 otherwise.

Parameters:

Name	Type	Description	Default
`state`	`State`	The current state of the game.	required

Returns:

Name	Type	Description
`Array`	`Array`	A scalar array `f32[]` with value 1 if the goal is reached, and 0 otherwise.

`time_cost(prev_state, action, new_state, cost=0.01)`

A reward function that returns a negative value as time passes, paying a cost of cost at each time step.

Parameters:

Name	Type	Description	Default
`prev_state`	`State`	The previous state of the game.	required
`action`	`Array`	The action taken.	required
`new_state`	`State`	The new state of the game.	required
`cost`	`float`	The cost of time passing.	`0.01`

Returns:

Name	Type	Description
`Array`	`Array`	A scalar array `f32[]` with value -`cost`.

`wall_hit_cost(prev_state, action, state, cost=0.01)`

A reward function that returns a negative value when the agent hits a wall, paying a cost of cost for each wall hit.

Parameters:

Name	Type	Description	Default
`state`	`State`	The current state of the game.	required
`cost`	`float`	The cost of hitting a wall.	`0.01`

Returns:

Name	Type	Description
`Array`	`Array`	A scalar array `f32[]` with value -`cost` if the agent hits a wall, and 0 otherwise.

Rewards

DEFAULT_TASK = compose(on_goal_reached, action_cost) module-attribute

action_cost(prev_state, action, new_state, cost=0.01)

compose(*reward_functions, operator=jnp.sum)

free(state)

on_door_done(prev_state, action, state)

on_goal_reached(prev_state, action, state)

time_cost(prev_state, action, new_state, cost=0.01)

wall_hit_cost(prev_state, action, state, cost=0.01)

`DEFAULT_TASK = compose(on_goal_reached, action_cost)` `module-attribute`

`action_cost(prev_state, action, new_state, cost=0.01)`

`compose(*reward_functions, operator=jnp.sum)`

`free(state)`

`on_door_done(prev_state, action, state)`

`on_goal_reached(prev_state, action, state)`

`time_cost(prev_state, action, new_state, cost=0.01)`

`wall_hit_cost(prev_state, action, state, cost=0.01)`