Rewards
DEFAULT_TASK = compose(on_goal_reached, action_cost)
module-attribute
The default task for the game, composed of the on_goal_reached
and action_cost
reward functions.
action_cost(prev_state, action, new_state, cost=0.01)
A reward function that returns a negative value when an action is taken.
All actions have a cost of cost
, except for noops.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prev_state |
State
|
The previous state of the game. |
required |
action |
Array
|
The action taken. |
required |
new_state |
State
|
The new state of the game. |
required |
cost |
float
|
The cost of taking an action. |
0.01
|
Returns:
Name | Type | Description |
---|---|---|
Array |
Array
|
A scalar array |
compose(*reward_functions, operator=jnp.sum)
Compose multiple reward functions into a single reward function.
The functions are called in order and the results are reduced using the operator
function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*reward_functions |
Callable[[State, Array, State], Array]
|
A list of reward functions. |
()
|
operator |
Callable
|
The operator to reduce the results of the reward functions. |
sum
|
Returns:
Name | Type | Description |
---|---|---|
Callable |
Callable
|
A composed reward function that applies the |
free(state)
A reward function that always returns 0, to simulate reward-free learning.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State
|
The current state of the game. |
required |
Returns:
Name | Type | Description |
---|---|---|
Array |
Array
|
A scalar array |
on_door_done(prev_state, action, state)
A reward function that returns a positive value when the agent uses the action done
in front of a door.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State
|
The current state of the game. |
required |
Returns:
Name | Type | Description |
---|---|---|
Array |
Array
|
A scalar array |
on_goal_reached(prev_state, action, state)
A reward function that returns 1 when the goal is reached, and 0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State
|
The current state of the game. |
required |
Returns:
Name | Type | Description |
---|---|---|
Array |
Array
|
A scalar array |
time_cost(prev_state, action, new_state, cost=0.01)
A reward function that returns a negative value as time passes, paying a cost of cost
at each time step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prev_state |
State
|
The previous state of the game. |
required |
action |
Array
|
The action taken. |
required |
new_state |
State
|
The new state of the game. |
required |
cost |
float
|
The cost of time passing. |
0.01
|
Returns:
Name | Type | Description |
---|---|---|
Array |
Array
|
A scalar array |
wall_hit_cost(prev_state, action, state, cost=0.01)
A reward function that returns a negative value when the agent hits a wall, paying a cost of cost
for each wall hit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State
|
The current state of the game. |
required |
cost |
float
|
The cost of hitting a wall. |
0.01
|
Returns:
Name | Type | Description |
---|---|---|
Array |
Array
|
A scalar array |