Ppo
PPOHparams
Bases: HParams
anneal_lr: bool = struct.field(pytree_node=False, default=True)
class-attribute
instance-attribute
Whether to anneal the learning rate linearly to 0 at the end of training.
budget: int = struct.field(pytree_node=False, default=1000000)
class-attribute
instance-attribute
Number of environment frames to train for.
clip_eps: float = 0.2
class-attribute
instance-attribute
PPO clip parameter.
clip_value_loss: bool = struct.field(pytree_node=False, default=True)
class-attribute
instance-attribute
Whether to clip the value loss in the PPO loss.
ent_coef: float = 0.01
class-attribute
instance-attribute
Entropy coefficient in the total loss.
gae_lambda: float = 0.95
class-attribute
instance-attribute
Lambda parameter of the TD(lambda) return.
lr: float = 0.00025
class-attribute
instance-attribute
Starting learning rate.
max_grad_norm: float = 0.5
class-attribute
instance-attribute
Maximum gradient norm for clipping.
normalise_advantage: bool = struct.field(pytree_node=False, default=True)
class-attribute
instance-attribute
Whether to normalise the advantages in the PPO loss.
num_envs: int = struct.field(pytree_node=False, default=16)
class-attribute
instance-attribute
Number of parallel environments to run.
num_epochs: int = struct.field(pytree_node=False, default=1)
class-attribute
instance-attribute
Number of epochs to train for.
num_minibatches: int = struct.field(pytree_node=False, default=8)
class-attribute
instance-attribute
Number of minibatches to split the data into for training.
num_steps: int = struct.field(pytree_node=False, default=128)
class-attribute
instance-attribute
Number of steps to run in each environment per update.
vf_coef: float = 0.5
class-attribute
instance-attribute
Value function coefficient in the total loss.