Skip to main content

Table 3 Hyperparameters of PPO

From: Comparative study of algorithms for optimized control of industrial energy supply systems

Hyperparameter

Meaning

Chosen Value

γ

discount factor

0.99

Nenvs

number of environments run in parallel

4 (during training)

Nsteps

number of steps before update

256 (System A) / 512 (System B)

∇max

max. value for gradient clipping

0.5

α

learning rate

2 ×10−4

c1

loss coefficient for the value function

0.5

c2

loss coefficient for the entropy function

0.01

λ

factor for bias/variance ratio

0.95

Nmb

number of mini-batches per update

4

Nepochs

number of epochs per surrogate update

4

εclip

limit between new and old policy

0.2

netarch

neural network topology, neurons per layer

MLP [500, 400, 300] (ReLU)