From: Comparative study of algorithms for optimized control of industrial energy supply systems
Hyperparameter | Meaning | Chosen Value |
---|---|---|
γ | discount factor | 0.99 |
Nenvs | number of environments run in parallel | 4 (during training) |
Nsteps | number of steps before update | 256 (System A) / 512 (System B) |
∇max | max. value for gradient clipping | 0.5 |
α | learning rate | 2 ×10−4 |
c1 | loss coefficient for the value function | 0.5 |
c2 | loss coefficient for the entropy function | 0.01 |
λ | factor for bias/variance ratio | 0.95 |
Nmb | number of mini-batches per update | 4 |
Nepochs | number of epochs per surrogate update | 4 |
εclip | limit between new and old policy | 0.2 |
netarch | neural network topology, neurons per layer | MLP [500, 400, 300] (ReLU) |