Symbol | Meaning | Value |
---|---|---|
|Z| | Dimension size of latent space | 6 |
\(\beta _T\) | Inverse temperature | 10 |
\(\beta _z\) | Weight of regularization in z | 1e−2 |
\(\beta _a\) | Weight of regularization in a | 1e−4 |
\(\eta\) | Remaining computational graph | 1e−4 |
\(\gamma\) | Discount factor | 0.99 |
\(\alpha\) | Learning rate | 3e−4 |
\(\rho\) | Echo state property [30] | 0.5 |
\((\tau , \nu )\) | Hyperparameters for t-soft update [29] | (0.5, 4.0) |
\((\lambda _{\mathrm{max}}^1, \lambda _{\mathrm{max}}^2, \kappa )\) | Hyperaparameters for adaptive eligibility traces [46] | (0.5, 0.95, 10) |