Skip to main content

Table 1 Parameter configuration

From: Optimization algorithm for feedback and feedforward policies towards robot control robust to sensing failures

Symbol

Meaning

Value

|Z|

Dimension size of latent space

6

\(\beta _T\)

Inverse temperature

10

\(\beta _z\)

Weight of regularization in z

1e−2

\(\beta _a\)

Weight of regularization in a

1e−4

\(\eta\)

Remaining computational graph

1e−4

\(\gamma\)

Discount factor

0.99

\(\alpha\)

Learning rate

3e−4

\(\rho\)

Echo state property [30]

0.5

\((\tau , \nu )\)

Hyperparameters for t-soft update [29]

(0.5, 4.0)

\((\lambda _{\mathrm{max}}^1, \lambda _{\mathrm{max}}^2, \kappa )\)

Hyperaparameters for adaptive eligibility traces [46]

(0.5, 0.95, 10)