Fig. 3From: Optimization algorithm for feedback and feedforward policies towards robot control robust to sensing failuresTrajectory optimization problem: the trajectory can be predicted with the composed policy and the stochastic dynamics model; the optimal/non-optimal trajectories can be inferred with the optimal/non-optimal policies and the true state transition probability; the predicted trajectory is desired to be close to the optimal trajectory, while to be away from the non-optimal trajectory; the divergence between trajectories can be represented by the KL divergenceBack to article page