Fig. 2From: Optimization algorithm for feedback and feedforward policies towards robot control robust to sensing failuresLoop of RL with sensing failures: in general RL (left), an agent interacts with environment by action sampled from policy depending on the current state; according to state transition probability, the new state is observed with related reward; however, in practice (right), state observation is probably with risk of sensing failures like occlusion and packet lossBack to article page