Fig. 4From: Lifelogging caption generation via fourth-person vision in a human–robot symbiotic environmentProposed methods. a Ensemble is fundamentally equivalent to the baseline UpDown [20]. All salient regions intact are subject to the attention module Att. b In KMeans, a clustering algorithm is applied to salient region features and the k centroid features are subject to attendBack to article page