Fig. 10From: Lifelogging caption generation via fourth-person vision in a human–robot symbiotic environmentCIDEr-D and SPICE scores with temporal batch clustering. The number of frames to be obtained is investigated among \(\{1, 2, 4, 8, 16, 32, 64\}\)Back to article page