Fig. 9From: Lifelogging caption generation via fourth-person vision in a human–robot symbiotic environmentCIDEr-D and SPICE scores with different methods of clustering. The number of clusters k is swept among \(\{4, 8, 16, 32, 64\}\) for each. Both metrics show the best scores at \(k=32\)Back to article page