Fig. 8From: Lifelogging caption generation via fourth-person vision in a human–robot symbiotic environmentThree types of top-20 frequent semantic tuples of our dataset. Same as in SPICE [30] described in Sect. “Evaluation metrics”, the “object”-, the “attribute”-, and the “relation”-elements are parsed from a set of reference captions, and they form the three types of tuples lined as the bins. As seen in “relation” frequency (right), our dataset contains several phrases related to the interaction between a person and household commoditiesBack to article page