Skip to main content

Table 3 SPICE subcategory scores on our dataset

From: Lifelogging caption generation via fourth-person vision in a human–robot symbiotic environment

Input Perspective

Method

SPICE (All)

Object

Relation

Attribute

Color

Count

Size

First

  

UpDown [20]

12.19

26.36

1.42

3.52

2.38

0.00

0.00

Second

 

UpDown [20]

12.08

26.66

1.30

1.45

0.02

0.00

0.00

 

Third

UpDown [20]

6.28

14.62

0.46

0.17

0.04

0.11

0.00

Second

Third

Ensemble

11.40

25.42

1.13

0.87

0.00

0.00

0.00

Second

Third

KMeans

12.21

27.40

1.08

0.86

0.00

0.00

0.00

First

 

Third

Ensemble

14.37

30.48

2.15

3.30

0.17

0.00

0.00

First

 

Third

KMeans

15.02

32.02

2.13

3.16

0.15

0.00

0.00

First

Second

 

Ensemble

15.04

31.96

1.98

3.63

0.04

0.00

0.00

First

Second

 

KMeans

15.24

32.56

1.90

3.41

0.14

0.00

0.00

First

Second

Third

Ensemble

14.99

32.02

2.01

3.35

0.02

0.00

0.00

First

Second

Third

KMeans

15.72

33.74

1.96

3.18

0.05

0.00

0.00

  1. Highest values are in italic