From: Lifelogging caption generation via fourth-person vision in a human–robot symbiotic environment
Input Perspective | Method | SPICE (All) | Object | Relation | Attribute | Color | Count | Size | ||
---|---|---|---|---|---|---|---|---|---|---|
First | UpDown [20] | 12.19 | 26.36 | 1.42 | 3.52 | 2.38 | 0.00 | 0.00 | ||
Second | UpDown [20] | 12.08 | 26.66 | 1.30 | 1.45 | 0.02 | 0.00 | 0.00 | ||
Third | UpDown [20] | 6.28 | 14.62 | 0.46 | 0.17 | 0.04 | 0.11 | 0.00 | ||
Second | Third | Ensemble | 11.40 | 25.42 | 1.13 | 0.87 | 0.00 | 0.00 | 0.00 | |
Second | Third | KMeans | 12.21 | 27.40 | 1.08 | 0.86 | 0.00 | 0.00 | 0.00 | |
First | Third | Ensemble | 14.37 | 30.48 | 2.15 | 3.30 | 0.17 | 0.00 | 0.00 | |
First | Third | KMeans | 15.02 | 32.02 | 2.13 | 3.16 | 0.15 | 0.00 | 0.00 | |
First | Second | Ensemble | 15.04 | 31.96 | 1.98 | 3.63 | 0.04 | 0.00 | 0.00 | |
First | Second | KMeans | 15.24 | 32.56 | 1.90 | 3.41 | 0.14 | 0.00 | 0.00 | |
First | Second | Third | Ensemble | 14.99 | 32.02 | 2.01 | 3.35 | 0.02 | 0.00 | 0.00 |
First | Second | Third | KMeans | 15.72 | 33.74 | 1.96 | 3.18 | 0.05 | 0.00 | 0.00 |