Settings
A mobile robot with a single mounted camera was used for our experiments. The mobile platform was “i-Cart mini” produced by the T-frog project [21], and the camera was a BSW32KM (Buffalo Americas Inc.). A laptop computer was mounted on the platform. It was used to capture VGA (\(640 \times 480\) pixels) images and control the platform. Image datasets were collected for both indoor (our experimental laboratory) and outdoor (ten different scenes on our university campus) environments.
Quality landmark selection
Nine shooting locations were set in each of the target scenes, as shown in Fig. 4. The distance between neighboring locations was 0.2 m.
Landmark selection was performed by the four methods described in "Landmark selection criteria". The parameters used to select the local feature region were experimentally defined as follows:
The parameters used to select a visual landmark determined by brute force. The results were as follows:
-
(a)
Number of high similarity regions: \(l_c \ge 3\).
-
(b)
Number of corresponding regions: \(l_i \ge 9\).
-
(c)
Number of detections of features: \(g_j \ge 9\)
These were the conditions used to select visual landmarks with respect to the criteria introduced in "Landmark selection criteria". Only the landmark candidates that satisfied each condition were selected as visual landmarks. These values were based on the assumption that nine images were used. If more images are to be used, these values should be increased linearly.
The abovementioned parameters were experimentally defined; therefore, one concern was their sensitivity. In our experience, it was not significantly high as long as we examined the proposed method using images captured in indoor and outdoor environments. When we slightly changed the parameters, the quality of the landmarks degraded in some scenes even though the changes improved the quality of landmarks in other scenes. The parameters given in this study might be rough estimates; however, they provided acceptable results.
Criterion for quality evaluation
In this study, it was assumed that a robot travels on a predefined course many times. While the robot moves along the course, the number of detections for each landmark was counted. The result was then used to evaluate the repeatability of the landmark.
A variation coefficient was used for this purpose. This calculation was performed by dividing the standard deviation (Std.) by the average (Ave.) with respect to the number of detections for each landmark. If the value is small, we consider the landmark to have high repeatability.
Landmark examples
First, we present a landmark selection example from indoor environments. The visibility of these landmarks was confirmed through eleven automatic navigations. In each navigation, one hundred images were captured at 3 [fps]. Landmarks were then detected using these images.
The rightmost images in Fig. 5 are visual landmarks selected from the scene. The left columns in the table show the name of the landmark, and the top row shows the number of experiments. A to D show landmarks whose number of arcs was greater than 9 (\(l_i \ge 9\)). They were stably detected in the complete images with a small variation coefficient. On the other hand, E and F show \(l_i = 7\) and \(l_i = 8\), respectively. These were also relatively stable landmarks; however, the values of the variation coefficient were greater than those of the abovementioned case. These results indicate that \(l_i\) can be used to determine the landmark quality.
Visibility evaluation
The same procedure described in the previous subsection was performed using images obtained in ten outdoor locations. Methods (a)–(d) ("Landmark selection criteria") were used to determine whether they are suitable for selecting a quality landmark. Figure 6 shows a list of variation coefficients for all local feature regions. The blue and red points indicate landmarks and other local feature regions, respectively. It is not always true that landmarks with \(l_c\) greater than 3 have a smaller variation coefficient than the other local feature regions. The same holds true for Fig. 7, which shows the results for method (b). In addition, it is not always true that landmarks with \(l_i\) greater than 9 have a small variation coefficient.
The SIFT features included in the landmarks were examined to clarify the reason behind these observations. In some cases, the features were extracted from a spatial region where a large perspective change occurred. These features are not robust against viewpoint changes; thus, it is expected that they would not be included in the landmark. Another problem unique to method (a) is that a local feature region can differ according to the layout of the feature points. Figure 8 shows an example. One large region was extracted at one viewpoint; however, it was divided into two regions in another viewpoint. This caused a misdetection of the landmark.
Method (c) considers the adequacy of SIFT features. Figure 9 shows the relation between the serial number of the landmark and the variation coefficient. Here, “all” means that all features were used for landmark detection and “only” means that only features with weight coefficient \(f_g\) were used for the detection. Obviously, using features with the weight coefficient resulted in small variation coefficients. This means that the criteria for defining the weight coefficient are useful for selecting high visibility landmarks.
Figure 10 shows the relation between the serial number of a landmark and the average number of correct correspondences. Using features with a weighted coefficient provided stable landmark detection. This means that the weighted features allow us to find correspondences with high repeatability because they were easy to find from a set of images captured at different viewpoints. In addition, the processing time to find landmarks with a weighted coefficient was 23.89 s for 100 frames. On the other hand, the same process using all feature points required 26.06 s, which is a reduction of 8.33 %.
As can be seen in Fig. 11, the average number of correspondences tended to be large when the feature points had large weight coefficient G. This graph shows that the proposed approach allows us to find quality landmarks using the weight coefficient, e.g., G greater than 5.0 statistically guarantees quality landmarks. This value can be predefined; thus, quality landmark selection can be automatically achieved. The same trend can be observed from the relation between the weight coefficient and the variation coefficient (Fig. 12). A greater weight coefficient results in a landmark with a lower variation coefficient.
Other feature descriptors
A SIFT descriptor is robust against changes in scale, rotation, translation, and illumination because these characteristics are suitable for mobile robots. In addition, we can find other excellent descriptors with equivalent characteristics. A feature descriptor that provides scale and direction information is applicable to the proposed method; therefore, we attempted to replace SIFT with another feature descriptor.
Here, there are two primary steps to extract an image feature point: feature point detection and feature description. Speeded-Up Robust Features (SURF) [22] are well-known approximations of SIFT features. To detect SURF keypoints, a box filter was applied to calculate the scale-space extremum. Note that the density of the extremum tends to become sparse; thus, it may show poorer performance than SIFT because the proposed method requires densely extracted keypoints. This assumption was experimentally confirmed using the abovementioned images. We set a smaller \(l_c\), and this blurred the line between a landmark and other feature regions.
As another proof, feature description using FREAK [17] was also examined. Here, to generate a local feature region, the parameters were the same as those described in "Quality landmark selection". The FREAK descriptor was applied to each feature point extracted by the SIFT method. Figures 13 and 14 show the results obtained by method (c). The significance of the landmark quality was lesser than that of SIFT. Although the basic tendency was the same, i.e., a small coefficient variance was found by using feature regions with weight coefficient \(g_j\), the SIFT feature showed better performance compared with the proposed method.