Preliminary trajectory evaluation
First, we conducted a preliminary experiment to verify the relationship between the input voltages to the piezo stages of the mirror-drive 2-DOF active vision system and its angular displacements in the pan and tilt directions when the active vision system was periodically operated on a designed trajectory at a frequency of 125 Hz. We determined the parameters \((b_\phi ,c_\phi )\) and \((b_\psi ,c_\psi )\), which are expressed in Eq. (6), of the trajectory of the active vision system during the time when the shutter is open, and quantified the nonlinear deviations with ripples in the pan and tilt trajectories. In the experiment, the periodic voltage wave at a cycle time of \(\tau =8\) ms was inputted to the piezo stage, as shown in Fig. 6; the input voltage wave was set to a linear waveform from 0 V to a maximum voltage \(V_{max}\) in a period \(\tau _t=4.5\) ms, where \(V_{max}\) was set to 15, 30, 45, 60, 75, 90, 105, 120, 135, and 150 V. To measure the pan and tilt angles of the active vision system, a laser beam spot for observation was redirected by the mirrors of the active vision system, and the locations of the laser beam spot projected on a screen at a distance of 4350 mm from the active vision system were extracted offline by capturing an HFR video at 10,000 fps.
Figure 7 shows the angular displacements of the pan and tilt angles of the active vision system for 30 ms when the periodic input voltage waves, the maximum voltages of which varied from 0 to 150 V, at 125 Hz were applied to the piezo stages. In both the pan and tilt angles, the angular displacements were periodically changed at a frequency of 125 Hz in proportion to the amplitudes of the input voltage waves, whereas they involved certain ripple waves because of their resonant vibrations. The observed resonant frequencies in the pan and tilt angles were approximately 730 and 850 Hz, respectively; they were one-fifth or less of 3900 Hz, which is the resonant frequency of the piezo stage when no object is mounted on it. The decrease in the resonant frequencies was caused mainly the mirror attached to the piezo stage. It can be observed that the resonant frequency in the tilt angle was less than that in the pan angle, and the amplitude of the ripple in the tilt angle was more than that in the pan angle, because the tilt angular motion was more strongly affected by gravity than the pan angular motion.
When the angular trajectories during the exposure time \(\tau =\,4\) ms were linearized by the least squares method, Fig. 8 shows the relationship between their inclinations and the input voltages to the piezo stages. It can be observed that the inclinations of the angular trajectories, which correspond to the apparent angular velocity of the target object, varied linearly with the amplitudes of the input voltages; the parameters in Eq. (6) were estimated as \((b_\phi ,c_\phi )=\) (2.10, 4.47) and \((b_\psi ,c_\psi )=\) (2.91, 1.61) for the pan and tilt angles, respectively. Figure 9 shows the relationship between the estimated angular speeds \(\tilde{\varvec{\omega }}=(\tilde{\omega }_\phi ,\tilde{\omega }_\psi )\) and the averaged deviations \((\Delta \phi _d, \Delta \psi _d)\) from the approximated lines during 4 ms. In the figure, the ratio of the averaged deviation \((\Delta \phi _d, \Delta \psi _d)\) to the estimated angular displacement \((\phi _{mv},\psi _{mv})=(\tilde{\omega }_\phi \tau ,\tilde{\omega }_\psi \tau )\) during the exposure time \(\tau =4\,\)ms, \((\Delta \phi _d / \phi _{mv}, \Delta \psi _d / \psi _{mv})\), is also plotted; the ratio indicates the percentage by which our frame-by-frame intermittent tracking method can reduce motion blur in shooting fast moving objects. When the maximum voltage of the input image was 150 V, the angular speeds and the averaged deviations are 49.7\({^\circ }\)/s and 1.91 \(\times\,10^{-2}{^\circ }\) for the pan angle and 67.1\({^\circ }\)/s and 3.64 \(\times 10^{-2}{^\circ }\) for the tilt angle; the ratios \((\Delta \phi _d / \phi _{mv}, \Delta \psi _d / \psi _{mv})\) were 9.3 and 12.8 %. It can be observed that the deviation error from the approximate line becomes larger as the angular speeds become larger in both the pan and tile angles. The ratio \((\Delta \phi _d / \phi _{mv}, \Delta \psi _d / \psi _{mv})\) was not so significantly changed with the estimated angular speeds, whereas the ratio of the tilt angle was larger than that of the pan angle because of the effect of gravity. Thus, we should consider image degradation with a certain motion blur with the above-mentioned ripple deviations in motion-blur-free video shooting at 125 fps.
Circle-dot motion at constant speeds
Next, we conducted video shooting experiments for a circle-dot pattern to verify the relationship between the speed of an object and its motion blur. The pattern was moved along (1) the horizontal direction and (2) the oblique direction with an inclination of 20\({^\circ }\), at constant speeds of 0, 250, 500, 750, and 1000 mm/s using a 1-DOF linear slider. In the experiment, the HTZ-11000 (Joble Co., Japan) was used as the CCTV zoom lens; its focal length was set to \(f=650\,\)mm. The linear slider was located at a distance of 4350 mm from the mirror-drive 2-DOF active vision system; the \(35\times 35\,\) mm area on a plane at a distance of 4350 mm corresponded to an image region of \(512\times 512\) pixels, and \(6.84\times 10^{-2}\) mm corresponded to one pixel. We can cancel motion blur during the 4 ms exposure when shooting a target object moving at 5.21 and 3.86 m/s on a plane 4350 mm in front of the mirror-drive 2-DOF active vision system in the vertical and horizontal direction, respectively, corresponding to its apparent motions at 304 and 225 pixels during the 4 ms exposure time in the x and y direction on the image sensor. Figure 10 shows (a) an overview of the experimental environment, (b) the circle-dot pattern to be observed, and (c) the configuration of the experimental setting. The 4-mm-diameter circle dots were black-printed at intervals of 50 mm on a white sheet of paper.
Figure 11 shows the \(227\times 227\) images cropped from the \(512\times 512\) input images so that the circle dot is located at their centers, and Fig. 12 shows the brightness profiles of 256 pixels on a horizontally intersected line of images when the circle dot moved at 0, 250, 500, 750, and 1000 mm/s in the horizontal direction. The threshold for binarization was \(I_B=50\). As observed in Figs. 11 and 12, the input images captured with frame-by-frame intermittent tracking (IT) were compared with those captured without mechanical tracking (NT) and their motion deblurring (MD) images. The MD images were obtained by processing the NT images offline using a non-blind convolution method with a line kernel function [25]. The NT images became increasingly blurred in the horizontal direction as the speed of the circle dot increased, whereas the IT images remained almost entirely free of blurring regardless of the speed. Figure 13 shows the \(227\times 227\) images cropped from the \(512 \times 512\) input images when the circle dot moved in the oblique direction. It can be seen that frame-by-frame intermittent tracking achieves motion-blur-free video shooting of the object moving in the oblique direction, as well as in the horizontal direction; the NT images are blurred in the \(20{^\circ }\) oblique direction, whereas the IT images are without blur at all the slider speeds. In the MD images, most of motion blurs were remarkably reduced, whereas certain ghost errors remained in the moving directions especially when the circle-dot moved by dozens of pixels during the camera shutter was open. This is because it is difficult for deconvolution-based methods to completely reduce large motion blurs for nonlinear brightness images with zero or saturation.
To evaluate the degree of motion blur of the observed the circle dot, the index \(\Delta \lambda =\lambda _+-\lambda _-\) was introduced; \(\lambda _+\) and \(\lambda _-\) represent the lengths of the major and minor axes of the approximated ellipse of the circle dot in the image. The index \(\Delta \lambda\) increases as the motion blur increases in the image, and is zero when the dot is a perfect circle in the image. \(\lambda _+\) and \(\lambda _-\) were estimated offline by calculating the zero-, first-, and second-order moment features for the circle-dot region in the image, which was extracted by binarization with a threshold of 63. Considering the offset \(\Delta \lambda _0=2.6\) pixel when no motion is present, the blur index \(\Delta \lambda '=\Delta \lambda - \Delta \lambda _0\) was evaluated for the IT and NT images in Figs. 11 and 13. Figure 14 shows the relationship between the speed of a circle dot and its blur index \(\Delta \lambda '\) for the IT and NT images; \(\Delta \lambda '\) was averaged for the speeds of 50 selected images. The blur index \(\Delta \lambda '\) for the IT images was remarkably low at all the speeds as compared with that for the NT images; it became larger as the speed of the circle dot increased. When the circle dot moved in the horizontal direction, the blur index \(\Delta \lambda '\) for the IT images was 0.9, 2.3, 3.1, and 2.7 pixel at 250, 500, 750, and 1000 mm/s, respectively; this corresponds to 13.3, 14.3, 11.5, and 7.4% of the respective value of \(\Delta \lambda '\) for the NT images. When the circle-dot moved in the oblique direction, \(\Delta \lambda '\) for the IT images was 0.1, 1.9, 2.6, and 2.3 pixel at 250, 500, 750, and 1000 mm/s, respectively; this corresponds to 1.6, 14.9, 12.2, and 7.5% of the respective value of \(\Delta \lambda '\) for the NT images. In the experiment, the speed of the circle dot was 1 m/s or less, which is considerably lower than the maximum motion-blur-free speeds of 5.21 m/s in the horizontal direction and 3.86 m/s in the vertical direction, and our frame-by-frame intermittent tracking method noticeably reduced motion blur of circle dots moving at all the speeds in video shooting with the exposure time of 4 ms, whereas slight motion blur remained in the IT images because of nonlinear deviations with ripples on the trajectory of the mirror-drive 2-DOF active vision system.
Table tennis ball motion at constant speeds
Next, we conducted video shooting experiments for fast moving table tennis balls launched by a table tennis machine to verify motion blur when the speed of the object to be observed is larger than the maximum motion-blur-free speed of our mirror-drive 2-DOF active vision system. Figure 15 shows (a) an overview of the experimental environment, and (b) the 40-mm-diameter table tennis balls that were observed. The table tennis machine (TSP Hyper S-2, Yamato Takkyu Co., Japan) was installed 4350 mm in front of the mirror-drive 2-DOF active vision system, and a table tennis ball (plain) was launched in (1) the horizontal direction, and (2) the oblique direction with an inclination of 20\({^\circ }\) at constant speeds of 3, 4, 5, 6, and 7 m/s. In the experiment, a CCTV lens of \(f= 75\) mm was used with a 1.5\({\times }\) extender; the \(200\times 200\) mm area on a plane at a distance of 4350 mm corresponded to an image region of \(512\times 512\) pixels and 0.391 mm corresponded to one pixel. When observing an object moving fast on a plane of 4350 mm in front of the mirror-drive 2-DOF active vision system, the maximum motion-blur-free speeds were 5.21 m/s in the horizontal direction and 3.86 m/s in the vertical direction, corresponding to its apparent motions at 52.2 and 38.6 pixels during the exposure time of 4 ms.
Figures 16 and 17 show the \(227\times 227\) images cropped from the \(512\times 512\) input images [(a) IT images, (b) NT images] so that the table tennis ball is located at their centers when it is thrown in the horizontal direction and oblique direction. As compared with the input images captured when a table tennis ball was thrown at 3, 4, 5, 6, and 7 m/s, the input image of a motionless table tennis ball (0 m/s) is illustrated. The threshold for binarization in frame-by-frame intermittent tracking was \(I_B=50\). It can be seen that the IT images remained almost blur-free, regardless of the speed, and they were similar to the input images captured when the ball speed was 0 m/s, whereas the motion blur of the table tennis balls in the NT images increased in both their moving directions as their speed increased.
Figure 18 shows the relationship between the speed of a table tennis ball and its blur index \(\Delta \lambda '\) for the IT and NT images. Considering the offset \(\Delta \lambda _0=1.22\) pixel in the case of no motion, \(\Delta \lambda '\) of those for 50 selected images, which were binarized with a threshold of 55, was averaged, in a manner similar to that in the experiment using circle-dot motion. As compared with the blur index \(\Delta \lambda '\) for the NT images, the blur index \(\Delta \lambda '\) for the IT images was remarkably low at all the speeds in the horizontal and oblique directions. The blur index \(\Delta \lambda '\) for the IT images at 3, 4, 5, 6, and 7 m/s in the horizontal direction was 0.05, 0.26, 0.20, 0.70, and 2.20 pixel, respectively, which corresponds to 1.1, 2.6, 1.1, 2.7, and 6.5% of the respective value of \(\Delta \lambda '\) for the NT images. The blur index \(\Delta \lambda '\) for the IT images at 3, 4, 5, 6, and 7 m/s in the oblique direction was 0.05, 0.42, 0.71, 1.63, and 2.33 pixel, respectively, which corresponds to 0.8, 3.2, 3.8, 6.4, and 6.8% of the respective value of \(\Delta \lambda '\) for the NT images. The blur index \(\Delta \lambda '\) for the IT images showed a tendency to increase slightly when shooting a video of a table tennis ball thrown at 6 and 7 m/s in the horizontal and oblique directions. This is mainly because the speed of the table tennis ball was so much higher than the maximum motion-blur-free speed (5.21 m/s in the horizontal direction, 3.86 m/s in the vertical direction) that the moving distance during an exposure time of 4 ms exceeded the upper limit of the movable range of the 2-DOF mirror-drive active vision system.
Table tennis ball motion at variable speeds
Next, we show the experimental results for the video of a table tennis ball identical to that used in the previous subsection, when the table tennis ball was alternately launched from the table tennis ball machine at different speeds of 3 and 5 m/s at intervals of 0.5 s. Figure 19a shows the 2-s temporal changes of the estimated speed and blur index \(\Delta \lambda '\) for the IT images with frame-by-frame intermittent tracking, as compared with (b) those for the NT images when the table tennis balls were passing in front of the mirror-drive 2-DOF active vision system in a manner similar to that when capturing the IT images. Corresponding to the launching interval of a table tennis ball and its passing time duration over a whole image region of \(512\times 512\) pixels, the speeds of the table tennis balls in images were discontinuously estimated in time; the passing time durations were 66.7 and 40.0 ms when a table tennis ball was thrown at 3 and 5 m/s, respectively; they correspond to the duration times for capturing eight and five frame images at 125 fps, respectively.
It can be seen that the ball speed was estimated as a pulse wave, in which a 3-m/s-amplitude pulse of 66.7-ms-width and a 5-ms-amplitude of 40-ms-width appear alternately at intervals of 0.5 s, and the blur index \(\Delta \lambda '\) for the NT images also alternated between 7 and 20 pixels. The blur index \(\Delta \lambda '\) for the IT images became a certain large value of 7 and 20 pixels exactly when the table tennis ball thrown at 3 and 5 m/s appeared in the image, whereas it was remarkably reduced around 1 pixel dozens of milliseconds after its appearance in the image; this corresponds to the duration time for capturing two frame images at 125 fps. The latency in motion blur reduction is caused mainly by (1) the time delay in frame-by-frame intermittent tracking, involving a one-frame-delay in estimating the ball speed using image features computed at the previous frame and a one-frame delay in reflecting it to the pan-tilt actuation of the 2-DOF mirror-drive active vision system, and (2) underestimated speed exactly when the table tennis ball appears in the field of camera view because of its partial appearance at the right side of the images.
Figure 20 shows (a) a sequence of the images with frame-by-frame intermittent tracking, and (b) a sequence of the images without tracking when a table tennis ball with printed patterns, as illustrated in Fig. 15b, thrown at 3 m/s in the horizontal direction, was passing over the whole image region from right to left, taken at intervals of 16 ms; the upper images are the \(512\times 512\) input images, and the lower ones are the \(132\times 132\) images cropped from them so that the table tennis ball is located at their centers. It can be seen that the NT images are too heavily blurred to allow recognition of the letter patterns printed on the table tennis ball in all the frames. For the IT images, the input image was largely blurred at the start frame when the table tennis ball appeared at the right side in the image, whereas the blurring of the input images in all the remaining frames was reduced to the extent that the letter pattern of “hello, world!” at the center of the table-tennis ball can always be recognized.
Nevertheless, a two-frame delay remains in frame-by-frame intermittent tracking for motion blur reduction. Our system can capture less-blurred input images with a dozens-of-millisecond delay for a table tennis ball thrown at 8.0 m/s or less; its passing time over a whole image region of \(512\times 512\) pixels was larger than 24 ms for capturing three frame images at 125 fps.