Skip to main content
Fig. 3 | ROBOMECH Journal

Fig. 3

From: 3D pointing gestures as target selection tools: guiding monocular UAVs during window selection in an outdoor environment

Fig. 3

a Scenario Overview There are four essential elements: the user, the targets, camera, and processing unit. The user interacts with the camera in the form of pointing towards the desired target. The targets are types of visual content. For example, this includes a window that the processing unit has been trained to recognize using object detection algorithms. The processing unit is any computational system that processes the images from the camera and sends signals to the UAV. The Camera is the image feed in the moving UAV. b Pose Output Format: BODY-25. c Left-side: Points used from BODY-25 model. Right-side: Intuition on depth estimation of keypoints. When pointing backwards (transverse and sagittal planes), the size of the user’s arm on the Image Projection (\(d_{c12} + d_{c23} + d_{c34}\)) is deformed by perspective. In this scenario, \(d_{c12} + d_{c23} + d_{c34}\) is then smaller than \(\left( d_{f12} \times \frac{d_{c18}}{d_{f18}}\right) + \left( d_{f23} \times \frac{d_{c18}}{d_{f18}}\right) + \left( d_{f34} \times \frac{d_{c18}}{d_{f18}}\right)\), which is the expected size of the arm when the user is pointing sideways (arms on the frontal plane). Please refer to Eqs. 8, 9, 10. Using a right-angle triangle intuition (see the three grey right angle triangles in the image), we can estimate the depth change between \(P_{1}^z\) and \(P_{2}^z\), \(P_{3}^z\), \(P_{4}^z\)

Back to article page