Skip to main content

Automated harvesting by a dual-arm fruit harvesting robot


In this study, we propose a method to automate fruit harvesting with a fruit harvesting robot equipped with robotic arms. Given the future growth of the world population, food shortages are expected to accelerate. Since much of Japan’s agriculture is dependent on imports, it is expected to be greatly affected by this upcoming food shortage. In recent years, the number of agricultural workers in Japan has been decreasing and the population is aging. As a result, there is a need to automate and reduce labor in agricultural work using agricultural machinery. In particular, fruit cultivation requires a lot of manual labor due to the variety of orchard conditions and tree shapes, causing mechanization and automation to lag behind. In this study, a dual-armed fruit harvesting robot was designed and fabricated to reach most of the fruits on joint V-shaped trellis that was cultivated and adjusted for the robot. To harvest the fruit, the fruit harvesting robot uses sensors and computer vision to detect and estimate the position of the fruit and then inserts end-effectors into the lower part of the fruit. During this process, there is a possibility of collision within the robot itself or with other fruits depending on the position of the fruit to be harvested. In this study, inverse kinematics and a fast path planning method using random sampling is used to harvest fruits with robot arms. This method makes it possible to control the robot arms without interfering with the fruit or the other robot arm by considering them as obstacles. Through experiments, this study showed that these methods can be used to detect pears and apples outdoors and automatically harvest them using the robot arms.


In recent years, various food-related issues have arisen around the world. According to statistics from the United Nations, the world’s population reached 7.7 billion in mid-2019. The global population is expected to grow to around 8.5 billion in 2030, 9.7 billion in 2050, and 10.9 billion in 2100 [1]. According to another statistic from the Food and Agriculture Organization of the United Nations, there are more than 800 million undernourished people in the world, and food shortages are expected to accelerate as the population grows [2]. Looking at Japan as an example, the calorie-based food self-sufficiency rate in 2020 is 37%, and Japan relies on imports for most of its food [3]. Therefore, the country will be greatly affected by the food shortages that are expected to occur in the future. The number of agricultural workers in Japan has decreased by 394,000 from 1,557,000 to 1,363,000 over the five years from 2015 to 2020. Furthermore, the percentage of people aged 65 and over has increased from 64.8 to 69.5%, indicating that the number of agricultural workers is decreasing and the aging of the population is becoming more serious [4].

Fruit cultivation requires more labor compared to other crops. This is because many tasks such as pollination, fruit picking, fruit set management, and harvesting are done manually. In addition, the fact that orchards are located on a wide variety of terrain, from flat to sloping, and that each orchard and species of tree has a different shape, is one of the reasons why mechanization/automation has been very slow. In order to solve this problem, it is essential to use agricultural machinery and robots that can handle fruit management and harvesting. In particular, pear and apple cultivation require more labor time than other fruits, and account for a large portion of fruit cultivation in Japan. Therefore, this study aims to automatically harvest fruits (pears and apples) in an orchard using a harvest robot.

There are two major challenges in the automatic harvesting of fruits by robots: detection and localization of fruits using sensors and harvesting of the detected fruits by the robot. To detect fruits outdoors, we use an object detection method based on deep learning for RGB images to detect the location of fruits in the image. By using deep learning, we aim to stably detect fruits in the shadow of leaves or other fruits, or in an environment with changing light intensity. In addition, since it is difficult to identify the exact location of the fruit using only RGB images, we combine depth images to identify the fruits more accurately.

When harvesting fruits with robot arms, the robotic arm may collide with the robot itself or other fruits depending on the position of the fruit to be harvested. In this study, inverse kinematics and a fast path planning method using random sampling is used to harvest fruits using robot arms. This method makes it possible to control the robot arms without interfering with the fruit or the robot arm by considering them as obstacles. The fruit is harvested by grasping the fruit with a fruit harvesting end-effector attached to the end of the robot arm and twisting the fruit.

Harvesting robot

Outline of harvesting robot

Figure 1 shows the harvesting robot used in this study. This harvest robot consists of four RGB-D cameras for detecting and locating fruits, two robot arms with end-effectors for harvesting, and a computer for controlling them. In this study, we used UR3 and UR5 robot arms manufactured by Universal Robots. The harvest robot is equipped with two robot arms to increase work efficiency. The upper robot arm (UR5) harvests the fruit on the upper side of the tree, while the lower robot arm (UR3) harvests the fruit on the lower side. It is also designed to approach many of the target fruits, considering the robot arm’s operating range and fruit tree standards. In this study, we used Intel’s Real sense D435 as the RGB-D camera. As Fig. 1 shows, four RGB-D cameras were set up on the robot: two to look up at the fruit tree from directly below, one to look diagonally upward, and one to look directly to the side. By installing cameras in such a way as to view the fruit tree from many directions, we tried to reduce the number of fruits in the blind spots hidden behind leaves and branches as much as possible. As shown in Fig. 2, the end-effector grasps the fruit when it is close enough and automatically harvests the fruit by twisting it with the rotation of the end-effector. Fruits such as apples and pears are supported by only one peduncle. Therefore, even if the arrival point of the end-effector is slightly offcenter from the fruit, the rotation of the end-effector and the spring force of the fingertips will cause the fruit to align with the center of the end-effector. The end effector does not have a sensor to detect the completion of fruit harvesting. However, we have confirmed through experiments that most of the fruit can be harvested by rotating the fruit four times; therefore, this value is used in this study.

Fig. 1
figure 1

Components of harvesting robot

Fig. 2
figure 2

Harvesting with an end-effector

Target tree and fruit characteristics

In this study, pears (Hosui) and apples (Fuji) are to be harvested. A variety of cultivation methods are used in fruit cultivation, and the degree of branch protrusion and the position of fruits differ depending on each cultivation method. The purpose of this study is to automate harvesting with a fruit harvesting robot for joint V-shaped trellis, which is a type of lined dense planting cultivation [5]. By making the position where the fruit grows flat, lined dense planting cultivation makes it possible for workers or robots to work with high efficiency. We have been working on a harvesting robot which has a single 6-DOF arm for Joint V-shaped trellis [6]. Figure 3 shows an example of the joint V-shaped trellis.

Fig. 3
figure 3

Joint V-shaped trellis

Related works

Various studies have focused on developing harvesting robots. These studies can be classified into two categories.

The first category mainly focuses on detection and localization of fruits using sensors. Lin et al. proposed a vision sensing algorithm that can detect guava fruits on trees and obtain useful 3D pose information with an RGB-D sensor [7]. They applied Euclidean clustering to identify individual fruits from the fruit point cloud corresponding to segmented fruits on the image and estimated the position of the fruit relative to its mother branch. Yu et al. proposed a localization algorithm to detect the picking point on strawberry stems with Rotational You Only Look Once (R-YOLO), which predicts the rotation of the bounding box of the fruit target [8]. Their harvesting robot measures the distance to the target fruit with a pair of laser beam sensors attached to the tip of the fingers of the robot instead of detecting the depth of the target fruit. Yoshida et al. proposed a method for detecting cutting points on tomato peduncles using an RGB-D camera mounted on a harvesting robot [9, 10]. In their approach, several types of region growings were used to construct a directed acyclic graph. Subsequently, they detected appropriate cutting points on the peduncles using the Mahalanobis distance, which is defined based on statistical information.

The second category focuses primarily on robotic systems that perform all tasks from recognition to harvesting. Irie et al. proposed an asparagus harvesting robot that measured whether the asparagus was tall enough to harvest using a 3D sensor [11, 12]. They also proposed a robotic arm mechanism and an end effector to grasp and cut asparagus. Hayashi et al. proposed a strawberry-harvesting robot consisting of a cylindrical manipulator, end effector, machine vision unit, storage unit, and traveling unit [13]. The end-effector of their robot was composed of a gripper for simultaneously grasping and cutting the peduncle of the fruit, a suction device for holding the fruit to avoid damage. Lili et al. proposed a tomato harvesting robot consisting of a four-wheel independent steering system, 5-DOF harvesting system, laser navigation system, and binocular stereo vision system [14]. The harvesting robot was designed for harvesting tomatoes in a greenhouse. Yaguchi et al. proposed a tomato fruit recognition method for the harvesting robot [15]. First, color-based point cloud extraction was applied to a 3D point cloud from a stereo camera. Second, distance-based clustering was applied to separate the candidate point cloud into tomato clusters. Thereafter, the harvesting robot inserts its end-effector into the fruit position, which is decided with sphere fitting using RANSAC [16]. Silwal et al. presented the design and field testing of a robotic system to harvest apples [17]. Their robotic system integrated a global camera set-up, 7-DOF manipulator, and three-fingered grasping end-effector to execute fruit picking with open-loop control. Based on the results of field studies, they showed that horticultural practices play a critical role in the performance of robotic fruit harvesting systems. Arad et al. proposed a robot for harvesting sweet pepper fruits in greenhouses [18, 19]. They proposed a Flash-No-Flash controlled illumination acquisition protocol to stabilize the effects of illumination for color-based detection algorithms. Their sweet pepper harvesting robot applies a visual servo that maintains the detected center of the fruit in a predetermined position in the camera image to lower the requirements for camera calibration and 3D coordinates.

Focusing on a harvesting robot that has dual arms, Ling et al. proposed a dual-arm harvesting robot for harvesting tomato in a greenhouse [20]. Their robot had 2 mirrored 3-DOF arms, a right arm for grasping and a left arm for detaching. Their proposed framework detected ripe tomatoes by an algorithm combining an AdaBoost classifier and color analysis using the RGB image. Then, the 3D position of a tomato object is obtained according to the relationship between 2D image pixel coordinate and 3D point cloud coordinate acquired from the stereo camera. Based on the 3D position, the vacuum cup-type end-effector of the robot grasped the target fruit and another end-effector cut the stem to harvest the fruit. Their dual-arm approach avoided movement of the tomato when cutting the stem.

Sepúlveda et al. proposed a dual-arm aubergine harvesting robot [21]. The robot consisted of two robot arms configured like humans to optimize the dual workspace. They applied the image segmentation algorithm based on a support vector machine pixel classifier, a watershed transform and a point cloud registration for detecting and localizing aubergines. Depending on the workspace and the locations of the fruits, the planning algorithm determined the movement which involved either the simultaneous harvesting of two pieces of fruit or harvesting a single fruit with a single arm. In addition, the planning algorithm determined a collaborative behavior between the arms if an aubergine was occluded by leaves.

These robots have robot arms attached to each of their shoulders across their torso like humans, and the possibility of collisions at the joints along the way is extremely low. On the other hand, the robot used in this study has dual robot arms attached to the same side, which increases the possibility of collision between the arms, and this is a problem that needs to be addressed. This study proposes a method to automate fruit harvesting using a robot equipped with arms without collision with the target fruit or the robot. In addition, we propose a method for locating and integrating fruits in an outdoor environment. Thus, this study belongs to the second category of harvesting robots.

Automatic fruit harvesting method

Figure 4 shows a flowchart of the sequence of automatic harvesting by the harvesting robot. Automatic fruit harvesting by a robot consists of five steps: (1) fruit detection, (2) fruit localization, (3) integration of information from each camera, (4) inverse kinematics, and (5) path planning.

Fig. 4
figure 4

Flowchart of automatic fruit harvesting

First, RGB-D cameras that can simultaneously obtain RGB and depth information are used to detect and locate the fruits in the images. Deep learning is performed on the RGB image acquired from RGB-D camera to detect the position of fruits in the image. Next, the 3D positions of the fruits are identified by combining the positions of the fruits in the RGB image and the depth image.

In order to reduce the number of fruits that cannot be seen by the cameras because of occlusion by leaves or other fruits, the robot is equipped with multiple cameras. The next step is to integrate the information obtained from these multiple cameras. The order of the integrated information is random; therefore, it is replaced by an order suitable for harvesting. Here, the obtained fruit information includes the 3D position. However, in the case of an articulated robot arm, the information required for commands is the angle of each joint. Therefore, each joint angle is calculated from the 3D position and posture of the end-effector using inverse kinematics. Next, the path is planned to reach the joint angles calculated by inverse kinematics, and the harvesting motion is performed.

Fruit detection

First, the RGB images are acquired from the RGB-D cameras mounted on the robot, the fruits in the images are detected. It is necessary to combine information such as color and texture in order to achieve sufficient accuracy. In this study, we apply Single Shot Multibox Detector (SSD), which is one of the object detection algorithms, to detect fruits in images [22]. SSD is a method for detecting objects in images using a single neural network, and was proposed by Liu et al. Other methods for object detection include Faster R-CNN [23], You Only Look Once [24]. In this study, SSD is used because speed and accuracy are important. All the information about the detected bounding box \(\varvec{D}\) is obtained from the results of the detection of the fruit in the image by SSD.

The bounding box information \(\varvec{D}\) is shown in Eq. (1). One bounding box information \(\delta \) shown in Eq. (2) consists of the pixel coordinates \((x_{min},~y_{min})\) of the upper left corner of the box and the pixel coordinates \((x_{max},~y_{max})\) of the lower right corner of the box.

$$\begin{aligned} \varvec{D}&= \left[ \begin{array}{ccc}\delta _1&\cdots&\delta _n\end{array}\right] \end{aligned}$$
$$\begin{aligned} \delta&= \left[ \begin{array}{ccccc}x_{min}&x_{max}&y_{min}&y_{max}\end{array}\right] ^T \end{aligned}$$

Fruit localization

In the next step, the fruit is considered as a sphere, and the coordinates and radius of the sphere are estimated from the bounding box information \(\delta \) obtained in the previous section, RGB image, and depth image. Considering that the spherical shape of the fruit is projected onto the 2D image, the circular shape of the fruit is detected from the bounding box detected by SSD. In this study, we use the Hough transform for circle detection to detect fruit circles from RGB images [25].

The relationship between the point \((X_{camX},~Y_{camX},~Z_{camX})\) in 3D space and the point (x,  y) in 2D image can be expressed by Eq. (3), where \(f_x,~f_y\) are the focal lengths of the camera, \(c_x,~c_y\) are the image centers of the camera, and d is the depth information at the point (x,  y) obtained from the depth image.

$$\begin{aligned} d \left[ \begin{array}{ccc} x \\ y \\ 1 \end{array} \right] = \left[ \begin{array}{ccc} f_x &{} 0 &{} c_x \\ 0 &{} f_y &{} c_y\\ 0 &{} 0 &{} 1 \end{array} \right] \left[ \begin{array}{ccc} X_{camX} \\ Y_{camX} \\ Z_{camX} \end{array} \right] \end{aligned}$$

The coordinates \((X_{camX},~Y_{camX},~Z_{camX})\) of the sphere and the radius R of the sphere can be obtained by the least squares method using the equation of the sphere.

The next step involves performing a coordinate transformation of the fruit. The coordinates of the fruit obtained from Eq. (3) are in the camera coordinate system. However, the position in the robot arm coordinate system is required in order to give commands to the robot arm. Therefore the rotation and translation matrices \(\varvec{T}\) between each robot arm and the camera coordinate system, which were obtained in calibration beforehand, are used to perform coordinate transforms to the robot arm coordinates as shown in Eq. (4).

$$\begin{aligned} {\left[ \begin{array}{cc} X_{arm} \\ Y_{arm} \\ Z_{arm} \\ 1 \end{array} \right] } = \varvec{T} {\left[ \begin{array}{cc} X_{camX} \\ Y_{camX} \\ Z_{camX} \\ 1 \end{array} \right] } \end{aligned}$$

Finally, we use Eq. (5) to find the reciprocal of the pixel distance from the center of the circle detected to the center of the image, where w is the width of the image and h is the height of the image. Fruits at the edge of the image may cause large deviations in sphere detection due to insufficient RGB and depth information. Therefore, the coordinates of each camera near the center of the image can be used preferentially by using this index when integrating the information of each camera.

$$\begin{aligned} s = \frac{1}{(w/2 - c_x)^2 + (h/2 - c_y)^2} \end{aligned}$$

Thus, the information \(\varvec{L}\) for all fruits can be obtained by using Eq. (4) and Eq. (5). Equation (6) shows the information \(\varvec{L}\) for all fruits.

The information \(\sigma \) of one fruit consists of the position \((X_{arm},~Y_{arm},~Z_{arm})\) of the fruit in the robot arm coordinate system, the radius R of the fruit, and the score s

$$\begin{aligned} \varvec{L}&= \left[ \begin{array}{ccc}\sigma _1&\cdots&\sigma _n\end{array}\right] \end{aligned}$$
$$\begin{aligned} \sigma&= \left[ \begin{array}{ccccc}X_{arm}&Y_{arm}&Z_{arm}&R&s\end{array}\right] ^T \end{aligned}$$

Integration of fruit information

In this section, we integrate the fruit information \(\varvec{L}_{X}\) obtained from each camera. We can find \(\varvec{L}_{All}\) that matches the actual number of fruits without duplication of identical fruits by using Algorithm 1 for all \(\varvec{L}_{X}\). First, algorithm 1 extracts the element \(\sigma _{detect}\) of \(\varvec{L}_{X}\) and the element \(\sigma _{target}\) of \(\varvec{L}_{All}\). Next, in the seventh line, if the distance between each element \(\sigma \) is less than 0.05 [m], they are considered to be the same object. We set the threshold here to less than 0.05 [m] since the radius of the fruit is roughly 0.05 [m]. If the score of \(\sigma _{detect}\) is greater than the score of \(\sigma _{target}\) in line 9, set \(\sigma _{target}\) of \(\varvec{L}_{All}\) to \(\sigma _{detect}\) and end the iteration process. This operation makes it possible to prioritize the use of coordinates close to the center of the image for the same fruit. If only the condition in line 7 is satisfied, the iteration will be terminated without any processing. If all the elements in \(\varvec{L}_{All}\) do not satisfy the condition in line 7, add \(\sigma _{detect}\) to \(\varvec{L}_{All}\) as a new harvest target. This process is repeated for all elements of \(\varvec{L}_{All}\) and \(\varvec{L}_{X}\) of all cameras to obtain \(\varvec{L}_{All}\) of all the detected harvest targets.

figure a

Next, the proposed method rearranges \(\varvec{L}_{All}\) in the order in which they will be harvested to avoid robot collisions. The robot arm approaches from the direction where the X coordinate becomes negative due to the configuration of the robot. Therefore, we prioritize the harvesting of fruits in order from the negative X-coordinate, so that fruits not targeted for harvesting do not become obstacles during path planning. First, \(\varvec{L}_{All}\) is sorted in descending order based on the value of X in \(\varvec{L}_{All}\). If the distance between the X coordinates of different fruits \(\sigma _{i}\) and \(\sigma _{i+1}\) on the same branch is less than a threshold, the lower fruit will be harvested first. This means that the sorting is done in descending order based on the Z-axis direction only between \(\sigma _{i}\) and \(\sigma _{i+1}\). Let \(\varvec{L}\) obtained after these permutations be \(\varvec{L_{AllSorted}}\). In this study, quick sort, which is practical and fast, was used because fast processing leads to shorter harvesting time.

Inverse kinematics

In this step, each joint angle of the robot arm at an arbitrary position and posture is obtained using inverse kinematics [26] for each \(\varvec{L_{AllSorted}}\) obtained in the previous section. In order to harvest fruit with an end-effector attached to the end of the robot arm, the end-effector needs to be moved in its position \(\varvec{p}\) and posture \(\varvec{R}\) (\(\varvec{P}=(\varvec{p},\varvec{R})\)) as specified. In the case of the articulated robot arm used in this study, the position and posture \(\varvec{P}\) of the end-effector are determined by the angle \(\varvec{q}\) of each joint. Therefore, it is necessary to establish the relationship between the joint coordinate system representing the joint angles of the robot arm and the end-effector coordinate system representing the position and posture of the end-effector. The problem of determining the angle \(\varvec{q}\) of each joint from the position and posture \(\varvec{P}\) of the end-effector is called an inverse kinematics problem, and its solution is expressed using a nonlinear function \(\varvec{f}^{-1}\) as follows.

$$\begin{aligned} \varvec{q} = \varvec{f}^{-1}(\varvec{P}) \end{aligned}$$

The inverse kinematics problem is more difficult to solve than the forward kinematics problem because there may be several solutions \(\varvec{q}\) for a certain \(\varvec{P}\), or there may be no solution \(\varvec{q}\). The solution of the inverse kinematics problem can be roughly divided into the following two types.

  1. 1

    It is found numerically using iterative algorithms.

  2. 2

    It is obtained geometrically or algebraically using the features of the mechanism.

In this study, we use the latter geometric solution method considering real time use.

The Denavit-Hartenberg notation is used to set up a coordinate system for each joint in order to obtain the robot’s end-effector position and posture.

Fig. 5
figure 5

Denavit-Hartenberg parameters

As shown in Fig. 5, \(d_i\) is the distance between the links, \(\theta _i\) is the angle between the links, \(a_i\) is the length of the links, and \(\alpha _i\) is the torsion angle of the links. The homogeneous transformation matrix \({^{i-1}\varvec{T}_i}\) from coordinate system \(\Sigma _{i-1}\) to coordinate system \(\Sigma _i\) is expressed as in Eq. (9).

$$\begin{aligned} {^{i-1}\varvec{T}_i} = \begin{bmatrix} \cos \theta _i &{} -\sin \theta _i \cos \alpha _i&{} \sin \theta _i \sin \alpha _i &{} a_i \cos \theta _i\\ \sin \theta _i &{} \cos \theta _i \cos \alpha _i&{} -\cos \theta _i \sin \alpha _i &{} a_i \sin \theta _i\\ 0 &{} \sin \alpha _i &{} \cos \alpha _i &{} d_i \\ 0 &{} 0 &{} 0 &{} 1\\ \end{bmatrix} \end{aligned}$$

The computations are performed in turn until a solution is obtained for \(\varvec{L}_{Allsorted}\). When a solution is obtained, the information is used in the next step.

Inverse kinematics model for UR arms

Table 1 shows Denavit-Hartenberg parameters for robot arms (UR arms) used in this study.

Table 1 Denavit-Hartenberg parameters for UR arms

Eq. (10) is a position of the end-effector, and Eq. (11) is a posture of the end-effector, where \(\phi \) is the rotation of the end-effector around the Z-axis (roll), \(\theta \) is the rotation of the end-effector around the Y-axis (pitch), and \(\psi \) is the rotation of the end-effector around the X-axis (yaw).

$$\begin{aligned} \varvec{p} =&\left[ \begin{array}{ccc} p_{x}&p_{y}&p_{z} \end{array} \right] = \left[ \begin{array}{ccc} X&Y&Z \end{array} \right] \end{aligned}$$
$$\begin{aligned} \varvec{R} =&\begin{bmatrix} R_{11} &{} R_{21} &{} R_{31} \\ R_{12} &{} R_{22} &{} R_{32} \\ R_{13} &{} R_{23} &{} R_{33}\\ \end{bmatrix}\nonumber \\ =&\begin{bmatrix} \cos {\phi } &{} -\sin {\phi } &{} 0 \\ \sin {\phi } &{} \cos {\phi } &{} 0 \\ 0 &{} 0 &{} 1\\ \end{bmatrix} \begin{bmatrix} \cos {\theta } &{} 0 &{} \sin {\theta } \\ 0 &{} 1 &{} 0 \\ -\sin {\theta } &{} 0 &{} \cos {\theta } \\ \end{bmatrix}\nonumber \\&\begin{bmatrix} 1 &{} 0 &{} 0 \\ 0 &{} \cos {\psi } &{} -\sin {\psi } \\ 0 &{} \sin {\psi } &{} \cos {\psi }\\ \end{bmatrix} \end{aligned}$$

The angles \(\theta _i(i=1,2,\dots ,6)\) of each joint in Eq. (10) and Eq. (11) of UR arms are as follows, where \(a_{i}\), \(d_{i}\) are the parameters for the Denavit-Hartenberg notation.

$$\begin{aligned} \theta _1&= \arctan {\left( \frac{p_{y} - d_{6} R_{23}}{p_{x} - d_{6} R_{13}}\right) } \nonumber \\&\quad \pm \arccos {\left( \frac{d_{4}}{\sqrt{{p_{y} - d_{6} R_{23}}^{2} + {p_{x} - d_{6} R_{13}}^{2}}}\right) } + \frac{\pi }{2} \end{aligned}$$
$$\begin{aligned} \theta _5 =&\pm \arccos {\left( \frac{\sqrt{p_{x}\sin {\theta _{1}} - p_{y}\cos {\theta _{1}} - d_{4}}}{d_{6}}\right) } \end{aligned}$$
$$\begin{aligned} \theta _6 =&\arctan {\left( \frac{R_{22}\cos {\theta _{1}} - R_{12}\sin {\theta _{1}}}{R_{11}\sin {\theta _{1}} - R_{21}\cos {\theta _{1}}}\right) } \end{aligned}$$
$$\begin{aligned} x_{04x} =&-(R_{13}\cos {\theta _{1}} + R_{23}\sin {\theta _{1}})\sin {\theta _{5}} \nonumber \\&- ((R_{12}\cos {\theta _{1}} + R_{22}\sin {\theta _{1}})\sin {\theta _{6}} \nonumber \\&- (R_{11}\cos {\theta _{1}} + R_{21}\sin {\theta _{1}})\cos {\theta _{6}})\cos {\theta _{5}} \end{aligned}$$
$$\begin{aligned} x_{04y} =&(R_{21}\cos {\theta _{6}} - R_{32}\sin {\theta _{6}})\cos {\theta _{5}} - R_{33}\sin {\theta _{5}} \end{aligned}$$
$$\begin{aligned} p_{13x} \,=\,&d_{5} ((R_{11}\cos {\theta _{1}} + R_{21}\sin {\theta _{1}})\sin {\theta _{6}} \nonumber \\&+ (R_{12}\cos {\theta _{1}} + R_{22}\sin {\theta _{1}})\cos {\theta _{6}}) \nonumber \\&- d_{6} (R_{13}\cos {\theta _{1}} + R_{23}\sin {\theta _{1}}) \nonumber \\&+ p_{x}\cos {\theta _{1}} + p_{y}\sin {\theta _{1}} \end{aligned}$$
$$\begin{aligned} p_{13y} =&p_{z} - d_{1} - d_{6} R_{33} + d_{5} (R_{32}\cos {\theta _{6}} + R_{31}\sin {\theta _{6}}) \end{aligned}$$
$$\begin{aligned} \theta _{3} =&\arccos {\left( \frac{{p_{13x}}^{2} + {p_{13y}}^{2} - {a_{2}}^{2} - {a_{3}}^{2}}{2 a_{2} a_{3}}\right) } \end{aligned}$$
$$\begin{aligned} \theta _2 =&\arctan {\left( \frac{(a_{2} + a_{3}\cos {\theta _{3}}) p_{13y} \pm (a_{3}\sin {\theta _{3}}) p_{13x}}{(a_{2} + a_{3}\cos {\theta _{3}}) p_{13y} \mp (a_{3}\sin {\theta _{3}}) p_{13x}}\right) } \end{aligned}$$
$$\begin{aligned} \theta _4 =&\arctan {\left( \frac{x_{04y}\cos {(\theta _{2}+\theta _{3})} - x_{04x}\sin {(\theta _{2}+\theta _{3})}}{x_{04x}\cos {(\theta _{2}+\theta _{3})} + x_{04y} \sin {(\theta _{2}+\theta _{3})}}\right) } \end{aligned}$$

We adopt a combination of joint angles that allows each joint of the robot arm to move with a small amount of rotation and also allows the end-effector to assume a posture that grasps the fruit from directly below from several solutions of inverse kinematics.

Path planning

Our harvesting robot comprises dual robot arms attached on the same side of the its body, which increases the possibility of collision between the arms. This is a problem that needs to be addressed. In this section, to avoid damaging the fruits and prevent the robot arm from colliding with the robot itself or another arm, a path is planned to move to the joint angle determined in the previous section by avoiding obstacles such as the fruit and the robot itself. Figure 6 shows the robot’s perception of its own state and the location of the fruits, which is a prerequisite for path planning. In cases where two robotic arms might interfere with each other during harvesting, we prioritize the upper arm in harvesting. If tree branches are set as obstacles, planning a path for harvesting becomes difficult because of the effect of noise in the point cloud. Therefore, a countermeasure is developed, wherein even if the robot grabs a tree branch, after a certain period of time, the robot judges that the target cannot be harvested and gives up.

Fig. 6
figure 6

Example of robot and fruits arrangement

Various methods of path planning have been proposed in the past. They can be broadly classified into two categories: methods based on given nodes in the target space, and methods based on a continuous space. In situations where there are obstacles or other people in the operating area, or where the environment is dynamically changing, it is difficult to provide specified nodes. Therefore, when considering path planning for autonomous vehicles and articulated robots, the latter method is applied. These methods include the potential method, PRM(probabilistic road map) [27] and RRT(rapidly-exploring random tree) [28].

PRM and RRT use random sampling to speed up the computation. Specifically, RRT is applicable to search in high-dimensional spaces because it does not require pre-processing and has a high ability to avoid local solutions. Therefore, several modification methods have been proposed to extend RRT to be applicable to path planning under various conditions. Among them, T-RRT (Transition-based RRT), which introduces an evaluation function that can be designed according to the situation into the routing procedure, has high versatility [29]. Therefore, in this study, T-RRT is used for automatic harvesting by robotic arms.

Algorithm 2 and Fig. 7 show the process of T-RRT. The first step is to create the surrounding environment CS for path planning. The next step is to set the evaluation function c() of path planning and a start point \(q_{init}\) and a goal point \(q_{goal}\). The fourth line initializes the search tree \(\tau \) with the starting point \(q_{init}\). The fifth line loops until the search tree \(\tau \) reaches the goal point \(q_{goal}\). The sixth line sets \(q_{rand}\), which does not contact with any obstacles in the search area. Collision detection here is performed based on a 3D model reflecting the actual robot joint angles and the detected fruit positions. The seventh line searches for the node \(q_{near}\) that is closest to \(q_{rand}\) in the search tree \(\tau \). The eighth line judges whether it is possible to connect \(q_{near}\) to \(q_{new}\) in the search tree \(\tau \). If no connection can be made, the algorithm returns to the beginning of the loop. The 11th line evaluates \(q_{new}\) and \(q_{near}\) by the evaluation function and finds the shortest connection and makes a decision to add \(q_{new}\) to the search tree \(\tau \).

figure b
Fig. 7
figure 7

Transition-based RRT

Algorithm 3 shows a function of transition test of T-RRT. The first step is to filter by the maximum cost \(c_{max}\). The next step is to compare the cost \(c_{j}\) of the new node with the cost \(c_{i}\) of the parent node and the result is True if the cost of the new node is lower. If the cost of the new node is higher, the algorithm makes a True or False decision using the probability p. The number of costly nodes added decreases by gradually decreasing the probability p.

figure c


Fruit detection

In this study, we used images of pears and apples. We collected images of fruit trees viewed from multiple directions as viewed by the harvesting robot. We primarily used the lower images because they have more occlusion and are more affected by sunlight. Moreover, we added images taken at other times of the day to the training set in order to consider various sunlight conditions. Next, we evaluate how well the SSD learning model can perform detection on untrained images. The evaluation method is based on the correct response data by visual inspection, and the test is based on how well the system can detect 50 untrained images in front-lit and back-lit conditions. Table 2 shows learning parameters and results of detection. As a result, it was possible to detect more than 95% of fruits in both targets.

Table 2 Learning parameters and results of detection

Figures 8 and 9 show some of the resulting images. We were able to detect some fruits that were occluded by other fruits and leaves, but we could not detect fruits that were almost hidden. This problem can be solved by supplementing the fruit with images from cameras installed in multiple directions. In this study, we only detected the position of the fruit and did not evaluate damage to the surface of the fruit, such as whether the fruit was ripped or not.

Fig. 8
figure 8

Result of pear detection

Comparison of path planning methods

Table 3 Comparison of path planning methods

In order to compare the various path planning methods, we simulated the harvesting operation. The simulations were performed for the same fruit from the same position for the simulated fruit. Table 3 shows the characteristics of each path planning method and the time required for the harvesting operation. In some cases, the optimization of the route search did not converge. Therefore, when more than 10 seconds had elapsed since the start of the route search, the calculation was terminated, and the operation was performed on the route planned up to that point. PathPlan shows the time taken for the path plan. Harvest shows the time taken for harvesting. Sum shows the total time for all of them. On the other hand, inverse kinematics calculations were very fast at less than 1 millisecond compared to path planning time and harvesting time.

This result shows that T-RRT can perform very fast path planning compared to RRT*, PRM, and PRM*. On the other hand, RRT is faster than T-RRT because the optimization method is not introduced in the algorithm itself. However, as shown in the result of Target 3, it was observed that the harvesting operation took a long time due to the inclusion of unnecessary movements. Based on these results, T-RRT, which can perform stable and fast path planning and harvesting operations, is used in this study.

Experiments for autonomous harvesting

Experimental environment

Automated harvesting experiments were conducted at three locations. Figure 10a shows pear field at Kanagawa Agricultural Technology Center. Figure 10b shows apple field at Miyagi Prefectural Institute of Agriculture and Horticulture.

Fig. 9
figure 9

Result of apple detection

Fig. 10
figure 10

Experimental fields

Results and Discussion

Figures 11, 12, 13 show the results of the automatic pear harvesting experiment. In this experiment, only the lower arm was used, and only one camera was used for harvesting. Figure 12 shows that the location identification also coincided with the fruit point cloud and the estimated red spheres. Figure 13a–c show that the automatic harvesting could be done without colliding with the target fruit or the robot. On the other hand, in the fruit detection, the overlapping fruits could not be detected as shown in Fig. 11. This shows that a single camera is not sufficient for a fruit harvesting robot and that it is necessary to integrate multiple cameras.

Fig. 11
figure 11

Result of pear detection

Fig. 12
figure 12

Result of pear localization

Fig. 13
figure 13

Pear harvesting motion

Figures 14, 15, 16 show the results of the automatic apple harvesting experiment. In this experiment, we used all the arms and cameras shown in Fig. 1. Figure 15 shows that the fruit point cloud and the estimated red spheres are consistent for the fruit localization. Figure 16a–c show that the automatic harvesting could be done without colliding with the target fruit or the robot. Figure 14 shows that the detection that could not be done by one camera can be done by another camera, indicating that they are sufficiently complementary. In addition, although the illumination conditions varied among the cameras, each camera was able to detect the image sufficiently.

Fig. 14
figure 14

Result of apple detection

Fig. 15
figure 15

Result of apple localization

Fig. 16
figure 16

Apple harvesting motion

We showed that the proposed method can be used to automatically harvest pears and apples with a robot arm in about 20 seconds per harvest. As shown in these experiments, fruits that were more than a certain distance from branches and occluded by other fruits were able to be harvested with sufficient success. On the other hand, the success rate was very low for harvesting fruits that were very close to branches. This can be caused by the robot arm or hand getting caught on a branch or colliding with it. If the entire fruit tree is included in the collision detection, the leaves also become obstacles and the workspace of the robot arm becomes very narrow. In order to use only branches for collision detection, it is necessary to detect only branches, but this is difficult because there are so many occlusions with leaves. In order to be able to perform automatic harvesting even in such a very close environment to the branches, the harvesting method, behavior, and detection methods need to be improved significantly.


In this study, we proposed a method for locating fruits in an outdoor environment and a method for automated harvesting of fruits using robot arms.

By using SSD, we have shown that fruit detection can be performed with an accuracy of more than 95% outdoors, even in back-lit conditions. The fruit detection system developed in this study can even detect different varieties of pears and apples by re-learning the target fruit. However, as shown in the automatic pear harvesting experiment using a single camera, the accuracy of detecting a fruit hidden by leaves or other fruits was reduced. Therefore, in this study, the number of occluded fruits was reduced as much as possible by installing cameras in multiple directions.

In order to avoid damaging the fruits and to prevent the robot arm from colliding with the robot itself or another arm, we showed that path planning to the harvesting target can be performed relatively quickly, in less than 0.5 seconds using inverse kinematics and T-RRT. In addition, when integrating images taken from multiple directions, the proposed method is set up to harvest them in a safe order to avoid collisions.

Through our experiments, it was shown that the fruits can be harvested in about 20 seconds each time. This means that a single fruit can be harvested in a maximum of 10 seconds by moving two robot arms simultaneously, which is equivalent to harvesting by a human. Similar to the results for detection, it is likely that harvesting of different varieties of pears and apples is possible.

Availability of data and materials

Not applicable



Red, Green, Blue and Depth


Degree of Freedom


Region-based Convolutional Neural Network


Rotational You Only Look Once


Random sample consensus


Single Shot MultiBox Detector


Rapidly-exploring random tree


Transition-based rapidly-exploring random tree


Probabilistic road map


  1. Nations U (2019) World population prospects 2019: highlights. Accessed 26 Feb 2022

  2. Food, of the United Nations AO (2018) The future of food and agriculture—alternative pathways to 2050. Accessed 26 Feb 2022

  3. Ministry of Agriculture F, Fisheries (2021) Food Balance sheet. (In Japanese). Accessed 26 Feb 2022

  4. Ministry of Agriculture F, Fisheries (2021) Overview of the 2020 census of agriculture and forestry. (In Japanese). Accessed 26 Feb 2022

  5. Kusaba S (2017) Integration of the tree form and machinery. Farming Mech 3189:5–9 (In Japanese)

    Google Scholar 

  6. Onishi Y, Yoshida T, Kurita H, Fukao T, Arihara H, Iwai A (2019) An automated fruit harvesting robot by using deep learning. Robomech J 6(13):1–8

    Google Scholar 

  7. Lin G, Tang Y, Zou X, Xiong J, Li J (2019) Guava detection and pose estimation using a low-cost rgb-d sensor in the field. Sensors 19(2):428

    Article  Google Scholar 

  8. Yu Y, Zhang K, Liu H, Yang L, Zhang D (2020) Real-time visual localization of the picking points for a ridge-planting strawberry harvesting robot. IEEE Access 8:116556–116568

    Article  Google Scholar 

  9. Yoshida T, Fukao T, Hasegawa T (2018) Fast detection of tomato peduncle using point cloud with a harvesting robot. J Robot Mechatron 30(2):180–186

    Article  Google Scholar 

  10. Yoshida T, Fukao T, Hasegawa T (2020) Cutting point detection using a robot with point clouds for tomato harvesting. J Robot Mechatron 32(2):437–444

    Article  Google Scholar 

  11. Irie N, Taguchi N, Horie T, Ishimatsu T (2009) Development of asparagus harvester coordinated with 3-d vision sensor. J Robot Mechatron 21(5):583–589

    Article  Google Scholar 

  12. Irie N, Taguchi N (2014) Asparagus harvesting robot. J Robot Mechatron 26(2):267–268

    Article  Google Scholar 

  13. Hayashi S, Shigematsu K, Yamamoto S, Kobayashi K, Kohno Y, Kamata J, Kurita M (2010) Evaluation of a strawberry-harvesting robot in a field test. Biosys Eng 105(2):160–171

    Article  Google Scholar 

  14. Lili W, Bo Z, Jinwei F, Xiaoan H, Shu W, Yashuo L, Qiangbing Z, Chongfeng W (2017) Development of a tomato harvesting robot used in greenhouse. Int J Agric Biol Eng 10(4):140–149

    Google Scholar 

  15. Yaguchi H, Nagahama K, Hasegawa T, Inaba M (2016) Development of an autonomous tomato harvesting robot with rotational plucking gripper. In: 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 652–657

  16. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  17. Silwal A, Davidson JR, Karkee M, Mo C, Zhang Q, Lewis K (2017) Design, integration, and field evaluation of a robotic apple harvester. Journal of FIELD ROBOTICS 34(6):1140–1159

    Article  Google Scholar 

  18. Arad B, Kurtser P, Barnea E, Harel B, Edan Y, Ben-Shahar O (2019) Controlled lighting and illumination-independent target detection for real-time cost-efficient applications the case study of sweet pepper robotic harvesting. Sensors 19(6):1390

    Article  Google Scholar 

  19. Arad B, Balendonck J, Barth R, Ben-Shahar O, Edan Y, Hellström T, Hemming J, Kurtser P, Ringdahl O, Tielen T, van Tuijl B (2020) Development of a sweet pepper harvesting robot. J Field Robot 37(6):1027–1039

    Article  Google Scholar 

  20. Ling X, Zhao Y, Gong L, Liu C, Wang T (2019) Dual-arm cooperation and implementing for robotic harvesting tomato using binocular vision. Robot Auton Syst 114:134–143

    Article  Google Scholar 

  21. SepúLveda D, Fernández R, Navas E, Armada M, González-De-Santos P (2020) Robotic aubergine harvesting using dual-arm manipulation. IEEE Access 8:121889–121904

    Article  Google Scholar 

  22. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  23. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  24. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  25. Atherton TJ, Kerbyson DJ (1999) Size invariant circle detection. Image Vis Comput 17(11):795–803

    Article  Google Scholar 

  26. Slotine J-JE, Asada H (1992) Robot analysis and control, 1st edn. Wiley, New York, NY

    Google Scholar 

  27. Kavraki LE, Svestka P, Latombe J-C, Overmars MH (1996) Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans Robot Autom 12(4):566–580

    Article  Google Scholar 

  28. LaValle SM (1998) Rapidly-exploring random trees: a new tool for path planning

  29. Jaillet L, Cortés J, Siméon T (2008) Transition-based rrt for path planning in continuous cost spaces. In: 2008 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 2145–2150

  30. Karaman S, Frazzoli E (2011) Sampling-based algorithms for optimal motion planning. Int J Robot Res 30(7):846–894

    Article  Google Scholar 

Download references


We would like to express our gratitude to Kanagawa Agricultural Technology Center, Miyagi Prefectural Institute of Agriculture and Horticulture and others for their cooperation in cultivating fruits.


This research was supported by grants from the Project of the Bio-oriented Technology Research Advancement Institution, NARO (the research project for the future agricultural production utilizing artificial intelligence).

Author information

Authors and Affiliations



TY and YO conducted all research and experiments. TK and TF conducted a research concept, participated in design adjustment, and drafted a paper draft assistant. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Takeshi Yoshida.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: List of notations

Appendix 1: List of notations

See Table 4.

Table 4 List of notations

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoshida, T., Onishi, Y., Kawahara, T. et al. Automated harvesting by a dual-arm fruit harvesting robot. Robomech J 9, 19 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Harvesting robot
  • Robot manipulation
  • Deep learning