Consensus-Making Algorithms for Cognitive Sharing of Object in Multi-Robot Systems
© Tomita et al.; licensee Springer 2014
Received: 15 January 2014
Accepted: 12 June 2014
Published: 4 September 2014
Visual recognition in multi-robot systems is afflicted with a peculiar problem that observations made from different viewpoints present different perspectives. Hence, realizing cognitive sharing of the object among robots in an unconstructed environment has become challenging. To cope with these issues, we have proposed the Hierarchical Invariants Perception Model (HIPM) in which multiple representations of the target are dynamically evaluated and selected by the robot. In this paper, we propose consensus-making algorithms to acquire a viewpoint-invariant representation of the geometric relation, which is an unaddressed issue in the HIPM.
The target is described by a combination of three representations: color, shape, and geometric relation. In terms of geometric relation, we employ relative positions between the target and the salient objects, which we call a geometric-relation-based representation (GRR). A GRR is regarded as viewpoint-invariant when satisfying two conditions: (i) It consists of sharable objects and (ii) the number of target candidates, which is reduced by using the GRR, is equivalent among the robots. Based on this definition, consensus-making algorithms are formulated
Experiments with real-world robots demonstrated that robots perceived the viewpoint-invariant GRR even when objects are occluded or their appearance changes. The experiment also demonstrated that the proposed algorithms were able to reduce the candidates without succumbing to an infinite loop. The success rate of cognitive sharing was about 60%. However, the success rate was 100% when GRR was shared. As long as robots can share the GRR, cognitive sharing may be realized even if the environment is more unstructured and uncertainties increase.
KeywordsCognitive sharing Multi-robot Consensus making
Cognitive sharing of an object is a primary issue in multi-robot task execution, where robots with different perspective are expected to cooperate in our daily unstructured environment. As robots engage in more varied and difficult tasks, they will become a ubiquitous part of our daily life in the future . Researchers generally agree that multi-robot systems of inherently distributed character may behave more robustly and effectively and accomplish cooperative tasks that are not possible for single-robot systems (e.g., carrying a heavy object) . In this paper, we refer to the robot that makes a request to another robot for a task execution as the client robot and to the task execution robot as the server robot.
Two methods are needed to realize cognitive sharing: a method for describing suitable representations of the target object so that the server robot uniquely identifies it and a method for sharing the representation with another robot. Cognitive sharing of an object would be achieved when the client robot and the server robot share representations of the object. However, visual recognition in multi-robot systems is afflicted with the peculiar problem that observations made from different viewpoints give different perspectives. Therefore, the server robot may mistake a target for background false positive objects and not all representations of the object can be shared among the robots.
Although cognitive sharing has been realized in conventional work, these methods are based on structured environments. With the help of an artificial marker such as an RFID tag  or QR code , or by using a single-colored sphere as a target , a target object is identified uniquely and its representation becomes viewpoint-independent. However, in unstructured environments, a predefined marker is difficult to use and the appearance of the target may change according to the viewpoint.
In this paper, to describe suitable representations of a target object, the target is described by a combination of multiple representations by utilizing RGB-D data. RGB-D data give many kinds of representation of an object (e.g., color, shape, label, relative positions, and semantic relations). We deem that, by using a combination of representations, robots can verify whether or not they observe the same object. Although cognitive sharing may be realized by sharing a global position of the target, the global coordinate has to be shared in advance. Even if the global coordinate is shared, localization errors by robots will lead to an ambiguous global position for the target and cause misrecognition.
However, visual representations are highly affected by the robots’ viewpoint and environmental condition. Different viewpoints lead to ambiguity of representations as objects are occluded or their appearance changes. The representations described by the robots are often embodiment-specific even if the sensor of the robots is the same e.g., the camera color models may differ slightly . Also, unstructured environments abound with unavoidable disturbances, such as illumination changes, object occlusion, and sensor faults, that would disturb cognitive sharing and even object recognition.
In this paper, we deal with unaddressed issues in the HIPM concept: the definition of a relation-based representation and sharing a viewpoint-invariant decision tree. Although we have proposed the technique of autonomous landmark generation , in which some peripheral objects near the target are selected as landmarks in real time to relate the target to the surroundings, problems resulting from differences of viewpoint are not considered and a relation-based representation among the target and multiple landmarks had not been used.
A relation between the target and the surroundings has been investigated for object tracking. Yang et al.  formulated a relation between a target and its surroundings under a Markov network with special topology and realized robust tracking against occlusion, similar objects, and cluttered backgrounds. However, the relation would change from another viewpoint. On the other hand, relative positions between the target and landmarks are utilized as a relation-based representation of the target. Gohring et al.  present a novel approach for multi robot object tracking and self localization using spatial relations of the target with respect to stationary objects. Ahmad et al.  represent the problem of cooperative localization and target tracking as a graph, where the edges are relative positions between the target and static landmarks, and solve the problem using sparse optimization methods. However, these methods are based on the use of static predefined landmarks. In unstructured environment, the robots have to select and share undefined landmarks autonomously. A novel approach for sharing the landmarks is proposed in this paper.
As a relation-based representation of the target, we also employ relative positions between the target and the objects near the target, which we call a geometric-relation-based representation (GRR). Use of relative position offers two advantages: 1. The error for the relative positions of objects is comparably small and 2. The information is independent of robot localization and odometry . Therefore, this relation is viewpoint-invariant.
We propose consensus-making algorithms to acquire viewpoint-invariant GRR. Through communication, robots compare and adjust their representations of a target and perceive which representation can be shared and what information is needed gradually. Although relative positions are viewpoint-invariant, not all components of GRR can be shared owing to occlusion or changes in object appearance. Robots can recognize a specific object regardless of viewpoint by means of a three-dimensional model . However, a priori knowledge cannot be used in unstructured environments and three-dimensional models require significant computation in a cluttered environment.
The purpose of the proposed approach is to share one target object in an unstructured environment between two robots: the client robot and the server robot. Several assumptions are made as follows.
(a) Environment: Some objects in the environment are similar to the target. The appearance of objects (e.g., its color and shape) may change as the result of a change in viewpoint. Occlusion of objects may occur. Objects do not move. Drastic illumination changes do not occur.
(b) Robots: The client robot recognizes a target object. The server robot does not know any target information. The robots are equipped with an RGB-D sensor and can execute translational and rotational movement.
The proposed approach based on the HIPM has two important features:
•Describing representations of the target The client robot describes representations of the target from an RGB-D image. Color and shape feature are employed as the basic representations to recognize an object; which are referred to as primitive representation. Also, GRRs are employed as relation-based representations. The procedure for describing representations of the target is illustrated in Figure 2.
Calculating representational priority is addressed in . Since we do not assume drastic illumination changes, this component is not discussed in this paper.
Describing representations of the target
To describe the primitive representation of each object, the input image has to be divided into objects and other areas, and the boundaries have to be contours of objects. In general, a computationally efficient segmentation method is required because the robots are supposed to move around. In this paper, we employ a segmentation method based on the depth information obtained using an RGB-D sensor . RGB-D sensors (Kinect or Xtion) can output sensing data at a frame rate of 30 Hz and with this method one can extract accurate contours by using the depth information.
We employ color and shape features as the primitive representation. In visual recognition, image features (e.g., color , shape , and feature points –) have been used to recognize an object. Although feature points may be salient and therefore suitable for object recognition, they are susceptible to viewpoint changes. However, color and shape features tend to be robust against viewpoint changes.
where H a and H b represent the hue histograms of objects O a and O b , respectively, and H(i) represents the value of the i th histogram’s bin. Equation (1) is computationally efficient and robust against partial occlusion and resolution changes.
h a and h b represent the Hu moments of objects O a and O b respectively, and h(i) represents the value of the i th Hu moments.
Similarity primitive representation from different viewpoints
Before explaining how to describe a GRR, the definition of a similar primitive representation has to be addressed. To identify the components of the GRR and the target, the server robot must determine which representation is similar to the representation sent from the client robot.
In this paper, similar primitive representations are determined by thresholding the similarity value defined by equations (1) and (2). In general, a primitive representation will be highly affected by changes in viewpoint. However, empirically, a similarity of primitive representations between the same object from different viewpoints lies within a certain range.
we define O b as having a similar shape representation to h a .
Two processes are needed to form a GRR as follows:
Select candidate objects of GRR components that have a salient primitive representation. An object that has no similar representation from the viewpoint of the client robot is likely to be identified uniquely. Therefore, such objects are suitable for candidates of GRR components.
Describe the relative positions among the target and components of the GRR based on distance information. Three objects of the same kind of primitive representation (e.g., three objects that have the same salient color representations) are selected from the candidates, and a triangle is formed. The reason why the same kind of primitive representation should be selected is that color and shape representations are invariant with respect to different disturbances (e.g., color representation is invariant to partial occlusion and shape representation is invariant to changes in lighting conditions). The reason why a triangle is chosen is that it is the minimum unit needed to divide a flat space into a closed area and other geometric shapes can be represented by a combination of triangles.
Sharing-viewpoint-invariant decision tree
Construction of the decision tree
When a primitive representation is employed, no similar representation is found close to the target object.
When a GRR is employed, no other objects belong to A t of a GRR.
Conventional methods (e.g., C4.5 and CART) tend to construct a large tree when learning data are not adequate. Cognitive sharing of an object requires classifying two types of objects: the target object and other objects. Because the target object is unique in the environment, target object class data are not adequate. In this case, taking C4.5 as an example, some information lead to the same value and C4.5 cannot select an effective node.
To construct a decision tree, we employ the branch and bound algorithm. The branch and bound algorithm has the advantage of constructing a decision tree that can minimize the required number of nodes. Redundant information may increase searching time because not all representation can be shared owing to appearance changes and occlusion. The solution to the problem resulting from viewpoint changes is discussed in the next section on consensus-making algorithms.
Objective Function and Constraint Condition
Subproblem P i (breadth first search)
Prune if |X|≤z.
z is the minimum upper bound seen among subproblems examined so far.
The server robot finds a similar representation in a ROI.
The number of objects that satisfy the similarity criteria of the representation is equivalent from the views of both the client and server robots.
When the second condition is not satisfied, the server robot may mistake the target because the server robot cannot perceive which object is occluded or changes in its appearance.
Next, we define viewpoint-invariant color, shape, and geometric-relation-based representation. Assume a set of n objects, which is perceived by the server robot. For the target, we have color representation H, shape representation h, and GRR g being sent from the client robot, and the number of representations similar to H and h from the viewpoint of the client robot is α and β, respectively. The number of candidates that belong to A t of g from the viewpoint of the client robot is γ.
and the primitive representation of the component is viewpoint-invariant.
The consensus-making algorithms are composed of four functional parts: sharing ROI, searching candidates, perceiving invariants, and adjusting the decision tree.
The server robot starts sharing ROI while rotating when receiving the decision tree (Figure 51○). A ROI has to be shared to verify whether or not a representation is viewpoint-invariant. Without sharing a ROI, robots cannot determine the cause of the misrecognition by searching another region or a non-viewpoint-invariant representation.
In this paper, if the server robot finds one or more components of the GRR, a ROI is considered to be shared roughly. If the server robot does not find components during one rotation, the server robot requests the client robot to send another decision tree. This process lasts until a ROI is shared.
After sharing the ROI, the server robot searches around the ROI for the target. The state of the server robot transitions according to the number of target candidates found by using the received decision tree. Let us denote the number of the candidates the server robot finds using a received decision tree by n and the number of candidates the client robot finds using the decision tree by k. When n=0, the state transitions to perceiving invariants. When n>k, the state transitions to adjusting the decision tree. When n=k, the state transitions to finishing cognitive sharing (Figure 52○).
The server robot regards a GRR as non-viewpoint-invariant under three conditions: (i) when the GRR includes the components that cannot be found in the ROI, (ii) when the GRR includes components whose candidates are found more than once in the ROI, and (iii) when the number of candidates existing inside of A t of the GRR, where the target object is found, is not equivalent between the client and server robots. Because the client robot selects the objects that have a salient primitive representation as GRR components, such components should be identified uniquely. However, either occlusion or appearance changes may occur as a result of difference of the viewpoints when the components are not found in the ROI. Also, objects near the components may become similar to the component owing to appearance change when candidates of the component are found more than once. The number of candidates differs when objects out of the client robot’s view exist. In this situation, a decision tree has difficulty in classifying the objects correctly.
If the received decision tree includes non-viewpoint-invariant GRRs, the server robot request the client robot to send another decision tree that does not include unidentified components. The robots iterate this process until they share a viewpoint-invariant GRR (Figure 53○).
Adjusting the decision tree
Even though the robots successfully share a viewpoint-invariant GRR, the server robot sometimes may fail to identify the target when the primitive representation is not viewpoint-invariant. For example, this would occur when an object out of the client robot’s view exists in A t of the GRR or when the objects near the target in A t change their appearance.
In these situations, the server robot has to determine what information is needed autonomously. The server robot calculates the similarity of primitive representations among the candidates as reduced by using the received decision tree. If the primitive representations of the candidates are not similar, the server robot requests the primitive representation. Then, the server robot adds the primitive representation to the decision tree and tries to identify the target. The server robot also requests another decision tree if the number of candidates cannot be reduced by using only the primitive representation. The server robot will finish searching when the number of candidates is reduced to one or when the number of candidates obtained by using the decision tree is the same between client and server robots (Figure 54○).
Experiments are conducted to validate the algorithm for use in real-world robots with basic kinds of objects from different viewpoints of the server robot as shown in Figure 6. The first two sections demonstrate that the proposed approach has robustness against the following problems:
• The server robot mistakes the target for a similar object.
In the third section, termination of the algorithm is discussed. Finally, the algorithm is evaluated quantitatively.
Each robot (client robot: Pioneer 3-DX; server robot: Amigo Bot) is equipped with a PC (Intel Core i5 2.4 GHz), an RGB-D camera (Kinect), and a communication module (OKI UDv4). The client robot recognizes the target and the server robot does not know a priori target information. Neither robot knows the current position of the other robot. Also, we assume that dramatic illumination changes and movement of objects will not occur in the time frame of communication.
Perception of viewpoint-invariant GRR
The client robot selected a GRR as a node of the decision tree to share the ROI. By t=4 [s], the server robot succeeded in receiving a decision tree and started searching the components. By t=13 [s], the server robot found two out of three components and succeeded in sharing the ROI.
Perceiving invariants: appearance change
The server robot searched the last component of the GRR but could not find it. Therefore, by t=18 [s], the server robot regarded the last component as non-viewpoint-invariant and requested the client robot to send another decision tree that did not include the unidentified component. Actually, the color for the last component from the viewpoint of the client robot was different from that of the server robot.
Perceiving invariants: occlusion
The client robot selected a GRR as a decision tree because the target could not be determined by using only the primitive representation owing to presence of similar object. By t=34 [s], the server robot did not find one of the components because of occlusion. The server robot regarded it as non-viewpoint-invariant and requested the client robot to send another decision tree.
Perceiving invariants: similar representation
By t=45 [s], the server robot requested another decision tree because one of the components had a similar representation in the environment owing to appearance change and another component was not found.
By t=49 [s], the server robot reduced the number of candidates to one because the received decision tree consisted of only a viewpoint-invariant GRR, and it succeeded in cognitive sharing.
Adjusting the decision tree
Termination of consensus-making algorithms
This experiment demonstrates that the proposed algorithms reduce the number of candidates as many as possible by using sharable representations. In this experiment, the relative position of the robots corresponds to Figure 6(b) and the target corresponds to the object labeled “Dummy” in Figure 7.
Success rate of cognitive sharing
SO do not exist
Do not share GRR
The experiments in Figure 8 and 10 demonstrated that the consensus-making algorithms allow the server robot to perceive non-viewpoint-invariant GRRs under three conditions: (i) The appearance of GRR components changed, (ii) a part of the components were occluded, and (iii) peripheral objects were similar to a part of the components in the view of the server robot. There may be some difficult cases where the robots fail to regard GRRs as viewpoint-invariant correctly. For example, a similar object to one of the components is occluded in the view of the client robot while the component is occluded in the view of the server robot, and the number of candidates existing inside of A t of the identified GRR is equivalent between the robots. However, such case is rare in the real environment.
The experimental results in Table 1 show the effect of combined representations for identifying the target. The success rate of cognitive sharing was 100% in the case that the robots can share the GRR, while the success rate is 38% in the case that the robots cannot share the GRR, i.e., the decision tree only includes the primitive representations.
We proposed a cognitive sharing algorithm based on visual information. A decision tree including a geometric-relation-based representation allows multiple robots to share precise ROI and avoids confusing the target for a similar object. The consensus-making algorithms serve to acquire viewpoint-invariant GRRs under conditions in which occlusion and object appearance changes occur. By adjusting the decision tree, the server robot, i.e., the robot requested to execute the task, identifies the target even if some primitive representations are not viewpoint-invariant. The consensus-making algorithms can reduce the number of candidates without succumbing to an infinite loop.
For future work, an important issue will be to integrate one of the features in HIPM, i.e., calculating representational priority with the consensus-making algorithms, thereby enabling robots to realize tracking and cognitive sharing of an object simultaneously in spite of disturbances.
Hierarchical invariants perception model
Region of interest
- Gates B: A robot in every home. Sci Am 2007, 296: 58–65. 10.1038/scientificamerican0107-58View ArticleGoogle Scholar
- Arai T, Pagello E, Parker LE: Editorial: advances in multi-robot systems. IEEE Trans Rob Autom 2002,18(5):655–661. 10.1109/TRA.2002.806024View ArticleGoogle Scholar
- Tan KG, Wasif AR, Tan CP: Objects tracking utilizing square grid RFID reader antenna network. J Electromagn Waves Appl 2008, 22: 27–38. 10.1163/156939308783122724View ArticleGoogle Scholar
- Xue Y, Tian G, Li R, Jiang H (2010) A new object search and recognition method based on artificial object mark in complex indoor environment. In: World congress on intelligent control and automation, 6648–6653. IEEE.Google Scholar
- Gohring D, Homann J (2006) Multi robot object tracking and self localization using visual percept relations In: Proceedings of IEEE/RSJ international conference of intelligent robots and systems, 31–36.. IEEE.Google Scholar
- Kira Z: Inter-robot transfer learning for perceptual classification. In Proceedings of the 9th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC; 2010:13–20.Google Scholar
- Umeda T, Sekiyama K, Fukuda T: Vision-based object tracking by multi-robots. J Rob Mechatron 2012,24(3):531–539.Google Scholar
- Yang M, Wu Y, Hua G: Context-aware visual tracking. IEEE Trans. Pattern Anal Mach Intell 2009,31(7):1195–1209. 10.1109/TPAMI.2008.146View ArticleGoogle Scholar
- Ahmad A, Tipaldi GD, Lima P, Burgard W (2013) Cooperative robot localization and target tracking based on least squares minimization. In: IEEE International Conference on Robotics and Automation 2013 (ICRA), 5696–5701, IEEE.
- Waibel M, Beetz M, Civera J, d’Andrea R, Elfring J, Galvez-Lopez D, Haussermann K, Janssen R, Montiel JMM, Perzylo A, Schiesle B, Tenorth M, Zweigle O, van de Molengraft R: A World Wide Web for Robots. IEEE Rob Autom Mag 2011,18(2):69–82. 10.1109/MRA.2011.941632View ArticleGoogle Scholar
- Filliat D, Battesti E, Bazeille S, Duceux G, Gepperth A, Harrath L, Jebari I, Pereira R, Tapus A, Meyer C, Ieng S-H, Benosman R, Cizeron E, MAmanna J-C, Pothier B (2012) RGBD object recognition and visual texture classification for indoor semantic mapping. In: IEEE international conference on technologies for practical robot applications (TePRA), 127–132. IEEE.Google Scholar
- Comaniciu D, Ramesh V, Meer P: Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 2003,25(5):564–577. 10.1109/TPAMI.2003.1195991View ArticleGoogle Scholar
- Isard M, Blake A: Contour tracking by stochastic propagation of conditional density. In Computer Vision -ECCV ’96. Lecture Notes in Computer Science, vol. 1064. Edited by: Buxton B, Cipolla R. Springer, Berlin Heidelberg; 1996:343–356.Google Scholar
- Shi J, Tomasi C (1994) Good features to track. In: Proceedings of IEEE conference of computer vision and pattern recognition, 593–600. IEEE.
- Lowe DG: Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comput Vision 1999, 2: 1150–1157. 10.1109/ICCV.1999.790410View ArticleGoogle Scholar
- Mikolajczyk K, Schmid C: Indexing based on scale invariant interest points. Proc Eighth IEEE Int Conf Comput Vision 2001, 1: 525–531. 10.1109/ICCV.2001.937561View ArticleGoogle Scholar
- Fitzgibbon A, Zisserman A: On affine invariant clustering and automatic cast listing in movies. Proc. Seventh Eur Conf Comput Vision 2002, 3: 304–320.Google Scholar
- Gevers T, Smeulders AW Color based object recognition. Pattern Recognit 1999,32(March):453–465. 10.1016/S0031-3203(98)00036-3View ArticleGoogle Scholar
- Swain MJ, Ballard DH: Color Indexing. Int J Comput Vision 1991,7(1):11–32. 10.1007/BF00130487View ArticleGoogle Scholar
- Hu Visual pattern recognition by moment invariants. IRE Trans Inf Theory 1962,8(2):179–187. 10.1109/TIT.1962.1057692MATHView ArticleGoogle Scholar
- Gary B, Kaehler A: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, Sebastopol, CA; 2008.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.