Verbal guidance for sit-to-stand support system

Artificial intelligence (AI)-based robots have become popular in various fields, thereby increasing the demand for care robots. Such care robots recognize or estimate factors such as human states and then perform actions depending on the estimation results. If the humans cooperating with robots do not understand robot functions well, the task will not be performed appropriately. Additionally, it is possible that the estimation may be incorrect. Robots may make mistakes, and there is a risk that these mistakes may not be understood by humans. In such cases, robots are not useful, and this may hinder the cooperation between humans and robots. There are several ways to deal with this issue including improving estimation and information representation. We can know what is important to design care robots by analyzing the influence of factors including accurate estimation and contents and timing of information representation. We developed a robot for analysis that can support a human during standing, by raising its armrest. The developed robot raises its armrest when the user leans on it. In the nursing-care field, verbal communication is important. Hence, we also adopt a verbal guidance for information representation. Verbal guidance to encourage a user to stand up and countdown timer representations are adopted. Initially, an experiment is conducted to validate the usability of the system that acts according to considerably accurate estimations. The experimental results show that system without verbal guidance causes anxiety to humans and the system becomes not useful, even if the human state is detected accurately. Furthermore, we determine the appropriate content and timing of verbal guidance based on the results. Subsequently, we conduct an experiment to confirm that the proposed method is applicable under imperfect estimations.


Introduction
The demands for artificial intelligence (AI)-based robots have increased in various fields. Such robots recognize or estimate various parameters and can also be used for environmental recognition and human state estimation [1][2][3]. Robotic systems often employ these estimation results for anomaly detection or determining the required function of the robots [4,5]. These robots actions which is according to human users have reported to be effective for care robots [6][7][8]. However, these estimations are not always correct. A majority of estimation methods are capable of providing accurate estimation. However, it is considered to be impossible to detect the accurate timing of user state transition. In particular, for human-robot interactions such as those of care robots, a lesser number of sensors is preferred because it reduces cost and also avoids privacy breaches. It is a privacy risk if someone obtains the data collected by robots. Hence, it is preferred that robots collect lesser amounts of data. However, this often results in a lower estimation performance, and the system also becomes low performance and not useful.

Open Access
*Correspondence: m.takeda@srd.mech.tohoku.ac.jp 1 Department of Robotics, Tohoku University, Aramaki-Aoba, Aoba-ku, Sendai, Japan Full list of author information is available at the end of the article Usability is one of the most important issues for robots. Even if robots are capable of estimating environments or user states almost accurately, it is possible that the system may not be useful. Especially during physical humanrobot interactions (p-HRIs), humans can feel anxious and consider the system as unhelpful if the system is not easy to understand. For example, if the users of a system are not aware of the robot's actions or its timing, the users can not adjust the action timing. Hence, some researchers have focused on knowledge representation to address this issue. Displaying the robot plans is one of methods to represent the robot's action [9]. Furthermore, simple LED lights can also represent the required information, including the robot's state [10]. Knowledge representation can also be used for teaching. The dance teaching robot can convey the next step to the user by applying a force to the user [11]. Additionally, some researchers have studied systems that confirm human commands [12]. The transparency of the system is important, especially for human-robot interactions [13][14][15]. These information representation methods are often used in considerably accurate systems. However, such representation can be helpful for imperfect estimation systems to improve usability. Although the estimations performed by a robot are for the purpose of adjusting to humans, the performance and usability of the robot can be improved if humans also adjust to the system. If robots can convey the their actions and action timings to users, the users can adjust their actions according to the robots. A rough estimation can be employed in robot systems by using knowledge representation, which will contribute to the safety of the system.
To adjust the motion, the action timing of the robot or the user is important as represented knowledge. In the case of care robots, users are considered to be cognitively damaged. Therefore, the knowledge representation method should be considered for such robots. Furthermore, sound is a good medium for representing timing. However, for elderly people, only sound is not adequate for understanding. Verbal communication is effective in nursing-care. Caregivers should talk to the care recipients during nursing-care, especially before touching or applying force to their bodies. This importance of verbal communication is well known as one of concepts of Humanitude [16,17]. Thus, verbal guidance can also be an effective tool for care robots.
In this paper, we analyze the importance of accurate estimation and contents and timing of knowledge representation for system usability and this analysis can be used for designing care robots. If the appropriate guidances make robots useful even if the estimation is not strictly accurate, accurate human state estimations are not necessary for robots. First, we developed a sit-to-stand support robot that performs actions based on the estimation result and the user's standing motion. Two types of experiments are conducted for the evaluation of the proposed system. The first experiment was used to validate the effectiveness of verbal guidance under accurate estimation. The estimation was conducted by humans since humans are assumed to be able to accurately detect human states. We set various verbal guidance patterns by combining the contents and timings of knowledge representation. System Usability Scale (SUS) [18] is adopted for the evaluation. From the experimental results, we confirmed that an accurate estimation does not always make the robot useful. It is also be confirmed that the capability for users to adjust motion makes robots useful. The appropriate verbal guidance patterns were determined by comparing the results. Subsequently, the estimation that is not perfect but almost accurate is adopted in the robot system; thereafter, the experiment to validate the effectiveness of the proposed method for care robots that imperfectly estimate user states, is conducted.

Verbal guidance concept and methods
This section explains the main concept of knowledge representation. Even if robots estimate the appropriate timing and perform actions, if the robots move without any information, the system user may be caught unawares, and the system may be deemed unfriendly. The users may feel useless if they do not know the actions of the robot or if they cannot adjust the timing. Hence, we realize a system to ensure that the user can understand the robot, user action, and its timing.
First, we developed a sit-to-stand support robot system. Thereafter, we determine the required information that is to be represented to user and the method of representing this information. The details of this system are explained in the following subsections.

Robot system
We focus on the care robot as it includes p-HRIs. We developed a standing support robot, as shown in Fig. 1, to implement and validate the proposed system. The robot is equipped with an armrest that could move vertically. It also has wheels and casters for walking, and the armrest can manually be moved down with the help of a switch. The specifications of the robot are explained in [6].
In the normal sit-to-stand motion, humans initially lean their upper body forward. Consequently, the center of mass (CoM) of the human body also moves forward. Thereafter, humans lift their body upward. Humans cannot stand without first leaning forward. People who need support for sit-to-stand actions, including elderly people, are assumed to be able to lean their upper body. However, they find it difficult to lift their bodies. Hence, a linear actuator is adopted to move the armrest vertically to support sit-to-stand actions, as shown in Fig. 2. The flowchart of the robot system is presented in Fig. 3.

Knowledge representation
It is necessary to determine the content, timing, and medium of the information to be represented. The system is simple as it has one function. The required information are robot estimation, robot action, user action, and timing of these actions. We assume that the users have been familiarized with the system simply by explanation, prior to using it. The robot recognition result, content of robot action, and required user action can be represented as a single sentence such as "let's stand up. " The timing should be represented by sounds, such as a countdown of "3, 2, 1. " Generally, in nursing-care, the caregivers speak to the care recipients. Communication is necessary before the caregiver touches or applies a force to the care recipients. In addition, auditory information can be obtained even if we do not concentrate on guiding, and sound is effective in representing timing. Hence, we select sound as the medium of knowledge representation.
We should consider the timing of knowledge representation. The timing of information representation can be determined according to the timing of the action. The first experiment was performed to validate the effectiveness of verbal guidance under accurate estimation. Furthermore, it can also determine the appropriate timing of representation of verbal guidance, which is explained in the next section.

Validation experiment of verbal guidance for the system with accurate estimation
First, we validate the effectiveness of knowledge representation under an accurate estimation. Subsequently, we compare groups of guidance methods to determine the appropriate method of guiding.
The system estimates the user's leaning and raises the armrest. Therefore, the verbal guidance should be represented around the user's leaning. We set the timing of verbal guidance as follows:  In this experiment, for realizing an almost accurate estimation, the leaning estimation is conducted by a human. One of the authors determined the user's state and sent a command to the robot, as shown in Fig. 4.
As explained in the previous section, the contents of verbal guidance are "let's stand up" and "3, 2, 1, " which are a representation of the action and timing, respectively. By combining them, the contents patterns are as follows: • Silence (Without Guidance) • Only "let's stand up" • Only "3, 2, 1" • Both "let's stand up" and "3, 2, 1" Continuously • Both "let's stand up" and "3, 2, 1" Separately As described above, there is more than one timing. Thus, the guidance can be represented in two ways, i.e., continuously at the same timing and separately at different timings.
To determine the appropriate guidance patterns, we consider all the guidance patterns and compare them. The armrest starts to rise following the ending of the last verbal guidance. Thus, there are three patterns for without guidance. Therefore, a total of 15 guidance and support patterns can be obtained by all combinations of the above mentioned timings and contents, as shown in Table 1. Figures 5 and 6 present the guidance and support procedure examples. For pattern 2, the user is initially sitting, and there are no guidance and support patterns, as shown in Fig. 5a. When the user starts to lean, the robot starts moving the armrest without guidance, as shown in Fig. 5b. For pattern 14, the verbal guidance "let's stand up" starts when the user is sitting, as shown in Fig. 6a. The verbal guidance countdown "3, 2, 1" starts at the end of leaning; subsequently, the armrest moves, as shown in Fig. 6c, d. The overviews of the guidance examples are shown in the Additional file 1.
Ten participants performed sit-to-stand motions by using the assistive robot. The participants were of both genders, 22-24 years old, 164-175 cm tall, and weighed 50-63 kg. None of the participants had physical disabilities. Informed consent was obtained from all participants prior to the experiments. The basic information of the participants is listed in Table 2. Although the actual users of care robots are elderly people, we supposed that it can be simulated by healthy young people  "let's stand up" and "3, 2, 1" are read out in Japanese as the participants are all Japanese. The representation order is randomly determined for each participant.
The System Usability Scale (SUS) [18] was used for evaluation, and participants wrote a brief comment for each pattern. SUS is a method to measure the usability of systems. It consist of ten questionnaires and the participants answered the degree of agreement or disagreement with the statement on a 5 point scale for each guidance pattern. The score is calculated as 0-100 point. If the score is around or over 70, the system is useful [19]. It is adopted to measure the system usability, and it can be an indicator of anxiety since the user anxious affect the  Guidance and support procedure (pattern 2.) a User is sitting; b armrest start uprising at beginning of leaning usability of the system. The users' brief comments are also obtained to discuss the anxiety. The result is presented in Fig. 7. The blue bars represent the SUS scores for each guidance pattern, and the error bars indicate the standard deviations. As shown in Fig. 7, the SUS scores of patterns 1-3 are very low. The results show that the patterns without verbal guidance are not useful. Users require timing representation while or after leaning, and patterns with both guidance are the most useful.
From the brief comments, we confirm the same tendency as for the SUS results. For the no guidance pattern, several participants felt that the system can be used if a user prepares to stand up, whereas almost all participants felt anxious or thought that users cannot prepare to stand up. The guidance "let's stand up" is useful for preparing the user to standing up; however, it is disadvantageous for understanding the meaning. In the case of using only "let's stand up", users cannot understand the timing appropriately. By contrast, "3, 2, 1" is useful for understanding the timing; however, it is too short if it is the only guidance. If both guidance systems are used, users can prepare for standing up as well as understand the timing. Even though the guidance in this case is slightly longer, the advantage for easy understanding is considered to be significant for elderly care.
Based on the comments, we know that the users feel that the beginning and the end of leaning is similar for verbal guidance. This tendency can be confirmed on the SUS results, especially for the patterns that adopt both guidance systems (patterns [10][11][12][13][14][15]. It is believed that this is because the leaning motion does not require a significant amount of time. This indicates that the system can function even if the guidance timing is not strictly accurate.

Experiment for imperfect estimation system
In this section, the experiments validated the proposed method with the actual system that could estimate a user's state almost accurately. We adopted an estimation method that we previously proposed in [6]. A more accurate system can be developed using another estimation method for robot control, for example, admittance control or impedance control. However, accurately detecting user state transition is difficult even if such estimation  method is adopted. Then, we validated that the verbal guidance is effective for robot systems that is based on imperfect estimation. From the SUS scores and participants comments obtained from the previous section's experiments, patterns 11, 12, 14, and 15 were adopted for the next experiments. the previous experimental results showed that that people at the beginning and end of the leaning motion feel the same for guidance timing. Thus, we unified them as "leaning" and set two patterns as follows; A. leaning: "let's stand up" and "3, 2, 1" B. sitting: "let's stand up", leaning: "3, 2, 1" An estimation method was implemented to the developed robot. Using a distance sensor and pressure sensors on armrest, the participants' CoM candidates can be calculated as explained in [6,20]. The robot system estimates the user state from the two states; only sitting or sitting with the upper body leaning, using a method which is proposed in [6]. The user state can be estimated using a support vector machine (SVM).
An experiment is conducted to validate the estimations. Ten participants conducted leaning and sit-tostand motions using the assistive robot. The participants were the same as used for the previous experiment. Informed consent was obtained from all the participants before the experiments.
The learning start and end time and estimated state transition time are listed in Table 3. As an example, participant J's time variation of estimated state is shown in Fig. 8.
As shown in Table 3 and Fig. 8, the system can estimate a user's leaning motion while they are leaning. The leaning motion is continuous and there is no noticeable boundary between the two states; only sitting and sitting while leaning. Therefore, accurate detection of state transition timing is difficult. The system performance should be validated to determine whether the estimation is enough for the target function, i. e., user standing support. We considered that the performance can be sufficient for supporting if there is knowledge representation. The user can adjust the timing using the represented knowledge if the representation is appropriate.
For simulating the actual use, various types of chairs and a bed are adopted and put on the experimental area as shown in Fig. 9. The height of chairs and the bed are as follows; • Chair 1: 600 mm • Chair 2: 500 mm • Chair 3: 460 mm • Chair 4: 430 mm • Chair 5: 400 mm • Bed: 390 mm The pictures of the chairs are shown in Fig. 10. Chair 3 is the same chair as used in the previous experiments.
The participants first stood up from a chair using the robot. Then, they walk toward another chair, and sit with the help of the robot. They conducted the above procedure for all chairs and the bed. The robot provided verbal guidance and support when the user stood up. The user can sit with the armrest moving down. The armrest moves down when the user turn on the switch. When the user is walking, the robot does not provide any assistance and it act as a non-robotic walker. The order of guidance pattern and chairs were determined randomly for each participant. The same participants that were used in the previous experiments were employed in this experiment. Informed consent was obtained from all the participants before the experiments. The experimental setup is shown in Fig. 11. The overview of the experiment is shown in the Additional file 1.
The results are shown in Fig. 12. From the results, we confirm that the proposed method works even for the case in which the estimations are not strictly accurate. The SUS scores of chairs which have similar seat height as chair 3 were high. Moreover, the SUS scores were low for low sheet chairs and bed. The main reason is that the hardware is not suitable for a chair with too low sheet. These results suggest that the verbal guidance makes system more useful even if the estimation is not accurate.

Results and discussion
In this section, we discuss the results of the experiments presented above. From the experimental results obtained from Validation Experiment of Verbal Guidance for the System with Accurate Estimation section, it is confirmed that verbal guidance was needed for usability even if the estimation was considerably accurate. Both the content and timing of action are effective.  From the participants comments, we concluded that there was no significant difference experienced between the verbal guidance given to the participants at the beginning and end of leaning. The appropriate verbal guidance patterns are determined from the SUS scores and participants comments. The two guidance patterns were adopted for the imperfect estimation system experiment in the previous section. Several types of chairs and a bed are used for simulating actual use in the experiment. The effectiveness of verbal guidance for imperfect estimation system is confirmed from the experimental results using SUS.
The SUS scores for cases without guidance were low even if the estimations were accurate as shown in Fig. 7. Participants commented that they became scared if the robot moved without any guidance even if the timing was appropriate and they knew when the robot would move. The results suggested that accurate estimation does not directly result in good performance and usability. It is important for p-HRI that users know about the robot action and can adjust their actions according to the robots.
The SUS scores of the imperfect estimation system were similar to those of the accurate estimation experiments. Even if the estimation was not accurate, the use of verbal guidance resulted in a high usability of the robot. Users could stand up well with the help of guidance system and could deal with any failure as they know the robot actions through verbal guidance. Thus, it is expected that verbal guidance can improve the performance and safety.
In this paper, we focused on the developed sit-to-stand support robot, however, the proposed method can also be applied to other robot systems. For example, it can be used for motion support systems and it can be applied easily. In the case of walking support system that can predict user's fall, the robot can stop the wheels, robot represent the estimated user state and tell user that robot will stop.
Robots that are capable of decision making based on estimation should communicate that to humans. If there are p-HRI or cooperation between robots and humans, the cooperation tasks can be carried out by representing contents and timing of the actions. For example, if robots that are designed to carry something with humans provide verbal guidance, humans can also cooperate with robots in the similar way as with humans.

Conclusion
In this study, we analyzed the importance of estimation performance and contents and timing of represented information, and pointed that robot that act based on the estimations should achieve accountability and can make the system useful even if the estimations are not strictly accurate.
We developed a robot system that support a user's standing motion by raising its armrest up when the user is leaning on it. The guidance for encouraging the user to stand up and timing representation of robot action are adopted. Verbal interaction is important for nursing-care, thus the knowledge are represented by sound.
Experiments are conducted for validating the proposed method. From the results, we confirmed that system without the guidance causes anxiety in humans and it is not useful even if the estimation is accurate. We also know the appropriate guidance pattern from the experimental results and confirmed that systems based on imperfect estimation can be useful with verbal knowledge representation. Moreover, the proposed method can be applied to other robot systems.
Tests with elderly people should be conducted as a future work. The experiments are conduct with young