[PDF] Learn and Transfer Knowledge of Preferred Assistance Strategies in Semi-autonomous Telemanipulation

Abstract

Enabling robots to provide effective assistance yet still accommodating the operator's commands for telemanipulation of an object is very challenging because robot's assistive action is not always intuitive for human operators and human behaviors and preferences are sometimes ambiguous for the robot to interpret. Although various assistance approaches are being developed to improve the control quality from different optimization perspectives, the problem still remains in determining the appropriate approach that satisfies the fine motion constraints for the telemanipulation task and preference of the operator. To address these problems, we developed a novel preference-aware assistance knowledge learning approach. An assistance preference model learns what assistance is preferred by a human, and a stagewise model updating method ensures the learning stability while dealing with the ambiguity of human preference data. Such a preference-aware assistance knowledge enables a teleoperated robot hand to provide more active yet preferred assistance toward manipulation success. We also developed knowledge transfer methods to transfer the preference knowledge across different robot hand structures to avoid extensive robot-specific training. Experiments to telemanipulate a 3-finger hand and 2-finger hand, respectively, to use, move, and hand over a cup have been conducted. Results demonstrated that the methods enabled the robots to effectively learn the preference knowledge and allowed knowledge transfer between robots with less training effort.

Full PDF

LLearn and Transfer Knowledge of Preferred Assistance Strategies in Semi-Autonomous Telemanipulation

Lingfeng Tao*, Michael Bowman*, Xu Zhou*, Jiucai Zhang^, and Xiaoli Zhang*

Abstract — Enabling robots to provide effective assistance yet still accommodating the operator’s commands for telemanipulation of an object is very challenging because robot’s assistive action is not always intuitive for human operators and human behaviors and preferences are sometimes ambiguous for the robot to interpret. Although various assistance approaches are being developed to improve the control quality from different optimization perspectives, the problem still remains in determining the appropriate approach that satisfies the fine motion constraints for the telemanipulation task and preference of the operator. To address these problems, we developed a novel preference-aware assistance knowledge learning approach. An assistance preference model learns what assistance is preferred by a human, and a stagewise model updating method ensures the learning stability while dealing with the ambiguity of human preference data. Such a preference-aware assistance knowledge enables a teleoperated robot hand to provide more active yet preferred assistance toward manipulation success. We also developed knowledge transfer methods to transfer the preference knowledge across different robot hand structures to avoid extensive robot-specific training. Experiments to telemanipulate a 3-finger hand and 2-finger hand, respectively, to use, move, and hand over a cup have been conducted. Results demonstrated that the methods enabled the robots to effectively learn the preference knowledge and allowed knowledge transfer between robots with less training effort.

Key words — Semi-autonomous telemanipulation; preference-aware assistance; learn from ambiguous human data; preference knowledge transfer I. I NTRODUCTION

ELEMANIPULATION is a branch of teleoperation [1] in which a human operator can remotely manipulate objects using the teleoperated robot’s hands. Unlike other teleoperation branches such as teleapproaching and telefollowing, telemanipulation tasks require fine motion adjustments to grasp objects at specific angles or at specific points and to apply force in a particular manner. For example, remotely controlling a robot hand to plug a phone charger into a wall outlet involves strict motion constraints to prevent the robot finger from blocking the charger plug. Applications of telemanipulation include industrial inspection and repairing, space exploration, search and rescue, and assistive living robotics. Most current telemanipulation approaches are master-slave control, where an operator’s hands give motion commands through data gloves, optical tracking, or reflective markers, and a robot hand follows. Such approaches rely on the operator’s cognitive spatial transformation reasoning and fine motion tuning to overcome the sense of disembodiment and the physical discrepancy [2][3] between the operator’s hand and the robot’s hand to satisfy the subtle motion constraints for task success. Indirect manipulation and visualization for complex telemanipulation tasks can impose a large physical and mental burden on the operator, increasing failure, and user frustration [4]. It usually takes hundreds of hours to adequately train an operator for a specific telemanipulation task and robot [5]. Current telemanipulation approaches still rely on the kinematic mapping between a human hand and a robot hand, whose performance is affected by the physical discrepancy (e.g., telemanipulate a 3-finger robot hand) and lack of consideration of task requirement/constraints. Existing efforts add passive constraints such as envelopes [6] and virtual fixtures [7] to the operation environment, which cannot achieve subtle motion adjustment in the manipulation process. Recent research has demonstrated that robots can blend human input with robot action by inferring human intent so it can provide more active assistance in teleoperation [8]-[10]. However, these methods are only implemented in teleapproaching using a robot arm with trajectory assistance, instead of controlling a robot hand for telemanipulation. Methods that enable robots to actively provide assistance to telemanipulate objects have not received enough research attention. To satisfy the fine motion requirement in telemanipulation, a robot also needs to T This material is based on work supported by the US NSF under grant 1652454. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation. *L. Tao, M. Bowman, X. Zhou and X. Zhang are with Colorado School of Mines, Intelligent Robotics and Systems Lab, 1500 Illinois St, Golden, CO 80401 USA (e-mail: [email protected], [email protected], [email protected], [email protected]). ^J. Zhang is with the GAC R&D Center Silicon Valley, Sunnyvale, CA 94085 USA (e-mail: [email protected]) nderstand the operator and provide active assistance yet still accommodate the operator’s commands, that is, semi-autonomous telemanipulation is necessary. Toward semi-autonomous telemanipulation, a critical problem to overcome is how to enable the robot to provide assistance that is preferred by the operator. Although active assistance is essential, it is unknown how much assistance is appropriate to balance task success with the operator’s feeling of being in control. Due to the difference in hand structures, some motion assistance from the robot may surprise the operator with counterintuitive movements, which could introduce more burden to the human to correct the actions, reduce the operator’s sense of system control and consequently increase the operator’s resistance of using the robot. Although researchers can develop different control methods with the goal to improve the operation quality [11], the problem still remains in determining which control method is preferred by a human operator. To overcome these deficiencies, the robot needs to be equipped with preferred assistance knowledge to understand the operators’ preferences on the manner of different assistance strategies. However, this understanding of human preferred assistance in telemanipulation has rarely been studied. Learning preference needs human subject experiments, which normally present poor data quality caused by the ambiguity and uncertainty of human intent [12][13]. One issue for preference modeling is that the inherent data discrepancy caused by the human ambiguity and relatively small size of the human dataset may cause the training process to suffer from performance oscillation, convergence difficulty, overfitting problem, and converging to a local solution. Additionally, derived or learned models are mostly user-specific and robot-specific and cannot adapt to new users or robots. The problem remains of how robots can quickly learn useful assistance knowledge from the teleoperation scenario with human involvement and transfer the knowledge to other robots with less training efforts. This paper provides a methodology for robots to learn the preferred assistance knowledge in a manner of the predicted rank of the available assistive robot control methods, which empowers a robot to choose the preferred method for flexible grasp generation that accommodates the operator’s motion commands and at the same time autonomously regulate its pose to satisfy the operator’s preferences (fig. 1). Such preferred assistance has the potential to reduce the frustration of the operator and help build better human-robot cooperation in telemanipulation tasks. The main contributions are as follows: (1)

Preference-model-enabled assistance.

A preference model is developed to learn the preferred assistance knowledge, where the input is robot grasp configurations generated by control strategies, and the output is predicted ranking. An interpretation layer was designed to convert the raw input data to preference-related features based on domain criteria such as mimicking operator motion, maximizing task success, or minimizing travel distance. The preference model enables a robot to provide active assistance that is preferred by human operators. (2)

Stagewise Preference Model Updating (SMU) methods.

To improve the stability while learning the preference model, we develop SMU methods with optimizing objectives: prediction accuracy (SMUPA), prediction error (SMUPE), and weights tendency (SMUWT) that update the model from model candidates stage by stage during the training process. The methods

Fig. 1. The framework of modeling assistance knowledge and transferring knowledge between different robots. In a telemanipulation task, operator commands are processed by different control strategies to generate robot grasp poses, which will be converted to preference-related features that human may use to perceive the quality of robot assistance, such as mimicking human motion, following human intent and optimizing kinematics. A preference model is trained with the Stagewise Model Updating methods to overcome the inherent data imperfection caused by human ambiguity and to learn the relationship between human preference while controlling a 3-finger robot hand. Then we transfer the learned knowledge to a 2-finger hand with the modified knowledge transfer methods. The transferred model can be refined with much less training to achieve equivalent performance as if the preference model is specifically trained for the target robot.

Grasp Input Learn Knowledge Model Motion Plan

Control Strategies Preference-Related Features Stagewise Model Updating Human Preference 𝐶𝐶 𝐶𝐶 𝐶𝐶 𝑛𝑛 … 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑂𝑂 𝑛 , 𝑂𝑂 𝑛 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑛 𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑 … 𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑n𝑑𝑑ℎ

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑

Knowledge Transfer 3-Finger Hand2-Finger Hand

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑 … 𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑n𝑑𝑑ℎ

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑 … 𝐶𝐶 𝐶𝐶 𝐶𝐶 𝑛𝑛 … 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑂𝑂 𝑛 , 𝑂𝑂 𝑛 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑛 … ncrease the stability of preference learning, reduce learning iteration, and improve the performance of the preference-aware models. (3) Cross-robot knowledge transfer methods.

To avoid extensive robot-specific training, proactive knowledge transfer methods are developed to better extend the learned model across different robot hands. With the rank-based prediction of the preference model, one possibility is that the operator may rank the control methods differently for different robot structures. The goal of our knowledge transfer methods is to learn the accurate rank for the new robot structures with less training effort. II. R ELATED W ORKS

1. Development in Telemanipulation

Current telemanipulation research focuses more on analytical kinematic mapping methods based on the structures of the operator and robot [14]-[16]. Data-driven methods are widely used in kinematic mapping for 5-finger robots, such as end-to-end mapping for a humanoid robot hand, and these methods have good performance in mimicking human motion [17]. There are few applications for robotic end-effectors that differ from a human hand structure. Recent research indicates that bilateral telemanipulation that considers the specific kinematics of the devices involved which takes advantage of a virtual mediate object and forward and backward mapping algorithms can generate telemanipulation relation in asymmetric mapping [18]. But these methods still are pure kinematics and follow the perspective of the operator command; the robot lacks the ability to cooperate with and proactively aid the operator. For object manipulation tasks, task complexity and the additional requirement of fine motion operation require the robot to understand the operator’s intent for task completion and preference for level of assistance, and autonomously regulate its configuration to ensure task success [19]-[22]. Reducing both the operator’s workload and the difficulty of robot control through robot inference of operator intent for task completion is a recent topic in teleoperation, particularly in teleapproaching tasks. Research has demonstrated that in a target-approaching process, the robot agent can infer the target location by observing the operator’s motion trajectory and provide motion assistance in approaching the target using linear blending strategies [23][24], virtual boundaries [25][26], and force guidance [27]. The bounded-memory adaptation model was used in human-robot mutual adaptation to predict the intent of an operator to maintain the operator’s trust in teleapproaching a target object [28]. However, all these works only focused on teleapproaching an object not telemanipulating an object. For an object manipulation task, end-effector motion trajectory blending methods that treat the end-effector as a single point are not effective for robot finger control, because they rarely consider the fine motion constraints that are critical for the success of manipulation tasks. These subtleties in motion for object manipulation are also difficult to replicate with robotic hands because of their physical differences from human hands. We vision that different strategies will be developed to overcome these challenges in telemanipulation tasks. But, in practice, operators may have a preference for these strategies. The missing component is a model that can handle control strategy selection based on preferences. Therefore, there is a need to build preference models for telemanipulation.

2. Development in Preference Modeling

Modeling human preference is a trending topic in robotic research, especially in human-robot cooperation applications [29] ， such as, industrial assemblies, hybrid driving, and manipulation, in which a human and a robot need to cooperate with each other to complete a task. Human preference is a reference to understand how the action of the robot is perceived as effective or intuitive by the operator. A preference model helps to optimize the robot action and increase the trust and cooperation quality between humans and robots. Researches like [30] present a mathematical preference model based on probabilistic planning and game-theoretic algorithms to help the robot to understand and adapt to human preference in a leader-follower manner. The preference can also be learned from the online human trajectory demonstration for mobile manipulators such as assembly line robots [31]. Machine learning methods are getting popular recently. The approach in [32] formulates the model as a Markov Decision Process (MDP) and uses Inverse Reinforcement Learning (IRL) to learn human preference as a reward function. A similar approach in [33] also uses MDP modeling but learn the preference with regression-based and gradient-based methods. The sources of data are also expanded from human demonstration to subjective feedback like natural-language-facilitated preference learning [34]. The above approaches show that researchers have put great efforts into the preference modeling problem. However, these methods are applied in applications where both humans and robots can directly interact with the environment and have the ability to finish the task individually. The robot can autonomously execute action while taking human preference as a soft constraint. In telemanipulation, the unique problem s that the action of the robot is controlled by the human operator’s command, which is a hard constraint that the robot must obey. The robot can only semi-autonomously assist with the consideration to balance the control performance and the operator’s feeling of being in control. A preference model in telemanipulation is essential to help the robot to provide appropriate assistance. But, to the best of our knowledge, preference modeling in telemanipulation has been rarely reported. III. A SSISTANCE P REFERENCE M ODEL

The human preferred assistance is defined to be those that are intuitive to human operators yet effective toward task success. The input of the preference model is characteristic raw data of robot hand configurations generated by different control strategies. These raw configuration data are converted to preference-related features by the model. The output of the model is based on human subjects’ ranking of the preferred control strategies. The robot can use the learned model to determine the preferred control strategy (i.e., highest rank) to provide active assistance to achieve semi-autonomous telemanipulation. We define control strategies 𝐶𝐶 𝑛𝑛 ∈ 𝒞𝒞 , where 𝑛𝑛 represents strategy type. A control strategy type contains a group of controllers with different optimization criteria. For example, a control strategy like mimicking human motion can contain different controllers that emphasize the fingertip motion mimicking or joint motion mimicking or both. The optimization criteria of the controllers are used to design quantitative preference-related features 𝑂𝑂 𝑚𝑚 ∈ 𝒪𝒪 that operators may use to perceive the quality of robot assistance. The raw grasp motion generated by each controller 𝐶𝐶 𝑛𝑛 can be interpreted to preference-related features 𝐶𝐶 𝑛𝑛 → [ 𝑂𝑂 , 𝑂𝑂 … , 𝑂𝑂 𝑀𝑀 ] . In Table I, an example list of potential preference-related features and corresponding controllers based on domain knowledge and literatures are shown. IV. S TAGEWISE P REFERENCE M ODEL U PDATING M ETHOD

A feed-forward neural network with sigmoid hidden neurons and linear output neurons is adopted for the preference model. Neural networks (NN) have been successful in supervised learning to map the relation between paired input-output training data [40]. A novel training method named stagewise model updating (SMU) can stabilize the training process and obtain the optimal model in a short training process. A stand-alone model 𝑀𝑀 𝑠𝑠 is used during the training process and updated stage by stage. In each training episode, the snapshot model 𝑀𝑀 𝑠𝑠𝑗𝑗 is saved in every 𝑁𝑁 iteration, which is then evaluated with the defined metric. The evaluation metrics include prediction accuracy (SMUPA)-based, prediction error (SMUPE)-based, and weights tendency (SMUWT)-based. At the beginning of the next training stage, the current model is updated with the best-performed snapshot model.

1. SMU Evaluation Metrics

SMUPA:

Improving the prediction accuracy of the preference model is the priority during the training process. In this metric, the performance of the snapshot models is validated by checking their prediction accuracy based on the reranking of the raw prediction for each operator command, where the evaluation metric is computed as 𝐴𝐴 = ∑ 𝐶𝐶�𝑠𝑠�𝑀𝑀 𝑠𝑠𝑗𝑗 ( 𝜏𝜏 ) � − 𝑟𝑟 𝜏𝜏 � Γ𝜏𝜏=1 (1) where 𝑠𝑠 [ 𝑀𝑀 𝑠𝑠𝑗𝑗 ( 𝜏𝜏 )] is the reranked prediction of the controllers for one operator command, and 𝑟𝑟 𝜏𝜏 is the true rank. Function 𝐶𝐶 compares the rank and output: if 𝑠𝑠�𝑀𝑀 𝑠𝑠𝑗𝑗 ( 𝜏𝜏 ) � = 𝑟𝑟 𝜏𝜏 , 𝐶𝐶 = 1 , if 𝑠𝑠�𝑀𝑀 𝑠𝑠𝑗𝑗 ( 𝜏𝜏 ) � ≠ 𝑟𝑟 𝜏𝜏 , 𝐶𝐶 = 0 . Γ is the number of testing data. TABLE I P

OTENTIAL P REFERENCE -R ELATED F EATURES AND C ORRESPONDING C ONTROLLERS

Controllers Preference-related features Description Intent-based control [35] 𝑂𝑂 = 12 � ( 𝑃𝑃 𝑖𝑖 ( 𝑅𝑅 ) − 𝑇𝑇 𝑖𝑖 ) 𝑇𝑇 is the inferred human task intent. 𝑃𝑃 ( 𝑅𝑅 ) are the probability distribution of each tasks given robot pose 𝑅𝑅 . Kinematics optimization [36] 𝑂𝑂 = � 𝜆𝜆 𝑖𝑖 ( 𝑅𝑅 𝑖𝑖 − 𝐻𝐻 𝑖𝑖 ) 𝑅𝑅 and 𝐻𝐻 are robot and human kinematics, include palm and finger configurations. 𝜆𝜆 is the KL divergence for feature 𝑖𝑖 . Joint-wise optimization [37] 𝑂𝑂 = � ( 𝜌𝜌⃗ 𝑖𝑖 − 𝜑𝜑�⃗ 𝑖𝑖 ) 𝜌𝜌⃗ and 𝜑𝜑�⃗ are robot and human joint configurations (Joint mapping is needed for discrepant structures). Fingertip-wise optimization [38] 𝑂𝑂 = � �𝑋𝑋⃗ 𝑖𝑖 − 𝑌𝑌�⃗ 𝑖𝑖 � 𝑋𝑋⃗ and

𝑌𝑌�⃗ are robot and human fingertip locations (Finger mapping is needed for discrepant structures). Vision-based optimization [39] 𝑂𝑂 = � 𝜓𝜓 𝑖𝑖 �𝐼𝐼 𝑅𝑅 , 𝑖𝑖 − 𝐼𝐼 𝐻𝐻 , 𝑖𝑖 � 𝑁𝑁 is the number of pixels. 𝐼𝐼 𝑅𝑅 and 𝐼𝐼 𝐻𝐻 are processed pose images from robot and human. 𝜓𝜓 are the weights. MUPE:

This metric also focuses on improving the prediction accuracy of the model.

The difference is that the performance is validated by comparing the cumulative RMS error between the actual rank and the raw prediction. where the error is computed as 𝐸𝐸 = 1 Γ � �� ( 𝑀𝑀 𝑠𝑠𝑗𝑗 ( 𝜏𝜏 ) − 𝑟𝑟 𝜏𝜏 ) (2) SMUWT:

This metric assumes that the weights of the neuron should not change dramatically if the training process is stable and effective. The sudden change of the weights usually means the input data are insufficient and have large variances. To avoid the sudden change in weights and maintain a smooth performance increase, the metric is designed to monitor the tendency of the weights to change. When the change in weight is greater than a threshold, the SMUWT metric terminates the updates, recalls the last normal snapshot, and continues the training. The KL divergence [41] can determine how the weights change between the weights’ distributions in the current model and snapshot model, which is calculated by

𝒦𝒦ℒ = � 𝑃𝑃 ( 𝑤𝑤 𝑘𝑘 ) log � 𝑃𝑃 ( 𝑤𝑤 𝑘𝑘 ) 𝑃𝑃 ( 𝑤𝑤′ 𝑘𝑘 ) � 𝐾𝐾𝑘𝑘=1 (3) where 𝑃𝑃 ( 𝑤𝑤 𝑘𝑘 ) is the network weights distribution of the last updated model and 𝑃𝑃 ( 𝑤𝑤 𝑘𝑘′ ) is the network weights distribution of the current model, 𝑘𝑘 is the index of the neuron.

2. Update Mechanism

A threshold 𝒯𝒯 is set to keep the training stable; it is tuned to reach the maximum training performance. When the KL divergence value exceeds 𝒯𝒯 , the current model will be replaced with the last saved safe model and will start a new training trial. When the KL divergence value does not exceed the threshold, the performance of the model may still decrease, and a validation step is used to avoid performance degradation. This validation is achieved by checking the primary goal of the prediction accuracy. If the prediction accuracy decreases, the current model will still be replaced with the last saved safe model. Because this method needs a reference model to calculate the KL divergence of the weight distribution, the algorithm kicks in at the second iteration. In practice, the updates follow an ε -updating policy that updates the stand-alone model with randomly selected snapshots in probability of ε for exploration and updates the current model with the best-performed snapshot in the probability of − ε for exploitation. The three strategies share a similar framework (fig. 2) but follow different updating laws. V. T RANSFER P REFERRED A SSISTANCE K NOWLEDGE BETWEEN D IFFERENT R OBOTS

Conventional telemanipulation methods are robot-specific and task-specific, which require extensive efforts to generalize these methods. Our preference model that converts the raw data to preference-related features establishes a foundation for knowledge transfer across different robots. However, direct transfer of the reference model from one robot to another may not be a substantial approach, because the physical structure discrepancy of different robot hands causes value change to the feature spaces. For instance, a 2-finger hand may share a criterion such as task completion with a 3-finger hand because their structures are considerably different from a human hand; but a 5-finger hand and 4-finger hand may focus on mimicking human motion because their structures are more similar to a human hand.

1. Positive Weights Transfer and Negative Weights Transfer

The training process of the NN approximates the learning mechanism of biological neurons. In the learned NN of preferred assistance knowledge, the weights of each neuron record the knowledge that positively or negatively affects the human preference

Fig. 2. The procedure of SMU strategies. First, the stand-alone model 𝑀𝑀 𝑠𝑠 is initialized with the weights of the current model 𝑀𝑀 𝑐𝑐 . During the training, in every 𝑁𝑁 iterations, a snapshot model 𝑀𝑀 𝑠𝑠𝑗𝑗 is saved to the model candidates pool. A designed evaluation metric is then used to evaluate the snapshot models in the pool; the current model 𝑀𝑀 𝑐𝑐 will be updated with a randomly selected snapshot model in probability of 𝜀𝜀 for exploration. Otherwise, the current model will be updated with the best-performed model 𝑀𝑀 𝑆𝑆∗ for exploitation.

Current model 𝑀𝑀 𝑐𝑐 Neural network training Best performed snapshot 𝑀𝑀 𝑆𝑆∗

Model candidates pool 𝑀𝑀 𝑠𝑠𝑗𝑗 , 𝑗 = 1,2,3 … 𝑛𝑛 Initialize stand-alone model 𝑀𝑀 𝑠𝑠 with 𝑀𝑀 𝑐𝑐 Save snapshot 𝑀𝑀 𝑠𝑠𝑗𝑗 in every 𝑁𝑁 iterations Evaluation metricSMUPA/SMUPE/SMUWTWith probability 𝜀𝜀 , update 𝑀𝑀 𝑐𝑐 with randomly chosen snapshot; otherwise, update 𝑀𝑀 𝑐𝑐 with 𝑀𝑀 𝑆𝑆∗ n the control strategies. When transferring knowledge between the robots, the fact is that the weights that store the knowledge are transferred. Inspired by the development of rectified linear unit (ReLU) layers [42], which is built on the neuroscience observation that control the firing rate of the total input current arising out of incoming signals at synapses [43], a knowledge transfer method is developed which modifies the weights with a positive rectifier function to transfer positive weights (TrPW) only, or with negative rectifier function to transfer negative weights (TrNW) only. The rectifier function allows the transferred knowledge to capture sparse representation, which is naturally suitable for human preference learning with sparse data.

2. Enhanced Weights Transfer

The magnitude of the NN weights represents the contribution of an attribute of the input to the output. We hypothesize that the learned weights in the preference model are consistently distributed when transferring knowledge between similar robot hands. Thus, another knowledge transferring method, called enhanced weights transfer (TrEW), is developed to enhance the weights distributions. The weights are proportionally enhanced according to their distance to the average magnitude of all weights within corresponding hidden layers. The enhanced weight is calculated by 𝜃𝜃′ = (1 − 𝛼𝛼 ) 𝜃𝜃 − 𝛼𝛼 𝐿𝐿 � 𝜃𝜃 𝑙𝑙𝐿𝐿𝑙𝑙=1 (4) Where 𝜃𝜃 is the weight, 𝛼𝛼 is the gain of enhancement, 𝐿𝐿 is the index of weights in the layer . VI. E XPERIMENTS

1. Experiment Setup

Three control strategies were designed with selected preference-related features listed in Table I (the details are presented in the appendix). For simplicity, each strategy only contains one controller. The first is an intent-based strategy, which is designed to enable the robot system to understand the operator’s task intent by reasoning the operator’s motion and then generate its own motion to accomplish task success without explicit consideration of following the operator’s motion. This strategy enables the robot to obey the task constraints without being interrupted by the physical discrepancy between the human hand and its own hand. For example, if the robot’s task is to hold a cup for an individual to drink water, the robot’s hand cannot cover the top of the cup. Also, for a robot to hand the cup to an individual, it is preferable to have the handle pointing out. The second strategy is a mimic-based strategy that makes the robot strictly follow the operator’s motion commands using a fixed kinematic mapping policy. As motion commands may lie outside the bounds of the robot’s capability, this strategy forces the robot to reach its physical limit but not attempting to use its own domain knowledge to explore a better alternative in these situations. The third strategy is an intent-mimic hybrid strategy, which determines the similarity of the operator command features to those known by the robot to find the level of importance they should have in the final grasp configurations. The importance is constructed as a penalty term into the formulation of the intent-based strategy. The new components added to the control system allow the robot to understand which attributes are common between itself and the operator as well as how similar these attributes are. Human-involved experiments (fig. 3) were designed to collect training data for a robot to learn the preference knowledge and to validate our SMU strategies and knowledge transfer methods. We chose three principle tasks: Use, Handover, and Move. Each principle task consisted of 18 different human motion commands. For each human motion command, three different robot grasp motions were generated using the three designed control strategies for a 3-finger robot hand. To collect the human preference/value data, 20 human subjects were asked to rank the generated robot grasp for the 3-finger hand. The order of trials was randomly generated, the subjects were not told how any of the control strategies behaved, and the models were not explicitly marked with formulation names. In total, 1080 trials across 20 evaluators were collected. The SMU strategies were evaluated first to learn the preferred assistance knowledge for the 3-finger hand. The training was limited to 20 trials and each trial had 200 iterations, for a total of 4,000 training iterations. For each trial, the snapshot model during the training process was saved every 10 iterations; 20 models were saved in total. The baseline is a conventional supervised learning method, which trained the model with the same number of trials and iterations, but using a continuous training process that randomly chose 70% of the data for training, 15% of the data for validating, and 15% of the data for testing at each epoch. The learned models were transferred to the 2-finger hand to validate our knowledge transfer method, and then the SMU strategies were used to refine the transferred model. . Experiment Evaluation Metrics

The high variance and ambiguity in the collected preference data from the human subjects made it difficult to evaluate the performance of the trained preference model. For instance, different people may prefer different grasp configurations for the same manipulation task; consequently, evaluators may rank them in different orders, causing a high variance in the data. For example, one evaluator may have ranked the strategies [1, 2, 3], while another evaluator ranked them [2, 1, 3]. To deal with this issue in practice, we evaluated the performance of the learned model in a flexible way. Although a single control strategy does not satisfy all evaluators’ preferences, certain control strategies are preferred than others. Like the previous example, strategies 1 and 2 are preferred, and strategy 3 is the least preferred. Thus, we formulate the evaluation criteria as a prediction problem to infer the control strategies with higher ranks when the operator telemanipulates the robot to complete a specific task. Instead of the winner-takes-all criterion, the model focuses on not only learning preferred control strategies but also understanding the least-preferred control strategies. This is useful knowledge for a robot as it attempts to understand human preferences and provide assistance that is aligned with those preferences. For example, a robot can provide the most preferred or sub-preferred choice and avoid the least-preferred choices. VII. R ESULTS

1. Assistance Preference Modeling

Table II row 1 shows the prediction accuracy of the learned model. The average prediction accuracy for the 3-finger robot is 86.5%. From the model for each principle task, the highest prediction accuracy is 88.3% and the lowest prediction accuracy is 84.5%, which shows the consistency and feasibility of our methods for different tasks.

2. Training with Stagewise Model Updating Methods

Fig. 4 shows the training process of the preference model for all principle tasks while using the three SMU methods. Each data point represents the performance of the model after the training epoch. 𝜎𝜎 𝑆𝑆𝑀𝑀𝑆𝑆2 and 𝜎𝜎 𝐵𝐵𝐵𝐵𝑠𝑠𝐵𝐵2 are the variance of prediction accuracy starting from the second data point. 𝐷𝐷 𝑆𝑆𝑀𝑀𝑆𝑆 and 𝐷𝐷 𝐵𝐵𝐵𝐵𝑠𝑠𝐵𝐵 are the cumulative performance degradation in prediction accuracy when compared with that of the previous epoch. Overall, compared to the baseline methods across all tasks, the average performance of the SMUPA method is 77.6% more stable and reduce the performance degradation from 0.7448 to 0.2481; the SMUPE method is 8.5% more stable and reduces the performance degradation from 0.4185 to 0.2905; the SMUWT methods is 85.2% more stable and TABLE

II R

ESULTS OF M ODEL L EARNING AND T RANSFERRING

Hand Over Move Use Average 3-Finger (Learned Model) 0.868 0.883 0.845 0.865 2-Finger (Directly Transfer) 0.608 0.690 0.561 0.620 2-Finger (TrPW) 0.642 0.771 0.689 0.701 2-Finger (TrNW) 0.762 0.582 0.779 0.708 2-Finger (TrEW) 0.712 0.711 0.599 0.674 2-Finger (Refined) 0.894 0.872 0.847 0.871

Fig. 3. Human-involved experiment for data collection. 54 commands are generated for principle task: usage, move, handover. A 3-finger hand is teleoperated with three control strategies: intent based, mimic based, intent-mimic combined. 20 evaluators gave rank for the three strategies. In total 1080 trials are collected.

Intent BasedMimic BasedIntent-Mimic Combined 𝑖𝑖 th Choice 𝑗 th Choice 𝑘𝑘 th ChoiceHuman Commands for Task: Use, Move, Handover Teleoperate 3-Finger Gripper Collect Ranking Data 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 𝑂𝑂 , 𝑂𝑂 ,… 𝑂𝑂 𝑚𝑚𝐶 educe the performance degradation from 0.6314 to 0. Specifically, the SMUPA method outperformed the baseline method in all tasks, but still experience performance oscillation in Hand Over and Move tasks. The SMUPE method performed well in Move and Use task but worse than the baseline in Hand Over task due to a massive performance drop at the late training stage. Among these three SMU methods, SMUWT can successfully maintain healthy updates in the neural network weights to avoid a performance drop caused by data disturbance. Even for the Move task where the other two SMU methods failed to keep the training stability and performance gain, the SMUWT method can still maintain the performance increase with stable training. The results of the learned preference model confirm that the operator’s preference relates to the human motion command and the corresponding robot grasp configuration for a specific task. Comparing to the baseline, the average performance of all three SMU methods are 9.2% more stable and experience 47.4% less performance degradation in Hand Over task; 64.5% more stable and experience 67.5% less performance degradation in Move task, 86% more stable and experience 97.7% less performance degradation in Use task. Furthermore, the performance of the Move task is among the highest (accuracy = 0.883), the performance of the Use task is among the lowest (0.845), and the performance of the Hand Over task is in the middle (0.868). The Use task usually has more motion constraints than the other two tasks which may result in a clearer preference rank. However, the habitual differences of humans affect the data quality; for example, some people prefer holding the handle of a cup while drinking, while others prefer holding the body. These differences in preferences cause higher data variance and lower model performance, which caused the Use

Fig. 4. The training process for each principle task (Hand Over, Move, Use) while learning the preference model with the three stagewise model updating strategies: SMUPA (a–c), SMUPE (d–f), and SMUWT (g–i). 𝜎 𝑆𝑀𝑈 and 𝜎 𝐵𝑎𝑠𝑒 are the variance of prediction accuracy starting from the second data point. 𝐷 𝑆𝑀𝑈 and 𝐷 𝐵𝑎𝑠𝑒 are the cumulative performance degradation in prediction accuracy during the training process.

Training with Stagewise Model Updating(a) (b) (c)(d) (e) (f)

Training Epoch Training Epoch Training Epoch (g) (h) (i)

BaselineSMUPA 𝜎 𝑆𝑀𝑈 = 4 ,𝐷 𝑆𝑀𝑈 = 0 0 𝜎

𝐵𝑎𝑠𝑒 = 4 ,𝐷 𝐵𝑎𝑠𝑒 = 0

BaselineSMUPE 𝜎 𝑆𝑀𝑈 = ,𝐷 𝑆𝑀𝑈 = 0 𝜎

𝐵𝑎𝑠𝑒 = 4 ,𝐷 𝐵𝑎𝑠𝑒 = 0 00

BaselineSMUPE 𝜎 𝑆𝑀𝑈 = 4 ,𝐷 𝑆𝑀𝑈 = 0𝜎

𝐵𝑎𝑠𝑒 = 34 ,𝐷 𝐵𝑎𝑠𝑒 = 0

BaselineSMUPA 𝜎 𝑆𝑀𝑈 = ,𝐷 𝑆𝑀𝑈 = 0 4 𝜎

𝐵𝑎𝑠𝑒 = 0 ,𝐷 𝐵𝑎𝑠𝑒 = 0 4

BaselineSMUPE 𝜎 𝑆𝑀𝑈 = ,𝐷 𝑆𝑀𝑈 = 0 3 𝜎

𝐵𝑎𝑠𝑒 = ,𝐷 𝐵𝑎𝑠𝑒 = 0 3

BaselineSMUWT 𝜎 𝑆𝑀𝑈 = 3 3 ,𝐷 𝑆𝑀𝑈 = 0𝜎

𝐵𝑎𝑠𝑒 = 4 ,𝐷 𝐵𝑎𝑠𝑒 = 0 3

BaselineSMUPA 𝜎 𝑆𝑀𝑈 = ,𝐷 𝑆𝑀𝑈 = 0 0 3𝜎

𝐵𝑎𝑠𝑒 = 4 ,𝐷 𝐵𝑎𝑠𝑒 = 0 0

BaselineSMUPE 𝜎 𝑆𝑀𝑈 = ,𝐷 𝑆𝑀𝑈 = 0𝜎

𝐵𝑎𝑠𝑒 = 3 3 ,𝐷 𝐵𝑎𝑠𝑒 = 0 34

BaselineSMUWT 𝜎 𝑆𝑀𝑈 = 3 ,𝐷 𝑆𝑀𝑈 = 0𝜎

𝐵𝑎𝑠𝑒 = 4 ,𝐷 𝐵𝑎𝑠𝑒 = 0 4 ask with the baseline method to have the highest training instability and the lowest prediction accuracy. The results showed that the SMU methods were effective to handle this data variance to improve the stability of the learning process, and the Use task with the highest data variance was improved the most with 86% stability improvement.

3. Transfer Preference Knowledge

Row 2 of Table II presents the performance by directly transferring the learned model to the 2-finger hand. The averageperformance of all transferred preference models dropped to 0.62 compared to the original model. Rows 3 to 5 show the prediction accuracy of the transferred model while using different knowledge transfer algorithms. Statistically, the average prediction accuracy of all modified knowledge transfer methods is 0.694, which is 12% higher than direct knowledge transfer. Overall, the proposed methods outperform the direct transfer method in eight out of nine cases. The results for TrNW and TrPW methods show that the ReLU conversion is effective to transfer sparse information in most cases. The TrEW method had more performance degradation right after transfer than the other two methods. One reason is that when training with imperfect data, the contained noise, disturbance, and ambiguity are also learned in addition to the preferred assistance knowledge. Since we cannot identify which weight contains useful knowledge, this method may also enhance the learned imperfectness, which may reduce the performance. In summary, we can draw three main reasons for performance drops while transferring knowledge between different robots: (1) the physical discrepancy of the hand structure cause value change to the feature weights; (2) loss of information while transferring knowledge; and (3) the disturbance contained in the weights are also transferred. Thus, the transferred models need to be refined to resume the performance.

4. Refine Transferred Model

Row 6 of Table II shows the model performance after refining, the average prediction accuracy (0.871) shows the performanceof the transferred model after refining can be comparable with if the model is specifically trained for the target robot. Fig.5 is an example of the refining process with three knowledge transfer methods for Move task and the SMUWT metric. The model transferred by the TrEW method reached the peak performance at the 2nd training epoch, which outperformed the other two transfer methods. A potential reason is that TrPW and TrNW models may need more data and training to refine because the hard zeros in the weights will affect the gradient backpropagation. Although TrEW has more performance degradation than the other two methods right after the transfer, it is much easier to refine to recover the performance because it transferred the complete distribution of the weighs which has less information loss and sets a good starting point while refining the transferred model. Overall, the experimental results show that the combination of the TrEW method and the SMUWT strategy can provide the best performance concerning training stability, peak performance, and convergent speed. Fig.6 shows the example refining process for the SMUWT + TrEW method across three tasks. All cases reached peak performance in less than 4 training epochs with no degradation. In general, the refined models of the 2-finger robot hand for Hand Over task has the best prediction accuracy among the other two tasks. The preference model for Use task has slightly lower prediction accuracy than the other two tasks. Fig.7 is an example of the performance changing flow chosen from the dataset when implementing the proposed methods for the Move task. The flow starts from the learned preference model for the 3-finger hand, which correctly matches the ground truth. When directly transferring the knowledge to the 2-finger hand, the model made a wrong prediction for all strategies. When using the TrEW knowledge transfer method, the performance increased compared to that of the direct transfer method. The model successfully

Fig. 5. Refining process across the three transfer methods: (a) TrEW; (b) TrPW; (c) TrNW, for Move task, using SMUWT for best training performance. redicted the second choice but confused the first and third choices. The model was then refined with the SMUWT methods in three training epochs with 540 samples (half of the data when train from scratch). The refined model successfully classified the first two preferred strategies and accurately identified the least preferred one. These results verify the feasibility of our assistance preference model and the assumption of knowledge transfer between different robots. It also proves the necessity of the proposed knowledge transfer methods and SMU methods.

VIII. D

ISCUSSION

1. Use the Assistance Preference Model for Telemanipulation

With the learned preference models, different preference-aware semi-autonomous manipulation schemes can be developed toenable robots to actively provide human-perceived effective assistance. For example, the robot can use the preference prediction to avoid the least preferred strategies and find the preferred assistance in the rest of candidates that maximize task reward, which is a

Fig. 7. An example of performance changing flow when applying the proposed methods. The learned preference model for 3-finger hand can make correct prediction. The performance dropped when exactly transfer the model to 2-finger hand. By implementing the modified knowledge transfer method and refining the transferred model in three training epochs, the model classifies the first two preferred strategies and accurately identifies the least preferred one for 2-finger hand.

Learned Knowledge (3-Finger)IntentMimicIntent-Mimic

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑2𝑛𝑛𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑3𝑟𝑟𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑

Refine the Model in 3 Training EpochsTransfer without Modification (2-Finger)

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑2𝑛𝑛𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑3𝑟𝑟𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑

Modified Transfer (2-Finger)

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑2𝑛𝑛𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑3𝑟𝑟𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑

Refined Model (2-Finger) 𝑟𝑟𝑑𝑑 𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑 𝑠𝑠𝑑𝑑 𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑

𝐶𝐶ℎ𝑛𝑛𝑖𝑖𝑑𝑑𝑑𝑑

Fig. 6. Refining process across the three principle tasks: (a) Hand Over; (b) Move; (c) Use; with the model transferred with TrEW method, The SMUWT method was used to refine the model. ore aggressive assistance scheme. The robot can also provide the first-ranked (most preferred) assistance to maximize operator preference, which can build better team cooperation but may not achieve the maximum task reward. Analysis of the confidence of the rank prediction and human adaptability can improve practicability when providing assistance in a real context.

2. Applications for Other Robot Structures and Control Strategies

Although for simplicity of testing and evaluation in this paper, we designed 3 control strategies and one controller for each strategy, the preference modeling and stabilized learning methods are expandable for a scenario with more control strategies and/or controllers and more preference-related features. The knowledge transfer methods can reduce the training efforts when adopting the reference models to different robot structures. The difficulty of the knowledge transfer varies according to the level of difference between the source robot and the target robot. While transferring knowledge between different robots, the structures of the robots should be relatively similar; in our case, knowledge transfer between a 3-finger hand and 2-finger hand is applicable because the hands are similar, and all parameters are the same except for the number of fingers. Intuitively, it is more challenging to transfer the knowledge of a 2-finger hand to a 5-finger hand because their structures are more dissimilar, which may inhibit the sharing of transferable knowledge. For example, operators may prefer the mimic strategy more than other strategies when working with a 5-finger hand as it is more like the human hand. Thus, knowledge should be transferred between robots that are physically similar, like a 5-finger hand with 20 degrees of freedom to a 5-finger hand with 16 degrees of freedom, or a 5-finger hand to a 4-finger hand.

In general, after transfer and refinement, we expect the preferred rank to be similar but not necessarily identical. For example, for a 4-finger robot, the majority rank may be [intent-mimic, mimic, intent], and for a 5-finger robot the majority rank may be [mimic, intent-mimic, intent]. They still share transferrable knowledge to identify the first two preferred strategies, which are intent-mimic and mimic, and the least one intent-based control. Furthermore, if we have ten control strategies, and three of them are preferred by the operator, the rank of these three preferred strategies does not have to be the same. We expect that our model can identify the three preferred strategies and the learned knowledge can be transferred between different robots. IX. C ONCLUSION

In this work, we developed a methodology for robots to choose the human-preferred way for providing a higher level of active assistance in semi-autonomous telemanipulation. We developed the preference models to learn the assistance preference knowledge in a manner of the predicted rank of the assistive robot control methods. We presented SMU methods to stably learn the preference model from ambiguous human preference data and different methods to transfer the preference model so different robots can use the model with fewer training samples. The experiment results demonstrated that the combination of the weights transfer method based on weight distribution (TrEW) and stagewise model updating strategy based on weights tendency (SMUWT) can implement the goal of knowledge transfer to reduce training efforts and ensure training stability. Our future research will concentrate on understanding the connection between the learned knowledge and the physical attributes of the task objects and subjects as well as developing and evaluating preference-aware assistance methods petitioned in discussion for telemanipulation with physical experiments. X. R EFERENCES [1].

S. Lichiardopol, “A survey on teleoperation,”

Tech. Univ. Eindhoven, DCT Rep. , vol. 20, pp. 40–60, 2007. [2].

J. E. Colgate, “Robust impedance shaping telemanipulation,”

IEEE Trans. Robot. Autom. , vol. 9, no. 4, pp. 374–384, 1993. [3].

A. J. Park and R. N. Kazman, “Augmented reality for mining teleoperation,” in

Telemanipulator and Telepresence Technologies , 1995, vol. 2351, pp. 119–129. [4].

E. Yang and M. C. Dorneich, “The emotional, cognitive, physiological, and performance effects of variable time delay in robotic teleoperation,” Int. J. Soc. Robot., vol. 9, no. 4, pp. 491–508, 2017. [5].

S. Rehman et al., “Simulation-based robot-assisted surgical training: a health economic evaluation,” Int. J. Surg., vol. 11, no. 9, pp. 841–846, 2013. [6].

M. C. Cavusoglu, A. Sherman, and F. Tendick, “Design of bilateral teleoperation controllers for haptic exploration and telemanipulation of soft environments,”

IEEE Trans. Robot. Autom. , vol. 18, no. 4, pp. 641–647, 2002. [7].

R. R. Murphy, J. Kravitz, S. L. Stover, and R. Shoureshi, “Mobile robots in mine rescue and recovery,”

IEEE Robot. Autom. Mag. , vol. 16, no. 2, pp. 91–103, 2009. [8].

K. Hauser, “Recognition, prediction, and planning for assisted teleoperation of freeform tasks,”

Auton. Robots , vol. 35, no. 4, pp. 241–254, 2013. 9].

X. Yang, K. Sreenath, and N. Michael, “A framework for efficient teleoperation via online adaptation,” in , 2017, pp. 5948–5953. [10].

S. Jain and B. Argall, “Recursive Bayesian human intent recognition in shared-control robotics,” in , 2018, pp. 3905–3912. [11].

J. Cui, S. Tosunoglu, R. Roberts, C. Moore, and D. W. Repperger, “A review of teleoperation system control,” in

Proceedings of the Florida Conference on Recent Advances in Robotics , 2003, pp. 1–12. [12].

C. Liu, J. Walker, and J. Y. Chai, “Ambiguities in spatial language understanding in situated human robot dialogue,” in , 2010. [13].

C. Breazeal, M. Berlin, A. Brooks, J. Gray, and A. L. Thomaz, “Using perspective taking to learn from ambiguous demonstrations,”

Rob. Auton. Syst. , vol. 54, no. 5, pp. 385–393, 2006. [14].

W. Griffin, R. Findley, M. Turner, and M. Cutkosky, “Calibration and mapping of a human hand for dexterous telemanipulation,” Jan. 2000. [15].

G. Gioioso, G. Salvietti, M. Malvezzi, and D. Prattichizzo, “Mapping Synergies From Human to Robotic Hands With Dissimilar Kinematics: An Approach in the Object Domain,”

IEEE Trans. Robot. , vol. 29, no. 4, pp. 825–837, 2013. [16].

L. Cui, U. Cupcic, and J. Dai, “An Optimization Approach to Teleoperation of the Thumb of a Humanoid Robot Hand: Kinematic Mapping and Calibration,”

ASME J. Mech. Des. , vol. 136, p. 91005, Sep. 2014. [17].

S. Li et al. , “Vision-based teleoperation of shadow dexterous hand using end-to-end deep neural network,” in , 2019, pp. 416–422. [18].

G. Salvietti, L. Meli, G. Gioioso, M. Malvezzi, and D. Prattichizzo, “Multicontact Bilateral Telemanipulation With Kinematic Asymmetries,”

IEEE/ASME Trans. Mechatronics , vol. 22, no. 1, pp. 445–456, 2017. [19].

A. Bicchi and V. Kumar, “Robotic grasping and contact: a review,” in

Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065) , 2000, vol. 1, pp. 348–353 vol.1. [20].

M. Gualtieri, A. Ten Pas, K. Saenko, and R. Platt, “High precision grasp pose detection in dense clutter,”

IEEE International Conference on Intelligent Robots and Systems , vol. 2016-November. pp. 598–605, 01-Mar-2016. [21].

R. Calandra et al. , “The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?,” arXiv e-prints , p. arXiv:1710.05512, Oct. 2017. [22].

S. Ekvall and D. Kragic, “Learning and Evaluation of the Approach Vector for Automatic Grasp Generation and Planning,” in

Proceedings 2007 IEEE International Conference on Robotics and Automation , 2007, pp. 4715–4720. [23].

Y. Li, K. P. Tee, W. L. Chan, R. Yan, Y. Chua, and D. K. Limbu, “Continuous Role Adaptation for Human–Robot Shared Control,”

IEEE Trans. Robot. , vol. 31, no. 3, pp. 672–681, 2015. [24].

J. D. Webb, S. Li, and X. Zhang, “Using visuomotor tendencies to increase control performance in teleoperation,” in , 2016, pp. 7110–7116. [25].

K. Muelling et al. , “Autonomy infused teleoperation with application to brain computer interface controlled manipulation,”

Auton. Robots , vol. 41, no. 6, pp. 1401–1422, 2017. [26].

P. Marayong, M. Li, A. M. Okamura, and G. D. Hager, “Spatial motion constraints: theory and demonstrations for robot guidance using virtual fixtures,” in , 2003, vol. 2, pp. 1954–1959 vol.2. [27].

S. Kuhn, T. Gecks, and D. Henrich, “Velocity control for safe robot guidance based on fused vision and force/torque data,” in , 2006, pp. 485–492. [28].

S. Nikolaidis, Y. X. Zhu, D. Hsu, and S. Srinivasa, “Human-Robot Mutual Adaptation in Shared Autonomy,”

ACM/IEEE International Conference on Human-Robot Interaction , vol. Part F127194. pp. 294–302, 01-Jan-2017. [29].

C. Pinto, P. Amorim, G. Veiga, and A. P. Moreira, “A review on task planning in Human-Robot Teams,” in

RSS 2017 Workshop on Mathematical Models, Algorithms, and Human-Robot Interaction , 2017. [30].

S. Nikolaidis, J. Forlizzi, D. Hsu, J. Shah, and S. Srinivasa, “Mathematical models of adaptation in human-robot collaboration,” arXiv Prepr. arXiv1707.02586 , 2017. [31].

A. Jain, S. Sharma, T. Joachims, and A. Saxena, “Learning preferences for manipulation tasks from online coactive feedback,”

Int. J. Rob. Res. , vol. 34, no. 10, pp. 1296–1313, 2015. [32].

T. Munzer, M. Toussaint, and M. Lopes, “Preference learning on the execution of collaborative human-robot tasks,” in , 2017, pp. 879–885. [33].

S. C. Akkaladevi, M. Plasch, C. Eitzinger, S. C. Maddukuri, and B. Rinner, “Towards learning to handle deviations using user preferences in a human robot collaboration scenario,” in

International Conference on Intelligent Human Computer Interaction , 2016, pp. 3–14. [34].

R. Liu and X. Zhang, “Systems of natural-language-facilitated human-robot cooperation: A review,” arXiv Prepr. arXiv1701.08269 , 2017. 35].

M. Bowman and X. Zhang, “An Intent-based Task-aware Shared Control Framework for Intuitive Object Telemanipulation,” arXiv Prepr. arXiv2003.03677 , 2020. [36].

L. Colasanto, R. Suárez, and J. Rosell, “Hybrid mapping for the assistance of teleoperated grasping tasks,”

IEEE Trans. Syst. Man, Cybern. Syst. , vol. 43, no. 2, pp. 390–401, 2012. [37].

A. Peer, S. Einenkel, and M. Buss, “Multi-fingered telemanipulation - Mapping of a human hand to a three finger gripper,”

Proc. 17th IEEE Int. Symp. Robot Hum. Interact. Commun. RO-MAN , pp. 465–470, 2008. [38].

R. N. Rohling and J. M. Hollerbach, “Optimized fingertip mapping for teleoperation of dextrous robot hands,” in [1993] Proceedings IEEE International Conference on Robotics and Automation , 1993, pp. 769–775. [39].

S. Li et al. , “A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU,” arXiv Prepr. arXiv2003.05212 , 2020. [40].

R. D. Reed and R. J. I. I. Marks,

Neural smithing: supervised learning in feedforward artificial neural networks . Cambridge (Mass.): MIT press, 1999. [41].

S. Kullback and R. A. Leibler, “On Information and Sufficiency,”

Ann. Math. Stat. , vol. 22, no. 1, pp. 79–86, 1951. [42].

X. Glorot, A. Bordes, and Y. B. B. T.-P. of the F. I. C. on A. I. and Statistics, “Deep Sparse Rectifier Neural Networks.” PMLR, pp. 315–323, 14-Jun-2011. [43].

P. Dayan and L. Abbott,

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , vol. 15. 2001. [44].

M. Bowman, S. Li, and X. Zhang, “Intent-Uncertainty-Aware Grasp Planning for Robust Robot Assistance in Telemanipulation,” in , 2019, pp. 409–415.

XI. A PPENDIX

The characteristic raw data is broken down into two categories: grasp attributes, and task attributes. Grasp attributes represent the hand kinematics, which include the palm orientation, palm center location, and finger configuration corresponding to the thumb and the index and middle fingers. We denote the set of robot grasp attributes as ℛ and the set of human grasp attributes as ℋ . Task attributes 𝑇𝑇 describe tasks to be done.

1. Intent-Based Strategy

We denote the control variables of the robot as 𝑅𝑅 𝐵𝐵 ∈ ℛ . A set of { 𝑅𝑅 𝐵𝐵 } produces a probability for each task, which is denoted as 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) . An intent-uncertainty-aware human grasp model from previous work [44] is created to refer to the different task inference intents 𝑇𝑇 𝑏𝑏 . There are upper and lower bounds for model parameters, 𝑈𝑈 𝐵𝐵 and 𝐿𝐿 𝐵𝐵 respectively, which the robot must adhere to, such as physical limits of end effector position or joint angles, or force provided. We establish the intent inference, which consists of three principle tasks including Use, Move, and Hand Over. For example, for grasping a cup: Use is using or drinking from the cup, Move is moving the cup to another location, and Hand Over is handing the cup over to another agent. We use intent-uncertainty-aware human grasp model ℳ to infer the intent 𝑇𝑇 𝑏𝑏 in (5): 𝑇𝑇 𝑏𝑏 = ℳ ( ℋ ) (5) The distribution 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) is used to quantify how much each task is satisfied by a given robot pose with features 𝑅𝑅 𝑎𝑎 . We use Naïve Bayes robot model ℳ 𝑟𝑟 to produce the robot probability vector of satisfying the task 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) in (6) to (8), where µ 𝑏𝑏 is the average value for task 𝑏𝑏 , ∑ b is the covariance matrix for task 𝑏𝑏 , and 𝑑𝑑 is the length of vector 𝑅𝑅 𝑎𝑎 . 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) = ℳ 𝑟𝑟 ( 𝑅𝑅 ) (6) 𝑃𝑃 ( 𝑅𝑅 𝐵𝐵 | 𝑏𝑏 ) = 1 �𝑑𝑑𝑑𝑑𝑑𝑑�∑ b � (2 𝜋𝜋 ) 𝑑𝑑 𝑑𝑑 −12 ( 𝑅𝑅 𝑎𝑎 − µ 𝑏𝑏 ) 𝑇𝑇 ∑ b−1 ( 𝑅𝑅 𝑎𝑎 − µ 𝑏𝑏 ) (7) 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 = 𝑅𝑅 𝐵𝐵 ) = 𝑃𝑃 ( 𝑏𝑏 | 𝑅𝑅 𝐵𝐵 ) = 𝑃𝑃 ( 𝑅𝑅 𝐵𝐵 | 𝑏𝑏 ) 𝑃𝑃 ( 𝑏𝑏 ) ∑ 𝑃𝑃 ( 𝑅𝑅 𝐵𝐵 | 𝑏𝑏 ) 𝑃𝑃 ( 𝑏𝑏 ) 𝐵𝐵𝑏𝑏 (8)

Upon developing the target probability vector and the robot probability vector, the intent-based strategy can be constructed based on the intent-based shared control criterion with added constraint, where the objective function is 𝐶𝐶 = min (12 � ( 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) − 𝑇𝑇 𝑏𝑏 ) ) 𝑏𝑏 𝑠𝑠 . 𝑑𝑑 . 𝐿𝐿 𝐵𝐵 ≤ 𝑅𝑅 𝐵𝐵 ≤ 𝑈𝑈 𝐵𝐵 ∀ 𝑖𝑖 𝑛𝑛𝑛𝑛𝑟𝑟𝑛𝑛 ( 𝑅𝑅 𝐵𝐵 ) = 1 ∀ 𝑖𝑖 𝑛𝑛𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑓𝑓𝑛𝑛𝑟𝑟 𝑝𝑝𝑎𝑎𝑝𝑝𝑛𝑛 𝑑𝑑𝑖𝑖𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑖𝑖𝑛𝑛𝑛𝑛 (9)

2. Mimic-Based Strategy

If a human operator needs the robot to strictly follow the motion command, unintended errors may occur, but we can still achieve this goal by adding extra constraints to the intent-based strategy. The motion constraints can be explicitly dictated by adding the following set of constraints: 𝑎𝑎 = 𝐻𝐻 𝑎𝑎 ∀ 𝑎𝑎 ( ) This will give the operator full control of all features of the robot. The new constraints added to the control diagram ensure the robot follows the human exactly by matching the robot features and human features to mimic the motion. The objective functions are 𝐶𝐶 = min � � ( 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) − 𝑇𝑇 𝑏𝑏 ) � 𝑠𝑠 . 𝑑𝑑 . 𝐿𝐿 𝐵𝐵 ≤ 𝑅𝑅 𝐵𝐵 ≤ 𝑈𝑈 𝐵𝐵 ∀ 𝐵𝐵 𝑛𝑛𝑛𝑛𝑟𝑟𝑛𝑛 ( 𝑅𝑅 𝐵𝐵 ) = 1 ∀ 𝐵𝐵 𝑛𝑛𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑓𝑓𝑛𝑛𝑟𝑟 𝑝𝑝𝑎𝑎𝑝𝑝𝑛𝑛 𝑑𝑑𝑖𝑖𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑖𝑖𝑛𝑛𝑛𝑛 𝑅𝑅 𝐵𝐵 = 𝐻𝐻 𝐵𝐵 ∀ 𝐵𝐵 (11)

3. Intent-Mimic Hybrid Strategy

We first define 𝜆𝜆 𝑎𝑎 as the KL divergence between the distribution of each feature. 𝜆𝜆 𝐵𝐵 = 𝐷𝐷 𝐾𝐾𝐿𝐿 ( 𝑅𝑅 𝐵𝐵 �� | �𝐻𝐻 𝐵𝐵 �� = 𝑝𝑝𝑛𝑛 𝜎𝜎 𝐻𝐻 𝑎𝑎 𝜎𝜎 𝑅𝑅 𝑎𝑎 + 𝜎𝜎 𝑅𝑅 𝑎𝑎 + �𝜇𝜇 𝑅𝑅 𝑎𝑎 − 𝜇𝜇 𝐻𝐻 𝑎𝑎 � 𝜎𝜎 𝐻𝐻 𝑎𝑎 −

12 (12)

Additionally, the multivariate normal distribution between two populations can be used to determine the overall divergence between hand configurations in (13): 𝛾𝛾 = 𝐷𝐷 𝐾𝐾𝐿𝐿 �𝑅𝑅� || 𝐻𝐻�� = 12 � 𝑑𝑑𝑟𝑟𝑎𝑎𝑑𝑑𝑑𝑑 ( 𝛴𝛴 𝐻𝐻−1 𝛴𝛴 𝑅𝑅 ) +( 𝜇𝜇 𝐻𝐻 − 𝜇𝜇 𝑅𝑅 ) 𝑇𝑇 𝛴𝛴 𝐻𝐻−1 ( 𝜇𝜇 𝐻𝐻 − 𝜇𝜇 𝑅𝑅 ) − 𝑘𝑘 + ln | 𝛴𝛴 𝐻𝐻 || 𝛴𝛴 𝑅𝑅 | � (13) The formulation results in making the mimic constraint from the previous formulation in the objective function to act as an elastic constraint which allows the robot to bend the rules on mimicking the human. The grasp position is generated by minimizing (14). 𝐶𝐶 = min � � ( 𝑃𝑃 𝑏𝑏 ( 𝑅𝑅 ) − 𝑇𝑇 𝑏𝑏 ) + 1 𝛾𝛾 � 𝜆𝜆 𝐵𝐵 ( 𝑅𝑅 𝐵𝐵 − 𝐻𝐻 𝐵𝐵 ) � 𝑠𝑠 . 𝑑𝑑 . 𝐿𝐿 𝐵𝐵 ≤ 𝑅𝑅 𝐵𝐵 ≤ 𝑈𝑈 𝐵𝐵 ∀ 𝐵𝐵 𝑛𝑛𝑛𝑛𝑟𝑟𝑛𝑛 ( 𝑅𝑅 𝐵𝐵 ) = 1 ∀ 𝐵𝐵 (14)(14)