Learning Interactive Behaviors for Musculoskeletal Robots Using Bayesian Interaction Primitives
Joseph Campbell, Arne Hitzmann, Simon Stepputtis, Shuhei Ikemoto, Koh Hosoda, Heni Ben Amor
LLearning Interactive Behaviors for Musculoskeletal Robots UsingBayesian Interaction Primitives
Joseph Campbell , Arne Hitzmann , Simon Stepputtis , Shuhei Ikemoto , Koh Hosoda , and Heni Ben Amor Abstract — Musculoskeletal robots that are based on pneu-matic actuation have a variety of properties, such as complianceand back-drivability, that render them particularly appealingfor human-robot collaboration. However, programming inter-active and responsive behaviors for such systems is extremelychallenging due to the nonlinearity and uncertainty inherentto their control. In this paper, we propose an approach forlearning Bayesian Interaction Primitives for musculoskeletalrobots given a limited set of example demonstrations. We showthat this approach is capable of real-time state estimation andresponse generation for interaction with a robot for which noanalytical model exists. Human-robot interaction experimentson a ’handshake’ task show that the approach generalizes tonew positions, interaction partners, and movement velocities.
I. I
NTRODUCTION
As robots are employed in an ever-expanding varietyof roles, interactions with humans will inevitably becomecommon place. To ensure that such interactions are safeand productive, robots need to be both mechanically andbehaviorally responsive to their human counterparts. Me-chanical responsiveness and compliance guarantees that nophysical contact or force-exchange is harmful to the hu-man. Traditional robotic systems are typically composed ofnon-compliant, rigid limbs that do not yield when contactwith an opposing body occurs. In contrast to that, variousmusculoskeletal robots have been proposed which are basedon McKibben pneumatic actuators [1]. These biologically-inspired systems mimic the behavior of human muscles andtendons, and are capable of providing a significant amount offorce while remaining inherently safe due to their compliant,back-drivable nature. Unfortunately, due to the uncertaintyand nonlinearity underlying pneumatic actuation, controllingsuch systems can pose major difficulties to the controlframework [2], [3]. Often, getting a complex robot to performsmooth, generalizable actions on its own can be extremelychallenging, let alone react to a human interaction partner.In this paper, we leverage a methodology for teaching in-teractive and responsive behaviors to musculoskeletal robotsknown as Bayesian Interaction Primitives (BIP) [4]. Basedon learning from demonstration , this framework provides thebasis for working with robots that have no tractable analyt-ical model. Given a small set of training demonstrations, J. Campbell, S. Stepputtis, and H. Ben Amor are with the school ofComputing, Informatics, and Decision Systems Engineering, Arizona StateUniversity { jacampb1, sstepput, hbenamor } @asu.edu A. Hitzmann and K. Hosoda are with the Graduate Schoolof Engineering Science, Osaka University { arne.hitzmann,hosoda } @sys.es.osaka-u.ac.jp S. Ikemoto is with the Graduate School of Life Scienceand Systems Engineering, Kyushu Institute of Technology [email protected]
Fig. 1. A musculoskeletal robot learning to shake the hand of a humanpartner. Bayesian Interaction Primitives are used to determine when and how to interact. a spatiotemporal model is extracted which correlates themovements of the interaction partners. This information isefficiently encoded in a joint probability distribution which,in turn, can be used to infer when and how to engage incollaborative actions with a human interaction partner.However, despite the promise of such a framework,Bayesian Interaction Primitives have yet to be successfullydemonstrated in a real-time human-robot interaction sce-nario, and it has yet to be shown whether this frameworkis capable of generalizing to new interaction partners. Fur-thermore, while BIP introduced a new integrated temporalestimation technique which differed from prior work [5], [6],there is no detailed analysis of its effectiveness nor operatingcharacteristics. This work intends to fill that gap with thefollowing contributions. Specifically, we: • introduce a methodology for learning from demonstra-tion in human-robot interaction scenarios which com-bines open-loop robot trajectories with reactive humanbehavior in order to generate a responsive robot controlpolicy, • demonstrate that our approach can effectively generatelegible, temporally- and spatially-adaptive response tra-jectories for a complex, musculoskeletal robot for which a r X i v : . [ c s . R O ] A ug o analytical model exists, • empirically analyze the integrated phase estimation ofour approach, including how well it adapts to differentinteraction speeds, starting times, and edge cases, aswell as how this impacts uncertainty estimates, • demonstrate its ability to generalize to new interactionpartners unseen in the training process.In the next section we will discuss the existing literature, andthe gap that exists in learning algorithms for human-robotinteraction with pneumatic robots that we aim to address.Section III provides an overview of our approach whileSec. IV describes our experimental setup and provides anextensive analysis of the results. Section V outlines ourfindings and the directions for our future work.II. R ELATED W ORK
Robots with pneumatic artificial muscles (PAMs) andcompliant limbs have been shown to be desirable for human-robot interaction scenarios [7], [8]. When configured inan anthropomorphic musculoskeletal structure, such robotsprovide an intriguing platform for human-robot interaction(HRI) [9] due to their potential to generate human-likemotions while offering a degree of safety as a result oftheir compliance when confronted with an external force– such as contact with a human. Recent work [10] hasshown the value of utilizing McKibben actuators in thedesign of these robots, due to their inherent complianceand inexpensive material cost. However, while analyticalkinematics models are in theory possible [11], [12], theyare not always practical due to the effects of friction andthe deterioration of mechanical elements, which are difficultto account for (although some gains have been made in thisarea [13]). Subsequently, this work proposes using a methodbased on learning from demonstration [14], [15], which isa well-established methodology for teaching robots complexmotor skills based on observed data.A particularly prominent technique in this regard is theDynamic Movement Primitives [16] framework in whichsensor trajectories are approximated by a nonlinear dynam-ical system. After training, the parameters of the dynamicalsystem can be changed so as to generalize the observed skillto novel scenarios. The extension to human-robot interactionvia Interaction Primitives [5] opened the door to complexinteractions with humans that are capable of generalizingto a wide variety of scenarios. A probabilistic framing ofInteraction Primitives [6] replaces the approximation witha probability distribution over a weighted combination oflinear basis functions, further strengthening the generaliza-tion capabilities despite requiring an additional controllerto ensure safe execution of trajectories. The representationinvolving probability distributions allows for training mixturemodels [17], thereby enabling the effective learning of mul-tiple complex actions. However, phase estimation and weightinference are performed separately, thus failing to leverageall of the prior information. Recent theoretical work on Inter-action Primitives has addressed this limitation by proposinga fully Bayesian derivation of the underlying concepts [4]. Yet these algorithms have largely only been deployed intraditional robots with electromechanical actuators and rigidlimbs [18], [19]. This work aims to fill that gap by showingthat sufficiently robust learning algorithms can safely andaccurately enable human-robot interaction in the face ofnonlinear dynamics imposed by musculoskeletal systems.III. B
AYESIAN I NTERACTION P RIMITIVES
Bayesian Interaction Primitives (BIP) [4] are a novel HRIframework based on learning from demonstration. Intuitively,BIP models the actions of two agents – a human and a robot– as time dependent trajectories for each measured degreeof freedom. Demonstration trajectories are represented as aweighted superposition of linear basis models. In turn, therelationship between actions can be captured by computingthe covariance between basis weights. During run-time, anobserved trajectory is generated from one of the agents whichis localized in both time and space. Once localized, thetrajectory for the other agent can be generated based on therelationship learned from the demonstrations.
A. Interaction Latent Model
We define an interaction Y as a time series of D -dimensional sensor observations over time, Y T =[ y , . . . , y T ] ∈ R D × T . Of the D dimensions, D o of themrepresent observed DoFs from the human and D c of themrepresent the controlled DoFs from the robot, such that D = D c + D o .In order to decouple the size of the state space from thenumber of observations while maintaining the shape of thetrajectories, we transform the interaction Y into a latentspace via basis function decomposition. Each dimension d ∈ D of Y is approximated with a weighted linear com-bination of time-dependent basis functions: [ y d , . . . , y dt ] =[Φ dφ (1) w d + (cid:15) y , . . . , Φ dφ ( t ) w d + (cid:15) y ] , where Φ dφ ( t ) ∈ R × B d isa row vector of B d basis functions, w d ∈ R B d × , and (cid:15) y is i.i.d. Gaussian noise. In this work, we employ Gaussianbasis functions as they are widely used in this type ofapplication [6]. Given that this forms a linear system ofequations, linear regression is employed to find the weights w d . The weights from each dimension are aggregated to-gether to form the full latent model of the interaction, w =[ w (cid:124) , . . . , w D (cid:124) ] ∈ R × B where B = (cid:80) Dd B d .The above basis functions are dependent on a relativephase value, φ ( t ) , rather than the absolute time t , such thatthe range of the phase function is linearly interpolated from [0 , over the domain [0 , T ] . The purpose of phase is todecouple the shape of a trajectory from its speed; whentransformed into phase space, a trajectory performed at botha slow and fast movement speed will yield the same basisfunction decomposition. In subsequent equations, we willrefer to φ ( t ) as simply φ to reduce notational clutter. B. Spatiotemporal Filtering
The objective of BIP is to infer the latent model ofan interaction, w , given a prior model w and a partialobservation Y t where φ ( t ) < . We assume T is unknown,nd so we must also simultaneously estimate the phase ofthe interaction at the same time as the latent model. Thisis possible due to correlated errors in the weights of thelatent model stemming from a shared error in the phaseestimate [4]. Intuitively, an error in the temporal estimatewill produce an error in the spatial estimate. We model thissimultaneous inference of space and time by augmenting thestate vector with both the phase and the phase velocity – thespeed of the interaction – such that s = [ φ, ˙ φ, w ] and p ( s t | Y t , s ) ∝ p ( y t | s t ) p ( s t | Y t − , s ) . (1)It is important to note that while the weights themselves aretime-invariant with respect to an interaction, our estimate ofthe weights is time-varying.The posterior density in Eq. 1 is computed with a recursivelinear state space filter [20] which consists of two steps: thepropagation of the state forward in time according to the sys-tem dynamics p ( s t | Y t − , s ) , and the update of the statebased on the latest sensor observation likelihood p ( y t | s t ) .We assume this system satisfies the Markov property, suchthat the state prediction density is defined as: p ( s t | Y t − , s )= (cid:90) p ( s t | s t − ) p ( s t − | Y t − , s ) d s t − . (2)Furthermore, as in the Kalman filter, we assume thatthe uncertainty associated with our state estimate is nor-mally distributed, i.e., p ( s t | Y t , s ) = N ( µ t | t , Σ t | t ) and p ( s t | Y t − , s ) = N ( µ t | t − , Σ t | t − ) . The system dynam-ics are defined such that the state evolves according to alinear constant velocity model: µ t | t − = t . . .
00 1 . . . ... ... . . . ... . . . (cid:124) (cid:123)(cid:122) (cid:125) G µ t − | t − , (3) Σ t | t − = G Σ t − | t − G (cid:124) + Σ φ,φ Σ φ, ˙ φ . . . ˙ φ,φ Σ ˙ φ, ˙ φ . . . ... ... . . . ... . . . (cid:124) (cid:123)(cid:122) (cid:125) Q t , (4)where Q t is the process noise associated with the statetransition, e.g., discrete white noise. The observation function h ( · ) is nonlinear and linearized via Taylor expansion: H t = ∂h ( s t ) ∂s t = ∂ Φ φ w ∂φ φ . . . ... ... ... . . . ... ∂ Φ Dφ w D ∂φ . . . Φ Dφ . (5)This yields the measurement update equations K t = Σ t | t − H (cid:124) t ( H t Σ t | t − H (cid:124) t + R t ) − , (6) µ t | t = µ t | t − + K t ( y t − h ( µ t | t − )) , (7) Σ t | t = ( I − K t H t ) Σ t | t − , (8) where R t is the Gaussian measurement noise associated withthe sensor observation y t .We compute the prior model s = [ φ , ˙ φ , w ] from a setof N initial demonstrations, W = [ w (cid:124) , . . . , w (cid:124) N ] , such that w is the arithmetic mean of the weights from each DoF: w = (cid:34) N N (cid:88) i =1 w i , . . . , N N (cid:88) i =1 w Di (cid:35) . (9)Assuming that all interactions start at the beginning, weset the initial phase φ to . The initial phase velocity ˙ φ ,however, is determined by the arithmetic mean of the set ofphase velocities from all demonstrations: ˙ φ = 1 N N (cid:88) i =1 T i , (10)where T i is the length of the i -th demonstration. Lastly, wedefine the prior density p ( s ) = N ( µ , Σ ) as µ = s , Σ = (cid:20) Σ φ,φ Σ W , W (cid:21) , (11)where Σ φ,φ is the variance in the phases and phase velocitiesof the demonstrations, with no initial correlations. C. Musculoskeletal Robots
Robots which utilize pneumatic artificial muscles in amusculoskeletal configuration are desirable in the contextof human-robot interaction. Due to their air-driven nature,they are inherently back-drivable and thus well-suited forphysical contact with humans. Furthermore, when arrangedin a musculoskeletal structure which mimics human anatomythey tend to produce predictable, legible [21] motions. Thisis important in interactions with humans as unexpectedmovements may result in injury or unsafe situations.Despite the suitability of PAMs for human-robot interac-tion, there are various challenges in working with them. Thefirst relates to the nonlinear dynamics inherent to pneumaticactuation and compliant behavior. Once external forces areapplied to the muscles, it becomes difficult to accuratelydetermine from the actuation pressure where in space theactuated components are. This limitation complicates controlduring interaction phases in which the robot experiencesexternal forces, but it also introduces difficulties when train-ing the robot. In learning from demonstration algorithms, acommon technique to train the robot relies on kinestheticguidance, i.e. the robot is physically moved along a desiredtrajectory and the internal states of the actuators are recorded.While it is possible to kinesthetically teach musculoskeletalrobots actuated with PAMs [12], this requires a specificdesign of the robot that is not always feasible. In the case ofthe robot used in this work, this approach was not possiblesince the state of pneumatic actuators is different when therobot is subject to external forces, i.e. undergoing kinestheticteaching, compared to when it is moving autonomously.However, with BIP, only the correlations between thetrajectories of the human and the robot need to be captured.As such, during training, instead of adapting the robot’s ig. 2. The motion of the human and robot over time when shaking hands at arbitrary end points with fast movement (top) and slow movement (bottom).Each sequence begins at the start of the interaction and each image is sampled at the same rate.Fig. 3. Different test interactions emphasizing BIP’s spatial generalization.The left image shows a handshake low and closer to the robot, while theright image shows a handshake high and closer to the human. trajectory to the human’s, we introduce a teaching method inwhich the human adapts to the robot while it is executing anopen-loop policy. More specifically, we created hand-craftedtrajectories of the robot’s desired action which are executedindependently of the human during training, while the humanproduces an appropriate response. In this way, we do notapply any external forces to the robot and are still able toaccurately capture the relationship between the trajectoriesdespite being unable to kinesthetically teach the robot.Another consideration is that of the execution of thegenerated response trajectories. The BIP algorithm updatesat a given frequency and at each iteration generates a newresponse trajectory from the current point in the interaction(as estimated by φ ) to the end of the interaction; this is trivialto calculate with the estimated latent state representation.This is preferable to generating only the next state, sincedepending on the size of the state dimension it may not bepossible to calculate the next state in real-time. However, thiscan produce discontinuous trajectories where the robot willneed to make a large adjustment in position when a newresponse trajectory is generated; this is unsafe in human-robot interaction. Therefore, an additional alpha-beta filter isemployed to smooth the transition between the trajectoriesgenerated by BIP. Mean(All) Var(All) Mean(T) Var(T) Mean(NT) Var(NT)BIP(Fast) 0.2640 0.0013 0.2479 0.0001 0.2752 0.0019BIP(Normal) 0.3094 0.0064 0.3148 0.0089 0.3062 0.0049BIP(Slow) 0.3824 0.0138 0.4049 0.0164 0.3689 0.0117Static(1) 0.3272 0.0055 0.2950 0.0012 0.3465 0.0070Static(2) 0.3009 0.0109 0.2450 0.0018 0.3345 0.0134Static(3) 0.3387 0.0057 0.3430 0.0060 0.3365 0.0056Static(4) 0.2914 0.0017 0.2642 0.0030 0.3078 0.0002TABLE I. The mean Time-to-Completion (as a ratio of trajectory length,lower is better) and variance for all test participants (All), test participantswho trained the model (T), and test participants who did not train themodel (NT). BIP refers to our approach while static refers to an open-loop trajectory. Green cells indicate the scenarios with the smallest meanvalues while gray cells indicate scenarios that are not significantly different,as calculated with the Mann-Whitney U test with a p -value < . . IV. E
XPERIMENTS AND R ESULTS
In order to evaluate the effectiveness of our algorithmin interaction scenarios with a musculoskeletal robot, wedesigned an experiment in which a human performs a jointphysical task with a robot. Specifically, we chose a hand-shake scenario and implemented a small set of manually-crafted trajectories for demonstrations. In this section weshow that not only is BIP capable of reproducing a robust,legible handshake motion, but that it is capable of success-fully generalizing to other humans.
A. Experimental Setup
The musculoskeletal robot [10] employed in this work,shown in Fig. 1, contains kinematic degrees of freedom: in the arm linkage and in the shoulder mechanism.These degrees of freedom are actuated with PAMsin an anatomical structure similar to that of humans. Dueto the prevalence of the spherical joints required to create acomplex biomimetic structure, the robot does not containconventional joint angle sensors. Rather, each PAM wasequipped with a tension sensor and a pressure sensor to ig. 4. The motion of the human and robot over time when the human partner does not move their arm. The robot does not engage in a hand shake.
Phase −0.6−0.4−0.20.0 H u m a n ( m ) Slow
Phase P D F −0.02 −0.01 0.00 0.01 0.02 Phase Velocity P D F Phase −0.6−0.4−0.20.0
Normal
Phase
Phase Velocity
Phase −0.6−0.4−0.20.0
Fast
Phase
Phase Velocity
Phase −0.6−0.4−0.20.0
None
Phase
Phase Velocity
Fig. 5. Top: the trajectory of the human’s hand along the x -axis, which approximately corresponds to the distance to the robot, during different testinteractions. The trajectory region shaded in red indicates the beginning portion of the interaction in which the participant has yet to move. The green regionindicates the period in which the participant actively moves to shake the robot’s hand. The blue region indicates the period after which the handshake iscompleted. Middle: the probability density corresponding to the estimated phase at the end of each aforementioned period (red, green, blue). Bottom: theprobability density corresponding to the estimated phase velocity for each region. capture the state of the actuators, however, only the pressuresensors were used in this experiment. The PAMs themselvesare connected to proportional valves which are controlled viaPID controllers operating at Hz with a pressure referencesignal. The values reported in this paper are measured in mPaas a difference from atmospheric pressure. Human subjectswere tracked with degrees of freedom using skeletontracking running on a Kinect v2 camera. The degrees offreedom correspond to the x -, y -, and z -position of the righthand, with the camera at an angle such that both the x - and y -axes indicate the distance and direction (left/right) of thehandshake, while the z -axis indicates the height. Thus, thetotal number of degrees of freedom in this experiment was and sampling was performed at a rate of Hz.During training, the robot executed manually-craftedtrajectories with no feedback, i.e., in an open-loop fashion, asexplained in Sec. III-C. These trajectories were constructedvia linear interpolation from a start pressure value and anend pressure value for each of the PAMs in the robot;for some of the PAMs, the start and end values were equalthus producing no movement for that DOF. The end pressurevalues were chosen to produce handshake end points overthe entire range of the robot in 3-D space. The humanparticipants who assisted in training were instructed to shakethe hand of the robot once for each executed trajectory overa time window of seconds. Three participants contributedtraining demonstrations with repetitions of each trajectory,resulting in a total of total training demonstrations.During testing, the three participants from training as well asfive additional participants were asked to shake the hand ofthe robot in eight different scenarios. In four of the scenarios,the robot once again executed a manually-crafted trajectory as in the training demonstrations, however, this time withdifferent end points. In the remaining four scenarios, the BIPalgorithm was employed with the robot generating responsetrajectories based on the human’s hand movement. Theparticipants were asked to move their hand to an arbitrarylocation and shake the hand of the robot while moving theirhand at a requested speed: fast, normal (as in the same speedas used in the demonstrations), slow, and a special case of nomovement at all. The speed definition was purposely vagueand left up to the determination of each participant so as toadequately test the temporal robustness of the BIP algorithm.Each test participant executed each scenario times, fora total of test trajectories. A response trajectory wasgenerated by BIP at a frequency of Hz (for computationalreasons) using basis degrees for a total state dimensionof . This trajectory was executed by the robot at Hzand consists solely of pressure values for each of the robot’sdegrees of freedom, no inverse kinematics or dynamics wereused at any point as such models were unavailable.
B. Results and Discussion1) Qualitative Analysis:
A sequence of images showingtwo different test interactions over time for different par-ticipants is shown in Fig. 2. Qualitatively, these sequencessimultaneously demonstrate three things: a) the robot learnedto reproduce surprisingly human-like handshake motionsdespite the lack of access to any sort of analytical model,b) BIP is capable of generalizing this motion across bothspace and time, and c) the algorithm is able to generalizeacross different human participants. The temporal differencesin particular can be clearly seen in Fig. 2; in the fast (top)sequence, the human and robot have reached each otherby the 5th frame, whereas in the slow (bottom) sequence, .0 0.2 0.4 0.6 0.8 1.0
Phase P D F SlowNormalFast0.000 0.005 0.010 0.015 0.020 0.025 0.030
Phase Velocity P D F Phase −0.6−0.4−0.20.0 H u m a n ( m ) Phase P D F −0.02 −0.01 0.00 0.01 0.02 Phase Velocity P D F Fig. 6. Left: the distribution of estimated phases (top) and phase velocities(bottom) after the participant moves to shake the robot’s hand, for all testedslow, normal, and fast interactions. This corresponds to the green regionshown in Fig. 5. Right: the same type of plot as in Fig. 5 for the same normalspeed interaction with the addition of an artificial pause at the beginning.
Normal
Previous PredictionCurrent PredictionActualHuman0.00.190.380.57 R o b o t P A M ( m P a ) Fast
Phase
Fast −0.6−0.4−0.20.00.2−0.6−0.4−0.20.00.2 H u m a n ( m ) −0.6−0.4−0.20.00.2 Fig. 7. The predicted pressure trajectory for one of the robot’s PAMsduring normal (top) and fast (middle and bottom) test interactions. In allcases, the current prediction (blue) is from approximately 50% through theinteraction (the 17th inference step), with the predicted trajectories (red)from the previous 15 inference steps shown for reference. These trajectoriesare given to the robot’s PID controller with the resulting actual pressurevalues shown in gray. The predicted and actual values do not directlycoincide due to physical restrictions inherent to the pneumatic system thatthe learned model is unaware of. The corresponding human observationsalong the x -axis are shown in orange. this does not occur until the 8th (last) frame. While spatialdifferences are visible in these sequences, they can be moreclearly seen in Fig. 3. In the first image, the human choosesan end point for the handshake that is low and near therobot, while in the second image the same human participantchooses an end point that is higher and closer to the human.Furthermore, an image sequence for an extreme edge caseis shown in Fig. 4 in which the human participant does notmove their hand at all, thus, never beginning the interaction.In response, the robot similarly does not proceed with theinteraction and produces only minute movements.
2) Temporal Analysis:
An analysis of the inferred phaseat different points in the interaction for each movementspeed is shown in Fig. 5. The trajectories shown in the topplots represent the movement of the human’s hand and arebroken up into three periods: the period before the humanhas moved (red), the period during which the human ismoving (green), and the period after the human has stopped moving (blue). The phase and phase velocity predictions atthe end of each period are shown such that the portion of thetrajectory falling inside the corresponding shaded region hasbeen observed. These plots yield an interesting insight: theestimated phase for each movement speed is approximatelythe same as in the normal speed case. That is, no matterhow quickly the interaction proceeds in real time, the phaseof the interaction is the same immediately before movementbegins and immediately after. Instead, it is the phase velocitythat differs, particularly in the region during which movementoccurs (the green region). The slow speed interaction has thesmallest phase velocity at the end of the movement periodwhile the fast speed interaction has the greatest velocity. Thepractical implication of this is that the amount of absolutetime required to reach each point in the interaction differs dueto different phase velocities, even if the phase value at eachpoint is the same. Since the state (Eq. 3) evolves accordingto a linear velocity model, a slower phase velocity requiresmore state updates to reach the same point in phase.Figure 6 depicts the distribution of estimated phases andphase velocities for all interactions, and shows that this trendholds for all cases. While the phase estimates exhibit littledifference between the movement speed cases, the velocityestimates show a positive relationship between movementspeed and phase velocity. Analytically, this is due to ourchoice of initial uncertainty in the phase and phase velocityin Eq. 11; the uncertainty in phase is set much lower thanthat of phase velocity because we are confident that theinitial phase is . Phase and phase velocity have a non-zerocovariance (an uncertainty in velocity affects the uncertaintyin phase), but due to the different magnitudes in the initialuncertainty the phase velocity experiences more significantupdates during inference. The other observation we can makefrom Fig. 6 is that the larger phase velocity estimates exhibitgreater variance. This is due to the increased uncertaintycorresponding to the larger estimates, which can also beseen in the green velocity plots of Fig. 5. As a consequence,this impacts the uncertainty in the phase estimate for theblue region, as inference has already taken into account thevelocities (and uncertainties) of the green region. This is whythe uncertainty for the blue region phase estimate increaseswith interaction speed in Fig. 5.Lastly, we note that BIP handles the special case of nomovement speed by reducing the phase velocity to , thusstopping the temporal progression of the interaction. Thisis visualized in the last column of Fig. 5. However, thiselimination of the temporal velocity is flexible and can berecovered from, as can be seen with the introduction of anartificial pause at the beginning of the interaction as shown inthe right column of Fig. 6. Although the phase velocity hasbeen reduced to at the end of the extended ”no movement”(red) period, it quickly progresses when the human beginsmoving and yields phase estimates consistent with what weexpect. This indicates that not only is BIP robust to themovement speed of the interaction, but it is also robust tovariations in when the interaction begins. ) Spatial Analysis: While we have demonstrated that theBIP framework is robust to temporal variations, our resultsalso show that is also robust to spatial variations. As Fig. 7shows, the predicted robot trajectory is dependent both on thevalue of the human observations (spatial variation) as well asthe rate of change (temporal variation). The top and middlefigures demonstrate spatial generalization through the patternof increasing pressure values associated with the previouspredictions (red dashed lines); as more human observationsbecome available, BIP continues to refine the prediction.However, because these two interactions have similar endpoints, the effect of movement speed becomes evident onthe predictions. While both predictions ultimately arrive atapproximately the same pressure value for this particularPAM, BIP predicts these values for the fast interactionmuch earlier in response to the high rate of change of thehuman movement. In contrast, the bottom figure is associatedwith a fast interaction at a different end point and thepredicted trajectory is significantly different than the others,demonstrating the ability to spatially generalize to differentend points. These figures also highlight limitations of the BIPframework: namely, lack of knowledge of control lag andphysical constraints. The control lag can be observed via thephase shift between the predicted pressure values and whenthe actual robot actually reaches those values (gray line).Similarly, BIP yields inferred pressure values that the systemis incapable of reaching due to physical and/or mechanicallimitations. This is visible in the offset between the currentprediction (blue dashed line) and the actual robot.As a result of spatial generalization, empirically we hy-pothesize that the robot and human positions converge tothe same physical location faster when both participants areactive in the interaction (BIP scenarios), as opposed to onlythe human (static scenarios). However, for technical reasons,we lack the position of the robot end-effector in 3D spaceand, therefore, measure the length of an interaction by howlong it takes the human and robot degrees of freedom toconverge to steady-state values, which we are referring toas Time-to-Completion. To this end, the variance of eachDoF was calculated over a sliding window of seconds. Ifthe variance of all DoFs within the window falls below athreshold – . m for the human and . mPa for therobot – then the start time of the sliding window is takento be the Time-to-Completion. These thresholds are chosensuch that all scenarios yield a completion time.The results, shown in Table I, support our hypothesis.The mean Time-to-Completion values for all test participantsare smaller when the BIP algorithm is employed comparedto a static handshake trajectory. The one exception is inthe case of test participants who also trained the model(the T subset), in which case the BIP interactions werenot statistically different than the second static trajectory.This is an interesting observation, since it suggests thatparticipants who had already participated in training themodel were better able to predict where the robot would gowhen compared to new participants, despite new handshakeend points. Thus, their interactions resulted in lower Time- -0.450-0.300-0.150-0.000 H u m a n X ( m ) BIPStatic -0.040-0.025-0.0100.005 H u m a n X ( m / s ) -0.300-0.225-0.150-0.075 H u m a n Y ( m ) -0.0050.0020.0100.017 H u m a n Y ( m / s ) Phase H u m a n Z ( m ) Phase -0.012-0.007-0.0020.003 H u m a n Z ( m / s ) Fig. 8. Left: the position distribution (mean and std dev) for all static andBIP scenarios for the human’s hand along the x -axis (top), y -axis (middle),and z -axis (bottom). Right: the corresponding velocity distributions. to-Completion times than the participants who hadn’t trainedthe model (the NT subset). However, despite their familiaritywith the experiment, BIP was still able to produce similarcompletion values; in the case of new participants who hadn’ttrained the model, BIP resulted in more efficient interactionswith smaller completion values.The results in Table I also indicate that the variancein Time-to-Completion is smaller for the static trajectorieswhen compared to the BIP ones, particularly slow speedinteractions. This is expected considering that the test par-ticipants are free to choose their own handshake end pointwhen testing with BIP as opposed to the static trajectorieswhich result in an identical handshake end point for everyparticipant. This is especially clear when looking at theposition and velocity distributions of the human as shownin Fig. 8. Likewise, the human begins moving earlier in BIPscenarios than static ones, as they are dictating the end pointsrather than the robot – visualized in the earlier peak of thevelocity distributions. The shape of the velocity distributionsare also quite illuminating, as the BIP distributions closelyresemble the typical bell-shaped velocity profiles of point-to-point movements observed in humans [22].The last result highlighting spatial generalization is thatthe Pearson correlation coefficients differ between the humanand robot trajectories for BIP and static handshakes, asshown in Fig. 9. Unsurprisingly, the only correlations abovea magnitude of . in the static scenarios are between thehuman degrees of freedom and the actuated robot degrees;the un-actuated degrees exhibit no significant correlations.By contrast, all degrees of freedom which were actuated inthe training process yielded significant correlations when BIPis used, indicating that BIP is effectively exploiting the modellearned from demonstrations. This is also supported by thehistogram that is generated from a non-actuated degree offreedom (in the static scenarios) in Fig. 10. The histogram forthe static scenarios is approximately Gaussian with a meanof , in other words, Gaussian noise. However, the histogramfor the BIP scenarios is bi-modal with peaks at − and , obot H R o b o t H Robot H R o b o t H −1.0−0.50.00.51.0 Fig. 9. Mean Pearson correlation coefficients for static trajectories (left)and BIP (right). The color of the squares indicates the magnitude of thecorrelation. The first 27 rows and columns represent the coefficients forthe robot degrees of freedom. The last 3 rows and columns represent thehuman. The mean correlations are calculated for all participants in all staticscenarios and all BIP scenarios. −1.0 −0.5 0.0 0.5 1.0
Correlation coefficient F r e q u e n c y −1.0 −0.5 0.0 0.5 1.0 Correlation coefficient F r e q u e n c y Fig. 10. Histogram for the Pearson correlation coefficient generated from asliding window over static trajectories (left) and BIP (right). This correlationis for an un-actuated PAM and hand position along the y -axis. demonstrating strong positive and negative correlations.V. C ONCLUSIONS
In this work, we have shown that Bayesian InteractionPrimitives can be successfully utilized in a physical, coopera-tive human-robot interaction scenario with a musculoskeletalrobot. Despite the challenges inherent to pneumatic artifi-cial muscles – nonlinear dynamics and lack of kinestheticteaching – BIPs were able to learn strong spatiotemporalrelationships between the movement trajectories of the robotand human test participants. Furthermore, these relation-ships were generalizable to new human partners who didnot take part in training the model, as well as significanttemporal variations including an edge case in which thehuman does not interact at all. At the same time, we havealso identified limitations of the BIP framework, such asthe disregard for control lag and mechanical constraints,which pose challenges to real-time, physical interactions.In future work, we will study more complex interactionscenarios that involve a sequence of multiple primitives,introduce sensors of multiple modalities, and attempt to learndirect spatiotemporal relationships between human muscleactivations and the PAMs of a musculoskeletal robot.A
CKNOWLEDGMENT
We would like to thank Masuda Hiroaki for his assistance.This work was supported by the National Science Foundationunder Grant Nos. 1714060 and IIS-1749783, JSPS underKAKENHI Grant Number 18H01410, and the Honda Re-search Institute. J.C. is a JSPS International Research Fellow.R
EFERENCES[1] G. K. Klute, J. M. Czerniecki, and B. Hannaford, “Mckibbenartificial muscles: pneumatic actuators with biomechanical intelli-gence,” in
Advanced Intelligent Mechatronics, 1999. Proceedings.1999 IEEE/ASME International Conference on . IEEE, 1999, pp.221–226. [2] B. Tondu, S. Ippolito, J. Guiochet, and A. Daidie, “A seven-degrees-of-freedom robot-arm driven by pneumatic artificial muscles forhumanoid robots,”
The International Journal of Robotics Research ,vol. 24, no. 4, pp. 257–274, 2005.[3] M. Van Damme, B. Vanderborght, B. Verrelst, R. Van Ham, F. Daer-den, and D. Lefeber, “Proxy-based sliding mode control of a planarpneumatic manipulator,”
The International Journal of Robotics Re-search , vol. 28, no. 2, pp. 266–284, 2009.[4] J. Campbell and H. Ben Amor, “Bayesian interaction primitives: Aslam approach to human-robot interaction,” in
To Appear: Proceedingsof the 1st Conference on Robot Learning (CoRL) , pp. 1–9.[5] H. B. Amor, G. Neumann, S. Kamthe, O. Kroemer, and J. Peters,“Interaction primitives for human-robot cooperation tasks,” in
Roboticsand Automation (ICRA), 2014 IEEE International Conference on .IEEE, 2014, pp. 2831–2837.[6] G. Maeda, M. Ewerton, R. Lioutikov, H. B. Amor, J. Peters, andG. Neumann, “Learning interaction for collaborative tasks with prob-abilistic movement primitives,” in
Humanoid Robots (Humanoids),2014 14th IEEE-RAS International Conference on . IEEE, 2014, pp.527–534.[7] T. Noritsugu and T. Tanaka, “Application of rubber artificial musclemanipulator as a rehabilitation robot,”
IEEE/ASME Transactions OnMechatronics , vol. 2, no. 4, pp. 259–267, 1997.[8] N. G. Tsagarakis and D. G. Caldwell, “Development and controlof a soft-actuatedexoskeleton for use in physiotherapy and training,”
Autonomous Robots , vol. 15, no. 1, pp. 21–33, 2003.[9] A. Thomaz, G. Hoffman, M. Cakmak, et al. , “Computational human-robot interaction,”
Foundations and Trends R (cid:13) in Robotics , vol. 4, no.2-3, pp. 105–223, 2016.[10] A. Hitzmann, H. Masuda, S. Ikemoto, and K. Hosoda, “Anthropo-morphic musculoskeletal 10 degrees-of-freedom robot arm driven bypneumatic artificial muscles,” Advanced Robotics , vol. 32, no. 15, pp.865–878, 2018.[11] C.-P. Chou and B. Hannaford, “Static and dynamic characteristics ofmckibben pneumatic artificial muscles,” in
Proceedings of the 1994IEEE international conference on robotics and automation . IEEE,1994, pp. 281–286.[12] S. Ikemoto, Y. Nishigori, and K. Hosoda, “Direct teaching methodfor musculoskeletal robots driven by pneumatic artificial muscles,” in
Robotics and Automation (ICRA), 2012 IEEE International Conferenceon . IEEE, 2012, pp. 3185–3191.[13] A. Hildebrandt, O. Sawodny, R. Neumann, and A. Hartmann, “Cas-caded control concept of a robot with two degrees of freedom drivenby four artificial pneumatic muscle actuators,” in
Proceedings of the2005, American Control Conference, 2005.
IEEE, 2005, pp. 680–685.[14] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot program-ming by demonstration,” in
Springer handbook of robotics . Springer,2008, pp. 1371–1394.[15] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A surveyof robot learning from demonstration,”
Robotics and autonomoussystems , vol. 57, no. 5, pp. 469–483, 2009.[16] S. Schaal, “Dynamic movement primitives-a framework for motorcontrol in humans and humanoid robotics,” in
Adaptive motion ofanimals and machines . Springer, 2006, pp. 261–280.[17] M. Ewerton, G. Neumann, R. Lioutikov, H. B. Amor, J. Peters, andG. Maeda, “Learning multiple collaborative tasks with a mixture ofinteraction primitives,” in
Robotics and Automation (ICRA), 2015IEEE International Conference on . IEEE, 2015, pp. 1535–1542.[18] D. Kuli´c, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, “Incrementallearning of full body motion primitives and their sequencing throughhuman motion observation,”
The International Journal of RoboticsResearch , vol. 31, no. 3, pp. 330–345, 2012.[19] L. Rozo, S. Calinon, D. G. Caldwell, P. Jimenez, and C. Torras,“Learning physical collaborative robot behaviors from human demon-strations,”
IEEE Transactions on Robotics , vol. 32, no. 3, pp. 513–527,2016.[20] S. Thrun, W. Burgard, and D. Fox,
Probabilistic robotics . MIT press,2005.[21] A. D. Dragan, K. C. Lee, and S. S. Srinivasa, “Legibility andpredictability of robot motion,” in
Proceedings of the 8th ACM/IEEEinternational conference on Human-robot interaction . IEEE Press,2013, pp. 301–308.[22] J. R. Flanagan and D. J. Ostry, “Trajectories of human multi-jointarm movements: Evidence of joint level planning,” in