[PDF] Learning to Shift Attention for Motion Generation

Abstract

One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. The other difficulty is the small number of demonstrations that cannot cover the entire working space. To overcome this problem, a motion generation model with extrapolation ability is needed. Previous works restrict task queries as local frames and learn representations in local frames. We propose a model to solve both problems. For multiple modes, we suggest to learn local latent representations of motion trajectories with a density estimation method based on real-valued non-volume preserving (RealNVP) transformations that provides a set of powerful, stably invertible, and learnable transformations. To improve the extrapolation ability, we propose to shift the attention of the robot from one local frame to another during the task execution. In experiments, we consider the docking problem used also in previous works where a trajectory has to be generated to connect two dockers without collision. We increase complexity of the task and show that the proposed method outperforms other approaches. In addition, we evaluate the approach in real robot experiments.

Full PDF

LLearning to Shift Attention for Motion Generation

You Zhou, Jianfeng Gao and Tamim Asfour

Abstract — One challenge of motion generation using robotlearning from demonstration techniques is that human demon-strations follow a distribution with multiple modes for one taskquery. Previous approaches fail to capture all modes or tendto average modes of the demonstrations and thus generateinvalid trajectories. The other difﬁculty is the small numberof demonstrations that cannot cover the entire working space.To overcome this problem, a motion generation model withextrapolation ability is needed. Previous works restrict taskqueries as local frames and learn representations in localframes. We propose a model to solve both problems. Formultiple modes, we suggest to learn local latent representationsof motion trajectories with a density estimation method basedon real-valued non-volume preserving (RealNVP) transforma-tions that provides a set of powerful, stably invertible, andlearnable transformations. To improve the extrapolation ability,we propose to shift the attention of the robot from one localframe to another during the task execution. In experiments,we consider the docking problem used also in previous workswhere a trajectory has to be generated to connect two dockerswithout collision. We increase complexity of the task and showthat the proposed method outperforms other approaches. Inaddition, we evaluate the approach in real robot experiments.

I. I

NTRODUCTION

Learning from demonstrations (LfD) is a promising ap-proach in robotics research [1]. It simpliﬁes robot program-ming and enables a ﬂexible realization of a variety of robotapplications. Two critical challenges of LfD are how torepresent motions and how to generalize the learned skillto different task queries. A proper motion representationsimpliﬁes skill generalization and allows generating motiontrajectories for inexperienced task queries. To address theproblem of skill generalization, we can learn a mapping fromtask queries to an appropriate motion trajectory based onmultiple human demonstrations. Learning such a mapping isnot a trivial task. One challenge is that the demonstrationsusually contain multiple modes, which should be consideredfor learning a motion generation model. However, multiplemodes cannot be learned by many regression models suchas a Gaussian process regression or a fully-connected neuralnetwork.Another difﬁculty is that usually a rather small number ofdemonstrations is available which do not cover the entirework space. Thus, the learned motion generation modelshould be able to extrapolate to task queries that are notin the training queries’ range.

The research leading to these results has received funding from theGerman Federal Ministry of Education and Research (BMBF) under theproject OML (01IS18040A)The authors are with the Institute for Anthropomatics and Robotics,Karlsruhe Institute of Technology, Karlsruhe, Germany. { you.zhou,asfour } @kit.edu Fig. 1: Two challenges of a motion generation system: 1)multiple modes in human demonstrations; 2) extrapolationto task queries outside of the training range.To meet these two challenges, based on the idea of localframes, we propose a model that consists of 1) a latenttransformation that transforms demonstration trajectories toa latent space with a single mode distribution, and 2) arecurrent attention model that shifts the attention of the robotfrom one local frame to another.II. R

ELATED W ORK

Several approaches in the literature have been proposedfor learning a mapping from task queries to parameterizedmotion representations such as dynamic movement prim-itives (DMPs) (see [2], [3], [4], [5] and [6]). Differentregression models are applied for learning this mapping, suchas locally weighted regression, Gaussian process regression,autoencoders, and mixture density network (MDN). Thegeneralization capability of the motion generation modelis dependent on the properties of regression methods. Forexample, compared to other methods, MDN can handle themultiple modes problem as shown in our previous work [6].However, like other methods, it cannot handle the extrapola-tion where new task queries are far from the training queries’range.To solve the multiple modes problem, instead of a GMM-based method such as MDN, we can convert a multiplemode distribution into a single mode distribution. Here,one popular model is variational auto-encoder ([7]) (VAE),whose encoder maps data to a latent distribution close toa Gaussian distribution and the decoder maps the samples a r X i v : . [ c s . R O ] F e b rom latent distribution to the original space. An alternativesolution is the normalizing ﬂow models, a set of invertiblemappings that transform from arbitrary distributions to asimple distribution (see [8] for a review). These modelsare directly trained with likelihood costs and the change ofvariable formula. Among these models, the real valued non-volume perserving transformation (RealNVP) was proposedin [9] and allows integrating arbitrary functions or neuralnetworks and still guarantees invertibility with the uniquestructure. VAEs and RealNVPs were extended separately totheir conditional versions, such as in [10], [11] to considertask queries. However, because of the poor regularizationof neural networks, they can nearly extrapolate to queriesoutside the training queries’ range. Hence, they cannot solvethe problems mentioned above.Task-parameterized Gaussian mixture model (TP-GMM),presented in [12], [13] is a popular method to improveextrapolation ability. It restricts task queries to location andorientation of local frames and maps the task space trajecto-ries to their representations in local frames. Then, Gaussianmixture models (GMMs) are learned for the distribution ofthese local representations. During the execution, TP-GMMscombine those GMMs and generates motions for new taskqueries.However, the combination of Gaussian components fromlocal frames distorts the generated global trajectories, es-pecially when they are far away from the those in humandemonstrations. In [14], [15], [16], the authors solve thisproblem by introducing a variance division factor to makethe low variance component more valuable in Gaussianmultiplication. The assumption is that the local frame with alower variance is more important than the one with a highervariance. In [16], the variance division factor α for eachframe is dependent on the variance of their local trajectories ΣΣΣ . For the k -th local frame, α k = || ΣΣΣ − γk || (cid:80) Kk =1 || ΣΣΣ − γk || , (1)where γ is a parameter that determines how sensitive thevalue is to the variance.The low variance assumption only works when taskqueries restrict the right motion trajectories, i. e. the rightdemonstrations, in a small area of the task space. However,some task queries, such as an obstacle’s location, do notrequire the motion trajectories to go through a small area.Hence, the proposed methods ignore such task queries.Moreover, TP-GMM based methods cannot solve multiplemodes problem. III. O UR A PPROACH

Instead of learning GMMs for local frames as in TP-GMMs, we associate each local frame with a latent dis-tribution given by a RealNVP. As discussed, RealNVPscan convert arbitrary, i. e. multiple mode, distribution intoa single mode distribution in the latent space. Compared tothe approximate inference of VAEs, RealNVPs enable exactinference and sampling in the latent space with an invertible Fig. 2: SALaT: Shift Attention Latent Transformation.

Left: a recurrent shift attention model based on GRUs outputsthe attention weights { a k } Kk =1 . Right: each local frame isassociated with a linear transformation T followed by arecurrent RealNVP.and stable mapping. This makes RealNVPs predestined tosolve the multiple modes problem in motion generationLike all other general purpose regression models, neuralnetworks can hardly handle extrapolation. Hence, we avoidto provide task queries as inputs to the neural networksever used in the model. Like TP-GMMs, we restrict thetask queries to the location and orientation of local frames.However, unlike TP-GMMs that obtain a global motiontrajectory with Gaussian multiplication, we consider thatthe robot should learn to shift its attention to differentlocal frames during task execution. When attention is onone local frame, the motion trajectory is generated by thecorresponding RealNVP.We propose a shift attention latent transformation(SALaT). As shown in Figure 2, it consists of two parts:latent transformation and shift attention model. In the fol-lowing, we ﬁrst introduce latent transformation based onRealNVPs that transform trajectories to a latent distribution.Then, we introduce the attention model to combine theselocal models and explain how it is learned. A. Latent Transformations

Before latent transformations, we ﬁrst transform alldemonstration trajectories into local frames. Each task query qqq is associated with the transformations of K local frames: T ( qqq ) = (cid:8) T k = ( AAA k , bbb k ) (cid:9) Kk =1 , (2)where AAA k and bbb k are the rotation and translation of the k -th local frame. From N demonstrations, we get N localtrajectories for each local frame.For the latent mapping, we split the local trajectory pointat each timestamp into two parts such that yyy = [ yyy , yyy ] .ig. 3: The structure of a recurrent RealNVP. The scalingand translation networks s , s , t , t are Bi-GRUs.According to the RealNVPs shown in Figure 3, the latenttransformation is zzz = yyy (cid:12) exp( s ( yyy )) + t ( yyy ); zzz = yyy (cid:12) exp( s ( zzz )) + t ( zzz ) , (3)where s , s and v , v are scaling and translationfunctions and (cid:12) is the Hadamard product or element-wiseproduct. The inverse RealNVPs are calculated with yyy = ( zzz − t ( zzz )) (cid:12) exp( − s ( zzz )); yyy = ( zzz − t ( yyy )) (cid:12) exp( − s ( yyy )) . (4)The advantage of a RealNVP is that its invertibility isindependent of the structure of the scaling and translationfunctions. To consider the temporal state of the trajectory,we use bi-directional gated recurrent units (Bi-GRUs) ([17],[18]) to model these functions.To train RealNVPs, we directly calculate the likelihood ofthe data with the change of variable formula, namely p ( yyy ) = p Z ( f ( yyy )) (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂f ( yyy ) ∂yyy T (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (5)where f is the latent mapping shown in Figure 3 and det (cid:18) ∂f ( yyy ) ∂yyy T (cid:19) = exp (cid:88) j s ( yyy ) + (cid:88) j s ( zzz )  . (6)Hereby, j and j indicate the dimension of neural networkoutputs. We can assume that the trajectory points at differenttimestamps are independent of each other, and use a negativelog-likelihood cost function such that L nvp = − T (cid:88) i =1  log p Z ( f ( yyy )) + (cid:88) j s ( yyy , i ) + (cid:88) j s ( zzz , i )  , (7)where T is the sequence length of the Bi-GRUs. Thelatent distribution Z ∼ N (0 , ΣΣΣ ) follows a standard multi-dimensional Gaussian distribution.This cost, however, does not consider the covariancebetween trajectory points at two different timestamps. Thus,it generates latent trajectories that might not be smooth in thelatent space. To consider the covariance between timestamps,we construct a Gaussian distribution with a covariance matrix K such that K i,j = k ( x i , x j ) = σ f exp (cid:18) − ( x i − x j ) l (cid:19) (8) where x is a temporal factor and the hyper-parametersare predeﬁned such that σ f = 1 , l = 1 . The distribution p Ξ ( ξ zzz ) ∼ N (0 , K ) is similar to a Gaussian process. TrainingRealNVPs with the cost function L nvp = − log p Ξ ( ξ zzz ) − T (cid:88) i =1 (cid:88) j s ( yyy , i ) + (cid:88) j s ( zzz , i )  (9)gives better results than using the previous cost according toour experience. B. Recurrent Attention Model

After training RealNVPs for local frames, we sample oruse the mean latent trajectory from the standard latent distri-bution. The attention model determines how much attentionis paid on each local frame at each timestamp. As shownin Figure 2, we use GRUs followed by a softmax layer togenerate attention weights { a k } Kk =1 for K local frames. Withthese attention weights, the generated trajectory point at atimestamp t is ˆ yyy t = K (cid:88) k =1 a k,t · T − k (cid:0) · N V P − k ( zzz k,t ) (cid:1) , (10)where { zzz k,t } Kk =1 are corresponding latent trajectory points.The cost function for training the attention model consistsof three different parts: variance weighted reproduction cost,attention distribution cost and smoothness cost . The varianceweighted reproduction cost guarantees that the generatedtrajectories by the model are similar to demonstrations espe-cially in low variance regions. However, it could ignore localframes containing high variance local trajectories. To solvethis problem, we introduce the attention distribution cost thatdistribute the attention according to the assumption that therobot should pay attention to one local frame at one time butneeds to distribute attention equally to all local frames overtime. However, this cost introduces high acceleration in thegenerated trajectories. To guarantee the smooth transition, asmoothness cost is introduced.

1) Variance Weighted Reproduction Cost:

In [16], theauthor suggested a variance weighted reproduction cost todetermine the parameter γ in Equation 1. We use a similarcost for the attention model. For N demonstrations, wehave N local trajectories { ξ n } Nn =1 in each local frame.We calculate K variance values for the trajectory points attimestamp t and ﬁnd the minimal value v t such that v t = min k ∈ K V ar (cid:0) { ξ nk,t } Nn =1 (cid:1) . (11)The variance weight is calcuated by w t = 1 v t + (cid:15) , (12)where (cid:15) = 0 . is used in all our experiments. The repro-duction cost is L reprod = N (cid:88) n =1 T (cid:88) t =1 w t || ξ nt − ˆ ξ nt || (13)with ξ and ˆ ξ , the demonstrated and generated trajectories. ) Attention Distribution Cost: The variance weightedreproduction cost penalizes the reproduction errors in lowvariance regions but ignores totally the local frame con-taining high-variance local trajectories such as an obstaclementioned before. To solve this problem, we assume thatthe attention should be roughly equally distributed to all K local frames during the task execution and end up with anattention distribution cost such that L traj = − log 1 K +1 N N (cid:88) n =1 K (cid:88) k =1 (cid:32) (cid:80) Tt =1 a nk,t T (cid:33) · log (cid:32) (cid:80) Tt =1 a nk,t T (cid:33) (14)where a nk,t is the attention on k -th local frame at thetimestamp t in n -th demonstration, and ( (cid:80) Tt =1 a nk,t ) /T is theaverage attention on k -th local frame in n -th demonstration.Minimizing this cost function distributes the attentionequally to different local frames over time. However, itcan also result in an equal distribution of attention at eachtimestamp, which weakens the model’s extrapolation ability.Considering the case when attention is distributed equally todifferent local frames at each timestamp, the model cannotextrapolate when local frames are far away from each otherbecause the unseen relative position of local frames unpre-dictably distorts the trajectory shape. Hence, we introduceanother cost such that L point = 1 N · T N (cid:88) n =1 T (cid:88) t =1 K (cid:88) k =1 (cid:0) − a nk,t log a nk,t (cid:1) . (15)Minimizing this cost distributes the attention to only onelocal frame at each timestamp. The attention distribution costis the summation of both costs that L dist = w traj L traj + w point L point . (16)In our experiments, w traj = 10 and w point = 1 .The attention distribution cost corresponds to the assump-tion that a robot should pay attention to one local frame ateach timestamp but distribute its attention to all local framesequally during the whole task.

3) Smoothness Cost:

However, the attention distributioncost results in a rapid shift of attention from one localframe to another and, thus, a large acceleration in generatedtrajectories. To solve this problem, we introduce the cost topenalize the rapid changes in the trajectory such that L smooth = 1 N N (cid:88) n =1 T − (cid:88) t =1 || ˆ yyy t +1 ,n − ˆ yyy t,n || . (17)The smoothness cost guarantees the smooth transition fromone local model to another. The total cost for training theattention model is the sum of all three costs such that L att = L reprod + L dist + L smooth . (18)In many applications, local trajectories already followGaussian distribution such as in the docker experiment(see subsection IV-A). In this case, we can skip the latent Fig. 4: Dockers Experiment. Top: the training and testingdata for the experiment.

Bottom: two examples for extrap-olation.transformation and directly use the shift attention model. Wecall such simpliﬁed version of SALaT the shift attentionlinear transformation (SALiT). However, as shown in othertasks than the docker experiments, a Gaussian is not enoughto represent the local trajectory distribution, where SALiTfails to generate valid trajectories.

C. Task Execution with SALaT

For task execution, we can either draw latent trajectoriesfrom the latent distribution or use its mean trajectory andtransform them to local trajectories according to new taskqueries. The attention model is applied to combine theselocal models and generate global trajectories to accomplishthe task.Sampling from the latent distribution does not guaranteethe generated trajectories’ soundness because the resultingsamples can still be away from the mean. To evaluate thesuccess rate in simulated experiments, we instead selected themean latent trajectory. In subsection IV-D, we use samplesto show that SALaT can generate trajectories associated withdifferent modes. IV. E

VALUATIONS

To evaluate our method, we conduct simulated and realexperiments. Inspired by [12], [16], we construct a dockerexperiment, where a valid trajectory connects the start andgoal dockers without collision. It is an abstraction of manyrobot applications as shown in subsection IV-D. To introducemultiple modes and increase the task complicity, we add oneobstacle and construct a docker-obstacle experiment, where avalid trajectory should also go around the obstacle. To makethe task even more difﬁcult, we change the goal docker toa tunnel and construct a docker-obstacle-tunnel experiment,where a valid trajectory should go through the tunnel andcome back to the start docker. We compare SALaT withig. 5: Docker-Obstacle Experiment.

First row: trainingand testing dataset for the docker-and-obstacle experiment.

Second row: two testing examples for extrapolation.

Thirdrow: the change of α in α TP-GMM.

Last row: the outputsof the shift attention model.

Exp TP-GMM([12]) α TP-GMM([16]) SALiT

SALaT

TABLE I: Success rate of methods for different experiments.Experiments are docker experiment, docker-obstacleexperiment, docker-obstacle-tunnel experimentprevious approaches based on TP-GMMs ([13] and [16]) inall three experiments.In robot experiments, we construct a task where the robotshould slide a tool out of a aluminium proﬁle and insertit into another one. Since it allows two different kinds ofsliding motions, it rasises multiple modes problem. Afterbeing trained on a limited number of demonstrations, theSALaT can extrapolate to new task queries and generatemotions from different modes. A. Docker Experiment

For the docker experiment, we collected training dataon a tablet and tested on new task queries. As shownin the top of Figure 4, the testing queries are uniformlysampled from the locations and the orientations that do not Fig. 6: Docker-Obstacle-Tunnel Experiment. First row: test-ing examples for extrapolation.

Second row: the change of α TP-GMM.

Third row: the outputs of the shift attentionmodel.appear in the training dataset. Two examples are shownin the bottom of Figure 4. The success rate is calculatedby counting the number of successful trials in tests.As shown in the ﬁrst row of Table I, all three methods α TP-GMM, SALiT and SALaT performed equally well inthe task. As mentioned before, since demonstrations in thedocker experiment already follow a Gaussian distribution, alatent representation does not improve the performance a lot.Furthermore, it meets the assumption made by α TP-GMMthat the local frame containing low variance trajectories isimportant at each timestamp. For the attention model, it justshifts the attention simply from the start docker to the enddocker.

B. Docker-Obstacle Experiment

We add one obstacle in between two dockers to introducemultiple modes in the task. A human can draw the curves thatgo around the obstacle from either its left or right side. Wecollected data for training and used testing queries.As shown at the top of Figure 5, the extrapolation test datasethas no overlap with the training dataset. In the two examplesshown in the second row of Figure 5, all methods exceptSALaT fail to generate a collision-free trajectory becausethey average both modes. As shown in Table I, SALaTperforms better than other methods.The third row of Figure 5 describes the change of α givenby Equation 1, the α TP-GMM ignores the obstacle all thetime (see the blue curve), which is another reason why itdoes not work well in this experiment. It can still have a success rate without considering the obstacle because itis randomly placed and might not necessarily be between twodockers. In contrast, both SALaT and SALiT pay attentionto the obstacle even though the trajectories’ variance is high.Since the SALiT uses the average of the trajectories that goig. 7: Robot Experiments with SALaT.through the obstacle, it has a zero success rate.

C. Docker-Obstacle-Tunnel Experiment

To further evaluate the model, we introduce a tunnel intothe experiment. A successful trial means that the trajectorygoes through the tunnel and returns to the start dockerwhile avoiding collisions with all obstacles (see the topdiagrams in Figure 6 for two testing examples). The trainingand testing queries have similar ranges, as in the previousexperiment (see the top plots in Figure 5). We collected demonstrations and randomly sampled testing queries.The attention should be paid twice to the obstacle. As shownwith the purple curves at the bottom of Figure 6, a SALaTmodel successfully learns how to shift attention to realizethe successful task execution. The α TP-GMM also changesthe parameter α accordingly because human demonstrationsaccidentally have relatively low variance when getting nearto the obstacle. However, α is still not signiﬁcant enough togenerate correct trajectories around the obstacle. As shownin Table I, the SALaT model outperforms others. D. Robot Experiment

We conducted the robot experiment on the humanoidARMAR-6 ([19], [20]). We use the aluminum proﬁles andimplement an experiment similar to the docking problem,where the robot should slide a tool out of one proﬁle andinsert it into another one. The tool can be taken out of theproﬁle by sliding it from either side but cannot be pulleddirectly out, as shown in the most top-left picture of Figure 7,which requires multiple modes. To evaluate extrapolation, weﬁxed the goal proﬁle pose and only rotated the initial proﬁleon the table to collect human demonstrations, and sampledthe testing queries from the whole working space wherethe robot’s arm is reachable. To evaluate multiple modes, we intentionally demonstrated two different motions for onequery.The Figure 7 shows a complete process of using SALaTfor real robot tasks. 1) We collect demonstrations ξ for task queries qqq s. Only trajectories for queries are shownin the plots for clarity. 2) All motion trajectories are trans-formed into two local frames T and T . 3) RealNVPs aretrained on local trajectories for each local frame; 4) We drawlatent trajectories from the latent distribution and transformthem back to local frames. 5) We learn the attention modeland generate motion trajectories for any new task queries qqq ∗ and ﬁnally execut trajectories on the robot. With goodextrapolation ability, the learned SALaT for the aluminumproﬁle task can be directly used for other tasks such asinserting a cup brush into a container as shown in Figure 1.V. C ONCLUSION

In this paper, we introduce a new model, the shift atten-tion latent transformation (SALaT), that consists of locallatent transformations and an attention model. The locallatent transformations solve the multiple modes problem thatexists in human demonstrations, while the attention modelimproves the extrapolation ability for motion generation.We can consider the local latent representation given byRealNVPs as the vocabulary, based on which the attentionmodel generates motion trajectories for the task. However,like TP-GMM, SALaT can only work in the robot taskspace and requires further transformations for the mappingbetween different spaces. In the future, we will solve howto automatically learn the transformations to replace thelinear transformations in SALaT. We also want to explorethe possibility of the model in the context of reinforcementlearning.

EFERENCES[1] A. G. Billard, S. Calinon, and R. Dillmann,

Learning from Humans .Springer International Publishing, 2016, pp. 1995–2014.[2] A. Ude, A. Gams, T. Asfour, and J. Morimoto, “Task-speciﬁc general-ization of discrete and periodic dynamic movement primitives,”

IEEETransactions on Robotics , vol. 26, pp. 800–815, Oct. 2010.[3] D. Forte, A. Gams, J. Morimoto, and A. Ude, “On-line motionsynthesis and adaptation using a trajectory database,”

Robotics andAutonomous Systems , vol. 60, p. 1327–1339, 10 2012.[4] B. da Silva, G. Konidaris, and A. Barto, “Learning parameterizedskills,” in

Proceedings of the Twenty Ninth International Conferenceon Machine Learning , June 2012.[5] R. Pahic, A. Gams, A. Ude, and J. Morimoto, “Deep encoder-decodernetworks for mapping raw images to dynamic movement primitives,” , pp. 1–6, 2018.[6] Y. Zhou, J. Gao, and T. Asfour, “Movement primitive learning andgeneralization: Using mixture density networks,”

IEEE Robotics andAutomation Magazine , vol. 27, no. 2, pp. 22–32, 2020.[7] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”

CoRR , vol. abs/1312.6114, 2014.[8] I. Kobyzev, S. Prince, and M. Brubaker, “Normalizing ﬂows: Anintroduction and review of current methods,”

IEEE Transactions onPattern Analysis and Machine Intelligence , pp. 1–1, 2020.[9] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation usingreal nvp,” arXiv preprint arXiv:1605.08803 , 2016.[10] K. Sohn, X. Yan, and H. Lee, “Learning structured output repre-sentation using deep conditional generative models,” ser. NIPS’15.Cambridge, MA, USA: MIT Press, 2015, p. 3483–3491.[11] L. Ardizzone, J. Kruse, S. J. Wirkert, D. Rahner, E. Pellegrini, R. S.Klessen, L. Maier-Hein, C. Rother, and U. K¨othe, “Analyzing inverseproblems with invertible neural networks,”

ArXiv , vol. abs/1808.04730,2019.[12] S. Calinon, T. Alizadeh, and D. G. Caldwell, “On improving theextrapolation capability of task-parameterized movement models,” in , Nov 2013, pp. 610–616. [13] S. Calinon, “A tutorial on task-parameterized movement learning andretrieval,”

Intelligent Service Robotics , vol. 9, no. 1, pp. 1–29, Jan2016.[14] T. Alizadeh, S. Calinon, and D. G. Caldwell, “Learning from demon-strations with partially observable task parameters,” in , 2014,pp. 3309–3314.[15] Y. Huang, J. Silv´erio, L. Rozo, and D. Caldwell, “Generalized task-parameterized skill learning,” , pp. 1–5, 2018.[16] A. Sena, B. Michael, and M. Howard, “Improving task-parameterisedmovement learning generalisation with frame-weighted trajectorygeneration,” , pp. 4281–4287, 2019.[17] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net-works,”

IEEE Transactions on Signal Processing , vol. 45, no. 11, pp.2673–2681, 1997.[18] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “Onthe properties of neural machine translation: Encoder-decoder ap-proaches,” 09 2014.[19] T. Asfour, L. Kaul, M. W¨achter, S. Ottenhaus, P. Weiner, S. Rader,R. Grimm, Y. Zhou, M. Grotz, F. Paus, D. Shingarey, and H. Haubert,“Armar-6: A collaborative humanoid robot for industrial environ-ments,” in , 2018, pp. 447–454.[20] T. Asfour, M. W¨achter, L. Kaul, S. Rader, P. Weiner, S. Ottenhaus,R. Grimm, Y. Zhou, M. Grotz, and F. Paus, “Armar-6: A high-performance humanoid for human-robot collaboration in real worldscenarios,”