[PDF] Dynamic Movement Primitives in Robotics: A Tutorial Survey

Abstract

Biological systems, including human beings, have the innate ability to perform complex tasks in versatile and agile manner. Researchers in sensorimotor control have tried to understand and formally define this innate property. The idea, supported by several experimental findings, that biological systems are able to combine and adapt basic units of motion into complex tasks finally lead to the formulation of the motor primitives theory. In this respect, Dynamic Movement Primitives (DMPs) represent an elegant mathematical formulation of the motor primitives as stable dynamical systems, and are well suited to generate motor commands for artificial systems like robots. In the last decades, DMPs have inspired researchers in different robotic fields including imitation and reinforcement learning, optimal control,physical interaction, and human-robot co-working, resulting a considerable amount of published papers. The goal of this tutorial survey is two-fold. On one side, we present the existing DMPs formulations in rigorous mathematical terms,and discuss advantages and limitations of each approach as well as practical implementation details. In the tutorial vein, we also search for existing implementations of presented approaches and release several others. On the other side, we provide a systematic and comprehensive review of existing literature and categorize state of the art work on DMP. The paper concludes with a discussion on the limitations of DMPs and an outline of possible research directions.

Full PDF

DDynamic Movement Primitives inRobotics: A Tutorial Survey

SAGE

Matteo Saveriano , Fares J. Abu-Dakka , Aljaˇz Kramberger , and Luka Peternel Abstract

Biological systems, including human beings, have the innate ability to perform complex tasks in versatile and agilemanner. Researchers in sensorimotor control have tried to understand and formally deﬁne this innate property. Theidea, supported by several experimental ﬁndings, that biological systems are able to combine and adapt basic unitsof motion into complex tasks ﬁnally lead to the formulation of the motor primitives theory. In this respect, DynamicMovement Primitives (DMPs) represent an elegant mathematical formulation of the motor primitives as stable dynamicalsystems, and are well suited to generate motor commands for artiﬁcial systems like robots. In the last decades,Dynamic Movement Primitives (DMPs) have inspired researchers in different robotic ﬁelds including imitation andreinforcement learning, optimal control, physical interaction, and human–robot co-working, resulting a considerableamount of published papers. The goal of this tutorial survey is two-fold. On one side, we present the existing DMPsformulations in rigorous mathematical terms, and discuss advantages and limitations of each approach as wellas practical implementation details. In the tutorial vein, we also search for existing implementations of presentedapproaches and release several others. On the other side, we provide a systematic and comprehensive review of existing literature and categorize state of the art work on DMP. The paper concludes with a discussion on the limitationsof DMPs and an outline of possible research directions.

Keywords

Motor control of artiﬁcial systems, Movement primitives theory, Dynamic movement primitives, Learning fromdemonstration

How biological systems, like humans and animals, executecomplex movements in a versatile and creative manner?

In the past decades, researchers of neurobiology and motorcontrol have made a signiﬁcant effort trying in to answerthis research question and their experimental ﬁndings leadto the formulation of the motor or motion primitives theory .The motion primitives theory explains the execution ofcomplex motion with the ability of biological systems ofsequencing and adapting units of actions, the so-calledmotion primitives (Mussa-Ivaldi 1999; Flash and Hochner2005).Dynamic Movement Primitives (DMPs) have their roots inthe motor control of biological systems and can be seen as arigorous mathematical formulation of the motion primitivesas stable nonlinear dynamical systems (Schaal 2006a,b). Inthis respect, DMPs represent one of the ﬁrst attempts toanswer the research question: How artiﬁcial systems, like (humanoid) robots, can executecomplex movements in a versatile and creative manner?

Beyond their biological motivation, DMPs have a simpleand elegant formulation, guarantee convergence to a giventarget, are sufﬁciently ﬂexible to create complex behaviors,are capable of reacting to external perturbations in real-time,and can be learned from data using efﬁcient algorithms.These properties explain the “success” of DMPs in roboticapplications, where they have established as a prominent tool for learning and generation of motor commands. Sincetheir formulation in the pioneering work from Ijspeert et al.(Ijspeert et al. 2001, 2002c), DMPs have been successfullyexploited in a variety of applications, becoming de facto theﬁrst approach that novices in the Imitation Learning (IL) ﬁelduse on their robots.

The popularity of DMPs resulted in a large amount ofwork that use, modify, or extend the original formulationof Ijspeert and colleagues. In this paper, we name classicalDMPs the DMP formulation initially presented in (Ijspeertet al. 2001) and further reﬁned in (Ijspeert et al. 2002c,b). Asshown in Table 1, some tutorials and surveys already tried tocategorize and review existing work on DMPs. Department of Computer Science and Digital Science Center,University of Innsbruck, Innsbruck, Austria Intelligent Robotics Group, Department of Electrical Engineering andAutomation (EEA), Aalto University, Espoo, Finland. SDU Robotics, the Maersk McKinney Moller Institute, University ofSouthern Denmark, Odense, Denmark. Delft Haptics Lab, Department of Cognitive Robotics, Delft University ofTechnology, Delft, The Netherlands

Corresponding author:

Fares J. Abu-Dakka, Intelligent Robotics Group, Department of ElectricalEngineering and Automation (EEA), Aalto University, Maarintie 8, 02150Espoo, Finland.Email: fares.abu-dakka@aalto.ﬁ

Prepared using sagej.cls [Version: 2017/01/17 v1.20] a r X i v : . [ c s . R O ] F e b Journal Title XX(X)

Table 1.

Comparison between existing reviews and tutorial about DMPs and our tutorial survey.

Survey/Tutorial Topics Description (Schaal et al. 2007) • Classical DMPs • Online adaptation • Optimization A tutorial that provides a unifying view on the two main approachesused to develop computational motor control theories, namely differentialequations and optimal control. In this work, discrete and rhythmic DMPs(Ijspeert et al. 2002c,b) are presented as a computational model ofthe motor primitives theory (Mussa-Ivaldi 1999) that uniﬁes nonlineardifferential equations and optimal control. The tutorial has a sectiondedicated to DMP parameters optimization beyond ILs. Schaal et al.show how to optimize DMP parameters to minimize various costsdescribing, for instance, the total jerk of the trajectory or the end-pointvariance.(Ijspeert et al. 2013) • Classical DMPs • Generalization • Online adaptation • Coupling terms A tutorial on classical DMPs that presents both discrete and rhythmicformulations, mostly developed in (Ijspeert et al. 2002c,b,a), andtheir application in IL and movement recognition. The tutorial alsopresents extensions of the classical DMP formulation to prevent highaccelerations at the beginning of the motion, to avoid collisions withunforeseen obstacles (Pastor et al. 2009), and to generalize both inspace ( e.g., reach a different goal) and time (e.g., produce longer/shortertrajectories).(Pastor et al. 2013) • Classical DMPs • Online adaptation • Coupling terms • Impedance learning A tutorial on classical DMPs that presents both discrete and rhythmicformulations, mostly developed in (Ijspeert et al. 2002c,b,a). The tutorialalso presents extensions of the classical DMP formulation to avoidcollisions with unforeseen obstacles (Pastor et al. 2009) and to learnimpedance control policies via Reinforcement Learning (RL) (Buchliet al. 2011b). The key difference between this tutorial and the one from(Ijspeert et al. 2013) is the section dedicated to sensory associationand online, context-aware adaptation of DMP trajectories using theassociative skill memory framework developed in (Pastor et al. 2011;Pastor et al. 2011).(Deniˇsa et al. 2016b) • Classical DMPs • CMPs A tutorial on CMPs, a framework developed to generate compliant robotbehaviors that accurately track a reference trajectory. CMPs exploitclassical DMPs to generate the desired kinematic landscape and encodetask-dependent dynamics as a combination of Gaussian basis functions(torque primitives). The tutorial show how to learn torque primitives fromtraining data, how to generalize CMPs to new situations, and how tocombine existing CMPs to synthesize new robot motions.

Survey and Tutorial Topics Description

This paper DMP tutorial • Classical • Orientation • SPD • Joining • Generalization • Online adaptationDMP survey • (Co-)Manipulation • Variable impedance • Physical interaction • Rehabilitation • Teleoperation • Motion recognition • Reinforcement, deep,and lifelong learning This tutorial survey conducts a wide scan of the existing DMP literaturewith the aim of categorizing and presenting the published work in theﬁeld. The main objective of this comprehensive literature review is givethe reader an exhausting overview on DMP related research, on itsmajor achievements, as well as on open issues and possible researchdirections. Our tutorial survey also provides a structured and uniﬁedformulation for different methods developed starting from the classicalDMPs proposed by (Ijspeert et al. 2002c,b). We believe that suchformulation contributes to easier the understanding of different methodsand extension that can be found in the literature, clarifying connectionsand differences among the existing approaches. The tutorial surveyalso provides an analysis on pros and cons of various methods and adiscussion with guidelines for different application scenarios.

Schaal et al. (2007) presented the classical DMPs as anattempt to unify nonlinear dynamical systems and optimalcontrol theory, i.e., the two prominent frameworks usedto derive computational models of neuro-biological motortheories (Mussa-Ivaldi 1999; Flash and Hochner 2005).In their tutorial paper, Ijspeert et al. (2013) presented ahomogeneous formulation of rhythmic and discrete DMPs together with some extensions including coupling terms,generalization to different goal, and online adaptationfor collision avoidance. They also described possibleapplications in IL and motion recognition methods. In thesame year, Pastor et al. (2013) published their tutorial onclassical DMPs with a special focus on online adaptation ofthe DMP attractor landscape by integrating the perceptual

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Dynamic Movement Primitives (DMPs)Tutorial

Sections 2 and 3

Formulation Extentions Survey

Sections 4 and 5

Integration Applications

Classicaldiscrete andperiodicOrientationSymmetricpositivedeﬁnite matrix GeneralizationJoiningOn-lineadaptationAlternativeformulations ManipulationtasksVariableimpedanceReinforcementlearningDeeplearningLife-longlearning Contactwith passiveenvironmentsHuman–robot co-manipulationHumanassistance,augmentation& rehabilitationTeleoperationHigh degreesof freedomMotionanalysis andrecognitionAutonomousdriving andﬁeld robotics

Discussion (Section 6)Guidelines Available code Open issues

Figure 1.

The structure of this tutorial survey on DMPs. information into the action generation process. Later on,Deniˇsa et al. (2016b) reviewed the so-called

CompliantMovement Primitive (CMP), which was ﬁrst introduced byPetriˇc et al. (2014a). CMPs combine classical DMP togenerate the desired kinematic path and torque primitives —a weighted summation of Gaussian basis functions—togenerate task-speciﬁc dynamics. As shown in the review,CMPs are capable of accurately tracking the kinematicpath in a compliant manner, which makes them well suitedfor tasks that require interaction of the robot with theenvironment.However the above-mentioned reviews and tutorialsprimarily focused on the methods and advancements withintheir respective research group and/or focused on a speciﬁcproblem or ﬁeld of application. On the other hand, the DMPsrelated literature is extensive and broad, with contributionsfrom many research groups that made advancements inseveral important ﬁelds of application. Therefore, theproposed survey and tutorial on DMPs aims to scan a widerrange and present a tutorial with uniﬁed and structuredformulations for various DMPs methods and advancements up to date. This should make it clearer for the users to see thedifferences and connections between various methods, andcan contribute to easier application. In addition, we providea more comprehensive and categorised survey of all majorDMPs application areas in robotics. This can help to inspirethe readers to apply the DMPs in various areas.In the tutorial part, we present mathematical formulations,implementation details, and potential issues of existing DMPformulations starting from the classical DMPs presented in(Ijspeert et al. 2002c,b) up to recent extensions of DMPsto Riemannian geometry and Symmetric Positive Deﬁnite(SPD) matrices (Abu-Dakka and Kyrki 2020). In the surveypart, we meticulously review existing literature on DMPs ina comprehensive and methodological manner by focusing onthe quality and signiﬁcance of their continuations withoutputting a bias on any particular research group. Details onthe systematic review procedure are given as follows.

Prepared using sagej.cls

Journal Title XX(X)

We preformed am automatic search for documents contain-ing the string

Dynamic Movement Primitive in Scopus on November returned papers. Wefound that Scopus lists papers only from on. Therefore,we manually track related work from (preliminarywork on DMPs) to . We further reﬁned the search on February to include last minute papers.We manually inspect all the papers and removed the onesthat do not explicitly use DMPs and that only compareagainst DMP in their literature review. The ﬁrst and foremostselection criteria were the technical quality of work and thesigniﬁcance of the contribution with respect the DMP state-of-the-art prior to the publication of any particular paper. Inother words, we asked the question ’did the paper make asigniﬁcant step change in the ﬁeld?’. Therefore, we discardedpapers that presented similar (or same) ideas multiple times,or that made insigniﬁcant improvements to the state-of-the-art. If multiple papers presented the same/similar idea, weincluded the one with the most comprehensive technicalquality, and if the quality was similar, the next decidingfactors were publication in more prestigious journals/venuesor the most cited ones. This manual selection led to the 276papers on DMPs (out of a total of 328 references) analyzedin this work.

The systematic review of DMP literature lead to thetaxonomy shown in Fig. 1, which also describes the structureof this paper. DMPs are placed at the root of the tree andbranch into two nodes, namely the tutorial and the survey . Inthe tutorial part we present different DMP formulations andextensions in rigorous mathematical terms.The tutorial part spans Sections 2 and 3. Section 2embraces DMPs formulations for discrete and periodic motions, orientation trajectories, and

SPDs matrices.Section 3 discusses extensions of the DMP formalismto account for skills generalization , joining of multipleprimitives, online adaptation based on force feedback orreference velocity. The section ends with a short descriptionof DMP related formulations.The survey part spans Sections 4 and 5. Section 4 presentsDMPs integration in larger executive frameworks for manip-ulation and variable impedance tasks, reinforcement , deep ,and life-long learning . Section 5 presents DMPs in differ-ent robotic applications including physical interaction , co-manipulation , rehabilitation , teleoperation , motion recogni-tion , humanoids and ﬁeld robotics , and autonomous driving .The paper ends with a discussion (Section 6) of presentedapproaches with the aim of providing, where possible, guide-lines to select the most suitable DMP approach for speciﬁcneeds. We have also collected available DMP implementa-tions (see Table 4) and contributed to the community withfurther open source implementations available at https://gitlab.com/dmp-codes-collection . Section 6terminates with a discussion on open issues and possibleresearch directions. Our paper has several key contributions that are summarizedas follows.Concerning the tutorial part:• We present the classical DMP formulation andexisting variations of this formulation in a uniﬁedmanner with rigorous mathematical terms, providingimplementation details and discussing advantages andlimitations of different approaches (Section 2).• We describe advanced approaches where DMPs areintegrated into sophisticated control and/or largerexecutive frameworks (Section 3).• We release to the community several implementationsof described approaches. Detailed information onthese code repositories are provided in Table 4 andSection 6. Moreover, we search for existing open-source implementations of the presented formulationsand list them in our repository (Section 6.2).Concerning the survey part:• We perform a systematic literature search to providea comprehensive and unbiased review of the topic(Sections 4 and 5).• We categorize existing work on DMPs into differentstreams and highlight prominent approaches in eachcategory (Fig. 1 and Sections 4 and 5).• We present guidelines to select the the most suitableapproach for different applications, discuss limitationsinherent to the DMP formalism, and highlight openissues and possible research directions (Section 6).

In this section, we will provide a complete description of thestandard formulation of DMPs. Speciﬁcally, point attractorsformulation—to encode discrete point-to point motions—in Section 2.1, and cycle attractors formulation—to encoderhythmic-patterns motions—in Section 2.2. For a betterunderstanding, we have summarized the key notations andthe used abbreviations in Table 2.

The discrete DMP is used to encode a point-to-point motioninto a stable dynamical system. In the following subsections,we will go through the formulation and main features ofdiscrete DMPs starting by the classical one operating in R space (Section 2.1.1), then passing by Cartesian space— S and SO (3) —in Section 2.1.2, and ending by DMPformulation for SPD space ( S m ++ ) in Section 2.1.3. The classical discrete DMPs were ﬁrst introduced by Ijspeertet al. (2002c). A DMP for a single DoF trajectory y of a discrete movement (point-to-point) is deﬁned by thefollowing set of nonlinear differential equations (Ijspeertet al. 2002c, 2013) τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) , (1) τ ˙ y = z, (2) τ ˙ x = α x x, (3) Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Table 2.

Description of key notations and abbreviations. Indices, super/subscripts, constants, and variables have the samemeaning over the whole text. N (cid:44) i (cid:44) index : i = 1 , , . . . , NJ (cid:44) j (cid:44) index : j = 1 , , . . . , JL (cid:44) l (cid:44) index : l = 1 , , . . . , LV (cid:44) v (cid:44) index : v = 1 , , . . . , V T (cid:44)  (cid:44) index :  = 1 , , . . . , T m (cid:44) dimensions of S m ++ n (cid:44) dimensions of R n {·} d (cid:44) subscript for desired value {·} q or {·} q (cid:44) quaternion related variable {·} R or {·} R (cid:44) rotation matrix related variable {·} ++ , {·} + or {·} + (cid:44) SPD related variable {·} g (cid:44) subscript for goal value α z , β z , α x , α s , α g , α yx , α qg (cid:44) positive gains τ (cid:44) time modulation parameter c i , h i (cid:44) centers and widths of Gaussians T (cid:44) time duration t (cid:44) continuous time λ (cid:44) forgetting factor r (cid:44) amplitude modulation parameter x (cid:44) phase variable y, ˙ y (cid:44) trajectory data and its 1st derivative s (cid:44) sigmoidal decay phase z, ˙ z (cid:44) scaled velocity and acceleration p (cid:44) piece-wise linear phase g , g q , g + (cid:44) attractor point ( goal ) in different spaces ω (cid:44) angular velocity ˆ g , ˆ g q and ˜ g , ˜ g q (cid:44) moving target and delayed goal function indifferent spaces Q t , ˙ Q t (cid:44) joint position, its 1st time-derivative g v (cid:44) intermediate attractor ( via-goal ) q , ˙q (cid:44) unit quaternion, its 1st time-derivative R , ˙R (cid:44) rotation matrix, its 1st time-derivative f , f q , f R , f q , F + (cid:44) forcing term for different spaces w i (cid:44) adjustable weights Ψ i (cid:44) basis functions θ and ϑ (cid:44) an angle and learnable parameters S m ++ (cid:44) m × m SPD manifold

Sym m (cid:44) m × m symmetric matrices space M (cid:44) a Riemannian manifold X (cid:44) an arbitrary SPD matrix T Λ M (cid:44) a tangent space of M at an arbitrary point Λ M (cid:44) the mean of { X t } T t =1 (cid:37) = Log Λ ( Υ ) (cid:44) M (cid:55)→ T Λ M , maps an arbitrary point Υ ∈M into (cid:37) ∈ T Λ M Υ = Exp Λ ( (cid:37) ) (cid:44) T Λ M (cid:55)→ M , maps (cid:37) ∈ T Λ M into Υ ∈ M vec ( · ) (cid:44) a function transforms Sym m into R n usingMandel’s notation. mat ( · ) (cid:44) a function transforms R n into Sym m usingMandel’s notation. k , K , K P , K O (cid:44) different forms of stiffness gains D, D V , D W (cid:44) different forms of damping gains M and I (cid:44) mass and inertia matrices F , f e and τ e (cid:44) forces and external forces and torques DMP Dynamic Movement Primitive IL Imitation LearningRL Reinforcement Learning SPD Symmetric Positive DeﬁniteDoF Degree of Freedom RBF Radial Basis FunctionLWR Locally Weighted Regression GMM Gaussian Mixture ModelGMR Gaussian Mixture Regression GP Gaussian ProcessNN Neural Network VMP Via-points Movement PrimitiveProMP Probabilistic Movement Primitives LfD Learning from DemonstrationGPR Gaussian Process Regression MoMP Mixture of Motor PrimitivesEMG Electromyography ILC Iterative Learning ControlVIC Variable Impedance Control VILC Variable Impedance Learning ControlPI Policy Improvement with Path Integrals CMA-ES Covariance Matrix Adaptation-Evolution StrategiesCC-DMP Coordinate Change-DMPs RBF-NN Radial Basis Function-Neural NetworkPoWER Policy Learning by Weighting Exploration with the Returns HRL Hierarchical RLAEDMP AutoEncoded DMP CNN Convolutional Neural NetworkGPDMP Global Parametric Dynamic Movement Primitive UAV Unmanned Areal Vehicle where x is the phase variable and z is an auxiliary variable.Parameters α z and β z deﬁne the behavior of the second ordersystem described by (1) and (2). With the choice τ > , α z = 4 β z and α x > , the convergence of the underlyingdynamic system to a unique attractor point at y = g , z = 0 is ensured (Ijspeert et al. 2013). Alternatively, thegains α z and β z can be learned from training data whilepreserving the convergence of the system (Tan et al. 2016).In the DMP literature, equations (1)–(2), as well as theirperiodic counterpart (33)–(34), are called the transformationsystem , while (3) (or (35)) is the canonical system . f ( x ) is deﬁned as a linear combination of N nonlinear RadialBasis Functions (RBFs), which enables the robot to followany smooth trajectory from the initial position y to the ﬁnalconﬁguration g f ( x ) = (cid:80) Ni =1 w i Ψ i ( x ) (cid:80) Ni =1 Ψ i ( x ) x, (4) Ψ i ( x ) = exp (cid:16) − h i ( x − c i ) (cid:17) , (5) where c i are the centers of Gaussian basis functionsdistributed along the phase of the movement and h i theirwidths. For a given N and setting τ equal to the totalduration of the desired movement, we can deﬁne c i = exp (cid:0) − α x i − N − (cid:1) , h i = c i +1 − c i ) and h N = h N − where i = 1 , . . . , N . For each DoF, the weights w i should beadjusted from the measured data so that the desired behavioris achieved. The selection of the number of weights shouldbe based on the desired resolution of the trajectory. Forcontrolling a robotic system with more than one DoF,we represent the movement of every DoF with its ownequation system (1)–(2), but with the common phase (3) tosynchronize them. For a discretemotion, given a demonstrated trajectory y d ( t  ) , t  =1 , . . . , T and its time derivatives ˙ y d ( t  ) and ¨ y d ( t  ) , it ispossible to invert (1) and approximate the desired shape f d as f d ( t  ) = τ ¨ y d ( t  ) − α z ( β z ( g − y d ( t  )) − τ ˙ y d ( t  )) . (6) Prepared using sagej.cls

Journal Title XX(X)

Figure 2.

A classical DMP is used to generate a discretemotion connecting x = 0 and g = 1 (green line in the top leftpanel). The training data (black dashed lines) are obtained froma minimum jerk trajectory connecting x and g in T = 1 s andused to learn the weights w i of Gaussian basis functionsequally distributed in time. The results of the parameterslearning procedure are shown in the bottom right panel. Theexponentially decaying phase variable is used as shown in themiddle right panel. Results are obtained with the open sourceimplementation available at . By stacking each f d ( t  ) and w i into the column vectors F = (cid:2) f d ( t ) , . . . , f d ( t T ) (cid:3) (cid:62) and w = (cid:2) w , . . . , w N (cid:3) (cid:62) , we obtainthe following linear system Φ w = F , (7)where Φ =  Ψ ( x ) (cid:80) Ni =1 Ψ i ( x ) x · · · Ψ N ( x ) (cid:80) Ni =1 Ψ i ( x ) x ... . . . ... Ψ ( x T ) (cid:80) Ni =1 Ψ i ( x T ) x T · · · Ψ N ( x T ) (cid:80) Ni =1 Ψ i ( x T ) x T  . (8)Locally Weighted Regression (LWR) (Atkeson et al. 1997;Schaal and Atkeson 1998; Ude et al. 2010) is a popularapproach used to update the weights w i . LWR uses the errorbetween the desired trajectory shape and currently learnedshape and a forgetting factor λ to update the weights as P  = 1 λ (cid:32) P  − − P  − ϕ  ϕ  (cid:62) P  − λ + ϕ  (cid:62) P  − ϕ  (cid:33) , (9) w  = w  − + ( f d ( t  ) − ϕ  (cid:62) w  − ) P  ϕ  . (10)In the previous equations w  = w ( t  ) and ϕ  is the columnvector obtained by transposing the  -th row of Φ . The initialvalue of the parameters is P = I , w = . A discrete DMPlearned on synthetic data is shown in Figure 2.LWR has been the standard method to learn the weights ofDMPs and therefore f ( x ) . As an alternative to LWR, (Krugand Dimitrovz 2013) have shown that learning a forcingterm deﬁned as in (4) can be formulated as a quadraticoptimization problem and efﬁciently solved. Figure 3.

Possible phase variables used in different discreteDMP formulations. All the different possibilities ensure that x, s, p → for t → + ∞ (for t > T in practice). In general, the problem of learning and retrieving f ( x ) can be in principle solved with any regression technique(Stulp et al. 2013). For instance, Wang et al. (2016) modiﬁed f ( x ) in (4) by considering a bias term b i , i.e., w i x + b i , and used truncated kernels ( Ψ i vanishes if x − c i issmaller than a threshold). This formulation, called DMP+,produces more accurate trajectories than the original DMP.Moreover, a learned trajectory can be modiﬁed by updatingonly a subset of the weights. Other work focused on usingmultiple demonstrations to increase the generalization powerof the learned primitive. To learn a suitable forcing termfrom multiple demonstrations, some authors used GaussianMixture Model (GMM) (Yin and Chen 2014; Pervez et al.2017a) and Gaussian Mixture Regression (GMR) (Cohnet al. 1996), while others adopted Gaussian Process (GP)(Fanger et al. 2016; Umlauft et al. 2017) (Rasmussen andWilliams 2006), or exploited a deep Neural Network (NN)(Pervez et al. 2017b; Pahiˇc et al. 2020) developed originallyin (LeCun et al. 2015). The phasevariable x in (3) provides the ability to manipulate timeduring the execution of DMP equations. Moreover, DMPprovides the ability to slow-down or even stop the executionthrough the phase-stopping mechanism (Ijspeert et al. 2002c) τ ˙ x = − α x x α yx || ˜ y − y || (11)Moreover, DMPs provide an elegant way to adapt thetrajectory generation in real-time through goal switchingmechanisms (Ijspeert et al. 2013) τ ˙ g = α g ( g − g ) (12)DMPs in its standard formulation are not suitable fordirect encoding of skills with speciﬁc geometry constraints,such as orientation proﬁles (represented in either unitquaternions or rotation matrices), stiffness/damping andmanipulability proﬁles (encapsulated in full SPD matrices).For instance, direct integration of unit quaternions, does notensure the unity of the quaternions norm. Any representationof orientation that does not contain singularities is non-minimal, which means that additional constraints need to betaken into account during integration. Equation (3)describes an exponential decaying phase variable that hasbeen widely used in the DMP literature. The main drawbackof the exponential decaying phase is that it rapidly dropsto very small values towards the end of the motion. This

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel “forces” the learning algorithm to exploit relatively highweights w i to accurately reproduce the last part of thedemonstration (Samant et al. 2016). As an example, inFigure 3 the exponential decaying phase (brown dot-dashedline) is very small already after . s, while the expectedtime duration of the motion is T = 1 s.To overcome this limitation, Kulvicius et al. (2011)propose the sigmoidal decay phase s (green solid line inFigure 3), obtained by integrating ˙ s = − α s e ( α s /δ t )( τT − t ) [1 + e ( α s /δ t )( τT − t ) ] , (13)where α s deﬁnes the steepness of s centered at time T and δt is the sampling time. As shown in Figure 3, s = 1 for t < T − δ s , where the time δ s depends on the steepness α s ,and then it decays to s = 0 .The sigmoidal decay in Figure 3 has a tail effect sinceit vanishes after T + δ s s, where δ s depends on the tunableparameter α s . The piece-wise linear phase l (blue dashedline in Figure 3), proposed by (Samant et al. 2016), linearlydecays from to in exactly T s and then remains constant. p is obtained by integrating τ ˙ p = (cid:40) − T , p ≥ , otherwise (14)where p (0) = 1 and T is the time duration of the motion. The classical DMP formulation described in Section 2.1.1applies to single DoF motions. Multidimensional motionsare generated independently and synchronized with acommon phase. In other words, equations (1) and (2) arerepeated for each DoF while the phase variable in (3) isshared. This works when the evolution of different DoFis independent, like for joint space or Cartesian positiontrajectories. Unlike Cartesian position, the elements oforientation representations like unit quaternion or rotationmatrix are constrained. In this section, we present approachesthat extend the classical DMP formulation to representCartesian orientations.

Unit quaternion q = ν + u ∈ S provides a representation of the orientation of the robot’send-effector (Chiaverini and Siciliano 1999). S is a unitsphere in R , ν ∈ R , and u ∈ R . Abu-Dakka et al. (2015a)rewrote DMP equations (1) and (2) for direct unit quaternionencoding as follows τ ˙ η = α z ( β z Log q ( g q ∗ q ) − η ) + f q ( x ) , (15) τ ˙q = 12 η ∗ q , (16)where g q ∈ S denotes the goal orientation, the quaternionconjugation is deﬁned as q = ν + u = ν − u , and ∗ denotesthe the quaternion product q ∗ q = ( ν + u ) ∗ ( ν + u )= ( ν ν − u (cid:62) u ) + ( ν u + ν u + u × u ) . η ∈ R is the scaled angular velocity ω and treated as unitquaternion with zero scalar ( ν = 0) in (16). The function Time [s]Time [s] 50 10-0.08-0.0400.50.350.65 50 10-0.0300.03 50 10 10.550.1 50 1010.50 50 10 500-500-1500-2500 50 10 [ r ad / s ] [ r ad / s ] Figure 4.

A unit quaternion DMP is used to generate a discretemotion connecting q and g q . The training data (black dashedlines) are obtained from a minimum jerk trajectory connecting q and g q in T = 10 s and used to learn the weights w i of Gaussian basis functions equally distributed in time. The resultsof the parameters learning procedure are shown in the bottomright panel. The exponentially decaying phase variable is usedas shown in the middle right panel. Results are obtained withthe open source implementation available at https://gitlab.com/dmp-codes-collection . Log q ( · ) : S (cid:55)→ R is given asLog q ( q ) =  arccos( ν ) u || u || , u (cid:54) = [0 0 0] (cid:62) , otherwise , (17)where || · || denotes (cid:96) norm.Early attempt to encode unit quaternion proﬁles usingDMP was presented by Pastor et al. (2011). Unlike Abu-Dakka et al.’s formulation, Pastor et al.’s does not take intoaccount the geometry of SO (3) as they just used the vectorpart of the quaternion product ( g q ∗ q ) in (15) instead of Log q ( g q ∗ q ) which deﬁnes the angular velocity ω thatrotates quaternion q into g q within a unit sampling time.Equation (16) can be integrated as q ( t + δt ) = Exp q (cid:18) δt η ( t ) τ (cid:19) ∗ q ( t ) , (18)where δ t > denotes a small constant. The functionExp q ( · ) : R (cid:55)→ S is givenExp q ( ω ) =  cos( || ω || ) + sin( || ω || ) ω || ω || , ω (cid:54) = (cid:62) , otherwise . (19)Both mappings become one-to-one, continuously differ-entiable and inverse to each other if the input domain of themapping Log q ( · ) is restricted to S except for − (cid:62) ,while the input domain of the mapping Exp q ( ω ) shouldfulﬁll the constraint || ω || < π (Abu-Dakka et al. 2015a). Anexemplar unit quaternion DMP is shown in Figure 4.Phase-stopping (11) can be rewritten as follows τ ˙ x = − α x x α qx d (˜ q , q ) (20) Prepared using sagej.cls

Journal Title XX(X)

Figure 5.

A rotation matrix DMP is used to generate a discretemotion connecting R and R g . The training data (black dashedlines) are obtained from a minimum jerk trajectory connecting R and R g in T = 10 s and used to learn the weights w i of Gaussian basis functions equally distributed in time. The resultsof the parameters learning procedure are shown in the bottomright panel. The exponentially decaying phase variable is usedas shown in the middle right panel. whered (˜ q , q ) = (cid:40) π, q ∗ q = 1 + [0 0 0]2 | Log q ( q ∗ q ) || , otherwiseUde et al. (2014) extended DMP quaternions-basedformulation by rewriting (12) to include goal switchingmechanism. τ ˙g q = α qg Log q ( g q,new − g q ) ∗ g q (21)so that g q is continuously changing onto g q,new in real-time.Equation (21) should be integrated using (19) along with (15)and (16).As shown by Saveriano et al. (2019) using Lyapunovarguments, both the quaternion DMP formulations in (Pastoret al. 2011) and in (Abu-Dakka et al. 2015a; Ude et al. 2014)asymptotically converge to the target quaternion g q with zerovelocity. In their work on orienta-tion DMPs, Ude et al. (2014) extended DMPs formulationin order to encode orientation trajectories represented in theform of rotation matrices R ( t ) ∈ SO (3) . Therefore, theyrewrote (1) and (2) in the form τ ˙ η = α z ( β z Log R ( R g R (cid:62) ) − η ) + f R ( x ) , (22) τ ˙R = [ η ] × R , (23)where R g represents the goal orientation. [ η ] × is a skewsymmetric matrix, such as [ η ] × = − [ η ] × . The relationbetween the angular velocity and 1st-time-derivative of therotation matrix is given by [ ω ] × =  − ω z ω y ω z − ω x − ω y ω x  = ˙ RR (cid:62) . (24) Time [s]Time [s] 500 100-20200-30030 500 100 10.550.1 500 10010.50 500 100 20-2-4 500 100x106

Figure 6.

An SPD DMP is used to generate a discrete motionconnecting X and X g . The training data (black dashed lines)are obtained from a minimum jerk trajectory connecting X and X g in T = 100 s and used to learn the weights w i of Gaussian basis functions equally distributed in time. The conein the upper left corner represents the manifold of SPD data andincludes the geodesic of the SPD proﬁle. The results of theparameters learning procedure are shown in the bottom rightpanel. The exponentially decaying phase variable is used asshown in the middle right panel. Results are obtained with theopen source implementation available at https://gitlab.com/dmp-codes-collection . The function Log R ( · ) : SO (3) (cid:55)→ R is given asLog R ( R ) = (cid:40) [0 , , (cid:62) , R = I ω = θ n , otherwise , (25) θ = arccos (cid:18) trace ( R ) − (cid:19) , n = 12 sin ( θ )  r − r r − r r − r  The generated rotation matrices can be obtained byintegrating (23) as follows R ( t + δt ) = Exp R (cid:18) δt [ η ] × τ (cid:19) R ( t ) . (26)The function Exp R ( · ) : R (cid:55)→ SO (3) is given asExp R ( t [ ω ] × ) = I + sin( θ ) [ ω ] × || ω || + (1 − cos ( θ )) [ ω ] × || ω || . (27)where θ ( t ) = t || ω || express the rotation angle within time t .An exemplar rotation matrix DMP is shown in Figure 5. Abu-Dakka and Kyrki (2020) generalized DMP formulationin order to encode robotic manipulation data proﬁlesencapsulated in form of SPD matrices. By deﬁning X ∈ S m ++ as an arbitrary SPD matrix and Ξ = { t  , X  } T  =1 asthe set of SPD matrices in one demonstration, where S m ++ Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel deﬁnes the set of m × m SPD matrices. Afterwards, we canrewrite (1) and (2) as follows τ ˙ σ = α z ( β z vec ( B X  (cid:55)→ X ( Log + X  ( X g ))) − σ )+ F + ( x ) , (28) τ ˙ ξ = σ , (29)where σ = vec ( Σ ) is the Mandel representation of thesymmetric matrix Σ , where Σ is the time derivativeof Ξ so that Σ ≡ ˙Ξ = ( Log + X  − ( X  )) /δt . The functionLog X  − ( X  ) : M (cid:55)→ T X  − M maps a point X  in themanifold M to a point in the tangent space ∆ ∈ T X  − M . vec ( · ) is a function that transforms a symmetric matrix intoa vector using Mandel’s notation, e.g., a vectorization of a × symmetric matrix is vec (cid:32) (cid:18) a bb d (cid:19) (cid:33) =  ad √ b  . (30) ξ is the vectorization of Ξ . X g ∈ S m ++ represents the goalSPD matrix. vec ( B X  (cid:55)→ X ( Log + X  ( X g ))) is the vectorizationof the transported symmetric matrix Log + X  ( X g ) over thegeodesic from X  to X . Then we integrat (29) as ˆX ( t + δt ) = Exp + X ( t ) (cid:32) B X (cid:55)→ X ( t ) ( mat ( σ ( t ))) τ δt (cid:33) . (31)where the function mat ( · ) is the inverse of vec ( · ) anddenotes to the matricization using Mandel’s notation. ˆX ∈ S m ++ represents the new SPD-matrices-based robot skills.The function Exp + X  − ( ∆ ) : T X  − M (cid:55)→ M maps a point ∆ ∈ T X  − M to a point X  ∈ M , so that it lies on thegeodesic starting from X  − ∈ S m ++ in the direction of ∆ .An exemplar SPD DMP is shown in Figure 6.Moreover, Abu-Dakka and Kyrki (2020) rewrote (12) forsmooth goal adaptation in case of sudden goal switching asfollows τ ˙g + = α g Log + g + new ( g + ) . (32)so g now is updated continually. The periodic DMP (sometimes called rhythmic DMP) areused when the encoded motion follows a rhythmic pattern.

The classical periodic (or rhythmic) DMPs were ﬁrstintroduced by Ijspeert et al. (2002b), where they redeﬁnedthe second order differential equation system described in(1) and (2) as follows ˙ z = Ω ( α ( β ( − y ) − z ) + f ( φ )) , (33) ˙ y = Ω z, (34) τ ˙ φ = 1 , (35)where Ω is the frequency and y is the desired periodictrajectory that we want to encode with a DMP. The maindifference between periodic DMPs and point-to-point DMPsis that the time constant related to trajectory duration isreplaced by the frequency of trajectory execution (refer Figure 7.

A classical DMP is used to reproduce a rhythmicmotion (brown solid line in the top left panel). The desiredtrajectory is obtained by adding Gaussian noise to y d = cos (2 πt ) with t ∈ [0 , s and computing the numericalderivatives with δt = 0 . s (black dashed lines). The forcingterm is obtained as the weighted summation of Gaussianbasis equally distributed in time (bottom left panel). The resultsof the parameters learning procedure are shown in the bottomright panel. Results are obtained with the open sourceimplementation available at . to (Ijspeert et al. 2013, 2002b) for details). In addition, theperiodic DMPs must ensure that the initial phase ( φ = 0 ) andthe ﬁnal one ( φ = 2 π ) coincide in order to achieve smoothtransition during the repetitions.Similar to (4), f ( φ ) is deﬁned with N Gaussian kernelsaccording to the following equation f ( φ ) = (cid:80) Ni =1 Ψ i ( φ ) w i r (cid:80) Ni =1 Ψ i ( φ ) , (36) Ψ i ( φ ) = exp ( h (cos ( φ − c i ) − , (37)where the weights are uniformly distributed along the phasespace, and r is used to modulate the amplitude of the periodicsignal (Ijspeert et al. 2002b; Gams et al. 2009) (if not used,it can be set to r = 1 (Peternel et al. 2016a)).Similarly to discrete DMPs, LWR (Schaal and Atkeson1998) can be used to update the weight to learn a desiredtrajectory. In a standard periodic DMP setting (Ijspeertet al. 2002b; Gams et al. 2009), the desired shape f d isapproximated by solving f d ( t  ) = ¨ y d ( t  )Ω − α z (cid:18) β z ( − y d ( t  )) − ˙ y d ( t  )Ω (cid:19) , (38)where y d is some demonstrated input trajectory that needsto be encoded. The weights w i can be updated using therecursive least-squares method (Schaal and Atkeson 1998)with forgetting factor λ based on the error between the Prepared using sagej.cls Journal Title XX(X)

Table 3.

Summary of DMP basic formulations.

Type of movement Space System of equations Reference Short descriptionDiscrete R τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) τ ˙ y = zτ ˙ x = α x x Eqs. (1)–(3), (Ijspeertet al. 2002c) A single DoF, discrete motiontrajectory is encoded into alinear, second-order dynamicalsystem with an additive, non-linear forcing term. Convergenceto the desired goal g is ensured bya vanishing phase variable x . S τ ˙ η = α z ( β z Log q ( g q ∗ q ) − η ) + f q ( x ) τ ˙q = η ∗ q Eqs. (15)–(16), (Abu-Dakka et al.2015a) A quaternion-based orientationtrajectory ( DoFs) is encodedinto a second-order dynamicalsystem with an additive, non-linear forcing term. The error def-inition complies with the geome-try of the unit quaternions space. SO (3) τ ˙ η = α z ( β z Log R ( R g ∗ R (cid:62) ) − η ) + f R ( x ) τ ˙R = [ η ] × R Eqs. (22)–(23), (Udeet al. 2014) A rotation matrix-basedorientation trajectory ( DoFs)is encoded into a second-orderdynamical system with anadditive, non-linear forcing term.The error deﬁnition complieswith the geometry of the rotationmatrices space. S m ++ τ ˙ σ = α z ( β z vec ( B X l (cid:55)→ X ( Log + X l ( X g ))) − σ ) + F ( x ) τ ˙ ξ = σ Eqs. (28)–(29), (Abu-Dakka andKyrki 2020) An SPD matrices trajectory, m ( m + 1) / DoFs, is encodedinto a second-order dynamicalsystem with an additive, non-linear forcing term. The errordeﬁnition complies with thegeometry of the SPD matricesspace.

Periodic R ˙ z = Ω ( α ( β ( − y ) − z ) + f ( φ ))˙ y = Ω zτ ˙ φ = 1 Eqs. (33)–(35),(Ijspeertet al.2002b) A single DoF, periodic motiontrajectory is encoded into alinear, second-order dynamicalsystem with an additive, non-linear forcing term. The resultingsystem generates a stable limitcycle. desired trajectory shape and currently learned shape w i ( t  +1 ) = w i ( t  ) + Ψ i P i ( t  +1 ) re r ( t  ) , (39) e r ( t  ) = f d ( t  ) − w i ( t  ) r, (40) P i ( t  +1 ) = 1 λ (cid:32) P i ( t  ) − P i ( t  ) r λ Ψ i + P i ( t  ) r (cid:33) . (41)The initial value of the parameters is w i (0) = 0 and P i (0) =1 . The forgetting factor determines the rate of weightchanges. Refer to (Schaal and Atkeson 1998) for details onparameter setting. An exemplar rhythmic DMP is shown inFigure 7.The classical periodic DMP described by (33)–(35) doesnot encode the transit motion needed to start the periodicone. Transients are important in several applications likehumanoid robot walking where usually the ﬁrst step madefrom a rest position is a transient needed to start theperiodic motion. To overcome this limitation, (Ernesti et al.2012) modify the classical formulation of periodic DMPsto explicitly consider transients as motion trajectory thatconverge towards the limit cycle ( i.e., periodic) one. A summary for the existence DMP formulations mentionedin the earlier sections is shown in Table 3. The tableshows the variations of the formulation in its standard shapebased on the space that they are applied to. However, themodiﬁcations of this standard shape ( e.g., adding a couplingterm) is discussed in the next section as an extension of theDMP formulations.

A desirable property of motion primitives is the abilityto generalize to unforeseen situations. In this section,we present approaches that allow to adapt DMP motiontrajectories to novel executive contexts.

Classical DMPs are time invariant, meaning that timescaling ςτ with ς > generate topologically equivalenttrajectories (Ijspeert et al. 2013). Using a simple modiﬁcationof the transformation system, namely substituting (1) with τ ˙ z = α z ( β z ( g − y ) − z ) + ( g − y ) f ( x ) , (42) Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Ijspeert et al. (2013) show that DMP are also scale invariant,meaning that the scaling of the movement amplitude ς ( g − y ) with ς > generates topologically equivalenttrajectories. The purpose of the green color used in (42)is to highlight differences w.r.t. (1). Apart from generatingscaled—in time and space—versions of the demonstratedmotion trajectory, classical DMPs also generalize to differentinitial/target states. However, the classical formulation—andits extension in (42)—may exhibit dangerous behaviors likeover-ampliﬁcation of the trajectory when reaching a differenttarget and high accelerations when switching to a differenttarget on-line (Pastor et al. 2009; Ijspeert et al. 2013). Toalleviate the second issue, Ijspeert et al. (2013) replacedhard goal switches with the smooth switching law as in(12). However, the over-ampliﬁcation issue still remains.Moreover, a DMP that uses (42) fails to learn motions withthe same initial and target states ( i.e., g = y , z = 0 → y ( t ) = y = g ∀ t ).In order to remedy those issues, Pastor et al. (2009)proposed to modify the transformation system as τ ˙ z = α z ( β z ( g − y − ( g − y ) x + f ( x )) − z ) , (43)where the green color is used to highlight differencesbetween (43) and (1). The most important change in thisformulation is the term ( g − y ) x that has several beneﬁts.It prevents high accelerations at the beginning of the motion( g − y − ( g − y ) x = 0 for t = 0 ) or when the goal isclose to the initial state. It allows to reproduce motionswith the same initial and target states and it preventsover-ampliﬁcations and trajectory mirroring effects * whenchanging the goal. Hoffmann et al. (2009) derived amultidimensional representation of (43) from the behaviorof the spinal force ﬁelds in frogs.The goal can also change over time and, in this case, thetracking performance of the DMP mostly depends on thegains α z and β z . As proposed by (Koutras and Doulgeri2020b), the tracking performance can be improved byadapting the temporal scaling τ .Dragan et al. (2015) showed that DMPs solve a trajectoryoptimization problem in order to minimize a particularHilbert norm between the demonstration and the newtrajectory subject to start and goal constraints. In this light,DMP adaptation capabilities to different start and goals canbe improved by choosing (or learning) a proper Hilbert normthat reduces the deformation in the retrieved trajectory. A via-point can be deﬁned as a point in the state spacewhere the trajectory has to pass. Failing to pass a via-pointmay cause the robot to fail the task execution. Therefore,having a motion primitive representation with the capabilityof modulating the via-points is of importance in roboticscenarios. It is not surprising that researchers have extendedthe DMP formulation to consider intermediate via-points inthe trajectory generation process.Ning et al. (2011, 2012) extend the classical DMP tosatisfy position and velocity constraints at the beginningand at the end of a sample trajectory. Their approach totraverse via-points consists of creating a sample trajectory by combining locally-linear trajectories connecting the via-points. This sample trajectory is used to ﬁt a DMP that isconstrained to pass the via-points.Weitschat and Aschemann (2018) considered each via-point as an intermediate goal ( via-goal ) g v for v = 1 , . . . , V to reach. The last via-goal g V corresponded to the target stateof the DMP. In their formulation, they deﬁned a variable goalas g via ( x ) = V (cid:88) v =1 Ψ v ( x ) g v , (44)where Ψ v ( x ) are the Gaussian basis function centered at thetime corresponding to the v − th via-goal. The effectivenessof the approach is demonstrated in a task were the robothas to reach a different target while preventing possible self-collisions of the end-effector with the robot body. To this end,authors place the via-goals along the trajectory used to learnthe DMP, forcing the generated trajectory to stay close to thedemonstration while reaching the new target.The problem of generalizing to via-point close (interpo-lation) and far (extrapolation) from the demonstration isfaced by (Zhou et al. 2019). Their approach, namely Via-points Movement Primitives (VMPs) , combines the beneﬁtsof DMP and Probabilistic Movement Primitivess (ProMPs)(Paraschos et al. 2013). Authors assumed that the motiontrajectory is generated as y vmp ( x ) = e ( x ) + f vmp ( x ) , (45)where x is the phase variable deﬁned as in (3) and theelementary trajectory e ( x ) can be deﬁned as the linearattractor e ( x ) = ( y − g ) x . The shape modulation term f vmp ( x ) is deﬁned as f vmp ( x ) = N (cid:88) i =1 w i Ψ i ( x ) + (cid:15) f (46)where the Gaussian kernels Ψ i ( x ) are deﬁned as in (5), w i are learnable weights, and (cid:15) f is the Gaussian noise.As detailed in (Paraschos et al. 2013), learning theshape modulation term f vmp ( x ) means Learning fromDemonstrations (LfDs) the prior probability distribution ofthe weights w i . Having separated the generated trajectoryinto two parts like in (45) allows to adopt different strategiesto pass a via-point y v at x v . Zhou et al. (2019) proposed tomodify the shape modulation term for interpolation cases–when the via-point is “close” to the demonstrations. Inextrapolation cases, instead, the elementary trajectory e ( x ) isrewritten as the polygonal line connecting y , y v , and g . Thisapproach easily generalizes to the case of multiple via-points.VMPs are experimentally compared with ProMPs, showingbetter performance especially in extrapolation cases. Reaching a different goal, or passing through via-points, maynot be enough to successfully execute a task in a differentcontext. Approaches presented in this section adapt the DMP ∗ As discussed by (Pastor et al. 2009), a transformation system that uses (42)generates a mirrored trajectory while reaching a new goal g new every timethe signs of ( g new − y ) and ( g − y ) differ. Prepared using sagej.cls Journal Title XX(X) [ m ] (a) Position 0.8-0.800 5 10Time [s] [ m / s ] (b) Linear Velocity 0.06-0.0604.6 4.7 4.8Time [s] [ m / s ] (c) Linear Velocity 0 5 10Time [s]1.50.5-0.5 [ m ] (d) Goal Switch 0.00500.010 5 10Time [s] [ m ] (e) Position Error0.8-0.800 5 10Time [s] (f)

Quaternion [ r ad / s ] (g) Angular Velocity 0.06-0.0604.6 4.7 4.8Time [s] [ r ad / s ] (h) Angular Velocity 0.6-1-0.20 5 10Time [s] (i)

Goal Switch 0 5 10Time [s]0.00700.014 [ r ad ] (j) Orientation Error

Figure 8.

Results obtained by applying the zero velocity switch approach to join two DMPs trained on synthetic data. The trainingtrajectory for the position and the orientation are shown as a black dashed lines in (a)-(b) and (f)-(g) respectively. Results areobtained with the open source implementation available at https://gitlab.com/dmp-codes-collection . motion to new situations by adjusting the weights w i of theforcing term (4), that modiﬁes the entire DMP trajectory.Weitschat et al. (2013) considered that L demonstrationsare given, each encoded in a different DMP. In order togeneralize, for instance, to a new goal g new , they proposedto interpolate the weights of nearby DMPs, i.e., DMPs thatreached points around g new . In formulas w new = (cid:80) ∀ o : d o

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel interpolation of DMP weights in the latent space results inbetter generalization performance. An important and desired feature of any motion primitiverepresentation is the possibility to combine basic movementsto obtain more complex behaviors (Schaal 1999). We reviewhere three prominent approaches developed to smoothly joina sequence of DMPs. In this tutorial, we name the approachby Pastor et al. (2009) as velocity threshold , that in (Koberet al. 2010b) as target crossing , and that in (Kulviciuset al. 2011, 2012) as basis functions overlay . Some ofthe presented approaches modify the DMP formulations inSection 2.1.1 and 2.1.2. The main differences are highlightedwith green text. The approaches have been implementedin Matlab for both position (Section 2.1.1) and orientation(Section 2.1.2) DMPs. The source code is included in ourpublic repository (see Table 4). Results on synthetic data areshown in Figures 8 to 11. A properly designed DMP reaches the desired target withzero velocity and acceleration, i.e., once a DMP is fullyexecuted the robot comes to a full stop. This also implies thatthe velocity “close” to the target is continuously decreasing.Using this property, Pastor et al. (2009) propose to combinesuccessive DMPs by simply terminating the current DMPwhen the velocity is below a certain threshold and thenstarting the following primitive. When executing a singleDMP, it is common practice to initialize its velocity tozero—the robot is assumed to be still. In principle, thisinitialization can be used to sequence multiple DMPs (Xuand Wang 2004; Lioutikov et al. 2016), but it may generatediscontinuities if the robot does not fully stop in between twoconsecutive primitives. To prevent this discontinuities, Pastoret al. initialized the state of the current DMP with that of theprevious one.The velocity threshold approach is simple and effectivesince it directly applies to the DMP formulations inSections 2.1.1 and 2.1.2. For instance, Saveriano et al.(2019) showed how to join multiple quaternion DMPs † (seeSection 2.1.2.1) with the velocity threshold approach.Results in Figure 8 are obtained when velocity threshold isapplied to merge DMPs separately trained to ﬁt minimumjerk trajectories (black dashed lines). Figures 8a–8e show theposition and Figures 8f–8j the orientation (unit quaternion)parts of the motion. The merged trajectory is generatedby following the ﬁrst DMP until the distance from thevia-point is below . [m] and . [rad]. As shown inFigures 8d and 8i, the switch occurs after about . [s].Figures 8e and 8j shows that the desired trajectory isaccurately reproduced. More or less accurate trajectoriescan be obtained by tuning the distance from the via-point. However, the value of this distance the time durationof the generated trajectory—a bigger (smaller) distanceresults in a shorter (longer) trajectory. For instance, in theconsidered case, the total motion ends after . [s] while thedemonstration lasts for [s]. Depending on the application,the time difference may cause failures, therefore, it has to betaken into account. Finally, the velocity threshold approachmay generate discontinuities if the target of the current DMP [ m ] Figure 9.

The constant goal, moving target, and delayed goalobtained obtained with y (0) = 0 [m], g = 1 [m], ˙ˆ y = 0 . [m/s](left), and q (0) = 1 + [0 , , (cid:62) , g q = 0 + [1 , , (cid:62) , ˆ ω = [0 . , . , . (cid:62) [rad/s] (right). The sampling time is δt = 0 . [s]. Only the scalar part ν of the quaternion is shownfor a better visualization. is far from the demonstrated initial point of the followingprimitive. There exist movements like hitting or batting that arecorrectly executed only if the target is reached with a non-zero velocity. To this end, Kober et al. (2010b) extend theclassical DMP formulation in Section 2.1.1 to let the DMP totrack a target moving at a given velocity. In their approach,the DMP passes the target with a given velocity exactly after T seconds. To achieve this, the acceleration in (1) is re-written as τ ˙ z = (1 − x ) α z (cid:16) β z (ˆ g − y )+ τ ( ˙ˆ y − ˙ y ) (cid:17) + f ( x ) , (48)where ˙ˆ y m is the desired velocity of the moving target ˆ g ,which is deﬁned as ˆ g = ˆ g (0) − ˙ˆ y τ ln ( x ) α x , (49) ˆ g (0) = g − T ˙ˆ yτ . (50)By inspecting (49) and (50), and considering that the term − τ ln( x ) /α x represents the elapsed time if x is the phasedeﬁned in (3), it is possible to show that the moving target ˆ g is designed to reach the goal g after T seconds, i.e., ˆ g ( T ) = g (Fig. 9- left ). The initial position of the moving target ˆ g (0) is obtained by moving the goal position g for T seconds atconstant velocity − ˙ˆ y . High accelerations at the beginning ofthe movement are avoided by the pre-factor (1 − x ) which isset to zero at the beginning of the motion ( x (0) = 1 ). Theapproach by Nemec and Ude (2012) combines a movingtarget and a particular initialization of the subsequent DMPto ensure continuity of the movement up to second-orderderivatives.Saveriano et al. (2019) extended this idea to quaternionDMP. The angular acceleration in (15) is modiﬁed as τ ˙ η =(1 − x ) α z ( β z Log q (ˆ g q ∗ q )+ τ ( ˆ ω − ω )+ f q ( x ) , (51) † Saveriano et al. used the multi-dimensional DMP formulation developedin (Hoffmann et al. 2009) for both position and quaternion DMPs. In thisreview paper, we reformulate the merging approaches in (Saveriano et al.2019) to comply with the formulations in Section 2.1.1 and 2.1.2.1.

Prepared using sagej.cls Journal Title XX(X) [ m ] (a) Position 0.8-0.800 5 10Time [s] [ m / s ] (b) Linear Velocity 0.02-0.0204.8 5 5.2Time [s] [ m / s ] (c) Linear Velocity 0 5 10Time [s]1.50.5-0.5 [ m ] (d) Goal Switch 0.0300.060 5 10Time [s] [ m ] (e) Position Error0.8-0.800 5 10Time [s] (f)

Quaternion 2.5-2.500 5 10Time [s] [ r ad / s ] (g) Angular Velocity 0.02-0.0204.8 5 5.2Time [s] [ r ad / s ] (h) Angular Velocity 0.6-1-0.20 5 10Time [s] (i)

Goal Switch 0 5 10Time [s]0.1600.32 [ r ad ] (j) Orientation Error

Figure 10.

Results obtained by applying the target crossing approach to join two DMPs trained on synthetic data. The trainingtrajectory for the position and the orientation are shown as a black dashed lines in (a)-(b) and (f)-(g) respectively. Results areobtained with the open source implementation available at https://gitlab.com/dmp-codes-collection . where ˆ ω is the angular velocity of the moving quaterniontarget ˆ g q and Log q (ˆ g q ∗ q ) measures the error betweenthe current orientation q and ˆ g q . The pre-factor (1 − x ) isused to avoid high angular accelerations at the beginning ofthe motion. The moving target for the quaternion DMPs isdeﬁned as ˆ g q = Exp q (cid:18) − τ ln( x )2 α x ˆ ω (cid:19) ∗ ˆ g q (0) , ˆ g q (0) = Exp q (cid:18) − T ω (cid:19) ∗ g q , (52)where g q is the goal quaternion, T is the time duration of theDMP, and the exponential map Exp q ( · ) is deﬁned in (19).As shown in Figure 9- right , the moving target ˆ g q reachesthe goal orientation after T seconds, i.e., ˆ g q ( T ) = g q . Thiscan be easily veriﬁed by considering that the initial valueof the moving target ˆ g q (0) is computed by moving the goalorientation g q for T seconds at the desired velocity − ˆ ω .The presented target crossing approach allows to cross thetarget after T seconds. Assuming to have two DMPs withtime duration T and T respectively, one can join them byrunning the ﬁrst DMP for T seconds and then switchingto the second one. As for the velocity threshold approach,possible discontinuities at the switching point are preventedby initializing the state of DMP with the ﬁnal state of DMP .This procedure can be repeated to join L ≥ consecutiveDMPs.Results in Figure 10 are obtained when the velocitythreshold is applied for merging separately trained DMPsto ﬁt the minimum jerk trajectories (black dashed lines).Figures 10a–10e show the position and Figures 10f–10jthe orientation (unit quaternion) parts of the motion. Themerged trajectory is generated by following the ﬁrst DMPfor T = 5 s and then switch to the second one. Therequired intermediate velocity is set to . m/s (rad/sfor the orientation) in each direction. The generatedtrajectory reaches the goal in s, i.e., demonstrationand execution times are the same. As required, the via-point is crossed at T = 5 s with the desired velocity (Fig. 10c and 10h). However, the non-zero crossing velocityintroduce a deformation in the ﬁrst part of the trajectory(Fig. 10e and 10j). The approach by Kulvicius et al. (2011, 2012) combinesmultiple DMPs into a complex one, guaranteeing a smoothtransition between the primitives by ensuring that the basisfunctions composing f ( x ) in (4) overlap at the switchinginstances. First of all, Kulvicius et al. adopted a sigmoidalphase variable in (13) instead of the exponentially decayingone (3). As discussed in Section 2.1.1.3, the sigmoidal phaseis ≈ for the large part of the motion which makes it possibleto use smaller forcing terms to reproduce the demonstrations.On the contrary, the exponential phase is close to zeroalready before T s (Fig. 3), which results in larger forcingterms.The classical acceleration dynamics in (1) is modiﬁed as τ ˙ z = α z ( β z (˜ g − y ) − z ) + f ( s ) , (53)Similarly to target crossing, Kulvicius et al. used a movingtarget ˜ g in the acceleration dynamics, but called it the delayedgoal function . The ˜ g term in (53) is obtained by integrating τ ˙˜ g = (cid:40) δtT ( g − y ) , t ≤ T , otherwise (54)with ˜ g (0) = y . The delayed goal function in Figure 9 moveslinearly from y to g in T seconds and then remains constant, i.e., ˜ g ( t ≥ T ) = g .The non-linear forcing term f ( s ) is in green in (53)because it slightly differs from the classical one in (4). f ( s ) is deﬁned as f ( s ) = (cid:80) Ni =1 w i Ψ i ( t ) (cid:80) Ni =1 Ψ i ( t ) s, Ψ i ( t ) = exp (cid:18) − ( tτT − c i ) σ i (cid:19) , (55) Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel [ m ] (a) Position 0.8-0.800 6 12Time [s] [ m / s ] (b) Linear Velocity 0.03-0.0304.8 5 5.2Time [s] [ m / s ] (c) Linear Velocity 0 6 12Time [s]1.50.5-0.5 [ m ] (d) Goal Switch 0.01500.030 6 12Time [s] [ m ] (e) Position Error0.8-0.800 6 12Time [s] (f)

Quaternion 2.5-2.500 6 12Time [s] [ r ad / s ] (g) Angular Velocity 0.03-0.0304.8 5 5.2Time [s] [ r ad / s ] (h) Angular Velocity 0.8-0.800 6 12Time [s] (i)

Goal Switch 0 6 12Time [s]0.0400.08 [ r ad ] (j) Orientation Error

Figure 11.

Results obtained by applying the basis functions overlay approach to join two DMPs trained on synthetic data. Thetraining trajectory for the position and the orientation are shown as a black dashed lines in (a)-(b) and (f)-(g) respectively. Resultsare obtained with the open source implementation available at https://gitlab.com/dmp-codes-collection . where σ i is the width and c i is the center of the i -th basisfunction, and s is obtained by integrating (13). The term t/τ T is used in (55) instead of the the phase variable x .Being ≤ t/τ T ≤ , the basis functions are equally spacedbetween and . Finally, σ i are the widths of each kernel.They are constant and depend on the number of kernels.Having presented the main differences with the canonicalapproach, it is possible to focus on how Kulvicius et al.(2012) solved the problem of joining L ≥ DMPs. Ingeneral, each of the L DMPs has a different time duration T l , desired target g l , and initial position y l , from which it ispossible to compute the delayed goal functions by integrating τ ˙˜ g l =  δtT l ( g l − y l ) , l − (cid:80) κ =1 T κ ≤ t ≤ l (cid:80) κ =1 T κ , otherwise . (56)Note that, being ˜ g l (0) = y , the acceleration (53) is smoothat the beginning of the motion. For this reason, the term (1 − x ) used in (48) is not needed in (53).Assuming that L DMPs have been trained and that eachDMP has N kernels, we can merge them into one DMP asfollows. The centers of the joined DMP are computed as ˇ c li =  T ( i − T join ( N − , l = 1 T l ( i − T join ( N − + T join l − (cid:80) κ =1 T κ , otherwise , (57)where T l is the duration of the l -th DMP, and T join = (cid:80) Ll =1 T l (duration of the joined motion). The widths of thejoined DMP are computed as ˇ σ li = σ li T l T join . (58)The centers and widths computed in (57) and (58)respectively overlap at the transition points allowing forsmooth transitions between consecutive DMPs. The weightsof the joined DMP are obtained by stacking the N weights ofthe L DMPs. Therefore, the joined DMP has N ∗ L kernels and N ∗ L weights. The phase variable (13) is modiﬁed torun for the duration T join of the joint motion.Saveriano et al. (2019) extended the basis functionsoverlay approach to quaternion DMPs. Assuming that asequence of L quaternion DMPs is given. The angularacceleration in (15) is reformulated for each DMP as τ ˙ η l = α z ( β z Log q (˜ g lq ∗ q l ) − η l ) + f lq ( s ) (59)where l indicates the l -th quaternion DMP and f lq ( s ) isdeﬁned as in (55). The term ˜ g lq is the quaternion delayedgoal function and it ranges from q l (0) to g lq in T l seconds(see Fig. 9 (right)). To generate this moving target whilepreserving the geometry of S it is needed that ˜ g lq movesalong the geodesic connecting q l (0) to g lq . Therefore, ˜ g lq isdeﬁned as ˜ g lq ( t + δt ) = Exp q (cid:32) τ ˜ ω l ( t )2 (cid:33) ∗ ˜ g lq ( t ) (60)where ˜ ω l ( t ) =  T l Log q (cid:0) g lq ∗ q l (0) (cid:1) l − (cid:80) κ =1 T κ ≤ t ≤ l (cid:80) k =1 T κ [0 , , (cid:62) , otherwise . (61)The angular velocity in (61) is computed for each l . Theterm Log q (cid:0) g lq ∗ q l (0) (cid:1) represent the angular velocity thatrotates q l (0) into g lq in a unit time. Note, that the mappingsLog q ( · ) and Exp q ( · ) are deﬁned in (17) and (19) respectively.The delayed goal ˜ g lq crosses all the via-goals g lq , l =1 , . . . , L − and then reaches the goal g Lq .Results in Figure 11 are obtained when velocity thresholdis applied to merge DMPs separately trained to ﬁt theminimum jerk trajectories (black dashed lines). Figures 11a–11e show the position and Figures 11f–11j the orientation(unit quaternion) parts of the motion. This approach doesnot require a switching rule and automatically generatesa smooth trajectory—with continuous velocity as shown

Prepared using sagej.cls Journal Title XX(X) in Figures 11c and 11h—that passes close to the via-point which favors the overall reproduction accuracy(Fig. 11e and 11j). However, the distance from the via-pointdepends on the weights of the joined primitives and cannotbe separately decided. The trajectory generated with thisapproach tend to last longer than the demonstrations. Thisis due to the sigmoidal phase that vanishes after T + δ s s(Fig. 3). Depending on the application, the time differencemay cause failures and has to be taken into account. The standard periodic DMP learning approach approximatesthe shape f d ( t ) of the input trajectory y d in (38) by changingthe weights of the Gaussian kernel functions (Ijspeert et al.2013). Updating of the weights is performed in such away that the difference between the reference trajectory andthe DMP is reduced at every control step and graduallythroughout the periodic repetitions. However, the DMP canalso be reshaped by some external feedback function toachieve different functionalities for different applications,for instance, tasks that require trail-and-error approach(Kober et al. 2008), obstacle avoidance (Park et al. 2008;Hoffmann et al. 2009; Tan et al. 2011), coaching (Petriˇcet al. 2014b; Gams et al. 2016) for robots, and adaptationof assistive exoskeleton behavior (Peternel et al. 2016a).Alternatively, the frequency of the existing periodic DMPscan be modulated online (Gams et al. 2009; Petriˇc et al.2011). In (Park et al. 2008; Hoffmann et al. 2009; Tan et al. 2011)the detected obstacle was ﬁtted with a potential ﬁeld functionto change the shape of the DMP to avoid it. More in details,Tan et al. (2011) used the potential ﬁeld to compute a time-varying goal and modiﬁed the resulting DMP trajectory,while (Park et al. 2008; Hoffmann et al. 2009) added andextra forcing term to the DMP. Similarly in (Gams et al.2016) the human arm was ﬁtted with a potential ﬁeldfunction, which was used to reshape the DMP to performcoaching. The potential ﬁeld was coupled to the position ofthe human hand to make pointing gestures and indicate thedirection in which the robot arm position trajectory shouldchange: ˙ z = Ω ( α ( β ( − y ) − z ) + C O + f ) . (62)The added coupling term C O is the obstacle avoidance termthat contains the potential ﬁeld and is given in a simpliﬁedform for the sake of explanation as: C O = d s ( ||O − y || ) exp( − ζ ( O − y )) , (63)where O is the obstacle (or human pointing gesture) and y is the robot position. Exponential and ζ functions determinethe potential ﬁeld, while function d s controls the distance atwhich the perturbation ﬁeld should start affecting the DMP.For the full formulation of C O and its parameters, see (Gamset al. 2016). In (Rai et al. 2017) the method was extended toinclude generalization of the obstacle avoidance formulationin (62).Alternatively, the faulty segment of collision DMPtrajectory can also be directly adjusted online by the human demonstrator (Karlsson et al. 2017). On the other hand, themethod in (Kim et al. 2015) considers obstacle avoidance asa constraint of an optimization problem, which modiﬁes theDMP trajectory to prevent collisions. Similarly as for obstacle avoidance, task dynamics can alsobe incorporated into DMP as coupling terms. In (Gamset al. 2014) task dynamics were coupled on the accelerationand velocity level of the DMP. The presented method wasutilized for interaction tasks, where the human changed thebehavior of the robot based on the exerted dynamics on themanipulator. τ ˙ z = α z ( β z ( g − y ) − z ) + ˙ C f + f ( x ) , (64) τ ˙ y = z + C f . (65)whereas the force coupling term C f = ςF is deﬁned asa virtual or measured force F and ς is a scaling factor,which essentially changes the dynamic behavior of theDMP, enabling the motion primitive to instantly react tothe coupled force. Later, Zhou et al. (2016b) introduceda PD controller based coupling term formulation C P D = ς ( K P ( F d − F e ) − D V ˙ F e ) coupled to the velocity part ofthe DMP (65). In the formulation F d represents the desiredforce, F e is the measured force, ς is a scaling factor and K P and D V are the proportional and derivative gains of theProportional Derivative (PD) controller. The coupling termformulation allows for controlled adaptation of robot motionto changes in the environment.In (Kramberger et al. 2018) this approach was extended,with a force feedback loop coupled to the velocity (2) and thegoal g of the DMP. The outcome of this approach is a similarbehavior as an admittance controller (Villani and De Schutter2008), with an difference that the execution is directly on thetrajectory generation level. τ ˙ z = α z ( β z (( g + C a ) − y ) − z ) + f ( x ) , (66) τ ˙ y = z + ˙ C a . (67)Here ˙ C a = ς ( F d − F e ) is the ﬁrst time-derivative of theadmittance coupling term, which changes the velocity andconsequently the integrated coupling term, the positionoutput of the DMP. The described approach can be usedfor Cartesian space motion, where the forces have to besubstituted for desired and measured torques. This approachcan be implemented in robot tasks involving contact with theenvironment as well as contact with humans. In (Peternel et al. 2016a), human effort was used to providethe information about the direction in which the assistiveexoskeleton joint torque DMP should change in order tominimize it. The human was included into the robot controlloop by replacing the error calculation in (40) with the humaneffort feedback term U ( E ) : w i ( t  +1 ) = w i ( t  ) + Ψ i P i ( t  +1 ) U ( E ) , (68)where E ( t ) is the current effort measured by humanmuscle activity through Electromyography (EMG) signals ‡ . ‡ Note that other feedback that measures human effort can be used insteadof EMG, such as joint torque or limb forces.

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Equations (33)-(37) and (41) are used in the original form.Equations (38)-(40) are not used, since (68) is used tomodulate the weights in (36) instead.The effort feedback term U ( E ) closes the loop and acts asa feedback for adapting the weights of Gaussian kernels thatdeﬁne the shape of the trajectory. A positive U ( E ) increases,while a negative U ( E ) decreases the values of weights at agiven section of the periodic DMP that encodes joint torque.If the shape of the DMP does not provide enough assistivepower, the human has to exert effort ( i.e., muscle activity) toproduce the rest of the power required to achieve the desiredtask under given dynamics. In turn, muscle activity feedbackthen increases the magnitude of the DMP until the humaneffort term U ( E ) is minimised. Note that each joint has itsown torque DMP and U ( E ) term (Peternel et al. 2016a).After that point, the DMPs do not change unless the task,dynamics or conditions change. If they change, the humanhas to compensate for the change by an additional muscleactivity, which in turn adapts the DMPs to the new requiredjoint torques. In many LfD scenarios it is desired to modify both thespatial motion and the speed of the learned motion atany stage of the execution. Speed-scaled dynamic motionprimitives ﬁrst presented in Nemec et al. (2013a) are appliedfor the underlying task representation. The original DMPformulation from (1) and (2) were extended by adding atemporal scaling factor υ on the velocity level of the DMP υ ( x ) τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) , (69) υ ( x ) τ ˙ y = z. (70)Form (69) and (70), it is evident that the velocity term is afunction of phase, and therefore encoded with a set of RBFssimilarly as in (4). This method allows for modiﬁcation ofthe spacial motion as well as the speed of the execution at anystage of the trajectory execution. The authors demonstratedthe proposed method in a learning scenario, where afterevery learning cycle (using Iterative Learning Control (ILC))a new velocity proﬁle was encoded based on the wrenchfeedback, and thus converged to an optimal velocity for thespeciﬁc task. Vuga et al. (2016) extended the approach byincorporating a compact representation for non-uniformlyaccelerated motion as well as simple modulation of themovement parameters.Later on, in Nemec et al. (2018) the authors extendedthe previous approach to also incorporate velocity scalingof the encoded orientation trajectories represented with unitquaternions. The outcome of the presented work is a uniﬁedapproach to velocity scaling for tasks executed in Cartesianspace. Furthermore, a reformulation of the velocity approachcalled AL-DMPs was presented by Gaˇspar et al. (2018).In this work they present a method, where the spatial andtemporal components of the motion are separated, by meansof the arc-lenght based on the time parameterized trajectory.Arc-lenght, based on the differential geometry of curves,is related to the speed of the movement, given as the timederivative of the demonstrated trajectory. The approach iswell suited when multiple demonstrations are compared forextraction of relevant information for learning. Weitschat andAschemann (2018) add an extra forcing term to keep the velocity within a certain predeﬁned limit. The aim of thiswork is to guaranty a safe execution of the robot task wheninteracting with humans, as well as providing a frameworkfor safe interaction in a changing environment where therobot position and velocity have to change over time. Fora full formulation of the coupling term see (Weitschat andAschemann 2018). Additionally, Dahlin and Karayiannidis(2020) in their work proposed a temporal coupling basedon a repulsive potential, keeping the DMP velocity withinthe predeﬁned velocity limits while ensuring the path shapeinvariance. LfD is a wide research area and many different approacheshave been developed to reproduce human demonstrations(Billard et al. 2016). As already mentioned, the aim of thistutorial survey is to provide a comprehensive overview ofDMPs research and we intentionally skip the rich literaturein the ﬁeld of LfD. However, we found some representationsthat are closely related to the DMP formulation. This sectionbrieﬂy reviews them.Calinon et al. (2009) computed an acceleration commandfor the robot in a PD-like form ¨y = K P ( y d − y ) + D V ( ˙y d − ˙y ) , where K P is a stiffness and D V a damping gain, y is the measured state of the robot and ˙y its timederivative (velocity), y d and ˙y d are desired position andvelocity retrieved with GMR. Authors then shown that theacceleration command ¨y can be seen as a mixture of lineardynamics, each converging to a certain attractor. Despitelater work like (Kormushev et al. 2010) referred to thisrepresentation as “a modiﬁed version” of DMPs there aresigniﬁcant differences with the DMP formulation properlyhighlighted by (Calinon et al. 2012).Herzog et al. (2016) computed an acceleration commandfor the robot from the linear system ¨y = u = K P ( y d − y ) , where y is the measured state of the robot, y d is a humandemonstration, and K P is a control gain computed usingthe linear-quadratic regulator method. Then, a compactrepresentation of the control input trajectory u is computedby means of Chebyshev polynomials. This representationdoes not require a vanishing phase variable to ensureconvergence, but the generalization to different start/goalposition requires the application of the linear-quadraticregulator method to ﬁnd a new sequence of control inputs.Regarding periodic motions, (Ajallooeian et al. 2013)proposed a dynamical system-based framework to learnrhythmic movements with an arbitrary shape and basin ofattraction. They exploit phase-based scaling functions torepresent the mapping between a known, base limit cycleand a desired periodic orbit. The basic limit cycle can be,for example, the one generated by a periodic DMPs, whichmakes the approach of (Ajallooeian et al. 2013) a moregeneral formulation of periodic primitives. Prepared using sagej.cls Journal Title XX(X)

This section reviews approaches where DMPs havebeen integrated into bigger executive frameworks. Wecategorize these approaches into ﬁve main research areas,namely grasping and manipulation , impedance learning , reinforcement learning , deep learning , and incremental andlife-long learning Successfully grasping an object is the ﬁrst step towardsrobotic manipulation. Performing a grasping requires a(visual) perception of the environment to locate the object tograsp and decide the grasping points based on its geometry.In this setting, even small uncertainties may cause the objectto drop and the grasp to fail. To improve the robustness ofvision driven grasping, Kr¨omer et al. (2010a) augmentedDMPs with a potential ﬁeld based on visual descriptorsthat adapts hand and ﬁnger trajectories to the object’slocal geometry. This grasping strategy was integrated ina hierarchical control architecture where the upper leveldecides where to grasp the object and the lower level locallyadapted the motion to robustly grasp the object (Kr¨omeret al. 2010b). Stein et al. (2014) proposed a point cloudsegmentation approach based on convexity and concavity ofsurfaces. The approach is particularly suited to recognizeobject handles and enables a robot to automatically graspobject.The ability of grasping and using tools is also desirableto perform daily-life manipulation. In this respect, (Guerinet al. 2014) proposed the so-called tool movement primitives that transform the demonstrations in a tool affordance frame.The result is a motion that generalize to different toolposes and to tools that share the same affordance(s). Liand Fritz (2015) considered tool usage with low-cost, non-dexterous grippers and propose a framework to learn bi-manual strategies for tool usage and compensate for the lackof dexterity. Bi-manual robotic manipulation is a challengingtask that requires precise coordination between the handmovements and adherence to the spatial constrains. Thotaet al. (2016) developed a DMP-based control framework forbi-manual manipulation that ensures time synchronization ofthe two hands while being robust to spatial perturbations andgoal changes.Beyond the object grasping, everyday manipulationrequires a precise execution of complex movements. Oftensuch a complex movements are hard to encode into a singlemotion primitive, but they can be conveniently split intosimpler motions ( e.g., reach and grasp) that can be properlysequenced and executed (Fig. 12).The possibility of exploiting DMPs as the building blocksof complex tasks was investigated in (Ramirez-Amaro et al.2015; Caccavale et al. 2018, 2019). In these works, a humanteacher demonstrated a relatively complex task consistingof several actions performed on different objects. Thedemonstration was then automatically segmented into M basic motions used to ﬁt M DMPs. While Ramirez-Amaroet al. (2015) exploit semantic rules ( e.g., , reach an objectwith a knife means cut ) to infer high-level human activities,Caccavale et al. built a hierarchical structure to schedulethe execution of the complex task by selecting the proper Figure 12.

An example of hierarchical task decomposition andmotion primitives sequencing from (Agostini et al. 2020).

DMP for the current executive context. They used kinestheticteaching and verbal cues (open/close gripper commands) toprovide task demonstrations. Lemme et al. (2014) organizesegmented task demonstrations into a motion primitiveslibrary learned from self-generated trajectory patches. Theyalso introduced a mechanism to remove unused skills andupdate the library. Kinesthetic teaching and haptic feedbackwere also used by Eiband et al. (2019) to segment andrecognize basic motions or skills , and to build a treedescribing geometric relationships—like reference framesand goal poses—between consecutive skills. At run time, therobot performed haptic exploration to locate objects in thescene and update the skill tree. The transformations in theskill tree were then used to deﬁne initial and goal pose ofthe DMPs and execute the task. Finally, Wu et al. (2018)integrated DMPs into a dialogue system with speech andontology to learn or re-learn a task using natural interactionmodalities.Collecting demonstrations becomes an issue of kinestheticteaching or marker-based motion trackers cannot be used.The latter requires an expensive sensor infrastructure thatis hard to build in real world scenarios like factory ﬂoors.Kinesthetic teaching needs torque controlled/collaborativerobots that are still uncommon in industrial scenarios. Toremedy this issue (Mao et al. 2015) exploited a low-cost RGB-D camera and track the human hand using themarkerless approach proposed by (Oikonomidis et al. 2011).Collected data were then segmented into basic motions andused to ﬁt DMPs.Described approaches assumes that human teachersalways provide consistent and noiseless task demonstrations.Ghalamzan E. et al. (2015) encoded noisy demonstrationsinto a GMM and computed a noise free trajectoryusing GMR. The noise free trajectory was then usedto ﬁt a DMP that generalized to different start, goal,and obstacle conﬁgurations. Niekum et al. (2012, 2015)designed a framework that learns from from unstructureddemonstrations by segmenting the task demonstrations,recognizing similar skills, and generalizing the taskexecution. Interestingly, a user study on volunteersconducted by (Gutzeit et al. 2018) showed that existing Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel strategies for segmentation and learning are sufﬁcientlyrobust to enable automatic transfer of manipulation skillsfrom humans to robots in a reasonable time. Finally, somework (Deniˇsa and Ude 2013a,b; Deniˇsa and Ude 2015)exploited transition graphs and trees to embed parts ofa trajectory and search algorithms to discover sequenceof partial parts and generate motions that have not beendemonstrated.Approaches that rely on a hierarchical, tree-like structureto represent the task that has limited task generalizationcapabilities. Lee and Suh (2013) used probabilistic inferenceand object affordances to infer the adequate skill thatcan handle uncertainties in the executive context. Beetzet al. (2010) learned stereotypical task solutions fromobservation and used task planning and symbolic reasoningto execute novel mobile manipulation tasks. A generativelearning framework was proposed by (W¨org¨otter et al.2015) to augment the robot’s knowledge-base with missinginformation at different level of the cognitive architecture,including symbolic planning as well as object and actionproperties. (Paxton et al. 2016) used task and motionplanning to generalize the execution of complex assemblytasks and proposed an learning by demonstration approachto ground symbolic actions. (Agostini et al. 2020) performedtask and motion planning by combining an object-centricdescription of geometric relations between objects in thescene, a symbol to motion hierarchical decompositiondepending on tree consecutive actions in the plan, and theLfD approach developed in (Caccavale et al. 2019) (Fig. 12).A manipulation task was described at three different levelsby (Aein et al. 2013). The top-level provides a symbolicdescriptions of actions, objects, and their relationships. Themid-level uses a ﬁnite state machine to generate a sequenceof action primitives grounded by the lower level. A commonpoint among these approaches is that they use DMP toexecute the task on real robots. Impedance control can be used to achieve complaintmotions, in which the controller resembles a virtual spring-damper system between the environment and robot end-effector (Hogan 1985). Such approach permits smooth,safe, and energy-efﬁcient interaction between robots andenvironments (possibly humans). A standard model for suchinteraction is deﬁned as M ¨y t = K P t ( y g − y t ) − D V t ˙y t + f et , (71) I ˙ ω t = K O t ( Log R ( R g R t (cid:62) )) − D W t ω t + τ et , (72)where (71) and (72) correspond to translational androtational cases respectively, M , K P t , and D V t are themass, stiffness and damping matrices, respectively, fortranslational motion, while I , K O t , and D W t are the momentof inertia, stiffness and damping matrices, respectively, forrotational motion. ˆR , R t ∈ SO (3) are rotation matrices andcorrespond to desired rotation goal and actual orientationproﬁle of the end-effector, respectively. f et and τ et representthe external force and torque applied to the robot end-effector.In fact, VIC plays an important role when a robot needsto interact with any environment in order to avoid high Variable Impedance Learning Control (VILC)DMPInverseDynamics Robot andEnvironmentForwardKinematicsImpedance Adaptation Strategy K P t M − t ++ − ˆy + − y t ˙y t Q t , ˙ Q t f et D V t τ t Figure 13.

General control scheme of Variable ImpedanceControl (VIC) and DMP. impact forces and damage for the environment or therobot ( i.e., change to low stiffness)Ajoudani et al. (2012);Abu-Dakka et al. (2018); Peternel et al. (2018a). On theother hand, it is important in rejecting unexpected andunpredictable perturbations from the environment to achievea desired position tracking precision ( i.e., change to highstiffness) Yang et al. (2011). In addition, it is also importantin coordination of human-robot collaborative movementsPeternel et al. (2017b). However, a robotic system still needsto learn how to adapt such VIC to unseen situations whileavoiding hard-coding. Such paradigm of learning is calledVariable Impedance Learning Control (VILC). Interestedreaders can refer to our recent survey on VILC (Abu-Dakkaand Saveriano 2020).In this review, we will mention some of the works thatintegrate DMP with VIC in a VILC framework. Figure 13shows a simple generic example where DMP is integrated ina VIC control scheme.Buchli et al. (2011a) proposed one of the earliestapproaches that integrates DMP with Policy Improvementwith Path Integrals (PI ) algorithm (Theodorou et al. 2010)to learn movements (position and velocity presented byDMP) while optimizing impedance parameters. Later theauthors exploited a diagonal stiffness matrix and expressedthe variation (time derivative) of each diagonal entry as ˙ k ϑ j ,t = α j (cid:0) (cid:102) j (cid:62) ( ϑ j + (cid:15) j,t ) − k ϑ j ,t (cid:1) , j = 1 , . . . , J, (73)where j indicates the j -th joint, k ϑ j ,t is the stiffness ofjoint j , (cid:15) j,t is a time-dependent exploration noise, each (cid:102) j is a vector of N Gaussian basis functions, and ϑ j are the learnable parameters for joint j . The stiffnessparameterization in (73) is also linear in the parameters andPI can be applied to ﬁnd the optimal policy. Later, authorsused PI to learn VIC in deterministic and stochastic forceﬁelds (Stulp et al. 2012a). Nakanishi et al. (2011) proposed amethod that optimizes a periodic motion a long with a time-varying joint stiffness.(Basa and Schneider 2015) introduced an extension toDMP formulation by adding a second nonlinear function tocope with elastic robots as follow τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) + f , (74)where f is deﬁned as (4) but without the phase variable x .The main purpose of f is to compensate the gravitationalinﬂuence on the moved DoF at the end of the movementtime and beyond. Differently, Haddadin et al. (2016) used Prepared using sagej.cls Journal Title XX(X) optimal-control to execute near-optimal motion of elasticrobots.Nemec et al. (2016) proposed a cooperative controlscheme that enables dual arm robot to adapt its stiffnessonline along to the executed trajectory in order to provideaccurate evolution. (Umlauft et al. 2017) used GP along withDMPs (as proposed in (Fanger et al. 2016)) to predict thetrajectories. During the execution, their admittance controlleradapts both stiffness and damping online. The energy-tankspassivity-based control method has been integrated withDMPs to enforce passivity in order to stably adapt to contactsin unknown environments by adapting the stiffness online(Shahriari et al. 2017; Kramberger et al. 2018; Kastritsi et al.2018).Methods in (Peternel et al. 2014, 2018b,a; Yang et al.2018, 2019; Bian et al. 2019) designed different multi-modalinterfaces to let the human to explicitly teach an impedancebehavior to the robot. Most of them combined EMG-based variable impedance skill transfer with DMP-basedmotion sequence planning, inheriting the merits of thesetwo aspects for robotic skill acquisition. Hu et al. (2018)used Covariance Matrix Adaptation-Evolution Strategies(CMA-ES) to update the parameters of DMPs and variableimpedance controller in order to reduce the impact in duringthe robot motion in noisy environments. Dometios et al.(2018) integrated a Coordinate Change-DMPs (CC-DMP)with a vision-based motion planning method to adapt thereference path of a robot’s end-effector and allow theexecution of washing actions.Travers et al. (2016, 2018) proposed a shape-basedcompliance controller for the ﬁrst time in locomotion,by implementing amplitude compliance on a snake robotmoving in complex environment with obstacles. Theirapproaches allow a snake-like robots to blindly adapt to suchcomplex unstructured terrains thanks to their proprioceptivegait compliance techniquesRecently, an adaptive admittance controller is proposed(Wang et al. 2020) which integrates GMR for the extractionof human motion characteristics, DMP to encode ageneralizable robot motion, and a RBF-NN-based controllerfor trajectory-tracking during the reproduction phase.Novel LfD approaches explicitly take into account thattraining data are possibly generated by certain Riemannianmanifolds with associated metrics. Abu-Dakka and Kyrki(2020) reformulated DMPs based on Riemannian metrics,such that the resulting formulation can operate with SPD datain the SPD manifold. Their formulation is capable to adaptto a new goal-SPD-point.Recently, biomimetic controller has been integrated withDMPs (Zeng et al. 2021) in order to learn and adaptcompliance skills.

In RL, an agent tries to improve its behavior via trial-and-error by exploring different strategies ( actions ) and receivinga feedback ( reward ) on the outcome of its actions. Actions a are drawn from a policy π ( s, a ) that represent a mappingbetween states s and actions a . The goal of RL is toﬁnd an optimal policy π (cid:63) that maximizes the cumulativeexpected reward, i.e., the sum of expected rewards over apossibly inﬁnite time interval. When the agent is a robot πϑϑ rewardORPolicy Initialization Robot andEnvironmentDMPPolicy ImprovementHumanDemonstration Figure 14.

General block scheme of DMP-based policyimprovement. performing tasks in the real world the state and actionsspaces are inherently continuous. Moreover, the roboticagent is affected by imperfect ( e.g., noisy) perception andinaccurate models ( e.g., contacts). Finally, performing alarge amount of interactions with the real word ( rollouts ) isexpensive and possibly dangerous. As discussed by (Koberet al. 2013), robotic speciﬁc challenges require speciﬁcsolutions to make the RL problem feasible.

One possibility is to use parameterized policy and use RLto search for an optimal, ﬁnite set of policy parameters.In this respect, DMPs have been widely used as policyparametererization. The general idea is shown in Figure 14.More in details, (Peters and Schaal 2008a,b) showed thatvarious policy gradient and actor-critic RL approaches canbe effectively applied to improve robotic skills parameterizedas DMPs. Other research focused on developing policysearch algorithms speciﬁcally for parameterized policies.Inspired by stochastic optimal control, Theodorou et al.(2010) proposed Policy Improvement with Path Integrals(PI ) which is an application of path integral optimal controlto DMPs. PI and DMPs have been successfully appliedin several domains including VILC Buchli et al. (2011a,b)and in-contact tasks (Hazara and Kyrki 2016), graspingunder state estimation uncertainties (Stulp et al. 2011), bi-manual manipulation (Zhao et al. 2020), and robot-assistedendovascular intervention (Chi et al. 2018). Kober and Peters(2011) derived from expectation-maximization the so-calledPolicy Learning by Weighting Exploration with the Returns(PoWER). PoWER and DMPs have been successfullyapplied to perform highly dynamic tasks including ball-in-a-cup Kober and Peters (2011) and pancake ﬂippingKormushev et al. (2010). Even with parameterized policies the number of rolloutsneeds to search for optimal policy parameters may becomelarge, especially for robots with many DoFs. Dimensionalityreduction techniques can be exploited to perform policysearch in a reduced space (Colom´e and Torras 2014). Theeffectiveness of this approach was demonstrated in thechallenging task of clothes ( i.e., soft tissues) manipulation(Colom´e and Torras 2018). IL arises as an effective approachto policy initialization and to speed up policy search by

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel reducing the number of rollouts (Kober and Peters 2010).In this respect, Kober et al. (2008, 2010a) augmentedDMPs with a perceptual coupling term and propose toinitialize the DMP via human imitation and to reﬁne themotor skill via RL. IL can be eventually combined withdimensionality reduction (Tan and Kawamura 2011) andseveral rollouts can be performed ﬁrstly in simulation(Cohen and Berman 2014) to further speed up the policysearch. When multiple demonstrations are given, one canlearn a mapping between policy parameters and querypoints ( e.g., , goal positions) and use the mapping togeneralize to new situations (Section 3.1.3). This strategywas used by Nemec et al. (2011, 2012, 2013b) to providea good initial policy for a new situation which is thenfurther reﬁned using RL. Being the mapping estimatedusing example query points, the search space can beeffectively constrained within query points making the policysearch more efﬁcient. Vuga et al. (2015a,b) combined thisapproach with a different DMP formulation to optimizethe velocity of execution. The approach was tested ondiverse tasks including pouring water in a cup, where itprevented the water to split from the cup during the motion.Schroecker et al. (2016) provided demonstrations in theform of soft via-points (Section 3.1.2) which reduce thesearch space to the neighborhood of the taught via-points.Multiple demonstrations were used by (Reinhart and Steil2014, 2015) to build a parameterized skill memory thatconnects low-dimensional skill parameterization to motionprimitive parameters. This low-dimensional embedding isthen leveraged for efﬁcient policy search. Instead of learninga mapping from task to policy parameters, Queißer et al.(2016) used data from the rollouts to incrementally learna parametric skill (bootstrapping) and used it to generate agood initial policy for a new task. Instead of using generalization to provide a better initialpolicy, some researchers exploit RL to improve andgeneralize the motion primitive. (Andr´e et al. 2015) adaptedDMP policies to walk on sloped terrains. M¨ulling et al.(2010) generalized to new situations using a mixture ofDMPs. In their approach, RL was used to estimate the shapeparameters as well as to estimate the optimal responsibilityof each DMP. (M¨ulling et al. 2013) used episodic RLto estimate meta-parameters like the temporal and spacialinterception point of the ball and the racket typical of tabletennis tasks. Lundell et al. (2017) used parameterized kernelweights and RL to search for optimal parameters, while(Forte et al. 2015) augmented the given demonstration usingRL-based state space exploration to autonomously expandthe robot’s task knowledge. Metric RL was exploited by(Hangl et al. 2015) to smoothly switch between learned DMPpolicies and execute a task in new situations.RL can be also applied to sequence multiple motionprimitives and perform more complex task; a successfulstrategy when the robot has to perform, for instance,a manipulation task (Section 4.1). To sequence multipleprimitives it is also of importance to learn the goal of eachmotion. Tamosiunaite et al. (2011) used continuous valuefunction approximation to optimize the goal parametersof a DMP used to perform a pouring task. Kober et al. (2011, 2012) learned a meta-parameter function that mapsthe current state to a set of meta-parameters including goaland duration of the movement. Instead of separating shapeand goal learning into different processes, (Stulp et al. 2011;Stulp et al. 2012b) extended PI to simultaneously learnshape and goal of a sequence of DMPs. Learned skills can be potentially transferred across differenttasks to speed up the learning process and increaserobot autonomy. To this end, Fabisch and Metzen (2014)considered the case where the robot can actively choosewhich task to learn to make the best progress in learning.The process of actively selecting the task was considered as anon-stationary bandit problem for which suitable algorithmicsolution exist while intrinsic motivation heuristics wereexploited to reward the agent after the selection. (Cho et al.2019) deﬁned the complexity of a motor skill based ontemporal and spatial entropy of multiple demonstrationsand used the measured complexity to generate an order forlearning and transferring motor skills. Their experimentalﬁndings provided useful guidelines for skill learning andtransfer. In short, humans have to demonstrate, whenpossible, the most complex task and then the robot is able totransfer the motor skills. Vice versa, if demonstrations are notgiven, it is more effective to start learning simple skills ﬁrstand then transfer the simpler skills to more complex tasks.

RL often lacks scalability to high dimensional continuousstate and action spaces. To remedy this issue, hierarchicalRL exploits a divide et impera approach by decomposing aRL problem into a hierarchy of sub-tasks in order to reducethe search space. Different levels in the hierarchy representinformation at different time and/or spatial scale.Stulp and Schaal (2011) proposed to represent differentoptions as DMPs to sequence. PI was extended to optimizeshape and (sub-)goal of each DMP at different levels oftemporal abstraction. In particular, the shape was adjustedbased on the cost up to the next primitive in the sequence,while the sub-goal considers the cost of the entire sequenceof two DMPs. Layered direct policy search in (End et al.2017) did not rely on a set of predeﬁned sub-policies and/orsub-goals, but instead used information theoretic principlesto uncover a set of diverse sub-policies and sub-goals.Reducing the number of rollouts required to discoveroptimal policies is also important in Hierarchical RL (HRL).As already mentioned, IL is a valuable option to ﬁndgood initial policies. However, there are applications likemanipulation with multi-ﬁngered robotics hands for whichit is hard or impossible to provide expert demonstrations.To make policy search more efﬁcient, Ojer De Andres et al.(2018) used HRL where the upper-level considers discreteaction and state spaces to search for optimal ﬁnger gaitingand synchronization among the ﬁngers. This informationwas passed to the lower-level where rhythmic DMPs andPI generated continuous commands for the ﬁngers. Anotherpossibility to increase data-efﬁciency is to use model-based approaches for RL. Colome et al. (2015) exploiteda friction model to improve a DMP policy and manipulatesoft tissues (a scarf). A model-based HRL approach was Prepared using sagej.cls Journal Title XX(X) proposed by (Kupcsik et al. 2017) for data-efﬁcient learningof upper-level policies that generalize well across differentexecutive contexts. Finally, (Li et al. 2018) proposed a hybridhierarchical framework where the higher-level computesoptimal plans in Cartesian space and converts them todesired joint targets using an efﬁcient solver. The lower-level is then responsible to learn joint space trajectories underuncertainties using RL and DMPs.

A popular method of machine learning are NNs. Due totheir non-parametric nature, they can effectively representnonlinear mappings. A major drawback of NNs in the pastwas their computational complexity of learning. In recentyears there is a renewed interest in NNs. New deep learningapproaches were successfully (LeCun et al. 2015) applied inmachine vision and language processing.In recent years, deep learning has been applied also inrobotics to learn task dynamics (Yang et al. 2016) andmovement dimensionality reduction (Chen et al. 2015). Theauthors (Chen et al. 2015, 2016) introduced a frameworkcalled AutoEncoded DMP (AEDMP) which uses deep auto-encoders to ﬁnd a movements represented in latent featurespace. In this space DMPs can optimally be generalized tonew tasks, as well as the architecture enables the DMPs to betrained as a unit. Pervez et al. (2017b) in their work coupledthe vison perception data for object calciﬁcation with taskspeciﬁc movement deﬁnitions represented with DMPs. Thedata was modeled with Convolutional Neural Networks(CNNs), where the images and the associated movementswere directly processed by the deep NN, thus preservingthe associated DMPs properties and eliminating the need forextracting the task parameters during motion reproduction.Later on Kim et al. (2018b) combined deep RL with DMPs tolearn and generalize robotic skills from demonstration. Theframework builds on a RL approach to learn and optimize anew DMP skill based of a demonstration. The RL approachis backed up with a hierarchical search strategy, reducing thesearch space for the robot, which allows for more efﬁcientlearning of complex tasks. Furthermore, Pan and Manocha(2018) presented an deep learning approach form motionplanning of high dimensional deformable robots in complexenvironments. The locomotion skills are encoded with DMPsand a NN is trained for obstacle avoidance and navigation.The data is further optimized with deep Q-Learning showingthat the learned planner can efﬁciently plan and navigatetasks for high dimensional robots in real time.Pahic et al. (2018) proposed a deep learning approachfor perception-action couplings, demonstrating the couplingbetween the vision based images and associated movementtrajectories. Later on they extended the approach toincorporate CNNs and give a distinguishing propertyformulation for the approach (Pahiˇc et al. 2020), whichutilizes a loss function to measure the physical distancebetween the movement trajectories as opposed to measuringthe distance between the DMPs parameters which haveno physical meaning, leading to better performance of thealgorithm. Recently, they extended the usage of GPR tocreate a database needed to train autoencoder NNs fordimensionality reduction (Lonˇcarevi´c et al. 2021).

T SK , T SK , · · · , T SK N − T SK N · · · Previously learned tasks New task Future tasksDMP-basedPolicyImprovementSkillModel P a s t K no w l edge Database a reward ϑ new Lifelong/Incremental Learning R obo t and E n v i r on m en t ϑ init Figure 15.

General framework of lifelong/incremental learningapproach.

Lifelong (incremental) learning is a framework whichprovides continuous learning of tasks arriving sequentially(Thrun 1996; Chen and Liu 2018; Fei et al. 2016). Theessential component of this framework is a database whichmaintains the knowledge acquired from previously learnedtasks

T SK , T SK , · · · , T SK N − . Incremental learningstarts from the task manager assigning a new task T SK N to a learning agent. In this case, the agent exploits theknowledge in the DB as prior data for enhancing thegeneralization performance of its model on the new task.After the new task T SK N is learned, database is updatedwith the knowledge obtained from learning T SK N . In fact,the incremental learning framework provides an agent withthree capabilities: ( i ) continuous learning, ( ii ) knowledgeaccumulation, and ( iii ) re-using previous knowledge forfuture learning enhancements. Figure 15 shows generalstructure of DMP integrated in a lifelong framework.Churchill and Fernando (2014) proposed a cognitivearchitecture capable of accumulating adaptations and skillsover multiple tasks in a manner which allows recombinationand re-use of task speciﬁc competences. Lemme et al. (2014)segmented demonstrations based on geometric similarities,and subsequently created a motion primitives library. Thelibrary is updated by removing unused skills and includingnew ones. Multiple demonstrations are used by (Reinhart andSteil 2014, 2015) to build a parameterized skill memory thatconnects low-dimensional skill parameterization to motionprimitive parameters. This low-dimensional embedding isthen leveraged for efﬁcient policy search. Piece-wise linearphase is used to improve incremental learning performance(Samant et al. 2016). Duminy et al. (2017) designed aframework for learning which data collection strategy ismost efﬁcient for acquiring motor skills to achieve multipleoutcomes, and generalize over its experience to achieve newoutcomes for cumulative learning.A generative learning framework is proposed to augmentthe robot’s knowledge-base with missing information atdifferent level of the cognitive architecture includingsymbolic planning as well as object and action properties(W¨org¨otter et al. 2015).Wang et al. (2016) proposed a modiﬁed formulation ofDMPs called as DMP+ which capable of efﬁciently modifylearned trajectories by improving the usability of existingprimitives and reducing user fatigue during IL. Later, DMP+ Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Figure 16.

Human operators teach the robot how to perform different tasks.

Left scenarios use robots’ gravity compensation modeto enable kinesthetic guiding, where a human operator guides the robot’s tool center point along the desired trajectory in such away that the desired task is successfully executed (Sloth et al. 2020; Abu-Dakka et al. 2015a, 2018; Caccavale et al. 2019).

Right scenarios use teleoperation system to demonstrate appropriate robot movements either through haptic interface (Peternel et al.2018a) or magnetic trackers (Abu-Dakka et al. 2015a). had been integrated into a dialogue system with speech andontology to learn or re-learn a task using natural interactionmodalities (Wu et al. 2018).In literature, it has been shown that incremental learningprovides better generalization than the isolated learningapproaches in terms of interpolation, extrapolation and thespeed of learning (Hazara and Kyrki 2017). Hazara andKyrki (2018) improved their Global Parametric DynamicMovement Primitive (GPDMP) (Lundell et al. 2017) inorder to construct, incrementally, a database of motionprimitives, which aims to improve the generalization to newtasks. Furthermore, it has been transferred incrementallyfrom simulation to the real world (Hazara and Kyrki2019). Moreover, authors endow incremental learning witha task manager, which capable of selecting a new task bymaximizing future learning while considering the currenttask performance (Hazara et al. 2019).

We categorize the applications into several subsections basedon different topics. We ﬁrst separate the use of DMPs forrobot interaction with the passive environment ( e.g., tools,objects, surfaces, etc ) and for interaction with an agent thatinvolve co-manipulation ( e.g., human, another robot, etc ).Additionally, we examine several other major applicationareas, such as human body augmentation/rehabilitation withexoskeletons, teleoperation , motion analysis/recognition , high DoF robots , and autonomous driving and ﬁeld robotics . Most of the daily tasks that the robots perform involvesome kind of physical interaction with the environmentthat requires control of forces or positions. Nevertheless,simultaneous control of force and position in the same axisis not possible (Stramigioli 2001) § , and therefore the controlapproaches have to make a compromise between prioritizingposition control or force control (Schindlbeck and Haddadin2015). The key to such control is for the robot to learnappropriate force or position reference trajectories that canlead to the desired task performance in interaction with theenvironment. A common approach to teaching robot motion trajectoriesis kinesthetic guidance (Fig. 16-

Left ), where the humanoperator holds the robot arm and shows the appropriatemovements to be encoded by DMPs (Kormushev et al. 2011;Abu-Dakka et al. 2015a; Joshi et al. 2017; Papageorgiou et al.2020a,b). Recently, the technology is protruding into highrisk ﬁelds such as invasive surgery, where high-dimensionalﬁne human-like manipulation skills are being demonstrated(Su et al. 2021) and executed with robots (Su et al. 2020;Ginesi et al. 2019). In (Kormushev et al. 2011), the humanheld the robot arm and used kinesthetic guidance to teachthe position and orientation trajectories necessary to performironing and door opening task. In the second stage the § There is a duality in impedance-admittance, i.e., , the force produce motionand motion produces force, therefore if one is the input, the other can onlybe the output of the control system Peternel et al. (2017a).

Prepared using sagej.cls Journal Title XX(X)

Figure 17.

Using DMPs for adapting to changing surfaces( e.g., wiping task) (Kramberger et al. 2018) corresponding forces and torques were recorded with ahaptic device in a teleoperation setup. For setups where therobot arm is equipped with multiple force/torque senors, thetwo demonstration steps with additional control policies canbe combined into one (Steinmetz et al. 2015; Montebelliet al. 2015).An alternative to learning force trajectories is to learnthe impedance of the robot by learning the desired stiffnesstrajectories. The ability to change the impedance of the arm iscrucial to simplify the physical interaction in unpredictableand unstructured environments (Hogan 1984; Burdet et al.2001). In (Peternel et al. 2015, 2018a) teleoperation wasused with a push-button interface to command the robotimpedance, which was learned by DMPs that enabled therobot to perform various collaborative assembly tasks. Forexample, the learned position and stiffness DMPs were usedto insert a peg in a groove to bind the two parts (Peternel et al.2015), or to screw a bolt (Peternel et al. 2018a). A similarapproach was used in (Yang et al. 2018) to learn DMPs usedfor vegetable cutting task.While teleoperation based methods are very effectiveto teach the robot DMPs for interaction tasks, it usuallyinvolves a complex and expensive system. The methodin (Abu-Dakka et al. 2018) enabled the robot to learnstiffness proﬁles through measurement of interaction forcewith the environment to perform valve turning task. Themethod in Peternel et al. (2017a) used human demonstrationand EMG to learn stiffness DMPs from human muscleactivity measurements in order to perform sawing and wiping(Fig. 17) tasks.Nevertheless, adaptation of a single trajectory is unlikelyto generate an appropriate solution for more general cases,where the task execution needs to change signiﬁcantly.After learning the initial DMP motion trajectories throughkinesthetic guidance, the robot can then adapt them basedon the measured force of interaction while performing thetask. Pastor et al. (2011) introduced a method for real-timeadaptation of demonstrated DMPs trajectories dependingon the measured sensory data. They developed an adaptiveregulator for trajectory adaptation based on estimated andactual force data. Recently, Prakash et al. (2020), extendedthe real-time adaptation approach incorporating a fuzzyfractional order sliding mode controller in order to efﬁciently

Figure 18.

An example of using DMPs in assembly tasks( e.g., peg-in-the-hole) (Kramberger et al. 2016b) .and stably adapt the demonstrated DMP trajectory to fastmovements, such as a ping pong swing.Sutanto et al. (2018) presented a data-driven frameworkfor learning a feedback model from demonstrations. Theyused an RBF-NN (RBF-NN) to represent the feedbackmodel for the movement primitive. Similarly to this research,Gams et al. (2010) proposed a method for adaptation ofdemonstrated movements depending on the desired force,with which the robot should act on the environment. Thus,they ensured the adaptation of the learned movements todifferent surfaces. This approach was later expanded (Pastoret al. 2011) to provide the statistically most likely force-torque proﬁle (Pastor et al. 2012) and furthermore, force-torque data was used for training a classiﬁer (Straizys et al.2020) in order to modulate the demonstrated trajectory forthe use with delicate tasks such as tissue or fruit cutting.Moving onward form policy learning, Do et al. (2014)presented an adaptation framework, where not only thedesired adaptation force or trajectory, but the entire skill canbe learned. They demonstrated the method with a wiping taskunder different environmental conditions.

Assembly presents one of the more challenging tasks toautomate, where not only position trajectories but also taskdynamics have to be taken into account. To deal with thischallenge, various methods were proposed. Abu-Dakka et al.(2015a) proposed a method that can learn the orientationaspect of the complex physical interaction, like the peg-in-the-hole assembly tasks (Fig. 18). The proposed method wasintegrated in an industrial assembly framework where thekey challenge was to adapt to uncertainties presented by theassembly task (Kr¨uger et al. 2014; Abu-Dakka et al. 2014).Complex assembly tasks that are subject to change cannotbe demonstrated and executed on the ﬂy therefore, adaptationmethods are required for ensuring a successful execution.Nemec et al. (2020) used exception strategies for dealingwith complex assembly cases . Sloth et al. (2020) presentedan exception strategy framework, combining discrete andperiodic DMP, coupled with force control to learn anassembly task under tight tolerances. Gaˇspar et al. (2020)presented several industrial assembly challenges and focusedon fast and efﬁcient setup of industrial tasks with theemphasis on LfD. Angelov et al. (2020) incorporated severaldifferent control policies by taking into account the dynamicsand sequencing of the task.

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel In some cases, active exploration and autonomousdatabase expansion can be used for learning assemblypolicies automatically. In (Petric et al. 2015) the proposedalgorithm can build and combine CMP motion knowledgefrom a database in an autonomous manner.Complementary to assembly tasks, disassembly is alsochallenging by solely using the demonstrated trajectories. Asdescribed in (Ijspeert et al. 2013), DMPs have a unique pointattractor in the speciﬁed goal parameter of the movement,essentially repelling the idea of reversibility. Therefore,Nemec et al. (2018) proposed a framework, where thedisassembly challenge was tackled by learning two separateDMPs from a single demonstrated motion; one forwards andone backwards. San Juan et al. (2019) took the idea furtherand reformulated the DMPs phase system with a logisticdifferential equation to obtain two stable point attractors.This approach provided a reversibility formulation of thedynamical system and demonstrated the effectiveness of thealgorithm on a peg-in-hole assembly task.

Desired force-torque proﬁles can be tracked using ILC(Gams et al. 2014, 2015b). In repetitive robotic tasks,iterative learning has been gaining increased popularity(Bristow et al. 2006) due to its effectiveness and robustness.However, in order to achieve effective results, a carefultuning of learning parameters is required. Norrl¨of (1991) andTayebi (2004) presented an adaptive learning approach forautomated tuning of learning parameters.Another approach is to use RL to adapt DMPs. Forexample, in (Buchli et al. 2011b,a) stiffness parameters wereadjusted during the task execution by RL.Alternatives to feedback-based adaptation of DMPs andRL are scalability and generalization approaches. Matsubaraet al. (2011) proposed an algorithm for the generation of newcontrol policies from existing knowledge, thereby achievingan extended scalability of DMPs, while mixture of motorprimitives were used for generation of table tennis swings(M¨ulling et al. 2010). On the other hand, generalization ofDMPs was combined with model predictive control by Krugand Dimitrov (2015) or applied to DMP coupling terms by(Gams et al. 2015a), which were learned and later addedto a demonstrated trajectory to generate new joint spacetrajectories.Stulp et al. (2013) proposed to learn a functionapproximator with one regression in the full space of phaseand tasks parameters, bypassing the need for two consecutiveregressions. Forte et al. (2012) performed a comparisonstudy of LWR and GPR for trajectory generalization. Thiswork shows that higher accuracy can be achieved with LWRtrajectory approximation. Koropouli et al. (2015) presenteda generalization approach for force control policies. Bylearning both the policy and the policy difference data usingLWR, they could estimate the policy at new inputs throughsuperposition of the training data.(Deniˇsa et al. 2016a) used GPR based generalization overcombined joint position trajectories and torque commandsin the framework of CMPs. To showcase the versatilityof the approach, (Petric et al. 2018) applied it for robotbased assembly tasks. Finally, Kramberger et al. (2017)extended the approach to account for variations of the desired

Figure 19.

An example of using DMPs for collaborativehuman–robot sawing from (Peternel et al. 2018b). tasks, e.g., assembly of similar objects. This enables therobot movements to be automatically generated with theuse of LWR from a demonstrated database of successfultask executions, which include kinematic and dynamicdemonstrated trajectories encoded with DMPs. The newlyobtained data is used to account for the changes in thework-space. Nevertheless, a major problem in statisticallearning is how to efﬁciently deal with singularity freerepresentations of orientation trajectories. To resolve thisissue, Kramberger et al. (2016a) proposed a formulation forCartesian space DMPs where orientations are representedwith unit quaternion.

While control of robot interaction with the passiveenvironment can solve majority of the tasks, in some casesthe robot needs to interact with an active agent ( e.g., human,another robot, etc.). Human-robot collaboration is becomingone of the key ﬁelds in robotics (Ajoudani et al. 2018). Toperform a successful physical human-robot collaboration,the robot must be able to control complex movements incoordination with the human partner. In this direction, theability to modulate the impedance is important to coordinatethe physical interaction during human-robot co-manipulationof tools (Peternel et al. 2017b). DMPs offer an elegantsolution to encode such coordinated dynamic movements.In (Peternel et al. 2014) the collaborative robot wasthought online through teleoperation how to performcollaborative sawing with a human co-worker. Theimpedance was commanded to the robot through muscleactivity measurement using EMG. DMPs were used toencode coordinated phase-dependent motion and impedanceas demonstrated by the human teleoperator. Teaching thoughteleoperation is an effective way to convey the physicalinteraction skill to the collaborative robot, however the setupcan be expensive and is not widely available.An intuitive alternative to teleperation is for the robotto learn the skill directly though physical interaction withthe human partner while they are collaborating. Numerousmethods have focused on learning the synchronized motionbetween collaborative partners (Kulvicius et al. 2013; Pradaet al. 2013; Gams et al. 2014; Umlauft et al. 2014; Zhouet al. 2016a; Peternel et al. 2018b; Sidiropoulos et al. 2019;Ugur and Girgin 2020). For example, in (Kulvicius et al.

Prepared using sagej.cls Journal Title XX(X)

InteractionPrimitives that can account for a probabilistic nature ofcollaborative movements. Rather than having a singlevalue of weights, the DMP includes weight distributions.This distribution enabled the robot to learn the inherentcorrelations of cooperative actions and infer the behaviorof the human partner during the cooperation. (Cui et al.2016, 2019) used visual information to extract contextrelated parameters that augment the interaction primitives toincrease the robustness during the task execution.There are also other types of co-manipulation scenarios,such within-hand bi-manipulation or human-robot objecthandover. For example, in (Koene et al. 2014; Gao et al.2019) DMPs were used to perform bi-manipulation, whilein (Prada et al. 2014; Solak and Jamone 2019; Laﬂeche et al.2019; Abdelrahman et al. 2020) DMPs were used for human-robot object handover.When the environment is hazardous for the humanworkers or when there are too many robots compared tothe number of human workers, the obvious solution is tomake robot collaborate between themselves. The method in(Peternel and Ajoudani 2017) used DMPs to make novicerobots learn from the expert robot through co-manipulation.Initially the novice robot remained compliant to let the expertrobot lead the task execution. In the ﬁrst stage, the novicerobot learned the reference motion through DMPs. In thesecond stage, it became stiff to perform the newly learnedmotion, while the expert robot initiated stiff/compliantphases expected in the collaborative task execution. Finally,the novice robot then learned in which phases of the taskto increase or decrease the impedance and encoded thisimpedance behavior with DMPs.

The most common type of co-manipulation is the classichuman-robot collaboration, where a human and a roboticagent are physically performing industrial or daily tasks.Another type of co-manipulation occurs when a humanis wearing an exoskeleton. In most cases, the exoskeletonsimply ampliﬁes the current human motion (Kong and Jeon

Figure 20.

An example of using DMPs for teaching passiveexercises for ankle rehabilitation (Abu-Dakka et al. 2015b,2020).

Left ). Thephase-dependent toque trajectory updated online in order tominimise the muscle activity feedback measured by EMG.In (Petriˇc et al. 2016) the robot encoded the assistive motionwith DMPs and then adapted it by taking into account aspectsof human motor control through the Fitts’ law.Gait related rehabilitation with exoskeletons is a verycommon application of DMPs and there are numerousexamples (Abu-Dakka et al. 2015b; Huang et al. 2016a,b;Hwang et al. 2019; Yuan et al. 2020; Amatya et al. 2020). In(Abu-Dakka et al. 2015b, 2020) a parallel robot was used forankle rehabilitation, where the movements were generatedby DMPs (Fig. 20). In (Huang et al. 2016a) DMPs wereused to learn the gait motion trajectories for a lower-bodyexoskeleton. This approach was then extended with a RLmethod to adapt a force coupling term (similar to earlierapproaches presented in Section 3.3.2) to enable onlineadaption of motion trajectories (Huang et al. 2016b).Besides normal gait, DMPs were also applied for stair-ascend (Xu et al. 2020) and sit-to-stand (Kamali et al. 2016)assistive movements of lower-body exoskeletons. In (Joshiet al. 2019), a robotic arm was used to assist humans withputting the cloths on their body, where the movements weregenerated by DMPs.Besides assistive body movement and rehabilitation,DMPs were also applied for relaxation purposes. Forexample, in (Li et al. 2020) a robotic arm provided massagemovement through DMPs.

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Figure 21.

The left photo shows arm exoskeleton applicationfrom (Peternel et al. 2016a). The right photo shows high-DoFhumanoid robot Walk-man (Tsagarakis et al. 2017) performingsawing in (Peternel and Ajoudani 2017).

Teleoperation is one of the major ﬁelds of robotics andenables a human to have a direct and real-time controlover a (remote) robot. Typically the control is done throughinterfaces that can capture the human commands to be sentto the robot and that can provide haptic feedback fromthe robot. While teleoperation focuses on giving the humanoperator a full or shared control over the robot, DMPs areused to encode autonomous robot behaviors. Therefore, herewe mostly examine cases where teleoperation is used to teachthe robot new autonomous behavior encoded by DMPs.In (Kormushev et al. 2011) a combination of kinestheticteaching and teleoperation was employed to form theDMP-based robot skill for ironing. After the motiontrajectories were learned through kinesthetic guidance, thecorresponding forces were recorded by using haptic deviceand a teleoperation system. In (Peternel et al. 2014)teleoperation was used to teach the robot how to physicallycollaborate with another human. Since there was no hapticfeedback, the teleoepration setup was unilateral, but thehuman was able teach also the impedance of the robot inaddition to motion. The former was commanded by muscleactivity measurement through EMG, while the latter was wascommanded by the movement of the human operator’s armas measured by an optical motion capture system.In (Peternel et al. 2018a) the human operator thoughtthe robot through teleoperation how to perform autonomousassembly actions (Fig. 16-

Right ). DMPs were used toencode the commanded impedance and motion, howevera more practical push-button based impedance commandinterface was employed. More importantly, the teleoperationsetup was bilateral and the haptic interface provided thehuman operator the feedback about the forces the robot felt.Similarly, teleoperation approaches were used in (Yang et al.2018; Lentini et al. 2020).Real robot is not always necessary to acquire newskills. In (Beik-Mohammadi et al. 2020) the robot andthe environment were simulated and the human operatorused a virtual reality system. A combination of DMPs and RL was used to form an adaptive skill. The scenarioproposed in (Abu-Dakka et al. 2015a) was teleoperationin its basis, however the human demonstrator did not justpretend that he/she is embodied in the robot, but the robottask environment was cloned at the human side (Fig. 16-

Right ). This removed the need for force feedback and hapticdevice, since the human felt the real environment on his/herside, while the motion was captured by non-contact basedsensory system ( i.e., magnetic trackers) and then mirroredon the robot.Multiple demonstrations through teleoperation can beinconsistent, especially if done in a multi-agent shared-control setting. The method proposed in (Pervez et al. 2019)can synchronize inconsistent demonstration through shared-control teleoperation and encode them with DMPs.

DMPs provide an elegant and fast way to deal with systemswith high-dimensional space by sharing one canonicalsystem (3) among all DoFs and maintain only a separate setof transformation systems. By high-dimensional space weare referring to systems with 10 or more DoFs ( i.e.,

Walk-man humanoid robot in Figure 21-

Right ). In this section, wewill quickly mention some of the potential works with highnumber of DoFs.Ijspeert et al. (2002b,a) used DMPs in an IL frameworkto learn tennis forehand, a tennis backhand, and rhythmicdrumming using 30-DoFs humanoid robot. Pastor et al.(2009) used DMPs to encode a 10-DoFs exoskeleton robotarm. Luo et al. (2015) integrated DMPs with stochasticpolicy gradient RL and GPR in order to design an onlineadaptive push recovery control strategy. The approach hadbeen applied to PKU-HR5 humanoid robot with 20-DoFs.Andr´e et al. (2015, 2016) implemented a predictive modelof sensor traces that enables early failure detection forhumanoids based on an associative skill memory to periodicmovements and DMPs. They applied their algorithm onDARwIn-OP with 20-DoFs in simulation. Pfeiffer andAngulo (2015) represented gestures by applying DMPs onREEM robotic platform with 23-DoFs. Nah et al. (2020)proposed an approach to optimize DMP parameters in orderto deal with the complexity of of high DoF system likea whip. They tested their approach in simulation for 10-, 15-, 20-, and 25-DoFs systems. In order to reduce thenumber of required rollouts for adaptation to new taskconditions, Queißer and Steil (2018) used CMA-ES tooptimize DMPs parameters. In addition, they introduceda hybrid optimization method that combines a fast coarseoptimization on a manifold of policy parameters with a ﬁnegrained parameter search in the unrestricted space of actions.The approach was successfully illustrated in simulation usinga 10-DoFs robot arm. Liu et al. (2020) proposed DMP-basedtrajectory generation to enable a full-body humanoid robotwith 10-DoFs (for the two legs) to realize adaptive walking.Travers et al. (2016, 2018) proposed a framework thatintegrates DMP with Gaussian-shaped spatial activationwindows in order to plan the motion for high DoF roboticsystems ( e.g., snake-like robot) in complex environment(with obstacles) by linking low-level controllers to high-levelplanners.

Prepared using sagej.cls Journal Title XX(X)

DMPs tend to ﬁt topologically similar trajectories withsimilar shape parameters w i (Ijspeert et al. 2013). Thisbehavior, due to the temporal and spatial invariance ofDMPs, makes the shape parameters a useful descriptor torecognize similar motions. Indeed, (Strachan et al. 2004)have shown that the shape parameters computed for repetitions of classes of discrete hand gestures—measuredwith a DoFs accelerometer—are linearly separable, i.e., easy to classify. Lantz and Murray-Smith (2004) drawsimilar conclusions for classes of periodic hand gestures.Xu et al. (2005) used the correlation between the parametervectors of two DMPs to measure the similarity betweenthe original motion and recognize gait patterns. Similarly,(Ijspeert et al. 2013) used the correlation between parametervectors to recognize the letters of the Grafﬁti alphabet.The shape parameters w i are also suitable to ﬁt moresophisticated classiﬁers like support vector machines. Thisstrategy was used to successfully classify gestures observedwith a monocular Liu et al. (2014) or a binocular (Wangand Payandeh 2015) camera. Instead of considering a ﬁxednumber of basis function (number of shape parameters),(Zhang et al. 2017) used fast dynamic time warping(Salvador and Chan 2007) to align parameter vectors ofdifferent length and then used K -nearest neighbors toclassify different motions.Motion recognition can also be used to determine whetherthe robot is correctly executing a task by comparing senseddata with a movement template. In this respect, (Andr´e et al.2016) used an associative skill memory, like the one in(Pastor et al. 2011), as a predictive model of sensor traces thatenables early failure detection. In this work, DMPs were usedto compactly encode the associative skill memory and speedup the failure detection. Described approaches demonstratethat DMPs are a valuable option for gesture recognitionespecially for systems with limited computational power.Humans tend to perform the same task in slightlydifferent manners. Sometimes differences in the executionstyle contain useful information to adapt the motion todifference executive context. This is the case, for instance,of a reaching motion with and without an obstacle onthe way. To capture the execution style (Matsubara et al.2010) augmented the forcing term of the DMP with a styleparameter learned from multiple demonstrations. At runtime, different style parameters can be used to smoothlyinterpolate between demonstrated behaviors. Zhao et al.(2014) employed movements with different styles, but alsolearned a smooth mapping between style parameters and goalto improve the generalization.When humans provide seamless demonstrations, DMPscan be used for online segmentation and recognition. Tothis end, (Meier et al. 2011) assumed that a library ofDMPs is given and used it to recognize motion segmentsduring a task demonstration. Instead of using exemplartemplates for each class of primitives, (Chang and Kuli´c2013) segmented a video stream using motion to non-motion transitions, ﬁtted DMPs on segmented data, andperformed clustering to group similar motion segments inan unsupervised fashion. Song et al. (2020) performedunsupervised trajectory segmentation using the conceptof key points, i.e., shared features across different task demonstrations. (Mandery et al. 2016) segmented whole-body motions by detecting contacts with the environmentand used them to build a probabilistic language model wherewords represent the poses and sentences sequences of poses.The learned language model was used to plan whole-bodymotion trajectories executed by joining multiple DMPs (seeSection 3.2).DMPs have been developed as a computational model ofthe neurobiological motor primitives (Schaal et al. 2007).Experimental ﬁndings from neurophisiology related to thespinal force ﬁelds in frog have inspired the modiﬁcation ofDMPs formulation in (Hoffmann et al. 2009). As discussedin Section 3.1.1, this multidimensional representationovercomes limitations of classical DMPs like trajectoryovershooting and dependence of the trajectory from thereference frame used to describe the motion. Hoffmann et al.(2009) also derived a collision avoidance strategy for DMPs,inspired by the way human avoid collisions during armmotion. (DeWolf et al. 2016) investigated the human abilityto cope with to changes in the arm dynamics and kinematicstructure during motion control. They proposed a spikingneuron model of the motor control system that uses DMPsto implement the preparation and planning functionalities ofthe premotor cortex. The effects of changes in the robot’sdynamic parameters on the tracking performance of a DMPtrajectory were studied in (Kuppuswamy and Alessandro2011). Their ﬁndings suggests that the change in the bodyparameters should be explicitly considered in the DMPlearning process. Hotson et al. (2016) augmented a brain-machine interface that captures neural signals with a DMPmodel of the endpoint trajectories executed by a non-humanprimate. The system was used to decode real trajectoriesform a primate manipulating four different objects. DMPs can be utilized in various autonomous non stationaryﬁelds of robotics. Perk and Slotine (2006) utilized DMPsfor deﬁning ﬂight paths and obstacle avoidance forUnmanned Areal Vehicles (UAVs), where the trajectorieswere generated based on the joystick movements controllingthe throttle of the UAV motors. Later, (Fang et al. 2014)extended the approach to encode user demonstrated UAVdata, extracting and encoding the rhythmic and linearsegments of the ﬂight trajectory, and combining them intoa ﬂight control skill. Furthermore, (Tomi´c et al. 2014)formulated the UAV movements as a optimal controlproblem. The output of the optimal control solver wasencoded with DMPs, enabling them to generalize andapply in-ﬂight modiﬁcations to the UAV ﬂight trajectoriesin real-time. Similarly, Lee et al. (2018); Kim et al.(2018a) presented a framework for UAV cooperative arealmanipulation tasks, based on an adaptive controller whichadapts the movement of the UAV in relation to the massand inertial properties of the payload. In addition, DMPswere incorporated in the control scheme to modify the ﬂighttrajectories and avoid obstacles on the ﬂy. The approach waslater extended to incorporate path optimization, where DMPsplay a signiﬁcant tole for real time obstacle avoidance (Leeet al. 2020).As mentioned before, DMPs represent a versatilemovement representation, which can be implemented in

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel various tasks and scenarios. One of the recent applications inthis ﬁeld are also Autonomous Underwater Vehicles (AUVs).Carrera et al. (2015) integrated the DMPs in a learning bydemonstration scenario for an AUV. The demonstrated dataconsisted of the manipulator and vehicle sensory outputs,which were efﬁciently used to demonstrate an underwatervalve turning task.DMPs are also represented in the autonomous drivingdomain. In the recent work of (Wang et al. 2018, 2019),the authors propose a framework which decomposes thecomplex driving data into a more elementary compositionof driving skills represented as motion primitives. In theproposed framework, DMPs are utilized to represent thedriver’s trajectory with acceptable accuracy and can begeneralized to different situations. This section provides guidelines to choose, among theseveral discussed in this work, the most appropriate approachfor a given application. A useful criterion to decide whetherto use a particular approach is the availability of code thatgreatly simpliﬁes the implementation. We have searchedfor open-source DMP implementations and listed them ina Git repository (see Section 6.2). To further contributethe community, we have also released the implementationslisted in Table 4. This section ends with a discussion on thelimitations inherent to the DMP formulation, the open issues,and the possible research directions. These are summarizedin Table 5.

Previous sections present different DMP formulations andextensions together with possible application scenarios. Asusual, there is not a single formulation that serves all thescopes and purposes, and the suitable approach to usedepends on the goal to achieve and conditions of application.For this reason, we present some guidelines to guide the userin the process of selecting the formulation to use.

For a task with distinct starting and ending points, discreteDMPs are a logical option to encode the movementtrajectories between them. Examples of these tasks include:reaching and pick-and-place, (Stulp et al. 2009; Forte et al.2012; Deniˇsa et al. 2016a; Caccavale et al. 2019), speciﬁcactions of assembly (Kr¨uger et al. 2014; Abu-Dakka et al.2014; Gaˇspar et al. 2020; Nemec et al. 2020; Angelov et al.2020) and cutting (Yang et al. 2018; Straizys et al. 2020).When the starting and ending points coincide, periodicDMPs are the logical option, since the encoded movementscan be repeated over and over again. Good examples of theirapplication are repetitive tasks such as locomotion (R¨uckertand d’Avella 2013; M. Wensing and Slotine 2017), humanbody augmentation/rehabilitation (Peternel et al. 2016a),wiping a surface (Gams et al. 2016; Peternel et al. 2017a;Kramberger et al. 2018) and sawing (Peternel et al. 2018b).Nevertheless, even typically non-repetitive tasks that areexecuted just once every now and then can still be encodedwith periodic DMPs when the starting and ending pointscoincide (Peternel et al. 2018a). There are cases where it is not possible to clearlydistinguish if the motion is periodic or discrete. For instance,(Ernesti et al. 2012) have shown that the ﬁrst step in a gait ofa humanoid robot is a transients towards a periodic motion.Their representation is a good candidate to encode transientsconverging to a limit cycle trajectories. Finally, in some caseslike in complex assembly, the task requires a combination ofdiscrete and periodic DMPs (Sloth et al. 2020).

The original formulation of DMPs were and are stillsuccessfully applied to multidimensional independent datawith each DoF ∈ R (Section 2.1.1 and 2.2.1). These datacan be joint or Cartesian positions, forces, torques, etc ,where every DoF of the data can be evolved independentlyform the rest. However, such formulation is not enough tosuccessfully encode data with speciﬁc geometry constraintswithout pre- and/or post-processing the data. Examplesof such data are: i ) orientation, where data are tight upby additional constraints ( i.e., the orthogonality in caseof rotation matrix representation or the unit norm ofthe quaternion representation); ii ) full stiffness/dampingmatrices and manipulability matrices are encapsulated in anSPD matrices.In many early works, orientation trajectories were learnedand adapted without considering its geometry constraints(Pastor et al. 2009), leading to improper orientation andhence requiring an additional re-normalization. In a differentexample, Umlauft et al. (2017) used eigendecomposition forimpedance adaptation.In order to comply with such geometry constraints,researchers provided new formulation of DMPs that ensuresproper unit quaternions or rotation matrices over the courseof orientation adaptation Abu-Dakka et al. (2015a); Udeet al. (2014); Saveriano et al. (2019); Koutras and Doulgeri(2020a), and proper SPD matrices over the course of theadaptation of SPD proﬁles ( e.g., stiffness or manipulabilityellipsoids) (Abu-Dakka and Kyrki 2020). We believe thatusing these geometry-aware DMPs is preferable to encodedata with underlying geometry constraints. DMPs represent motion trajectories as stable dynamicalsystems with learnable weights that deﬁne the shape of themotion. In the LfD paradigm, DMP weights are usuallylearned in a supervised manner using human demonstrations.The procedure used to transform human demonstrations intotraining data for the DMP forcing term is highlighted inSection 2.1.1.1. Given the training data, different techniquescan be used to ﬁt the weights.LWR is widely used when the forcing term is acombination of RBFs as in (4). If multiple demonstrationsare given, one can exploit GMM/GMR as in (Pervez et al.2017a) or GPR as in (Fanger et al. 2016) to represent theforcing term and use expectation–maximization to ﬁt the(hyper-)parameters. Deep NNs, typically trained via back-propagation, seem an appealing possibility to map inputimages into forcing terms (Pervez et al. 2017b), mimickingthe human perception-action loop. Although appealing, thepossibility of exploiting deep learning techniques as motionprimitives requires further investigations.

Prepared using sagej.cls Journal Title XX(X)

Table 4.

Open-source implementations of DMP-based approaches that we have released to the community. The source code foreach appraoch is available at https://gitlab.com/dmp-codes-collection

Approach Author Language Description

Discrete DMP

Fares J. Abu-Dakka

C++

An implementation for discrete DMP based on the work in(Ude et al. 2010; Abu-Dakka et al. 2015a; Ude et al. 2014).Periodic DMP

Luka Peternel

Python

An implementation for periodic DMP based on the work in(Peternel et al. 2016a).Unit quaternionDMP

Fares J. Abu-Dakka

Matlab and

C++

An implementation for unit quaternion DMP and goalswitching based on the work in (Abu-Dakka et al. 2015a;Ude et al. 2014).SPD DMP

Fares J. Abu-Dakka

Matlab

An implementation for SPD DMP and goal switching basedon the work in (Abu-Dakka and Kyrki 2020).Joining DMPs

Matteo Saveriano

Matlab

An implementation for joining multiple DMPs based on thework in (Saveriano et al. 2019).Coupling-forceDMPs

Aljaz Kramberger

Matlab

An implementation for discrete DMPs and force couplingterms based on the work in (Kramberger et al. 2018).

In real applications, there can be a misplacement betweenthe DMP trajectory and the robot motion. Typical examplesinclude assembly or other tasks that require physicalinteraction with the environment (see Section 5.1). In thissituations, the DMP motion can be incrementally adjusted toimprove the robot performance. ILC arises as an interestingapproach to iteratively update the DMP weights as it ensuresa rapid convergence to the desired performance (Gams et al.2014; Abu-Dakka et al. 2015a; Kramberger et al. 2018).However, ILC assumes that a target behavior to reproduceis given. When the target behavior cannot be easily speciﬁedand the robot performance is not satisfactory, RL solutionshave to be adopted. As detailed in Section 4.3, DMPs areeffective control policies and, combined with policy searchalgorithms like PI or PoWER, are able to solve complexand highly dynamic tasks. Performing robotic tasks in the real world requires adaptationcapabilities. When adaptation of DMPs based on somefeedback is required, one of the extension methods shouldbe applied. For example, to change the existing movementbased a detected obstacle, the method in (Park et al. 2008;Hoffmann et al. 2009; Tan et al. 2011; Gams et al. 2016) canbe used (see Section 3.3.1). If it is necessary to adaptivelylearn the movement dynamics based on real-time effortfeedback, the method in (Peternel et al. 2016a) can beemployed (see Section 3.3).Furthermore, for industrial tasks, such as assembly orpolishing, adaptation strategies combining force control withdemonstrated trajectories can be applied (Abu-Dakka et al.2015a; Kramberger et al. 2016a; Gams et al. 2010), ensuringthe system will follow the predeﬁned trajectory and adaptto the environmental uncertainties. For online adaptationDMPs can be used as a trajectory generator, which outputrepresents an input to the force control algorithm, on theother hand, force feedback can directly be incorporated as acoupling term in the DMPs formulation (see Section 3.3.2),eliminating the need for an additional force controller. Similar approach can also be utilized for velocity basedadaptation of the movements (see Section 3.3.4).

In physical interaction tasks, DMPs can be used to eitherlearn force or impedance (Peternel et al. 2017a). If thetask requires position control, then the impedance shouldbe learned with DMPs in combination with the referenceposition. If the task requires to control a speciﬁc force, e.g., pushing on a surface during the wiping and drilling,either force or impedance is feasible. However, if safety isthe most critical aspect, the DMPs should be used to learnimpedance control so that the robot can be made soft.Furthermore, to overcome any undesirable movements,the control policy can be augmented with a tank-basedpassivity approach (Shahriari et al. 2017). This approachmonitors the energy ﬂow between the modeled sub-systems, e.g.,

DMPs trajectory generation, impedancecontrol, environment. In an event of an energy violation, thesystem will ﬁrst try to passively compensate for the violationand subsequently if the violation cannot be compensated e.g.the energy tank is depleted, stop the system. In cases, wherethe task characteristics are not fully known, a learning policycan be added on top of the passivity approach (Krambergeret al. 2018) in-order to learn the overall energy requirementsfor the task.

Availability of code and datasets is useful to speed-upthe setup of novel applications without the need of re-implementing a promising approach from scratch. Wehave searched for available DMP implementation andfound out that several researchers published their DMPcodes in various open-source repositories. We decided tolist the available implementations on the Git repositorythat accompanies this paper ( https://gitlab.com/dmp-codes-collection/third-party-dmp ). Foreach implementation, we mention the type of DMP, theauthor, the url to download the code, and the used

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel programming language. We also provide a short descriptionof the key features.Apart from listing existing approaches, the Git repositorythat accompanies this paper contains implementation that wedecided to release to the community. The list of providedimplementations is given in Table 4. As any motion primitive representation, DMPs havestrengths but also inherent limitations. The advantages ofthe DMPs have been widely discussed in previous sections.Here, we present the main limitations of the DMPs anddiscuss open issues that require further investigation. Asummary of these limitations is presented in Table 3.

The phase variable used to suppress the non-linear forcingterm and ensure convergence to a given goal introducesan implicit time dependency in the DMP formulation. Thereason for representing the time dependency implicitly asa dynamical system is that such a phase variable can beconveniently manipulated. For example, in Section 2.1.1.2,we have seen how to manipulate the phase variable toslow-down (or even stop) the execution. A drawback ofthe time dependency is that the shape of the DMP motionis signiﬁcantly affected by the time evolution of the phasevariable. If the phase vanishes too early, the last part ofthe trajectory is executed with a linear dynamics convergingto the goal. If the phase lasts too long, the trajectory mayovershoot and fail to reach the goal within the desired time.In both cases, the DMP motion may signiﬁcantly deviatefrom the demonstration. A properly designed phase stoppingmechanism can remedy the issue, but the proper phasestopping to adopt depends on the speciﬁc application.In order to overcome this limitation, several authorsfocused on learning stable and time-independent (orautonomous) dynamical systems from demonstrations. Aglobally stable and autonomous system generates a vectorﬁeld that converges to the given goal from any initial state.Without the need of a phase variable, the generated motiondepends only of the current state of the system. Notableapproaches to learn stable and autonomous systems exploitLyapunov theory (Khansari-Zadeh and Billard 2011, 2014),contraction theory (Ravichandar and Dani 2015; Blocheret al. 2017), diffeomorphic transformations (Neumann andSteil 2015; Perrin and Schlehuber-Caissier 2016), andpassivity considerations (Kronander and Billard 2015).These approaches have been effectively used to learncomplex movements from demonstrations.In general, autonomous systems have the potential torepresent much more complex movements than DMPs.For example, autonomous systems can encode differentmotions in different regions of the state-space. In thisrespect, DMPs can only generate a stereotypical trajectoryconnecting the start to the goal, regardless where the initialstate is placed in the state-space. However, the stereotypicalmotion generation is also an advantage of DMPs since itmakes easier to predict the generated motion in regionsof the state-space poorly covered by training data. On thecontrary, it is hard to predict how an autonomous systemgeneralizes where only few or no training data are available. DMPs are know to scale well in high-dimensional spacessince the learned forcing term always depends on a shared,scalar phase variable. Autonomous systems perform learningdirectly on the high-dimensional state-space, which posesnumerical challenges and requires much more training data.In synthesis, each representation has its own advantagesand disadvantages and the choice between time-dependantand autonomous motion primitives depends on the speciﬁcapplication.

Representing the demonstrated motion as a probabilitydistribution has several advantages. For example, in aprobabilistic framework the generalization to new a goal(or a via-point) is achieved using conditioning on the newgoal (via-point), while the covariance computed from theprobabiltiy distribution can represent couplings betweendifferent DoFs (Paraschos et al. 2013). As a matter of fact,classical DMPs are deterministic and lack the stochasticinformation on the modelled motion.Ben Amor et al. (2014) proposed an approach to estimatethe predictive distribution P ( w | y T ) that relates the DMPweights w and a partial trajectory y T observed for T time instants. P ( w | y T ) is used to estimate the most likelyweights given a partial movement and to reconstruct themissing part of the trajectory. However, a full probabilisticcharacterization of DMPs is still missing.The ProMP framework (Paraschos et al. 2013) proposedan alternative movement primitive representation thatcontains information about the variability across differentdemonstrations as well as different DoFs in the form ofa covariance matrix. This enables to explicitly encodethe couplings between different directions and to increasethe generalization by conditioning on a desired goal, via-point, or intermediate velocity. The covariance computedby ProMPs represent the variability and the correlationin the demonstrations. In other representations, like GPR,the covariance is a measure of the model uncertaintydue to the lack of training data. Kernelized MovementPrimitives (KMPs) (Huang et al. 2019; Silv´erio et al. 2019)offer the possibility of modelling variability, correlation,and uncertainty in the same framework. However, KMP’scomputational cost can be elevated compared to DMP inlonger trajectories due to the computation of the inverse ofthe kernel matrix. A vast majority of methods employ DMPs only as a referencetrajectory generator for the closed-loop controller, whichthen actually executes it. However, the DMPs can also beused as a part of the close-loop controller itself and only afew methods explored this concept. For example, in Peternelet al. (2016a) the DMPs are directly torque generators forexoskeleton actuators in the control loop, which is closedby a feedback from the human user’s muscle activity.Nevertheless, in such scenario the closed-loop stability andpassivity become crucial considerations that have to beaddressed and resolved before the wide-spread application(Kramberger et al. 2018).

Prepared using sagej.cls Journal Title XX(X)6.3.4 Coping with high-dimensional inputs

One of the main limitations of DMP is that it encodeshuman and robot trajectories explicitly with the time ( i.e., e.g., faster/slower velocity) from the demonstratedones. In order to avoid synchronization problem, Ben Amoret al. (2014) designed a time-alignment strategy, while(Pervez et al. 2017a) estimated the phase signal during thetraining using expectation-maximization (Bishop 2006).As the DMP models trajectories using basis functions,this works effectively when learning time-driven trajectories( i.e., e.g., the position ofa target) from an input image and a fully connected NN toretrieve the forcing term from the 2–D parameters and thephase variable. The CNN and the fully connected NN aretrained in two separate stages. The approach is promising,but the separate training of the two networks increases thepre-processing and complicates the learning process.Alternative approaches in literature, such as GMM/GMR(Calinon 2016), Task-Parameterized GMM (TP-GMM)(Calinon 2016), KMP (Huang et al. 2019; Huang et al.2021), can be directly applied for learning demonstrationscomprising high-dimensional inputs.

The well known second-order dynamic properties of theDMPs, strive towards a single attractor system (Ijspeert et al.2002a). The properties, e.g., convergence and modulationof the motion, are well studied and implementations can befound in many research papers. Because of the second-orderdynamics, the system becomes unstable if for example themotion is reversed during the execution. In the past years,two main approaches describing the reversibility problemhave been introduced. In the ﬁrst approach (Nemec et al.2018), reversibility is considered as leaning two separateprimitives, one for each direction of the motion. Theapproach is promising, but does not reﬂect true reversibility,because it uses one attractor point for each primitive.On the other hand, (San Juan et al. 2019) introduced analternative formulation with two stable attractor systems.The ﬁrst attractor is deﬁned at the starting point y ofthe trajectory and the subsequent one at the goal g ,the dynamical system between them guaranties a stableconvergence depending on the selected attractor. Theapproach demonstrated true reversibility, while keeping allthe DMPs properties. Nevertheless, all questions have notbeen resolved yet, the approach was evaluated on tasks andjoint space position trajectories. A proper formulation for Table 5.

A summary of DMP features and limitations that havebeen solved ( (cid:51) ) or partially solved ( (cid:121) ). Limitation Related work Status

Via-points (Ning et al. 2011, 2012; Weitschatand Aschemann 2018; Saveriano et al.2019; Zhou et al. 2019) (cid:51)

Start-point (Hoffmann et al. 2009; Ijspeert et al.2013; Weitschat et al. 2013; Draganet al. 2015) (cid:51)

Goal-point (Ijspeert et al. 2013; Ude et al. 2014;Abu-Dakka and Kyrki 2020; Draganet al. 2015; Weitschat and Aschemann2018) (cid:51)

Obstacleavoidance (Park et al. 2008; Hoffmann et al.2009; Tan et al. 2011; Kim et al. 2015;Rai et al. 2017) (cid:51)

Geometry-constraineddata (Pastor et al. 2009; Abu-Dakka et al.2015a; Ude et al. 2014; Saverianoet al. 2019; Abu-Dakka and Kyrki2020) (cid:121) ¶ Probabilistic (Ben Amor et al. 2014) (cid:121)

Extrapolation (Pervez and Lee 2018; Zhou et al.2019) (cid:121)

High-diminput (Pervez et al. 2017a; Pahiˇc et al. 2020) (cid:121)

Closed-loop (Peternel et al. 2016a; Krambergeret al. 2018) (cid:121)

Multi-attractor (Nemec et al. 2018; San Juan et al.2019) (cid:121) dealing with orientations e.g. quaternions in task space is stillmissing,

Since their introduction in early 2000’s, DMPs haveestablished as one of the most used and popularapproaches for motor commands generator system inrobotics. Several authors have exploited and extended theclassical formulation to overcome some limitations andfulﬁll different requirements. Their research resulted in alarge amount of papers published over the last two decades.One of the aims of this paper is to categorize and reviewthe vast literature on DMPs. We took a systematic reviewapproach and automatically searched for DMP related papersin a popular database. A manual inspection of the resultingpapers, guided by clear and unbiased criteria, led to thepapers included on this tutorial survey.Another aim of our work is to provide a tutorial onDMPs that presents the classical formulation and the keyextensions in rigorous mathematical terms. We made aneffort to unify the notation among different approaches inorder to make them easier to understand. Moreover, weprovide useful guidelines that guide the reader to select theright approach for a given application. In the tutorial vein,we have also searched for open-source implementation of the ¶ The referred work extended the classical DMP to different space like SO (3) or S m ++ . Although formally similar, the extention to otherRiemannian manifolds like the Grassmannian or the Hyperbolic manifoldsis non-trivial and still not fully addressed. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel described approaches and released to the community severalimplementations of DMP-based approaches.Advantages of DMPs have been discussed as well astheir limitations and the open-issues. We have summarizedthem in Table 5 where we also indicate the solved issuesand the one that require further investigation. In thisrespect, as research on DMP is still very active, we providea comprehensive discussion that will help the reader tounderstand what has been done in the ﬁeld and where he canput his research focus. Funding

This work has been partially supported by:- CHIST-ERA project IPALM (Academy of Finland decision326304).- The Austrian Research Foundation (Euregio IPN 86-N30,OLIVER).- Innovation Fund Denmark (Research and innovation projectMADE FAST).

References

Abdelrahman A, Mitrevski A and Ploger P (2020) Context-aware task execution using apprenticeship learning. In:

IEEEInternational Conference on Robotics and Automation . pp.1329–1335.Abu-Dakka F, Nemec B, Kramberger A, Buch A, Kr¨uger N and UdeA (2014) Solving peg-in-hole tasks by human demonstrationand exception strategies.

Industrial Robot

IEEE International Conference onRobotics and Automation . Paris, France, pp. 4421–4426.Abu-Dakka FJ, Nemec B, Jørgensen JA, Savarimuthu TR, Kr¨ugerN and Ude A (2015a) Adaptation of manipulation skillsin physical contact with the environment to reference forceproﬁles.

Autonomous Robots

Roboticsand Autonomous Systems

Frontiers in Robotics and AI

7: 177.Abu-Dakka FJ, Valera A, Escalera J, Vall´es M, Mata V andAbderrahim M (2015b) Trajectory adaptation and learning forankle rehabilitation using a 3-prs parallel robot. In: Liu H,Kubota N, Zhu X, Dillmann R and Zhou D (eds.)

IntelligentRobotics and Applications . Cham: Springer InternationalPublishing, pp. 483–494.Abu-Dakka FJ, Valera A, Escalera JA, Abderrahim M, Page Aand Mata V (2020) Passive exercise adaptation for anklerehabilitation based on learning control framework.

Sensors

IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 4555–4562.Agostini A, Saveriano M, Lee D and Piater J (2020) Manipulationplanning using object-centered predicates and hierarchicaldecomposition of contextual actions.

IEEE Robotics andAutomation Letters

Physica D:Nonlinear Phenomena

The International Journal of RoboticsResearch

Autonomous Robots

IEEE/ASME International Conference onAdvanced Intelligent Mechatronics . pp. 889–894.Amatya S, Rezayat Sorkhabadi S and Zhang W (2020) Humanlearning and coordination in lower-limb physical interactions.In:

Proceedings of the American Control Conference , volume2020-July. pp. 557–562.Andr´e J, Santos C and Costa L (2016) Skill memory in bipedlocomotion: Using perceptual information to predict taskoutcome.

Journal of Intelligent and Robotic Systems: Theoryand Applications

Journal of Intelligent andRobotic Systems: Theory and Applications

IEEE Robotics and Automation Letters

Artiﬁcial Intelligence Review

11: 11–73.Basa D and Schneider A (2015) Learning point-to-point movementson an elastic limb using dynamic movement primitives.

Robotics and Autonomous Systems

66: 55–63.Beetz M, Stulp F, Esden-Tempski P, Fedrizzi A, Klank U, KresseI, Maldonado A and Ruiz F (2010) Generality and legibilityin mobile manipulation: Learning skills for routine tasks.

Autonomous Robots

IEEEInternational Conference on Robot and Human InteractiveCommunication . pp. 713–720.Ben Amor H, Neumann G, Kamthe S, Kr¨omer O and Peters J (2014)Interaction primitives for human-robot cooperation tasks. In:

IEEE International Conference on Robotics and Automation .Hong Kong, China, pp. 2831–2837.Bian F, Ren D, Li R, Liang P, Wang K and Zhao L (2019) Anextended dmp framework for robot learning and improvingvariable stiffness manipulation.

Assembly Automation

Handbook ofRobotics , chapter 74. Secaucus, NJ, USA: Springer, pp. 1995–2014. 2nd Edition.Bishop CM (2006)

Linear Models for Regression . Springer, pp.172–173.

Prepared using sagej.cls Journal Title XX(X)

Bitzer S and Vijayakumar S (2009) Latent spaces for dynamicmovement primitives. In:

IEEE-RAS International Conferenceon Humanoid Robots . pp. 574–581.Blocher C, Saveriano M and Lee D (2017) Learning stabledynamical systems using contraction theory. In:

InternationalConference on Ubiquitous Robots and Ambient Intelligence .pp. 124–129.Bristow DA, Tharayil M and Alleyne AG (2006) A survey ofiterative learning control.

Control Systems Magazine

The International Journal ofRobotics Research

Robotics: Science and Systems VI : 153.Burdet E, Osu R, Franklin DW, Milner TE and Kawato M (2001)The central nervous system stabilizes unstable dynamics bylearning optimal impedance.

Nature

Autonomous Robots

Joint IEEE InternationalConference on Development and Learning and on EpigeneticRobotics . Lisbon, Portugal, pp. 66–71.Calinon S (2016) A tutorial on task-parameterized movementlearning and retrieval.

Intelligent Service Robotics

IEEE-RAS International Conference on Humanoid Robots . pp. 582–588.Calinon S, Li Z, Alizadeh T, Tsagarakis N and Caldwell D(2012) Statistical dynamical systems for skills acquisitionin humanoids. In:

IEEE-RAS International Conference onHumanoid Robots . pp. 323–329.Carrera A, Palomeras N, Hurt´os N, Kormushev P and CarrerasM (2015) Cognitive system for autonomous underwaterintervention.

Pattern Recognition Letters

67: 91–99.Chang G and Kuli´c D (2013) Motion learning from observationusing afﬁnity propagation clustering. In:

IEEE InternationalSymposium on Robot and Human Interactive Communication .pp. 662–667.Chen N, Bayer J, Urban S and Van Der Smagt P (2015) Efﬁcientmovement representation by embedding dynamic movementprimitives in deep autoencoders. In:

IEEE-RAS InternationalConference on Humanoid Robots . IEEE, pp. 434–440.Chen N, Karl M and Van Der Smagt P (2016) Dynamic movementprimitives in latent space of time-dependent variationalautoencoders. In:

IEEE-RAS International Conference onHumanoid Robots . pp. 629–636.Chen Z and Liu B (2018) Lifelong machine learning.

SynthesisLectures on Artiﬁcial Intelligence and Machine Learning

IEEE/RSJ International Conference on Intelligent Robots andSystems . pp. 3875–3881.Chiaverini S and Siciliano B (1999) The unit quaternion: A usefultool for inverse kinematics of robot manipulators.

SystemsAnalysis Modelling Simulation

IEEE Robotics and Automation Letters

EvolutionaryIntelligence

European Conference onModelling and Simulation . pp. 421–427.Cohn DA, Ghahramani Z and Jordan MI (1996) Active learningwith statistical models.

Journal of artiﬁcial intelligenceresearch

4: 129–145.Colome A, Planells A and Torras C (2015) A friction-model-basedframework for reinforcement learning of robotic tasks in non-rigid environments. In:

IEEE International Conference onRobotics and Automation . pp. 5649–5654.Colom´e A and Torras C (2014) Dimensionality reduction andmotion coordination in learning trajectories with dynamicmovement primitives. In:

IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 1414–1420.Colom´e A and Torras C (2018) Dimensionality reduction fordynamic movement primitives and application to bimanualmanipulation of clothes.

IEEE Transactions on Robotics

IEEE-RAS InternationalConference on Humanoid Robots . pp. 711–717.Cui Y, Poon J, Miro JV, Yamazaki K, Sugimoto K and Mat-subara T (2019) Environment-adaptive interaction primitivesthrough visual context for human–robot motor skill learning.

Autonomous Robots

IEEE Control Systems Letters

Advances in Intelligent Systems andComputing

Studies inComputational Intelligence

IEEE/ASME Transactions on Mechatronics

RobotControl , chapter 1. Rijeka: IntechOpen, pp. 1–17.Deniˇsa M and Ude A (2015) Synthesis of new dynamic movementprimitives through search in a hierarchical database of examplemovements.

International Journal of Advanced RoboticSystems

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel DeWolf T, Stewart T, Slotine JJ and Eliasmith C (2016) A spikingneural model of adaptive arm control.

Proceedings of the RoyalSociety B: Biological Sciences

IEEE International Conference on Roboticsand Automation . pp. 1858–1864.Dometios A, Zhou Y, Papageorgiou X, Tzafestas C and Asfour T(2018) Vision-based online adaptation of motion primitives todynamic surfaces: Application to an interactive robotic wipingtask.

IEEE Robotics and Automation Letters

IEEE InternationalConference on Robotics and Automation . Seattle, WA, USA,pp. 2339–2346.Duminy N, Nguyen S and Duhaut D (2017) Strategic and interactivelearning of a hierarchical set of tasks by the poppy humanoidrobot. In:

IEEE International Conference on Development andLearning and Epigenetic Robotics . pp. 204–209.Eiband T, Saveriano M and Lee D (2019) Learning hapticexploration schemes for adaptive task execution. In:

IEEE International Conference on Robotics and Automation .Montreal, QC, Canada, pp. 7048–7054.End F, Akrour R, Peters J and Neumann G (2017) Layereddirect policy search for learning hierarchical skills. In:

IEEEInternational Conference on Robotics and Automation . pp.6442–6448.Ernesti J, Righetti L, Do M, Asfour T and Schaal S (2012) Encodingof periodic and their transient motions by a single dynamicmovement primitive. In:

IEEE-RAS International Conferenceon Humanoid Robots . pp. 57–64.Fabisch A and Metzen J (2014) Active contextual policy search.

Journal of Machine Learning Research

15: 3371–3399.Fang Z, Wang G, Li W and Li P (2014) Control-orientedmodeling of ﬂight demonstrations for quadrotors using higher-order statistics and dynamic movement primitives. In:

IEEEInternational Symposium on Industrial Electronics . pp. 1518–1525.Fanger Y, Umlauft J and Hirche S (2016) Gaussian processes fordynamic movement primitives with application in knowledge-based cooperation. In:

IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 3913–3919.Fei G, Wang S and Liu B (2016) Learning cumulatively to becomemore knowledgeable. In:

Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining . pp. 1565–1574.Flash T and Hochner B (2005) Motor primitives in vertebrates andinvertebrates.

Current opinion in neurobiology

IEEE Transactions on Robotics

Roboticsand Autonomous Systems

The 17th International Conference onAdvanced Robotics . pp. 252–258. Forte D, Ude A and Gams A (2011) Real-time generalizationand integration of different movement primitives. In:

IEEE-RAS International Conference on Humanoid Robots . Bled,Slovenia, pp. 590–595.Gams A, Deniˇsa M and Ude A (2015a) Learning of parametriccoupling terms for robot-environment interaction. In:

IEEE-RAS International Conference on Humanoid Robots . Seoul,South Korea, pp. 304–309.Gams A, Do M, Ude A, Asfour T and Dillmann R (2010) On-lineperiodic movement and force-proﬁle learning for adaptationto new surfaces. In:

IEEE-RAS International Conference onHumanoid Robots . Nashville, TN, USA, pp. 560–565.Gams A, Ijspeert A, Schaal S and Lenarˇciˇc J (2009) On-linelearning and modulation of periodic movements with nonlineardynamical systems.

Autonomous robots

IEEE Transactions on Robotics

Roboticsand Autonomous Systems

75: 340–351.Gams A and Ude A (2009) Generalization of example movementswith dynamic systems. In:

IEEE-RAS International Conferenceon Humanoid Robots . Paris, France, pp. 28–33.Gams A, Ude A and Morimoto J (2015b) Acceleratingsynchronization of movement primitives: Dual-arm discrete-periodic motion of a humanoid robot. In:

IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 2754–2760.Gao J, Zhou Y and Asfour T (2019) Projected force-admittancecontrol for compliant bimanual tasks. In:

IEEE-RASInternational Conference on Humanoid Robots . pp. 607–613.Gaˇspar T, Deniˇsa M and Ude A (2020) Knowledge acquisitionthrough human demonstration for industrial robotic assembly.

Advances in Intelligent Systems and Computing

Robotics and Autonomous Systems

IEEE International Conferenceon Robotics and Automation . pp. 5616–5621.Ginesi M, Meli D, Nakawala H, Roberti A and Fiorini P (2019) Aknowledge-based framework for task automation in surgery. In:

International Conference on Advanced Robotics . pp. 37–42.Guerin K, Riedel S, Bohren J and Hager G (2014) Adjutant: Aframework for ﬂexible human-machine collaborative systems.In:

IEEE/RSJ International Conference on Intelligent Robotsand Systems . pp. 1392–1399.Gutzeit L, Fabisch A, Otto M, Metzen J, Hansen J, Kirchner F andKirchner E (2018) The besman learning platform for automatedrobot skill learning.

Frontiers Robotics AI

Springer Tracts in AdvancedRobotics

Prepared using sagej.cls Journal Title XX(X) learning. In:

International Conference on Advanced Robotics .pp. 557–564.Hazara M and Kyrki V (2016) Reinforcement learning forimproving imitated in-contact skills. In:

IEEE-RASInternational Conference on Humanoid Robots . pp. 194–201.Hazara M and Kyrki V (2017) Model selection for incrementallearning of generalizable movement primitives. In:

Interna-tional Conference on Advanced Robotics . IEEE, pp. 359–366.Hazara M and Kyrki V (2018) Speeding up incremental learningusing data efﬁcient guided exploration. In:

IEEE InternationalConference on Robotics and Automation . IEEE, pp. 1–8.Hazara M and Kyrki V (2019) Transferring generalizable motorprimitives from simulation to real world.

IEEE Robotics andAutomation Letters

IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 1834–1839.Herzog S, W¨org¨otter F and Kulvicius T (2016) Optimal trajectorygeneration for generalization of discrete movements withboundary conditions. In:

IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 3143–3149.Hoffmann H, Pastor P, Park DH and Schaal S (2009) Biologically-inspired dynamical systems for movement generation: auto-matic real-time goal adaptation and obstacle avoidance. In:

IEEE International Conference on Robotics and Automation .Kobe, Japan, pp. 2587–2592.Hogan N (1984) Adaptive control of mechanical impedance bycoactivation of antagonist muscles.

IEEE Transactions onautomatic control

Journal of Dynamic Systems, Measurement, and Control

IEEE Robotics and Automation Letters

IEEE Transactions on Industrial Electronics

IEEE international conference on robotics andautomation . IEEE, pp. 257–263.Huang R, Cheng H, Guo H, Lin X, Chen Q and Sun F (2016b)Learning cooperative primitives with physical human-robotinteraction for a human-powered lower exoskeleton. In:

IEEE/RSJ International Conference on Intelligent Robots andSystems . IEEE, pp. 5355–5360.Huang Y, Abu-Dakka FJ, Silv´erio J and Caldwell DG (2021)Toward orientation learning and adaptation in cartesian space.

IEEE Transactions on Robotics

The International Journal of RoboticsResearch

International Journal of Precision Engineering and Manufacturing

Advances inNeural Information Processing Systems 15 . Vancouver, BC,Canada: Cambridge, MA: MIT Press, pp. 1523–1530.Ijspeert AJ, Nakanishi J, Hoffmann H, Pastor P and Schaal S (2013)Dynamical Movement Primitives: Learning Attractor Modelsfor Motor Behaviors.

Neural Computation

IEEE/RSJInternational Conference on Intelligent Robots and Systems .Maui, HI, USA, pp. 752–757.Ijspeert AJ, Nakanishi J and Schaal S (2002b) Learning rhythmicmovements by demonstration using nonlinear oscillators. In:

IEEE/RSJ International Conference on Intelligent Robots andSystems , volume 1. Lausanne, Switzerland, pp. 958–963.Ijspeert AJ, Nakanishi J and Schaal S (2002c) Movement imitationwith nonlinear dynamical systems in humanoid robots.

IEEEInternational Conference on Robotics and Automation

Advanced Robotics

Proceedings of the Advances inRobotics . Association for Computing Machinery, pp. 1–6.Kamali K, Akbari AA and Akbarzadeh A (2016) Trajectorygeneration and control of a knee exoskeleton based on dynamicmovement primitives for sit-to-stand assistance.

AdvancedRobotics

IEEE International Conference onRobotics and Automation . pp. 316–321.Kastritsi T, Dimeas F and Doulgeri Z (2018) Progressiveautomation with dmp synchronization and variable stiffnesscontrol.

IEEE Robotics and Automation Letters

Transactions on Robotics

Robotics and AutonomousSystems

IEEERobotics and Automation Magazine

IEEE/ASME InternationalConference on Advanced Intelligent Mechatronics . pp. 1032–1037.Kim W, Lee C and Kim H (2018b) Learning and generalizationof dynamic movement primitives by hierarchical deepreinforcement learning from demonstration. In:

IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 3117–3123.

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Kober J, Bagnell JA and Peters J (2013) Reinforcement learningin robotics: A survey.

The International Journal of RoboticsResearch

IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 834–839.Kober J, Mohler B and Peters J (2010a) Imitation and reinforcementlearning for motor primitives with perceptual coupling. In:Sigaud O and Peters J (eds.)

From Motor Learning toInteraction Learning in Robots . Berlin, Heidelberg: SpringerBerlin Heidelberg, pp. 209–225.Kober J, M¨ulling K, Kr¨omer O, Lampert CH, Sch¨olkopf B andPeters J (2010b) Movement templates for learning of hittingand batting. In:

IEEE International Conference on Roboticsand Automation . Anchorage, AK, USA: IEEE, pp. 853–858.Kober J, Oztop E and Peters J (2011) Reinforcement learning toadjust robot movements to new situations. In:

InternationalJoint Conference on Artiﬁcial Intelligence . pp. 2650–2655.Kober J and Peters J (2010) Imitation and reinforcement learning.

IEEE Robotics and Automation Magazine

Machine Learning

Autonomous Robots

IEEEInternational Symposium on Robot and Human InteractiveCommunication . pp. 249–254.Kong K and Jeon D (2006) Design and control of an exoskeletonfor the elderly and patients.

IEEE/ASME Transactions onmechatronics

IEEE/RSJ International Conference on Intelligent Robots andSystems . pp. 3232–3237.Kormushev P, Calinon S and Caldwell DG (2011) Imitationlearning of positional and force skills demonstrated viakinesthetic teaching and haptic input.

Advanced Robotics

Journal of Intelligent & Robotic Systems

Conference on Robot Learning . PMLR,pp. 293–302.Koutras L and Doulgeri Z (2020b) Dynamic movement primitivesfor moving goals with temporal scaling adaptation. In:

IEEEInternational Conference on Robotics and Automation . IEEE,pp. 144–150.Kramberger A, Gams A, Nemec B, Chrysostomou D, Madsen Oand Ude A (2017) Generalization of orientation trajectoriesand force-torque proﬁles for robotic assembly.

Robotics andAutonomous Systems

98: 333–346.Kramberger A, Gams A, Nemec B and Ude A (2016a)Generalization of orientational motion in unit quaternion space.In:

IEEE-RAS International Conference on Humanoid Robots . Cancun, Mexico, pp. 808–813.Kramberger A, Piltaver R, Nemec B, Gams M, Ude A et al. (2016b)Learning of assembly constraints by demonstration and activeexploration.

Industrial Robot: An International Journal

IEEE/RSJInternational Conference on Intelligent Robots and Systems .Madrid, Spain: IEEE, pp. 6023–6028.Kr¨omer O, Detry R, Piater J and Peters J (2010a) Grasping withvision descriptors and motor primitives. In:

InternationalConference on Informatics in Control, Automation andRobotics , volume 2. pp. 47–54.Kr¨omer OB, Detry R, Piater J and Peters J (2010b) Combiningactive learning and reactive control for robot grasping.

Roboticsand Autonomous Systems

IEEE Robotics and Automation Letters

Journalof Intelligent & Robotic Systems

IEEE International Conference onAdvanced Robotics . pp. 1–8.Kr¨uger N, Ude A, Petersen H, Nemec B, Ellekilde LP, SavarimuthuT, Rytz J, Fischer K, Buch A, Kraft D, Mustafa W, Aksoy E,Papon J, Kramberger A and W¨org¨otter F (2014) Technologiesfor the fast set-up of automated assembly processes.

KunstlicheIntelligenz

Robotics and AutonomousSystems

IEEE International Conference on Robotics andAutomation . pp. 2275–2280.Kulvicius T, Ning K, Tamosiunaite M and Worg¨otter F(2012) Joining movement sequences: Modiﬁed dynamicmovement primitives for robotics applications exempliﬁed onhandwriting.

IEEE Transactions on Robotics

ArtiﬁcialIntelligence

Procedia Computer Science

7: 166–168. 2nd European FutureTechnologies Conference and Exhibition 2011 (FET 11).Laﬂeche JF, Saunderson S and Nejat G (2019) Robot cooperativebehavior learning using single-shot learning from demonstra-tion and parallel hidden markov models.

IEEE Robotics andAutomation Letters

Nordic Conference on Human–ComputerInteraction , volume 82. pp. 97–100.

Prepared using sagej.cls Journal Title XX(X)

Lauretti C, Cordella F, Ciancio AL, Trigili E, Catalan JM, BadesaFJ, Crea S, Pagliara SM, Sterzi S, Vitiello N et al. (2018)Learning by demonstration for motion planning of upper-limbexoskeletons.

Frontiers in neurorobotics

12: 5.Lauretti C, Cordella F, Guglielmelli E and Zollo L (2017) Learningby demonstration for planning activities of daily living inrehabilitation and assistive robotics.

IEEE Robotics andAutomation Letters nature

IEEE Robotics and Automation Letters

IEEE Access

8: 135406–135415.Lee S and Suh I (2013) Skill learning and inference frameworkfor skilligent robot. In:

IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 108–115.Lemme A, Reinhart R and Steil J (2014) Self-supervisedbootstrapping of a movement primitive library from complextrajectories. In:

IEEE-RAS International Conference onHumanoid Robots . pp. 726–732.Lentini G, Grioli G, Catalano M and Bicchi A (2020) Robotprogramming without coding. In:

IEEE InternationalConference on Robotics and Automation . pp. 7576–7582.Li C, Fahmy A, Li S and Sienz J (2020) An enhanced robot massagesystem in smart homes using force sensing and a dynamicmovement primitive.

Frontiers in Neurorobotics

IEEE-RAS International Conference on Humanoid Robots . pp.547–553.Li Z, Zhao T, Chen F, Hu Y, Su CY and Fukuda T (2018)Reinforcement learning of manipulation and grasping usingdynamical movement primitives for a humanoidlike mobilemanipulator.

IEEE/ASME Transactions on Mechatronics

Intelligent Autonomous Systems 13 . Cham: SpringerInternational Publishing, pp. 1601–1611.Liu C, Geng W, Liu M and Chen Q (2020) Workspace trajectorygeneration method for humanoid adaptive walking withdynamic motion primitives.

IEEE Access

8: 54652–54662.Liu Z, Hu F, Luo D and Wu X (2014) Visual gesture recognition forhuman robot interaction using dynamic movement primitives.In:

IEEE International Conference on Systems, Man andCybernetics . pp. 2094–2100.Lonˇcarevi´c Z, Pahiˇc R, Ude A and Gams A (2021) Generalization-based acquisition of training data for motor primitive learningby neural networks.

Applied Sciences

Towards Autonomous Robotic Systems .Cham: Springer International Publishing, pp. 16–31.Luo D, Han X, Ding Y, Ma Y, Liu Z and Wu X (2015) Learningpush recovery for a bipedal humanoid robot with dynamical movement primitives. In:

IEEE-RAS International Conferenceon Humanoid Robots . pp. 1013–1019.M Wensing P and Slotine JJ (2017) Sparse control for dynamicmovement primitives.

IFAC-PapersOnLine

IEEE/RSJ International Conference on IntelligentRobots and Systems . pp. 5411–5418.Mao R, Yang Y, Ferm¨uller C, Aloimonos Y and Baras J (2015)Learning hand movements from markerless demonstrations forhumanoid tasks. In:

IEEE-RAS International Conference onHumanoid Robots . pp. 938–943.Matsubara T, Hyon SH and Morimoto J (2010) Learning stylisticdynamic movement primitives from multiple demonstrations.In:

IEEE/RSJ International Conference on Intelligent Robotsand Systems . pp. 1277–1283.Matsubara T, Hyon SH and Morimoto J (2011) Learning parametricdynamic movement primitives from multiple demonstrations.

Neural Networks

IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 3407–3412.Montebelli A, Steinmetz F and Kyrki V (2015) On handingdown our tools to robots: Single-phase kinesthetic teaching fordynamic in-contact tasks. In:

IEEE International Conferenceon Robotics and Automation . pp. 5628–5634.M¨ulling K, Kober J, Kr¨omer O and Peters J (2013) Learning toselect and generalize striking movements in robot table tennis.

International Journal of Robotics Research

IEEE-RAS InternationalConference on Humanoid Robots . Nashville, TN, USA, pp.411–416.Mussa-Ivaldi FA (1999) Modular features of motor control andlearning.

Current opinion in neurobiology

IEEE RAS and EMBS International Conference on BiomedicalRobotics and Biomechatronics . pp. 685–691.Nakanishi J, Rawlik K and Vijayakumar S (2011) Stiffness andtemporal optimization in periodic movements: An optimalcontrol approach. In:

IEEE/RSJ International Conference onIntelligent Robots and Systems . IEEE, pp. 718–724.Nemec B, Forte D, Vuga R, Tamosiunaite M, W¨org¨otter F andUde A (2012) Applying statistical generalization to determinesearch direction for reinforcement learning of movementprimitives. In:

IEEE-RAS International Conference onHumanoid Robots . pp. 65–70.Nemec B, Gams A and Ude A (2013a) Velocity adaptation forself-improvement of skills learned from user demonstrations.In:

IEEE-RAS International Conference on Humanoid Robots .Atlanta, GA, USA, pp. 423–428.Nemec B, Likar N, Gams A and Ude A (2016) Bimanual humanrobot cooperation with adaptive stiffness control. In:

IEEE-RASInternational Conference on Humanoid Robots . pp. 607–613.Nemec B, Simonic M and Ude A (2020) Learning of exceptionstrategies in assembly tasks. In:

IEEE International Conferenceon Robotics and Automation . pp. 6521–6527.

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Nemec B and Ude A (2012) Action sequencing using dynamicmovement primitives.

Robotica

IEEE-RAS International Conference on Humanoid Robots . Bled,Slovenia, pp. 727–732.Nemec B, Vuga R and Ude A (2013b) Efﬁcient sensorimotorlearning from multiple demonstrations.

Advanced Robotics

IEEE-RAS International Conference onHumanoid Robots . Beijing, China, pp. 166–173.Neumann K and Steil JJ (2015) Learning robot motions with stabledynamical systems under diffeomorphic transformations.

Robotics and Autonomous Systems

70: 1–15.Niekum S, Osentoski S, Konidaris G and Barto A (2012) Learningand generalization of complex tasks from unstructureddemonstrations. In:

IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 5239–5246.Niekum S, Osentoski S, Konidaris G, Chitta S, Marthi B and BartoA (2015) Learning grounded ﬁnite-state representations fromunstructured demonstrations.

International Journal of RoboticsResearch

IEEE InternationalConference on Robotics and Automation . pp. 5006–5011.Ning K, Kulvicius T, Tamosiunaite M and W¨org¨otter F (2012) Anovel trajectory generation method for robot control.

Journalof Intelligent and Robotic Systems

IEEE Transaction onRobotics and Automation

BritishMachine Vision Conference . pp. 101.1–101.11.Ojer De Andres M, Mahdi Ghazaei Ardakani M and RobertssonA (2018) Reinforcement learning for 4-ﬁnger-gripper manip-ulation. In:

IEEE International Conference on Robotics andAutomation . pp. 4257–4262.Pahic R, Gams A, Ude A and Morimoto J (2018) Deep encoder-decoder networks for mapping raw images to dynamicmovement primitives. In:

IEEE International Conference onRobotics and Automation . pp. 5863–5868.Pahiˇc R, Ridge B, Gams A, Morimoto J and Ude A (2020)Training of deep neural networks for the generation of dynamicmovement primitives.

Neural Networks

IEEEInternational Conference on Robotics and Automation . pp.5582–5589.Papageorgiou D, Dimeas F, Kastritsi T and Doulgeri Z (2020a)Kinesthetic guidance utilizing dmp synchronization andassistive virtual ﬁxtures for progressive automation.

Robotica

Robotics and Computer-IntegratedManufacturing

61: 101824. Paraschos A, Daniel C, Peters J and Neumann G (2013)Probabilistic movement primitives. In: Burges C, Bottou L,Welling M, Ghahramani Z and Weinberger K (eds.)

Advancesin Neural Information Processing Systems 26 . Lake Tahoe,Nevada, US: Curran Associates, Inc., pp. 2616–2624.Park DH, Hoffmann H, Pastor P and Schaal S (2008) Movementreproduction and obstacle avoidance with dynamic movementprimitives and potential ﬁelds. In:

IEEE-RAS InternationalConference on Humanoid Robots . Daejeon, South Korea, pp.91–98.Pastor P, Hoffmann H, Asfour T and Schaal S (2009)Learning and generalization of motor skills by learning fromdemonstration. In:

IEEE International Conference on Roboticsand Automation . Kobe, Japan, pp. 763–768.Pastor P, Kalakrishnan M, Chitta S, Theodorou E and SchaalS (2011) Skill learning and task outcome prediction formanipulation. In:

IEEE International Conference on Roboticsand Automation . Shanghai, China: IEEE, pp. 3828–3834.Pastor P, Kalakrishnan M, Meier F, Stulp F, Buchli J, TheodorouE and Schaal S (2013) From dynamic movement primitives toassociative skill memories.

Robotics and Autonomous Systems

IEEE-RAS InternationalConference on Humanoid Robots . Osaka, Japan, pp. 309–315.Pastor P, Righetti L, Kalakrishnan M and Schaal S (2011) Onlinemovement adaptation based on previous sensor experiences. In:

IEEE/RSJ International Conference on Intelligent Robots andSystems . San Francisco, CA, USA, pp. 365–371.Paxton C, Jonathan F, Kobilarov M and Hager G (2016) Dowhat i want, not what i did: Imitation of skills by planningsequences of actions. In:

IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 3778–3785.Perk BE and Slotine JJE (2006) Motion primitives for robotic ﬂightcontrol. arXiv preprint cs/0609140 .Perrin N and Schlehuber-Caissier P (2016) Fast diffeomorphicmatching to learn globally asymptotically stable nonlineardynamical systems.

Systems & Control Letters

96: 51–59.Pervez A, Ali A, Ryu JH and Lee D (2017a) Novel learning fromdemonstration approach for repetitive teleoperation tasks. In:

IEEE World Haptics Conference . pp. 60–65.Pervez A, Latifee H, Ryu JH and Lee D (2019) Motion encodingwith asynchronous trajectories of repetitive teleoperationtasks and its extension to human-agent shared teleoperation.

Autonomous Robots

IntelligentService Robotics

IEEE-RASInternational Conference on Humanoid Robots . pp. 191–197.Peternel L and Ajoudani A (2017) Robots learning from robots:A proof of concept study for co-manipulation tasks. In:

IEEE-RAS International Conference on Humanoid Robots .Birmingham, UK: IEEE, pp. 484–490.Peternel L, Noda T, Petriˇc T, Ude A, Morimoto J and Babiˇc J(2016a) Adaptive control of exoskeleton robots for periodicassistive behaviours based on emg feedback minimisation.

PLOS ONE

Prepared using sagej.cls Journal Title XX(X)

Peternel L, Petriˇc T and Babiˇc J (2015) Human-in-the-loopapproach for teaching robot assembly tasks using impedancecontrol interface. In:

IEEE international conference on roboticsand automation . Seattle, WA, USA: IEEE, pp. 1497–1502.Peternel L, Petriˇc T and Babiˇc J (2018a) Robotic assembly solutionby human-in-the-loop teaching method based on real-timestiffness modulation.

Autonomous Robots

Autonomousrobots

IEEE Robotics andAutomation Letters

IEEE Transactions on Neural Systems andRehabilitation Engineering

IEEE-RAS InternationalConference on Humanoid Robots . Cancun, Mexico: IEEE, pp.489–494.Peternel L, Tsagarakis N, Caldwell D and Ajoudani A (2018b)Robot adaptation to human physical fatigue in human–robotco-manipulation.

Autonomous Robots

Neural Information Processing . Springer Berlin Heidelberg,pp. 233–242.Peters J and Schaal S (2008b) Reinforcement learning of motorskills with policy gradients.

Neural Networks

IEEE-RAS International Conferenceon Humanoid Robots . pp. 346–351.Petric T, Gams A, Colasanto L, Ijspeert A and Ude A (2018)Accelerated sensorimotor learning of compliant movementprimitives.

IEEE Transactions on Robotics

The International Journal of Robotics Research

IEEE/RSJInternational Conference on Intelligent Robots and Systems .Chicago, IL, USA, pp. 1790–1795.Petriˇc T, Gams A, ˇZlajpah L, Ude A and Morimoto J (2014b) Onlineapproach for altering robot behaviors based on human in theloop coaching gestures. In:

IEEE International Conference onRobotics and Automation . Hong Kong, China, pp. 4770–4776.Petriˇc T, Goljat R and Babiˇc J (2016) Cooperative human-robotcontrol based on ﬁtts’ law. In:

IEEE-RAS InternationalConference on Humanoid Robots . pp. 345–350.Pfeiffer S and Angulo C (2015) Gesture learning and execution ina humanoid robot via dynamic movement primitives.

PatternRecognition Letters

67: 100–107.Prada M, Remazeilles A, Koene A and Endo S (2013)Dynamic movement primitives for human-robot interaction: comparison with human behavioral observation. In:

IEEE/RSJInternational Conference on Intelligent Robots and Systems .Tokyo, Japan, pp. 1168–1175.Prada M, Remazeilles A, Koene A and Endo S (2014)Implementation and experimental validation of dynamicmovement primitives for object handover. In:

IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 2146–2153.Prakash R, Behera L, Mohan S and Jagannathan S (2020) Dynamictrajectory generation and a robust controller to intercept amoving ball in a game setting.

IEEE Transactions on ControlSystems Technology

Frontiers Robotics AI

IEEE-RAS InternationalConference on Humanoid Robots . pp. 223–229.Rai A, Sutanto G, Schaal S and Meier F (2017) Learning feedbackterms for reactive planning and control. In:

IEEE InternationalConference on Robotics and Automation . pp. 2184–2191.Ramirez-Amaro K, Beetz M and Cheng G (2015) Understandingthe intention of human activities through semantic perception:Observation, understanding and execution on a humanoidrobot.

Advanced Robotics

Gaussian Processes forMachine Learning . Cambridge, Massachusetts: The MITPress.Ravichandar H and Dani A (2015) Learning contracting nonlineardynamics from human demonstration for robot motionplanning. In:

ASME Dynamic Systems and Control Conference .Reinhart R and Steil J (2014) Efﬁcient policy search with aparameterized skill memory. In:

IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 1400–1407.Reinhart R and Steil J (2015) Efﬁcient policy search inlow-dimensional embedding spaces by generalizing motionprimitives with a parameterized skill memory.

AutonomousRobots

Frontiers in Computa-tional Neuroscience .Salvador S and Chan P (2007) Toward accurate dynamic timewarping in linear time and space.

Intelligent Data Analysis

International Joint Conference on Neural Networks . pp. 1068–1075.San Juan II, Sloth C, Kramberger A, Petersen HG, Østerg˚ardEH and Savarimuthu TR (2019) Towards reversible dynamicmovement primitives. In:

IEEE/RSJ International Conferenceon Intelligent Robots and Systems . Macau, China: IEEE, pp.5063–5070.Saveriano M, Franzel F and Lee D (2019) Merging positionand orientation motion primitives. In:

IEEE InternationalConference on Robotics and Automation . Montreal, QC,Canada, pp. 7041–7047.Schaal S (1999) Is imitation learning the route to humanoid robots?

Trends in cognitive sciences

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Schaal S (2006a) Dynamic movement primitives -a framework formotor control in humans and humanoid robotics. In: Kimura H,Tsuchiya K, Ishiguro A and Witte H (eds.)

Adaptive Motion ofAnimals and Machines . Tokyo: Springer Tokyo, pp. 261–280.Schaal S (2006b)

Dynamic systems: brain, body, and imitation ,chapter 1. Cambridge University Press, pp. 177–214.Schaal S and Atkeson CG (1998) Constructive incremental learningfrom only local information.

Neural Comput.

Progress in brain research

IEEE InternationalConference on Robotics and Automation . Seattle, WA, USA,pp. 440–447.Schroecker Y, Amor H and Thomaz A (2016) Directing policysearch with interactively taught via-points. In:

Proceedings ofthe International Joint Conference on Autonomous Agents andMultiagent Systems, AAMAS . pp. 1052–1059.Shahriari E, Kramberger A, Gams A, Ude A and Haddadin S(2017) Adapting to contacts: Energy tanks and task energy forpassivity-based dynamic movement primitives. In:

IEEE-RASInternational Conference on Humanoid Robots . Birmingham,UK, pp. 136–142.Sidiropoulos A, Karayiannidis Y and Doulgeri Z (2019) Human-robot collaborative object transfer using human motionprediction based on dynamic movement primitives. In:

European Control Conference . Naples, Italy, pp. 2583–2588.Silv´erio J, Huang Y, Abu-Dakka FJ, Rozo L and Caldwell DG(2019) Uncertainty-aware imitation learning using kernelizedmovement primitives. In:

IEEE/RSJ International Conferenceon Intelligent Robots and Systems . IEEE, pp. 90–97.Sloth C, Kramberger A and Iturrate I (2020) Towards easy setup ofrobotic assembly tasks.

Advanced Robotics

IEEE/RSJ International Conference on Intelligent Robotsand Systems . pp. 8246–8251.Song C, Liu G, Zhang X, Zang X, Xu C and Zhao J (2020) Robotcomplex motion learning based on unsupervised trajectorysegmentation and movement primitives.

ISA Transactions

IEEE International Conference on Robotics and Automation .pp. 3213–3220.Steinmetz F, Montebelli A and Kyrki V (2015) Simultaneouskinesthetic teaching of positional and force requirements forsequential in-contact tasks. In:

IEEE-RAS InternationalConference on Humanoid Robots . pp. 202–209.Strachan S, Murray-Smith R, Oakley I and ¨Angeslev¨a J (2004)Dynamic primitives for gestural interaction. In: Brewster Sand Dunlop M (eds.)

Mobile Human–Computer Interaction .Springer Berlin Heidelberg, pp. 325–330.Straizys A, Burke M and Ramamoorthy S (2020) Surﬁng onan uncertain edge: Precision cutting of soft tissue usingtorque-based medium classiﬁcation. In:

IEEE InternationalConference on Robotics and Automation . pp. 4623–4629. Stramigioli S (2001)

Modeling and IPC Control of InteractiveMechanical Systems – A Coordinate-Free Approach , LectureNotes in Control and Information Sciences , volume 266.Springer-Verlag London.Stulp F, Buchli J, Ellmer A, Mistry M, Theodorou EA and SchaalS (2012a) Model-free reinforcement learning of impedancecontrol in stochastic environments.

IEEE Transactions onAutonomous Mental Development

IEEE-RAS InternationalConference on Humanoid Robots . pp. 589–595.Stulp F, Raiola G, Hoarau A, Ivaldi S and Sigaud O (2013)Learning compact parameterized skills with a single regression.In:

IEEE-RAS International Conference on Humanoid Robots .Atlanta, GA, USA, pp. 417–422.Stulp F and Schaal S (2011) Hierarchical reinforcement learningwith movement primitives. In:

IEEE-RAS InternationalConference on Humanoid Robots . Bled, Slovenia, pp. 231–238.Stulp F, Theodorou E, Buchli J and Schaal S (2011) Learning tograsp under uncertainty. In:

IEEE International Conference onRobotics and Automation . pp. 5703–5708.Stulp F, Theodorou E, Kalakrishnan M, Pastor P, Righetti L andSchaal S (2011) Learning motion primitive goals for robustmanipulation. In:

IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 325–331.Stulp F, Theodorou E and Schaal S (2012b) Reinforcement learningwith sequences of motion primitives for robust manipulation.

IEEE Transactions on Robotics

IEEEInternational Conference on Robotics and Automation . pp.2203–2208.Su H, Mariani A, Ovur SE, Menciassi A, Ferrigno G and De MomiE (2021) Toward teaching by demonstration for robot-assistedminimally invasive surgery.

IEEE Transactions on AutomationScience and Engineering .Sutanto G, Su Z, Schaal S and Meier F (2018) Learning sensorfeedback models from demonstrations via phase-modulatedneural networks. In:

IEEE International Conference onRobotics and Automation . pp. 1142–1149.Tamosiunaite M, Nemec B, Ude A and W¨org¨otter F (2011)Learning to pour with a robot arm combining goal and shapelearning for dynamic movement primitives.

Robotics andAutonomous Systems

IEEE International Conference on Mechatronics andAutomation . pp. 525–530.Tan H and Kawamura K (2011) A computational framework forintegrating robotic exploration and human demonstration inimitation learning. In:

IEEE International Conference onSystems, Man and Cybernetics . pp. 2501–2506.Tan H, Zhao Y and Kannan B (2016) Applying adaptive controlin modeling human motion behaviors in reinforcement roboticlearning from demonstrations. In:

AAAI Fall Symposium -Technical Report , volume FS-16-01 - FS-16-05. pp. 79–85.

Prepared using sagej.cls Journal Title XX(X)

Tayebi A (2004) Adaptive iterative learning control for robotmanipulators.

Automatica

TheJournal of Machine Learning Research

11: 3137–3181.Thota P, Ravichandar H and Dani A (2016) Learning and synchro-nization of movement primitives for bimanual manipulationtasks. In:

IEEE 55th Conference on Decision and Control . pp.945–950.Thrun S (1996) Is learning the n-th thing any easier than learning theﬁrst? In:

Advances in neural information processing systems .pp. 640–646.Tomi´c T, Maier M and Haddadin S (2014) Learning quadrotormaneuvers from optimal control and generalizing in real-time. In:

IEEE International Conference on Robotics andAutomation . pp. 1747–1754.Travers M, Whitman J and Choset H (2018) Shape-basedcoordination in locomotion control.

The International Journalof Robotics Research

Robotics: Scienceand Systems , volume 12.Tsagarakis NG, Caldwell DG, Negrello F, Choi W, Baccelliere L,Loc VG, Noorden J, Muratore L, Margan A, Cardellino A et al.(2017) Walk-man: A high-performance humanoid platform forrealistic environments.

Journal of Field Robotics

IEEE Transactions on Robotics

IEEEInternational Conference on Robotics and Automation , 3. HongKong, China: IEEE, pp. 2997–3004.Ugur E and Girgin H (2020) Compliant parametric dynamicmovement primitives.

Robotica

IEEEInternational Conference on Robotics and Automation . pp.6428–6434.Umlauft J, Sieber D and Hirche S (2014) Dynamic movementprimitives for cooperative manipulation and synchronizedmotions. In:

IEEE International Conference on Robotics andAutomation . Hong Kong, China, pp. 766–771.Villani L and De Schutter J (2008) Force control. In: Siciliano Band Khatib O (eds.)

Springer Handbook of Robotics . Berlin,Heidelberg: Springer Berlin Heidelberg, pp. 161–185.Vuga R, Nemec B and Ude A (2015a) Enhanced policy adaptationthrough directed explorative learning.

International Journal ofHumanoid Robotics

IEEE-RASInternational Conference on Humanoid Robots . pp. 547–553.Vuga R, Nemec B and Ude A (2016) Speed adaptation forself-improvement of skills learned from user demonstrations.

Robotica

IEEE Transactions on Intelligent Transportation Systems .Wang B, Gong J, Zhang R and Chen H (2018) Learning to segmentand represent motion primitives from driving data for motionplanning applications. In:

IEEE Conference on IntelligentTransportation Systems . pp. 1408–1414.Wang J and Payandeh S (2015) A study of hand motion/posturerecognition in two-camera views.

Lecture Notes in ComputerScience (including subseries Lecture Notes in ArtiﬁcialIntelligence and Lecture Notes in Bioinformatics)

Neurocomputing

IEEE/RSJ International Conference on IntelligentRobots and Systems . pp. 3765–3771.Weitschat R and Aschemann H (2018) Safe and efﬁcient human–robot collaboration part ii: Optimal generalized human-in-the-loop real-time motion generation.

IEEE Robotics andAutomation Letters

IEEE/RSJ InternationalConference on Intelligent Robots and Systems . Tokyo, Japan:IEEE, pp. 5636–5643.W¨org¨otter F, Geib C, Tamosiunaite M, Aksoy E, Piater J, Xiong H,Ude A, Nemec B, Kraft D, Kruger N, Wachter M and Asfour T(2015) Structural bootstrapping-a novel, generative mechanismfor faster and more efﬁcient acquisition of action-knowledge.

IEEE Transactions on Autonomous Mental Development

IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 8582–8588.Xu F, Huang R, Cheng H, Qiu J, Xiang S, Shi C and Ma W (2020)Stair-ascent strategies and performance evaluation for a lowerlimb exoskeleton.

International Journal of Intelligent Roboticsand Applications

IEEE International Symposium onIntelligent Control . pp. 186–191.Xu JX, Wang W, Goh J and Lee G (2005) Internal model approachfor gait modeling and classiﬁcation. In:

IEEE InternationalConference of the IEEE Engineering in Medicine and Biology .pp. 7688–7691.Yang C, Ganesh G, Haddadin S, Parusel S, Albu-Sch¨affer Aand Burdet E (2011) Human-like adaptation of force andimpedance in stable and unstable interactions.

Robotics, IEEETransactions on

IEEE Transactions on Industrial Informatics

Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel variable impedance skills. IEEE/ASME Transactions onMechatronics

IEEE Robotics and Automation Letters

IEEE International Conference onSystems, Man and Cybernetics . pp. 3761–3766.Yuan Y, Li Z, Zhao T and Gan D (2020) Dmp-based motiongeneration for a walking exoskeleton robot using reinforcementlearning.

IEEE Transactions on Industrial Electronics

Robotics and Autonomous Systems

IntelligentRobotics and Applications . Springer International Publishing,pp. 474–484.Zhao T, Deng M, Li Z and Hu Y (2020) Cooperative manipulationfor a mobile dual-arm robot using sequences of dynamicmovement primitives.

IEEE Transactions on Cognitive andDevelopmental Systems

International Journal of Advanced Robotic Systems

IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 3202–3209.Zhou Y, Do M and Asfour T (2016a) Coordinate changedynamic movement primitives—a leader-follower approach.In:

IEEE/RSJ International Conference on Intelligent Robotsand Systems . Daejeon, South Korea, pp. 5481–5488.Zhou Y, Do M and Asfour T (2016b) Learning and force adaptationfor interactive actions. In:

IEEE-RAS International Conferenceon Humanoid Robots . pp. 1129–1134.Zhou Y, Gao J and Asfour T (2019) Learning via-point movementprimitives with inter- and extrapolation capabilities. In:

IEEE/RSJ International Conference on Intelligent Robots andSystems . Macau, China, pp. 4301–4308.