Dynamic Movement Primitives in Robotics: A Tutorial Survey
Matteo Saveriano, Fares J. Abu-Dakka, Aljaz Kramberger, Luka Peternel
DDynamic Movement Primitives inRobotics: A Tutorial Survey
SAGE
Matteo Saveriano , Fares J. Abu-Dakka , Aljaˇz Kramberger , and Luka Peternel Abstract
Biological systems, including human beings, have the innate ability to perform complex tasks in versatile and agilemanner. Researchers in sensorimotor control have tried to understand and formally define this innate property. Theidea, supported by several experimental findings, that biological systems are able to combine and adapt basic unitsof motion into complex tasks finally lead to the formulation of the motor primitives theory. In this respect, DynamicMovement Primitives (DMPs) represent an elegant mathematical formulation of the motor primitives as stable dynamicalsystems, and are well suited to generate motor commands for artificial systems like robots. In the last decades,Dynamic Movement Primitives (DMPs) have inspired researchers in different robotic fields including imitation andreinforcement learning, optimal control, physical interaction, and human–robot co-working, resulting a considerableamount of published papers. The goal of this tutorial survey is two-fold. On one side, we present the existing DMPsformulations in rigorous mathematical terms, and discuss advantages and limitations of each approach as wellas practical implementation details. In the tutorial vein, we also search for existing implementations of presentedapproaches and release several others. On the other side, we provide a systematic and comprehensive review of existing literature and categorize state of the art work on DMP. The paper concludes with a discussion on the limitationsof DMPs and an outline of possible research directions.
Keywords
Motor control of artificial systems, Movement primitives theory, Dynamic movement primitives, Learning fromdemonstration
How biological systems, like humans and animals, executecomplex movements in a versatile and creative manner?
In the past decades, researchers of neurobiology and motorcontrol have made a significant effort trying in to answerthis research question and their experimental findings leadto the formulation of the motor or motion primitives theory .The motion primitives theory explains the execution ofcomplex motion with the ability of biological systems ofsequencing and adapting units of actions, the so-calledmotion primitives (Mussa-Ivaldi 1999; Flash and Hochner2005).Dynamic Movement Primitives (DMPs) have their roots inthe motor control of biological systems and can be seen as arigorous mathematical formulation of the motion primitivesas stable nonlinear dynamical systems (Schaal 2006a,b). Inthis respect, DMPs represent one of the first attempts toanswer the research question: How artificial systems, like (humanoid) robots, can executecomplex movements in a versatile and creative manner?
Beyond their biological motivation, DMPs have a simpleand elegant formulation, guarantee convergence to a giventarget, are sufficiently flexible to create complex behaviors,are capable of reacting to external perturbations in real-time,and can be learned from data using efficient algorithms.These properties explain the “success” of DMPs in roboticapplications, where they have established as a prominent tool for learning and generation of motor commands. Sincetheir formulation in the pioneering work from Ijspeert et al.(Ijspeert et al. 2001, 2002c), DMPs have been successfullyexploited in a variety of applications, becoming de facto thefirst approach that novices in the Imitation Learning (IL) fielduse on their robots.
The popularity of DMPs resulted in a large amount ofwork that use, modify, or extend the original formulationof Ijspeert and colleagues. In this paper, we name classicalDMPs the DMP formulation initially presented in (Ijspeertet al. 2001) and further refined in (Ijspeert et al. 2002c,b). Asshown in Table 1, some tutorials and surveys already tried tocategorize and review existing work on DMPs. Department of Computer Science and Digital Science Center,University of Innsbruck, Innsbruck, Austria Intelligent Robotics Group, Department of Electrical Engineering andAutomation (EEA), Aalto University, Espoo, Finland. SDU Robotics, the Maersk McKinney Moller Institute, University ofSouthern Denmark, Odense, Denmark. Delft Haptics Lab, Department of Cognitive Robotics, Delft University ofTechnology, Delft, The Netherlands
Corresponding author:
Fares J. Abu-Dakka, Intelligent Robotics Group, Department of ElectricalEngineering and Automation (EEA), Aalto University, Maarintie 8, 02150Espoo, Finland.Email: fares.abu-dakka@aalto.fi
Prepared using sagej.cls [Version: 2017/01/17 v1.20] a r X i v : . [ c s . R O ] F e b Journal Title XX(X)
Table 1.
Comparison between existing reviews and tutorial about DMPs and our tutorial survey.
Survey/Tutorial Topics Description (Schaal et al. 2007) • Classical DMPs • Online adaptation • Optimization A tutorial that provides a unifying view on the two main approachesused to develop computational motor control theories, namely differentialequations and optimal control. In this work, discrete and rhythmic DMPs(Ijspeert et al. 2002c,b) are presented as a computational model ofthe motor primitives theory (Mussa-Ivaldi 1999) that unifies nonlineardifferential equations and optimal control. The tutorial has a sectiondedicated to DMP parameters optimization beyond ILs. Schaal et al.show how to optimize DMP parameters to minimize various costsdescribing, for instance, the total jerk of the trajectory or the end-pointvariance.(Ijspeert et al. 2013) • Classical DMPs • Generalization • Online adaptation • Coupling terms A tutorial on classical DMPs that presents both discrete and rhythmicformulations, mostly developed in (Ijspeert et al. 2002c,b,a), andtheir application in IL and movement recognition. The tutorial alsopresents extensions of the classical DMP formulation to prevent highaccelerations at the beginning of the motion, to avoid collisions withunforeseen obstacles (Pastor et al. 2009), and to generalize both inspace ( e.g., reach a different goal) and time (e.g., produce longer/shortertrajectories).(Pastor et al. 2013) • Classical DMPs • Online adaptation • Coupling terms • Impedance learning A tutorial on classical DMPs that presents both discrete and rhythmicformulations, mostly developed in (Ijspeert et al. 2002c,b,a). The tutorialalso presents extensions of the classical DMP formulation to avoidcollisions with unforeseen obstacles (Pastor et al. 2009) and to learnimpedance control policies via Reinforcement Learning (RL) (Buchliet al. 2011b). The key difference between this tutorial and the one from(Ijspeert et al. 2013) is the section dedicated to sensory associationand online, context-aware adaptation of DMP trajectories using theassociative skill memory framework developed in (Pastor et al. 2011;Pastor et al. 2011).(Deniˇsa et al. 2016b) • Classical DMPs • CMPs A tutorial on CMPs, a framework developed to generate compliant robotbehaviors that accurately track a reference trajectory. CMPs exploitclassical DMPs to generate the desired kinematic landscape and encodetask-dependent dynamics as a combination of Gaussian basis functions(torque primitives). The tutorial show how to learn torque primitives fromtraining data, how to generalize CMPs to new situations, and how tocombine existing CMPs to synthesize new robot motions.
Survey and Tutorial Topics Description
This paper DMP tutorial • Classical • Orientation • SPD • Joining • Generalization • Online adaptationDMP survey • (Co-)Manipulation • Variable impedance • Physical interaction • Rehabilitation • Teleoperation • Motion recognition • Reinforcement, deep,and lifelong learning This tutorial survey conducts a wide scan of the existing DMP literaturewith the aim of categorizing and presenting the published work in thefield. The main objective of this comprehensive literature review is givethe reader an exhausting overview on DMP related research, on itsmajor achievements, as well as on open issues and possible researchdirections. Our tutorial survey also provides a structured and unifiedformulation for different methods developed starting from the classicalDMPs proposed by (Ijspeert et al. 2002c,b). We believe that suchformulation contributes to easier the understanding of different methodsand extension that can be found in the literature, clarifying connectionsand differences among the existing approaches. The tutorial surveyalso provides an analysis on pros and cons of various methods and adiscussion with guidelines for different application scenarios.
Schaal et al. (2007) presented the classical DMPs as anattempt to unify nonlinear dynamical systems and optimalcontrol theory, i.e., the two prominent frameworks usedto derive computational models of neuro-biological motortheories (Mussa-Ivaldi 1999; Flash and Hochner 2005).In their tutorial paper, Ijspeert et al. (2013) presented ahomogeneous formulation of rhythmic and discrete DMPs together with some extensions including coupling terms,generalization to different goal, and online adaptationfor collision avoidance. They also described possibleapplications in IL and motion recognition methods. In thesame year, Pastor et al. (2013) published their tutorial onclassical DMPs with a special focus on online adaptation ofthe DMP attractor landscape by integrating the perceptual
Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Dynamic Movement Primitives (DMPs)Tutorial
Sections 2 and 3
Formulation Extentions Survey
Sections 4 and 5
Integration Applications
Classicaldiscrete andperiodicOrientationSymmetricpositivedefinite matrix GeneralizationJoiningOn-lineadaptationAlternativeformulations ManipulationtasksVariableimpedanceReinforcementlearningDeeplearningLife-longlearning Contactwith passiveenvironmentsHuman–robot co-manipulationHumanassistance,augmentation& rehabilitationTeleoperationHigh degreesof freedomMotionanalysis andrecognitionAutonomousdriving andfield robotics
Discussion (Section 6)Guidelines Available code Open issues
Figure 1.
The structure of this tutorial survey on DMPs. information into the action generation process. Later on,Deniˇsa et al. (2016b) reviewed the so-called
CompliantMovement Primitive (CMP), which was first introduced byPetriˇc et al. (2014a). CMPs combine classical DMP togenerate the desired kinematic path and torque primitives —a weighted summation of Gaussian basis functions—togenerate task-specific dynamics. As shown in the review,CMPs are capable of accurately tracking the kinematicpath in a compliant manner, which makes them well suitedfor tasks that require interaction of the robot with theenvironment.However the above-mentioned reviews and tutorialsprimarily focused on the methods and advancements withintheir respective research group and/or focused on a specificproblem or field of application. On the other hand, the DMPsrelated literature is extensive and broad, with contributionsfrom many research groups that made advancements inseveral important fields of application. Therefore, theproposed survey and tutorial on DMPs aims to scan a widerrange and present a tutorial with unified and structuredformulations for various DMPs methods and advancements up to date. This should make it clearer for the users to see thedifferences and connections between various methods, andcan contribute to easier application. In addition, we providea more comprehensive and categorised survey of all majorDMPs application areas in robotics. This can help to inspirethe readers to apply the DMPs in various areas.In the tutorial part, we present mathematical formulations,implementation details, and potential issues of existing DMPformulations starting from the classical DMPs presented in(Ijspeert et al. 2002c,b) up to recent extensions of DMPsto Riemannian geometry and Symmetric Positive Definite(SPD) matrices (Abu-Dakka and Kyrki 2020). In the surveypart, we meticulously review existing literature on DMPs ina comprehensive and methodological manner by focusing onthe quality and significance of their continuations withoutputting a bias on any particular research group. Details onthe systematic review procedure are given as follows.
Prepared using sagej.cls
Journal Title XX(X)
We preformed am automatic search for documents contain-ing the string
Dynamic Movement Primitive in Scopus on November returned papers. Wefound that Scopus lists papers only from on. Therefore,we manually track related work from (preliminarywork on DMPs) to . We further refined the search on February to include last minute papers.We manually inspect all the papers and removed the onesthat do not explicitly use DMPs and that only compareagainst DMP in their literature review. The first and foremostselection criteria were the technical quality of work and thesignificance of the contribution with respect the DMP state-of-the-art prior to the publication of any particular paper. Inother words, we asked the question ’did the paper make asignificant step change in the field?’. Therefore, we discardedpapers that presented similar (or same) ideas multiple times,or that made insignificant improvements to the state-of-the-art. If multiple papers presented the same/similar idea, weincluded the one with the most comprehensive technicalquality, and if the quality was similar, the next decidingfactors were publication in more prestigious journals/venuesor the most cited ones. This manual selection led to the 276papers on DMPs (out of a total of 328 references) analyzedin this work.
The systematic review of DMP literature lead to thetaxonomy shown in Fig. 1, which also describes the structureof this paper. DMPs are placed at the root of the tree andbranch into two nodes, namely the tutorial and the survey . Inthe tutorial part we present different DMP formulations andextensions in rigorous mathematical terms.The tutorial part spans Sections 2 and 3. Section 2embraces DMPs formulations for discrete and periodic motions, orientation trajectories, and
SPDs matrices.Section 3 discusses extensions of the DMP formalismto account for skills generalization , joining of multipleprimitives, online adaptation based on force feedback orreference velocity. The section ends with a short descriptionof DMP related formulations.The survey part spans Sections 4 and 5. Section 4 presentsDMPs integration in larger executive frameworks for manip-ulation and variable impedance tasks, reinforcement , deep ,and life-long learning . Section 5 presents DMPs in differ-ent robotic applications including physical interaction , co-manipulation , rehabilitation , teleoperation , motion recogni-tion , humanoids and field robotics , and autonomous driving .The paper ends with a discussion (Section 6) of presentedapproaches with the aim of providing, where possible, guide-lines to select the most suitable DMP approach for specificneeds. We have also collected available DMP implementa-tions (see Table 4) and contributed to the community withfurther open source implementations available at https://gitlab.com/dmp-codes-collection . Section 6terminates with a discussion on open issues and possibleresearch directions. Our paper has several key contributions that are summarizedas follows.Concerning the tutorial part:• We present the classical DMP formulation andexisting variations of this formulation in a unifiedmanner with rigorous mathematical terms, providingimplementation details and discussing advantages andlimitations of different approaches (Section 2).• We describe advanced approaches where DMPs areintegrated into sophisticated control and/or largerexecutive frameworks (Section 3).• We release to the community several implementationsof described approaches. Detailed information onthese code repositories are provided in Table 4 andSection 6. Moreover, we search for existing open-source implementations of the presented formulationsand list them in our repository (Section 6.2).Concerning the survey part:• We perform a systematic literature search to providea comprehensive and unbiased review of the topic(Sections 4 and 5).• We categorize existing work on DMPs into differentstreams and highlight prominent approaches in eachcategory (Fig. 1 and Sections 4 and 5).• We present guidelines to select the the most suitableapproach for different applications, discuss limitationsinherent to the DMP formalism, and highlight openissues and possible research directions (Section 6).
In this section, we will provide a complete description of thestandard formulation of DMPs. Specifically, point attractorsformulation—to encode discrete point-to point motions—in Section 2.1, and cycle attractors formulation—to encoderhythmic-patterns motions—in Section 2.2. For a betterunderstanding, we have summarized the key notations andthe used abbreviations in Table 2.
The discrete DMP is used to encode a point-to-point motioninto a stable dynamical system. In the following subsections,we will go through the formulation and main features ofdiscrete DMPs starting by the classical one operating in R space (Section 2.1.1), then passing by Cartesian space— S and SO (3) —in Section 2.1.2, and ending by DMPformulation for SPD space ( S m ++ ) in Section 2.1.3. The classical discrete DMPs were first introduced by Ijspeertet al. (2002c). A DMP for a single DoF trajectory y of a discrete movement (point-to-point) is defined by thefollowing set of nonlinear differential equations (Ijspeertet al. 2002c, 2013) τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) , (1) τ ˙ y = z, (2) τ ˙ x = α x x, (3) Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Table 2.
Description of key notations and abbreviations. Indices, super/subscripts, constants, and variables have the samemeaning over the whole text. N (cid:44) i (cid:44) index : i = 1 , , . . . , NJ (cid:44) j (cid:44) index : j = 1 , , . . . , JL (cid:44) l (cid:44) index : l = 1 , , . . . , LV (cid:44) v (cid:44) index : v = 1 , , . . . , V T (cid:44) (cid:44) index : = 1 , , . . . , T m (cid:44) dimensions of S m ++ n (cid:44) dimensions of R n {·} d (cid:44) subscript for desired value {·} q or {·} q (cid:44) quaternion related variable {·} R or {·} R (cid:44) rotation matrix related variable {·} ++ , {·} + or {·} + (cid:44) SPD related variable {·} g (cid:44) subscript for goal value α z , β z , α x , α s , α g , α yx , α qg (cid:44) positive gains τ (cid:44) time modulation parameter c i , h i (cid:44) centers and widths of Gaussians T (cid:44) time duration t (cid:44) continuous time λ (cid:44) forgetting factor r (cid:44) amplitude modulation parameter x (cid:44) phase variable y, ˙ y (cid:44) trajectory data and its 1st derivative s (cid:44) sigmoidal decay phase z, ˙ z (cid:44) scaled velocity and acceleration p (cid:44) piece-wise linear phase g , g q , g + (cid:44) attractor point ( goal ) in different spaces ω (cid:44) angular velocity ˆ g , ˆ g q and ˜ g , ˜ g q (cid:44) moving target and delayed goal function indifferent spaces Q t , ˙ Q t (cid:44) joint position, its 1st time-derivative g v (cid:44) intermediate attractor ( via-goal ) q , ˙q (cid:44) unit quaternion, its 1st time-derivative R , ˙R (cid:44) rotation matrix, its 1st time-derivative f , f q , f R , f q , F + (cid:44) forcing term for different spaces w i (cid:44) adjustable weights Ψ i (cid:44) basis functions θ and ϑ (cid:44) an angle and learnable parameters S m ++ (cid:44) m × m SPD manifold
Sym m (cid:44) m × m symmetric matrices space M (cid:44) a Riemannian manifold X (cid:44) an arbitrary SPD matrix T Λ M (cid:44) a tangent space of M at an arbitrary point Λ M (cid:44) the mean of { X t } T t =1 (cid:37) = Log Λ ( Υ ) (cid:44) M (cid:55)→ T Λ M , maps an arbitrary point Υ ∈M into (cid:37) ∈ T Λ M Υ = Exp Λ ( (cid:37) ) (cid:44) T Λ M (cid:55)→ M , maps (cid:37) ∈ T Λ M into Υ ∈ M vec ( · ) (cid:44) a function transforms Sym m into R n usingMandel’s notation. mat ( · ) (cid:44) a function transforms R n into Sym m usingMandel’s notation. k , K , K P , K O (cid:44) different forms of stiffness gains D, D V , D W (cid:44) different forms of damping gains M and I (cid:44) mass and inertia matrices F , f e and τ e (cid:44) forces and external forces and torques DMP Dynamic Movement Primitive IL Imitation LearningRL Reinforcement Learning SPD Symmetric Positive DefiniteDoF Degree of Freedom RBF Radial Basis FunctionLWR Locally Weighted Regression GMM Gaussian Mixture ModelGMR Gaussian Mixture Regression GP Gaussian ProcessNN Neural Network VMP Via-points Movement PrimitiveProMP Probabilistic Movement Primitives LfD Learning from DemonstrationGPR Gaussian Process Regression MoMP Mixture of Motor PrimitivesEMG Electromyography ILC Iterative Learning ControlVIC Variable Impedance Control VILC Variable Impedance Learning ControlPI Policy Improvement with Path Integrals CMA-ES Covariance Matrix Adaptation-Evolution StrategiesCC-DMP Coordinate Change-DMPs RBF-NN Radial Basis Function-Neural NetworkPoWER Policy Learning by Weighting Exploration with the Returns HRL Hierarchical RLAEDMP AutoEncoded DMP CNN Convolutional Neural NetworkGPDMP Global Parametric Dynamic Movement Primitive UAV Unmanned Areal Vehicle where x is the phase variable and z is an auxiliary variable.Parameters α z and β z define the behavior of the second ordersystem described by (1) and (2). With the choice τ > , α z = 4 β z and α x > , the convergence of the underlyingdynamic system to a unique attractor point at y = g , z = 0 is ensured (Ijspeert et al. 2013). Alternatively, thegains α z and β z can be learned from training data whilepreserving the convergence of the system (Tan et al. 2016).In the DMP literature, equations (1)–(2), as well as theirperiodic counterpart (33)–(34), are called the transformationsystem , while (3) (or (35)) is the canonical system . f ( x ) is defined as a linear combination of N nonlinear RadialBasis Functions (RBFs), which enables the robot to followany smooth trajectory from the initial position y to the finalconfiguration g f ( x ) = (cid:80) Ni =1 w i Ψ i ( x ) (cid:80) Ni =1 Ψ i ( x ) x, (4) Ψ i ( x ) = exp (cid:16) − h i ( x − c i ) (cid:17) , (5) where c i are the centers of Gaussian basis functionsdistributed along the phase of the movement and h i theirwidths. For a given N and setting τ equal to the totalduration of the desired movement, we can define c i = exp (cid:0) − α x i − N − (cid:1) , h i = c i +1 − c i ) and h N = h N − where i = 1 , . . . , N . For each DoF, the weights w i should beadjusted from the measured data so that the desired behavioris achieved. The selection of the number of weights shouldbe based on the desired resolution of the trajectory. Forcontrolling a robotic system with more than one DoF,we represent the movement of every DoF with its ownequation system (1)–(2), but with the common phase (3) tosynchronize them. For a discretemotion, given a demonstrated trajectory y d ( t ) , t =1 , . . . , T and its time derivatives ˙ y d ( t ) and ¨ y d ( t ) , it ispossible to invert (1) and approximate the desired shape f d as f d ( t ) = τ ¨ y d ( t ) − α z ( β z ( g − y d ( t )) − τ ˙ y d ( t )) . (6) Prepared using sagej.cls
Journal Title XX(X)
Figure 2.
A classical DMP is used to generate a discretemotion connecting x = 0 and g = 1 (green line in the top leftpanel). The training data (black dashed lines) are obtained froma minimum jerk trajectory connecting x and g in T = 1 s andused to learn the weights w i of Gaussian basis functionsequally distributed in time. The results of the parameterslearning procedure are shown in the bottom right panel. Theexponentially decaying phase variable is used as shown in themiddle right panel. Results are obtained with the open sourceimplementation available at . By stacking each f d ( t ) and w i into the column vectors F = (cid:2) f d ( t ) , . . . , f d ( t T ) (cid:3) (cid:62) and w = (cid:2) w , . . . , w N (cid:3) (cid:62) , we obtainthe following linear system Φ w = F , (7)where Φ = Ψ ( x ) (cid:80) Ni =1 Ψ i ( x ) x · · · Ψ N ( x ) (cid:80) Ni =1 Ψ i ( x ) x ... . . . ... Ψ ( x T ) (cid:80) Ni =1 Ψ i ( x T ) x T · · · Ψ N ( x T ) (cid:80) Ni =1 Ψ i ( x T ) x T . (8)Locally Weighted Regression (LWR) (Atkeson et al. 1997;Schaal and Atkeson 1998; Ude et al. 2010) is a popularapproach used to update the weights w i . LWR uses the errorbetween the desired trajectory shape and currently learnedshape and a forgetting factor λ to update the weights as P = 1 λ (cid:32) P − − P − ϕ ϕ (cid:62) P − λ + ϕ (cid:62) P − ϕ (cid:33) , (9) w = w − + ( f d ( t ) − ϕ (cid:62) w − ) P ϕ . (10)In the previous equations w = w ( t ) and ϕ is the columnvector obtained by transposing the -th row of Φ . The initialvalue of the parameters is P = I , w = . A discrete DMPlearned on synthetic data is shown in Figure 2.LWR has been the standard method to learn the weights ofDMPs and therefore f ( x ) . As an alternative to LWR, (Krugand Dimitrovz 2013) have shown that learning a forcingterm defined as in (4) can be formulated as a quadraticoptimization problem and efficiently solved. Figure 3.
Possible phase variables used in different discreteDMP formulations. All the different possibilities ensure that x, s, p → for t → + ∞ (for t > T in practice). In general, the problem of learning and retrieving f ( x ) can be in principle solved with any regression technique(Stulp et al. 2013). For instance, Wang et al. (2016) modified f ( x ) in (4) by considering a bias term b i , i.e., w i x + b i , and used truncated kernels ( Ψ i vanishes if x − c i issmaller than a threshold). This formulation, called DMP+,produces more accurate trajectories than the original DMP.Moreover, a learned trajectory can be modified by updatingonly a subset of the weights. Other work focused on usingmultiple demonstrations to increase the generalization powerof the learned primitive. To learn a suitable forcing termfrom multiple demonstrations, some authors used GaussianMixture Model (GMM) (Yin and Chen 2014; Pervez et al.2017a) and Gaussian Mixture Regression (GMR) (Cohnet al. 1996), while others adopted Gaussian Process (GP)(Fanger et al. 2016; Umlauft et al. 2017) (Rasmussen andWilliams 2006), or exploited a deep Neural Network (NN)(Pervez et al. 2017b; Pahiˇc et al. 2020) developed originallyin (LeCun et al. 2015). The phasevariable x in (3) provides the ability to manipulate timeduring the execution of DMP equations. Moreover, DMPprovides the ability to slow-down or even stop the executionthrough the phase-stopping mechanism (Ijspeert et al. 2002c) τ ˙ x = − α x x α yx || ˜ y − y || (11)Moreover, DMPs provide an elegant way to adapt thetrajectory generation in real-time through goal switchingmechanisms (Ijspeert et al. 2013) τ ˙ g = α g ( g − g ) (12)DMPs in its standard formulation are not suitable fordirect encoding of skills with specific geometry constraints,such as orientation profiles (represented in either unitquaternions or rotation matrices), stiffness/damping andmanipulability profiles (encapsulated in full SPD matrices).For instance, direct integration of unit quaternions, does notensure the unity of the quaternions norm. Any representationof orientation that does not contain singularities is non-minimal, which means that additional constraints need to betaken into account during integration. Equation (3)describes an exponential decaying phase variable that hasbeen widely used in the DMP literature. The main drawbackof the exponential decaying phase is that it rapidly dropsto very small values towards the end of the motion. This
Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel “forces” the learning algorithm to exploit relatively highweights w i to accurately reproduce the last part of thedemonstration (Samant et al. 2016). As an example, inFigure 3 the exponential decaying phase (brown dot-dashedline) is very small already after . s, while the expectedtime duration of the motion is T = 1 s.To overcome this limitation, Kulvicius et al. (2011)propose the sigmoidal decay phase s (green solid line inFigure 3), obtained by integrating ˙ s = − α s e ( α s /δ t )( τT − t ) [1 + e ( α s /δ t )( τT − t ) ] , (13)where α s defines the steepness of s centered at time T and δt is the sampling time. As shown in Figure 3, s = 1 for t < T − δ s , where the time δ s depends on the steepness α s ,and then it decays to s = 0 .The sigmoidal decay in Figure 3 has a tail effect sinceit vanishes after T + δ s s, where δ s depends on the tunableparameter α s . The piece-wise linear phase l (blue dashedline in Figure 3), proposed by (Samant et al. 2016), linearlydecays from to in exactly T s and then remains constant. p is obtained by integrating τ ˙ p = (cid:40) − T , p ≥ , otherwise (14)where p (0) = 1 and T is the time duration of the motion. The classical DMP formulation described in Section 2.1.1applies to single DoF motions. Multidimensional motionsare generated independently and synchronized with acommon phase. In other words, equations (1) and (2) arerepeated for each DoF while the phase variable in (3) isshared. This works when the evolution of different DoFis independent, like for joint space or Cartesian positiontrajectories. Unlike Cartesian position, the elements oforientation representations like unit quaternion or rotationmatrix are constrained. In this section, we present approachesthat extend the classical DMP formulation to representCartesian orientations.
Unit quaternion q = ν + u ∈ S provides a representation of the orientation of the robot’send-effector (Chiaverini and Siciliano 1999). S is a unitsphere in R , ν ∈ R , and u ∈ R . Abu-Dakka et al. (2015a)rewrote DMP equations (1) and (2) for direct unit quaternionencoding as follows τ ˙ η = α z ( β z Log q ( g q ∗ q ) − η ) + f q ( x ) , (15) τ ˙q = 12 η ∗ q , (16)where g q ∈ S denotes the goal orientation, the quaternionconjugation is defined as q = ν + u = ν − u , and ∗ denotesthe the quaternion product q ∗ q = ( ν + u ) ∗ ( ν + u )= ( ν ν − u (cid:62) u ) + ( ν u + ν u + u × u ) . η ∈ R is the scaled angular velocity ω and treated as unitquaternion with zero scalar ( ν = 0) in (16). The function Time [s]Time [s] 50 10-0.08-0.0400.50.350.65 50 10-0.0300.03 50 10 10.550.1 50 1010.50 50 10 500-500-1500-2500 50 10 [ r ad / s ] [ r ad / s ] Figure 4.
A unit quaternion DMP is used to generate a discretemotion connecting q and g q . The training data (black dashedlines) are obtained from a minimum jerk trajectory connecting q and g q in T = 10 s and used to learn the weights w i of Gaussian basis functions equally distributed in time. The resultsof the parameters learning procedure are shown in the bottomright panel. The exponentially decaying phase variable is usedas shown in the middle right panel. Results are obtained withthe open source implementation available at https://gitlab.com/dmp-codes-collection . Log q ( · ) : S (cid:55)→ R is given asLog q ( q ) = arccos( ν ) u || u || , u (cid:54) = [0 0 0] (cid:62) , otherwise , (17)where || · || denotes (cid:96) norm.Early attempt to encode unit quaternion profiles usingDMP was presented by Pastor et al. (2011). Unlike Abu-Dakka et al.’s formulation, Pastor et al.’s does not take intoaccount the geometry of SO (3) as they just used the vectorpart of the quaternion product ( g q ∗ q ) in (15) instead of Log q ( g q ∗ q ) which defines the angular velocity ω thatrotates quaternion q into g q within a unit sampling time.Equation (16) can be integrated as q ( t + δt ) = Exp q (cid:18) δt η ( t ) τ (cid:19) ∗ q ( t ) , (18)where δ t > denotes a small constant. The functionExp q ( · ) : R (cid:55)→ S is givenExp q ( ω ) = cos( || ω || ) + sin( || ω || ) ω || ω || , ω (cid:54) = (cid:62) , otherwise . (19)Both mappings become one-to-one, continuously differ-entiable and inverse to each other if the input domain of themapping Log q ( · ) is restricted to S except for − (cid:62) ,while the input domain of the mapping Exp q ( ω ) shouldfulfill the constraint || ω || < π (Abu-Dakka et al. 2015a). Anexemplar unit quaternion DMP is shown in Figure 4.Phase-stopping (11) can be rewritten as follows τ ˙ x = − α x x α qx d (˜ q , q ) (20) Prepared using sagej.cls
Journal Title XX(X)
Figure 5.
A rotation matrix DMP is used to generate a discretemotion connecting R and R g . The training data (black dashedlines) are obtained from a minimum jerk trajectory connecting R and R g in T = 10 s and used to learn the weights w i of Gaussian basis functions equally distributed in time. The resultsof the parameters learning procedure are shown in the bottomright panel. The exponentially decaying phase variable is usedas shown in the middle right panel. whered (˜ q , q ) = (cid:40) π, q ∗ q = 1 + [0 0 0]2 | Log q ( q ∗ q ) || , otherwiseUde et al. (2014) extended DMP quaternions-basedformulation by rewriting (12) to include goal switchingmechanism. τ ˙g q = α qg Log q ( g q,new − g q ) ∗ g q (21)so that g q is continuously changing onto g q,new in real-time.Equation (21) should be integrated using (19) along with (15)and (16).As shown by Saveriano et al. (2019) using Lyapunovarguments, both the quaternion DMP formulations in (Pastoret al. 2011) and in (Abu-Dakka et al. 2015a; Ude et al. 2014)asymptotically converge to the target quaternion g q with zerovelocity. In their work on orienta-tion DMPs, Ude et al. (2014) extended DMPs formulationin order to encode orientation trajectories represented in theform of rotation matrices R ( t ) ∈ SO (3) . Therefore, theyrewrote (1) and (2) in the form τ ˙ η = α z ( β z Log R ( R g R (cid:62) ) − η ) + f R ( x ) , (22) τ ˙R = [ η ] × R , (23)where R g represents the goal orientation. [ η ] × is a skewsymmetric matrix, such as [ η ] × = − [ η ] × . The relationbetween the angular velocity and 1st-time-derivative of therotation matrix is given by [ ω ] × = − ω z ω y ω z − ω x − ω y ω x = ˙ RR (cid:62) . (24) Time [s]Time [s] 500 100-20200-30030 500 100 10.550.1 500 10010.50 500 100 20-2-4 500 100x106
Figure 6.
An SPD DMP is used to generate a discrete motionconnecting X and X g . The training data (black dashed lines)are obtained from a minimum jerk trajectory connecting X and X g in T = 100 s and used to learn the weights w i of Gaussian basis functions equally distributed in time. The conein the upper left corner represents the manifold of SPD data andincludes the geodesic of the SPD profile. The results of theparameters learning procedure are shown in the bottom rightpanel. The exponentially decaying phase variable is used asshown in the middle right panel. Results are obtained with theopen source implementation available at https://gitlab.com/dmp-codes-collection . The function Log R ( · ) : SO (3) (cid:55)→ R is given asLog R ( R ) = (cid:40) [0 , , (cid:62) , R = I ω = θ n , otherwise , (25) θ = arccos (cid:18) trace ( R ) − (cid:19) , n = 12 sin ( θ ) r − r r − r r − r The generated rotation matrices can be obtained byintegrating (23) as follows R ( t + δt ) = Exp R (cid:18) δt [ η ] × τ (cid:19) R ( t ) . (26)The function Exp R ( · ) : R (cid:55)→ SO (3) is given asExp R ( t [ ω ] × ) = I + sin( θ ) [ ω ] × || ω || + (1 − cos ( θ )) [ ω ] × || ω || . (27)where θ ( t ) = t || ω || express the rotation angle within time t .An exemplar rotation matrix DMP is shown in Figure 5. Abu-Dakka and Kyrki (2020) generalized DMP formulationin order to encode robotic manipulation data profilesencapsulated in form of SPD matrices. By defining X ∈ S m ++ as an arbitrary SPD matrix and Ξ = { t , X } T =1 asthe set of SPD matrices in one demonstration, where S m ++ Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel defines the set of m × m SPD matrices. Afterwards, we canrewrite (1) and (2) as follows τ ˙ σ = α z ( β z vec ( B X (cid:55)→ X ( Log + X ( X g ))) − σ )+ F + ( x ) , (28) τ ˙ ξ = σ , (29)where σ = vec ( Σ ) is the Mandel representation of thesymmetric matrix Σ , where Σ is the time derivativeof Ξ so that Σ ≡ ˙Ξ = ( Log + X − ( X )) /δt . The functionLog X − ( X ) : M (cid:55)→ T X − M maps a point X in themanifold M to a point in the tangent space ∆ ∈ T X − M . vec ( · ) is a function that transforms a symmetric matrix intoa vector using Mandel’s notation, e.g., a vectorization of a × symmetric matrix is vec (cid:32) (cid:18) a bb d (cid:19) (cid:33) = ad √ b . (30) ξ is the vectorization of Ξ . X g ∈ S m ++ represents the goalSPD matrix. vec ( B X (cid:55)→ X ( Log + X ( X g ))) is the vectorizationof the transported symmetric matrix Log + X ( X g ) over thegeodesic from X to X . Then we integrat (29) as ˆX ( t + δt ) = Exp + X ( t ) (cid:32) B X (cid:55)→ X ( t ) ( mat ( σ ( t ))) τ δt (cid:33) . (31)where the function mat ( · ) is the inverse of vec ( · ) anddenotes to the matricization using Mandel’s notation. ˆX ∈ S m ++ represents the new SPD-matrices-based robot skills.The function Exp + X − ( ∆ ) : T X − M (cid:55)→ M maps a point ∆ ∈ T X − M to a point X ∈ M , so that it lies on thegeodesic starting from X − ∈ S m ++ in the direction of ∆ .An exemplar SPD DMP is shown in Figure 6.Moreover, Abu-Dakka and Kyrki (2020) rewrote (12) forsmooth goal adaptation in case of sudden goal switching asfollows τ ˙g + = α g Log + g + new ( g + ) . (32)so g now is updated continually. The periodic DMP (sometimes called rhythmic DMP) areused when the encoded motion follows a rhythmic pattern.
The classical periodic (or rhythmic) DMPs were firstintroduced by Ijspeert et al. (2002b), where they redefinedthe second order differential equation system described in(1) and (2) as follows ˙ z = Ω ( α ( β ( − y ) − z ) + f ( φ )) , (33) ˙ y = Ω z, (34) τ ˙ φ = 1 , (35)where Ω is the frequency and y is the desired periodictrajectory that we want to encode with a DMP. The maindifference between periodic DMPs and point-to-point DMPsis that the time constant related to trajectory duration isreplaced by the frequency of trajectory execution (refer Figure 7.
A classical DMP is used to reproduce a rhythmicmotion (brown solid line in the top left panel). The desiredtrajectory is obtained by adding Gaussian noise to y d = cos (2 πt ) with t ∈ [0 , s and computing the numericalderivatives with δt = 0 . s (black dashed lines). The forcingterm is obtained as the weighted summation of Gaussianbasis equally distributed in time (bottom left panel). The resultsof the parameters learning procedure are shown in the bottomright panel. Results are obtained with the open sourceimplementation available at . to (Ijspeert et al. 2013, 2002b) for details). In addition, theperiodic DMPs must ensure that the initial phase ( φ = 0 ) andthe final one ( φ = 2 π ) coincide in order to achieve smoothtransition during the repetitions.Similar to (4), f ( φ ) is defined with N Gaussian kernelsaccording to the following equation f ( φ ) = (cid:80) Ni =1 Ψ i ( φ ) w i r (cid:80) Ni =1 Ψ i ( φ ) , (36) Ψ i ( φ ) = exp ( h (cos ( φ − c i ) − , (37)where the weights are uniformly distributed along the phasespace, and r is used to modulate the amplitude of the periodicsignal (Ijspeert et al. 2002b; Gams et al. 2009) (if not used,it can be set to r = 1 (Peternel et al. 2016a)).Similarly to discrete DMPs, LWR (Schaal and Atkeson1998) can be used to update the weight to learn a desiredtrajectory. In a standard periodic DMP setting (Ijspeertet al. 2002b; Gams et al. 2009), the desired shape f d isapproximated by solving f d ( t ) = ¨ y d ( t )Ω − α z (cid:18) β z ( − y d ( t )) − ˙ y d ( t )Ω (cid:19) , (38)where y d is some demonstrated input trajectory that needsto be encoded. The weights w i can be updated using therecursive least-squares method (Schaal and Atkeson 1998)with forgetting factor λ based on the error between the Prepared using sagej.cls Journal Title XX(X)
Table 3.
Summary of DMP basic formulations.
Type of movement Space System of equations Reference Short descriptionDiscrete R τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) τ ˙ y = zτ ˙ x = α x x Eqs. (1)–(3), (Ijspeertet al. 2002c) A single DoF, discrete motiontrajectory is encoded into alinear, second-order dynamicalsystem with an additive, non-linear forcing term. Convergenceto the desired goal g is ensured bya vanishing phase variable x . S τ ˙ η = α z ( β z Log q ( g q ∗ q ) − η ) + f q ( x ) τ ˙q = η ∗ q Eqs. (15)–(16), (Abu-Dakka et al.2015a) A quaternion-based orientationtrajectory ( DoFs) is encodedinto a second-order dynamicalsystem with an additive, non-linear forcing term. The error def-inition complies with the geome-try of the unit quaternions space. SO (3) τ ˙ η = α z ( β z Log R ( R g ∗ R (cid:62) ) − η ) + f R ( x ) τ ˙R = [ η ] × R Eqs. (22)–(23), (Udeet al. 2014) A rotation matrix-basedorientation trajectory ( DoFs)is encoded into a second-orderdynamical system with anadditive, non-linear forcing term.The error definition complieswith the geometry of the rotationmatrices space. S m ++ τ ˙ σ = α z ( β z vec ( B X l (cid:55)→ X ( Log + X l ( X g ))) − σ ) + F ( x ) τ ˙ ξ = σ Eqs. (28)–(29), (Abu-Dakka andKyrki 2020) An SPD matrices trajectory, m ( m + 1) / DoFs, is encodedinto a second-order dynamicalsystem with an additive, non-linear forcing term. The errordefinition complies with thegeometry of the SPD matricesspace.
Periodic R ˙ z = Ω ( α ( β ( − y ) − z ) + f ( φ ))˙ y = Ω zτ ˙ φ = 1 Eqs. (33)–(35),(Ijspeertet al.2002b) A single DoF, periodic motiontrajectory is encoded into alinear, second-order dynamicalsystem with an additive, non-linear forcing term. The resultingsystem generates a stable limitcycle. desired trajectory shape and currently learned shape w i ( t +1 ) = w i ( t ) + Ψ i P i ( t +1 ) re r ( t ) , (39) e r ( t ) = f d ( t ) − w i ( t ) r, (40) P i ( t +1 ) = 1 λ (cid:32) P i ( t ) − P i ( t ) r λ Ψ i + P i ( t ) r (cid:33) . (41)The initial value of the parameters is w i (0) = 0 and P i (0) =1 . The forgetting factor determines the rate of weightchanges. Refer to (Schaal and Atkeson 1998) for details onparameter setting. An exemplar rhythmic DMP is shown inFigure 7.The classical periodic DMP described by (33)–(35) doesnot encode the transit motion needed to start the periodicone. Transients are important in several applications likehumanoid robot walking where usually the first step madefrom a rest position is a transient needed to start theperiodic motion. To overcome this limitation, (Ernesti et al.2012) modify the classical formulation of periodic DMPsto explicitly consider transients as motion trajectory thatconverge towards the limit cycle ( i.e., periodic) one. A summary for the existence DMP formulations mentionedin the earlier sections is shown in Table 3. The tableshows the variations of the formulation in its standard shapebased on the space that they are applied to. However, themodifications of this standard shape ( e.g., adding a couplingterm) is discussed in the next section as an extension of theDMP formulations.
A desirable property of motion primitives is the abilityto generalize to unforeseen situations. In this section,we present approaches that allow to adapt DMP motiontrajectories to novel executive contexts.
Classical DMPs are time invariant, meaning that timescaling ςτ with ς > generate topologically equivalenttrajectories (Ijspeert et al. 2013). Using a simple modificationof the transformation system, namely substituting (1) with τ ˙ z = α z ( β z ( g − y ) − z ) + ( g − y ) f ( x ) , (42) Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Ijspeert et al. (2013) show that DMP are also scale invariant,meaning that the scaling of the movement amplitude ς ( g − y ) with ς > generates topologically equivalenttrajectories. The purpose of the green color used in (42)is to highlight differences w.r.t. (1). Apart from generatingscaled—in time and space—versions of the demonstratedmotion trajectory, classical DMPs also generalize to differentinitial/target states. However, the classical formulation—andits extension in (42)—may exhibit dangerous behaviors likeover-amplification of the trajectory when reaching a differenttarget and high accelerations when switching to a differenttarget on-line (Pastor et al. 2009; Ijspeert et al. 2013). Toalleviate the second issue, Ijspeert et al. (2013) replacedhard goal switches with the smooth switching law as in(12). However, the over-amplification issue still remains.Moreover, a DMP that uses (42) fails to learn motions withthe same initial and target states ( i.e., g = y , z = 0 → y ( t ) = y = g ∀ t ).In order to remedy those issues, Pastor et al. (2009)proposed to modify the transformation system as τ ˙ z = α z ( β z ( g − y − ( g − y ) x + f ( x )) − z ) , (43)where the green color is used to highlight differencesbetween (43) and (1). The most important change in thisformulation is the term ( g − y ) x that has several benefits.It prevents high accelerations at the beginning of the motion( g − y − ( g − y ) x = 0 for t = 0 ) or when the goal isclose to the initial state. It allows to reproduce motionswith the same initial and target states and it preventsover-amplifications and trajectory mirroring effects * whenchanging the goal. Hoffmann et al. (2009) derived amultidimensional representation of (43) from the behaviorof the spinal force fields in frogs.The goal can also change over time and, in this case, thetracking performance of the DMP mostly depends on thegains α z and β z . As proposed by (Koutras and Doulgeri2020b), the tracking performance can be improved byadapting the temporal scaling τ .Dragan et al. (2015) showed that DMPs solve a trajectoryoptimization problem in order to minimize a particularHilbert norm between the demonstration and the newtrajectory subject to start and goal constraints. In this light,DMP adaptation capabilities to different start and goals canbe improved by choosing (or learning) a proper Hilbert normthat reduces the deformation in the retrieved trajectory. A via-point can be defined as a point in the state spacewhere the trajectory has to pass. Failing to pass a via-pointmay cause the robot to fail the task execution. Therefore,having a motion primitive representation with the capabilityof modulating the via-points is of importance in roboticscenarios. It is not surprising that researchers have extendedthe DMP formulation to consider intermediate via-points inthe trajectory generation process.Ning et al. (2011, 2012) extend the classical DMP tosatisfy position and velocity constraints at the beginningand at the end of a sample trajectory. Their approach totraverse via-points consists of creating a sample trajectory by combining locally-linear trajectories connecting the via-points. This sample trajectory is used to fit a DMP that isconstrained to pass the via-points.Weitschat and Aschemann (2018) considered each via-point as an intermediate goal ( via-goal ) g v for v = 1 , . . . , V to reach. The last via-goal g V corresponded to the target stateof the DMP. In their formulation, they defined a variable goalas g via ( x ) = V (cid:88) v =1 Ψ v ( x ) g v , (44)where Ψ v ( x ) are the Gaussian basis function centered at thetime corresponding to the v − th via-goal. The effectivenessof the approach is demonstrated in a task were the robothas to reach a different target while preventing possible self-collisions of the end-effector with the robot body. To this end,authors place the via-goals along the trajectory used to learnthe DMP, forcing the generated trajectory to stay close to thedemonstration while reaching the new target.The problem of generalizing to via-point close (interpo-lation) and far (extrapolation) from the demonstration isfaced by (Zhou et al. 2019). Their approach, namely Via-points Movement Primitives (VMPs) , combines the benefitsof DMP and Probabilistic Movement Primitivess (ProMPs)(Paraschos et al. 2013). Authors assumed that the motiontrajectory is generated as y vmp ( x ) = e ( x ) + f vmp ( x ) , (45)where x is the phase variable defined as in (3) and theelementary trajectory e ( x ) can be defined as the linearattractor e ( x ) = ( y − g ) x . The shape modulation term f vmp ( x ) is defined as f vmp ( x ) = N (cid:88) i =1 w i Ψ i ( x ) + (cid:15) f (46)where the Gaussian kernels Ψ i ( x ) are defined as in (5), w i are learnable weights, and (cid:15) f is the Gaussian noise.As detailed in (Paraschos et al. 2013), learning theshape modulation term f vmp ( x ) means Learning fromDemonstrations (LfDs) the prior probability distribution ofthe weights w i . Having separated the generated trajectoryinto two parts like in (45) allows to adopt different strategiesto pass a via-point y v at x v . Zhou et al. (2019) proposed tomodify the shape modulation term for interpolation cases–when the via-point is “close” to the demonstrations. Inextrapolation cases, instead, the elementary trajectory e ( x ) isrewritten as the polygonal line connecting y , y v , and g . Thisapproach easily generalizes to the case of multiple via-points.VMPs are experimentally compared with ProMPs, showingbetter performance especially in extrapolation cases. Reaching a different goal, or passing through via-points, maynot be enough to successfully execute a task in a differentcontext. Approaches presented in this section adapt the DMP ∗ As discussed by (Pastor et al. 2009), a transformation system that uses (42)generates a mirrored trajectory while reaching a new goal g new every timethe signs of ( g new − y ) and ( g − y ) differ. Prepared using sagej.cls Journal Title XX(X) [ m ] (a) Position 0.8-0.800 5 10Time [s] [ m / s ] (b) Linear Velocity 0.06-0.0604.6 4.7 4.8Time [s] [ m / s ] (c) Linear Velocity 0 5 10Time [s]1.50.5-0.5 [ m ] (d) Goal Switch 0.00500.010 5 10Time [s] [ m ] (e) Position Error0.8-0.800 5 10Time [s] (f)
Quaternion [ r ad / s ] (g) Angular Velocity 0.06-0.0604.6 4.7 4.8Time [s] [ r ad / s ] (h) Angular Velocity 0.6-1-0.20 5 10Time [s] (i)
Goal Switch 0 5 10Time [s]0.00700.014 [ r ad ] (j) Orientation Error
Figure 8.
Results obtained by applying the zero velocity switch approach to join two DMPs trained on synthetic data. The trainingtrajectory for the position and the orientation are shown as a black dashed lines in (a)-(b) and (f)-(g) respectively. Results areobtained with the open source implementation available at https://gitlab.com/dmp-codes-collection . motion to new situations by adjusting the weights w i of theforcing term (4), that modifies the entire DMP trajectory.Weitschat et al. (2013) considered that L demonstrationsare given, each encoded in a different DMP. In order togeneralize, for instance, to a new goal g new , they proposedto interpolate the weights of nearby DMPs, i.e., DMPs thatreached points around g new . In formulas w new = (cid:80) ∀ o : d o Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel interpolation of DMP weights in the latent space results inbetter generalization performance. An important and desired feature of any motion primitiverepresentation is the possibility to combine basic movementsto obtain more complex behaviors (Schaal 1999). We reviewhere three prominent approaches developed to smoothly joina sequence of DMPs. In this tutorial, we name the approachby Pastor et al. (2009) as velocity threshold , that in (Koberet al. 2010b) as target crossing , and that in (Kulviciuset al. 2011, 2012) as basis functions overlay . Some ofthe presented approaches modify the DMP formulations inSection 2.1.1 and 2.1.2. The main differences are highlightedwith green text. The approaches have been implementedin Matlab for both position (Section 2.1.1) and orientation(Section 2.1.2) DMPs. The source code is included in ourpublic repository (see Table 4). Results on synthetic data areshown in Figures 8 to 11. A properly designed DMP reaches the desired target withzero velocity and acceleration, i.e., once a DMP is fullyexecuted the robot comes to a full stop. This also implies thatthe velocity “close” to the target is continuously decreasing.Using this property, Pastor et al. (2009) propose to combinesuccessive DMPs by simply terminating the current DMPwhen the velocity is below a certain threshold and thenstarting the following primitive. When executing a singleDMP, it is common practice to initialize its velocity tozero—the robot is assumed to be still. In principle, thisinitialization can be used to sequence multiple DMPs (Xuand Wang 2004; Lioutikov et al. 2016), but it may generatediscontinuities if the robot does not fully stop in between twoconsecutive primitives. To prevent this discontinuities, Pastoret al. initialized the state of the current DMP with that of theprevious one.The velocity threshold approach is simple and effectivesince it directly applies to the DMP formulations inSections 2.1.1 and 2.1.2. For instance, Saveriano et al.(2019) showed how to join multiple quaternion DMPs † (seeSection 2.1.2.1) with the velocity threshold approach.Results in Figure 8 are obtained when velocity threshold isapplied to merge DMPs separately trained to fit minimumjerk trajectories (black dashed lines). Figures 8a–8e show theposition and Figures 8f–8j the orientation (unit quaternion)parts of the motion. The merged trajectory is generatedby following the first DMP until the distance from thevia-point is below . [m] and . [rad]. As shown inFigures 8d and 8i, the switch occurs after about . [s].Figures 8e and 8j shows that the desired trajectory isaccurately reproduced. More or less accurate trajectoriescan be obtained by tuning the distance from the via-point. However, the value of this distance the time durationof the generated trajectory—a bigger (smaller) distanceresults in a shorter (longer) trajectory. For instance, in theconsidered case, the total motion ends after . [s] while thedemonstration lasts for [s]. Depending on the application,the time difference may cause failures, therefore, it has to betaken into account. Finally, the velocity threshold approachmay generate discontinuities if the target of the current DMP [ m ] Figure 9. The constant goal, moving target, and delayed goalobtained obtained with y (0) = 0 [m], g = 1 [m], ˙ˆ y = 0 . [m/s](left), and q (0) = 1 + [0 , , (cid:62) , g q = 0 + [1 , , (cid:62) , ˆ ω = [0 . , . , . (cid:62) [rad/s] (right). The sampling time is δt = 0 . [s]. Only the scalar part ν of the quaternion is shownfor a better visualization. is far from the demonstrated initial point of the followingprimitive. There exist movements like hitting or batting that arecorrectly executed only if the target is reached with a non-zero velocity. To this end, Kober et al. (2010b) extend theclassical DMP formulation in Section 2.1.1 to let the DMP totrack a target moving at a given velocity. In their approach,the DMP passes the target with a given velocity exactly after T seconds. To achieve this, the acceleration in (1) is re-written as τ ˙ z = (1 − x ) α z (cid:16) β z (ˆ g − y )+ τ ( ˙ˆ y − ˙ y ) (cid:17) + f ( x ) , (48)where ˙ˆ y m is the desired velocity of the moving target ˆ g ,which is defined as ˆ g = ˆ g (0) − ˙ˆ y τ ln ( x ) α x , (49) ˆ g (0) = g − T ˙ˆ yτ . (50)By inspecting (49) and (50), and considering that the term − τ ln( x ) /α x represents the elapsed time if x is the phasedefined in (3), it is possible to show that the moving target ˆ g is designed to reach the goal g after T seconds, i.e., ˆ g ( T ) = g (Fig. 9- left ). The initial position of the moving target ˆ g (0) is obtained by moving the goal position g for T seconds atconstant velocity − ˙ˆ y . High accelerations at the beginning ofthe movement are avoided by the pre-factor (1 − x ) which isset to zero at the beginning of the motion ( x (0) = 1 ). Theapproach by Nemec and Ude (2012) combines a movingtarget and a particular initialization of the subsequent DMPto ensure continuity of the movement up to second-orderderivatives.Saveriano et al. (2019) extended this idea to quaternionDMP. The angular acceleration in (15) is modified as τ ˙ η =(1 − x ) α z ( β z Log q (ˆ g q ∗ q )+ τ ( ˆ ω − ω )+ f q ( x ) , (51) † Saveriano et al. used the multi-dimensional DMP formulation developedin (Hoffmann et al. 2009) for both position and quaternion DMPs. In thisreview paper, we reformulate the merging approaches in (Saveriano et al.2019) to comply with the formulations in Section 2.1.1 and 2.1.2.1. Prepared using sagej.cls Journal Title XX(X) [ m ] (a) Position 0.8-0.800 5 10Time [s] [ m / s ] (b) Linear Velocity 0.02-0.0204.8 5 5.2Time [s] [ m / s ] (c) Linear Velocity 0 5 10Time [s]1.50.5-0.5 [ m ] (d) Goal Switch 0.0300.060 5 10Time [s] [ m ] (e) Position Error0.8-0.800 5 10Time [s] (f) Quaternion 2.5-2.500 5 10Time [s] [ r ad / s ] (g) Angular Velocity 0.02-0.0204.8 5 5.2Time [s] [ r ad / s ] (h) Angular Velocity 0.6-1-0.20 5 10Time [s] (i) Goal Switch 0 5 10Time [s]0.1600.32 [ r ad ] (j) Orientation Error Figure 10. Results obtained by applying the target crossing approach to join two DMPs trained on synthetic data. The trainingtrajectory for the position and the orientation are shown as a black dashed lines in (a)-(b) and (f)-(g) respectively. Results areobtained with the open source implementation available at https://gitlab.com/dmp-codes-collection . where ˆ ω is the angular velocity of the moving quaterniontarget ˆ g q and Log q (ˆ g q ∗ q ) measures the error betweenthe current orientation q and ˆ g q . The pre-factor (1 − x ) isused to avoid high angular accelerations at the beginning ofthe motion. The moving target for the quaternion DMPs isdefined as ˆ g q = Exp q (cid:18) − τ ln( x )2 α x ˆ ω (cid:19) ∗ ˆ g q (0) , ˆ g q (0) = Exp q (cid:18) − T ω (cid:19) ∗ g q , (52)where g q is the goal quaternion, T is the time duration of theDMP, and the exponential map Exp q ( · ) is defined in (19).As shown in Figure 9- right , the moving target ˆ g q reachesthe goal orientation after T seconds, i.e., ˆ g q ( T ) = g q . Thiscan be easily verified by considering that the initial valueof the moving target ˆ g q (0) is computed by moving the goalorientation g q for T seconds at the desired velocity − ˆ ω .The presented target crossing approach allows to cross thetarget after T seconds. Assuming to have two DMPs withtime duration T and T respectively, one can join them byrunning the first DMP for T seconds and then switchingto the second one. As for the velocity threshold approach,possible discontinuities at the switching point are preventedby initializing the state of DMP with the final state of DMP .This procedure can be repeated to join L ≥ consecutiveDMPs.Results in Figure 10 are obtained when the velocitythreshold is applied for merging separately trained DMPsto fit the minimum jerk trajectories (black dashed lines).Figures 10a–10e show the position and Figures 10f–10jthe orientation (unit quaternion) parts of the motion. Themerged trajectory is generated by following the first DMPfor T = 5 s and then switch to the second one. Therequired intermediate velocity is set to . m/s (rad/sfor the orientation) in each direction. The generatedtrajectory reaches the goal in s, i.e., demonstrationand execution times are the same. As required, the via-point is crossed at T = 5 s with the desired velocity (Fig. 10c and 10h). However, the non-zero crossing velocityintroduce a deformation in the first part of the trajectory(Fig. 10e and 10j). The approach by Kulvicius et al. (2011, 2012) combinesmultiple DMPs into a complex one, guaranteeing a smoothtransition between the primitives by ensuring that the basisfunctions composing f ( x ) in (4) overlap at the switchinginstances. First of all, Kulvicius et al. adopted a sigmoidalphase variable in (13) instead of the exponentially decayingone (3). As discussed in Section 2.1.1.3, the sigmoidal phaseis ≈ for the large part of the motion which makes it possibleto use smaller forcing terms to reproduce the demonstrations.On the contrary, the exponential phase is close to zeroalready before T s (Fig. 3), which results in larger forcingterms.The classical acceleration dynamics in (1) is modified as τ ˙ z = α z ( β z (˜ g − y ) − z ) + f ( s ) , (53)Similarly to target crossing, Kulvicius et al. used a movingtarget ˜ g in the acceleration dynamics, but called it the delayedgoal function . The ˜ g term in (53) is obtained by integrating τ ˙˜ g = (cid:40) δtT ( g − y ) , t ≤ T , otherwise (54)with ˜ g (0) = y . The delayed goal function in Figure 9 moveslinearly from y to g in T seconds and then remains constant, i.e., ˜ g ( t ≥ T ) = g .The non-linear forcing term f ( s ) is in green in (53)because it slightly differs from the classical one in (4). f ( s ) is defined as f ( s ) = (cid:80) Ni =1 w i Ψ i ( t ) (cid:80) Ni =1 Ψ i ( t ) s, Ψ i ( t ) = exp (cid:18) − ( tτT − c i ) σ i (cid:19) , (55) Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel [ m ] (a) Position 0.8-0.800 6 12Time [s] [ m / s ] (b) Linear Velocity 0.03-0.0304.8 5 5.2Time [s] [ m / s ] (c) Linear Velocity 0 6 12Time [s]1.50.5-0.5 [ m ] (d) Goal Switch 0.01500.030 6 12Time [s] [ m ] (e) Position Error0.8-0.800 6 12Time [s] (f) Quaternion 2.5-2.500 6 12Time [s] [ r ad / s ] (g) Angular Velocity 0.03-0.0304.8 5 5.2Time [s] [ r ad / s ] (h) Angular Velocity 0.8-0.800 6 12Time [s] (i) Goal Switch 0 6 12Time [s]0.0400.08 [ r ad ] (j) Orientation Error Figure 11. Results obtained by applying the basis functions overlay approach to join two DMPs trained on synthetic data. Thetraining trajectory for the position and the orientation are shown as a black dashed lines in (a)-(b) and (f)-(g) respectively. Resultsare obtained with the open source implementation available at https://gitlab.com/dmp-codes-collection . where σ i is the width and c i is the center of the i -th basisfunction, and s is obtained by integrating (13). The term t/τ T is used in (55) instead of the the phase variable x .Being ≤ t/τ T ≤ , the basis functions are equally spacedbetween and . Finally, σ i are the widths of each kernel.They are constant and depend on the number of kernels.Having presented the main differences with the canonicalapproach, it is possible to focus on how Kulvicius et al.(2012) solved the problem of joining L ≥ DMPs. Ingeneral, each of the L DMPs has a different time duration T l , desired target g l , and initial position y l , from which it ispossible to compute the delayed goal functions by integrating τ ˙˜ g l = δtT l ( g l − y l ) , l − (cid:80) κ =1 T κ ≤ t ≤ l (cid:80) κ =1 T κ , otherwise . (56)Note that, being ˜ g l (0) = y , the acceleration (53) is smoothat the beginning of the motion. For this reason, the term (1 − x ) used in (48) is not needed in (53).Assuming that L DMPs have been trained and that eachDMP has N kernels, we can merge them into one DMP asfollows. The centers of the joined DMP are computed as ˇ c li = T ( i − T join ( N − , l = 1 T l ( i − T join ( N − + T join l − (cid:80) κ =1 T κ , otherwise , (57)where T l is the duration of the l -th DMP, and T join = (cid:80) Ll =1 T l (duration of the joined motion). The widths of thejoined DMP are computed as ˇ σ li = σ li T l T join . (58)The centers and widths computed in (57) and (58)respectively overlap at the transition points allowing forsmooth transitions between consecutive DMPs. The weightsof the joined DMP are obtained by stacking the N weights ofthe L DMPs. Therefore, the joined DMP has N ∗ L kernels and N ∗ L weights. The phase variable (13) is modified torun for the duration T join of the joint motion.Saveriano et al. (2019) extended the basis functionsoverlay approach to quaternion DMPs. Assuming that asequence of L quaternion DMPs is given. The angularacceleration in (15) is reformulated for each DMP as τ ˙ η l = α z ( β z Log q (˜ g lq ∗ q l ) − η l ) + f lq ( s ) (59)where l indicates the l -th quaternion DMP and f lq ( s ) isdefined as in (55). The term ˜ g lq is the quaternion delayedgoal function and it ranges from q l (0) to g lq in T l seconds(see Fig. 9 (right)). To generate this moving target whilepreserving the geometry of S it is needed that ˜ g lq movesalong the geodesic connecting q l (0) to g lq . Therefore, ˜ g lq isdefined as ˜ g lq ( t + δt ) = Exp q (cid:32) τ ˜ ω l ( t )2 (cid:33) ∗ ˜ g lq ( t ) (60)where ˜ ω l ( t ) = T l Log q (cid:0) g lq ∗ q l (0) (cid:1) l − (cid:80) κ =1 T κ ≤ t ≤ l (cid:80) k =1 T κ [0 , , (cid:62) , otherwise . (61)The angular velocity in (61) is computed for each l . Theterm Log q (cid:0) g lq ∗ q l (0) (cid:1) represent the angular velocity thatrotates q l (0) into g lq in a unit time. Note, that the mappingsLog q ( · ) and Exp q ( · ) are defined in (17) and (19) respectively.The delayed goal ˜ g lq crosses all the via-goals g lq , l =1 , . . . , L − and then reaches the goal g Lq .Results in Figure 11 are obtained when velocity thresholdis applied to merge DMPs separately trained to fit theminimum jerk trajectories (black dashed lines). Figures 11a–11e show the position and Figures 11f–11j the orientation(unit quaternion) parts of the motion. This approach doesnot require a switching rule and automatically generatesa smooth trajectory—with continuous velocity as shown Prepared using sagej.cls Journal Title XX(X) in Figures 11c and 11h—that passes close to the via-point which favors the overall reproduction accuracy(Fig. 11e and 11j). However, the distance from the via-pointdepends on the weights of the joined primitives and cannotbe separately decided. The trajectory generated with thisapproach tend to last longer than the demonstrations. Thisis due to the sigmoidal phase that vanishes after T + δ s s(Fig. 3). Depending on the application, the time differencemay cause failures and has to be taken into account. The standard periodic DMP learning approach approximatesthe shape f d ( t ) of the input trajectory y d in (38) by changingthe weights of the Gaussian kernel functions (Ijspeert et al.2013). Updating of the weights is performed in such away that the difference between the reference trajectory andthe DMP is reduced at every control step and graduallythroughout the periodic repetitions. However, the DMP canalso be reshaped by some external feedback function toachieve different functionalities for different applications,for instance, tasks that require trail-and-error approach(Kober et al. 2008), obstacle avoidance (Park et al. 2008;Hoffmann et al. 2009; Tan et al. 2011), coaching (Petriˇcet al. 2014b; Gams et al. 2016) for robots, and adaptationof assistive exoskeleton behavior (Peternel et al. 2016a).Alternatively, the frequency of the existing periodic DMPscan be modulated online (Gams et al. 2009; Petriˇc et al.2011). In (Park et al. 2008; Hoffmann et al. 2009; Tan et al. 2011)the detected obstacle was fitted with a potential field functionto change the shape of the DMP to avoid it. More in details,Tan et al. (2011) used the potential field to compute a time-varying goal and modified the resulting DMP trajectory,while (Park et al. 2008; Hoffmann et al. 2009) added andextra forcing term to the DMP. Similarly in (Gams et al.2016) the human arm was fitted with a potential fieldfunction, which was used to reshape the DMP to performcoaching. The potential field was coupled to the position ofthe human hand to make pointing gestures and indicate thedirection in which the robot arm position trajectory shouldchange: ˙ z = Ω ( α ( β ( − y ) − z ) + C O + f ) . (62)The added coupling term C O is the obstacle avoidance termthat contains the potential field and is given in a simplifiedform for the sake of explanation as: C O = d s ( ||O − y || ) exp( − ζ ( O − y )) , (63)where O is the obstacle (or human pointing gesture) and y is the robot position. Exponential and ζ functions determinethe potential field, while function d s controls the distance atwhich the perturbation field should start affecting the DMP.For the full formulation of C O and its parameters, see (Gamset al. 2016). In (Rai et al. 2017) the method was extended toinclude generalization of the obstacle avoidance formulationin (62).Alternatively, the faulty segment of collision DMPtrajectory can also be directly adjusted online by the human demonstrator (Karlsson et al. 2017). On the other hand, themethod in (Kim et al. 2015) considers obstacle avoidance asa constraint of an optimization problem, which modifies theDMP trajectory to prevent collisions. Similarly as for obstacle avoidance, task dynamics can alsobe incorporated into DMP as coupling terms. In (Gamset al. 2014) task dynamics were coupled on the accelerationand velocity level of the DMP. The presented method wasutilized for interaction tasks, where the human changed thebehavior of the robot based on the exerted dynamics on themanipulator. τ ˙ z = α z ( β z ( g − y ) − z ) + ˙ C f + f ( x ) , (64) τ ˙ y = z + C f . (65)whereas the force coupling term C f = ςF is defined asa virtual or measured force F and ς is a scaling factor,which essentially changes the dynamic behavior of theDMP, enabling the motion primitive to instantly react tothe coupled force. Later, Zhou et al. (2016b) introduceda PD controller based coupling term formulation C P D = ς ( K P ( F d − F e ) − D V ˙ F e ) coupled to the velocity part ofthe DMP (65). In the formulation F d represents the desiredforce, F e is the measured force, ς is a scaling factor and K P and D V are the proportional and derivative gains of theProportional Derivative (PD) controller. The coupling termformulation allows for controlled adaptation of robot motionto changes in the environment.In (Kramberger et al. 2018) this approach was extended,with a force feedback loop coupled to the velocity (2) and thegoal g of the DMP. The outcome of this approach is a similarbehavior as an admittance controller (Villani and De Schutter2008), with an difference that the execution is directly on thetrajectory generation level. τ ˙ z = α z ( β z (( g + C a ) − y ) − z ) + f ( x ) , (66) τ ˙ y = z + ˙ C a . (67)Here ˙ C a = ς ( F d − F e ) is the first time-derivative of theadmittance coupling term, which changes the velocity andconsequently the integrated coupling term, the positionoutput of the DMP. The described approach can be usedfor Cartesian space motion, where the forces have to besubstituted for desired and measured torques. This approachcan be implemented in robot tasks involving contact with theenvironment as well as contact with humans. In (Peternel et al. 2016a), human effort was used to providethe information about the direction in which the assistiveexoskeleton joint torque DMP should change in order tominimize it. The human was included into the robot controlloop by replacing the error calculation in (40) with the humaneffort feedback term U ( E ) : w i ( t +1 ) = w i ( t ) + Ψ i P i ( t +1 ) U ( E ) , (68)where E ( t ) is the current effort measured by humanmuscle activity through Electromyography (EMG) signals ‡ . ‡ Note that other feedback that measures human effort can be used insteadof EMG, such as joint torque or limb forces. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Equations (33)-(37) and (41) are used in the original form.Equations (38)-(40) are not used, since (68) is used tomodulate the weights in (36) instead.The effort feedback term U ( E ) closes the loop and acts asa feedback for adapting the weights of Gaussian kernels thatdefine the shape of the trajectory. A positive U ( E ) increases,while a negative U ( E ) decreases the values of weights at agiven section of the periodic DMP that encodes joint torque.If the shape of the DMP does not provide enough assistivepower, the human has to exert effort ( i.e., muscle activity) toproduce the rest of the power required to achieve the desiredtask under given dynamics. In turn, muscle activity feedbackthen increases the magnitude of the DMP until the humaneffort term U ( E ) is minimised. Note that each joint has itsown torque DMP and U ( E ) term (Peternel et al. 2016a).After that point, the DMPs do not change unless the task,dynamics or conditions change. If they change, the humanhas to compensate for the change by an additional muscleactivity, which in turn adapts the DMPs to the new requiredjoint torques. In many LfD scenarios it is desired to modify both thespatial motion and the speed of the learned motion atany stage of the execution. Speed-scaled dynamic motionprimitives first presented in Nemec et al. (2013a) are appliedfor the underlying task representation. The original DMPformulation from (1) and (2) were extended by adding atemporal scaling factor υ on the velocity level of the DMP υ ( x ) τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) , (69) υ ( x ) τ ˙ y = z. (70)Form (69) and (70), it is evident that the velocity term is afunction of phase, and therefore encoded with a set of RBFssimilarly as in (4). This method allows for modification ofthe spacial motion as well as the speed of the execution at anystage of the trajectory execution. The authors demonstratedthe proposed method in a learning scenario, where afterevery learning cycle (using Iterative Learning Control (ILC))a new velocity profile was encoded based on the wrenchfeedback, and thus converged to an optimal velocity for thespecific task. Vuga et al. (2016) extended the approach byincorporating a compact representation for non-uniformlyaccelerated motion as well as simple modulation of themovement parameters.Later on, in Nemec et al. (2018) the authors extendedthe previous approach to also incorporate velocity scalingof the encoded orientation trajectories represented with unitquaternions. The outcome of the presented work is a unifiedapproach to velocity scaling for tasks executed in Cartesianspace. Furthermore, a reformulation of the velocity approachcalled AL-DMPs was presented by Gaˇspar et al. (2018).In this work they present a method, where the spatial andtemporal components of the motion are separated, by meansof the arc-lenght based on the time parameterized trajectory.Arc-lenght, based on the differential geometry of curves,is related to the speed of the movement, given as the timederivative of the demonstrated trajectory. The approach iswell suited when multiple demonstrations are compared forextraction of relevant information for learning. Weitschat andAschemann (2018) add an extra forcing term to keep the velocity within a certain predefined limit. The aim of thiswork is to guaranty a safe execution of the robot task wheninteracting with humans, as well as providing a frameworkfor safe interaction in a changing environment where therobot position and velocity have to change over time. Fora full formulation of the coupling term see (Weitschat andAschemann 2018). Additionally, Dahlin and Karayiannidis(2020) in their work proposed a temporal coupling basedon a repulsive potential, keeping the DMP velocity withinthe predefined velocity limits while ensuring the path shapeinvariance. LfD is a wide research area and many different approacheshave been developed to reproduce human demonstrations(Billard et al. 2016). As already mentioned, the aim of thistutorial survey is to provide a comprehensive overview ofDMPs research and we intentionally skip the rich literaturein the field of LfD. However, we found some representationsthat are closely related to the DMP formulation. This sectionbriefly reviews them.Calinon et al. (2009) computed an acceleration commandfor the robot in a PD-like form ¨y = K P ( y d − y ) + D V ( ˙y d − ˙y ) , where K P is a stiffness and D V a damping gain, y is the measured state of the robot and ˙y its timederivative (velocity), y d and ˙y d are desired position andvelocity retrieved with GMR. Authors then shown that theacceleration command ¨y can be seen as a mixture of lineardynamics, each converging to a certain attractor. Despitelater work like (Kormushev et al. 2010) referred to thisrepresentation as “a modified version” of DMPs there aresignificant differences with the DMP formulation properlyhighlighted by (Calinon et al. 2012).Herzog et al. (2016) computed an acceleration commandfor the robot from the linear system ¨y = u = K P ( y d − y ) , where y is the measured state of the robot, y d is a humandemonstration, and K P is a control gain computed usingthe linear-quadratic regulator method. Then, a compactrepresentation of the control input trajectory u is computedby means of Chebyshev polynomials. This representationdoes not require a vanishing phase variable to ensureconvergence, but the generalization to different start/goalposition requires the application of the linear-quadraticregulator method to find a new sequence of control inputs.Regarding periodic motions, (Ajallooeian et al. 2013)proposed a dynamical system-based framework to learnrhythmic movements with an arbitrary shape and basin ofattraction. They exploit phase-based scaling functions torepresent the mapping between a known, base limit cycleand a desired periodic orbit. The basic limit cycle can be,for example, the one generated by a periodic DMPs, whichmakes the approach of (Ajallooeian et al. 2013) a moregeneral formulation of periodic primitives. Prepared using sagej.cls Journal Title XX(X) This section reviews approaches where DMPs havebeen integrated into bigger executive frameworks. Wecategorize these approaches into five main research areas,namely grasping and manipulation , impedance learning , reinforcement learning , deep learning , and incremental andlife-long learning Successfully grasping an object is the first step towardsrobotic manipulation. Performing a grasping requires a(visual) perception of the environment to locate the object tograsp and decide the grasping points based on its geometry.In this setting, even small uncertainties may cause the objectto drop and the grasp to fail. To improve the robustness ofvision driven grasping, Kr¨omer et al. (2010a) augmentedDMPs with a potential field based on visual descriptorsthat adapts hand and finger trajectories to the object’slocal geometry. This grasping strategy was integrated ina hierarchical control architecture where the upper leveldecides where to grasp the object and the lower level locallyadapted the motion to robustly grasp the object (Kr¨omeret al. 2010b). Stein et al. (2014) proposed a point cloudsegmentation approach based on convexity and concavity ofsurfaces. The approach is particularly suited to recognizeobject handles and enables a robot to automatically graspobject.The ability of grasping and using tools is also desirableto perform daily-life manipulation. In this respect, (Guerinet al. 2014) proposed the so-called tool movement primitives that transform the demonstrations in a tool affordance frame.The result is a motion that generalize to different toolposes and to tools that share the same affordance(s). Liand Fritz (2015) considered tool usage with low-cost, non-dexterous grippers and propose a framework to learn bi-manual strategies for tool usage and compensate for the lackof dexterity. Bi-manual robotic manipulation is a challengingtask that requires precise coordination between the handmovements and adherence to the spatial constrains. Thotaet al. (2016) developed a DMP-based control framework forbi-manual manipulation that ensures time synchronization ofthe two hands while being robust to spatial perturbations andgoal changes.Beyond the object grasping, everyday manipulationrequires a precise execution of complex movements. Oftensuch a complex movements are hard to encode into a singlemotion primitive, but they can be conveniently split intosimpler motions ( e.g., reach and grasp) that can be properlysequenced and executed (Fig. 12).The possibility of exploiting DMPs as the building blocksof complex tasks was investigated in (Ramirez-Amaro et al.2015; Caccavale et al. 2018, 2019). In these works, a humanteacher demonstrated a relatively complex task consistingof several actions performed on different objects. Thedemonstration was then automatically segmented into M basic motions used to fit M DMPs. While Ramirez-Amaroet al. (2015) exploit semantic rules ( e.g., , reach an objectwith a knife means cut ) to infer high-level human activities,Caccavale et al. built a hierarchical structure to schedulethe execution of the complex task by selecting the proper Figure 12. An example of hierarchical task decomposition andmotion primitives sequencing from (Agostini et al. 2020). DMP for the current executive context. They used kinestheticteaching and verbal cues (open/close gripper commands) toprovide task demonstrations. Lemme et al. (2014) organizesegmented task demonstrations into a motion primitiveslibrary learned from self-generated trajectory patches. Theyalso introduced a mechanism to remove unused skills andupdate the library. Kinesthetic teaching and haptic feedbackwere also used by Eiband et al. (2019) to segment andrecognize basic motions or skills , and to build a treedescribing geometric relationships—like reference framesand goal poses—between consecutive skills. At run time, therobot performed haptic exploration to locate objects in thescene and update the skill tree. The transformations in theskill tree were then used to define initial and goal pose ofthe DMPs and execute the task. Finally, Wu et al. (2018)integrated DMPs into a dialogue system with speech andontology to learn or re-learn a task using natural interactionmodalities.Collecting demonstrations becomes an issue of kinestheticteaching or marker-based motion trackers cannot be used.The latter requires an expensive sensor infrastructure thatis hard to build in real world scenarios like factory floors.Kinesthetic teaching needs torque controlled/collaborativerobots that are still uncommon in industrial scenarios. Toremedy this issue (Mao et al. 2015) exploited a low-cost RGB-D camera and track the human hand using themarkerless approach proposed by (Oikonomidis et al. 2011).Collected data were then segmented into basic motions andused to fit DMPs.Described approaches assumes that human teachersalways provide consistent and noiseless task demonstrations.Ghalamzan E. et al. (2015) encoded noisy demonstrationsinto a GMM and computed a noise free trajectoryusing GMR. The noise free trajectory was then usedto fit a DMP that generalized to different start, goal,and obstacle configurations. Niekum et al. (2012, 2015)designed a framework that learns from from unstructureddemonstrations by segmenting the task demonstrations,recognizing similar skills, and generalizing the taskexecution. Interestingly, a user study on volunteersconducted by (Gutzeit et al. 2018) showed that existing Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel strategies for segmentation and learning are sufficientlyrobust to enable automatic transfer of manipulation skillsfrom humans to robots in a reasonable time. Finally, somework (Deniˇsa and Ude 2013a,b; Deniˇsa and Ude 2015)exploited transition graphs and trees to embed parts ofa trajectory and search algorithms to discover sequenceof partial parts and generate motions that have not beendemonstrated.Approaches that rely on a hierarchical, tree-like structureto represent the task that has limited task generalizationcapabilities. Lee and Suh (2013) used probabilistic inferenceand object affordances to infer the adequate skill thatcan handle uncertainties in the executive context. Beetzet al. (2010) learned stereotypical task solutions fromobservation and used task planning and symbolic reasoningto execute novel mobile manipulation tasks. A generativelearning framework was proposed by (W¨org¨otter et al.2015) to augment the robot’s knowledge-base with missinginformation at different level of the cognitive architecture,including symbolic planning as well as object and actionproperties. (Paxton et al. 2016) used task and motionplanning to generalize the execution of complex assemblytasks and proposed an learning by demonstration approachto ground symbolic actions. (Agostini et al. 2020) performedtask and motion planning by combining an object-centricdescription of geometric relations between objects in thescene, a symbol to motion hierarchical decompositiondepending on tree consecutive actions in the plan, and theLfD approach developed in (Caccavale et al. 2019) (Fig. 12).A manipulation task was described at three different levelsby (Aein et al. 2013). The top-level provides a symbolicdescriptions of actions, objects, and their relationships. Themid-level uses a finite state machine to generate a sequenceof action primitives grounded by the lower level. A commonpoint among these approaches is that they use DMP toexecute the task on real robots. Impedance control can be used to achieve complaintmotions, in which the controller resembles a virtual spring-damper system between the environment and robot end-effector (Hogan 1985). Such approach permits smooth,safe, and energy-efficient interaction between robots andenvironments (possibly humans). A standard model for suchinteraction is defined as M ¨y t = K P t ( y g − y t ) − D V t ˙y t + f et , (71) I ˙ ω t = K O t ( Log R ( R g R t (cid:62) )) − D W t ω t + τ et , (72)where (71) and (72) correspond to translational androtational cases respectively, M , K P t , and D V t are themass, stiffness and damping matrices, respectively, fortranslational motion, while I , K O t , and D W t are the momentof inertia, stiffness and damping matrices, respectively, forrotational motion. ˆR , R t ∈ SO (3) are rotation matrices andcorrespond to desired rotation goal and actual orientationprofile of the end-effector, respectively. f et and τ et representthe external force and torque applied to the robot end-effector.In fact, VIC plays an important role when a robot needsto interact with any environment in order to avoid high Variable Impedance Learning Control (VILC)DMPInverseDynamics Robot andEnvironmentForwardKinematicsImpedance Adaptation Strategy K P t M − t ++ − ˆy + − y t ˙y t Q t , ˙ Q t f et D V t τ t Figure 13. General control scheme of Variable ImpedanceControl (VIC) and DMP. impact forces and damage for the environment or therobot ( i.e., change to low stiffness)Ajoudani et al. (2012);Abu-Dakka et al. (2018); Peternel et al. (2018a). On theother hand, it is important in rejecting unexpected andunpredictable perturbations from the environment to achievea desired position tracking precision ( i.e., change to highstiffness) Yang et al. (2011). In addition, it is also importantin coordination of human-robot collaborative movementsPeternel et al. (2017b). However, a robotic system still needsto learn how to adapt such VIC to unseen situations whileavoiding hard-coding. Such paradigm of learning is calledVariable Impedance Learning Control (VILC). Interestedreaders can refer to our recent survey on VILC (Abu-Dakkaand Saveriano 2020).In this review, we will mention some of the works thatintegrate DMP with VIC in a VILC framework. Figure 13shows a simple generic example where DMP is integrated ina VIC control scheme.Buchli et al. (2011a) proposed one of the earliestapproaches that integrates DMP with Policy Improvementwith Path Integrals (PI ) algorithm (Theodorou et al. 2010)to learn movements (position and velocity presented byDMP) while optimizing impedance parameters. Later theauthors exploited a diagonal stiffness matrix and expressedthe variation (time derivative) of each diagonal entry as ˙ k ϑ j ,t = α j (cid:0) (cid:102) j (cid:62) ( ϑ j + (cid:15) j,t ) − k ϑ j ,t (cid:1) , j = 1 , . . . , J, (73)where j indicates the j -th joint, k ϑ j ,t is the stiffness ofjoint j , (cid:15) j,t is a time-dependent exploration noise, each (cid:102) j is a vector of N Gaussian basis functions, and ϑ j are the learnable parameters for joint j . The stiffnessparameterization in (73) is also linear in the parameters andPI can be applied to find the optimal policy. Later, authorsused PI to learn VIC in deterministic and stochastic forcefields (Stulp et al. 2012a). Nakanishi et al. (2011) proposed amethod that optimizes a periodic motion a long with a time-varying joint stiffness.(Basa and Schneider 2015) introduced an extension toDMP formulation by adding a second nonlinear function tocope with elastic robots as follow τ ˙ z = α z ( β z ( g − y ) − z ) + f ( x ) + f , (74)where f is defined as (4) but without the phase variable x .The main purpose of f is to compensate the gravitationalinfluence on the moved DoF at the end of the movementtime and beyond. Differently, Haddadin et al. (2016) used Prepared using sagej.cls Journal Title XX(X) optimal-control to execute near-optimal motion of elasticrobots.Nemec et al. (2016) proposed a cooperative controlscheme that enables dual arm robot to adapt its stiffnessonline along to the executed trajectory in order to provideaccurate evolution. (Umlauft et al. 2017) used GP along withDMPs (as proposed in (Fanger et al. 2016)) to predict thetrajectories. During the execution, their admittance controlleradapts both stiffness and damping online. The energy-tankspassivity-based control method has been integrated withDMPs to enforce passivity in order to stably adapt to contactsin unknown environments by adapting the stiffness online(Shahriari et al. 2017; Kramberger et al. 2018; Kastritsi et al.2018).Methods in (Peternel et al. 2014, 2018b,a; Yang et al.2018, 2019; Bian et al. 2019) designed different multi-modalinterfaces to let the human to explicitly teach an impedancebehavior to the robot. Most of them combined EMG-based variable impedance skill transfer with DMP-basedmotion sequence planning, inheriting the merits of thesetwo aspects for robotic skill acquisition. Hu et al. (2018)used Covariance Matrix Adaptation-Evolution Strategies(CMA-ES) to update the parameters of DMPs and variableimpedance controller in order to reduce the impact in duringthe robot motion in noisy environments. Dometios et al.(2018) integrated a Coordinate Change-DMPs (CC-DMP)with a vision-based motion planning method to adapt thereference path of a robot’s end-effector and allow theexecution of washing actions.Travers et al. (2016, 2018) proposed a shape-basedcompliance controller for the first time in locomotion,by implementing amplitude compliance on a snake robotmoving in complex environment with obstacles. Theirapproaches allow a snake-like robots to blindly adapt to suchcomplex unstructured terrains thanks to their proprioceptivegait compliance techniquesRecently, an adaptive admittance controller is proposed(Wang et al. 2020) which integrates GMR for the extractionof human motion characteristics, DMP to encode ageneralizable robot motion, and a RBF-NN-based controllerfor trajectory-tracking during the reproduction phase.Novel LfD approaches explicitly take into account thattraining data are possibly generated by certain Riemannianmanifolds with associated metrics. Abu-Dakka and Kyrki(2020) reformulated DMPs based on Riemannian metrics,such that the resulting formulation can operate with SPD datain the SPD manifold. Their formulation is capable to adaptto a new goal-SPD-point.Recently, biomimetic controller has been integrated withDMPs (Zeng et al. 2021) in order to learn and adaptcompliance skills. In RL, an agent tries to improve its behavior via trial-and-error by exploring different strategies ( actions ) and receivinga feedback ( reward ) on the outcome of its actions. Actions a are drawn from a policy π ( s, a ) that represent a mappingbetween states s and actions a . The goal of RL is tofind an optimal policy π (cid:63) that maximizes the cumulativeexpected reward, i.e., the sum of expected rewards over apossibly infinite time interval. When the agent is a robot πϑϑ rewardORPolicy Initialization Robot andEnvironmentDMPPolicy ImprovementHumanDemonstration Figure 14. General block scheme of DMP-based policyimprovement. performing tasks in the real world the state and actionsspaces are inherently continuous. Moreover, the roboticagent is affected by imperfect ( e.g., noisy) perception andinaccurate models ( e.g., contacts). Finally, performing alarge amount of interactions with the real word ( rollouts ) isexpensive and possibly dangerous. As discussed by (Koberet al. 2013), robotic specific challenges require specificsolutions to make the RL problem feasible. One possibility is to use parameterized policy and use RLto search for an optimal, finite set of policy parameters.In this respect, DMPs have been widely used as policyparametererization. The general idea is shown in Figure 14.More in details, (Peters and Schaal 2008a,b) showed thatvarious policy gradient and actor-critic RL approaches canbe effectively applied to improve robotic skills parameterizedas DMPs. Other research focused on developing policysearch algorithms specifically for parameterized policies.Inspired by stochastic optimal control, Theodorou et al.(2010) proposed Policy Improvement with Path Integrals(PI ) which is an application of path integral optimal controlto DMPs. PI and DMPs have been successfully appliedin several domains including VILC Buchli et al. (2011a,b)and in-contact tasks (Hazara and Kyrki 2016), graspingunder state estimation uncertainties (Stulp et al. 2011), bi-manual manipulation (Zhao et al. 2020), and robot-assistedendovascular intervention (Chi et al. 2018). Kober and Peters(2011) derived from expectation-maximization the so-calledPolicy Learning by Weighting Exploration with the Returns(PoWER). PoWER and DMPs have been successfullyapplied to perform highly dynamic tasks including ball-in-a-cup Kober and Peters (2011) and pancake flippingKormushev et al. (2010). Even with parameterized policies the number of rolloutsneeds to search for optimal policy parameters may becomelarge, especially for robots with many DoFs. Dimensionalityreduction techniques can be exploited to perform policysearch in a reduced space (Colom´e and Torras 2014). Theeffectiveness of this approach was demonstrated in thechallenging task of clothes ( i.e., soft tissues) manipulation(Colom´e and Torras 2018). IL arises as an effective approachto policy initialization and to speed up policy search by Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel reducing the number of rollouts (Kober and Peters 2010).In this respect, Kober et al. (2008, 2010a) augmentedDMPs with a perceptual coupling term and propose toinitialize the DMP via human imitation and to refine themotor skill via RL. IL can be eventually combined withdimensionality reduction (Tan and Kawamura 2011) andseveral rollouts can be performed firstly in simulation(Cohen and Berman 2014) to further speed up the policysearch. When multiple demonstrations are given, one canlearn a mapping between policy parameters and querypoints ( e.g., , goal positions) and use the mapping togeneralize to new situations (Section 3.1.3). This strategywas used by Nemec et al. (2011, 2012, 2013b) to providea good initial policy for a new situation which is thenfurther refined using RL. Being the mapping estimatedusing example query points, the search space can beeffectively constrained within query points making the policysearch more efficient. Vuga et al. (2015a,b) combined thisapproach with a different DMP formulation to optimizethe velocity of execution. The approach was tested ondiverse tasks including pouring water in a cup, where itprevented the water to split from the cup during the motion.Schroecker et al. (2016) provided demonstrations in theform of soft via-points (Section 3.1.2) which reduce thesearch space to the neighborhood of the taught via-points.Multiple demonstrations were used by (Reinhart and Steil2014, 2015) to build a parameterized skill memory thatconnects low-dimensional skill parameterization to motionprimitive parameters. This low-dimensional embedding isthen leveraged for efficient policy search. Instead of learninga mapping from task to policy parameters, Queißer et al.(2016) used data from the rollouts to incrementally learna parametric skill (bootstrapping) and used it to generate agood initial policy for a new task. Instead of using generalization to provide a better initialpolicy, some researchers exploit RL to improve andgeneralize the motion primitive. (Andr´e et al. 2015) adaptedDMP policies to walk on sloped terrains. M¨ulling et al.(2010) generalized to new situations using a mixture ofDMPs. In their approach, RL was used to estimate the shapeparameters as well as to estimate the optimal responsibilityof each DMP. (M¨ulling et al. 2013) used episodic RLto estimate meta-parameters like the temporal and spacialinterception point of the ball and the racket typical of tabletennis tasks. Lundell et al. (2017) used parameterized kernelweights and RL to search for optimal parameters, while(Forte et al. 2015) augmented the given demonstration usingRL-based state space exploration to autonomously expandthe robot’s task knowledge. Metric RL was exploited by(Hangl et al. 2015) to smoothly switch between learned DMPpolicies and execute a task in new situations.RL can be also applied to sequence multiple motionprimitives and perform more complex task; a successfulstrategy when the robot has to perform, for instance,a manipulation task (Section 4.1). To sequence multipleprimitives it is also of importance to learn the goal of eachmotion. Tamosiunaite et al. (2011) used continuous valuefunction approximation to optimize the goal parametersof a DMP used to perform a pouring task. Kober et al. (2011, 2012) learned a meta-parameter function that mapsthe current state to a set of meta-parameters including goaland duration of the movement. Instead of separating shapeand goal learning into different processes, (Stulp et al. 2011;Stulp et al. 2012b) extended PI to simultaneously learnshape and goal of a sequence of DMPs. Learned skills can be potentially transferred across differenttasks to speed up the learning process and increaserobot autonomy. To this end, Fabisch and Metzen (2014)considered the case where the robot can actively choosewhich task to learn to make the best progress in learning.The process of actively selecting the task was considered as anon-stationary bandit problem for which suitable algorithmicsolution exist while intrinsic motivation heuristics wereexploited to reward the agent after the selection. (Cho et al.2019) defined the complexity of a motor skill based ontemporal and spatial entropy of multiple demonstrationsand used the measured complexity to generate an order forlearning and transferring motor skills. Their experimentalfindings provided useful guidelines for skill learning andtransfer. In short, humans have to demonstrate, whenpossible, the most complex task and then the robot is able totransfer the motor skills. Vice versa, if demonstrations are notgiven, it is more effective to start learning simple skills firstand then transfer the simpler skills to more complex tasks. RL often lacks scalability to high dimensional continuousstate and action spaces. To remedy this issue, hierarchicalRL exploits a divide et impera approach by decomposing aRL problem into a hierarchy of sub-tasks in order to reducethe search space. Different levels in the hierarchy representinformation at different time and/or spatial scale.Stulp and Schaal (2011) proposed to represent differentoptions as DMPs to sequence. PI was extended to optimizeshape and (sub-)goal of each DMP at different levels oftemporal abstraction. In particular, the shape was adjustedbased on the cost up to the next primitive in the sequence,while the sub-goal considers the cost of the entire sequenceof two DMPs. Layered direct policy search in (End et al.2017) did not rely on a set of predefined sub-policies and/orsub-goals, but instead used information theoretic principlesto uncover a set of diverse sub-policies and sub-goals.Reducing the number of rollouts required to discoveroptimal policies is also important in Hierarchical RL (HRL).As already mentioned, IL is a valuable option to findgood initial policies. However, there are applications likemanipulation with multi-fingered robotics hands for whichit is hard or impossible to provide expert demonstrations.To make policy search more efficient, Ojer De Andres et al.(2018) used HRL where the upper-level considers discreteaction and state spaces to search for optimal finger gaitingand synchronization among the fingers. This informationwas passed to the lower-level where rhythmic DMPs andPI generated continuous commands for the fingers. Anotherpossibility to increase data-efficiency is to use model-based approaches for RL. Colome et al. (2015) exploiteda friction model to improve a DMP policy and manipulatesoft tissues (a scarf). A model-based HRL approach was Prepared using sagej.cls Journal Title XX(X) proposed by (Kupcsik et al. 2017) for data-efficient learningof upper-level policies that generalize well across differentexecutive contexts. Finally, (Li et al. 2018) proposed a hybridhierarchical framework where the higher-level computesoptimal plans in Cartesian space and converts them todesired joint targets using an efficient solver. The lower-level is then responsible to learn joint space trajectories underuncertainties using RL and DMPs. A popular method of machine learning are NNs. Due totheir non-parametric nature, they can effectively representnonlinear mappings. A major drawback of NNs in the pastwas their computational complexity of learning. In recentyears there is a renewed interest in NNs. New deep learningapproaches were successfully (LeCun et al. 2015) applied inmachine vision and language processing.In recent years, deep learning has been applied also inrobotics to learn task dynamics (Yang et al. 2016) andmovement dimensionality reduction (Chen et al. 2015). Theauthors (Chen et al. 2015, 2016) introduced a frameworkcalled AutoEncoded DMP (AEDMP) which uses deep auto-encoders to find a movements represented in latent featurespace. In this space DMPs can optimally be generalized tonew tasks, as well as the architecture enables the DMPs to betrained as a unit. Pervez et al. (2017b) in their work coupledthe vison perception data for object calcification with taskspecific movement definitions represented with DMPs. Thedata was modeled with Convolutional Neural Networks(CNNs), where the images and the associated movementswere directly processed by the deep NN, thus preservingthe associated DMPs properties and eliminating the need forextracting the task parameters during motion reproduction.Later on Kim et al. (2018b) combined deep RL with DMPs tolearn and generalize robotic skills from demonstration. Theframework builds on a RL approach to learn and optimize anew DMP skill based of a demonstration. The RL approachis backed up with a hierarchical search strategy, reducing thesearch space for the robot, which allows for more efficientlearning of complex tasks. Furthermore, Pan and Manocha(2018) presented an deep learning approach form motionplanning of high dimensional deformable robots in complexenvironments. The locomotion skills are encoded with DMPsand a NN is trained for obstacle avoidance and navigation.The data is further optimized with deep Q-Learning showingthat the learned planner can efficiently plan and navigatetasks for high dimensional robots in real time.Pahic et al. (2018) proposed a deep learning approachfor perception-action couplings, demonstrating the couplingbetween the vision based images and associated movementtrajectories. Later on they extended the approach toincorporate CNNs and give a distinguishing propertyformulation for the approach (Pahiˇc et al. 2020), whichutilizes a loss function to measure the physical distancebetween the movement trajectories as opposed to measuringthe distance between the DMPs parameters which haveno physical meaning, leading to better performance of thealgorithm. Recently, they extended the usage of GPR tocreate a database needed to train autoencoder NNs fordimensionality reduction (Lonˇcarevi´c et al. 2021). T SK , T SK , · · · , T SK N − T SK N · · · Previously learned tasks New task Future tasksDMP-basedPolicyImprovementSkillModel P a s t K no w l edge Database a reward ϑ new Lifelong/Incremental Learning R obo t and E n v i r on m en t ϑ init Figure 15. General framework of lifelong/incremental learningapproach. Lifelong (incremental) learning is a framework whichprovides continuous learning of tasks arriving sequentially(Thrun 1996; Chen and Liu 2018; Fei et al. 2016). Theessential component of this framework is a database whichmaintains the knowledge acquired from previously learnedtasks T SK , T SK , · · · , T SK N − . Incremental learningstarts from the task manager assigning a new task T SK N to a learning agent. In this case, the agent exploits theknowledge in the DB as prior data for enhancing thegeneralization performance of its model on the new task.After the new task T SK N is learned, database is updatedwith the knowledge obtained from learning T SK N . In fact,the incremental learning framework provides an agent withthree capabilities: ( i ) continuous learning, ( ii ) knowledgeaccumulation, and ( iii ) re-using previous knowledge forfuture learning enhancements. Figure 15 shows generalstructure of DMP integrated in a lifelong framework.Churchill and Fernando (2014) proposed a cognitivearchitecture capable of accumulating adaptations and skillsover multiple tasks in a manner which allows recombinationand re-use of task specific competences. Lemme et al. (2014)segmented demonstrations based on geometric similarities,and subsequently created a motion primitives library. Thelibrary is updated by removing unused skills and includingnew ones. Multiple demonstrations are used by (Reinhart andSteil 2014, 2015) to build a parameterized skill memory thatconnects low-dimensional skill parameterization to motionprimitive parameters. This low-dimensional embedding isthen leveraged for efficient policy search. Piece-wise linearphase is used to improve incremental learning performance(Samant et al. 2016). Duminy et al. (2017) designed aframework for learning which data collection strategy ismost efficient for acquiring motor skills to achieve multipleoutcomes, and generalize over its experience to achieve newoutcomes for cumulative learning.A generative learning framework is proposed to augmentthe robot’s knowledge-base with missing information atdifferent level of the cognitive architecture includingsymbolic planning as well as object and action properties(W¨org¨otter et al. 2015).Wang et al. (2016) proposed a modified formulation ofDMPs called as DMP+ which capable of efficiently modifylearned trajectories by improving the usability of existingprimitives and reducing user fatigue during IL. Later, DMP+ Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Figure 16. Human operators teach the robot how to perform different tasks. Left scenarios use robots’ gravity compensation modeto enable kinesthetic guiding, where a human operator guides the robot’s tool center point along the desired trajectory in such away that the desired task is successfully executed (Sloth et al. 2020; Abu-Dakka et al. 2015a, 2018; Caccavale et al. 2019). Right scenarios use teleoperation system to demonstrate appropriate robot movements either through haptic interface (Peternel et al.2018a) or magnetic trackers (Abu-Dakka et al. 2015a). had been integrated into a dialogue system with speech andontology to learn or re-learn a task using natural interactionmodalities (Wu et al. 2018).In literature, it has been shown that incremental learningprovides better generalization than the isolated learningapproaches in terms of interpolation, extrapolation and thespeed of learning (Hazara and Kyrki 2017). Hazara andKyrki (2018) improved their Global Parametric DynamicMovement Primitive (GPDMP) (Lundell et al. 2017) inorder to construct, incrementally, a database of motionprimitives, which aims to improve the generalization to newtasks. Furthermore, it has been transferred incrementallyfrom simulation to the real world (Hazara and Kyrki2019). Moreover, authors endow incremental learning witha task manager, which capable of selecting a new task bymaximizing future learning while considering the currenttask performance (Hazara et al. 2019). We categorize the applications into several subsections basedon different topics. We first separate the use of DMPs forrobot interaction with the passive environment ( e.g., tools,objects, surfaces, etc ) and for interaction with an agent thatinvolve co-manipulation ( e.g., human, another robot, etc ).Additionally, we examine several other major applicationareas, such as human body augmentation/rehabilitation withexoskeletons, teleoperation , motion analysis/recognition , high DoF robots , and autonomous driving and field robotics . Most of the daily tasks that the robots perform involvesome kind of physical interaction with the environmentthat requires control of forces or positions. Nevertheless,simultaneous control of force and position in the same axisis not possible (Stramigioli 2001) § , and therefore the controlapproaches have to make a compromise between prioritizingposition control or force control (Schindlbeck and Haddadin2015). The key to such control is for the robot to learnappropriate force or position reference trajectories that canlead to the desired task performance in interaction with theenvironment. A common approach to teaching robot motion trajectoriesis kinesthetic guidance (Fig. 16- Left ), where the humanoperator holds the robot arm and shows the appropriatemovements to be encoded by DMPs (Kormushev et al. 2011;Abu-Dakka et al. 2015a; Joshi et al. 2017; Papageorgiou et al.2020a,b). Recently, the technology is protruding into highrisk fields such as invasive surgery, where high-dimensionalfine human-like manipulation skills are being demonstrated(Su et al. 2021) and executed with robots (Su et al. 2020;Ginesi et al. 2019). In (Kormushev et al. 2011), the humanheld the robot arm and used kinesthetic guidance to teachthe position and orientation trajectories necessary to performironing and door opening task. In the second stage the § There is a duality in impedance-admittance, i.e., , the force produce motionand motion produces force, therefore if one is the input, the other can onlybe the output of the control system Peternel et al. (2017a). Prepared using sagej.cls Journal Title XX(X) Figure 17. Using DMPs for adapting to changing surfaces( e.g., wiping task) (Kramberger et al. 2018) corresponding forces and torques were recorded with ahaptic device in a teleoperation setup. For setups where therobot arm is equipped with multiple force/torque senors, thetwo demonstration steps with additional control policies canbe combined into one (Steinmetz et al. 2015; Montebelliet al. 2015).An alternative to learning force trajectories is to learnthe impedance of the robot by learning the desired stiffnesstrajectories. The ability to change the impedance of the arm iscrucial to simplify the physical interaction in unpredictableand unstructured environments (Hogan 1984; Burdet et al.2001). In (Peternel et al. 2015, 2018a) teleoperation wasused with a push-button interface to command the robotimpedance, which was learned by DMPs that enabled therobot to perform various collaborative assembly tasks. Forexample, the learned position and stiffness DMPs were usedto insert a peg in a groove to bind the two parts (Peternel et al.2015), or to screw a bolt (Peternel et al. 2018a). A similarapproach was used in (Yang et al. 2018) to learn DMPs usedfor vegetable cutting task.While teleoperation based methods are very effectiveto teach the robot DMPs for interaction tasks, it usuallyinvolves a complex and expensive system. The methodin (Abu-Dakka et al. 2018) enabled the robot to learnstiffness profiles through measurement of interaction forcewith the environment to perform valve turning task. Themethod in Peternel et al. (2017a) used human demonstrationand EMG to learn stiffness DMPs from human muscleactivity measurements in order to perform sawing and wiping(Fig. 17) tasks.Nevertheless, adaptation of a single trajectory is unlikelyto generate an appropriate solution for more general cases,where the task execution needs to change significantly.After learning the initial DMP motion trajectories throughkinesthetic guidance, the robot can then adapt them basedon the measured force of interaction while performing thetask. Pastor et al. (2011) introduced a method for real-timeadaptation of demonstrated DMPs trajectories dependingon the measured sensory data. They developed an adaptiveregulator for trajectory adaptation based on estimated andactual force data. Recently, Prakash et al. (2020), extendedthe real-time adaptation approach incorporating a fuzzyfractional order sliding mode controller in order to efficiently Figure 18. An example of using DMPs in assembly tasks( e.g., peg-in-the-hole) (Kramberger et al. 2016b) .and stably adapt the demonstrated DMP trajectory to fastmovements, such as a ping pong swing.Sutanto et al. (2018) presented a data-driven frameworkfor learning a feedback model from demonstrations. Theyused an RBF-NN (RBF-NN) to represent the feedbackmodel for the movement primitive. Similarly to this research,Gams et al. (2010) proposed a method for adaptation ofdemonstrated movements depending on the desired force,with which the robot should act on the environment. Thus,they ensured the adaptation of the learned movements todifferent surfaces. This approach was later expanded (Pastoret al. 2011) to provide the statistically most likely force-torque profile (Pastor et al. 2012) and furthermore, force-torque data was used for training a classifier (Straizys et al.2020) in order to modulate the demonstrated trajectory forthe use with delicate tasks such as tissue or fruit cutting.Moving onward form policy learning, Do et al. (2014)presented an adaptation framework, where not only thedesired adaptation force or trajectory, but the entire skill canbe learned. They demonstrated the method with a wiping taskunder different environmental conditions. Assembly presents one of the more challenging tasks toautomate, where not only position trajectories but also taskdynamics have to be taken into account. To deal with thischallenge, various methods were proposed. Abu-Dakka et al.(2015a) proposed a method that can learn the orientationaspect of the complex physical interaction, like the peg-in-the-hole assembly tasks (Fig. 18). The proposed method wasintegrated in an industrial assembly framework where thekey challenge was to adapt to uncertainties presented by theassembly task (Kr¨uger et al. 2014; Abu-Dakka et al. 2014).Complex assembly tasks that are subject to change cannotbe demonstrated and executed on the fly therefore, adaptationmethods are required for ensuring a successful execution.Nemec et al. (2020) used exception strategies for dealingwith complex assembly cases . Sloth et al. (2020) presentedan exception strategy framework, combining discrete andperiodic DMP, coupled with force control to learn anassembly task under tight tolerances. Gaˇspar et al. (2020)presented several industrial assembly challenges and focusedon fast and efficient setup of industrial tasks with theemphasis on LfD. Angelov et al. (2020) incorporated severaldifferent control policies by taking into account the dynamicsand sequencing of the task. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel In some cases, active exploration and autonomousdatabase expansion can be used for learning assemblypolicies automatically. In (Petric et al. 2015) the proposedalgorithm can build and combine CMP motion knowledgefrom a database in an autonomous manner.Complementary to assembly tasks, disassembly is alsochallenging by solely using the demonstrated trajectories. Asdescribed in (Ijspeert et al. 2013), DMPs have a unique pointattractor in the specified goal parameter of the movement,essentially repelling the idea of reversibility. Therefore,Nemec et al. (2018) proposed a framework, where thedisassembly challenge was tackled by learning two separateDMPs from a single demonstrated motion; one forwards andone backwards. San Juan et al. (2019) took the idea furtherand reformulated the DMPs phase system with a logisticdifferential equation to obtain two stable point attractors.This approach provided a reversibility formulation of thedynamical system and demonstrated the effectiveness of thealgorithm on a peg-in-hole assembly task. Desired force-torque profiles can be tracked using ILC(Gams et al. 2014, 2015b). In repetitive robotic tasks,iterative learning has been gaining increased popularity(Bristow et al. 2006) due to its effectiveness and robustness.However, in order to achieve effective results, a carefultuning of learning parameters is required. Norrl¨of (1991) andTayebi (2004) presented an adaptive learning approach forautomated tuning of learning parameters.Another approach is to use RL to adapt DMPs. Forexample, in (Buchli et al. 2011b,a) stiffness parameters wereadjusted during the task execution by RL.Alternatives to feedback-based adaptation of DMPs andRL are scalability and generalization approaches. Matsubaraet al. (2011) proposed an algorithm for the generation of newcontrol policies from existing knowledge, thereby achievingan extended scalability of DMPs, while mixture of motorprimitives were used for generation of table tennis swings(M¨ulling et al. 2010). On the other hand, generalization ofDMPs was combined with model predictive control by Krugand Dimitrov (2015) or applied to DMP coupling terms by(Gams et al. 2015a), which were learned and later addedto a demonstrated trajectory to generate new joint spacetrajectories.Stulp et al. (2013) proposed to learn a functionapproximator with one regression in the full space of phaseand tasks parameters, bypassing the need for two consecutiveregressions. Forte et al. (2012) performed a comparisonstudy of LWR and GPR for trajectory generalization. Thiswork shows that higher accuracy can be achieved with LWRtrajectory approximation. Koropouli et al. (2015) presenteda generalization approach for force control policies. Bylearning both the policy and the policy difference data usingLWR, they could estimate the policy at new inputs throughsuperposition of the training data.(Deniˇsa et al. 2016a) used GPR based generalization overcombined joint position trajectories and torque commandsin the framework of CMPs. To showcase the versatilityof the approach, (Petric et al. 2018) applied it for robotbased assembly tasks. Finally, Kramberger et al. (2017)extended the approach to account for variations of the desired Figure 19. An example of using DMPs for collaborativehuman–robot sawing from (Peternel et al. 2018b). tasks, e.g., assembly of similar objects. This enables therobot movements to be automatically generated with theuse of LWR from a demonstrated database of successfultask executions, which include kinematic and dynamicdemonstrated trajectories encoded with DMPs. The newlyobtained data is used to account for the changes in thework-space. Nevertheless, a major problem in statisticallearning is how to efficiently deal with singularity freerepresentations of orientation trajectories. To resolve thisissue, Kramberger et al. (2016a) proposed a formulation forCartesian space DMPs where orientations are representedwith unit quaternion. While control of robot interaction with the passiveenvironment can solve majority of the tasks, in some casesthe robot needs to interact with an active agent ( e.g., human,another robot, etc.). Human-robot collaboration is becomingone of the key fields in robotics (Ajoudani et al. 2018). Toperform a successful physical human-robot collaboration,the robot must be able to control complex movements incoordination with the human partner. In this direction, theability to modulate the impedance is important to coordinatethe physical interaction during human-robot co-manipulationof tools (Peternel et al. 2017b). DMPs offer an elegantsolution to encode such coordinated dynamic movements.In (Peternel et al. 2014) the collaborative robot wasthought online through teleoperation how to performcollaborative sawing with a human co-worker. Theimpedance was commanded to the robot through muscleactivity measurement using EMG. DMPs were used toencode coordinated phase-dependent motion and impedanceas demonstrated by the human teleoperator. Teaching thoughteleoperation is an effective way to convey the physicalinteraction skill to the collaborative robot, however the setupcan be expensive and is not widely available.An intuitive alternative to teleperation is for the robotto learn the skill directly though physical interaction withthe human partner while they are collaborating. Numerousmethods have focused on learning the synchronized motionbetween collaborative partners (Kulvicius et al. 2013; Pradaet al. 2013; Gams et al. 2014; Umlauft et al. 2014; Zhouet al. 2016a; Peternel et al. 2018b; Sidiropoulos et al. 2019;Ugur and Girgin 2020). For example, in (Kulvicius et al. Prepared using sagej.cls Journal Title XX(X) InteractionPrimitives that can account for a probabilistic nature ofcollaborative movements. Rather than having a singlevalue of weights, the DMP includes weight distributions.This distribution enabled the robot to learn the inherentcorrelations of cooperative actions and infer the behaviorof the human partner during the cooperation. (Cui et al.2016, 2019) used visual information to extract contextrelated parameters that augment the interaction primitives toincrease the robustness during the task execution.There are also other types of co-manipulation scenarios,such within-hand bi-manipulation or human-robot objecthandover. For example, in (Koene et al. 2014; Gao et al.2019) DMPs were used to perform bi-manipulation, whilein (Prada et al. 2014; Solak and Jamone 2019; Lafleche et al.2019; Abdelrahman et al. 2020) DMPs were used for human-robot object handover.When the environment is hazardous for the humanworkers or when there are too many robots compared tothe number of human workers, the obvious solution is tomake robot collaborate between themselves. The method in(Peternel and Ajoudani 2017) used DMPs to make novicerobots learn from the expert robot through co-manipulation.Initially the novice robot remained compliant to let the expertrobot lead the task execution. In the first stage, the novicerobot learned the reference motion through DMPs. In thesecond stage, it became stiff to perform the newly learnedmotion, while the expert robot initiated stiff/compliantphases expected in the collaborative task execution. Finally,the novice robot then learned in which phases of the taskto increase or decrease the impedance and encoded thisimpedance behavior with DMPs. The most common type of co-manipulation is the classichuman-robot collaboration, where a human and a roboticagent are physically performing industrial or daily tasks.Another type of co-manipulation occurs when a humanis wearing an exoskeleton. In most cases, the exoskeletonsimply amplifies the current human motion (Kong and Jeon Figure 20. An example of using DMPs for teaching passiveexercises for ankle rehabilitation (Abu-Dakka et al. 2015b,2020). Left ). Thephase-dependent toque trajectory updated online in order tominimise the muscle activity feedback measured by EMG.In (Petriˇc et al. 2016) the robot encoded the assistive motionwith DMPs and then adapted it by taking into account aspectsof human motor control through the Fitts’ law.Gait related rehabilitation with exoskeletons is a verycommon application of DMPs and there are numerousexamples (Abu-Dakka et al. 2015b; Huang et al. 2016a,b;Hwang et al. 2019; Yuan et al. 2020; Amatya et al. 2020). In(Abu-Dakka et al. 2015b, 2020) a parallel robot was used forankle rehabilitation, where the movements were generatedby DMPs (Fig. 20). In (Huang et al. 2016a) DMPs wereused to learn the gait motion trajectories for a lower-bodyexoskeleton. This approach was then extended with a RLmethod to adapt a force coupling term (similar to earlierapproaches presented in Section 3.3.2) to enable onlineadaption of motion trajectories (Huang et al. 2016b).Besides normal gait, DMPs were also applied for stair-ascend (Xu et al. 2020) and sit-to-stand (Kamali et al. 2016)assistive movements of lower-body exoskeletons. In (Joshiet al. 2019), a robotic arm was used to assist humans withputting the cloths on their body, where the movements weregenerated by DMPs.Besides assistive body movement and rehabilitation,DMPs were also applied for relaxation purposes. Forexample, in (Li et al. 2020) a robotic arm provided massagemovement through DMPs. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Figure 21. The left photo shows arm exoskeleton applicationfrom (Peternel et al. 2016a). The right photo shows high-DoFhumanoid robot Walk-man (Tsagarakis et al. 2017) performingsawing in (Peternel and Ajoudani 2017). Teleoperation is one of the major fields of robotics andenables a human to have a direct and real-time controlover a (remote) robot. Typically the control is done throughinterfaces that can capture the human commands to be sentto the robot and that can provide haptic feedback fromthe robot. While teleoperation focuses on giving the humanoperator a full or shared control over the robot, DMPs areused to encode autonomous robot behaviors. Therefore, herewe mostly examine cases where teleoperation is used to teachthe robot new autonomous behavior encoded by DMPs.In (Kormushev et al. 2011) a combination of kinestheticteaching and teleoperation was employed to form theDMP-based robot skill for ironing. After the motiontrajectories were learned through kinesthetic guidance, thecorresponding forces were recorded by using haptic deviceand a teleoperation system. In (Peternel et al. 2014)teleoperation was used to teach the robot how to physicallycollaborate with another human. Since there was no hapticfeedback, the teleoepration setup was unilateral, but thehuman was able teach also the impedance of the robot inaddition to motion. The former was commanded by muscleactivity measurement through EMG, while the latter was wascommanded by the movement of the human operator’s armas measured by an optical motion capture system.In (Peternel et al. 2018a) the human operator thoughtthe robot through teleoperation how to perform autonomousassembly actions (Fig. 16- Right ). DMPs were used toencode the commanded impedance and motion, howevera more practical push-button based impedance commandinterface was employed. More importantly, the teleoperationsetup was bilateral and the haptic interface provided thehuman operator the feedback about the forces the robot felt.Similarly, teleoperation approaches were used in (Yang et al.2018; Lentini et al. 2020).Real robot is not always necessary to acquire newskills. In (Beik-Mohammadi et al. 2020) the robot andthe environment were simulated and the human operatorused a virtual reality system. A combination of DMPs and RL was used to form an adaptive skill. The scenarioproposed in (Abu-Dakka et al. 2015a) was teleoperationin its basis, however the human demonstrator did not justpretend that he/she is embodied in the robot, but the robottask environment was cloned at the human side (Fig. 16- Right ). This removed the need for force feedback and hapticdevice, since the human felt the real environment on his/herside, while the motion was captured by non-contact basedsensory system ( i.e., magnetic trackers) and then mirroredon the robot.Multiple demonstrations through teleoperation can beinconsistent, especially if done in a multi-agent shared-control setting. The method proposed in (Pervez et al. 2019)can synchronize inconsistent demonstration through shared-control teleoperation and encode them with DMPs. DMPs provide an elegant and fast way to deal with systemswith high-dimensional space by sharing one canonicalsystem (3) among all DoFs and maintain only a separate setof transformation systems. By high-dimensional space weare referring to systems with 10 or more DoFs ( i.e., Walk-man humanoid robot in Figure 21- Right ). In this section, wewill quickly mention some of the potential works with highnumber of DoFs.Ijspeert et al. (2002b,a) used DMPs in an IL frameworkto learn tennis forehand, a tennis backhand, and rhythmicdrumming using 30-DoFs humanoid robot. Pastor et al.(2009) used DMPs to encode a 10-DoFs exoskeleton robotarm. Luo et al. (2015) integrated DMPs with stochasticpolicy gradient RL and GPR in order to design an onlineadaptive push recovery control strategy. The approach hadbeen applied to PKU-HR5 humanoid robot with 20-DoFs.Andr´e et al. (2015, 2016) implemented a predictive modelof sensor traces that enables early failure detection forhumanoids based on an associative skill memory to periodicmovements and DMPs. They applied their algorithm onDARwIn-OP with 20-DoFs in simulation. Pfeiffer andAngulo (2015) represented gestures by applying DMPs onREEM robotic platform with 23-DoFs. Nah et al. (2020)proposed an approach to optimize DMP parameters in orderto deal with the complexity of of high DoF system likea whip. They tested their approach in simulation for 10-, 15-, 20-, and 25-DoFs systems. In order to reduce thenumber of required rollouts for adaptation to new taskconditions, Queißer and Steil (2018) used CMA-ES tooptimize DMPs parameters. In addition, they introduceda hybrid optimization method that combines a fast coarseoptimization on a manifold of policy parameters with a finegrained parameter search in the unrestricted space of actions.The approach was successfully illustrated in simulation usinga 10-DoFs robot arm. Liu et al. (2020) proposed DMP-basedtrajectory generation to enable a full-body humanoid robotwith 10-DoFs (for the two legs) to realize adaptive walking.Travers et al. (2016, 2018) proposed a framework thatintegrates DMP with Gaussian-shaped spatial activationwindows in order to plan the motion for high DoF roboticsystems ( e.g., snake-like robot) in complex environment(with obstacles) by linking low-level controllers to high-levelplanners. Prepared using sagej.cls Journal Title XX(X) DMPs tend to fit topologically similar trajectories withsimilar shape parameters w i (Ijspeert et al. 2013). Thisbehavior, due to the temporal and spatial invariance ofDMPs, makes the shape parameters a useful descriptor torecognize similar motions. Indeed, (Strachan et al. 2004)have shown that the shape parameters computed for repetitions of classes of discrete hand gestures—measuredwith a DoFs accelerometer—are linearly separable, i.e., easy to classify. Lantz and Murray-Smith (2004) drawsimilar conclusions for classes of periodic hand gestures.Xu et al. (2005) used the correlation between the parametervectors of two DMPs to measure the similarity betweenthe original motion and recognize gait patterns. Similarly,(Ijspeert et al. 2013) used the correlation between parametervectors to recognize the letters of the Graffiti alphabet.The shape parameters w i are also suitable to fit moresophisticated classifiers like support vector machines. Thisstrategy was used to successfully classify gestures observedwith a monocular Liu et al. (2014) or a binocular (Wangand Payandeh 2015) camera. Instead of considering a fixednumber of basis function (number of shape parameters),(Zhang et al. 2017) used fast dynamic time warping(Salvador and Chan 2007) to align parameter vectors ofdifferent length and then used K -nearest neighbors toclassify different motions.Motion recognition can also be used to determine whetherthe robot is correctly executing a task by comparing senseddata with a movement template. In this respect, (Andr´e et al.2016) used an associative skill memory, like the one in(Pastor et al. 2011), as a predictive model of sensor traces thatenables early failure detection. In this work, DMPs were usedto compactly encode the associative skill memory and speedup the failure detection. Described approaches demonstratethat DMPs are a valuable option for gesture recognitionespecially for systems with limited computational power.Humans tend to perform the same task in slightlydifferent manners. Sometimes differences in the executionstyle contain useful information to adapt the motion todifference executive context. This is the case, for instance,of a reaching motion with and without an obstacle onthe way. To capture the execution style (Matsubara et al.2010) augmented the forcing term of the DMP with a styleparameter learned from multiple demonstrations. At runtime, different style parameters can be used to smoothlyinterpolate between demonstrated behaviors. Zhao et al.(2014) employed movements with different styles, but alsolearned a smooth mapping between style parameters and goalto improve the generalization.When humans provide seamless demonstrations, DMPscan be used for online segmentation and recognition. Tothis end, (Meier et al. 2011) assumed that a library ofDMPs is given and used it to recognize motion segmentsduring a task demonstration. Instead of using exemplartemplates for each class of primitives, (Chang and Kuli´c2013) segmented a video stream using motion to non-motion transitions, fitted DMPs on segmented data, andperformed clustering to group similar motion segments inan unsupervised fashion. Song et al. (2020) performedunsupervised trajectory segmentation using the conceptof key points, i.e., shared features across different task demonstrations. (Mandery et al. 2016) segmented whole-body motions by detecting contacts with the environmentand used them to build a probabilistic language model wherewords represent the poses and sentences sequences of poses.The learned language model was used to plan whole-bodymotion trajectories executed by joining multiple DMPs (seeSection 3.2).DMPs have been developed as a computational model ofthe neurobiological motor primitives (Schaal et al. 2007).Experimental findings from neurophisiology related to thespinal force fields in frog have inspired the modification ofDMPs formulation in (Hoffmann et al. 2009). As discussedin Section 3.1.1, this multidimensional representationovercomes limitations of classical DMPs like trajectoryovershooting and dependence of the trajectory from thereference frame used to describe the motion. Hoffmann et al.(2009) also derived a collision avoidance strategy for DMPs,inspired by the way human avoid collisions during armmotion. (DeWolf et al. 2016) investigated the human abilityto cope with to changes in the arm dynamics and kinematicstructure during motion control. They proposed a spikingneuron model of the motor control system that uses DMPsto implement the preparation and planning functionalities ofthe premotor cortex. The effects of changes in the robot’sdynamic parameters on the tracking performance of a DMPtrajectory were studied in (Kuppuswamy and Alessandro2011). Their findings suggests that the change in the bodyparameters should be explicitly considered in the DMPlearning process. Hotson et al. (2016) augmented a brain-machine interface that captures neural signals with a DMPmodel of the endpoint trajectories executed by a non-humanprimate. The system was used to decode real trajectoriesform a primate manipulating four different objects. DMPs can be utilized in various autonomous non stationaryfields of robotics. Perk and Slotine (2006) utilized DMPsfor defining flight paths and obstacle avoidance forUnmanned Areal Vehicles (UAVs), where the trajectorieswere generated based on the joystick movements controllingthe throttle of the UAV motors. Later, (Fang et al. 2014)extended the approach to encode user demonstrated UAVdata, extracting and encoding the rhythmic and linearsegments of the flight trajectory, and combining them intoa flight control skill. Furthermore, (Tomi´c et al. 2014)formulated the UAV movements as a optimal controlproblem. The output of the optimal control solver wasencoded with DMPs, enabling them to generalize andapply in-flight modifications to the UAV flight trajectoriesin real-time. Similarly, Lee et al. (2018); Kim et al.(2018a) presented a framework for UAV cooperative arealmanipulation tasks, based on an adaptive controller whichadapts the movement of the UAV in relation to the massand inertial properties of the payload. In addition, DMPswere incorporated in the control scheme to modify the flighttrajectories and avoid obstacles on the fly. The approach waslater extended to incorporate path optimization, where DMPsplay a significant tole for real time obstacle avoidance (Leeet al. 2020).As mentioned before, DMPs represent a versatilemovement representation, which can be implemented in Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel various tasks and scenarios. One of the recent applications inthis field are also Autonomous Underwater Vehicles (AUVs).Carrera et al. (2015) integrated the DMPs in a learning bydemonstration scenario for an AUV. The demonstrated dataconsisted of the manipulator and vehicle sensory outputs,which were efficiently used to demonstrate an underwatervalve turning task.DMPs are also represented in the autonomous drivingdomain. In the recent work of (Wang et al. 2018, 2019),the authors propose a framework which decomposes thecomplex driving data into a more elementary compositionof driving skills represented as motion primitives. In theproposed framework, DMPs are utilized to represent thedriver’s trajectory with acceptable accuracy and can begeneralized to different situations. This section provides guidelines to choose, among theseveral discussed in this work, the most appropriate approachfor a given application. A useful criterion to decide whetherto use a particular approach is the availability of code thatgreatly simplifies the implementation. We have searchedfor open-source DMP implementations and listed them ina Git repository (see Section 6.2). To further contributethe community, we have also released the implementationslisted in Table 4. This section ends with a discussion on thelimitations inherent to the DMP formulation, the open issues,and the possible research directions. These are summarizedin Table 5. Previous sections present different DMP formulations andextensions together with possible application scenarios. Asusual, there is not a single formulation that serves all thescopes and purposes, and the suitable approach to usedepends on the goal to achieve and conditions of application.For this reason, we present some guidelines to guide the userin the process of selecting the formulation to use. For a task with distinct starting and ending points, discreteDMPs are a logical option to encode the movementtrajectories between them. Examples of these tasks include:reaching and pick-and-place, (Stulp et al. 2009; Forte et al.2012; Deniˇsa et al. 2016a; Caccavale et al. 2019), specificactions of assembly (Kr¨uger et al. 2014; Abu-Dakka et al.2014; Gaˇspar et al. 2020; Nemec et al. 2020; Angelov et al.2020) and cutting (Yang et al. 2018; Straizys et al. 2020).When the starting and ending points coincide, periodicDMPs are the logical option, since the encoded movementscan be repeated over and over again. Good examples of theirapplication are repetitive tasks such as locomotion (R¨uckertand d’Avella 2013; M. Wensing and Slotine 2017), humanbody augmentation/rehabilitation (Peternel et al. 2016a),wiping a surface (Gams et al. 2016; Peternel et al. 2017a;Kramberger et al. 2018) and sawing (Peternel et al. 2018b).Nevertheless, even typically non-repetitive tasks that areexecuted just once every now and then can still be encodedwith periodic DMPs when the starting and ending pointscoincide (Peternel et al. 2018a). There are cases where it is not possible to clearlydistinguish if the motion is periodic or discrete. For instance,(Ernesti et al. 2012) have shown that the first step in a gait ofa humanoid robot is a transients towards a periodic motion.Their representation is a good candidate to encode transientsconverging to a limit cycle trajectories. Finally, in some caseslike in complex assembly, the task requires a combination ofdiscrete and periodic DMPs (Sloth et al. 2020). The original formulation of DMPs were and are stillsuccessfully applied to multidimensional independent datawith each DoF ∈ R (Section 2.1.1 and 2.2.1). These datacan be joint or Cartesian positions, forces, torques, etc ,where every DoF of the data can be evolved independentlyform the rest. However, such formulation is not enough tosuccessfully encode data with specific geometry constraintswithout pre- and/or post-processing the data. Examplesof such data are: i ) orientation, where data are tight upby additional constraints ( i.e., the orthogonality in caseof rotation matrix representation or the unit norm ofthe quaternion representation); ii ) full stiffness/dampingmatrices and manipulability matrices are encapsulated in anSPD matrices.In many early works, orientation trajectories were learnedand adapted without considering its geometry constraints(Pastor et al. 2009), leading to improper orientation andhence requiring an additional re-normalization. In a differentexample, Umlauft et al. (2017) used eigendecomposition forimpedance adaptation.In order to comply with such geometry constraints,researchers provided new formulation of DMPs that ensuresproper unit quaternions or rotation matrices over the courseof orientation adaptation Abu-Dakka et al. (2015a); Udeet al. (2014); Saveriano et al. (2019); Koutras and Doulgeri(2020a), and proper SPD matrices over the course of theadaptation of SPD profiles ( e.g., stiffness or manipulabilityellipsoids) (Abu-Dakka and Kyrki 2020). We believe thatusing these geometry-aware DMPs is preferable to encodedata with underlying geometry constraints. DMPs represent motion trajectories as stable dynamicalsystems with learnable weights that define the shape of themotion. In the LfD paradigm, DMP weights are usuallylearned in a supervised manner using human demonstrations.The procedure used to transform human demonstrations intotraining data for the DMP forcing term is highlighted inSection 2.1.1.1. Given the training data, different techniquescan be used to fit the weights.LWR is widely used when the forcing term is acombination of RBFs as in (4). If multiple demonstrationsare given, one can exploit GMM/GMR as in (Pervez et al.2017a) or GPR as in (Fanger et al. 2016) to represent theforcing term and use expectation–maximization to fit the(hyper-)parameters. Deep NNs, typically trained via back-propagation, seem an appealing possibility to map inputimages into forcing terms (Pervez et al. 2017b), mimickingthe human perception-action loop. Although appealing, thepossibility of exploiting deep learning techniques as motionprimitives requires further investigations. Prepared using sagej.cls Journal Title XX(X) Table 4. Open-source implementations of DMP-based approaches that we have released to the community. The source code foreach appraoch is available at https://gitlab.com/dmp-codes-collection Approach Author Language Description Discrete DMP Fares J. Abu-Dakka C++ An implementation for discrete DMP based on the work in(Ude et al. 2010; Abu-Dakka et al. 2015a; Ude et al. 2014).Periodic DMP Luka Peternel Python An implementation for periodic DMP based on the work in(Peternel et al. 2016a).Unit quaternionDMP Fares J. Abu-Dakka Matlab and C++ An implementation for unit quaternion DMP and goalswitching based on the work in (Abu-Dakka et al. 2015a;Ude et al. 2014).SPD DMP Fares J. Abu-Dakka Matlab An implementation for SPD DMP and goal switching basedon the work in (Abu-Dakka and Kyrki 2020).Joining DMPs Matteo Saveriano Matlab An implementation for joining multiple DMPs based on thework in (Saveriano et al. 2019).Coupling-forceDMPs Aljaz Kramberger Matlab An implementation for discrete DMPs and force couplingterms based on the work in (Kramberger et al. 2018). In real applications, there can be a misplacement betweenthe DMP trajectory and the robot motion. Typical examplesinclude assembly or other tasks that require physicalinteraction with the environment (see Section 5.1). In thissituations, the DMP motion can be incrementally adjusted toimprove the robot performance. ILC arises as an interestingapproach to iteratively update the DMP weights as it ensuresa rapid convergence to the desired performance (Gams et al.2014; Abu-Dakka et al. 2015a; Kramberger et al. 2018).However, ILC assumes that a target behavior to reproduceis given. When the target behavior cannot be easily specifiedand the robot performance is not satisfactory, RL solutionshave to be adopted. As detailed in Section 4.3, DMPs areeffective control policies and, combined with policy searchalgorithms like PI or PoWER, are able to solve complexand highly dynamic tasks. Performing robotic tasks in the real world requires adaptationcapabilities. When adaptation of DMPs based on somefeedback is required, one of the extension methods shouldbe applied. For example, to change the existing movementbased a detected obstacle, the method in (Park et al. 2008;Hoffmann et al. 2009; Tan et al. 2011; Gams et al. 2016) canbe used (see Section 3.3.1). If it is necessary to adaptivelylearn the movement dynamics based on real-time effortfeedback, the method in (Peternel et al. 2016a) can beemployed (see Section 3.3).Furthermore, for industrial tasks, such as assembly orpolishing, adaptation strategies combining force control withdemonstrated trajectories can be applied (Abu-Dakka et al.2015a; Kramberger et al. 2016a; Gams et al. 2010), ensuringthe system will follow the predefined trajectory and adaptto the environmental uncertainties. For online adaptationDMPs can be used as a trajectory generator, which outputrepresents an input to the force control algorithm, on theother hand, force feedback can directly be incorporated as acoupling term in the DMPs formulation (see Section 3.3.2),eliminating the need for an additional force controller. Similar approach can also be utilized for velocity basedadaptation of the movements (see Section 3.3.4). In physical interaction tasks, DMPs can be used to eitherlearn force or impedance (Peternel et al. 2017a). If thetask requires position control, then the impedance shouldbe learned with DMPs in combination with the referenceposition. If the task requires to control a specific force, e.g., pushing on a surface during the wiping and drilling,either force or impedance is feasible. However, if safety isthe most critical aspect, the DMPs should be used to learnimpedance control so that the robot can be made soft.Furthermore, to overcome any undesirable movements,the control policy can be augmented with a tank-basedpassivity approach (Shahriari et al. 2017). This approachmonitors the energy flow between the modeled sub-systems, e.g., DMPs trajectory generation, impedancecontrol, environment. In an event of an energy violation, thesystem will first try to passively compensate for the violationand subsequently if the violation cannot be compensated e.g.the energy tank is depleted, stop the system. In cases, wherethe task characteristics are not fully known, a learning policycan be added on top of the passivity approach (Krambergeret al. 2018) in-order to learn the overall energy requirementsfor the task. Availability of code and datasets is useful to speed-upthe setup of novel applications without the need of re-implementing a promising approach from scratch. Wehave searched for available DMP implementation andfound out that several researchers published their DMPcodes in various open-source repositories. We decided tolist the available implementations on the Git repositorythat accompanies this paper ( https://gitlab.com/dmp-codes-collection/third-party-dmp ). Foreach implementation, we mention the type of DMP, theauthor, the url to download the code, and the used Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel programming language. We also provide a short descriptionof the key features.Apart from listing existing approaches, the Git repositorythat accompanies this paper contains implementation that wedecided to release to the community. The list of providedimplementations is given in Table 4. As any motion primitive representation, DMPs havestrengths but also inherent limitations. The advantages ofthe DMPs have been widely discussed in previous sections.Here, we present the main limitations of the DMPs anddiscuss open issues that require further investigation. Asummary of these limitations is presented in Table 3. The phase variable used to suppress the non-linear forcingterm and ensure convergence to a given goal introducesan implicit time dependency in the DMP formulation. Thereason for representing the time dependency implicitly asa dynamical system is that such a phase variable can beconveniently manipulated. For example, in Section 2.1.1.2,we have seen how to manipulate the phase variable toslow-down (or even stop) the execution. A drawback ofthe time dependency is that the shape of the DMP motionis significantly affected by the time evolution of the phasevariable. If the phase vanishes too early, the last part ofthe trajectory is executed with a linear dynamics convergingto the goal. If the phase lasts too long, the trajectory mayovershoot and fail to reach the goal within the desired time.In both cases, the DMP motion may significantly deviatefrom the demonstration. A properly designed phase stoppingmechanism can remedy the issue, but the proper phasestopping to adopt depends on the specific application.In order to overcome this limitation, several authorsfocused on learning stable and time-independent (orautonomous) dynamical systems from demonstrations. Aglobally stable and autonomous system generates a vectorfield that converges to the given goal from any initial state.Without the need of a phase variable, the generated motiondepends only of the current state of the system. Notableapproaches to learn stable and autonomous systems exploitLyapunov theory (Khansari-Zadeh and Billard 2011, 2014),contraction theory (Ravichandar and Dani 2015; Blocheret al. 2017), diffeomorphic transformations (Neumann andSteil 2015; Perrin and Schlehuber-Caissier 2016), andpassivity considerations (Kronander and Billard 2015).These approaches have been effectively used to learncomplex movements from demonstrations.In general, autonomous systems have the potential torepresent much more complex movements than DMPs.For example, autonomous systems can encode differentmotions in different regions of the state-space. In thisrespect, DMPs can only generate a stereotypical trajectoryconnecting the start to the goal, regardless where the initialstate is placed in the state-space. However, the stereotypicalmotion generation is also an advantage of DMPs since itmakes easier to predict the generated motion in regionsof the state-space poorly covered by training data. On thecontrary, it is hard to predict how an autonomous systemgeneralizes where only few or no training data are available. DMPs are know to scale well in high-dimensional spacessince the learned forcing term always depends on a shared,scalar phase variable. Autonomous systems perform learningdirectly on the high-dimensional state-space, which posesnumerical challenges and requires much more training data.In synthesis, each representation has its own advantagesand disadvantages and the choice between time-dependantand autonomous motion primitives depends on the specificapplication. Representing the demonstrated motion as a probabilitydistribution has several advantages. For example, in aprobabilistic framework the generalization to new a goal(or a via-point) is achieved using conditioning on the newgoal (via-point), while the covariance computed from theprobabiltiy distribution can represent couplings betweendifferent DoFs (Paraschos et al. 2013). As a matter of fact,classical DMPs are deterministic and lack the stochasticinformation on the modelled motion.Ben Amor et al. (2014) proposed an approach to estimatethe predictive distribution P ( w | y T ) that relates the DMPweights w and a partial trajectory y T observed for T time instants. P ( w | y T ) is used to estimate the most likelyweights given a partial movement and to reconstruct themissing part of the trajectory. However, a full probabilisticcharacterization of DMPs is still missing.The ProMP framework (Paraschos et al. 2013) proposedan alternative movement primitive representation thatcontains information about the variability across differentdemonstrations as well as different DoFs in the form ofa covariance matrix. This enables to explicitly encodethe couplings between different directions and to increasethe generalization by conditioning on a desired goal, via-point, or intermediate velocity. The covariance computedby ProMPs represent the variability and the correlationin the demonstrations. In other representations, like GPR,the covariance is a measure of the model uncertaintydue to the lack of training data. Kernelized MovementPrimitives (KMPs) (Huang et al. 2019; Silv´erio et al. 2019)offer the possibility of modelling variability, correlation,and uncertainty in the same framework. However, KMP’scomputational cost can be elevated compared to DMP inlonger trajectories due to the computation of the inverse ofthe kernel matrix. A vast majority of methods employ DMPs only as a referencetrajectory generator for the closed-loop controller, whichthen actually executes it. However, the DMPs can also beused as a part of the close-loop controller itself and only afew methods explored this concept. For example, in Peternelet al. (2016a) the DMPs are directly torque generators forexoskeleton actuators in the control loop, which is closedby a feedback from the human user’s muscle activity.Nevertheless, in such scenario the closed-loop stability andpassivity become crucial considerations that have to beaddressed and resolved before the wide-spread application(Kramberger et al. 2018). Prepared using sagej.cls Journal Title XX(X)6.3.4 Coping with high-dimensional inputs One of the main limitations of DMP is that it encodeshuman and robot trajectories explicitly with the time ( i.e., e.g., faster/slower velocity) from the demonstratedones. In order to avoid synchronization problem, Ben Amoret al. (2014) designed a time-alignment strategy, while(Pervez et al. 2017a) estimated the phase signal during thetraining using expectation-maximization (Bishop 2006).As the DMP models trajectories using basis functions,this works effectively when learning time-driven trajectories( i.e., e.g., the position ofa target) from an input image and a fully connected NN toretrieve the forcing term from the 2–D parameters and thephase variable. The CNN and the fully connected NN aretrained in two separate stages. The approach is promising,but the separate training of the two networks increases thepre-processing and complicates the learning process.Alternative approaches in literature, such as GMM/GMR(Calinon 2016), Task-Parameterized GMM (TP-GMM)(Calinon 2016), KMP (Huang et al. 2019; Huang et al.2021), can be directly applied for learning demonstrationscomprising high-dimensional inputs. The well known second-order dynamic properties of theDMPs, strive towards a single attractor system (Ijspeert et al.2002a). The properties, e.g., convergence and modulationof the motion, are well studied and implementations can befound in many research papers. Because of the second-orderdynamics, the system becomes unstable if for example themotion is reversed during the execution. In the past years,two main approaches describing the reversibility problemhave been introduced. In the first approach (Nemec et al.2018), reversibility is considered as leaning two separateprimitives, one for each direction of the motion. Theapproach is promising, but does not reflect true reversibility,because it uses one attractor point for each primitive.On the other hand, (San Juan et al. 2019) introduced analternative formulation with two stable attractor systems.The first attractor is defined at the starting point y ofthe trajectory and the subsequent one at the goal g ,the dynamical system between them guaranties a stableconvergence depending on the selected attractor. Theapproach demonstrated true reversibility, while keeping allthe DMPs properties. Nevertheless, all questions have notbeen resolved yet, the approach was evaluated on tasks andjoint space position trajectories. A proper formulation for Table 5. A summary of DMP features and limitations that havebeen solved ( (cid:51) ) or partially solved ( (cid:121) ). Limitation Related work Status Via-points (Ning et al. 2011, 2012; Weitschatand Aschemann 2018; Saveriano et al.2019; Zhou et al. 2019) (cid:51) Start-point (Hoffmann et al. 2009; Ijspeert et al.2013; Weitschat et al. 2013; Draganet al. 2015) (cid:51) Goal-point (Ijspeert et al. 2013; Ude et al. 2014;Abu-Dakka and Kyrki 2020; Draganet al. 2015; Weitschat and Aschemann2018) (cid:51) Obstacleavoidance (Park et al. 2008; Hoffmann et al.2009; Tan et al. 2011; Kim et al. 2015;Rai et al. 2017) (cid:51) Geometry-constraineddata (Pastor et al. 2009; Abu-Dakka et al.2015a; Ude et al. 2014; Saverianoet al. 2019; Abu-Dakka and Kyrki2020) (cid:121) ¶ Probabilistic (Ben Amor et al. 2014) (cid:121) Extrapolation (Pervez and Lee 2018; Zhou et al.2019) (cid:121) High-diminput (Pervez et al. 2017a; Pahiˇc et al. 2020) (cid:121) Closed-loop (Peternel et al. 2016a; Krambergeret al. 2018) (cid:121) Multi-attractor (Nemec et al. 2018; San Juan et al.2019) (cid:121) dealing with orientations e.g. quaternions in task space is stillmissing, Since their introduction in early 2000’s, DMPs haveestablished as one of the most used and popularapproaches for motor commands generator system inrobotics. Several authors have exploited and extended theclassical formulation to overcome some limitations andfulfill different requirements. Their research resulted in alarge amount of papers published over the last two decades.One of the aims of this paper is to categorize and reviewthe vast literature on DMPs. We took a systematic reviewapproach and automatically searched for DMP related papersin a popular database. A manual inspection of the resultingpapers, guided by clear and unbiased criteria, led to thepapers included on this tutorial survey.Another aim of our work is to provide a tutorial onDMPs that presents the classical formulation and the keyextensions in rigorous mathematical terms. We made aneffort to unify the notation among different approaches inorder to make them easier to understand. Moreover, weprovide useful guidelines that guide the reader to select theright approach for a given application. In the tutorial vein,we have also searched for open-source implementation of the ¶ The referred work extended the classical DMP to different space like SO (3) or S m ++ . Although formally similar, the extention to otherRiemannian manifolds like the Grassmannian or the Hyperbolic manifoldsis non-trivial and still not fully addressed. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel described approaches and released to the community severalimplementations of DMP-based approaches.Advantages of DMPs have been discussed as well astheir limitations and the open-issues. We have summarizedthem in Table 5 where we also indicate the solved issuesand the one that require further investigation. In thisrespect, as research on DMP is still very active, we providea comprehensive discussion that will help the reader tounderstand what has been done in the field and where he canput his research focus. Funding This work has been partially supported by:- CHIST-ERA project IPALM (Academy of Finland decision326304).- The Austrian Research Foundation (Euregio IPN 86-N30,OLIVER).- Innovation Fund Denmark (Research and innovation projectMADE FAST). References Abdelrahman A, Mitrevski A and Ploger P (2020) Context-aware task execution using apprenticeship learning. In: IEEEInternational Conference on Robotics and Automation . pp.1329–1335.Abu-Dakka F, Nemec B, Kramberger A, Buch A, Kr¨uger N and UdeA (2014) Solving peg-in-hole tasks by human demonstrationand exception strategies. Industrial Robot IEEE International Conference onRobotics and Automation . Paris, France, pp. 4421–4426.Abu-Dakka FJ, Nemec B, Jørgensen JA, Savarimuthu TR, Kr¨ugerN and Ude A (2015a) Adaptation of manipulation skillsin physical contact with the environment to reference forceprofiles. Autonomous Robots Roboticsand Autonomous Systems Frontiers in Robotics and AI 7: 177.Abu-Dakka FJ, Valera A, Escalera J, Vall´es M, Mata V andAbderrahim M (2015b) Trajectory adaptation and learning forankle rehabilitation using a 3-prs parallel robot. In: Liu H,Kubota N, Zhu X, Dillmann R and Zhou D (eds.) IntelligentRobotics and Applications . Cham: Springer InternationalPublishing, pp. 483–494.Abu-Dakka FJ, Valera A, Escalera JA, Abderrahim M, Page Aand Mata V (2020) Passive exercise adaptation for anklerehabilitation based on learning control framework. Sensors IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 4555–4562.Agostini A, Saveriano M, Lee D and Piater J (2020) Manipulationplanning using object-centered predicates and hierarchicaldecomposition of contextual actions. IEEE Robotics andAutomation Letters Physica D:Nonlinear Phenomena The International Journal of RoboticsResearch Autonomous Robots IEEE/ASME International Conference onAdvanced Intelligent Mechatronics . pp. 889–894.Amatya S, Rezayat Sorkhabadi S and Zhang W (2020) Humanlearning and coordination in lower-limb physical interactions.In: Proceedings of the American Control Conference , volume2020-July. pp. 557–562.Andr´e J, Santos C and Costa L (2016) Skill memory in bipedlocomotion: Using perceptual information to predict taskoutcome. Journal of Intelligent and Robotic Systems: Theoryand Applications Journal of Intelligent andRobotic Systems: Theory and Applications IEEE Robotics and Automation Letters Artificial Intelligence Review 11: 11–73.Basa D and Schneider A (2015) Learning point-to-point movementson an elastic limb using dynamic movement primitives. Robotics and Autonomous Systems 66: 55–63.Beetz M, Stulp F, Esden-Tempski P, Fedrizzi A, Klank U, KresseI, Maldonado A and Ruiz F (2010) Generality and legibilityin mobile manipulation: Learning skills for routine tasks. Autonomous Robots IEEEInternational Conference on Robot and Human InteractiveCommunication . pp. 713–720.Ben Amor H, Neumann G, Kamthe S, Kr¨omer O and Peters J (2014)Interaction primitives for human-robot cooperation tasks. In: IEEE International Conference on Robotics and Automation .Hong Kong, China, pp. 2831–2837.Bian F, Ren D, Li R, Liang P, Wang K and Zhao L (2019) Anextended dmp framework for robot learning and improvingvariable stiffness manipulation. Assembly Automation Handbook ofRobotics , chapter 74. Secaucus, NJ, USA: Springer, pp. 1995–2014. 2nd Edition.Bishop CM (2006) Linear Models for Regression . Springer, pp.172–173. Prepared using sagej.cls Journal Title XX(X) Bitzer S and Vijayakumar S (2009) Latent spaces for dynamicmovement primitives. In: IEEE-RAS International Conferenceon Humanoid Robots . pp. 574–581.Blocher C, Saveriano M and Lee D (2017) Learning stabledynamical systems using contraction theory. In: InternationalConference on Ubiquitous Robots and Ambient Intelligence .pp. 124–129.Bristow DA, Tharayil M and Alleyne AG (2006) A survey ofiterative learning control. Control Systems Magazine The International Journal ofRobotics Research Robotics: Science and Systems VI : 153.Burdet E, Osu R, Franklin DW, Milner TE and Kawato M (2001)The central nervous system stabilizes unstable dynamics bylearning optimal impedance. Nature Autonomous Robots Joint IEEE InternationalConference on Development and Learning and on EpigeneticRobotics . Lisbon, Portugal, pp. 66–71.Calinon S (2016) A tutorial on task-parameterized movementlearning and retrieval. Intelligent Service Robotics IEEE-RAS International Conference on Humanoid Robots . pp. 582–588.Calinon S, Li Z, Alizadeh T, Tsagarakis N and Caldwell D(2012) Statistical dynamical systems for skills acquisitionin humanoids. In: IEEE-RAS International Conference onHumanoid Robots . pp. 323–329.Carrera A, Palomeras N, Hurt´os N, Kormushev P and CarrerasM (2015) Cognitive system for autonomous underwaterintervention. Pattern Recognition Letters 67: 91–99.Chang G and Kuli´c D (2013) Motion learning from observationusing affinity propagation clustering. In: IEEE InternationalSymposium on Robot and Human Interactive Communication .pp. 662–667.Chen N, Bayer J, Urban S and Van Der Smagt P (2015) Efficientmovement representation by embedding dynamic movementprimitives in deep autoencoders. In: IEEE-RAS InternationalConference on Humanoid Robots . IEEE, pp. 434–440.Chen N, Karl M and Van Der Smagt P (2016) Dynamic movementprimitives in latent space of time-dependent variationalautoencoders. In: IEEE-RAS International Conference onHumanoid Robots . pp. 629–636.Chen Z and Liu B (2018) Lifelong machine learning. SynthesisLectures on Artificial Intelligence and Machine Learning IEEE/RSJ International Conference on Intelligent Robots andSystems . pp. 3875–3881.Chiaverini S and Siciliano B (1999) The unit quaternion: A usefultool for inverse kinematics of robot manipulators. SystemsAnalysis Modelling Simulation IEEE Robotics and Automation Letters EvolutionaryIntelligence European Conference onModelling and Simulation . pp. 421–427.Cohn DA, Ghahramani Z and Jordan MI (1996) Active learningwith statistical models. Journal of artificial intelligenceresearch 4: 129–145.Colome A, Planells A and Torras C (2015) A friction-model-basedframework for reinforcement learning of robotic tasks in non-rigid environments. In: IEEE International Conference onRobotics and Automation . pp. 5649–5654.Colom´e A and Torras C (2014) Dimensionality reduction andmotion coordination in learning trajectories with dynamicmovement primitives. In: IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 1414–1420.Colom´e A and Torras C (2018) Dimensionality reduction fordynamic movement primitives and application to bimanualmanipulation of clothes. IEEE Transactions on Robotics IEEE-RAS InternationalConference on Humanoid Robots . pp. 711–717.Cui Y, Poon J, Miro JV, Yamazaki K, Sugimoto K and Mat-subara T (2019) Environment-adaptive interaction primitivesthrough visual context for human–robot motor skill learning. Autonomous Robots IEEE Control Systems Letters Advances in Intelligent Systems andComputing Studies inComputational Intelligence IEEE/ASME Transactions on Mechatronics RobotControl , chapter 1. Rijeka: IntechOpen, pp. 1–17.Deniˇsa M and Ude A (2015) Synthesis of new dynamic movementprimitives through search in a hierarchical database of examplemovements. International Journal of Advanced RoboticSystems Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel DeWolf T, Stewart T, Slotine JJ and Eliasmith C (2016) A spikingneural model of adaptive arm control. Proceedings of the RoyalSociety B: Biological Sciences IEEE International Conference on Roboticsand Automation . pp. 1858–1864.Dometios A, Zhou Y, Papageorgiou X, Tzafestas C and Asfour T(2018) Vision-based online adaptation of motion primitives todynamic surfaces: Application to an interactive robotic wipingtask. IEEE Robotics and Automation Letters IEEE InternationalConference on Robotics and Automation . Seattle, WA, USA,pp. 2339–2346.Duminy N, Nguyen S and Duhaut D (2017) Strategic and interactivelearning of a hierarchical set of tasks by the poppy humanoidrobot. In: IEEE International Conference on Development andLearning and Epigenetic Robotics . pp. 204–209.Eiband T, Saveriano M and Lee D (2019) Learning hapticexploration schemes for adaptive task execution. In: IEEE International Conference on Robotics and Automation .Montreal, QC, Canada, pp. 7048–7054.End F, Akrour R, Peters J and Neumann G (2017) Layereddirect policy search for learning hierarchical skills. In: IEEEInternational Conference on Robotics and Automation . pp.6442–6448.Ernesti J, Righetti L, Do M, Asfour T and Schaal S (2012) Encodingof periodic and their transient motions by a single dynamicmovement primitive. In: IEEE-RAS International Conferenceon Humanoid Robots . pp. 57–64.Fabisch A and Metzen J (2014) Active contextual policy search. Journal of Machine Learning Research 15: 3371–3399.Fang Z, Wang G, Li W and Li P (2014) Control-orientedmodeling of flight demonstrations for quadrotors using higher-order statistics and dynamic movement primitives. In: IEEEInternational Symposium on Industrial Electronics . pp. 1518–1525.Fanger Y, Umlauft J and Hirche S (2016) Gaussian processes fordynamic movement primitives with application in knowledge-based cooperation. In: IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 3913–3919.Fei G, Wang S and Liu B (2016) Learning cumulatively to becomemore knowledgeable. In: Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining . pp. 1565–1574.Flash T and Hochner B (2005) Motor primitives in vertebrates andinvertebrates. Current opinion in neurobiology IEEE Transactions on Robotics Roboticsand Autonomous Systems The 17th International Conference onAdvanced Robotics . pp. 252–258. Forte D, Ude A and Gams A (2011) Real-time generalizationand integration of different movement primitives. In: IEEE-RAS International Conference on Humanoid Robots . Bled,Slovenia, pp. 590–595.Gams A, Deniˇsa M and Ude A (2015a) Learning of parametriccoupling terms for robot-environment interaction. In: IEEE-RAS International Conference on Humanoid Robots . Seoul,South Korea, pp. 304–309.Gams A, Do M, Ude A, Asfour T and Dillmann R (2010) On-lineperiodic movement and force-profile learning for adaptationto new surfaces. In: IEEE-RAS International Conference onHumanoid Robots . Nashville, TN, USA, pp. 560–565.Gams A, Ijspeert A, Schaal S and Lenarˇciˇc J (2009) On-linelearning and modulation of periodic movements with nonlineardynamical systems. Autonomous robots IEEE Transactions on Robotics Roboticsand Autonomous Systems 75: 340–351.Gams A and Ude A (2009) Generalization of example movementswith dynamic systems. In: IEEE-RAS International Conferenceon Humanoid Robots . Paris, France, pp. 28–33.Gams A, Ude A and Morimoto J (2015b) Acceleratingsynchronization of movement primitives: Dual-arm discrete-periodic motion of a humanoid robot. In: IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 2754–2760.Gao J, Zhou Y and Asfour T (2019) Projected force-admittancecontrol for compliant bimanual tasks. In: IEEE-RASInternational Conference on Humanoid Robots . pp. 607–613.Gaˇspar T, Deniˇsa M and Ude A (2020) Knowledge acquisitionthrough human demonstration for industrial robotic assembly. Advances in Intelligent Systems and Computing Robotics and Autonomous Systems IEEE International Conferenceon Robotics and Automation . pp. 5616–5621.Ginesi M, Meli D, Nakawala H, Roberti A and Fiorini P (2019) Aknowledge-based framework for task automation in surgery. In: International Conference on Advanced Robotics . pp. 37–42.Guerin K, Riedel S, Bohren J and Hager G (2014) Adjutant: Aframework for flexible human-machine collaborative systems.In: IEEE/RSJ International Conference on Intelligent Robotsand Systems . pp. 1392–1399.Gutzeit L, Fabisch A, Otto M, Metzen J, Hansen J, Kirchner F andKirchner E (2018) The besman learning platform for automatedrobot skill learning. Frontiers Robotics AI Springer Tracts in AdvancedRobotics Prepared using sagej.cls Journal Title XX(X) learning. In: International Conference on Advanced Robotics .pp. 557–564.Hazara M and Kyrki V (2016) Reinforcement learning forimproving imitated in-contact skills. In: IEEE-RASInternational Conference on Humanoid Robots . pp. 194–201.Hazara M and Kyrki V (2017) Model selection for incrementallearning of generalizable movement primitives. In: Interna-tional Conference on Advanced Robotics . IEEE, pp. 359–366.Hazara M and Kyrki V (2018) Speeding up incremental learningusing data efficient guided exploration. In: IEEE InternationalConference on Robotics and Automation . IEEE, pp. 1–8.Hazara M and Kyrki V (2019) Transferring generalizable motorprimitives from simulation to real world. IEEE Robotics andAutomation Letters IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 1834–1839.Herzog S, W¨org¨otter F and Kulvicius T (2016) Optimal trajectorygeneration for generalization of discrete movements withboundary conditions. In: IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 3143–3149.Hoffmann H, Pastor P, Park DH and Schaal S (2009) Biologically-inspired dynamical systems for movement generation: auto-matic real-time goal adaptation and obstacle avoidance. In: IEEE International Conference on Robotics and Automation .Kobe, Japan, pp. 2587–2592.Hogan N (1984) Adaptive control of mechanical impedance bycoactivation of antagonist muscles. IEEE Transactions onautomatic control Journal of Dynamic Systems, Measurement, and Control IEEE Robotics and Automation Letters IEEE Transactions on Industrial Electronics IEEE international conference on robotics andautomation . IEEE, pp. 257–263.Huang R, Cheng H, Guo H, Lin X, Chen Q and Sun F (2016b)Learning cooperative primitives with physical human-robotinteraction for a human-powered lower exoskeleton. In: IEEE/RSJ International Conference on Intelligent Robots andSystems . IEEE, pp. 5355–5360.Huang Y, Abu-Dakka FJ, Silv´erio J and Caldwell DG (2021)Toward orientation learning and adaptation in cartesian space. IEEE Transactions on Robotics The International Journal of RoboticsResearch International Journal of Precision Engineering and Manufacturing Advances inNeural Information Processing Systems 15 . Vancouver, BC,Canada: Cambridge, MA: MIT Press, pp. 1523–1530.Ijspeert AJ, Nakanishi J, Hoffmann H, Pastor P and Schaal S (2013)Dynamical Movement Primitives: Learning Attractor Modelsfor Motor Behaviors. Neural Computation IEEE/RSJInternational Conference on Intelligent Robots and Systems .Maui, HI, USA, pp. 752–757.Ijspeert AJ, Nakanishi J and Schaal S (2002b) Learning rhythmicmovements by demonstration using nonlinear oscillators. In: IEEE/RSJ International Conference on Intelligent Robots andSystems , volume 1. Lausanne, Switzerland, pp. 958–963.Ijspeert AJ, Nakanishi J and Schaal S (2002c) Movement imitationwith nonlinear dynamical systems in humanoid robots. IEEEInternational Conference on Robotics and Automation Advanced Robotics Proceedings of the Advances inRobotics . Association for Computing Machinery, pp. 1–6.Kamali K, Akbari AA and Akbarzadeh A (2016) Trajectorygeneration and control of a knee exoskeleton based on dynamicmovement primitives for sit-to-stand assistance. AdvancedRobotics IEEE International Conference onRobotics and Automation . pp. 316–321.Kastritsi T, Dimeas F and Doulgeri Z (2018) Progressiveautomation with dmp synchronization and variable stiffnesscontrol. IEEE Robotics and Automation Letters Transactions on Robotics Robotics and AutonomousSystems IEEERobotics and Automation Magazine IEEE/ASME InternationalConference on Advanced Intelligent Mechatronics . pp. 1032–1037.Kim W, Lee C and Kim H (2018b) Learning and generalizationof dynamic movement primitives by hierarchical deepreinforcement learning from demonstration. In: IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 3117–3123. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Kober J, Bagnell JA and Peters J (2013) Reinforcement learningin robotics: A survey. The International Journal of RoboticsResearch IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 834–839.Kober J, Mohler B and Peters J (2010a) Imitation and reinforcementlearning for motor primitives with perceptual coupling. In:Sigaud O and Peters J (eds.) From Motor Learning toInteraction Learning in Robots . Berlin, Heidelberg: SpringerBerlin Heidelberg, pp. 209–225.Kober J, M¨ulling K, Kr¨omer O, Lampert CH, Sch¨olkopf B andPeters J (2010b) Movement templates for learning of hittingand batting. In: IEEE International Conference on Roboticsand Automation . Anchorage, AK, USA: IEEE, pp. 853–858.Kober J, Oztop E and Peters J (2011) Reinforcement learning toadjust robot movements to new situations. In: InternationalJoint Conference on Artificial Intelligence . pp. 2650–2655.Kober J and Peters J (2010) Imitation and reinforcement learning. IEEE Robotics and Automation Magazine Machine Learning Autonomous Robots IEEEInternational Symposium on Robot and Human InteractiveCommunication . pp. 249–254.Kong K and Jeon D (2006) Design and control of an exoskeletonfor the elderly and patients. IEEE/ASME Transactions onmechatronics IEEE/RSJ International Conference on Intelligent Robots andSystems . pp. 3232–3237.Kormushev P, Calinon S and Caldwell DG (2011) Imitationlearning of positional and force skills demonstrated viakinesthetic teaching and haptic input. Advanced Robotics Journal of Intelligent & Robotic Systems Conference on Robot Learning . PMLR,pp. 293–302.Koutras L and Doulgeri Z (2020b) Dynamic movement primitivesfor moving goals with temporal scaling adaptation. In: IEEEInternational Conference on Robotics and Automation . IEEE,pp. 144–150.Kramberger A, Gams A, Nemec B, Chrysostomou D, Madsen Oand Ude A (2017) Generalization of orientation trajectoriesand force-torque profiles for robotic assembly. Robotics andAutonomous Systems 98: 333–346.Kramberger A, Gams A, Nemec B and Ude A (2016a)Generalization of orientational motion in unit quaternion space.In: IEEE-RAS International Conference on Humanoid Robots . Cancun, Mexico, pp. 808–813.Kramberger A, Piltaver R, Nemec B, Gams M, Ude A et al. (2016b)Learning of assembly constraints by demonstration and activeexploration. Industrial Robot: An International Journal IEEE/RSJInternational Conference on Intelligent Robots and Systems .Madrid, Spain: IEEE, pp. 6023–6028.Kr¨omer O, Detry R, Piater J and Peters J (2010a) Grasping withvision descriptors and motor primitives. In: InternationalConference on Informatics in Control, Automation andRobotics , volume 2. pp. 47–54.Kr¨omer OB, Detry R, Piater J and Peters J (2010b) Combiningactive learning and reactive control for robot grasping. Roboticsand Autonomous Systems IEEE Robotics and Automation Letters Journalof Intelligent & Robotic Systems IEEE International Conference onAdvanced Robotics . pp. 1–8.Kr¨uger N, Ude A, Petersen H, Nemec B, Ellekilde LP, SavarimuthuT, Rytz J, Fischer K, Buch A, Kraft D, Mustafa W, Aksoy E,Papon J, Kramberger A and W¨org¨otter F (2014) Technologiesfor the fast set-up of automated assembly processes. KunstlicheIntelligenz Robotics and AutonomousSystems IEEE International Conference on Robotics andAutomation . pp. 2275–2280.Kulvicius T, Ning K, Tamosiunaite M and Worg¨otter F(2012) Joining movement sequences: Modified dynamicmovement primitives for robotics applications exemplified onhandwriting. IEEE Transactions on Robotics ArtificialIntelligence Procedia Computer Science 7: 166–168. 2nd European FutureTechnologies Conference and Exhibition 2011 (FET 11).Lafleche JF, Saunderson S and Nejat G (2019) Robot cooperativebehavior learning using single-shot learning from demonstra-tion and parallel hidden markov models. IEEE Robotics andAutomation Letters Nordic Conference on Human–ComputerInteraction , volume 82. pp. 97–100. Prepared using sagej.cls Journal Title XX(X) Lauretti C, Cordella F, Ciancio AL, Trigili E, Catalan JM, BadesaFJ, Crea S, Pagliara SM, Sterzi S, Vitiello N et al. (2018)Learning by demonstration for motion planning of upper-limbexoskeletons. Frontiers in neurorobotics 12: 5.Lauretti C, Cordella F, Guglielmelli E and Zollo L (2017) Learningby demonstration for planning activities of daily living inrehabilitation and assistive robotics. IEEE Robotics andAutomation Letters nature IEEE Robotics and Automation Letters IEEE Access 8: 135406–135415.Lee S and Suh I (2013) Skill learning and inference frameworkfor skilligent robot. In: IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 108–115.Lemme A, Reinhart R and Steil J (2014) Self-supervisedbootstrapping of a movement primitive library from complextrajectories. In: IEEE-RAS International Conference onHumanoid Robots . pp. 726–732.Lentini G, Grioli G, Catalano M and Bicchi A (2020) Robotprogramming without coding. In: IEEE InternationalConference on Robotics and Automation . pp. 7576–7582.Li C, Fahmy A, Li S and Sienz J (2020) An enhanced robot massagesystem in smart homes using force sensing and a dynamicmovement primitive. Frontiers in Neurorobotics IEEE-RAS International Conference on Humanoid Robots . pp.547–553.Li Z, Zhao T, Chen F, Hu Y, Su CY and Fukuda T (2018)Reinforcement learning of manipulation and grasping usingdynamical movement primitives for a humanoidlike mobilemanipulator. IEEE/ASME Transactions on Mechatronics Intelligent Autonomous Systems 13 . Cham: SpringerInternational Publishing, pp. 1601–1611.Liu C, Geng W, Liu M and Chen Q (2020) Workspace trajectorygeneration method for humanoid adaptive walking withdynamic motion primitives. IEEE Access 8: 54652–54662.Liu Z, Hu F, Luo D and Wu X (2014) Visual gesture recognition forhuman robot interaction using dynamic movement primitives.In: IEEE International Conference on Systems, Man andCybernetics . pp. 2094–2100.Lonˇcarevi´c Z, Pahiˇc R, Ude A and Gams A (2021) Generalization-based acquisition of training data for motor primitive learningby neural networks. Applied Sciences Towards Autonomous Robotic Systems .Cham: Springer International Publishing, pp. 16–31.Luo D, Han X, Ding Y, Ma Y, Liu Z and Wu X (2015) Learningpush recovery for a bipedal humanoid robot with dynamical movement primitives. In: IEEE-RAS International Conferenceon Humanoid Robots . pp. 1013–1019.M Wensing P and Slotine JJ (2017) Sparse control for dynamicmovement primitives. IFAC-PapersOnLine IEEE/RSJ International Conference on IntelligentRobots and Systems . pp. 5411–5418.Mao R, Yang Y, Ferm¨uller C, Aloimonos Y and Baras J (2015)Learning hand movements from markerless demonstrations forhumanoid tasks. In: IEEE-RAS International Conference onHumanoid Robots . pp. 938–943.Matsubara T, Hyon SH and Morimoto J (2010) Learning stylisticdynamic movement primitives from multiple demonstrations.In: IEEE/RSJ International Conference on Intelligent Robotsand Systems . pp. 1277–1283.Matsubara T, Hyon SH and Morimoto J (2011) Learning parametricdynamic movement primitives from multiple demonstrations. Neural Networks IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 3407–3412.Montebelli A, Steinmetz F and Kyrki V (2015) On handingdown our tools to robots: Single-phase kinesthetic teaching fordynamic in-contact tasks. In: IEEE International Conferenceon Robotics and Automation . pp. 5628–5634.M¨ulling K, Kober J, Kr¨omer O and Peters J (2013) Learning toselect and generalize striking movements in robot table tennis. International Journal of Robotics Research IEEE-RAS InternationalConference on Humanoid Robots . Nashville, TN, USA, pp.411–416.Mussa-Ivaldi FA (1999) Modular features of motor control andlearning. Current opinion in neurobiology IEEE RAS and EMBS International Conference on BiomedicalRobotics and Biomechatronics . pp. 685–691.Nakanishi J, Rawlik K and Vijayakumar S (2011) Stiffness andtemporal optimization in periodic movements: An optimalcontrol approach. In: IEEE/RSJ International Conference onIntelligent Robots and Systems . IEEE, pp. 718–724.Nemec B, Forte D, Vuga R, Tamosiunaite M, W¨org¨otter F andUde A (2012) Applying statistical generalization to determinesearch direction for reinforcement learning of movementprimitives. In: IEEE-RAS International Conference onHumanoid Robots . pp. 65–70.Nemec B, Gams A and Ude A (2013a) Velocity adaptation forself-improvement of skills learned from user demonstrations.In: IEEE-RAS International Conference on Humanoid Robots .Atlanta, GA, USA, pp. 423–428.Nemec B, Likar N, Gams A and Ude A (2016) Bimanual humanrobot cooperation with adaptive stiffness control. In: IEEE-RASInternational Conference on Humanoid Robots . pp. 607–613.Nemec B, Simonic M and Ude A (2020) Learning of exceptionstrategies in assembly tasks. In: IEEE International Conferenceon Robotics and Automation . pp. 6521–6527. Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Nemec B and Ude A (2012) Action sequencing using dynamicmovement primitives. Robotica IEEE-RAS International Conference on Humanoid Robots . Bled,Slovenia, pp. 727–732.Nemec B, Vuga R and Ude A (2013b) Efficient sensorimotorlearning from multiple demonstrations. Advanced Robotics IEEE-RAS International Conference onHumanoid Robots . Beijing, China, pp. 166–173.Neumann K and Steil JJ (2015) Learning robot motions with stabledynamical systems under diffeomorphic transformations. Robotics and Autonomous Systems 70: 1–15.Niekum S, Osentoski S, Konidaris G and Barto A (2012) Learningand generalization of complex tasks from unstructureddemonstrations. In: IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 5239–5246.Niekum S, Osentoski S, Konidaris G, Chitta S, Marthi B and BartoA (2015) Learning grounded finite-state representations fromunstructured demonstrations. International Journal of RoboticsResearch IEEE InternationalConference on Robotics and Automation . pp. 5006–5011.Ning K, Kulvicius T, Tamosiunaite M and W¨org¨otter F (2012) Anovel trajectory generation method for robot control. Journalof Intelligent and Robotic Systems IEEE Transaction onRobotics and Automation BritishMachine Vision Conference . pp. 101.1–101.11.Ojer De Andres M, Mahdi Ghazaei Ardakani M and RobertssonA (2018) Reinforcement learning for 4-finger-gripper manip-ulation. In: IEEE International Conference on Robotics andAutomation . pp. 4257–4262.Pahic R, Gams A, Ude A and Morimoto J (2018) Deep encoder-decoder networks for mapping raw images to dynamicmovement primitives. In: IEEE International Conference onRobotics and Automation . pp. 5863–5868.Pahiˇc R, Ridge B, Gams A, Morimoto J and Ude A (2020)Training of deep neural networks for the generation of dynamicmovement primitives. Neural Networks IEEEInternational Conference on Robotics and Automation . pp.5582–5589.Papageorgiou D, Dimeas F, Kastritsi T and Doulgeri Z (2020a)Kinesthetic guidance utilizing dmp synchronization andassistive virtual fixtures for progressive automation. Robotica Robotics and Computer-IntegratedManufacturing 61: 101824. Paraschos A, Daniel C, Peters J and Neumann G (2013)Probabilistic movement primitives. In: Burges C, Bottou L,Welling M, Ghahramani Z and Weinberger K (eds.) Advancesin Neural Information Processing Systems 26 . Lake Tahoe,Nevada, US: Curran Associates, Inc., pp. 2616–2624.Park DH, Hoffmann H, Pastor P and Schaal S (2008) Movementreproduction and obstacle avoidance with dynamic movementprimitives and potential fields. In: IEEE-RAS InternationalConference on Humanoid Robots . Daejeon, South Korea, pp.91–98.Pastor P, Hoffmann H, Asfour T and Schaal S (2009)Learning and generalization of motor skills by learning fromdemonstration. In: IEEE International Conference on Roboticsand Automation . Kobe, Japan, pp. 763–768.Pastor P, Kalakrishnan M, Chitta S, Theodorou E and SchaalS (2011) Skill learning and task outcome prediction formanipulation. In: IEEE International Conference on Roboticsand Automation . Shanghai, China: IEEE, pp. 3828–3834.Pastor P, Kalakrishnan M, Meier F, Stulp F, Buchli J, TheodorouE and Schaal S (2013) From dynamic movement primitives toassociative skill memories. Robotics and Autonomous Systems IEEE-RAS InternationalConference on Humanoid Robots . Osaka, Japan, pp. 309–315.Pastor P, Righetti L, Kalakrishnan M and Schaal S (2011) Onlinemovement adaptation based on previous sensor experiences. In: IEEE/RSJ International Conference on Intelligent Robots andSystems . San Francisco, CA, USA, pp. 365–371.Paxton C, Jonathan F, Kobilarov M and Hager G (2016) Dowhat i want, not what i did: Imitation of skills by planningsequences of actions. In: IEEE/RSJ International Conferenceon Intelligent Robots and Systems . pp. 3778–3785.Perk BE and Slotine JJE (2006) Motion primitives for robotic flightcontrol. arXiv preprint cs/0609140 .Perrin N and Schlehuber-Caissier P (2016) Fast diffeomorphicmatching to learn globally asymptotically stable nonlineardynamical systems. Systems & Control Letters 96: 51–59.Pervez A, Ali A, Ryu JH and Lee D (2017a) Novel learning fromdemonstration approach for repetitive teleoperation tasks. In: IEEE World Haptics Conference . pp. 60–65.Pervez A, Latifee H, Ryu JH and Lee D (2019) Motion encodingwith asynchronous trajectories of repetitive teleoperationtasks and its extension to human-agent shared teleoperation. Autonomous Robots IntelligentService Robotics IEEE-RASInternational Conference on Humanoid Robots . pp. 191–197.Peternel L and Ajoudani A (2017) Robots learning from robots:A proof of concept study for co-manipulation tasks. In: IEEE-RAS International Conference on Humanoid Robots .Birmingham, UK: IEEE, pp. 484–490.Peternel L, Noda T, Petriˇc T, Ude A, Morimoto J and Babiˇc J(2016a) Adaptive control of exoskeleton robots for periodicassistive behaviours based on emg feedback minimisation. PLOS ONE Prepared using sagej.cls Journal Title XX(X) Peternel L, Petriˇc T and Babiˇc J (2015) Human-in-the-loopapproach for teaching robot assembly tasks using impedancecontrol interface. In: IEEE international conference on roboticsand automation . Seattle, WA, USA: IEEE, pp. 1497–1502.Peternel L, Petriˇc T and Babiˇc J (2018a) Robotic assembly solutionby human-in-the-loop teaching method based on real-timestiffness modulation. Autonomous Robots Autonomousrobots IEEE Robotics andAutomation Letters IEEE Transactions on Neural Systems andRehabilitation Engineering IEEE-RAS InternationalConference on Humanoid Robots . Cancun, Mexico: IEEE, pp.489–494.Peternel L, Tsagarakis N, Caldwell D and Ajoudani A (2018b)Robot adaptation to human physical fatigue in human–robotco-manipulation. Autonomous Robots Neural Information Processing . Springer Berlin Heidelberg,pp. 233–242.Peters J and Schaal S (2008b) Reinforcement learning of motorskills with policy gradients. Neural Networks IEEE-RAS International Conferenceon Humanoid Robots . pp. 346–351.Petric T, Gams A, Colasanto L, Ijspeert A and Ude A (2018)Accelerated sensorimotor learning of compliant movementprimitives. IEEE Transactions on Robotics The International Journal of Robotics Research IEEE/RSJInternational Conference on Intelligent Robots and Systems .Chicago, IL, USA, pp. 1790–1795.Petriˇc T, Gams A, ˇZlajpah L, Ude A and Morimoto J (2014b) Onlineapproach for altering robot behaviors based on human in theloop coaching gestures. In: IEEE International Conference onRobotics and Automation . Hong Kong, China, pp. 4770–4776.Petriˇc T, Goljat R and Babiˇc J (2016) Cooperative human-robotcontrol based on fitts’ law. In: IEEE-RAS InternationalConference on Humanoid Robots . pp. 345–350.Pfeiffer S and Angulo C (2015) Gesture learning and execution ina humanoid robot via dynamic movement primitives. PatternRecognition Letters 67: 100–107.Prada M, Remazeilles A, Koene A and Endo S (2013)Dynamic movement primitives for human-robot interaction: comparison with human behavioral observation. In: IEEE/RSJInternational Conference on Intelligent Robots and Systems .Tokyo, Japan, pp. 1168–1175.Prada M, Remazeilles A, Koene A and Endo S (2014)Implementation and experimental validation of dynamicmovement primitives for object handover. In: IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 2146–2153.Prakash R, Behera L, Mohan S and Jagannathan S (2020) Dynamictrajectory generation and a robust controller to intercept amoving ball in a game setting. IEEE Transactions on ControlSystems Technology Frontiers Robotics AI IEEE-RAS InternationalConference on Humanoid Robots . pp. 223–229.Rai A, Sutanto G, Schaal S and Meier F (2017) Learning feedbackterms for reactive planning and control. In: IEEE InternationalConference on Robotics and Automation . pp. 2184–2191.Ramirez-Amaro K, Beetz M and Cheng G (2015) Understandingthe intention of human activities through semantic perception:Observation, understanding and execution on a humanoidrobot. Advanced Robotics Gaussian Processes forMachine Learning . Cambridge, Massachusetts: The MITPress.Ravichandar H and Dani A (2015) Learning contracting nonlineardynamics from human demonstration for robot motionplanning. In: ASME Dynamic Systems and Control Conference .Reinhart R and Steil J (2014) Efficient policy search with aparameterized skill memory. In: IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 1400–1407.Reinhart R and Steil J (2015) Efficient policy search inlow-dimensional embedding spaces by generalizing motionprimitives with a parameterized skill memory. AutonomousRobots Frontiers in Computa-tional Neuroscience .Salvador S and Chan P (2007) Toward accurate dynamic timewarping in linear time and space. Intelligent Data Analysis International Joint Conference on Neural Networks . pp. 1068–1075.San Juan II, Sloth C, Kramberger A, Petersen HG, Østerg˚ardEH and Savarimuthu TR (2019) Towards reversible dynamicmovement primitives. In: IEEE/RSJ International Conferenceon Intelligent Robots and Systems . Macau, China: IEEE, pp.5063–5070.Saveriano M, Franzel F and Lee D (2019) Merging positionand orientation motion primitives. In: IEEE InternationalConference on Robotics and Automation . Montreal, QC,Canada, pp. 7041–7047.Schaal S (1999) Is imitation learning the route to humanoid robots? Trends in cognitive sciences Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel Schaal S (2006a) Dynamic movement primitives -a framework formotor control in humans and humanoid robotics. In: Kimura H,Tsuchiya K, Ishiguro A and Witte H (eds.) Adaptive Motion ofAnimals and Machines . Tokyo: Springer Tokyo, pp. 261–280.Schaal S (2006b) Dynamic systems: brain, body, and imitation ,chapter 1. Cambridge University Press, pp. 177–214.Schaal S and Atkeson CG (1998) Constructive incremental learningfrom only local information. Neural Comput. Progress in brain research IEEE InternationalConference on Robotics and Automation . Seattle, WA, USA,pp. 440–447.Schroecker Y, Amor H and Thomaz A (2016) Directing policysearch with interactively taught via-points. In: Proceedings ofthe International Joint Conference on Autonomous Agents andMultiagent Systems, AAMAS . pp. 1052–1059.Shahriari E, Kramberger A, Gams A, Ude A and Haddadin S(2017) Adapting to contacts: Energy tanks and task energy forpassivity-based dynamic movement primitives. In: IEEE-RASInternational Conference on Humanoid Robots . Birmingham,UK, pp. 136–142.Sidiropoulos A, Karayiannidis Y and Doulgeri Z (2019) Human-robot collaborative object transfer using human motionprediction based on dynamic movement primitives. In: European Control Conference . Naples, Italy, pp. 2583–2588.Silv´erio J, Huang Y, Abu-Dakka FJ, Rozo L and Caldwell DG(2019) Uncertainty-aware imitation learning using kernelizedmovement primitives. In: IEEE/RSJ International Conferenceon Intelligent Robots and Systems . IEEE, pp. 90–97.Sloth C, Kramberger A and Iturrate I (2020) Towards easy setup ofrobotic assembly tasks. Advanced Robotics IEEE/RSJ International Conference on Intelligent Robotsand Systems . pp. 8246–8251.Song C, Liu G, Zhang X, Zang X, Xu C and Zhao J (2020) Robotcomplex motion learning based on unsupervised trajectorysegmentation and movement primitives. ISA Transactions IEEE International Conference on Robotics and Automation .pp. 3213–3220.Steinmetz F, Montebelli A and Kyrki V (2015) Simultaneouskinesthetic teaching of positional and force requirements forsequential in-contact tasks. In: IEEE-RAS InternationalConference on Humanoid Robots . pp. 202–209.Strachan S, Murray-Smith R, Oakley I and ¨Angeslev¨a J (2004)Dynamic primitives for gestural interaction. In: Brewster Sand Dunlop M (eds.) Mobile Human–Computer Interaction .Springer Berlin Heidelberg, pp. 325–330.Straizys A, Burke M and Ramamoorthy S (2020) Surfing onan uncertain edge: Precision cutting of soft tissue usingtorque-based medium classification. In: IEEE InternationalConference on Robotics and Automation . pp. 4623–4629. Stramigioli S (2001) Modeling and IPC Control of InteractiveMechanical Systems – A Coordinate-Free Approach , LectureNotes in Control and Information Sciences , volume 266.Springer-Verlag London.Stulp F, Buchli J, Ellmer A, Mistry M, Theodorou EA and SchaalS (2012a) Model-free reinforcement learning of impedancecontrol in stochastic environments. IEEE Transactions onAutonomous Mental Development IEEE-RAS InternationalConference on Humanoid Robots . pp. 589–595.Stulp F, Raiola G, Hoarau A, Ivaldi S and Sigaud O (2013)Learning compact parameterized skills with a single regression.In: IEEE-RAS International Conference on Humanoid Robots .Atlanta, GA, USA, pp. 417–422.Stulp F and Schaal S (2011) Hierarchical reinforcement learningwith movement primitives. In: IEEE-RAS InternationalConference on Humanoid Robots . Bled, Slovenia, pp. 231–238.Stulp F, Theodorou E, Buchli J and Schaal S (2011) Learning tograsp under uncertainty. In: IEEE International Conference onRobotics and Automation . pp. 5703–5708.Stulp F, Theodorou E, Kalakrishnan M, Pastor P, Righetti L andSchaal S (2011) Learning motion primitive goals for robustmanipulation. In: IEEE/RSJ International Conference onIntelligent Robots and Systems . pp. 325–331.Stulp F, Theodorou E and Schaal S (2012b) Reinforcement learningwith sequences of motion primitives for robust manipulation. IEEE Transactions on Robotics IEEEInternational Conference on Robotics and Automation . pp.2203–2208.Su H, Mariani A, Ovur SE, Menciassi A, Ferrigno G and De MomiE (2021) Toward teaching by demonstration for robot-assistedminimally invasive surgery. IEEE Transactions on AutomationScience and Engineering .Sutanto G, Su Z, Schaal S and Meier F (2018) Learning sensorfeedback models from demonstrations via phase-modulatedneural networks. In: IEEE International Conference onRobotics and Automation . pp. 1142–1149.Tamosiunaite M, Nemec B, Ude A and W¨org¨otter F (2011)Learning to pour with a robot arm combining goal and shapelearning for dynamic movement primitives. Robotics andAutonomous Systems IEEE International Conference on Mechatronics andAutomation . pp. 525–530.Tan H and Kawamura K (2011) A computational framework forintegrating robotic exploration and human demonstration inimitation learning. In: IEEE International Conference onSystems, Man and Cybernetics . pp. 2501–2506.Tan H, Zhao Y and Kannan B (2016) Applying adaptive controlin modeling human motion behaviors in reinforcement roboticlearning from demonstrations. In: AAAI Fall Symposium -Technical Report , volume FS-16-01 - FS-16-05. pp. 79–85. Prepared using sagej.cls Journal Title XX(X) Tayebi A (2004) Adaptive iterative learning control for robotmanipulators. Automatica TheJournal of Machine Learning Research 11: 3137–3181.Thota P, Ravichandar H and Dani A (2016) Learning and synchro-nization of movement primitives for bimanual manipulationtasks. In: IEEE 55th Conference on Decision and Control . pp.945–950.Thrun S (1996) Is learning the n-th thing any easier than learning thefirst? In: Advances in neural information processing systems .pp. 640–646.Tomi´c T, Maier M and Haddadin S (2014) Learning quadrotormaneuvers from optimal control and generalizing in real-time. In: IEEE International Conference on Robotics andAutomation . pp. 1747–1754.Travers M, Whitman J and Choset H (2018) Shape-basedcoordination in locomotion control. The International Journalof Robotics Research Robotics: Scienceand Systems , volume 12.Tsagarakis NG, Caldwell DG, Negrello F, Choi W, Baccelliere L,Loc VG, Noorden J, Muratore L, Margan A, Cardellino A et al.(2017) Walk-man: A high-performance humanoid platform forrealistic environments. Journal of Field Robotics IEEE Transactions on Robotics IEEEInternational Conference on Robotics and Automation , 3. HongKong, China: IEEE, pp. 2997–3004.Ugur E and Girgin H (2020) Compliant parametric dynamicmovement primitives. Robotica IEEEInternational Conference on Robotics and Automation . pp.6428–6434.Umlauft J, Sieber D and Hirche S (2014) Dynamic movementprimitives for cooperative manipulation and synchronizedmotions. In: IEEE International Conference on Robotics andAutomation . Hong Kong, China, pp. 766–771.Villani L and De Schutter J (2008) Force control. In: Siciliano Band Khatib O (eds.) Springer Handbook of Robotics . Berlin,Heidelberg: Springer Berlin Heidelberg, pp. 161–185.Vuga R, Nemec B and Ude A (2015a) Enhanced policy adaptationthrough directed explorative learning. International Journal ofHumanoid Robotics IEEE-RASInternational Conference on Humanoid Robots . pp. 547–553.Vuga R, Nemec B and Ude A (2016) Speed adaptation forself-improvement of skills learned from user demonstrations. Robotica IEEE Transactions on Intelligent Transportation Systems .Wang B, Gong J, Zhang R and Chen H (2018) Learning to segmentand represent motion primitives from driving data for motionplanning applications. In: IEEE Conference on IntelligentTransportation Systems . pp. 1408–1414.Wang J and Payandeh S (2015) A study of hand motion/posturerecognition in two-camera views. Lecture Notes in ComputerScience (including subseries Lecture Notes in ArtificialIntelligence and Lecture Notes in Bioinformatics) Neurocomputing IEEE/RSJ International Conference on IntelligentRobots and Systems . pp. 3765–3771.Weitschat R and Aschemann H (2018) Safe and efficient human–robot collaboration part ii: Optimal generalized human-in-the-loop real-time motion generation. IEEE Robotics andAutomation Letters IEEE/RSJ InternationalConference on Intelligent Robots and Systems . Tokyo, Japan:IEEE, pp. 5636–5643.W¨org¨otter F, Geib C, Tamosiunaite M, Aksoy E, Piater J, Xiong H,Ude A, Nemec B, Kraft D, Kruger N, Wachter M and Asfour T(2015) Structural bootstrapping-a novel, generative mechanismfor faster and more efficient acquisition of action-knowledge. IEEE Transactions on Autonomous Mental Development IEEE/RSJInternational Conference on Intelligent Robots and Systems .pp. 8582–8588.Xu F, Huang R, Cheng H, Qiu J, Xiang S, Shi C and Ma W (2020)Stair-ascent strategies and performance evaluation for a lowerlimb exoskeleton. International Journal of Intelligent Roboticsand Applications IEEE International Symposium onIntelligent Control . pp. 186–191.Xu JX, Wang W, Goh J and Lee G (2005) Internal model approachfor gait modeling and classification. In: IEEE InternationalConference of the IEEE Engineering in Medicine and Biology .pp. 7688–7691.Yang C, Ganesh G, Haddadin S, Parusel S, Albu-Sch¨affer Aand Burdet E (2011) Human-like adaptation of force andimpedance in stable and unstable interactions. Robotics, IEEETransactions on IEEE Transactions on Industrial Informatics Prepared using sagej.cls averiano, Abu-Dakka, Kramberger, and Peternel variable impedance skills. IEEE/ASME Transactions onMechatronics IEEE Robotics and Automation Letters IEEE International Conference onSystems, Man and Cybernetics . pp. 3761–3766.Yuan Y, Li Z, Zhao T and Gan D (2020) Dmp-based motiongeneration for a walking exoskeleton robot using reinforcementlearning. IEEE Transactions on Industrial Electronics Robotics and Autonomous Systems IntelligentRobotics and Applications . Springer International Publishing,pp. 474–484.Zhao T, Deng M, Li Z and Hu Y (2020) Cooperative manipulationfor a mobile dual-arm robot using sequences of dynamicmovement primitives. IEEE Transactions on Cognitive andDevelopmental Systems International Journal of Advanced Robotic Systems IEEE/RSJ InternationalConference on Intelligent Robots and Systems . pp. 3202–3209.Zhou Y, Do M and Asfour T (2016a) Coordinate changedynamic movement primitives—a leader-follower approach.In: IEEE/RSJ International Conference on Intelligent Robotsand Systems . Daejeon, South Korea, pp. 5481–5488.Zhou Y, Do M and Asfour T (2016b) Learning and force adaptationfor interactive actions. In: IEEE-RAS International Conferenceon Humanoid Robots . pp. 1129–1134.Zhou Y, Gao J and Asfour T (2019) Learning via-point movementprimitives with inter- and extrapolation capabilities. In: IEEE/RSJ International Conference on Intelligent Robots andSystems . Macau, China, pp. 4301–4308.