Learning from Demonstration for Hydraulic Manipulators
Markku Suomalainen, Janne Koivumäki, Santeri Lampinen, Ville Kyrki, Jouni Mattila
LLearning from Demonstration for Hydraulic Manipulators
Markku Suomalainen , Janne Koivum¨aki , Santeri Lampinen , Ville Kyrki and Jouni Mattila Abstract — This paper presents, for the first time, a methodfor learning in-contact tasks from a teleoperated demonstrationwith a hydraulic manipulator. Due to the use of extremelypowerful hydraulic manipulator, a force-reflected bilateral tele-operation is the most reasonable method of giving a humandemonstration. An advanced subsystem-dynamic-based controldesign framework, virtual decomposition control (VDC), is usedto design a stability-guaranteed controller for the teleoperationsystem, while taking into account the full nonlinear dynamicsof the master and slave manipulators. The use of fragile force/torque sensor at the tip of the hydraulic slave manipulator isavoided by estimating the contact forces from the manipulatoractuators’ chamber pressures. In the proposed learning method,it is observed that a surface-sliding tool has a friction-dependentrange of directions (between the actual direction of motion andthe contact force) from which the manipulator can apply forceto produce the sliding motion. By this intuition, an intersectionof these ranges can be taken over a motion to robustly find adesired direction for the motion from one or more demonstra-tions. The compliant axes required to reproduce the motioncan be found by assuming that all motions outside the desireddirection is caused by the environment, signalling the needfor compliance. Finally, the learning method is incorporatedto a novel VDC-based impedance control method to learncompliant behaviour from teleoperated human demonstrations.Experiments with 2-DOF hydraulic manipulator with a 475kgpayload demonstrate the suitability and effectiveness of theproposed method to perform learning from demonstration(LfD) with heavy-duty hydraulic manipulators.
I. INTRODUCTIONHydraulic actuation can provide many advantages overtheir electrical counterparts due to simplicity, robustness,low cost, and large power-to-weight ratio. For these rea-sons, hydraulically actuated heavy-duty work machines havebeen used for decades in various harsh-environment andrisk-intensive industries such as agriculture, construction,forestry, and mining industries. In the recent years, hydraulicmaintenance robots have also been extensively developedto operate heavy objects in harsh environments, such as inthe nuclear fusion industry (see [1] and [2]) containing highradiation, extreme temperatures, and strong magnetic fields.It is already projected that the advent of robotics will revo-lutionize the (hydraulic) heavy-duty machine industry [3].It is evident that in extreme working conditions where astable high-bandwidth communication is not guaranteed, new
This work was supported by Academy of Finland under the project“Cooperative heavy-duty hydraulic manipulators for sustainable subsea in-frastructure installation and dismantling,” under Grants 286553 and 286580. M. Suomalainen and V. Kyrki are with School of Electrical Engineering,Aalto University, Finland. [email protected] J. Koivum¨aki, S. Lampinen and J. Mattila are with Tampere Universityof Technology, Finland. [email protected] * These authors contributed equally to this paper. solutions for automated operations are required to extendhuman capabilities to operate harsh environments safely.A major challenge in automating heavy-duty hydraulicmachinery is that performed tasks often require contact withthe environment. As hydraulic manipulators are extremelypowerful, the use of position control for in-contact tasks isnot advised since a small error in the position would resultin large forces applied against the environment, possiblyleading to significant damage. This can be prevented byusing compliance, i.e., allowing the manipulator to deviatefrom the planned trajectory in case of physical constraints.Impedance control [4], in which a virtual spring-damper sys-tem is programmed to the controller, is a natural method forrealizing compliant motions. However, automatic planningof compliant motions is shown to be mathematically infea-sible [5]. Furthermore, designing a stability-guaranteed (i.e.,theoretically sound) and high-precision closed-loop controlfor multiple degrees-of-freedom ( n -DOF) hydraulic manipu-lators is a well-known challenge due to their highly nonlineardynamic behaviour [3], [6]–[8]. It is valid to mention thatthe control system stability is the primary requirement forall control systems and an unstable system is typicallyuseless and potentially dangerous [9], [10]. As reviewedin [3], nonlinear model-based (NMB) control methods haveshown to provide the most advanced control performance forhydraulic manipulators. However, it was only very recentlyin [7], [8] where the authors managed to provide for the firsttime stability-guaranteed NMB control designs for n -DOFhydraulic manipulators performing contact tasks.Learning from Demonstration (LfD) is a well-establishedparadigm in robotics [11]. The idea is that a human teachershows a demonstration by performing a specific task and therobot then learns to perform the same task even in slightlydifferent environments. When dealing with heavy-duty hy-draulic manipulators, teleoperation is the most logical choiceto provide the demonstration, since this allows the domainexperts to demonstrate the task, such as maintenance or earthmoving, in a natural way. However, designing a bilateralforce-reflected teleoperation for a hydraulic slave manipula-tor controlled with an electric master manipulator becomes achallenging control design task due to the slave’s significantnonlinearities, its dynamic connection to the master, and theneed for arbitrary motion/force scaling between these sig-nificantly different manipulators. Moreover, demonstrationsfrom teleoperation are shown to be on average noisier thankinesthetic teaching [12]. Finally, learning compliant mo-tions from human demonstrations requires the slave ma-nipulator contact force measurement. Using a conventionalforce/torque sensor to measure the contact force is infeasible a r X i v : . [ c s . R O ] S e p ith hydraulic manipulators as these sensors are usuallysensitive to shocks and overloading, situations that frequentlyoccur in hydraulic operations.Common tools for encoding LfD motions have the draw-back that the position and force trajectories are coupled,rendering them ill-equipped to function in changing andhazardous environments where errors in position may growlarge. We propose using the method presented in [13], wherea task is modelled as a sequence of linear impedance con-trollers. From position and contact force data recorded in oneor more demonstrations by haptic feedback teleoperation, wededuce the direction and stiffness parameters of the controllerfor performing a single motion. Sequencing the motions toperform a full task is outside of the scope of this paper, butthere are existing methods [14] and ongoing research [15] onhow to learn and perform a whole task, such as excavationor heavy-duty manipulation or assembly.In this paper, we show that despite the significant chal-lenges, it is possible to realize LfD capabilities for hydraulicmanipulators using a force-reflected bilateral teleoperationthat allows an arbitrary motion and force scaling between themanipulators. Furthermore, the proposed method takes anadvantage of a novel LfD method [13] which is incorporatedto a novel impedance control method [8] to learn compliantbehaviour from teleoperated human demonstrations. The useof a fragile force sensor at the contact point is avoided byestimating the contact forces from the hydraulic cylinders’chamber pressures [7]. The experiments with a full-scalehydraulic slave manipulator (having an attached payload of475 kg) demonstrate the feasibility of the proposed method.II. RELATED WORKThe current state-of-the-art on control of hydraulic n -DOF manipulators is reviewed in [3]. This study shows thatstability-guaranteed NMB control methods can provide themost advanced control performance for highly nonlinearhydraulic manipulators. The study further shows that theauthors’ stability-guaranteed NMB controls demonstrate thestate-of-the-art for hydraulic manipulators’ free-space motion(see [16]) and constrained motion (see [7], [8]) controls.An extensive review on teleoperation in [17] shows thatteleoperation is a well-established paradigm with electric sys-tems. However, very few studies exist on teleoperation withhydraulic n -DOF manipulators. In [18], [19], the authors pro-posed for the first time a full-dynamics-based force-reflectedbilateral teleoperation for hydraulic n -DOF manipulators.The previous studies on bilateral teleoperation with hydraulic n -DOF manipulators, using linearized system models formaster and slave manipulators, can be found in [20], [21].Common methods for encoding LfD skills are DynamicMovement Primitive (DMP) [22] and Hidden Semi-MarkovModels [23] , where the trajectory is learned as a set ofattractors. Recently, also force profile has been added toDMPs [24]. However, the problem of DMPs with forceprofile is the tight positional coupling between force andposition, which in turn means that there must not be a lot ofvariance between demonstrations or initial contact locations Fig. 1: Position alignment by taking advantage of chamfers. in case of chamfers. It has been shown that giving a peg-in-hole demonstration by teleoperation is more difficult than bykinesthetic teaching [12]. Therefore, the method for learningthe skill in teleoperation must be robust against differences indemonstrations, such as different starting or ending positions.Recently, an LfD method for teleoperation was presentedby Pervez et al. [25], who acknowledge the need for robust-ness when learning from teleoperated demonstrations. Theychoose one of the demonstrations as a reference trajectoryand then extract the applicable data from other incomplete orjerky demonstrations. In our method [13], this step is notnecessary and demonstrations of different lengths and start-ing positions are naturally combined. Moreover, Pervez et al.used DMPs which, without the trajectory and force profilecoupling, cannot take advantage of chamfers such as in Fig.1 where contact force can appear from various directions, acommon case in maintenance tasks. Finally, to the authors’best knowledge, LfD has previously not been shown to workwith hydraulic manipulators for in-contact tasks.
III. METHODThe goal is to show that our learning method in [13] 1) canbe used to learn impedance parameters from a teleoperatedhuman demonstration with a hydraulic heavy-duty manipu-lator, and 2) can be incorporated to our novel impedancecontrol method in [8] such that the demonstrated motion canbe reproduced with guaranteed stability, even in contact withvarying environment and presence of measurement errors.Overall, the proposed method is built on the authors’ earliercontributions on high-precision and stability-guaranteed n -DOF manipulator controls [7], [8], [16] and on LfD al-gorithms [13], [26]. Furthermore, the solution relies on anadvanced force-reflected teleoperation method in [27], whichis compatible to the hydraulic manipulator control methodsin [7], [8], [16]. The proposed method consists of fourimportant parts, i.e, control system design , contact forceestimation , bilateral teleoperation and learning algorithm . A. Control Method
As discussed, NMB control methods have shown to pro-vide the most advanced control performance for hydraulicmanipulators [3]. The idea in these methods is to designa specific feedforward term (from the system inverse dy-namics) to proactively generate the required actuator forcesfrom the required motion dynamics. The feedforward termis designed to be responsible for the major control actions,whereas the feedback terms are used only to overcome uncer-tainties, to maintain stability, and to address transition issues.Next, Section III-A.1 describes the basis of the VDC ap-proach, which is used as an underlying NMB control designframework for the hydraulic slave manipulator and electricmaster manipulator. Then, Section III-A.2 introduces the bject 1Open chain 1Open chain 2 Object 0 M {B} {G} Closed chain 2Closed chain 1 Virtual
Decomposition (a)(b)
Object = Open chain = O b j ec t Open chain 4Open chain 3Open chain 2Open chain 1 O b j ec t Object 0
VCP =
Simple oriented graph (c) = VCP {G} {B}
Object 2
Fig. 2: (a) Two-DOF hydraulic manipulator. (b) A virtual decom-position of the system. (c) A simple oriented graph of the system. impedance control method [8] suitable for the VDC frame-work. The method to connected the separately controlledmaster and slave plants for the force-reflected bilateral tele-operation is discussed in Section III-C.
The detailed controldesigns for the hydraulic slave manipulator and electricmaster manipulator can be found in [8] and [18], [19].1) Virtual Decomposition Control:
VDC [28], [29] isa novel NMB subsystem-dynamics-based control designmethod, developed for controlling complex robotic systems.The method allows the original system (see Fig. 2a) to be virtually decomposed to modular subsystems , objects andopen chains (see Fig. 2b), using conceptual virtual cuttingpoints (VCPs). This enables that the control system designand its stability analysis can be performed locally at thesubsystem level without imposing additional approximations.After the virtual decomposition, the system is representedby a simple oriented graph (SOG); see Fig. 2c. In the SOG,each subsystem represents a node , and each VCP representsa directed edge , the direction of which defines the forcereference direction. Then, in the control system design, thekinematics of subsystems can be computed by propagatingalong the direction of the VCP flow in the SOG from the source node (object 0) toward the sink node (object 2); seeFig. 2c. Using the kinematics, the dynamics of subsystemscan be computed by propagating along the opposite directionof the SOG from the sink node toward the source node .Finally, the subsystems’ control design can be establishedusing the system kinematics and dynamics.The unique control design philosophy of VDC has broughta modularity to control system engineering, enabling, e.g.,that changing the control (or dynamics) of one subsystemdoes not affect the control equations of the rest of the system.Furthermore, an adaptive control can be incorporated to the control design to cope with uncertain parameters involvedin subsystem dynamics. With advances of VDC, the state-of-the-art control performances are already demonstrated forhydraulic manipulators in their free-space motion control andin constrained motion control; see [3], [7], [8], [16], [30].
2) The Proposed Impedance Control:
In view of [4], thefull expression of the Cartesian target impedance law formanipulators can be described as G FFF d − G FFF = − M d (¨ χχχ d − ¨ χχχ ) − D d ( ˙ χχχ d − ˙ χχχ ) − K d ( χχχ d − χχχ ) (1)where χχχ ∈ R n and G FFF ∈ R n are the Cartesian positionand contact force vectors; χχχ d ∈ R n and G FFF d ∈ R n arethe desired Cartesian position and contact force vectors; and M d ∈ R n × n , D d ∈ R n × n and K d ∈ R n × n characterizethe desired inertia, damping and stiffness, respectively. Then,similar to [8], [31], neglecting the inertia term in (1), thetarget impedance can be written as G FFF d − G FFF = − D d ( ˙ χχχ d − ˙ χχχ ) − K d ( χχχ d − χχχ ) . (2)The VDC approach is a velocity-based control method ,which takes care of the system dynamics [28]. In VDC, arequired velocity serves as a reference trajectory for a systemand the control objective is to make the controlled actualvelocities track the required velocities. The general format ofa required velocity includes a desired velocity (which usuallyserves as a reference trajectory for a system) and one or moreterms that are related to control errors [28]. The followingcontrol law (required velocity vector ˙ χχχ r ) was proposed in [8]to perform the impedance control within the VDC framework ˙ χχχ r = ˙ χχχ d + ΛΛΛ χ ( χχχ d − χχχ ) + ΛΛΛ f ( G FFF d − G FFF ) (3)where ΛΛΛ χ ∈ R n × n and ΛΛΛ f ∈ R n × n are two positive-definitematrices characterizing Cartesian position and force controland they should be defined according to Condition 1. Condition 1:
Matrices
ΛΛΛ f and ΛΛΛ χ should be defined as ΛΛΛ f = D − , ΛΛΛ χ = D − K d such that both ΛΛΛ f and ΛΛΛ χ qualify as positive-definite.Then, the following Theorem 1 provides that the targetimpedance behaviour (2) can be achieved. Theorem 1:
Consider the proposed control law (3), whichdefines the required velocity behaviour for the system. If
ΛΛΛ f and ΛΛΛ χ in (3) are defined according to Condition 1, then thecontrol law (3) equals the target impedance law (2). Proof:
See Appendix I.
B. Contact Force Estimation
A contact force control requires a force feedback. How-ever, the use of conventional six-DOF force/moment sensor(built using either straingauge technology or optics) at amanipulator tip is not practical with extremely powerfulhydraulic manipulators. Thus, alternative methods for contactforce measuring (estimation) are highly favorable.A force-sensorless contact force control method was devel-oped in [7] by the authors. In the method, the contact forceswere estimated from chamber pressures of the manipulator’shydraulic actuators by using an accurate system modelingand gravity compensation; see [7]. However, estimating con-tact forces from the chamber pressures is challenging due tohe nonlinear dynamic behaviour of hydraulic manipulators.Furthermore, in the estimation method in [7], the systeminertia and piston friction were not considered in the contactforce estimation. Thus, the greater the manipulator velocity,the more error in the contact force estimates. To improve thecontact force estimation accuracy, e.g., the method presentedin [32] can be used to address the system inertia throughaccurate estimation of manipulators’ link accelerations.Eventually, inaccuracies will always exist in the contactforce estimation with hydraulic manipulators. This is due tothe systems complexity involved with non-smooth and dis-continuous nonlinearities, and model and parameter uncer-tainties. Thus, in this study, a major emphasis is not paid tothe contact force estimation accuracy, as the learning methodshould be robust for the force estimation errors . C. Teleoperation
Teleoperation can extend human capabilities to hard-to-reach, hazardous or dangerous environments. In addition, itcan be used to scale control actions to both micro and macroenvironments, e.g., to surgery or control of an excavator.Furthermore, when dealing with heavy and powerful hy-draulic manipulators, kinesthetic teaching is impossible toperform, and teleoperation is the remaining option. Withforce-reflected bilateral teleoperation, the operator is able to ) send (control) the desired motions to a slave manipulatorwith a master manipulator and ) physically feel forces actingat the slave manipulator.The designed force-reflected bilateral teleoperation systemis identical with the system presented in [18], [19]. IndividualNMB controllers are designed for both master and slavemanipulator according to VDC design principles defined insection III-A.1. Then, the two separately controlled plants areconnected with each other using the following equations [27] v sr = κ p ˜v m + Λ (cid:0) κ p ˜p m − p s (cid:1) − A (cid:0) ˜f s + κ f ˜f m (cid:1) (4) v mr = 1 κ p ˜v s + Λ (cid:0) κ p ˜p s − p m (cid:1) − A κ p (cid:0) ˜f s + κ f ˜f m (cid:1) (5)where Λ ∈ R n × n is a diagonal positive-definite matrixdefining position feedback gain, A ∈ R n × n is a positive-definite matrix defining force feedback gain, and p m and p s denote the position/orientation of the master and slavemanipulator, respectively, subject to ˙ p m = v m and ˙ p s = v s .In (4) and (5), tilde ˜ · denotes that the variable is obtained witha first order low-pass filter in [27], ˜f m and ˜f s are the filteredcontact forces from the master/slave manipulator toward theoperator/environment. Scaling factors κ p and κ f are usedto scale the position and force of the master manipulator,respectively. The scaling factors allow arbitrary motion andforce scaling between the manipulators [28] .The asymptotic position/velocity tracking between themaster and slave manipulators can be guaranteed in free-space and constrained motions by following the control de-sign principles in [27], [28]. Finally, in [20], [21], linearizedsystem models were used in hydraulic manipulator’s bilateralteleoperation, whereas our method takes the full nonlineardynamics of the master and slave manipulators into account . Fig. 3:
Illustration of the forces acting on the end-effector duringa sliding motion. G FFF is the contact force estimated from chamberpressures,
FFF N the normal force, FFF µ the friction force, FFF ext the external force applied during a demonstration, vvv a the actualdirection of motion and s the sector of desired directions. D. Learning
To reproduce a motion with linear dynamics, the parame-ters χχχ d and ΛΛΛ χ in (3) must be deduced from the demonstra-tion data. Since an NMB controller requires the trajectoryto be differentiable, we compute χχχ d as a quintic path [33],providing a smooth trajectories for position, velocity and ac-celeration. The trajectory is created from starting position χ χ χ and ending position χ n χ n χ n , where χ n χ n χ n = χ χ χ + δ ˆ v ˆ v ˆ v ∗ d with δ definingthe length of the trajectory and ˆ v ˆ v ˆ v ∗ d the desired directionlearned from demonstrations. ΛΛΛ χ we compute by Condition1 from the traditional impedance control stiffness matrix K d .Now, a controller with correctly learned parameters ˆ v ˆ v ˆ v ∗ d and K d can reproduce motions such as in Fig. 1 where the samecontroller can perform both depicted motions which result inthe completed assembly. First, we present how to learn ˆ v ˆ v ˆ v ∗ d and, then, K d in the Cartesian space.
1) Learning desired direction:
The intuition for learningthe desired direction ˆ v ˆ v ˆ v ∗ d stems from geometry: to slidethe robot’s end-effector along a surface, there is alwaysa friction-dependent sector s of directions from which therobot can apply a force to accomplish the sliding. If thissector is calculated at intervals over a whole demonstration,the intersection of all sectors s i would signify a directionwhich can lead the end-effector through the whole demon-strated motion either in free space or in contact. We callsector s a set of desired directions and it is visualized for asingle time-instant in 2-D in Fig. 3. From Fig. 3 we also seethat the force estimated in Section III-B consists of G FFF = − F N F N F N − F µ F µ F µ (6)where F µ F µ F µ = | µF N F N F N | ( − ˆ v a ˆ v a ˆ v a ) is the force caused by Coulombfriction with µ being the friction coefficient, ˆ v a ˆ v a ˆ v a the actualdirection of motion and F N F N F N the normal force. As explainedin Section III-B, we ignore the acceleration from (6) andbuild the algorithm robust enough to withstand the error.Throughout this paper, we will use the circumflex (ˆ) notationto denote the normalization of a vector. From (6) and Fig. 3we can see that s is between the direction of contact force G FFF and the actual direction of motion ˆ v a ˆ v a ˆ v a .As a human cannot give a perfect demonstration, sector s must be extended perpendicularly to allow the calculation ofan intersection from real demonstrations in 3-D. We computethe vector (cid:15) which extends the sector s perpendicularly byig. 4: Illustration of expanding 2-D sector s into 3-D set ofdirections P in (7) and (8). Continuous lines represent the vectorsand dotted lines highlight the pyramid shape.(a) 2 compliant axes (b) 1 compliant axis Fig. 5:
Unit spheres in the coordinate system for determining thecompliant axes. Origin (black circle) represents ˆ v ˆ v ˆ v ∗ d , green crossesare the corresponding ˆ v a ˆ v a ˆ v a of each demonstration, blue line visualizesthe model for 1 compliant axis and red cross the identified compli-ant direction when the model with 1 compliant axis is chosen. α degrees (cid:15)(cid:15)(cid:15) = tan α G ˆ F ˆ F ˆ F × ˆ v a ˆ v a ˆ v a | G ˆ F ˆ F ˆ F × ˆ v a ˆ v a ˆ v a | . (7)Now, by adding (cid:15) and − (cid:15) to both ˆ F ˆ F ˆ F and ˆ v a ˆ v a ˆ v a , we can constructpolyhedron P defining the set of desired directions at eachtime instant. This is computed in (8) and visualized in Fig. 4. P k = (cid:104) ˆ v a ˆ v a ˆ v a + (cid:15)(cid:15)(cid:15) ˆ v a ˆ v a ˆ v a − (cid:15)(cid:15)(cid:15) G ˆ F ˆ F ˆ F − (cid:15)(cid:15)(cid:15) G ˆ F ˆ F ˆ F + (cid:15)(cid:15)(cid:15) (cid:105) T . (8)Essentially, ˆ v ˆ v ˆ v ∗ d for the whole motion is found by projectingeach polyhedron P k into a 2-D polygon, calculating theintersection of the polygons with outlier rejection and, finally,projecting a chosen point from the intersection back into a 3-D vector, which is the final chosen ˆ v ˆ v ˆ v ∗ d . More details about thisprocess, such as the exact projection algorithms, can be foundfrom [13]. If there are more than one demonstrations, theforce and position data are concatenated and the intersectionis calculated over all the demonstrations, naturally combiningmultiple demonstrations.
2) Finding the compliant axes:
After finding the desireddirection, we need to find the compliant axes to construct K d such that the task can be reproduced. Our key idea is toassume that if motion is observed in other directions besidesthe desired direction ˆ v ˆ v ˆ v ∗ d , this motion is caused by the envi-ronment and therefore compliance is required. We assumethat if compliance is required in a direction, the stiffnessin that direction should be zero, i.e., the corresponding axisis fully compliant. This leads to the requirement that thecompliant axes need to be perpendicular to ˆ v ˆ v ˆ v ∗ d , since nomotion can be commanded along a direction where stiffnessis zero. This requirement restricts our search into directionsperpendicular to ˆ v ˆ v ˆ v ∗ d , which when projected into 2-D and normalized form a unit sphere visualized in Fig. 5. Fromthis unit sphere we find the number of compliant axes. Zerocompliant axes are detected when there were no deviationsfrom the desired direction, i.e., all the observed motion ˆ v a ˆ v a ˆ v a was along ˆ v ˆ v ˆ v ∗ d , signalling free space motion. In such a case,the green crosses in Fig. 5 would be on top of the blackcircle. When ˆ v a ˆ v a ˆ v a deviates from ˆ v ˆ v ˆ v ∗ d along a single line in thiscoordinate system, there has been deviation from ˆ v ˆ v ˆ v ∗ d alonga single axis and, therefore, one compliant axis is required,as in Fig. 5b. The direction of this axis can be obtained byprojecting the intersection of the axis and unit sphere backto 3-D. If the deviation from ˆ v ˆ v ˆ v ∗ d is not along a single line,both axes perpendicular to ˆ v ˆ v ˆ v ∗ d must be compliant, as in Fig.5a. The whole process is shown in Algorithm 1. Again, werefer the reader to [13] for a more detailed explanation.To compute numerical values for different number ofcompliant axes, we calculate the likelihoods for each caseon rows 3-8 in Algorithm 1. This is done by utilizingPrincipal Component Analysis (PCA) approximations of dif-ferent ranks. Rank 0 approximation corresponds to 0 matrix,and therefore the likelihood is based on the distance fromorigin. Rank 1 approximation corresponds to a 1st degreepolynomial fit in v a v a v a , and rank 2 corresponds to maximumlikelihood since it perfectly explains the data. In this case,the direction of compliance is not required from U , sincethe whole plane perpendicular to ˆ v ˆ v ˆ v ∗ d must be compliant.Finally, we wish to encourage the use of simpler models.We take inspiration from Bayesian Information Criterion(BIC) [34], which is defined BIC = ln( n ) k − L ) (9)where n is the number of data points, k the number ofparameters and L the likelihood of a model. We choose themodel with the lowest BIC value as the number of requiredcompliant axes on row 8 of Algorithm 1, and in case of 1compliant axis also the direction. The details of writing thestiffness matrix K d can be found in [26]. We note that this isnot the typical use of BIC where n (cid:29) k and variance in thelikelihood is calculated from the data, but we assume that theuncertainty of demonstrations can be estimated beforehand. Algorithm 1
Finding the required number of compliant axesand their directions. Set of mean actual directions of each i demonstration v a v a v a Rotate and project v a v a v a to 2-D such that the origin represents ˆ v ˆ v ˆ v ∗ d for d = 0 : 2 axes of compliance do for v i v i v i in v a v a v a do (cid:15) di (cid:15) di (cid:15) di = ( I − U d ) v i v i v i where U d = rank d PCA approximation of v a v a v a end for L d = (cid:81) i N ( (cid:15) di (cid:15) di (cid:15) di | , Σ) Calculate
BIC d with (9) end for D = arg min d BIC d if D = 1 then Project U d back to 3-D end if V. EXPERIMENTS AND RESULTSThis section demonstrates the suitability and effectivenessof the proposed method to perform LfD with hydraulic ma-nipulators. First, Section IV-A demonstrates the learningmethod by recording data from the motion of sliding alonga stack of wooden pallets (see Fig. 6a), from which wegathered four demonstrations via the teleoperation methoddescribed in Section III-C. Then, Section IV-B shows thatwith the parameters learned from the demonstrations we canreproduce the motion in the wooden pallet environment (inFig. 6a) but also in another environment, where the styrofoamsheets were used (see Fig. 6b) to change the properties ofthe original environment.
In the below sections, the Cartesianposition/force data refers the data in relation to the systembase frame { B } ; see Fig. 6.The two-DOF hydraulic manipulator (in Fig. 6) has themaximum reach of approximately 3.2 m and the payloadof 475 kg is attached to its tip. Phantom Premium 3.0haptic device is used as the electric master manipulator inthe realized bilateral teleoperation system. For the real-timecontrol system implementation, the following componentswere used: DS1005 processor board, DS3001 incrementalencoder board, DS2103 DAC board, DS2003 ADC board,and DS4504 100 Mb/s ethernet interface. The remaining ofthe hardware implementations can be found in [7] or [8]. A. Learning
Our goal is to validate that the desired direction ˆ v ˆ v ˆ v ∗ d ,computed with the algorithm described in Section III-D.1,can reproduce the demonstrated motion. Since the slave ma-nipulator of the experiments is two-DOF, ˆ v ˆ v ˆ v ∗ d is a 2-D vector.Thus, we can assume there is no deviation perpendicularto s , and adding (cid:15) in (8) is not required. Otherwise, thecomputation is performed similarly as it would be for a 3-Dvector (with 0 values for z-axis).The position and force data from the four demonstrationsof sliding the manipulator tip along a wooden pallet (see Fig.6a) are plotted in Fig. 7. The original measurement frequencywas 500Hz, which we averaged to 25Hz. Fig. 8 visualizes in2-D the directions of motion and the directions of estimatedforces in 25Hz (i.e. the edges of P from Fig. 4 in 2-Dwithout (cid:15) ) from demonstrations 1 and 2, starting the momentwhen contact force was detected. As expected, in this simple Wooden pallets
Styrofoam sheets a) b) { B } { B } Fig. 6: a) Experiment setup with wooden pallets. b) Experimentsetup with styrofoam sheets and wooden pallets. The manipulator’sposition in the figures show the starting point of the test trajectories(same starting position in the both cases). Time [s]
Cartesian position data
Data set 1
Cartesian contact force along the X-axisCartesian contact force along the Y-axis Time [s]
X [m] -0.85-0.87 Y [ m ] F o r ce [ k N ] -1.001.0 -1.0 F o r ce [ k N ] -2.0 Data set 2Data set 3 Data set 4 -0.86
Fig. 7:
The learning data. The learning data was gathered by usingthe wooden pallets in Fig. 6a as the contact environment. setup the desired direction is detected approximately in themiddle of these limitations, corresponding to the center ofintersection between sectors s from Fig. 3. ˆ v ˆ v ˆ v ∗ d computedfrom demonstrations 3 and 4 produced a very similar result.The expected results are achieved despite the difficulties inthe force measuring explained in Section III-B, showing therobustness of the algorithm.In addition, we wanted to validate that the method canfind the number of compliant axes which, together with thedesired direction, can reproduce the demonstrated motion. In2-D, this corresponds to choosing between 0 and 1 compliantaxis. Fig. 9 visualizes steps from the process of choosing thecompliant axes. Fig. 9a is in the same coordinate system asFig. 5, with the actual direction of motion from demonstra-tions 1 and 2 plotted in green crosses: in this simple setup,the directions of motion between the demonstrations areclose enough to each other that the two green crosses areoverlapping but not near the origin, signalling the need forone compliant axis.To properly validate the choice of 1 compliant axis, wecomputed the BIC values with all 2-demonstration pairs fromthe four available demonstrations. The results are in Fig. 9b.Due to the similarity of demonstrations, the values are againoverlapping heavily, but it can be seen that the differencebetween 0 and 1 compliant axes is clear. As required, thedirection of this axis is perpendicular to ˆ v ˆ v ˆ v ∗ d in the 2-D plane.Due to the inaccuracies in the manipulator’s inverse kine-matics, a small stiffness value has been put on the compliantFig. 8: Fig. 9: (a) Average directions of motion of demonstrations 1 and2 (green crosses overlapping each other) in the same coordinatesystem as in Fig. 5: (b) BIC values of 0 (blue) and 1 (red)compliant axis based on likelihoods from Algorithm 1. All 6 two-demonstration combinations of the 4 recorded demonstrations areplotted, and due to the similar values are overlapping heavily. Themodel with smaller BIC value is chosen. axis (the stiffness value of / on the axis correspond-ing ˆ v ˆ v ˆ v ∗ d ). Consequently, K d = k stiff [0 .
75 0 .
40; 0 .
40 0 . isobtained from the learning data using the proposed learningalgorithm, and the value of k stiff = 4 × is used in theexperiments. Furthermore, D d = diag(2 . , . × / s isused for the desired damping (in line with the damping alongthe compliant axis used in [8]). Finally, using the above K d , D d , and Condition 1, the learned control matrix ΛΛΛ χ andcontrol matrix ΛΛΛ f can be written as ΛΛΛ χ = (cid:20) .
00 8 . .
72 5 . (cid:21) m / sm , ΛΛΛ f = (cid:20) . . (cid:21) − m / sN . Note that
ΛΛΛ χ and ΛΛΛ f qualify as positive-definite matrices,satisfying Condition 1. This verifies that the proposed learn-ing algorithm provides a theoretically sound method to com-pute the impedance control matrix
ΛΛΛ χ from the learned K d . B. Reproduction of motion
To verify that the taught motions (in Fig. 7) can bereproduced with the learned parameters, two test cases ( TC s)with different environment stiffness (shown in Fig. 6) wereperformed. In TC 1 , the environment was the same woodenpallet environment as was used in the learning (see Fig. 6a).
In TC 2 , two wooden pallets were replaced with a pile ofstyrofoam sheets (see Fig. 6b) making the environment morecompliant in relation to TC 1.Fig. 10 shows the manipulator behavior with the learnedparameters in both TCs. In this figure, the data for theexperiment with the wooden pallets (see Fig. 6a) is givenin black and the data for the experiment with the woodenpallets and styrofoam sheets (see Fig. 6b) is given in grey.The first plot in Fig. 10 shows the Cartesian position pro-file (in X-Y space) in both TCs. The starting position of themanipulator in both TCs is shown with a circle (and can beseen also in Fig. 6). As the plot demonstrates, in both TCsthe taught motion can be reproduced with the learned param-eters; see relation to Fig. 7. When contact to the environmentwas established, it was retained all the time during themotion in both TCs. Furthermore, in TC 2, the manipulatorpenetrates deeper along the Y-axis (in relation to TC 1) dueto more compliant environment. This was anticipated. Time [s]
Cartesian position
Wood
Cartesian contact force along the X-axisCartesian contact force along the Y-axis
X [m] -0.86 -0.88 Y [ m ] F o r ce [ k N ] -1.0 F o r ce [ k N ] -1.5 Wood & Styrofoam -0.84-0.90-0.5 1 3 52
Time [s]
WoodWood & Styrofoam Wood
Wood & Styrofoam = Starting position
Fig. 10:
The experimental results with the proposed learning.
The second and third plots in Fig. 10 show the (estimated)Cartesian forces along the X- and Y-axes, respectively, inboth TCs. As the plots show, almost identical force behaviorswere identified in both TCs despite different compliance inthe test environments. Furthermore, the force levels alongthe X- and Y-axes in Fig. 10 correspond well in relation tothe average force levels during the teaching (see Fig. 7).In view of the results in Fig. 10, it can be concluded thatthe proposed method provides efficient tools for LfD withheavy-duty hydraulic manipulators, despite all challengesdiscussed in Sections I and III.
V. CONCLUSIONS AND FUTURE WORKWe presented, to the authors’ knowledge, the first resultson how LfD for in-contact tasks can be performed on ahydraulic manipulator. We showed how to overcome threesignificant obstacles restricting LfD implementation to hy-draulics, namely how 1) to perform force-reflected bilateralteleoperation on a hydraulic manipulator while estimatingthe contact forces between tool and environment, 2) to learnfrom teleoperated demonstrations, which are noisier thandemonstrations gathered with kinesthetic teaching, and 3)to reproduce the learned motions with a stability-guaranteedimpedance controller for a hydraulic manipulator.Even though the test setup is simple, it demonstrates wellthe strength of the work. With the force-reflected bilateralteleoperation, even a non-expert can perform demonstra-tions without breaking the manipulator or the environment.Demonstrations of different lengths, which cannot be avoidedon many real-world hydraulic applications, are not an issuedue to the learning method combining them through inter-sections. Our controller can keep the applied force similareven though the characteristics of the setup vary, makingthe use of such heavy-duty machines safer. The downside ofthe learning algorithm in this paper is that it cannot learnnonlinear free-space motions; however, often in heavy-dutymanipulation tasks such motions are not required.This paper adjusts and combines existing methods todemonstrate that hydraulic manipulators can be efficientlytaught to autonomously perform motions requiring contactith the environment in the presence of positional uncer-tainty and varying environment conditions. Hydraulic heavy-duty machines are common in tasks which require heavylifting, but the current LfD-research is concentrating mainlyon electric manipulators and physically light tasks. On theother hand, research regarding hydraulics is often concentrat-ing on the traditional control problems and not consideringwhether learning could be used to further facilitate the usageof hydraulic heavy-duty machines. We believe that this paperis an important milestone in bridging the gap between thesetwo faculties and widening the view of both hydraulics andlearning communities, as well as showing hydraulic industrythat learning methods can be applied to heavy-duty machines.Our future work consists of generalizing the method toa 6-DOF hydraulic manipulator and possibly another kindof hydraulic machine, such as an excavator. In addition,we plan to adapt the method from [15] to work withhydraulic manipulators to perform whole tasks required in,for example, earthmoving or heavy maintenance.R
EFERENCES[1] A. Muhammad et al. , “Development of water hydraulic remotehandling system for divertor maintenance of ITER,” in
IEEE 22ndSymposium on Fusion Engineering , pp. 1–4, June 2007.[2] H. Saarinen et al. , “Results of CMM standalone tests at DTP2,”
FusionEngineering and Design , vol. 86, no. 9, pp. 1907–1910, 2011.[3] J. Mattila, J. Koivum¨aki, D. Caldwell, and C. Semini, “A survey oncontrol of hydraulic robotic manipulators with projection to futuretrends,”
IEEE/ASME Trans. Mechatronics , vol. 22, no. 2, pp. 669–680, 2017.[4] N. Hogan, “Impedance control: An approach to manipulation: PartsI-III,”
J. Dyn. Syst. Meas. Control , vol. 107, no. 1, pp. 1–24, 1985.[5] J. Canny and J. Reif, “New lower bound techniques for robot motionplanning problems,” in
Foundations of Computer Science, 1987., 28thAnnual Symposium on , pp. 49–60, IEEE, 1987.[6] M. Sirouspour and S. Salcudean, “Nonlinear control of hydraulicrobots,”
IEEE Trans. Robot. Autom. , vol. 17, pp. 759–765, Apr. 2001.[7] J. Koivum¨aki and J. Mattila, “Stability-guaranteed force-sensorlesscontact force/motion control of heavy-duty hydraulic manipulators,”
IEEE Trans. Robot. , vol. 31, no. 4, pp. 918–935, 2015.[8] J. Koivum¨aki and J. Mattila, “Stability-guaranteed impedance controlof hydraulic robotic manipulators,”
IEEE/ASME Trans. Mechatronics ,vol. 22, no. 2, pp. 601–612, 2017.[9] M. Krsti´c, I. Kanellakopoulos, and P. Kokotovi´c,
Nonlinear andAdaptive Control Design . John Wiley & Sons, Inc., 1995.[10] J.-J. E. Slotine and W. Li,
Applied Nonlinear Control . Prentice-HallEnglewood Cliffs, NJ, 1991.[11] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A surveyof robot learning from demonstration,”
Robotics and autonomoussystems , vol. 57, no. 5, pp. 469–483, 2009.[12] K. Fischer, F. Kirstein, L. C. Jensen, N. Kr¨uger, K. Kukli´nski, T. R.Savarimuthu, et al. , “A comparison of types of robot control forprogramming by demonstration,” in
The Eleventh ACM/IEEE Int.Conf. on Human Robot Interaction , pp. 213–220, IEEE Press, 2016.[13] M. Suomalainen and V. Kyrki, “A geometric approach for learningcompliant motions from demonstration,” in
IEEE-RAS 17th Int. Conf.on Humanoid Robots (Humanoids) , pp. 783–790, 2017.[14] O. Kroemer, H. Van Hoof, G. Neumann, and J. Peters, “Learning topredict phases of manipulation tasks as hidden states,” in
IEEE Int.Conf. on Robotics and Automation (ICRA) , pp. 4009–4014, 2014.[15] T. Hagos, M. Suomalainen, and V. Kyrki, “Segmenting and sequencingof compliant motions,” in
IEEE/RSJ Int. Conf. on Intelligent Robotsand Systems (IROS) , 2018, Accepted for publication.[16] J. Koivum¨aki and J. Mattila, “High performance non-linear mo-tion/force controller design for redundant hydraulic construction craneautomation,”
Automation in Construction , vol. 51, pp. 59–77, 2015.[17] P. F. Hokayem and M. W. Spong, “Bilateral teleoperation: An histor-ical survey,”
Automatica , vol. 42, no. 12, pp. 2035 – 2057, 2006. [18] S. Lampinen, J. Koivum¨aki, and J. Mattila, “Full-dynamics-basedbilateral teleoperation of hydraulic robotic manipulators,” in
Proc. ofthe 14th IEEE Int. Conf. on Automation and Systems Engineering(CASE) , 2018. [Accepted].[19] S. Lampinen, J. Koivum¨aki, and J. Mattila, “Bilateral teleoperation ofa hydraulic robotic manipulator in contact with physical and virtualconstraints,” in
Proc. of Bath/ASME Symp. on Fluid Power and MotionControl (FPMC2018) , 2018. [Accepted].[20] S. Salcudean et al. , “Bilateral matched-impedance teleoperation withapplication to excavator control,”
IEEE Control Systems , vol. 19, no. 6,pp. 29–37, 1999.[21] S. Tafazoli et al. , “Impedance control of a teleoperated excavator,”
IEEE Trans. Control Syst. Technol. , vol. 10, no. 3, pp. 355–367, 2002.[22] S. Schaal, “Dynamic movement primitives-a framework for motorcontrol in humans and humanoid robotics,” in
Adaptive Motion ofAnimals and Machines , pp. 261–280, Springer, 2006.[23] M. Racca, J. Pajarinen, A. Montebelli, and V. Kyrki, “Learning in-contact control strategies from demonstration,” in
IEEE/RSJ Int. Conf.on Intelligent Robots and Systems (IROS) , pp. 688–695, 2016.[24] A. Kramberger et al. , “Generalization of orientation trajectories andforce-torque profiles for robotic assembly,”
Robotics and AutonomousSystems , vol. 98, pp. 333–346, 2017.[25] A. Pervez, A. Ali, J.-H. Ryu, and D. Lee, “Novel learning fromdemonstration approach for repetitive teleoperation tasks,” in
IEEEWorld Haptics Conference (WHC) , pp. 60–65, 2017.[26] M. Suomalainen and V. Kyrki, “Learning compliant assembly motionsfrom demonstration,” in
IEEE/RSJ Int. Conf. on Intelligent Robots andSystems (IROS) , pp. 871–876, 2016.[27] W.-H. Zhu and S. E. Salcudean, “Stability guaranteed teleoperation:an adaptive motion/force control approach,”
IEEE Transactions onAutomatic Control , vol. 45, pp. 1951–1969, Nov 2000.[28] W.-H. Zhu,
Virtual Decomposition Control - Toward Hyper Degreesof Freedom Robots . Springer-Verlag, 2010.[29] W.-H. Zhu et al. , “Virtual decomposition based control for generalizedhigh dimensional robotic systems with complicated structure,”
IEEETrans. Robot. Autom. , vol. 13, no. 3, pp. 411–436, 1997.[30] W.-H. Zhu and J. Piedboeuf, “Adaptive output force tracking controlof hydraulic cylinders with applications to robot manipulators,”
J. Dyn.Syst. Meas. Control , vol. 127, pp. 206–217, June 2005.[31] C. Semini et al. , “Towards versatile legged robots through activeimpedance control,”
The International Journal of Robotics Research ,vol. 34, no. 7, pp. 1003–1020, 2015.[32] J. Vihonen, J. Honkakorpi, J. Tuominen, J. Mattila, and A. Visa, “Lin-ear accelerometers and rate gyros for rotary joint angle estimation ofheavy-duty mobile manipulators using forward kinematic modeling,”
IEEE/ASME Trans. Mechatronics , vol. 21, no. 3, pp. 1765–1774, 2016.[33] R. Jazar,
Theory of Applied Robotics: Kinematics, Dynamics, andControl . Springer, 2010.[34] G. Schwarz et al. , “Estimating the dimension of a model,”
The annalsof statistics , vol. 6, no. 2, pp. 461–464, 1978. A PPENDIX IT HE P ROOF FOR T HEOREM
ΛΛΛ f = D − and ΛΛΛ χ = D − K d in Condition 1, it yields ˙ χχχ r = ˙ χχχ d + ΛΛΛ χ ( χχχ d − χχχ ) − ΛΛΛ f [ D d ( ˙ χχχ d − ˙ χχχ ) + K d ( χχχ d − χχχ )]= ˙ χχχ d + D − K d ( χχχ d − χχχ ) − D − D d ( ˙ χχχ d − ˙ χχχ ) − D − K d ( χχχ d − χχχ )= ˙ χχχ. (10)Then, using (3), (10) and Condition 1 yields ˙ χχχ r = ˙ χχχ d + ΛΛΛ χ ( χχχ d − χχχ ) + ΛΛΛ f ( G fff d − G fff ) ⇔ G fff d − G fff = − ΛΛΛ − f ( ˙ χχχ d − ˙ χχχ r ) − ΛΛΛ − f ΛΛΛ χ ( χχχ d − χχχ ) ⇔ G fff d − G fff = − D d ( ˙ χχχ d − ˙ χχχ ) − D d D − K d ( χχχ d − χχχ ) ⇔ G fff d − G fff = − D d ( ˙ χχχ d − ˙ χχχ ) − K d ( χχχ d − χχχ ) . (11)In (11), the first row equals to (3), whereas the last rowequals to (2). This completes the proof for Theorem 1.(11)In (11), the first row equals to (3), whereas the last rowequals to (2). This completes the proof for Theorem 1.