Evaluating direct transcription and nonlinear optimization methods for robot motion planning
Diego Pardo, Lukas Möller, Michael Neunert, Alexander W. Winkler, Jonas Buchli
EEvaluating direct transcription and nonlinear optimization methods for robotmotion planning
Diego Pardo, Lukas M¨oller, Michael Neunert, Alexander W. Winkler and Jonas Buchli
Abstract — This paper studies existing direct transcriptionmethods for trajectory optimization applied to robot motionplanning. There are diverse alternatives for the implemen-tation of direct transcription. In this study we analyze theeffects of such alternatives when solving a robotics problem.Different parameters such as integration scheme, number ofdiscretization nodes, initialization strategies and complexity ofthe problem are evaluated. We measure the performance of themethods in terms of computational time, accuracy and qualityof the solution. Additionally, we compare two optimizationmethodologies frequently used to solve the transcribed problem,namely Sequential Quadratic Programming (SQP) and InteriorPoint Method (IPM). As a benchmark, we solve differentmotion tasks on an underactuated and non-minimal-phaseball-balancing robot with a 10 dimensional state space and 3dimensional input space. Additionally, we validate the resultson a simulated 3D quadrotor. Finally, as a verification of usingdirect transcription methods for trajectory optimization on realrobots, we present hardware experiments on a motion taskincluding path constraints and actuation limits.
I. I
NTRODUCTION
Numerical methods for trajectory optimization (TO) [1]have received considerable attention from the robotics com-munity in recent years. These methods are especially con-venient for motion planning and control problems on highdimensional systems with complex dynamics [2], [3]. In suchsystems, using common approaches separating kinematicplanning and control struggles in finding plausible solutionsand tends to produce inefficient and artificial motions. Incontrast to designed or motion captured references, TOmethods generate dynamically feasible motions maximizinga performance criterion while satisfying a set of constraints.This is an appealing approach when dealing with nonlinearand unstable systems like legged or balancing robots, wherean optimal solution cannot be obtained analytically from thenecessary conditions of optimality.There are different types of methods to numerically solvethe TO problem (direct, indirect, single shooting, multipleshooting). The classification of these methods is out of thescope of this paper [1]. Here we are interested in studyingthe performance of the family of methods known as direct
This research has been funded through a Swiss NationalScience Foundation Professorship award to Jonas Buchli andby the Swiss National Centre of Competence in ResearchNCCR - Robotics. Diego Pardo { [email protected] } ,Lukas M¨oller { [email protected] } , MichaelNeunert { [email protected] } , Alexander Winkler { [email protected] } , Jonas Buchli { [email protected] } arewith the Agile & Dexterous Robotics Lab at the Institute of Robotics andIntelligent Systems, ETH Z¨urich, Switzerland. A preprint of this paper isavailable at Arxiv:1504.05803. This manuscript includes additional results. transcription when applied to systems with high nonlinear-ities and unstable dynamics. Recent results in the area ofwhole-body dynamic motion planning have demonstrated thepotential of these methods [4], [5], [3], where very dynamicand complex motions have been obtained. Interestingly, tothe best of our knowledge there are no reports of hardwareexperiments of direct transcription methods on floating-baseand naturally unstable robots.In direct transcription the state and control trajectoriesare discretized over time, and an augmented version of theoriginal problem is stated over the values of the trajectories atthe discrete points or nodes [6], [7]. Therefore, constraintsenforcing the system dynamics between nodes need to beincluded. The resulting problem can be solved by a nonlinearprogramming (NLP) solver. There are many approaches todirect transcription, mainly differing in the method used toenforce the dynamic constraints.During robot motion planning, the procedure consists offolding the problem into a continuous TO framework andsubsequently find optimal trajectories using direct transcrip-tion and an NLP solver. The main challenge of TO forrobotics applications is to adequately integrate all compo-nents of the problem into this formulation (e.g. contactforces [3], contact dynamics [5]). However, the transcriptionand solver stages are often treated as black box processeswith few degrees of freedom for the robotics researcher.Specialized software is available to apply direct transcriptionon general optimal control problems (e.g., PSOPT [8]), and anumber of NLP solvers exist (e.g. SNOPT [9], IPOPT [10]),based on mature methods in mathematical optimization.One of the main disadvantages of direct transcriptionis the augmented size of the subsequent NLP, making itssolution computationally expensive. This is especially truefor robotics problems with high dimensional systems withcontact forces or obstacle constraints. The accuracy of thesolution is also a major concern, as the dynamics of the robotare approximated in between the nodes and the resulting planmight not be feasible on the real robot. Given such difficul-ties, it is worth to evaluate the impact of choosing differenttypes of dynamic constraints, number of discretization nodesand NLP solvers on the accuracy and quality of the solution.Moreover, it is also important to measure the performanceof the different implementations for tasks with diverse levelsof complexity.The main contribution of this paper is the quantificationof the performance of various direct transcription methods inan experimental study. It is also shown that these methodsare sufficiently fast so online motion planning on robotic a r X i v : . [ m a t h . O C ] J a n ardware comes in reach. This work also investigates theperformance of two types of methods, namely SequentialQuadratic Programming (SQP) and Interior Point Methods(IPM), for solving the corresponding NLP problem. Thispaper does not present a survey or comparison of methodsfor solving TO problems [11], nor does it evaluate availableoptimization software packages or compare NLP solvers[12]. Instead, we use an unstable ball-balancing robot [13](10 states, 3 inputs) to compare the performance of thesetechniques in terms of computational time, accuracy andquality of the solution. Influences of the complexity of thetask and NLP initialization are also evaluated. Moreover, theresults of the evaluation are validated using a different robotand task (3D quadrotor, 12 states, 4 controls), indicating howresults scale to other systems.This paper is organized as follows. Section II presents atheoretical review of direct transcription methods and NLPsolvers. In Section III, the methodology designed to evaluateand compare the methods is described. Information aboutthe model of the robot used in this study, together witha description of the cost function and feedback controllerused during the benchmark tasks are provided in Section IV.Results are shown in Section V and analyzed and discussedin Section VI. Section VII presents the conclusions.II. B ACKGROUND
In this section we describe direct transcription as well asthe key concepts supporting SQP and IPM solvers.
A. Direct transcription for trajectory optimization
In general, the system dynamics of a nonlinear robot canbe modeled by a set of differential equations, ˙ x ( t ) = f ( x ( t ) , u ( t )) , (1)where x ∈ R n represents the system states and u ∈ R m thevector of control actions. The transition function f ( · ) definesthe system evolution in time. A single phase TO problemconsists in finding a finite-time input trajectory u ( t ) , ∀ t ∈ [0 , T ] , such that a given criteria is minimized, J = Ψ( x ( T )) + (cid:90) T ψ ( x ( t ) , u ( t ) , t ) dt (2)where ψ ( · ) and Ψ( · ) are the intermediate and final costfunctions respectively. The optimization may be subject toa set of boundary and path constraints, φ p,min ≤ φ p ( x ( t ) , u ( t ) , t ) ≤ φ p,max (3)and bounds on the state and control variables x min ≤ x ( t ) ≤ x max , u min ≤ u ( t ) ≤ u max . (4)Direct transcription translates this continuous formulationinto an optimization problem with a finite number of vari-ables. The set of decision variables, y ∈ R p , includes thediscrete values of the state and control trajectories at certainpoints or nodes . Therefore, y = { x k , u k } , for k = 1 , ..., N .Moreover, the set of decision variables can be augmentedwith additional parameters to be optimized. For instance, in the results presented in this paper the time between nodes ∆ T = t k +1 − t k has been included as a decision variable.The resulting NLP is then formulated as follows, min y f ( y ) s.t. ζ ( y ) = 0 g min ≤ g ( y ) ≤ g max y min ≤ y ≤ y max , (5)where f ( · ) is a scalar objective function which in ourimplementation is given by a quadrature formula approxi-mating (2), whereas the boundary and path constraints in(3) are gathered in g ( · ) . The NLP also includes bounds onthe decision variables. Additionally, a vector of dynamicconstraints or defects ζ ( · ) ∈ R ( N − n is added to verifythe system dynamics at each interval.As a consequence of the discretization, the resulting NLPis considerably large. This fact is partially compensated bythe sparsity of the resulting problem, as such structuralproperty is handled particularly well by large-scale NLPsolvers [9], [10].Direct transcription methods mainly differ in the waythey formulate the dynamic constraints in (5). The simplestdynamic constraint is given by Euler’s integration rule, ζ k = x k +1 − x k − f ( x k , u k )∆ T = 0 . (6)There are other approaches using different implicit integra-tion rules. For example, using a trapezoidal [1] interpolationfor the dynamics, the defects are then given by x k +1 − x k − ∆ T f ( t k ) + f ( t k +1 )] = 0 , (7)where the notation f ( t i ) = f ( x i , u i ) has been adoptedfor simplicity. In the original formulation of the schemeknown as direct collocation [7] [6] piecewise cubic Hermitefunctions are used to interpolate the states between nodes.In this method the difference between the system dynamicsand the derivative of the Hermite function at the middle ofthe interval, ( t = t c ) , is used as dynamic constraint. Thisapproach is equivalent to a Simpson’s integration rule, x k +1 − x k − ∆ T f ( t k ) + 4 f ( t c ) + f ( t k +1 )] = 0 . (8)Euler’s and trapezoidal integration rules are computationallyless expensive than the one given in (8). Therefore, the in-tegration scheme influences the accuracy of the solution andalso the possibility of the solver of finding a feasible solution.To mitigate these issues a more refined discretization, i.e.more nodes, might be necessary. In return however, the sizeof the problem and potentially the time required to find asolution is increased. Depending on the ability of the solverto handle the sparsity in the structure of the problem, thisincrease in complexity may be partially absorbed by thesolver itself. Still, there is a compromise between simplicityof the integration rule, number of nodes and efficiency of thesolver to be made. . NLP solvers Nonlinear programming solvers are in a mature state ofdevelopment given the vast experience of the field of math-ematical optimization. Here we evaluate two representativeinstances of NLP solvers. The first solver in our comparisonis SNOPT which is based on Sequential Quadratic Program-ming. The second solver is IPOPT using the Interior PointMethod. The goal of the analysis is to evaluate the perfor-mance of both approaches within a robotics application. Thecomplete description of the solvers and the algorithms liesoutside of the scope of this paper. This section presents somefeatures aiming to describe the main differences betweenthem and to identify the nature of any difference in theirperformances. A complete comparison between these classesof solvers can be found in [12].
1) SNOPT (Sparse Nonlinear OPTimizer):
The basicstructure of this implementation of the SQP algorithm in-volves major and minor iterations. Major iterations advancealong a sequence of points y h that satisfy the set of linearconstraints in ζ ( y h ) and g ( y h ) . These iterations convergeto a point that satisfies the remaining nonlinear constraintsand the first-order conditions of optimality [9]. The directiontowards which the major iterations move is produced bysolving a QP subproblem. Solving this subproblem is aniterative procedure by itself (i.e. the minor iterations), basedon a Newton-type minimization approach.An important characteristic of all SQP algorithms is thatthey are ‘active set’. Roughly speaking, this means thatduring the iterative procedure all the inequality constraintsplay a very explicit role as the QP subproblem must estimatethe active set in order to find the search direction.
2) IPOPT:
This algorithm also depends on a Newtontype subproblem. Nevertheless, inequalities are handled ina different manner. A barrier function is used to keep thesearch as far as possible from the bounds of the feasibleset. The barrier parameters change along iterations, allowingproximity to the adequate constraint.III. E
VALUATION P ROCEDURE
Considering the elements presented in the previous sec-tion, there are several decisions to make in order to put allthe pieces together to solve a robot motion planning problemusing direct transcription.What integration method to use? What is the influenceof the number of nodes? Given a complex robotic problem(i.e. high state dimensionality, nonlinear dynamics and pathconstraints), how long does it take to find a solution? Canthis technique be implemented online? What type of solveris more favorable for TO in robotics? Can the resultingtrajectory be implemented on a real robot?We address these questions measuring the performance ofdifferent configurations of the method against a robot motionbenchmark.
A. Benchmark robot and tasks
The robot used for the benchmark is the ball balanc-ing robot Rezero [13], a so called ballbot. These kind of (0,0) xy (1,3)(0.5,1.25)r = 0.5 (0.625,2.5)(0,0) xy (1,3)(0.5,1.25)r = 0.5r = 0.25(0,0) xy (1,3) (a) (b) (c) Fig. 1. Illustration of the benchmark tasks (top view). Starting from theorigin the robot should reach the goal (star mark). Circle obstacles (blue)are included to increase the problem complexity. (a) Go-to task, (b) one-obstacle task and (c) two-obstacles task. Position and size of the obstaclesare expressed in meters. robots are essentially 3D inverted pendulums and henceare statically unstable, underactuated and non-minimal phasesystems. Due to this instability there is a complex interactionbetween the requirements for stable control and satisfyingconstraints.The benchmark consists of a set of three different motiontasks. Each task is a variation of a go–to task, i.e. therobot is supposed to move from an initial to a final spatiallocation avoiding fixed obstacles (see Fig. 1). State andcontrol trajectories are not pre-specified and should resultfrom the minimization of a cost function as well as from thesatisfaction of the boundary and path constraints. Details onthe implementation of the cost function and constraints arepresented in Section IV.
B. Problem components
The benchmark compares the performance of differentconfigurations of the direct transcription method. Table Isummarizes the variables evaluated in this study.Regarding the complexity of the task, each path constraintadded to (3) has to be verified at each node in (5), augment-ing the size and density of the gradients required to findan optimal solution. This complexity is also increased whenbounds on the state and control are considered. This is thecase for the hardware experiments presented later in SectionV-D. Here we measure the effect of including such boundsto the two-obstacle task, in a problem complexity labeled as‘two-obstacle-bounds’.At the same time, the performance also depends on theinitial values of the decision variables of the NLP. Here weevaluate three different initialization methods, namely Zero,Linear and Incremental. In the first method all variablesare naively set to zero before the optimization begins. Thelinear method initializes only the variables corresponding tothe states of the robot, using a linear interpolation betweenthe initial and desired final states. Finally, the incrementalmethod uses the optimal solution of a simpler task (e.g. lessconstrained) as initial values for all variables.
ABLE IA
NALYZED C OMPONENTS OF A D IRECT T RANSCRIPTION P ROBLEM
Integration Scheme TrapezoidalHermite-SimpsonProblem Complexity Go-toOne-obstacleTwo-obstacleTwo-obstacle-boundsNLP Initialization ZeroLinear interpolationIncrementalNumber of NodesNLP Solvers SNOPTIPOPT
C. Performance criteria
The ultimate goal of TO is to obtain state and controltrajectories that can be implemented on a real robot, such thata certain task is accomplished while an objective function isminimized. Accordingly, we measure the performance of themethods in terms of accuracy, solving time and quality ofthe solution. a) Accuracy:
The system dynamics are approximatedusing a piecewise polynomial function. A solution is onlyvalid if such an approximation is accurate. In case of perfectaccuracy, almost no stabilizing feedback control would berequired during simulation. We measure the accuracy ofthe solution using the mean squared error between theplanned and the actual total control trajectories measuredwhen the complete system is simulated (i.e. an indication ofthe divergence between planned ( u ∗ ) and total ( u ) controlsignals as defined in Section IV and represented in Fig. 2). b) Running time: Ideally, the motion planner should befast enough to compute trajectories online. However, depend-ing on the configuration, different number of iterations mightbe required to solve the NLP. Here we use the total time re-quired to find a feasible solution as indicator of performance.In contrast to the number of iterations, it provides an absolutevalue that can be used to compare among solutions obtainedwith different solvers and integration schemes. c) Quality of the solution:
The conditions of optimalityfor the continuous trajectory optimization problem are givenby the Pontryagin’s minimum principle. In direct transcrip-tion such conditions are approximated by the Karush-Kuhn-Tucker (KKT) necessary conditions for the correspondingNLP (see e.g., [1]). Continuous and discrete conditionsconverge as N → ∞ and ∆ T → . In this paper the KKTconditions are satisfied by any given solution with a toleranceof − . We compare the quality of the solutions in terms ofoptimality using the resulting value of the objective function.IV. R OBOT M ODEL AND C ONTROL S CHEME
A. Robot model
The non-linear model of the ballbot dynamics is describedin [13]. The robot is modeled as two rigid bodies, the torsoof the robot and the ball. The two bodies are linked by threeactuators. In our model, we neglect wheel dynamics and we assume that no slip or friction losses occur. The state vectorof this ballbot is defined as x = [ φ r ˙ φ r φ p ˙ φ p φ y ˙ φ y θ x ˙ θ x θ y ˙ θ y ] T , (9)where φ and ˙ φ represent the torso angles and velocities in theroll, pitch and yaw directions ( rpy ) . Furthermore, the stateincludes the rotational angles of the ball ( θ x , θ y ) as well astheir derivatives ( ˙ θ x , ˙ θ y ) , representing the ball position withrespect to the initial state. The control actions are defined bythe wheels’ input torques u = [ τ τ τ ] T .The desired goal state for all tasks is given by, x g = [0 0 0 0 0 0 8 0 24 0] T where the goal angles of the ball ( θ x = 8 rad, θ y = 24 rad)correspond to the desired spatial location ( x = 1 m , y = 3 m), given that the radius of the ball is r = 0 . m. B. Cost function, terminal conditions and total trajectorytime constraint
A quadratic function is used for the intermediate cost, i.e., ψ ( x k , u k ) = x Tk Qx k + u Tk Ru k . (10)Where ¯ x N = x N − x g is the difference between the laststate of the trajectory x N and the desired goal state x g . Thediagonal cost matrices Q, R weigh the contribution of thestate and control to the total cost. Setting those values of Q to zero that correspond to the rotational angles of the ball,tends to reduce the amount of time the robot is not in theupright pose, and at the same time use as little torque aspossible along the path.In direct transcription initial conditions are assumed tobe fixed. Additionally, in the simulations and experimentspresented in this paper, the desired goal state is added asa hard terminal condition to the NLP, (i.e., x N = x g ). Incase this condition is not satisfied the problem is consideredinfeasible. It is important to note that by using this hardconstraint the use of a final cost function Ψ( x N ) can beavoided.Finally, the time between nodes (∆ T ) is chosen to beconstant and included as a decision variable. A constraint isadded bounding the total trajectory time ( T ) , T min ≤ ( N − · ∆ T ≤ T max , (11)where the bounds are given as a task specification. For thisstudy we define that motions should be completed between T min = 1 . s, and T max = 3 . s. C. Trajectory stabilization
Despite using an elaborated integration scheme or a verysmall discretization time, solutions obtained using directtranscription are based on an approximation of the systemdynamics. Even in simulation very small numerical integra-tion errors push the system towards instability. Therefore, itis necessary to stabilize the system.In this work we use Time Variant Linear Quadratic Regu-lator - TVLQR (see e.g., [14]) to obtain a feedback controller. irectTranscription Robot
K(t) - - x t * û t x t u t * u t ++ Fig. 2. Complete control scheme. Feed forward control action is providedby the TO planner and a feedback stabilization control is computed usinga time varying gain matrix, K ( t ) , obtained via TVLQR. The derivation of this optimal controller is out of the scopeof this paper. However, a descriptive explanation is providedfor completeness: The dynamics of the system are linearizedaround the continuous optimal trajectory ( x ∗ t , u ∗ t ) using aTaylor approximation of (1). The state and control trajectoryerrors are defined as ˆ x t = x ∗ t − x t and ˆ u t = u ∗ t − u t . Thus,its dynamics are described as a linear time varying system ˙ˆ x t = A ( t )ˆ x t + B ( t )ˆ u t . (12)The state and input matrices are given by A ( t ) = ∂f ( x, u ) ∂x (cid:13)(cid:13)(cid:13)(cid:13) x ∗ t ,u ∗ t B ( t ) = ∂f ( x, u ) ∂u (cid:13)(cid:13)(cid:13)(cid:13) x ∗ t ,u ∗ t . By solving the TVLQR problem for the system in (12), amatrix of time dependent optimal feedback gains K ( t ) ∈ R m × n is obtained. The resulting control scheme is illustratedin Fig. 2. V. R ESULTS
The results presented in this section were obtained usinga standard laptop computer with 2.4Ghz Intel Core i5 (dualcore) processor with 4GB 133Mhz DDR3 RAM. Feasibilitytolerance of the NLP solvers was set to × − , meaningthat all the constraints, including the terminal condition ( x N = x g ) , are satisfied with an error bounded by this value. A. Optimal solution and stabilizer
Here we present the resulting state and control trajectoriesfor the goto task obtained with a typical configuration: 100Nodes, Hermite-Simpson integration scheme, linear initial-ization and SNOPT as NLP solver.Fig. 3 presents the optimal trajectories for the states. Thedesired state is reached in T = 3 . s. Similarly, Fig. 4 showsthe control trajectories and Fig. 5 shows the correspondingstabilizer gains obtained using TVLQR. B. Task complexity and initialization strategy
In this part, the performance of direct transcription fortasks of diverse complexity as well as the influence ofdifferent NLP initialization strategies are evaluated.Firstly, we observed that the two-obstacles task cannot besolved using zero initialization regardless of the number of time (s)0 0.5 1 1.5 2 2.5 3 x ( t ) -50510152025 States ? r _ ? r ? p _ ? p ? y _ ? y y _ y x _ x Fig. 3. State trajectories for the goto task. Solution found using N =100 nodes, linear initialization, Hermite-Simpson integration and SNOPTsolver. The optimal solution also includes the time increment between nodes ∆ T = 0 . s. All velocities and body angles converge to zero. time (s)0 0.5 1 1.5 2 2.5 3 u ( t ) [ N . m ] -4-3-2-101234 Control Input u $ u $ u $ Fig. 4. Control trajectories for the goto task. Solution found using N = 100 nodes, linear initialization, Hermite-Simpson integration and SNOPT solver. time (s)0 0.5 1 1.5 2 2.5 3 K ( t ) -40-200204060 TVLQR - K(t) ^ u ^ u ^ u Fig. 5. Feedback gains for stabilizing the goto task. Gains are grouped(line/color) by the corresponding control. Each feedback control (ˆ u i ) iscomputed adding the weighted contribution of the state error ( ˆ x ). o-to One-obstacle Two-obstacle Two-obstacle-bounds s o l v i ng t i m e [ s ] Solver Runtime Comparison - Linear Initialization
IPOPTSNOPT
Fig. 6. This plot compares the solving time for the different tasks andsolvers using linear initialization method. Number of nodes ( N = 100) and integration scheme (trapezoidal) are constant. SNOPT is not able tofind a solution for the real robot task given those conditions. One-obstacle Two-obstacle Two-obstacle-bounds s o l v i ng t i m e [ s ] Solver Runtime Comparison - Incremental Initialization
IPOPTSNOPT
Fig. 7. This plot compares the solving time for the different tasks andsolvers using incremental initialization. Number of nodes ( N = 100) andintegration scheme (trapezoidal) are constant. nodes, integration scheme or solver used. On the other hand,such a task can be solved using linear initialization. Fig.6 shows the solving time required by the solvers to find asolution to the different tasks using the same scheme (100nodes, trapezoidal interpolation and linear initialization). Itcan be seen that the complexity of the task affects the timerequired by the solvers. This is more significant in the case ofSNOPT. Moreover, for the two-obstacle-bounds complexity,SNOPT does not find a feasible solution before reaching themaximum number of iterations ( ), irrespective of theintegration scheme or number of nodes used.In Fig. 7 solving time for the different tasks and solversare shown for the case of incremental initialization. Decisionvariables are initialized with the solution of the previous task.As expected in this type of initialization, the solving timeis lower than those shown in Fig.6. Moreover, in this caseSNOPT is able to find a solution for the two-obstacle-boundscomplexity. Apart from the variation on running time andfeasibility reported in this section, no significant differenceswere observed on the accuracy or quality of the solutiongiven different initialization methods. C. Integration scheme
In this section we show the influence of the integrationscheme and solvers in terms of run time, accuracy and qualityof the solution. In order to obtain the corresponding data weuse the one-obstacle task using zero initialization.Fig. 8 compares the runtime for the integration schemes incombination with both solvers. While the absolute values ofthese measures are highly dependent on the CPU capacity,this section is concerned with their relative values. Witha low number of nodes (i.e.
N < ) both solvers showsimilar performance. However, with an increasing number ofnodes the computational complexity of SNOPT is higher and s o l v i ng t i m e [ s ] -2 -1 Solving Time
Hermite - IPOPTHermite - SNOPTTrapez. - IPOPTTrapez. - SNOPT
Fig. 8. This plot shows the runtime of the different combinations ofintegration methods and solvers. For a low number of nodes, differencesare small. However, with an increasing number of nodes, the runtimes ofSNOPT are above those of IPOPT and also rise faster. M SE [ ( N . m ) ] -10 -5 Accuracy
Hermite - IPOPTHermite - SNOPTTrapez. - IPOPTTrapez. - SNOPT
Fig. 9. This plot shows the accuracy of the solution for both solvers andintegration schemes. As indicator of accuracy we use the mean squared error(MSE) between the planned and the total control required to simulate thetask. It can be seen that Hermite integration is significantly more accuratethan trapezoidal integration. increases faster than the one obtained with IPOPT. For bothsolvers, Hermite integration requires more CPU time thantrapezoidal integration, but the difference is not significant.Fig. 9 shows the divergence, in terms of mean squarederror, between the planned and total control trajectories(during simulation). As it can be observed, the accuracyis independent of the solver. However, Hermite integrationproduces an error which is by several magnitudes lower thanthe one of trapezoidal integration.Fig. 10 compares the value of the objective function afteroptimization. This value can be understood as being inverselyproportional to the quality of the solution. The differencebetween the two solvers is marginal. However, Hermiteintegration is able to achieve a significantly lower objectivefunction value, especially with a low number of nodes.
D. Hardware experiments
All the benchmark tasks shown in Fig. 1 (including thosewith path constraints) were applied to the real hardware inorder to verify that these approaches also hold on a physicalsystem. Control inputs and states were bounded to moreconservative values avoiding aggressive behaviors and tooperate in a safer regime. Moreover, upper bound for the total of nodes0 50 100 150 200 v a l ue Objective Function
Hermite - IPOPTHermite - SNOPTTrapez. - IPOPTTrapez. - SNOPT
Fig. 10. This plot shows the objective function value for the differentcombinations of integration methods and solvers. Especially for low numberof nodes, Hermite integration clearly outperforms trapezoidal integration.For more than 50 nodes, the differences vanish.Fig. 11. Ball balancing robot. This robotconsists of three major elements: the ball, thepropulsion unit and the upper body (torso).The ball is driven by three omniwheelsmounted on brushless motors with planetarygear heads. Through optical encoders on themotors, the ball rotation is measured provid-ing onboard odometry for the robot. trajectory time was relaxed ( T max = 7 s). Solutions werefound using 100 Nodes, Hermite approximation and IPOPTas a solver. The robot used in the experiment is shown in Fig.11 and a video with the experimental results can be foundat . In the videoit can be seen that the robot executes the task synchronizedwith the simulation, following the planned trajectories.The xy position of the robot during execution of all tasksmatches the planned trajectory fairly well with almost nodeviation. Fig. 12 shows the results for the third task. Eventhough this is the most challenging task, it can be seenthat the tracking behavior is very good. In summary, thehardware tests show good tracking performance, verifyingthat the optimized trajectory is dynamically feasible and thatthe tracking control is able to reject disturbances and modelinaccuracies very well. E. Validation on a different robot
Results presented above are validated by applying thesame methods on a simulated 3D quadrotor ( x ∈ R , u ∈ R ) executing an obstacle avoidance task. Fig. 13 shows thetrajectory followed by the quadrotor, starting from the originand reaching the goal after avoiding a cylindrical obstacle.The complete motion can be seen in the supplemental video.We observed that the influences of the type of solver, dy-namic constraints, initialization method and number of nodesare similar to the ones shown on the ballbot experiments. Forinstance, Table II shows the time required to solve this taskfor different configurations using zero initialization strategy.The same patterns can be observed: Solution time increases t [s]0 1 2 3 4 5 6 7 x [ m ] HardwarePlan t [s]0 1 2 3 4 5 6 7 y [ m ] HardwarePlan
Fig. 12. Position of the robot in x and y direction during execution of thetwo obstacles task. The blue solid line shows the measured data while the reddashed lie shows the planned trajectory. Solution time for this configurationis . s. Fig. 13. Quadrotor valida-tion task. Blue dot representsthe goal spatially located at p = [5 , , . A cylindricalobstacle ( r = 1 ) is centeredat x = 2 . , y = 2 . . with the number of nodes and IPOPT is faster than SNOPTgiven the same integration scheme. As observed in Fig. 8,there is no clear pattern regarding solution times requiredby SNOPT when using different integration schemes. Theseresults support the hypothesis that the conclusions drawnfrom the evaluation presented in this paper are not robotspecific but that they can be extended to other platforms.VI. A NALYSIS AND D ISCUSSION a) NLP solver:
We observed that SNOPT requires moretime in most of the comparisons. This is even clearer forthe case of the two obstacle task, where the difference toIPOPT is significant. This can be explained by the need ofSQP methods in finding a solution within a big active set ofinequality constraints. As pointed out in other comparisonstudies [12], IPM tend to be faster in a NLP problemwith many inequality constraints. Moreover, given a goodinitialization, the IPM algorithm always outperformed theSQP solver. In terms of quality and accuracy of the solutionboth solvers acted similarly under different schemes.
TABLE IIV
ERIFICATION R ESULTS - S
OLUTION T IME ( S )N IPOPT SNOPTHermite Trapez. Hermite Trapez.50 0.785 0.240 18.53 0.42375 1.153 0.332 42.81 3.072ABLE IIIM ISSING DATA POINTS - CONFIGURATIONS
Integration Scheme Number of Nodes SolverHermite 190 SNOPTHermite 15, 25, 40, 45 IPOPTTrapez. 25,30 SNOPT b) Integration scheme:
The integration scheme has lessinfluence on the solution time than the solver. However, se-lecting a Hermite-Simpson approximation has a considerableimpact on the accuracy of the solution. As expected, the errordecreases with the number of nodes, but this is not signif-icant with respect to the better approximation reached withHermite-Simpson. The quality of the solution is also affectedby the integration scheme. Trapezoidal methods requires atleast nodes to reach the same quality as the one obtainedwith Hermite integration using nodes. This fact, togetherwith the running time results, where Hermite and trapezoidalschemes perform similarly when using IPOPT, suggests thatHermite integration scheme should be preferred. c) Initialization: The aspect of initialization is closelyrelated to the performance of the solver. We observed thatdirect transcription in general is considerably sensitive to theinitialization strategy adopted at the NLP stage. Regardless ofthe solver, applying an adequate initialization policy clearlyreduces the time to obtain a solution. This is a fundamentalaspect for applying direct transcription online. d) Hardware experiments:
It has been shown that TOwith a matching controller can be deployed on real hardware.The good match between simulated and real behavior canbe observed both from the figures presented in this paperas well as from the video attachment overlaying simulatedand physical robot. Small deviations between planned andreal trajectories can be explained by model mismatches andsensor noise. These experiments validate the applicability ofTO to real hardware and they are one of very few examplesof using this technique on real underactuated robots. e) Solving time:
Ideally, the robot itself should be ableto obtain plans online. Since the optimized trajectory isusually complemented with a stabilizing controller TO isnot bound to run at the same rate or in hard real-time.However, the solving time of TO should be fast enough toreact to large scale disturbances or unforeseen changes in therobot’s environment. The maximally acceptable solving timedepends on the dynamics of the system and on the task. Forthe given example in Figure 7 the solving time is about 5 to50 cycles of the inner stabilizing control loop. These solvingtimes lie within a reasonable magnitude for online planning. f) Failures during optimal trajectory search:
Occasion-ally the solver might not be able to find a solution, declaringnumerical difficulties or because the maximum number ofiterations has been reached. As observed in Fig. 8 andFig. 10, data is missing for a few configurations of solver,integration scheme and number of nodes. Such configurationsare reported in Table III. Further investigation is required todetect the fundamental reasons for these failures, as no clearpattern of the variables analyzed in this study is observed. Finally, apart from the aforementioned cases, in Fig. 9five other points are also missing. Such cases correspondto solutions for which the TVLQR method is not ableto find a feedback controller and therefore the accuracyas defined in Section III-C cannot be determined. This isobserved for some configurations with less than 30 nodes forboth integration schemes and solvers. Further investigationregarding the relation between direct transcription and thesolution of a TVLQR problem is required in order to identifythe fundamental reason of such effect.VII. C
ONCLUSIONS
The comparisons performed in this work show clearpatterns on the performance of direct transcription methodswith respect to the different integration methods, solvers,initialization and problem types.We showed a successful implementation of a direct tran-scription based motion planning and control approach fora real robot. For a problem of medium complexity, such asthe one evaluated here, the solution convergences sufficientlyfast so online planning using these methods comes in reach.R
EFERENCES[1] J. T. Betts, “Survey of numerical methods for trajectory optimization,”
Journal of Guidance, Control, and Dynamics , vol. 21, no. 2, pp. 193–207, 1998.[2] J. Koenemann, A. Del Prete, Y. Tassa, E. Todorov, O. Stasse, M. Ben-newitz, and N. Mansard, “Whole-body model-predictive control ap-plied to the hrp-2 humanoid robot,” in
IEEE-RSJ Int. Conference onIntelligent Robots and Systmes (IROS) , 2015, pp. 3346–3351.[3] H. Dai, A. Valenzuela, and R. Tedrake, “Whole-body motion planningwith centroidal dynamics and full kinematics,” in
Humanoid Robots(Humanoids), 2014 14th IEEE-RAS International Conference on ,2014, pp. 295–302.[4] M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectoryoptimization of rigid bodies through contact,”
International Journal ofRobotics Research , vol. 33, no. 1, pp. 69–81, 2014.[5] G. Schultz and K. Mombaur, “Modeling and optimal control of human-like running,”
Mechatronics, IEEE/ASME Transactions on , vol. 15,no. 5, pp. 783–792, 2010.[6] C. Hargraves and S. Paris, “Direct trajectory optimization using non-linear programming and collocation,”
Journal of Guidance, Control,and Dynamics , vol. 10, no. 4, pp. 338–342, 1987.[7] O. von Stryk and R. Bulirsch, “Direct and indirect methods fortrajectory optimization,”
Annals of Operations Research , vol. 37, no. 1,pp. 357–373, 1992.[8] V. Becerra, “Solving complex optimal control problems at no cost withpsopt,” in
Computer-Aided Control System Design (CACSD), 2010IEEE International Symposium on , 2010, pp. 1391–1396.[9] P. Gill, W. Murray, and M. Saunders, “SNOPT: An SQP algorithm forlarge-scale constrained optimization,”
SIAM Review , vol. 47, no. 1, pp.99–131, 2005.[10] A. Wachter and L. T. Biegler, “On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinearprogramming,”
Mathematical Programming , vol. 106, no. 1, pp. 25–57, 2006.[11] B. Conway, “A survey of methods available for the numerical op-timization of continuous dynamic systems,”