Learning Efficient Navigation in Vortical Flow Fields
Peter Gunnarson, Ioannis Mandralis, Guido Novati, Petros Koumoutsakos, John O. Dabiri
LLearning Efficient Navigation in Vortical Flow Fields
Peter Gunnarson, Ioannis Mandralis, Guido Novati, Petros Koumoutsakos,
2, 3 and John O. Dabiri
1, 4 Graduate Aerospace Laboratories, California Institute of Technology,1200 E California Blvd, Pasadena, California 91125, USA Computational Science and Engineering Laboratory, ETH Zurich, 8093 Zurich, Switzerland John A. Paulson School of Engineering and Applied Sciences,Harvard University, 150 Western Ave, Boston, MA 02134, USA Mechanical and Civil Engineering, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125, USA (Dated: February 23, 2021)Efficient point-to-point navigation in the presence of a background flow field is important forrobotic applications such as ocean surveying. In such applications, robots may only have knowledgeof their immediate surroundings or be faced with time-varying currents, which limits the use ofoptimal control techniques for planning trajectories. Here, we apply a novel Reinforcement Learningalgorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through anunsteady two-dimensional flow field. The algorithm entails inputting environmental cues into adeep neural network that determines the swimmer’s actions, and deploying Remember and ForgetExperience replay. We find that the resulting swimmers successfully exploit the background flow toreach the target, but that this success depends on the type of sensed environmental cue. Surprisingly,a velocity sensing approach outperformed a bio-mimetic vorticity sensing approach by nearly two-foldin success rate. Equipped with local velocity measurements, the reinforcement learning algorithmachieved near 100% success in reaching the target locations while approaching the time-efficiency ofpaths found by a global optimal control planner.
Introduction .—Navigation in the presence of a back-ground unsteady flow field is an important task in awide range of robotic applications, including ocean sur-veying [1], monitoring of deep-sea animal communities[2], drone-based inspection and delivery in windy con-ditions [3], and weather balloon station keeping [4]. Insuch applications, robots must contend with unsteadyfluid flows such as wind gusts or ocean currents in orderto survey specific locations and return useful measure-ments, often autonomously. Ideally, robots would exploitthese background currents to propel themselves to theirdestinations more quickly or with lower energy expendi-ture.If the entire background flow field is known in advance,numerous algorithms exist to accomplish optimal pathplanning, ranging from the classical Zermelo’s equationfrom optimal control theory [5, 6] to modern optimiza-tion approaches [1, 3, 7–10]. However, measuring theentire flow field is often be impractical, as ocean andair currents can be difficult to measure and can changeunpredictably. Robots themselves can also significantlyalter the surrounding flow field, for example when multi-rotors fly near obstacles [11] or during fish-like swimming[12]. Additionally, oceanic and flying robots are increas-ingly operated autonomously and therefore do not haveaccess to real-time external information about incomingcurrents and gusts (e.g. [13, 14]).Instead, robots may need to rely on data from on-boardsensors to react to the surrounding flow field and navigateeffectively. A bio-inspired approach is to navigate usinglocal flow information, for example by sensing the localflow velocity or pressure. Zebrafish appear to use their lateral line to sense the local flow velocity and avoid ob-stacles by recognizing changes in the local vorticity dueto boundary layers [15]. Some seal species can orientthemselves and hunt in total darkness by detecting cur-rents with their whiskers [16]. Additionally, a numericalstudy of fish schooling demonstrated how surface pres-sure gradient and shear stress sensors on a downstreamfish can determine the locations of upstream fish, thusenabling energy-efficient schooling behavior [17].Reinforcement Learning (RL) offers a promising ap-proach for replicating this feat of navigation from lo-cal flow information. In simulated environments, RLhas successfully discovered energy-efficient fish swimming[18, 19] and schooling behavior [12], and a time-efficientnavigation policy for repeated quasi-turbulent flow usingposition information [20]. In application, RL using localwind velocity estimates outperformed existing methodsfor energy-efficient weather balloon station keeping [4]and for replicating bird soaring [21]. Other methods ex-ist for navigating uncertainty in a partially known flowfield such as fuzzy logic or adaptive control methods [7],however RL can be applied generally to an unknown flowfield without requiring human tuning for specific scenar-ios.The question remains, however, as to which environ-mental cues are most useful for navigating through flowfields using RL. A biomimetic approach suggest thatsensing the vorticity could be beneficial [15]; however flowvelocity, pressure, or quantities derived thereof are alsoviable candidates for sensing.In this letter, we find that Deep Reinforcement Learn-ing can indeed discover time-efficient, robust paths a r X i v : . [ phy s i c s . f l u - dyn ] F e b through an unsteady, two-dimensional (2D) flow field us-ing only local flow information, where simpler strategiessuch as swimming towards the target largely fail at thetask. We find, however, that the success of the RL ap-proach depends on the type of flow information provided.Surprisingly, a RL swimmer equipped with local velocitymeasurements dramatically outperforms the bio-mimeticlocal vorticity approach. These results show that combin-ing RL-based navigation with local flow measurementscan be a highly effective method for navigating throughunsteady flow, provided the appropriate flow quantitiesare used as inputs to the algorithm. Simulated Navigation Problem .—As a testing environ-ment for RL-based navigation, we pose the problem ofnavigating across an unsteady von K´arm´an vortex streetobtained by simulating 2D, incompressible flow past acylinder at a Reynolds number of 400. Other studies haveinvestigated optimal navigation through real ocean flows[1], simulated turbulence [20], and simple flows for whichthere exist exact optimal navigation solutions [8]. Here,we investigate the flow past a cylinder to retain greaterinterpretability of learned navigation strategies while re-maining a challenging, unsteady navigation problem.The swimmer is tasked with navigating from a start-ing point on one side of the cylinder wake to within asmall radius of a target point on the opposite side of thewake region. For each episode, or attempt to swim tothe target, a pair of start and target positions are chosenrandomly within disk regions as shown in Figure 1. Addi-tionally, the swimmer is assigned a random starting timein the vortex shedding cycle. The spatial and temporalrandomness prevent the RL algorithm from speciouslyforming a one-to-one correspondence between the swim-mer’s relative position and the background flow, whichwould not reflect real-world navigation scenarios. Allswimmers have access to their position relative to thetarget (∆ x , ∆ y ) rather than their absolute position tofurther prevent the swimmer from relying on memorizedlocations of flow features during training.For simplicity and training speed, we consider theswimmer to be a massless point with a position X n =[ x, y ] which advects with the time-dependent backgroundflow U flow = [ u ( x, y, t ) , v ( x, y, t )]. The swimmer canswim with a constant speed U swim and can directly con-trol its swimming direction θ . These dynamics are dis-cretized with a time step ∆ t = 0 . D/U ∞ using a forwardEuler scheme: X = X start , (1) X n +1 = X n + ∆ t ( U swim [cos ( θ ) , sin ( θ )] + U flow ) . (2)It is also possible to apply RL-based navigation withmore complex dynamics, including when the swimmer’sactions alter the background flow [12].We chose a swimming speed of 80% of the freestreamspeed U ∞ to make the navigation problem challenging, StartTarget
FIG. 1. Test navigation problem of navigating through un-steady cylinder flow. Swimmers are initialized randomly in-side the red disk and are assigned a random target locationinside the green disk. These regions of start and target pointsare 4 D in diameter, and are located 5 D downstream and cen-tered 2 . D above and below the cylinder. Additionally, eachswimmer is initialized at a random time step in the vortexshedding cycle. An episode is successful when a swimmerreaches within a radius of D/
12 around the target location. as the swimmer cannot overcome the local flow in someregions of the domain. A slower speed ( U swim < . U ∞ )makes navigating this flow largely intractable, while aswimming speed greater than the freestream ( U swim >U ∞ ) would allow the swimmer to overcome the back-ground flow and easily reach the target. Navigation Using Deep Reinforcement Learning .—InReinforcement Learning, an agent acts according to apolicy, which takes in the agent’s state s as an inputand outputs an action a . Through repeated experienceswith the surrounding environment, the policy is trainedso that the agent’s behavior maximizes a cumulative re-ward. Here, the agent is a swimmer, the action is theswimming direction θ , and we seek to determine how theperformance of a learned navigation policy is impactedby the type of flow information contained in the state.To this end, we first consider a flow-blind swimmeras a baseline, which cannot sense the surrounding flowand only has access to its position relative to the target( s = { ∆ x, ∆ y } ). Next, inspired by the vorticity-basednavigation strategy of the zebrafish [15], we consider a vorticity swimmer with access to the local vorticity atthe current and previous time step in order to sensechanges in the local vorticity ( s = { ∆ x, ∆ y, ω t , ω t − ∆ t } ).We also consider a velocity swimmer, which has accessto both components of the local background velocity( s = { ∆ x, ∆ y, u, v } ). Other states were also investigated,and are included in supplemental materials.We employ Deep Reinforcement Learning for this nav-igation problem, in which the navigation policy is ex-pressed using a deep neural network. Previously, Biferaleet al. [20] employed an actor-critic approach for RL-based navigation of repeated quasi-turbulent flow. Thepolicy was expressed using a basis function architecture,requiring a coarse discretization of both the swimmer’sposition and swimming direction. Here, a single 128 × r n , which is designed toproduce the desired behavior of navigating to the target.We employ a similar reward function as Biferale et al.[20]: r n = − ∆ t + 10 (cid:20) || X n − − X target || U swim − || X n − X target || U swim (cid:21) +bonus . (3)The first term penalizes duration of an episode to en-courage fast navigation to the target. The second twoterms give a reward when the swimmer is closer to thetarget than it was in the previous time step. The finalterm is a bonus equal to 200 seconds, or approximately30 times the duration of a typical trajectory. The bonusis awarded if the swimmer successfully reaches the tar-get. Swimmers that exit the simulation area or collidewith the cylinder are treated as unsuccessful. The sec-ond two terms are scaled by 10 to be on the same orderof magnitude as the first term, which we found signif-icantly improved training speed and navigation successrates. We also investigated a non-linear reward function,in which the second two terms are the reciprocal of thedistance to the target, however it exhibited lower perfor-mance. The RL algorithm seeks to maximize the totalreward, which is the sum of the reward function acrossall N time steps in an episode: r total = N (cid:88) n =1 r n = − T f + 10 || X start − X target || U swim + bonus . (4)Assuming the swimmer reaches the target location, theonly term in r total that depends on the swimmer’s tra- -50050100150200 Number of Training Episodes Flow-Blind Swimmer Vorticity Swimmer Velocity Swimmer
FIG. 2. Evolution of the cumulative reward during trainingfor the three RL swimmers. The cumulative rewards for eachepisode are plotted as points, and a moving average with awindow of 201 episodes is plotted with a solid line. Becausethe swimmer gains a bonus of 200 for reaching the target,successful episodes are clustered around a reward of 200 whileunsuccessful episodes are clustered below zero. jectory is − T f . Therefore, maximizing the cumulativereward of a successful episode is equivalent to finding theminimum time path to the target. During training how-ever, all terms in the reward contribute to finding policiesthat drive the swimmer to the target in the first place.The evolution of the reward function during training foreach swimmer is shown in Figure 2. All RL swimmerswere trained for 20,000 episodes. Success of RL Navigation .—After training, Deep RLdiscovered effective policies for navigating through thisunsteady flow. An example of a path discovered by thevelocity RL swimmer is shown in Figure 3. Because theswimming speed is less than the free-stream velocity, theswimmer must utilize the wake region where it can ex-ploit slower background flow to swim upstream. Oncesufficiently far upstream, the swimmer can then steer to-wards the target. The plot of the swimming directioninside the wake (Figure 4B) shows how the swimmerchanges its swimming direction in response to the back-ground flow, enabling it to maintain its position insidethe wake region and target low-velocity regions.However, the ability of Deep RL to discover these ef-fective navigation strategies depends on the type of localflow information included in the swimmer state. To il-lustrate this point, example trajectories and the averagesuccess rates of the flow-blind, vorticity, and velocity RLswimmers are plotted in Figure 4, and are compared witha na¨ıve policy of simply swimming towards the target( θ na¨ıve = tan − (∆ y/ ∆ x )).A na¨ıve policy of swimming towards the target ishighly ineffective. Swimmers employing this policy areswept away by the background flow, and reached the tar-get only 1.2% of the time on average. A reinforcementlearning approach, even without access to flow informa-tion, is much more successful: the flow-blind swimmerreached the target locations nearly 40% of the time. (A) (B) targetstart FIG. 3. (A) Example trajectory of the velocity RL swimmer,in which it successfully navigates from its starting location tothe target. (B) Segment of this trajectory plotted in a wake-stationary frame of reference on top of the background flowfield, which highlights the swimmer exploiting low-velocity re-gions in the cylinder wake to swim upstream. The swimmingdirection is plotted at each time step along the trajectory,revealing that this RL swimmer adjusts it swimming direc-tion in response to the changing background flow, enablingtime-efficient navigation.
Giving the RL swimmers access to local flow infor-mation increases the success further: the vorticity RLswimmer averaged a 47.2% success rate. Surprisinglyhowever, the velocity swimmer has a near 100% successrate, greatly outperforming the zebrafish-inspired vortic-ity approach. With the right local flow information, itappears that an RL approach can navigate nearly with-out fail through a complex, unsteady flow field. However,the question remains as to why some flow properties aremore informative than others.To better understand the difference between RL swim-mers with access to different flow properties, the swim-ming direction computed by each RL policy is plottedover a grid of locations in Figure 5. The flow-blindswimmer does not react to changes in the backgroundflow field, although it does appear to learn the effect ofthe mean background flow, possibly through correlationbetween the mean flow and the relative position of theswimmer in the domain. This provides it an advantageover the na¨ıve swimmer. The vorticity swimmer adjustsits swimming direction modestly in response to changesin the background flow, for example by swimming slightlyupwards in counter-clockwise vortices and slightly down-wards in clockwise vortices. The velocity swimmer ap-pears most sensitive to the background flow, which mayhelp it respond more effectively to changes in the back-ground flow.Station-keeping inside the wake region may be impor-tant for navigating through this flow. In the upper rightof the domain, the velocity swimmer learns to orientdownwards and back to the wake region, while the otherswimmers swim futilely towards the target. Because thevorticity depends on gradients in the background flow,that property cannot be used to respond to flow distur-bances that are spatially uniform. These difference ap-pear to explain many of the failed trajectories in Figure4, in which the flow-blind and vorticity swimmers are (A) Naïve Swimmer (B) Flow-Blind RL Swimmer(C) Vorticity RL Swimmer (D) Velocity RL Swimmer
Success Rate:1.3 ± 0.4% Success Rate:39.4 ± 5.8%Success Rate:47.2 ± 8.7% Success Rate:99.9 ± 0.1%
FIG. 4. Average success rate with 30 example trajectories foreach swimmer type. Successful attempts to reach the targetare green, while unsuccessful attempts are red. (A) Na¨ıvepolicy of swimming towards the target is rarely successful.(B) The flow-blind RL swimmer navigates more effectivelythan the na¨ıve swimmer. (C) The vorticity RL swimmer ismore successful than the flow-blind swimmer, showing thatsensing the local flow can improve RL-based navigation. (D)Surprisingly, the velocity RL swimmer nearly always reachesthe target using only the local flow velocity. The stated suc-cess rates are averaged over 12,500 episodes and are shownwith one standard deviation arising from the five times eachswimmer was trained. swept up and to the right by the background flow.It is worth noting that because the flow pushes theswimmers according to linear dynamics (Equation 2), thelocal velocity can exactly determine the swimmer’s po-sition at the next time step. This may explain the highnavigation success of the velocity swimmer, as it has thepotential to accurately predict its next location. To besure, the Deep RL algorithm must still learn where themost advantageous next location ought to be, as the flowvelocity at the next time step is still unknown.While sensing of vorticity is insufficient to detect spa-tially uniform disturbances, it can be useful for distin-guishing the vortical wake from the freestream flow. Thiscan explain why the vorticity swimmer performs betterthan the flow-blind swimmer. A similar reasoning couldapply to swimmers that sense other flow quantities suchas pressure or shear.For real swimmers however, vorticity may play a largerrole, for example by causing a swimmer to rotate in theflow [24] or by altering boundary layers and skin frictiondrag [12]. Real robots would also be subject to addi-tional sources of complexity not considered in this sim-plified simulation, which would make it more difficult todetermine a swimmer’s next position from local velocitymeasurements. (A) Naïve Swimmer (B) Flow-Blind RL Swimmer(C) Vorticity RL Swimmer (D) Velocity RL Swimmer
FIG. 5. Swimming direction policy plotted across the domainfor a fixed target (green circle) at a given time instant. (A)The na¨ıve swimmer swims towards the target. (B) The redoutline highlights how the flow-blind swimmer navigates irre-spective of the background flow, while the vorticity swimmer(C) adjusts its swimming direction modestly. (D) The ve-locity swimmer appears even more sensitive to the unsteadybackground flow.
Comparison with Optimal Control .—In addition toreaching the destination successfully, it is desirable tonavigate to the target while minimizing energy consump-tion or time spent traveling. Biferale et. al [20] demon-strated that RL can approach the performance of time-optimal trajectories in steady flow for fixed start andtarget positions. Here, we find that this result also holdsfor the more challenging problem of navigating unsteadyflow with variable start and target points.As noted in Equation 4, maximizing r total is equiva-lent to minimizing the time spent traveling to the target( T f ), provided the swimmer successfully reaches the tar-get. Therefore, we compare the velocity RL swimmer tothe time-optimal swimmer derived from optimal control.To find time-optimal paths through the flow, givenknowledge of the full velocity field at all times, we con-structed a path planner that finds locally optimal pathsin two steps. First, a rapidly-exploring random tree al-gorithm (RRT) finds a set of control inputs that drivethe swimmer from the starting location to the targetlocation, typically non-optimally [25]. Then we applyconstrained gradient-descent optimization (i.e. the fmin-con function in MATLAB) to minimize the time step(and therefore overall time T f ) of the trajectory whileenforcing that the swimmer starts at the starting point(Equation 1), obeys the dynamics at every time stepin the trajectory (Equation 2), and reaches the target( || X N − X target || < = 0 . RL: = 8.80Optimal: = 7.38(16% faster) RL: = 18.4Optimal: = 15.6(15% faster) RL: = 33.3Optimal: = 25.7(23% faster)
FIG. 6. Comparison between time-optimal trajectories (red)and RL trajectories (black) using an RL swimmer with state s = { ∆ x, ∆ y u, v } . Time to reach the target T f is made non-dimensional using the timescale D/U ∞ . The surprisingly high performance of the RL approachcompared to a global path planner suggests that deepneural networks can, to some extent, approximate howlocal flow at a particular time impacts navigation in thefuture. In other words, a successful RL swimmer must si-multaneously navigate and identify the approximate cur-rent state of the environment. In comparison, the op-timal control approach relies on knowledge of the envi-ronment in advance. There are limitations to the RLapproach, however. For example, the optimal swimmerin the middle of Figure 6 enters the wake region at adifferent location than the RL swimmer to avoid a highvelocity region, which the RL swimmer may not havebeen able to sense initially.In addition to approaching the optimality of a globalplanner, RL navigation offers a robustness advantage. Asnoted in [20], RL can be robust to small changes in initialconditions. Here, we show that RL navigation can gener-alize to a large area of initial and target conditions as wellas random starting times in the unsteady flow. RL nav-igation may also generalize to other flow fields to someextent [24]. In contrast, the optimal trajectories here areopen loop: any disturbance or flow measurement inaccu-racy would prevent the swimmer from successfully navi-gating the target. While robustness can be included withoptimal control in other ways [7], responding to changesin the surrounding environment is the driving principle ofthis RL navigation policy. Indeed, the related algorithmof imitation learning has been applied to add robustnessto existing path planners [26].
Conclusion .—We have shown in this Letter how DeepReinforcement Learning can discover robust and time-efficient navigation policies which are improved by sens-ing local flow information. A bio-inspired approach ofsensing the local vorticity provided a modest increasein navigation success over a position-only approach, butsurprisingly the key to success was discovered to lie insensing the velocity field, which more directly determinedthe future position of the swimmer. This suggests thatRL coupled with an on-board velocity sensor may be aneffective tool for robot navigation. Future investigationis warranted to examine the extent to which the successof the velocity approach extends to real-world scenarios,in which robots may face more complex, 3D fluid flows,and be subject to non-linear dynamics and sensor errors. [1] Weizhong Zhang, T. Inanc, S. Ober-Blobaum, and J. E.Marsden, in (2008) pp. 1083–1088, iSSN:1050-4729.[2] L. A. Kuhnz, H. A. Ruhl, C. L. Huffard, and K. L. Smith,Deep Sea Research Part II: Topical Studies in Oceanog-raphy Thirty-year time-series study in the abyssal NEPacific, , 104761 (2020).[3] J. A. Guerrero and Y. Bestaoui, Journal of Intelligent &Robotic Systems , 297 (2013).[4] M. G. Bellemare, S. Candido, P. S. Castro, J. Gong,M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang,Nature , 77 (2020), number: 7836 Publisher: NaturePublishing Group.[5] E. Zermelo, ZAMM - Journal of Applied Mathematicsand Mechanics / Zeitschrift f¨ur Angewandte Mathematikund Mechanik , 114 (1931).[6] L. Techy, Intelligent Service Robotics , 271 (2011).[7] M. Panda, B. Das, B. Subudhi, and B. B. Pati, Inter-national Journal of Automation and Computing , 321(2020).[8] D. Kularatne, S. Bhattacharya, and M. A. Hsieh, Au-tonomous Robots , 1369 (2018).[9] C. Petres, Y. Pailhas, P. Patron, Y. Petillot, J. Evans,and D. Lane, IEEE Transactions on Robotics ,331 (2007), conference Name: IEEE Transactions onRobotics.[10] T. Lolla, P. F. J. Lermusiaux, M. P. Ueckermann, andP. J. Haley, Ocean Dynamics , 1373 (2014).[11] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli,A. Anandkumar, Y. Yue, and S. Chung, in (2019) pp. 9784–9790, iSSN: 2577-087X.[12] S. Verma, G. Novati, and P. Koumoutsakos, Proceedingsof the National Academy of Sciences , 5849 (2018),publisher: National Academy of Sciences Section: Phys-ical Sciences.[13] E. Fiorelli, N. E. Leonard, P. Bhatta, D. A. Paley,R. Bachmayer, and D. M. Fratantoni, IEEE Journal ofOceanic Engineering , 935 (2006), conference Name:IEEE Journal of Oceanic Engineering.[14] D. A. Caron, B. Stauffer, S. Moorthi, A. Singh,M. Batalin, E. A. Graham, M. Hansen, W. J. Kaiser,J. Das, A. Pereira, A. Dhariwal, B. Zhang, C. Oberg,and G. S. Sukhatme, Limnology and Oceanography ,2333 (2008).[15] P. Oteiza, I. Odstrcil, G. Lauder, R. Portugues, andF. Engert, Nature , 445 (2017), number: 7664 Pub-lisher: Nature Publishing Group.[16] G. Dehnhardt, B. Mauck, and H. Bleckmann, Nature , 235 (1998), number: 6690 Publisher: Nature Pub-lishing Group.[17] P. Weber, G. Arampatzis, G. Novati, S. Verma, C. Pa-padimitriou, and P. Koumoutsakos, Biomimetics , 10(2020).[18] M. Gazzola, B. Hejazialhosseini, and P. Koumoutsakos,SIAM Journal on Scientific Computing , B622 (2014),publisher: Society for Industrial and Applied Mathemat-ics.[19] Y. Jiao, F. Ling, S. Heydari, N. Heess, J. Merel, andE. Kanso, arXiv:2009.14280 [physics, q-bio] (2020),arXiv: 2009.14280.[20] L. Biferale, F. Bonaccorso, M. Buzzicotti, P. ClarkDi Leoni, and K. Gustavsson, Chaos: An Interdisci-plinary Journal of Nonlinear Science , 103138 (2019),publisher: American Institute of Physics.[21] G. Reddy, J. Wong-Ng, A. Celani, T. J. Sejnowski, andM. Vergassola, Nature , 236 (2018), number: 7726Publisher: Nature Publishing Group.[22] G. Novati and P. Koumoutsakos, arXiv:1807.05827 [cs,stat] (2019), arXiv: 1807.05827.[23] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Pre-cup, and D. Meger, arXiv:1709.06560 [cs, stat] (2019),arXiv: 1709.06560.[24] S. Colabrese, K. Gustavsson, A. Celani, and L. Biferale,Physical Review Letters , 158004 (2017), publisher:American Physical Society.[25] S. M. LaValle and J. J. Kuffner, The InternationalJournal of Robotics Research , 378 (2001), publisher:SAGE Publications Ltd STM.[26] B. Rivi`ere, W. H¨onig, Y. Yue, and S. Chung, IEEERobotics and Automation Letters5