[PDF] Learning Efficient Navigation in Vortical Flow Fields

Abstract

Efficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques for planning trajectories. Here, we apply a novel Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through an unsteady two-dimensional flow field. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer's actions, and deploying Remember and Forget Experience replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the type of sensed environmental cue. Surprisingly, a velocity sensing approach outperformed a bio-mimetic vorticity sensing approach by nearly two-fold in success rate. Equipped with local velocity measurements, the reinforcement learning algorithm achieved near 100% success in reaching the target locations while approaching the time-efficiency of paths found by a global optimal control planner.

Full PDF

LLearning Eﬃcient Navigation in Vortical Flow Fields

Peter Gunnarson, Ioannis Mandralis, Guido Novati, Petros Koumoutsakos,

2, 3 and John O. Dabiri

1, 4 Graduate Aerospace Laboratories, California Institute of Technology,1200 E California Blvd, Pasadena, California 91125, USA Computational Science and Engineering Laboratory, ETH Zurich, 8093 Zurich, Switzerland John A. Paulson School of Engineering and Applied Sciences,Harvard University, 150 Western Ave, Boston, MA 02134, USA Mechanical and Civil Engineering, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125, USA (Dated: February 23, 2021)Eﬃcient point-to-point navigation in the presence of a background ﬂow ﬁeld is important forrobotic applications such as ocean surveying. In such applications, robots may only have knowledgeof their immediate surroundings or be faced with time-varying currents, which limits the use ofoptimal control techniques for planning trajectories. Here, we apply a novel Reinforcement Learningalgorithm to discover time-eﬃcient navigation policies to steer a ﬁxed-speed swimmer through anunsteady two-dimensional ﬂow ﬁeld. The algorithm entails inputting environmental cues into adeep neural network that determines the swimmer’s actions, and deploying Remember and ForgetExperience replay. We ﬁnd that the resulting swimmers successfully exploit the background ﬂow toreach the target, but that this success depends on the type of sensed environmental cue. Surprisingly,a velocity sensing approach outperformed a bio-mimetic vorticity sensing approach by nearly two-foldin success rate. Equipped with local velocity measurements, the reinforcement learning algorithmachieved near 100% success in reaching the target locations while approaching the time-eﬃciency ofpaths found by a global optimal control planner.

Introduction .—Navigation in the presence of a back-ground unsteady ﬂow ﬁeld is an important task in awide range of robotic applications, including ocean sur-veying [1], monitoring of deep-sea animal communities[2], drone-based inspection and delivery in windy con-ditions [3], and weather balloon station keeping [4]. Insuch applications, robots must contend with unsteadyﬂuid ﬂows such as wind gusts or ocean currents in orderto survey speciﬁc locations and return useful measure-ments, often autonomously. Ideally, robots would exploitthese background currents to propel themselves to theirdestinations more quickly or with lower energy expendi-ture.If the entire background ﬂow ﬁeld is known in advance,numerous algorithms exist to accomplish optimal pathplanning, ranging from the classical Zermelo’s equationfrom optimal control theory [5, 6] to modern optimiza-tion approaches [1, 3, 7–10]. However, measuring theentire ﬂow ﬁeld is often be impractical, as ocean andair currents can be diﬃcult to measure and can changeunpredictably. Robots themselves can also signiﬁcantlyalter the surrounding ﬂow ﬁeld, for example when multi-rotors ﬂy near obstacles [11] or during ﬁsh-like swimming[12]. Additionally, oceanic and ﬂying robots are increas-ingly operated autonomously and therefore do not haveaccess to real-time external information about incomingcurrents and gusts (e.g. [13, 14]).Instead, robots may need to rely on data from on-boardsensors to react to the surrounding ﬂow ﬁeld and navigateeﬀectively. A bio-inspired approach is to navigate usinglocal ﬂow information, for example by sensing the localﬂow velocity or pressure. Zebraﬁsh appear to use their lateral line to sense the local ﬂow velocity and avoid ob-stacles by recognizing changes in the local vorticity dueto boundary layers [15]. Some seal species can orientthemselves and hunt in total darkness by detecting cur-rents with their whiskers [16]. Additionally, a numericalstudy of ﬁsh schooling demonstrated how surface pres-sure gradient and shear stress sensors on a downstreamﬁsh can determine the locations of upstream ﬁsh, thusenabling energy-eﬃcient schooling behavior [17].Reinforcement Learning (RL) oﬀers a promising ap-proach for replicating this feat of navigation from lo-cal ﬂow information. In simulated environments, RLhas successfully discovered energy-eﬃcient ﬁsh swimming[18, 19] and schooling behavior [12], and a time-eﬃcientnavigation policy for repeated quasi-turbulent ﬂow usingposition information [20]. In application, RL using localwind velocity estimates outperformed existing methodsfor energy-eﬃcient weather balloon station keeping [4]and for replicating bird soaring [21]. Other methods ex-ist for navigating uncertainty in a partially known ﬂowﬁeld such as fuzzy logic or adaptive control methods [7],however RL can be applied generally to an unknown ﬂowﬁeld without requiring human tuning for speciﬁc scenar-ios.The question remains, however, as to which environ-mental cues are most useful for navigating through ﬂowﬁelds using RL. A biomimetic approach suggest thatsensing the vorticity could be beneﬁcial [15]; however ﬂowvelocity, pressure, or quantities derived thereof are alsoviable candidates for sensing.In this letter, we ﬁnd that Deep Reinforcement Learn-ing can indeed discover time-eﬃcient, robust paths a r X i v : . [ phy s i c s . f l u - dyn ] F e b through an unsteady, two-dimensional (2D) ﬂow ﬁeld us-ing only local ﬂow information, where simpler strategiessuch as swimming towards the target largely fail at thetask. We ﬁnd, however, that the success of the RL ap-proach depends on the type of ﬂow information provided.Surprisingly, a RL swimmer equipped with local velocitymeasurements dramatically outperforms the bio-mimeticlocal vorticity approach. These results show that combin-ing RL-based navigation with local ﬂow measurementscan be a highly eﬀective method for navigating throughunsteady ﬂow, provided the appropriate ﬂow quantitiesare used as inputs to the algorithm. Simulated Navigation Problem .—As a testing environ-ment for RL-based navigation, we pose the problem ofnavigating across an unsteady von K´arm´an vortex streetobtained by simulating 2D, incompressible ﬂow past acylinder at a Reynolds number of 400. Other studies haveinvestigated optimal navigation through real ocean ﬂows[1], simulated turbulence [20], and simple ﬂows for whichthere exist exact optimal navigation solutions [8]. Here,we investigate the ﬂow past a cylinder to retain greaterinterpretability of learned navigation strategies while re-maining a challenging, unsteady navigation problem.The swimmer is tasked with navigating from a start-ing point on one side of the cylinder wake to within asmall radius of a target point on the opposite side of thewake region. For each episode, or attempt to swim tothe target, a pair of start and target positions are chosenrandomly within disk regions as shown in Figure 1. Addi-tionally, the swimmer is assigned a random starting timein the vortex shedding cycle. The spatial and temporalrandomness prevent the RL algorithm from speciouslyforming a one-to-one correspondence between the swim-mer’s relative position and the background ﬂow, whichwould not reﬂect real-world navigation scenarios. Allswimmers have access to their position relative to thetarget (∆ x , ∆ y ) rather than their absolute position tofurther prevent the swimmer from relying on memorizedlocations of ﬂow features during training.For simplicity and training speed, we consider theswimmer to be a massless point with a position X n =[ x, y ] which advects with the time-dependent backgroundﬂow U ﬂow = [ u ( x, y, t ) , v ( x, y, t )]. The swimmer canswim with a constant speed U swim and can directly con-trol its swimming direction θ . These dynamics are dis-cretized with a time step ∆ t = 0 . D/U ∞ using a forwardEuler scheme: X = X start , (1) X n +1 = X n + ∆ t ( U swim [cos ( θ ) , sin ( θ )] + U ﬂow ) . (2)It is also possible to apply RL-based navigation withmore complex dynamics, including when the swimmer’sactions alter the background ﬂow [12].We chose a swimming speed of 80% of the freestreamspeed U ∞ to make the navigation problem challenging, StartTarget

FIG. 1. Test navigation problem of navigating through un-steady cylinder ﬂow. Swimmers are initialized randomly in-side the red disk and are assigned a random target locationinside the green disk. These regions of start and target pointsare 4 D in diameter, and are located 5 D downstream and cen-tered 2 . D above and below the cylinder. Additionally, eachswimmer is initialized at a random time step in the vortexshedding cycle. An episode is successful when a swimmerreaches within a radius of D/

12 around the target location. as the swimmer cannot overcome the local ﬂow in someregions of the domain. A slower speed ( U swim < . U ∞ )makes navigating this ﬂow largely intractable, while aswimming speed greater than the freestream ( U swim >U ∞ ) would allow the swimmer to overcome the back-ground ﬂow and easily reach the target. Navigation Using Deep Reinforcement Learning .—InReinforcement Learning, an agent acts according to apolicy, which takes in the agent’s state s as an inputand outputs an action a . Through repeated experienceswith the surrounding environment, the policy is trainedso that the agent’s behavior maximizes a cumulative re-ward. Here, the agent is a swimmer, the action is theswimming direction θ , and we seek to determine how theperformance of a learned navigation policy is impactedby the type of ﬂow information contained in the state.To this end, we ﬁrst consider a ﬂow-blind swimmeras a baseline, which cannot sense the surrounding ﬂowand only has access to its position relative to the target( s = { ∆ x, ∆ y } ). Next, inspired by the vorticity-basednavigation strategy of the zebraﬁsh [15], we consider a vorticity swimmer with access to the local vorticity atthe current and previous time step in order to sensechanges in the local vorticity ( s = { ∆ x, ∆ y, ω t , ω t − ∆ t } ).We also consider a velocity swimmer, which has accessto both components of the local background velocity( s = { ∆ x, ∆ y, u, v } ). Other states were also investigated,and are included in supplemental materials.We employ Deep Reinforcement Learning for this nav-igation problem, in which the navigation policy is ex-pressed using a deep neural network. Previously, Biferaleet al. [20] employed an actor-critic approach for RL-based navigation of repeated quasi-turbulent ﬂow. Thepolicy was expressed using a basis function architecture,requiring a coarse discretization of both the swimmer’sposition and swimming direction. Here, a single 128 × r n , which is designed toproduce the desired behavior of navigating to the target.We employ a similar reward function as Biferale et al.[20]: r n = − ∆ t + 10 (cid:20) || X n − − X target || U swim − || X n − X target || U swim (cid:21) +bonus . (3)The ﬁrst term penalizes duration of an episode to en-courage fast navigation to the target. The second twoterms give a reward when the swimmer is closer to thetarget than it was in the previous time step. The ﬁnalterm is a bonus equal to 200 seconds, or approximately30 times the duration of a typical trajectory. The bonusis awarded if the swimmer successfully reaches the tar-get. Swimmers that exit the simulation area or collidewith the cylinder are treated as unsuccessful. The sec-ond two terms are scaled by 10 to be on the same orderof magnitude as the ﬁrst term, which we found signif-icantly improved training speed and navigation successrates. We also investigated a non-linear reward function,in which the second two terms are the reciprocal of thedistance to the target, however it exhibited lower perfor-mance. The RL algorithm seeks to maximize the totalreward, which is the sum of the reward function acrossall N time steps in an episode: r total = N (cid:88) n =1 r n = − T f + 10 || X start − X target || U swim + bonus . (4)Assuming the swimmer reaches the target location, theonly term in r total that depends on the swimmer’s tra- -50050100150200 Number of Training Episodes Flow-Blind Swimmer Vorticity Swimmer Velocity Swimmer

FIG. 2. Evolution of the cumulative reward during trainingfor the three RL swimmers. The cumulative rewards for eachepisode are plotted as points, and a moving average with awindow of 201 episodes is plotted with a solid line. Becausethe swimmer gains a bonus of 200 for reaching the target,successful episodes are clustered around a reward of 200 whileunsuccessful episodes are clustered below zero. jectory is − T f . Therefore, maximizing the cumulativereward of a successful episode is equivalent to ﬁnding theminimum time path to the target. During training how-ever, all terms in the reward contribute to ﬁnding policiesthat drive the swimmer to the target in the ﬁrst place.The evolution of the reward function during training foreach swimmer is shown in Figure 2. All RL swimmerswere trained for 20,000 episodes. Success of RL Navigation .—After training, Deep RLdiscovered eﬀective policies for navigating through thisunsteady ﬂow. An example of a path discovered by thevelocity RL swimmer is shown in Figure 3. Because theswimming speed is less than the free-stream velocity, theswimmer must utilize the wake region where it can ex-ploit slower background ﬂow to swim upstream. Oncesuﬃciently far upstream, the swimmer can then steer to-wards the target. The plot of the swimming directioninside the wake (Figure 4B) shows how the swimmerchanges its swimming direction in response to the back-ground ﬂow, enabling it to maintain its position insidethe wake region and target low-velocity regions.However, the ability of Deep RL to discover these ef-fective navigation strategies depends on the type of localﬂow information included in the swimmer state. To il-lustrate this point, example trajectories and the averagesuccess rates of the ﬂow-blind, vorticity, and velocity RLswimmers are plotted in Figure 4, and are compared witha na¨ıve policy of simply swimming towards the target( θ na¨ıve = tan − (∆ y/ ∆ x )).A na¨ıve policy of swimming towards the target ishighly ineﬀective. Swimmers employing this policy areswept away by the background ﬂow, and reached the tar-get only 1.2% of the time on average. A reinforcementlearning approach, even without access to ﬂow informa-tion, is much more successful: the ﬂow-blind swimmerreached the target locations nearly 40% of the time. (A) (B) targetstart FIG. 3. (A) Example trajectory of the velocity RL swimmer,in which it successfully navigates from its starting location tothe target. (B) Segment of this trajectory plotted in a wake-stationary frame of reference on top of the background ﬂowﬁeld, which highlights the swimmer exploiting low-velocity re-gions in the cylinder wake to swim upstream. The swimmingdirection is plotted at each time step along the trajectory,revealing that this RL swimmer adjusts it swimming direc-tion in response to the changing background ﬂow, enablingtime-eﬃcient navigation.

Giving the RL swimmers access to local ﬂow infor-mation increases the success further: the vorticity RLswimmer averaged a 47.2% success rate. Surprisinglyhowever, the velocity swimmer has a near 100% successrate, greatly outperforming the zebraﬁsh-inspired vortic-ity approach. With the right local ﬂow information, itappears that an RL approach can navigate nearly with-out fail through a complex, unsteady ﬂow ﬁeld. However,the question remains as to why some ﬂow properties aremore informative than others.To better understand the diﬀerence between RL swim-mers with access to diﬀerent ﬂow properties, the swim-ming direction computed by each RL policy is plottedover a grid of locations in Figure 5. The ﬂow-blindswimmer does not react to changes in the backgroundﬂow ﬁeld, although it does appear to learn the eﬀect ofthe mean background ﬂow, possibly through correlationbetween the mean ﬂow and the relative position of theswimmer in the domain. This provides it an advantageover the na¨ıve swimmer. The vorticity swimmer adjustsits swimming direction modestly in response to changesin the background ﬂow, for example by swimming slightlyupwards in counter-clockwise vortices and slightly down-wards in clockwise vortices. The velocity swimmer ap-pears most sensitive to the background ﬂow, which mayhelp it respond more eﬀectively to changes in the back-ground ﬂow.Station-keeping inside the wake region may be impor-tant for navigating through this ﬂow. In the upper rightof the domain, the velocity swimmer learns to orientdownwards and back to the wake region, while the otherswimmers swim futilely towards the target. Because thevorticity depends on gradients in the background ﬂow,that property cannot be used to respond to ﬂow distur-bances that are spatially uniform. These diﬀerence ap-pear to explain many of the failed trajectories in Figure4, in which the ﬂow-blind and vorticity swimmers are (A) Naïve Swimmer (B) Flow-Blind RL Swimmer(C) Vorticity RL Swimmer (D) Velocity RL Swimmer

Success Rate:1.3 ± 0.4% Success Rate:39.4 ± 5.8%Success Rate:47.2 ± 8.7% Success Rate:99.9 ± 0.1%

FIG. 4. Average success rate with 30 example trajectories foreach swimmer type. Successful attempts to reach the targetare green, while unsuccessful attempts are red. (A) Na¨ıvepolicy of swimming towards the target is rarely successful.(B) The ﬂow-blind RL swimmer navigates more eﬀectivelythan the na¨ıve swimmer. (C) The vorticity RL swimmer ismore successful than the ﬂow-blind swimmer, showing thatsensing the local ﬂow can improve RL-based navigation. (D)Surprisingly, the velocity RL swimmer nearly always reachesthe target using only the local ﬂow velocity. The stated suc-cess rates are averaged over 12,500 episodes and are shownwith one standard deviation arising from the ﬁve times eachswimmer was trained. swept up and to the right by the background ﬂow.It is worth noting that because the ﬂow pushes theswimmers according to linear dynamics (Equation 2), thelocal velocity can exactly determine the swimmer’s po-sition at the next time step. This may explain the highnavigation success of the velocity swimmer, as it has thepotential to accurately predict its next location. To besure, the Deep RL algorithm must still learn where themost advantageous next location ought to be, as the ﬂowvelocity at the next time step is still unknown.While sensing of vorticity is insuﬃcient to detect spa-tially uniform disturbances, it can be useful for distin-guishing the vortical wake from the freestream ﬂow. Thiscan explain why the vorticity swimmer performs betterthan the ﬂow-blind swimmer. A similar reasoning couldapply to swimmers that sense other ﬂow quantities suchas pressure or shear.For real swimmers however, vorticity may play a largerrole, for example by causing a swimmer to rotate in theﬂow [24] or by altering boundary layers and skin frictiondrag [12]. Real robots would also be subject to addi-tional sources of complexity not considered in this sim-pliﬁed simulation, which would make it more diﬃcult todetermine a swimmer’s next position from local velocitymeasurements. (A) Naïve Swimmer (B) Flow-Blind RL Swimmer(C) Vorticity RL Swimmer (D) Velocity RL Swimmer

FIG. 5. Swimming direction policy plotted across the domainfor a ﬁxed target (green circle) at a given time instant. (A)The na¨ıve swimmer swims towards the target. (B) The redoutline highlights how the ﬂow-blind swimmer navigates irre-spective of the background ﬂow, while the vorticity swimmer(C) adjusts its swimming direction modestly. (D) The ve-locity swimmer appears even more sensitive to the unsteadybackground ﬂow.

Comparison with Optimal Control .—In addition toreaching the destination successfully, it is desirable tonavigate to the target while minimizing energy consump-tion or time spent traveling. Biferale et. al [20] demon-strated that RL can approach the performance of time-optimal trajectories in steady ﬂow for ﬁxed start andtarget positions. Here, we ﬁnd that this result also holdsfor the more challenging problem of navigating unsteadyﬂow with variable start and target points.As noted in Equation 4, maximizing r total is equiva-lent to minimizing the time spent traveling to the target( T f ), provided the swimmer successfully reaches the tar-get. Therefore, we compare the velocity RL swimmer tothe time-optimal swimmer derived from optimal control.To ﬁnd time-optimal paths through the ﬂow, givenknowledge of the full velocity ﬁeld at all times, we con-structed a path planner that ﬁnds locally optimal pathsin two steps. First, a rapidly-exploring random tree al-gorithm (RRT) ﬁnds a set of control inputs that drivethe swimmer from the starting location to the targetlocation, typically non-optimally [25]. Then we applyconstrained gradient-descent optimization (i.e. the fmin-con function in MATLAB) to minimize the time step(and therefore overall time T f ) of the trajectory whileenforcing that the swimmer starts at the starting point(Equation 1), obeys the dynamics at every time stepin the trajectory (Equation 2), and reaches the target( || X N − X target || < = 0 . RL: = 8.80Optimal: = 7.38(16% faster) RL: = 18.4Optimal: = 15.6(15% faster) RL: = 33.3Optimal: = 25.7(23% faster)

FIG. 6. Comparison between time-optimal trajectories (red)and RL trajectories (black) using an RL swimmer with state s = { ∆ x, ∆ y u, v } . Time to reach the target T f is made non-dimensional using the timescale D/U ∞ . The surprisingly high performance of the RL approachcompared to a global path planner suggests that deepneural networks can, to some extent, approximate howlocal ﬂow at a particular time impacts navigation in thefuture. In other words, a successful RL swimmer must si-multaneously navigate and identify the approximate cur-rent state of the environment. In comparison, the op-timal control approach relies on knowledge of the envi-ronment in advance. There are limitations to the RLapproach, however. For example, the optimal swimmerin the middle of Figure 6 enters the wake region at adiﬀerent location than the RL swimmer to avoid a highvelocity region, which the RL swimmer may not havebeen able to sense initially.In addition to approaching the optimality of a globalplanner, RL navigation oﬀers a robustness advantage. Asnoted in [20], RL can be robust to small changes in initialconditions. Here, we show that RL navigation can gener-alize to a large area of initial and target conditions as wellas random starting times in the unsteady ﬂow. RL nav-igation may also generalize to other ﬂow ﬁelds to someextent [24]. In contrast, the optimal trajectories here areopen loop: any disturbance or ﬂow measurement inaccu-racy would prevent the swimmer from successfully navi-gating the target. While robustness can be included withoptimal control in other ways [7], responding to changesin the surrounding environment is the driving principle ofthis RL navigation policy. Indeed, the related algorithmof imitation learning has been applied to add robustnessto existing path planners [26].

Conclusion .—We have shown in this Letter how DeepReinforcement Learning can discover robust and time-eﬃcient navigation policies which are improved by sens-ing local ﬂow information. A bio-inspired approach ofsensing the local vorticity provided a modest increasein navigation success over a position-only approach, butsurprisingly the key to success was discovered to lie insensing the velocity ﬁeld, which more directly determinedthe future position of the swimmer. This suggests thatRL coupled with an on-board velocity sensor may be aneﬀective tool for robot navigation. Future investigationis warranted to examine the extent to which the successof the velocity approach extends to real-world scenarios,in which robots may face more complex, 3D ﬂuid ﬂows,and be subject to non-linear dynamics and sensor errors. [1] Weizhong Zhang, T. Inanc, S. Ober-Blobaum, and J. E.Marsden, in (2008) pp. 1083–1088, iSSN:1050-4729.[2] L. A. Kuhnz, H. A. Ruhl, C. L. Huﬀard, and K. L. Smith,Deep Sea Research Part II: Topical Studies in Oceanog-raphy Thirty-year time-series study in the abyssal NEPaciﬁc, , 104761 (2020).[3] J. A. Guerrero and Y. Bestaoui, Journal of Intelligent &Robotic Systems , 297 (2013).[4] M. G. Bellemare, S. Candido, P. S. Castro, J. Gong,M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang,Nature , 77 (2020), number: 7836 Publisher: NaturePublishing Group.[5] E. Zermelo, ZAMM - Journal of Applied Mathematicsand Mechanics / Zeitschrift f¨ur Angewandte Mathematikund Mechanik , 114 (1931).[6] L. Techy, Intelligent Service Robotics , 271 (2011).[7] M. Panda, B. Das, B. Subudhi, and B. B. Pati, Inter-national Journal of Automation and Computing , 321(2020).[8] D. Kularatne, S. Bhattacharya, and M. A. Hsieh, Au-tonomous Robots , 1369 (2018).[9] C. Petres, Y. Pailhas, P. Patron, Y. Petillot, J. Evans,and D. Lane, IEEE Transactions on Robotics ,331 (2007), conference Name: IEEE Transactions onRobotics.[10] T. Lolla, P. F. J. Lermusiaux, M. P. Ueckermann, andP. J. Haley, Ocean Dynamics , 1373 (2014).[11] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli,A. Anandkumar, Y. Yue, and S. Chung, in (2019) pp. 9784–9790, iSSN: 2577-087X.[12] S. Verma, G. Novati, and P. Koumoutsakos, Proceedingsof the National Academy of Sciences , 5849 (2018),publisher: National Academy of Sciences Section: Phys-ical Sciences.[13] E. Fiorelli, N. E. Leonard, P. Bhatta, D. A. Paley,R. Bachmayer, and D. M. Fratantoni, IEEE Journal ofOceanic Engineering , 935 (2006), conference Name:IEEE Journal of Oceanic Engineering.[14] D. A. Caron, B. Stauﬀer, S. Moorthi, A. Singh,M. Batalin, E. A. Graham, M. Hansen, W. J. Kaiser,J. Das, A. Pereira, A. Dhariwal, B. Zhang, C. Oberg,and G. S. Sukhatme, Limnology and Oceanography ,2333 (2008).[15] P. Oteiza, I. Odstrcil, G. Lauder, R. Portugues, andF. Engert, Nature , 445 (2017), number: 7664 Pub-lisher: Nature Publishing Group.[16] G. Dehnhardt, B. Mauck, and H. Bleckmann, Nature , 235 (1998), number: 6690 Publisher: Nature Pub-lishing Group.[17] P. Weber, G. Arampatzis, G. Novati, S. Verma, C. Pa-padimitriou, and P. Koumoutsakos, Biomimetics , 10(2020).[18] M. Gazzola, B. Hejazialhosseini, and P. Koumoutsakos,SIAM Journal on Scientiﬁc Computing , B622 (2014),publisher: Society for Industrial and Applied Mathemat-ics.[19] Y. Jiao, F. Ling, S. Heydari, N. Heess, J. Merel, andE. Kanso, arXiv:2009.14280 [physics, q-bio] (2020),arXiv: 2009.14280.[20] L. Biferale, F. Bonaccorso, M. Buzzicotti, P. ClarkDi Leoni, and K. Gustavsson, Chaos: An Interdisci-plinary Journal of Nonlinear Science , 103138 (2019),publisher: American Institute of Physics.[21] G. Reddy, J. Wong-Ng, A. Celani, T. J. Sejnowski, andM. Vergassola, Nature , 236 (2018), number: 7726Publisher: Nature Publishing Group.[22] G. Novati and P. Koumoutsakos, arXiv:1807.05827 [cs,stat] (2019), arXiv: 1807.05827.[23] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Pre-cup, and D. Meger, arXiv:1709.06560 [cs, stat] (2019),arXiv: 1709.06560.[24] S. Colabrese, K. Gustavsson, A. Celani, and L. Biferale,Physical Review Letters , 158004 (2017), publisher:American Physical Society.[25] S. M. LaValle and J. J. Kuﬀner, The InternationalJournal of Robotics Research , 378 (2001), publisher:SAGE Publications Ltd STM.[26] B. Rivi`ere, W. H¨onig, Y. Yue, and S. Chung, IEEERobotics and Automation Letters5