MModels of benthic bipedalism
F. Giardina and L. Mahadevan
1, 2, 3 John A. Paulson School of Engineering and Applied Sciences, Harvard University Department of Physics, Harvard University Department of Organismic and Evolutionary Biology, Harvard University
Walking is a common bipedal and quadrupedal gait and is often associated with terrestrial andaquatic organisms. Inspired by recent evidence of the neural underpinnings of primitive aquaticwalking in the little skate
Leucoraja erinacea , we introduce a theoretical model of aquatic walkingthat reveals robust and efficient gaits with modest requirements for body morphology and control.The model predicts undulatory behavior of the system body with a regular foot placement patternwhich is also observed in the animal, and additionally predicts the existence of gait bistabilitybetween two states, one with a large energetic cost for locomotion and another associated withalmost no energetic cost. We show that these can be discovered using a simple reinforcementlearning scheme. To test these theoretical frameworks, we built a bipedal robot and show that itsbehaviors are similar to those of our minimal model: its gait is also periodic and exhibits bistability,with a low efficiency gait separated from a high efficiency gait by a ”jump” transition. Overall, ourstudy highlights the physical constraints on the evolution of walking and provides a guide for thedesign of efficient biomimetic robots.
I. INTRODUCTION
The transition of vertebrates from water to land isthought to have occurred around 400 million years agoand required fundamental changes in morphology and be-havior. Used to living in near-neutral buoyancy, aquaticvertebrates had to adapt to the effects of gravity on landwhich required a change in locomotion strategy. Fins pro-vided a basic form of legs that helped early land-dwellingvertebrates to support their body weight and switch fromundulatory (swimming by lateral bending of the body)locomotion to ambulatory (walking) locomotion. A com-mon view of the transition from undulation to walkingis that it occurred gradually [1], consistent with observa-tions in contemporary tetrapods that most closely repre-sent early legged locomotion. For instance, the salaman-der uses a combination of undulatory and ambulatorylocomotion when walking on land [2], supporting the hy-pothesis of gradual transition from swimming to walkingduring vertebrate terrestrialization. While the develop-ment of legs can be traced in the fossil record, the ori-gins of neural circuits giving rise to the control requiredfor ambulatory locomotion are unclear. However, recentwork [3] suggests that the neural circuits required for limbcontrol can be found in aquatic vertebrates who are dis-tant relatives to the first tetrapods, indicating that theneuromuscular basis for legged locomotion was present inall vertebrates with paired fins. These observations raisethe question of whether a walking gait was actively usedin the earliest finned vertebrates or if it only emergedprior to terrestrialization. Given the plethora of extantbenthic (living near the bottom of aquatic environments)fish and species that can walk short stretches on land [4–7], it is conceivable that their ancestors with similar mor-phologies might have used these ancient neural circuitsto walk. A particularly striking example in this regard isthe little skate
Leucoraja erinacea (Figs. 1A,B) that isincapable of undulation due to its rather stiff vertebrae, and uses a benthic gait consisting of left-right alternatingwalking [3]. These observations are strongly suggestivethat walking in aquatic environments can emerge with-out a prior undulatory gait and that the a requirementfor the evolution of walking is the capability to inde-pendently control each fin along with a control strategythat synchronizes the walking motion. While there existsstrong evidence that independent leg control was presentin early vertebrates with paired fins [3], what form thelocomotion gait adopted and whether a neural controlstrategy for stable and efficient legged locomotion wasfeasible remain open questions.Here, inspired by published video data on the dynam-ics of walking in the little skate, we study these questionsby devising a minimal mathematical model to analyzethe stability, energy efficiency, and control complexity ofearly benthic locomotion. We compare the most efficientgaits predicted by the model to the kinematics of the lit-tle skate and show that both are characterized by a left-right alternating leg pattern with an undulating center ofmass and a regular foot placement pattern. Closed-formexpressions for the dynamics of the model show that themost energy efficient gait is associated with no energeticcost of locomotion and merely requires a simple open-loop control strategy. Additionally, the model also pre-dicts the co-existence of a second gait with much lowerefficiency. To complement this explicit dynamic model,we use a reinforcement learning strategy and show thata little skate-like gait is the preferred solution in thisframework, suggesting that an evolutionary process canconverge to this in nature. Finally, to test our resultswith hardware, we build a simple bipedal robot and showthat it also exhibits bistable behavior for certain controlparameters, qualitatively consistent with our theory. a r X i v : . [ q - b i o . Q M ] S e p -1 0 1024681012 RightLeft 𝛼𝐿 A A C 𝛼 DB C footholds 𝑣 ∗ FIG. 1.
Little skate locomotion behavior (data obtained from [3] with permission from the authors) . A , Ventral view oflittle skate indicating leg length L , footsteps, and leg angle α measured relative to the centerline of the body. Scale bar 1cm. B , Sequence of walking gait indicating trajectory of the pelvic girdle (dashed white line), active legs (solid black lines), passivelegs (dashed black lines), and footsteps. C , Trajectory of the pelvic girdle (black line) as a function of dimensionless positionwith footsteps (circles). D , Left and right leg angles α as a function of dimensionless time and mean foot placement angle α . The inset shows the dimensionless speed of the pelvic girdle as a function of dimensionless time with v ∗ (dashed line) theapproximate lower speed bound during steady state locomotion. II. MATHEMATICAL MODEL
Published videos of the little skate [3] allowed us toextract foot placements, trajectories, and leg angles asshown in Figs. 1C,D (See also Supplementary Informa-tion, SI, and SI Video). In steady locomotion, we ob-served that as the pelvic girdle position of the skate un-dulates during locomotion, only one foot is in contactwith the ground at any time, and there is little slip be-tween the leg and the ground during contact. This leadsto a periodic foot placement pattern over time as shownin Fig. 1C, accompanied by periodic dynamics of the legangle α (measured with respect to the centerline of thebody), as shown in Fig. 1D.Based on these observations, a minimal model of lo-comotion suggests modeling the body as a mass m withmoment of inertia I which can move and rotate freely inthe plane if no legs are placed on the ground, as shownin Fig. 2; assuming perfect neutral buoyancy removesthe effect of gravity. Since the legs are very light rela-tive to the massive body, we ignored their mass, assumedthem to be directly attached to the center of mass, andallowed them to switch their ground contact state with afrequency ω . In the absence of slip, upon leg contact withthe ground, the velocity of the body v is perpendicular tothe leg vector r c which points from body to contact point c . The foot-ground contact was modeled as a perfectlyinelastic impact which dissipates any velocity componentthat violates the constraint at foot placement, consistentwith a range of previously used simple models of locomo-tion [10–12]. During the impulse-free phases, a torque T accelerates the body on a circular path around the active leg with length L and the current contact point c . Forsimplicity, we assumed the torque to be constant duringstep i , i.e. T i ( t ) = T i .We described the resulting dynamics of this model interms of generalized coordinates q = [ x, y, θ ] T , with x, y defining the position in the plane and θ the body an-gle with respect to the inertial frame of reference. Therotational degree of freedom was assumed to be decou-pled from translation during a single step, but couplingoccurs at leg transition as the touchdown angle α ismeasured relative to body orientation. In an alternatingleg sequence with constant frequency ω and torque T , thebody rotates during a step i by ∆ θ i = ± T / Iω + ˙ θ i − /ω ,where ˙ θ i = ˙ θ i − ± T /Iω . It is easy to verify that theinitial conditions ˙ θ = ∓ T / Iω, θ = 0 (positive sign forright leg, negative for left) guarantee at every step θ i = 0.The system is subject to opening and closing bilateralcontacts with the ground, which is a problem extensivelystudied in nonsmooth mechanics [13]. We modeled a clos-ing contact with an inelastic impact law and assumed aninitial condition for θ i = 0 , ∀ i as described above. Weconsidered the transition from active ground contact inthe left leg to the right leg only, as the final result is legindependent. The center of mass velocity before impactat step i is denoted by v − i and after impact with v + i . Thedirection of the velocities is given by the geometry of theproblem as shown in Fig. 2A. Pre- and post-impact veloc-ities must be perpendicular to the leg direction (definedby the angles α − and α for detaching and attaching leg,respectively) due to the pivoting motion. At leg transi-tion, the constraints of both legs may not be compatible,which requires a projection of the velocity of the detach- 𝒆 𝑥 𝒆 𝑦 𝒓 𝑚 𝐿 𝛼 𝑚, 𝐼 𝜔𝑡 = 0 𝜔𝑡 = 1 𝜔𝑡 = 2 A 𝑣 𝒓 𝑐 𝑐 𝛼 − 𝛼 𝑣 − 𝑣 + 𝛿 Detaching leg Attachingleg B 𝒆 𝜃 FIG. 2.
Model sketch. A , The leg transition is modelledwith an inelastic impact where the post-impact speed v + isobtained by a projection of the pre-impact speed v − to theline perpendicular to the attaching leg direction. B , Modelsketch at three subsequent dimensionless times. Point mass m with moment of inertia I is constrained in its motion by aconnected massless leg with length L to rotate around currentactive contact point c . The system cannot slip and velocitylosses can only occur due to inelastic impact at leg transi-tion. A constant torque T applied to the leg can acceleratethe system. The leg placement angle is given with ± α attransition. ing leg to the direction perpendicular to the attachingleg. This projection is the inelastic impact law at step i and is defined as v + i = | cos δ i | v − i . (1) δ i is the difference in leg angles at the transition given by δ i = α − α − i − π as shown in Fig. 2A. Therefore, the map-ping of velocity magnitude from pre- to post-impact stateis only a function of the leg angles at collision. With theassumption of constant torque T during contact phase,the gained velocity due to torque T over the period 1 /ω is ∆ v = T /ωLm . The post-impact velocity map follows v + i = | cos δ i | (cid:0) v + i − + ∆ v (cid:1) (2)Equation (2) contains implicitly α − i in δ i which itselfdepends on v + i − by the relation α − i = − α + ( v + i − /ωL + γ/ γ = T /ω L m .This is the evolution of α over one step given the initialvelocity and prescribed torque.The kinematic data in Fig. 1D and SI suggests that,after a short transient phase, the leg torque T , frequency ω , and leg placement angle α are constant over consec-utive steps. This observation suggests the following openloop bipedal control strategy : alternate in leg activationand keep T , ω and α constant over all steps. This al-lowed us to frame the energetic cost, speed, and stabilityof the locomotion gait mathematically. Since all controlparameters are constant, the speeds at which loss due tocollision and gain due to torque T balance and result inno variation in speed over subsequent steps, i.e. the fixedpoints of the system, are given by the implicit equation¯ v ∗ (1 − | cos δ | ) = γ | cos δ | , (3)with ¯ v ∗ = v ∗ /ωL . The evolution of the fixed pointcorresponds to a Poincar´e map of the dynamics of thestudied body. For v = v ∗ the resulting trajectory of thedynamical system is a limit cycle with period 2 /ω , wherethe prefactor is due to the bilateral symmetry. Contin-uous undulation of the body and a regular footstep pat-tern are the prominent features of the gait, which arealso observed in recorded animal data shown in 1B. Asmentioned above, the derivation of (3) does not includedynamics of the body rotation because we can decou-ple rotation from translation with the initial rotationalvelocity ± T / ωI and initial body orientation θ = 0.Stability of the discrete map (2) was quantified by ap-plying a linear stability analysis, i.e. searching for v ∗ such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d v + i d v + i − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) v + i − = v ∗ < . (4)To reveal the possible behaviors and gaits of the modelwe searched for solutions of the discrete map (3). Find-ing a solution (i.e. a fixed point ¯ v ∗ ) in the model equatesto finding a periodic gait. For a constant foot placementangle α the fixed point velocity ¯ v ∗ is solely defined bythe nondimensional torque γ . Fig. 3 shows the solutionsof (3) in a bifurcation diagram for a fixed foot placementangle α . Stability analysis of the solutions revealed thereare three stable regions of interest in the diagram. For γ ≤ . γ is increased,consistent with the existence of a saddle-node bifurcation[14]. For γ ∈ [0 . , .
75] only one stable fixed point existswhich is the continuation of the upper branch. Lastly,at around γ ≈ .
75 a period doubling bifurcation occurs,and the system jumps between two solution branches for γ > .
75. This gait is asymmetric with a sideway com-ponent relative to the body orientation.Gaits with small γ are biologically most plausible asthey reduce energetic cost. In fact, there exists a pointat γ = 0, on the upper saddle-node bifurcation in thebifurcation diagram, for which the post-impact veloc-ity is nonzero, indicating a gait with zero energetic cost.This was confirmed by considering the kinetic energy overtime: energy fluctuations vanished when γ = 0, which re-quired the leg vectors r c of the left and right leg at tran-sition to be parallel, i.e. δ = 0. The speed of this pointwhich corresponds to the energy-optimal gait was studiednext. For small γ , the right-hand side of (3) is negligi-ble, and we have the solutions ¯ v ∗ = 0 and | cos δ | = 1.The first fixed-point speed provides the trivial solutionon the lower branch. For the second solution, recall that δ = 2 α − ¯ v ∗ − γ/ − π which implies ¯ v ∗ = 2 α − nπ, n ∈ Z .For n = 1 the solution corresponds to the energy-optimalgait. The physical interpretation of this result is that forthe energy-optimal gait, the angle δ has to be zero, or putdifferently, the legs have to be parallel at the transitionwhich removes any dissipation due to impact. Interest-ingly, the parameter n suggests that there are an infinitenumber of energy-optimal gaits. In these gaits, the tran-sition from one leg to the next occurs after | n | / γ > γ < . v ∗ was stud-ied next. The Reynolds number in the little skate is ofthe order 10 and drag can be expressed using the dragequation F d = 1 / ρv C d A , with ρ the fluid density, v thebody velocity, C d the drag coefficient, and A the refer-ence area. The part in (2) which gets affected by thisdissipative effect is ∆ v , i.e. the velocity increase over thecontact phase of one leg. The velocity in the case withdrag obeys the differential equationd v d t = TmL − ρv C d A, (5)which has the solution v ( t ) = s TLρC d A tanh r ρC d AT m L ( c mL + t ) ! (6) stableunstableRL path FIG. 3.
Dynamics of bipedal locomotion strategy.
Bi-furcation diagram with nondimensional torque γ = T /ω L m for a fixed foot placement angle α = 2 .
25. Black lines are sta-ble fixed-point velocities ¯ v ∗ = v ∗ /ωL and blue lines unstableones. The bifurcation around γ ≈ .
75 is a period-doublingbifurcation and the solution jumps between the lower and up-per black branch. Black circles show consecutive steps of theoptimized reinforcement learning policy. with c a constant to be defined by initial conditions.This provides an analogous closed form of (2) and en-ables the search for fixed points in the system. The solu-tion depends on the drag coefficient C d and we expectedto retrieve the original drag-less solution for C d → γ values and thefixed-point velocities decrease. We do not know the ac-tual drag coefficient for the little skate, but estimationsof a benthic ray (Raja clavata) suggest a drag coefficientof 0.1 (see [15]). At the considered Reynolds numbers thedrag coefficient of a sphere in a fluid flow is C d ≈ . γ required fora similar speed increases significantly. Around C d ≈ . III. REINFORCEMENT LEARNING OFBIPEDALISM
Our analysis so far shows that aquatic bipedalism hasfew requirements towards morphology (rudimentary legs)and control (constant torque, touchdown angle and fre-quency), but we have imposed an alternating leg sequence C D = -3 C D = C D = C D = stableunstable FIG. 4.
Bifurcation diagram with fluid drag for var-ious drag coefficients.
Bifurcation diagram as a functionof nondimensional torque γ = T /ω L m . Black lines are sta-ble fixed-point velocities v ∗ /ωL and blue lines unstable ones.Equations for the fixed points are obtained by using (6) in(2). As C d → with fixed foot placement angle and torque for the anal-ysis. We have not addressed the question if this gait canbe discovered and if it is optimal when these constraintsare relaxed. Discovery in this context requires an aquaticorganism with rudimentary appendages to learn the neu-ral control for a bipedal gait, which may be hard if thegait is a “needle in a haystack” in the vast control space.The search for a favorable gait relates to the field of gaitselection and optimization [16–19]. A natural question inthe context of motor control learning is if a desired be-havior can be acquired by reinforcement [20]. With theobjective of maximizing the travelled distance and min-imizing the required energy (or equivalently, minimizingenergy consumption for a travelled distance), we traineda reinforcement learning (RL) agent [21, 22] on the modelto obtain a given locomotion speed v T . The frameworkhas four state parameters (planar position and velocityof the point mass) and four control parameters (threecontinuous ones for T, ω, α and a binary one for the legcase l ) - for details see Methods. We observed, in most in-stances of the learning routine like the one shown in Fig.5, that a one-legged locomotion strategy emerges afteronly 4 episodes, two-legged locomotion emerges aroundepisode 200, and periodic locomotion with alternating legcycles emerges via RL around episode 600. This type ofgait prevails as the most efficient one and other exploredstrategies have a worse cumulative reward. We ran ∼
50 instances of learning for 5000 episodes with changinglearning parameters and weights for the reward functionand found that the left-right alternating gait emerged in70% of instances and generated the highest reward. Thebest solution matches the little skate’s observed walkinggait in Fig. 1, and we see an undulatory motion of thecenter of mass and a left-right alternating leg sequence(see Fig. 2C and SI video). For comparison with the bi-furcation diagram, the evolution of the best learned RL -1 0 101234567 -1 0 1012345678910 -1 0 102468101214 -1 0 102468101214 -1 0 1 2 3012345 -1 0 1024681012
Episode4 Episode220 Episode622 Episode1636 Episode1921 Episode2472
FIG. 5.
Learning progress . Training progress of one in-stance of the reinforcement learning agent for the little skatemodel with center of mass trajectories and footsteps at dif-ferent episodes during learning. policy in the parameter space corresponds to the dashedlines in Fig. 3, starting with no forward speed and γ = 1.The first step uses a large γ and over subsequent stepsminimizes it while increasing ¯ v ∗ , eventually settling closeto the upper saddle-node bifurcation point at γ = 0,confirming the optimality of the solution discovered byRL. In the context of our model, these results suggestthat, despite the vast solution space of gaits, a left-rightalternating bipedal control strategy can and will be dis-covered and is the optimal solution for energy efficientlocomotion. IV. ROBOTIC MODEL OF BIPEDALISM
Having shown that the bipedal control strategy is thepreferred solution in an optimization framework, we turnto realize it experimentally, inspired by a range of recentstudies in this vein such as mimicking the legged locomo-tion of mudskippers [23], the reconstruction of feasibletetrapod gaits in extinct species [24], the cost of trans-port of high frequency swimming [25], and robophysicalstudies in general [26]. We used a supported (simulatingneutrally buoyant environments) robotic biped as shownin Fig. 6A. The legs were mechanically constrained tosatisfy α ∈ [0 , . rad and we fixed ω , L , m and varied T to change γ . The design of the robot aims to test theplanar dynamics of aquatic walking, restricting verticaloscillations of the body (but not the vertical displacementof legs) for simplicity; an unsupported system would re-quire stabilization of vertical body attitude and positionby e.g. using a tail or pectoral fins that generate lift. Wealso ignored the effect of fluid drag for the study of pla-nar dynamics as we found no qualitative change in thebifurcation diagram when fluid drag was considered.Fig. 6B shows a bottom view of a series of snapshots atdifferent times of two experiments. The system was ini-tialized from rest and with γ = 0 .
123 and γ = 0 . C B 𝛾 = 0.148 𝛾 = 0.123 𝜔𝑡 = 2.2 𝜔𝑡 = 9𝜔𝑡 = 13𝜔𝑡 = 11 𝜔𝑡 = 2.2 𝜔𝑡 = 5.3𝜔𝑡 = 8.4 𝜔𝑡 = 7.1 Theory fixed pointsRobot exp. from rest
Robot exp. flying start
Robot exp. dataLittle skate speeds A -1 0 1024681012 -1 0 1024681012 -1 0 1024681012 -1 0 1024681012 -1 0 1024681012 -1 0 1024681012 Torque TGround contact Ground clearanceLeg reset to 𝜶 FIG. 6.
Robot experiments. A , Image of supported robot with right leg in contact with the ground and left leg resettingto initial angle α . Scale bar 5cm. B , Bottom-up view of sequence of robot walking gait for two cases initialized from rest.The black line indicates the center of mass trajectory of the robot. Circles indicate closed leg contact points for correspondingpicture. Scale bar 5cm. C , Experimental fixed-point velocities as a function of nondimensional torque γ . The shaded regionis the range of observed little skate speeds as obtained from the video analysis in [3]. Insets show a selection of experimentaltrajectories of the center of mass (black line) as a function of dimensionless position with footsteps (circles). the robot and the dots to the footsteps of the snapshots.The coexistence of fixed points at γ < .
13 was tested byinitializing the robot from rest and with a flying start, i.e.an initial nonzero velocity. As expected we observed twosteady solutions: a slow and a fast gait. Note, however,as γ increases, we no longer require a nonzero velocityinitialization to reach the fast solution branch, effectivelydemonstrating that gait transition (resting to walking),acceleration, and stabilization are performed without theneed for additional control. As observed in skate experi-ments and our model, the robot also exhibits undulatorybehavior and a regular footstep pattern. The observedversus predicted locomotion speeds are shown in Fig. 6C.The observed locomotion speeds are low when startedfrom rest for γ < .
13, but converge to the upper branchof the bifurcation diagram for γ > .
13, with exceptionof the γ = 0 .
123 case where a convergence to the upperbranch occurs at marginally lower γ than predicted. Alltogether, the gait is completely determined in terms ofa constant rate of motion ω , range of motion α , andenergetic cost as determined by the constant torque T . V. DISCUSSION
Under the assumptions of neutrally buoyant environ-ments and rudimentary leg-like appendages we showedthat a left-right alternating bipedal locomotion strategyexists in a minimal model and that it is stable, energyefficient, learnable, and easily realizable in a robotic plat-form. The fact that such a simple control strategy leadsto remarkably robust and efficient behavior is related tothe concept of passive dynamics, as seen for instance ina slinky toy “walking” down a slope without the need of complex control and the passive dynamic walkers [27]that revolutionized the way we think about human-likelocomotion. Such systems demonstrate that the appro-priate morphology for a particular environment oftenleads to the most efficient behavior with remarkable sim-ple or no active feedback control. In the same vein, it wasshown that anesthetized fish can swim upstream giventhe right environmental conditions [28] which revealedthat the concept of passive dynamics is indeed exploitedin natural systems. The little skate robot presented hereis not passive as the energy source for motion is an in-ternal actuator and not an external source of energy likegravitational potential energy, but it exploits the sameprinciples of a passive system: sustained locomotion un-der a constant energy source without feedback control.The selection of the walking gait in the little skatemay be a consequence of an increasing energetic cost oftransport for swimming at slow locomotion speeds [29],similar to the gait transition in other vertebrates [16].Metabolic rate measurements of walking skates are yetto be recorded and will provide further insights into en-ergy expenditure as a driver of gait selection, but thepassive bipedal gait presented here can help explain theenergetic benefits of benthic walking in aquatic environ-ments. It must be added that the little skate uses analternative legged gait called punting [8] which it uses,for example, to kick-start the left-right alternating loco-motion strategy. Punting uses two legs simultaneouslyand was not discovered in our optimization framework,but it may be the preferred gait when fast accelerationis rewarded over energy efficient locomotion.Our study complements earlier work on the theoreticalexistence of zero-energy gaits in terrestrial walking[30] byshowing how it arises in a minimal theoretical model foraquatic bipedalism, and approximately in robot experi-ments. In particular, it requires the legs to be collinearat the end/beginning of every footstep, effectively reduc-ing the degrees of freedom of the problem. Instead ofcontrolling the legs individually, one leg can simply mir-ror the motion of the other leg, reminiscent of mirroralgorithms used in other impulsive robotic tasks like jug-gling [31]. This type of gait can also be realized usinga rigid body with two attached rigid legs; walking thencorresponds to alternate rotations about a vertical axis,centered about one of two legs. This is similar to thewaddling gait of penguins, where lateral undulation isthought to improve the energetics of locomotion [32]. Ofcourse such zero-energy models do not account for lossesdue to internal damping, cost of leg swing, or contribu-tion of leg mass to collision, fluid drag etc. Adding fluiddrag to our model, we found no qualitative difference indynamics. Comparing the observed steady locomotionspeed in the little skate v ∗ ∈ [1 . , . terrafirma . Understanding how the brain, body and envi-ronment worked together in heterogeneous aquatic andterrestrial environments likely needed to include propri-oceptive feedback. But in reliably homogeneous environ-ments, perhaps the simple strategy quantified here waswhere it all started. VI. METHODS
Animal data . Kinematic data from little skates wasobtained from supplementary movies in [3] with permis-sion from the authors. Center of mass position, bodyorientation, and leg positions were extracted using thesoftware Kinovea. Some characteristics of the extracteddata are shown in Table I. The animals were tested in a tank with a textured PDMS surface for traction of thelegs with the substrate. Slip per step of the leg duringstance phase varied across individuals and ranged be-tween 0 . α (Figs. 1D, SI) wereobtained from measuring the angle between the centerline(from tracking the connecting line between pelvic girdleand mouth) and vectors pointing from the pelvic girdleto the leg tips. Velocities of pelvic girdle as a function oftime (insets in Figs. 1D, SI) were computed from filteredtrajectories (local regression using weighted linear leastsquares and a 2nd degree polynomial model using a spanof 10% of the total number of data points) and numer-ically differentiating them with respect to time. Datawas made dimensionless with a leg length L = 1 . cm and a frequency ω = 1 . Hz which were extracted fromthe movies. TABLE I.
Mean (bold) and standard deviation (paren-theses) of kinematic data from 3 individual skates .Data averaged over 10 steps in experiment excluding initialacceleration. v m mean nondimensional velocity, φ phase dif-ference of legs, α p peak leg angle, slip/step normalized by leglength.Skate v m φ α p slip/step1 . (0 . ◦ (34 ◦ ) . (0 . . (0 . . (0 . ◦ (10 ◦ ) . (0 . . (0 . . (0 . ◦ (10 ◦ ) . (0 . . (0 . Reinforcement learning framework.
For themodel-based optimization of the little skate gait we useda reinforcement learning (RL) framework due to the obvi-ous links between episodic and biological learning. Otheroptimization methods such as trajectory optimizationcan also find the optimal solution, but would not pro-vide insight into the learnability of the walking gait viaprocesses related to biological reinforcement [20]. Wechose a deep deterministic policy gradient (DDPG) rein-forcement learning agent for the optimization of the littleskate gait. DDPG [21] has the advantage of acceptingcontinuous control inputs, which is commensurate withthe biological control capabilities of the little skate. Thedynamics for the RL environment are obtained by com-puting the next step position after placing leg l at angle α on the ground and applying a torque T for 1 /ω sec-onds. This provides the new position coordinates x, y and velocities v = ( ˙ x, ˙ y ) T . We ignored the rotationaldegree of freedom of the little skate center of mass forsimplicity. At every episode, the center of mass is placedat the origin and its velocity set to zero. The reward ofstep i was defined as R i = −| v y − v T | + ∆ y (cid:18) − TT max . (cid:19) (7)The first term on the left side penalizes variations of theend-of-step vertical component of the velocity from thetarget speed v T . This term drives the system towardsa constant locomotion speed v T . The second term ac-counts for optimization of the cost of transport, in thatit rewards the product of travelled distance ∆ y and neg-ative normalized torque. T max is the maximum appli-cable torque in the system defined as a bound in theRL problem. The bounds for control parameters were T ∈ [0 , , ω ∈ [1 , , α ∈ [0 , π ], and l ∈ {− , } . Weused Matlab’s reinfrocement learning toolbox to trainthe critic and actor networks with two fully connectedlayers with 400 and 300 nodes and rectifiers as acti-vation functions (except for the actor ouput where weused a tanh function). The leg case (-1 left, 1 right)was treated as a continuous control variable which wasput through a signum function before its use. To testthe effects of learning parameters on the converged so-lution we ran combinations of values for noise variance { . , . , . } , discount factor { . , . , . } and learn-ing rates { × − , × − , × − } . We ran the RLroutine for 5000 episodes (an episode was ended after amaximum of 30 steps or if the center of mass surpassedthe bounds at x = ± l ) for all combinations of parame-ter values and found convergence to the optimal bipedalgait in 17 of 27 cases, one of them is shown in Fig. 5(all routines with learning rates of 5 × − did not con-verge). We further asked, whether changing the relativeweight of the two terms in the reward function (7) hadan effect on the optimal solution of the gait. We ran 20learning routines of 5000 episodes each and weighted theterms 1:3, 1:1, and 3:1. The solution yielding the highestreward was again of the type shown in Fig. 5 (left-rightalternating strategy) and was found in 16 of 20 cases. Robot experiment.
We developed a supportedlegged robotic system to systematically test the modelpredictions. The robot body was created using PolyJettechnology using VeroWhitePlus material. The robot ispowered by a 6V Nickel-metal hydride battery and digi-tally controlled with an Arduino Uno microcontroller. Amotor driver (pololu max14870) operated two 6V DC mo-tors (pololu 50:1 micro metal gear motor medium power)to allow for leg rotation. A servo motor per leg ensuredground contact and clearance of the leg tips (Power HDSub-Micro Servo HD-1440A). Small rubber pads wereglued to the leg tips to reduce slip. Two pins weremounted to the robot structure which prevent the legsfrom exceeding the angle mechanically, and the mainrobot structure prevented the leg angle from becomingnegative, i.e. we always have α ∈ [0 , . m = 350 g and leg length L = 8 cm . Therobot was connected with a long and stiff aluminum barto a ball bearing which moves on a linear 1m steel rod.The steel bar was mounted at an angle of 0 . ◦ at whichthe ball bearing could slide along the steel rod. Thisresulted in a decrease in height of the bar position in di-rection of travel of the robot. Although this decrease ingravitational potential along the bar could potentially be used as a source for acceleration of the robot, friction in-side the ball bearing resulted in a marginal velocity loss ifthe system is started with an initial speed v . Note thatthis is a conservative set-up as our model predicts no ve-locity loss over time in case of no leg collisions with theground. The robot is hanging above a glass plate 90cmin length onto which the feet could push against whenactivated to close ground contact. A webcam recordedthe locomotion behavior from the bottom of the glassplate at 30fps and center of mass trajectories obtainedby tracking a blue marker on the bottom of the robotwere subsequently extracted by analyzing the videos us-ing Matlab. See the SI for an illustration of the set-up.The control strategy for walking was implemented asfollows. Both legs are initialized at α = 2 .
15 beforeevery trial. Leg switching frequency ω was set to 1.3Hz.At switching time both DC motors reverse their rotationdirection and servo motors change their state from liftedto contact or from contact to lifted. The parameter γ was tuned by changing the leg torque exerted by the DCmotors, which was controlled by adjusting the suppliedvoltage set by the motor controller. See supplementaryvideos for various trials with a selection of γ ’s.The data generated for Fig. 6C was obtained from 5independent robot experiments per error bar. All exper-iments were initialized from rest except the three errorbars on the upper branch in the bistable region whichwere initialized with a nonzero velocity. The nondimen-sional initial speed of all flying starts is v = 2 . . γ ≈ .
05 whichwere started with nonzero velocity often converged to thelower branch and slow velocities. The results shown forthis case are the 5 successful cases where terminal veloc-ity in the cameras field of view did not vanish. However,we cannot guarantee that these cases have converged or ifthey would further decay in a larger experimental set-up,which may explain the prediction error. In the case of γ ≈ .
123 we observe slower speeds than expected, whichcan be explained with the fact that the gait has not com-pletely converged to the steady state speed. These longtransient phases were observed in simulations where γ is close but past the end of the bistable region, whichcorresponds well to the position of γ ≈ . ACKNOWLEDGMENTS
We acknowledge support from the Swiss National Sci-ence foundation (F.G.) and by a MacArthur Fellowship(L.M.). [1] S. Grillner and T. M. Jessell, “Measured motion: search-ing for simplicity in spinal locomotor networks,”
Currentopinion in neurobiology , vol. 19, no. 6, pp. 572–586, 2009.[2] S. Chevallier, A. J. Ijspeert, D. Ryczko, F. Nagy, and J.-M. Cabelguen, “Organisation of the spinal central pat-tern generators for locomotion in the salamander: biol-ogy and modelling,”
Brain research reviews , vol. 57, no. 1,pp. 147–161, 2008.[3] H. Jung, M. Baek, K. P. DElia, C. Boisvert, P. D. Cur-rie, B.-H. Tay, B. Venkatesh, S. M. Brown, A. Heguy,D. Schoppik, et al. , “The ancient origins of neural sub-strates for land walking,”
Cell , vol. 172, no. 4, pp. 667–682, 2018.[4] H. M. King, N. H. Shubin, M. I. Coates, and M. E.Hale, “Behavioral evidence for the evolution of walk-ing and bounding before terrestriality in sarcopterygianfishes,”
Proceedings of the National Academy of Sciences ,vol. 108, no. 52, pp. 21146–21151, 2011.[5] B. E. Flammang, A. Suvarnaraksha, J. Markiewicz, andD. Soares, “Tetrapod-like pelvic girdle in a walking cave-fish,”
Scientific reports , vol. 6, p. 23711, 2016.[6] L. J. Macesic and S. M. Kajiura, “Comparative punt-ing kinematics and pelvic fin musculature of benthic ba-toids,”
Journal of morphology , vol. 271, no. 10, pp. 1219–1228, 2010.[7] L. O. Lucifora and A. I. Vassallo, “Walking in skates(chondrichthyes, rajidae): anatomy, behaviour andanalogies to tetrapod locomotion,”
Biological Journal ofthe Linnean Society , vol. 77, no. 1, pp. 35–41, 2002.[8] D. M. Koester and C. P. Spirito, “Punting: an unusualmode of locomotion in the little skate, leucoraja erinacea(chondrichthyes: Rajidae),”
Copeia , vol. 2003, no. 3,pp. 553–561, 2003.[9] R. Pfeifer, M. Lungarella, and F. Iida, “Self-organization,embodiment, and biologically inspired robotics,” science ,vol. 318, no. 5853, pp. 1088–1093, 2007.[10] M. Garcia, A. Chatterjee, A. Ruina, and M. Coleman,“The simplest walking model: stability, complexity, andscaling,”
Journal of biomechanical engineering , vol. 120,no. 2, pp. 281–288, 1998.[11] A. Goswami, B. Thuilot, and B. Espiau, “A study of thepassive gait of a compass-like biped robot: Symmetryand chaos,”
The International Journal of Robotics Re-search , vol. 17, no. 12, pp. 1282–1301, 1998.[12] J. R. Usherwood and J. E. Bertram, “Understandingbrachiation: insight from a collisional perspective,”
Jour-nal of Experimental Biology , vol. 206, no. 10, pp. 1631–1642, 2003.[13] B. Brogliato,
Nonsmooth mechanics . Springer, 1999.[14] S. H. Strogatz,
Nonlinear dynamics and chaos: with ap-plications to physics, biology, chemistry, and engineering .CRC Press, 2018.[15] P. W. Webb, “Station-holding by three species of benthicfishes,”
Journal of Experimental Biology , vol. 145, no. 1,pp. 303–320, 1989.[16] D. F. Hoyt and C. R. Taylor, “Gait and the energetics oflocomotion in horses,”
Nature , vol. 292, no. 5820, p. 239,1981.[17] A. Ruina, J. E. Bertram, and M. Srinivasan, “A colli-sional model of the energetic cost of support work quali-tatively explains leg sequencing in walking and galloping, pseudo-elastic leg behavior in running and the walk-to-run transition,”
Journal of theoretical biology , vol. 237,no. 2, pp. 170–192, 2005.[18] M. Srinivasan and A. Ruina, “Computer optimization ofa minimal biped model discovers walking and running,”
Nature , vol. 439, no. 7072, p. 72, 2006.[19] R. M. Alexander, “Optimum walking techniques forquadrupeds and bipeds,”
Journal of Zoology , vol. 192,no. 1, pp. 97–117, 1980.[20] P. W. Glimcher, “Understanding dopamine and rein-forcement learning: the dopamine reward prediction er-ror hypothesis,”
Proceedings of the National Academy ofSciences , vol. 108, no. Supplement 3, pp. 15647–15654,2011.[21] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez,Y. Tassa, D. Silver, and D. Wierstra, “Continuous con-trol with deep reinforcement learning,” arXiv preprintarXiv:1509.02971 , 2015.[22] R. S. Sutton and A. G. Barto,
Reinforcement learning:An introduction . MIT press, 2018.[23] B. McInroe, H. C. Astley, C. Gong, S. M. Kawano, P. E.Schiebel, J. M. Rieser, H. Choset, R. W. Blob, and D. I.Goldman, “Tail use improves performance on soft sub-strates in models of early vertebrate land locomotors,”
Science , vol. 353, no. 6295, pp. 154–158, 2016.[24] J. A. Nyakatura, K. Melo, T. Horvat, K. Karakasilio-tis, V. R. Allen, A. Andikfar, E. Andrada, P. Arnold,J. Laustr¨oer, J. R. Hutchinson, et al. , “Reverse-engineering the locomotion of a stem amniote,”
Nature ,vol. 565, no. 7739, p. 351, 2019.[25] J. Zhu, C. White, D. Wainwright, V. Di Santo,G. Lauder, and H. Bart-Smith, “Tuna robotics: Ahigh-frequency experimental platform exploring the per-formance space of swimming fishes,”
Science Robotics ,vol. 4, no. 34, p. eaax4615, 2019.[26] J. Aguilar, T. Zhang, F. Qian, M. Kingsbury, B. McInroe,N. Mazouchova, C. Li, R. Maladen, C. Gong, M. Travers, et al. , “A review on locomotion robophysics: the studyof movement at the intersection of robotics, soft matterand dynamical systems,”
Reports on Progress in Physics ,vol. 79, no. 11, p. 110001, 2016.[27] T. McGeer et al. , “Passive dynamic walking,”
I. J.Robotic Res. , vol. 9, no. 2, pp. 62–82, 1990.[28] D. Beal, F. Hover, M. Triantafyllou, J. Liao, and G. V.Lauder, “Passive propulsion in vortex wakes,”
Journal ofFluid Mechanics , vol. 549, p. 385, 2006.[29] V. Di Santo and C. P. Kenaley, “Skating by: low en-ergetic costs of swimming in a batoid fish,”
Journal ofExperimental Biology , vol. 219, no. 12, pp. 1804–1807,2016.[30] M. Gomes and A. Ruina, “Walking model with no energycost,”
Physical Review E , vol. 83, no. 3, p. 032901, 2011.[31] M. Buhler, D. E. Koditschek, and P. J. Kindlmann, “Afamily of robot control strategies for intermittent dy-namical environments,”
IEEE Control Systems Maga-zine , vol. 10, no. 2, pp. 16–22, 1990.[32] T. M. Griffin and R. Kram, “Biomechanics: penguinwaddling is not wasteful,”
Nature , vol. 408, no. 6815,p. 929, 2000.[33] E. M. Standen, T. Y. Du, and H. C. Larsson, “Devel-opmental plasticity and the origin of tetrapods,”
Nature , vol. 513, no. 7516, p. 54, 2014. upplementary Figures for“Models of benthic bipedalism” Fabio Giardina , and L. Mahadevan , School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA Department of Physics, Department of Organismic and EvolutionaryBiology, Harvard University, Cambridge, Massachusetts 02138, USA
RightLeft
RightLeft BA 𝑣 ∗ 𝑣 ∗ FIG. S1:
Locomotion kinematics of little skate A and B. . These data are different specimen and runs fromFig. 1D. Left and right leg angles α as a function of dimensionless time and mean foot placement angle α . Theinset shows the dimensionless speed of the pelvic girdle as a function of dimensionless time with v ∗ (dashed line) theapproximate lower speed bound during steady state locomotion. Note that individuals A and B are different from theone corresponding to data in 1D. c m CameraLED source Glass plateRobot Steel bar
Ball bearing
Rigid aluminum connection
Motion c m c m A B FIG. S2:
Sketch of experiment set-up (not to scale) and robot . A . Robot is walking on a glass plate andfilmed with a camera from below. A rigid aluminum beam supports the robot against gravity. The beam is connectedvia a ball bearing to a steel rod along which it can slide freely. B Robot walking on glass plate and its components. a r X i v : . [ q - b i o . Q M ] S e p C Episode Number -20020 E p i s od e R e w a r d Episode RewardAverage Reward -1 0 101234567 -1 0 1012345678910 -1 0 102468101214 -1 0 102468101214 -1 0 1 2 3012345 -1 0 1024681012
Episode4 Episode220 Episode622 Episode1636 Episode1921 Episode2472
FIG. S3:
Sample learning progress . Trainingprogress of DDPG agent for the little skate model withinset showing center of mass trajectories and footfalls atdifferent episodes during learning. step -1-0.500.511.522.53
T1/ leg FIG. S4: