[PDF] Microswimmers learning chemotaxis with genetic algorithms

Abstract

Various microorganisms and some mammalian cells are able to swim in viscous fluids by performing nonreciprocal body deformations, such as rotating attached flagella or by distorting their entire body. In order to perform chemotaxis, i.e. to move towards and to stay at high concentrations of nutrients, they adapt their swimming gaits in a nontrivial manner. We propose a model how microswimmers are able to autonomously adapt their shape in order to swim in one dimension towards high field concentrations using an internal decision making machinery modeled by an artificial neural network. We present two methods to measure chemical gradients, spatial and temporal sensing, as known for swimming mammalian cells and bacteria, respectively. Using the NEAT genetic algorithm surprisingly simple neural networks evolve which control the shape deformations of the microswimmer and allow them to navigate in static and complex time-dependent chemical environments. By introducing noisy signal transmission in the neural network the well-known biased run-and-tumble motion emerges. Our work demonstrates that the evolution of a simple internal decision-making machinery, which we can fully interpret and is coupled to the environment, allows navigation in diverse chemical landscapes. These findings are of relevance for intracellular biochemical sensing mechanisms of single cells, or for the simple nervous system of small multicellular organisms such as C. elegans.

Full PDF

MMicroswimmers learning chemotaxis with genetic algorithms

Benedikt Hartl, Maximilian H¨ubl, Gerhard Kahl, and Andreas Z¨ottl ∗ Institute for Theoretical Physics, TU Wien, Wiedner Hauptstr. 8-10, 1040 Wien, Austria (Dated: February 1, 2021)Various microorganisms and some mammalian cells are able to swim in viscous ﬂuids by per-forming nonreciprocal body deformations, such as rotating attached ﬂagella or by distorting theirentire body. In order to perform chemotaxis, i.e. to move towards and to stay at high concentra-tions of nutrients, they adapt their swimming gaits in a nontrivial manner. We propose a modelhow microswimmers are able to autonomously adapt their shape in order to swim towards highﬁeld concentrations using an internal decision making machinery modeled by an artiﬁcial neuralnetwork. We present two methods to measure chemical gradients, spatial and temporal sensing, asknown for swimming mammalian cells and bacteria, respectively. Using the NEAT genetic algorithmsurprisingly simple neural networks evolve which control the shape deformations of the microswim-mer and allow them to navigate in static and complex time-dependent chemical environments. Byintroducing noisy signal transmission in the neural network the well-known biased run-and-tumblemotion emerges. Our work demonstrates that the evolution of a simple internal decision-makingmachinery, which we can fully interpret and is coupled to the environment, allows navigation indiverse chemical landscapes. These ﬁndings are of relevance for intracellular biochemical sensingmechanisms of single cells, or for the simple nervous system of small multicellular organisms suchas

C. elegans . INTRODUCTION

Microogranisms possess a huge variety of diﬀerent self-propulsion strategies in order to actively swim throughviscous ﬂuids such as water, which is realized by perform-ing periodic nonreciprocal deformations of their bodyshape [1–4]. In order to search for nutrients, oxygen, orlight, they have developed mechanisms to change theirshape and hence their swimming direction abruptly. Animportant example is the run-and-tumble motion of var-ious bacteria such as

Escherichia coli [5, 6] or of the al-gae

Chlamydomonas [7]. In order to perform chemotaxis,bacteria use temporal information of chemical ﬁeld con-centrations mediated by a time-dependent response func-tion which suppresses tumbling when swimming upwardschemical gradients [5, 8–10]. Some bacteria follow morediverse chemotactic strategies which can be related totheir speciﬁc propulsion mechanisms [11]. In contrastto bacteria, many eukaryotic cells such as

Dictyostelium [12, 13], leukocytes [14] or cancer cells [15] are able toperform chemotaxis by adapting their migration direc-tion in accordance with the chemical gradient by spatialsensing with membrane receptors. From an evolutionarypoint of view, it remains elusive how motility and chemo-tactic patterns evolved together, bearing in mind thatboth diﬀerent prokaryotic and eukaryotic cells with di-verse self-propulsion mechanisms developed surprisinglysimilar chemotactic machinery [14, 16, 17].In our work we use machine learning (ML) techniquesin order to investigate how chemotaxis-based decisionmaking can be learned and performed in a viscous en-vironment. During past years various ML approaches ∗ [email protected] have become increasingly appealing in physics, for exam-ple in material science, soft matter and ﬂuid mechanics[18–20]. Unsupervised reinforcement learning (RL) hasbeen used in various biologically motivated active mattersystems [21] to investigate optimum strategies, employedby smart, self-propelled agents: examples are to navi-gate in ﬂuid ﬂow [22–25] and airﬂow [26], in complexenvironments, external ﬁelds [27] and potentials [28]. Sofar, two contributions have taken the viscous environ-ment into account, namely one applying Q-learning toa three-bead-swimmer [29], and one using deep learningto ﬁnd energetically eﬃcient collective swimming of ﬁsh[30]. Experimental realizations of ML applied to self-propelled objects are navigation of microswimmers on agrid [31] or macroscopic gliders learning to soar in theatmosphere [32].Here we address the problem, of how a microswimmeris able to make decisions by adapting its shape in order toperform chemotaxis. To employ adaptive swimming be-havior, microswimmers need to be – to a certain extent– aware of both, their environment and their internal,physiological state. Substituting the complex biochem-ical sensing machinery of unicellular organisms, or realsensory and motor neurons of small multicellular organ-isms such as C. elegans we therefore employ the evolutionof a simple artiﬁcial neural network (ANN) which is ableto sense the environment and proposes actions to deformthe body shape accordingly. We introduce both spatialand temporal chemical gradient sensing leading to diﬀer-ent decision making strategies and dynamics in chemicalenvironments. a r X i v : . [ phy s i c s . b i o - ph ] J a n RESULTSMicroswimmer model

As a simple model we use the so-called three-beadswimmer introduced originally by Najaﬁ and Golestanian[33]. It swims in a viscous ﬂuid of viscosity η via peri-odic, nonreciprocal deformations of two arms, connectingthree aligned beads of radius R , located at positions x i , i = 1 , , L and L are extended and stretchedby time-dependent forces F i ( t ) acting on the hydrody-namically interacting beads, which determine the beadvelocities v i ( t ) [34] (see SI Appendix). In this manner aforce-free microswimmer (i.e., (cid:80) i F i = 0) is able to per-form locomotion via nonreciprocal motions of the beadsresulting in a directed displacement of the center of mass(COM) position x c = ( x + x + x ) / R , the viscosity η and themaximum force on a bead F such that | F i | < F . Hencethe unit of time is T = ηR /F . In previous studiesof this model either the forces or the linearly connectedbead velocities v i ( t ) have been prescribed via a periodic,nonreciprocal motion pattern [33–35]. Alternatively aQ-learning procedure [29] has been applied (see also Dis-cussion section). Phase one: Learning unidirectional locomotion

We start by demonstrating that a microswimmer isable to learn swimming in the absence of a chemicalﬁeld with the help of a simple genetic algorithm. Thisis achieved by applying RL [36] using a reward schemewhich optimizes a microswimmer’s strategy of locomo-tion along a prescribed direction within a viscous ﬂuidenvironment.RL algorithms are designed to optimize the policy of aso-called agent during training: In general, the policy is ahighly complex and task-speciﬁc quantity that maps thestate of an environment , i.e. everything the agent can per-ceive (input), onto actions which the agent can activelypropose (output) in order to maximize an objective (orreward) function (see Fig. 1). Such rewards might berelated to maximize the score of a computer game [37],to minimize the (free) energy when folding proteins [38],or – as in our case – to maximize the distance that amicroswimmer actively moves along a certain direction.In our approach the agent represents the internal de-cision making machinery responsible for the deforma-tions of the microswimmer. The agent takes as input(i.e. as information it needs to decide about future ac-tions) the state of the environment given by the instan-taneous arm lengths L ( t ) and L ( t ), and arm veloci-ties V i ( t ) = dL i ( t ) /dt , i = 1 ,

2. In addition we allowthe total length L T ( t ) = L ( t ) + L ( t ), and the velocity V T ( t ) = V ( t )+ V ( t ) as input. The arm lengths are nor- Agent (Control)

HiddenInputOutput

Environment (Model)Swimmer Ac � on Layer (SAL) x x x L L F Ac � on xc ( x , t )2 R State

NEAT Genera � ons F F Reward L L L T V V V T F F F i t n e ss [ v / ( - R T ) ] - Figure 1. Schematic representation of the RL cycle for athree-bead swimmer moving in a viscous environment (topleft) controlled by an ANN-based agent (bottom left). Re-ward is maximized during training and is granted either forunidirectional locomotion – phase one – or for chemotaxis(top right) – phase two. Bottom right: typical NEAT trainingcurves showing the maximum (blue), the mean (black), andthe standard deviation (gray) of the ﬁtness (i.e. of the cumu-lative reward) of successive NEAT generations each covering480 neural networks when learning unidirectional locomotion. malized by the default length L = 10 R and subjectedto restoring forces acting when L , L are > . L or < . L in order to limit the extent of L and L (see SIAppendix). With this information the agent proposes ac-tions which in our case are the forces F ( t ) and F ( t ) thatdetermine the dynamics of the swimmer. The full hydro-dynamic environment, including the three-bead model ofthe microswimmer, represents the (interactive) environ-ment, whose state is updated after the agent has activelyproposed its actions (see left part of Fig. 1). In an eﬀortto train unidirectional motion we choose the COM posi-tion x c of a microswimmer to be maximized after a ﬁxedintegration time T I ; x c thus represents the cumulative re-ward of this training process. In this manner we achievepositive reinforcement when the swimmer moves to theright (positive x direction) and negative reinforcementwhen it swims to the left (negative x direction).In an eﬀort to approximate the analytically unknownoptimum policy of the microswimmer we use ANNs (seebottom left panel of Fig. 1 and Methods) where the out-put is represented by output neurons nonlinearly con-nected to the input neurons using hyperbolic tangentfunctions whose arguments depend on the weights of theconnections. In our case the internal structure of theANN (weights and topology) is successively optimizedusing the NEAT genetic algorithm to maximize the re-ward (for details see Methods and SI Appendix).The training of the swimmer agent is performed overmultiple RL steps which correspond to successive NEATgenerations. At each step an ensemble of N = 480ANNs (representing one generation) controls the swim-ming gaits of an ensemble of N independent microswim-mers. The cumulative reward x c ( T I ) is evaluated sep-arately for each microswimmer trajectory deﬁning the t / T s x c / R x c temporal x c spatial 0.71.01.3 L e n g t h s

32 33 34 t / T s F o r c e s L / L L / L F / F F / F t / T s x c / R x c temporal x c spatial 0.71.01.3 L e n g t h s

32 33 34 t / T s F o r c e s L / L L / L F / F F / F FF / F , F / F / F , F / F L / L , L / L L / L , L / L A F / F , F / F F / F , F / F L / L , L / L L / L , L / L F / F , F / F F / F , F / F L / L , L / L L / L , L / L

10 5 c ( x )/ c x / R C t / T s x c / R t / T s -101 0.7 1.0 1.3 L / L L / L F / F F / F

100 96 c ( x )/ c x / R B L L L T V V V T O-SALF F L L L T V V V T MC-SALF F Figure 2. Trajectories of the three-bead swimmer after training. (A) left: Time evolution of the center of mass x c for optimum(O-SAL) (gray) and minimal complexity (MC-SAL) (blue) ANN solution. Insets show topology of O-SAL and MC-SAL ANNsolutions. Time t is shown in units of the MC-SAL stroke period T S . center: Corresponding arm length solutions L ( t ) and L ( t ), and arm forces, F ( t ) and F ( t ), in the absence of a chemical ﬁeld, shown for MC-SAL. right: Phase space curves ( L , L )and ( F , F ) for O-SAL (gray) und MC-SAL (blue). (B) Similar as in (A), but for a MC-SAL swimmer in a linear chemical ﬁeld(see left-most panel), c ( x ) = max(0 , a − k | x − x | ) for an amplitude a = 100 c , slope k = c /R and peak position x = 2 . R ,with temporal (red and blue trajectories of x c ) and spatial (black dashed trajectory of x c ) chemical gradient sensing (see Fig. 3for ANN solutions). Temporal sensing trajectories and phase-space plots are color coded by the currently estimated gradientdirection (blue: rightwards, red: leftwards). The lengths L i ( t ) and forces F i ( t ) are shown for the time domain highlighted by agray area in the left trajectory plot. Blue and red background colors correspond to gradient direction estimation (rightwardsand leftwards, respectively). Arrows in phase space indicate locomotive strategy change (gait adaptation) due to gradientestimation (change from rightwards to leftwards locomotion: blue to red, and vice versa ). (C) Same as in (B), but for aswimmer in a Gaussian chemical ﬁeld (see left-most panel) c ( x ) = a exp( −| x − x | / σ ) for a = 10 c , σ = 3 R and x = 2 . R . ﬁtness ¯ v = x c ( T I ) /T I of the related ANN-based agent,which is simply the mean swimming velocity (i.e. re-ward per unit time). We initialize an ensemble of ANNswhere input neurons are only sparsely connected to out-put neurons by using random weights. The NEAT algo-rithm dynamically produces ANN solutions which diﬀerin number of connections and values of the weights, andmay contain hidden neurons (see bottom left of Fig. 1).ANN solutions with large ﬁtness values are retained andare preferentially selected for reproduction to form thenext generation of ANNs. Thus, good traits of the con-trolling networks will prevail over time directing therebythe entire ensemble of ANNs to the desired solution. Atypical evolution of the ensemble of ANN ﬁtness val-ues containing the maximum ﬁtness (blue) is shown inthe bottom right panel of Fig. 1 (see also Supplemen-tary Movie showing the time evolution of the correspond-ing ANNs). This training process converges after ∼

200 NEAT generations where an optimum policy with ﬁtness¯ v O = 1 . · − R/T has been identiﬁed. Interest-ingly, the optimum solution, which we refer to the optimalswimmer action layer (O-SAL), does not use any hiddenneurons and consist of a sparse architecture containingonly ﬁve connections (see top inset of Fig. 2A). The op-timum solution consist of a square-like shape in ( F , F )action space, while the shape of the ( L , L ) curve is moresmooth (Fig. 2A).Strikingly, the algorithm identiﬁed intermediate, non-optimum but extremely simple solutions which can beeasily interpreted and consist of as few as two connections( L → F , L → F , see bottom inset of Fig. 2A), F = F tanh( w L + b ) and F = F tanh( w L + b ). Thebest of these very simple solutions identiﬁed during theNEAT training has good ﬁtness, ¯ v MC = 0 . · − R/T ,and we refer to this solution as the minimal complex-ity swimmer action layer (MC-SAL), with weights w =20 . /L , w = 5 . L , b = − . b = − .

4. Forlengths L , > L ∗ , = − b , /w , the respective forces F , are positive, and otherwise negative, leading to asimple phase-shifted periodic output (Fig. 2A) with pe-riod T S ≈ T controlling the arm motion. The mag-nitudes of w and w determine how quickly the forcesapproach their maximum values when crossing L ∗ , (seealso Movie S1 and discussion in Supplementary Figure).We also analyzed how network complexity (number ofconnections and hidden neurons) is related to maximallypossible ﬁtness ¯ v of the swimmer: Starting with two con-nections, ¯ v increases when gradually adding connections.However, using more than ﬁve connections (as we havefor O-SAL), or including hidden neurons does not im-prove the ﬁtness of the swimmer any more (see Fig. S9). Phase two: Learning chemotaxis in a constantgradient – spatial vs. temporal gradient detection

Now we proceed to the challenging problem of ﬁnding apolicy which allows the microswimmer to navigate on itsown within a complex environment such as a chemicalﬁeld, c ( x ) (cf. upper right panel in Fig. 1), and per-form positive chemotaxis (motion towards local maximaof c ( x )).We ﬁrst extend the agent’s perception of the environ-ment such that it is able to sense the ﬁeld c ( x ) (whichwe normalize by an arbitrary concentration strength c )and which we use as an additional input for a more ad-vanced chemotaxis agent. We expect that such an agentis able to evaluate the chemical gradient ∇ c ( x ) in or-der to conditionally control the lengths of its arms in away to steer its motion towards maxima of c ( x ). Com-pared to phase one we propose a slightly more complexcumulative reward scheme for the training phase: weuse r c = (cid:80) T I t i =1 [ x c ( t i ) − x c ( t i − )] D ( t i ) where D ( t i ) =sign[ ∇ c ( x c ( t i ))] = ± t i ; thus, r c measures the total distance thatthe swimmer moves along an ascending gradient duringthe total integration time T I .Prior to applying any RL scheme we decompose theproblem of chemotaxis into two tasks: ﬁrst, we requirea mechanism which allows the agent to discern the di-rection D of the gradient (i.e. D = 1 for ascending or D = − chemical gradient (CG) block in the ANN of the chemo-taxis agent (see Fig. 3A) as described below. Second, weidentify a pure locomotive part of the agent which can berooted on already acquired skills – i.e. the unidirectionalmotion learned in phase one (and covered by the abovementioned SAL solutions) – and on the inherent symme-tries of the swimmer model: swimming to the left andswimming to the right are symmetric operations. Basedon the actual value of D , conditional directional motion(i.e., either to the left or to the right) can be inducedby introducing two permutation control layers (PCLs) to the ANN (see Fig. 3A, and SI Appendix and Fig. S1 fordetails).In order to obtain chemotaxis strategies using NEAT,the remaining task is to identify a (potentially recurrent)ANN structure for the chemical gradient block (Fig. 3A),i.e. an ANN which is able to predict the sign D of thechemical gradient. For this purpose we have consideredthree diﬀerent methods which allow the microswimmerto sense ∇ c ( x ): ﬁrst, we assume that the chemotaxisagent can directly measure the sign of the gradient atits COM position x c ( t ): here D is automatically known.Second, for temporal sensing we allow the swimmer tosimultaneously evaluate the chemical ﬁelds c i ( t ) at thebead positions x i ( t ) to predict the sign of the gradient D = Θ( G ) (with Θ( · ) the Heaviside function) from theoutput G of the ANN (Fig. 3B) to be determined byNEAT during training (see below). Third, in an eﬀort tomodel temporal sensing of chemical gradients, which isrelevant for bacterial chemotaxis, we consider recurrentANNs (Fig. 3D). In this case, we explicitly provide theCG agent with inputs that describe its internal, physio-logical state (total arm length and velocity L T and V T ),as well as with the chemical ﬁeld at the COM position c c at each instance of time t i . To train the CG agent we sub-divide its architecture into a block which estimates thegradient, and into another block that controls an inter-nal memory M ( t i ) of the chemical ﬁeld (i.e., the chemicalmemory control cell (CMC)). The latter is inspired by thewell-known long short-term memory (LSTM) cell [39, 40].The ﬁrst block is trained using the NEAT algorithm: ittakes as input L T ( t i ) and V T ( t i ) as well as two recur-rent variables C x ( t i ) and G x ( t i ) and maps this informa-tion onto a control output C y ( t i ) and an estimated valueof the instantaneous chemical gradient G y ( t i ) ∈ [ − , C y (with a temporal feedback connection C x ( t i +1 ) = C y ( t i )) the CMC cell controls via the binaryvariable α = 1 − Θ( C y ( t i )) ∈ { , } the recurrent gradientvalue G x ( t i +1 ) = αG y ( t i ) + (1 − α ) · ( c c ( t i ) − M ( t i )) andthe state of an internal memory M ( t i +1 ) = αM ( t i ) +(1 − α ) c c ( t i ). In that way the CG agent can activelycontrol the time interval between consecutive measure-ments: An update of both G x ( t i +1 ) and M ( t i +1 ) is per-formed whenever C y ( t i ) >

0, otherwise G y ( t i ) and M ( t i )are maintained. Notably, the chemical ﬁeld input of theCG agent is directly forwarded to the CMC cell and thetrained NEAT ANN operates on time delayed gradients G x rather than directly on the values of the chemical ﬁeld c c . Eventually, the output of the temporal CG agent is D = Θ( G y ).For temporal and spatial gradient sensing (Fig. 3B,D)training is necessary. For simplicity, we train the swim-mer on a piece-wise linear ﬁeld, c ( x ) = max(0 , a − k | x − x | ), with amplitude a and slope k using the MC-SAL so-lution obtained in phase one (see SI Appendix and Fig. S3for details).Both for spatial and temporal sensing methods the re-sulting ANNs are strikingly simple and their topology can B Temporal CG Spatial CG

Chemotaxis Agent c c NEATANNCMC MC x G x C y G y DG y c c c DA E

Spatial CG NEAT ANN c c c NEATANN DG Temporal CG NEAT ANN L T V T CG G r a d i e n t S e n s e t i v e I n p u t P e r m u t a t i o n F F T r a n s l a t i o n N e t w o r k PCL SAL G r a d i e n t S e n s e t i v e F o r c e s P e r m u t a t i o n PCLCG D cV V T L T V L L GC x V T G x L T G y C y C - Figure 3. (A) Schematic view of full ANN-based chemo-taxis agent. A chemical ﬁeld c ( x ) and swimmer arm lengths( L , L , L T = L + L ) and respective arm velocities ( V , V , V T ) are used as input. By measuring the chemical gra-dient through the CG-block the swimmer controls the forces F and F in order to perform directed locomotion towardsan ascending gradient of c ( x ). Directed locomotion is splitinto two permutation control layers (PCL) which permute in-put and output of the swimmer action layer (SAL) (see insetsFig. 2A) according to a predicted sign D of the chemical gra-dient. The prediction of D by the CG-block (cyan) can beperformed either by directly measuring D = sign[ ∇ c ( x c )], orby spatial resolution of the chemical ﬁeld (B), or by temporalsensing at the center of mass position x c (D). The respectivesolutions for the ANNs (dark gray and gray) found by NEATare shown in (C) and (E). be well interpreted: the NEAT ANN solution for spatialsensing, shown in Fig. 3C, only requires a single neuronwhich predicts D ( t ) = Θ( G ( t )) ≈ sign[ c ( t ) − c ( t )] (seeSI Appendix for details). During training of the tem-poral gradient-sensing ANN we determine via the CMCcell the precise way in which one output signal of theANN is used as recurrent input signal in the next timestep and how this signal controls the way the chemicalmemory is updated. The solution for temporal sensingis shown in Fig. 3E. The temporal gradient-sensing ANNenables the CG agent to correlate its direction of propa-gation with the gradient of a chemical ﬁeld. Details onthe interpretation of the ANN solution is provided in theSI Appendix.In Figs. 2B and C we present typical trajectories aftersuccessful training obtained for chemical ﬁelds of piece-wise linear shape and of Gaussian shape, respectively.In both cases the swimmer – controlled by spatial sens-ing – suddenly stops as soon as its COM position x c is reasonably close to the maximum x of the chemicalﬁeld (see also Movie S4). In contrast, the swimmer con-trolled by temporal sensing performs oscillations around x due to its time delayed measurements of the chemi-cal ﬁeld and its internal, recurrent processes (see Fig. 3B A BCD

Figure 4. Stochastic microswimmer dynamics from noisymemory readings for noise level ξ = 0 . c . (A) Sampletrajectories in the absence (gray) and in the presence (green)of a linear chemical ﬁeld. (B,C) Run time distributions formoving the ﬁeld upwards (∆ t R ) and downwards (∆ t L ) in theabsence (B) and in the presence (C) of a ﬁeld. (D) Sampletrajectories in time-dependent Gaussian proﬁles c ( x, t ) (seecolor bar) centered at x ± = ± R of width σ = 10 √ πR andheight a = 4 c , and modulated with period T = 708 T S . and Movies S2 and S3). The widths of the oscillationsare governed by the shape of the Gaussian (see also SIAppendix Fig. S6).We observe that the ANNs of both spatial and tempo-ral sensing methods are able to generalize their capabilityto predict the chemical gradient over a much wider rangeof parameters (i.e., amplitude a and slope k of a chem-ical ﬁeld) than they were originally trained on (see SIAppendix and Figs. S4 and S5.) Emergent run-reverse motion from noisy memoryreadings

Realistic chemotactic pathways are always inﬂuencedby thermal noise. In our implementation we applystochastic memory readings of the CMC cell for the tem-porally sensing swimmer, mimicking the fact that thechemotactic signal cannot be detected perfectly. In thisspirit the swimmer measures a ﬁeld, c ( x c ( t )) + δc , δc be-ing a normal distributed random number with zero meanand standard deviation ξ which sets the strength of thenoise. We apply this feature to an ensemble of 100 non-interacting microswimmers moving in a constant chemi-cal gradient c ( x ) = kx . Strikingly, a 1D run-and-tumble(run-reverse) motion emerges naturally, even in the ab-sence of a chemical ﬁeld ( k = 0). In Fig. 4A we presenttypical trajectories both in the absence and in the pres-ence ( k = 0 . c /R ) of a chemical ﬁeld (see also Movie S5).These trajectories consist of segments of rightward mo-tion (over run times ∆ t R ), alternating with segmentsof leftward motion (∆ t L ). The stochastic nature of theunderlying process leads to approximately exponentiallydistributed run times, ∼ e − ∆ t R /τ R and ∼ e − ∆ t L /τ L , fol-lowing thus a similar behaviour as the one measured formicroorganisms [5, 7, 41]. As expected, in the absenceof a ﬁeld τ R ≈ τ L , (Fig. 4B). In the presence of a ﬁeldthe swimmer exhibits a tendency for longer run timesmoving the gradient upwards ( τ R > τ L ) (Fig. 4C) re-sulting in a mean net chemotactic drift velocity v c > k = 0 . c /R in Fig. 4) than it experienced during train-ing (in our case k = 1 c /R ). In general, the chemotacticperformance depends on k and is strongly inﬂuenced bythe noise level ξ (see SI Appendix). Chemotaxis in time-dependent chemical ﬁelds

Eventually we study the dynamics of temporal gradi-ent sensing microswimmers which perform noisy memoryreadings in a more complicated, time-dependent chem-ical environment. Notably, the microswimmers havesolely been trained in a constant chemical gradient as de-scribed in phase two. We now use time-dependent chem-ical ﬁelds of the form c ( x, t ) = h + ( t ) c + ( x ) + h − ( t ) c − ( x )where c ± ( x ) are of Gaussian shape with maximum height a and centered at x ± . The peak amplitudes are mod-ulated via h ± ( t ) = (cid:80) i =0 max[(1 − | i − t/T ) ± | ) , T , see contour plot in Fig. 4D where wealso show typical microswimmer trajectories. Swimmersmay explore consecutive peaks by hopping between chem-ical sources of c + and c − , or may miss peaks by residingin the vicinity of the previously visited chemical source.Thus, the actual swimming paths strongly depend onprior decisions of the chemotaxis agent. In ﬁeld-free re-gions microswimmers perform unbiased run-and-reversestrategies and they employ positive chemotaxis in regionsfeaturing chemical gradients. Hence the combination ofchemotactic response and noise enables useful foragingstrategies in time-dependent ﬁelds. DISCUSSION

We modeled the response of a simple microswimmer toa viscous and chemical environment using the NEAT ge-netic algorithm to construct ANNs which describe the in-ternal decision making machinery coupled to the motionof two arms. First our model microswimmer learned toswim in the absence of a chemical ﬁeld in a motion as it appears, for example, forthe swimming pattern of the algae

Chlamydomonas . In contrast to a recently used Q-learning approachwhich uses a very limited action space [29], we allow con-tinuous changes of the microswimmer’s shape and thuspermit high ﬂexibility in exploring many diﬀerent swim-ming gaits during training. Furthermore, the NEAT al-gorithm has created surprisingly simple ANNs which wewere able to fully understand and interpret, in contrastto often used complex deep neural networks [42–45] orthe lookup table like Q-learning algorithm [45].We used biologically relevant chemotactic sensingstrategies, namely spatial gradient sensing usually per-formed by slow-moving eukaryotic cells, and temporalgradient sensing performed by fast swimming bacteria.We used the latter to explore the inﬂuence of a singlenoisy channel, namely for the reading of the value ofthe chemical concentration, on the chemotactic response.Interestingly, we identiﬁed a noise level for a run-and-tumble type of dynamics with exponentially distributedrun times. Notably, the precise values of the noise havedramatic eﬀects on the run-and-tumble behavior. In fact,there appears to be a rather narrow window of noise lev-els which enables the swimmer to eﬃciently perform run-and-tumble motion (see SI Appendix). Indeed for real ex-isting signal sensing mechanisms in microorganisms therole of the noise and the precision of signal detection isan active ﬁeld of research, see e.g. [46].The run-and-tumble behavior in our system is an emer-gent behavior which sustains in the absence of a chemicalﬁeld (as observed, for example, for swimming bacteria)without explicitly challenging the microswimmer to ex-ploit search strategies in the absence of a ﬁeld duringtraining. From an evolutionary point of view it makessense that bacteria have learned this behavior in com-plex chemical environments. We also ﬁnd that individ-ual microswimmers performing run-and-tumble motionmay show a small bias to the left or to the right even inthe absence of a ﬁeld due to the stochastic nature of thegenetic optimization.The question how single cells make decisions whichaﬀect their motion in their environment is an activeﬁeld of research [47–50]. For example, bacteria, protists,plants and fungi make decisions without using neuronsbut rather employ a complex chemotactic signaling net-work [51]. On the other hand, small multicellular or-ganisms such as the worm

C. elegans use only a smallnumber of neurons in order to move and perform chemo-taxis [52, 53]. Our approach therefore oﬀers new toolsin order to investigate possible architectures, functional-ities and the necessary level of complexity of sensing andmotor neurons coupled to muscle movement in-silico byevolutionary developed ANNs. In the future our workcan be extended to more speciﬁc microswimmers movingin two or three dimensions, in order to extract the neces-sary complexity of the decision making machinery usedfor chemotaxis, mechanosensing, or even more complexbehavioral responses such as reproduction.

METHODSArtiﬁcial Neural Networks (ANNs)

An ANN is a set of interconnected artiﬁcial neu-rons which collect weighted signals (either from exter-nal sources or from other neurons) and create and redis-tribute output signals generated by a nonlinear activationfunction [54] (see SI Appendix for details). In that way anANN can process information in an eﬃcient and ﬂexibleway: by adjusting the weights and biases of connectionsbetween diﬀerent neurons or by adjusting the networktopology ANNs can be trained to map network input tooutput signals thereby realizing task speciﬁc operationswhich are often too complicated to be implemented man-ually [55].

NEAT algorithm

NeuroEvolution of Augmented Topologies (NEAT) [56]is a genetic algorithm designed for constructing neuralnetworks. In contrast to most learning algorithms it doesnot only optimize the weights of an ANN (in an eﬀort tooptimize a so-called target function), but, moreover, gen- erates the weights and the topology of the ANN simul-taneously (see SI Appendix for details). This process isguided by the principle of complexiﬁcation [56]: startingfrom a minimal design of the ANN, the algorithm willgradually add or remove nodes and connecting neuronswith certain probabilities according the evolutionary pro-cess (schematically depicted by the green dashed lines inthe bottom left panel of Fig. 1), in order to keep the re-sulting network as simple and sparse as possible. Theresulting ANNs of minimal complexity can then be usedto employ the target task, even for situations that theANNs never have explicitly experienced during training.

ACKNOWLEDGEMENTS

B.H. acknowledges a DOC Fellowship of the AustrianAcademy of Sciences. B.H. and G.K. acknowledge ﬁ-nancial support by E-CAM, an e-infrastructure centerof excellence for software, training, and consultancy insimulation and modeling funded by the EU (Project no.676531). A.Z. acknowledges funding from the AustrianScience Fund (FWF) through a Lise-Meitner Fellowship(Grant No. M 2458-N36). The computational resultspresented have been achieved using the Vienna ScientiﬁcCluster. [1] E. M. Purcell, Am. J. Phys. , 3 (1977).[2] E. Lauga and T. R. Powers, Rep. Prog. Phys. , 096601(2009), arXiv:0812.2887.[3] J. Elgeti, R. G. Winkler, and G. Gompper, Rep. Prog.Phys. , 056601 (2015), arXiv:1412.2692.[4] A. Z¨ottl and H. Stark, J. Phys.: Condens. Matter ,253001 (2016), arXiv:1601.06643.[5] H. C. Berg and D. A. Brown, Nature , 500 (1972).[6] E. Lauga, Annu. Rev. Fluid Mech. , 105 (2016),arXiv:1509.02184.[7] M. Polin, I. Tuval, K. Drescher, J. P. Gollub, and R. E.Goldstein, Science , 487 (2009).[8] D. A. Clark and L. C. Grant, Proceedings of the NationalAcademy of Sciences of the United States of America , 9150 (2005).[9] A. Celani and M. Vergassola, Proceedings of the NationalAcademy of Sciences of the United States of America , 1391 (2010).[10] J. Taktikos, H. Stark, and V. Zaburdaev, PLoS ONE (2013), 10.1371/journal.pone.0081936.[11] Z. Alirezaeizanjani, R. Großmann, V. Pfeifer,M. Hintsche, and C. Beta, Science Advances (2020), 10.1126/sciadv.aaz6153.[12] K. F. Swaney, C.-H. Huang, and P. N. Devreotes, AnnualReview of Biophysics , 265 (2010).[13] H. Levine and W.-J. Rappel, Physics Today , 24(2013).[14] Y. Artemenko, T. J. Lampert, and P. N. Devreotes,Cellular and molecular life sciences : CMLS , 3711(2014). [15] E. T. Roussos, J. S. Condeelis, and A. Patsialou, NatRev Cancer , 573 (2011).[16] K. F. Jarrell and M. J. McBride, Nature Reviews Micro-biology , 466 (2008).[17] K. Y. Wan and G. J´ekely, Philosophi-cal Transactions of the Royal Society B:Biological Sciences , 20190758 (2021),https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2019.0758.[18] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev,and A. Walsh, Nature , 547 (2018).[19] P. Mehta, M. Bukov, C. H. Wang, A. G. Day, C. Richard-son, C. K. Fisher, and D. J. Schwab, Physics Reports , 1 (2019), arXiv:1803.08823.[20] S. L. Brunton, B. R. Noack, and P. Koumoutsakos,Annual Review of Fluid Mechanics , 477 (2020),arXiv:1905.11075.[21] F. Cichos, K. Gustavsson, B. Mehlig, and G. Volpe,Nature Machine Intelligence , 94 (2020).[22] S. Colabrese, K. Gustavsson, A. Celani, and L. Biferale,Physical Review Letters , 1 (2017), arXiv:1701.08848.[23] K. Gustavsson, L. Biferale, A. Celani, and S. Co-labrese, European Physical Journal E (2017),10.1140/epje/i2017-11602-9, arXiv:1711.05826.[24] J. K. Alageshan, A. K. Verma, J. Bec, and R. Pandit,Physical Review E , 43110 (2020).[25] J. R. Qiu, W. X. Huang, C. X. Xu, and L. H. Zhao,Science China: Physics, Mechanics and Astronomy (2020), 10.1007/s11433-019-1502-2, arXiv:1811.10880.[26] G. Reddy, A. Celani, T. J. Sejnowski, and M. Vergassola,Proceedings of the National Academy of Sciences of theUnited States of America , E4877 (2016). [27] G. Palmer and S. Yaida, Arxiv preprint ,arXiv:1709.02379 (2017), arXiv:1709.02379.[28] E. Schneider and H. Stark, Epl (2019),10.1209/0295-5075/127/64003, arXiv:1909.03243.[29] A. C. H. Tsang, P. W. Tong, S. Nallan, and O. S. Pak,Phys. Rev. Fluids , 074101 (2020), arXiv:1808.07639.[30] S. Verma, G. Novati, and P. Koumoutsakos, Proceedingsof the National Academy of Sciences of the United Statesof America , 5849 (2018), arXiv:1802.02674.[31] S. Mui˜nos-Landin, K. Ghazi-Zahedi, and F. Ci-chos, Arxiv preprint , arXiv:1803.06425 (2018),arXiv:1803.06425.[32] G. Reddy, J. Wong-Ng, A. Celani, T. J. Sejnowski, andM. Vergassola, Nature , 236 (2018).[33] A. Najaﬁ and R. Golestanian, Phys. Rev. E , 062901(2004), arXiv:0402070 [cond-mat].[34] R. Golestanian and A. Ajdari, Physical Review E - Sta-tistical, Nonlinear, and Soft Matter Physics , 1 (2008),arXiv:0711.3700.[35] D. J. Earl, C. M. Pooley, J. F. Ryder, I. Bredberg, andJ. M. Yeomans, J. Chem. Phys. , 064703 (2007),arXiv:0701511 [cond-mat].[36] R. S. Sutton and A. G. Barto, Reinforcement Learning:An Introduction , 2nd ed. (The MIT Press, 2018).[37] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves,I. Antonoglou, D. Wierstra, and M. Riedmiller, (2013).[38] A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick,L. Sifre, T. Green, C. Qin, A. ˇZ´ıdek, A. W. R. Nel-son, A. Bridgland, H. Penedones, S. Petersen, K. Si-monyan, S. Crossan, P. Kohli, D. T. Jones, D. Silver,K. Kavukcuoglu, and D. Hassabis, Nature , 706(2020).[39] S. Hochreiter and J. Schmidhuber, Neural Comput. ,1735 (1997).[40] R. C. Staudemeyer and E. R. Morris, Arxiv preprint ,arXiv:1909.09586 (2019), arXiv:1909.09586. [41] M. Theves, J. Taktikos, V. Zaburdaev, H. Stark, andC. Beta, Biophysical Journal , 1915 (2013).[42] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez,Y. Tassa, D. Silver, and D. Wierstra, Arxiv preprint ,arXiv:1509.02971 (2015), arXiv:1509.02971 [cs.LG].[43] Z. Gu, Z. Jia, and H. Choset, “Adversary A3C for Ro-bust Reinforcement Learning,” (2019), arXiv:1912.00330[cs.LG].[44] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, andO. Klimov, “Proximal Policy Optimization Algorithms,”(2017), arXiv:1707.06347 [cs.LG].[45] C. J. C. H. Watkins and P. Dayan, Mach. Learn. , 279(1992).[46] P. R. ten Wolde, N. B. Becker, T. E. Ouldridge, andA. Mugler, Journal of Statistical Physics , 1395(2016), arXiv:1505.06577.[47] G. Bal´azsi, A. Van Oudenaarden, and J. J. Collins, Cell , 910 (2011).[48] C. G. Bowsher and P. S. Swain, Current Opinion inBiotechnology , 149 (2014).[49] S. K. Tang and W. F. Marshall, Current Biology ,R1180 (2018).[50] S. Tripathi, H. Levine, and M. K. Jolly, Annual Reviewof Biophysics , 1 (2020).[51] C. R. Reid, S. Garnier, M. Beekman, and T. Latty, An-imal Behaviour , 44 (2015).[52] T. A. Jarrell, Y. Wang, A. E. Bloniarz, C. A. Brittin,M. Xu, J. N. Thomson, D. G. Albertson, D. H. Hall,and S. W. Emmons, Science , 437 (2012).[53] E. Itskovits, R. Ruach, and A. Zaslaver, Nature Com-munications (2018), 10.1038/s41467-018-05151-2.[54] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learn-ing (MIT Press, 2016).[55] M. R. Baker and R. B. Patil, Reliab. Comput. , 235(1998).[56] K. O. Stanley and R. Miikkulainen, Evol. Comput.10