[PDF] Bayesian gradient sensing in the presence of rotational diffusion

Abstract

Biological cells estimate concentration gradients of signaling molecules with a precision that is limited not only by sensing noise, but additionally by the cell's own stochastic motion. We ask for the theoretical limits of gradient estimation in the presence of both motility and sensing noise. We introduce a minimal model of a stationary chemotactic agent in the plane subject to rotational diffusion, which uses Bayesian estimation to optimally infer a gradient direction from noisy concentration measurements. Contrary to the known case of gradient sensing by temporal comparison, we show that for spatial comparison, the ultimate precision of gradient sensing scales not with the rotational diffusion time, but with its square-root. To achieve this precision, an individual agent needs to know its own rotational diffusion coefficient. This agent can accurately estimate the expected variability within an ensemble of agents. If an agent, however, does not account for its own motility noise, Bayesian estimation fails in a characteristic manner.

Full PDF

BBayesian gradient sensing in the presence of rotational diﬀusion

Maja Novak and Benjamin M. Friedrich TU Dresden, Dresden, Germany (Dated: February 20, 2020)Biological cells estimate concentration gradients of signaling molecules with a precision that islimited not only by sensing noise, but additionally by the cell’s own stochastic motion. We ask forthe theoretical limits of gradient estimation in the presence of both motility and sensing noise. Weintroduce a minimal model of a stationary chemotactic agent in the plane subject to rotational diﬀu-sion, which uses Bayesian estimation to optimally infer a gradient direction from noisy concentrationmeasurements. Contrary to the known case of gradient sensing by temporal comparison, we showthat for spatial comparison, the ultimate precision of gradient sensing scales not with the rotationaldiﬀusion time, but with its square-root. To achieve this precision, an individual agent needs to knowits own rotational diﬀusion coeﬃcient. This agent can accurately estimate the expected variabilitywithin an ensemble of agents. If an agent, however, does not account for its own motility noise,Bayesian estimation fails in a characteristic manner.

Keywords: physical limits of chemosensation, rotational diﬀusion, sequential Bayesian estimation, chemo-taxis, bearing tracking

1. INTRODUCTION

Many motile biological cells navigate in concentration gradients of signaling molecules in a process termed chemo-taxis [1–6]. At cellular scales, the stochastic binding of signaling molecules results in molecular shot noise and rendersconcentration measurements inherently noisy [7]. This sensing noise imposes physical limits on the precision of chemo-taxis [7–13]. Experimental work suggests that biological cells indeed operate at these limits in shallow concentrationgradients [14–18]. Temporal averaging over extended measurement intervals is a common strategy to reduce sensingnoise [7, 19–21]. Yet, temporal averaging may be limited in time-varying environments [18, 22–24], or more directlyby the stochastic motion of chemotactic agents themselves [25]. How should a chemotactic agent integrate previousand more recent measurements to most accurately estimate the relative direction of a concentration gradient if thisdirection changes stochastically in time?Previous work suggested that bacteria use Kalman ﬁlters to track a time-dependent concentration signal, providingan optimal weighting of past and recent measurements [26], see also [25]. Yet, Kalman ﬁlters address linear problems[27], while sensing a direction is a nonlinear problem of circular statistics, which prompts Bayesian updating of anangular likelihood distribution at each time step [28]. Bayesian estimation had been successfully applied, e.g., formonitoring time-varying environments with two states [29], estimating absolute concentration [30–32], or temporalchanges thereof [24, 33], as well as classiﬁcation tasks [34, 35], and even decision making in the immune system [36].The speciﬁc problem of Bayesian sensing of direction was addressed previously [11, 12, 15, 37], yet without con-sidering motility noise. Likewise, the infotaxis algorithm, which computes a likelihood map for the position of ahidden target, does not include motility noise [38]. In the engineering literature, directional sensing is known as‘bearing tracking’, and estimation algorithms do indeed account for motility noise [39, 40]. However, to the best ofour knowledge, an analytical theory of optimal directional sensing that accounts for motility noise is missing.Here, we derive theoretical limits for the precision of gradient sensing by chemotactic agents such as biologicalcells in the presence of both sensing and motility noise. We consider a minimal model of a chemotactic agent inthe plane that attempts to track the direction of an external concentration gradient. The agent performs noisyconcentration measurements, while it undergoes rotational diﬀusion. This agent integrates subsequent measurementsinto a likelihood distribution of possible gradients using Bayesian updating.Our manuscript is structured as follows: We ﬁrst brieﬂy review Bayesian gradient sensing without motility noiseto introduce notation. We recapitulate how temporal averaging improves the precision of gradient estimates as afunction of measurement time. We then introduce motility noise and consider an agent subject to rotational diﬀusion.This agent, however, is ﬁrst not aware of its own motility noise, which results in grossly erroneous gradient estimates.In contrast, as our main result, we show how an agent that only knows its own rotational diﬀusion coeﬃcient D canobtain optimal estimates of gradient direction, as well as a self-consistent estimate of the accuracy of this estimate,i.e., the expected dispersion in an ensemble of agents, which scales as D − / . This optimal gradient-sensing strategycorresponds to temporal averaging over a time span that likewise scales as D − / . We discuss why this new resultfor gradient-sensing by spatial comparison is diﬀerent from previous results for chemotaxis by temporal comparison ,which predicted a substantially longer optimal time span of temporal averaging that scales as D − [25]. a r X i v : . [ phy s i c s . b i o - ph ] F e b

2. MINIMAL MODEL

We consider a chemotactic agent in the plane, see Fig. 1. Orthonormal vectors h and h deﬁne its material frame.The agent is subject to rotational diﬀusion with eﬀective diﬀusion coeﬃcient D = D rot (with units of inverse time),i.e., (cid:104) h ( t ) · h ( t + t ) (cid:105) = exp( − D | t | ). For simplicity, the agent is stationary with center position R .The agent seeks to estimate the strength and direction of an external concentration gradient c ( r ) = c (cid:104) α a g · ( r − R ) (cid:105) (1)with base concentration c , dimensionless gradient strength α = |∇ c | a/c (normalized by a sensing length-scale a set by the dimensions of the agent), and gradient direction vector g = cos ψ h + sin ψ h of unit length, such that ψ denotes the gradient angle enclosed by h and g . The concentration gradient Eq. (1) represents an unknown state of the environment , S true ( t ) = ( c , α , ψ true ( t )), where ψ true (0) = ψ at time t = 0. The agent shall beequipped with N sensors placed equidistantly on the circumference of a disk of radius a at respective positions r j = R + a cos(2 πj/N ) h + a sin(2 πj/N ) h , see Fig. 1A. To simplify future calculations, we introduce complexvector notation h = h + i h , which gives, g = Re e iψ h ∗ and r j = R + Re η j h ∗ , where η = exp(2 πi/N ) denotesthe N th root of unity. Each sensor detects stochastic binding events of molecules with rate Λ j = λc j /N proportionalto local concentration c j = c ( r j ) = c [1 + α cos(2 πj/N − ψ )] for j = 1 , . . . , N . The sensor count n j during a timeinterval τ becomes a Poissonian random variable with expectation value n j = (cid:104) n j (cid:105) = Λ j τ and variance n j . As fusionof this sensor information, we consider its Fourier transform (cid:101) n k = (cid:80) Nj =1 n j η jk for k = 0 , . . . , N − (cid:104) (cid:101) n (cid:105) = ν , (cid:104) (cid:101) n (cid:105) = α ν e iψ / N ≥ (cid:104) (cid:101) n k (cid:105) = 0 for 2 ≤ k ≤ N −

2. Correspondingly, we assume that the agent stores only ameasurement vector M = ( (cid:101) n , (cid:101) n (cid:48) , (cid:101) n (cid:48)(cid:48) ) ∈ R , where (cid:101) n = (cid:101) n (cid:48) + i (cid:101) n (cid:48)(cid:48) denotes decomposition into real and imaginarypart.In the following, we consider discrete time dynamics with subsequent measurement intervals T j = ( t j − , t j ) ofduration τ and discrete rotational diﬀusion events at times t j = jτ , where the agent rotates by a random angle ∆ j that is normally distributed with zero mean and variance (cid:104) ∆ i ∆ j (cid:105) = 2 Dτ δ ij , see Fig. 1B.The agents estimates the state of the environment as (cid:98) S = (cid:16)(cid:98) c, (cid:98) α, (cid:98) ψ (cid:17) based on the sequence of measurements M , . . . , M m taken during the time intervals T , . . . , T m using a maximum-likelihood estimate as detailed below.

3. BAYESIAN GRADIENT SENSING WITHOUT MOTILITY NOISE3.1. The measurement process for a single measurement

From the fact that the molecule counts n j are independent Poisson random variables, we readily ﬁnd the expectationvalue µ = (cid:104)M(cid:105) and covariance matrix Σ = (cid:104) ( M − µ )( M − µ ) T (cid:105) of a single measurement M if the true state is S = ( c , α , ψ ). Interestingly, if the agent possesses at least four sensors, N ≥

4, both µ and Σ are independent ofthe number N of sensors. For N = 2, gradient-sensing obviously becomes impossible if ψ = ± π/

2, while the precisionof gradient-sensing (weakly) depends on ψ for N = 3, i.e. it depends on the orientation of the agent relative to thegradient direction, see appendix D. For N ≥

4, we ﬁnd µ = ν (1 , α cos( ψ ) / , α sin( ψ ) / T (2)and covariance matrix Σ = ν  α cos( ψ ) / α sin( ψ ) / α cos( ψ ) / / α sin( ψ ) / /  , (3)where we introduced short-hand ν = λτ c , and ν = λτ c for later use. Interestingly, the covariance matrix Σ possesses non-zero oﬀ-diagonal entries, i.e., (cid:101) n (“measuring absolute concentration”) and (cid:101) n (“measuring a gradient”)are not independent.In the limit of large molecule counts, n j (cid:29)

1, we can employ a diﬀusion approximation, and approximate theprobability distribution of each n j by a normal distribution with mean n j and variance n j . The discrete Fouriertransform is a linear transformation, hence the distribution of measurement vectors M can likewise be approximatedas a multi-variate Gaussian, using the mean values and co-variance matrix computed above, P ( M | S ) = N ( µ , Σ ). Bayesian updating (accounting for sensing noise)

Prediction (accounting for motility noise)

True state (time-varying)

A CB

Likelihood distribution

FIG. 1:

Chemotactic agent subject to rotational diﬀusion. A.

In our minimal model, a chemotactic agents seeks toestimate an external concentration gradient ∇ c of signaling molecules (relative to its material frame h and h ) by countingbinding events at N sensor sites spaced equidistantly on the agent’s circumference. During each measurement interval T j of duration τ , the agent obtains molecule counts n , . . . , n N , which are combined into a (Fourier-transformed) measurementvector M j . Between measurements, the agent is subject to rotational diﬀusion with rotational diﬀusion coeﬃcient D . B. Theangle ψ true ( t ) enclosed between gradient direction ∇ c and material frame vector h becomes a random walk with stochasticincrements ∆ j . This motility noise limits the precision of the gradient estimate (cid:99) ∇ c and its direction angle (cid:98) ψ as estimatedby the agent. C. The relative direction of the concentration gradient represents a time-dependent state of the environment, S true ( t ) = S j for t ∈ T j where S j = ( c , α , ψ j ). The agent computes a likelihood distribution L ( S ) of possible concentrationgradients, iteratively executing a prediction step that accounts for its rotational diﬀusion (which ﬂattens the distribution), andan update step that incorporates a new measurement M m (which usually sharpens the distribution). The agent, which does not know S , anticipates that measurements M are distributed according to an analogous P ( M | S ) for any hypothetical state S , i.e., P ( M | S ) = N ( µ, Σ) = (2 π ) − / | Σ | − / exp (cid:18) −

12 (

M − µ ) T Σ − ( M − µ ) (cid:19) . (4)Here, µ and Σ are deﬁned analogously to Eqs. (2) and (3) with substitutions c → c , α → α , ψ → ψ , such that µ = µ ( S ) and Σ = Σ( S ). We introduce the signal-to-noise ratio of gradient sensing, SNR (with prefactor matching [41]), which characterizessensing noise SNR = 2 |(cid:104) (cid:101) n (cid:105)| (cid:104)| (cid:101) n − (cid:104) (cid:101) n (cid:105)| (cid:105) = α ν . (5)Here, we used Eqs. (2) and (3) in the last step, see also appendix A.1. Explicitly, SNR = λτ |∇ c | a / (2 c ), i.e., thesignal-to-noise ratio scales with measurement time τ . Below, we show that the SNR sets the precision of a singlemeasurement. The agent computes a likelihood L m = L ( S | M m ) for each possible state S = ( c, α, ψ ) of the environment, basedon all previous measurements M , . . . , M m . The corresponding maximum-likelihood state estimate at time t m reads (cid:98) S m = argmax S L m . We are especially interested in the maximum-likelihood estimate (cid:98) ψ m of the gradient direction,and the estimated precision of this estimate (quantiﬁed below in terms of a so-called measure of concentration).After each measurement, the agent updates the likelihood function L m ( S ), using Bayes’ rule L m = L ( S | M m ) = P ( M m | S ) P ( M m |M m − ) L m − . (6)Here, P ( M | S ) is the probability to measure M given a speciﬁc state S ( measurement process , approximated byEq. (4)), P ( M m | M m − ) = (cid:82) d S P ( M m | S ) L m − ( S | M m − ) is the probability to measure M m given the previouslikelihood function L m − = L ( S | M m − ), and L ( S ) is a Bayesian prior. After a single measurement M that yielded a measured angle ψ τ , i.e., (cid:101) n = | (cid:101) n | e iψ τ , the likelihood function L = L ( S|M ) for S = ( c, α, ψ ) reads L = Λ exp (cid:18) (cid:101) n | (cid:101) n | Aν (cid:124) (cid:123)(cid:122) (cid:125) κ τ cos( ψ − ψ τ ) (cid:19) exp (cid:18) −| (cid:101) n | αA ν cos 2( ψ − ψ τ ) (cid:19) L , (7)which follows from Eqs. (4) and (6). Here, Λ = Λ( c, α, M ) is a prefactor independent of ψ , see appendix A.3, andwe used short-hand A = 2 α/ (2 − α ) ≈ α for α (cid:28) L contains as a factor a von-Mises distribution for ψ , p ( ψ ) ∼ exp[ κ τ cos( ψ − ψ τ )] with measure ofconcentration κ τ = (cid:101) n | (cid:101) n | A/ν , see also appendix B. In the limit of weak concentration gradients, α (cid:28)

1, thisfactor dominates (for likely S and typical M ). The second exponential factor in Eq. (7), which results from the oﬀ-diagonal entries of the covariance matrix Σ, represents a von-Mises distribution for 2 ψ . The corresponding measure ofconcentration | (cid:101) n | αA/ (4 ν ) ∼ α κ τ is small compared to that of the ﬁrst factor (for likely S and typical M ). Hence,we can approximate this factor by a constant.The measure of concentration κ τ corresponding to a single measurement depends on M and is thus itself a randomvariable. We can compute the expectation value of κ τ exactly (cid:104) κ τ (cid:105) = α (cid:104) (cid:101) n | (cid:101) n | (cid:105) ν (8)= 2 SNR + SNR + O (cid:0) α , α ν (cid:1) , (9)see also appendix A.1. For the ﬁrst moment of κ τ , we ﬁnd an analytic expression in the limit of high signal-to-noiseratio (cid:104) κ τ (cid:105) ≈ SNR + 12 for SNR (cid:29) , (10)see appendix A.2. Accordingly, we can interpret (cid:104) κ τ (cid:105) as the sum of a squared mean (cid:104) κ τ (cid:105) ≈ SNR , and a variance (cid:104) κ τ (cid:105) − (cid:104) κ τ (cid:105) ∼ SNR.The asymptotic scaling (cid:104) κ τ (cid:105) ∼ SNR is consistent with geometric intuition: the measure of concentration κ τ is closelyrelated to the circular variance, which in turn can be estimated by considering a typical right-angled triangle in thecomplex plane with small angle ψ τ − ψ and catheti | Re (cid:101) n e − iψ | ≈ α ν / | Im (cid:101) n e − iψ | ∼ √ ν for SNR (cid:29) κ τ ) − ≈ − (cid:104) cos( ψ τ − ψ ) (cid:105) ∼ [ √ ν / ( α ν / ∼ SNR − .Next, we give an explicit approximation for L . For simplicity, we consider the special case, where the Bayesian prior L ( S ) is itself a von-Mises distribution in ψ , centered at ψ with measure of concentration κ , and the agent possessesperfect knowledge of the other two environmental variables, c and α , L ∼ exp[ κ cos( ψ − ψ )] δ ( c − c ) δ ( α − α ).This is not a severe restriction, at least not for the absolute concentration c , as agents can estimate c very preciselyfor ν (cid:29)

1. Now, the updated likelihood distribution L is again a von-Mises distribution with new measure ofconcentration κ and maximum-likelihood angle (cid:98) ψ , see also appendix B κ e i (cid:98) ψ = κ e iψ + κ τ e iψ τ . (11)As expected, (cid:98) ψ is a weighted circular mean of the prior ψ and the measured ψ τ For the measure of concentration κ of the updated likelihood function L , we ﬁnd from Eq. (11) with κ τ ≈ α (cid:101) n | (cid:101) n | /ν (cid:104) κ (cid:105) = κ + (cid:104) κ τ (cid:105) + α ν κ e iψ (cid:104) (cid:101) n (cid:101) n (cid:105) ∗ + c . c . = κ + 2(1 + κ ) SNR + SNR + O (cid:0) α , α ν (cid:1) . (12) We are interested in the marginal likelihood distribution L m ( ψ ) of the estimated gradient angle ψ after m subsequentmeasurements. By applying Bayes’ update rule Eq. (6) iteratively m times, we obtain an approximate formula for L m ( ψ ) as a von-Mises distribution L m ( ψ ) ∼ exp (cid:104) κ m cos( ψ − (cid:98) ψ m ) (cid:105) . (13)Next, we compute the measure of concentration κ m . In the absence of rotational diﬀusion, subsequent measurementsare independent random variables. This allows us to compute the second moment of κ m analogous to Eqs. (S10) and(12), see appendix A.4 (cid:104) κ m (cid:105) = κ + 2(1 + κ ) m SNR + m SNR + O ( α , α ν ) . (14)Eq. (14) corroborates how chemotactic agents increasingly become more conﬁdent of their gradient estimates as thenumber m of sequential measurements increases. Eq. (14) is equivalent to the result for a single long measurement ofduration mτ , for which the eﬀective signal-to-noise ratio reads m SNR, see also appendix D.Asymptotically, the root-mean-square expectation value of the measure of concentration, normalized by the number m of measurements, approaches the signal-to-noise ratio SNRlim m →∞ m (cid:104) κ m (cid:105) / = SNR . (15)

4. GRADIENT SENSING IN THE PRESENCE OF ROTATIONAL DIFFUSION

We now consider a chemotactic agent subject to rotational diﬀusion with

D >

0. The orientational angle ψ true ( t )that speciﬁes the direction of the gradient vector g relative to the material frame of the agent at time t , i.e.,cos[ ψ true ( t )] = h ( t ) · g , thus becomes a stochastic process. For simplicity, we consider discrete time dynamics,where rotational diﬀusion events occur at discrete times t j = jτ . Thus, the orientation angle ψ true ( t ) is constant dur-ing each interval T j = ( t j − , t j ) with ψ true ( t ) = ψ j for t ∈ T j , with independent random increments ∆ j = ψ j +1 − ψ j ,normally distributed with (cid:104) ∆ i ∆ j (cid:105) = 2 Dτ δ ij , see Fig. 1B. We calculate the expected measure of concentration of sequential gradient estimates for

D >

0, following thecalculation for the special case D = 0 in section 3 3.5. For simplicity, we again assume a Bayesian prior of the form L ( c, α, ψ ) ∼ exp[ κ ( ψ − ψ )] δ ( c − c ) δ ( α − α ). For agents with rotational diﬀusion, subsequent measurements M j and M k are not independent random variables because they depend on the underlying stochastic process ψ true ( t ).Speciﬁcally, (cid:104) (cid:101) n ( j )1 (cid:105) = α ν e iψ e − Djτ and (cid:104) (cid:101) n ( j )1 (cid:101) n ( k )1 ∗ (cid:105) = ( α ν ) e − D | k − j | τ . (16)This correlation marks a crucial diﬀerence to the case D = 0 treated above in Eq. (14), where we exploited thatsubsequent measurements are independent. For D >

0, Eq. (S12a) in appendix A.4 still holds, but Eq. (S12b) doesnot.We now only use the approximation that (cid:101) n ( j )0 and (cid:101) n ( j )1 are approximately independent for each measurement, butaccount for the correlation of subsequent measurements in Eq. (16). With this approximation, we obtain (cid:104) κ m (cid:105) = κ + α (cid:88) j (cid:104)| (cid:101) n ( j )1 | (cid:105) + (cid:88) j (cid:54) = k (cid:104) (cid:101) n ( j )1 (cid:101) n ( k )1 ∗ (cid:105)  + α κ e iψ (cid:88) j (cid:104) (cid:101) n ( j )1 (cid:105) ∗ + c . c . . (17)We compute the sums of expectation values in Eq. (17) by evaluating a (double) geometric series using Eq. (16), seeappendix A.4. As result, we ﬁnd (cid:104) κ m (cid:105) = κ + 2Φ ( m, κ ) SNR + 2Φ ( m ) SNR + O ( α , α ν ) (18)with Φ ( m, κ ) = m + κ − e − mDτ e Dτ − and Φ ( m ) = m e Dτ − − e Dτ − e − mDτ ( e Dτ − + m . In the limit of slow diﬀusion, Dτ (cid:28) Dτ , Φ ≈ (1 + κ ) m for mτ (cid:28) D − and Φ ≈ m + κ / ( Dτ ) for mτ (cid:29) D − , as well asΦ ≈ m / mτ (cid:28) D − and Φ ≈ m/ ( Dτ ) for mτ (cid:29) D − . This provides an asymptotic scaling for κ m , valid inthe limit SNR (cid:29) Dτ (cid:28) (cid:104) κ m (cid:105) / ≈ (cid:40) m SNR for mτ (cid:28) D − (cid:113) mτD SNR /τ for mτ (cid:29) D − . (19)For D > κ m will initially increase linearly with m , and cross-over to the asymptotic scaling κ m ∼ m − / beyond acharacteristic measurement time t = mτ on the order of D − . In fact, the condition SNR (cid:29) D (cid:28) SNR /τ and m (cid:29) SNR − . (The ﬁrst condition ensures Φ (cid:29) Φ , while the secondcondition implies that the contribution of the Bayesian prior is negligible.) Note that in the continuum limit τ → t = mτ ﬁxed, Eq. (19) becomes (cid:104) κ ( t ) (cid:105) / = (cid:112) t/D SNR /τ , where SNR /τ = α λc / τ .Eq. (19) shows how an agent subject to rotational diﬀusion that does not take into account its own stochasticmotion in its update process will erroneously believe that its gradient direction estimate becomes increasingly moreaccurate if measurement time t = mτ is increased. Yet, this is wrong.In fact, in an ensemble of agents, the estimation errors δ j = (cid:98) ψ j − ψ j will eventually become completely randomized.To illustrate this behavior, we characterize the distribution of estimation errors δ within an ensemble of agents,approximating it by a wrapped normal distribution with variance parameter σ m . We show that σ m increases as afunction of time t m . For the estimation error of an individual agent, we have an approximate iteration rule, validfor early times, mτ (cid:28) D − , and high signal-to-noise ratio, SNR (cid:29)

1, which expresses the new error as an aﬃneinterpolation of the previous error and the error of the last measurement δ m ≈ σ τ (cid:98) σ m − + σ τ ( δ m − − ∆ m − ) (cid:124) (cid:123)(cid:122) (cid:125) previous estimateplus noise + (cid:98) σ m − (cid:98) σ m − + σ τ (cid:16) arg (cid:101) n ( m )1 − ψ m (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) new measurement . (20)Here, we introduced the variance parameters σ τ and (cid:98) σ m of wrapped normal distributions approximating von-Misesdistributions with measures of concentration κ τ and κ m computed above in Eqs. (10) and (14), such that respectivedistributions have the same circular variance. Mathematically, σ = − I ( κ ) /I ( κ ), see appendix B. From Eq. (20),we obtain an approximate iteration rule for σ m σ m ≈ (cid:18) σ τ (cid:98) σ m − + σ τ (cid:19) (cid:0) σ m − + 2 Dτ (cid:1) + (cid:18) (cid:98) σ m − (cid:98) σ m − + σ τ (cid:19) σ τ . (21)This expression suggests that σ m grows asymptotically as √ Dτ m , see appendix A.5. Correspondingly, the circularvariance CV = 1 − e − σ / of the distribution p ( δ ) should increase, eventually converging to 1 for mτ (cid:29) D − . Althoughthe speciﬁc assumptions made in the derivation of Eq. (21) do not hold in this limit, simulations corroborate thissimple picture, see Fig. 2A.In conclusion, agents not aware of their own rotational diﬀusion, will arrive at erroneous gradient estimates. Thereason is that past measurements will have become partially invalidated by rotational diﬀusion, yet are nonethelessincorporated in the gradient estimates with full weight. Concomitantly, the precision that individual agents estimatefor their own gradient measurement does not reﬂect the true accuracy, i.e., the dispersion of maximum-likelihoodestimates in an ensemble of agents. Individual agents are ‘over-conﬁdent’ of their own estimates. Agent unaware of motility noise A Agent aware of motility noise

Theory Estimated precisionTrue accuracySimulation B -1 -2 -3 FIG. 2:

Estimated precision and true accuracy of Bayesian gradient sensing. A.

Agents unaware of own rotationaldiﬀusion.

Each individual agents computes a likelihood distribution L m ( ψ ) at each time step, with maximum-likelihooddirection angle (cid:98) ψ m and circular variance CV[ L m ( ψ )]. Shown is the ensemble-averaged circular variance (cid:104) CV[ L m ( ψ )] (cid:105) ( estimatedprecision , red), and the circular variance CV[ p ( δ m )] of estimation errors δ m = (cid:98) ψ m − ψ m within the ensemble of agents ( trueaccuracy , blue). Solid lines represent the analytical results, Eq. (18) and (21), for estimated precision and accuracy, respectively.The accuracy converges to one, corresponding to the randomization of estimated angles (cid:98) ψ . At the same time, the estimatedprecision converges to zero, displaying a cross-over between two scaling regimes as predicted by Eq. (19), see inset. B. Agentsaware of own rotational diﬀusion.

Same as panel A, but agents take into account their rotational diﬀusion in a prediction stepfor L ( ψ ). Solid lines represent the analytical results, Eq. (23) and (21). Estimated precision and accuracy converge to thesame limit value, CV ∞ . Inset illustrates circular distributions with circular variance 0 .

07 (black), 0 .

14 (gray), 0 .

64 (light-gray),using von-Mises distributions centered at an arbitrary ψ . Error bars represent s.e.m. (determined by bootstrapping for anensemble of n = 5000 agents, occasionally smaller than symbols). Parameters: ν = 5000, α = 0 . D τ = 0 .

05; Bayesianprior: κ = 3 . ≈ (cid:104) κ τ (cid:105) / , ψ = 0. To make analytical results comparable to simulated circular variances, we used the formulaCV = 1 − I ( κ ) /I ( κ ) to convert the measure of concentration κ of the von-Mises distributions in Eq. (18) [panel (a)] andEq. (23) [panel (b)] to a circular variance. Similarly, we used CV = 1 − exp( − σ /

2) to convert the variance paremeter σ ofthe wrapped normal distribution in Eq. (21) to a circular variance, where we additionally used (cid:98) σ = − I ( κ ) /I ( κ ) to relate κ m from Eq. (18) [panel (a)] and Eq. (23) [panel (b)] to (cid:98) σ m in Eq. (21). We now consider an agent that correctly takes into account its own rotational diﬀusion before updating the likeli-hood distribution L ( S ) of estimated concentration gradients. Following the terminology of the known Kalman ﬁlteralgorithm, we consider in addition to the update step , Eq. (6), which describes the incorporation of new measurementinformation, an additional prediction step that describes the change of L ( S ) due to rotational diﬀusion, see Fig. 1C.In its most general form, this prediction step is given by a Chapman-Kolmogorov equation L (cid:48) m − = L (cid:48) ( S (cid:48) |M m − ) = (cid:90) S P ( S (cid:48) |S ) L ( S|M m − ) d S . (22)Here, P ( S (cid:48) |S ) is the transition probability from state S to state S (cid:48) at time t m − . In our case, P ( ψ (cid:48) | ψ ) is a wrappednormal distribution with zero mean and variance 2 Dτ , while the other two state variables do not change, i.e., P ( S (cid:48) |S ) = P ( ψ (cid:48) | ψ ) δ ( c (cid:48) − c ) δ ( α (cid:48) − α ) is independent of time t .To make analytical progress, we again assume perfect knowledge of concentration c and gradient strength α , i.e.,a Bayesian prior of the form L ( S ) ∼ exp[ κ cos( ψ − ψ )] δ ( c − c ) δ ( α − α ). In the limit of high signal-to-noise ratio,SNR (cid:29)

1, and slow diﬀusion, Dτ (cid:28)

1, we can approximate all factors in Eqs. (6) and (22) by von-Mises distributionswith appropriate measures of concentrations. In this limit, the update step corresponds to the (normalized) productof two von-Mises distributions, while the prediction step corresponds to the convolution of two such distributions.From the calculus of directional distributions, see appendix B, we obtain a recursive relation for the measure ofconcentration κ m of a von-Mises distribution approximating L m ( ψ ) κ m = (cid:18) κ m − + 2 Dτ (cid:19) − + κ τ . (23)Here, we assume κ m ≈ (cid:104) κ m (cid:105) , κ τ ≈ (cid:104) κ τ (cid:105) , which is valid for SNR (cid:29)

1. For completeness, we list all assumptions madein deriving Eq. (23): (i) high molecule count, ν (cid:29) α (cid:28) (cid:29) Dτ (cid:28) (cid:29)

1, ensures that the simple formulas Eq. (S22)and Eq. (S23) can be used for the measure of concentration of convolutions and normalized products of von-Misesdistributions, respectively).

The iteration rule Eq. (23) for the measure of concentration κ m after m measurements deﬁnes a monotonicallyincreasing sequence with limit value κ ∞ (given by a root of the quadratic equation κ ∞ ( κ ∞ + κ τ ) = κ τ / ( Dτ )).Provided measurement intervals are short, τ (cid:28) ( D SNR /τ ) − / , we have κ ∞ ≈ (cid:114) κ τ Dτ = (cid:114) λ c D |∇ c | a . (24)This result for κ ∞ highlights the competition between information gain with rate κ τ /τ ≈ SNR /τ and informationloss by rotational diﬀusion with rotational diﬀusion coeﬃcient D . Mathematically, Eq. (24) is valid in a ‘sandwiched’limit (SNR /τ ) − (cid:28) τ (cid:28) ( D SNR /τ ) − / . Note that in the opposite limit, τ (cid:29) ( D SNR /τ ) − / , each rotationaldiﬀusion event would erase all previous measurements.We can understand the scaling for the limit value in Eq. (24) intuitively as follows. We expect κ m to increase linearlywith total measurement time t = mτ for t (cid:28) t ∞ , and to saturate to a limit value κ ∞ for t (cid:29) t ∞ , where the cross-overtime t ∞ = t ∞ ( D ) is a yet unknown function of the rotational diﬀusion coeﬃcient D . Such a saturation curve suggestsa scaling relation, t ∞ /τ ∼ k ∞ /k τ . Any measurements taken at a time t in the past will have been corrupted byrotational diﬀusion to an extent that they do not serve to increase the measure of concentration above a value ( Dt ) − .Thus, the cross-over time t ∞ must satisfy Dt ∞ ∼ k − ∞ . We conclude t ∞ ∼ τ / ( Dκ τ ), hence κ ∞ ∼ (cid:112) κ τ / ( Dτ ).With Eq. (24), we can characterize the distribution of estimation errors δ m = (cid:98) ψ m − ψ m within an ensemble ofagents. By Eq. (21), the variance parameter of this distribution converges to σ ∞ = lim m →∞ σ m ≈ κ − ∞ . Thus, σ ∞ ≈ (cid:98) σ ∞ , i.e., the estimated precision (cid:98) σ ∞ ≈ κ − ∞ = [2 Dτ /

SNR] / of gradient sensing of each individual agent is afaithful estimator for the true accuracy , i.e., the dispersion σ ∞ of maximum-likelihood estimates within an ensemble,provided the agents know their rotational diﬀusion coeﬃcient D . Fig. 2B corroborates this ﬁnding for the equivalentmeasure of circular variance.More generally, we can consider agents that assume a value (cid:98) D for their rotational diﬀusion coeﬃcient when per-forming their prediction step Eq. (22), while the true rotational diﬀusion coeﬃcient is D . In this case, the varianceparameter σ ∞ of the distribution p ( δ ) of estimation errors follows from Eqs. (21) and (23) in the long-time limit as σ ∞ = (cid:114) Dτ SNR · D + (cid:98) D (cid:112) D (cid:98) D , (25)which attains the minimal value σ ∞ = (cid:98) σ ∞ exactly for (cid:98) D = D .

5. DISCUSSION

Summary of results.

We considered a minimal model of gradient sensing in the presence of both sensing andmotility noise. We derived analytical results for sequential Bayesian estimation by chemotactic agents that undergorotational diﬀusion. Gradient sensing fails if agents are not aware of their own rotational diﬀusion, because agentsextend temporal averaging inﬁnitely into the past. Concomitantly, the estimated gradient direction decorrelates fromthe true direction on the time-scale of rotational diﬀusion, while agents erroneously believe that their estimates becomemore and more precise as function of total measurement time t . Interestinlgy, we ﬁnd an abnormal asymptotic scalingof the estimated variance of gradient estimates, CV[ L ( ψ )] ∼ t − / , see Eq. (19). This abnormal scaling is a signatureof erroneous state estimation and is intimately related to the properties of circular statistics. This signature could betested for in real-world applications, e.g., of bearing tracking.In contrast, if agents know their own diﬀusion coeﬃcient, sequential Bayesian estimation yields accurate estimatesof gradient direction. These estimates are both faithful and self-consistent , i.e., estimation errors are unbiased andindividual agents can estimate how large their estimation error is in each time step. In fact, the estimated precisionthat individual agents assign to their individual direction estimate converges to the true accuracy, i.e., the dispersionof maximum-likelihood estimates within an ensemble of agents. Remarkably, the ultimate precision of gradient sensingscales with the square root of the rotational diﬀusion coeﬃcient D as CV[ L ( ψ )] ∼ √ D . Intuitively, agents extendtemporal averaging only over the recent past deﬁned by a time span of duration t ∞ ∼ D − / . Measurements takenbefore this time span still contain partial information on the current gradient direction as long as they do not extendbeyond the rotational diﬀusion time D − . Yet, these measurements are already corrupted too much by rotationaldiﬀusion as that they could add to the precision achieved by temporal averaging only over the time span t ∞ . Infact, these past measurement would make the estimate worse if they were included. Mathematically, this reﬂects adiﬀerence between estimating a vectorial quantity, e.g., gradient direction, as opposed to estimating a scalar quantity,see appendix C. Previous theoretical limits.

Our result on the optimal time span for temporal averaging is diﬀerent from previouswork [25], which suggested that temporal comparison should be extended over a time span set by the rotationaldiﬀusion time, i.e., t ∞ ∼ D − . This previous work addressed a diﬀerent sensing and chemotaxis strategy based on temporal comparison , which measures a scalar quantity. It turns out that this makes a crucial diﬀerence. In short,temporal comparsion relies on active motion of chemotactic agents within a spatial concentration gradient, such thatconcentration diﬀerences in space become encoded in a temporal change of local concentration measurements. Thereby,the cells estimate only a scalar quantity, the component of the concentration gradient along their direction of motion( v · ∇ c ). If a cell’s swimming direction v / | v | decorrelates on a time-scale D − due to rotational diﬀusion, an optimalﬁlter should indeed discount previous measurements on the same time-scale, i.e., downweight measurements taken atime ∆ t before by a factor ∼ exp( − D ∆ t ). The strategy of temporal comparison is diﬀerent from spatial comparison asconsidered here, where chemotactic agents estimate concentration gradients by comparing concentrations across theirdiameter. Bacteria performing run-and-tumble chemotaxis employ temporal comparison [19], while most eukaryoticcells with crawling motility employ spatial comparison [4, 5, 42–44]. A third chemotaxis strategy is represented bymarine sperm cells navigating along helical paths [3]. Although these cells eﬀectively use only a single sensor, theirchemotaxis can be mapped on the case of spatial comparison considered here: while moving along helical paths, thesecells ‘visit’ diﬀerent sensor positions during one helical turn.In conclusion, our work identiﬁed a crucial diﬀerence regarding the optimal time span of temporal averaging betweenthe chemotaxis strategies of temporal and spatial comparison if both sensing and motility noise are present. Typical parameters.

Typical rotational diﬀusion coeﬃcients for the bacterium

E. coli are D ∼ . − , close to thetheoretical lower limit of a passive particle of same size and shape. For ten-fold larger sperm cells, active ﬂuctuationsdominate [45], resulting in an estimate D ∼ . − . − [46]. The motility of crawling Dictyostelium cells wascharacterized by a persistence time of ∼ D ∼ .

003 s − . For immune T cells in three-dimensional tissue, a persistence time of ∼ λc = 4 πD c ac for a perfectly absorbing sphericalcell of radius a , where D c denotes the translational diﬀusion coeﬃcient of signaling molecules [49]. For typical values( D c ∼ µ m s − , a ∼ µ m, c ∼ λc ∼ s − . Thus, for a concentration gradient of either α = 1%, 0 . .

1% across the diameter of a cell, and a measurement time τ = 10 s, we estimate signal-to-noiseratios of gradient sensing, SNR ≈ . ≈ .

9, and ≈ .

03, respectively. Assuming D ∼ .

003 s − [47], our mainresult, Eq. (24), predicts for the ultimate precision of gradient sensing CV ∞ ≈ . ≈ .

14, and ≈ .

64 for thesethree gradient strengths, respectively (see inset of Fig. 2B for visualization). Reversible binding of signaling moleculeseﬀectively increases sensing noise, thus decreasing the signal-to-noise ratio by a constant prefactor [8, 9, 12]. Somecells, such as sperm cells, respond chemotacticly even at pico-molar concentrations [50], corresponding to respectivelylower signal-to-noise ratios [20, 41].Bayesian estimation sets a lower bound for the precision of gradient sensing. This theoretical limit is relevant at highnoise levels, but may be less so if noise is low. For the slime mold

Dictyostelium , it was shown that at signal-to-noiseratios (SNR) below one, the eﬃcacy of chemotaxis was well characterized by SNR alone, while noise of downstreamintracellular signaling [51] becomes also relevant at high SNR [12, 16, 17]. While several of our analytical results werederived for high SNR, scaling relations persist for low SNR and are conﬁrmed by numerical simulations.

Biochemical implementation.

Storing the likelihood distribution of estimated gradient directions, or just a proxythereof, requires internal memory. We speculate that the distribution of chemotactic eﬀector molecules on the cellboundary as considered in recent models such as LEGI ( local excitation, global inhibition ) [52], or balanced-inactivationmodels [53] can indeed serve as a such a proxy. While the position of a concentration peak in such a distribution canrepresent a maximum likelihood estimate, its amplitude could encode a level of certainty. Similarly, the directionalpersistence and polarization of crawling cells represents a form of eﬀective memory [54, 55]. For marine sperm cells,the axis of their helical paths likewise represents a consolidated memory of previous noisy concentration measurements[56].0

Time-varying gradients.

Here, we considered only static concentration gradients. Yet, the general framework de-veloped here generalizes in a straight-forward manner to time-varying environments, provided their temporal statisticsis known to the agent. For example, analogous results apply if instead of rotational diﬀusion of the chemotactic agent,it is the direction of the concentration gradient that changes stochastically with a correlation time D − (e.g., due toan explicit time dependence of concentration ﬁelds, or due to active motion of the agent within a spatially complexconcentration ﬁeld). The seminal infotaxis strategy proposed a Bayesian framework for navigation in time-dependentconcentration ﬁelds, using time-averaged properties of scalar turbulence [38]. In the presence of motility noise, thisproblem becomes considerably harder, for which our work can serve as a ﬁrst step. Acknowledgments

MN and BMF are supported by the German National Science Foundation (DFG) through the Excellence Initiativeby the German Federal and State Governments (Clusters of Excellence cfaed EXC-1056 and PoL EXC-2068), aswell as DFG grant FR3429/3-1 to BMF. We thank all members of the ‘Biological Algorithms Group’ for stimulatingdiscussions. [1] H. C. Berg and D. A. Brown, Nature , 500 (1972).[2] M. Eisenbach and L. Giojalas, Nat. Rev. Mol. Cell Biol. , 276 (2006).[3] L. Alvarez, B. M. Friedrich, G. Gompper, and U. B. Kaupp, Trends Cell Biol. , 198 (2014).[4] P. N. Devreotes and S. H. Zigmond, Ann. Rev. Cell Biol. , 649 (1988).[5] S. H. Zigmond, J. Cell Biol. , 606 (1977).[6] T. Gregor, D. W. Tank, E. F. Wieschaus, and W. Bialek, Cell , 153 (2007).[7] H. C. Berg and E. M. Purcell, Biophys. J. , 193 (1977).[8] W. Bialek and S. Setayeshgar, Proc. Natl. Acad. Sci. U.S.A. , 10040 (2005).[9] K. Kaizu, W. De Ronde, J. Paijmans, K. Takahashi, F. Tostevin, and P. R. ten Wolde, Biophys. J. , 976 (2014).[10] W. J. Rappel and H. Levine, Proc. Natl. Acad. Sci. U.S.A. , 19270 (2008).[11] B. Hu, W. Chen, W.-J. Rappel, and H. Levine, Phys. Rev. Lett. , 048104 (2010).[12] R. G. Endres and N. S. Wingreen, Proc. Natl. Acad. Sci. U. S. A , 15749 (2008).[13] P. R. ten Wolde, N. B. Becker, T. E. Ouldridge, and A. Mugler, J. Stat. Phys. , 1395 (2016).[14] P. J. Van Haastert and M. Postma, Biophys. J. , 1787 (2007).[15] D. Mortimer, J. Feldner, T. Vaughan, I. Vetter, Z. Pujic, W. J. Rosoﬀ, K. Burrage, P. Dayan, L. J. Richards, and G. J.Goodhill, Proc. Natl. Acad. Sci. U.S.A. , 10296 (2009).[16] D. Fuller, W. Chen, M. Adler, A. Groisman, H. Levine, W.-J. Rappel, and W. F. Loomis, Proc. Natl. Acad. Sci. U.S.A. , 9656 (2010).[17] G. Amselem, M. Theves, A. Bae, C. Beta, and E. Bodenschatz, Phys. Rev. Lett. , 1 (2012).[18] D. R. Brumley, F. Carrara, A. M. Hein, Y. Yawata, S. A. Levin, and R. Stocker, Proc. Natl. Acad. Sci. U.S.A. , 10792(2019).[19] J. E. Segall, S. M. Block, and H. C. Berg, Proc. Nat. Acad. Sci. U.S.A. , 8987 (1986), ISSN 0027-8424.[20] N. D. Kashikar, L. Alvarez, R. Seifert, I. Gregor, O. J¨ackle, M. Beyermann, E. Krause, and U. B. Kaupp, J. Cell Biol. , 1075 (2012).[21] D. Hathcock, J. Sheehy, C. Weisenberger, E. Ilker, and M. Hinczewski, IEEE Trans. Mol. Biol. Multi-Scale Commun. ,16 (2016).[22] A. Celani and M. Vergassola, Proc. Natl. Acad. Sci. U.S.A. , 1391 (2010).[23] G. Aquino, L. Tweedy, D. Heinrich, and R. G. Endres, Sci. Rep. , 1 (2014).[24] A. M. Hein, D. R. Brumley, F. Carrara, R. Stocker, and S. A. Levin, J. R. Soc. Interface (2016).[25] S. P. Strong, B. Freedman, W. Bialek, and R. Koberle, Phys. Rev. E , 4604 (1998).[26] B. W. Andrews, T. M. Yi, and P. A. Iglesias, PLoS Comp. Biol. , 1407 (2006).[27] R. E. Kalman, Trans. ASME – J. Basic Engin. , 35 (1960).[28] J. L. Melsa and D. L. Cohn, Decision and estimation theory (McGraw-Hill, New York, 1978).[29] T. J. Kobayashi, Phys. Rev. Lett. , 1 (2010).[30] R. G. Endres and N. S. Wingreen, Phys. Rev. Lett. , 158101 (2009).[31] C. Zechner, G. Seelig, M. Rullan, and M. Khammash, Proc. Natl. Acad. Sci. U.S.A. , 4729 (2016).[32] M. Scholz, A. R. Dinner, E. Levine, and D. Biron, Proc. Natl. Acad. Sci. U.S.A. , 9261 (2017).[33] T. Mora and N. S. Wingreen, Phys. Rev. Lett. , 248101 (2010).[34] E. Libby, T. J. Perkins, and P. S. Swain, Proc. Natl. Acad. Sci. U.S.A. , 7151 (2007).[35] E. D. Siggia and M. Vergassola, Proc. Natl. Acad. Sci. U.S.A. , E3704 (2013).[36] A. Mayer, V. Balasubramanian, A. M. Walczak, and T. Mora, Proc. Natl. Acad. Sci. U.S.A. , 8815 (2019).[37] B. Hu, W. Chen, W. J. Rappel, and H. Levine, Phys. Rev. E , 021917 (2011). [38] M. Vergassola, E. Villermaux, and B. Shraiman, Nature , 406 (2007).[39] I. Petrovi´c and I. Markovi´c, in (2012), pp.707–712.[40] G. Kurz, I. Gilitschenski, and U. D. Hanebeck, IEEE Aero. El. Sys. Mag. , 70 (2016).[41] J. Kromer, S. M¨arcker, S. Lange, C. Baier, and B. M. Friedrich, PLoS Comp. Biol. , 1 (2018).[42] R. A. Arkowitz, Trends Cell Biol. , 20 (1999).[43] C. L. Manahan, P. A. Iglesias, Y. Long, and P. N. Devreotes, Annu. Rev. Cell Dev. Biol. , 223 (2004).[44] M. Sarris and M. Sixt, Curr. Opin. Cell Biol. , 93 (2015).[45] R. Ma, G. S. Klindt, I. H. Riedel-Kruse, F. J¨ulicher, and B. M. Friedrich, Phys. Rev. Lett. , 048101 (2014).[46] B. M. Friedrich, Phys. Biol. , 026007 (2008).[47] P. J. Van Haastert, PLoS Comp. Biol. (2010).[48] E. R. Jerison and S. R. Quake, bioRxiv (2019), URL .[49] H. C. Berg, Random Walks in Biology (Princeton Univ. Press, 1993).[50] T. Str¨unker, L. Alvarez, and U. Kaupp, Curr. Opinion Neurobiol. , 110 (2015).[51] M. Ueda and T. Shibata, Biophys. J. , 11 (2007).[52] L. Ma, C. Janetopoulos, L. Yang, P. N. Devreotes, and P. A. Iglesias, Biophys. J. , 3764 (2004).[53] H. Levine, D. A. Kessler, and W.-J. , Proc. Natl. Acad. Sci. U.S.A. , 9761 (2006).[54] M. Skoge, H. Yue, M. Erickstad, A. Bae, H. Levine, A. Groisman, W. F. Loomis, and W.-J. Rappel, Proc. Natl. Acad.Sci. U. S. A , 14448 (2014).[55] B. W. Andrews and P. A. Iglesias, PLoS Comp. Biol. , 1 (2007).[56] B. M. Friedrich and F. J¨ulicher, Phys. Rev. Lett. , 068102 (2009).[57] K. V. Mardia and P. E. Jupp, Directional statistics (John Wiley & Sons, 2000). A. DETAILS ON ANALYTICAL CALCULATIONSA.1. Expectation values of higher moments

In deriving Eqs. (2) and (3) for the mean µ and covariance matrix Σ of a single measurement M , valid for N ≥ (cid:104) (cid:101) n (cid:105) = ν , (cid:104) (cid:101) n (cid:105) = (cid:104) (cid:101) n (cid:105) + ν , (cid:104) (cid:101) n (cid:105) = α ν e iψ , (cid:104)| (cid:101) n | (cid:105) = |(cid:104) (cid:101) n (cid:105)| + ν , (cid:104) (cid:101) n (cid:101) n (cid:105) = (cid:104) (cid:101) n (cid:105)(cid:104) (cid:101) n (cid:105) + (cid:104) (cid:101) n (cid:105) . (S1)Similarly, (cid:104) (cid:101) n | (cid:101) n | (cid:105) = (cid:104) (cid:101) n (cid:105)(cid:104)| (cid:101) n | (cid:105) + α ν + α ν + 2 ν + ν . (S2)For the special case N = 2, we ﬁnd diﬀerent from Eq. (S1) (cid:104) (cid:101) n (cid:105) = α ν cos ψ . (S3)For N = 3, we ﬁnd the same ﬁrst moments as in Eq. (S1), but diﬀerent covariance matrixΣ ( N =3)0 = Σ ( N ≥ + α ν  ψ ) − sin( ψ )0 − sin( ψ ) cos( ψ )  , (S4)where Σ ( N ≥ denotes the result for N ≥ (cid:104) (cid:101) n (cid:105) involves a sum of squared roots of unity, (cid:80) Nj =1 η j , which is nonzero for N = 2, while the calculation of (cid:104) (cid:101) n (cid:105) for Σ involves a sum of cubed roots of unity, (cid:80) Nj =1 η j , which is nonzero for N = 3. A.2. Expected precision of a single measurement (cid:104) κ τ (cid:105) In the double limit of high signal-to-noise ratio, SNR (cid:29)

1, and weak gradients, α (cid:28)

1, the factors in the deﬁnitionof κ τ = (cid:101) n | (cid:101) n | A/ν are (approximately) statistically independent; hence (cid:104) κ τ (cid:105) ≈ (cid:104) (cid:101) n (cid:105)(cid:104)| (cid:101) n |(cid:105)(cid:104) A (cid:105) / (cid:104) ν (cid:105) = α (cid:104)| (cid:101) n |(cid:105) , since (cid:104) A (cid:105) ≈ α .To compute (cid:104)| (cid:101) n |(cid:105) , we use the law of cosines c = | (cid:101) n | = (cid:112) a + b − ab cos ϕ where a = |(cid:104) (cid:101) n (cid:105)| = α ν / b = | (cid:101) n − (cid:104) (cid:101) n (cid:105)| . (S5)The isotropy of the covariance matrix in Eq. (3) implies that ϕ is a uniformly distributed random angle with prob-ability distribution p ( ϕ ) = (2 π ) − , while b follows a χ -distribution for 2 degrees of freedom (namely, (cid:101) n (cid:48) −(cid:104) (cid:101) n (cid:48) (cid:105) and (cid:101) n (cid:48)(cid:48) −(cid:104) (cid:101) n (cid:48)(cid:48) (cid:105) ), hence p ( b ) = (2 b/ν ) exp( − b /ν ). Now, (cid:104) c (cid:105) = (cid:82) ∞ db p ( b ) (cid:72) π dϕ p ( ϕ ) c ( b, ϕ ). The ﬁrst integration, I ( b ) = (cid:72) π dϕ p ( ϕ ) c ( b, ϕ ), results in an elliptic integral, which, however, can be well approximated by I ( b ) ≈  α ν (cid:20) (cid:16) bα ν (cid:17) (cid:21) b ≤ α ν / b (cid:104) (cid:2) α ν b (cid:1) (cid:105) b > α ν / . (S6)The second integration can now be easily done, yielding (cid:104) c (cid:105) = 1 α (cid:18) SNR + 12 + . . . (cid:19) , (S7)where the ellipses represents terms that decay exponentially fast for SNR (cid:29) A.3. Prefactor in Eq. (7)

The prefactor Λ in Eq. (7) is independent of ψ and readsΛ = P ( M m +1 | M m ) − (cid:112) (2 π ) (2 − α ) ν · exp (cid:20) − ν (cid:18)(cid:101) n − α − (cid:101) n ν + ν (cid:19)(cid:21) exp (cid:18) − α − α ) ν | (cid:101) n | (cid:19) . (S8)In the limit of low signal-to-noise ratio, SNR (cid:28)

1, and α ≈ α , this expression simpliﬁes toΛ = P ( M m +1 | M m ) − νπ ) / e − ( ν − (cid:101) n ν e − | (cid:101) n | ν + O ( α ν / ) . (S9) A.4. Precision of sequential measurements

We compute the second moment (cid:104) κ m (cid:105) of the measure of concentration of the marginal likelihood distribution L m ( ψ )of the estimated gradient angle ψ after m subsequent measurements, approximating said distribution by a von-Misesdistribution.Analogous to Eq. (11), we have κ m e i (cid:98) ψ m = κ e iψ + m (cid:88) j =1 κ ( j ) τ e iψ ( j ) τ , (S10)where κ ( j ) τ is the measure of concentration of the j th measurement M j , and (cid:101) n ( j )0 and (cid:101) n ( j )1 = | (cid:101) n ( j )1 | e iψ ( j ) τ denote therespective zeroth and ﬁrst Fourier modes of molecule counts in M j . In the absence of rotational diﬀusion, subsequentmeasurements are independent random variables, hence (cid:104) (cid:101) n j, (cid:101) n ( j )1 (cid:101) n k, (cid:101) n ( k )1 ∗ (cid:105) = (cid:104) (cid:101) n j, (cid:101) n ( j )1 (cid:105)(cid:104) (cid:101) n k, (cid:101) n ( k )1 (cid:105) ∗ . (S11)Thus, we ﬁnd analogous to Eq. (12) (cid:104) κ m (cid:105) = κ + α ν (cid:88) j (cid:104) (cid:101) n ( j )0 2 | (cid:101) n ( j )1 | (cid:105) + (cid:88) j (cid:54) = k (cid:104) (cid:101) n ( j )0 (cid:101) n ( j )1 (cid:101) n k, (cid:101) n ( k )1 ∗ (cid:105)  + α ν κ e iψ (cid:88) j (cid:104) (cid:101) n ( j )0 (cid:101) n ( j )1 (cid:105) ∗ + c . c . (S12a)= κ + α ν (cid:88) j (cid:104) (cid:101) n j, | (cid:101) n ( j )1 | (cid:105) + (cid:88) j (cid:54) = k (cid:104) (cid:101) n ( j )0 (cid:101) n ( j )1 (cid:105)(cid:104) (cid:101) n ( k )0 (cid:101) n ( k )1 (cid:105) ∗  + α ν κ e iψ (cid:88) j (cid:104) (cid:101) n j, (cid:101) n ( j )1 (cid:105) ∗ + c . c . (S12b)= κ + 2(1 + κ ) m SNR + m SNR + O ( α , α ν ) . (S12c)In the presence of rotational diﬀusion, D >

0, Eq. (S12a) still holds, but Eq. (S12b) does not. We use the approxi-mation that (cid:101) n ( j )0 and (cid:101) n ( j )1 are approximately independent for each measurement, hence (cid:104) κ m (cid:105) ≈ κ + α ν (cid:88) j (cid:104) (cid:101) n ( j )0 2 | (cid:101) n ( j )1 | (cid:105) + (cid:88) j (cid:54) = k (cid:104) (cid:101) n ( j )0 (cid:101) n ( k )0 (cid:105)(cid:104) (cid:101) n ( j )1 (cid:101) n ( k )1 ∗ (cid:105)  + α ν κ e iψ (cid:88) j (cid:104) (cid:101) n ( j )0 (cid:101) n ( j )1 (cid:105) ∗ + c . c . . (S13)We ﬁrst compute the second sum of expectation values in Eq. (S13) by evaluating a double geometric series (cid:88) j (cid:54) = k (cid:104) (cid:101) n ( j )1 (cid:101) n ( k )1 ∗ (cid:105) = α ν (cid:88) j

We derive the asymptotic scaling of the variance parameter σ m characterizing the distribution of estimation errors δ m = (cid:98) ψ m − ψ m in section 4. We start with the ansatz σ m = ( γm ) / + O (1) for m large. Inserting (cid:98) σ m ≈ (cid:104) κ m (cid:105) − / from Eq. (19) and σ τ ≈ (cid:104) κ τ (cid:105) − ≈ SNR − from Eq. (10) into Eq. (21), we obtain a self-consistency condition √ γ m = 1 (cid:113) D τ m − + 1 (cid:16)(cid:112) γ ( m −

1) + 2

D τ (cid:17) + (cid:113) D τ m − (cid:113) D τ m − + 1 SNR − . (S17)We expand left-hand and right-hand side of this equation into powers of m − / , and match both the leading-orderterm O ( m / ) and the ﬁrst-order correction O (1): this yields γ = 2 D τ and validates the Ansatz.

B. BASIC PROPERTIES OF CIRCULAR DISTRIBUTIONS

A probability distribution p ( ψ ) of angles should be 2 π -periodic, i.e., p ( ψ ) = p ( ψ + 2 π ), and normalized to one onthe unit circle, i.e., (cid:72) π dψ p ( ψ ) = 1. The circular variance of such a circular distribution is deﬁned asCV[ p ( ψ )] = 1 − (cid:12)(cid:12)(cid:12)(cid:12)(cid:73) π dψ e iψ p ( ψ ) (cid:12)(cid:12)(cid:12)(cid:12) . (S18)An important circular distribution is the wrapped normal distribution WN ( ψ ; µ, σ ) = ∞ (cid:88) l = −∞ (2 πσ ) − / exp (cid:18) − ( ψ − µ + 2 πl ) σ (cid:19) (S19)with variance parameter σ , whose circular variance reads CV = 1 − e − σ / . In the limit σ (cid:28)

1, CV ≈ σ / VM ( ψ ; µ, κ ) = (2 πI ( κ )) − exp [ κ cos( ψ − µ )] , (S20)where κ is the so-called measure of concentration or precision, and I n ( κ ) is the modiﬁed Bessel function of order n .The circular variance reads CV = 1 − I ( κ ) /I ( κ ) = 1 / (2 κ ) + 1 / (8 κ ) + O ( κ − ).The normalized product of two von-Mises distributions, say VM ( ψ ; µ , κ ) and VM ( ψ ; µ , κ ), is again a von-Misesdistribution VM ( ψ ; µ, κ ). Such normalized product appears, e.g., in Bayes formula, Eq. (6). Speciﬁcally, the mean µ and measure of concentration κ of the normalized product satisfy κe iµ = κ e iµ + κ e iµ . (S21)If mean values are close, | µ − µ | (cid:28)

1, we have the approximate sum rule κ ≈ κ + κ .In contrast, the circular convolution of two von-Mises distributions VM ( ψ ; µ , κ ) and VM ( ψ ; µ , κ ) is onlyapproximately a von-Mises distribution [57]. To ﬁnd such approximation, one can ﬁrst map the two von-Misesdistributions onto wrapped normal distributions of same respective mean and circular variance, and compute theconvolution of these wrapped normal distributions [40]. The convolution of two wrapped normal distributions,say with variance parameters σ and σ , is again a wrapped normal distribution with new variance parameter σ = σ + σ . Finally, this new wrapped normal distribution WN ( ψ ; µ + µ , σ ) is mapped back onto a von-Mises distribution VM ( ψ ; µ, κ ). Thus, VM ( ψ ; µ, κ ) ≈ VM ( ψ ; µ , κ ) ∗ VM ( ψ ; µ , κ ) with µ = µ + µ and4 I ( κ ) /I ( κ ) = exp( − σ /

2) = exp( − σ ) exp( − σ ) = [ I ( κ ) /I ( κ )] · [ I ( κ ) /I ( κ )]. In the limit κ , κ (cid:29)

1, wehave κ − = κ − + κ − .For ease of reference, we highlight the sum rules for the measure of concentration of either a normalized productor convolution of two von-Mises distributionsconvolution: 1 κ ≈ κ + 1 κ , (S22)multiplication: κ ≈ κ + κ . (S23)While Eq. (S22) is valid for κ , κ (cid:29)

1, Eq. (S23) is valid for | µ − µ | (cid:28)

1. Note that Eq. (S23) is not used untilsection 4 4.2; before we always use the exact expression Eq. (S21).

C. OPTIMAL AVERAGING TIME FOR ESTIMATION OF A SCALAR QUANTITY

Previous work addressed the optimal time span for temporal averaging for chemotaxis by temporal comparison [19].In our notation, this amounts to estimating the scalar quantity s true ( t ) = ddt c ( R ( t )) = v h · ∇ c from a noisy inputsignal s ( t ), where the agent moves with velocity ˙ R = v h .As a minimal pedagogical model, we approximate s true ( t ) as an Ornstein-Uhlenbeck process with (cid:104) s true ( t ) (cid:105) = 0 and (cid:104) s true ( t ) s true ( t − ∆ t ) (cid:105) = ( α c v ) / − D | ∆ t | ). We assume a measurement process with additive sensing noise, s ( t ) = s ( t ) + ξ ( t ), where ξ ( t ) denotes Gaussian white noise with (cid:104) ξ ( t ) (cid:105) = 0 and (cid:104) ξ ( t ) ξ ( t (cid:48) ) (cid:105) = ( σ /τ ) δ ( t − t (cid:48) ). Theagent shall perform temporal averaging using a linear ﬁlter χ (∆ t ) (cid:98) s ( t ) = (cid:90) ∞ d ∆ t s ( t − ∆ t ) χ (∆ t ) . (S24)We restrict ourselves to ﬁlters of exponential form, χ (∆) = A exp( − ∆ t/t χ ), and ask for the optimal averaging timespan t χ . From the condition for a faithful estimtor, (cid:104) (cid:98) s ( t ) (cid:105) = s true ( t ), we obtain the prefactor A as A = D + 1 /t χ (where this and subsequent expectation values are conditioned on s true ( t )). We minimize the variance of estimationerrors δ ( t ) = (cid:98) s ( t ) − s true ( t ), which is equivalent to minimizing (cid:104) (cid:98) s (cid:105) . Using the autocorrelation function of s true ( t )above, we ﬁnd (cid:104) (cid:98) s (cid:105) = A (cid:90) ∞ (cid:90) ∞ d ∆ t d ∆ t s ( t ) e − D | ∆ t − ∆ t | e − (∆ t +∆ t ) /t χ . (S25)We compute this integral using the change of variables z = ∆ t + ∆ t and z = ∆ t − ∆ t , and ﬁnd (cid:104) (cid:98) s (cid:105) = s ( t ) (cid:18) D + 1 t χ (cid:19) D t χ ∼ Dt χ + 2 + 1 Dt χ . (S26)This expectation value becomes minimal exactly for t χ = D − , irrespective of the value of s true ( t ). Thus, the optimalaveraging time equals the rotational diﬀusion time for estimating this scalar quantity. Similar results were found fordetailed models of bacterial chemotaxis [19]. Note that in this case, the time-derivative d/dt is incorporated into theﬁlter function itself, representing a smoothed time derivative. D. INTERMEDIATE MEASUREMENTS GIVE (ALMOST) NO ADVANTAGE

As in the main text, we consider an agent that monitors a static environment with constant state S ( t ) = S ,where the agent takes subsequent measurements M j during time intervals T j = (( j − τ, jτ ). We ask if knowing theresults M j of the intermediate measurements confers any advantage compared to a single measurement of same totalduration mτ , corresponding to the sum M = (cid:80) mj =1 M j , if rotational diﬀusion is absent, D = 0.The answer is ‘no’ for a Poisson point process, e.g. if the agent estimates an absolute concentration S = c bycounting the number of independent molecular binding events M j in a time-interval T j . We sketch the known prooffor m = 2. We want to show L ( S | M , M ; τ ) = L ( S | M + M ; 2 τ ) , (S27)where the left-hand and right-hand side denote the likelihood of state S estimated from two intermediate measurements M and M , each of duration τ , or a single measurement of duration 2 τ given by M + = M + M , respectively.5We assume that measurements are Poisson distributed, i.e., P ( M|S ; τ ) = e − µ µ M / M !, where the mean number ofbinding events µ = λcτ within a time-interval τ is proportional to concentration c . A straight-forward applicationof the Binomial formula yields P ( M + | S ; 2 τ ) = P ( M | S ; τ ) P ( M | S ; τ ) for the measurement probabilities. Bayes’theorem implies Eq. (S27) for arbitrary prior L ( S ). The case of k ≥ µ (cid:29)

1, the Poisson distribution is well approximated by a normal distribution.One may thus assume that a statement similar to Eq. S27 also holds true for a Gaussian measurement model with P ( M j | µ ; τ ) = N ( M j ; µ, σ ), j = 1 ,

2, and P ( M + | µ ; τ ) = N ( M + ; 2 µ, σ ), where µ and σ are unknown parameters.(As above, N ( x, µ, σ ) denotes the normal distribution with argument x , mean µ , variance σ .) Intriguingly, theanswer now depends on whether µ and σ are independent or not, according to the Bayesian prior L ( S ) = L ( µ, σ ).If µ and σ are independent, Eq. (S27) holds also for Gaussian measurements. If, however, µ and σ are independent,say σ = σ ( µ ) is a function of µ , an explicit calculation yields L ( (cid:98) µ | M , M ; τ ) = N ( M − ; 0 , σ ( (cid:98) µ ) ) L ( (cid:98) µ | M + M ; 2 τ ) . (S28)In short, knowledge of M − = M − M improves the estimate for σ , which, in turn, improves the estimate (cid:98) µ of µ .In the limit dσ/dµ (cid:28)

1, the pre-factor on the right-hand side of Eq. (S28) will be approximately constant, displayingonly relative changes on the order of [ M − /σ ( µ )] dσ/d (cid:98) µ | (cid:98) µ = µ , which, with probability close to 1, is small. Speciﬁcally,this will hold true for a diﬀusion approximation of a Poissonian measurement model, where ˆ µ = σ (ˆ µ ), provided ˆ µµ