[PDF] Noise Propagation in Biological and Chemical Reaction Networks

Abstract

We describe how noise propagates through a network by calculating the variance of the outputs. Using stochastic calculus and dynamical systems theory, we study the network topologies that accentuate or alleviate the effect of random variance in the network for both directed and undirected graphs. Given a linear tree network, the variance in the output is a convex function of the poles of the individual nodes. Cycles create correlations which in turn increase the variance in the output. Feedforward and feedback have a limited effect on noise propagation when the respective cycles is sufficiently long. Crosstalk between the elements of different pathways helps reduce the output noise, but makes the network slower. Next, we study the differences between disturbances in the inputs and disturbances in the network parameters, and how they propagate to the outputs. Finally, we show how noise correlations can affect the steady state of the system in chemical reaction networks with reactions of two or more reactants, each of which may be affected by independent or correlated noise sources.

Full PDF

aa r X i v : . [ q - b i o . M N ] A ug Noise Propagation in Biological and ChemicalReaction Networks

Dionysios Barmpoutis and Richard M. Murray Computation and Neural Systems Control and Dynamical SystemsCalifornia Institute of [email protected]@cds.caltech.eduApril 6, 2019

Abstract

We describe how noise propagates through a network by calculat-ing the variance of the outputs. Using stochastic calculus and dynam-ical systems theory, we study the network topologies that accentuateor alleviate the eﬀect of random variance in the network for bothdirected and undirected graphs. Given a linear tree network, the vari-ance in the output is a convex function of the poles of the individualnodes. Cycles create correlations which in turn increase the variancein the output. Feedforward and feedback have a limited eﬀect on noisepropagation when the respective cycles is suﬃciently long. Crosstalkbetween the elements of diﬀerent pathways helps reduce the outputnoise, but makes the network slower. Next, we study the diﬀerencesbetween disturbances in the inputs and disturbances in the networkparameters, and how they propagate to the outputs. Finally, we showhow noise correlations can aﬀect the steady state of the system inchemical reaction networks with reactions of two or more reactants,each of which may be aﬀected by independent or correlated noisesources. Introduction and Overview

Noise is ubiquitous in nature, and virtually all signals carry some amount ofrandom noise. In addition, even the simplest systems can be represented asa set of smaller subsystems interconnected with each other. There have beennumerous studies on how noise aﬀects speciﬁc functions (e.g. [1], [2] andreferences therein), but none of them has looked at how noise propagates ingeneral networks, and how various network structures impact the robustnessof each system to noise. Although there is evidence that it may degradethe system performance, noise is sometimes necessary for speciﬁc functions[3]. Networks in which information is transmitted through a means that isaccessible by all the individual units of the network are prone to unwantedcrosstalk interactions between various unrelated subsystems [4]. Both noiseand crosstalk have been treated as something unwanted in engineering sys-tems. However, they do not seem to be a problem in the cell, or in naturalbiological systems in general, despite the large number of noise sources, thevariety of molecules, and the intricate patterns of interactions.We present a new method to quantify the noise propagation in a system,and the vulnerability of each of its subsystems. We use results from graphtheory and control systems theory to quantify noise propagation in networks,and use them to evaluate various network structures in terms of how well theyﬁlter out noise. We study how crosstalk can help suppress noise, when thenoise sources are independent or correlated. We show that perturbationsthat depend on the state of the system (for example, feedback loops thatare prone to noise or noisy degradation rates) have a fundamentally diﬀerenteﬀect on the system output, compared to noise in the inputs. Finally, westudy noise propagation in chemical reaction networks where all reactantsmay introduce noise, and analytically ﬁnd that noise correlations may aﬀectthe expected behavior of such systems.

In this section, we will brieﬂy revisit some basic tools from control systemstheory. Consider a linear time invariant system with impulse response h ( t, s )25]. The general form of the output when the input signal is u ( t ) is y ( t ) = Z t −∞ h ( t, s ) u ( s ) ds (1)where h ( t ) is the impulse response of the dynamical system. A system with m inputs, n states and p outputs can be written in the form S :  dxdt = Ax + Buy = Cx, (2)where the dimensions of matrices

A, B and C are n × n , n × m and p × n respectively. The output of the system at time t when the input is an impulseapplied at time s is h ( t, s ) = Ce A ( t − s ) B (3)and equation (1) can be simpliﬁed to y ( t ) = C Z t −∞ e A ( t − s ) Bu ( s ) ds. (4)When the network in question is comprised of elements whose outputsobey linear time-invariant diﬀerential equations, we can also ﬁnd the Fouriertransform of the network output: H ( ω ) = Z + ∞−∞ h ( t ) e − jωt dt, (5)where h ( t ) = h ( t,

0) is the impulse response of the system and ω = 2 πf isthe angular frequency. If the system is causal ( h ( t ) = 0 for t < E [ y ( t )] and the variance V [ y ( t )] of the output y ( t ) will be denoted as E [ y ] and V [ y ] respectively: E [ y ] = lim t →∞ E [ y ( t )] and V [ y ] = lim t →∞ V [ y ( t )] . (6)3f we know the impulse response of the system, the mean of the output vectorcan be expressed as E [ y ( t )] = E (cid:20)Z t −∞ h ( t − s ) u ( s ) ds (cid:21) = Z t −∞ h ( t − s ) · E [ u ( s )] ds, (7)where in the last equation we have interchanged the expectation with theintegration operator, assuming that the input functions are non-pathological,and the quantities are ﬁnite, such that all the integrands are measurable inthe respective measure space (Fubini’s theorem, [6]). In what follows, we willalways assume that all such conditions are satisﬁed.The covariance matrix of the outputs, when applying the same input is V [ y ( t )] = E [ y ( t ) · y T ( t )]= Z t −∞ Z t −∞ h ( t − r ) (cid:0) E (cid:2) u ( r ) u T ( s ) (cid:3) − E [ u ( r )] E (cid:2) u T ( s ) (cid:3)(cid:1) h T ( t − s ) drds. (8)If in addition u ( t ) = 0 for t <

0, then according to equation (6), V [ y ] = lim t →∞ Z t Z t h ( t − r ) (cid:0) E (cid:2) u ( r ) u T ( s ) (cid:3) − E [ u ( r )] E (cid:2) u T ( s ) (cid:3)(cid:1) h T ( t − s ) drds. (9) In this subsection, we will be describing some elementary properties of theWiener process that will be used in the following analysis. Let ξ n , n ∈ N bea sequence of independent identically distributed random variables with zeromean and unit standard deviation. Their sum is S n = n X k =1 ξ n . (10)We now deﬁne the piecewise constant function W t = lim n →∞ S ⌊ nt ⌋ √ n . (11)4ccording to the Central Limit Theorem, the distribution of W t is indepen-dent of the distribution of the sequence of ξ n , as long as they have ﬁnitevariance, are identically distributed and independent of each other. Therandom process W t is normally distributed with variance equal to the timeinterval it which it is measured: W t = lim n →∞ S ⌊ nt ⌋ √ nt √ nt √ n = ⇒ W t ∼ N (0 , t ) . (12)The diﬀerence of two sums S b − S a with a < b has the same distribution ofthe random variable S b − a and as a result W b − W a ∼ W b − a ≤ a < b. (13)Lastly, the random variables W b − W a and W d − W c are independent when0 ≤ a < b ≤ c < d , since the respective sums consist of independent randomvariables. More details on the properties of the Wiener process can be foundin [6]. A graph (also called a network ) is an ordered pair G = ( V , E ) comprised ofa set V = V ( G ) of vertices together with a set E = E ( G ) of edges that areunordered 2-element subsets of V . Two vertices u and v are called neighbors if they are connected through an edge (( u, v ) ∈ E ) and we write u − v ,otherwise we write u / − v . The neighborhood N u of a vertex u is the set ofits neighbors. The degree of a vertex is the number of its neighbors. The order N of a graph is the number of its vertices, N = |V| . A graph’s size (denoted by m = |E | ), is the number of its edges. We will denote a graph G of order N and size m as G ( N, m ) or simply G N,m . A path is a sequenceof consecutive edges in a graph and the length of the path is the number ofedges traversed. The distance between two vertices u and v , usually denotedby d = d ( u, v ), is the length of the shortest path that connects these twovertices. A full cycle is a cycle that includes all the vertices of the network.A graph is connected if for every pair of vertices u and v , there is a pathfrom u to v . Otherwise the graph is called disconnected . We will be focusingexclusively on connected graphs, because every disconnected graph can beanalyzed as the sum of its connected components . A tree is a graph in whichany two vertices are connected by exactly one path. A path graph is a tree5ith two or more vertices that has two vertices with degree 1, while all othervertices have degree 2. A thorough treatment of the graph theory notionsused in this article can be found in [7]. In the state space, when the parameters of the system are deterministic andthe input consists of a deterministic and a random component (white noise),then the system (2) is deﬁned by the stochastic diﬀerential equation: S :  dx = Ax · dt + B ( u t dt + Σ t dW t ) y = Cx, (14)where dW t = W t + dt − W t is the standard vector Wiener process in the timeinterval [ t, t + dt ) and u t is a deterministic input. We will denote the value ofa function f at time t as f ( t ) or f t interchangeably. The matrix Σ t consistsof nonnegative entries, possibly time-varying, each of which is proportionalto the strength of the corresponding disturbance input. Note that the onlydiﬀerence with the system (2) is that now the inﬁnitesimal state diﬀerence dx depends not only on the current state and the deterministic input, butalso a random term dW t ∼ N (0 , dt ).It should be noted that the fraction dW t /dt does not exist as dt →

0, sodividing both sides of equation (14) by dt would not make sense. But thisnotation also helps us to intuitively understand the eﬀect of randomness inthe system, when we know how the state of the system is aﬀected by therandomness in the inputs. It also helps us to easily generalize these resultswhen the randomness is a product of many noise sources as we will see inthe last section.The diﬀerent Wiener processes may be correlated with each other butsince each input may consist of a weighted sum of all of the diﬀerent processesthrough multiplication by matrix Σ t , the analysis is simpliﬁed if we assumethat they are independent.The output of the system is the superposition of the deterministic output,6nd the response to the random input: y ( t ) = Z t −∞ h ( t − s )( u ( s ) ds + Σ s dW s )= Z t −∞ h ( t − s ) u ( s ) ds + Z t −∞ h ( t − s )Σ s dW s . (15)The expected value for the output, according to equation (7) will be E [ y ( t )] = Z t −∞ h ( t − s ) E [ u ( s ) ds + Σ s dW s ]= Z t −∞ h ( t − s ) u ( s ) ds, (16)since Brownian motion is a martingale [6].Applying equation (8) when the input is white noise, the covariance ma-trix can be written as V [ y ] = lim t →∞ V [ y ( t )]= lim t →∞ Z t −∞ Z t −∞ h ( t − r ) E (cid:2) dW r Σ r Σ Ts dW Ts (cid:3) h T ( t − s ) . (17)But since the inputs are assumed to be white noise processes, the covarianceamong all of them is nonzero only if they take place during the same interval,and in that case, the covariance is proportional to the length of this interval. V [ y ] = Z t −∞ Z t −∞ h ( t − r ) (cid:16) Σ r √ drδ ( r − s ) √ ds Σ Ts (cid:17) h T ( t − s )= Z t −∞ h ( t − s ) · V ( s ) · h T ( t − s ) ds, (18)where V ( s ) = Σ s Σ Ts is the covariance matrix of the input random vector.For the linear time invariant system (2) and white noise inputs of constantvariance V ( s ) is a constant matrix, and we can write V [ y ] = Z t −∞ ( Ce A ( t − s ) B ) · V · ( Ce A ( t − s ) B ) T ds = C (cid:18)Z + ∞ e Ax BV B T e A T x dx (cid:19) C T . (19)7he mean and the variance of the output signal in the steady state canbe written as a function of the Fourier transforms of the input signal and thenetwork transfer function. From equation (7) E [ y ( t )] = E (cid:20)Z t −∞ h ( t − s ) u ( s ) (cid:21) ds = h ( t ) ∗ E [ u ( t )] (20)where f ( t ) ∗ g ( t ) denotes the convolution of two functions f ( t ) and g ( t ) giventhat it exists.When the input is constant with time, the expected value of the input isconstant as well ( E [ u ( t )] = µ x ) and the last expression can be simpliﬁed to E [ y ] = µ x Z + ∞ h ( u ) du = µ x H (0) . (21)If the input itself is not known, but its frequency content can be estimated,we can ﬁnd the variance of the output using Parseval’s theorem: V [ y ] = E [ y · y T ] = lim t →∞ Z t −∞ y ( t ) y T ( t ) dt = Z + ∞−∞ | Y ( f ) | df = Z + ∞−∞ Y ( f ) · Y ∗ ( f ) df = Z + ∞−∞ H ( f ) X ( f ) X ∗ ( f ) H ∗ ( f ) df. (22)The formula above is useful if we know or we can estimate the variousfrequencies of the input random processes. More generally, if we know theautocorrelation function of the random processes in the input, we may ﬁndthe expected autocorrelation in the output, and then estimate the outputvariance. R y ( τ ) = Z + ∞−∞ S y ( f ) cos(2 πf τ ) df = Z + ∞−∞ | H ( f ) | S x ( f ) cos(2 πf τ ) df = Z + ∞−∞ | H ( f ) | (cid:18)Z + ∞−∞ R x ( u ) cos(2 πf u ) du (cid:19) cos(2 πf τ ) df. (23)8e will be focusing on Wiener processes exclusively, because this is themost general approach for sums of random disturbances. The Central LimitTheorem shows that the sum of a large number of independent identically dis-tributed random variables with ﬁnite mean and variance always approachesthe normal distribution (see also equation (12)). The only assumption inthe case of additive disturbances is that the inputs at every time are sumsof independent random variables of arbitrary distribution of ﬁnite standarddeviation. This is a reasonable assumption in most settings. For example, inbiology the Poisson distribution is frequently used to model random distur-bances [1]. The Poisson distribution can be well approximated by a Gaussianwhen the event rate is greater than 10 (see [8]), and the same can be said forsmall sums of Poisson random variables. When the input disturbance at eachtime is correlated with the disturbances during earlier times, the correlationstructure can be emulated by passing white noise through a ﬁlter that pro-duces it. Also, in some applications, noise cannot be expected to have equalfrequency content for all frequencies up to inﬁnity. We can still use whitenoise as an input, which we can pass through a ﬁlter with zero response forall the frequencies outside the desired range. Tree networks are a special case of networks where there is a unique pathamong every pair of vertices. In other words, there are no cycles, whichmakes the analysis of such networks easier. Many natural networks havebeen found to be locally tree-like [9]. When analyzing the behavior of anetwork around an equilibrium point, or if the network is linear, then theanalysis can be signiﬁcantly simpliﬁed. Since there is a unique path from anyvertex to another, it suﬃces to analyze path networks, which consist of alltheir vertices connected in series. For each output, the total response of thesystem is the superposition of the signals caused for all the individual inputs.First, we will show that in the case of random signals, the order of the nodesin the network does not matter in the case of linear pathways. Then, we willﬁnd the variance of a linear path graph assuming that every node is a ﬁrstorder ﬁlter. The result can easily be generalized for the case of arbitrary treegraphs. Finally, we are going to ﬁnd the optimal placement of poles so thatthe noise suppression is maximized. 9 .1 Output Variance of Linear Pathways

Lemma 1.

The noise response of a linear pathway is independent of therelative position of its nodes.Proof.

Without loss of generality, we can assume that the linear pathwayhas one input and one output. Otherwise, since the system is linear, we canrepeat the process each time considering only the respective subtree. Underthe last assumption, the output is the state of the last node, and all inputsaﬀect only the ﬁrst node. From equation (22): V [ y ] = Z + ∞−∞ H ( f ) X ( f ) X ∗ ( f ) H ∗ ( f ) df = Z + ∞−∞ H ( f )( X ( f ) + . . . + X n ( f ))( X ∗ ( f ) + . . . + X ∗ n ( f )) H ∗ ( f ) df = n X k =1 n X m =1 Z + ∞−∞ X k ( f ) X ∗ m ( f ) H ( f ) H ∗ ( f ) df = n X k =1 n X m =1 Z + ∞−∞ X k ( f ) X ∗ m ( f )( h ( f ) · . . . · h N ( f ))( h ∗ n ( f ) · . . . · h ∗ ( f )) df = M X k =1 M X m =1 Z + ∞−∞ X k ( f ) X ∗ m ( f ) N Y n =1 | h n ( f ) | df. (24)It is evident that we can interchange the transfer functions inside the productin the integral, without changing its value.Assume that we have a linear pathway such that the system is linear,described by the equation (2) where the dynamical and input matrices are A =  − d . . . f − d . . . f . . . . . . f N − d N  B =   C T =   For simplicity, we assume that there is only one noise source and only oneoutput, but since there are no cycles, there is a unique path from each node10o every other, which means we can use the result for a linear pathwayrepeatedly, in order to ﬁnd the total variance. The variance is independentof the deterministic input that is applied to the pathway, since the system islinear.Using equation (19), and after performing all calculations, the varianceat the output will be V out = N − Y u =1 f u !  N X k =1 N X m =1 d k + d m ) N Y a =1 ,a = k ( d k − d a ) N Y b =1 ,b = m ( d m − d b )  . (25)The expression above holds even if there exist two vertices a and b suchthat their reaction rates are equal, according to the next Lemma. Lemma 2.

The output variance of a linear pathway does not depend on thediﬀerence of any of the reaction rates.Proof.

We pick two rates d x and d y and show that the V out does not dependon their diﬀerence. If we denote T k,m = 1( d k + d m ) N Y a =1 ,a = k ( d k − d a ) N Y b =1 ,b = m ( d m − d b ) , (26)the diﬀerence d x − d y appears only in the terms T x,x , T x,y , T y,x and T y,y . Theirsum T x − y is equal to T x − y = T x,x + T x,y + T y,x + T y,y = 12 d x ( d x − d y ) Y s = x,y ( d x − d s ) + 12 d y ( d y − d x ) Y s = x,y ( d y − d s ) − d x + d y )( d y − d x ) Y s = x,y ( d y − d s ) . (27)We set P x = Y s = x,y ( d x − d s ) and P y = Y s = x,y ( d y − d s ) (28)11o that sum above can be written as T x − y = d x ( d x + d y ) P x + d y ( d x + d y ) P y − d x d y P x P y d x d y ( d x + d y )( d y − d x ) P x P y . (29)Expanding the nominator of T x − y and grouping the relevant terms together: T x − y = d x P x + d x d y P x + d x d y P y + d y P y − d x d y P x P y d x d y ( d x + d y )( d y − d x ) P x P y = (cid:0) d x P x − d x d y P x P y + d y P y (cid:1) + d x d y (cid:0) P x − P x P y + P y (cid:1) d x d y ( d x + d y )( d y − d x ) P x P y = ( d x P x − d y P y ) + d x d y ( P x − P y ) d x d y ( d x + d y )( d y − d x ) P x P y . (30)It is easy to see that both terms in the nominator of the last fraction have afactor of order ( d y − d x ) , and the Lemma is proved, and the fraction doesnot depend on the square diﬀerence ( d y − d x ) . Lemma 3.

Assume that the same noise source is applied to two diﬀerentpathways with impulse responses h ( t ) and h ( t ) respectively. The covarianceof the signals in their output will be equal to C ( τ ) = lim t →∞ E [ y ( t ) y ( t + τ )]= Z ∞ h ( r ) h ( r + τ ) dr. (31) Proof.

The two outputs y ( t ) and y ( t ) are equal to y ( t ) = Z t −∞ h ( t − x ) dW x and y ( t ) = Z t −∞ h ( t − y ) dW y (32)where W t is the Wiener process that drives both systems simultaneously.Taking the expected value of the product of the ﬁrst and a delayed version12f the second, C ( τ ) = lim t →∞ E (cid:20)Z t −∞ h ( t − x ) dW x · Z t −∞ h ( t + τ − y ) dW y (cid:21) = lim t →∞ Z t −∞ Z t + τ −∞ h ( t − x ) h ( t + τ − y ) E [ dW x dW y ]= lim t →∞ Z t −∞ h ( t − s ) h ( t + τ − s ) ds = Z ∞ h ( r ) h ( r + τ ) dr. (33) Corollary 1.

Assume that noise from a single noise source with standarddeviation σ enters a network, and propagates through N independent path-ways to reach the output. If the impulse response of each of the independentpathways is h ( t ) , h ( t ) , . . . , h N ( t ) respectively, the mean of the output y willbe zero, and its variance equal to V out = σ Z ∞ N X k =1 a k h k ( x ) ! dx. (34) Proof.

The output vertex will receive a weighted sum of the outputs of thetwo independent pathways z ( t ) = N X k =1 a k y k ( t ) . (35)Its expected value is equal to zero at all times: E [ z ( t )] = E " N X k =1 a k y k ( t ) = N X k =1 E [ a k y k ( t )]= N X k =1 a k Z t −∞ h k ( t − x ) σ E [ dW x ]= 0 . (36)13he variance is equal to: V y = lim t →∞ V y ( t ) = lim t →∞ E [ z ( t )]= lim t →∞ E " Z t −∞ N X k =1 a k h k ( t − x ) dW x ! · Z t −∞ N X k =1 a k h k ( t − y ) dW y ! = lim t →∞ Z t −∞ Z t −∞ N X k =1 a k h k ( t − x ) ! · N X k =1 a k h k ( t − y ) ! E [ dW x dW y ]= lim t →∞ Z t −∞ σ N X k =1 a k h k ( t − s ) ! ds = σ Z ∞ N X k =1 a k h k ( x ) ! dx. (37)Suppose we have a linear pathway with each element representing a single-pole linear ﬁlter, and we need to pick the position of the poles such that thevariance in the output is minimized. The next lemma shows an easy way toﬁnd the pathway if all its vertices are identical and subject to the symmetricconstraints. Deﬁnition 1.

A symmetric multivariable function f : R n → R is a functionfor which f ( x ) = f ( π ( x )) where π ( x ) is an arbitrary permutation of the inputvector x . Lemma 4.

Assume that a symmetric multivariable function f : R n → R is nowhere constant and has a sign deﬁnite Hessian matrix. Then it has aunique extremum under symmetric constraints, such that all the elements ofthe input vector x are equal.Proof. Since the Hessian has the same sign everywhere, the function f isstrictly convex or strictly concave. We will assume that f is strictly convex,noting that the proof is the similar when f is concave. Assume that theextremum of the function f is equal to f ∗ , and the argument that achievesthis is x ∗ . Further assume that min( x ∗ ) = m and max( x ∗ ) = M are theminimum and maximum elements of the vector x ∗ respectively. Since f issymmetric, f ( m, M, x ∗ , . . . , x ∗ n ) = f ( M, m, x ∗ , . . . , x ∗ n ) = f ∗ (38)14here the arguments still satisfy the symmetric constraints. But since f isstrictly convex, every convex combination of these values will be f ( a, b, x , . . . x n ) ≤ tf ( m, M, x , . . . , x n ) + (1 − t ) f ( M, m, x , . . . , x n )= tf ∗ + (1 − t ) f ∗ = f ∗ . (39)Generalizing the last argument, it is straightforward to see that f ( x , x , . . . x n ) = f ∗ for every m ≤ x , x , . . . x n ≤ M. (40)Therefore, f ( x ) needs to be constant in that area, which contradicts theassumption that the function has sign deﬁnite Hessian.When the constraints are convex but not necessarily symmetric, then wecan use the Lagrangian to ﬁnd the optimal parameters. Coming back to thelinear pathway network, and assuming that the input is white noise, if thepoles of the diﬀerent nodes are placed at a , a , . . . , a N , the total variance inthe output is equal to (see equation (22)): V out ( a , a , . . . a N ) = 12 π Z + ∞−∞ (cid:12)(cid:12)(cid:12)(cid:12) jω + a (cid:12)(cid:12)(cid:12)(cid:12) · (cid:12)(cid:12)(cid:12)(cid:12) jω + a (cid:12)(cid:12)(cid:12)(cid:12) · · · (cid:12)(cid:12)(cid:12)(cid:12) jω + a N (cid:12)(cid:12)(cid:12)(cid:12) dω = 12 π Z + ∞−∞ ω + a · ω + a · · · ω + a N dω. (41)The function V out is convex with respect to all its arguments a , a . . . a N ,as an (inﬁnite) sum of products of convex functions. Consequently, it has aunique minimum under convex constraints.The Lagrangian of the function for V out is L ( a , a , . . . a N ) = 12 π Z + ∞−∞ ω + a · ω + a · · · ω + a N dω − λg ( a , a , . . . , a N ) . (42)Diﬀerentiating with respect to a k , under the Leibnitz integral rule: ∂ L ∂a k = 12 π Z + ∞−∞ ω + a · · · − a k ( ω + a k ) · · · ω + a N dω = λ ∂g ( a , . . . , a N ) ∂a k (43)for every k . Diﬀerentiating with respect to all the parameters will give us N equations, and we have one more equation by requiring g ( a , . . . , a N ) = 0. So15e can solve the system of N + 1 equations and N + 1 unknowns λ, a , . . . a N ,which is guaranteed to have a unique solution as all functions are convex.In conclusion, we can ﬁnd the unique minimum of the variance of a linearpathway, when each node is a single pole linear ﬁlter with real negativepoles. Given that a linear tree network with independent noise inputs canbe decomposed to many linear pathways, this method can be applied to anyarbitrary network without cycles. In a serial pathway where each vertex acts as a ﬁlter, the output at eachnode has a diﬀerent frequency content as the noise propagates through thenetwork, being ﬁltered at each step. The variance at each node is decreasingas we move further from the noise source, as is shown in Figure 1. Asthe serial pathway becomes longer, the input and the output become lesscorrelated since their distance increases. In addition, every node changesthe phase of its inputs, which also contributes to the decreased correlation.Therefore, applying negative feedback or setting up a feedforward cycle canonly have a measurable eﬀect if the cycle length is relatively small. Figure2 shows the covariances and correlations among the vertices of two simplelinear pathways, one unidirectional one bidirectional, as they are depicted inFigure 1.Cycles can signiﬁcantly increase the eﬀect of noise in the system. Thereare two reasons for this: First the noise can now reach more vertices sincethe average distance among nodes decreases, and second, every node nowreceives the same disturbance from at least two diﬀerent paths, and the twosignals are correlated, contributing to larger variance. An example is shownin Figure 3, where we compare the average variance of two systems whoseonly diﬀerence is the connection between the ﬁrst and the last node. Bothnetworks receive the same inputs, but in the cycle network, the variance ismuch larger. The result of the noise is even more pronounced when there iscorrelation among the noise inputs to diﬀerent nodes.The eﬀect of cycles on the output noise can be reduced if we make surethat each independent pathway also changes the phase of its input by diﬀer-ent amounts. Diﬀerent phases in the output (for at least a relatively large16 ..Noise Output O u t pu t V a r i a n ce ...Noise Output O u t pu t V a r i a n ce Figure 1: Variance of the output of a unidirectional and a bidirectional serial path-way as a function of the pathway length. All nodes are assumed to be identicalsingle-pole ﬁlters. In the unidirectional pathway, each node is aﬀected only by thenode immediately preceding it, whereas in the bidirectional pathway each interme-diate vertex is receiving input from the node preceding and the node succeedingit. The bidirectional pathway is much more eﬃcient in ﬁltering out noise. Thevariance for both pathways decreases with the pathway length. The bidirectionalpathway has variance very close to zero even when it is relatively short. frequency spectrum) will ensure that the various frequencies partially can-cel each other, reducing the output variation. When a pathway signiﬁcantlyreduces the frequency content, or has small gain for most frequencies, thencorrelations do not play a signiﬁcant role. This behavior is clearly shown inFigure 4 for a unidirectional cycle and in Figure 5 for a bidirectional cycle.Phase shifts in a pathway are equivalent to time delays, as we will see in thenext section.Similarly, negative feedback carefully applied to a network contributes tobetter disturbance rejection. When the disturbance is white noise, the eﬀectof feedback is smaller as the feedback cycle gets longer.17 a) Unidirectional Pathway CovarianceMatrix (b) Unidirectional Pathway CorrelationMatrix(c) Bidirectional Pathway CovarianceMatrix (d) Bidirectional Pathway CorrelationMatrix

Figure 2: Covariance and correlation among all pairs of nodes in a linear pathway.Every square ( x, y ) in the matrices above corresponds to the value of their corre-lation R x,y ( τ = 0) of nodes of distance x and y from the origin, 0 ≤ x, y ≤ N − | x − y | among the nodes increases, their covariance and correlation decreases. The covari-ance among nodes of the same distance in the unidirectional pathway decreases,and the correlation among them increases towards the end. The covariance of thenodes in the bidirectional pathway is essentially zero within a small distance, andthe correlation is larger even when the distance is relatively large. A v e r a g e N od e V a r i a n ce R a ti o A v e r a g e N od e V a r i a n ce R a ti o Figure 3: Average variance of all nodes in a network in a cycle as compared toan identical network without the feedback loop. Every node has a noise inputwhich is then spread through the network. The average variance of all the nodesfor both the cycle is normalized by the variance of the respective serial pathway.The variance cycle is always much larger than the variance of the simple serialpathway when the noise inputs for each node are uncorrelated (bottom left). Theratio becomes even larger when the inputs are correlated (bottom right).

The correlation and covariance among vertices decreases with distanceand the variance of each node decreases as the length of the pathway in-creases. Furthermore, as we move towards the end of the pathway, the co-variance of nodes of a given distance decreases but the correlation of nodesof a given distance increases. The last observation is easily justiﬁed takinginto account that each new node introduces a virtual ﬁlter, and the outputof nodes will tend to have very similar frequency content the more ﬁlters ithas gone through. Moreover, from the Bode plot of a ﬁlter, we can easily seethat for the frequencies that are not aﬀected by the ﬁlter, their phase is alsorelatively unaﬀected, which does not decrease their correlation.The previous analysis hints to the fact that feedback cycles have limitedutility when applied to long pathways. Figure 6 shows the variance of theoutput after we apply negative feedback to a linear pathway. The darknessof each element ( m, n ) of the upper triangular matrix shows the standard19 oise Output (a) O u t pu t V a r i a n ce (b) O u t pu t V a r i a n ce (c) Figure 4: A network consisting of a feedforward cycle and the corresponding noisestrength in its output. If the nodes of the network have poles with relativelysmall absolute values, then the output variance may be larger than the variancein the intermediate nodes. A ﬁxed number of identical nodes is divided into twopathways, whose output is combined in the output node. If the number of nodesis similar in both pathways, then their outputs are highly correlated, and whencombined produce large random swings. This does not happen when the poles ofeach node have a large negative real part (right). In the ﬁrst case, the poles areplaced at a = − a = − . deviation of the pathway output when we apply feedback from node n tonode m . As one would expect, the eﬀect of feedback is directly proportionalto the correlation between the source and target vertices. The same holdsfor feedforward loops, both positive and negative.In the case of negative feedforward loop, the variance in the output in-creases as the loop length increases. When the feedforward interaction ispositive, the variance decreases at ﬁrst, since the correlation among the dif-ferent states also decreases, but then goes up, partly because when it aﬀects20 oise Output (a) O u t pu t V a r i a n ce (b) O u t pu t V a r i a n ce (c) Figure 5: Correlations increase the variance in bidirectional networks. If the out-puts of two pathways that are correlated are combined, then the output has rel-atively large variance. Here, a single output receives input from two pathways ofdiﬀerent lengths, which consist of identical nodes. Bidirectional pathways ﬁlternoise very eﬀectively as shown before, and the output variance is still small. a node towards the end of the pathway, it does not pass through successiveﬁlters, so the variance does not have the chance to decrease (see Figure 7).

As one would expect, adding delay to the interactions among any nodesin a network driven by noise decreases their correlation, meaning that anyfeedforward or feedback cycles will have a smaller eﬀect. The covarianceof a white noise process with a delayed version of the same signal can be21 n Noise Output − (a) Feedback Topology Figure O u t pu t V a r i a n ce (b) Output Variance Figure 6: A serial pathway with a unit feedback loop. The matrix on the rightconsists of squares ( m, n ), each of which represents the variance of the output whenfeedback is applied from node n to node m . The result of the feedback loop onlydepends on the distance d = | n − m | , and the variance decreases as the length ofthe feedback loop becomes smaller, and vice versa. m n Noise Output (a) Feedforward Loop O u t pu t V a r i a n ce (b) Negative Feedforward Loop O u t pu t V a r i a n ce (c) Positive Feedforward Loop Figure 7: Output variance of a linear pathway when the input is white noise,and we add a negative (left) or positive (right) feedforward loop starting fromthe ﬁrst vertex. For the positive loop, the variance is largest when we connectnearby vertices (large correlation) or we connect an early vertex to the end of thepathway, since it has a large variance that is transmitted directly to the outputwithout being further ﬁltered. V τ [ y ] = lim t →∞ E [ y ( t ) y ( t + τ )]= lim t →∞ E "(cid:18)Z t −∞ h ( t − r )Σ r dW r (cid:19) (cid:18)Z t + τ −∞ h ( t + τ − s )Σ s dW s (cid:19) T = lim t →∞ Z t −∞ Z t + τ −∞ h ( t − s )Σ r E (cid:2) dW r dW Ts (cid:3) Σ Ts h T ( t + τ − s )= lim t →∞ Z t −∞ Z t + τ −∞ h ( t − r )Σ r √ drδ ( s − r ) √ ds Σ s h T ( t + τ − s )= lim t →∞ Z t −∞ h ( t − s ) V s h T ( t + τ − s ) ds = lim t →∞ Z t h ( t − s ) V s h T ( t + τ − s ) ds. (44)If the system is causal, linear and time invariant, and the disturbance is whitenoise of constant strength added to the input, V τ [ y ] = Z ∞ h ( u ) V h T ( u + τ ) du. (45)As a speciﬁc example, if the impulse response is h ( t ) = Ce At B and thecovariance matrix is constant: V τ [ y ] = Z ∞ Ce As BV B T e ( s + τ ) A T C T ds = C (cid:18)Z ∞−∞ e As BV B T e sA T ds (cid:19) e τA T C T . (46)Note that the last equation is similar to equation (19), except for the ex-ponential delay term in the end. We assume that the dynamical matrix A has negative eigenvalues, otherwise the system is not stable. If the delay is τ >

0, 23 V τ k = k C (cid:18)Z ∞ e As BV B T e A T s ds (cid:19) e A T τ C T k≤ k C (cid:18)Z ∞ e As BV B T e A T s ds (cid:19) C T k · k e A T τ k≤ k C (cid:18)Z ∞ e As BV B T e A T s ds (cid:19) C T k = k V k . (47)The matrix norm used here is the ﬁrst order elementwise norm, since weare usually interested in the average variance of all parts of the network. k M k = N X i =1 N X j =1 | m i,j | . (48)If we only know the autocorrelation function of the disturbance, we cancompute the output variance by moving to the frequency domain. R y ( τ ) = Z + ∞−∞ S y ( f ) cos(2 πf τ ) df = Z + ∞−∞ S y ( f ) cos(2 πf τ ) df = Z + ∞−∞ | H ( f ) | S x ( f ) cos(2 πf τ ) df = 12 π Z + ∞−∞ | C ( jωI − A ) − B | (cid:18)Z + ∞−∞ R x ( u ) cos( ωu ) du (cid:19) cos( ωτ ) dω. (49)The shape of the autocorrelation function is a good indicator of how a feed-back or feedforward loop will aﬀect the output variation. A correlation func-tion that quickly goes to zero as τ increases shows that the feedback cyclewill not change the variance of the output by a lot. Conversely, a randomsignal with a correlation structure can be easily ﬁltered out by applying anappropriate feedback mechanism. In a general network, signals are propagated from one node to its neighbors.Every vertex receives a ﬁltered version of the noise signal, since every node24cts as a single pole ﬁlter. The pole is always real, and proportional to thedegree of each vertex, if we assume that each node receives input proportionalto the diﬀerences of concentrations among its neighbors and itself, or thatnodes that interact with many others have proportionally large degradationrates. In this case, we can model the dynamics of a ﬁrst order linear networkthrough its Laplacian matrix. In such a network, the state of each node x k follows the diﬀerential equation dx k dt = X m ∈N k a km ( x k − x m ) , (50)where a km > k, m ∈ V . The Laplacian of a matrix has beenused to model a wide range of systems, including formation stabilizationfor groups of agents, collision avoidance of swarms and synchronization ofcoupled oscillators [10]. It can also be used in biological and chemical reactionnetworks, if the degradation rate of each species is equal to the sum of therates with which it is produced. In this section, we will model the dynamics ofeach network with its Laplacian matrix, where each node is aﬀected by a noisesource which is independent of all other nodes, but has the same standarddeviation. Given that each vertex contributes equally to the overall noisemeasure of the graph, and since the noise entering each node propagatestowards all its neighbors, we can use Lemma 4 to see that the degrees ofthe network vertices have to be as similar as possible (see also [11] and [4]).In addition, Figure 3 shows that the cycles need to be as long as possiblein order to avoid any correlations of signals through two diﬀerent paths.For longer cycles, the noise inputs go through more ﬁlters before they arecombined. Moreover, the phase shift is larger for all their frequencies, whichreduces their correlation. On the other hand, there are bounds on how longa cycle can be given the network’s order and size. Networks with long cyclestend to have large radius and larger average distance, as shown in [12], whichmakes noise harder to propagate, having to pass through many ﬁlters. By thesame token, networks with a small clustering coeﬃcient will tend to be moreimmune to noise in their output, since these networks tend to create cliques ordensely connected subnetworks [13], which will facilitate noise propagation,especially if the noise sources that aﬀect the nodes are correlated, as shownin previous sections. A method to ﬁnd these graphs is ﬁrst to determine theirdegree sequence, and then determine which one has the largest average cyclelength. This procedure can be simpliﬁed by working recursively, buildingnetworks with progressively larger order and size.25 emma 5. There is always a connected graph of order N and size m inwhich there are k vertices with degree d + 1 and N − k vertices with degree d where d = (cid:22) mN (cid:23) and k = 2 m − N d. (51)

Proof.

We will prove the existence of such a graph by starting with its degreedistribution and, by successive transformations, convert it to a graph thatis known to exist. Speciﬁcally, at each step we will remove one vertex alongwith its edges, repeating the process until we end up having a cycle graph.Assume that the degree sequence of the graph G is as above, and we arrangethe degrees of the vertices in a decreasing order. s = { d + 1 , d + 1 , . . . d + 1 | {z } k vertices , d, d, . . . , d | {z } N − k vertices } . (52)According to the Havel-Hakimi theorem [11], the above sequence is a graphsequence if and only if the graph sequence in which the largest degree vertexis connected to vertices 2 , , ..., d + 2 is also a graph sequence. The new graphwill have a degree sequence of s =  { d + 1 , d + 1 , . . . d + 1 | {z } k − d − , d, d, . . . d | {z } N − k + d +1vertices } if d < k − { d, d, . . . , d, d | {z } N + k − d − , d − , d − , . . . , d − | {z } d − k + 2vertices } if d ≥ k − . (53)The key observation is that the transformation above preserves the propertyof degree homogeneity, in other words, in the new graph G = G ( N − , m − d + 1), the minimum and maximum vertex degrees are d min = (cid:22) m − d + 1 N − (cid:23) (54)and d min ≤ d max ≤ d min + 1 . (55)26epeating the process, there will be a graph G r with at least one vertex ofdegree d min = 1. It follows from the analysis above that the graph G r willinclude either one or two vertices of degree d min = 1. If it has two verticeswith degree one, it is the path graph. If it has only one vertex with degreeone, its degree sequence is not a graph sequence. But this would mean thatthe sum of all the degrees is an odd number, which is not possible, sinceat every transformation, we remove 2 d max from the sum of degrees. Thegraph G r is a connected graph, and implementing the inverse transforms, weconnect new vertices to an already connected network, which guarantees thatthe ﬁnal graph is connected.For networks with a small number of vertices , we can ﬁnd all graphswith the desired degree sequence, and among them, exhaustively search forthe ones with the largest average cycle length that have the smallest averagevariance. For N = 6 nodes, all connected networks (with 5 ≤ m ≤

15 edges)with most homogeneous degree distribution and longest average cycles areshown in Figure 8.

Figure 8: All connected networks of order N = 6 and size 5 ≤ m ≤

15 andwith minimum output variance. We assume that every vertex is aﬀected by anindependent noise source. In addition, each vertex acts as a single pole ﬁlter. Thetotal noise of the network is measured as the average of the variances of all nodes.

To summarize this section, positive correlations increase the output vari-ance, and cycles create correlations that make the system more prone torandom inputs. The longer the cycles, the smaller their eﬀect. The immu-nity to noise is increased when pathways with the same output introduce27iﬀerent phase shifts, so that the diﬀerent noise contributions cancel eachother at least partially. This result holds both for feedforward and feedbackloops. When we have some convex constraint on the strength of the variousﬁlters, placing the poles, we can ﬁnd the optimal placement such that theoutput noise is reduced. Speciﬁcally, for a linear network where all nodesact as single pole ﬁlters and the dynamics of the network are described byits Laplacian matrix, there is a systematic way to ﬁnd the network with thesmallest average variance. The optimal networks have homogeneous degreedistribution, and cycles that are as long as possible. ¯ I ¯ I − ¯ V + − ¯ V + R RC CD

Figure 9: A simple circuit with two noise sources. The two resistors generatethermal noise, which is modeled as current sources in parallel to them. When theswitch is open, the two circuits are independent. When the switch is closed, thenoise in both outputs has smaller variance than before.

Assume that we have a resistor without any external voltage source. Ifwe measure the voltage between its endpoints, we will ﬁnd that in any in-ﬁnitesimal frequency interval df there is thermal noise V t with E [ V t ] = 0 and E (cid:2) V t (cid:3) = 4 kT Rdf (56)where R is the resistance. The above equation shows that the noise increasesas temperature and resistance increase. We connect a capacitor in parallel28ith the resistor, and measure the voltage between its endpoints. We areinterested in the total amount of variance of the voltage in the output ofthe parallel combination of the resistor and the capacitor. When the switchis open, each of the two subcircuits operate independently, and the outputvariance for both of them is¯ V = ¯ V = Z + ∞ kTR (cid:12)(cid:12)(cid:12)(cid:12) R j πf RC (cid:12)(cid:12)(cid:12)(cid:12) df = Z + ∞ kTR R RC ) (2 πf ) df = 4 kT R πRC Z + ∞ du u = kTC . (57)If we close the switch, the output variance is¯ V = ¯ V = 4 kTR Z + ∞ (cid:12)(cid:12)(cid:12)(cid:12) R (1 + j πf R ( C + D ))( j πf R ( C + 2 D ) + 1)(1 + j πf RC ) (cid:12)(cid:12)(cid:12)(cid:12) df + 4 kTR Z + ∞ (cid:12)(cid:12)(cid:12)(cid:12) j πf R D (1 + j πf RC )(1 + j πf R ( C + 2 D ) (cid:12)(cid:12)(cid:12)(cid:12) df = kT D C ( C + D )( C + 2 D ) + kT ( C + D ) C ( C + 2 D )= kTC · C + DC + 2 D . (58)If the capacitor that connects the two subcircuits has capacitance

D >

We analyze the four simple subgraphs of Figure 10. σ (a) σζ c f (b) σn · ζn · c n · f (c) σζ ζ. . .c f cf (d) Figure 10: Crosstalk topologies involving one network node. (a)

A node withoutcrosstalk interactions with white noise input having standard deviation equal to σ . (b) A node with crosstalk interaction with one other node in the network,which also is aﬀected by noise with standard deviation ζ . (c) Same as before,but we assume that both the crosstalk and the noise are increased. (d)

Crosstalkinteractions with many other nodes, each of which has an independent noise inputof the same strength. See text for quantitative analysis of these subsystems.

For simplicity, we may disregard any deterministic inputs, since we assumethese are linear systems, and any deterministic inputs only aﬀect the outputmean, but not its variance. The stochastic diﬀerential equations for all thesystems are shown next.System ( a ) obeys a simple stochastic diﬀerential equation, with one noiseinput, and it has no other interactions with any other parts of the network. dX = − aXdt + σdW t . (59)We have found the solution to this equation in the ﬁrst section, and thevariance in the output is found is equal to V a = σ a . (60)30his is the trivial case without any crosstalk, and will be used for comparisonto the performance of the other subnetworks.Subsystem ( b ) consists of one vertex that interacts with another nodewhich may also be prone to other noise sources. Crosstalk is modeled througha new vertex in the network, with which the studied node exchanges ﬂows.In chemical reaction networks for example, the species of interest X may beforming a complex Y with species I , whose concentration is supposed to beconstant: X + I c − ⇀↽ − f Y. (61)We also expect X to have a constant degradation rate a . The equations forthe concentrations of X and Y are dX = − ( a + c ) Xdt + f Y dt + σdW t dY = cXdt − f Y dt + ζ dU t (62)and the output variation is V b = a + f a ( a + c + f ) σ + f a ( a + c + f ) ζ . (63)The next step is to see what happens if we increase the crosstalk intensity.We can distinguish two cases. The ﬁrst is when there is crosstalk with oneother node (Figure 66). In the chemical reaction network analogy, X + A n · c −− ⇀↽ −− n · f Y. (64)It is straightforward to ﬁnd the new diﬀerential equations, and the variancein the output. dX = − ( a + nc ) Xdt + nf Y dt + σdW t dY = ncXdt − nf Y dt + nζ dU t (65) V c = a + nf a ( a + n ( c + f )) σ + n f a ( a + n ( c + f )) ζ . (66)Finally, we consider the case where one node has crosstalk interactions withmany diﬀerent nodes, each of which is aﬀected by a diﬀerent noise process(Figure 10(d)). The equations that the nodes obey are dX = − ( a + nc ) Xdt + nf Y dt + σdW t dY k = cXdt − f Y k dt + ζ dU kt ≤ k ≤ n (67)31 O u t pu t V a r i a n ce SingleNodeDoubleRate TwoNodes (a) O u t pu t V a r i a n ce DoubleRateTwoNodes (b)

Figure 11: Output variance as a result of noise input for a single vertex in thenetwork in the existence of crosstalk interactions with other vertices. (a)

Out-put variance as a function of the amount of crosstalk (concentration of crosstalkcomplex), when no additional noise is introduced. Crosstalk clearly mitigates theoutput variance. Also, having crosstalk with two independent nodes reduces thevariance even more, compared to having a single crosstalk node. (b)

Normal-ized output variance as a fraction of the variance when only one crosstalk nodeis present. Having many small sources of crosstalk is clearly better than havingone strong crosstalk interaction. For the same amount of total crosstalk, dividingit among many nodes drives the output noise variance to zero as the number ofnodes grows large. and the output variance can be computed as V d = a + f a ( a + nc + f ) σ + nf a ( a + nc + f ) ζ . (68)When no noise is introduced from the crosstalk nodes ( ζ = 0), crosstalkreduces the output variance. Figure 11 compares the last three cases, as thestrength of crosstalk interactions among the nodes increases. The crosstalkstrength in this case is quantiﬁed by the ratio r x = cc + f (69)which is equal to the concentration of the crosstalk product Y in equation(62) in the absence of degradation rates and noise inputs. It is shown thatdistributing the crosstalk among many nodes (equation (68)) decreases theeﬀect of noise noticeably more compared to the single node case. This iseven more pronounced when we normalize by the variance in the base case(equation (63)). 32 .0 0.5 1.0 1.5 2.0 2.5 3.00.00.51.01.5 Amount of Crosstalk O u t pu t V a r i a n ce Single Node Double RateTwo Nodes

Figure 12: Normalized variance of the output when the crosstalk introduces ad-ditional noise. Having strong crosstalk interactions with one single node increasesthe variance because noise propagates easily. When crosstalk is distributed amongmany nodes, the variance may be smaller or larger than before, depending on thestrength of the interactions. This is because having crosstalk interactions withmany other vertices introduces a proportional amount of noise.

When crosstalk introduces additional noise, it may increase the variancein the output of any given node if crosstalk is not strong enough to make upfor the introduced noise (Figure 12).

We consider two pathways with crosstalk among more than one of theirnodes. We distinguish two cases, when the two pathways have diﬀerent orthe same outputs. In the ﬁrst case, since the two outputs are independent,it is easier to reduce the noise variance in both of them, by “exchanging”their noise through each node, assuming that the diﬀerent noise sources areindependent. When the output is the same, there is little reduction in theoutput variance from crosstalk, since every disturbance eventually reaches theoutput, and is combined with other correlated versions of the same signal,as shown in Figure 13. The variance reduction in this case is caused by33he increase of the eﬀective pathway lengths, since they follow on average alonger path towards the output.

Noise 1Noise 2 ...... Output 1Output 2 O u t pu t V a r i a n ce N = = = = Noise 1Noise 2 ...... Output O u t pu t V a r i a n ce N = = = = Figure 13: Output variance when crosstalk is present among all stages of twodiﬀerent pathways for various pathway lengths, when their output is diﬀerent(left) or the same (right). The output variances are normalized by the varianceof a pathway without crosstalk. We assume that every stage of the pathway hassome noise input. A small amount of crosstalk can help reduce the eﬀect of noise inthe output, but more crosstalk does not help ﬁltering out the noise of the system.Crosstalk has a much smaller eﬀect when the two pathways have the same output.Although it reduces the variance of the intermediate nodes, it creates correlationsamong them, that in turn increases the variance in the output.

Suppose we have a simple decomposed system: dY = − aY dt + σdU t dY = − aY dt + σdW t . (70)The two outputs of the system are completely independent, since they donot interact in any way, and therefore are uncorrelated. The variance of eachoutput is: 34 [ Y ] = σ y = σ π Z + ∞−∞ ω + a dω = σ a . (71)The system is symmetric, thus V [ Y ] = V [ Y ] = σ y . If there is crosstalk, thenthe diﬀerent states of the system are correlated. If we model crosstalk as apositive conversion rate from one state to another, with the conversion rates Noise 3 Output 3Noise 2 Output 2Noise 1 Output 1 ⇒ Noise 3 Output 3Noise 2 Output 2Noise 1 Output 1 (a) R e l a ti v e O u t pu t V a r i a n ce (b) Figure 14: Output variation for each node in a system of N nodes, when thereare crosstalk interactions among every pair of nodes. The variance has been nor-malized by the corresponding variance without crosstalk. Each node is identical,and receives an independent noise input of the same intensity. When the numberof vertices increases, the noise is distributed among all the nodes, thus the outputvariance is reduced. − state system above becomes: dY = − ( a + c ) Y dt + cY dt + σdU t dY = − ( a + c ) Y dt + cY dt + σdW t . (72)The variance of each of the outputs now becomes: V [ Y ] = Z + ∞−∞ ( | h ( f ) | + | h ( f ) | ) df = σ π Z + ∞−∞ (cid:12)(cid:12)(cid:12)(cid:12) a + c + jω ( a + jω )( a + 2 c + jω ) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) c ( a + jω )( a + 2 c + jω ) (cid:12)(cid:12)(cid:12)(cid:12) ! dω = σ a · a + ca + 2 c , (73)where h and h are the impulse responses of the ﬁrst node when the input isan impulse response to the ﬁrst and second node respectively. The symmetryis preserved, so V [ Y ] = ψ y = V [ Y ]. The variance when crosstalk is present( c >

0) is always smaller than the initial variance of the outputs. Generalizingthe equations above for N nodes (see Figure 14), we ﬁnd that σ y = σ a ψ N = a + c ( a + N c ) σ y (74)and as a result, ψ N σ y = a + ca + N c (75)which tends to zero as N becomes large.Alternatively, we can model crosstalk interactions as two species beingconverted to an intermediate complex, as has been done in the previous sec-tions. A very simple example of a chemical reaction network which demon-strates this type of behavior is Y −→ AY −→ BY + Y − ⇀↽ − Z. (76)Crosstalk is deﬁned by the presence of the last reaction. We are interestedin the variance in the concentration of the output products A and B , which36re directly aﬀected by the variance of Y and Y . The two pathways willinteract through an intermediate vertex. The system can be written as dY = − aY dt − cY Y dt + f Zdt + σdU t dY = − aY dt − cY Y dt + f Zdt + σdW t dZ = cY Y dt − f Zdt. (77)We assume that there is a new “crosstalk vertex” Z among each pair oforiginal vertices. After linearizing around an equilibrium point ( ¯ Y , ¯ Y , ¯ Z ),these equations become dY = − ( a + c ¯ Y ) Y dt − c ¯ Y Y dt + f Zdt + σdU t dY = − ( a + c ¯ Y ) Y dt − c ¯ Y Y dt + f Zdt + σdW t dZ = c ¯ Y Y dt + c ¯ Y Y dt − f Zdt. (78)We ﬁnd that this network is now more capable of reducing the eﬀect of noisein the output (Figure 15). There are cases where the noise intensity is proportional to a state of the sys-tem. In biological networks for example, the degradation of various proteinsdepends on speciﬁc enzymes, whose concentration may be subject to randomﬂuctuations. This makes the degradation of a protein prone to noise whosesource is independent of the protein concentration, but makes the rate atwhich it degrades proportional to it. The noise intensity is also proportionalto the state of the system when a state is autoregulated, either with posi-tive or negative feedback, where the rate at which the concentration of thatparticular state changes is subject to random noise. We will call this type ofnoise multiplicative, because it is multiplied by the state of the system. As aspeciﬁc example, consider a gene that is regulated by a single regulator [14].The transcription interaction can be written as P → X. (79)When P is in its active form, gene X starts being transcribed and the mRNAis translated, resulting in accumulation of protein X at a constant rate b . Theproduction of X is balanced by protein degradation (by other specialized37 R e l a ti v e N o i s e V a r i a n ce DirectIndirect

Figure 15: Comparison of the noise in the output of a simple network with twodiﬀerent implementations of crosstalk, direct conversion or forming a new complex,as described by equations (72) and (77). proteins) and cell dilution during growth with rate a . A diﬀerential equationthat describes this simple system is dXdt = b t − a t X. (80)If there is noise in the concentration of the aforementioned degradation pro-teins, or the cell growth, the rate a t is not constant, but it consists of adeterministic component, and a random component. We will now show thatnoise in the production rate b t has a fundamentally diﬀerent eﬀect in systembehavior compared to the eﬀect of noise in the degradation rate a t , becausethe latter is multiplied by the concentration of the protein itself. We will ﬁrststudy the homogenous version of the diﬀerential equation (80), and then wewill add the constant production term. Ignoring the constant productionterm, and multiplying by dt , equation (80) becomes dX = − ( a t dt ) X. (81)38fter adding a random component in the degradation rate, the last equationbecomes dX = ( − a t dt + σ t dW t ) X, (82)where W t is the regular Wiener process and dW t represents the noise term.Note that the degradation rate and the noise intensity are allowed to be time-dependent. We will ﬁrst ﬁnd the diﬀerential of the logarithm of X using Itˆo’slemma. We will again require that all the input functions are continuous andnon-pathological, so that we can always change the order of taking the limitand the expectation operator. We will additionally assume that all integralsare ﬁnite, so that we can also change the order of integration. The technicaldetails mentioned above are covered in more detail in [6] and [15].We apply Itˆo ’s lemma on the logarithm of the random variable X , whichobeys equation (82): f ( X, t ) = log X ( t ) (83)and applying Itˆo’s lemma, we get d log( X ) = df ( X, t )= ∂f∂t dt + ∂f∂X dX + 12 ∂ f∂X dX = 0 + dXX −

12 1 X (cid:0) a t X dt − a t σ t X dtdW t + σ t X dW t (cid:1) = (cid:18) − a t dt + σ t dW t − σ dW t (cid:19) + X (cid:0) a t dt − a t σ t dtdW t (cid:1) . (84)The last two terms can be neglected, since dt = O ( dt ) and dt · dW t = O ( dt )as dt →

0. On the other hand, as dt becomes small,lim dt → dW t = E [ dW t ] = dt. (85)Applying the rules above to equation (84),log X ( t ) X = − Z t (cid:18) a s + 12 σ s (cid:19) ds + σ t W t . (86)We can now solve for X ( t ): X ( t ) = X e − R t ( a s + σ s ) ds · e σ t W t . (87)39he above derivation is valid only when the equilibrium state (concentration)is equal to zero and we start from a state X = 0. If the rate a and the noisestrength σ are constant, it simpliﬁes to X ( t ) = X e − ( a + σ ) t · e σW t . (88)When the equilibrium is positive (which is the case for most systems), thefollowing diﬀerential equation is more relevant: dY = bdt + ( − adt + σdW t ) Y. (89)One way to view the terms on the right hand side of equation (89) is thatthe concentration of species X depends on a deterministic input, and isregulated by a negative feedback mechanism which is subject to randomdisturbances. It has been shown in [16] that when feedback is also noisy,there are fundamental limits on how much the noise in the output can bereduced, because there are bounds on how well we can estimate the stateof the system. In [16] the authors focus on discrete random events (birth-death processes) as the source of noise, and the result is that feedback noisemakes it harder to control the noise in the output. We will also show that inour setting multiplicative noise results in larger variance than additive noiseof equal strength, and in the next section we will show it propagates in acascade of linear ﬁlters.Using Itˆo’s lemma once more, and the solution to the homogeneous equa-tion, we ﬁnd that the solution to the nonhomogeneous case is Y ( t ) = Y X ( t ) + bX ( t ) Z t X − ( s ) ds = Y e − R t ( a u + σ u ) du · e σ t W t + b Z t e − R ts ( a u + σ u ) du · e σ t W t − σ s W s ds (90)where X ( t ) is the solution of the homogeneous equation (87) with initialcondition X ( t = 0) = 1.If the initial state is equal to zero (or when t is large), and the all theparameters are constant, then we can simplify the last expression as Y ( t ) = b Z t e − ( a + σ ) u · e σW u du. (91)Note that the form of the last equation is fundamentally diﬀerent from theresponse of linear systems to input noise, because here the Wiener process40nput depends on the same time variable as the kernel of the integral. Inother words, the output is not a convolution of the impulse response of thesystem with the input. In order to see how the noise propagates through thenetwork, and given that we cannot use the solution (22), it is helpful to ﬁndthe correlation of two versions of this stochastic process, so that we ﬁnd itsfrequency content.As a ﬁrst step, we will compute the correlation of the exponential ofBrownian motion. The expected value is E [ Z t ] = E (cid:2) e σW t (cid:3) = Z + ∞−∞ e σ √ tx √ πt e − x t dx = e σ t . (92)The expected value of the square of the exponential Wiener process is E (cid:2) Z t (cid:3) = E (cid:2) e σW t (cid:3) = Z + ∞−∞ e σ √ tx √ πt e − x t dx = e σ t . (93)Combining the last two equations: σ Z t = V ar [ Z t ] = E (cid:2) Z t (cid:3) − ( E [ Z t ]) = e σ t − e σ t = e σ t (cid:16) e σ t − (cid:17) . (94)The expected value of Y ( t ) in equation (91) can now be computed: E [ Y ( t )] = b Z t e − ( a + σ ) u · E (cid:2) e σW u (cid:3) du = b Z t e − ( a + σ ) x · e σ x dx = ba (1 − e − at ) (95)which means that ¯ Y = lim t →∞ E [ Y ( t )] = ba . (96)41s one would expect, it the same as when the system is completely deter-ministic. Next, we need to compute the covariance of two realizations of therandom process Z t :Cov [ Z s , Z t ] = E [ Z s · Z t ] − E [ Z s ] · E [ Z t ]= E (cid:2) e σW s e σW t (cid:3) − E (cid:2) e σW s (cid:3) · E (cid:2) e σW t (cid:3) = E (cid:2) e σW s ∧ t e σW s ∨ t (cid:3) − e σ ( s + t ) = E (cid:2) e σW s ∧ t e σ ( W s ∨ t − W s ∧ t ) (cid:3) − e σ ( s + t ) = E (cid:2) e σW s ∧ t (cid:3) · E (cid:2) e σ ( W s ∨ t − W s ∧ t ) (cid:3) − e σ ( s + t ) = e σ s ∧ t · e σ ( s ∨ t − s ∧ t ) − e σ ( s + t ) (97)where we follow the standard notation s ∧ t = min( s, t ) and s ∨ t = max( s, t ).Combining all the equations above, we can ﬁnd the correlation for the geo-metric Brownian motion: R ( s, t ) = Corr [ Z s , Z t ]= Cov [ Z s , Z t ] σ Z s · σ Z t = e σ s ∧ t · e σ ( s ∨ t − s ∧ t ) − e σ ( s + t ) p e σ s ( e σ s − p e σ t ( e σ t − s e σ s ∧ t − e σ s ∨ t − . (98)We now deﬁne the covariance and correlation of two such processes with timelag τ in the equilibrium state as: C ( τ ) = lim t →∞ C ( t, t + τ ) and R ( τ ) = lim t →∞ R ( t, t + τ ) . (99)Applying this deﬁnition to the general correlation formula of geometric Brow-nian Motion, R ( τ ) = lim t →∞ s e σ t − e σ ( t + τ ) − r e σ t e σ ( t + τ ) = e − σ τ . (100)42o the correlation is exponentially decreasing as a function of the time lag.We can now follow the same procedure in order to ﬁnd the correlation ofthe stochastic process deﬁned by equation (91).Its second moment is equal to E [ Y ( t )] = b Z t Z t e − (cid:16) a + σ (cid:17) ( x + y ) · E (cid:2) e σW x e σW y (cid:3) dxdy = b Z t Z t e − (cid:16) a + σ (cid:17) ( x + y ) e σ x ∧ y e σ ( x ∨ y − x ∧ y ) dxdy = b Z t Z x e − (cid:16) a + σ (cid:17) ( x + y ) e σ y e σ ( x − y ) dxdy + b Z t Z tx e − (cid:16) a + σ (cid:17) ( x + y ) e σ x e σ ( y − x ) dxdy = 2 (cid:16) a (cid:16) − e − at + e t ( − a + σ ) (cid:17) + ( − e − at ) σ (cid:17) a (2 a − aσ + σ ) (101)where we have assumed that all integrals are ﬁnite, which means that therate a has to be greater than the input variance σ . As t goes to inﬁnity, wecan ignore all the decaying exponentials.lim t →∞ E [ Y ( t )] =  ∞ if a ≤ σ b a ( a − σ ) if a > σ . (102)In what follows, we will only be interested in the behavior of the systemwhen a > σ , because it only makes sense to compute the correlation whenthe standard deviation is ﬁnite.Based on equation (102), the standard deviation (when it is deﬁned) isequal to σ Y = b σ a (2 a − σ ) = σ a − σ ¯ Y . (103)The standard deviation is proportional to the average value of Y , sincethe larger the value of Y , the larger the strength of the disturbance. Assume that a pathway consists of two nodes. The ﬁrst one is aﬀected bymultiplicative noise, and it is used as an input to the second node. We ﬁrst43nalyze a system where each state has a single real pole, and later on we willgeneralize it for an arbitrary number of poles. The equations of the systemare dX = cdt + ( − f dt + σdW t ) XdY = bX − aY. (104)Combining the forms for the multiplicative noise and the output of a singlepole ﬁlter, Y ( t ) = bce − at Z t e as (cid:18)Z s e − (cid:16) f + σ (cid:17) u e σW u du (cid:19) ds. (105)The mean is equal to: E [ Y ( t )] = bce − at Z t e as (cid:18)Z s e − (cid:16) f + σ (cid:17) u E (cid:2) e σW u (cid:3) du (cid:19) ds = bce − at Z t e as (cid:18)Z s e − fu du (cid:19) ds = bc (cid:0) a − ae − ft + ( − e − at ) f (cid:1) a ( a − f ) f . (106)The last equation also holds when a = f , and we can ﬁnd the expected valueby ﬁnding the limit as f → a . Letting the time t go to inﬁnity, E [ Y ] = lim t →∞ E [ Y ( t )] = bcaf (107)which is exactly the same as an equivalent system without any noise.The second moment is E [ Y ] = b c e − at Z t e ar dr Z t e as ds Z r Z s e − ( f + σ )( x + y ) E (cid:2) e σ ( W x + W y ) (cid:3) dxdy. (108)We break the integral above in ﬁve parts, in order to compute the expected44alue inside it: e at b c E [ Y ( t )] = Z t e ar dr Z tr e as ds Z r (cid:18)Z x e − fx e − fy e σ y dy (cid:19) dx + Z t e ar dr Z tr e as ds Z r (cid:18)Z sx e − fx e − fy e σ x dy (cid:19) dx + Z t e ar dr Z r e as ds Z s (cid:18)Z x e − fx e − fy e σ y dy (cid:19) dx + Z t e ar dr Z r e as ds Z s (cid:18)Z sx e − fx e − fy e σ x dy (cid:19) dx + Z t e ar dr Z r e as ds Z rs (cid:18)Z s e − fx e − fy e σ y dy (cid:19) dx. (109)After performing all the algebraic calculations, E [ Y ] = lim t →∞ E [ Y ( t )] = b c a f ( f − σ ) (110)given that the second moment is ﬁnite, which happens when f > σ . Thevariance is V [ Y ] = b c σ a f (2 f − σ ) . (111)We can write the above equation as a constant times the variance of the ﬁrststate: V [ Y ] = (cid:18) ba (cid:19) c σ f (2 f − σ )= (cid:18) ba (cid:19) V [ X ] . (112)The variance of Y is fundamentally diﬀerent from the variance in the casewhen white noise is added directly to the input, in which case, it would beequal to V [ Y ] = b a σ in . (113)The time evolution of the variance is shown in Figure 16. When the noiseis multiplicative, it takes longer for the variance to settle to its steady statevalue, which is also an indication that the output variance consists of lowerfrequencies than in the case of additive noise.45 V a r i a n ce GeometricAdditive

Figure 16: Evolution of the output variance of a single pole ﬁlter when the input isaﬀected by additive and multiplicative noise respectively. The system with additivenoise has less variance in the output compared to the one with multiplicative noise.Also, in the case of geometric noise, the variance takes more time to settle to itsequilibrium value.

More generally, if we pass the output of the multiplicative noise throughan arbitrary linear ﬁlter with impulse response h ( t ) then the output is deﬁnedas the convolution of the impulse response and the input: Y ( t ) = c Z t h ( t − s ) (cid:18)Z s e − (cid:16) f + σ (cid:17) u e σW u du (cid:19) ds. (114)The mean is E [ Y ( t )] = c Z t h ( t − s ) (cid:18)Z s e − fu du (cid:19) ds = cf (cid:18)Z t (1 − e − fs ) h ( t − s ) ds (cid:19) . (115)46he variance is equal to V [ Y ( t )] = E [ Y ( t )] − ( E [ Y ( t )]) = c Z t h ( t − r ) dr Z r h ( t − s ) ds Z s Z y e − f ( x + y ) e σ x dxdy + c Z t h ( t − r ) dr Z r h ( t − s ) ds Z s Z ry e − f ( x + y ) e σ y dxdy + c Z t h ( t − r ) dr Z tr h ( t − s ) ds Z r Z x e − f ( x + y ) e σ y dydx + c Z t h ( t − r ) dr Z tr h ( t − s ) ds Z r Z sx e − f ( x + y ) e σ x dydx − c f (cid:18)Z t (1 − e − fs ) h ( t − s ) ds (cid:19) . (116)For example, if the ﬁlter has one pole at − a with a >

0, then h ( t, s ) = e − a ( t − s ) , we can verify that the mean and the variance are equal to the onesfound in equations (107) and (110).If we have n identical single-pole ﬁlters in series, with the same pole at − a , with a ∈ R , and their input is multiplied by b , then the mean is E [ Y ] = lim t →∞ b n cf Z t (1 − e − fs ) ( t − s ) n − ( n − e − as ds = b n a n · cf (117)and the variance is equal to V [ Y ] = (cid:18) ba (cid:19) n (cid:18) cf (cid:19) σ (2 f − σ ) . (118)The above results show how variation that enters the system throughnoisy degradation rates aﬀects the output of a given pathway. For example,in the two-step cascade X → YY → Z (119)described by (104), species Y is aﬀected by multiplicative noise, and then isused as an input to the next reaction that produces Z . The second reaction47cts as a ﬁrst order linear ﬁlter, and the noise propagates to the pathwayoutput Z . The analysis can be used for any system that can be describedby linear diﬀerential equations. If a linear time invariant system is describedby (2) then, if there is noise in the input u or its input matrix B , then wecan consider noise a new additional input as in equation (14), and solve itaccordingly. The same holds for the oﬀ-diagonal elements of the dynamicalmatrix A . But noise in the diagonal elements of A is multiplicative noise, andneeds to be considered separately from all other noise sources, and it leadsto qualitatively diﬀerent behavior than the previous kinds of input noise. In this section, we will examine how noise propagates in general linear chem-ical reaction networks. Noise in chemical reaction networks that do not in-volve bimolecular or higher order reactions has been studied extensively (seefor example [17]) and chemical reactions have also been analyzed as analogsignal processing systems [18]. In this section, we will study reactions wheretwo or more reactants are noisy, and their disturbances may be correlatedwith each other.

Consider the following reaction: X + Y → Z. (120)Further assume that the concentration of X and Y is subject to randomwhite noise ﬂuctuations around a deterministic mean value: X t = X + σ X dU t Y t = Y + σ Y dW t (121)and Z degrades with a rate proportional to its concentration. The corre-sponding stochastic diﬀerential equation is dZ = ( X Y − aZ t ) dt + X σ Y dW t + Y σ X dU t + σ X σ Y d [ U t , W t ] (122)48here U t and W t are standard Brownian motions. Equation (122) is a naturalgeneralization of the case where we have only one or more noise terms that areadded to the deterministic diﬀerential equation. In all stochastic diﬀerentialequations so far, we multiply the deterministic factors that contribute tothe inﬁnitesimal change in the state of the system by dt , and then we addthe noise terms. When we have a product of two noisy inputs, we willﬁrst consider the noiseless case, and then add all the noise terms, and theirproducts as well. In equation (122) the deterministic term is equal to X Y and the noise terms that are added are equal to X t − Y t − X Y . The term dU t dW t = d [ U t , W t ] is the diﬀerential of the quadratic covariation process of U t and W t . If the two processes have correlation ρ , then d [ U t , W t ] = ρdt. (123)Simplifying the last expression for dZ , dZ = ( X Y − aZ t ) dt + d ( X t Y t )= ( X Y + ρσ X σ Y − aZ t ) dt + X σ Y dW t + Y σ X dU t (124)which is the familiar Ornstein − Uhlenbeck process with two noise sources.The ﬁnal expression for the concentration of Z is Z ( t ) = 1 a ( X Y + ρσ X σ Y )(1 − e − at )+ σ X Y Z t e a ( t − s ) dU s + σ Y X Z t e a ( t − s ) dW s . (125)As the eﬀect of the initial conditions diminishes, the mean is¯ Z = lim t →∞ E [ Z ( t )] = 1 a ( X Y + ρσ X σ Y ) (126)and the variance is equal to V [ Z ] = lim t →∞ V [ Z ( t )] = Y σ X + X σ Y + 2 X Y ρσ X σ Y a . (127)An important consequence of correlations in the input noise ( ρ = 0) is thatthe mean is diﬀerent from the case where there is no noise, even if both noiseterms in (121) have themselves zero mean. If the correlation is negative, themean is lower and vice versa. In addition, the variance is larger when thereare positive correlations in the two input noise terms, as expected. When thecorrelation is negative, the two noise processes partially cancel each other,resulting in lower variance. 49 .2 General Reactions We can generalize the above results to general reactions of the form a X + · · · + a N X N → b Y + · · · + b M Y M (128)where each of the elements of the left-hand side is assumed to be a randomvariable that consists of a deterministic mean ¯ X k and a standard white noiseprocess dW ( k ) t multiplied by the standard deviation of its concentration. X k ( t ) = ¯ X k + σ k dW ( k ) t ≤ k ≤ N. (129)The concentration of the product Y j is described by a stochastic diﬀerentialequation: dY j = b j N Y u =1 ¯ X u − f j Y j ! dt + b j N X k =1 σ k  N Y u =1 u = k ¯ X u  dW ( k ) t + b j N X k =1 N X m =1 σ k σ m  N Y u =1 u = k,m ¯ X u  ρ k,m dt + O ( dt ) . (130)The last equation is derived by using Itˆo’s box rule, and the fact that higherorder products of Wiener processes have variance that tends to zero fasterthan dt as dt →

0. As in the bimolecular case, we multiply the noiselessinput by dt , as in the corresponding ordinary diﬀerential equation, and thenwe add all the noise terms, and their products.The mean (disregarding initial conditions) is E [ Y j ] = b j f j  N Y u =1 ¯ X u + N X k =1 N X m =1 σ k σ m  N Y u =1 u = k,m ¯ X u  ρ k,m  (131)which is diﬀerent from the case when there is no noise, if there are correlationsamong the noise terms. The last equation clearly shows that noisy inputscan have an eﬀect in the average of the concentration of the output, even iftheir mean is zero. The amount by which they shift the mean depends ontheir own variances, their correlations, and the product of concentrations ofall other reactants. 50he variance is equal to V [ Y j ] = b j f j  N X k =1 σ k N Y u =1 u = k ¯ X u + X k σ , λ > σ ⇒ λ + λ > ρσ σ . (156)The inequalities above guarantee that the inputs have ﬁnite variances, asshown in equation (102). In the equilibrium state,lim t →∞ E [ u ( t )] = ¯ X ¯ X λ + λ ( λ + λ − ρσ σ ) . (157)The output average is then equal to E [ Y ] = lim t →∞ E [ Y ( t )] = ¯ X ¯ X a · λ + λ ( λ + λ − ρσ σ ) . (158)55he last equation clearly shows that if the input noise sources are corre-lated ( ρ = 0), the average value of the output will be diﬀerent from the valuewhen there is no correlation ( ρ = 0). As shown in the other types of noise,positive correlations increase the mean, and negative correlations reduce it.The variance can be computed using the same methods. First, we willcalculate the expected value of a product of diﬀerent instances of a standardWiener process. Lemma 6. If t , t , . . . t n ∈ R + is an ordered set of times such that t ≤ t ≤ . . . ≤ t n and σ , σ . . . σ n ∈ R + are arbitrary positive numbers denotingstandard deviations, then E " n Y k =1 e σ k W tk = exp  n X k =1 n X m = k σ m ! ( t k − t k − )  (159) where W t is the standard Wiener process.Proof. For each t k , we decompose the Wiener process W t k as a sum of inde-pendent processes: W t k = k X m =1 (cid:0) W t m − W t m − (cid:1) . (160)Based on the sum above, we can write n Y k =1 e σ k W tk = exp " n X k =1 σ k W t k = exp " n X k =1 σ k k X m =1 (cid:0) W t m − W t m − (cid:1) = exp " n X k =1 (cid:0) W t k − W t k − (cid:1) n X m = k σ k = n Y k =1 exp "(cid:0) W t k − W t k − (cid:1) n X m = k σ k (161)where in the last equation, we changed the order of summation making use56f the triangle rule. All terms in the last product are independent: E " n Y k =1 e σ k W tk = E " n Y k =1 exp "(cid:0) W t k − W t k − (cid:1) n X m = k σ k = n Y k =1 E " exp "(cid:0) W t k − W t k − (cid:1) n X m = k σ k = n Y k =1 exp 

12 ( t k − t k − ) n X m = k σ k !  = exp  n X k =1 n X m = k σ k ! ( t k − t k − )  . (162)When one of the inputs is aﬀected by multiplicative noise, and the otherby additive noise, the mean value of the output is not aﬀected, even if thedriving noise is the same in both cases. If consider again the chemical reaction(140), the diﬀerential equation in that case becomes dYdt = − aY + (cid:18) λ ¯ X Z t e − ( λ + σ ) x e σW x dx (cid:19) (cid:18) ¯ X + σ Z t e − a ( t − y ) dW y (cid:19) . (163)The input is equal to u ( t ) = (cid:18) λ ¯ X Z t e − ( λ + σ ) x e σW x dx (cid:19) (cid:18) ¯ X + σ Z t e − λ ( t − y ) dW y (cid:19) (164)and its expected value is E [ u ( t )] = λ ¯ X ¯ X Z t e − ( λ + σ ) x E (cid:2) e σW x (cid:3) dx + σλ ¯ X Z t Z t e − ( λ + σ ) x e − λ ( t − y ) E (cid:2) e σW x dW y (cid:3) dx. (165)In order to compute the second term of the last equation, we will need thefollowing Lemma about the expected value of the product an exponentialWiener process with an inﬁnitesimal diﬀerence of the same process.57 emma 7. If W t is a standard Wiener process, then E (cid:2) e σW s dW t (cid:3) =  if s ≤ tσ e σ s dt if s > t. (166) Proof. If s < t , then W s and dW t = W t + dt − W t are uncorrelated, so E (cid:2) e σW s dW t (cid:3) = E (cid:2) e σW s (cid:3) E [ dW t ] = 0 . (167)Now, if 0 < a < b < s , then E (cid:2) e σW s ( W b − W a ) (cid:3) = E (cid:2) e σW a (cid:3) E (cid:2) e σ ( W b − W a ) ( W b − W a ) (cid:3) E (cid:2) e σ ( W s − W b ) (cid:3) = e σ a e σ ( b − a ) σ ( b − a ) e σ ( s − b ) = σ e σ s ( b − a ) . (168)Setting a = t and b = t + dt , we get the desired result.Recalling equation (165), E [ u ( t )] = λ ¯ X ¯ X Z t e − λ x dx + σ λ ¯ X e − λ t Z t (cid:18)Z ty e − λ x e λ y dx (cid:19) ds = ¯ X ¯ X (cid:0) − e − λ t (cid:1) + σ ¯ X e − t ( λ + λ ) (cid:0) λ (cid:0) − e tλ (cid:1) − (cid:0) − e tλ (cid:1) λ (cid:1) λ ( λ − λ ) . (169)As time t grows large, lim t →∞ E [ u ( t )] = ¯ X ¯ X (170)and the mean of the output is E [ Y ] = 1 a ¯ X ¯ X (171)which is exactly the same as in the case where the two noise inputs are com-pletely uncorrelated. So, input noise correlation does not aﬀect the averageconcentration of the output in this case.This section has analyzed how noise propagates in an arbitrary chemicalreaction network where one or more inputs include a random component.The diﬀerent noise sources may have arbitrary correlations with each other.58e have studied the propagation of both additive and multiplicative noise.One of the main results is that even if all noise sources have mean equal tozero, their correlations shift the mean of the outputs, for both types of noise.If there is positive correlation, the mean of the output increases, and whenthe correlation is negative, it shifts lower, and the same is true for the outputvariance. We have shown how noise propagates in networks and how a network’s noisyparameters can aﬀect its output. Since many biological networks are locallytree-like, we have studied how noise propagates in the absence of feedfor-ward or feedback cycles. Tree networks are relatively easy to quantitativelyanalyze, since there is only one path from each node to another. We havederived a method to compute the variance of the output of any tree network,and shown that the variance is minimized when there are no “bottlenecks”in each pathway, in other words when there is no rate limiting step. Whena network is not a tree, there are cycles, which means that a signal (alongwith its noise) can propagate through two or more paths towards the output.Feedback cycles typically reduce the output variance, and feedforward cyclesincrease it. When the noise sources are correlated, the variance in the outputis larger, and small cycles have a stronger inﬂuence on the output, comparedto longer cycles in both cases. Delays contribute to the decrease of the out-put noise when we have two or more noise sources, since their correlationis diminished. Crosstalk is also shown to decrease the output variance, butthe tradeoﬀ is that the output mean is lowered, or the concentration of theinputs needs to be proportionally higher in order to ensure the same output.In biological and chemical reaction networks, the reaction rates are proneto noise, since they depend on the concentration of other species. Whenthe degradation rates are aﬀected by noise, the result is increased outputvariance, which also depends on the concentration of the respective species,and the form of the output is diﬀerent from when the noise is in the inputs,in the sense that higher concentrations also correspond to larger deviationsfrom the mean. Finally, we have extensively studied how noise propagatesthrough chemical reaction networks where one or more of the reactants arenoisy, and their disturbances may be correlated. Even when the disturbanceshave zero average, correlations change the output mean, and variance.59 eferences [1] Paulsson, J

Summing Up the Noise in Gene Networks , Nature

Nature, Nurture, or Chance: StochasticGene Expression and its Consequences , Cell

Functional Roles for Noise in Genetic Cir-cuits , Nature

Quantiﬁcation and minimization ofcrosstalk sensitivity in networks , arXiv:1012.0606v1 (2010)[5] ˚Astr¨om, KJ and Murray, RM

Feedback Systems: An Introduction forScientists and Engineers

Princeton University Press (2008).[6] Liptser, RS and Shiryaev, AN

Statistics of Random Processes I: GeneralTheory

Springer (2000)[7] Newman, MEJ

Networks: An introduction , Oxford University Press(2010).[8] Rosner, B

Fundamentals of Biostatistics

Duxbury Press (2005)[9] Jeong, H

The large-scale organization of metabolic networks , Nature

Consensus Problems in Networks ofAgents with Switching Topology and Time-Delays , IEEE Transactionson Automatic Control On realizability of a set of integers as degrees of the verticesof a linear graph , Applied Mathematics Extremal Properties of Complex Net-works , arXiv:1104.5532v1 (2011)[13] Barmpoutis, D and Murray, RM

Networks with the smallest averagedistance and the largest average clustering , arXiv:1007.4031v1 (2010)[14] Alon, U

An introduction to systems biology: design principles of biolog-ical circuits , Chapman and Hall/CRC (2006)6015] Gardiner, CW

Handbook of Stochastic Methods for Physics, Chemistry,and the Natural Sciences , Springer (1983).[16] Lestas, I et al.

Fundamental limits on the suppression of molecular ﬂuc-tuations , Nature

Exact Results for Noise Power Spectra in LinearBiochemical Reaction Networks , The Journal of Chemical Physics

Signal Processing by Simple Chemical Systems , Jour-nal of Physical Chemistry106