A stochastic molecular scheme for an artificial cell to infer its environment from partial observations
Muppirala Viswa Virinchi, Abhishek Behera, Manoj Gopalkrishnan
AA stochastic molecular scheme for an artificial cell to infer itsenvironment from partial observations
Muppirala Viswa Virinchi Abhishek Behera Manoj GopalkrishnanIndia Institute of Technology Bombay, Mumbai, India { axlevisu, abhishek.behera.iitm, manoj.gopalkrishnan } @gmail.comApril 1, 2017 Abstract
The notion of entropy is shared between statistics and thermodynamics, and is fundamental to bothdisciplines. This makes statistical problems particularly suitable for reaction network implementations.In this paper we show how to perform a statistical operation known as Information Projection or Eprojection with stochastic mass-action kinetics. Our scheme encodes desired conditional distributions asthe equilibrium distributions of reaction systems. To our knowledge this is a first scheme to exploit theinherent stochasticity of reaction networks for information processing. We apply this to the problem ofan artificial cell trying to infer its environment from partial observations.
Biological cells function in environments of high complexity. Transmembrane receptors allow a cell to samplethe state of its environment, following which biochemical reaction networks integrate this information, andcompute decision rules which allow the cell to respond in sophisticated ways. One challenge is that receptorsmay be imperfectly specific, binding to multiple ligands with various propensities. What algorithmic andstatistical ideas are needed to deal with this challenge, and how would these ideas be implemented withreaction networks? These are the questions we begin to address here. The two questions do not decouplebecause the attractiveness of algorithmic and statistical ideas towards these challenges is tied in with theirease of implementation with reaction networks. We are interested in statistical algorithms that fully exploitthe native dynamics and stochasticity of reaction networks. To fix ideas, let us consider an example.
Example 1.
Consider an artificial cell with two types of transmembrane receptors R and R in an envi-ronment with three ligand species L , L , and L (Figure 1). Receptor R has equal affinity to ligands L and L , and no affinity to L . Receptor R has equal affinity to ligands L and L , and no affinity to L .This information can be summarized in an observation matrix O = L L L (cid:18) (cid:19) R R l , l , l of the ligands from receptor binding information. We assume that a prior probability distributionover ligand states ( l , l , l ) ∈ Z ≥ is given. We further assume that this prior probability distribution isa product of Poisson distributions specified by given Poisson rate parameters q , q , q ∈ R > respectively.Lemma 4 provides intuition for the product-Poisson assumption. The following questions concern us.1 a r X i v : . [ q - b i o . M N ] A p r igure 1: An artificial cell with two transmembrane receptors R and R and extracellular ligands L , L , L . R has equal affinity to both L and L . R has equal affinity to both L and L .1. Given information on the exact numbers r and r of binding events of receptors R and R , obtainsamples over populations ( l , l , l ) of the ligand species according to the Bayesian posterior distributionPr[( l , l , l ) | ( r , r , Poisson( q , q , q ))].2. Given information on the average numbers (cid:104) r (cid:105) and (cid:104) r (cid:105) of binding events of receptors R and R ,obtain samples over populations ( l , l , l ) of the ligand species according to the Bayesian posteriordistribution Pr[( l , l , l ) | ( (cid:104) r (cid:105) , (cid:104) r (cid:105) , Poisson( q , q , q ))].We investigate these questions for arbitrary numbers of receptors and ligands, arbitrary observationmatrices O , and arbitrary product-Poisson rate parameters q , and make the following new contributions: • In Section 3, we precisely state our question in the general setting. In Section 4, we illustrate our mainideas on Example 1. • In Section 5.1, we describe a reaction network scheme Proj that takes as input an observation matrix O and outputs a prime chemical reaction network. Our proposed reaction networks have the followingmerits that make them promising candidates for molecular implementation. Implementing the reactionsrequires only thermodynamic control and not kinetic control because the reaction rate constants needonly be specified upto the equilibrium constant for the reactions (Remark 3). Our scheme avoidscatalysis, and so is robust to “leak reaction” situations [21] (Remark 4). • In Section 5.2, we address Question 1. We show that for each fixed O and q , when the chemicalreaction system is initialized as prescribed according to the numbers r i of binding events of receptors,and allowed to evolve according to stochastic mass-action kinetics, then the system evolves towards thedesired Bayesian posterior distribution (Theorem 7). • In Section 5.3, we address Question 2. We show that for each fixed O and q , when the chemical reactionsystem is initialized as prescribed according to the average numbers (cid:104) r i (cid:105) of binding events of receptors,and allowed to evolve according to deterministic mass-action kinetics, then the distribution of unit-volume aliquots of the system evolves towards the desired Bayesian posterior distribution (Theorem 10). • We do a literature review in Section 6, comparing our scheme with other reaction network schemesthat process information. Exploiting inherent stochasticity and free energy minimization appear to bethe two key new ideas in our scheme. • In Section 7, we discuss limitations and directions for future work, including a reaction scheme forthe expectation-maximization algorithm, which is a commonly used algorithm in machine learning andmay be a more sophisticated way for an artificial cell to infer its environment from partial observations.2
Background
For n ∈ Z > , following [15], KL Divergence D : R n ≥ × R n ≥ → R is the function D ( x || y ) := n (cid:88) i =1 x i log( x i y i ) − x i + y i with the convention 0 log 0 = 0 and for p > p log 0 = −∞ . If x, y are probability distributions then (cid:80) ni =1 − x i + y i = 0 and KL Divergence is the same as relative entropy (cid:80) ni =1 x i log (cid:16) x i y i (cid:17) . When the index i takes values over a countably infinite set, we define KL Divergence by the same formal sum as above, andunderstand it to be well-defined whenever the infinite sum converges in [0 , ∞ ]. For x ∈ R k> , by Poisson( x )we mean Pr[ n , n , . . . , n k | x ] = (cid:81) ki =1 e − x i x nii n i ! . The following lemma is well-known and easy to show. Lemma 1. D (Poisson( x ) || Poisson( y )) = D ( x || y ) for all x, y ∈ R k> .The Exponential-Projection or E-Projection [15] (or Information-Projection or I-Projection [6]) of aprobability distribution q onto a set of distributions P is p ∗ = arg min p ∈ P D ( p || q ). The Mixture-Projectionor M-Projection (or reverse I-projection) of a probability distribution p onto a set of distributions Q is q ∗ = arg min q ∈ Q D ( p || q ). We recall notation, definitions, and results from reaction network theory [10, 14, 11, 12, 1]. For x, y ∈ R k ,by x y we mean (cid:81) ki =1 x y i i , and by e x we mean (cid:81) ki =1 e x i . For m ∈ Z k ≥ , by m ! we mean (cid:81) ki =1 m i !.Fix a finite set S of species . By a reaction we mean a formal chemical equation (cid:88) i ∈ S y i X i → (cid:88) i ∈ S y (cid:48) i X i where the numbers y i , y (cid:48) i ∈ Z ≥ are the stoichiometric coefficients . This reaction is also written as y → y (cid:48) where y, y (cid:48) ∈ Z S ≥ . A reaction network is a pair ( S, R ) where S is finite, and R is a finite set ofreactions. It is reversible iff for every reaction y → y (cid:48) ∈ R , the reaction y (cid:48) → y ∈ R . Fix n, n (cid:48) ∈ Z S ≥ .We say that n (cid:55)→ R n (cid:48) , read n maps to n (cid:48) iff there exists a reaction y → y (cid:48) ∈ R with y i ≤ n i for all i ∈ S and n (cid:48) = n + y (cid:48) − y . We say that n ⇒ R n (cid:48) , or in words that n (cid:48) is R - reachable from n , iff there exista nonnegative integer k ∈ Z ≥ and n (1) , n (2) , . . . , n ( k ) ∈ Z S ≥ such that n (1) = n and n ( k ) = n (cid:48) and for i = 1 to k −
1, we have n ( i ) (cid:55)→ R n ( i + 1). A reaction network ( S, R ) is weakly reversible iff for everyreaction y → y (cid:48) ∈ R , we have y (cid:48) ⇒ y . Trivially, every reversible reaction network is weakly reversible. The reachability class of n ∈ Z S ≥ is the set Γ( n ) = { n | n ⇒ R n } . The stoichiometric subspace H R isthe real span of the vectors { y (cid:48) − y | y → y (cid:48) ∈ R} . The conservation class containing x ∈ R S ≥ is the set C ( x ) = ( x + H R ) ∩ R S ≥ .Fix a weakly reversible reaction network ( S, R ). Let x = ( x i ) i ∈ S . The associated ideal I ( S, R ) ⊆ C [ x ] isthe ideal generated by the binomials { x y − x y (cid:48) | y → y (cid:48) ∈ R} . A reaction network is prime iff its associatedideal is a prime ideal, i.e., for all f, g ∈ C [ x ], if f g ∈ I then either f ∈ I or g ∈ I .A reaction system is a triple ( S, R , k ) where ( S, R ) is a reaction network and k : R → R > is calledthe rate function . It is detailed balanced iff it is reversible and there exists a point q ∈ R S> such thatfor every reaction y → y (cid:48) ∈ R : k y → y (cid:48) q y ( y (cid:48) − y ) = k y (cid:48) → y q y (cid:48) ( y − y (cid:48) )A point q ∈ R S> that satisfies the above condition is called a point of detailed balance .Fix a reaction system ( S, R , k ). Then stochastic mass action describes a continuous-time Markovchain on the state space Z S ≥ . A state n = ( n i ) i ∈ S ∈ Z S ≥ of this Markov chain represents a vector of3olecular counts, i.e., each n i is the number of molecules of species i in the population. Transitions go from n → n + y (cid:48) − y for each n ∈ Z S ≥ and each y → y (cid:48) ∈ R , with transition rates λ ( n → n + y (cid:48) − y ) = k y → y (cid:48) n !( n − y )!The following theorem states that the stationary distributions of detailed-balanced reaction networks areobtained from products of Poisson distributions. It is well-known, see for example [23] for a proof. Theorem 2.
If ( S, R , k ) is detailed balanced with q a point of detailed balance then the correspondingstochastic mass action Markov chain admits on each reachability class Γ ⊂ Z S ≥ a unique stationary distri-bution π Γ ( n ) ∝ (cid:40) e − q q n n ! for n ∈ Γ0 otherwise
Deterministic mass action describes a system of ordinary differential equations in concentration vari-ables { x i ( t ) | i ∈ S } : ˙ x ( t ) = (cid:88) y → y (cid:48) ∈R k y → y (cid:48) x ( t ) y ( y (cid:48) − y ) (1)Note that every detailed balance point is a fixed point to Equation 1. For detailed balanced reactionsystems, every fixed point is also detailed balanced. Moreoever, every conservation class C ( x ) has a uniquedetailed balance point x ∗ in the positive orthant. Further if the reaction network is prime then x ∗ is a “globalattractor,” i.e., all trajectories starting in C ( x ) ∩ R S> asymptotically reach x ∗ . (Recently Craciun [5] hasproved the global attractor theorem for all detailed-balanced reaction systems with a much more involvedproof. We do not need Craciun’s theorem, the special case which holds for prime detailed-balanced reactionsystems and is much easier to prove, suffices for our purposes.) The following Global Attractor Theoremfor Prime Detailed Balanced Reaction Systems follows from [12, Corollary 4.3, Theorem 5.2]. See [13,Theorem 3] for another restatement of this theorem. Theorem 3.
Let ( S, R , k ) be a prime, detailed balanced reaction system with point of detailed balance q .Fix a point x ∈ R S> . Then there exists a point of detailed balance x ∗ in C ( x ) ∩ R S> such that for everytrajectory x ( t ) to Equation 1 with initial conditions x (0) ∈ C ( x ) ∩ R S ≥ , the limit lim t →∞ x ( t ) exists andequals x ∗ . Further D ( x ( t ) || q ) is strictly decreasing along non-stationary trajectories and attains its uniqueminimum value in C ( x ) ∩ R S ≥ at x ∗ . We argue in the next lemma that a product of Poisson distributions is not an unreasonable form to use as aprior on ligand populations. The ideas are familiar from statistical mechanics as well as stochastic processes.We recall them in a chemical context.
Lemma 4.
Consider a well-mixed vessel of infinite volume with n species X , X , . . . , X n at concentrations x , x , . . . , x n respectively. Assume that the solution is sufficiently dilute, and that molecule volumes arevanishingly small. A unit volume aliquot is taken. Then the probability of finding the population in thealiquot in state ( m , m , . . . , m n ) ∈ Z ≥ is given by the product-Poisson distribution (cid:81) ni =1 e − xi x mii m i ! Proof.
We will first do the analysis for a finite volume V and then let V → ∞ .Consider a container of finite volume V, which contains species X , X , . . . , X n at concentrations x , x . . . , x n .Consider a unit volume aliquot within this particular container. The probability of finding a particularmolecule from the vessel within the unit volume aliquot is V . The number of molecules of species X i in the4essel is V x i for i = 1 . . . n . Hence the probability of finding m i molecules of species X i in the aliquot isgiven by the binomial coefficient (cid:18) V x i m i (cid:19) (cid:18) V (cid:19) m i (cid:18) − V (cid:19) V x i − m i . We assume that the solution is sufficiently dilute, and that molecular sizes are vanishingly small, so thatthe probability of finding one molecule in the aliquot is independent of the probability of finding a differentmolecule in the aliquot. This assumption leads to:Pr( m , m , . . . , m n | x , x , . . . , x n ) = n (cid:89) i =1 (cid:18) V x i m i (cid:19) (cid:18) V (cid:19) m i (cid:18) − V (cid:19) V x i − m i The RHS follows because for all i ∈ { , , . . . , n } :lim V →∞ (cid:18) V x i m i (cid:19) (cid:18) V (cid:19) m i (cid:18) − V (cid:19) V x i − m i = lim V →∞ V x i ( V x i − . . . ( V x i − m i + 1) V m i m i ! (cid:104) (1 − /V )) V (cid:105) x i − m i /V which equals e − x i x m i i /m i !Fix positive integers n R , n L ∈ Z ≥ with n R ≤ n L denoting the number of receptor species and thenumber of ligand species respectively. Fix q = ( q , q , . . . , q n L ) ∈ R n L > denoting Poisson rate parametersfor the product-Poisson distribution Poisson( q ) which we consider as a prior over ligand numbers. Fix an n R × n L observation matrix O with entries o ij in the nonnegative rational numbers Q ≥ . The entry o ij denotes the affinity of the i ’th receptor R i for the j ’th ligand L j . The intuition is that when ligand j encounters receptor i , the propensity that a binding occurs is proportional to o ij . So a high-affinity ligandwill trigger a receptor more often than a low-affinity ligand with the same concentration, with the numberof times they trigger the receptor in proportion to their corresponding entries in the observation matrix.Our results in this paper will hold for a subclass of observation matrices which we term tidy. Anobservation matrix O = ( o ij ) n R × n L is tidy iff for each receptor R i there exists a message vector m i ∈ Z n L ≥ such that Om i = e i where e i ∈ R n R is the unit vector with a 1 in the row corresponding to the i ’th receptor.The intuition is that for j = 1 to n L , species X j will be the cell’s internal representation of the ligand L j .Every time receptor R i is bound, it will trigger a cascade leading to the synthesis inside the cell of m ij molecules of species X j for j = 1 to n L .Note that there could be multiple message vector sets { m i } i =1 to n R , so the cell need not choose the“correct” one. The task of figuring out the true state of the environment will be left to the reaction networkoperating inside the cell between the molecules X , X , . . . , X n L . The messages only perform the task ofinitializing the reaction network in the right reachability class. The following questions concern us.1. Given information on the exact numbers r = ( r , r , . . . , r n R ) ∈ Z n R ≥ of receptor binding events, obtainsamples over populations l = ( l , l , . . . , l n L ) ∈ Z n L ≥ of the ligand species according to the Bayesianposterior distribution Pr[ l | ( r, Poisson( q , q , . . . , q n L ))]2. Given information on the average numbers (cid:104) r (cid:105) = ( (cid:104) r (cid:105) , (cid:104) r (cid:105) , . . . , (cid:104) r n R (cid:105) ) ∈ R n R > of receptor bindingevents (averaged over the surface of the cell, or time, or both), obtain samples over populations l = ( l , l , . . . , l n L ) ∈ Z n L ≥ of the ligand species according to the Bayesian posterior distributionPr[ l | ( (cid:104) r (cid:105) , Poisson( q , q , . . . , q n L ))] Before moving to the general solution, we illustrate our main ideas with an example.5 xample 1 (continuing from p. 1) . Consider the observation matrix O = L L L (cid:18) (cid:19) R R q = ( q , q , q ) ∈ R > from Example 1. We describe a chemical reaction system (Proj( O, B ) , k q )as follows. There is one chemical species X i corresponding to each ligand L i , so that the species are X , X ,and X . To describe the reactions, we compute a basis for the right kernel of O . In this case, the vector(1 , , − T is a basis for the right kernel. (To be precise, we will view the right kernel as a free group in theinteger lattice, and take a basis for this free group. This ensures not only that each basis vector has integercoordinates, but also that the corresponding reaction network is prime, which we use crucially in our proofs.)Each basis vector is written as a reversible reaction, with negative numbers representing stoichiometriccoefficients on one side of the chemical equation, and positive numbers representing stoichiometric coefficientson the other side. So the vector (1 , , − T describes the reversible pair of reactions X + X (cid:10) X .The rates of the reactions need to be set so that q is a point of detailed balance. For this example, callingthe forward rate k ∈ R > and the backward rate k ∈ R > , the balance condition is k q q = k q so that k /k = q q q . One choice satisfying this condition is k = q and k = q q . Note that our scheme requiresonly the ratio of the rates to be specified (Remark 3). Solution to Question 1:
Given r = ( r , r ) ∈ Z ≥ interpreted as ( r , r ) T = O ( l , l , l ) T , we want todraw samples from the conditional distribution Pr[( l , l , l ) | ( r , r , Poisson( q , q , q ))]. The statistical solu-tion is to multiply the Bayesian prior Poisson( q , q , q ) by the likelihood Pr[( r , r ) | ( l , l , l , Poisson( q , q , q ))],and normalize so probabilities add up to 1. The likelihood is the characteristic function of the set L = { l = ( l , l , l ) ∈ Z ≥ | Ol T = r T } . Note that O is tidy with message vectors m = (1 , , T and m = (0 , , T . The reaction system(Proj( O, B ) , k q ) which is X + X q −−− (cid:42)(cid:41) −−− q q X here, is initialized at n (0) = ( r , r ,
0) = (cid:80) i r i m i , and allowedto evolve according to stochastic mass-action kinetics with master equation:˙ p ( n, t ) = p ( n − , n − , n + 1 , t ) (cid:18) q q q ( n + 1) − n n (cid:19) + p ( n + 1 , n + 1 , n − , t ) (cid:18) ( n + 1)( n + 1) − q q q n (cid:19) where p ( n, t ) is the probability that the system is in state n at time t . We claim that the steady-statedistribution is the required Bayesian posterior. First note that this reaction system has a detailed balancedpoint q , so it admits Poisson( q ) as a steady-state distribution. Since n (0) ∈ L , it is enough to show that L forms an irreducible component of the Markov chain. Together we conclude that the steady-state distributionwill be a restriction of Poisson( q , q , q ) to the set L .To obtain that L forms an irreducible component of the Markov chain, we will crucially use the fact thatwe chose a basis of the free group to generate our reactions, and not just a basis of the real vector space.This will allow us to prove that the corresponding reaction network is prime, and hence that L forms anirreducible component. Note, for example, that if we had chosen the vector (2 , , − T in the kernel insteadof (1 , , − T , that would have given us the reaction 2 X + 2 X (cid:10) X in which case L does not form anirreducible component of the Markov chain since each reaction conserves parity of molecular counts. Solution to Question 2:
Given (cid:104) r (cid:105) = ( (cid:104) r (cid:105) , (cid:104) r (cid:105) ) ∈ R > of binding events of receptors R and R , with (cid:104) r (cid:105) interpreted as empirical average of O ( l , l , l ) T over a large number of samples of ( l , l , l ), we want todraw samples from the conditional distribution Pr[( l , l , l ) | ( (cid:104) r (cid:105) , (cid:104) r (cid:105) , Poisson( q , q , q ))]. Note that we6re conditioning over an event whose probability tends to 0 unless Oq T = (cid:104) r (cid:105) T , so the conditional distributionneeds to be defined using the notion of regular conditional distribution [8]. As the number of samples goesto infinity, by the conditional limit theorem [8, Theorem 7.3.8, Corollary 7.3.5], this conditional distributionconverges to Poisson( x ∗ ) where x ∗ = ( x ∗ , x ∗ , x ∗ ) ∈ R ≥ minimizes D ( x || q ) among all x satisfying Ox ∗ = (cid:104) r (cid:105) .Because these results are stated in the reference in much greater generality, to show that these results actuallyapply to our case will need some technical work which is the content of Section 5.3.To compute x ∗ , we allow (Proj( O, B ) , k q ) = X + X q −−− (cid:42)(cid:41) −−− q q X to evolve according to deterministicmass-action kinetics starting from x (0) = ( (cid:104) r (cid:105) , (cid:104) r (cid:105) ,
0) = (cid:80) i (cid:104) r i (cid:105) m i . ˙ x ( t )˙ x ( t )˙ x ( t ) = (cid:18) x ( t ) x ( t ) − q q q x ( t ) (cid:19) − − Then the equilibrium concentration is the desired x ∗ by Theorem 3. The required sample can be drawnby sampling a unit aliquot, as in Lemma 4.Our scheme suggests that the reactions are carried out in infinite volume, which seems impractical. Inpractise, infinite volume need not be necessary because the chemical dynamics of even molecular numbers assmall as 50 molecules are often described fairly accurately by the infinite-volume limit. Further, our schemesuggests an infinite number of samples for this to work correctly, which also looks impractical. However, therate of convergence is exponentially fast, so the scheme can be expected to work quite accurately even witha moderate number of samples. Analysis beyond the scope of the current paper is needed to explore thetradeoffs in volume and number of samples (also see Section 7). In this subsection, we present a reaction scheme Proj (short for projection) that takes as input a matrix O with rational entries, and a basis B for the free group ker O ∩ Z n L ≥ and outputs a reversible reaction networkProj( O, B ) that is prime. The same scheme, appropriately initialized, serves to perform M-projection (as weshowed in [13]) and E-projection, as we show here.
Definition 2.
Fix a matrix O = ( o ij ) m × nL with rational entries o ij ∈ Q , and a basis B for the free group Z n ∩ ker O . The reaction network Proj( O, B ) is described by species X , X , . . . , X n and for each b ∈ B thereversible reaction: (cid:80) j : b j > b j X j (cid:10) (cid:80) j : b j < − b j X j Remark 3.
Exquisitely setting the specific rates of individual reactions to desired values requires a detailedunderstanding of molecular dynamics, and is forbiddingly difficult with current molecular technology. Whenwe set rates, we will only require that a given point remains a point of detailed balance. This is equivalent tospecifying the equilibrium constants of all the reactions. This is an equilibrium thermodynamics condition,hence much less forbidding.
Lemma 5.
Fix a matrix O = ( o ij ) m × n with rational entries o ij ∈ Q , and a basis B for the free group Z n ∩ ker O . Then the reaction network Proj( O, B ) is prime.
Proof. [18, Corollary 1.15] establishes this when O is a matrix of integers. Scaling the rational entries tomake them all integers makes no difference to the kernel. Remark 4.
From [12, Theorem 5.2], prime reaction networks are free of catalysis. Catalysts require care toimplement. Ideally a catalyst should act as a switch, so that its absence completely shuts off the catalyzedreaction. In practice, there is always a “leak reaction” [21] even in the absence of the catalyst species.Care needs to be taken that the timescales of the leak are much slower than the timescales of the catalyzedreaction to get an acceptable approximation to the final answer. It is therefore notable that our scheme isable to perform a nontrivial computation even though it admits an implementation wholly free of catalysis.7 xample 5.
Consider the reaction 2 X (cid:10)
0. On the state space Z ≥ , this reaction will preserve the parityof the initial number n of X . This is a case where the intersection of a conservation class C ( n ) with thestate space does not equal the reachability class Γ( n ). It turns out that these “non-benign” situations onlyhappen when the reaction network is not prime. We will use this property when answering Questions 1 and2, so we establish it now. Definition 6.
A weakly-reversible reaction network ( S, R ) is benign iff for all n ∈ Z S ≥ , the conservationclass C ( n ) ∩ Z S ≥ n ), the reachability class of n . Lemma 6.
Every prime reaction network is benign.
Proof.
Let ( S, R ) be a prime reaction network. This means that the associated ideal ( x y − x y (cid:48) ) y → y (cid:48) ∈R isprime. We define the associated lattice as L = (cid:88) y → y (cid:48) ∈R a y → y (cid:48) ( y (cid:48) − y ) | a y → y (cid:48) ∈ Z for all y → y (cid:48) ∈ R . Note from [18] that L is saturated , i.e., if k ∈ Z and v ∈ Z S are such that kv ∈ L then v ∈ L Suppose n , n (cid:48) ∈ Z S ≥ such that n (cid:48) ∈ C ( n ) but n (cid:48) is not reachable from n . The condition n (cid:48) ∈ C ( n )means that there is a rational combination n (cid:48) − n = (cid:88) y → y (cid:48) ∈R b y → y (cid:48) ( y (cid:48) − y )This shows that for some sufficiently large integer M , the quantity M ( n (cid:48) − n ) ∈ L . Since L is saturated, n (cid:48) − n ∈ L . Hence there is an integer combination n (cid:48) − n = (cid:88) y → y (cid:48) ∈R c y → y (cid:48) ( y (cid:48) − y ) . Since ( S, R ) is weakly-reversible, there is a path y (cid:48) ⇒ R y for every y → y (cid:48) ∈ R , and therefore there is acombination over nonnegative integers. This implies that n ⇒ R n (cid:48) . Hence the network is benign. In this section we solve Question 1 using the reaction network Proj(
O, B ).Fix an n L × n R tidy observation matrix O = ( o ij ) n R × n L with non-negative rational entries o ij ∈ Q ≥ ,and message vectors { m i ∈ Z n L ≥ } i =1 , ,...,n R , Poisson rate parameter vector q ∈ R n L ≥ , and number r ∈ Z n R ≥ of receptor binding events observed. Fix a basis B for the free group ker O ∩ Z n L ≥ . Let k q be a function ofrate constants for the reaction network Proj( O, B ) such that q is a point of detailed balance of the reactionsystem (Proj( O, B ) , k q ). For example, the choice k q ( y → y (cid:48) ) = q y (cid:48) satisfies this requirement. Theorem 7.
Consider Stochastic Mass Action for the reaction system (Proj(
O, B ) , k q ) from the initial state n (0) = (cid:80) n R i =1 r i m i . Then the Bayesian Posterior Pr[ l | ( r, Poisson( q ))] is the stationary distribution of thisMarkov chain. Proof.
Let L = (cid:110) l ∈ Z n L ≥ | Ol = r (cid:111) . From Bayes Theorem Pr[ l | ( r, Poisson( q ))] ∝ Prior × Likelihood. Theprior is Poisson( q ) and the likelihood is Pr[ r | l ] = Pr[ Ol = r ] which is the characteristic function on L .Therefore Pr[ l | ( r, Poisson( q ))] ∝ (cid:40) e − q q l l ! for l ∈ L O, B ) is prime, by Lemma 5 and Lemma 6, Proj(
O, B ) is benign. Byconstruction n (0) ∈ L , and so L is the reachability class Γ( n (0)). Applying Theorem 2 to L = Γ( n ) π L ( l ) ∝ (cid:40) e − q q l l ! for l ∈ L l | ( r, Poisson( q ))].8n the following theorem, we show that our reaction scheme has computed an E-Projection. Theorem 8.
Let P := { Probability measure P on Z n L ≥ | P ( l ) = 0 for all l / ∈ L } . Then Pr[ l | ( r, Poisson( q ))]is the E-Projection of Poisson( q ) on P . Proof.
The E-projection of Poisson( q ) onto P is given by P ∗ = arg min P ∈P D ( P || Poisson( q )). We useLagrange multiplier to minimize D ( P || Poisson( q )) with constraints (cid:80) l ∈ L P ( l ) = 1 and P ( l ) = 0 for l / ∈ L . F ( P, λ, µ ) = D ( P || Poisson( q )( l )) + λ (cid:32)(cid:88) l ∈ L P ( l ) − (cid:33) + (cid:88) l/ ∈ L µ l P ( l )At P ∗ , ∂F∂P ( l ) = 0 for all l ∈ Z n L ≥ . That is, log (cid:16) P ∗ ( l )Poisson( q ) (cid:17) + 1 + λ = 0 if l ∈ L and P ∗ ( l ) = 0 if l / ∈ L . That is, P ∗ ( l ) ∝ (cid:40) Poisson( q )( l ) for x ∈ L l | ( r, Poisson( q ))] In this subsection we solve Question 2 using the reaction network Proj(
O, B ). We first characterize theBayesian Posterior Pr[ l | ( (cid:104) r (cid:105) , Poisson( q ))] as an E-projection using a conditional limit theorem. Definition 7.
Fix (cid:104) r (cid:105) ∈ R n R > . Then P (cid:104) r (cid:105) is the set of those probability measures on Z n L ≥ such that if Y is arandom variable distributed according to P ∈ P (cid:104) r (cid:105) then the expected value (cid:104) OY (cid:105) P = (cid:104) r (cid:105) . Theorem 9.
Fix (cid:104) r (cid:105) ∈ R n R > . Then Pr[ l | ( (cid:104) r (cid:105) , Poisson( q ))] is a Poisson distribution, as well as the E-Projection arg min P ∈P (cid:104) r (cid:105) D ( P || Poisson( q )) of Poisson( q ) on P (cid:104) r (cid:105) . Proof.
We apply the Gibbs Conditioning Principle ([9, Theorem 7.3.8]) n R times with a sequence of energyfunctions U , . . . , U n R which iteratively set the expected values of the n R rows of O to the correspondingvalues from (cid:104) r (cid:105) . The intuition is that this is a formal way of doing Lagrange optimization.To show that this result can be applied, we choose the space Σ as R n L , the initial distribution µ = µ as Poisson( q ) on Z n L ≥ and 0 everywhere else, and for i = 1 to n R , we define the function U i : Σ → [0 , ∞ )by U i ( n ) = ( On ) i (cid:104) r i (cid:105) . The sequence of Gibbs distributions are then defined by dµ i +1 dµ i = e − βiUi ( n ) Z βi where Z β i isthe normalizing constant. It is easily checked that each of these is a Poisson distribution since the U i ’s arelinear functions. Since (cid:104) r (cid:105) ∈ R n R > , there is nonzero probability under µ i − that ( Ox ) i < (cid:104) r i (cid:105) for all i . Hencefor i = 1 to n R it follows that µ i − ( { x | U i ( x ) < } ) >
0. The other condition µ i − ( { x | U ( x ) > } ) > Ox ) i can take arbitrarily large integer values with nonzeroprobability. Since the µ i are all Poisson, β ∞ = −∞ since Poisson distributions converge for arbitrarily smallnonegative values of rate parameters. Hence the assumptions of [9, Lemma 7.3.6] are satisfied and we getto apply [9, Theorem 7.3.8] sequentially n R times and conclude that the empirical distribution on the space Z n L ≥ converges weakly to a Poisson distribution µ n R = Poisson( p ∗ ) ∈ P (cid:104) r (cid:105) , which is also the E-projectionarg min P ∈P (cid:104) r (cid:105) D ( P || Poisson( q )).Now fix an n L × n R tidy observation matrix O = ( o ij ) n R × n L with non-negative rational entries o ij ∈ Q ≥ ,and message vectors { m i ∈ Z n L ≥ } i =1 , ,...,n R , Poisson rate parameter vector q ∈ R n L ≥ , and average number (cid:104) r (cid:105) ∈ R n R > of receptor binding events observed. Fix a basis B for the free group ker O ∩ Z n L ≥ . Let k q be afunction of rate constants for the reaction network Proj( O, B ) such that q is a point of detailed balance ofthe reaction system (Proj( O, B ) , k q ). For example, the choice k q ( y → y (cid:48) ) = q y (cid:48) satisfies this requirement. Theorem 10.
Consider the solution x ( t ) to the Deterministic Mass Action ODEs for the reaction system(Proj( O, B ) , k q ) from the initial concentration x (0) = (cid:80) n R i =1 (cid:104) r i (cid:105) m i . Let x ∗ = lim t →∞ x ( t ). Then x ∗ is well-defined, and the Bayesian Posterior Pr[ l | ( r, Poisson( q ))] equals Poisson( x ∗ ). That is, one obtains samplesfrom the Bayesian Posterior by measuring the state of a unit volume aliquot of the system at equilibrium.9 roof. Note that Poisson( x (0)) ∈ P (cid:104) r (cid:105) . Further the reaction vectors span the kernel of O so we have x ∈ C ( x (0)) ∩ R n L > iff Poisson( x ) ∈ P (cid:104) r (cid:105) . By Theorem 9, the distribution Pr[ l | ( r, Poisson( q ))] equals Poisson( y )for some y ∈ R n L > . Further, it is an E-projection so that, among all Poisson distributions in P (cid:104) r (cid:105) , the relativeentropy D (Poisson( y ) || q ) is minimum. By Lemma 1, the E-projection of { Poisson( x ) | x ∈ C ( x (0)) ∩ R n L > } to Poisson( q ) is the Poisson distribution of the E-projection of C ( x (0)) ∩ R n L > to q .By Lemma 5, the reaction network Proj( O, B ) is prime. Further the reaction system (Proj(
O, B ) , k q ) isdetailed balanced with q a point of detailed balance, by assumption. Hence by Theorem 3, the limit x ∗ is well-defined and is the E-projection of C ( x (0)) ∩ R n L > to q . Together we have Pr[ l | ( r, Poisson( q ))] = Poisson( x ∗ ).We can sample from a unit aliquot at equilibrium due to Lemma 4. Various schemes have been proposed to perform information processing with reaction networks, for example,[21, 22] which shows how Boolean circuits and perceptrons can be built, [20] which shows how to implementlinear input/ output systems, [7] exploiting analogies with electronic circuits, [2] for computing algebraicfunctions, etc. Some of these schemes have even been successfully implemented in vitro.Each of these schemes has been inspired by analogy with some existing model of computation. However,reaction networks as a computing platform has some unique opportunities and challenges. It is an inherentlydistributed and stochastic platform. Noise manifests as leaks in catalyzed reactions. We can tune equilibriumthermodynamic parameters, but kinetic-level control is very difficult. In addition, one needs to keep in mindthe tasks that reaction networks are called upon to solve in biology, or might be called upon to solve intechnological applications. Keeping these factors in mind, there is value in considering a scheme whichattempts to uncover the class of problems that is suggested by the mathematical structure of reactionnetwork dynamics.In trying to uncover such a class of problems, we have looked to the ideas of Maximum Entropy orMaxEnt [16] which form a natural bridge between Machine Learning and Reaction Networks. The systematicfoundations of statistics based on the minimization of KL-divergence (equivalently, free energy) go back tothe pioneering work of Kullback [17]. The conceptual, technical, and computational advantages of thisapproach have been brought out by subsequent workers [6, 15, 4]. This work has also been put forwardas a mathematical justification of Jaynes’ MaxEnt principle. Our hope is that those parts of statistics andmachine learning that can be expressed in terms of minimization of free energy should naturally suggestreaction network algorithms for their computation.The link between statistics/ machine learning and reaction networks has been explored before by Nappand Adams [19]. They propose a deterministic mass-action based reaction network scheme to computesingle-variable marginals from a joint distribution given as a factor graph, drawing on “message-passing”schemes. Our work is in the same spirit of finding more connections between machine learning and reactionnetworks, but the nature of the problem we are trying to solve is different. We are trying to estimate a fulldistribution from partial observations. In doing so, we exploit the inherent stochasticity of reaction networksto represent correlations and do Bayesian inference.One previous work which has engaged with stochasticity in reaction networks is by Cardelli et al. [3]. Theygive a reaction scheme that takes an arbitrary finite probability distribution and encodes it in the stationarydistribution of a reaction system. In comparison, we are taking samples from a marginal distribution andencoding the full distribution in terms of the stationary distribution. Thus our scheme allows us to doconditioning and inference.In Gopalkrishnan [13], one of the present authors has proposed a molecular scheme to do MaximumLikelihood Estimation in Log-Linear models. The reaction networks employed in that work are essentiallyidentical to the reaction networks employed in this work, modulo some minor technical differences. Inthat paper, the reaction networks were used to obtain M-projections (or reverse I-projections), and therebyto solve for Maximum Likelihood Estimators. In this paper, we obtain E-projections, and sample fromconditional distributions. The results in that paper were purely at the level of deterministic mass-actionkinetics. The results in this paper obtain at the level of stochastic behavior.10
Discussions
We have shown that reaction networks are particularly well-adapted to perform E-projections. In a previouspaper [13], one of the authors has shown how to perform M-projections with reaction networks. Intuitively,an E-projection corresponds to a “rationalist” who interprets observations in light of previous beliefs, andan M-projection corresponds to an “empiricist” who forms new beliefs in light of observations.Not surprisingly, these two complementary operations keep appearing as blocks in various statistical algo-rithms. Our two schemes should be viewed together as building blocks for implementing more sophisticatedstatistical algorithms. For example, the
EM algorithm works by alternating E and M projections [15]. Ifour two reaction networks are coupled so that the point q is obtained by the scheme in [13], and the initial-ization of the scheme in this paper is used to perturb the conservation class for the M-projection correctly,then an “interior point” version of the EM algorithm may be possible, though perhaps not with detailedbalance but in a “driven” manner reminiscent of futile cycles.We have illustrated how E-projections might apply to the situation of an artificial cell trying to infer itsenvironment from partial observations. We are acutely aware that our illustration is far from complete. Amore sophisticated algorithm would work in an “online” fashion, adjusting its estimates on the fly to eachnew receptor binding event. This certainly appears within the scope of the kind of schemes we have outlined,but more careful design and analysis is necessary before formal theorems in this direction can be shown.Also we think it likely that the schemes that will prove most effective will work neither purely in the regimeof the first scheme, nor purely in the regime of the second scheme, but somewhere in between. How long atime window they average over, and how large a volume is optimal, and how these choices tradeoff betweensensitivity and reliability, these are questions for further analysis.One glaring gap in our narrative is that we require the internal species X i to be as numerous as theoutside ligands L i . A much more efficient encoding of ligand population vectors should be possible, drawingon ideas from graphical models, so that the number of representing species need only be a logarithm of thenumber of ligands being represented. Moreover it may be possible to perform E and M projections directlyon these graphical model representations.Our constructions and results of Section 5.1 were carried out for arbitrary matrices with rational numberentries. We only used the assumption of “tidy” matrices to set initial conditions in Theorems 7, 10. If someother method of setting initial conditions correctly is available, for example by performing matrix inversionswith a reaction network, then the technical condition of tidy matrices can be dropped. In defence of theassumption that our observation matrices are tidy, it is not inconceivable that through evolution a biologicalcell would have evolved its receptors so that the affinity matrix allows for simple meaningful messages to betransmitted inside the cell.Note that the mathematics does not require the restriction of the affinities o ij to nonnegative rationalnumbers. We could have admitted negative numbers, and all our results would go through. References [1] David F Anderson, Gheorghe Craciun, and Thomas G Kurtz. Product-form stationary distributions fordeficiency zero chemical reaction networks.
Bulletin of mathematical biology , 72(8):1947–1970, 2010.[2] HJ Buisman, Huub MM ten Eikelder, Peter AJ Hilbers, and Anthony ML Liekens. Computing algebraicfunctions with biochemical reaction networks.
Artificial life , 15(1):5–19, 2009.[3] Luca Cardelli, Marta Z. Kwiatkowska, and Luca Laurenti. Programming discrete distributions withchemical reaction networks.
CoRR , abs/1601.02578, 2016.[4] N.N. Cencov.
Statistical Decision Rules and Optimal Inference . Translations of mathematical mono-graphs. American Mathematical Society, 2000.[5] Gheorghe Craciun. Toric differential inclusions and a proof of the global attractor conjecture. arXivpreprint arXiv:1501.02860 , 2015. 116] Imre Csisz´ar, Paul C Shields, et al. Information theory and statistics: A tutorial.
Foundations andTrends R (cid:13) in Communications and Information Theory , 1(4):417–528, 2004.[7] Ramiz Daniel, Jacob R Rubens, Rahul Sarpeshkar, and Timothy K Lu. Synthetic analog computationin living cells. Nature , 497(7451):619–623, 2013.[8] Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of stochasticmodelling and applied probability, 2010.[9] Paul Dupuis and Richard S Ellis.
A weak convergence approach to the theory of large deviations , volume902. John Wiley & Sons, 2011.[10] Martin Feinberg. On chemical kinetics of a certain class.
Arch. Rational Mech. Anal. , 46, 1972.[11] Martin Feinberg. Lectures on chemical reaction networks. , 1979.[12] Manoj Gopalkrishnan. Catalysis in Reaction Networks.
Bulletin of Mathematical Biology , 73(12):2962–2982, 2011.[13] Manoj Gopalkrishnan. A scheme for molecular computation of maximum likelihood estimators forlog-linear models. In
International Conference on DNA-Based Computers , pages 3–18. Springer, 2016.[14] Friedrich J. M. Horn. Necessary and sufficient conditions for complex balancing in chemical kinetics.
Arch. Rational Mech. Anal. , 49, 1972.[15] Shun ichi Amari.
Information Geometry and Its Applications . Springer, 7th edition, 2016.[16] Edwin T Jaynes. Information theory and statistical mechanics.
Physical review , 106(4):620, 1957.[17] Solomon Kullback.
Information theory and statistics . Courier Corporation, 1997.[18] Ezra Miller. Theory and applications of lattice point methods for binomial ideals. In
CombinatorialAspects of Commutative Algebra and Algebraic Geometry , pages 99–154. Springer, 2011.[19] Nils E Napp and Ryan P Adams. Message passing inference with chemical reaction networks. In
Advances in Neural Information Processing Systems , pages 2247–2255, 2013.[20] Kevin Oishi and Eric Klavins. Biomolecular implementation of linear I/O systems.
Systems Biology,IET , 5(4):252–260, 2011.[21] Lulu Qian and Erik Winfree. A simple DNA gate motif for synthesizing large-scale circuits.
J. R. Soc.Interface , 2011.[22] Lulu Qian and Erik Winfree. Scaling up digital circuit computation with DNA strand displacementcascades.
Science , 332(6034):1196–1201, 2011.[23] Peter Whittle.