Implementing Non-Equilibrium Networks with Active Circuits of Duplex Catalysts
IImplementing Non-Equilibrium Networks withActive Circuits of Duplex Catalysts
Antti Lankinen
Department of Bioengineering, Imperial College London, London, SW7 2AZ, United [email protected]
Ismael Mullor Ruiz
Department of Bioengineering and Imperial College Centre for Synthetic Biology, Imperial CollegeLondon, London, SW7 2AZ, United [email protected]
Thomas E. Ouldridge
Department of Bioengineering and Imperial College Centre for Synthetic Biology, Imperial CollegeLondon, London, SW7 2AZ, United [email protected]
Abstract
DNA strand displacement (DSD) reactions have been used to construct chemical reaction networksin which species act catalytically at the level of the overall stoichiometry of reactions. These effectivecatalytic reactions are typically realised through one or more of the following: many-stranded gatecomplexes to coordinate the catalysis, indirect interaction between the catalyst and its substrate,and the recovery of a distinct “catalyst” strand from the one that triggered the reaction. Thesefacts make emulation of the out-of-equilibrium catalytic circuitry of living cells more difficult. Here,we propose a new framework for constructing catalytic DSD networks: Active Circuits of DuplexCatalysts (ACDC). ACDC components are all double-stranded complexes, with reactions occurringthrough 4-way strand exchange. Catalysts directly bind to their substrates, and and the “identity”strand of the catalyst recovered at the end of a reaction is the same molecule as the one thatinitiated it. We analyse the capability of the framework to implement catalytic circuits analogous tophosphorylation networks in living cells. We also propose two methods of systematically introducingmismatches within DNA strands to avoid leak reactions and introduce driving through net basepair formation. We then combine these results into a compiler to automate the process of designingDNA strands that realise any catalytic network allowed by our framework.
Hardware → Biology-related information processing
Keywords and phrases
DNA strand displacement, Catalysis, Information-processing networks a r X i v : . [ q - b i o . M N ] M a y Active Circuits of Duplex Catalysts
DNA is an attractive engineering material due to the high specificity of Watson-Crickbase pairing and well-characterised thermodynamics of DNA hybridisation [13, 40], whichgive DNA the most predictable and programmable interactions of any natural or syntheticmolecule [43]. DNA computing involves exploiting these properties to assemble computationaldevices made of DNA. The computational circuits are typically realised using DNA stranddisplacement (DSD) reactions, in which sections of DNA strands called domains with partialor full complementarity hybridise, displacing one or more previously hybridised strands inthe process [55]. DSD is initiated by the binding of short complementary sequences called toeholds . It is helpful to divide DSD reactions into a few common reaction steps, including:binding, unbinding, and three- or four-way strand displacement and branch migration, shownin Figure 1. DSD is an attractive scheme for computation as it can be used as a mediumin which to realise chemical reaction networks (CRNs) [44], which provide an abstractionof systems exhibiting mass-action chemical kinetics and have been shown to be Turingcomplete [27]. DSD is then Turing complete as well [35, 52]. DSD has been used to construct,for example, logic circuits [34,42], artificial neural networks [9,17,38], dynamical systems [46],catalytic networks [8,36,56], and other computational devices [1,53]. To facilitate testing andrealisation of DSD systems, domain-level design tools [23, 45] as well as domain-to-sequencetranslation [54] software have been introduced.While DNA nanotechnology is concerned with using DNA as a non-biological material,a key goal of DNA nanotechnology is the imitation and augmentation of cellular systems.It is therefore worth considering how these natural systems typically perform computationand information processing. One ubiquitous biological paradigm for signal propagationand processing is the catalytic activation network, as exemplified by kinases [20, 28, 29].Kinases are catalysts that modify substrates by phosphorylation and consume ATP inthe process. These substrates can be, for example, transcription factors, but can also bekinases themselves that are either activated or deactivated by phosphorylation. The oppositefunction, dephosphorylation, is performed by phosphatases [4]. The emergent catalyticnetwork then performs information propogation or computation by converting species, kinasesand phosphatases, between their active and passive states. Kinase cascades are featured inmany key biological functions, such as cellular growth, adhesion, and differentiation [28, 51]and long-term potentiation [47].The fuel-consuming, catalytic nature of these circuits is vital in allowing them to performfunctions such as signal splitting, amplification, time integration and insulation [5,12,18,30,31].Moreover, since the key molecular species are recovered rather than consumed by reactions,catalytic networks can operate continuously, responding to stimuli as they change over time- unlike many architectures for DSD-based computation and information processing thatoperate by allowing the key components to be consumed [1, 9, 38]. This ability to operatecontinuously is invaluable in autonomous environments such as living cells.In this work, we propose a minimal mechanism for implementing reaction networks ofmolecules that exist in catalytically active and inactive states, a simple abstraction of naturalkinase networks. In these catalytic activation networks, we implement arbitrary activationreactions of the form A + B + P i F i → A + B + P i W i . Here, the active catalyst A drives B between its inactive and active states by the conversion of fuel molecules { F i } into waste { W i } . Equivalent deactivation reactions in which an active catalyst deactivates a substrateare also considered.The rest of this paper is organised as follows. In Section 2, we propose and motivate . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 3 aa*a*a (a) Bind aa* aa* (b)
Unbind b bb*a*aaa* b bb* (c) Displace (3-way) b*a* ba ee * b* d*b d c * c a*a beb * e * d*d c * b * c b (d) Branch migrate (4-way)
Figure 1
Basic reaction steps in the DSD formalism, as represented by Visual DSD [23]. Eachdomain is represented by a letter and a colour. "*" denotes the Watson-Crick complement. Thebarbed end of a strand indicates the 3’ end. the concept of a direct bimolecular catalytic reaction and consider the necessary conditionsfor DSD species that are able to perform such reactions. Section 3 introduces a novel DSDframework to implement these reactions, and its computational properties are analysedin Section 4. Based on these findings, we propose a systematic method of introducingmismatched base pairs within species in our framework to improve its function in Section 5.We combine our findings and propositions into a software to automate the sequence-leveldesign of any CRN that is realisable within our framework, and detail this software in Section6. In Section 7, we discuss our framework, findings, and future work. We conclude the paperin Section 8.
In kinase cascades, functional changes in substrates are a result of direct binding of thecatalyst to the substrate. Moreover, the essential products of the reaction (the activatedsubstrate and recovered catalyst) are the same molecules that initially bound to each other- albeit with some modification of certain residues, or turnover of small molecules such asATP or ADP to which they are bound. Motivated by these facts, we propose the followingdefinition for a direct bimolecular catalytic activation reaction. (cid:73)
Definition 1 (Direct bimolecular catalytic activation) . Consider the (non-elementary) reac-tion A + B + X i F i → A + B + X i W i , where A catalyses the conversion of inactive B to active B , using ancillary fuels { F i } and producing waste { W i } . The overall reaction is a direct bimolecular catalytic activationreaction if and only if: The reaction is initialised with the interaction of A and B . The A and B molecules have molecular cores that are retained in the products A and B , rather than the input molecules being consumed and distinct outputs released.Deactivation reactions have an equivalent form, but convert B to B . If the same overallreaction stoichiometry is implemented differently, the reaction is a pseudocatalytic bimolecularactivation reaction.Direct bimolecular catalytic (de)activation reactions have some important functionalproperties. The first is that, if the first step of the reaction requires the presence of A and B , Active Circuits of Duplex Catalysts c bc* b* eb* Gate:Output c b eOutput c bc* b* db* Gate:Fuel c* b*c bb* aGate:Input c b dFuela b cInput
Figure 2
Catalytic reaction using a seesaw gate [19, 36]. Reactants are shown in bold boxes; theinput acts pseudocatalytically to “convert" the fuel into an output, with ancillary gate complexesconsumed and produced. Each compound reaction is illustrated by a small square, and consists ofsequential bind, displace, and unbind reactions. All reactions are reversible; open arrows indicatereactions proceeding forwards, and closed arrows by reactions proceeding backwards. nothing can happen unless both molecules are present. In pseudocatalytic implementations,as we discuss below, it is possible to produce activated B or sequester A even if no B ispresent, violating the logic of activation-based networks. The second is that the persistenceof a molecular core of both the substrate and the catalyst allows either or both to be localisedon a surface or scaffold, as is observed for some kinase cascades in living cells [14, 41, 50] andis often proposed for DNA-based systems [6, 7, 37, 39, 48].A number of DNA computing frameworks have been developed to implement reactions ofthe stoichiometry of Definition 1. The simplest, illustrated in Figure 2 (a), involves a two-stepseesaw gate [19, 36]. An input molecule ( A in Definition 1) binds to a gate-output complex( F ), releasing the output ( B ). The input is then displaced by a molecule conventionallydescribed as the fuel, but fulfilling the role of B from Definition 1 in the context of catalysis,recovering A and generating a waste duplex ( W ). Although the A strand recovered atthe end of the process is the same one that initiated the process, the B and B moleculesare distinct and the reaction is not initiated by the binding of A and B ; it is thereforepseudocatalytic.This pseudocatalysis can have important consequences. If a small quantity of input A isadded to a solution containing the gate-output complex F but no B , a large fraction of A issequestered and a corresponding amount of B is produced. This sequestration of A andproduction of B from nothing violates the logic of ideal catalytic activation networks.More complex strategies to implement reactions of the stoichiometry of Definition 1 usingDSD exist [8, 35]. These approaches rely on the catalyst and substrate ( A and B fromDefinition 1) interacting with a gate, rather than binding to each other, and the recoveredcatalyst and product are separate strands - the reactions are therefore pseudocatalytic. Incertain limits, these strategies can approximate a mass-action dependence of reaction rates onthe concentrations of A and B [8, 33], providing a better approximation to the logic of idealcatalytic activation circuits than the simple seesaw motif. The price, however, is the need toconstruct large multi-stranded gate complexes to facilitate the reaction; the complexity ofthese motifs is a major barrier to implementing such systems in autonomous setting such asliving cells. Moreover, localising catalysts and substrates to a scaffold or surface remainschallenging when the molecules themselves are not recovered.We now consider how to design minimal DSD-based units that implement direct bio-molecular catalytic (de)activation in catalytic activation networks. If the core of the substratespecies B must be retained in the product B , B and B cannot simply be two strands . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 5 with a slightly different sequence. Instead, B and B must either be distinct complexes ofstrands, in which at least one strand is common, or have different secondary structure withina single strand, or both. To avoid complexities in balancing the thermodynamics of hairpinloop formation with bimolecular association, and suppressing the kinetics of unimolecularrearrangement, we do not pursue the possibility of engineering metastable secondary structurewithin a strand. At least one of B and B must therefore consist of at least two strands.Moreover, since each activation state of each species must be a viable substrate in an arbitrarycatalytic (de)activation network, the simplest approach that allows for a generic catalyticmechanism is to implement all substrate/catalyst species as two-stranded complexes. We introduce the Active Circuits of Duplex Catalysts (ACDC) scheme to implement catalyticactivation networks through direct bimolecular catalytic (de)activation. Each reaction hasthree inputs: a substrate, a catalyst, and a single fuel complex. The outputs are a modifiedsubstrate, the recovered catalyst and a waste complex. The domain-level structures of thesespecies are shown in Figure 3.Substrates and catalysts – hereafter referred to as major species – are anatomicallyidentical. Each consists of two strands, each of which has one central long domain ( ∼ ∼ identity strand and the state strand . The identity strand is thepreserved molecular core; the state strand specifies the activation state of a major species ata particular time (specifically, through the domain at its 5 end - labelled “a” in Fig. 3).The two strands in a major species are bound by three central domains; the outer toeholds at either end of the strands are available (unbound). Major species thus contain two interfaces at either end of the molecule, both displaying two available toeholds, one on each constituentstrand. The inner toeholds , which are bound in major species, are described as hidden . Wecall the interface at the 5’ end of the state strand and the 3’ end of the identity strand the downstream interface and the interface with the 3’ end of the state strand and 5’ end of theidentity strand the upstream interface.All other two-stranded species in ACDC, including fuel and waste species, are describedas ancillary species . They have a distinct structure from major species, but are identicalto each other (Figure 3). Ancillary species also consist of two strands of five domains, butare bound by the central long domain and two shorter flanking toeholds (one outer toeholdand one inner toehold) on one side. They therefore possess just one interface of availabletoeholds, but this interface presents two contiguous available toeholds on each strand.The catalytic reaction of a single ACDC unit proceeds as shown in Figure 4. Thedownstream interface of the catalyst A and upstream interface of the substrate B bindtogether through recognition of all four available toeholds in the relevant interfaces. Theresultant complex undergoes a 4-way branch migration, with the base pairs between the stateand identity strand of the substrate and catalyst being exchanged for base pairs betweenthe two state strands and the two identity strands. After the exchange of a hidden toeholdand the central binding domain, the 4-stranded complex is held together by only two innertoeholds on either side of a 4-way junction. Dissociation by spontaneous detachment ofthese toeholds creates two ancillary product species, a waste W AB → B and an intermediatecomplex AB . The sequence of these three reactions is called the reaction [21].The fuel F AB → B is identical to the waste, except for a single toehold. This toehold corres-ponds to the outer toehold of the state strand of B from the downstream interface. F AB → B Active Circuits of Duplex Catalysts c*b* cb g * d*d f * e a State strandIdentity strand UpstreaminterfaceDownstreaminterface Inner toeholdsOuter toeholds (a)
Major species e* d* c*e d c h * i * b j (b) Ancillary species
Figure 3 (a) Topology of major species in the ACDC system (substrates or catalysts), illustratingupstream and downstream interfaces, and inner and outer toeholds. The long central domain forms astable binding duplex. (b) Topology of ancillary species (fuel, waste or substrate-catalyst complex). and AB can undergo another reaction, producing B ( B , but with a single domainchanged in the downstream interface) and recovering the catalyst. With the downstreaminterface of substrate B changed into that of B , the substrate has been activated and couldact as a catalyst to another reaction, provided that an appropriate downstream substrate andfuel were present. An equivalent catalytic process could trigger another reaction converting B to B , deactivating B , analogous to dephosphorylation by a phosphatase.The basic ACDC unit in Figure 4 satisfies the criteria for direct bimolecular catalyticactivation, since the reaction is initiated by the binding of A and B , and the identity strandsin the major species are retained throughout. ACDC relies on the experimentally-verifiedmechanism of toehold-mediated 4-way branch migration [10, 22, 25, 49]. The number of basepairs and complexes is unchanged by each reaction, and therefore a bias for clockwiseactivation cycles (as opposed to anticlockwise deactivation) would require a large excess offuel complexes F AB → B relative to waste W AB → B . In addition, for a single catalytic cycleto operate as intended, the following assumptions must hold: (cid:73) Assumption 2 (Stability of complexes) . It is assumed that strands bound together by longdomains are stable and will not spontaneously dissociate. It is also assumed that if twostrands are bound by a pair of complementary domains, any adjacent pairs of complementarydomains that could bind to form a contiguous duplex are not available. (cid:73)
Assumption 3 (Detachment of products) . It is assumed that 4-stranded complexes boundtogether by two pairs of toehold domains either side of a junction can dissociate into duplexes. (cid:73)
Assumption 4 (Need for two complementary toeholds to trigger branch migration) . It isassumed that if a 4-stranded complex is formed by the binding of a single pair of toeholddomains, it will dissociate into product duplexes, rather than undergo branch migration.
Assumption 2 ensures that the system keeps its duplex-based structure, and that toeholdsare well hidden in complexes when required. Assumption 3 is necessary to avoid all speciesbeing sequestered into 4-stranded complexes. Note that the assumption is not that detachmentmust happen extremely quickly, since such 4-stranded complexes need to be metastableenough to initiate branch migration with reasonable frequency. It is equivalent to the need forsingle toeholds to detach in 3-way toehold exchange reactions [36]. In practice, toehold lengthand conditions such as temperature could be tuned to optimize the relative propensity for . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 7 h c dh* c* d* j f i * e * B d* c* b*d c b f * g * e a A' f* d* c*f d c b * g * h k F AB → B' h c dh* c* d* k f i * e * B' d * c * b * d c b g * a ff* d * c * h * d c h i * k e*e b * b g * a c d fc* d* f* h * h i * k e* d* c*e d c f* d* c*f d c b * g * h j W AB → B' e* d* c*e d c h * i * b a AB b * b g * a c d fc* d* f* h * h i * j e* d* c*e d c d * c * b * d c b g * a ff* d * c * h * d c h i * j e*ebu mm ubbummub Figure 4
A basic ACDC reaction unit A + B + F AB → B → A + B + W AB → B , as representedby Visual DSD [23]. Inputs to the reaction are shown in bold, and each small box corresponding toa reaction step is labelled with b/u (bind/unbind) or m (migrate). Imbalances in the concentrationof fuel and waste drive the reaction clockwise (the direction indicated by open arrows). branch migration and detachment. Given a reasonable balance between branch migration anddetachment, Assumption 4 – which enables the switching of B and B to have a downstreameffect – is also likely to be satisfied. Larger catalytic activation networks can be constructed from the basic ACDC units ofFigure 4, since the activated substrate B can itself act as a catalyst. Let A → B be ashorthand for the reaction A + B + F AB → B → A + B + W AB → B and C a B a shorthandfor the reaction C + B + F CB → B → C + B + W CB → B . Then, any potential catalyticactivation network can be represented as a weighted directed graph, where nodes representcatalyst/substrate species and edges represent activation (edge weight 1) or deactivation(edge weight -1). Is it possible to realise any such graph using ACDC? (cid:73) Assumption 5 (Toehold orthogonality) . We assume that there are sufficiently many toeholddomain sequences that cross-talk between non-complementary domains is negligible.
Since ACDC components share a long central domain, specificity is entirely driven throughtoehold recognition. As noted by Johnson, [21], there is a finite number of orthogonal short
Active Circuits of Duplex Catalysts (a)
Split (b)
Integrate (c)
Cascade (d)
Auto-activation loop (e)
Bidirectionaledge (f)
Feedback loop (g)
Feedforwardloop
Figure 5
Minimal example motifs of interest in a catalytic activation network. toehold domains that limits the size of the connected network that can be constructed.We assume that the network of interest does not violate this limit. We instead ask therealisability question at the level of domains. (cid:73)
Definition 6 (Realisability) . A catalytic activation network is realisable using the ACDCframework if a domain structure for a set of strands can be specified such that: All network reactions are represented by a basic ACDC unit. No two species possess domains that allow a 2r-4 reaction (a full four-way strand exchange)that preserves the number of bound domains and is initiated by the binding of two availableand complementary pairs of toeholds, unless the reaction is part of an ACDC unitrepresenting a reaction in the network. No two strands can form an uninterrupted duplex of four bound domains or more. No two species (including all wastes, fuels and catalyst-substrate complexes) possess twoavailable toehold pairs that could form a contiguous complementary duplex.
Condition 2 rules out reactions that respect the architecture of ACDC, but which involvereactants that are not intended to interact. Condition 3 rules out strand exchange reactionsthat allow an increase in the number of bound domains, which would sequester additionaltoeholds and violate the ACDC architecture (it is assumed that strand exchange reactionsthat would reduce the number of bound domains can be neglected). Condition 4 rules out theformation of 4-stranded complexes that can only dissociate by disrupting an uninterruptedtwo-toehold duplex. Contiguous duplexes of this kind are potentially stable, even if theycannot undergo strand exchange, and would potentially sequester components. (cid:73)
Theorem 7 (Realisability with activation implies realisability with deactivation ) . If a catalyticactivation network with purely activation reactions is realisable using the basic ACDC form-alism, it is also realisable using the basic ACDC formalism if any subset of those reactionsare converted to deactivation.
Proof.
A deactivation reaction is simply an activation reaction with the role of the fuel andwaste reversed. Therefore a domain structure specification that realises a given network withactivation reactions also realises all networks of the same structure. (cid:74)
Since there are infinitely many networks, we restrict our analysis to a set of motifs (generalisedversions of the minimal examples depicted in Figure 5), establishing whether these motifs . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 9 q* c* m*q c m p * o * r n D m c im* c* i* l k n * j * C i* c* b*i c b h * a * j g B d*c*b* dcb f * g * e a A' (a) Major species i k i * k * c * m * o * c m o p * q * a b ca* b* c* d e i k i * k * c * m * o * c m o s * q * a b ca* b* c* d e c * m * n * c m n r q i ji * j * g b cg* b* c* d * f * bb b o m co* m* c* i k q * p * W CD → D' o m co* m* c* i k q * s * F CD → D' a b ca* b* c* d e i * k * F AB → B' n m cn* m* c* q r i * j * CD g b cg* b* c* i j d * f * AB (b) Ancillary species and unwanted reactions
Figure 6
Major species and a subset of ancillary species from an implementation of A → B → C → D using the ACDC formalism. Three unwanted reactions occur between the shown ancillaryspecies. can be realised in isolation. The split , integrate cascade , self-activation , bidirectional edge , feedback loop (FBL), and feedforward loop (FFL) are chosen because of their importance inbiology and synthetic biology [2, 15, 16]. The proofs of theorems not explicitly given in thissection are provided in Appendix B. Theorems 8 and 9 establish that arbitrarily complex split and integrate motifs are realisable. (cid:73)
Theorem 8 (Split motifs are realisable) . Consider the set of N reactions A → B A → B . . . A → B N , in which all B i are distinct nodes from A . Such a network is realisable for any N ≥ . (cid:73) Theorem 9 (Integrate motifs are realisable) . Consider the set of N reactions A → B A → B . . . A → B, in which all A i are distinct nodes from B . Such a system is realisable for any N ≥ . Although all networks consist of simply combining split and integrate motifs for eachnode, proving that all split and integrate motifs are realisable in isolation does not prove thatany network assembled from them is realisable. We therefore explore other simple motifs.For example, consider the cascade motif (a 3-component example is illustrated in Figure 5). (cid:73)
Lemma 10 (The ancillary species of a catalyst’s upstream reactions and substrate’s downstreamreactions cause leak reactions) . Consider a reaction B → C , and further assume that A → B and C → D for at least one species A and at least one species D . Then AB and CD ,and F AB → B and F CD → D / W CD → D possess two available toehold pairs that could form acontiguous complementary duplex. No other violations of realisability occur. An example is shown in Fig.6. The essence of the problem is that both the inner and outertoehold domains from the downstream end of B are available in AB and F AB → B , and theinner and outer toehold domains from the upstream end of C are available in CD , F CD → D and W CD → D . Since the downstream end of B is complementary to the upstream end of C ,the result is that the species can bind to each other strongly. (cid:73) Theorem 11 (Cascades with N ≥ components are not realisable) . Consider the set of N reactions A → A , A → A ... A N − → A N , in which all A i are distinct. For N ≥ , thisnetwork is not realisable. Proof.
A direct consequence of Lemma 10 and Definition 6. (cid:74)(cid:73)
Theorem 12 (Cascades with N ≤ components are realisable) . The set of reactions A → A , A → A , A → A , in which all A i are distinct, is realisable. Proof.
A direct consequence of Lemma 10 and Definition 6. (cid:74)(cid:73)
Theorem 13 (Long cascades are non-realisable due to a particular type of leak reaction only) . Consider the set of N reactions A → A , A → A ... A N − → A N , in which all A i aredistinct. This network would be realisable if reactions between ancillary species A i A i +1 and A i +2 A i +3 , and F A i A i +1 → A i +1 and F A i +2 A i +3 → A i +3 / W A i +2 A i +3 → A i +3 , were absent. The result of Theorem 11 is discouraging, since cascades are a major feature of kinasenetworks [20, 29]. Nonetheless, we will continue the analysis of remaining motifs, and presenta potential solution in Section 5.
A network possesses a loop if it is possible to traverse a path that begins and ends at thesame node without using the same edge twice. For the purposes of this classification, agiven (directed) edge can be traversed in either direction. Loops are common components ofnatural networks, providing the possibility of oscillation, bistability and filtering [2, 11]. (cid:73)
Theorem 14 (Loops of odd length are not realisable) . Consider a system of reactions A ↔ A ↔ A . . . A N − ↔ A , where ↔ indicates a catlytic activation in either direction.This network is not realizable if N is odd, unless the long central domain is self-complementary. Proof.
ACDC circuits require that the long central domain alternates between a sequenceand its complement in the identity strands of catalysts and their substrates. If N is odd,then the sequence must be self-complementary for this alternation to happen. (cid:74) Introducing a self-complementary central domain is a strategy that risks a competitionbetween duplexes and single-stranded hairpins. We do not consider it further. (cid:73)
Theorem 15 (Self interactions and bidirectional edges are not realisable) . Consider a systemof reactions A → A → A . . . A N − → A . This network is not realisable if N ≤ . . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 11 The ACDC system is not inherently suited to auto-activation or bidirectional interactions.These motifs require complementarity between both the downstream and upstream toeholdsof either a single species, or two species. Strands in the system therefore violate condition 3of Definition 6 and will tend to hybridise to form fully complementary duplexes.An isolated feedback loop is a network of size N with a single directed path around thenetwork. A simple example of length 3 is shown in Fig. 5(f). (cid:73) Theorem 16 (Feedback loops are not realisable) . Consider the feedback loop A → A → A . . . A N − → A . Such a system is not realisable for any N . Proof.
A direct consequence of Theorems 11, 14, and 15. (cid:74)
As a consequence of Theorems 14 and 15, any realisable feedback loop must have N ≥ n and reaction n + 2. (cid:73) Theorem 17 (Long feedback loops with an even number of units are non-realisable due to aparticular type of leak reaction only) . Consider the feedback loop A → A A → A . . . A N − → B N A N → A For N even, N ≥ , this network would be realisable if reactions between ancillary species A i A i +1 and A i +2 A i +3 , and F A i A i +1 → A i +1 and F A i +2 A i +3 → A i +3 / W A i +2 A i +3 → A i +3 , were absent.Here, the index j in A j should be interpreted modularly: A j = A j − N for j > N . An isolated feedforward loop is a network of size N with two directed paths from onenode i to another node j . Every other node appears exactly once in one of these paths. Anexample with path lengths of 1 and 2 is shown in Figure 5. (cid:73) Theorem 18 (The relative lengths of paths are constrained in feedforward loops) . Considerthe generalised feedforward loop A → B B → B . . . B N − → B N B N → DA → C C → C . . . C M − → C M C M → D For such a network to be realisable, it is necessary that N ≥ , M ≥ , and N − M is even. The constraint on the relative length of loops arises from Theorem 14. Feedforward loopsinvolving paths with no intermediates are not realisable due to the existence of unintendedstrand exchange reactions within the path that contains intermediates.Since each path in a feedforward loop is a cascade, Theorems 11 and 18 imply that onlyfeedforward loops with a single intermediate in each branch are realisable. (cid:73)
Theorem 19 (Realisability of feedforward loops) . Consider the generalised feedforward loop A → B B → B . . . B N − → B N B N → DA → C C → C . . . C M − → C M C M → D Such a system is realisable if and only if N = 1 and M = 1 . Proof.
As a consequence of Theorems 8, 9, 11, 12, and 18, all other FFLs are not realisable.The realisability of the FFL with N=1 and M=1 can be verified by inspection. (cid:74)
Typically, feedforward loops use branches of different lengths to achieve a complexresponse to a signal over time [2,11]. Such networks are not realisable. Indeed, our analysis ofvarious motifs has revealed that the majority are not realisable. Broadly speaking, there are anumber of small motifs (eg. auto-activation, bi-directional reactions, feedforward loops withno intermediates in one branch) that cannot be achieved because the major species themselvesinteract directly. In addition, loops of odd total length are not realisable due to the natureof complementary base pairs. However, most motifs are ruled out because of a single type ofinteraction, between the ancillary species in one reaction and the ancillary species in anotherreaction that occurs two steps downstream. In Section 5, we propose a strategy to overcomethis last problem, massively increasing the scope of the ACDC framework.
The most severe limitation of the ACDC system detailed in Section 3 is expressed by Theorem11. Long cascades, and loops incorporating cascades, are non-realisable due to interactionsbetween ancillary species of a given reaction, and ancillary species of a reaction separated bytwo catalytic steps (Theorem 13). (cid:73)
Assumption 20 (Mismatches destabilise complexes held together by two contiguous toeholddomains) . We assume that a single mismatched C-C or G-G base pair, positioned adjacent tothe interface of two toehold domains, is sufficiently destabilizing that an unwanted complexformed only by the binding of these toehold domains no longer precludes realisability.
The basic design of the ACDC motif assumes that toehold binding is relatively weak; twotoehold domains on either side of a junction must be able to dissociate by Assumption 2.Individual C-C or G-G mismatches are known to be highly destabilising [40], and shouldsimilarly allow for two contiguous domains to detach. Given Assumption 20, the challengeis then to systematically introduce mismatches so that all interactions between ancillaryspecies identified in Theorem 13 are compromised by a mismatch, without compromisingintended circuit activity. Our full scheme is visualised in Figure 7. (cid:73)
Definition 21 (Mismatches proposed to destabilize unintended complexes) . We propose thefollowing mismatches. We propose that the upstream interface of every major species is made distinct for activeand inactive states. Specifically, we introduce a G base at the inner edge of the outertoehold domain of the state strand of the inactive species, and a C base in the sameposition for the active species. Catalysts that (de)activate that species possess a C(G) inthe complementary position of their downstream interface. We introduce a C-C mismatch at the outer edge of the inner toehold domain at thedownstream interface of each major species. This mismatch is eliminated in the formationof waste complexes, and retained in the substrate-catalyst complexes. (cid:73)
Assumption 22 (Mismatches cannot cause leak reactions) . We assume that the sequenceconstraints introduced by mismatch inclusion do not violate Assumption 5, and that thedestabilisation of duplexes does not violate Assumption 2.
In practice, mismatches will likely result in some increase in the rate of interactions betweenotherwise hidden toeholds; we assume that these rates remain negligible. (cid:73)
Theorem 23 (Mismatches successfully destabilize unintended complexes) . The schemeproposed in Definition 21 satisfies the following: . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 13 W AB → B' F AB → B' AB NNNNNCNNNN
NNNNNGNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCNNNN a b h *i* d d * e cc* e * NNNNNCNNNN
NNNNNGNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCNNNN k b h *i* cc* d e d * e * NNNNNCNNNN
NNNNNCNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNN j h b *g* cc* d fd * f* e * d * c* h * i* j h cd f BA'B' a b c d e f* d *c*b * g* NNNNNC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
NNNNNC GNNNN
NNNNNNNNNCC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
NNNNNC NNNNN
NNNNNk b c d e f* d *c*b * g* NNNNNC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
NNNNNC CNNNN
NNNNN (a) A → B W CB → B F CB → B CB NNNNNCNNNN
NNNNNGNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGNNNN k b l *m* d d * e cc* e * NNNNNCNNNN
NNNNNGNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGNNNN a b l *m* cc* d e d * e * NNNNNCNNNN
NNNNNCNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNN n l b *g* cc* d fd * f* B a b c d e f* d *c*b * g* NNNNNC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
NNNNNC GNNNN
NNNNN C' e * d * c* l * m* n l cd f NNNNGC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
NNNNNC NNNNN
NNNNN B' k b c d e f* d *c*b * g* NNNNNC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC
NNNNNC CNNNN
NNNNN (b) C a B Figure 7
Illustration of the proposed mismatch schemes for reactions A → B and C a B ,assuming toeholds of length 5 nucleotides and central domains of length 23 nucleotides. Specificmismatched bases are highlighted in red, and the same bases are highlighted in green when not partof a mismatch. The domains are separated with ticks on each species, and upstream interfaces ofthe major species are shown on the right of each diagram. All motifs that are realisable in the mismatch-free ACDC design remain realisable in themismatch-based scheme. Cascades of arbitrary length N with at most the first and last reactions deactivating arerealisable; Feedback loops with N even and N ≥ in which all reactions are activating are realisable; Feedforward loops with N ≥ , M ≥ , N − M even, in which at most the first and lastreactions are deactivating in each branch, are realisable. The proof for Theorem 23 is given in Appendix B.Note that the introduction of mismatches proposed in Definition 21 invalidates Theorem7, since the downstream domains of activating and deactivating catalysts are now distinct.Indeed, the described strategy only eliminates unwanted sequestration in cascades in whichthe intermediate steps are activating. Nonetheless, it makes complex networks in which -for example - deactivating catalysts are always active realisable. Networks of this kind arecommon in biology [20, 29].The first type of mismatch in Definition 21 ensures that there is always a C-C mismatchbetween the upstream toeholds of the state strand of A i +2 and the downstream toeholds ofthe state strand of A i +1 in the cascade A i → / a A i +1 → A i +2 → / a A i +3 , weakening theunwanted binding between the fuel and waste species identified in Theorem 13. Here → / a indicates activation or deactivation. The second type of mismatch in Definition 21 ensuresthat the upstream toeholds of the identity strand of A i +2 are no longer fully complementaryto the downstream toeholds of A i +1 in the cascade A i → / a A i +1 → / a A i +2 → / a A i +3 ,weakening the unwanted binding between ancillary species A i A i +1 and A i +2 A i +3 .Having proposed these mismatches, it is important to determine that they would notcompromise the intended reactions. The first type of mismatch in Definition 21 is notpresent in any complex that must form during the operation of the network; only in theinitially-prepared fuel and if a (de)activating catalyst binds to an (in)active substrate. Ittherefore presents no issues for intended reactions.The second type of mismatch in Definition 21 is more subtle. When a catalyst A interactswith its substrate B , a mismatch at the very end of the catalyst duplex is converted intoa mismatch within the stem of of the catalyst-substrate complex AB . Since mismatches are known to be more destabilizing in duplex interiors [32, 40], this conversion represents alocal barrier to branch migration. The thermodynamic favourability of the full reaction A + B → AB + W AB → B (or the equivalent step in a deactivation reaction) is marginal, asthe mismatch at the downstream end of B counters this barrier. We assume that the localbarriers introduced would not prohibit the intended reactions - indeed, conventional 3-waystrand displacement is able to proceed through unmitigated C-C mismatch formation, albeitwith a significant effect on kinetics [26]. In this case, any penalty is likely to be far weaker.The second step of the catalytic turnover, AB + F AB → B → A + B (or the equivalent ina deactivation) is thermodynamically favourable (two internal mismatches are converted intoexterior mismatches) and without local barriers, although one of the toeholds is effectivelyshortened to 4bp. The overall catalytic (de)activation cycle effectively eliminates a singleC-C (G-G) mismatch initially present in the fuel. The reaction as a whole is thereforedriven forwards by the free energy of base-pairing via “hidden thermodynamic driving” [19];products are more stable than reactants without consumption of initially available toeholds.In this sense, the mismatches proposed in Definition 21 will improve the efficacy of the ACDCmotif, as the concentration excess of fuel relative to waste required to drive the reaction inthe desired direction would be reduced. To construct an ACDC network that implements a given graph, three things need to be done:(1) verification that the network is realisable; (2) enumerating all domains on all speciesgiven the graph topology; and (3) compile sequences for each domain and thus for eachstrand present in the system. We have created an ACDC compiler with this functionality [24].While compilers for DSD systems that could be potentially be extended to accommodateour framework exist [3, 46], we decided to make our own since our framework has uniquerequirements about verifying the feasibility of a given CRN and introducing mismatcheswithin domains.The first part is done, at least at the level of each cascade and loop present, by analysingthe properties of a given graph. For every pair of nodes i, j , all directed simple paths arecomputed. We search for paths of length N ≥ i to j or from j to i ) or a FBL (at least one path from i to j and from j to i ) exists in the graph. The realisability of the loop(s) can be verifiedfrom the lengths of the paths according to Theorems 14, 15 and 18.If a given graph is found to be realisable, then domains are assigned for each strandof each species, such that all complementarities and mismatches required by the topologyare satisfied. This ask can be achieved by local analysis of the network topology. Finally,a NUPACK [54] script is generated to generate optimal sequences for each strand. Therequired mismatches are hard-coded into the domain definitions in the script. The softwareis available at https://doi.org/10.5281/zenodo.3838080 . We have introduced the ACDC scheme for constructing DNA-based networks that performdirect catalysis, analysed its shortcomings, and subsequently proposed practical improvements.As of now, we have focused only on the realisibility of ACDC implementations for some . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 15 graphs, not their dynamical behaviour. Three natural directions for further theoreticalinvestigation are: (1) proving the realisability of arbitrary networks; (2) implementingadditional hidden thermodynamic driving so that both substeps of a catalytic reactionare thermodynamically downhill; and (3) automated design of ACDC networks to performsome desired transfer function between input concentrations x i ( t ) , i = 1 ..N and outputconcentrations y j ( t ) , j = 1 ..M . With regard to the first, we conjecture that all violations ofrealisability in arbitrary networks are attributable to the causes identified in Section 4.Equally important, however, is experimentally testing the ACDC motif. Whilst 4-waybranch migration has been used in several contexts [10, 22, 25, 49], the toehold exchangemechanism proposed here is relatively untested. It is also important to establish that themismatches function as intended, limiting sequestration reactions and providing strongoverall thermodynamic driving without causing excessive local barriers that frustrate thenecessary reactions. A final consideration is the possibility of leak reactions involving non-complementary toeholds that we have assumed to be negligible. It remains to be establishedthat unintended reactions will occur at a negligible rate, particularly in the context of speciescontaining mismatches. This research is ongoing within the group.A key property of ACDC is the two recognition interfaces within each species and theinherent symmetry in the species that follows. While this is a design feature that allows bothsubstrate-like and catalyst-like behaviour for a single species, it also has a drawback thatdomains that are essential for some reaction to occur are also present in reactions wherethey only act as identity placeholders (downstream interface of a catalyst and an upstreaminterface of a substrate) that do not interact with any other domain. Consider the reaction inFigure 4; the identity of the “placeholder domains” a, b, g, h, i, j, k that aren’t involved in theinitial binding and migration reactions could be swapped to arbitrary domains that aren’tcomplementary with d, e, f or each other in only one species and the reaction could stilloccur (assuming the correct fuel species is generated based on the substrate and catalyst).However, this may not be possible if A and B are part of some larger computation networkwhere the placeholder domain identities are important. Another drawback of the symmetryis the limitation of loop lengths to even numbers, characterised in Theorem 14. An obviouspotential mitigation to this problem is to make the central domain its own complement,although this choice risks the formation of self-complementary hairpins.The weaknesses of the ACDC motif invite the exploration of other possible designs ofcatalytic activation networks that operate via direct bimolecular catalysis. It is an openquestion as to whether the shortcomings of ACDC can be mitigated without a substantialincrease in complexity or abandoning the mechanism of direct catalytic action. We have established the concept of a direct catalytic reaction and discussed why previouswork on catalytic DNA computing does not fulfil this definition. We have then proposeda framework, ACDC, for implementing non-equilibrium catalytic (de)activation networksusing direct catalytic activation, analogous to systems seen in living cells. ACDC is simplein the sense that all species contain only two strands - an important consideration in thecontext of implementing DSD circuitry in a broad range of contexts.We have analysed the framework’s expressiviness by exploring the implementation of sevennetwork motifs with ACDC. The basic design is highly limited by the inherent symmetry ofcomponents, prohibiting long cascades and most feedforward and feedback loops. However,we propose that systematic placement of mismatches can obviate these difficulties in many contexts. Moreover, we argue that these initially-present mismatches can contribute a “hiddenthermodynamic driving” [19] to the ACDC motifs, increasing the robustness of the designto subtleties in DNA thermodynamics and reducing the concentration imbalances of fuelsrequired to drive the reactions forward. We present a compiler for the sequence design ofACDC-based networks that implements these findings [24].
References L. Adleman. Molecular computation of solutions to combinatorial problems.
Science ,266(5187):1021–1024, November 1994. URL: , doi:10.1126/science.7973651 . Uri Alon.
An Introduction to Systems Biology: Design Principles of Biological Circuits . CRCPress, July 2019. Google-Books-ID: Lg3MDwAAQBAJ. Stefan Badelt, Seung Woo Shin, Robert F. Johnson, Qing Dong, Chris Thachuk, andErik Winfree. A General-Purpose CRN-to-DSD Compiler with Formal Verification, Op-timization, and Simulation Capabilities. In
DNA Computing and Molecular Program-ming , Lecture Notes in Computer Science, pages 232–248. Springer, Cham, September2017. URL: https://link.springer.com/chapter/10.1007/978-3-319-66799-7_15 , doi:10.1007/978-3-319-66799-7_15 . David Barford, Amit K. Das, and Marie-Pierre Egloff. The Structure and Mechanism ofProtein Phosphatases: Insights into Catalysis and Regulation.
Annual Review of Biophysicsand Biomolecular Structure , 27(1):133–164, June 1998. Publisher: Annual Reviews. URL: , doi:10.1146/annurev.biophys.27.1.133 . John P. Barton and Eduardo D. Sontag. The Energy Costs of Insulators in BiochemicalNetworks.
Biophysical Journal , 104(6):1380–1390, March 2013. URL: https://linkinghub.elsevier.com/retrieve/pii/S0006349513001975 , doi:10.1016/j.bpj.2013.01.056 . Hieu Bui, Shalin Shah, Reem Mokhtar, Tianqi Song, Sudhanshu Garg, and John Reif. LocalizedDNA Hybridization Chain Reactions on DNA Origami.
ACS Nano , 12(2):1146–1155, February2018. Publisher: American Chemical Society. doi:10.1021/acsnano.7b06699 . Gourab Chatterjee, Neil Dalchau, Richard A. Muscat, Andrew Phillips, and Georg Seelig. Aspatially localized architecture for fast and modular DNA computing.
Nature Nanotechnology ,12(9):920–927, September 2017. Number: 9 Publisher: Nature Publishing Group. URL: , doi:10.1038/nnano.2017.127 . Yuan-Jyue Chen, Neil Dalchau, Niranjan Srinivas, Andrew Phillips, Luca Cardelli, DavidSoloveichik, and Georg Seelig. Programmable chemical controllers made from DNA.
NatureNanotechnology , 8(10):755–762, October 2013. URL: , doi:10.1038/nnano.2013.189 . Kevin M. Cherry and Lulu Qian. Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks.
Nature , 559(7714):370–376, July 2018. URL: , doi:10.1038/s41586-018-0289-6 . Nadine L Dabby.
Synthetic Molecular Machines for Active Self-Assembly: Prototype Al-gorithms, Designs, and Experimental Study . PhD thesis, California Institute of Tech-nology, Pasadena, California, 2013. URL: https://pdfs.semanticscholar.org/e668/440cdb786ea7c2d0d6ae306c5aefef1208f6.pdf . Wiet de Ronde and Pieter Rein ten Wolde. Multiplexing oscillatory biochemical signals.
Physical Biology , 11(2):026004, April 2014. doi:10.1088/1478-3975/11/2/026004 . Abhishek Deshpande and Thomas E. Ouldridge. High rates of fuel consumptionare not required by insulating motifs to suppress retroactivity in biochemical circuits.
Engineering Biology , 1(2):86–99, December 2017. Publisher: IET Digital Library.URL: https://digital-library.theiet.org/content/journals/10.1049/enb.2017.0017;jsessionid=3if2o9nadi5rh.x-iet-live-01 , doi:10.1049/enb.2017.0017 . . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 17 Robert M. Dirks, Justin S. Bois, Joseph M. Schaeffer, Erik Winfree, and Niles A. Pierce.Thermodynamic analysis of interacting nucleic acid strands.
SIAM Rev , page 2007, 2007. Elaine A. Elion. Ste5: a meeting place for MAP kinases and their associates.
Trends inCell Biology , 5(8):322–327, August 1995. URL: , doi:10.1016/S0962-8924(00)89055-8 . Michael B. Elowitz and Stanislas Leibler. A synthetic oscillatory network of transcriptional reg-ulators.
Nature , 403(6767):335–338, January 2000. Number: 6767 Publisher: Nature PublishingGroup. URL: , doi:10.1038/35002125 . Timothy S. Gardner, Charles R. Cantor, and James J. Collins. Construction of a genetictoggle switch in Escherichia coli.
Nature , 403(6767):339–342, January 2000. Number: 6767Publisher: Nature Publishing Group. URL: , doi:10.1038/35002131 . Anthony J. Genot, Teruo Fujii, and Yannick Rondelez. Scaling down DNA circuits withcompetitive neural networks.
Journal of The Royal Society Interface , 10(85):20130212, August2013. Publisher: Royal Society. URL: https://royalsocietypublishing.org/doi/full/10.1098/rsif.2013.0212 , doi:10.1098/rsif.2013.0212 . Christopher C. Govern and Pieter Rein ten Wolde. Energy Dissipation and Noise Correlationsin Biochemical Sensing.
Physical Review Letters , 113(25):258102, December 2014. Publisher:American Physical Society. URL: https://link.aps.org/doi/10.1103/PhysRevLett.113.258102 , doi:10.1103/PhysRevLett.113.258102 . Natalie E. C. Haley, Thomas E. Ouldridge, Ismael Mullor Ruiz, Alessandro Geraldini,Ard A. Louis, Jonathan Bath, and Andrew J. Turberfield. Design of hidden thermody-namic driving for non-equilibrium systems via mismatch elimination during DNA stranddisplacement.
Nature Communications , 11(1):2562, May 2020. Number: 1 Publisher:Nature Publishing Group. URL: , doi:10.1038/s41467-020-16353-y . Ira Herskowitz. MAP kinase pathways in yeast: For mating and more.
Cell , 80(2):187–197,January 1995. URL: https://linkinghub.elsevier.com/retrieve/pii/0092867495904026 , doi:10.1016/0092-8674(95)90402-6 . Robert F. Johnson. Impossibility of Sufficiently Simple Chemical Reaction Network Imple-mentations in DNA Strand Displacement. In Ian McQuillan and Shinnosuke Seki, editors,
Unconventional Computation and Natural Computation , Lecture Notes in Computer Science,pages 136–149. Springer International Publishing, 2019. Shohei Kotani and William L. Hughes. Multi-Arm Junctions for Dynamic DNA Nanotechnology.
Journal of the American Chemical Society , 139(18):6363–6368, May 2017. URL: https://pubs.acs.org/doi/10.1021/jacs.7b00530 , doi:10.1021/jacs.7b00530 . Matthew R. Lakin, Simon Youssef, Filippo Polo, Stephen Emmott, and Andrew Phillips.Visual DSD: a design and analysis tool for DNA strand displacement systems.
Bioinformatics ,27(22):3211–3213, November 2011. URL: , doi:10.1093/bioinformatics/btr543 . Antti Lankinen. ACDC compiler, May 2020. URL: https://doi.org/10.5281/zenodo.3838080 . Tong Lin, Jun Yan, Luvena L. Ong, Joanna Robaszewski, Hoang D. Lu, Yongli Mi, Peng Yin,and Bryan Wei. Hierarchical Assembly of DNA Nanostructures Based on Four-Way Toehold-Mediated Strand Displacement.
Nano Letters , 18(8):4791–4795, August 2018. Publisher:American Chemical Society. doi:10.1021/acs.nanolett.8b01355 . Robert R. F. Machinek, Thomas E. Ouldridge, Natalie E. C. Haley, Jonathan Bath, andAndrew J. Turberfield. Programmable energy landscapes for kinetic control of DNA stranddisplacement.
Nature Communications , 5(1):1–9, November 2014. Number: 1 Publisher:Nature Publishing Group. URL: , doi:10.1038/ncomms6324 . Marcelo O. Magnasco. Chemical Kinetics is Turing Universal.
Physical Review Letters ,78(6):1190–1193, February 1997. URL: https://link.aps.org/doi/10.1103/PhysRevLett.78.1190 , doi:10.1103/PhysRevLett.78.1190 . G. Manning, D. B. Whyte, R. Martinez, T. Hunter, and S. Sudarsanam. The ProteinKinase Complement of the Human Genome.
Science , 298(5600):1912–1934, December 2002.Publisher: American Association for the Advancement of Science Section: Review. URL: https://science.sciencemag.org/content/298/5600/1912 , doi:10.1126/science.1075762 . Christopher J. Marshall. MAP kinase kinase kinase, MAP kinase kinase and MAPkinase.
Current Opinion in Genetics & Development , 4(1):82–89, February 1994.URL: , doi:10.1016/0959-437X(94)90095-7 . Pankaj Mehta, Alex H. Lang, and David J. Schwab. Landauer in the Age of SyntheticBiology: Energy Consumption and Information Processing in Biochemical Networks.
Journalof Statistical Physics , 162(5):1153–1166, March 2016. doi:10.1007/s10955-015-1431-6 . Thomas E. Ouldridge, Christopher C. Govern, and Pieter Rein ten Wolde. Thermodynamicsof Computational Copying in Biochemical Systems.
Physical Review X , 7(2):021004, April2017. Publisher: American Physical Society. URL: https://link.aps.org/doi/10.1103/PhysRevX.7.021004 , doi:10.1103/PhysRevX.7.021004 . Thomas E. Ouldridge, Ard A. Louis, and Jonathan P. K. Doye. Structural, mechanical,and thermodynamic properties of a coarse-grained DNA model.
The Journal of ChemicalPhysics , 134(8):085101, February 2011. Publisher: American Institute of Physics. URL: https://aip.scitation.org/doi/full/10.1063/1.3552946 , doi:10.1063/1.3552946 . Tomislav Plesa. Stochastic approximation of high-molecular by bi-molecular reactions. arXiv:1811.02766 [math, q-bio] , November 2018. arXiv: 1811.02766. URL: http://arxiv.org/abs/1811.02766 . L. Qian and E. Winfree. Scaling Up Digital Circuit Computation with DNA Strand Displace-ment Cascades.
Science , 332(6034):1196–1201, June 2011. URL: , doi:10.1126/science.1200520 . Lulu Qian, David Soloveichik, and Erik Winfree. Efficient Turing-Universal Computationwith DNA Polymers. In Yasubumi Sakakibara and Yongli Mi, editors,
DNA Computingand Molecular Programming , Lecture Notes in Computer Science, pages 123–140, Berlin,Heidelberg, 2011. Springer. doi:10.1007/978-3-642-18305-8_12 . Lulu Qian and Erik Winfree. A simple DNA gate motif for synthesizing large-scale circuits.
Journal of the Royal Society Interface , 8(62):1281–1297, September 2011. URL: , doi:10.1098/rsif.2010.0729 . Lulu Qian and Erik Winfree. Parallel and Scalable Computation and Spatial Dynam-ics with DNA-Based Chemical Reaction Networks on a Surface. In Satoshi Murataand Satoshi Kobayashi, editors,
DNA Computing and Molecular Programming , LectureNotes in Computer Science, pages 114–131, Cham, 2014. Springer International Publish-ing. doi:10.1007/978-3-319-11295-4_8 . Lulu Qian, Erik Winfree, and Jehoshua Bruck. Neural network computation with DNA stranddisplacement cascades.
Nature , 475(7356):368–372, July 2011. URL: , doi:10.1038/nature10262 . Ismael Mullor Ruiz, Jean-Michel Arbona, Amitkumar Lad, Oscar Mendoza, Jean-Pierre Aimé,and Juan Elezgaray. Connecting localized DNA strand displacement reactions.
Nanoscale ,7(30):12970–12978, July 2015. Publisher: The Royal Society of Chemistry. URL: https://pubs.rsc.org/en/content/articlelanding/2015/nr/c5nr02434j , doi:10.1039/C5NR02434J . John SantaLucia and Donald Hicks. The Thermodynamics of DNA Structural Motifs.
An-nual Review of Biophysics and Biomolecular Structure , 33(1):415–440, 2004. _eprint: ht-tps://doi.org/10.1146/annurev.biophys.32.110601.141800. doi:10.1146/annurev.biophys.32.110601.141800 . . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 19 Hans J. Schaeffer, Andrew D. Catling, Scott T. Eblen, Lara S. Collier, Anke Krauss, andMichael J. Weber. MP1: A MEK Binding Partner That Enhances Enzymatic Activationof the MAP Kinase Cascade.
Science , 281(5383):1668–1671, September 1998. Publisher:American Association for the Advancement of Science Section: Report. URL: https://science.sciencemag.org/content/281/5383/1668 , doi:10.1126/science.281.5383.1668 . Georg Seelig, David Soloveichik, David Yu Zhang, and Erik Winfree. Enzyme-Free Nucleic AcidLogic Circuits.
Science , 314(5805):1585–1588, December 2006. Publisher: American Associationfor the Advancement of Science Section: Report. URL: https://science.sciencemag.org/content/314/5805/1585 , doi:10.1126/science.1132493 . Nadrian C. Seeman and Hanadi F. Sleiman. DNA nanotechnology.
Nature Reviews Materials ,3(1):1–23, November 2017. URL: , doi:10.1038/natrevmats.2017.68 . D. Soloveichik, G. Seelig, and E. Winfree. DNA as a universal substrate for chemical kinetics.
Proceedings of the National Academy of Sciences , 107(12):5393–5398, March 2010. URL: , doi:10.1073/pnas.0909380107 . Carlo Spaccasassi, Matthew R. Lakin, and Andrew Phillips. A Logic Programming Languagefor Computational Nucleic Acid Devices.
ACS synthetic biology , 8(7):1530–1547, July 2019. doi:10.1021/acssynbio.8b00229 . Niranjan Srinivas, James Parkin, Georg Seelig, Erik Winfree, and David Soloveichik. Enzyme-free nucleic acid dynamical systems.
Science , 358(6369), December 2017. URL: https://science.sciencemag.org/content/358/6369/eaal2052 , doi:10.1126/science.aal2052 . J. David Sweatt. The neuronal MAP kinase cascade: a biochemical signal integrationsystem subserving synaptic plasticity and memory.
Journal of Neurochemistry , 76(1):1–10, 2001. URL: https://onlinelibrary.wiley.com/doi/abs/10.1046/j.1471-4159.2001.00054.x , doi:10.1046/j.1471-4159.2001.00054.x . Mario Teichmann, Enzo Kopperger, and Friedrich C. Simmel. Robustness of Localized DNAStrand Displacement Cascades.
ACS Nano , 8(8):8487–8496, August 2014. Publisher: AmericanChemical Society. doi:10.1021/nn503073p . Suvir Venkataraman, Robert M. Dirks, Paul W. K. Rothemund, Erik Winfree, and Niles A.Pierce. An autonomous polymerization motor powered by DNA hybridization.
NatureNanotechnology , 2(8):490–494, August 2007. Number: 8 Publisher: Nature Publishing Group.URL: , doi:10.1038/nnano.2007.225 . Alan J. Whitmarsh, Julie Cavanagh, Cathy Tournier, Jun Yasuda, and Roger J. Davis. AMammalian Scaffold Complex That Selectively Mediates MAP Kinase Activation.
Science ,281(5383):1671–1674, September 1998. Publisher: American Association for the Advancementof Science Section: Report. URL: https://science.sciencemag.org/content/281/5383/1671 , doi:10.1126/science.281.5383.1671 . Christian Widmann, Spencer Gibson, Matthew B. Jarpe, and Gary L. Johnson. Mitogen-Activated Protein Kinase: Conservation of a Three-Kinase Module From Yeast to Human.
Physiological Reviews , 79(1):143–180, January 1999. Publisher: American PhysiologicalSociety. URL: https://journals.physiology.org/doi/full/10.1152/physrev.1999.79.1.143 , doi:10.1152/physrev.1999.79.1.143 . Wataru Yahiro and Masami Hagiya. Implementation of Turing Machine Using DNA StrandDisplacement. In Carlos Martín-Vide, Takaaki Mizuki, and Miguel A. Vega-Rodríguez, editors,
Theory and Practice of Natural Computing , Lecture Notes in Computer Science, pages 161–172,Cham, 2016. Springer International Publishing. doi:10.1007/978-3-319-49001-4_13 . Peng Yin, Harry M. T. Choi, Colby R. Calvert, and Niles A. Pierce. Programming biomolecularself-assembly pathways.
Nature , 451(7176):318–322, January 2008. Number: 7176 Publisher:Nature Publishing Group. URL: , doi:10.1038/nature06451 . Joseph N. Zadeh, Conrad D. Steenberg, Justin S. Bois, Brian R. Wolfe, Marshall B. Pierce,Asif R. Khan, Robert M. Dirks, and Niles A. Pierce. NUPACK: Analysis and design of nucleic acid systems.
Journal of Computational Chemistry , 32(1):170–173, 2011. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.21596 , doi:10.1002/jcc.21596 . David Yu Zhang and Georg Seelig. Dynamic DNA nanotechnology using strand-displacementreactions.
Nature Chemistry , 3(2):103–113, February 2011. URL: , doi:10.1038/nchem.957 . David Yu Zhang, Andrew J. Turberfield, Bernard Yurke, and Erik Winfree. EngineeringEntropy-Driven Reactions and Networks Catalyzed by DNA.
Science , 318(5853):1121–1125,November 2007. Publisher: American Association for the Advancement of Science Section:Report. URL: https://science.sciencemag.org/content/318/5853/1121 , doi:10.1126/science.1148532 . A Notation For ACDC Species and Reactions
Notation [ a b ] denotes a strand consisting of domains a and b . Logical not is denoted by ¬ and logicaland by ∧ . { n..m } , with n < m , denotes the integer interval between n and m . Definitions (cid:73)
Definition 24. (ACDC reactant structure). Each reactant in an ACDC network consistsof two strands, each of which have one long domain and four toehold domains. The twostrands are called state strand and identity strand based on the fact that one strand decodesthe state of the species and other the identity. A reactant X has the following domains (notethe use of H for “inner” to avoid confusion with “identity”: SH X ) : the inner toehold domain on the 5’ side (downstream end) of the state strand. SO X ) : the outer toehold domain on the 5’ side (downstream end) of the state strand. SH X ) : the inner toehold domain on the 3’ side (upstream end) of the state strand. SO X ) : the outer toehold domain on the 3’ side (upstream end) of the state strand. IH X ) : the inner toehold domain on the 5’ side (upstream end) of the identity strand. IO X ) : the outer toehold domain on the 5’ side (upstream end) of the identity strand. IH X ) : the inner toehold domain on the 3’ side (downstream end) of the identity strand. IO X ) : the outer toehold domain on the 3’ side (downstream end) of the identity strand. SL ( X ) : the long domain on the state strand. IL ( X ) : the long domain on the identity strand. (cid:73) Definition 25. (Subset and logical operations for ACDC species). The following operationswill be useful in the analysis of ACDC networks:Complementarity (cid:5) : x (cid:5) y is true for sequences x, y iff x = y ∗ (and x ∗ = y ).Complementarity with mismatch (cid:3) : x (cid:3) y is true for sequences x, y iff x = y ∗ (and x ∗ = y ) except for a single centrally-placed C-C or G-G mismatch. x (cid:3) y is distinct from ¬ x (cid:5) y , for which it is assumed that interactions between x and y are negligible. (downstream end) state toehold sequence S A ) := [ SO A ) SH A )] . (upstream end) state toehold sequence S A ) := [ SH A ) SO A )] . (upstream end) identity toehold sequence I A ) := [ IO A ) IH A )] . (downstream end) identity toehold sequence I A ) := [ IH A ) IO A )] . . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 21 (cid:73) Definition 26. (Major species). A species X is either a major species only if ¬ (cid:0) SO X ) (cid:5) IO X ) (cid:1) ∧ (cid:0) SH X ) (cid:5) IH X ) (cid:1) ∧ (cid:0) SL ( X ) (cid:5) IL ( X ) (cid:1) ∧ (cid:0) SH X ) (cid:5) IH X ) (cid:1) ∧¬ (cid:0) SO X ) (cid:5) IO X ) (cid:1) . (cid:73) Definition 27. (Domain complementarities in an ACDC reaction without mismatches).An ACDC reaction A → B or A a B implies S A ) (cid:5) S B ) = S B ) ∧ IL ( A ) = IL ( A ) (cid:5) IL ( B ) = IL ( B ) ∧ I A ) = I A ) (cid:5) I B ) = I B ) . Domains not constrained by these requirements are non-complementary. (cid:73)
Definition 28. (Domain complementarities in ACDC reactions with mismatches). AnACDC reaction 21 A → B with mismatches placed as per Definition 21 implies S A ) (cid:5) S B ) ∧ S A ) (cid:3) S B ) ∧ IL ( A ) = IL ( A ) (cid:5) IL ( B ) = IL ( B ) ∧ I A ) = I A ) (cid:3) I B ) = I B ) . Domains not constrained by these requirements are non-complementary.An ACDC reaction A a B with mismatches placed as per Definition 21 implies A ) (cid:3) S B ) ∧ S A ) (cid:5) S B ) ∧ IL ( A ) = IL ( A ) (cid:5) IL ( B ) = IL ( B ) ∧ I A ) = I A ) (cid:3) I B ) = I B ) . Domains not constrained by these requirements are non-complementary.
B Proofs of Theorems and Lemmas 8 - 23 (cid:73)
Theorem 8 (Split motifs are realisable) . Consider the set of N reactions A → B A → B . . . A → B N , in which all B i are distinct nodes from A . Such a network is realisable for any N ≥ . Proof.
To realise the above system, we must have:Definition 26 must apply for A and B i for all i ,Definition 26 must apply for all pairs A , B i , S B i ) and and I B i ) must be unique for all i .By simple inspection it can be verified that there is no contradiction in these requirements.Moreover, all toeholds other than those required to be complementary can be chosen to benon-complementary to each other. If these assignments are made, it can be directly verifiedthat Definition 6 is not violated by the major species and associated ancillary species. Thusthe motif is realisable. (cid:74)(cid:73) Theorem 9 (Integrate motifs are realisable) . Consider the set of N reactions A → B A → B . . . A N → B, in which all A i are distinct nodes from B . Such a system is realisable for any N ≥ . Proof.
To realise the above system, we must have:Definition 26 must apply for A i and B for all i ,Definition 26 must apply for all pairs A i , B .By simple inspection it can be verified that there is no contradiction in these requirements.Moreover, all toeholds other than those required to be complementary can be chosen to benon-complementary to each other. If these assignments are made, it can be directly verifiedthat Definition 6 is not violated by the major species and associated ancillary species. Thusthe motif is realisable. (cid:74)(cid:73) Lemma 10 (The ancillary species of a catalyst’s upstream reactions and substrate’s downstreamreactions cause leak reactions) . Consider a reaction B → C , and further assume that A → B and C → D for at least one species A and at least one species D . Then AB and CD ,and F AB → B and F CD → D / W CD → D possess two available toehold pairs that could form acontiguous complementary duplex. No other violations of realisability occur. Proof.
It can be verified by inspection that there is no inconsistency in the domain require-ments for
A, B, C, D to be defined as major species (Definition 26) and for A → B → C → D (Definition 27). All domains can be chosen to be non-complementary unless specified by theserequirements. When these domain assignments are made, it can be verified by inspectionthat criteria 1, 2 and 3 of Definition 6 are not violated.To establish whether criterion 4 of Definition 6 is violated, one need only consider theunbound domains on the ancillary species in the system A → B → C → D : I A ) , I B ) in ABS A ) , S B ) in F AB → B S A ) , S B ) in W AB → B I B ) , I C ) in BC . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 23 S B ) , S C ) in F BC → C S B ) , S C ) in W BC → C I C ) , I D ) in CDS C ) , S D ) in F CD → D S C ) , S D ) in W CD → D .Observe that the reaction B → C implies I B ) (cid:5) I C ) , S B ) (cid:5) S C ), meaning AB and BC can bind by the two contiguous toehold domains I B ) , I C ), and F AB → B can bindwith F CD → D and W CD → D by the two contiguous toehold domains in S B ) , S C ). Itcan be verified by inspection that no other violations of criterion 4 occur. (cid:74)(cid:73) Theorem 13 (Long cascades are non-realisable due to a particular type of leak reaction only) . Consider the set of N reactions A → A , A → A ... A N − → A N , in which all A i aredistinct. This network would be realisable if reactions between ancillary species A i A i +1 and A i +2 A i +3 , and F A i A i +1 → A i +1 and F A i +2 A i +3 → A i +3 / W A i +2 A i +3 → A i +3 , were absent. Proof.
It can be directly verified at that an arbitrarily-long cascade can be constructed atthe domain level in which each A i satisfies Definition 26 and each pair A i , A i +1 satisfiesDefinition 27, satisfying criterion 1 of Definition 6.If all sequences not constrained to be complementary by these definitions are chosen tobe non-complementary, potential violations of the criteria 2-4 of Definition 6 arise due toan unavoidable unwanted complementarity between toehold domains in species that are notintended to interact. By explicitly constructing a cascade at the domain level, it can beverified that some toehold domains (or their complements) present in the reaction A i → A i +1 must also be present in A i +1 → A i +2 and A i +2 → A i +3 , but not in A i + n → A i + n +1 for n ≥
3. Intuitively, strands that participate at cascade level j also participate at level j − j + 1, but no further away. It is therefore sufficient to consider a cascade with N = 4 toidentify all violations of realisability in a cascade. The required result then follows directlyfrom Lemma 10. (cid:74)(cid:73) Theorem 15 (Self interactions and bidirectional edges are not realisable) . Consider a systemof reactions A → A → A . . . A N − → A . This network is not realisable if N ≤ . Proof.
The result for N = 1 is a direct consequence of Theorem 14. For N = 2, consider theset of reactions A → BB → A. By Definition 27, A → B implies I A ) (cid:5) I B ) and IL ( A ) (cid:5) IL ( B ). In addition, B → A implies I A ) (cid:5) I B ). The identity strands of A and B are then fully complementary,violating criterion 3 of Definition 6. (cid:74)(cid:73) Theorem 17 (Long feedback loops with an even number of units are non-realisable due to aparticular type of leak reaction only) . Consider the feedback loop A → A A → A . . . A N − → B N A N → A For N even, N ≥ , this network would be realisable if reactions between ancillary species A i A i +1 and A i +2 A i +3 , and F A i A i +1 → A i +1 and F A i +2 A i +3 → A i +3 / W A i +2 A i +3 → A i +3 , were absent.Here, the index j in A j should be interpreted modularly: A j = A j − N for j > N . Proof.
It can be directly verified at that for N even, N ≥
4, a loop can be constructed atthe domain level in which each A i satisfies Definition 26 and each pair A i , A i +1 (definedmodularly) satisfies Definition 27, satisfying criterion 1 of Definition 6.To identify the violations of realisability that arise from unwanted interactions, let usfirst consider a cascade without the A N → A reaction. The only violations of realis-ability are those identified in 13: between A i A i +1 and A i +2 A i +3 , and F A i A i +1 → A i +1 and F A i +2 A i +3 → A i +3 / W A i +2 A i +3 → A i +3 , without interpreting the index modularly. Now we con-sider the additional effect of requiring A N → A . The only domains that must be changedare S A N ) and I A N ). These domains and their complements are only present in reactions A N − → A N , A N → A , A → A , and so it is sufficient to consider only this cascadeto identify additional violations of realisability. By Lemma 10, the resultant violations ofrealisability are exactly those stated in the theorem. (cid:74)(cid:73) Theorem 18 (The relative lengths of paths are constrained in feedforward loops) . Considerthe generalised feedforward loop A → B B → B . . . B N − → B N B N → DA → C C → C . . . C M − → C M C M → D For such a network to be realisable, it is necessary that N ≥ , M ≥ , and N − M is even. Proof.
The claim about N − M having to be even follows from Theorem 14.Assume for contradiction that a FFL with N = 0 and M ≥ A activates C , and both A and C M activate D , it must be that C M can also perform abranch migration with C , which is an unwanted reaction violating criterion 2 of Defintion 6. (cid:74)(cid:73) Theorem 23 (Mismatches successfully destabilize unintended complexes) . The schemeproposed in Definition 21 satisfies the following: All motifs that are realisable in the mismatch-free ACDC design remain realisable in themismatch-based scheme. Cascades of arbitrary length N with at most the first and last reactions deactivating arerealisable; Feedback loops with N even and N ≥ in which all reactions are activating are realisable; Feedforward loops with N ≥ , M ≥ , N − M even, in which at most the first and lastreactions are deactivating in each branch, are realisable. Proof.
Consider the first claim. For any network in which it is possible to select domainsthat satisfy Definition 26 and Definition 27, it is trivial to convert those domains to satisfy26 and 27 by introducing the specific bases at the required locations in the major species,and adjusting the fuel and waste to compensate. By Assumption 20, these changes do notintroduce new violations of realisability.Now consider the second claim. By the first claim and 13, it is sufficient to considerwhether the sequestration reactions characterised by Lemma 10 occur between ancillaryspecies in any cascade of N = 4 components in the mismatch-based scheme of Definition 21.First, consider the unbound domains in the ancillary species in the system A → / a B → C → / a D , with mismatches placed as per Definition 21: I A ) , I B ) in ABS A ) , S B ) in F AB → B W AB → B S A ) , S B ) in W AB → B F AB → B . Lankinen, I. Mullor Ruiz, and T. E. Ouldridge 25 I B ) , I C ) in BCS B ) , S C ) in F BC → C S B ) , S C ) in W BC → C I C ) , I D ) in CDS C ) , S D ) in F CD → D W CD → D S C ) , S D ) in W CD → D F CD → D .By Definition 28, observe that the reaction B → C implies I B ) (cid:3) I C ) , S B ) (cid:3) S C ).Moreover, ¬ S B ) (cid:5) S C ). By Assumption 20, none of the violations of realisabil-ity that would otherwise occur due to binding of AB and CD ; F AB → B W AB → B and F CD → D W CD → D ; and F AB → B W AB → B and W CD → D F CD → D characterised by Lemma10, occur.We note that if B a C in the above network, Definition 28 implies S B ) (cid:5) S C ),meaning that sequestration reactions still occur between ancillary fuel and waste species.Cascades with deactivation reactions as intermediate steps are therefore not realisable inthis scheme.Now consider the third claim. By Theorem 17 and the first claim of this Theorem, it issufficient to consider only the sequestration reactions listed in Theorem 17. Further, sincethe only difference between a feedback loop with exclusively activating interactions andan activating cascade with N species is that A N → A in a loop, the second claim of thisTheorem implies that it is only necessary to consider changes in realisability due to theintroduction of the A N → A reaction.For N ≥
6, it can be verified that imposing I A N ) (cid:3) I A ), S A N ) (cid:3) S (3) A , asrequired by A N → A , does not create new realisability violations for a cascade of length N with exlcusively activating reactions. The ancillary species of the reactions A N − → A N − , A N − → A N , A N → A , A → A , A → A can only form complexes held together bytwo contiguous toehold domains with a central mismatch, and thus do not violate realisabilityby Assumption 20. All other ancillary species are unaffected.We note that the above argument does not apply to FBLs of length N = 4, which remainunrealisable. In that case, adding the reaction A N → A creates complexes of ancillaryspecies that are held together by two separate sets of contiguous toehold domains, each witha central mismatch, either side of a 4-way junction. In effect, the short periodicity of an N = 4 loop means that the unwanted interaction identified in Lemma 10 happens twice foreach pair of ancillary species. We do not assume in Assumption 20 that such a structurewill dissociate. We also note that feedback loops with any deactivating reactions remainunrealisable, since each reaction A i → A i +1 is effectively an intermediate reaction between A i − → A i and A i +1 → A i +2 .Finally we turn to the fourth claim. By the first claim of this Theorem, and Theorem18, it is sufficient to consider only the potential unwanted sequestration reactions betweenancillary species identified in Theorem 18 for each feed-forward branch. The proof is thenidentical to that of the second claim of this Theorem..Finally we turn to the fourth claim. By the first claim of this Theorem, and Theorem18, it is sufficient to consider only the potential unwanted sequestration reactions betweenancillary species identified in Theorem 18 for each feed-forward branch. The proof is thenidentical to that of the second claim of this Theorem.