A small PAM optimises target recognition in the CRISPR-Cas immune system
AA small PAM optimises target recognition in theCRISPR-Cas immune system
Melia E. Bonomo , Department of Physics & Astronomy, Rice University, Houston, TX, USA Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
Keywords:
CRISPR-Cas, adaptive immunity, bacteria, phage, modularity, stochastic model
Author for correspondence:
M.E. Bonomo, [email protected]
ABSTRACT
CRISPR-Cas is an adaptive immune mechanism that has been harnessed for a variety ofgenetic engineering applications: the Cas9 protein recognises a 2–5nt DNA motif, known asthe PAM, and a programmable crRNA binds a target DNA sequence that is then cleaved.While off-target activity is undesirable, it occurs because cross-reactivity was beneficial inthe immune system on which the machinery is based. Here, a stochastic model of the targetrecognition reaction was derived to study the specificity of the innate immune mechanismin bacteria. CRISPR systems with Cas9 proteins that recognised PAMs of varying lengthswere tested on self and phage DNA. The model showed that the energy associated withPAM binding impacted mismatch tolerance, cleavage probability, and cleavage time. SmallPAMs allowed the CRISPR to balance catching mutant phages, avoiding self-targeting,and quickly dissociating from critically non-matching sequences. Additionally, the resultsrevealed a lower tolerance to mismatches in the PAM and a PAM-proximal region knownas the seed, as seen in experiments. This work illustrates the role that the Cas9 protein hasin dictating the specificity of DNA cleavage that can aid in preventing off-target activity inbiotechnology applications. 1 a r X i v : . [ phy s i c s . b i o - ph ] J a n . INTRODUCTION Clustered regularly interspaced short palindromic repeats (CRISPR) and the CRISPR-associated proteins (Cas) constitute a genetic adaptive immune system found in bacteriaand archaea [1, 2]. The immunological memory is comprised of alternating DNA repeatsand short sequences known as spacers, which match sequences known as protospacers inthe genomes of mobile genetic threats, such as phages. CRISPR-Cas transcribes its spacersas RNA guides (crRNA) that help Cas proteins to recognise and cleave subsequent geneticinfectors. Owing to its ability to interact with programmable DNA targets, the Cas9 proteinfrom
Streptococcus pyogenes has become a prominent tool for a variety of genetic editingand gene expression applications [3]. As these biotechnologies rapidly advance, there is stillmore to be understood about the specificity and efficiency of DNA recognition and cleavage.The CRISPR target interrogation reaction is modular: the Cas9 searches for and binds toa 2–5nt protospacer-associated DNA motif (PAM), and the crRNA binds to the associatedDNA sequence [4]. During the PAM interaction, the Cas9 domain that locks with thephosphate group of the first DNA base pair causes a distortion in the double-strands thatstarts double-stranded DNA (dsDNA) melting [5]. Melting relies on Cas9:PAM bindingenergy, because Cas9 does not hydrolyse ATP [6]. After the start of local dsDNA melting,crRNA:DNA hybridisation begins, where the roughly 30nt crRNA binds the available targetDNA in a directional, base-by-base sequential manner [6, 7]. A sufficient crRNA:DNAmatch will lead to cleavage of the target DNA. Experiments and stochastic modelling haveshown that the number and position of mismatches between the Cas9:crRNA and targetDNA impact the CRISPR’s specificity, whereby multiple mismatches in the PAM and PAM-proximal region significantly lower cleavage probability [7–9].To date, there has been little work investigating the extent to which the Cas9:crRNAmodularity regulates a successful immune recognition reaction. Here, a stochastic modelwas derived to study how the size of the PAM module impacts the free energy landscape ofthe target interrogation reaction, which was defined as beginning with Cas9:PAM binding,followed by base-by-base dsDNA melting and crRNA:DNA binding, and ending with DNAcleavage. Cleavage probability, cleavage time, and dissociation time (when cleavage doesnot occur) were calculated for simulated phage DNA with varying mutation rates and selfDNA. The model demonstrated that small PAMs, comparable to those found in endogenous2RISPR systems, exhibited hierarchical mismatch tolerance and were sufficiently specificfor determining self versus non-self DNA, while still being sufficiently cross-reactive and fastto protect against mutant phage attacks.
II. METHODSA. The DNA Target Recognition Reaction
Let us consider the reaction through which the CRISPR-Cas binds and cleaves a DNAsequence as unfolding on a landscape of 0 to M discrete states (Figure 1). The barrierheights ∆ E j,k between state j to state k are determined as∆ E i,i − = ∆ E i − ,i − ∆ G i (1)where ∆ G i is the free energy difference between state i and i −
1, defined by∆ G = ∆ G PAM ( n match ) , (2)∆ G >i>M = ∆ G dsDNAseparation + ∆ G BindDNA , (3)∆ G M = ∆ G Cleavage . (4)The ∆ G PAM ( n match ) is a the free energy associated with binding the Cas9:DNA, where n match is the number of matches between the Cas9 and the target DNA sequence’s PAM. The freeenergy of melting each target DNA base pair, ∆ G dsDNAseparation , is based on the energyassociated with melting dsDNA that is matched and negatively supercoiled, which makes itenergetically favourable to separate the strands [7]. The free energy ∆ G BindDNA associatedwith binding each crRNA:DNA base pair after the PAM is ∆ G BindMatch if the base pairmatches and ∆ G BindMismatch if it does not. The ∆ G Cleavage is the free energy associated withbreaking the phosphodiester bonds of both strands of the DNA target. The reverse µ andforward λ rates for each reaction state are then determined using the Arrhenius relation as µ i = A i,i − e − ∆ E i,i − /k B T (5) λ i = A i,i +1 e − ∆ E i,i +1 /k B T , (6)3here A j,k is the attempt rate to cross the barrier from state j to state k , k B is the Boltzmannconstant, and T is temperature. The free energy and kinetic parameters used in the modelare estimated from experiments, as described in the Supplemental Material. CR RNA:DNA
BINDING
BASE - BY - BASE Δ G NOT
BOUND
GGC TA CTAGAA AT GGC TA CTAGAA AT C T A C G A C GG C G AA G A T G C T G CC G C TT C AS BOUND C T A C G A C GG C G A A GG C T A C T A G A A A T C T G CC G G A T C TT G POST - CLEAVAGE GG C T A C T A G AA A T C T A C G A C GG C G AA C T G CC G G A T C TT G DNA
TARGET
FULLY
BOUND
GGC T A C T A G A A A T C T A C G A C GG C G AA G C T G CC G C TT G A T μ i λ i i = 0 i = 1 i = 2... M − 2 i = M − 1 i = M M i Δ E Δ E FIG. 1. Diagram of the reaction landscape for CRISPR-Cas interrogation of a DNA sequence.State 0 represents the Cas9:crRNA unbound to the DNA. In state 1, the Cas9 has bound thePAM, and in states i = 2 through M −
2, there is binding of individual crRNA:DNA base pairs.State M − M is after cleavage occurs.Examples of the free energy ∆ G i − and barrier heights ∆ E i − ,i and ∆ E i,i − for i = 2 are marked. B. Probability and Time to Cleave DNA Target
Utilising the reverse and forward reaction rates, the probability of cleaving a DNA targetfrom the initial Cas9:PAM interaction in state i = 1 is p (1 ,M ) = 11 + M − P i =1 i Q j =1 γ j , (7)and from any state i is p ( i,M ) = p (1 ,M ) (cid:16) i − X j =1 j Y k =1 γ k (cid:17) , (8)4here γ i = µ i /λ i . The time to cleave the DNA target starting from the initial Cas9:PAMstate i = 1 is t (1 ,M ) = M − X i =1 p ( i,M ) λ i + M − X i =1 p ( i,M ) λ i M − X j = i +1 j Y k = i +1 γ k ! , (9)and the time for the Cas9 protein to dissociate from the PAM is t (1 , = " M − X i =1 − p ( i,M ) λ i + M − X i =1 − p ( i,M ) λ i M − X j = i +1 j Y k = i +1 γ k ! p (1 ,M ) − p (1 ,M ) . (10)These equations were derived using the backwards Fokker-Planck equation, as detailed inthe Supplemental Material, and they describe the absorbing probability and time for anystochastic system with state-dependent forward and reverse rates. III. MODELLING RESULTS
Hypothetical Cas9 proteins that had PAM specificities varying in length from 0nt to 33ntwere associated with crRNAs and tested on target DNA sequences of fixed length, consistingof 33nt for the PAM and protospacer together, with assorted numbers and locations ofmismatches. For the Cas9:crRNA interrogation of each DNA sequence, the landscape ofreaction states was generated (Supplemental Figure S1), the probability of cleavage wascalculated with Eq. 7, and a random number was generated to determine whether or notcleavage occurred given this probability. If the target sequence was cleaved, the cleavagetime was calculated with Eq. 9, otherwise the time for dissociation of the Cas9:crRNA fromthe target DNA sequence was calculated with Eq. 10.
A. Random Mismatches
In the first run of the model, mismatches were generated in random sequence locations.Phage PAMs and protospacers were given mutation rates ν = 0 . , . , . , and 0.50,corresponding to approximately 2, 5, 10, and 16 mismatches in each sequence, respectively.Self DNA sequences of the same 33nt-length were generated with arbitrary matches tothe Cas9:crRNA with a 1 in 4 probability (equivalent to ν = 0 . p (1 ,M ) > . ν ≤ .
30 was not greatly affected by PAM size, though in general, more mismatches sloweddown cleavage (Fig. 2B). Importantly, the PAMs that were 5nt or less had p (1 ,M ) < .
05 forcleaving their own sequences, and a very fast dissociation time (on the order of 10 − s to10 − s) from any sequence that was not cleaved (Fig. 2C). A Size of PAM (nt) A ve r a g e P r ob a b ili t y o f C l eava g e Self DNAPhage = 0.50Phage = 0.30Phage = 0.15Phage = 0.05 B Size of PAM (nt) -1 A ve r a g e T i m e t o C l eava g e ( s ) Self DNAPhage = 0.50Phage = 0.30Phage = 0.15Phage = 0.05 C Size of PAM (nt) -10 -5 A ve r a g e T i m e t o D i ss o c i a t e ( s ) Self DNAPhage = 0.50Phage = 0.30Phage = 0.15Phage = 0.05 -10 -5 FIG. 2. (A)
Probability of cleavage, (B) cleavage time, and (C) dissociation time when cleavagedid not occur (inset shows the curves offset for clarity). Each curve was averaged from 10,000iterations of the model, and error bars represent standard error. The average times were calculatedas geometric means when the respective events occurred in at least 5% of the iterations for aparticular PAM size. . Consecutive Mismatches In the second model run, mismatches were generated consecutively across all possiblepositions. Mismatch tolerance was profiled for all numbers of mismatches from 2 to 24,by averaging over the number of times cleavage occurred (called cleavage frequency) whenthere was a mismatch at a particular sequence location. For the 3nt-PAM system, therewas low tolerance (cleavage frequency < .
6) in the PAM and moderate tolerance (cleavagefrequency ≈ .
8) in the first 15nt after the PAM depending on the number of consecutivemismatches (Fig. 3).
P1 P2 P3 1 5 10 15 20 25 30
Location of Mismatch (nt position) C l eava g e F r e qu e n cy FIG. 3. Mismatch tolerance for the 3-nt PAM system illustrated by averaging over cleavage occur-rences when there were mismatches at each sequence location. A cleavage frequency of 1 meansthat a mismatch at that particular location was always tolerated, whereas 0 means it was nevertolerated, when there were the specified number of consecutive mismatches. Each curve was aver-aged from 10,000 iterations of the model, and error bars represent standard error. P i denotes thelocation of PAM nucleotide i , and the x -axis restarts for nucleotides after the PAM. IV. DISCUSSION
An effective CRISPR system needs to be able to swiftly cleave foreign DNA, includingsequences that are not exact matches to the crRNA, while avoiding cleaving self DNA andquickly dissociating to minimise the DNA target search time. This work showed how themodularity of Cas9:PAM and crRNA:DNA binding regulated target recognition from theenergy associated with binding the first module.Larger PAMs led to more cross-reactivity, whereas smaller PAMs led to more specifictargeting. Experiments have shown that Cas9 evolved with broadened PAM compatibilityled to higher DNA targeting specificity [10]. In other words, the engineered Cas9 had a7ess specific PAM interaction, resulting in lower binding energy than the wild type, andengaged in less off-target activity as a result. Furthermore, the heightened sensitivity tomismatches in the PAM and PAM-proximal region has been experimentally observed. Singlemismatches in the PAM increase the binding free energy but can still lead to crRNA:DNAhybridisation [6, 7]. In particular, if there is sufficient crRNA:DNA complementarity in thefirst 8–12 bp, known as the seed region, and up to 8 mismatches in the remainder of thesequence [6].CRISPR systems with smaller PAMs were just as fast as those with larger PAMs atcleaving phage DNA at low mutation rates, and they had fast dissociation times, though withlarge variation.
In vivo experiments with a single Cas9 in
Escherichia coli demonstrated thatinterrogating a sequence next to a PAM site took less than 30 ms on average [4]. However,while some Cas9:crRNA complexes dissociate immediately when the seed does not match,others engage in more of crRNA:DNA hybridisation before dissociating [11]. Additionally,while PAM-distal mismatches did not affect cleavage probability, the model demonstratedthat ≥ ≈ Competing Interests.
The author declares that the research was conducted in theabsence of any commercial or financial relationships that could be construed as a potentialconflict of interest.
Funding.
No funding was received for this study.
Acknowledgements.
The author would like to thank M.W. Deem for valuable guidancein developing the model and A.B. Kolomeisky for helpful discussions. Part of this workappears in the author’s doctoral thesis [17]. [1] Makarova KS, Wolf YI, Iranzo J, Shmakov SA, Alkhnbashi OS, Brouns SJJ, Charpentier E,Cheng D, Haft DH, Horvath P, et al. Evolutionary classification of CRISPR–Cas systems: aburst of class 2 and derived variants.
Nat Rev Microbiol , 18(2):67–83, 2020.[2] Bonomo ME, Deem MW. The physicist’s guide to one of biotechnology’s hottest new topics:CRISPR-Cas.
Phys Biol , 15(4), 2018.[3] Doudna JA, Charpentier E. The new frontier of genome engineering with CRISPR-Cas9.
Science , 346(6213), 2014.[4] Jones DL, Leroy P, Unoson C, Fange D, ´Curi´c V, Lawson MJ, Elf J. Kinetics of dCas9 targetsearch in
Escherichia coli.
Science , 357(6358):1420–1424, 2017.
5] Anders C, Niewoehner O, Duerst A, Jinek M. Structural basis of PAM-dependent target DNArecognition by the Cas9 endonuclease.
Nature , 513(7519):569–573, 2014.[6] Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by theCRISPR RNA-guided endonuclease Cas9.
Nature , 507(7490):62, 2014.[7] Farasat I, Salis HM. A biophysical model of CRISPR/Cas9 activity for rational design ofgenome editing and gene regulation.
PLoS Comp Biol , 12(1):e1004724, 2016.[8] Jiang W, Bikard D, Cox D, Zhang F, Marraffini LA. RNA-guided editing of bacterial genomesusing CRISPR-Cas systems.
Nat Biotechnol , 31(3):233–239, 2013.[9] Klein M, Eslami-Mossallam B, Arroyo DG, Depken M. Hybridization Kinetics ExplainsCRISPR-Cas Off-Targeting Rules.
Cell Rep , 22(6):1413–1423, 2018.[10] Hu JH, Miller SM, Geurts MH, Tang W, Chen L, Sun N, Zeina CM, Gao X, Rees HA, Lin Z,et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.
Nature ,556(7699):57–63, 2018.[11] Szczelkun MD, Tikhomirova MS, Sinkunas T, Gasiunas G, Karvelis T, Pschera P, Siksnys V,Seidel R. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascadeeffector complexes.
Proc Natl Acad Sci USA , 111(27):9798–9803, 2014.[12] Bradde S, Nourmohammad A, Goyal S, Balasubramanian V. The size of the immune repertoireof bacteria.
Proc Natl Acad Sci USA , 117(10):5144–5151, 2020.[13] Shvets AA, Kolomeisky AB. Mechanism of genome interrogation: How CRISPR RNA-guidedCas9 proteins locate specific targets on DNA.
Biophys J , 113(7):1416–1424, 2017.[14] Nishimasu H, Cong L, Yan WX, Ran FA, Zetsche B, Li Y, Kurabayashi A, Ishitani R, ZhangF, Nureki O. Crystal structure of
Staphylococcus aureus
Cas9.
Cell , 162(5):1113–1126, 2015.[15] Jiang F, Doudna JA. The structural biology of CRISPR-Cas systems.
Curr Opin Struct Biol ,30:100–111, 2015.[16] Fineran PC, Charpentier E. Memory of viral infections by CRISPR-Cas adaptive immunesystems: acquisition of new information.
Virology , 434(2):202–209, 2012.[17] Bonomo ME.
Investigating Modular Structure and Function in Biology: from Immunology toCognition . PhD thesis, Rice University, 2020. upplemental Material A small PAM optimises target recognition in theCRISPR-Cas immune system
Melia E. Bonomo
Contents
I. Model Derivation pg. 1I A. Backwards Fokker-Planck Equation pg. 1I B. Probability of CRISPR Cleavage pg. 3I C. Time to CRISPR Cleavage or Dissociation pg. 4II. Model Parameters pg. 6III. Landscape Results pg. 8IV. Consecutive Mismatches – Extended Results pg. 9
I. MODEL DERIVATIONA. Backwards Fokker-Planck Equation
We can imagine the CRISPR recognition reaction occurring on a one-dimensional land-scape with forward rates λ and backward rates µ to move between discrete states. State 0represents the Cas9:crRNA unbound to target DNA, state 1 represents the bound state ofthe Cas9 to the PAM of the target DNA, states i for i = 2 ...M − M − M represents the post-cleavage state.Given the initial Cas9:PAM interaction at time t = 0, we want to determine the prob-ability p and time t to go from state 1 to cleavage in state M . We can start with theone-dimensional Fokker-Planck equation for a stochastic process in operator form,˙ P i ( t ) = LP i ( t ) , (S1)where P i ( t ) is the probability of reaching M from state i at time t and L is the linear operator L i,j = − µ i δ i,j − λ i δ i,j + µ j δ j,i +1 + λ j δ j,i − . a r X i v : . [ phy s i c s . b i o - ph ] J a n he probability of the CRISPR reaction being in state M therefore changes over timeaccording to ˙ P i ( t ) = − µ i P i − λ i P i + µ i +1 P i +1 + λ i − P i − . Note that there is no drift term since the λ i and µ i rates are different for each state i , andthere is no diffusion term since we have discrete states.Given an initial P i (0), we want to know the probability of being in state M at a latertime t , so we would integrate Eq. S1 forward in time. However, it will be simpler to considera final P M ( T ) at t = T and determine the probability of ending up in the target state fromstate i at t = 0 by integrating backwards in time. The backwards Fokker-Planck equation is − ˙ P i ( t ) = L | P i ( t ) , (S2)where L | is the transpose of L , L i,j = − µ i δ i,j − λ i δ i,j + µ i δ j +1 ,i + λ i δ j − ,i , which leads to − ˙ P i ( t ) = − µ i P i − λ i P i + µ i P i − + λ i P i +1 . The boundary conditions of Eq. S2 are P i ( ∞ ) = δ i,M P i (0) = probability to reach M from i by t i < ∞ , however it will be easier to integrate forward in time by shifting the boundaries to P i (0) = δ i,M P i ( − t ) = probability to reach M from i by 0 − ( − t ) = t i . which changes the sign of t , ˙ P i ( t ) = L | P i ( t ) . (S3)We can now integrate Eq. S3 forward in time to obtain the First Passage Time, T i , to reachstate M from state i , T i = Z ∞ t ∂P i ( t ) ∂t dt = tP i ( t ) (cid:12)(cid:12)(cid:12) ∞ − Z ∞ P i ( t ) dt, however, this is undefined with P i (0) = 0 P i ( ∞ ) = P eq i , where P eq i is the equilibrium probability distribution for state i .We can get the integral to converge by redefining the quantity that we want to calculate asthe survival probability, S i ( t ), or the probability that the CRISPR reaction has not reachedstate M at time t , S i ( t ) = P eq i − P i ( t ) (S4)2 i (0) = P eq i S i ( ∞ ) = 0 . The First Passage Time is now calculated as T i = − Z ∞ t ∂S i ( t ) ∂t dt = − tS i ( t ) (cid:12)(cid:12)(cid:12) ∞ + Z ∞ S i ( t ) dt = Z ∞ S i ( t ) dt. (S5)We can substitute Eq. S4 and its derivative into Eq. S3 to obtain − ˙ S i ( t ) = L | ( P eq i − S i ( t ))= − µ i P eq i − λ i P eq i + µ i P eq i − + λ i P eq i +1 + µ i S i + λ i S i − µ i S i − − λ i S i +1 . Since the equilibrium probability distribution of reaching state M from state i is P eq i = µ i P eq i − + (1 − µ i − λ i ) P eq i + λ i P eq i +1 µ i P eq i − − µ i P eq i − λ i P eq i + λ i P eq i +1 , (S6)we can determine that ˙ S i ( t ) = − µ i S i − λ i S i + µ i S i − + λ i S i +1 ˙ S i ( t ) = L | S i ( t ) . Now Eq. S5 can be solved as T i = Z ∞ S i ( t ) dt = Z ∞ ˙ S i ( t ) L | dt = S i ( t ) L | (cid:12)(cid:12)(cid:12) ∞ S i ( ∞ ) − S i (0) = L | T i − S i (0) = − µ i T i − λ i T i + µ i T i − + λ i T i +1 P eq i = µ i T i + λ i T i − µ i T i − − λ i T i +1 . (S7)The T i in Eq. S7 can be divided by the conditional probability P eq i for reaching state M ( p i ), state 0 (1 − p i ), or either state (1 − p i + p i = 1) to obtain the t i that we want. B. Probability of CRISPR Cleavage
Given Eq. S6 and the fact that states 0 and M are considered absorbing states, theprobability p i of reaching state M from i is p = 0 p M = 10 = µ i p i − − µ i p i − λ i p i + λ i p i +1 . (S8)As Eq. S8 is a recursive equation, we can extract a concise equation for our desired proba-bility. For i = 1,...,M let z i = p i − p i − , (S9)3nd it follows that M X i =1 z i = p − p + p − p + ... + p M − p M − = p M − p = 1 . Using Eq. S8 and simplifying algebraically, we find that z i +1 = γ i z i where γ i = µ i /λ i . Withthis relationship and Eq. S9, we obtain z = p z = γ z = γ p z = γ z = γ γ p , and so on. The sum of all z values is then1 = M X i =1 z i = p + γ p + γ γ p + ... p (1 + γ + γ γ + ... ) p = 11 + γ + γ γ + ...p = 11 + M − P i =1 i Q j =1 γ j , (S10)where Eq. S10 is the probability that, given PAM binding in state i = 1, all crRNA nu-cleotides will be bound to those of the target DNA, and target DNA will be cleaved in state M . The equation for this probability from any state i is then p i = p (cid:16) i − X j =1 j Y k =1 γ k (cid:17) . (S11) C. Time to CRISPR Cleavage or Dissociation
Given Eq. S7 and the fact that states 0 and M are considered absorbing states, the time T i to reach state M from i is T = ∞ T M = 0 p i = µ i T i + λ i T i − µ i T i − − λ i T i +1 (S12) T i = p i µ i + λ i + λ i µ i + λ i T i +1 + µ i µ i + λ i T i − (S13)4s Eq. S13 is a recursive equation, we can extract a concise equation for our desired time.For i = 1,...,M-1 let y i = T i − T i +1 , (S14)and it follows that M − X i =1 y i = T − T + T − T + ... + T M − − T M = T − T M = T . Using Eq. S13 and simplifying algebraically, we find that y i = p i λ i + γ i y i − . With thisrelationship and Eq. S14, we obtain y = p λ − γ T y = p λ + γ y = p λ + γ p λ − γ γ T y = p λ + γ y = p λ + γ p λ + γ γ p λ − γ γ γ T , and so on. The sum of all y values is then T = M − X i =1 y i = p λ − γ T + p λ + γ p λ − γ γ T + p λ + γ p λ + γ γ p λ − γ γ γ T + ...T = ( p λ + p λ + p λ + ... ) + [ p λ ( γ + γ γ + ... ) + p λ ( γ + γ γ + ... ) + ... ]1 + γ + γ γ + γ γ γ ...T = M − P i =1 p i λ i + M − P i =1 p i λ i M − P j = i +1 j Q k = i +1 γ k ! M − P i =1 i Q j =1 γ j . (S15)Dividing Eq. S15 by the probability p of reaching state M from state 1, yields t (1 ,M ) = M − X i =1 p i λ i + M − X i =1 p i λ i M − X j = i +1 j Y k = i +1 γ k ! , (S16)which is the time from the PAM binding in state i = 1 to target DNA cleavage in state M .Conversely, if we consider 1 − p i , which is the probability of reaching state 0 from state i in Eq. S12, and we divide the resulting equivalent of Eq. S15 by 1 − p , the probability ofreaching state 0 from state 1, we obtain t (1 , = " M − X i =1 − p i λ i + M − X i =1 − p i λ i M − X j = i +1 j Y k = i +1 γ k ! p − p , (S17)which is the time from the PAM binding in state i = 1 to dissociation from the PAM instate i = 0. 5 I. MODEL PARAMETERS
The free energy and reaction parameters utilised in the model are summarised in Table S1.
TABLE S1. Energy and kinetic parameters used in the CRISPR model and the experimental sourcefrom which they were obtained or estimated. The dsDNA and RNA:DNA melting and bindingenergies are simplified to be the same for A:T and C:G base pairs (bp). Note that the forward andreverse attempt rates A for interrogating the dsDNA protospacer ( λ i for i = 1 ...M − µ i for i = 2 ...M −
1) were assumed to be equivalent. k B T = 0 .
62 kcal/mol is used.
Free Energy Description Value Source ∆ G MeltMatch melting 1 dsDNA bp 3 kcal/mol [1]∆ G Supercoiling topological constraint per bp -0.8 kcal/mol [2–5]∆ G BindMatch binding 1 RNA:DNA bp − ∆ G MeltMatch [6]∆ G BindMismatch binding 1 mismatched RNA:DNA bp − / G MeltMatch [6, 7]∆ G Cleavage break DNA phosphodiester bonds 6 kcal/mol per strand [8]
Rate Attempt Rate Barrier Height Source
PAM dissociation ( µ ) A , = 6x10 ∆ E , = 3.4 kcal/mol [9–11]dsDNA separation initiation ( λ ) A , = A , ∆ E , = 10.2 kcal/mol [2]dsDNA melting ( λ i , for i = 2 ...M − A i,i +1 = A i,i − = 1x10 ∆ E i,i +1 = 0.9 kcal/mol [12, 13]dsDNA cleavage ( λ M − ) A M − ,M = 1x10 ∆ E M − ,M = 12 kcal/mol [14] There is a reduced free energy for Cas9 to separate a double-stranded DNA (dsDNA)target that is associated with supercoiling [1, 3, 15],∆ G dsDNAseparation = ∆ G MeltMatch + ∆ G Supercoiling , (S18)where ∆ G MeltMatch is the free energy associated with melting individual base pairs of re-laxed dsDNA. It is energetically favourable to separate DNA strands that are topologicallyconstrained due to negative supercoiling [1]. For easy strand separation and compaction,cellular DNA is generally kept 5% to 7% under-wound, resulting in negative superhelicaltwists [16]. In fact, positive supercoiling helps to protect extreme thermophiles from spon-taneous thermal denaturation [17]. During lysogenic infection, the phage injects its DNAinto the cell, the DNA joins its ends and circularises, and the circular DNA then becomesnegatively supercoiled by means of the host’s machinery in order to more easily integrateitself into the bacterial genome [18]. Though in vivo supercoiling is a dynamic quantity [3],we estimate an average free energy of supercoiling for n DNA base pairs to be∆ G Supercoiling = n qRTh σ, (S19)where R is the gas constant and T = 37 ◦ C is the temperature inside the cell [2, 3]. Theaverage superhelical density σ is taken to be -0.06 [4], the parameter q is an experimentallydetermined coefficient ≈ h is 10.4 bp per turn [5].In general for protein:DNA interaction energetics, the change in free energy related tobinding has specific E ( −→ b ) and nonspecific E ns components,∆ G binding ( N ) = E ( −→ b ) + E ns , (S20)6here −→ b is a DNA sequence b i , ...b i + N − of length N [19]. The nonspecific binding energydoes not depend on the actual nucleotide sequence, but rather accounts for the interactionbetween the protein and the DNA’s phosphate backbone. The specific binding energy islinearly related to the individual contribution of the protein interacting with each nucleotide,in which the energy change with respect to binding the matching target is E ( −→ b ) − E ( −→ b ) Match = N X i =1 (cid:15) ( i, b i ) , (S21)where (cid:15) ( i, b i ) is the energy penalty of mismatching base b i in position i [19, 20]. The valueof (cid:15) ( i, b i ) is obtained according to the protein’s position weight matrix, which can be definedin several ways [21]. This matrix approximates protein specificity by assigning a score toeach base at each position, relative to the matching target sequence.At present, a position weight matrix has not been experimentally determined for Cas9,so we proceed as follows. For PAM binding ∆ G PAM , the scores are defined as position- andbase-independent match rewards that each contribute to the binding by an amount of energy (cid:15) such that ∆ G PAM ( n match ) = (cid:15) ∗ n match + E ns , (S22)where n match is the number of matching nucleotides between the Cas9’s DNA-interactingdomain and the DNA target. Dissociation rates for Cas9 from matching and non-matchingNGG PAMs were obtained from published CRISPR experiments [22, 23]. A generic barrierheight and attempt rate for protein:DNA dissociation were estimated from [9] and transitionstate theory for protein unfolding [10, 11]. The Arrhenius relation was then used to calculate∆ G PAM for binding matching and non-matching 3nt-PAMs, and these values were used toobtain (cid:15) and E ns from a linear fit of Eq. S22,∆ G PAM ( n match ) = − .
63 kcal/mol ∗ n match − . . (S23)Given experimental observations of negligible Cas9 interaction with the DNA sequence inthe absence of a PAM [22, 24], ∆ G PAM (0) was set to 0. It is important to note thatthis fitting method and use of Eq. S23 to define the free energy change greatly simplifyCas9:PAM binding. Endogenous Cas9 PAMs, such as the NGG of
Streptococcus pyogenes or the NGRRT of
Staphylococcus aureus , have complex position- and base-dependent energypenalties for mismatches that impact binding [1, 24, 25]. However, the objective of thismodel is to investigate how the probability of cleaving a DNA sequence, and the timefor cleaving or dissociating from that sequence, depends on the amount of initial energyassociated with the first module (i.e., the bound Cas9:PAM). Eq. S23 allows us to probehow the probability and times are impacted when that first module’s energy is increasedand decreased in approximate units of mismatches.7
II. LANDSCAPE RESULTS
The reaction landscapes for the CRISPR interrogation of target DNA were defined bythe parameters in Table S1, where ∆ G i was the depth of each state i ,∆ G = ∆ G PAM ( n match ) , (S24)∆ G >i>M = ∆ G dsDNAseparation + ∆ G BindDNA , (S25)∆ G M = ∆ G Cleavage , (S26)the forward barriers were ∆ E i − ,i , and the reverse barriers were calculated as∆ E i,i − = ∆ E i − ,i − ∆ G i , (S27)as stated in the main text. For a target DNA sequence that was a perfect match, thelandscapes all trended downwards (Figure S1A). Mismatches caused the landscapes to trendupwards, and the more mismatches there were, the more energetically unfavourable it wasfor the CRISPR to cleave the DNA sequence (Figure S1B). It has been suggested that near-perfect complementarity between the crRNA and target DNA in the PAM-proximal regionlowers the energy needed to continue binding the rest of the target, such that it is less thanthat of the reverse unzipping reaction [24]. A CRISPR-Cas Reaction States -60-40-20020 F r ee E ne r g y ( kc a l / m o l ) B CRISPR-Cas Reaction States -40-20020 F r ee E ne r g y ( kc a l / m o l ) FIG. S1. (A)
Reaction landscapes calculated for nine representative CRISPR systems that hadPAMs of different lengths. In all cases, the target DNA sequence’s PAM and protospacer were per-fect matches to the Cas9:crRNA. (B)
Reaction landscapes calculated for the 3nt-PAM CRISPRwhen faced with a target DNA sequence that had no (blue), 5 (tan), and 20 (brown) mismatches.The dotted lines designate where the mismatches were located (coloured to match their corre-sponding landscape). V. CONSECUTIVE MISMATCHES – EXTENDED RESULTS A P1 P2 P3 1 5 10 15 20 25 30
Location of Mismatch (nt position) -2 A ve r a g e T i m e t o C l eava g e ( s ) PAM1 5 10 15 20 25 300200040006000800010000 C u m u l a t i ve M i s m a t c h es B P1 P2 P3 1 5 10 15 20 25 30
Location of Mismatch (nt position) -15 -10 -5 A ve r a g e T i m e t o D i ss o c i a t e ( s ) PAM1 5 10 15 20 25 300200040006000800010000 C u m u l a t i ve M i s m a t c h es FIG. S2. Mismatch tolerance for the 3-nt PAM system illustrated by averaging over (A) cleavagetime or (B) dissociation time when there were mismatches at each sequence location. Each curve isa geometric mean from 10,000 iterations of testing the specified number of consecutive mismatches(from 2 to 24 mismatches) along all possible positions in the sequence, and error bars representstandard error. The insets for each figure show the cumulative number of mismatches at eachsequence location when (A) cleavage occurred and (B) dissociation occurred. (In other words, ifeach curve of the insets was divided by the total number of cumulative mismatches tested at eachsequence location, the inset in (A) would show the cleavage frequency plotted in Fig. 3 of the maintext, and the inset in (B) would show the dissociation frequency.) P i denotes the location of PAMnucleotide i , and the x -axis restarts at 1 for nucleotides after the PAM.[1] Farasat I, Salis HM. A biophysical model of CRISPR/Cas9 activity for rational design ofgenome editing and gene regulation. PLoS Comp Biol , 12(1):e1004724, 2016.[2] Bauer WR, Benham CJ. The free energy, enthalpy and entropy of native and of partiallydenatured closed circular DNA.
J Mol Biol , 234(4):1184–1196, 1993.[3] Westra ER, van Erp PB, K¨unne T, Wong SP, Staals RH, Seegers CL, Bollen S, Jore MM,Semenova E, Severinov K, et al. CRISPR immunity relies on the consecutive binding anddegradation of negatively supercoiled invader DNA by Cascade and Cas3.
Mol Cell , 46(5):595–605, 2012.
4] Wang H, Benham CJ. Superhelical destabilization in regulatory regions of stress responsegenes.
PLoS Comp Biol , 4(1):e17, 2008.[5] Wang JC. Helical repeat of DNA in solution.
Proc Natl Acad Sci USA , 76(1):200–203, 1979.[6] Rauzan B, McMichael E, Cave R, Sevcik LR, Ostrosky K, Whitman E, Stegemann R, SinclairAL, Serra MJ, Deckert AA. Kinetics and thermodynamics of DNA, RNA, and hybrid duplexformation.
Biochemistry , 52(5):765–772, 2013.[7] Leonard GA, Booth ED, Brown T. Structural and thermodynamic studies on the adenine.guanine mismatch in B-DNA.
Nucleic Acids Res , 18(19):5617–5623, 1990.[8] Dickson KS, Burns CM, Richardson JP. Determination of the free-energy change for repairof a DNA phosphodiester bond.
J Biol Chem , 275(21):15828–15831, 2000.[9] Ecevit O, Khan MA, Goss DJ. Kinetic analysis of the interaction of b/HLH/Z transcriptionfactors Myc, Max, and Mad with cognate DNA.
Biochemistry , 49(12):2627–2635, 2010.[10] Garcia-Viloca M, Gao J, Karplus M, Truhlar DG. How enzymes work: analysis by modernrate theory and computer simulations.
Science , 303(5655):186–195, 2004.[11] Pollak E, Talkner P. Reaction rate theory: What it was, where is it today, and where is itgoing?
Chaos , 15(2):026116, 2005.[12] Chen X, Zhou Y, Qu P, Zhao XS. Base-by-base dynamics in DNA hybridization probed byfluorescence correlation spectroscopy.
J Am Chem Soc , 130(50):16947–16952, 2008.[13] Sanstead PJ, Tokmakoff A. Direct observation of activated kinetics and downhill dynamics inDNA dehybridization.
J Phys Chem B , 122(12):3088–3100, 2018.[14] Gong S, Yu HH, Johnson KA, Taylor DW. DNA unwinding is the primary determinant ofCRISPR-Cas9 activity.
Cell Rep , 22(2):359–371, 2018.[15] Szczelkun MD, Tikhomirova MS, Sinkunas T, Gasiunas G, Karvelis T, Pschera P, Siksnys V,Seidel R. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascadeeffector complexes.
Proc Natl Acad Sci USA , 111(27):9798–9803, 2014.[16] Nelson D, Cox M.
Bioenergetics and metabolism in Lehninger Principles of Biochemistry .New York: Worth Publishers, 3rd edition, 2000.[17] Witz G, Dietler G, Stasiak A. Effect of DNA Supercoiling on DNA Decatenation and Un-knotting Followed By Brownian Dynamics Simulations.
Biophys J , 98(3):62a, 2010.[18] Trun N, Trempy J.
Fundamental bacterial genetics . John Wiley & Sons, 2009.[19] Slutsky M, Mirny LA. Kinetics of protein-dna interaction: facilitated target location insequence-dependent potential.
Biophys J , 87(6):4021–4035, 2004.[20] Phillips R, Kondev J, Theriot J, Garcia H.
Physical biology of the cell . Garland Science, 2012.[21] Stormo GD. Modeling the specificity of protein-DNA interactions.
Quant Biol , 1(2):115–130,2013.[22] Jones DL, Leroy P, Unoson C, Fange D, ´Curi´c V, Lawson MJ, Elf J. Kinetics of dCas9 targetsearch in
Escherichia coli.
Science , 357(6358):1420–1424, 2017.[23] Singh D, Sternberg SH, Fei J, Doudna JA, Ha T. Real-time observation of DNA recognitionand rejection by the RNA-guided endonuclease Cas9.
Nat Commun , 7(1):1–8, 2016.[24] Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by theCRISPR RNA-guided endonuclease Cas9.
Nature , 507(7490):62, 2014.[25] Jiang W, Bikard D, Cox D, Zhang F, Marraffini LA. RNA-guided editing of bacterial genomesusing CRISPR-Cas systems.
Nat Biotechnol , 31(3):233–239, 2013., 31(3):233–239, 2013.