[PDF] A small PAM optimises target recognition in the CRISPR-Cas immune system

Abstract

CRISPR-Cas is an adaptive immune mechanism that has been harnessed for a variety of genetic engineering applications: the Cas9 protein recognises a 2-5nt DNA motif, known as the PAM, and a programmable crRNA binds a target DNA sequence that is then cleaved. While off-target activity is undesirable, it occurs because cross-reactivity was beneficial in the immune system on which the machinery is based. Here, a stochastic model of the target recognition reaction was derived to study the specificity of the innate immune mechanism in bacteria. CRISPR systems with Cas9 proteins that recognised PAMs of varying lengths were tested on self and phage DNA. The model showed that the energy associated with PAM binding impacted mismatch tolerance, cleavage probability, and cleavage time. Small PAMs allowed the CRISPR to balance catching mutant phages, avoiding self-targeting, and quickly dissociating from critically non-matching sequences. Additionally, the results revealed a lower tolerance to mismatches in the PAM and a PAM-proximal region known as the seed, as seen in experiments. This work illustrates the role that the Cas9 protein has in dictating the specificity of DNA cleavage that can aid in preventing off-target activity in biotechnology applications.

Full PDF

AA small PAM optimises target recognition in theCRISPR-Cas immune system

Melia E. Bonomo , Department of Physics & Astronomy, Rice University, Houston, TX, USA Center for Theoretical Biological Physics, Rice University, Houston, TX, USA

Keywords:

CRISPR-Cas, adaptive immunity, bacteria, phage, modularity, stochastic model

Author for correspondence:

M.E. Bonomo, [email protected]

ABSTRACT

CRISPR-Cas is an adaptive immune mechanism that has been harnessed for a variety ofgenetic engineering applications: the Cas9 protein recognises a 2–5nt DNA motif, known asthe PAM, and a programmable crRNA binds a target DNA sequence that is then cleaved.While oﬀ-target activity is undesirable, it occurs because cross-reactivity was beneﬁcial inthe immune system on which the machinery is based. Here, a stochastic model of the targetrecognition reaction was derived to study the speciﬁcity of the innate immune mechanismin bacteria. CRISPR systems with Cas9 proteins that recognised PAMs of varying lengthswere tested on self and phage DNA. The model showed that the energy associated withPAM binding impacted mismatch tolerance, cleavage probability, and cleavage time. SmallPAMs allowed the CRISPR to balance catching mutant phages, avoiding self-targeting,and quickly dissociating from critically non-matching sequences. Additionally, the resultsrevealed a lower tolerance to mismatches in the PAM and a PAM-proximal region knownas the seed, as seen in experiments. This work illustrates the role that the Cas9 protein hasin dictating the speciﬁcity of DNA cleavage that can aid in preventing oﬀ-target activity inbiotechnology applications. 1 a r X i v : . [ phy s i c s . b i o - ph ] J a n . INTRODUCTION Clustered regularly interspaced short palindromic repeats (CRISPR) and the CRISPR-associated proteins (Cas) constitute a genetic adaptive immune system found in bacteriaand archaea [1, 2]. The immunological memory is comprised of alternating DNA repeatsand short sequences known as spacers, which match sequences known as protospacers inthe genomes of mobile genetic threats, such as phages. CRISPR-Cas transcribes its spacersas RNA guides (crRNA) that help Cas proteins to recognise and cleave subsequent geneticinfectors. Owing to its ability to interact with programmable DNA targets, the Cas9 proteinfrom

Streptococcus pyogenes has become a prominent tool for a variety of genetic editingand gene expression applications [3]. As these biotechnologies rapidly advance, there is stillmore to be understood about the speciﬁcity and eﬃciency of DNA recognition and cleavage.The CRISPR target interrogation reaction is modular: the Cas9 searches for and binds toa 2–5nt protospacer-associated DNA motif (PAM), and the crRNA binds to the associatedDNA sequence [4]. During the PAM interaction, the Cas9 domain that locks with thephosphate group of the ﬁrst DNA base pair causes a distortion in the double-strands thatstarts double-stranded DNA (dsDNA) melting [5]. Melting relies on Cas9:PAM bindingenergy, because Cas9 does not hydrolyse ATP [6]. After the start of local dsDNA melting,crRNA:DNA hybridisation begins, where the roughly 30nt crRNA binds the available targetDNA in a directional, base-by-base sequential manner [6, 7]. A suﬃcient crRNA:DNAmatch will lead to cleavage of the target DNA. Experiments and stochastic modelling haveshown that the number and position of mismatches between the Cas9:crRNA and targetDNA impact the CRISPR’s speciﬁcity, whereby multiple mismatches in the PAM and PAM-proximal region signiﬁcantly lower cleavage probability [7–9].To date, there has been little work investigating the extent to which the Cas9:crRNAmodularity regulates a successful immune recognition reaction. Here, a stochastic modelwas derived to study how the size of the PAM module impacts the free energy landscape ofthe target interrogation reaction, which was deﬁned as beginning with Cas9:PAM binding,followed by base-by-base dsDNA melting and crRNA:DNA binding, and ending with DNAcleavage. Cleavage probability, cleavage time, and dissociation time (when cleavage doesnot occur) were calculated for simulated phage DNA with varying mutation rates and selfDNA. The model demonstrated that small PAMs, comparable to those found in endogenous2RISPR systems, exhibited hierarchical mismatch tolerance and were suﬃciently speciﬁcfor determining self versus non-self DNA, while still being suﬃciently cross-reactive and fastto protect against mutant phage attacks.

II. METHODSA. The DNA Target Recognition Reaction

Let us consider the reaction through which the CRISPR-Cas binds and cleaves a DNAsequence as unfolding on a landscape of 0 to M discrete states (Figure 1). The barrierheights ∆ E j,k between state j to state k are determined as∆ E i,i − = ∆ E i − ,i − ∆ G i (1)where ∆ G i is the free energy diﬀerence between state i and i −

1, deﬁned by∆ G = ∆ G PAM ( n match ) , (2)∆ G >i>M = ∆ G dsDNAseparation + ∆ G BindDNA , (3)∆ G M = ∆ G Cleavage . (4)The ∆ G PAM ( n match ) is a the free energy associated with binding the Cas9:DNA, where n match is the number of matches between the Cas9 and the target DNA sequence’s PAM. The freeenergy of melting each target DNA base pair, ∆ G dsDNAseparation , is based on the energyassociated with melting dsDNA that is matched and negatively supercoiled, which makes itenergetically favourable to separate the strands [7]. The free energy ∆ G BindDNA associatedwith binding each crRNA:DNA base pair after the PAM is ∆ G BindMatch if the base pairmatches and ∆ G BindMismatch if it does not. The ∆ G Cleavage is the free energy associated withbreaking the phosphodiester bonds of both strands of the DNA target. The reverse µ andforward λ rates for each reaction state are then determined using the Arrhenius relation as µ i = A i,i − e − ∆ E i,i − /k B T (5) λ i = A i,i +1 e − ∆ E i,i +1 /k B T , (6)3here A j,k is the attempt rate to cross the barrier from state j to state k , k B is the Boltzmannconstant, and T is temperature. The free energy and kinetic parameters used in the modelare estimated from experiments, as described in the Supplemental Material. CR RNA:DNA

BINDING

BASE - BY - BASE Δ G NOT

BOUND

GGC TA CTAGAA AT GGC TA CTAGAA AT C T A C G A C GG C G AA G A T G C T G CC G C TT C AS BOUND C T A C G A C GG C G A A GG C T A C T A G A A A T C T G CC G G A T C TT G POST - CLEAVAGE GG C T A C T A G AA A T C T A C G A C GG C G AA C T G CC G G A T C TT G DNA

TARGET

FULLY

BOUND

GGC T A C T A G A A A T C T A C G A C GG C G AA G C T G CC G C TT G A T μ i λ i i = 0 i = 1 i = 2... M − 2 i = M − 1 i = M M i Δ E Δ E FIG. 1. Diagram of the reaction landscape for CRISPR-Cas interrogation of a DNA sequence.State 0 represents the Cas9:crRNA unbound to the DNA. In state 1, the Cas9 has bound thePAM, and in states i = 2 through M −

2, there is binding of individual crRNA:DNA base pairs.State M − M is after cleavage occurs.Examples of the free energy ∆ G i − and barrier heights ∆ E i − ,i and ∆ E i,i − for i = 2 are marked. B. Probability and Time to Cleave DNA Target

Utilising the reverse and forward reaction rates, the probability of cleaving a DNA targetfrom the initial Cas9:PAM interaction in state i = 1 is p (1 ,M ) = 11 + M − P i =1 i Q j =1 γ j , (7)and from any state i is p ( i,M ) = p (1 ,M ) (cid:16) i − X j =1 j Y k =1 γ k (cid:17) , (8)4here γ i = µ i /λ i . The time to cleave the DNA target starting from the initial Cas9:PAMstate i = 1 is t (1 ,M ) = M − X i =1 p ( i,M ) λ i + M − X i =1 p ( i,M ) λ i M − X j = i +1 j Y k = i +1 γ k ! , (9)and the time for the Cas9 protein to dissociate from the PAM is t (1 , = " M − X i =1 − p ( i,M ) λ i + M − X i =1 − p ( i,M ) λ i M − X j = i +1 j Y k = i +1 γ k ! p (1 ,M ) − p (1 ,M ) . (10)These equations were derived using the backwards Fokker-Planck equation, as detailed inthe Supplemental Material, and they describe the absorbing probability and time for anystochastic system with state-dependent forward and reverse rates. III. MODELLING RESULTS

Hypothetical Cas9 proteins that had PAM speciﬁcities varying in length from 0nt to 33ntwere associated with crRNAs and tested on target DNA sequences of ﬁxed length, consistingof 33nt for the PAM and protospacer together, with assorted numbers and locations ofmismatches. For the Cas9:crRNA interrogation of each DNA sequence, the landscape ofreaction states was generated (Supplemental Figure S1), the probability of cleavage wascalculated with Eq. 7, and a random number was generated to determine whether or notcleavage occurred given this probability. If the target sequence was cleaved, the cleavagetime was calculated with Eq. 9, otherwise the time for dissociation of the Cas9:crRNA fromthe target DNA sequence was calculated with Eq. 10.

A. Random Mismatches

In the ﬁrst run of the model, mismatches were generated in random sequence locations.Phage PAMs and protospacers were given mutation rates ν = 0 . , . , . , and 0.50,corresponding to approximately 2, 5, 10, and 16 mismatches in each sequence, respectively.Self DNA sequences of the same 33nt-length were generated with arbitrary matches tothe Cas9:crRNA with a 1 in 4 probability (equivalent to ν = 0 . p (1 ,M ) > . ν ≤ .

30 was not greatly aﬀected by PAM size, though in general, more mismatches sloweddown cleavage (Fig. 2B). Importantly, the PAMs that were 5nt or less had p (1 ,M ) < .

05 forcleaving their own sequences, and a very fast dissociation time (on the order of 10 − s to10 − s) from any sequence that was not cleaved (Fig. 2C). A Size of PAM (nt) A ve r a g e P r ob a b ili t y o f C l eava g e Self DNAPhage = 0.50Phage = 0.30Phage = 0.15Phage = 0.05 B Size of PAM (nt) -1 A ve r a g e T i m e t o C l eava g e ( s ) Self DNAPhage = 0.50Phage = 0.30Phage = 0.15Phage = 0.05 C Size of PAM (nt) -10 -5 A ve r a g e T i m e t o D i ss o c i a t e ( s ) Self DNAPhage = 0.50Phage = 0.30Phage = 0.15Phage = 0.05 -10 -5 FIG. 2. (A)

Probability of cleavage, (B) cleavage time, and (C) dissociation time when cleavagedid not occur (inset shows the curves oﬀset for clarity). Each curve was averaged from 10,000iterations of the model, and error bars represent standard error. The average times were calculatedas geometric means when the respective events occurred in at least 5% of the iterations for aparticular PAM size. . Consecutive Mismatches In the second model run, mismatches were generated consecutively across all possiblepositions. Mismatch tolerance was proﬁled for all numbers of mismatches from 2 to 24,by averaging over the number of times cleavage occurred (called cleavage frequency) whenthere was a mismatch at a particular sequence location. For the 3nt-PAM system, therewas low tolerance (cleavage frequency < .

6) in the PAM and moderate tolerance (cleavagefrequency ≈ .

8) in the ﬁrst 15nt after the PAM depending on the number of consecutivemismatches (Fig. 3).

P1 P2 P3 1 5 10 15 20 25 30

Location of Mismatch (nt position) C l eava g e F r e qu e n cy FIG. 3. Mismatch tolerance for the 3-nt PAM system illustrated by averaging over cleavage occur-rences when there were mismatches at each sequence location. A cleavage frequency of 1 meansthat a mismatch at that particular location was always tolerated, whereas 0 means it was nevertolerated, when there were the speciﬁed number of consecutive mismatches. Each curve was aver-aged from 10,000 iterations of the model, and error bars represent standard error. P i denotes thelocation of PAM nucleotide i , and the x -axis restarts for nucleotides after the PAM. IV. DISCUSSION

An eﬀective CRISPR system needs to be able to swiftly cleave foreign DNA, includingsequences that are not exact matches to the crRNA, while avoiding cleaving self DNA andquickly dissociating to minimise the DNA target search time. This work showed how themodularity of Cas9:PAM and crRNA:DNA binding regulated target recognition from theenergy associated with binding the ﬁrst module.Larger PAMs led to more cross-reactivity, whereas smaller PAMs led to more speciﬁctargeting. Experiments have shown that Cas9 evolved with broadened PAM compatibilityled to higher DNA targeting speciﬁcity [10]. In other words, the engineered Cas9 had a7ess speciﬁc PAM interaction, resulting in lower binding energy than the wild type, andengaged in less oﬀ-target activity as a result. Furthermore, the heightened sensitivity tomismatches in the PAM and PAM-proximal region has been experimentally observed. Singlemismatches in the PAM increase the binding free energy but can still lead to crRNA:DNAhybridisation [6, 7]. In particular, if there is suﬃcient crRNA:DNA complementarity in theﬁrst 8–12 bp, known as the seed region, and up to 8 mismatches in the remainder of thesequence [6].CRISPR systems with smaller PAMs were just as fast as those with larger PAMs atcleaving phage DNA at low mutation rates, and they had fast dissociation times, though withlarge variation.

In vivo experiments with a single Cas9 in

Escherichia coli demonstrated thatinterrogating a sequence next to a PAM site took less than 30 ms on average [4]. However,while some Cas9:crRNA complexes dissociate immediately when the seed does not match,others engage in more of crRNA:DNA hybridisation before dissociating [11]. Additionally,while PAM-distal mismatches did not aﬀect cleavage probability, the model demonstratedthat ≥ ≈ Competing Interests.

The author declares that the research was conducted in theabsence of any commercial or ﬁnancial relationships that could be construed as a potentialconﬂict of interest.

Funding.

No funding was received for this study.

Acknowledgements.

The author would like to thank M.W. Deem for valuable guidancein developing the model and A.B. Kolomeisky for helpful discussions. Part of this workappears in the author’s doctoral thesis [17]. [1] Makarova KS, Wolf YI, Iranzo J, Shmakov SA, Alkhnbashi OS, Brouns SJJ, Charpentier E,Cheng D, Haft DH, Horvath P, et al. Evolutionary classiﬁcation of CRISPR–Cas systems: aburst of class 2 and derived variants.

Nat Rev Microbiol , 18(2):67–83, 2020.[2] Bonomo ME, Deem MW. The physicist’s guide to one of biotechnology’s hottest new topics:CRISPR-Cas.

Phys Biol , 15(4), 2018.[3] Doudna JA, Charpentier E. The new frontier of genome engineering with CRISPR-Cas9.

Science , 346(6213), 2014.[4] Jones DL, Leroy P, Unoson C, Fange D, ´Curi´c V, Lawson MJ, Elf J. Kinetics of dCas9 targetsearch in

Escherichia coli.

Science , 357(6358):1420–1424, 2017.

5] Anders C, Niewoehner O, Duerst A, Jinek M. Structural basis of PAM-dependent target DNArecognition by the Cas9 endonuclease.

Nature , 513(7519):569–573, 2014.[6] Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by theCRISPR RNA-guided endonuclease Cas9.

Nature , 507(7490):62, 2014.[7] Farasat I, Salis HM. A biophysical model of CRISPR/Cas9 activity for rational design ofgenome editing and gene regulation.

PLoS Comp Biol , 12(1):e1004724, 2016.[8] Jiang W, Bikard D, Cox D, Zhang F, Marraﬃni LA. RNA-guided editing of bacterial genomesusing CRISPR-Cas systems.

Nat Biotechnol , 31(3):233–239, 2013.[9] Klein M, Eslami-Mossallam B, Arroyo DG, Depken M. Hybridization Kinetics ExplainsCRISPR-Cas Oﬀ-Targeting Rules.

Cell Rep , 22(6):1413–1423, 2018.[10] Hu JH, Miller SM, Geurts MH, Tang W, Chen L, Sun N, Zeina CM, Gao X, Rees HA, Lin Z,et al. Evolved Cas9 variants with broad PAM compatibility and high DNA speciﬁcity.

Nature ,556(7699):57–63, 2018.[11] Szczelkun MD, Tikhomirova MS, Sinkunas T, Gasiunas G, Karvelis T, Pschera P, Siksnys V,Seidel R. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascadeeﬀector complexes.

Proc Natl Acad Sci USA , 111(27):9798–9803, 2014.[12] Bradde S, Nourmohammad A, Goyal S, Balasubramanian V. The size of the immune repertoireof bacteria.

Proc Natl Acad Sci USA , 117(10):5144–5151, 2020.[13] Shvets AA, Kolomeisky AB. Mechanism of genome interrogation: How CRISPR RNA-guidedCas9 proteins locate speciﬁc targets on DNA.

Biophys J , 113(7):1416–1424, 2017.[14] Nishimasu H, Cong L, Yan WX, Ran FA, Zetsche B, Li Y, Kurabayashi A, Ishitani R, ZhangF, Nureki O. Crystal structure of

Staphylococcus aureus

Cas9.

Cell , 162(5):1113–1126, 2015.[15] Jiang F, Doudna JA. The structural biology of CRISPR-Cas systems.

Curr Opin Struct Biol ,30:100–111, 2015.[16] Fineran PC, Charpentier E. Memory of viral infections by CRISPR-Cas adaptive immunesystems: acquisition of new information.

Virology , 434(2):202–209, 2012.[17] Bonomo ME.

Investigating Modular Structure and Function in Biology: from Immunology toCognition . PhD thesis, Rice University, 2020. upplemental Material A small PAM optimises target recognition in theCRISPR-Cas immune system

Melia E. Bonomo

Contents

I. Model Derivation pg. 1I A. Backwards Fokker-Planck Equation pg. 1I B. Probability of CRISPR Cleavage pg. 3I C. Time to CRISPR Cleavage or Dissociation pg. 4II. Model Parameters pg. 6III. Landscape Results pg. 8IV. Consecutive Mismatches – Extended Results pg. 9

I. MODEL DERIVATIONA. Backwards Fokker-Planck Equation

We can imagine the CRISPR recognition reaction occurring on a one-dimensional land-scape with forward rates λ and backward rates µ to move between discrete states. State 0represents the Cas9:crRNA unbound to target DNA, state 1 represents the bound state ofthe Cas9 to the PAM of the target DNA, states i for i = 2 ...M − M − M represents the post-cleavage state.Given the initial Cas9:PAM interaction at time t = 0, we want to determine the prob-ability p and time t to go from state 1 to cleavage in state M . We can start with theone-dimensional Fokker-Planck equation for a stochastic process in operator form,˙ P i ( t ) = LP i ( t ) , (S1)where P i ( t ) is the probability of reaching M from state i at time t and L is the linear operator L i,j = − µ i δ i,j − λ i δ i,j + µ j δ j,i +1 + λ j δ j,i − . a r X i v : . [ phy s i c s . b i o - ph ] J a n he probability of the CRISPR reaction being in state M therefore changes over timeaccording to ˙ P i ( t ) = − µ i P i − λ i P i + µ i +1 P i +1 + λ i − P i − . Note that there is no drift term since the λ i and µ i rates are diﬀerent for each state i , andthere is no diﬀusion term since we have discrete states.Given an initial P i (0), we want to know the probability of being in state M at a latertime t , so we would integrate Eq. S1 forward in time. However, it will be simpler to considera ﬁnal P M ( T ) at t = T and determine the probability of ending up in the target state fromstate i at t = 0 by integrating backwards in time. The backwards Fokker-Planck equation is − ˙ P i ( t ) = L | P i ( t ) , (S2)where L | is the transpose of L , L i,j = − µ i δ i,j − λ i δ i,j + µ i δ j +1 ,i + λ i δ j − ,i , which leads to − ˙ P i ( t ) = − µ i P i − λ i P i + µ i P i − + λ i P i +1 . The boundary conditions of Eq. S2 are P i ( ∞ ) = δ i,M P i (0) = probability to reach M from i by t i < ∞ , however it will be easier to integrate forward in time by shifting the boundaries to P i (0) = δ i,M P i ( − t ) = probability to reach M from i by 0 − ( − t ) = t i . which changes the sign of t , ˙ P i ( t ) = L | P i ( t ) . (S3)We can now integrate Eq. S3 forward in time to obtain the First Passage Time, T i , to reachstate M from state i , T i = Z ∞ t ∂P i ( t ) ∂t dt = tP i ( t ) (cid:12)(cid:12)(cid:12) ∞ − Z ∞ P i ( t ) dt, however, this is undeﬁned with P i (0) = 0 P i ( ∞ ) = P eq i , where P eq i is the equilibrium probability distribution for state i .We can get the integral to converge by redeﬁning the quantity that we want to calculate asthe survival probability, S i ( t ), or the probability that the CRISPR reaction has not reachedstate M at time t , S i ( t ) = P eq i − P i ( t ) (S4)2 i (0) = P eq i S i ( ∞ ) = 0 . The First Passage Time is now calculated as T i = − Z ∞ t ∂S i ( t ) ∂t dt = − tS i ( t ) (cid:12)(cid:12)(cid:12) ∞ + Z ∞ S i ( t ) dt = Z ∞ S i ( t ) dt. (S5)We can substitute Eq. S4 and its derivative into Eq. S3 to obtain − ˙ S i ( t ) = L | ( P eq i − S i ( t ))= − µ i P eq i − λ i P eq i + µ i P eq i − + λ i P eq i +1 + µ i S i + λ i S i − µ i S i − − λ i S i +1 . Since the equilibrium probability distribution of reaching state M from state i is P eq i = µ i P eq i − + (1 − µ i − λ i ) P eq i + λ i P eq i +1 µ i P eq i − − µ i P eq i − λ i P eq i + λ i P eq i +1 , (S6)we can determine that ˙ S i ( t ) = − µ i S i − λ i S i + µ i S i − + λ i S i +1 ˙ S i ( t ) = L | S i ( t ) . Now Eq. S5 can be solved as T i = Z ∞ S i ( t ) dt = Z ∞ ˙ S i ( t ) L | dt = S i ( t ) L | (cid:12)(cid:12)(cid:12) ∞ S i ( ∞ ) − S i (0) = L | T i − S i (0) = − µ i T i − λ i T i + µ i T i − + λ i T i +1 P eq i = µ i T i + λ i T i − µ i T i − − λ i T i +1 . (S7)The T i in Eq. S7 can be divided by the conditional probability P eq i for reaching state M ( p i ), state 0 (1 − p i ), or either state (1 − p i + p i = 1) to obtain the t i that we want. B. Probability of CRISPR Cleavage

Given Eq. S6 and the fact that states 0 and M are considered absorbing states, theprobability p i of reaching state M from i is p = 0 p M = 10 = µ i p i − − µ i p i − λ i p i + λ i p i +1 . (S8)As Eq. S8 is a recursive equation, we can extract a concise equation for our desired proba-bility. For i = 1,...,M let z i = p i − p i − , (S9)3nd it follows that M X i =1 z i = p − p + p − p + ... + p M − p M − = p M − p = 1 . Using Eq. S8 and simplifying algebraically, we ﬁnd that z i +1 = γ i z i where γ i = µ i /λ i . Withthis relationship and Eq. S9, we obtain z = p z = γ z = γ p z = γ z = γ γ p , and so on. The sum of all z values is then1 = M X i =1 z i = p + γ p + γ γ p + ... p (1 + γ + γ γ + ... ) p = 11 + γ + γ γ + ...p = 11 + M − P i =1 i Q j =1 γ j , (S10)where Eq. S10 is the probability that, given PAM binding in state i = 1, all crRNA nu-cleotides will be bound to those of the target DNA, and target DNA will be cleaved in state M . The equation for this probability from any state i is then p i = p (cid:16) i − X j =1 j Y k =1 γ k (cid:17) . (S11) C. Time to CRISPR Cleavage or Dissociation

Given Eq. S7 and the fact that states 0 and M are considered absorbing states, the time T i to reach state M from i is T = ∞ T M = 0 p i = µ i T i + λ i T i − µ i T i − − λ i T i +1 (S12) T i = p i µ i + λ i + λ i µ i + λ i T i +1 + µ i µ i + λ i T i − (S13)4s Eq. S13 is a recursive equation, we can extract a concise equation for our desired time.For i = 1,...,M-1 let y i = T i − T i +1 , (S14)and it follows that M − X i =1 y i = T − T + T − T + ... + T M − − T M = T − T M = T . Using Eq. S13 and simplifying algebraically, we ﬁnd that y i = p i λ i + γ i y i − . With thisrelationship and Eq. S14, we obtain y = p λ − γ T y = p λ + γ y = p λ + γ p λ − γ γ T y = p λ + γ y = p λ + γ p λ + γ γ p λ − γ γ γ T , and so on. The sum of all y values is then T = M − X i =1 y i = p λ − γ T + p λ + γ p λ − γ γ T + p λ + γ p λ + γ γ p λ − γ γ γ T + ...T = ( p λ + p λ + p λ + ... ) + [ p λ ( γ + γ γ + ... ) + p λ ( γ + γ γ + ... ) + ... ]1 + γ + γ γ + γ γ γ ...T = M − P i =1 p i λ i + M − P i =1 p i λ i M − P j = i +1 j Q k = i +1 γ k ! M − P i =1 i Q j =1 γ j . (S15)Dividing Eq. S15 by the probability p of reaching state M from state 1, yields t (1 ,M ) = M − X i =1 p i λ i + M − X i =1 p i λ i M − X j = i +1 j Y k = i +1 γ k ! , (S16)which is the time from the PAM binding in state i = 1 to target DNA cleavage in state M .Conversely, if we consider 1 − p i , which is the probability of reaching state 0 from state i in Eq. S12, and we divide the resulting equivalent of Eq. S15 by 1 − p , the probability ofreaching state 0 from state 1, we obtain t (1 , = " M − X i =1 − p i λ i + M − X i =1 − p i λ i M − X j = i +1 j Y k = i +1 γ k ! p − p , (S17)which is the time from the PAM binding in state i = 1 to dissociation from the PAM instate i = 0. 5 I. MODEL PARAMETERS

The free energy and reaction parameters utilised in the model are summarised in Table S1.

TABLE S1. Energy and kinetic parameters used in the CRISPR model and the experimental sourcefrom which they were obtained or estimated. The dsDNA and RNA:DNA melting and bindingenergies are simpliﬁed to be the same for A:T and C:G base pairs (bp). Note that the forward andreverse attempt rates A for interrogating the dsDNA protospacer ( λ i for i = 1 ...M − µ i for i = 2 ...M −

1) were assumed to be equivalent. k B T = 0 .

62 kcal/mol is used.

Free Energy Description Value Source ∆ G MeltMatch melting 1 dsDNA bp 3 kcal/mol [1]∆ G Supercoiling topological constraint per bp -0.8 kcal/mol [2–5]∆ G BindMatch binding 1 RNA:DNA bp − ∆ G MeltMatch [6]∆ G BindMismatch binding 1 mismatched RNA:DNA bp − / G MeltMatch [6, 7]∆ G Cleavage break DNA phosphodiester bonds 6 kcal/mol per strand [8]

Rate Attempt Rate Barrier Height Source

PAM dissociation ( µ ) A , = 6x10 ∆ E , = 3.4 kcal/mol [9–11]dsDNA separation initiation ( λ ) A , = A , ∆ E , = 10.2 kcal/mol [2]dsDNA melting ( λ i , for i = 2 ...M − A i,i +1 = A i,i − = 1x10 ∆ E i,i +1 = 0.9 kcal/mol [12, 13]dsDNA cleavage ( λ M − ) A M − ,M = 1x10 ∆ E M − ,M = 12 kcal/mol [14] There is a reduced free energy for Cas9 to separate a double-stranded DNA (dsDNA)target that is associated with supercoiling [1, 3, 15],∆ G dsDNAseparation = ∆ G MeltMatch + ∆ G Supercoiling , (S18)where ∆ G MeltMatch is the free energy associated with melting individual base pairs of re-laxed dsDNA. It is energetically favourable to separate DNA strands that are topologicallyconstrained due to negative supercoiling [1]. For easy strand separation and compaction,cellular DNA is generally kept 5% to 7% under-wound, resulting in negative superhelicaltwists [16]. In fact, positive supercoiling helps to protect extreme thermophiles from spon-taneous thermal denaturation [17]. During lysogenic infection, the phage injects its DNAinto the cell, the DNA joins its ends and circularises, and the circular DNA then becomesnegatively supercoiled by means of the host’s machinery in order to more easily integrateitself into the bacterial genome [18]. Though in vivo supercoiling is a dynamic quantity [3],we estimate an average free energy of supercoiling for n DNA base pairs to be∆ G Supercoiling = n qRTh σ, (S19)where R is the gas constant and T = 37 ◦ C is the temperature inside the cell [2, 3]. Theaverage superhelical density σ is taken to be -0.06 [4], the parameter q is an experimentallydetermined coeﬃcient ≈ h is 10.4 bp per turn [5].In general for protein:DNA interaction energetics, the change in free energy related tobinding has speciﬁc E ( −→ b ) and nonspeciﬁc E ns components,∆ G binding ( N ) = E ( −→ b ) + E ns , (S20)6here −→ b is a DNA sequence b i , ...b i + N − of length N [19]. The nonspeciﬁc binding energydoes not depend on the actual nucleotide sequence, but rather accounts for the interactionbetween the protein and the DNA’s phosphate backbone. The speciﬁc binding energy islinearly related to the individual contribution of the protein interacting with each nucleotide,in which the energy change with respect to binding the matching target is E ( −→ b ) − E ( −→ b ) Match = N X i =1 (cid:15) ( i, b i ) , (S21)where (cid:15) ( i, b i ) is the energy penalty of mismatching base b i in position i [19, 20]. The valueof (cid:15) ( i, b i ) is obtained according to the protein’s position weight matrix, which can be deﬁnedin several ways [21]. This matrix approximates protein speciﬁcity by assigning a score toeach base at each position, relative to the matching target sequence.At present, a position weight matrix has not been experimentally determined for Cas9,so we proceed as follows. For PAM binding ∆ G PAM , the scores are deﬁned as position- andbase-independent match rewards that each contribute to the binding by an amount of energy (cid:15) such that ∆ G PAM ( n match ) = (cid:15) ∗ n match + E ns , (S22)where n match is the number of matching nucleotides between the Cas9’s DNA-interactingdomain and the DNA target. Dissociation rates for Cas9 from matching and non-matchingNGG PAMs were obtained from published CRISPR experiments [22, 23]. A generic barrierheight and attempt rate for protein:DNA dissociation were estimated from [9] and transitionstate theory for protein unfolding [10, 11]. The Arrhenius relation was then used to calculate∆ G PAM for binding matching and non-matching 3nt-PAMs, and these values were used toobtain (cid:15) and E ns from a linear ﬁt of Eq. S22,∆ G PAM ( n match ) = − .

63 kcal/mol ∗ n match − . . (S23)Given experimental observations of negligible Cas9 interaction with the DNA sequence inthe absence of a PAM [22, 24], ∆ G PAM (0) was set to 0. It is important to note thatthis ﬁtting method and use of Eq. S23 to deﬁne the free energy change greatly simplifyCas9:PAM binding. Endogenous Cas9 PAMs, such as the NGG of

Streptococcus pyogenes or the NGRRT of

Staphylococcus aureus , have complex position- and base-dependent energypenalties for mismatches that impact binding [1, 24, 25]. However, the objective of thismodel is to investigate how the probability of cleaving a DNA sequence, and the timefor cleaving or dissociating from that sequence, depends on the amount of initial energyassociated with the ﬁrst module (i.e., the bound Cas9:PAM). Eq. S23 allows us to probehow the probability and times are impacted when that ﬁrst module’s energy is increasedand decreased in approximate units of mismatches.7

II. LANDSCAPE RESULTS

The reaction landscapes for the CRISPR interrogation of target DNA were deﬁned bythe parameters in Table S1, where ∆ G i was the depth of each state i ,∆ G = ∆ G PAM ( n match ) , (S24)∆ G >i>M = ∆ G dsDNAseparation + ∆ G BindDNA , (S25)∆ G M = ∆ G Cleavage , (S26)the forward barriers were ∆ E i − ,i , and the reverse barriers were calculated as∆ E i,i − = ∆ E i − ,i − ∆ G i , (S27)as stated in the main text. For a target DNA sequence that was a perfect match, thelandscapes all trended downwards (Figure S1A). Mismatches caused the landscapes to trendupwards, and the more mismatches there were, the more energetically unfavourable it wasfor the CRISPR to cleave the DNA sequence (Figure S1B). It has been suggested that near-perfect complementarity between the crRNA and target DNA in the PAM-proximal regionlowers the energy needed to continue binding the rest of the target, such that it is less thanthat of the reverse unzipping reaction [24]. A CRISPR-Cas Reaction States -60-40-20020 F r ee E ne r g y ( kc a l / m o l ) B CRISPR-Cas Reaction States -40-20020 F r ee E ne r g y ( kc a l / m o l ) FIG. S1. (A)

Reaction landscapes calculated for nine representative CRISPR systems that hadPAMs of diﬀerent lengths. In all cases, the target DNA sequence’s PAM and protospacer were per-fect matches to the Cas9:crRNA. (B)

Reaction landscapes calculated for the 3nt-PAM CRISPRwhen faced with a target DNA sequence that had no (blue), 5 (tan), and 20 (brown) mismatches.The dotted lines designate where the mismatches were located (coloured to match their corre-sponding landscape). V. CONSECUTIVE MISMATCHES – EXTENDED RESULTS A P1 P2 P3 1 5 10 15 20 25 30

Location of Mismatch (nt position) -2 A ve r a g e T i m e t o C l eava g e ( s ) PAM1 5 10 15 20 25 300200040006000800010000 C u m u l a t i ve M i s m a t c h es B P1 P2 P3 1 5 10 15 20 25 30

Location of Mismatch (nt position) -15 -10 -5 A ve r a g e T i m e t o D i ss o c i a t e ( s ) PAM1 5 10 15 20 25 300200040006000800010000 C u m u l a t i ve M i s m a t c h es FIG. S2. Mismatch tolerance for the 3-nt PAM system illustrated by averaging over (A) cleavagetime or (B) dissociation time when there were mismatches at each sequence location. Each curve isa geometric mean from 10,000 iterations of testing the speciﬁed number of consecutive mismatches(from 2 to 24 mismatches) along all possible positions in the sequence, and error bars representstandard error. The insets for each ﬁgure show the cumulative number of mismatches at eachsequence location when (A) cleavage occurred and (B) dissociation occurred. (In other words, ifeach curve of the insets was divided by the total number of cumulative mismatches tested at eachsequence location, the inset in (A) would show the cleavage frequency plotted in Fig. 3 of the maintext, and the inset in (B) would show the dissociation frequency.) P i denotes the location of PAMnucleotide i , and the x -axis restarts at 1 for nucleotides after the PAM.[1] Farasat I, Salis HM. A biophysical model of CRISPR/Cas9 activity for rational design ofgenome editing and gene regulation. PLoS Comp Biol , 12(1):e1004724, 2016.[2] Bauer WR, Benham CJ. The free energy, enthalpy and entropy of native and of partiallydenatured closed circular DNA.

J Mol Biol , 234(4):1184–1196, 1993.[3] Westra ER, van Erp PB, K¨unne T, Wong SP, Staals RH, Seegers CL, Bollen S, Jore MM,Semenova E, Severinov K, et al. CRISPR immunity relies on the consecutive binding anddegradation of negatively supercoiled invader DNA by Cascade and Cas3.

Mol Cell , 46(5):595–605, 2012.

4] Wang H, Benham CJ. Superhelical destabilization in regulatory regions of stress responsegenes.

PLoS Comp Biol , 4(1):e17, 2008.[5] Wang JC. Helical repeat of DNA in solution.

Proc Natl Acad Sci USA , 76(1):200–203, 1979.[6] Rauzan B, McMichael E, Cave R, Sevcik LR, Ostrosky K, Whitman E, Stegemann R, SinclairAL, Serra MJ, Deckert AA. Kinetics and thermodynamics of DNA, RNA, and hybrid duplexformation.

Biochemistry , 52(5):765–772, 2013.[7] Leonard GA, Booth ED, Brown T. Structural and thermodynamic studies on the adenine.guanine mismatch in B-DNA.

Nucleic Acids Res , 18(19):5617–5623, 1990.[8] Dickson KS, Burns CM, Richardson JP. Determination of the free-energy change for repairof a DNA phosphodiester bond.

J Biol Chem , 275(21):15828–15831, 2000.[9] Ecevit O, Khan MA, Goss DJ. Kinetic analysis of the interaction of b/HLH/Z transcriptionfactors Myc, Max, and Mad with cognate DNA.

Biochemistry , 49(12):2627–2635, 2010.[10] Garcia-Viloca M, Gao J, Karplus M, Truhlar DG. How enzymes work: analysis by modernrate theory and computer simulations.

Science , 303(5655):186–195, 2004.[11] Pollak E, Talkner P. Reaction rate theory: What it was, where is it today, and where is itgoing?

Chaos , 15(2):026116, 2005.[12] Chen X, Zhou Y, Qu P, Zhao XS. Base-by-base dynamics in DNA hybridization probed byﬂuorescence correlation spectroscopy.

J Am Chem Soc , 130(50):16947–16952, 2008.[13] Sanstead PJ, Tokmakoﬀ A. Direct observation of activated kinetics and downhill dynamics inDNA dehybridization.

J Phys Chem B , 122(12):3088–3100, 2018.[14] Gong S, Yu HH, Johnson KA, Taylor DW. DNA unwinding is the primary determinant ofCRISPR-Cas9 activity.

Cell Rep , 22(2):359–371, 2018.[15] Szczelkun MD, Tikhomirova MS, Sinkunas T, Gasiunas G, Karvelis T, Pschera P, Siksnys V,Seidel R. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascadeeﬀector complexes.

Proc Natl Acad Sci USA , 111(27):9798–9803, 2014.[16] Nelson D, Cox M.

Bioenergetics and metabolism in Lehninger Principles of Biochemistry .New York: Worth Publishers, 3rd edition, 2000.[17] Witz G, Dietler G, Stasiak A. Eﬀect of DNA Supercoiling on DNA Decatenation and Un-knotting Followed By Brownian Dynamics Simulations.

Biophys J , 98(3):62a, 2010.[18] Trun N, Trempy J.

Fundamental bacterial genetics . John Wiley & Sons, 2009.[19] Slutsky M, Mirny LA. Kinetics of protein-dna interaction: facilitated target location insequence-dependent potential.

Biophys J , 87(6):4021–4035, 2004.[20] Phillips R, Kondev J, Theriot J, Garcia H.

Physical biology of the cell . Garland Science, 2012.[21] Stormo GD. Modeling the speciﬁcity of protein-DNA interactions.

Quant Biol , 1(2):115–130,2013.[22] Jones DL, Leroy P, Unoson C, Fange D, ´Curi´c V, Lawson MJ, Elf J. Kinetics of dCas9 targetsearch in

Escherichia coli.

Science , 357(6358):1420–1424, 2017.[23] Singh D, Sternberg SH, Fei J, Doudna JA, Ha T. Real-time observation of DNA recognitionand rejection by the RNA-guided endonuclease Cas9.

Nat Commun , 7(1):1–8, 2016.[24] Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by theCRISPR RNA-guided endonuclease Cas9.

Nature , 507(7490):62, 2014.[25] Jiang W, Bikard D, Cox D, Zhang F, Marraﬃni LA. RNA-guided editing of bacterial genomesusing CRISPR-Cas systems.

Nat Biotechnol , 31(3):233–239, 2013., 31(3):233–239, 2013.