[PDF] Exact site frequency spectra of neutrally evolving tumors, transition between power laws and signatures of cell viability

Abstract

The site frequency spectrum (SFS) is a popular summary statistic of genomic data. While the SFS of a constant-sized population undergoing neutral mutations has been extensively studied in population genetics, the rapidly growing amount of cancer genomic data has attracted interest in the spectrum of an exponentially growing population. Recent theoretical results have generally dealt with special or limiting cases, such as considering only cells with an infinite line of descent, assuming deterministic tumor growth, or sending the tumor size to infinity. In this work, we we derive exact expressions for the expected SFS of a cell population that evolves according to a stochastic branching process, first for cells with an infinite line of descent and then for the total population, evaluated either at a fixed time (fixed-time spectrum) or at the stochastic time at which the population reaches a certain size (fixed-size spectrum). We find that while the rate of mutation scales the SFS of the total population linearly, the rates of cell birth and cell death change the shape of the spectrum at the small-frequency end, inducing a transition between a 1/j^2 power-law spectrum and a 1/j spectrum as cell viability decreases. We use this insight to propose a simple estimator for the ratio between the rate of cell death and cell birth, that we show to accurately recover the true ratio from synthetic single-cell sequencing data. Although the discussion is framed in terms of tumor dynamics, our results apply to any exponentially growing population of individuals undergoing neutral mutations.

Full PDF

EExact site frequency spectra of neutrally evolving tumors,transition between power laws and signatures of cell viability

Einar Bjarki Gunnarsson Kevin Leder Jasmine Foo Department of Industrial and Systems Engineering, University of Minnesota, Twin Cities, MN 55455, USA. School of Mathematics, University of Minnesota, Twin Cities, MN 55455, USA.

Abstract

The site frequency spectrum (SFS) is a popular summary statistic of genomicdata. While the SFS of a constant-sized population undergoing neutral muta-tions has been extensively studied in population genetics, the rapidly growingamount of cancer genomic data has attracted interest in the spectrum of an ex-ponentially growing population. Recent theoretical results have generally dealtwith special or limiting cases, such as considering only cells with an inﬁnite lineof descent, assuming deterministic tumor growth, or sending the tumor size toinﬁnity. In this work, we we derive exact expressions for the expected SFS of acell population that evolves according to a stochastic branching process, ﬁrst forcells with an inﬁnite line of descent and then for the total population, evaluatedeither at a ﬁxed time (ﬁxed-time spectrum) or at the stochastic time at which thepopulation reaches a certain size (ﬁxed-size spectrum). We ﬁnd that while therate of mutation scales the SFS of the total population linearly, the rates of cellbirth and cell death change the shape of the spectrum at the small-frequency end,inducing a transition between a 1 /j power-law spectrum and a 1 /j spectrum ascell viability decreases. We use this insight to propose a simple estimator for theratio between the rate of cell death and cell birth, that we show to accuratelyrecover the true ratio from synthetic single-cell sequencing data. Although thediscussion is framed in terms of tumor dynamics, our results apply to any expo-nentially growing population of individuals undergoing neutral mutations. Keywords:

Mathematical modeling, cancer evolution, branching processes, exponentiallygrowing populations, site frequency spectrum, inﬁnite sites model.

MSC classiﬁcation:

The study of genetic variation driven by neutral mutations has a long history in popula-tion genetics [1]. Usually, the population is assumed to be of a large constant size N , andreproduction follows either the Wright-Fisher model (non-overlapping generations) or theMoran model (overlapping generations) [2]. Neutral mutations occur at rate u per individualper generation, and each new mutation is assumed to be unique (the inﬁnite-sites model ofKimura [3]). This setup gives rise to a sample-based theory of tracing genealogies of extantindividuals backwards in time via the coalescent , introduced by Kingman [4, 5]. A popular1 a r X i v : . [ q - b i o . P E ] F e b ummary statistic of genomic data is the site frequency spectrum (SFS), which records thefrequencies of mutations in a population or population sample. Under the Moran model withneutral mutations, the expected number of mutations found in j cells of a sample of size n is ξ j = N u/j , and any linear combination of the form (cid:80) n − j =1 jc j ξ j with (cid:80) n − j =1 c j = 1 is anunbiased estimator of θ := N u , the population-scaled mutation rate. Prominent estimatorsof this form include Watterson’s θ W [6], Tajima’s θ π [7], Fu and Li’s ξ [8] and Fay and Wu’s θ H [9], and these estimators form the basis of several statistical tests of neutral evolutionvs. evolution under selection [10]. In this way, the site frequency spectrum has provided asimple means of understanding the evolutionary history of populations from genomic data.Cancer can be viewed as its own evolutionary process, operating at the somatic level.Cancer initiation is usually understood to be a series of mutational events, that culminatesin malignant cells able to proliferate uncontrollably [11, 12, 13, 14]. Such “driver” mutationsare complemented by more frequent neutral or “passenger” mutations [15, 16], that haveno functional role in the evolution to malignancy, but contribute to the genetic diversitycharacteristic of cancer [17, 18, 19]. The dominant paradigm of tumor progression has beenthat of sequential clonal expansion of driver mutations. However, several recent works suggestthat a neutral evolution model, under which all driver mutations are already present in thetumor-initiating cell, is suﬃcient to explain the intratumoral heterogeneity in many cancers,see e.g. [20, 21, 22] and the reviews [23, 24]. A simple test of neutral tumor evolutionbased on the site frequency spectrum was proposed in Williams et al. [22], which has sincegenerated debate e.g. surrounding its signiﬁcance level and statistical power (see e.g. [25, 26,27] with author responses in [28, 29]). The authors of [22] subsequently suggested a Bayesianframework for detecting tumor subclones evolving under selection in [30], and more recentapproaches to that problem include Dinh et al. [31] and Caravagna et al. [32]. These worksand the surrounding debate are indicative both of the fact that increased attention is beingpaid to the role of neutral mutations in driving heterogeneity in cancer, and that eﬀorts arejust underway to develop robust methods of inferring the evolutionary history of tumors.While the constant-sized models of population genetics are appropriate for understandingearly cancer development in small tissue compartments, exponential growth models are morerelevant for understanding long-run tumor progression [33, 34]. In recent years, several workshave used diﬀerent versions of such models to derive the site frequency spectrum of an expo-nentially growing tumor with neutral mutations. To state the most prominent result, assumea branching process model in which cells divide at rate r and die at rate d ( r > d ), andthat w neutral mutations accumulate on average per cell division. Deﬁne p := d /r as the extinction probability of the tumor (Section 2), and q := 1 − p as its survival probability . Ifwe consider only cells with an inﬁnite line of descent, and assume the number of cells N islarge, the expected number of mutations found in j cells is ξ j = ( w/q ) N · / ( j ( j + 1)). ThisSFS diﬀers from the SFS of the constant-sized Moran model of population genetics in twoimportant ways: The spectrum now follows a 1 /j power law as opposed to a 1 /j law, and itnow depends on the growth parameters r and d via the survival probability q . Versions ofthe 1 /j spectrum have been established by Durrett [35, 33], Bozic et al. [36] and Williamset al. [22], as we discuss in more detail below. In [22], [36] and [21], the 1 /j spectrum wasused to infer the ratio w/q from tumor data, but extracting information about the mutationparameter w and the growth parameter q separately seemingly requires additional tools.In a recent work by Werner et al. [37], the authors measured pairwise mutational diﬀer-ences between the ancestors of spatially separated tumor bulk samples, and they developeda coalescent-based approach for estimating w and a function of p deﬁned in (2) below.The extinction probability p quantiﬁes the viability and turnover of tumor cells. As cell2urnover increases, the mutational burden and genetic diversity of the tumor increases, whichenhances its adaptability under treatment. As a result, it is important to be able to identifytumors with a high cell turnover, i.e. with a large extinction probability p . Prior works onthe SFS of an exponentially growing tumor population oﬀer only a limited understanding ofhow p aﬀects the spectrum, as these works generally only consider cells with an inﬁnite lineof descent, or they consider special cases such as deterministic growth of the tumor bulk orno cell death ( p = 0), which is reasonable when p is small. Moreover, many prior results aregiven in the large-time or large-population limit, and for practical reasons, the focus is oftenon mutations of frequency 5 −

10% and higher. These results are discussed in more detail inSection 6. Our goal in this work is to gain a more complete understanding of the SFS of anexponentially growing tumor with neutral mutations. In particular, we seek to understandhow the spectrum behaves both at small and large frequencies, for all values of p and thepopulation size N . We obtain separate results for cells with an inﬁnite line of descent and forthe total population, evaluated either at a ﬁxed time or at the stochastic time at which thepopulation reaches a certain size, each of which is relevant to tumor data analysis dependingon the context. We observe that while the SFS of skeleton cells depends on the mutation rate w and the extinction probability p only via w/q , the two parameters decouple in the SFSof the total population. In fact, as p increases from 0 to 1, the small-frequency end of thespectrum transitions from the 1 /j power law characteristic of pure-birth exponential growthto the 1 /j power law characteristic of a constant-sized population. We investigate simplemetrics that quantify this transition, and we use one of them to propose a simple estimatorfor p , that we subsequently evaluate using synthetic single-cell sequencing data.The rest of the paper is organized as follows. In Section 2, we formulate our branchingprocess model with neutral mutations, deﬁne the skeleton subpopulation of cells with aninﬁnite line of descent, and establish relevant notation. In Section 3, we analyze the SFS ofskeleton cells, and in Section 4, we analyze the SFS of the total cell population. In Section 5,we use our theoretical results to propose and evaluate a simple estimator for p . In Section6, we summarize our results, and discuss in detail how they relate to the existing literature.The proofs of all of our theoretical results are found in appendices at the end of the paper. We assume that the tumor evolution follows a branching process model. Cells divide intotwo cells at rate r ≥ d ≥ t , a cell divides with probability r ∆ t and dies withprobability d ∆ t . Assume r > d and deﬁne λ := r − d > Z ( t ) denote the size of the tumor population at time t and assume Z (0) = 1, i.e. the tumorexpands from a single tumor-initiating cell. DeﬁneΩ ∞ := { Z ( t ) > t > } as the event that the tumor does not go extinct, and p := P (Ω c ∞ ) = P ( Z ( t ) = 0 for some t > extinction probability of the tumor. This probability can be computed as p = d /r with 0 ≤ p <

1, see e.g. Section 3 of [33]. Note that any clone derived from a single tumorcell gives rise to its own branching process with the same growth parameters r and d and3igure 1: (a) The skeleton subpopulation, which consists of cells with an inﬁnite line ofdescent, can be thought of as the trunk and scaﬀolding branches of the genealogical branchingtree (bold branches). Each skeleton cell divides into one skeleton cell and one ﬁnite-familycell at rate 2 d (type-1 skeleton division), in which case a ﬁnite-family clone grows out fromthe skeleton as a lateral branch (light gray branches). Each skeleton cell divides into twoskeleton cells at rate λ (type-2 skeleton division), in which case another scaﬀolding branch isadded. (b) In between two type-2 skeleton divisions, the expected number of mutations thataccumulate on type-1 divisions is wp /q . Since w mutations on average are added on eachtype-2 division, the expected number of mutations per skeleton level is wp /q + w = w/q .the same extinction probability. We also deﬁne q := 1 − p = λ /r (1)as the survival probability of the tumor or a single-cell derived clone. On the nonextinctionevent Ω ∞ , the cells alive at time t can be split into two categories, one consisting of cells withan inﬁnite line of descent, i.e. cells that start clones that do not go extinct, and the otherconsisting of cells whose descendants eventually go extinct. We refer to the former cells as skeleton cells and the latter as ﬁnite-family cells. An arbitrary tumor cell is a skeleton cellwith probability q , so in the long run, the proportion of skeleton cells in the population is q .We can think of skeleton cells as forming the trunk and scaﬀold branches of the genealogicalbranching tree, with ﬁnite-family clones growing out from the skeleton as lateral branches,see Figure 1a.We next add neutral mutations under the inﬁnite-sites model. Prior to a cell division, eachparental DNA molecule is unwound and separated into two complementary strands. Eachparental strand serves as a template for the construction of a new complementary daughterstrand. The end result is two copies of the DNA molecule, each consisting of one parentalstrand and one daughter strand. Errors in nucleotide pairing during this process can resultin one or more point mutations per daughter strand. We assume that these errors amountto w/ w mutations on averageper cell division. We make no assumption on the distribution of the number of mutationsapart from its mean. The point mutation rate has been estimated as 5 · − per base pairper cell division for normal cells [38], and it is commonly higher in cancer due to genomicinstability [17]. Since the number of base pairs is of order 10 in the exome and 10 in the4enome, it makes sense to allow w to be any positive number, i.e. w ∈ (0 , ∞ ). In many works,the convention is to allow at most one mutation per cell division, introducing a probability u ∈ (0 ,

1) of a new mutation. Since our analysis only depends on the mean number ofmutations w per cell division, it includes this case with w := u . Finally, while our focus ison the above discrete model of mutation accumulation, we will also present all of our resultsin terms of continuous mutation accumulation, another common and biologically relevantassumption. In the continuous model, neutral mutations occur at rate ν > clonal and subclonal mutations.Say the tumor-initiating cell divides into two cells A and B , and say that cell A acquires anew mutation. If the clone started by cell B eventually dies out, the mutation in cell A willbe shared by all tumor cells from that point onward, making it clonal . However, if the clonesstarted by A and B both survive, the mutation becomes subclonal , i.e. there is always atleast one tumor cell without it. While this example demonstrates how clonal mutations canarise post-initiation, all mutations that accumulate prior to cancer initiation, as the cancerprecursor cell evolves to malignancy, also become clonal. For this reason, the clonal mutationsusually tell us more about the events preceding cancer than the dynamics post-initiation, andthey can in fact outnumber the subclonal mutations [15]. Nevertheless, clonal mutations doappear in the SFS of mutations post-initiation, and they play distinct roles in the ﬁxed-timeand ﬁxed-size spectrum, which is why we pay them special attention below. In this section, we establish the expected ﬁxed-time and ﬁxed-size spectrum of skeleton cells.The reason we are interested in analyzing skeleton cells separately is twofold: • When p = 0 (no cell death), all cells are skeleton cells, and the SFS of the totalpopulation is the SFS of the skeleton. More generally, when p is small, the totalpopulation spectrum is well-approximated by the simpler skeleton spectrum. • When the extinction probability p is large, ﬁnite-family cells aﬀect the SFS of thetotal population at the small-frequency end. However, the large-frequency end is stillcharacterized by the skeleton spectrum, as we demonstrate in Section 4 below. When the tumor is conditioned on nonextinction, the probability that the tumor-initiatingcell divides during the ﬁrst ∆ t units of time, and that exactly one of the two daughter cellssurvives, i.e. starts a clone that does not go extinct, is P (division in [0 , ∆ t ], one oﬀspring survives) P (Ω ∞ ) = r ∆ t · p (1 − p )1 − p = 2 r p ∆ t, and the probability of a division in [0 , ∆ t ] where both daughter cells survive is P (division in [0 , ∆ t ], both oﬀspring survive) P (Ω ∞ ) = r ∆ t · (1 − p ) − p = r (1 − p )∆ t. Since each skeleton cell starts a clone that does not go extinct, we can conclude that a skeletoncell divides into one skeleton cell and one ﬁnite-family cell at rate 2 r p = 2 d per unit time5type-1 skeleton division), and it divides into two skeleton cells at rate r q = λ per unittime (type-2 skeleton division). The probability that a skeleton division is type-2 is κ := r q r q + 2 r p = 1 − p p . (2)This probability can also be computed directly as follows: (1 − p ) is the probability thatboth daughter cells survive, and 1 − p is the probability that at least one of them does, so(1 − p ) / (1 − p ) = (1 − p ) / (1 + p ) is the probability that a skeleton division is type-2.Let ˜ Z ( t ) denote the number of skeleton cells at time t , conditional on the nonextinctionevent Ω ∞ . Since type-1 divisions do not aﬀect the size of the skeleton, we can think ofthe type-2 divisions as the “eﬀective” divisions. More precisely, ( ˜ Z ( t )) t ≥ is a pure-birthexponential growth process, or a Yule process , with birth rate λ and mean size E [ ˜ Z ( t )] = e λ t [33, 39]. Type-1 divisions do contribute to mutation accumulation however. Indeed, eachtype-1 division adds w/ w mutations on average. The rate at which mutations accumulate on the skeleton is then( w/ · r p + w · r (1 − p ) = wr (3)per unit time, which equals the mutation rate for the original, unconditioned process ( Z ( t )) t ≥ .The mutation rate per type-2 division, or the eﬀective mutation rate, is on the other hand wr /λ = w/q (4)per unit time. We can also think of w/q as the expected number of mutations that accu-mulate per skeleton level as follows. Upon a type-2 division, the number of type-1 divisionsbefore the next type-2 division has the geometric distribution with support { , , . . . } andsuccess probability κ given by (2). It follows that in between two type-2 divisions, theexpected number of mutations that accumulate on the skeleton due to type-1 divisions is( w/ · (1 /κ −

1) = ( w/ · p /q = wp /q . (5)When we include the type-2 division that starts each skeleton level, we obtain wp /q + w = w/q , (6)mutations per skeleton level (Figure 1b). The eﬀective mutation rate w/q plays a key role inthe SFS of the skeleton, with the continuous-time viewpoint in (4) applying to the ﬁxed-timespectrum, and the population-level viewpoint in (6) applying to the ﬁxed-size spectrum. Let ˜ S j ( t ) denote the number of mutations that are found in j ≥ t ,conditional on the nonextinction event Ω ∞ . This is the site frequency spectrum of skeletoncells. For any integer N ≥

1, deﬁne ˜ t N := log( N ) /λ , (7)as the (ﬁxed) time at which the skeleton has expected size N , i.e. e λ ˜ t N = N , and deﬁne˜ τ N := inf { t ≥ Z ( t ) = N } (8)6 -8 -6 -4 -2 -1 -3 -2 Figure 2: (a)

The expected ﬁxed-time spectrum (9) of Proposition 1 (solid blue line) showsgood agreement with the average spectrum of simulated tumors (grey dots) for the parameters N = 100, p = 0 and w = 1. We generated 10 tumors with these parameters and stoppedeach simulation at the ﬁxed time ˜ t N deﬁned by (7). At each cell division, the number ofmutations acquired by each daughter cell was generated as a Poisson random variable withmean w/ (b) The expected ﬁxed-size spectrum (11) of Proposition 1 (solid red line) showsgood agreement with the average spectrum of simulated tumors (grey dots) with the sameparameters as in (a). We again generated 10 tumors with these parameters, but this time,we stopped each simulation when the tumor reached size N . The fundamental diﬀerencebetween the ﬁxed-time and ﬁxed-size spectrum is that the latter spectrum is restricted to j = 1 , . . . , N , while the former spectrum has nonzero mass at values j > N .as the (random) time at which the skeleton reaches size N . In Proposition 1 below, weprovide the expected skeleton spectrum evaluated both at time ˜ t N (ﬁxed-time spectrum) andat time ˜ τ N (ﬁxed-size spectrum). Both the ﬁxed-time and the ﬁxed-size spectrum can berelevant to tumor data analysis depending on the context. For example, in vitro cell cultureexperiments and in vivo mouse experiments are often conducted over a ﬁxed time period, inwhich case the ﬁxed-time spectrum would apply. In the clinic, however, the size of a tumoris more readily measured than its age, in which case the ﬁxed-size spectrum is more relevant[40]. It is therefore useful to understand both spectra and to what extent they diﬀer. Proposition 1. (1) Deﬁne ˜ t N as in (7) . Then, for any N ≥ and any j ≥ , E [ ˜ S j (˜ t N )] = ( w/q ) N · (cid:82) − /N (1 − y ) y j − dy. (9) For any j ≥ , then as N → ∞ , E [ ˜ S j (˜ t N )] ∼ ( w/q ) N · / ( j ( j + 1)) , (10) where f ( y ) ∼ g ( y ) as y → ∞ means lim y →∞ f ( y ) /g ( y ) = 1 .(2) Deﬁne ˜ τ N as in (8) . Then, for any N ≥ , E [ ˜ S j (˜ τ N )]= (cid:40) ( w/q ) · (cid:80) N − max(2 ,j ) k =1 kN − j (cid:81) j − n =1 (1 − kN − n ) + wδ ,j , ≤ j ≤ N − ,wp /q = w/q − w, j = N, (11) where (cid:81) ∅ := 1 and δ (cid:96),m = 1 if (cid:96) = m and δ (cid:96),m = 0 otherwise. -2 -1 Figure 3: (a)

The ﬁxed-time spectrum (9) for the skeleton (blue curve) is a good approxima-tion of the ﬁxed-size spectrum (11) (red crosses) for mutations at frequencies j (cid:28) N , givenparameters N = 10 , p = 0 . w = 1. The two spectra diverge at larger frequencies, andthe diﬀerence is substantial at j = N , since clonal mutations are concentrated at j = N inthe ﬁxed-size spectrum, while they are scattered in the ﬁxed-time spectrum. (b) Here, weshow the ratio of the ﬁxed-time spectrum to the ﬁxed-size spectrum (purple crosses), andthe proportion of mutations found in ≤ j skeleton cells (green curve). The ﬁxed-time andﬁxed-size spectrum are virtually the same on j (cid:28) N , where almost all mutations are found. Proof.

Appendix A.Analogous results for continuous mutation accumulation are presented in Appendix C. InFigure 2, we compare our ﬁxed-time (9) and ﬁxed-size (11) results with simulation results for w = 1, p = 0 and N = 100. In this example, there are no clonal mutations, since p = 0. Thefundamental diﬀerence between the ﬁxed-time and ﬁxed-size spectrum is that the skeletonsize at time ˜ t N is variable, while it is always N at time ˜ τ N . The ﬁxed-time spectrum thereforehas nonzero mass at values j > N , due to instances in which the skeleton is larger than N attime ˜ t N . It is however natural to ask how the ﬁxed-time spectrum restricted to j = 1 , . . . , N relates to the ﬁxed-size spectrum. To that end, note that for N large and j (cid:28) N , the sumin (11) can be approximated by( w/q ) · (cid:80) N − max(2 ,j ) k =1 kN − j (cid:81) j − n =1 (1 − kN − n ) ≈ ( w/q ) · (cid:82) N kN (cid:0) − kN (cid:1) j − dk, which, using the substitution y := 1 − k/N , becomes the ﬁxed-time spectrum (9). In Figure3a, we compare the two spectra for p = 0 . N = 10 , and show how the ﬁxed-timespectrum is a good approximation of the ﬁxed-size spectrum for j (cid:28) N . In Figure 3b, weshow that almost all mutations are found at j (cid:28) N . In this example, the diﬀerence betweenthe two spectra is within 1% on the range j = 1 , . . . , j = N in the ﬁxed-size spectrum ofFigure 3a, which does not appear in the ﬁxed-time spectrum. This is due to the distinct waysin which clonal mutations manifest in the two spectra. In the ﬁxed-time spectrum, clonalmutations may appear at any value of j , depending on the skeleton size at time ˜ t N . In theﬁxed-size spectrum, all these mutations are concentrated at j = N , which creates a largepoint mass at j = N . 8 .3 Proportion of mutations found in one cell We conclude this section by introducing two simple and important metrics derived fromthe site frequency spectrum of the skeleton. Note ﬁrst that by (9) of Proposition 1, in theﬁxed-time spectrum, the expected number of mutations found in one skeleton cell is E [ ˜ S (˜ t N )] = (1 / w/q )( N − /N ∼ (1 / w/q ) N, N → ∞ . (12)Next, let ˜ M j ( t ) := (cid:80) k ≥ j ˜ S k ( t ) denote the number of mutations found in ≥ j skeleton cells attime t . Again, by (9) of Proposition 1, in the ﬁxed-time spectrum, the expected total numberof mutations on the skeleton is E [ ˜ M (˜ t N )] = ( w/q )( N − ∼ ( w/q ) N, N → ∞ . (13)Expressions (12) and (13) suggest that for large N , half the mutations discovered at time ˜ t N are found in only one cell. This is a consequence of the pure-birth exponential growth of theskeleton. Indeed, note that if we only consider the eﬀective type-2 skeleton divisions, the totalnumber of divisions required to reach generation k is (cid:80) k − j =0 j = 2 k −

1. The expected totalnumber of mutations in generation k is then ( w/q )(2 k − N = 2 k as thenumber of cells in generation k . An additional 2 k divisions are required to reach generation k + 1, so each generation roughly doubles the total number of mutations. Of course, ourmodel is stochastic, it operates in continuous time and generations may overlap, but thissimple discrete argument gives intuition as to why half the mutations are found in one cell,and more generally why most mutations are found at the smallest frequencies. When the extinction probability p is small, i.e. tumor cell turnover is low, the SFS ofthe total population ( Z ( t )) t ≥ is well-approximated by the SFS of the skeleton ( ˜ Z ( t )) t ≥ .However, tumor evolution is commonly characterized by high turnover, which induces a largemutational burden and high genetic diversity. For example, in [41], Avanzini & Antal estimate p for metastatic breast cancer, colorectal cancer, head & neck cancer, lung cancer andprostate cancer as 0 . , . , . , .

97 and 0 .

76, respectively. Bozic et al. [16] estimated p as 0 .

93 for metastatic melanoma, and Bozic et al. [36] applied 0.72 for fast-growing colorectalcancer metastases and 0.99 for early tumor growth. Understanding how the SFS of thesetumors behave, and exploring how they can potentially be identiﬁed from genomic data, iscrucial due to their elevated genetic diversity and enhanced adaptability under treatment.To this end, we investigate in this section the expected ﬁxed-time and ﬁxed-size spectrum ofthe total population ( Z ( t )) t ≥ . We show that as p increases, the small-frequency end of thespectrum starts to deviate from the skeleton spectrum of Section 3, and as p approaches 1,it transitions to the spectrum of a constant-sized population. Let S j ( t ) denote the number of mutations found in j ≥ t , the site frequencyspectrum of the total cell population. We now wish to compute the mean of S j ( t ) conditionedon the tumor surviving to time t . We can compute the probability of this survival event as P ( Z ( t ) >

0) = q e λ t / ( e λ t − p ) , t ≥ , t as E [ Z ( t ) | Z ( t ) >

0] = ( e λ t − p ) /q , t ≥ . (14)Note that E [ Z ( t ) | Z ( t ) > ∼ e λ t /q as t → ∞ , which means that the long-run expectedgrowth of a tumor conditioned on survival is exponential, where the exponential growthfunction has initial value 1 /q . Thus, while tumors with a small survival probability q (largeextinction probability p ) are less likely to survive in the long run, those that do survive arelarger on average than tumors with a large survival probability, since the latter tumors areless aﬀected by the conditioning on survival. Now, for any integer N ≥

1, deﬁne t N := log( q N + p ) /λ (15)as the (ﬁxed) time at which a surviving tumor has expected size N , and deﬁne τ N := inf { t ≥ Z ( t ) = N } (16)as the (random) time at which the tumor reaches size N , with inf ∅ = ∞ . In Proposition 2,we provide the expected SFS of ( Z ( t )) t ≥ at time t N and τ N , conditioned on the survivalevents { Z ( t N ) > } and { τ N < ∞} , respectively. In this case, we cannot obtain an explicitexpression for the ﬁxed-size spectrum, and provide instead a simple computational expression. Proposition 2. (1) Deﬁne t N as in (15) . Then, for any N ≥ and j ≥ , E [ S j ( t N ) | Z ( t N ) >

0] = wN · (cid:82) − /N (1 − p y ) − (1 − y ) y j − dy. (17) For any j ≥ , then as N → ∞ , E [ S j ( t N ) | Z ( t N ) > ∼ wN · (cid:82) (1 − p y ) − (1 − y ) y j − dy = wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) , (18) where f ( y ) ∼ g ( y ) as y → ∞ means lim y →∞ f ( y ) /g ( y ) = 1 .(2) Deﬁne τ N as in (16) , let S := { ( (cid:96), m ) : (cid:96), m ≥ and (cid:96) + m ≤ N } and A := { (0 , } ∪{ ( r, s ) : r, s ≥ and r + s = N } . Then, for any N ≥ , E [ S j ( τ N ) | τ N < ∞ ] = ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) · h ( j,N − j )(1 ,k ) , ≤ j ≤ N, (19) where for ( r, s ) ∈ A , the vector (cid:0) h ( r,s )( (cid:96),m ) (cid:1) ( (cid:96),m ) ∈S solves the system ( (cid:96) + m )(1 + p ) h ( r,s )( (cid:96),m ) = (cid:96)h ( r,s )( (cid:96) +1 ,m ) + (cid:96)p h ( r,s )( (cid:96) − ,m ) + mh ( r,s )( (cid:96),m +1) + mp h ( r,s )( (cid:96),m − , (cid:96), m ≥ , (cid:96) + m < N, with boundary conditions given by (46) in Appendix B.Proof. Appendix B. 10 -8 -6 -4 -2 -2 -1 Figure 4: (a)

The expected ﬁxed-time spectrum (17) of Proposition 2 (solid blue line) showsgood agreement with the average spectrum of simulated tumors (grey dots) for the parameters N = 100, p = 0 . w = 1. We generated 10 tumors with these parameters and stoppedeach simulation at the ﬁxed time t N deﬁned in (15). At each cell division, the number ofmutations acquired by each daughter cell was generated as a Poisson random variable withmean w/ (b) The expected ﬁxed-size spectrum (19) of Proposition 2 (solid red line) showsgood agreement with the average spectrum of simulated tumors (grey dots) with the sameparameters as in (a). We again generated 10 tumors with these parameters, but this time,we stopped each simulation when the tumor reached size N . Note the discontinuity in theﬁxed-size spectrum at j = N , which is due to clonal mutations.Analogous results for continuous mutation accumulation are presented in Appendix C. InFigure 4, we compare our ﬁxed-time (17) and ﬁxed-size (19) results with simulation resultsfor w = 1, p = 0 . N = 100. The fundamental diﬀerence between the two spectra isthe same as we observed in Section 3.2. In Figure 5a, we compare them in more detail for N = 100 and p ∈ { , . , . , . } . The ﬁxed-time spectrum is a good approximation ofthe ﬁxed-size spectrum on j (cid:28) N for all but p = 0 .

9, in which case there is a signiﬁcantdiﬀerence even for j (cid:28) N . In Figure 5b, we show that this diﬀerence reduces as N increases,with the expected number of mutations found in one cell being 1.46% oﬀ for p = 0 . N = 1000. The number of cells in 1 cm of tumor tissue is around 10 − [42], in whichcase the two spectra can be expected to agree on j (cid:28) N for most values of p . They maydiverge, however, when applied to smaller tumor tissue samples (1 mm or smaller) or tumorsubclones, especially considering that the extinction probability can be as large as p = 0 . j (cid:28) N for N suﬃciently largeis that once the tumor has reached a large size, its growth becomes essentially deterministicwith exponential rate λ . Mutations that arise late in the development of a large tumortherefore behave largely the same whether we stop at the ﬁxed time t N or the stochastic time τ N . In fact, conditional on the nonextinction event Ω ∞ , τ N /t N → N → ∞ ,i.e. τ N ≈ t N for large N [43]. As indicated by (14), tumor growth is not exponential initially,and the eﬀect of the constant term p /q increases as p increases. Thus, the tumor entersits stable exponential growth phase later for larger values of p , which is why two spectracan diﬀer substantially for large values of p when N is small. In Appendix D, we present asimple heuristic argument for why the two spectra agree on j (cid:28) N , where we approximatethe probabilities h ( j,N − j )(1 ,k ) in (19) by continuous-time probabilities following [44].11 Figure 5: (a)

Here, we show the ratio between the ﬁxed-time spectrum (17) and the ﬁxed-sizespectrum (19) for N = 100, p ∈ { , . , . , . } and w = 1. The ﬁxed-time spectrum is agood approximation of the ﬁxed-size spectrum for smaller values of p , but the two spectrastart to diverge as p increases. (b) Here, we show the ratio between the expected numberof mutations found in one cell for the ﬁxed-time spectrum and the ﬁxed-size spectrum, asa function of the population size N , for p ∈ { . , . } and w = 1. As N increases, thediﬀerence between the two spectra reduces both for p = 0 . p = 0 . By Proposition 1 of Section 3.2, the SFS of the skeleton depends on the mutation rate w andthe extinction probability p only through the eﬀective mutation rate w/q . By Proposition 2,however, the two parameters decouple in the SFS of the total population. The mutation rate w merely scales the total population spectrum by a linear factor, and the same is true for alarge population size N , but the dependence on p is more complicated. To better understandhow p aﬀects the SFS, recall ﬁrst that by (18), the asymptotic expected ﬁxed-time spectrumis given by E [ S j ( t N ) | Z ( t N ) > ∼ wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) , N → ∞ . In Appendix E, we show that for ﬁxed 0 < p <

1, sending j → ∞ in this expression yields wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) ∼ ( w/q ) N · / ( j ( j + 1)) , j → ∞ , (20)which is the 1 /j power law (10) of the asymptotic skeleton spectrum of Proposition 1. Wealso show that for ﬁxed j ≥

1, sending p → wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) ∼ wN · /j, p → , (21)which is the 1 /j power law of the constant-sized Moran model of population genetics, seee.g. Theorem 1.33 of [2]. When referring to results from [2], note that the population isassumed to have size 2 N , which is a common convention in population genetics.Recall that when p = 0, the spectrum of the total population ( Z ( t )) t ≥ is the spectrumof the skeleton ( ˜ Z ( t )) t ≥ . Expressions (20) and (21) suggest that when p >

0, the spectrumof ( Z ( t )) t ≥ continues to follow the 1 /j skeleton law at the large-frequency end, while itstarts to deviate from the 1 /j law at the small-frequency end. In fact, as p approaches 1,the small-frequency end transitions to the 1 /j law of the constant-sized Moran model. Thisis illustrated in Figure 6 for p ∈ { . , . , . } . Then, in Figure 7a, we show how for ﬁxed12 Figure 6: As the extinction probability p is increased from p = 0 . p = 0 .

95, the small-frequency end of the asymptotic expected ﬁxed-time spectrum (18) of Proposition 2 (solidblue line) transitions from the 1 /j power law of (20) (dotted red line) to the 1 /j power lawof (21) (dotted green line). Other parameters are N = 10 and w = 1. Note that the 1 /j law of (21) is ﬁxed as a function of p , while the 1 /j law of (20) increases with p . p = 0 .

99, the spectrum transitions from the 1 /j law at the large-frequency end to the 1 /j law at the small-frequency end. Note that the power laws of (20) and (21) intersect at j = 1 /q − , which gives an indication of the frequency at which the transition occurs. This intersectingpoint is independent of N , as is illustrated in Figure 7b (dotted vertical line).To understand this transition between power laws, note that mutations that occur earlyin the evolution of a large tumor are only detected if they occur on the skeleton, which iswhy mutations at large frequencies follow the 1 /j skeleton spectrum. For mutations thatoccur late, we need to consider both skeleton cells and ﬁnite-family cells. When p is large,most cells are ﬁnite-family cells (their long-run proportion is p ), and as p approaches 1,ﬁnite-family clones start to behave like a critical branching process with net growth rate 0.This is why late mutations follow the 1 /j law characteristic of a constant-sized population.Thus, even though the branching process dynamics of cell division and cell death are thesame throughout the evolution of the tumor, from the perspective of mutation accumulation,the tumor eﬀectively behaves like a pure-birth exponential growth process initially, and morelike a constant-sized process at the end, assuming that the extinction probability p is large.The transition between power laws is the transition between these two growth regimes. We next wish to quantify how p aﬀects overall mutation accumulation. To this end, wederive in Proposition 3 the expected total mutational burden of the tumor, both underthe ﬁxed-time and ﬁxed-size spectrum. This quantity indicates the genetic diversity of thetumor, which has implications e.g. for its adaptability under treatment. It also enablesus to compute a normalized version of the SFS, which is particularly useful for parameterestimation, as is discussed further in Sections 4.4 and 4.5 below. Before proceeding, deﬁne M j ( t ) := (cid:80) k ≥ j S k ( t ) as the number of mutations found in ≥ j cells at time t .13 -4 -2 -4 -2 Figure 7: (a)

For a ﬁxed, large value of the extinction probability p , the asymptotic expectedﬁxed-time spectrum (18) of Proposition 2 (solid blue line) transitions from the 1 /j powerlaw of (20) (dotted red line) at the large-frequency end to the 1 /j power law of (21) (dottedgreen line) at the small-frequency end. The parameters are N = 10 , p = 0 .

99 and w = 1.The dotted vertical line shows the intersecting point j = 1 /q − (b) When we increase the tumor size to N = 10 , the transition betweenpower laws occurs around the same value of j as before, j = 1 /q − Proposition 3. (1) For < p < , the expected total number of mutations in the ﬁxed-timespectrum is given by E [ M ( t N ) | Z ( t N ) >

0] = − wN · (1 /p ) log( q + p /N ) ∼ − wN · log( q ) /p , N → ∞ . (22) For p = 0 , E [ M ( t N ) | Z ( t N ) >

0] = w ( N − ∼ wN as N → ∞ .(2) Deﬁne S and A as in Proposition 2. The expected total number of mutations in theﬁxed-size spectrum is given by E [ M ( τ N ) | τ N < ∞ ] = ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) (cid:0) − h (0 ,N )(1 ,k ) − h (0 , ,k ) (cid:1) , (23) where for ( r, s ) ∈ A , the vector (cid:0) h ( r,s )( (cid:96),m ) (cid:1) ( (cid:96),m ) ∈S solves the linear system in Proposition 2.For p = 0 , E [ M ( τ N ) | τ N < ∞ ] = w ( N − ∼ wN as N → ∞ .Proof. Appendix F.For ease of notation, write M := E [ M ( t N ) | Z ( t N ) >

0] as the expected total mutationalburden under the ﬁxed-time spectrum. We are interested in comparing M with the expectednumber of mutations under the 1 /j skeleton law of (20). To this end, deﬁne (cid:99) M := ( w/q ) N, (24)following (13). This simple estimate has been used e.g. in [21], where they obtain a rangeof estimates of the total number of mutations in a hepatocellular carcinoma (HCC) tumorunder diﬀerent assumptions on tumor evolution. The ratio of M and (cid:99) M is M / (cid:99) M = − ( q /p ) · log( q + p /N ) ∼ − ( q /p ) · log( q ) , N → ∞ . (25)14 .1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 8: (a)

Ratio between the expected number of mutations in the ﬁxed-time spectrum, M := E [ M ( t N ) | Z ( t N ) >

0] given by (22), and the simple estimate (cid:99) M of (24), derivedfrom the 1 /j skeleton law of (20), as a function of p for N = 10 . The two estimatesagree for p = 0, but (cid:99) M becomes a signiﬁcant overestimate of M as p increases. (b) Theproportion of mutations found in one cell, S /M = ϕ N ( p ) as given by (27), quantiﬁes thetransition between the 1 /j and 1 /j power laws at the small-frequency end of the spectrum.This transition accelerates as p increases.In Figure 8a, we show this ratio as a function of p for N = 1000. The ratio is decreasing in p , it converges to 1 as p →

0, and it converges to 0 as p →

1. To give some examples, inthe N → ∞ limit, M / (cid:99) M = 0 .

46 for p = 0 .

75 and M / (cid:99) M = 0 .

047 for p = 0 .

99. Thus,if one is interested in estimating the total number of mutations in an exponentially growingtumor, using the simple expression (24) implied by the skeleton spectrum will result in asigniﬁcant overestimate when p is large. Indeed, as p increases, ﬁnite-family cells start todominate the population, and they accumulate mutations less eﬃciently than skeleton cells. At the extremes p = 0 and p = 1, the small-frequency end of the SFS is characterized bythe 1 /j skeleton law and the 1 /j constant-sized law, respectively. To better understand howthe small-frequency end behaves for intermediate values of p , we next determine the relativeproportion of mutations found at the very smallest frequency, i.e. in one cell. This metricquantiﬁes the transition between the 1 /j and 1 /j power laws, and it enables us to proposea simple estimator for p in Section 5 below. In Appendix G, we show that in the ﬁxed-timespectrum, for 0 < p <

1, the expected number of mutations found in one cell is given by E [ S ( t N ) | Z ( t N ) >

0] = wN · (1 /p ) (cid:0) − /N + ( q /p ) log( q + p /N ) (cid:1) ∼ wN · (1 /p )(1 + ( q /p ) log( q )) , N → ∞ . (26)As in Section 4.3, write S := E [ S ( t N ) | Z ( t N ) >

0] for ease of notation. Then deﬁne ϕ N ( p ) := S /M as the proportion of mutations found in one cell. By (22) and (26), ϕ N ( p ) = − (cid:0) − /N + ( q /p ) log( q + p /N ) (cid:1) (cid:14) log( q + p /N ) ∼ − (1 + ( q /p ) log( q )) / log( q ) , N → ∞ . (27)In Figure 8b, we show ϕ N = S /M as a function of p for N = 1000. This function is strictlydecreasing in p , it converges to 0 .

50 as p →

0, and it converges to (1 − /N ) / log( N ) as p →

1, which is of order 1 / log( N ) for N large. In the p → Figure 9: (a)

Histogram of S ( t N ) (cid:14) S over 10 simulation runs with N = 100, p = 0 . w = 1, where t N is deﬁned by (15), and S := E [ S ( t N ) | Z ( t N ) >

0] as given by (26). The y -axis is normalized so as to approximate the density of the underlying probability distri-bution. By comparison with the density x (cid:55)→ e − x , we see that S ( t N ) (cid:14) S is approximatelya mean-1 exponential random variable, which is consistent with the law of large numbers(28). (b) When the population size is increased to N = 1000, S ( t N ) (cid:14) S retains the mean-1exponential distribution, which is also consistent with (28). (c) Histogram of S ( τ N ) /S over10 simulation runs with N = 100, p = 0 . w = 1, where τ N is deﬁned by (16). (d) Same as in (c), except now N = 1000. Together, (c) and (d) show that the ratio S ( τ N ) (cid:14) S concentrates around 1 as N increases, which is consistent with the law of large numbers (29).population ( Z ( t )) t ≥ is the SFS of the skeleton ( ˜ Z ( t )) t ≥ , in which case half the mutationsare found in one cell by Section 3.3. In the p → Z ( t )) t ≥ is the SFS ofthe constant-sized Moran model, in which case the expected number of mutations is of order wN (cid:80) Nj =1 /j ∼ wN log( N ) for N large, and the proportion of mutations found in one cell isof order 1 / log( N ). Note that the rate of change ϕ N ( p ) increases as p increases (Figure 8b).This means that the deviation from the 1 /j skeleton law is initially slow for small values of p , but it accelerates as p increases and transitions quickly to the 1 /j law for large valuesof p . It also means that S /M is more useful for distinguishing larger values of p thansmaller values, as will become more apparent in Section 5 below. The results of Proposition 2 hold in expectation, meaning that they apply to the average SFScomputed over a large number of tumors. If we want to use these results to understand theevolutionary history of individual tumors, we need to know more about how well they apply16 Figure 10: (a)

Histogram of ( S ( t N ) /M ( t N )) (cid:14) ( S /M ) over 10 simulation runs with N =100, p = 0 . w = 1, where t N is deﬁned by (15), and the expected ratio S /M isgiven by (27). Note the point masses at 0 and 2.44, which represent simulation runs where S ( t N ) /M ( t N ) = 0 and S ( t N ) /M ( t N ) = 1, respectively. (b) Same as in (a), except now, N = 1000. Together, (a) and (b) show that as N increases, S ( t N ) /M ( t N ) concentratesaround S /M , which is consistent with the law of large numbers (28). It indicates thatwhen the ﬁxed-time spectrum is normalized by the number of mutations, it has the samedeterministic law-of-large-numbers limit as the normalized ﬁxed-size spectrum.on a tumor-by-tumor basis. It is well-known that conditional on the nonextinction eventΩ ∞ , Z ( t ) ∼ Y e λ t almost surely as t → ∞ , where Y follows the exponential distributionwith mean 1 /q (Theorem 1 of [33]). In words, the tumor population Z ( t ) eventually growsat exponential rate λ , but the initial value of the long-run exponential growth function israndom and depends on the individual tumor. We can use this fact to formulate laws of largenumbers for the ﬁxed-time and ﬁxed-size spectrum, which we state formally as a conjecture.In Appendix H, we present simple calculations in support of this result, and we also provean analogous result (51) for a simpliﬁed, semideterministic version of our model. Conjecture. (1) Deﬁne t N as in (15) . Then, there exists an exponential random variable X with mean 1 so that for any j ≥ , conditional on Ω ∞ , S j ( t N ) ∼ X · wN · (cid:82) (1 − p y ) − (1 − y ) y j − dy, (28) in probability or almost surely as N → ∞ .(2) Deﬁne τ N as in (16) . Then, for any j ≥ , conditional on Ω ∞ , S j ( τ N ) ∼ wN · (cid:82) (1 − p y ) − (1 − y ) y j − dy, (29) in probability or almost surely as N → ∞ . Both the ﬁxed-time result (28) and the ﬁxed-size result (29) agree with simulation results,see Figures 9 and 10. The diﬀerence between the two results is the random scaling factor X in (28), which reﬂects the fundamental diﬀerence between the ﬁxed-time and ﬁxed-sizespectrum, i.e. that the tumor size is variable at time t N , while it is always N at time τ N .Note that since E [ X ] = 1, the right-hand sides of (28) and (29) agree in the mean, andthis mean agrees with the right-hand side of the asymptotic spectrum (18) of Proposition 2.Importantly, the scaling factor X in (28) is independent of j , so the proportion of mutations17ound in j cells is the same in (18), (28) and (29). Thus, if we normalize the SFS with the totalnumber of mutations, S j ( t ) /M ( t ), the ﬁxed-time and ﬁxed-size spectrum of an individuallarge tumor will be completely characterized by the asymptotic expected spectrum (18) ofProposition 2 (Figure 10). In particular, inference made from the normalized SFS of anindividual large tumor is robust to whether it is observed at a ﬁxed time or a ﬁxed size. Wewill make use of this fact in our examination of a simple estimator for p in Section 5. In this section, we use our theoretical results to propose a simple estimator for the extinctionprobability p , based on extracting one or more spatially separated subclones from a tumor,either clinically obtained or derived in vitro or in vivo . By a subclone, we mean all currentlyliving descendants of a given common ancestor, i.e. all leaves of the branching tree startedby a given tumor cell. We emphasize that since every clone or subclone derived from asingle tumor cell obeys the same branching process dynamics as the overall tumor, all of ourprevious results can be applied to individual subclones.Say that we sample a subclone of size n . For 1 ≤ j ≤ n −

1, let s j be the number ofmutations found in j cells of the subclone, and let m := (cid:80) n − j =1 s j be the total number ofmutations. Here, we ignore mutations found in all cells of the subclone, since they include(i) mutations that accumulate prior to initiation of the tumor as a whole, (ii) mutations thatoccur post-tumor-initiation but prior to initiation of the subclone, and (iii) mutations thatoccur post-subclone-initiation but still end up in all subclone cells. Let s be the expectednumber of mutations in the subclone under the ﬁxed-time spectrum, and let m be theexpected total number of mutations under the ﬁxed-time spectrum. By (27), we can write s /m = ϕ n ( p ) , where ϕ n ( p ) is continuous and strictly decreasing in p . In particular, ϕ n ( p ) is invertible.This implies that given s and m , p can be recovered from this expression via p = ϕ − n ( s /m ) . For the sampled values s and m , this suggests the following estimator for p : (cid:98) p ( n ) := ϕ − n ( s /m ) . (30)Of course, if the subclone is sampled at a certain size, it makes more sense to use the ﬁxed-sizespectrum than the ﬁxed-time spectrum. In addition, m excludes mutations found in all cellsof the subclone, while m includes some of these mutations. These potential sources of errorare minor and can easily be resolved if necessary, as we discuss in more detail below. Notethat the ratio of expected values s /m takes values in [(1 − /n ) / log( n ) , /

2] by Section4.4, whereas due to stochasticity, the sampled ratio s /m can take any value in [0 , ϕ − n by setting ϕ − n ( x ) := (cid:40) , / ≤ x ≤ , , ≤ x ≤ (1 − /n ) / log( n ) . (31)For example, if we observe a ratio s /m larger than 1/2, we default to the estimate (cid:98) p ( n ) = 0,since the expected ratio s /m is largest (and equal to 1 /

2) for p = 0. Then, once p hasbeen estimated, an estimate for the mutation rate w can be obtained from (26) or (22).18 Figure 11: (a)

Histogram of the estimator (cid:98) p ( n ) deﬁned in (30) computed across 10 syntheticsubclone samples of size n = 200, given true values p ∈ { . , . , . } and w = 1. (b) Sameas in (a), except now n = 1000. Together, (a) and (b) show that as the size of the subcloneis increased, the estimator (cid:98) p ( n ) concentrates around the true value of p , which indicatesstatistical consistency of the estimator. They also show that the estimator gives the mostaccurate prediction of p when p is large.The estimator (cid:98) p ( n ) has the beneﬁt of being simple to deﬁne and to compute. However, aswas mentioned above, it may make more sense to use the ﬁxed-size spectrum than the ﬁxed-time spectrum, and m includes clonal mutations that arise post-initiation of the subclone,whereas m excludes these mutations. The second potential source of error is minor in mostcases, and it can easily be removed simply by subtracting from m the contribution frommutations shared by all subclone cells. The ﬁrst potential source of error is also insigniﬁcantwhen n is large, since we observed in Section 4.5 that the normalized spectrum of an individuallarge tumor or tumor subclone is robust to whether it is observed at a ﬁxed time or a ﬁxedsize. For smaller values of n , one can replace s and m by the corresponding quantities (19)and (23) for the ﬁxed-size spectrum, which we denote here by s and m . It remains truethat we can write s /m = ψ n ( p ) for some function ψ n of p , which allows us to deﬁne anestimator for p as before. However, the ﬁxed-size estimator has to be obtained numerically,e.g. by precomputing ψ n ( p ) over a grid of values of p , and minimizing the error betweenthe observed ratio s /m and the expected ratio s /m over the grid.To further evaluate (cid:98) p ( n ), we use computer simulations to generate multiple independentsubclones of size n with true extinction probability p , and for each generated subclone, wecompute the estimate (cid:98) p ( n ). In Figure 11a, we show a histogram for (cid:98) p ( n ) across 10 syntheticsubclone samples of size n = 200 with true extinction probabilities p ∈ { . , . , . } . InTable 1, we show performance metrics for (cid:98) p ( n ) computed across the 10 samples. For p = 0 .

5, the estimator defaults to (cid:98) p ( n ) = 0 for 4.6% of the subclone samples, for p = 0 . p mean median std. error defaults to 0 (%) defaults to 1 (%)200 0.5 0.4694 0.5041 0.2099 0.0459 00.7 0.6755 0.7045 0.1505 0.0026 00.9 0.8839 0.9001 0.0725 0 0.000021000 0.5 0.4921 0.5000 0.0948 0.00002 00.7 0.6957 0.7010 0.0603 0 00.9 0.8979 0.9009 0.0266 0 0Table 1: Performance metrics for the the estimator (cid:98) p ( n ) computed from the data thatunderlies Figure 11. The standard error is the sample standard deviation from the mean.Both for n = 200 and n = 1000, the median estimate of (cid:98) p ( n ) accurately recovers the truevalue of p , and the estimate improves in terms of standard error both as n increases and as p increases.it defaults to 0 in 0.3% of cases, and for p = 0 .

9, it never defaults to 0. For all values of p , themedian estimate of (cid:98) p ( n ) accurately recovers the true value, and as p increases, the qualityof the estimate improves in terms of standard error (standard deviation from the mean). InFigure 11b, we increase the subclone size to n = 1000 and observe a marked improvementin the quality of (cid:98) p ( n ). This indicates statistical consistency of the estimator, meaning that (cid:98) p ( n ) → p in probability as n → ∞ , which would also be a direct consequence of the law oflarge numbers (28) and the continuous mapping theorem. In words, the estimator recoversthe true value of p with arbitrarily high precision given a suﬃciently large subclone. Notethat the size of n required to return a high-precision estimate of (cid:98) p ( n ) becomes smaller as p increases, making (cid:98) p ( n ) especially useful when p is large. Indeed, as we remarked in Section4.4, the rate of change of the expected ratio s /m increases as p increases, making it moreuseful for distinguishing between larger values of p than smaller values.Whenever it is possible to do multi-region sampling, there may be beneﬁts from extractingmultiple small, spatially separated subclones over a single large one. In this more generalsetting, we sample K ≥ n . For 1 ≤ j ≤ n −

1, let s kj be the number ofmutations found in j cells of subclone number k , and let m k := (cid:80) n − j =1 s kj be the total numberof mutations in subclone k . We replace s and m in the deﬁnition of (cid:98) p ( n ) in (30) by the Figure 12: Histogram of the estimator (cid:98) p ( n, K ) of (32) computed across 10 synthetic samplesof K = 5 subclones of size n = 200, given true values p ∈ { . , . , . } and w = 1. Note thesimilarity between these histograms and the histograms in (b) of Figure 11, which indicatesthat sampling ﬁve subclones of size 200 gives a comparable estimate of p to sampling a singlesubclone of size 1000, using the estimator (cid:98) p ( n, K ) in both cases.20 n p mean median std. error defaults to 0 (%) defaults to 1 (%)5 200 0.5 0.4965 0.5050 0.0957 0.00013 00.7 0.7000 0.7055 0.0612 0 00.9 0.8983 0.9014 0.0279 0 0Table 2: Performance metrics for the the estimator (cid:98) p ( n, K ) computed from the data thatunderlies Figure 12. The standard error is the sample standard deviation from the mean.Note the similarity of these metrics with the lower half of Table 1.sums (cid:80) Kk =1 s k and (cid:80) Kk =1 m k to obtain the estimator (cid:98) p ( n, K ) := ϕ − n (cid:0) (cid:80) Kk =1 s k (cid:14) (cid:80) Kk =1 m k (cid:1) . (32)Of course, (cid:98) p ( n,

1) = (cid:98) p ( n ). In Figure 12, we show a histogram for (cid:98) p ( n, K ) evaluated across10 synthetic samples, each sample consisting of K = 5 independent subclones of size n = 200.In Table 2, we show performance metrics computed across the 10 samples. Qualitatively, thehistograms in Figure 12 are very similar to the histograms of Figure 11b, and quantitatively,the performance metrics in Table 2 mimic those for the n = 1000 case in Table 1. In otherwords, the quality of the estimate of p obtained from sampling one subclone of size 1000 iscomparable to the one obtained from sampling ﬁve subclones of size 200. In this scenario,it may make more sense to extract multiple small subclones than one large one, since it isimpossible to tell from a single subclone sample alone whether the tumor as a whole can beconsidered as neutrally evolving. Should there be diﬀerences in the subclone dynamics, themultiregion sample will tease this out, and should the dynamics be the same, the estimateone obtains for p will be of comparable quality to the single large subclone case. In this work, we have established exact expressions for the expected site frequency spectrumof a tumor, or more generally any population of individuals, that evolves according to abranching process with neutral mutations under the inﬁnite-sites assumption of populationgenetics. We ﬁrst considered the skeleton subpopulation, consisting of cells with an inﬁniteline of descent, and obtained explicit expressions for SFS of the skeleton evaluated both ata ﬁxed time and a ﬁxed size. We then examined the total population, deriving an explicitexpression for the ﬁxed-time spectrum and a computational expression for the ﬁxed-sizespectrum. Our results apply to mutations at all frequencies, to tumor tissue samples andtumor subclones of any size, and to all values of the extinction probability p , even valuesas large as p = 0 .

90 and above, which are broadly relevant for cancer. Our work thereforeoﬀers a complete and rigorous characterization of the expected SFS of our branching processmodel of Section 2, and it contains results previously obtained in the literature as special orlimiting cases, as we outline in detail in the next two paragraphs.First, to compare our expected ﬁxed-time and ﬁxed-size skeleton spectrum, (9) and (11) ofProposition 1, to results previously obtained by Durrett [35, 33], Bozic et al. [36] and Williamset al. [22], note that for ﬁxed 0 ≤ f <

1, the expected number of subclonal mutations foundat frequency ≥ f in the population is, in the N → ∞ limit, E (cid:2) (cid:80) N − j = (cid:100) Nf (cid:101) ˜ S j (˜ t N ) (cid:12)(cid:12) Ω ∞ (cid:3) ∼ ( w/q ) N · (cid:80) N − j = (cid:100) Nf (cid:101) / ( j ( j + 1)) = ( w/q )(1 /f − . (33)21his is the cumulative spectrum of subclonal mutations obtained in the above works. Theresult in Durrett [35, 33] is given for continuous mutation accumulation, and it includes clonalmutations, which by (47) of Appendix C requires adding ν/λ = w/q mutations to (33). Thisyields ( ν/λ )(1 /f ) as the cumulative spectrum, see Theorem 1 of [35] and Theorem 2 of [33].Under discrete mutation accumulation, the number of clonal mutations is wp /q by (11) ofProposition 1, yielding the cumulative spectrum ( w/q )(1 /f ) − w including clonal mutations.The diﬀerence of w reﬂects the diﬀerence in the number of clonal mutations between thediscrete and continuous model, see Appendix C. Thus, our skeleton results, which hold for allvalues of p and N , for discrete or continuous mutation accumulation, evaluated at a ﬁxedtime or a ﬁxed size, contain the results of [35, 33, 36, 22] as asymptotic byproducts.The expected ﬁxed-time spectrum of the total population, (17) and (18) of Proposition 2,was previously obtained by Ohtsuki & Innan in [34], under the assumption of deterministicgrowth of the tumor bulk and stochastic growth of mutant subclones, which is a reasonableapproximation when p is small. We have established (17) and (18) for the fully stochasticmodel and for all values of p by conditioning on tumor survival. We have also conjecturedthe laws of large numbers (28) and (29) for the ﬁxed-time and ﬁxed-size spectrum, which aresupported by heuristic calculations and simulation results. We have proved a law of largenumbers (51) for the semideterministic model of [34] in Appendix H, and we plan to prove thefully stochastic results (28) and (29) in a future work. The ﬁxed-time law of large numbers hasa stochastic limit, while the ﬁxed-size limit is deterministic. However, if we normalize eachspectrum by the total number of mutations, they converge to the same deterministic limit.This implies that analysis of the normalized spectrum of an individual large tumor is robustto whether it is observed at a ﬁxed time or a ﬁxed size. The ﬁxed-size law of large numbers(29) agrees with Theorem 2.3 of Lambert [45], who considers a sample of individuals from acoalescent point process, a framework under which an inﬁnite extant population is endowedwith a coalescent structure that speciﬁes how lineages coalesce when traced backwards intime. The expected ﬁxed-size spectrum (19) of Proposition 2 is new as far as we know, aswell as expressions (22) and (23) for the expected total mutational burden of Proposition 3.All of the above results apply to the inﬁnite-sites mutation model of population genetics.Cheek & Antal [46] have recently examined the SFS of an exponentially growing tumor with-out this assumption, citing single-cell sequencing results of Kuipers et al. [47] as motivation.They observe that if recurrent mutations are allowed, and there are S sites in the genome, theexpected SFS at time t can be computed as S P ( Y ( t ) = j ), where Y ( t ) is the number of mu-tants at time t in a two-type model of wild-type and mutant cells, each growing at the samerate. Then, to compute the SFS, one needs the distribution of Y ( t ), which has been obtainedunder various simplifying assumptions, e.g. by Kessler & Levine [48] under the assumptionof no cell death, and by Keller & Antal [49] under deterministic growth of the wild-types.Yet other authors have substituted the inﬁnite-sites model with the inﬁnite-alleles model,under which each new mutation creates a new type of individual, see e.g. [50, 51, 52]. Underthis model, the site frequency spectrum is usually replaced by the allele frequency spectrum ,which tracks frequencies of genetically distinct individuals, known as haplotypes .Our complete theoretical results give rise to several important insights. First of all,whereas the ﬁxed-time and ﬁxed-size skeleton spectrum depends on the mutation rate w and the extinction probability p only through the eﬀective mutation rate w/q , the twoparameters decouple in the total population spectrum. The mutation rate w merely scalesthe spectrum linearly, whereas the extinction probability p changes its shape at the small-frequency end. In fact, as p increases from 0 to 1, the small-frequency end of the spectrumtransitions from the 1 /j power law characteristic of a pure-birth exponential growth to the22 /j law characteristic of constant-sized populations. We examined the simple metrics M / (cid:99) M and S /M that quantify this transition, where (cid:99) M is the expected total mutational burdenunder the 1 /j power law spectrum. We saw that M / (cid:99) M → p →

1, which suggests thatthe simple estimate (cid:99) M of the total number of mutations, applied e.g. in [21], is a signiﬁcantoverestimate of the actual number of mutations M when p is large. We ﬁnally used themetric S /M to propose a simple estimator for p , based on sampling one or more spatiallyseparated subclones from a tumor. This estimator accurately recovers the true value of p from synthetic single-cell sequencing data, and it is most accurate when p is large.Our work reveals the information contained in the small-frequency end of the spectrum,on which most mutations are found. Accurately capturing mutations at small frequenciesremains a technical challenge, even with the rapid advancement of single-cell sequencingtechnology. Current technology only reveals a small proportion of mutations in the tumor ortumor subclone, with large-frequency mutations most likely to be detected. Some of the abovementioned works have considered the site frequency spectrum of a small random sample ofcells from a large tumor, see e.g. [35, 33, 34], and the more recent work [31] considers samplingDNA reads from the genomes of cells, which is a more relevant sampling scheme for currentsequencing data. In a small random sample of cells, mutations shared by two or more cellsin the sample are likely to have arisen early in tumor development, which means they willlikely have occurred on the skeleton. As a result, the spectrum of a small random sampleusually only contains information about the eﬀective mutation rate w/q . This fact has beenused to estimate w/q e.g. in [21, 22, 36] as mentioned in the introduction. Whether ourproposed method of decoupling w and p using the small-frequency end of the spectrumbecomes tenable with higher-quality genomic data remains to be seen. In the meanwhile,it is likely necessary to go beyond the site frequency spectrum, which is indeed a crudesummary statistic of the data, and use e.g. phylogenetic reconstruction to infer ancestralrelationships, as in [37], and/or to complement genomic data gene expression data that canprovide independent quantitative insights into the proliferation activity of extant cells. Acknowledgements

EBG and KL were supported in part by NSF grant CMMI-1362236. EBG and JF weresupported in part by NSF grant DMS-1349724.

Competing Interests

The authors declare no competing interests.

A Proof of Proposition 1

Proposition 1. (1) Deﬁne ˜ t N as in (7) . Then, for any N ≥ and any j ≥ , E [ ˜ S j (˜ t N )] = ( w/q ) N · (cid:82) − /N (1 − y ) y j − dy. For any j ≥ , then as N → ∞ , E [ ˜ S j (˜ t N )] ∼ ( w/q ) N · / ( j ( j + 1)) , where f ( y ) ∼ g ( y ) as y → ∞ means lim y →∞ f ( y ) /g ( y ) = 1 .

2) Deﬁne ˜ τ N as in (8) . Then, for any N ≥ , E [ ˜ S j (˜ τ N )] = (cid:40) ( w/q ) · (cid:80) N − max(2 ,j ) k =1 kN − j (cid:81) j − n =1 (1 − kN − n ) + wδ ,j , ≤ j ≤ N − ,wp /q = w/q − w, j = N, where (cid:81) ∅ := 1 and δ (cid:96),m = 1 if (cid:96) = m and δ (cid:96),m = 0 otherwise.Proof. (1) By (3) in the main text, the mutation rate per skeleton cell per time unit is wr .To stratify mutations based on their frequencies at time ˜ t N , we deﬁne˜ p j ( s ) := P ( ˜ Z ( s ) = j | ˜ Z (0) = 1) , j ≥ , s ≥ , as the size-distribution at time s of a single-cell derived skeleton clone. Since ( ˜ Z ( t )) t ≥ is a Yule process with birth rate λ , this distribution has an explicit expression,˜ p j ( s ) = (1 /e λ s )(1 − /e λ s ) j − , j ≥ , which is the geometric distribution with support { , , . . . } and success probability 1 /e λ s ,see e.g. Section 3 of [33] (the support does not include 0 since skeleton clones do not goextinct). For 0 ≤ t ≤ ˜ t N , let ˜ S j, ˜ t N ( t ) denote the number of mutations that accumulatein the time interval [0 , t ] and are found in j ≥ t N . We write˜ S j (˜ t N ) := ˜ S j, ˜ t N (˜ t N ) for the site frequency spectrum of the skeleton at time ˜ t N . If amutation occurs during an inﬁnitesimal time interval [ t, t + ∆ t ], the clone started by thecell carrying the mutation has size j at time ˜ t N with probability ˜ p j (˜ t N − t ) + O (∆ t ),where f ( x ) = O ( x ) means that there exists C > | f ( x ) | ≤ Cx for suﬃcientlylarge x . The expected number of mutations that accumulate in [ t, t + ∆ t ] and are presentin j ≥ t N is therefore E [ ˜ S j, ˜ t N ( t + ∆ t )] − E [ ˜ S j, ˜ t N ( t )] = wr e λ t ∆ t · ˜ p j (˜ t N − t ) + o (∆ t ) , where we use that E [ ˜ Z ( t )] = e λ t is the mean skeleton size at time t , and f ( x ) = o ( x )means that f ( x ) /x → x →

0. A more precise mathematical argument for thisstatement is given in the proof of part (1) of Proposition 2 in Appendix B. Integratingover time, and using that q = λ /r by expression (1) and N = e λ ˜ t N by expression (7),we obtain E [ ˜ S j (˜ t N )] = (cid:82) ˜ t N wr ˜ p j (˜ t N − t ) e λ t dt = ( w/q ) N · (cid:82) ˜ t N ( e λ t /N )(1 − e λ t /N ) j − · λ ( e λ t /N ) dt. Substituting y = 1 − e λ t /N , dy = − λ ( e λ t /N ) dt , this implies E [ ˜ S j (˜ t N )] = ( w/q ) N · (cid:82) − /N (1 − y ) y j − dy, (34)the desired result. To obtain the asymptotic expression (10), it suﬃces to note that (cid:82) − /N (1 − y ) y j − dy = (cid:0) − N ) j (cid:0) j ( j +1) + N j +1 (cid:1) . (2) In (5) of Section 3.1, we showed that wp /q mutations on average accumulate on type-1divisions in between two type-2 divisions, while type-2 divisions add w mutations onaverage and change the skeleton level. Since the type-1 mutations on level k = 1 are24igure 13: A diagram of the discrete-time Markov chain on { ( (cid:96), m ) : (cid:96), m ≥ , (cid:96) + m ≤ N } which tracks the number of mutated and non-mutated cells at each skeleton level. Eachtype-2 division increases the population level by one, and since skeleton cells do not die, thechain never returns to the lower levels. The states ( (cid:96), N − (cid:96) ) with 1 ≤ (cid:96) ≤ N are absorbing(dashed box), since we are only interested in the evolution up until skeleton level N .the clonal mutations, the expected number of clonal mutations is wp /q , which is the j = N case of the desired result. For k = 2 , . . . , N −

1, however, the expected numberof mutations on level k is wp /q + w = w/q , which includes the type-2 division thatstarts the level. For 1 ≤ j ≤ N −

1, let ˜ h j (1 ,k − be the probability that starting withone mutated and k − j mutated cells when theskeleton reaches size N . Since for levels k = 2 , . . . , N −

1, there are w/q mutationson average per level, and each mutation on level k contributes ˜ h j (1 ,k − to the expectednumber of mutations found in j cells at level N , we obtain E [ ˜ S j (˜ τ N )] = ( w/q ) (cid:80) N − k =2 ˜ h j (1 ,k − + wδ ,j = ( w/q ) (cid:80) N − k =1 ˜ h j (1 ,k ) + wδ ,j , ≤ j ≤ N − , where the extra wδ ,j term is due to mutations that occur on the ﬁnal type-2 divisionthat changes levels from N − N , each of which is found in one skeleton cell. Theabove informal argument is carried out in more mathematical detail in the proof of part(2) of Proposition 2 in Appendix B.It remains to compute the probabilities ˜ h j (1 ,k ) . To this end, deﬁne a two-dimensionaldiscrete-time Markov chain on the state space { ( (cid:96), m ) : (cid:96), m ≥ , (cid:96) + m ≤ N } , where thestate ( (cid:96), m ) keeps tracks of the number of mutants and non-mutants on population level (cid:96) + m . Since each skeleton cell, mutated or non-mutated, divides into two cells at rate λ , the transition probabilities for this chain are given by( (cid:96), m ) → ( (cid:96) + 1 , m ) w.p. (cid:96)/ ( (cid:96) + m ) , ( (cid:96), m ) → ( (cid:96), m + 1) w.p. m/ ( (cid:96) + m ) , for (cid:96), m ≥ (cid:96) + m < N . The states ( (cid:96), N − (cid:96) ) for 1 ≤ (cid:96) ≤ N − h r ( (cid:96),m ) denote the probability that the above chain is absorbed at state ( r, N − r )when started from state ( (cid:96), m ). It is immediate that ˜ h r ( (cid:96),m ) = 0 if (cid:96) > r or m > N − r .For ( (cid:96), m ) with (cid:96) ≤ r , m ≤ N − r and (cid:96) + m < N , by conditioning on whether the25rst transition out of state ( (cid:96), m ) is to ( (cid:96) + 1 , m ) or ( (cid:96), m + 1), we obtain the followingrecursion for ˜ h r ( (cid:96),m ) : ( (cid:96) + m )˜ h r ( (cid:96),m ) = (cid:96) ˜ h r ( (cid:96) +1 ,m ) + m ˜ h r ( (cid:96),m +1) . (35)The boundary conditions are ˜ h r ( (cid:96),N − (cid:96) ) = δ (cid:96),r for 1 ≤ (cid:96) ≤ N −

1. We can actually compute˜ h r ( (cid:96),m ) directly as the sum of probabilities of all possible paths from ( (cid:96), m ) to ( r, N − r ),without using the above recursion. By noting that there are (cid:0) N − ( (cid:96) + m ) r − (cid:96) (cid:1) possible paths,and that each path has the same probability, we obtain˜ h r ( (cid:96),m ) = (cid:0) N − ( (cid:96) + m ) r − (cid:96) (cid:1) · (cid:81) r − (cid:96) − n =0 (cid:96) + n(cid:96) + m + n · (cid:81) N − m − r − n =0 m + nr + m + n , (36)with (cid:81) ∅ := 1. As veriﬁcation, it is straightforward to check that (36) solves (35).To obtain ˜ h j (1 ,k ) for 1 ≤ k ≤ N − ≤ j ≤ N −

1, note ﬁrst that ˜ h j (1 ,k ) = 0 for k > N − j . For 1 ≤ k ≤ min( N − j, N − h j (1 ,k ) = kN − j · (cid:81) j − n =1 (1 − kN − n ) . Thus, ﬁnally, for 1 ≤ j ≤ N − E [ ˜ S j (˜ τ N )] = ( w/q ) (cid:80) N − k =1 ˜ h j (1 ,k ) + wδ ,j = ( w/q ) (cid:80) min( N − j,N − k =1 kN − j (cid:81) j − n =1 (1 − kN − n ) + wδ ,j = ( w/q ) (cid:80) N − max(2 ,j ) k =1 kN − j (cid:81) j − n =1 (1 − kN − n ) + wδ ,j . B Proof of Proposition 2

Proposition 2. (1) Deﬁne t N as in (15) . Then, for any N ≥ and j ≥ , E [ S j ( t N ) | Z ( t N ) >

0] = wN · (cid:82) − /N (1 − p y ) − (1 − y ) y j − dy. For any j ≥ , then as N → ∞ , E [ S j ( t N ) | Z ( t N ) > ∼ wN · (cid:82) (1 − p y ) − (1 − y ) y j − dy = wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) , where f ( y ) ∼ g ( y ) as y → ∞ means lim y →∞ f ( y ) /g ( y ) = 1 .(2) Deﬁne τ N as in (16) , let S := { ( (cid:96), m ) : (cid:96), m ≥ and (cid:96) + m ≤ N } and A := { (0 , } ∪{ ( r, s ) : r, s ≥ and r + s = N } . Then, for any N ≥ , E [ S j ( τ N ) | τ N < ∞ ] = ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) · h ( j,N − j )(1 ,k ) , ≤ j ≤ N, where for ( r, s ) ∈ A , the vector (cid:0) h ( r,s )( (cid:96),m ) (cid:1) ( (cid:96),m ) ∈S solves the system ( (cid:96) + m )(1 + p ) h ( r,s )( (cid:96),m ) = (cid:96)h ( r,s )( (cid:96) +1 ,m ) + (cid:96)p h ( r,s )( (cid:96) − ,m ) + mh ( r,s )( (cid:96),m +1) + mp h ( r,s )( (cid:96),m − , (cid:96), m ≥ , (cid:96) + m < N, with boundary conditions given by (46) in Appendix B. roof. (1) We begin by deﬁning p j ( s ) := P ( Z ( s ) = j | Z (0) = 1) , j ≥ , s ≥ , as the size-distribution at time s of a single-cell derived clone. This distribution has anexplicit expression: Setting g ( t ) := p ( e λ t − e λ t − p and h ( t ) := e λ t − e λ t − p , we can write p ( t ) = P ( Z ( t ) = 0 | Z (0) = 1) = g ( t ) ,p j ( t ) = P ( Z ( t ) = j | Z (0) = 1) = (1 − g ( t ))(1 − h ( t ))( h ( t )) j − , j ≥ , see e.g. (8) of [33]. Simplifying, we obtain p ( t ) = p ( e λ t − e λ t − p ,p j ( t ) = q e λ t ( e λ t − p ) · (cid:16) e λ t − e λ t − p (cid:17) j − , j ≥ . (37)The probability that a single-cell derived clone is still alive at time t is then given by P ( Z ( t ) > | Z (0) = 1) = 1 − p ( t ) = q e λ t / ( e λ t − p ) . (38)For 0 ≤ t ≤ t N , let S j,t N ( t ) denote the number of mutations that accumulate in [0 , t ] andare found in j ≥ t N . We write S j ( t N ) := S j,t N ( t N ) for the site frequencyspectrum at time t N . Say a cell division occurs in an inﬁnitesimal time interval [ t, t + ∆ t ].The division results in w mutations on average, each assigned to one of the two daughtercells, and the clone started by this cell has size j ≥ t N with probability p j ( t N − t ) + O (∆ t ). We wish to show that on the event { Z ( t N ) > } of survival ofthe population up until time t N , the expected number of mutations that accumulate in[ t, t + ∆ t ] and are found in j ≥ t N is E (cid:2)(cid:0) S j,t N ( t + ∆ t ) − S j,t N ( t ) (cid:1) { Z ( t N ) > } (cid:3) = wr e λ t ∆ t · p j ( t N − t ) + o (∆ t ) , (39)where we use that E [ Z ( t )] = e λ t . It will then follow from (39) that E (cid:2) S j,t N ( t + ∆ t ) − S j,t N ( t ) (cid:12)(cid:12) Z ( t N ) > (cid:3) = wr e λ t ∆ t · p j ( t N − t ) / (1 − p ( t N )) + o (∆ t ) . (40)Note that the right-hand side of (39) is the number of mutations found in j cells at time t N for the semideterministic model in which the tumor bulk grows deterministically atrate λ , mutant clones arise at stochastic rate wr , and mutant clones grow stochastically.It is not obvious that (39) should hold for the fully stochastic model, as including theevent { Z ( t N ) > } of survival up until time t N should presumably aﬀect the populationsize at time t ≤ t N . The key is to observe that if a mutation occurs on a cell divisionat time t and ends up in j ≥ t N , the population is automatically alive attime t N . The relevant survival event in (39) is therefore { Z ( t ) > } , and the relevantpopulation size factor is E [ Z ( t )1 { Z ( t ) > } ] = E [ Z ( t )] = e λ t . To not distract furtherfrom the main calculations, we assume the reader is willing to accept (39) as true for themoment, and we provide a detailed mathematical argument at the end of the proof.27sing (40), we can integrate over time to obtain E (cid:2) S j ( t N ) | Z ( t N ) > (cid:3) = (1 − p ( t N )) − · (cid:82) t N wr e λ t p j ( t N − t ) dt. (41)Focusing on the integral, we write (cid:82) t N wr e λ t p j ( t N − t ) dt = ( w/q ) · (cid:82) t N q e λ tN e λ t ( e λ tN − p e λ t ) · (cid:0) e λ tN − e λ t e λ tN − p e λ t (cid:1) j − · λ e λ t dt = wq e λ t N · (cid:82) t N e λ t ( e λ tN − p e λ t ) · (cid:0) e λ tN − e λ t e λ tN − p e λ t (cid:1) j − · λ e λ t dt. Set L := e λ t N . Using the substitution x := e λ t , dx = λ e λ t , we obtain (cid:82) t N wr e λ t p j ( t N − t ) dt = wq L · (cid:82) L x ( L − p x ) · (cid:0) L − xL − p x (cid:1) j − dx. We again change variables, this time y := ( L − x ) / ( L − p x ), in which case x = L (1 − y ) / (1 − p y ) ,dx = − q L/ (1 − p y ) dy,L − p x = q L/ (1 − p y ) , and y = ( L − / ( L − p ) = 1 − q / ( L − p ) for x = 1 and y = 0 for x = L , which implies (cid:82) t N wr e λ t p j ( t N − t ) dt = wL · (cid:82) − q / ( L − p )0 (1 − p y ) − (1 − y ) y j − dy. (42)We now apply (41) and (38) to see that E (cid:2) S j ( t N ) | Z ( t N ) > (cid:3) = w · e λ tN − p q · (cid:82) − q / ( e λ tN − p )0 (1 − p y ) − (1 − y ) y j − dy, and the desired result then follows from the fact that ( e λ t N − p ) /q = N by the deﬁnitionof t N in (15).To write the asymptotic expression (18) as a sum, note that(1 − p y ) − = (cid:80) ∞ k =0 p k y k , which is valid for all 0 ≤ p < ≤ y ≤

1. It then follows that (cid:82) (1 − p y ) − (1 − y ) y j − dy = (cid:80) ∞ k =0 p k (cid:0) (cid:82) (1 − y ) y j + k − dy (cid:1) = (cid:80) ∞ k =0 p k ( j + k )( j + k +1) . We conclude by establishing (39) above, which was E (cid:2) ( S j,t N ( t + ∆ t ) − S j,t N ( t ))1 { Z ( t N ) > } (cid:3) = wr e λ t ∆ t · p j ( t N − t ) + o (∆ t ) , We decompose according to population size at time t . Assume that Z ( t ) = k with k ≥ k cells at time t . Let D t, ∆ t denote the event that exactly one of the k cellsdivides in the inﬁnitesimal time interval [ t, t + ∆ t ], and enumerate the k + 1 cells after thecell division as Y t , . . . , Y k +1 t , where Y t and Y t are the two new cells. Let W denote thenumber of mutations that occur on the cell division, where W is a nonnegative integer-valued random variable with E [ W ] = w , independent of ( Z ( t )) t ≥ . For (cid:96) ≥

1, let B (cid:96) be28.i.d. with P ( B (cid:96) = 1) = P ( B (cid:96) = 2) = 1 /

2, independent of ( Z ( t )) t ≥ and W , and assignmutation number (cid:96) to cell number B (cid:96) for 1 ≤ (cid:96) ≤ W . Finally, let Y mt ( s ) be the numberof descendants of cell Y mt at time t + s ( Y mt (0) = 1). With this notation, deﬁne A j,k,(cid:96) ( t ) := { Z ( t ) = k } ∩ D t, ∆ t ∩ { (cid:96) ≤ W } ∩ { Y B (cid:96) t ( t N − t ) = j } ∩ { Z ( t N ) > } . This is the event that the tumor survives to time t N , that it consists of k cells at time t , that exactly one of the k cells divides in [ t, t + ∆ t ], that at least (cid:96) mutations occur onthis division, and that mutation number (cid:96) is found in j cells at time t N . The reason weare interested in this event is that we can write E (cid:2) ( S j,t N (cid:0) t + ∆ t ) − S j,t N ( t ) (cid:1) { Z ( t N ) > } (cid:3) = (cid:80) ∞ k =1 (cid:80) ∞ (cid:96) =1 P (cid:0) A j,k,(cid:96) ( t ) (cid:1) + o (∆ t ) , where the o (∆ t ) term captures the possibility of more than one cell division in [ t, t + ∆ t ].To compute P (cid:0) A j,k,(cid:96) ( t ) (cid:1) , note ﬁrst that P (cid:0) A j,k,(cid:96) ( t ) (cid:1) = P (cid:0) { Z ( t ) = k } ∩ D t, ∆ t ∩ { (cid:96) ≤ W } ∩ { Y B (cid:96) t ( t N − t ) = j } (cid:1) , since the survival event { Z ( t N ) > } is implied by the other events. By independence, P (cid:0) A j,k,(cid:96) ( t ) (cid:1) = P ( (cid:96) ≤ W ) · P (cid:0) { Z ( t ) = k } ∩ D t, ∆ t ∩ { Y B (cid:96) t ( t N − t ) = j } (cid:1) , To analyze the latter probability, note that since B (cid:96) is independent of ( Z ( t )) t ≥ , and (cid:0) ( Z ( s )) s ≤ t , ( Y t ( s )) s ≥ (cid:1) = (cid:0) ( Z ( s )) s ≤ t , ( Y t ( s )) s ≥ (cid:1) , we can write P (cid:0) { Z ( t ) = k } ∩ D t, ∆ t ∩ { Y B (cid:96) t ( t N − t ) = j } (cid:1) = P (cid:0) { Z ( t ) = k } ∩ D t, ∆ t ∩ { Y t ( t N − t ) = j } (cid:1) . Using the Markov property, we can calculate the latter probability as P (cid:0) { Z ( t ) = k } ∩ D t, ∆ t ∩ { Y t ( t N − t ) = j } (cid:1) = P ( Z ( t ) = k ) · P ( D t, ∆ t | Z ( t ) = k ) · P (cid:0) Y t ( t N − t ) = j (cid:12)(cid:12) Z ( t ) = k, D t, ∆ t (cid:1) = P ( Z ( t ) = k ) · e − kr ∆ t kr ∆ t · ( p j ( t N − t ) + O (∆ t )) . Combining the above, we obtain E (cid:2)(cid:0) S j,t N ( t + ∆ t ) − S j,t N ( t ) (cid:1) { Z ( t N ) > } (cid:3) = (cid:80) ∞ k =1 (cid:80) ∞ (cid:96) =1 P (cid:0) A j,k,(cid:96) ( t ) (cid:1) + o (∆ t )= r p j ( t N − t )∆ t · (cid:0) (cid:80) ∞ (cid:96) =1 P ( W ≥ (cid:96) ) (cid:1) · (cid:0) (cid:80) ∞ k =1 k P ( Z ( t ) = k ) (cid:1) + o (∆ t )= wr e λ t ∆ t · p j ( t N − t ) + o (∆ t ) , where we use (cid:80) ∞ (cid:96) =1 P ( W ≥ (cid:96) ) = E [ W ] = w and (cid:80) ∞ k =1 k P ( Z ( t ) = k ) = E [ Z ( t )] = e λ t .This concludes the proof.(2) Let ( X n ) n ≥ denote the discrete-time jump process embedded in ( Z ( t )) t ≥ that onlykeeps track of changes in population size. More precisely, if σ n is the time of the n -th jump of ( Z ( t )) t ≥ , n ≥

1, then X = 1 and X n = Z ( σ n ) for n ≥

1. Since cellsdivide at rate r and die at rate d , ( X n ) n ≥ is a simple random walk, absorbed at0, which moves up with probability a := r / ( r + d ) = 1 / (1 + p ) and down with29robability b = 1 − a = p / (1 + p ). Since we are only interested in what happens untilthe population either goes extinct or reaches level N , we treat N as an absorbing state.Deﬁne T k := inf { n ≥ X n = k } , ≤ k ≤ N, (43)as the (discrete) time at which the random walk ﬁrst hits level k , with inf ∅ = ∞ . Let P j denote the probability measure of ( X n ) n ≥ when started at X = j . By the gambler’sruin formula, P j ( T k < T ) = (1 − p j ) / (1 − p k ) , ≤ j ≤ k. (44)For 1 ≤ k ≤ N −

1, let Λ k,k +1 denote the number of transitions from k to k + 1,Λ k,k +1 := (cid:80) ∞ j =0 { Z j = k,Z j +1 = k +1 } . and let Λ k denote the number of visits to k ,Λ k := (cid:80) ∞ j =0 { Z j = k } . By the strong Markov property and (44), we can write E [Λ k ] = P ( T k < T ) · E k [Λ k ] = q − p k · E k [Λ k ] . When the chain leaves state k , it moves up with probability 1 / (1 + p ) and down withprobability p / (1 + p ). Starting from k + 1, the probability that the chain does notreturn to k (probability it is absorbed at N ) is q / (1 − p N − k ) by (44), and startingfrom k −

1, the probability it does not return to k (probability it is absorbed at 0) is1 − (1 − p k − ) / (1 − p k ) again by (44). Thus, starting from k , Λ k has the geometricdistribution with support { , , . . . } and success probability p · q − p N − k + p p · (cid:0) − − p k − − p k (cid:1) = q (1 − p N )(1+ p )(1 − p k )(1 − p N − k ) . It follows that E [Λ k ] = (1+ p )(1 − p N − k )1 − p N , E [Λ k,k +1 ] = p · E [Λ k ] = − p N − k − p N . (45)For 1 ≤ k ≤ N −

1, deﬁne T ik,k +1 as the (discrete) time of the i -th transition from k to k + 1, T ik,k +1 := inf { n > T i − k : X n − = k, X n = k + 1 } , i ≥ , with T k,k +1 := 0 and inf ∅ = ∞ . A transition from k to k + 1 in ( X n ) n ≥ occurs dueto one of the k cells in the original process ( Z ( t )) t ≥ dividing. Assume W i,k mutationsoccur on the i -th such transition, where W i,k are i.i.d. nonnegative integer-valued randomvariables with E [ W i,k ] = w , independent of ( X n ) n ≥ . Enumerate the cells at time T ik,k +1 as Y i,k , . . . , Y k +1 i,k . By the same argument as laid out in part (1) above, we can assumethat each mutation is assigned to the ﬁrst cell. Let Y mi,k ( n ) be the number of descendantsof cell Y mi,k at time step T ik,k +1 + n ( Y mi,k (0) = 1). Then deﬁne the event A j,k,i,(cid:96) := (cid:8) T N < T , T ik,k +1 < ∞ , (cid:96) ≤ W i,k , Y i,k ( T N − T ik,k +1 ) = j (cid:9) . N , that it transitions at least i times from k to k + 1 before doing so, that at least (cid:96) mutations occur on the i -th suchtransition, and that the (cid:96) -th mutation is found in j cells at level N . We can then write E [ S j ( τ N ) | τ N < ∞ ] = (cid:0) P ( T N < T ) (cid:1) − · (cid:80) N − k =1 (cid:80) ∞ i =1 (cid:80) ∞ (cid:96) =1 P (cid:0) A j,k,i,(cid:96) (cid:1) = − p N q · (cid:80) N − k =1 (cid:80) ∞ i =1 (cid:80) ∞ (cid:96) =1 P (cid:0) A j,k,i,(cid:96) (cid:1) . To compute P (cid:0) A j,k,i,(cid:96) (cid:1) , note ﬁrst that by independence, P (cid:0) T N < T , T ik,k +1 < ∞ , (cid:96) ≤ W i,k , Y i,k ( T N − T ik,k +1 ) = j (cid:1) = P (cid:0) (cid:96) ≤ W i,k ) · P ( T N < T , T ik,k +1 < ∞ , Y i,k ( T N − T ik,k +1 ) = j (cid:1) . By the strong Markov property, with j ≥ P ( T N < T , T ik,k +1 < ∞ , Y i,k ( T N − T ik,k +1 ) = j (cid:1) = P ( T N < T , Y i,k ( T N − T ik,k +1 ) = j | T ik,k +1 < ∞ (cid:1) · P ( T ik,k +1 < ∞ )= P k +1 (cid:0) T N < T , Y ( T N ) = j (cid:1) · P ( T ik,k +1 < ∞ ) , where we restart the chain at the stopping time T ik,k +1 with k + 1 cells enumerated as Y , . . . , Y k +1 . Deﬁne h ( j,N − j )(1 ,k ) := P k +1 ( T N < T , Y ( T N ) = j ) for the moment, i.e. theprobability that starting with 1 mutated and k non-mutated cells, there are j mutatedand N − j non-mutated cells when the population reaches level N . We can then write P ( T N < T , T ik,k +1 < ∞ , Y i,k ( T N − T ik,k +1 ) = j (cid:1) = h ( j,N − j )(1 ,k ) · P ( T ik,k +1 < ∞ ) . Combining the above, and using E [Λ k,k +1 ] = (1 − p N − k ) / (1 − p N ) by (45), we obtain E [ S j ( τ N ) | τ N < ∞ ]= − p N q · (cid:80) N − k =1 (cid:0) (cid:80) ∞ i =1 (cid:0) (cid:80) ∞ (cid:96) =1 P ( W i,k ≥ (cid:96) ) (cid:1) · P ( T ik,k +1 < ∞ ) (cid:1) · h ( j,N − j )(1 ,k ) = − p N q · (cid:80) N − k =1 w · E [Λ k,k +1 ] · h ( j,N − j )(1 ,k ) = ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) h ( j,N − j )(1 ,k ) , which is the desired result.It remains to show how the probabilities h ( j,N − j )(1 ,k ) for 1 ≤ j ≤ N can be computed. Asin the proof of (1) of Proposition 1, one can observe that h ( r,N − r )( (cid:96),m ) is the probability ofabsorption in state ( r, N − r ), starting from state ( (cid:96), m ), for a Markov chain on statespace S := { ( (cid:96), m ) : (cid:96), m ≥ (cid:96) + m ≤ N } , where the state ( (cid:96), m ) keeps track ofthe number of mutated and non-mutated cells at population level (cid:96) + m . The diﬀerenceis that now, cells can die, so population level changes can both be up and down. Thetransition probabilities are therefore more complex in this case, and given by( (cid:96), m ) → ( (cid:96) + 1 , m ) w.p. (cid:96)r / (( (cid:96) + m )( r + d )) = (cid:96)/ (( (cid:96) + m )(1 + p )) , ( (cid:96), m ) → ( (cid:96) − , m ) w.p. (cid:96)d / (( (cid:96) + m )( r + d )) = (cid:96)p / (( (cid:96) + m )(1 + p )) , ( (cid:96), m ) → ( (cid:96), m + 1) w.p. mr / (( (cid:96) + m )( r + d )) = m/ (( (cid:96) + m )(1 + p )) , ( (cid:96), m ) → ( (cid:96), m −

1) w.p. md / (( (cid:96) + m )( r + d )) = mp / (( (cid:96) + m )(1 + p )) , S := { ( (cid:96), m ) : (cid:96), m ≥ , (cid:96) + m ≤ N } that tracks the number of mutated and non-mutated cells at each populationlevel. Contrary to the chain for the skeleton process of Proposition 1 (Figure 13), we nowincorporate cell death, which means both that population level changes can be up and down,and that we add states of the form ( (cid:96), m ) with (cid:96) = 0 or m = 0. The states (0 ,

0) and { ( (cid:96), N − (cid:96) ) } ≤ (cid:96) ≤ N are absorbing, and the states { ( (cid:96), m ) : (cid:96), m ≥ , (cid:96) + m < N } , { ( (cid:96), } ≤ (cid:96) ≤ N − and { (0 , m ) } ≤ m ≤ N − form their respective communicating classes. Colored arrows indicatetransitions out of communicating classes and dashed boxes identify absorbing states.for ( (cid:96), m ) with (cid:96), m ≥ < (cid:96) + m < N . The states (0 ,

0) and ( (cid:96), N − (cid:96) ) with0 ≤ (cid:96) ≤ N are absorbing. A diagram of the Markov chain is shown in Figure 14.For given ( r, s ) ∈ A := { (0 , } ∪ { ( r, s ) : r, s ≥ r + s = N } and ( (cid:96), m ) with (cid:96), m ≥ < (cid:96) + m < N , by conditioning on the ﬁrst transition out of state ( (cid:96), m ), we canderive the following recursion for h ( r,s )( (cid:96),m ) :( (cid:96) + m )(1 + p ) h ( r,s )( (cid:96),m ) = (cid:96)h ( r,s )( (cid:96) +1 ,m ) + (cid:96)p h ( r,s )( (cid:96) − ,m ) + mh ( r,s )( (cid:96),m +1) + mp h ( r,s )( (cid:96),m − . By the gambler’s ruin formula (44), the boundary conditions are h ( N, (cid:96), = 1 − h (0 , (cid:96), = (1 − p (cid:96) ) / (1 − p N ) , ≤ (cid:96) ≤ N,h ( r,s )( (cid:96), = 0 , ≤ (cid:96) ≤ N, ( r, s ) / ∈ { (0 , , ( N, } h (0 ,N )(0 ,m ) = 1 − h (0 , ,m ) = (1 − p m ) / (1 − p N ) , ≤ m ≤ N,h ( r,s )(0 ,m ) = 0 , ≤ m ≤ N, ( r, s ) / ∈ { (0 , , (0 , N ) } ,h ( r,N − r )( (cid:96),N − (cid:96) ) = δ r,(cid:96) , ≤ (cid:96) ≤ N − ,h ( r,s )( (cid:96),N − (cid:96) ) = 0 , ≤ (cid:96) ≤ N − , ( r, s ) ∈ { (0 , , ( N, , (0 , N ) } . (46)This is the desired linear system. 32 Continuous mutation accumulation

C.1 Skeleton spectrum

Under continuous mutation accumulation, the expected ﬁxed-time spectrum of the skeletonis given by E [ ˜ S j (˜ t N )] = ( ν/λ ) N · (cid:82) − /N (1 − y ) y j − dy, which is the same as (9) of Proposition 1 with w/q replaced by ν/λ , the eﬀective mutationrate of the continuous-time model. We can use the same proof as in Appendix A, simplyreplacing the mutation rate wr by ν and noting that q = λ /r . However, the expectedﬁxed-size spectrum of the skeleton becomes E [ ˜ S j (˜ τ N ) | Ω ∞ ] = (cid:40) ( ν/λ ) · (cid:80) N − max(2 ,j ) k =1 kN − j (cid:81) j − n =1 (1 − kN − n ) , ≤ j ≤ N − ,ν/λ , j = N. (47)In the continuous model, mutations occur at rate ν per unit time, and the eﬀective type-2 celldivisions occur at rate λ per unit time. Thus, for 1 ≤ k ≤ N −

1, the number of mutationsthat accumulate on level k , prior to the type-2 division that changes levels to k + 1, has thegeometric distribution with support { , , , . . . } and success probability kλ / ( kλ + kν ) = λ / ( λ + ν ) . The expected number of mutations per level is therefore( λ + ν ) /λ − ν/λ , which applies to all levels k with 1 ≤ k ≤ N −

1. In particular, there are ν/λ (= w/q )clonal mutations in the continuous model, as opposed to wp /q = w/q − w clonal mutationsin the discrete model. The reason for this diﬀerence is that in the discrete model, the clonalmutations come from the type-1 divisions that occur before the ﬁrst type-2 division in theprocess. The ﬁrst type-2 division adds w mutations, but it also changes levels, so mutationsoccurring on this division are not clonal. In the continuous model, all mutations occur inbetween cell divisions, which is why there is no such boundary eﬀect. Similarly, in the discretemodel, the very last type-2 division that changes the skeleton size to N adds w mutations onaverage that each ends up in one cell. This creates the wδ ,j term in (11) of Proposition 2,which does not occur in (47) for the continuous model. The key diﬀerences between mutationaccumulation in the two models are diagramed in Figure 15. C.2 Total population spectrum

Under continuous mutation accumulation, the expected ﬁxed-time spectrum of the totalpopulation is given by E [ S j ( t N ) | Z ( t N ) >

0] = ( ν/r ) N · (cid:82) − /N (1 − p y ) − (1 − y ) y j − dy, which is the same as (17) of Proposition 2 with w replaced by ν/r . We can use the same proofas in Appendix B, replacing the mutation rate wr by ν . However, the expected ﬁxed-sizespectrum of the total population becomes E [ S j ( τ N ) | τ N < ∞ ] = ( ν/λ ) · (cid:80) N − k =1 (1 − p N − k ) h j (1 ,k − , ≤ j ≤ N. (a) In the discrete model of mutation accumulation, mutations coincide withcell divisions. On average, wp /q mutations accumulate on type-1 divisions in betweentwo type-2 divisions, and w mutations are added on each type-2 division. This results is wp /q + w = w/q mutations on average per skeleton level, for all but the ﬁrst level. (b) In the continuous model, all ν/λ (= w/q ) mutations per level accumulate in between type-2 divisions. The diﬀerences between the discrete and continuous model result in boundaryeﬀects at j = 1 and j = N in the ﬁxed-size spectrum (11) of Proposition 1 that do not appearin the ﬁxed-size result (47) for the continuous model.In the continuous model, mutations no longer coincide with changes in population level, soinstead of counting level changes Λ k,k +1 as we did in the proof of part (2) of Proposition 2,we need to compute the time spent at level k . We already know from (45) that the numberof visits to level k in the embedded discrete-time chain ( X n ) n ≥ has expected value E [Λ k ] = (1+ p )(1 − p N − k )1 − p N . During each visit to state k in the discrete-time chain, the time spent at population level k in the continuous-time process ( Z ( t )) t ≥ is exponentially distributed with rate k ( r + d ) = kr (1 + p ). It follows that the mean time spent on level k for 1 ≤ k ≤ N − kr (1+ p ) · (1+ p )(1 − p N − k )1 − p N = − p N − k kr (1 − p N ) . Since mutations occur at rate kν per unit time on level k , we obtain E [ S j ( τ N ) | τ N < ∞ ] = − p N q · (cid:80) N − k =1 kν · − p N − k kr (1 − p N ) · h ( j,N − j )(1 ,k − = ( ν/λ ) · (cid:80) N − k =1 (1 − p N − k ) h ( j,N − j )(1 ,k − , (48)where we use that q = λ /r . We now have h ( j,N − j )(1 ,k − in the sum instead of h ( j,N − j )(1 ,k ) in (19)of Proposition 2 since mutations no longer coincide with population level changes. The ﬁrstequality in (48) can be obtained rigorously using the same argument as in the proof of part(2) of Proposition 2. 34 Fixed-time vs. ﬁxed-size spectrum

Here, we present a simple heuristic argument for whey the ﬁxed-time and ﬁxed-size spectraof the total population agree on j (cid:28) N when N is large. Conditional on the nonextinctionevent Ω ∞ , the tumor eventually grows at exponential rate λ . If s is the time it takes to gofrom population level k to population level N , we can write k = N e − λ s i.e. e λ s = N/k ,following [44]. If we assume s is deterministic, we can write h ( j,N − j )(1 ,k ) ≈ p j ( s ), where h ( j,N − j )(1 ,k ) is deﬁned as in Proposition 2 and ( p j ( s )) j ≥ is the size-distribution at time s for a single-cellderived clone, we can apply (37) with e λ s = N/k to obtain h ( j,N − j )(1 ,k ) ≈ p j ( s ) = q Nk ( N − p k ) · (cid:0) N − kN − p k (cid:1) j − . Then, observing that 1 − p N − k ≈ k (cid:28) N , we can write( w/q ) · (cid:80) N − k =1 (1 − p N − k ) · h j (1 ,k ) ≈ ( w/q ) · (cid:82) N q Nk ( N − p k ) · (cid:0) N − kN − p k (cid:1) j − . Using the substitution y := ( N − k ) / ( N − p k ) and writing N − p ≈ N , this becomes theﬁxed-time spectrum (17) of Proposition 2. E Derivation of the asymptotic expressions (20) and (21)

Here, we establish the asymptotic expressions (20) and (21) in the main text. To establish(20), ﬁx 0 < p < f j ( k ) := p k · j ( j +1)( j + k )( j + k +1) , j ≥ , k ≥ . Clearly, f j ( k ) ≤ p k for all j ≥ k ≥

0. Since (cid:80) ∞ k =0 p k = 1 /q < ∞ , it follows from thedominated convergence theorem thatlim j →∞ (cid:80) ∞ k =0 f j ( k ) = 1 /q , from which it follows that wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) ∼ ( w/q ) N · / ( j ( j + 1)) , j → ∞ . To establish (21), ﬁx j ≥ f p ( k ) := p k ( j + k )( j + k +1) , < p < , k ≥ . Clearly, f p ( k ) ≤ / (( j + k )( j + k + 1)) for all 0 < p < k ≥

0. Since (cid:80) ∞ k =0 1( j + k )( j + k +1) = (cid:80) ∞ k =0 (cid:0) j + k − j + k +1 (cid:1) = 1 /j < ∞ , it follows from the dominated convergence theorem thatlim p → (cid:80) ∞ k =0 f p ( k ) = 1 /j, from which it follows that wN · (cid:80) ∞ k =0 p k ( j + k )( j + k +1) ∼ wN · /j, p → . Proof of Proposition 3

Proposition 3. (1) For < p < , the expected total number of mutations in the ﬁxed-timespectrum is given by E [ M ( t N ) | Z ( t N ) >

0] = − wN · (1 /p ) log( q + p /N ) ∼ − wN · log( q ) /p , N → ∞ . For p = 0 , E [ M ( t N ) | Z ( t N ) >

0] = w ( N − ∼ wN as N → ∞ .(2) Deﬁne S and A as in Proposition 2. The expected total number of mutations in theﬁxed-size spectrum is given by E [ M ( τ N ) | τ N < ∞ ] = ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) (cid:0) − h (0 ,N )(1 ,k ) − h (0 , ,k ) (cid:1) , where for ( r, s ) ∈ A , the vector (cid:0) h ( r,s )( (cid:96),m ) (cid:1) ( (cid:96),m ) ∈S solves the linear system in Proposition 2.For p = 0 , E [ M ( τ N ) | τ N < ∞ ] = w ( N − ∼ wN as N → ∞ .Proof. (1) Deﬁne M j ( t ) := (cid:80) k ≥ j S k ( t ) as the cumulative number of mutations found in ≥ j cells at time t . For ﬁxed j ≥

1, by (17) of Proposition 2 and Fubini’s theorem, theexpected cumulative ﬁxed-time spectrum can be written as E [ M j ( t N ) | Z ( t N ) >

0] = wN · (cid:80) ∞ k = j (cid:0) (cid:82) − /N (1 − p y ) − (1 − y ) y k − dy (cid:1) = wN · (cid:82) − /N (1 − p y ) − (1 − y ) (cid:0) (cid:80) ∞ k = j y k − (cid:1) dy = wN · (cid:82) − /N (1 − p y ) − y j − dy = wN · (cid:80) ∞ k =0 p k (cid:0) (cid:82) − /N y j + k − dy (cid:1) = wN · (cid:80) ∞ k =0 p k j + k (1 − N ) j + k . (49)To obtain the desired result, set j = 1 in (49) and use that (cid:80) ∞ k =1 x k /k = − log(1 − x ).(2) By (19) of Proposition 2, E [ M ( τ N ) | τ N < ∞ ] = (cid:80) Nj =1 (cid:0) ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) · h ( j,N − j )(1 ,k ) (cid:1) = ( w/q ) · (cid:80) N − k =1 (1 − p N − k ) · (cid:0) (cid:80) Nj =1 h ( j,N − j )(1 ,k ) (cid:1) , and the result follows from the fact that (cid:80) ( r,s ) ∈ A h ( r,s )(1 ,k ) = 1. G Derivation of expression (26)

Here, we establish expression (26) in the main text. By (17) of Proposition 2 and Fubini’stheorem, we can write E [ S ( t N ) | Z ( t N ) > wN · (cid:82) − /N (1 − p y ) − (1 − y ) dy = wN · (cid:80) ∞ k =0 p k (cid:0) (cid:82) − /N y k (1 − y ) dy (cid:1) = wN · (cid:80) ∞ k =0 p k (cid:0) k +1 (1 − N ) k +1 − k +2 (1 − N ) k +2 (cid:1) = wN · (cid:80) ∞ k =0 p k k +1 (1 − N ) k +1 − wN · (cid:80) ∞ k =0 p k k +2 (1 − N ) k +2 . (50)36sing that (cid:80) ∞ k =1 x k /k = − log(1 − x ), the former term can be computed as wN · (cid:80) ∞ k =0 p k k +1 (1 − N ) k +1 = − wN · (1 /p ) log( q + p /N ) , and the latter term can be computed as wN · (cid:80) ∞ k =0 p k k +2 (1 − N ) k +2 = wN · (1 /p ) (cid:80) ∞ k =0 p k +20 k +2 (1 − N ) k +2 = wN · (1 /p ) (cid:0) − log( q + p /N ) − p (1 − /N ) (cid:1) . Combining with (50), we obtain E [ S ( t N ) | Z ( t N ) >

0] = wN · (1 /p )(1 − /N + ( q /p ) log( q + p /N )) , the desired result. H Laws of large numbers

Here, we present simple calculations in support of the laws of large numbers (28) and (29)in the main text. As stated in the main text, conditional on the nonextinction event Ω ∞ , Z ( t ) ∼ Y e λ t almost surely as t → ∞ , where Y has the exponential distribution withmean 1 /q (Theorem 1 of [33]). For the ﬁxed-time spectrum, the number of mutations thataccumulate in [0 , t N ] and are found in j ≥ t N is then approximately S j ( t N ) ≈ (cid:82) t N wr · Y e λ t · p j ( t N − t ) dt. From (42) in the proof of Proposition 2, we know that (cid:82) t N wr e λ t p j ( t N − t ) dt = we λ t N · (cid:82) − q / ( e λ tN − p )0 (1 − p y ) − (1 − y ) y j − dy ∼ wq N · (cid:82) (1 − p y ) − (1 − y ) y j − dy, N → ∞ , where we use that e λ t N = q N + p by the deﬁnition of t N in (15). This implies that S j ( t N ) ≈ q Y · wN · (cid:82) (1 − p y ) − (1 − y ) y j − dy for large N . Note that since Y has the exponential distribution with mean 1 /q , q Y has theexponential distribution with mean 1. This suggests (28) in the main text.For the ﬁxed-size spectrum, note that if N is large, then at time τ N − t , we can write Z ( τ N − t ) ≈ N e − λ t . The number of mutations that accumulate in [0 , τ N ] and are found in j ≥ τ N is then approximately S j ( τ N ) ≈ (cid:82) τ N wr · N e − λ t · p j ( t ) dt = N e − λ τ N · (cid:82) τ N wr · e λ t · p j ( τ N − t ) dt. Using (42) from the proof of Proposition 2 again, we can write (cid:82) τ N wr · e λ t · p j ( τ N − t ) dt = we λ τ N · (cid:82) − q / ( e λ τN − p )0 (1 − p y ) − (1 − y ) y j − dy, from which it follows that S j ( τ N ) ≈ wN · (cid:82) − q / ( e λ τN − p )0 (1 − p y ) − (1 − y ) y j − dy ≈ wN · (cid:82) (1 − p y ) − (1 − y ) y j − N is large. This suggests (29) in the main text.We ﬁnally show that it is straightforward to prove a law of large numbers for a simpliﬁedversion of our model, in which we assume that the tumor bulk grows deterministically, Z ( t ) = e λ t , mutant clones arise at stochastic rate r , each mutant clone acquires w mutations onaverage, and mutant clones grow stochastically. To state the result, let ˆ S j (˜ t N ) denote thenumber of mutations found in j ≥ t N under the simpliﬁed model, where ˜ t N isgiven by (7), i.e. e λ ˜ t N = N . We want to show thatˆ S j (˜ t N ) ∼ wN · (cid:82) (1 − p y ) − (1 − y ) y j − dy, (51)almost surely as N → ∞ . Note that the limit is deterministic since we assume deterministicgrowth of the tumor bulk. To prove the result, for 0 ≤ t ≤ ˜ t N , let ˆ N j, ˜ t N ( t ) denote the numberof mutant clones created in [0 , t ] that have size j ≥ t N . Then ( ˆ N j, ˜ t N ( t )) ≤ t ≤ ˜ t N is aninhomogeneous Poisson process with rate function ˆ λ ( t ) = r e λ t p j (˜ t N − t ) and mean functionˆ m ( t ) = (cid:82) t ˆ λ ( s ) ds, ≤ t ≤ ˜ t N . Set ˆ N j (˜ t N ) := ˆ N j, ˜ t N (˜ t N ). By (42) in the proof of Proposition 2, and the fact that e λ ˜ t N = N ,ˆ m (˜ t N ) = N · (cid:82) − q / ( N − p )0 (1 − p y ) − (1 − y ) y j − dy ∼ N · (cid:82) (1 − p y ) − (1 − y ) y j − dy, N → ∞ . Then, by a simple Poisson concentration inequality, see Theorem 1 of [53], P (cid:0) | ˆ N j (˜ t N ) / ˆ m (˜ t N ) − | > ( ˆ m (˜ t N )) − / (cid:1) = P (cid:0) | ˆ N j (˜ t N ) − ˆ m (˜ t N ) | > ( ˆ m (˜ t N )) / (cid:1) ≤ (cid:0) − ( ˆ m (˜ t N )) / (cid:14) m (˜ t N )) − / ) (cid:1) . Since ˆ m (˜ t N ) is of order N as N → ∞ , it follows from the Borel-Cantelli lemma thatˆ N j (˜ t N ) / ˆ m (˜ t N ) → N → ∞ . Now, let W , W , . . . be i.i.d. nonnegative integer-valuedrandom variables with mean w . We can then choose copies of W , W , . . . so thatˆ S j (˜ t N ) = (cid:80) ˆ N j (˜ t N ) k =1 W i . The desired result then follows from the strong law of large numbers.

References [1] Motoo Kimura. Genetic variability maintained in a ﬁnite population due to mutationalproduction of neutral and nearly neutral isoalleles.

Genetics research , 11(3):247–270,1968.[2] Richard Durrett.

Probability models for DNA sequence evolution . Springer Science &Business Media, 2008.[3] Motoo Kimura. The number of heterozygous nucleotide sites maintained in a ﬁnitepopulation due to steady ﬂux of mutations.

Genetics , 61(4):893, 1969.[4] John FC Kingman. On the genealogy of large populations.

Journal of applied probability ,pages 27–43, 1982. 385] John Frank Charles Kingman. The coalescent.

Stochastic processes and their applica-tions , 13(3):235–248, 1982.[6] GA Watterson. On the number of segregating sites in genetical models without recom-bination.

Theoretical population biology , 7(2):256–276, 1975.[7] Fumio Tajima. Statistical method for testing the neutral mutation hypothesis by dnapolymorphism.

Genetics , 123(3):585–595, 1989.[8] Yun-Xin Fu and Wen-Hsiung Li. Statistical tests of neutrality of mutations.

Genetics ,133(3):693–709, 1993.[9] Justin C Fay and Chung-I Wu. Hitchhiking under positive darwinian selection.

Genetics ,155(3):1405–1413, 2000.[10] Kai Zeng, Yun-Xin Fu, Suhua Shi, and Chung-I Wu. Statistical tests for detectingpositive selection by utilizing high-frequency variants.

Genetics , 174(3):1431–1439, 2006.[11] Peter Armitage and Richard Doll. The age distribution of cancer and a multi-stagetheory of carcinogenesis.

British journal of cancer , 8(1):1, 1954.[12] Peter Armitage and Richard Doll. A two-stage theory of carcinogenesis in relation tothe age distribution of human cancer.

British journal of cancer , 11(2):161, 1957.[13] Alfred G Knudson. Mutation and cancer: statistical study of retinoblastoma.

Proceedingsof the National Academy of Sciences , 68(4):820–823, 1971.[14] Peter C Nowell. The clonal evolution of tumor cell populations.

Science , 194(4260):23–28, 1976.[15] Cristian Tomasetti, Bert Vogelstein, and Giovanni Parmigiani. Half or more of thesomatic mutations in cancers of self-renewing tissues originate prior to tumor initiation.

Proceedings of the National Academy of Sciences , 110(6):1999–2004, 2013.[16] Ivana Bozic, Johannes G Reiter, Benjamin Allen, Tibor Antal, Krishnendu Chatterjee,Preya Shah, Yo Sup Moon, Amin Yaqubie, Nicole Kelly, Dung T Le, et al. Evolutionarydynamics of cancer in response to targeted combination therapy. elife , 2:e00747, 2013.[17] Rebecca A Burrell, Nicholas McGranahan, Jiri Bartek, and Charles Swanton. The causesand consequences of genetic heterogeneity in cancer evolution.

Nature , 501(7467):338–345, 2013.[18] Bert Vogelstein, Nickolas Papadopoulos, Victor E Velculescu, Shibin Zhou, Luis A Diaz,and Kenneth W Kinzler. Cancer genome landscapes. science , 339(6127):1546–1558,2013.[19] Nicholas McGranahan and Charles Swanton. Clonal heterogeneity and tumor evolution:past, present, and the future.

Cell , 168(4):613–628, 2017.[20] Andrea Sottoriva, Haeyoun Kang, Zhicheng Ma, Trevor A Graham, Matthew P Salomon,Junsong Zhao, Paul Marjoram, Kimberly Siegmund, Michael F Press, Darryl Shibata,et al. A big bang model of human colorectal tumor growth.

Nature genetics , 47(3):209–216, 2015. 3921] Shaoping Ling, Zheng Hu, Zuyu Yang, Fang Yang, Yawei Li, Pei Lin, Ke Chen, Lili Dong,Lihua Cao, Yong Tao, et al. Extremely high genetic diversity in a single tumor pointsto prevalence of non-darwinian cell evolution.

Proceedings of the National Academy ofSciences , 112(47):E6496–E6505, 2015.[22] Marc J Williams, Benjamin Werner, Chris P Barnes, Trevor A Graham, and AndreaSottoriva. Identiﬁcation of neutral tumor evolution across cancer types.

Nature genetics ,48(3):238, 2016.[23] Subramanian Venkatesan and Charles Swanton. Tumor evolutionary principles: howintratumor heterogeneity inﬂuences cancer treatment and outcome.

American Societyof Clinical Oncology Educational Book , 36:e141–e149, 2016.[24] Alexander Davis, Ruli Gao, and Nicholas Navin. Tumor evolution: Linear, branching,neutral or punctuated?

Biochimica et Biophysica Acta (BBA)-Reviews on Cancer ,1867(2):151–161, 2017.[25] Thomas O McDonald, Shaon Chakrabarti, and Franziska Michor. Currently availablebulk sequencing data do not necessarily support a model of neutral tumor evolution.

Nature genetics , 50(12):1620–1623, 2018.[26] Maxime Tarabichi, I˜nigo Martincorena, Moritz Gerstung, Armand M Leroi, FlorianMarkowetz, Paul T Spellman, Quaid D Morris, Ole Christian Lingjærde, David C Wedge,and Peter Van Loo. Neutral tumor evolution?

Nature genetics , 50(12):1630–1633, 2018.[27] Ivana Bozic, Chay Paterson, and Bartlomiej Waclaw. On measuring selection in cancerfrom subclonal mutation frequencies.

PLoS computational biology , 15(9):e1007368, 2019.[28] Timon Heide, Luis Zapata, Marc J Williams, Benjamin Werner, Giulio Caravagna,Chris P Barnes, Trevor A Graham, and Andrea Sottoriva. Reply to ‘neutral tumorevolution?’.

Nature genetics , 50(12):1633–1637, 2018.[29] Benjamin Werner, Marc J Williams, Chris P Barnes, Trevor A Graham, and AndreaSottoriva. Reply to ‘currently available bulk sequencing data do not necessarily supporta model of neutral tumor evolution’.

Nature genetics , 50(12):1624–1626, 2018.[30] Marc J Williams, Benjamin Werner, Timon Heide, Christina Curtis, Chris P Barnes,Andrea Sottoriva, and Trevor A Graham. Quantiﬁcation of subclonal selection in cancerfrom bulk sequencing data.

Nature genetics , 50(6):895–903, 2018.[31] Khanh N Dinh, Roman Jaksik, Marek Kimmel, Amaury Lambert, Simon Tavar´e, et al.Statistical inference for the evolutionary history of cancer genomes.

Statistical Science ,35(1):129–144, 2020.[32] Giulio Caravagna, Timon Heide, Marc J Williams, Luis Zapata, Daniel Nichol, KetevanChkhaidze, William Cross, George D Cresswell, Benjamin Werner, Ahmet Acar, et al.Subclonal reconstruction of tumors by using machine learning and population genetics.

Nature Genetics , 52(9):898–907, 2020.[33] Richard Durrett. Branching process models of cancer. In

Branching Process Models ofCancer , pages 1–63. Springer, 2015. 4034] Hisashi Ohtsuki and Hideki Innan. Forward and backward evolutionary processes andallele frequency spectrum in a cancer cell population.

Theoretical Population Biology ,117:43–50, 2017.[35] Rick Durrett. Population genetics of neutral mutations in exponentially growing cancercell populations.

The annals of applied probability: an oﬃcial journal of the Institute ofMathematical Statistics , 23(1):230, 2013.[36] Ivana Bozic, Jeﬀrey M Gerold, and Martin A Nowak. Quantifying clonal and subclonalpassenger mutations in cancer evolution.

PLoS computational biology , 12(2):e1004731,2016.[37] Benjamin Werner, Jack Case, Marc J Williams, Ketevan Chkhaidze, Daniel Temko,Javier Fern´andez-Mateos, George D Cresswell, Daniel Nichol, William Cross, Inmacu-lada Spiteri, et al. Measuring single cell divisions in human tissues from multi-regionsequencing data.

Nature communications , 11(1):1–9, 2020.[38] Siˆan Jones, Wei-dong Chen, Giovanni Parmigiani, Frank Diehl, Niko Beerenwinkel, TiborAntal, Arne Traulsen, Martin A Nowak, Christopher Siegel, Victor E Velculescu, et al.Comparative lesion sequencing provides insights into tumor evolution.

Proceedings ofthe National Academy of Sciences , 105(11):4283–4288, 2008.[39] Neil O’Connell. Yule process approximation for the skeleton of a branching process.

Journal of applied probability , 30(3):725–729, 1993.[40] Natalia L Komarova, Lin Wu, and Pierre Baldi. The ﬁxed-size luria–delbruck modelwith a nonzero death rate.

Mathematical biosciences , 210(1):253–290, 2007.[41] Stefano Avanzini and Tibor Antal. Cancer recurrence times from a branching processmodel.

PLoS computational biology , 15(11):e1007423, 2019.[42] Ugo Del Monte. Does the cell number 109 still really ﬁt one gram of tumor tissue?

Cellcycle , 8(3):505–506, 2009.[43] Kevin Leder, Jasmine Foo, Brian Skaggs, Mercedes Gorre, Charles L Sawyers, andFranziska Michor. Fitness conferred by bcr-abl kinase domain mutations determinesthe risk of pre-existing resistance in chronic myeloid leukemia.

PloS one , 6(11):e27682,2011.[44] Yoh Iwasa, Martin A Nowak, and Franziska Michor. Evolution of resistance during clonalexpansion.

Genetics , 172(4):2557–2566, 2006.[45] Amaury Lambert. The allelic partition for coalescent point processes. arXiv preprintarXiv:0804.2572 , 2008.[46] David Cheek, Tibor Antal, et al. Mutation frequencies in a birth–death branchingprocess.

The Annals of Applied Probability , 28(6):3922–3947, 2018.[47] Jack Kuipers, Katharina Jahn, Benjamin J Raphael, and Niko Beerenwinkel. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the lifehistories of tumors.

Genome research , 27(11):1885–1894, 2017.4148] David A Kessler and Herbert Levine. Large population solution of the stochasticluria–delbr¨uck evolution model.

Proceedings of the National Academy of Sciences ,110(29):11682–11687, 2013.[49] Peter Keller and Tibor Antal. Mutant number distribution in an exponentially growingpopulation.

Journal of Statistical Mechanics: Theory and Experiment , 2015(1):P01011,2015.[50] Robert C Griﬃths and Anthony G Pakes. An inﬁnite-alleles version of the simple branch-ing process.

Advances in applied probability , pages 489–524, 1988.[51] Nicolas Champagnat, Amaury Lambert, and Mathieu Richard. Birth and death pro-cesses with neutral mutations.

International Journal of Stochastic Analysis , 2012, 2012.[52] Xiaowei Wu and Marek Kimmel. Modeling neutral evolution using an inﬁnite-allelemarkov branching process. 2013.[53] C. Cannone. A short note on Poisson tail bounds. 2017.