[PDF] Inferring differentiation order in adaptive immune responses from population level data

Abstract

A hallmark of the adaptive immune response is the proliferation of pathogen-specific lymphocytes that leave in their wake a long lived population of cells that provide lasting immunity. A subject of ongoing investigation is when during an adaptive immune response those memory cells are produced. In two ground-breaking studies, Buchholz et al. (Science, 2013) and Gerlach et al. (Science, 2013) employed experimental methods that allowed identification of offspring from individual lymphocytes in vivo, which we call clonal data, at a single time point. Through the development, application and fitting of a mathematical model, Buchholz et al. (Science, 2013) concluded that, if memory is produced during the expansion phase, memory cell precursors are made before the effector cells that clear the original pathogen. We sought to determine the general validity and power of the modeling approach introduced in Buchholz et al. (Science, 2013) for quickly evaluating differentiation networks by adapting it to make it suitable for drawing inferences from more readily available non-clonal phenotypic proportion time-courses. We first established the method drew consistent deductions when fit to the non-clonal data in Buchholz et al. (Science, 2013) itself. We fit a variant of the model to data reported in Badovinac et al. (J. Immun., 2007), Schlub et al. (Immun. & Cell Bio., 2010), and Kinjo et al. (Nature Commun., 2015) with necessary simplifications to match different reported data in these papers. The deduction from the model was consistent with that in Buchholz et al. (Science, 2013), albeit with questionable parameterizations. An alternative possibility, supported by the data in Kinjo et al. (Nature Commun., 2015), is that memory precursors are created after the expansion phase, which is a deduction not possible from the mathematical methods provided in Buchholz et al. (Science, 2013).

Full PDF

IInferring differentiation order in adaptive immune responsesfrom population level data

Alexander S. Miles ∗ , Philip D. Hodgkin † , and Ken R. Duffy ‡ June 4, 2020

Abstract

A hallmark of the adaptive immune response is the proliferation of pathogen-speciﬁc lymphocytes thatleave in their wake a long lived population of cells that provide lasting immunity. A subject of ongoinginvestigation is when during an adaptive immune response those memory cells are produced. In two ground-breaking studies, Buchholz et al. (Science, 2013) and Gerlach et al. (Science, 2013) employed experimentalmethods that allowed identiﬁcation of offspring from individual lymphocytes in vivo , which we call clonaldata, at a single time point. Through the development, application and ﬁtting of a mathematical model,Buchholz et al. (Science, 2013) concluded that, if memory is produced during the expansion phase, memorycell precursors are made before the effector cells that clear the original pathogen. We sought to determinethe general validity and power of the modeling approach introduced in Buchholz et al. (Science, 2013) forquickly evaluating differentiation networks by adapting it to make it suitable for drawing inferences frommore readily available non-clonal phenotypic proportion time-courses. We ﬁrst established the method drewconsistent deductions when ﬁt to the non-clonal data in Buchholz et al. (Science, 2013) itself. We ﬁt a variantof the model to data reported in Badovinac et al. (Journal of Immunology, 2007), Schlub et al. (Immunologyand Cell Biology, 2010), and Kinjo et al. (Nature Communications, 2015) with necessary simpliﬁcationsto match different reported data in these papers. The deduction from the model was consistent with thatin Buchholz et al. (Science, 2013), albeit with questionable parameterizations. An alternative possibility,supported by the data in Kinjo et al. (Nature Communications, 2015), is that memory precursors are createdafter the expansion phase, which is a deduction not possible from the mathematical methods provided inBuchholz et al. (Science, 2013). This investigation further supports the value of the approach, though weindicate where published experimental ﬁndings run contrary to assumptions underlying the model, whichmay impact the strength of inferences, and what an alternate model more consistent with recent data mightbe.

During an adaptive immune response, a population of naive lymphocytes becomes heterogeneous, containinga population of effector cells that clear the pathogen and then die during the contraction phase, and a popu-lation of memory cells that are of a longer lineage. For CD8 + T cells, many theories have been proposed forthe dynamics of this differentiation. The three primarily championed ones are [1, 16, 4]: ∗ Hamilton Institute, Maynooth University, Ireland † Walter and Eliza Hall Institute, Royal Melbourne Hospital, Parkville, Australia and the Department of Medical Biology, The Universityof Melbourne, Parkville, Australia ‡ Hamilton Institute, Maynooth University, Ireland. E-mail: Corresponding [email protected] a r X i v : . [ q - b i o . CB ] J un Linear Effector First Model:

Naive cells differentiate into proliferating effector cells. Later in theimmune response, these cells differentiate into memory cells or die [19, 13, 23, 15]. • Linear Memory First Model:

Naive cells differentiate into memory precursor cells that then prolifer-ate, with some differentiating into effector cells. Some time after the expansion phase is completed theeffector cells die [3]. • Bifurcation Model:

Each naive cell has the potential to divide into either effector or memory precursorsubsets which then continue to proliferate inheriting their phenotype, and effector cells die after theexpansion phase while memory cells remain [5, 14].These three are non-exhaustive, high-level summaries of a more diverse and subtle set of theories. Provisionof a simple classiﬁcation is further complicated by phenotypic characterization of T cells where there is anoverlap between memory and effector functionality and phenotypic markers, sometimes resulting in a sub-category that is referred to as effector memory cells (TEMp, the p standing for precursor) to distinguish themfrom effector (TEF) and central memory cells (TCMp).The Linear Effector First Model has a long history and could be considered the traditional view [19, 1].In 2007, the authors of [5] proposed asymmetric cell division to explain heterogeneity with CD8 + T cells,supporting the Bifurcation Model, which won’t be considered further here. In 2013, two landmark paperswere published [3, 8] that further addressed this question. Both used novel methodologies to determineclonal relationships of cells. In addition, due to the more quantitative nature of their assay, the authors of [3]developed a methodology to ﬁt a memoryless multi-type Bellman-Harris process model [9] to clonal summarystatistics of CD8 + T cell phenotypes, ﬁnding that a quantitative realization of the Linear Memory First Modelwas the best explanation of their data.Let us introduce some notation in order to distinguish between the available data sources. For an exper-iment initiated at time t = n pathogen speciﬁc cells, let c fi ( t ) denote the number offspring that are ofphenotype f ∈ { ,..., F } from an initial transferred naive cell i at time t post infection. The method used in[3] was effectively unique in determining the number of cells of each phenotype that were descendent froman initially naive cell, c fi ( t ) , for each i and all f ∈ { TEF, TEMp, TCMp } at t = day eight post infection. Weshall refer to this data as being at a clonal level.A more common experimental set-up [3, 2, 21, 15] provides the fraction of all offspring that are of eachphenotype, f , at a given time t , ρ f ( t ) = ∑ ni = c fi ( t ) ∑ Fj = ∑ ni = c ji ( t ) , with clonal contributions being indistinguishable. Experiments recording this information are typically per-formed as a time-course, with cells harvested at distinct days post infection. We shall refer to it as cohort orpopulation level data.In the present paper, we describe deductions made by taking the modeling framework described in [3]and adapting it to make it suitable for ﬁtting to cohort data. Our aim was to explore the general validity andpower of the method for quickly evaluating differentiation networks from simple cohort time series as, unlikeclonal data, those data are more readily available. We used the method to interrogate data reported by otherauthors [2, 21, 15], that employ similar experimental systems to that described in [3].A high-level summary of the content of this paper is:1. To adapt the method in [3] for ﬁtting to non-clonal data, we introduce a distinct objective function.On comparing the best parameter ﬁts to six representative models of the 304 considered in the originalpaper to the time-course cohort data reported in [3] itself, the ﬁnding was the same as for the clonaldata: that if memory is made during the expansion phase, then the Linear Memory First Model is thebest. Moreover, the best-ﬁt parameterization from the cohort data was quantitatively similar to thatwhen ﬁt to clonal data, suggesting consistency of the method.2. Due to data limitations where fewer phenotypes were identiﬁed, for the papers [2, 21, 15] we ﬁt onlyto two representative differentiation structures: Linear Effector First and Linear Memory First. Naive pplication of the methodology to the entire time-course cohort data taken from [15] supported the Lin-ear Effector First Model over the Linear Memory First Model. A more sophisticated application wasperformed, noting that the number of antigen-speciﬁc T cells adoptively transferred in the experimentsreported in [15] was signiﬁcantly higher than in [3], resulting in a curtailed expansion phase. Limit-ing model ﬁtting to expansion phase only, data from blood and spleen in four papers, [2, 21, 3, 15]all support a Linear Memory First Model in favor of a Linear Effector First Model. Best ﬁt modelparameterizations, however, suggest the model is struggling to explain this data.3. Additional data reported in [15] suggests that the a priori assumption underlying the mathematicalanalysis that memory is created during the expansion phase could be questionable.4. While the evidence presented here further supports the utility of the method introduced in [3], weindicate some of the discrepancies between its model assumptions and published data, suggesting analternate model structures that are more consistent with them, which might be of use in future. Clonal level experiments [3] : The authors performed innovative OT-1 CD8+ T cell adoptive transfer experi-ments where cell lineage could be determined. Donor mice were engineered to express both CD45 and CD90markers homozygously, heterozygously or not at all, allowing for nine combinations of markers in total, Fig.1A. C57BL/6 host mice were chosen from the double heterogeneous marker pool. Recipient mice received amix of naive OT-1 CD8 T cells that contained one cell from each of the seven uniquely congenically markeddonor mice together with 100 cells from one further uniquely marked mouse. As lineage was determined bycell surface markers, clonal cell numbers could be quantitatively evaluated by FACS. The cellular barcodingapproach used in the twinned paper [8] was based on DNA tags and their processing via PCR makes that dataless suitable for quantitative modeling analysis.In the experiments reported in [3], the host was infected with 5 × Listeria monocytogenes, a pathogenicbacteria, expressing OVA (LM-OVA). At day eight post infection, cells were taken from the spleen, lymphnodes and lungs. The cells were sorted into three phenotypes: • TCMp : CD62L + CD27 + phenotype. • TEMp : CD62L - CD27 + phenotype. • TEF : CD62L - CD27 - phenotype.Summary statistics from the data collected for these phenotypically deﬁned populations were reported: theaverage cell count per family, coefﬁcients of variation and correlation coefﬁcients between families for thethree phenotypes, Fig. 2A. Cohort level experiments, [2, 21, 3, 15]

Each of these four papers also performed OT-1 CD8+ T celladoptive transfer experiments, Fig. 1B. In these, cells were not tracked at a clonal level, { c fi ( t ) } , but at acohort level { ρ f ( t ) } , via expression of either CD45 or CD90. The host mouse was infected with a pathogenexpressing OVA; PR8-OVA in [15] and LM-OVA in the other papers. The reported experiments have cellstaken from different organs, spleen, or mesenteric lymph nodes (MLN), at a range of times post infections. Inaddition, [2, 21] performed experiments with varying number of transferred OT-1 T cells. All of those papersreport the percentage of cells that were CD62L + at harvest times.For cohort spleen and blood data reported in all four papers, at early stages post infection all experimentsshow an initial large CD62L + population, Fig. 2B, which the authors of [3] associate this with an earlymemory precursor population while the authors of [15] associate it with cells still being naive. All papersshow a decrease in this initial CD62L + proportion, which both [15] and [3] explain is due to the effector poolincreasing in size. The paper [15] reports an increase in the percentage of CD62L + by day seven and explainsthis as cells differentiating into memory cells, whereas the data in [3] continues to show a decrease by dayseven.For experiments with a high number of adoptively transferred cells, [21] and [2] report an earlier expan-sion peak and a shorter expansion phase. Experiments transferring less than 500 cells had a peak at day seven, hile for experiments transferring 5 × cells the peak immune response occurred at day ﬁve. Curtailingthe experimental data with higher numbers of adoptively transferred cells to its expansion phase (a method forestimating this time we describe later in the paper) removes much of the later upturn in CD62L proportions,Fig. 2B. Having transferred 5 × cells, the paper [2] estimates the family size increase to be 40 – 400fold on average at the peak, compared to the estimated 400,000 fold increase on average for the experimenttransferring 50 cells. The authors of [3] introduced a collection of memoryless multi-type Bellman-Harris stochastic processes [9]to model the expansion phase of an adaptive immune response based on the following assumptions. • There are four types of cell: naive, TCMp, TEMp, and TEF. • There is no cell death during the expansion phase described by the model. • For each cell type, the lifetime of each cell is an independent and identically exponentially distributedrandom variable with a cell-type dependent mean. • At the end of its lifetime, a naive cell differentiates into one of the other cell types. • TCMp, TEMp, and TEF cells can divide or differentiate, with the each cell’s fate determined proba-bilistically and independently based on a cell-type dependent parameterization. • Possible differentiation between cell-types is known as a path. Different model types have a differentcombination of paths between cell types, which can be uni- or bi-directional. • There must be a direct or indirect path from the naive cell to each cell-type. • When the combinatorics are all done, the collection of possible paths describes 304 different modelswith between seven and 12 parameters.For a given differentiation network and parameterization, the authors of [3] employ a χ distance as a mea-sure of a quality of ﬁt between expected summary statistics and observed summary statistics. Denote the nineobserved clonal summary statistics per-phenotype of mean family size, coefﬁcient of variation and Pearsoncorrelation between the phenotypes as ˆ x = ( ˆ x ,..., ˆ x ) ∈ R , and their equivalent expected values for a givenmodel plus parameterization, θ , as (cid:126) x ( θ ) ∈ R . With experimental uncertainty of each observation, ˆ x i , esti-mated by bootstrap approximation as ˆ σ i , the χ distance between models and observations is computed tobe d ( ˆ x ,(cid:126) x ( θ )) = ∑ i = ( ˆ x i − x i ( θ )) ˆ σ i . For each of the 304 possible models, the best-ﬁt parameterization, θ ∗ = arginf θ d ( ˆ x ,(cid:126) x ( θ )) , was numericallyidentiﬁed and more involved models were penalised via Akaike Information Criterion (AIC). One of the twobest ﬁts over all 304 possible differentiation networks was a linear differentiation pathway with memory cellsappearing before effector cells (the second model was as the ﬁrst, but with an additional 10% chance for naiveto skip the TCMp stage of the path).We wished to determine the method’s consistency when ﬁtting the model class described in [3] to thetime-course cohort data { ρ f ( t ) : t = day 1 , , , , , } , described in that paper rather than to the clonal data { c fi ( t ) : t = day 8 } as the authors had originally done. To do so, we replaced the clonal summary statistics withˆ y = ( ˆ y ,..., ˆ y ) where ˆ y is the average family size at the peak and y ,..., y are the 18 measured proportionsof phenotypes observed at different times post infection. We denote ˆ σ i as the sample error of the statistic y i . In the case of the average family size, the SEM was approximated by summing together the SEMs of thephenotypic components. Thus, we use the formulae d ( ˆ y ,(cid:126) y ( θ )) = ∑ i = ( ˆ y i − y i ( θ )) ˆ σ i . to measure the distance between expected and observed cohort statistics.We numerically identiﬁed the best ﬁt parameterization of the model for six representative differentiationpathways, Fig. 3A, of the 304 considered in [3]. These six were chosen since they include one of the two bestﬁt models and were sufﬁcient to recreate the most signiﬁcant results reported in [3]. We label the six modelsby: • Model 1 : Naive → CD62L + CD27 + → CD62L - CD27 + → CD62L - CD27 - . • Model 2 : Naive → CD62L + CD27 + → CD62L - CD27 - → CD62L - CD27 + . • Model 3 : Naive → CD62L + CD27 - → CD62L + CD27 + → CD62L - CD27 - . • Model 4 : Naive → CD62L + CD27 - → CD62L - CD27 - → CD62L + CD27 + . • Model 5 : Naive → CD62L - CD27 - → CD62L + CD27 + → CD62L + CD27 - . • Model 6 : Naive → CD62L - CD27 - → CD62L - CD27 + → CD62L + CD27 + .Model 1 is one of the two original best ﬁtting models reported in [3]. As these models all have the samenumber of parameters, we do not need to consider penalization by AIC. Due to less extensive data in [2, 21,15], later we will reduce the set model further to consider just two competing models, Memory (CD62L + )First and Effector (CD62L - ) First, Fig. 3B.The model in [3] has four cell types, but only reports data for three phenotypes and no statistics for naivecells are presented. This could be interpreted two ways, either naive cells have been excluded from the data, orthey were included in the count of one of the other phenotypes. We note that [3] deﬁne naive cells as CD62L + CD27 + ([3] Supplementary Fig. 3), but with a different gating strategy than used for the classiﬁcation of othercell types ([3] Supplementary Fig. 22). For the analysis here, we assume that naive cells are included in theTCMp count reported in [3].When applied to the time-course cohort data reported in [3], we found that Model 1, Memory First,to be the best ﬁt, consistent with the original deduction from ﬁtting to clonal data, Fig. 4A. Notably, theparameterization for the best ﬁtting model was remarkably similar for ﬁve parameters out of six, as shown inFig. 4B. Fitting to either data-set is consistent with an increase in the division rate as cells differentiate frommemory to effector cells, as also reported in [3]. The difference in the parameter λ N , which parameterizesthe lifetime of naive cells, could be explained by the fact that the adjusted method compares the sum ofproportions of naive and TCMp cells to one statistic, the proportion of CD62L + CD27 + cells, and is not ableto discriminate between their separate contributions. These results suggest that, for this model, ﬁtting totime-course cohort data with an average family size is a reasonable alternative to ﬁtting to clonal data ofmean count of cells, variances and covariances. Adaptation.

As data from other papers we wish to analyze, [2, 21, 15], did not include all of the informationas [3] it was necessary to use a modiﬁed scheme. In some cases, average family size or SEM informationwas not given. All four papers provide the proportion of CD62L + cells, but some do not report CD27 data.Thus we deﬁne a new set of models that do not distinguish between CD27 - and CD27 + cells so that we canﬁt the observed statistics without this information. We change the assumption that there are four underlyingcell types: naive, TCMp, TEMp and TEF used in [3], to three cell types: naive, memory and effector cells.Thus the phenotypes are: • Naive and memory cells : CD62L + phenotype. Effector cells : CD62L - phenotype.We label the two versions of these models as: • Linear Memory First Model differentiation path: Naive → CD62L + → CD62L - . • Linear Effector First Model differentiation path: Naive → CD62L - → CD62L + .It is estimated that a mouse has 100–1000 cells capable of responding to any speciﬁc antigen, meaninglarge adoptive transfers such as 10 used in [15] dwarf the endogenous response [2]. The papers [2, 21] estab-lish that increased transfer size leads to an earlier peak response. We used log-linear ﬁtting to approximate therelationship between the number of transferred cells and the day of peak response, Fig. 5A. For an experimenttransferring 107 cells, as described in [3], we estimate the peak to be at day between seven and eight. For anexperiment transferring 10 cells, as described in [15], we estimate the peak day to be between day four andﬁve. We proceed with the assumption in [3] that the expansion phase in that system continues until day eight.The χ objective function employed by [3] uses estimates of the sample variance, ˆ σ i , for each observedstatistic. There is not, however, sufﬁcient data reported to make the bootstrapping calculation for the SEM of[15] data. As a result we needed to modify the weighting in the objective function so it was not dependenton this missing information. This weighting also has to manage the fact that an average family size is on adifferent scale than a proportional statistic. Let us suppose we have an observed average family size denoted asˆ y and k statistics observed of proportional phenotypes at different times denoted as ˆ y ,... ˆ y k + . The objectivefunction we applied was a weighted mean squared error deﬁned by d ( ˆ y ,(cid:126) y ( θ )) = ( ˆ y − y ( θ )) ˆ y + k ∑ i = ( ˆ y i − y i ( θ )) . That is, we use another weighted mean squared error where we only divide one of the sum’s elements by therespective observed statistic squared, the observed populations at day eight post infection. This weighting isrequired to ensure that the population statistic does not disproportionately inﬂuence the objective function, asthe population scales exponentially over time while the proportional statistics do not.The methodology of ﬁtting models to observed cohort data requires an estimate of the average family sizeat peak immune response in order for models to scale correctly, however this data is not reported in [15].The authors of [21] provide a methodology for estimating the difference in the average number of divisionstransferred OT-1 CD8+ T cells have made by the time of peak response between two experiments. We adoptand extend this method to estimate average family size at peak. We ﬁt a log linear relationship to estimatethe percentage of OT-1 cell at peak of the immune response for different adoptive transfer amounts, Fig. 5B.We also make the assumption that the total number lymphocytes (OT-1 and endogenous) at the peak immuneresponse is the same across all experiments, an assumption supported by spleen data in [21]. Spleen countsreported in [2] suggest this may not always be the case, particularly for low transfer numbers. Finally weassume that the average family size reported in [3] is representative. Combing these assumptions allowedus to derive the average family size for a given adoptive transfer of OT-1 cells, Fig. 5C. The log-linearassumption is used for simplicity, but predicts negative growth for low numbers of OT-1 cells, and so canonly be useful above a threshold.This scheme indicates that with a transfer of 1 × cells, as in [3], each clone results in an average ofnine more divisions than the clones in a transfer of 1 × cells, as in [15]. This scales roughly with thatestimated in [21] that a transfer of 3 . × cells results in ﬁve more divisions than a 4 × transfer. Wecalculate that in the experiment reported in [15] we would expect 23 cells per family (four to ﬁve averagedivisions) at the peak of around day 4.5. This is reasonable when compared to data in [2], which reported thatthere is only a 13 fold increase in total OT-1 proportion at peak for a 10 fold increase in precursor transfer.While relative numbers are comparable, large discrepancies in reported data for absolute cell counts highlightcaveats with these estimates. On transferring 10 cells, [3] reports an average peak family size of 15k, while[2] gives a 400k estimate for 50 cells transferred. Our estimate for the total number of lymphocytes in amouse, calculated from the value reported in [3], is four fold lower than that [21] reported in the spleen.Because of the fragility of this calculation, we performed a sanity check which established there was nosigniﬁcant impact (data not shown). pplication. When ﬁtting to all the data in [15] up to day eight post infection, contrary to the deductionin [3], the method indicated that the Linear Effector First Model provided the better ﬁt, Fig. 6A. The LinearEffector First Model was also the best ﬁt for the experiments in [21] with high adoptive transfer numbers. Forexperiments with lower numbers of adoptively transferred cells, the deduction ﬂipped and the Linear Memoryﬁrst model was the best ﬁt. Thus, when ﬁtting without adjusting for the number of cell adoptively transferred,we found contradictory results.We restricted the data from these papers to the estimated expansion phase. Curtailed data with less thanfour data points was not considered. For all other data sets, the Linear Memory First Model provided the bestﬁt, Fig. 6B. On examination, however, all of best ﬁts, had biologically implausible parameterizations, withnaive cell average lifetimes taking values far over 10 days, with the one exception of the Memory First modelﬁtting to [3] data. This is at odds with data showing that, even for high adoptive transfers, less than 0.1%of the OT-1 population comprised unrecruited cells [21], and data from [15] showing that while no cells haddivided on day two post infection the majority were preparing to divide. We therefore further adapted themethod, re-performing the analysis, but upper bounding parameters so that average cell lifetime is ﬁve daysor less. When ﬁtting with this adjustment, the Linear Memory First Model still provided the best ﬁt to allthe curtailed data sets, but the result was stronger, with the Linear Effector First Model failing to give a goodexplanation for the data in [3, 15], Fig. 6C. This result suggested that under the assumptions of the modelthere was no biologically plausible good ﬁt for the Linear Effector First Model when ﬁt to either data set. Weapplied the ﬁtting method to blood data from [2] which was not included in the main analysis, and this alsoreturned a Memory First model as the best ﬁt, Fig. 6D. We also applied the method to MLN data reported in[15], which, in contrast to the other data, returned the Linear Effector First Model as the best ﬁt. The MLNdata, however, may not be appropriate for this analysis, due to the high levels of migration.The method chose boundary values for at least one parameter in all cases except when ﬁtting the LinearMemory First Model to [3] data, Fig. 7. In addition, some models predicted little memory, primarily ﬁttingthe CD62L + portion with naive and effector cells, as was the case with the Effector First Model ﬁtted to[15]. This suggested that the model may not be sophisticated enough or did not have enough data to ﬁt to.An alternative explanation could come from other [15] results, which showed a homogeneous populationeffectors on day four post infection make way for two homogeneous groups of effectors and memory cells onday seven. With the insight that day seven is after the expansion phase in the experimental setup describedin [15], this result suggests the possibility that memory may appear after the strictly exponential expansionphase considered by the model in [3] has come to an end. The evidence presented here suggests that the modeling methodology introduced in [3] to discriminate be-tween differentiation pathways, originally implemented to ﬁt to clonal data, is robust in its deductions whenmodiﬁed to ﬁt to more readily available time-course non-clonal data. Robustness is a key advantage of theapproach for quickly evaluating differentiation networks.We acknowledge that our application of the model to published data has several limitations beyond theoriginal: we considered a much more limited collection of linear-only paths; we had to collapse phenotypicdeﬁnitions due to not having CD27 measurements; we extracted data from papers directly from graphs; weemployed an adapted method for estimating day of peak immune response, and average family size.Despite all of those considerations, the model provided a robust outcome when ﬁt to the non-clonal cohorttime-course published in [3]. Once day of peak response was taken into account, it produced consistentdeductions, that within this modeling framework Memory First provides the best ﬁt, across a range of datafrom other papers [2, 21, 15]. When inappropriately ﬁt to time-courses that extend beyond the expansionphase, interestingly it supported the Effector First Model in some cases, which is more consistent with otherdata in [15].As with any modeling framework, there are, of course, caveats in the strength of the conclusions drawnfrom it. Presumably due to the inherently limited nature of the in vivo data being ﬁt, several modelingassumptions were made in [3] that keep the computational model tractable and parameters identiﬁable, but ome of which are inconsistent with published experimental data. As more sophisticated tracking of cellfamilial fates and information on physiological constraints around lineage transitions develops, we anticipatethat ultimately other mathematical models will be employed that are built on that knowledge. For illustration,we attempt to highlight where those assumptions may inﬂuence the deductions of the present study.The model’s remit is exclusively an expansion phase with no death and strictly exponentially growingpopulations. As a result, the method cannot be reasonably expected to draw inferences on differentiation if itoccurs as motivation to divide is petering out.The fundamental unit of autonomous randomness in the model is the cell: each cell is born afresh, in-dependently selecting its lifetime and fate in a cell-type dependent fashion. The data in both [3, 8] illustratea strong familial inﬂuence, with some clones expanding dramatically more than others. Controlled in vitro experiments have established signiﬁcant elements of clonal dependency [16], notably in burst size [17, 12],while in vivo data using the cell cycle reporter FUCCI [20] has established that during an adaptive immuneresponse cells drop out of cycle as early as day four and the number of non-dividing cells progressively in-creases with time [18]. Thus, to retain simplicity and identiﬁability, an alternative view, taking to accountthese experimental features would posit that the ultimate clone burst size is randomized between families,and that the slowing in proliferation rate noted by the memory ﬁrst model, Fig. 4B and Fig. 7B, is created bynon-dividing memory cells diluting the average division time of the OT-1 populations being measured.The use of an exponential distribution for the lifetime distribution ensures that the multi-type Bellman-Harris process is Markovian, and facilitates the use of non-asymptotic values in model ﬁtting. Due to theminimum time needed to synthesize DNA, there is always a lower-bound on division times and it has beenlong since known that for lymphocytes, at least in vitro , the exponential does not offer a good description ofthe distribution of division times and a right skewed distribution is more appropriate [22, 6, 11, 10, 7]. Thus,while cohort and clonal data yielded the same deductions in our analysis, only experiments looking deeperthan the population level can shed light on the validity of the modeling assumptions used for inference.Whether a model built taking these assumptions into account would lead to distinct deductions is unclear.What is clear is that, within the conﬁnes of the assumptions underlying it, the model introduced in [3] providesa method for quickly evaluating differentiation networks, even for cohort level data. Acknowledgement.

The work of A.M. and K.R.D. was supported by Science Foundation Ireland Grant12IP1263. This work was supported by the National Health and Medical Research Council of Australiavia Project Grant 1010654, Program Grant 1054925 and a fellowship to P.D.H. as well as an AustralianGovernment National Health and Medical Research Council Independent Research Institutes InfrastructureSupport Scheme Grant 361646.

References [1] Raﬁ Ahmed, Michael J. Bevan, Steven L. Reiner, and Douglas T. Fearon. The precursors of memory:models and controversies.

Nature Reviews Immunology , 9(9):662–668, 2009.[2] Vladimir P. Badovinac, Jodie S. Haring, and John T. Harty. Initial T cell receptor transgenic cell precur-sor frequency dictates critical aspects of the CD8+ T cell response to infection.

Immunity , 26(6):827–841, 2007.[3] Veit R. Buchholz, Michael Flossdorf, Inge Hensel, Lorenz Kretschmer, Bianca Weissbrich, PatriciaGr¨af, Admar Verschoor, Matthias Schiemann, Thomas H¨ofer, and Dirk H. Busch. Disparate individualfates compose robust CD8+ T cell immunity.

Science , 340(6132):630–635, 2013.[4] Veit R. Buchholz, Ton N. M. Schumacher, and Dirk H. Busch. T cell fate at the single-cell level.

Annualreview of immunology , 34:65–92, 2016.[5] John T. Chang, Vikram R. Palanivel, Ichiko Kinjyo, Felix Schambach, Andrew M. Intlekofer, ArnobBanerjee, Sarah A. Longworth, Kristine E. Vinup, Paul Mrass, Jane Oliaro, Nigel Killeen, Jordan S.Orange, Sarah M. Russell, Wolfgang Weninger, and Steven L. Reiner. Asymmetric T lymphocytedivision in the initiation of adaptive immune responses.

Science , 315(5819):1687–1691, 2007.

6] Mark R. Dowling, Andrey Kan, Susanne Heinzel, Jie H. S. Zhou, Julia M. Marchingo, Cameron J.Wellard, John F. Markham, and Philip D. Hodgkin. Stretched cell cycle model for proliferating lym-phocytes.

Proceedings of the National Academy of Sciences , 111(17):6377–6382, 2014.[7] Ken R. Duffy, Cameron J. Wellard, John F. Markham, Jie H. S. Zhou, Ross Holmberg, Edwin D.Hawkins, Jhagvaral Hasbold, Mark R. Dowling, and Philip D. Hodgkin. Activation-induced B cellfates are selected by intracellular stochastic competition.

Science , 335(6066):338–341, 2012.[8] Carmen Gerlach, Jan C. Rohr, Le¨ıla Peri´e, Nienke van Rooij, Jeroen W. J. van Heijst, Arno Velds, JosUrbanus, Shalin H. Naik, Heinz Jacobs, Joost B. Beltman, Rob J. de Boer, and Ton N. M. Schumacher.Heterogeneous differentiation patterns of individual CD8+ T cells.

Science , 340(6132):635–639, 2013.[9] Theodore E Harris.

The Theory of Branching Processes . The RAND Corporation, 1964.[10] E. D. Hawkins, J. F. Markham, L. P. McGuinness, and P. D. Hodgkin. A single-cell pedigree anal-ysis of alternative stochastic lymphocyte fates.

Proceedings of the National Academy of Sciences ,106(32):13457–13462, 2009.[11] E. D. Hawkins, M. L. Turner, M. R. Dowling, C. van Gend, and P. D. Hodgkin. A model of immuneregulation as a consequence of randomized lymphocyte division and death times.

Proceedings of theNational Academy of Sciences , 104(12):5032–5037, 2007.[12] Susanne Heinzel, Tran Binh Giang, Andrey Kan, Julia M Marchingo, Bryan K Lye, Lynn M Corcoran,and Philip D Hodgkin. A Myc-dependent division timer complements a cell-death timer to regulate Tcell and B cell responses.

Nature Immunology , 18:96–103, 2017.[13] Joshyi Jacob and David Baltimore. Modelling T-cell memory by genetic marking of memory T cells invivo.

Nature , 399:593–597, 1999.[14] Carolyn G. King, Sabrina Koehli, Barbara Hausmann, Mathias Schmaler, Dietmar Zehn, and Ed Palmer.T cell afﬁnity regulates asymmetric division, effector cell differentiation, and tissue pathology.

Immu-nity , 37(4):709–720, 2012.[15] Ichiko Kinjyo, Jim Qin, Sioh-Yang Tan, Cameron J. Wellard, Paulus Mrass, William Ritchie, AtsushiDoi, Lois L. Cavanagh, Michio Tomura, Asako Sakaue-Sawano, Osami Kanagawa, Atsushi Miyawaki,Philip D. Hodgkin, and Wolfgang Weninger. Real-time tracking of cell cycle progression during CD8+effector and memory T-cell differentiation.

Nature Communications , 6:6301 EP –, 2015.[16] Fabrice Lemaˆıtre, H´el`ene D Moreau, Laura Vedele, and Philippe Bousso. Phenotypic CD8+ T celldiversiﬁcation occurs before, during, and after the ﬁrst T cell division.

The Journal of Immunology ,191(4):1578–1585, 2013.[17] J. M. Marchingo, G. Prevedello, A. Kan, S. Heinzel, P. D. Hodgkin, and K. R. Duffy. T-cell stimuliindependently sum to regulate an inherited clonal division fate.

Nature Communications , 7:13540, 2016.[18] Julia M. Marchingo, Andrey Kan, Robyn M. Sutherland, Ken R. Duffy, Cameron J. Wellard, Gabrielle T.Belz, Andrew M. Lew, Mark R. Dowling, Susanne Heinzel, and Philip D. Hodgkin. Antigen afﬁnity,costimulation, and cytokine inputs sum linearly to amplify T cell expansion.

Science , 346(6213):1123–1127, 2014.[19] Joseph T. Opferman, Bertram T. Ober, and Philip G. Ashton-Rickardt. Linear differentiation of cytotoxiceffectors into memory T lymphocytes.

Science , 283(5408):1745–1748, 1999.[20] Asako Sakaue-Sawano, Hiroshi Kurokawa, Toshifumi Morimura, Aki Hanyu, Hiroshi Hama, HatsukiOsawa, Saori Kashiwagi, Kiyoko Fukami, Takaki Miyata, Hiroyuki Miyoshi, Takeshi Imamura, Masa-haru Ogawa, Hisao Masai, and Atsushi Miyawaki. Visualizing spatiotemporal dynamics of multicellularcell-cycle progression.

Cell , 132(3):487–498, 2008.[21] Timothy E. Schlub, Vladimir P. Badovinac, Jaime T. Sabea, John T. Harty, and Miles P. Davenport.Predicting CD62L expression during the CD8+ T cell response in vivo.

Immunology and Cell Biology ,88(2):157–164, 2010.

22] J. A. Smith and L. Martin. Do cells cycle?

Proceedings of the National Academy of Sciences ,70(4):1263–1267, 1973.[23] E. John Wherry, Volker Teichgr¨aber, Todd C. Becker, David Masopust, Susan M. Kaech, Rustom Antia,Ulrich H. von Andrian, and Raﬁ Ahmed. Lineage relationship and protective immunity of memory CD8T cell subsets.

Nature Immunology , 4:225–234, 2003. . Path ogen

Naive

NaiveNaive Flow

Cytometer ? ? ? ?Host

1. An array of eight OT-1 congenic C57BL/6 donor mice is crossbred so that each has

CD45/CD90 T cell markers that distinguish offspring. 2. Adoptive transfer of purified naïve cells from 8 donor mice to host mouse. 1 cell is transferred per donor for 7 mice, 100 for the 8 th donor mouse. cfu of LM-OVA, triggering the immune response. 4. At eight days after infection cells are harvested from the host mouse from the lungs, spleen and lymph nodes.5.FACs analysis. For observed T cells, the congenic markers indicate the family and CD62L and CD27 markers indicate phenotype. Thus summary statistics of phenotype population is gathered at a family level. Analysis

HostDonors B . Flow Cytometer ? ? ? ?

1. The donor mouse is crossbred so as to express CD90 or CD45. 2. Adoptive transfer of different numbers of purified naïve T cells from donor mice to host mouse: • - 5x [B.2007] • - 4x [S.2010] • - 1x [B.2013] • [K.2015] Various days post infection: • • • •

3. The host mouse (expressing different markers from the donor) is infected with OVA expressing pathogen: • LM-OVA (7x cfu) [B.2007] • LM-OVA (8x cfu) [S.2010] • LM-OVA (5x cfu) [B.2013] • PR8-OVA (1x pfu) [K.2015]

4. At a fixed time post infection cells are harvested from the host mouse from: • blood tail snips [B.2007][S.2010] • spleen only [B.2013] • or spleen and MLN [K. 2015] Analysis

HostHostDonor *1-4 days for 1x transfer 6,8 days for 1x transfer Path ogen

NaiveNaiveNaive

Figure 1. Summary adoptive transfer experiments described in [2, 21, 3, 15]. (A) Experiments describedin [3] determine the number of cells per OT-1 CD8+ T cell clone at day eight. (B) All four papers describesimilar OT-1 CD8+ T cell adoptive transfer experiments, where cohort data is reported for a time-course ofdays post infection. The number of cells adoptively transferred varies between papers.11 . a v e r a g e f a m il y s i z e CD62L-CD27-CD62L-CD27+CD62L+CD27+ c o e ff i c i e n t s o f v a r i a t i o n CD62L-CD27-CD62L-CD27+CD62L+CD27+ c o rr e l a t i o n c o e ff i c i e n t s CD62L-CD27-/CD62L+CD27+CD62L-CD27-/CD62L-CD27+CD62L+CD27+/CD62L-CD27+ p o p u l a t i o n p r o p o r t i o n s CD62L-CD27-CD62L-CD27+CD62L+CD27+ B . [S. 2010] - blood a ll d a t a u p t o d a y p . i . ( p r o p o r t i o n C D L + ) [B. 2007] - blood d a t a c u r t a il e d t o e x p a n s i o n p h a s e ( p r o p o r t i o n C D L + ) [K. 2015] - spleen [K. 2015] - MLN Min ( x ) Median ( . x ) Max ( x ) [B. 2013] - spleen Figure 2. Experimental data from [2, 21, 3, 15].

Data manually extracted from graphs in those papers. (A, leftthree graphs) Results from the clonal experiment described in [3], showing average family size, coefﬁcients ofvariation and correlations for different phenotypes populations. (A, rightmost graph) Proportional phenotyperesults from the cohort experiment. (B) Comparing the CD62L + proportions from the four papers reportingcohort data. The top row shows all reported data up to day eight post infection. The bottom row shows the datawhen curtailed to the estimated expansion phase, which is shortened due to high adoptive transfer cell numbers.12 . TYPE 2Naive TYPE 2TYPE 2

TYPE 2

TYPE 3TYPE 3TYPE 3

TYPE 3 λ λ N λ p TYPE 4TYPE 4TYPE 4TYPE 4 λ p

1. Naïve cells live a exponentially distributed lifetime parameterized by λ N . Upon exit from this state, naïve cells become one cell of type two. 2. Type two cells live an exponentially distributed lifetime parameterized by λ . Upon exit from this state, type two cells differentiate into one cell of type three with probability p or divide into two type two cells with probability 1-p .

3. Type three cells live an exponentially distributed lifetime parameterized by λ . Upon exit from this state, type three cells differentiate into one cell of type four with probability p or divide into two type three cells with probability 1-p .4. Type four cells live an exponentially distributed lifetime parameterized by λ . Upon exit from this state, type four cells always divide into two type four cells. B . TYPE 2Naive TYPE 2TYPE 2TYPE 2 TYPE 3TYPE 3TYPE 3TYPE 3 λ λ N λ p

1. Naïve cells live a exponentially distributed lifetime parameterized by λ N . Upon exit from this state, naïve cells become one cell of type two.

2. Type two cells live an exponentially distributed lifetime parameterized by λ . Upon exit from this state, type two cells differentiate into one cell of type three with probability p or divide into two type three cells with probability 1- p .

3. Type three cells live an exponentially distributed lifetime parameterized by λ . Upon exit from this state type three cells always divide into two type three cells. Figure 3. Linear differentiation models considered here. (A) These linear models are a subset of the304 multi-type Bellman-Harris processes modeled by [3]. Each of the six models has six parameters, ( λ N , λ , λ , λ , p , p ) . (B) Simpliﬁed linear models, with the observed phenotypes reduced from three(CD62L + CD27 + , CD62L + CD27 - , CD62L - CD27 - ) to two (CD62L + , CD62L - ) and cell types reduced fromfour to three. The simpliﬁed model has four parameters, ( λ N , λ , λ , p ) .13 . Model 1 Model 2 Model 3 Model 4 Model 5 Model 6Model Index010203040506070 o b j e c t i v e f un c t i o n s c o r e ( c l o n a l ) o b j e c t i v e f un c t i o n s c o r e ( c o h o r t ) B . λ − N (days) λ − (days) λ − (days) λ − (days) p p parameter0.00.20.40.60.81.01.21.4 p a r a m e t e r v a l u e Model fitted to:

Clonal data (means, variances and covariances of phenotype family size).Population data (phenotypeproportions and averagefamily size).

Figure 4. Results from ﬁtting the linear models to clonal data, as in the original method, or cohort datafrom [3]. (A) Objective function values for the best ﬁtting parameterization of each model, where smaller val-ues indicate a better ﬁt. (B) Best ﬁtting parameterization for Model 1, where λ corresponds to CD62L + CD27 + (TCMp), λ to CD62L - CD27 + (TEMp), λ to CD62L - CD27 - (TEF), p to the probability that a TCMp cellbecomes a TEMp cell and p to the probability that a TEMp cell becomes a TEF cell.14 . transfered number of cells0123456789 d a y o f p e a k r e s p o n s e r =0.81 [Badovinac et al. 2007][Schlub et al. 2010]Estimate for [Buchholz et al. 2013]Estimate for [Kinjyo et al. 2015] B . transfered number of cells0.00.10.20.30.40.50.6 p r o p o r t i o n o f O T - c e ll s a t p e a k r e s p o n c e r =0.78 [Badovinac et al. 2007][Schlub et al. 2010]Estimate for [Buchholz et al. 2013]Estimate for [Kinjyo et al. 2015] C . number of cells adoptively transfered05000100001500020000 e s t i m a t e f o r a v e r a g e f a m il y s i z e Adoptive transfer: [Buchholz et al. 2013] 107[Schlub et al. 2010] . x [Schlub et al. 2010] . x [Kinjyo et al. 2015] x Figure 5. Estimates of the day of peak response and average family size as a function of the number ofadoptively transferred cells, using data from [2, 21]. (A) Log-linear regression was used to ﬁnd an estimatefor the day of peak response (y co-ordinate) from the number of adoptively transferred cells (x co-ordinate)reported in [15] and [3]. (B) as in A but estimating proportion of OT-1 cells of lymphocytes at peak. (C)Estimate of average family size at peak for given number of adoptively transferred cells. For low numbers oftransferred cells ( < . x − spleen[B. 2013] . x blood[S. 2010] . x blood[S. 2010] x blood[S. 2010] x blood[S. 2010] x spleen[K. 2015] T cells tranfered / data source0.000.050.100.150.200.25 o b j e c t i v e f un c t i o n s c o r e B . x − spleen[B. 2013] . x blood[S. 2010] . x blood[S. 2010] x no data[S. 2010] x no data[S. 2010] x spleen[K. 2015] T cells tranfered / data source0.000.020.040.060.080.100.120.14 o b j e c t i v e f un c t i o n s c o r e C . x − spleen[B. 2013] . x blood[S. 2010] . x blood[S. 2010] x no data[S. 2010] x no data[S. 2010] x spleen[K. 2015] T cells tranfered / data source0.00.10.20.30.40.50.6 o b j e c t i v e f un c t i o n s c o r e D . x blood[B. 2007] x blood[B. 2007] x blood[B. 2007] x lymph[K. 2015]T cells tranfered / data source0.000.020.040.060.080.10 o b j e c t i v e f un c t i o n s c o r e s Linear Memory First ModelLinear Effector First Model

Figure 6. The objective function value for the two simpliﬁed linear models for the best parameterizationswhen ﬁtting cohort statistics and average family size.

Smaller numbers indicate a better ﬁt. Fits only donewhen data sets had four or more data points. (A) Fitted to all reported data up to day eight post infection. (B)As in A, but curtailing data to the estimated expansion phase. (C) As in B, but with the average cell lifetimelimited to being below ﬁve days. (D) as in C, but ﬁt to additional data sources.16 . a v e r a g e f a m il y s i z e x − [B. 2013] . x [S. 2010] . x [S. 2010] All CD62L- CD62L+ x [K. 2015] Memory First Effector First p o p u l a t i o n p r o p o r t i o n B . λ − N (days) λ − (days) λ − (days) P parameter0.00.51.01.52.02.5 p a r a m e t e r v a l u e Linear Memory First Model

Data source: x − [B. 2013] . x [S. 2010] . x [S. 2010] x [K. 2015] C . λ − N (days) λ − (days) λ − (days) P parameter0.00.51.01.52.02.5 p a r a m e t e r v a l u e Linear Effector First Model