[PDF] Microsimulation Model Calibration using Incremental Mixture Approximate Bayesian Computation

Abstract

Microsimulation models (MSMs) are used to predict population-level effects of health care policies by simulating individual-level outcomes. Simulated outcomes are governed by unknown parameters that are chosen so that the model accurately predicts specific targets, a process referred to as model calibration. Calibration targets can come from randomized controlled trials, observational studies, and expert opinion, and are typically summary statistics. A well calibrated model can reproduce a wide range of targets. MSM calibration generally involves searching a high dimensional parameter space and predicting many targets through model simulation. This requires efficient methods for exploring the parameter space and sufficient computational resources. We develop Incremental Mixture Approximate Bayesian Computation (IMABC) as a method for MSM calibration and implement it via a high-performance computing workflow, which provides the necessary computational scale. IMABC begins with a rejection-based approximate Bayesian computation (ABC) step, drawing a sample of parameters from the prior distribution and simulating calibration targets. Next, the sample is iteratively updated by drawing additional points from a mixture of multivariate normal distributions, centered at the points that yield simulated targets that are near observed targets. Posterior estimates are obtained by weighting sampled parameter vectors to account for the adaptive sampling scheme. We demonstrate IMABC by calibrating a MSM for the natural history of colorectal cancer to obtain simulated draws from the joint posterior distribution of model parameters.

Full PDF

MMicrosimulation Model Calibration using Incremental MixtureApproximate Bayesian Computation

Carolyn Rutter , Jonathan Ozik , Maria DeYoreo , and Nicholson Collier RAND Corporation University of Chicago and Argonne National LaboratoryAugust 14, 2018

Abstract

Microsimulation models (MSMs) are used to inform policy by predicting population-leveloutcomes under diﬀerent scenarios. MSMs simulate individual-level event histories that markthe disease process (such as the development of cancer) and the eﬀect of policy actions (suchas screening) on these events. MSMs often have many unknown parameters; calibration is theprocess of searching the parameter space to select parameters that result in accurate MSMprediction of a wide range of targets. We develop Incremental Mixture Approximate BayesianComputation (IMABC) for MSM calibration, which results in a simulated sample from theposterior distribution of model parameters given calibration targets. IMABC begins with arejection-based ABC step, drawing a sample of points from the prior distribution of model pa-rameters and accepting points that result in simulated targets that are near observed targets.Next, the sample is iteratively updated by drawing additional points from a mixture of multi-variate normal distributions and accepting points that result in accurate predictions. Posteriorestimates are obtained by weighting the ﬁnal set of accepted points to account for the adaptivesampling scheme. We demonstrate IMABC by calibrating CRC-SPIN 2.0, an updated versionof a MSM for colorectal cancer (CRC) that has been used to inform national CRC screeningguidelines.

Microsimulation models (MSMs) are used to inform policy by predicting population-level outcomesunder diﬀerent policy scenarios. MSMs are characterized by simulation of agents that representindividual members of an idealized population of interest. For each agent, the model simulatesevent histories that catalog landmarks in the disease process. In general, disease processes modeledare not directly observable, though outcomes from these processes may be observed. For example,the process of developing colorectal cancer (CRC) cannot be observed, but the prevalence of bothprecursor lesions (adenomas) and preclinical (asymptomatic) CRC can be estimated from screeningtrials, and CRC incidence can be observed from national registry data.1 a r X i v : . [ s t a t . M E ] A ug odel calibration involves selecting parameter values that result in model predictions thatare consistent with observed data and expected ﬁndings. Once parameters are selected, MSMscan be used to make predictions about population trends in disease outcomes, eﬀectiveness ofinterventions, and the comparative eﬀectiveness of interventions, especially those without directempirical comparisons. For example, models have been used to inform U.S. Preventive ServicesTask Force screening guidelines for breast (Mandelblatt et al., 2016), cervical (Kim et al., 2017),colorectal (Knudsen et al., 2016), and lung cancer (de Koning et al., 2014) by comparing theeﬀectiveness of diﬀerent screening regimens.MSM calibration involves searching a high dimensional parameter space to predict many targets.Several approaches have been proposed. The simplest calibration method involves perturbingparameters one at a time and evaluating the goodness of ﬁt to calibration data, but this is onlyfeasible when calibrating a few parameters. Directed searches, such as the Nelder-Mead algorithm(Nelder and Mead, 1965), provide a derivative free hill-climb to identify a single best value for eachparameter. Kong et al. (2009) used search algorithms from engineering (simulated annealing and agenetic algorithm) for model calibration. Bayesian calibration methods estimate the joint posteriordistribution of MSM parameters, which provides information about parameter uncertainty andenables estimation of functions of parameters. Rutter et al. (2009) used Markov Chain Monte Carlo(MCMC) to simulate draws from the posterior distribution of MSM parameters given calibrationtargets. However, MCMC can be diﬃcult and costly to apply to MSM calibration and becauseMCMC is based on a process of sequentially updating draws, it is not easy to parallelize the processto take advantage of modern computing resources.Approximate Bayesian Computation (ABC) oﬀers an alternative approach to MSM calibration.ABC is a likelihood-free technique for simulating draws from the posterior distribution that ap-proximates likelihood-based algorithms by choosing parameters that produce a close match to datarather than calculating the likelihood (Marin et al., 2012; Conlan et al., 2012). The validity of ABCalgorithms, in the sense that they result in samples from the approximate posterior distribution,relies on the validity of the corresponding exact algorithms (Sisson et al., 2007). The idea under-lying ABC is simple. For a parameter θ with prior distribution π ( θ ) and observed data y , we canwrite the posterior probability as p ( θ | y ) = p ( y | θ ) π ( θ ) implying that we can approximate p ( θ | y ) bysampling θ from π ( · ) and retaining only points with p ( y | θ ) ≈

1. However, ABC is ineﬃcient andcan fail when the parameter space is high dimensional, when there are many calibration targets,or when the prior distributions are very diﬀerent from the posterior distributions. McKinley et al.(2018) found that popular ABC variants that improve the algorithm’s eﬃciency were not compu-tationally feasible for calibrating stochastic epidemiological models. We propose an IncrementalMixture ABC (IMABC) approach for MSM model calibration that begins with a basic rejection-sampling ABC step (e.g., Pritchard et al., 1999) and then incrementally adds points to regionswhere targets are well predicted.In the next sections we describe the CRC-SPIN MSM for the natural history of colorectal cancer(CRC) ( § § § § § The ColoRectal Cancer Simulated Population Incidence and Natural history model (CRC-SPIN)(Rutter et al., 2009; Rutter and Savarino, 2010) describes the natural history of CRC based onthe adenoma-carcinoma sequence (Muto et al., 1975; Leslie et al., 2002). Four model componentsdescribe the natural history of CRC: 1) adenoma risk; 2) adenoma growth; 3) transition fromadenoma to preclinical cancer; and 4) transition from preclinical to clinical cancer (sojourn time).CRC-SPIN has been used to provide guidance to the Centers for Medicare and Medicaid Ser-vices (CMS) (Zauber et al., 2009) and to inform U.S. Preventive Services Task Force CRC screeningguidelines (Knudsen et al., 2016). Model validation, based on comparison of model predictions toobserved outcomes, revealed that while CRC-SPIN predicted many aspects of CRC well (includ-ing clinically detected cancer, cancer mortality, and the eﬀectiveness of screening), it predicteddetection of too few preclinical cancers at screening, indicating that the simulated times spent inthe preclinical cancer phase (sojourn times) were too short (Rutter et al., 2016). In this paperwe present CRC-SPIN 2.0, an update to the original CRC-SPIN 1.0. CRC-SPIN 2.0 contains 21calibrated parameters (Table 1). Because this is a model recalibration, prior distributions are basedon results from the previous calibration of CRC-SPIN 1.0 (Rutter et al., 2009). In this section, weprovide an overview of the model. Additional details are provided in Appendix § A and online at cisnet.cancer.gov (National Cancer Institute, 2018).

The occurrence of adenomas is modeled using a non-homogeneous Poisson process with a piecewiseage-eﬀect. We assume zero risk before age 20. We focus on CRC in adults because CRC is very rarebefore age 20, with incidence of about one in 10 million (Koh et al., 2015). The i th agent’s baselineinstantaneous risk of an adenoma at age a = 20 years is given by ψ i (20) = exp( α i + α female i )where α i ∼ N ( A, σ α ) and α captures the diﬀerence in risk for women (female i = 1 indicates agent i is female). Adenoma risk changes over time, generally increasing with age, a process we modelusing a linear change-point for log-risk with knots at ages 50, 60, and 70. log ( ψ i ( a )) = α i + α sex i + δ ( a ≥

20) min( a − , α + δ ( a ≥

50) min(( a − , α + δ ( a ≥

60) min(( a − , α + δ ( a ≥ a − α (1)3able 1: Summary of CRC Microsimulation Model Components. Calibrated parameters associatedwith the 4 components of the natural history model, including parameter notation, associated equa-tions, prior distributions and posterior estimates (mean and 95% credible interval). TN [ a,b ] ( µ , σ )denotes a truncated normal distribution with mean µ and standard deviation σ , restricted to theinterval ( a, b ). U( a , b ) denotes a Uniform distribution over ( a, b ). Refer to section 2 for details ofthe 4 model components. Prior Posterior EstimatesComponent Distribution Mean 95% CIAdenoma Risk (eqn 1)Baseline log-risk A ∼ TN [ − . , − . (-6.4,0.25) -6.36 (-6.65,-6.03)Standard deviation, baseline log-risk σ α ∼ U(0.75,1.75) 1.28 (0.86,1.69)Female α ∼ TN [ − . , − . (-0.5,0.1) -0.61 (-0.69,-0.49)Age eﬀect, age ∈ [20 , α ∼ TN [0 . , . (0.04,0.06) 0.041 (0.033,0.049)Age eﬀect, age ∈ [50 , α ∼ TN [0 . , . (0.03,0.01) 0.028 (0.012,0.046)Age eﬀect, age ∈ [60 , α ∼ TN [ − . , . (0.03,0.01) 0.013 (-0.007,0.039)Age eﬀect, age ≥ α ∼ U(-0.02,0.03) 0.008 (-0.016,0.028)Time to 10mm (eqn 2)Shape, colon β C ∼ U(1.1,5) 1.32 (1.12,1.57)Shape, rectum β R ∼ U(1.1,5) 3.30 (1.68,4.84)Scale, colon ∗ β C ∼ U(10.7,40) 38.1 (35.6,39.9)Scale, rectum ∗ β R ∼ U(10.7,40) 16.4 (13.5,19.3)Intercept γ ∼ TN [2 . , . (3.1,0.25) 3.23 (3.07,3.42)Female (versus male) γ ∼ TN [ − . , . (-0.06,0.2) -0.17 (-0.26,-0.09)Rectal (versus colon) γ ∼ U(-0.25,0.25) -0.07 (-0.24,0.15)Female & rectal γ ∼ U(-0.25,0.25) 0.12 (-0.03,0.23)Age at initiation γ ∼ TN [ − . , . (-0.008,0.004) -0.009 (-0.014,-0.004)Female & age at initiation γ ∼ U(-0.004,0.004) 0.001 (-0.003,0.004)Rectal & age at initiation γ ∼ U(-0.004,0.004) 0.000 (-0.004,0.003)Female, rectal, & age at initiation γ ∼ U(-0.004,0.004) 0.000 (-0.004,0.004)Mean Sojourn Time (eqn 4)Colon τ C ∼ U(1.5,5.0) 1.91 (1.52,2.65)Rectum τ R ∼ U(1.5,5.0) 2.32 (1.55,3.55) ∗ Scale parameters, β , were also restricted to range from 10( − ln(0 . /β to ( − ln(0 . /β ,corresponding to the probability of an adenoma reaching 10mm within 10 years ranging from0.0001 to 0.25. 4 .2 Adenoma Growth Model For each adenoma, we simulate a hypothetical time to reach 10mm, t mm , which may exceed theagent’s lifespan. We assume that t mm has a Fr`echet distribution with shape parameter β , scaleparameter β , and cumulative distribution function given by F ( t ) = exp (cid:34) − (cid:18) tβ (cid:19) − β (cid:35) (2)for t ≥

0, with E ( t mm ) = β Γ(1 − /β ) and median( t mm ) = β ln(2) − /β . Prior distributionsfor adenoma growth parameters specify that most adenomas grow very slowly. We allow diﬀerentscale and shape parameters for adenomas in the colon and rectum.Adenoma size at any point in time is simulated using a von Bertalanﬀy growth curve model(Tjørve and Tjørve, 2010, see also § A). The simulated time to reach 10mm is used in combinationwith the growth curve model to calculate the adenoma growth rate parameter.

For the j th adenoma in the i th agent the size at transition to preclinical cancer (in mm) is simulatedusing a lognormal distribution; the underlying (exponentiated) normal distribution is assumed tohave standard deviation 0.5 and mean µ ij = γ + γ female i + γ rectum ij + γ female i rectum ij + (cid:16) γ + γ female i + γ rectum ij + γ female i rectum ij (cid:17) age ij . (3)Where rectum ij is an indicator of rectal versus colon location and age ij is the age at adenomainitiation. Based on this model, the probability that an adenoma transitions to preclinical cancerincrease with increasing size. The expected size at transition is given by exp( µ γ + 0 . µ γ ) and variance 0 .

28 exp(2 µ γ + 0 . µ γ = 3 . × − at 10mm, 0.008 at 15mm and 0.16 at 20mm. Sojourn time is the time from the transition to preclinical (asymptomatic) CRC and clinical (symp-tomatic and detected) cancer. We simulate sojourn time using a Weibull distribution with shapeparameter ﬁxed at 5: f ( x ) = (cid:16) τ (cid:17)(cid:16) xτ (cid:17) exp (cid:16) − (cid:16) xτ (cid:17) (cid:17) (4)so that E ( x ) = τ Γ(1 .

2) and

V ar ( x ) = τ (cid:16) Γ(1 . − Γ(1 . (cid:17) . By ﬁxing the shape parameter, wefocus on distributions with a limited range of skewness to disallow distributions with heavy righttails while retaining enough ﬂexibility to model plausible sojourn time distributions. We allow5iﬀerent values of τ for cancers in the colon and rectum. Prior distributions for sojourn timeparameters allow the mean (and standard deviation) of sojourn time to range from 1.4 (sd 0.32) to6.4 years (sd 1.5). Once a cancer becomes clinically detectable, we simulate stage and size at clinical detection andsurvival. Stage and tumor size at clinical detection are based on SEER data from 1975 to 1979,prior to diﬀusion of CRC screening (National Cancer Institute, 2004). Simulated survival time afterCRC diagnosis is based on a Cox proportional hazards model, estimated using SEER data fromindividuals diagnosed with CRC from 1975 through 2003 (Rutter et al., 2013). CRC survival isbased on the ﬁrst diagnosed CRC and depends on sex, age at diagnosis, cancer location (colon orrectum) and stage at diagnosis.Other-cause mortality is modeled using survival probabilities based on product-limit estimatesfor age and birth-year cohorts from the National Center for Health Statistics Databases (NationalCenter for Health Statistics, 2000).

Calibration data are derived from published studies, and typically take the form of summary statis-tics with known distributions, such as binomial, multinomial, and Poisson. We calibrate to 37targets from six sources: SEER registry data (National Cancer Institute, 2004, 16 targets, § § − ln(0 . /β ≤ β ≤ − ln(0 . /β .Calibration targets are based on individual-level data that is reported in aggregate. Calibrationrequires simulating targets by simulating a set of agents with risk that is similar to the studypopulation based on age, gender, and prior screening patterns, and the time period of the study,which may aﬀect both overall and cancer-speciﬁc mortality. SEER colon and rectal cancer incidence rates in 1975-1979 are a key calibration target (Table 2).Incidence rates reported are per 100,000 individuals. These rates are based on the ﬁrst observedinvasive colon or rectal cancer during the years 1975-1979, the most recent period prior to dissem-ination of CRC screening tests. We assume that given the SEER population size, the number ofincident CRC cases in any year follows a binomial distribution.To simulate SEER incidence rates, we generate a population of individuals from 20 to 100, withan age- and sex-distribution that matches the SEER 1978 population (to capture risk-levels within6able 2: Observed and Predicted Annual Incidence of Clinically Detected Cancers in 1975-1979,per 100,000 individuals. Observed Tolerance Posterior PredictedLocation Gender Age Mean Interval Mean 95% CIColon Female 20-49 4.8 (2.8, 6.8) 3.5 (2.8, 4.8)50-59 43.3 (31.3, 55.2) 46.3 (37.0, 54.2)60-69 100.7 (79.7, 121.7) 106.0 (89.8, 119.9)70-84 216.7 (185.6, 247.8) 210.1 (187.7, 239.3)Colon Male 20-49 4.5 (2.5, 6.5) 3.4 (2.6, 4.7)50-59 45.9 (33.2, 58.6) 51.0 (41.4, 58.1)60-69 121.4 (96.6, 146.2) 126.2 (107.0, 143.5)70-84 268.4 (224.6, 312.2) 261.6 (228.7, 301.4)Rectal Female 20-49 1.9 (0.6, 3.1) 1.7 (0.7, 2.8)50-59 20.4 (12.2, 28.6) 20.4 (13.8, 27.3)60-69 42.5 (28.9, 56.1) 41.9 (31.7, 53.1)70-84 73.9 (55.7, 92.1) 73.2 (58.1, 89.7)Rectal Male 20-49 2.3 (0.9, 3.7) 2.4 (1.3, 3.5)50-59 30.0 (19.7, 40.3) 31.7 (23.1, 39.5)60-69 71.4 (52.4, 90.4) 67.9 (54.5, 83.4)70-84 128.0 (97.7, 158.3) 120.5 (100.0, 146.9)each age category), who are free from clinically detected CRC. Model-predicted CRC incidence isbased on the number of people who develop CRC in the next year.

Table 3 summarizes calibration targets from ﬁve studies. To simulate these targets, we generatedseparate populations for each target that match the age and gender distribution of study partici-pants during the time-period of the study. One study (Church, 2004) describing the pathology oflesions (i.e., adenomas and preclinical cancers) did not provide information about the age or sex ofpatients, and so we simulated a population that was 50% male with an average age of 65 (standarddeviation of 5), and an age range of 20 to 90 years.Simulation of targets in Table 3 also requires simulating the detection of lesions (adenomasand preclinical cancers). Sensitivity is a function of lesions size, and is informed by back-to-backcolonoscopy studies (Hixson et al., 1990; Rex et al., 1997, additional details provided in § A). Weassume that study participants are free from symptomatic (clinically detectable) CRC and have notbeen screened for CRC prior to the study. This is a reasonable assumption because studies usedfor model calibration were conducted prior to widespread screening, or were based on minimallyscreened samples. CRC screening guidelines have been in place since the late 1990s (Winaweret al., 1997), and screening rates have since risen steadily (Meissner et al., 2006; Centers for DiseaseControl & Prevention, 2011). 7able 3: Observed and Predicted Calibration Targets from Published StudiesTolerance Posterior PredictedTarget Mean Interval Mean 95% CICorley et al. (2013)Adenoma Prevalence, Women 50-54 15 (12.9, 20.8) 16.8 (14.1, 19.9)Adenoma Prevalence, Women 55-59 18 (15.5, 25.0) 20.3 (17.4, 23.6)Adenoma Prevalence, Women 60-64 22 (19.4, 30.1) 23.8 (20.6, 27.1)Adenoma Prevalence, Women 65-69 24 (20.6, 33.4) 27.0 (23.5, 30.4)Adenoma Prevalence, Women 70-74 26 (21.5, 37.0) 29.9 (26.1, 33.4)Adenoma Prevalence, Women ≥

75 26 (20.8, 37.7) 33.2 (29.0, 37.2)Adenoma Prevalence, Men 50-54 25 (22.1, 34.2) 26.0 (22.7,29.6)Adenoma Prevalence, Men 55-59 29 (25.6, 39.7) 30.7 (26.8,34.6)Adenoma Prevalence, Men 60-64 31 (27.5, 42.3) 35.1 (30.9,39.3)Adenoma Prevalence, Men 65-69 34 (29.6, 46.9) 39.2 (34.4,43.8)Adenoma Prevalence, Men 70-74 39 (33.2, 54.6) 42.7 (37.4,47.7)Adenoma Prevalence, Men ≥

75 38 (31.6, 53.9) 46.6 (40.4,52.1)Pickhardt et al. (2003) ∗ Percent of Detected Adenomas ≥ ∗ Preclinical CRCs per 1,000 Lesions 6 − ≥ , ≥ ∗ Size was reported categorically as ≤ ≥ , .

5) mm, [5 . , .

5) mm and ≥ . Posterior Inference via Incremental Mixture Approximate BayesianComputation (IMABC)

The basic rejection-based ABC algorithm (Tavare et al., 1997; Pritchard et al., 1999) generatesmodel parameter vectors θ from the prior distribution, π ( θ ), then uses the model to simulate data, y ∗ . Draws that result in simulated data that are similar to observed data, y , are accepted. Similaritybetween y ∗ and y is based on user-deﬁned summary statistics, a distance metric, and a tolerancelevel that deﬁnes the distance of acceptable points.In practice, simulating θ from the prior distribution can be very ineﬃcient because the priorand posterior distributions are often poorly aligned. Many versions of ABC have been developedto address ineﬃciencies. Two popular variants are ABC-MCMC (Marjoram et al., 2003) andsequential Monte Carlo ABC (ABC-SMC, Sisson et al., 2007; Toni et al., 2009). ABC-MCMCinvolves proposing a new value of θ by sampling u from a user-speciﬁed jumping distribution, q ( · ),that is centered at zero with θ ( t +1) = θ ( t ) + u . If simulated data based on θ ( t +1) are within tolerancelevels for observed data then, similarly to MCMC, θ ( t +1) is accepted with a probability equal to theminimum of 1 and q ( θ ( t +1) | θ ( t ) ) π ( θ ( t +1) ) q ( θ ( t ) | θ ( t +1) ) π ( θ ( t ) ) . Drawbacks of ABC-MCMC include the usual problems withMCMC, such as correlated samples, low acceptance rates, the possibility of getting stuck in lowposterior probability regions, and slow mixing requiring simulation of very long chains. ABC-SMCis based on importance sampling with the prior used as the proposal distribution. ABC-SMC startsby simulating a set of draws from the prior distribution. Each subsequent set of draws is simulatedby drawing an (importance) weighted sample from the previous set of draws and for each sampledpoint adding a random deviate u that is drawn from a user-speciﬁed jumping distribution. Foreach sampled point this process is repeated until the perturbed point is accepted (i.e., falls withinthe tolerance interval). When using the ABC-SMC approach, users specify the total number ofiterations, T , and a sequence of T increasingly stringent tolerance intervals, which require acceptedpoints to be nearer to targets as the algorithm proceeds. After T iterations, draws from the posteriordistribution are simulated by drawing a weighted sample of θ ’s using ﬁnal importance weights thatare based on the sequence of jumping distributions. The population Monte Carlo ABC algorithm(ABC-PMC) is closely related to ABC-SMC, and also draws on importance sampling (Beaumontet al., 2009; Marin et al., 2012). ABC-PMC uses a multivariate normal jumping distribution withcovariance matrix that is based on prior draws.In general, ABC and its variants can be impractical or can fail when the parameter space ishigh dimensional, or there are many summary statistics that the simulated data must approximate(Blum and Francois, 2010). We propose a new ABC approach that we call incremental mixture ap-proximate Bayesian computation (IMABC), which is well-suited to MSM calibration which involvesboth high dimensional parameter spaces and many calibration targets. IMABC is an approximateBayesian version of adaptive importance sampling, similar to IMIS (Steele et al., 2006; Raftery andBao, 2010), with samples drawn from the parameter space using a proposal distribution that is amixture of normal distributions. Posterior estimates are based on accepted draws that are weighted9o account for diﬀerences between the prior and proposal distrbutions. IMABC is most similar tothe ABC-PMC approach (Beaumont et al., 2009). IMABC adds new points in regions near a subsetof points that produce simulated targets closest to observed targets, whereas ABC-PMC samplespoints based on an approximation to the joint distribution using importance weights. The IMABC algorithm begins with a rejection-sampling ABC step, and updates this initial sampleby adding points near a set of “best” points that result in simulated targets that are closest tocorresponding observed targets.Let O , . . . , O J denote the J calibration targets, which we assume are summary statistics.We specify tolerance bounds around targets based on (1 − α j ) × j = 1 , ..., J . Let α = ( α , α , . . . , α J ). The IMIS algorithm updates tolerance intervals so theybecome more stringent in later iterations. Let α (0) be the alpha-levels used for tolerance intervalsfor the initial ABC step, α ( t ) are alpha-levels for the t th iteration, and α ∗ are the ﬁnal (user-speciﬁed) alpha-levels, corresponding to convergence of the IMABC algorithm. When searchinga high dimensional parameter space, it is practical to begin with very wide tolerance intervals,corresponding to small values of α . Final alpha-levels used to calculate tolerance intervals mayvary across targets depending on the quality of and conﬁdence in calibration targets.Let S ij denote the j th simulated target (corresponding to O j ) for the i th sampled point, andlet δ j ( θ i , α j ) = 1 if S ij falls within the (1 − α j ) × O j . We use anintersection criterion for acceptance (Conlan et al., 2012; Ratmann et al., 2014), with θ i is acceptedwhen all S ij lie within ABC tolerance bounds, so that δ ( θ i , α ) = (cid:81) Jj =1 δ j ( θ i , α j ).At the ﬁrst IMABC step, a sample of N points is drawn from the prior distribution of modelparameters, π ( θ ). The algorithm then enters an updating phase. The ( t + 1)st iteration in theIMABC algorithm proceeds as outlined below: Step 1: Identify the best points and sample new points nearby ρ ij , for each accepted θ i , based on two-sided tests of H : S ij = O j versusH A : S ij (cid:54) = O j for j = 1 , . . . , J , treating S ij as ﬁxed and O j as estimated with error. Often,as in our application, O j is a summary statistic, and is approximately normally distributed.We summarize model ﬁt across multiple targets with ρ i · = min i ( ρ ij ), the worst ﬁt across the J targets.1B. Select the N ( c ) points with the largest ρ i · . When there are ties, calculate the distance betweenthe simulated and observed targets, d i · = (cid:80) j : α j <α ∗ j d ij where d ij = ( S ij − O j ) /O j and selectpoints with the largest ρ i · and smallest d i · .1C. Simulate B new draws around each of the θ ( t +1)( k ) , k = 1 , . . . , N ( c ) best points by sampling froma normal distribution with mean θ ( t +1)( k ) and covariance Σ ( t +1)( k ) .10et p be the dimension of θ (i.e., the number of calibrated parameters). If there are fewer than5 p accepted points, then Σ ( t +1)( k ) is set to a diagonal covariance matrix with standard deviationset to half the prior distribution standard deviation for each parameter. If there are at least5 p and up to 25 p accepted points, Σ ( t +1)( k ) is calculated using all accepted points. If there aremore than 25 p accepted points, Σ ( t +1)( k ) is calculated using the 25 p accepted points nearestto θ ( t +1)( k ) . This means that until the algorithm accepts 25 p points, the the same covariancematrix is used for all normal mixtures.1D. Simulate calibration targets, S ij , for each new draw, and resimulate targets at center points, θ ( t +1)( k ) . Accept or reject new draws and previously sampled center points based on δ ( θ i , α ( t ) ).Resimulation of targets at center points enables the algorithm to move away from centerpoints with S ij that are, by chance, similar to O i . Step 2: Update Tolerance Intervals

If any α ( t ) j < α ∗ j and there are 50 p or more accepted points, check to see if the tolerance can beupdated. Identify i (cid:48) associated with the median ρ i · , with d i · as a tie breaker. For each potentiallyupdated tolerance level, set α ( t +1) j = min( ρ i (cid:48) j , α ∗ j ), then update the accepted θ ’s, so that they arebased on δ ( θ i , α ( t +1) ), removing up to half of the previously accepted points that are furthest fromthe targets. Step 3: Evaluate Stopping Criteria If α ( t +1) = α ∗ , calculate sampling weights and the corresponding eﬀective sample size (ESS).Sampling weights account for sampling of points from the normal mixture rather than the priordistribution, w i = π ( θ i ) /q t ( θ i ). The mixture sampling distribution, q t , is given by q t = N N t π + BN t (cid:80) ts =1 (cid:80) N ( c ) k =1 H ( s ) k where H ( s ) k is the k th normal distribution at iteration s , given by N ( θ ( s )( k ) , Σ ( s )( k ) ),and N t = N + N ( c ) Bt , the total number of draws through iteration t .The ESS for the N ( t +1) draws is ( (cid:80) N ( t +1) i =1 w i ) − , where w i = 0 if δ ( θ i , α ( t +1) ) = 0 (Kish, 1965;Liu, 2004). The algorithm stops when ESS ≥ N post , having obtained the desired number of drawsfrom the posterior distribution. If α ( t +1) = α ∗ and ESS < N post the algorithm continues to iterate,but without further updates to tolerance intervals.Once the IMABC algorithm is complete, independent draws from the posterior distributionare simulated by taking a weighted sample from accepted points with replacement, using the w i .Alternatively, posterior means and 95% credible intervals (CIs) can be estimated using weightedmeans and percentiles based on all accepted draws.When implementing the IMABC algorithm, we recommend using a large initial sample size, N , to ensure exploration of the parameter space and because few initially sampled points may liein high posterior probability regions. The number of normal mixtures used to draw new pointsat each step, N ( c ) , can be selected to optimize use of computing resources. The eﬀective samplesize of the ﬁnal set of accepted points, N post , will depend on the planned uses of the calibrated11argets. For example, 2 ,

000 is a good choice when the goal is to provide interval estimates ofmodel predictions based on percentile intervals, but larger samples may be desired when estimatingfunctions of parameters.Using IMABC to calibrate an MSM requires multiple model evaluations at each parameter drawand the user needs to specify m j , the size of the simulated sample used to obtain S ij . m j may besmaller for common outcomes (such as adenoma prevalence), and larger for rare outcomes (suchas cancer incidence). Setting m too low will result in too much stochastic variation in S ij andinaccurate identiﬁcation of acceptable θ i . Setting m too high will unnecessarily slow the algorithm. To calibrate CRC-SPIN 2.0, we used N = 21 ,

000 with Latin hypercube sampling from the priordistribution to ensure coverage of the parameter space at the initial draw. With the exception ofthe SEER target, we began with α (0) = 0 . α ∗ = 0 . α (0) = 0, accepting all points regardless of nearness to SEER targets, and workedtoward α ∗ = 1 × − , which results in narrow bands around these registry-based incidence rates.Tolerance intervals are wider for study-derived targets because of the smaller sample sizes. Thesewider tolerance intervals also reﬂect the greater uncertainty in these targets due to a range of factorsrelated to their simulation, including uncertainty about population characteristics, sensitivity oflesion detection, and lesion size measurement and categorization. Because the Corley et al. (2013)study is based on insured patients who underwent colonoscopies from 1/1/2006 to 12/31/2008,the observed adenoma incidence rates may be lower than expected because of prior screening andremoval of adenomas. Therfore, we speciﬁed asymetric tolerance limits for the Corley et al. (2013)target, extending the upper tolerance range by adding 0.25 O j to the upper tolerance limit.To take advantage of high performance computing and parallel processing (Appendix § B), weused N ( c ) = 10, drawing B = 1 ,

000 points from each normal mixture so that 10 ,

000 new pointswere evaluated at each updating iteration. We assumed a normal distribution for sample statisticswhen estimating (1 − α ) × N post , to 5 , m j equal to 5 × for Pickhardt et al. (2003); 2 × forCorley et al. (2013) and Imperiale et al. (2000); 3 × for Church (2004), 5 × for Lieberman et al.(2008) and 5 × for the SEER registry data. To improve eﬃciency of the IMABC algorithm, wesequentially calculated S ij for each new θ i in Step 1 of the algorithm, working from targets that areleast to most computationally intensive. After calculating each target, we evaluated δ j ( θ i , α j ) andonce δ j ( θ i , α j ) = 0, the point is rejected without simulating the remaining, more computationallyintensive, targets.Both the IMABC algorithm and the CRC-SPIN 2.0 model were implemented in the R pro-gramming language (R Core Team, 2014). They were coupled to produce an integrated, dynamic,12igh-performance computing workﬂow with the use of the Extreme-scale Model Exploration withSwift (EMEWS) framework (Ozik et al., 2016). Further details about the computing environmentare provided in Appendix B. The IMABC algorithm completed 8 iterations, obtaining 5 ,

253 parameter draws within tolerancelimits, with an eﬀective sample size of 5 ,

168 draws from the joint posterior distribution. Samplingweights ranged from 1 . × − to 3 . × − , with a mean and median of 1 . × − .Posterior estimated means and 95% CIs of model parameters were based on weighted meansand percentiles of accepted draws from the joint posterior distribution (shown in Table 1). Weestimated that adenoma risk is higher for men than women, increases with age, and increasesmore rapidly at younger (than older) ages. Parameters that govern the time for an adenoma toreach 10mm were tightly estimated, with the exception of β R . Consistent with prior limitations,the model predicted that 0.4% of adenomas in the colon reach 10mm within 10 years (95% CI(0.4%, 1.0%)) and 1.8% of adenomas in the rectum reach 10mm within 10 years (95% CI (0.002%,9.8%)). The predicted percent of adenomas reaching 10mm within 20 years rises to 9.6% (95% CI(6.8%, 12.3%)) for adenomas in the colon and 59.7% (95% CI (38.8%, 83.3%)) for adenomas in therectum. We estimated that adenomas transition to preclinical cancer at smaller sizes for women, foradenomas in the rectum, and for adenomas initiated at later ages. The gender eﬀect was strongerfor adenomas in the colon than for adenomas in the rectum. We did not ﬁnd evidence of diﬀerentialeﬀects of age at adenoma initiation on size at transition by adenoma location or agent sex (basedon interaction terms γ , γ , and γ ). We estimated shorter sojourn times for preclinical cancers inthe colon relative to the rectum. The estimated posterior mean sojourn time is 1.75 years with 95%CI (1.39, 2.44) for preclinical cancers in the colon and 2.13 with 95% CI (1.42, 3.26) for preclinicalcancers in the rectum.By simulating draws from the posterior distribution, we were able to examine correlations andrelationships among model parameters. For example, Figure 1 displays the bivariate posteriordistributions of baseline log-adenoma risk ( A ) and the annual increase in risk between the agesof 20 and 50 years ( α ). When baseline risk is lower, risk increases more rapidly from 20 to 50years to accurately predict observed adenoma prevalence, which largely is based on prevalence afterage 50 when guidelines recommend initiation of CRC screening (correlation is − . β R and β R (correlation is − . ≥ .0350.0400.0450.050 −6.7 −6.5 −6.3 −6.1 −5.9Baseline log−risk, A C h a ng e i n l og − r i s k , − , a (a) Baseline log-risk ( A ) and change in risk ages20-59 ( α ) b R S ca l e , ti m e t o mm , r ec t u m , b R (b) Growth parameters, rectal adenomas (shape, β R and scale, β R ) Figure 1: Joint posterior distribution of model parameters associated with adenoma risk, and thegrowth and sojourn time in the colon.the IMABC calibration approach combines information across potentially conﬂicting targets.

We addressed the problem of calibrating microsimulation models by developing IMABC, an ABCalgorithm based on the ideas of incremental mixture importance sampling (IMIS) (Steele et al., 2006;Raftery and Bao, 2010), an adaptive Sampling Importance Resampling algorithm (SIR; Rubin,1987). We illustrate our approach by calibrating CRC-SPIN 2.0, an MSM for colorectal cancer, aproblem that involves a relatively high dimensional parameter space and multiple targets.Like IMIS, the IMABC algorithm iteratively updates the proposal distribution at each itera-tion to obtain samples from regions of the parameter space that are consistent with calibrationtargets. The resulting mixture of normal distributions with locally adaptive covariance matrices isa very ﬂexible distribution, and the algorithm can sample from a distribution that is multimodalto better approximate the posterior distribution. In terms of ABC algorithms, IMABC uses a newapproach to selecting tolerance levels, based on α -levels associated with a test of equality betweenthe simulated and observed targets, which implicitly incorporates the precision of calibration tar-gets. IMABC also provides an automated approach to tuning these tolerance intervals, requiringusers to speciﬁy only the initial and ﬁnal values whereas ABC-SMC requires prespeciﬁcation of thesequence of tolerance intervals.Other advantages of IMABC include clear stopping rules based on the eﬀective sample size,the ability to specify which targets are most important through ﬁnal tolerance intervals, and the14bility to take advantage of parallelized code. A limitation of the IMABC algorithm, especially asapplied to MSM calibration, is that IMABC can be computationally demanding. Evaluation of avery large number of points may be necessary, and calibration targets must be simulated for eachpoint. The computational expense can be reduced through the ordering of target evaluations, andceasing evaluation of a point when the ﬁrst set of targets fails to fall within tolerance bounds. Weimplemented IMABC as a dynamic high-performance computing (HPC) workﬂow via the EMEWSframework (Ozik et al., 2016). While the HPC environment was advantageous for development ofthe IMABC approach, we found that it was not ultimately necessary for its application.Future work will explore the release of publicly available code to allow others to utilize IMABC.In addition, because calibration to summary statistics requires simulation of a large number ofmodel evaluations, each with a large number of agents, we plan to explore ways to improve theeﬃciency of IMABC model calibration. We also plan to examine eﬃcient approaches to parameterupdating when new targets become available, and sequential calibration approaches that can beused to eﬃciently build from simpler to more complex models. A CRC-SPIN 2.0: Additional Model Information

This appendix provides information about the CRC-SPIN 2.0 model that that may be useful forunderstanding the model, but is not essential to understanding the calibration approach. Completemodel description can be found on the cancer.cisnet.gov (National Cancer Institute, 2018), inthe section describing model proﬁles.

A.1 Adenoma Risk Model

Once adenomas are initiated, they are assigned a location. The distribution of adenomas through-out the large intestine follows a multinomial distribution based on data from 9 autopsy studies(Blatt, 1961; Chapman, 1963; Stemmermann and Yatani, 1973; Eide and Stalsberg, 1978; Rick-ert et al., 1979; Williams et al., 1982; Bombi, 1988; Johannsen et al., 1989; Szczepanski et al.,1992). The probabilities associated with six sites in the large intestine (from distal to proximal)are: P(rectum) = 0.09; P(sigmoid colon) = 0.24; P(descending colon) = 0.12; P(transverse colon)= 0.24; P(ascending colon) = 0.23; and P(cecum) = 0.08. For many purposes it is important todistinguish between colon and rectal locations; more detailed location information is sometimesused for determining screening test accuracy.

A.2 Adenoma Growth Model

The diameter of the j th adenoma in the i th agent at time t after onset is given by d ij ( t ) = d ∞ (cid:34) (cid:32)(cid:18) d d ∞ (cid:19) /p − (cid:33) exp( − λ ij t ) (cid:35) d ∞ = 50 is the maximum adenoma diameter in millimeters (mm), d = 1mm is the minimumadenoma diameter, p = 3, corresponding to the von Bertalanﬀy growth model, and λ ij is the growthrate for the j th adenoma within the i th agent. CRC-SPIN 1.0 speciﬁed p = 1, corresponding tothe negative exponential model, but this resulted in relatively faster early adenoma growth and toofew small adenomas.We parameterized the growth model in terms of the time it takes for the adenoma diameterto reach 10mm to improve our ability to relate adenoma growth to observable data and clinicalknowledge. The growth rate, λ ij , can easily be calculated given the time to reach 10mm. A.3 Model for Transition from Adenoma to Preclinical Invasive Cancer

The CRC-SPIN 2.0 model for adenoma transition is a reparameterized version of the CRC-SPIN 1.0model for adenoma transition, restated as a regression model to better evaluate diﬀerences basedon agent and adenoma characteristics.

A.4 Model for Sojourn Time

CRC-SPIN 2.0 uses a Weibul for sojourn times. This allows longer sojourn times and better alignswith ﬁndings from previous studies than the log-normal model used in Version 1.0. For example,data from the TAMACS study (Chen et al., 1999), reported an estimated mean sojourn time of2.85 years with a 95% conﬁdence interval (2 . , . A.5 Simulation of Lifespan and Colorectal Cancer Survival

The CRC-SPIN 2.0 model ﬁrst simulates the stage at clinical detection given sex and age atdetection, and then simulates size at detection conditional on stage. (In contrast, the CRC-SPIN 1.0model simulated size, and then stage conditional on size.)

A.6 Simulated Screening

Colonoscopy sensitivity for adenoma and preclinical CRC detection is based on a quadratic functionof lesion size ( s ) that was successfully used in the CRC-SPIN 1.0 model. For adenomas, we assume P (miss | size = s ≤ . − . s + 0 . s , P (miss | size = 15 < s ≤ . P (miss | size = 30 < s ≤ .

005 and P (miss | size = s ≥ . B Programming and Computing Environment

We utilized the EMEWS framework (Ozik et al., 2016) to implement a dynamic HPC workﬂowcontrolled by the IMABC algorithm. EMEWS, built on the general-purpose parallel scriptinglanguage Swift/T (Wozniak et al., 2013), allows for the direct integration of multi-language softwarecomponents, in this case IMABC and CRC-SPIN 2.0, and can be used on computing resourcesranging from desktops and campus clusters to supercomputers. The resulting IMABC EMEWSworkﬂow is driven directly by the IMABC R source code, obviating the need for porting thecode to alternate programming languages or platforms for the sole purpose of running large-scalecomputational experiments.The experiments were performed on the Cray XE6 Beagle at the University of Chicago, hostedat Argonne National Laboratory. Beagle has 728 nodes, each with 2 AMD Operton 6300 processors,each having 16 cores, for a total of 32 cores per node; the system thus has 23,296 cores in all. Eachnode has 64 GB of RAM. Experiments were also run on the Midway2 cluster at the University ofChicago Research Computing Center. Midway2 is a hybrid cluster, including both CPU and GPUresources. For this work, the CPU resources were used, consisting of 370 nodes of Intel E5-2680v4processors, each with 28 cores and 64 GB of RAM. Swift/T, with the underlying EMEWS workﬂowengine, allows for the abstraction of resource speciﬁc settings (e.g., scheduler type and computelayouts) for a variety of target computing resources. Thus, once the IMABC EMEWS workﬂow wasdeveloped, it could be run on both the Beagle and Midway2 clusters with only minimal conﬁgurationmodiﬁcations.The experiment reported here used 80 nodes on Beagle with 4 worker processes per node (toaccount for the memory footprint of CRC-SPIN 2.0) for a total of 320 worker processes, each ofwhich could concurrently execute an individual model run. The total compute time was 29.4 hoursor 2,352 node-hours.

Acknowledgements

This publication was made possible by ﬁnancial support provided by NIH. Drs. Rutter and DeY-oreo were supported by a grant from the National Cancer Institute (U01-CA-199335) as part of theCancer Intervention and Surveillance Modeling Network (CISNET). Drs. Ozik and Collier weresupported by the NIH (grants 1R01GM115839, 1S10OD018495), the NCI-DOE Joint Design ofAdvanced Computing Solutions for Cancer program, and through resources provided by the Com-putation Institute and the Biological Sciences Division of the University of Chicago, the Universityof Chicago Research Computing Center, and Argonne National Laboratory. This material is basedupon work supported by the U.S. Department of Energy, Oﬃce of Science, under contract number17E-AC02-06CH11357. The contents of this article are solely the responsibility of the authors anddo not necessarily represent the oﬃcial views of the National Cancer Institute.

References

Beaumont, M. A., Corneaut, J., Marin, J. M., and Robert, C. P. (2009), “Adaptive approximateBayesian computation,”

Biometrika , 96(4), 983–990.Blatt, L. J. (1961), “Polyps of the colon and rectum: Incidence and distribution,”

Diseases of theColon and Rectum , 4, 277–282.Blum, M., and Francois, O. (2010), “Non-linear regression models for Approximate Bayesian Com-putation,”

Statistics and Computing , 20, 63–73.Bombi, J. A. (1988), “Polyps of the colon in Barcelona, Spain,”

Cancer , 61, 1472–1476.Centers for Disease Control & Prevention (2011), “Vital signs: Colorectal cancer screening,incidence, and mortality–United States, 2002-2010,”

Morbidity and mortality weekly report ,60(26), 884.Chapman, I. (1963), “Adenomatous polypi of large intestine: Incidence and distribution,”

Annalsof Surgery , 157, 223–226.Chen, T. H. H., Yen, M. F., Lai, M. S., Koong, S. L., Wang, C. Y., Wong, J. M., Prevost, T. C., andDuﬀy, S. W. (1999), “Evaluation of a selective screening for colorectal carcinoma: The TaiwanMulticenter Cancer Screening (TAMACS) Project,”

Cancer , 86, 1116–1128.Church, J. M. (2004), “Clinical Signiﬁcance of Small Colorectal Polyps,”

Dis Colon Rectum , 47, 481–485.Conlan, A. J. K., McKinley, T. J., Karolemeas, K., Pollock, E. B., Goodchild, A. V., Mitchell,A. P., Birch, C. P. D., Clifton-Hadley, R. S., and Wood, J. L. N. (2012), “Estimating the HiddenBurden of Bovine Tuberculosis in Great Britain,”

PLOS Computational Biology , 8(10), 1–14.

URL: https://doi.org/10.1371/journal.pcbi.1002730

Corley, D. A., Jensen, C. D., Marks, A. R., Zhao, W. K., De Boer, J., Levin, T. R., Doubeni,C., Fireman, B. H., and P, Q. C. (2013), “Variation of Adenoma Prevalence by Age, Sex, Raceadn colon Location in a Large Population: Implications for Screening and Quality Programs,”

Clinical Gastroenterology and Hepatology , 11, 172–180.de Koning, H. J., Meza, R., Plevritis, S. K., Ten Haaf, K., Munshi, V. N., Jeon, J., Erdogan,S. A., Kong, C. Y., Han, S. S., van Rosmalen, J. et al. (2014), “Beneﬁts and harms of computedtomography lung cancer screening strategies: a comparative modeling study for the US PreventiveServices Task Force,”

Annals of internal medicine , 160(5), 311–320.18ide, T. J., and Stalsberg, H. (1978), “Polyps of the large intestine in Northern Norway,”

Cancer ,42, 2839–2848.Hixson, L., Fennerty, M., Sampliner, R., McGee, D., and Garewal, H. (1990), “Prospective studyof the frequency and size distribution of polyps missed by colonoscopy,”

J Natl Cancer Inst ,82, 1769–1772.Imperiale, T. F., Wagner, D. R., Lin, C. Y., Larkin, G. N., Rogge, J. D., and Ransohoﬀ, D. F.(2000), “Risk of Advanced Proximal Neoplasms in Asymptomatic Adults According to the DistalColorectal Findings,”

NEJM , 343, 169–174.Johannsen, L. G. K., Momsen, O., and Jacobsen, N. O. (1989), “Polyps of the large intestine inAarhus, Demark. An autopsy study,”

Cancer , 24, 799–806.Kim, J. J., Burger, E. A., Regan, C., and Sy, S. (2017), Screening for Cervical Cancer in PrimaryCare: A Decision Analysis for the U.S. Preventive Services Task Force,, Technical report, Agencyfor Healthcare Research and Quality. Contract No. HHSA-290-2012-00015-I.

URL:

Kish, L. (1965),

Survey sampling , New York, USA: John Wiley and Sons.Knudsen, A. B., Zauber, A. G., Rutter, C. M., Naber, S. K., Doria-Rose, V. P., Pabiniak, C.,Johanson, C., Fischer, S. E., Lansdorp-Vogelaar, I., and Kuntz, K. M. (2016), “Estimation ofbeneﬁts, burden, and harms of colorectal cancer screening strategies: modeling study for the USPreventive Services Task Force,”

Jama , 315(23), 2595–2609.Koh, K.-J., Lin, L.-H., Huang, S.-H., and Wong, J.-U. (2015), “CARE?Pediatric Colon Adeno-carcinoma: A Case Report and Literature Review Comparing Diﬀerences in Clinical FeaturesBetween Children and Adult Patients,”

Medicine , 94(6).Kong, C. Y., McMahon, P. M., and Gazelle, G. S. (2009), “Calibration of Disease Simulation ModelUsing an Engineering Approach,”

Value in Health , 12(4), 521–529.

URL: http://dx.doi.org/10.1111/j.1524-4733.2008.00484.x

Leslie, A., Carey, F. A., Pratt, N. R., and Steele, R. J. C. (2002), “The colorectal adenoma–carcinoma sequence,”

British Journal of Surgery , 89, 845–860.Lieberman, D., Moravec, M., Holub, J., Michaels, L., and Eisen, G. (2008), “Polyp Size and Ad-vanced Histology in Patients Undergoing Colonoscopy Screening: Implications for CT Colonog-raphy,”

Gastroenterology , 135, 1100–1105.Liu, J. (2004),

Monte Carlo Strategies in Scientiﬁc Computing , New York: Springer Verlag.Mandelblatt, J. S., Stout, N. K., Schechter, C. B., Van Den Broek, J. J., Miglioretti, D. L., Krapcho,M., Trentham-Dietz, A., Munoz, D., Lee, S. J., Berry, D. A. et al. (2016), “Collaborative modeling19f the beneﬁts and harms associated with diﬀerent US breast cancer screening strategies,”

Annalsof internal medicine , 164(4), 215–225.Marin, J.-M., Pudlo, P., Robert, C. P., and Ryder, R. J. (2012), “Approximate Bayesian computa-tional methods,”

Statistics and Computing , 22(6), 1167–1180.Marjoram, P., Molitor, J., Plagnol, V., and Simon, T. (2003), “Markov chain Monte Carlo withoutlikelihoods,”

Proceedings of the National Academy of Sciences , 100(26), 15324–15328.McKinley, T., Vernon, I., Andrianakis, I., McCreesh, N., Oakley, J., Nsubuga, R., Goldstein, M.,and White, R. (2018), “Approximate Bayesian Computation and simulation-based inference forcomplex stochastic epidemic models,”

Statistical science , 33(1), 4–18.Meissner, H. I., Breen, N., Klabunde, C. N., and Vernon, S. W. (2006), “Patterns of colorectalcancer screening uptake among men and women in the United States,”

Cancer Epidemiology andPrevention Biomarkers , 15(2), 389–394.Muto, T., Bussey, H. J. R., and Morson, B. C. (1975), “The Evolution of Cancer in the Colon andRectum,”

Cancer , 36, 2251–2270.National Cancer Institute (2004), Surveillance, Epidemiology, and End Results (SEER) Program,,Technical report, National Cancer Institute, DCCPS, Surveillance Research Program, CancerStatistics Branch. released April 2004, based on the November 2003 submission.

URL:

National Cancer Institute (2018), “Cancer INtervention and Surveillance Modeling Network (CIS-NET),”.

URL: https://cisnet.cancer.gov

National Center for Health Statistics (2000), “US Life Tables,”.

URL:

Nelder, J. A., and Mead, R. (1965), “A simplex method for function minimization,”

ComputerJournal , 7, 308–313.Ozik, J., Collier, N. T., Wozniak, J. M., and Spagnuolo, C. (2016), From desktop to Large-Scale Model Exploration with Swift/T,,, in

Proceedings of the 2016 Winter Simulation Con-ference(WSC) , IEEE Press, pp. 206–220.Pickhardt, P. J., Choi, R., Hwang, I., Butler, J. A., Puckett, M. L., Hildebrandt, H. A., Wong, R. K.,Nugent, P. A., Mysliwiec, P. A., and Schindler, W. R. (2003), “Computed Tomographic VirtualColonocopy to Screen for Colorectal Neoplasia in Asymptomatic Adults,”

NEJM , 349, 2191–2200.Ponugoti, P. L., and Rex, D. K. (2017), “Yield of a second screening colonoscopy 10 years after aninitial negative examination in average-risk individuals,”

Gastrointestinal endoscopy , 85(1), 221–224. 20ritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., and Feldman, M. W. (1999), “Populationgrowth of human Y chromosomes: a study of Y chromosome microsatellites.,”

Molecular Biologyand Evolution , 16(12), 1791–1798.

URL: + http://dx.doi.org/10.1093/oxfordjournals.molbev.a026091

R Core Team (2014),

R: A Language and Environment for Statistical Computing , R Foundationfor Statistical Computing, Vienna, Austria.

URL:

Raftery, A., and Bao, L. (2010), “Estimating and Projecting Trends in HIV/AIDS GeneralizedEpidemics Using Incremental Mixture Importance Sampling,”

Biometrics , 66, 1162–1173.Ratmann, O., Camacho, A., Meijer, A., and Donker, G. (2014), “Statistical modelling of summaryvalues leads to accurate approximate Bayesian computations,”

ArXiv , arXiv:1305.4283.Rex, D., Cutler, C., Lemmel, G., Rahmani, E., Clark, D., Helper, D., Lehman, G., and Mark,D. (1997), “Colonoscopic miss rates of adenomas determined by back-to-back colonoscopies,”

Gastroenterology , 112, 24–28.Rickert, R. R., Auerbach, O., Garﬁnkel, L., Hammond, E. C., and Frasca, J. M. (1979), “Adeno-matous lesions of the large bowel. An autopsy survey,”

Cancer , 43, 1847–1857.Rubin, D. (1987), “The calculation of posterior distributions by data augmentation: Comment:A noniterative sampling/importance resampling alternative to the data augmentation algorithmfor creating a few imputations when fractions of missing information are modest: The SIRalgorithm,”

Journal of the American Statistical Association , 82, 543–546.Rutter, C. M., Johnson, E. A., Feuer, E. J., Knudsen, A. B., Kuntz, K. M., and Schrag, D. (2013),“Secular trends in colon and rectal cancer relative survival,”

Journal of the National CancerInstitute , 105(23), 1806–1813.Rutter, C. M., Knudsen, A. B., Marsh, T. L., Doria-Rose, V. P., Johnson, E., Pabiniak, C., Kuntz,K. M., van Ballegooijen, M., Zauber, A. G., and Lansdorp-Vogelaar, I. (2016), “Validation ofModels Used to Inform Colorectal Cancer Screening Guidelines: Accuracy and Implications,”

Medical Decision Making , 36, 604–614.Rutter, C. M., Miglioretti, D. L., and Savarino, J. E. (2009), “Bayesian Calibration of Microsimu-lation Models,”

JASA , 104, 1338–1350.Rutter, C. M., and Savarino, J. E. (2010), “An Evidence-Based Microsimulation Model for Col-orectal Cancer: Validation and Application,”

Cancer Epidemiology Biomarkers and Prevention ,19, 1992–2002.Sisson, S. A., Fan, Y., and Tanaka, M. M. (2007), “Sequential Monte Carlo without likelihoods,”

Proceedings of the National Academy of Sciences , 104(6), 1760–1765.

URL:

Journal of Computationaland Graphical Statistics , 15, 712–734.Stemmermann, G. N., and Yatani, R. (1973), “Diverticulosis and polyps of the large intestine. Anecropsy study of Hawaii Japanese,”

Cancer , 31, 1260–1270.Szczepanski, W., Urban, A., and Wierzchowski, W. (1992), “Colorectal polyps in autopsy material.Part I. Adenomatous polyps,”

Pat Pol , 43, 79–85.Tavare, S., Balding, D., Griﬃths, R., and Donnelly, P. (1997), “Inferring Coalescence Times fromDNA Sequence Data,”

Genetics , 145, 505—518.Tjørve, E., and Tjørve, K. M. (2010), “A uniﬁed approach to the Richards-model family for use ingrowth analyses: why we need only two model forms,”

Journal of theoretical biology , 267(3), 417–425.Toni, T., Welch, D., Strelkowa, N., and Stumpf, M. (2009), “Approximate Bayesian computationscheme for parameter inference and model selection in dynamical systems,”

Journal of the RoyalSociety Interface , 6(31), 187–202.Williams, A. R., Balasooriya, B. A. W., and Day, D. W. (1982), “Polyps and cancer of the largebowel: A necropsy study in Liverpool,”

Gut , 23, 835–842.Winawer, S. J., Fletcher, R. H., Miller, L., Godlee, F., Stolar, M., Mulrow, C., Woolf, S., Glick,S., Ganiats, T., Bond, J. et al. (1997), “Colorectal cancer screening: clinical guidelines andrationale,”

Gastroenterology , 112(2), 594–642.Wozniak, J. M., Armstrong, T. G., Wilde, M., Katz, D. S., Lusk, E., and Foster, I. T. (2013),Swift/T: Large-Scale Application Composition via Distributed-Memory Dataﬂow Processing,, in , IEEE,pp. 95–102.Zauber, A. G., Knudsen, A. B., Rutter, C. M., Lansdorp-Vogelaar, I., Savarino, J. E., vanBallegooijen, M., and Kuntz, K. M. (2009), Cost-Eﬀectiveness of CT Colonography to Screenfor Colorectal Cancer: Report to the Agency for Healthcare Research and Quality from theCancer Intervention and Surveillance Modeling Network (CISNET) for MISCAN, SimCRC, andCRC-SPIN Models,, Technical report. Project ID: CTCC0608.

URL: