[PDF] Adaptive Approximate Bayesian Computation Tolerance Selection

Abstract

Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al. (2009) has become a popular choice for approximate sampling from the posterior. ABC-PMC is a sequential sampler with an iteratively decreasing value of the tolerance, which specifies how close the simulated data need to be to the real data for acceptance. We propose a method for adaptively selecting a sequence of tolerances that improves the computational efficiency of the algorithm over other common techniques. In addition we define a stopping rule as a by-product of the adaptation procedure, which assists in automating termination of sampling. The proposed automatic ABC-PMC algorithm can be easily implemented and we present several examples demonstrating its benefits in terms of computational efficiency.

Full PDF

BBayesian Analysis (0000) , Number 0, pp. 1 Adaptive Approximate Bayesian ComputationTolerance Selection

Umberto Simola ∗ and Jessi Cisewski-Kehe † and Michael U. Gutmann ‡ and JukkaCorander ∗ , § Abstract.

Approximate Bayesian Computation (ABC) methods are increasinglyused for inference in situations in which the likelihood function is either compu-tationally costly or intractable to evaluate. Extensions of the basic ABC rejectionalgorithm have improved the computational eﬃciency of the procedure and broad-ened its applicability. The ABC - Population Monte Carlo (ABC-PMC) approachhas become a popular choice for approximate sampling from the posterior. ABC-PMC is a sequential sampler with an iteratively decreasing value of the tolerance,which speciﬁes how close the simulated data need to be to the real data for ac-ceptance. We propose a method for adaptively selecting a sequence of tolerancesthat improves the computational eﬃciency of the algorithm over other commontechniques. In addition we deﬁne a stopping rule as a by-product of the adaptationprocedure, which assists in automating termination of sampling. The proposed au-tomatic ABC-PMC algorithm can be easily implemented and we present severalexamples demonstrating its beneﬁts in terms of computational eﬃciency.

Keywords:

Complex stochastic modeling, likelihood-free methods, sequentialMonte Carlo.

Approximate Bayesian Computation (ABC) provides a framework for inference in sit-uations where the relationship between the data and the parameters does not lead to atractable likelihood function, but where forward simulation of the data-generating pro-cess is possible. ABC has been used in many areas of science such as biology (Thorntonand Andolfatto, 2006), epidemiology (McKinley et al., 2009; Numminen et al., 2013),ecology (Beaumont, 2010), population modeling (Toni et al., 2009), modeling the popu-lation eﬀects of a vaccine (Corander et al., 2017), dark matter direct detection (Simolaet al., 2019), and astronomy (Cameron and Pettitt, 2012; Cisewski-Kehe et al., 2019;Ishida et al., 2015; Schafer and Freeman, 2012; Weyant et al., 2013). The basic ABCalgorithm (Pritchard et al., 1999; Rubin, 1984; Tavar´e et al., 1997) can be explained infour steps. Suppose the parameter vector θ ∈ R p is the target of inference, then (i) drawthe model parameters from the prior distribution, θ prop ∼ π ( θ ), (ii) produce a synthetic ∗ Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland um-berto.simola@helsinki.ﬁ † Department of Statistics and Data Science, Yale University, New Haven, CT, USA [email protected] ‡ School of Informatics, University of Edinburgh, Edinburgh, United [email protected] § Department of Biostatistics, University of Oslo, Oslo, Norway [email protected].ﬁ c (cid:13) imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 a r X i v : . [ s t a t . C O ] A p r Adaptive Approximate Bayesian Computation Tolerance Selection sample of the data by using θ prop in the forward simulation model, y prop ∼ f ( y | θ prop ),(iii) compare the true data, y obs , with the generated sample, y prop , using a distancefunction, ρ ( · , · ), and deﬁning the distance as d = ρ ( s ( y obs ) , s ( y prop )) where s ( · ) is some(possibly multi-dimensional) summary statistic of the data, (iv) if the distance, d , is lessthan or equal to a ﬁxed tolerance, (cid:15) , then θ prop is retained, otherwise it is discarded.This is repeated until a desired particle sample size, N , is achieved.Following the notation of Marin et al. (2012), the resulting ABC posterior can bewritten as π (cid:15) ( θ | y obs ) = (cid:90) (cid:34) f ( y prop | θ ) π ( θ ) I A (cid:15),y obs ( y prop ) (cid:82) A (cid:15),y obs × Θ f ( y prop | θ ) π ( θ ) dy prop dθ (cid:35) dy prop , where I A (cid:15),y obs ( · ) is the indicator function for the set A (cid:15),y obs = { y prop | ρ ( s ( y obs ) , s ( y prop )) ≤ (cid:15) } . There are many extentions to the basic ABC algorithm (e.g., Blum 2010; Blum et al.2013; Ratmann et al. 2013; Csill´ery et al. 2010; Del Moral et al. 2012; Drovandi andPettitt 2011; Fearnhead and Prangle 2012; Joyce and Marjoram 2008; Marin et al.2012), but here we focus on the ABC - Population Monte Carlo (ABC-PMC) approachintroduced by Beaumont et al. (2009). However, the proposed methodology could beused in other sequential versions of ABC that require selecting a sequence of tolerances.The proposed adaptive approximate Bayesian computation tolerance selection algorithm(aABC-PMC) targets the same kind of approximate posterior sampling problems as theoriginal ABC-PMC algorithm, and may be subject to the same limitations in the case ofhigh-dimensional parameter spaces. ABC has been successfully used in numerous situ-ations where the likelihood function is intractable and the number of parameters variesfrom 2 to 5 (e.g. Beaumont et al. 2009; Cisewski-Kehe et al. 2019; Csill´ery et al. 2010;Cornuet et al. 2008; Del Moral et al. 2012; Gutmann and Corander 2016; J¨arvenp¨a¨aet al. 2016; Jennings and Madigan 2016; Jennings et al. 2016; Numminen et al. 2013;Silk et al. 2013; Simola et al. 2019; Sisson et al. 2007; Toni et al. 2009). Our algorithmis designed to signiﬁcantly improve upon the original ABC-PMC method under similarcircumstances.The ABC-PMC algorithm by Beaumont et al. (2009) is based on an adaptive im-portance sampling approach, where, given a series of decreasing tolerances (cid:15) > (cid:15) > · · · > (cid:15) T (T being the ﬁnal iteration), the proposal distribution is sequentially updatedin order to improve the eﬃciency of the algorithm. This is done by constructing a seriesof intermediate proposal distributions, with the details of the steps presented in Algo-rithm 1. The ﬁrst iteration of the ABC-PMC algorithm uses tolerance (cid:15) and drawsproposals from the speciﬁed prior distribution(s); the corresponding ABC posterior isdenoted by π (cid:15) . Rather than starting the rejection sampling over using a smaller (cid:15) , thealgorithm proceeds sequentially by drawing proposals from the ABC posterior approx-imated in the previous iteration. After a parameter value, typically referred to as a particle , is selected from the set of available particles from the previous iteration, it isalso translocated according to some kernel function (e.g. a Gaussian kernel) to avoiddegeneracy of the sampler. Since the proposals are not drawn directly from the prior π , importance weights are used. The importance weight for a particle J = 1 , . . . , N at imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. t is: W ( J ) t ∝ π ( θ ( J ) t ) / N (cid:88) K =1 W ( K ) t − φ (cid:104) τ − t − (cid:16) θ ( J ) t − θ ( K ) t − (cid:17)(cid:105) , (1.1)where φ ( · ) is the density function of a standard normal distribution , τ t − is the variance(twice the weighted sample variance of the particles from iteration t − π ( · ) is the prior distribution. We note thatthe deﬁnition for the importance weight provided in Eq.(1.1) is up to a normalizationconstant. In fact each importance weight is normalized such that (cid:80) NJ =1 W ( J ) t = 1.While the particles are drawn from a sequentially improving proposal distribution, thetolerances also decrease such that (cid:15) > (cid:15) > · · · > (cid:15) T , to increase the ﬁdelity of theresulting approximation to the underlying posterior. The common strategies for selectingthis sequence adaptively, highlighted in Section 1.1, can lead to ineﬃcient sampling aswell as avoiding relevant regions of the parameter space (Silk et al., 2013). The keycontributions of this article are (i) a method for selecting the (cid:15) T = ( (cid:15) , (cid:15) , . . . , (cid:15) T ) in amanner that results in improved computational eﬃciency, and (ii) a rule for determiningwhen the algorithm terminates (i.e. determining T ). There are three common approaches for selecting the tolerance sequence, (cid:15) T : (i) ﬁxingthe values in advance (Beaumont et al., 2009; McKinley et al., 2009; Sisson et al., 2007;Toni et al., 2009), (ii) adaptively selecting (cid:15) t based on some quantile of { d ( J ) t − } NJ =1 ,the distances of the accepted particles from iteration t − (cid:15) t based on some quantile of the eﬀective sample size (ESS)values (Del Moral et al., 2012; Numminen et al., 2013). These approaches can lead toineﬃcient sampling as discussed below and demonstrated in the simulation study inSection 3. It turns out that selecting tolerances using a predetermined quantile can,if not selected wisely, lead to the particle system getting stuck in local modes (Silket al., 2013). Hence the exact sequence of tolerances has an impact not only on thecomputational eﬃciency of the algorithm but also on convergence towards the true pos-terior. We emphasize, however, that obtaining a high-ﬁdelity approximation to the trueposterior using ABC is not guaranteed, as this depends on a number of conditions tobe met, including a careful selection of summary statistics. Silk et al. (2013) proposean adaptive approach for selecting the tolerance sequence at each iteration by esti-mating the threshold-acceptance rate curve (TAR curve), which is used to balance theamount of shrinkage of the tolerance with the acceptance rate. This approach requiresthe estimation of the TAR curve at each iteration of the algorithm. The naive, butcomputationally impractical approach to estimating the TAR curve (noted as such inSilk et al. 2013), is to simulate a Monte Carlo estimate of the acceptance rate at a range The probability density function of a Q –dimensional standard normal distribution is φ ( X ) =(2 π ) − Q exp (cid:0) − X T X (cid:1) with the expected value of the random vector X is E [ X ] = (cid:126) (cid:126) X ] = I Q , where I Q is the Q × Q identitymatrix. imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection

Algorithm 1

ABC-PMC algorithm for θ Given a series of decreasing tolerances (cid:15) > (cid:15) > · · · > (cid:15) T if t = 1 thenfor J = 1 , . . . , N do Set d ( J )1 = (cid:15) + 1 while d ( J )1 > (cid:15) do Propose θ ( J ) by drawing θ prop ∼ π ( θ ),Generate y prop ∼ f (cid:0) y | θ ( J ) (cid:1) Calculate distance d ( J )1 = ρ ( s ( y obs ) , s ( y prop )) end while Set weight W ( J )1 = N − end forelse if ≤ t ≤ T then Set τ t = 2 · var (cid:16) { θ ( J ) t − , W ( J ) t − } NJ =1 (cid:17) for J = 1 , . . . , N do Set d ( J ) t = (cid:15) t + 1 while d ( J ) t > (cid:15) t do Select θ ∗ t from θ ( J ) t − with probabilities (cid:110) W ( J ) t − / (cid:80) NK =1 W ( K ) t − (cid:111) NJ =1 Propose θ ( J ) t ∼ N ( θ ∗ t , τ t )Generate y prop ∼ f (cid:16) y | θ ( J ) t (cid:17) Calculate distance d ( J ) t = ρ ( s ( y obs ) , s ( y prop )) end while Set weight W ( J ) t ∝ π ( θ ( J ) t ) / (cid:80) NK =1 W ( K ) t − φ (cid:104) τ − t − (cid:16) θ ( J ) t − θ ( K ) t − (cid:17)(cid:105) end forend if imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. T (Beaumont et al., 2009). Ishida et al. (2015) showed that once the ABCposterior stabilizes, further reduction of the tolerance leads to low acceptance rateswithout meaningful improvement in the ABC approximation to the posterior. Theystop the algorithm once the acceptance rate drops below a threshold set by the user.The ﬁrst main contribution of this paper is to extend the ABC-PMC algorithm sothat the quantile used to update the tolerance in each iteration, q t , is automaticallyand eﬃciently selected, rather than being ﬁxed in advance to a quantile that is usedfor each iteration. It is worth noticing that eﬃciency is not only a matter of havinga high acceptance rate, as this can be easily accomplished by using larger quantiles,but rather a balance between the acceptance rate and a suitable amount of shrinkageof the tolerance. Moreover the series of tolerances needs to be selected in such a waythat the algorithm avoids getting stuck in local modes. As the second contribution, wedevelop an automatic stopping rule directly based on the behavior of the sequentialABC posterior.The rest of the paper is organized as follows. In Section 2 the adaptive selection of q t for determining the tolerance sequence is presented along with the proposed stoppingrule. Section 3 is dedicated to a simulation study to compare quantile-based selection oftolerances using ABC-PMC with the proposed procedure. The ﬁnal example considereduses real data on colonizations of the bacterium Streptococcus pneumoniae (Numminenet al., 2013). Concluding remarks are given in Section 4.

Using the same quantile to update the tolerance at each iteration can be computation-ally ineﬃcient and results in the particle system getting stuck in local modes (see theexample in Section 3.2). In this section we introduce a method for adaptively selectingthe quantile such that each iteration has its own quantile, q t , set based on the onlineperformance of the algorithm. In order to initialize the tolerance sequence we use the following approach. Let N be thedesired number of particles to approximate the posterior. The initial tolerance (cid:15) can imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection be adaptively selected by sampling N init = kN draws from the prior, for some k ∈ Z + (Cisewski-Kehe et al., 2019). Then the N particles of the N init total particles with thesmallest distances are retained, and (cid:15) = max (cid:16) d (1 ∗ )1 , . . . , d ( N ∗ )1 (cid:17) , where d (1 ∗ )1 , . . . , d ( N ∗ )1 are the N smallest distances of the N init particles sampled. This initialization procedureeﬀectively selects a distance quantile for the ﬁrst step by the selection of an appropriate k , but making this ﬁrst step adaptive is easier than trying to guess a good (cid:15) . Tryingto specify a reasonable (cid:15) can be especially challenging when testing diﬀerent summarystatistics or distance functions because the scale of the distances can be diﬀerent. Itis important to note that k must be large enough to result in a satisfactory initialexploration of the parameter space, otherwise the algorithm might get stuck in localregions of the parameter space. This challenge also holds true in general for other ABCalgorithms, including when (cid:15) is predeﬁned (i.e. not set adaptively). Providing a generaland suitable value for k regardless of the problem that is considered is challenging,since this choice depends on a number of factors such as the deﬁnition of the priordistribution(s), the forward model and where relevant regions of the parameter spaceare (the latter being unknown). Therefore the parameter k has to be suitably tuned bythe user once the forward model and the prior distribution(s) have been deﬁned. Theproblem of selecting k is further discussed in Section 3.For the subsequent tolerances, (cid:15) T , the general idea is to gauge the amount ofshrinkage for iteration t + 1 by determining the value of (cid:15) t +1 based on the amount ofimprovement between ˆ π (cid:15) t − and ˆ π (cid:15) t . In particular, we can use the estimated ABC pos-teriors to select a quantile to update the tolerance for the next iteration, and adjust thenext tolerance based on how slowly or rapidly the sequential ABC posteriors are chang-ing. More speciﬁcally, after each iteration t >

1, the following ratio can be estimatedusing the weighted particles: ˆ c t = sup θ ˆ π (cid:15) t ( θ )ˆ π (cid:15) t − ( θ ) . (2.1)Since ˆ π (cid:15) t − ( θ ) and ˆ π (cid:15) t ( θ ) from Eq. (2.1) are both proper densities, they will be eitherexactly the same, making ˆ c t = 1, or there must be a place where ˆ π (cid:15) t ( θ ) > ˆ π (cid:15) t − ( θ ),making ˆ c t >

1. Then the proposed quantile for iteration t (in order to determine (cid:15) t +1 )is q t = 1ˆ c t , (2.2)which varies between 0 and 1. Small values of q t imply q t − lead to a large improve-ment between ˆ π (cid:15) t − and ˆ π (cid:15) t , which then results in a larger percentage reduction of thetolerance for the coming iteration, t + 1. On the other hand, once the ABC posteriorstabilizes, q t tends to 1 as ˆ π (cid:15) t − and ˆ π (cid:15) t become more similar.The form of Eq. (2.2) was motivated by the Accept-Reject (A/R) algorithm (Andrieuet al., 2003; Robert and Casella, 2013). The A/R algorithm has a target distribution,a proposal distribution, and a rule to decide whether or not an element coming fromthe proposal distribution should be accepted as an element coming from the targetdistribution. If the form of the ABC posterior distribution was known, A/R samplingwould work as follows. A candidate, θ ∗ , would be proposed from ˆ π (cid:15) t − ( θ | y obs ), and wouldbe accepted with probability ˆ π (cid:15)t ( θ ∗ | y obs ) c · ˆ π (cid:15) ∗ t − ( θ | y obs ) , where c ∈ (1 , ∞ ) is a positive real constant imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. π (cid:15) t ( θ | y obs ) ≤ c · ˆ π (cid:15) t − ( θ | y obs ) (Robert and Casella, 2013).In A/R sampling, the unconditional acceptance probability is c (Hesterberg, 1988).The constant c acts as a proxy for the diﬀerence between the proposal and the targetdistributions (e.g., if they are the same distribution, then c = 1 and all proposals wouldbe accepted).The ABC algorithm does not follow the A/R sampling scheme, but the notion of1 /c relating to the sampling eﬃciency in the A/R algorithm inspired the proposedadaptive tolerance selection idea. For some future iteration, say iteration t + 1, the ABCposterior distribution is unknown so the previous two ABC posteriors at iterations t − t are used as the proposal and target distributions, respectively, so that ˆ c t can becomputed in Eq. (2.1). If there was a substantial change between ˆ π (cid:15) t − and ˆ π (cid:15) t , then ˆ c t would be larger resulting in a smaller quantile, q t , for specifying (cid:15) t +1 . As ˆ π (cid:15) t − and ˆ π (cid:15) t become more similar, larger quantiles q t are assigned. The proposed form of c t allowsthe tolerance selection to be based on changes in the ABC posterior from the previousiteration where substantial changes between iterations t − t result in a substantialdecrease in the proposed tolerance for (cid:15) t +1 . This continues until a substantial decreasein the proposed tolerance does not result in a substantial change in the ABC posterior,at which point the amount of shrinkage in the tolerance becomes smaller.We found the rule based on Eq. (2.2) to work well empirically. One challenge witha theoretical evaluation of the proposed algorithm, and other algorithms designed tooptimize the tolerance shrinkage and acceptance rate, is that the acceptance rate de-pends on the forward simulation model. In general ABC settings, the forward simulationmodel does not have a closed-form expression.An illustration of the proposed quantile selection procedure is provided in Figure 1.If ˆ π t − was used as the proposal for iteration t + 1 (instead of ˆ π t ), then q t would be thepercentage decrease in the acceptance rate from iteration t , i.e. if acc t is the acceptancerate for iteration t , then acc t +1 would be approximately q t × acc t . However, we are notproposing from ˆ π t − , but rather ˆ π t so the decrease in the acceptance rate is mitigatedby the improvement in the proposed particles from iteration t . When there is a largeimprovement in the ABC posterior from ˆ π t − to ˆ π t , then q t is smaller, allowing for alarger drop in the tolerance. This larger percentage drop in tolerance does not resultin an equal percentage drop in acceptance rate because the new proposal distribution,ˆ π t , is better than ˆ π t − . Conversely, if ˆ π t − is close to ˆ π t , then the improvement in theABC posterior is not enough to allow for a large decrease in the acceptance rate andconsequently q t is closer to 1.The evaluation of Eq. (2.1) relies on the calculation of the ratio between the (possiblymultidimensional) density functions, deﬁned here as r . A naive solution would be toseparately calculate the density for ˆ π t and ˆ π t − using some Kernel Density Estimate(KDE) method (see Silverman 2018 for a review), and then estimate the ratio from thoseestimates. Then, the supremum of the previously calculated ratio can be obtained, forexample, through an optimization procedure that computes the density over a grid ofvalues. However, this is not a reliable solution, in particular for high-dimensional casesfor which division by an estimated quantity can magnify the estimation error (Sugiyamaet al., 2008). In order to address the problem of properly estimating r with ˆ r , and imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection

Figure 1: Illustration of the selection of q t . (left) The proposal distribution ABC poste-rior ˆ π t − , the resulting ABC posterior ˆ π t and their ratio ˆ π t ˆ π t − , with ˆ c t deﬁned accordingto Eq. (2.1) and used for setting q t , as deﬁned in Eq. (2.2). (right) The (arbitrary)distribution of distances is from the accepted distances at iteration t , { d ( J ) t } NJ =1 , with (cid:15) t being the largest possible value. The next iteration’s tolerance, (cid:15) t +1 , is set as the q t quantile of { d ( J ) t } NJ =1 .therefore solving Eq. (2.1), alternatives to the KDE solution are available, such as ratioestimation methods (REM) (Sugiyama et al., 2012). The main advantage of using REMis that the calculation of the desired ratio does not include density estimation, whichwould involve dividing by an estimated KDE. Additionally, when using a KDE, kerneland bandwidth need to be selected, which can aﬀect the result. Poorly estimating thedensity of the denominator of r , in particular, can potentially increase the error of theestimated ratio (Sugiyama et al., 2010). There are several diﬀerent REM frameworks(e.g. Bickel et al. 2007; Gretton et al. 2009; Sugiyama et al. 2008, 2010), but we use theratio matching approach of Sugiyama et al. (2008) discussed in more detail next.In order to introduce the REM framework, consider θ ∈ R p and two generic sam-ples { θ Li } Li =1 and { θ Mj } Mj =1 , where L and M are the sample sizes for the ﬁrst and thesecond sample, respectively. The sample { θ Li } Li =1 has as corresponding density p L ( θ ),while the sample { θ Mi } Mi =1 has as corresponding density p M ( θ ). The density ratio r ( θ )can be deﬁned as r ( θ ) = p L ( θ ) p M ( θ ) . The basic idea of the ratio matching approach is tomatch a density ratio model ˆ r ( θ ) with the true density ratio r ( θ ) under some divergence(Sugiyama et al., 2010). Several divergences can be used to compare ˆ r ( θ ) with r ( θ ). Acommon divergence is the Bregman divergence (Bregman, 1967), along with some ofits related divergences such has the unnormalized Kullback-Leibler divergence and thesquared distance. In particular, the unnormalized Kullback-Leibler divergence mini-mizes the divergence between p L ( θ ) and ˆ p L ( θ ) = ˆ r ( θ ) p M ( θ ) by means of the following imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. ˆ r (cid:90) p L ( θ ) log p L ( θ )ˆ r ( θ ) p M ( θ ) dθ. (2.3)By decomposing the Kullback-Leibler divergence deﬁned in Eq. (2.3), ˆ r ( θ ) can be es-timated by solving the objective function max ˆ r (cid:82) p L ( θ ) log ˆ r ( θ ) dθ (Hido et al., 2011;Sugiyama et al., 2010). Further details on the unnormalized Kullback-Leibler diver-gence and on other REM approaches are found in Sugiyama et al. (2012). As pointedout by Sugiyama et al. (2010), a further non-negligible advantage of using REM, and inparticular the ratio matching approach, is the applicability of gradient-based algorithmsand quasi–Newton methods for optimization over ˆ r ( x ).In the analyses of the present work we use the ratio matching approach and theKullback–Leibler importance estimation procedure (KLIEP) (Hido et al., 2011; Sugiyamaet al., 2010, 2008) in order to estimate, at the end of each iteration t , the ratio of den-sities deﬁned in Eq. (2.1). Recall that the densities involved in Eq. (2.1) are ˆ π (cid:15) t ( θ ) andˆ π (cid:15) t − ( θ ). Once the ratio between ˆ π (cid:15) t ( θ ) and ˆ π (cid:15) t − ( θ ) has been estimated, the supre-mum of Eq. (2.1) is calculated by using an optimizer over the parameter space, suchas the one proposed by Brent (2013). The quantile used to reduce the tolerance forthe coming iteration is ﬁnally retrieved by using Eq. (2.2). The steps discussed aboveare performed at the end of each iteration as long as the stopping rule, deﬁned in Eq.(2.5) and discussed below, is not satisﬁed. Estimation of ˆ r is carried out by using thedensratio package , which is freely available in the R software (R Core Team, 2019).The acceptance rate is also useful for evaluating the computational burden of theABC-PMC algorithm, deﬁned as: acc t = N D t , (2.4)where D t is the number of draws done at iteration t in order to produce N accepted val-ues. Eq. (2.4) generally decreases with each iteration because as the tolerance decreases,the number of elements D t required to get N accepted particles generally increases (Lin-tusaari et al., 2017). There are several published ideas in the literature on how to determine the number ofiterations in an ABC-PMC algorithm. Often one picks some T based on the compu-tational resources available, but this can be needlessly ineﬃcient. Ishida et al. (2015)proposed to stop the algorithm once the acceptance rate is smaller than some speciﬁed,ﬁxed tolerance. The proposed stopping rule is directly based on the estimated sequen-tial ABC posterior distributions, which avoids unnecessary additional iterations of thealgorithm.The ABC–PMC algorithm produces a sequence of T posterior distributions, ˆ π (cid:15) t ,where (cid:15) t identiﬁes the tolerance used in iteration t , with t = 1 , . . . , T and (cid:15) > (cid:15) > https://github.com/hoxo-m/densratio imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection · · · > (cid:15) T . When deﬁning a stopping rule, it turns out that Eq. (2.2) can be used not onlyto adaptively selecting the quantile used to reduce the tolerance across the iterations,but also to indicate when to stop the procedure once the sequential ABC posterior stopschanging signiﬁcantly.The series of quantiles deﬁned through Eq. (2.2) generally increases as the tolerancedecreases. In particular, since the quantile used to reduce the tolerance is based onthe online performance of the ABC posterior distribution, once the ABC posterior hasstabilized, q t ≈

1. This follows directly from Eq. (2.1) because once the ABC posteriorhas stabilized ˆ c t ≈

1, and further reductions of the tolerance (i.e. additional iterations)do not necessarily lead to an improvement by the ABC posterior distribution. In otherwords, once the ABC posterior stabilizes, the series of the quantiles deﬁned through Eq.(2.2) stops increasing and the upper bound of 1 implies that no further reduction willimprove the ABC posterior distribution. This leads to an automatic and simple stoppingrule, which is employed starting from the third iteration, i.e. once the transformationkernel has been used twice to avoid premature stopping. Our algorithm is stopped attime t when q t > .

99 for t ≥ . (2.5)Hence, the algorithm is stopped once the quantile used to reduce the tolerance suggeststhat further reduction is not necessary since the ABC posterior has stabilized.Using Eq. (2.2) as an automatic rule to shrink the tolerance and Eq. (2.5) as the stop-ping rule, the ABC-PMC algorithm is stopped once additional iterations with smallertolerances do not lead to signiﬁcant changes in the ABC posterior. Next we provide a comparison between the original ABC-PMC algorithm and our ex-tension proposed in Section 2, the aABC-PMC, by using three examples. In the ﬁrstexample the Gaussian mixture model by Sisson et al. (2007) is used in order to demon-strate the computational eﬃciency of the proposed aABC-PMC procedure. Then theaABC-PMC algorithm is used for a model from Silk et al. (2013), which has localmodes, in order to illustrate how the proposed automatic tolerance selector is able toavoid getting stuck in local regions of the parameter space. The ﬁnal example, origi-nally presented in Numminen et al. (2013), uses data on colonizations of the bacterium

Streptococcus pneumoniae and represents a computationally expensive forward model.Expensive forward models are a challenge for ABC methods because the computationalcost can be prohibitive for practical applications, and in these cases selecting an appro-priate sequence of tolerances is crucial. A fourth example, the Lotka–Volterra model byToni et al. (2009), is presented in the Appendix A of the Supplementary Material.In order to compare the proposed procedure with the original ABC-PMC algorithm,both the computational time and the total number of draws until the stopping criterion The desired sample size N has an impact on the evaluation of Eq. (2.5). This problem arises also inthe classical MCMC analysis when determining the length of the MCMC chain (Gelman et al., 2014).An N that is too small leads to more variability of the estimated posterior in Eq. (2.5), which couldlead to the algorithm stopping prematurely. imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. π (cid:15) T , and a benchmark, π true , which is deﬁned as: H (ˆ π (cid:15) T , π true ) = (cid:18)(cid:90) (cid:16)(cid:112) ˆ π (cid:15) T ( y ) − (cid:112) π true ( y ) (cid:17) dy (cid:19) . (3.1)The benchmark, π true , is the true posterior distribution if it is available in closed form,which is the case in the ﬁrst two presented examples (see Sections 3.1 and 3.2). In theﬁnal example, since the true posterior distribution is not available, the ABC posteriorsfrom Numminen et al. (2013), are used as benchmarks (see Section 3.3).In order to estimate the 1–dimensional marginal ABC posterior distributions fromthe samples and their corresponding importance weights, a KDE (Silverman, 2018) isused with a Gaussian kernel and a smoothing bandwidth parameter h . The bandwidthis selected using Silverman’s rule–of–thumb (Silverman, 1986).Finally, unless otherwise noted, the number of particles in the ABC procedures isset to N = 1 , The ﬁrst application of the aABC-PMC is an example from Sisson et al. (2007), whichis also analyzed by Beaumont et al. (2009). It is a Gaussian mixture model with twoGaussian components with known variances and mixture weights, but an unknowncommon mean, f ( y | θ ) = 0 . N ( θ,

1) + 0 . N ( θ, .

01) and prior π ( θ ) ∼ Unif( − , y obs = 0, the true posterior distribution is π ( θ | y obs ) ∼ . N (0 ,

1) + 0 . N (0 , . . (3.2)For consistency with the results of Sisson et al. (2007) and Beaumont et al. (2009),the distance function used is ρ ( y obs , y prop ) = | y obs − y prop | , N = 1 , T = 10 iterations with a ﬁxed series of tolerances (cid:15) displayed in Table1. To evaluate the reliability of the aABC-PMC, a comparison with the ABC-PMCis done both in terms of computational time and total number of draws. The resultsof the analysis are shown in Table 1 and are based on 21 independent runs with thesame dataset, y obs = 0. The table includes the values for the run that produced themedian number of total draws. The aABC-PMC outperforms ABC-PMC in the termsof total draws (81,230 vs 1,421,283) and a faster computational time (88 seconds vs243 seconds). The ﬁnal ABC posteriors for each method are displayed in Figure 2a.Though the aABC-PMC method is computationally more eﬃcient than the ABC-PMCapproach, the ﬁnal ABC posteriors are very similar. This suggests that after a suitabletolerance is achieved, decreasing the tolerance further does not necessarily lead to abetter approximation of the posterior distribution. imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection −4 −2 0 2 4 . . . . . . q D en s i t y True PosterioraABC−PMC Final PosteriorABC−PMC Final PosteriorInput theta (a) Final ABC posteriors. l l l l . . . . . . Iteration E ff i c i en c i e s l Estimated QuantileAcceptance Rate (b) aABC-PMC quantities.

Figure 2: Gaussian mixture model example. (a) ABC-PMC and aABC-PMC ﬁnal pos-terior distributions and (b) sequential quantities computed for the aABC-PMC method.The q t ’s (black circles) generally increase through the iterations until the ABC posteriorhas stabilized. The acceptance rate (blue triangles) decreases throughout the iterations,which is why it is desirable to stop the algorithm once the ABC posterior has stabilized.From Table 1, we note that the ﬁnal tolerance for Sisson et al. (2007) is (cid:15) = 0 . H dist = 0 .

20) while the automatic stopping rule of aABC-PMC leads to 4 iterationswith a ﬁnal tolerance of (cid:15) = 0 .

035 ( H dist = 0 . q t ’s retrieved byusing Eq. (2.2) are displayed (black circles), which increase until the ﬁnal iteration,while the acceptance rate (blue triangles) decreases. Neglecting to stop the algorithmonce the ABC posterior has stabilized can be ineﬃcient since the number of drawsneeded in order to complete further iterations can drastically increase, as evidenced bythe increasing D t for later iterations displayed in Table 1.Next, we show the behavior of the aABC-PMC algorithm for diﬀerent choices ofthe number of proposed values from the prior distribution at the ﬁrst iteration of theprocedure. Initial particle sample sizes, N init , of N, 2N, 5N, and 10N are considered (with N = 1 , N , with similar ﬁnal ABC posterior distributionsbased on H dist (see Table 2); the posteriors are displayed in Figure 3.Using the EasyABC R package we carried out the same analysis for the ABC-SMCalgorithm by Del Moral et al. (2012). The ABC–SMC algorithm by Del Moral et al.(2012) is discussed in Section 3.3. For each initial particle sample size, N init , of N, 2N,5N, and 10N, 21 independent runs with the same dataset are performed and the runsthat produced the median number of total draws are compared to the corresponding https://cran.r-project.org/web/packages/EasyABC . imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. t (cid:15) t D t H dist t (cid:15) t q t D t H dist N = 1 ,

000 accepted values for the ABC-PMC and the aABC-PMC algorithm.(The displayed results were obtained by running the procedure 21 times and using therun that produced the median number of total draws.) For the aABC-PMC algorithm,the quantile automatically selected through the iterations is displayed under q t . Theprocedure stopped once the quantile q = 0 .

999 was proposed. For the ABC–PMCalgorithm a total of 1 , ,

283 (243 sec.) draws were required, while our aABC-PMCtakes 81 ,

230 (88 sec.) draws overall.

T D t (cid:15) (cid:15) T time (sec) H dist N 14 276,885 11.54 0.035 208 0.232N 10 109,720 4.97 0.077 150 0.295N 4 81,230 1.96 0.035 88 0.2010N 4 90,194 1.00 0.059 105 0.17

Table 2: aABC-PMC algorithm with diﬀer-ent choices for N init ( N, N, N, N ) forthe Gaussian mixture model example. −4 −2 0 2 4 . . . . . . q D en s i t y True PosterioraABC−PMC Final Posterior NaABC−PMC Final Posterior 2NaABC−PMC Final Posterior 5NaABC−PMC Final Posterior 10NInput theta

Figure 3: aABC-PMC posteriors with dif-ferent choices for N init ( N, N, N, N )for the Gaussian mixture model example. imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection D t time (sec) H dist (cid:15) t N 25,023 17 0.35 0.0372N 47,192 54 0.37 0.0315N 124,890 322 0.33 0.03110N 249,696 1254 0.34 0.03

Table 3: ABC-SMC algorithm with diﬀerentchoices for N init ( N, N, N, N ) for theGaussian mixture model example. −4 −2 0 2 4 . . . . . . q D en s i t y True PosteriorABC−SMC Final Posterior NABC−SMC Final Posterior 2NABC−SMC Final Posterior 5NABC−SMC Final Posterior 10NInput theta

Figure 4 : ABC-SMC ﬁnal posterior dis-tributions with diﬀerent choices for N init ( N, N, N, N ) for the Gaussian mix-ture model example.run obtained by our adaptive approach. Our choices for setting the parameters requiredby the ABC-SMC algorithm (see Sec. 3.4) are: N = 1 , (cid:15) = 0 . α = 0 . M = 1and nb threshold = N/

2. We note that for the last three parameters, the default values areused, according to the suggestions by Del Moral et al. (2012). The results of the analysisare summarized in Table 3 and the corresponding posterior distributions are displayedin Fig. 4. For all four N init values considered, the ﬁnal tolerances returned by the ABC-SMC algorithm are comparable with the one obtained by our approach with N init =5 N ( (cid:15) = 0 . The sequence of tolerances has an impact not only on the computational eﬃciencyof the algorithm, but also on its ability to ﬁnd the true posterior (Silk et al., 2013),noting again that convergence to the true posterior using ABC is not guaranteed. Todemonstrate the performance of aABC-PMC in the presence of local modes, we consideran example proposed in Silk et al. (2013). The (deterministic) forward model is g ( θ ) =( θ − −

100 exp( − θ − ). The input value is set to θ = 3 leading to a singleobservation y obs = −

51. The true posterior distribution is a Dirac function at 3. Thespeciﬁcations for the distance function ( L norm), the prior distribution (a normal imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. N = 1 , θ ’s, which highlights the challenge for ABC with this model. Thereis a local minimum distance around θ = 10, but the global minimum distance occurs atthe true value of θ = 3. Initial steps of the ABC algorithm will ﬁnd the local minimum,but the algorithm can easily get stuck around θ = 10 if the sequential tolerances arenot selected carefully. The series of plots in Figure 5 shows the behavior of the aABC-PMC algorithm by focusing on the values for θ that were accepted (orange x’s). After6 iterations, the aABC-PMC algorithm has found the global minimum distance aroundthe true θ . The results of the analysis, based on 21 independent runs, are summarizedin Table 4, where 384 ,

347 total particles were used by the proposed aABC-PMC algo-rithm. The table includes the values for the run that produced the median number oftotal draws.

Iteration t = 1 D i s t an c e q Iteration t = 2 D i s t an c e q Iteration t = 3 D i s t an c e q Iteration t = 4 D i s t an c e q Iteration t = 5 D i s t an c e q Iteration t = 6 D i s t an c e q Figure 5: Example from Silk et al. (2013) to investigate the performance of the proposedaABC-PMC in the presence of a local optimal value. The accepted θ are plotted asorange x’s against the corresponding distance by iteration.It is apparent from Figure 5 that the third iteration was an important step inwhich the large reduction of the tolerance allowed the algorithm to consider those fewparticles coming from the global optimal value at θ = 3. Although the raw tolerancehardly decreases between the ﬁrst and the second iteration ( (cid:15) = 51 .

59 and (cid:15) = 51 . π (cid:15) to ˆ π (cid:15) . The majorityof the accepted values from t = 2 are sampled near the local mode at θ = 10, but thereduction resulting from the slightly smaller (cid:15) leads to the majority of values proposednear θ = 3 to be accepted.In order to compare the proposed aABC-PMC algorithm with the ABC-PMC ap-proach of Silk et al. (2013) (see Section 1.1), we estimated the TAR curve and thecorresponding thresholds (Silk et al., 2013). The TAR curve is obtained by plotting imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection

TAR curve (Silk et al., 2013) aABC-PMC t (cid:15) t D t H dist t (cid:15) t q t D t H dist .

84 1,403,040 0.174 3 51.00 0.16 99,596 0.684 39.33 0.17 138,972 0.435 0.07 0.06 32,045 0.0676 0.00025 0.90 100,604 0.064Total 1,415,600 384,347Table 4: The number of draws needed in each iteration to reach N = 1 ,

000 acceptedvalues for the ABC-PMC with the TAR curve-selected tolerances and the aABC-PMCalgorithm. (The displayed results were obtained by running the procedure 21 timesand using the run that produced the median number of total draws.) For the aABC-PMC algorithm, the quantile automatically selected through the iterations is displayedunder q t . The procedure stopped once the quantile q = 0 . , ,

600 (310 sec.) draws are required, while ouraABC-PMC takes 384 ,

347 (258 sec.) draws overall. The number of draws listed for Silket al. (2013) does not include the draws required to build the TAR curve; however, wedid include the TAR curve construction in the computational time.on the x –axis several thresholds (cid:15) that might be picked for the next iteration of ABCsimulations and on the y –axis their corresponding acceptance rates. The threshold (cid:15) rec-ommended for the next ABC-PMC iteration is then selected by locating the “elbow” ofthe estimated TAR curve (Silk et al., 2013). Since the forward model is computationallycheap, an approximation to the forward model was not needed. Instead, the TAR curvewas estimated at each iteration by setting arbitrary grid points of tolerances havingrange in (0 , (cid:15) t − ), running the ABC-PMC algorithm (for t > (cid:15) = (150 , . , .

84) and the corresponding ABC posterior distributionsare displayed in Figure 6a. The number of draws listed for Silk et al. (2013) does not include the draws required to build the TAR curve; however, we did include the TARcurve construction in displayed computational time. We note that the true posteriordistribution, which is a Dirac function centered in θ = 3, is not suitably approximatedby Silk et al. (2013) ( H dist = 0 . θ = 3) with an N -dimensional vector with all imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. t = 4, the estimated TAR curve did not have an elbow and, consequently, therewas no additional shrinkage of the tolerance resulting in an ABC posterior that wasnot a suitable approximation to the true posterior distribution; the ﬁnal tolerance (cid:15) was too high. We tried making adjustments to the TAR curve grid to see if this couldbe improved. When using fewer grid points (e.g. 10) for the TAR curve, we were ableto improve the performance. However, this improved performance was due to poorerapproximation to the TAR curve. In general, it would be preferable if a better estimate ofthe TAR curve lead to better performance. A higher resolution TAR curve grid with 1000grid points also was not able to ﬁnd the global optimal solution. In contrast, as shownin Figure 6b, the proposed aABC-PMC approach provides a better approximation ofthe true posterior distribution although the number of draws required by the simulatoris only of 384 ,

347 (compared to 1,415,600 draws required by ABC–PMC with the 100point TAR curve grid). D en s i t y q True PosteriorABC−PMC Final PosteriorABC−PMC Initial PosteriorLocal mode (a) ABC posterior distributions (Silket al., 2013). D en s i t y q True PosterioraABC−PMC Final PosterioraABC−PMC Initial PosteriorLocal mode (b) aABC-PMC posterior distributions.

Figure 6: ABC posterior distributions by iteration using (a) the TAR curve, and (b)the proposed aABC–PMC algorithm. The true posterior distribution, which is a Diracfunction centered at θ = 3 is better captured by the aABC–PMC algorithm ( H dist =0 . H dist = 0 . kN with k = 5, which seems to work well in the examples considered. We emphasize that movingtoward relevant regions of the parameter space needs to happen in the ﬁrst few iterations imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection of the ABC-PMC procedure, since uniformly small reductions in the tolerance sequence(e.g. using a ﬁxed q t ≥ .

25) could end up removing those few important particles nearthe global optimal value, even if the number of particles sampled directly from the prioris 5 N .The initial exploration of the parameter space and the deﬁnition of small enoughquantiles in the ﬁrst iterations appears to be why in the procedure based on the TARcurve, the total number of draws needed by the ABC–PMC algorithm is large, makingit very expensive computationally. In fact, at the end of the second iteration, the ma-jority of the previous iteration’s accepted particles are drawn near the local minimum.Moreover, since their N init = N , only few candidates close to the global optimum areavailable. This means that when a particle is resampled, it will likely come from regionsnear to the local minimum and therefore it may be easily rejected during the thirditeration of the ABC–PMC algorithm, for which the selected tolerance is (cid:15) = 50 . q t ’s early on, when larger im-provements occur between the sequential ABC posteriors. By doing so, larger reductionsin the tolerance sequence can be taken in the ﬁrst iterations of the ABC-PMC, whichresults in moving away from local optimal values into better regions of the parameterspace. If a suﬃcient reduction of the tolerance is not made early on, achieving a goodapproximation of the true posterior distribution is unlikely because the distances asso-ciated with the local optimal values will overwhelm the particle system so that it getsstuck in the local region.As previously done for the Gaussian Mixture Model example presented in Sec. 3.1,we conclude the analysis of this model by performing a comparison between our adaptiveaABC–PMC approach and the ABC-SMC algorithm by Del Moral et al. (2012). Again,four initial particle sample sizes of N init are considered (N, 2N, 5N, and 10N) and 21independent runs with the same dataset are performed. The results include the runsthat produced the median number of total draws and are compared to the correspondingresults obtained by our adaptive approach. The 5 parameters required by the ABC–SMC algorithm have been ﬁxed as follows: N = 1 , (cid:15) = 0 . α = 0 . M = 1and nb threshold = N/

2. We note again that default values are used for the last threeparameters, following the suggestions by Del Moral et al. (2012). The results of theanalysis are summarized in Table 5 and the corresponding posterior distributions aredisplayed in Fig. 7. From Table 5 with k = 5, although the number of total draws of theABC–SMC algorithm is smaller than the corresponding total number of draws obtainedby the adaptive aABC-PMC, our procedure is faster in terms of computational time.Moreover, the ﬁnal ABC posterior distribution obtained by the aABC-PMC algorithm( H dist = 0 . H dist = 0 . k = 1 and k = 2, whileour aABC-PMC failed to reach the global mode for k = 1 , k = 5, and for which H dist = 0 . imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. D t time (sec) H dist (cid:15) t N 56,535 31 0.374 0.000272N 111,478 106 0.368 0.000505N 278,412 606 0.388 0.0001710N 521,945 2305 0.41 0.00022

Table 5: ABC-SMC algorithm with diﬀerentchoices for N init ( N, N, N, N ) for theSilk et al. (2013)’s model. q D en s i t y True PosteriorABC−SMC Final Posterior NABC−SMC Final Posterior 2NABC−SMC Final Posterior 5NABC−SMC Final Posterior 10NLocal mode

Figure 7 : ABC-SMC ﬁnal posteriordistributions with diﬀerent choices for N init ( N, N, N, N ) for the Silk et al.(2013)’s model. The ﬁnal model we consider, discussed by Numminen et al. (2013), uses data on colo-nizations of the bacterium

Streptococcus pneumoniae . Discussion about mathematicalmodels for such scenarios, known as household models, can be found in Hoti et al. (2009)or Brooks-Pollock et al. (2011). According to the speciﬁcations provided in Numminenet al. (2013), the transmission process is modeled with four parameters. Two parame-ters, β and Λ, account for the hazards of infection from the day care center and from thecommunity, respectively. Another parameter, θ , scales the probability of co-infection.Finally, the parameter γ corresponds to the rate of clearance of an infection. In the fol-lowing analyses we considered γ = 1 ﬁxed and known, to be consistent with the analysisin Numminen et al. (2013).The observed data consists of the identiﬁed pneumococcal strains in a total of 611children from 29 day care centers, with varying numbers of sampled attendees per day(Vestrheim et al., 2008, 2010). For each of the 29 day care centers, a binary matrix withvarying number of sampled attendees is available. For each sampled attendee, the stateof carrying one of the 33 diﬀerent pneumococcal strains or not is indicated by a 1 or0, respectively, in the binary matrix. As pointed out in Gutmann and Corander (2016)statistical inference is challenging in this setting since the data represent a snapshot ofthe state of the sampled attendees at a single time point only. Moreover, the modeledsystem involves inﬁnitely many correlated unobserved variables, since the modeled pro-cess evolves in continuous time. Using the observed colonizations with bacterial strains,the following four summary statistics are obtained for each of the 29 day care centers:the Shannon index of diversity of the distribution of the observed strains, the numberof diﬀerent strains, the prevalence of carriage among the observed individuals, and the imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection prevalence of multiple infections among the observed individuals. By doing so, the di-mensionality of the problems reduces from a 611 · ·

29 = 584 ,

727 dimensional spaceto a 4 ·

29 = 116 dimensional space.Numminen et al. (2013) use the four summary statistics and four tolerances, (cid:15) =( (cid:15) , (cid:15) , (cid:15) , (cid:15) ), for each iteration of their procedure. Instead, we use the approach ofGutmann and Corander (2016). Each of the four summary statistics is rescaled so thatthe maximum value for each of the four the summary statistics is one. Then the summarystatistics are vectorized in order to obtain a single vector of dimension 116. Finally the L distance between the vector corresponding to y prop and the vector corresponding to y obs is calculated, with the result divided by 116. By doing so, only one tolerance isused in the ABC procedure.The series of tolerances used in Numminen et al. (2013) was based on the ABC–Sequential Monte Carlo (ABC–SMC) method proposed by Del Moral et al. (2012). TheABC–SMC method of Del Moral et al. (2012) adaptively proposes a series of tolerancesby estimating, at the end of each iteration, the eﬀective sample size (ESS). For a genericiteration t the ESS is deﬁned as:ESS( { W ( J ) t } NJ =1 ) = (cid:32) N (cid:88) J =1 (cid:16) W ( J ) t (cid:17) (cid:33) − , (3.3)where W ( J ) t is the importance weight for particle J = 1 , . . . , N at iteration t as deﬁnedin Eq. (1.1). Once the ESS is estimated by using Eq. (3.3), the new tolerance (cid:15) t +1 isobtained by solving the following for (cid:15) t +1 :ESS( { W ( J ) t } NJ =1 , (cid:15) t +1 ) = q t ESS( { W ( J ) t − } NJ =1 , (cid:15) t ) , (3.4)where q t is some pre-selected quantile which varies between 0 and 1. Numminen et al.(2013) had to adjust this to work for a their setting with four tolerances. We notethat our aABC–PMC approach does not require the speciﬁcation of a quantile q t , norother parameters such as the number M of simulations performed for each particle,the minimal eﬀective sample size threshold below which a resampling of particles isperformed, nb threshold , and the ﬁnal tolerance level, (cid:15) ﬁnal . Further details on the ABC–SMC algorithm and discussions on how to properly select its required parameters canbe found in Del Moral et al. (2012).The prior distributions for the three parameters of interest are β ∼ Unif(0 , ∼ Unif(0 , θ ∼ Unif(0 , N = 10 , N init = 5 × , , ,

696 draws vs. 2 , , imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al. t (cid:15) t D t t (cid:15) t q t D t N = 10 ,

000 accepted values for the ABC-SMC as presentedin Gutmann and Corander (2016) and the proposed aABC-PMC algorithm. In theaABC-PMC algorithm also the quantile automatically selected through the iterations isavailable. The procedure stopped once the quantile q = 0 .

993 was calculated. For theABC–SMC algorithm a total of 2 , ,

760 (4 days and 12 hours on a cluster with 200cores) draws are required, while our aABC-PMC takes 1 , ,

696 draws (3 days and 5hours on a cluster with 200 cores).draws). Because the proposed sampling procedure stops after t = 4 iterations, theexpensive forward model is used fewer times, achieving ﬁnal posterior distributions ina shorter amount of time. We note that the number of particles sampled in the ﬁrstiteration has an important role in the performance of the algorithm. In fact, havingsampled from the priors D = 50 ,

000 particles allowed the aABC–PMC algorithmto initiate with a smaller tolerance (cid:15) = 1 .

26 compared to the ABC–SMC algorithm( (cid:15) = 3 .

91 by ﬁxing D = 10 ,

000 particles).The ABC posteriors for the three parameters β , Λ and θ for the tolerances of Nummi-nen et al. (2013) selected by using ABC–SMC and the proposed aABC-PMC approachare displayed in Figures 8. We note that the ﬁnal tolerance from Numminen et al. (2013), (cid:15) = 0 .

83, is slightly smaller than the ﬁnal tolerance of aABC-PMC, (cid:15) = 0 .

93, but theposteriors for β , Λ and θ are comparable, with the Hellinger distances respectively equalsto H dist = 0 . , . , . . The ABC-PMC algorithm of Beaumont et al. (2009) has lead to great improvementsover the basic ABC rejection algorithm in terms of sampling eﬃciency. However, to useABC-PMC it is necessary to deﬁne a sequence of tolerances along with the total numberof iterations. We propose an approach leveraging ratio estimating methods for shrinkingthe tolerances by adaptively selecting a suitable quantile based on the progression of the The Hellinger distances are calculated between the ABC posterior distributions found by Nummi-nen et al. (2013) and the corresponding ABC posterior distributions retrieved with our aABC-PMCapproach. imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection . . . . . . b D en s i t y ABC Final PosterioraABC Final Posterior (a) Final ABC posteriors for β . L D en s i t y ABC Final PosterioraABC Final Posterior (b) Final ABC posteriors for λ . q D en s i t y ABC Final PosterioraABC Final Posterior (c) Final ABC posteriors for θ . Figure 8: Bacterial infection in day care centers ABC posteriors. Comparison betweenthe ﬁnal posterior distributions for β , λ and θ obtained by using Del Moral et al. (2012)’sadaptive selection of the tolerances (solid black) and by using the aABC-PMC algorithm(dashed blue).estimated ABC posteriors. The proposed adjustment to the existing algorithm is shownto be able to deal with the possible presence of local modes and shrinks the tolerance insuch a way that fewer draws are needed from the forward model compared to commonlyused techniques for selecting the tolerances. A simple criterion for stopping the algorithmbased on the behavior of the sequential ABC posterior distribution is also presented.The empirical performance in the examples considered suggests the proposed aABC-PMC algorithm is superior to the other options considered in terms of computationaltime and the number of draws from the forward model. Based on the computationalexperiments we envisage that the proposed aABC-PMC algorithm performs generallywell when dealing with small to moderate dimensional problems for which the originalABC-PMC algorithm was developed. It remains as a challenge for the future researchto generalize these samplers to higher dimensional models. Supplementary material for “Adaptive Approximate Bayesian Computation ToleranceSelection”(DOI: ; .pdf).

The authors thank IT-University of Helsinki and Yale’s Center for Research Comput-ing for the computational resources provided to execute the analyses of the presentwork. U. Simola was partially supported by Fondazione CARIPARO and supported bythe Academy of Finland grant no. 1313197. J. Corander was supported by the ERCgrant no. 742158. The authors are grateful for the comments and feedback from the imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al.

References

Andrieu, C., De Freitas, N., Doucet, A., and Jordan, M. I. (2003). “An introduction toMCMC for machine learning.”

Machine learning , 50(1-2): 5–43. 6Beaumont, M. A. (2010). “Approximate bayesian computation in evolution and ecol-ogy.”

Annual review of ecology, evolution, and systematics 41 , 96: 379 – 406. 1Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009). “Adaptiveapproximate Bayesian computation.”

Biometrika , 96(4): 983 – 990. 2, 3, 5, 11, 20,21Bickel, S., Br¨uckner, M., and Scheﬀer, T. (2007). “Discriminative learning for diﬀeringtraining and test distributions.” In

Proceedings of the 24th international conferenceon Machine learning , 81–88. ACM. 8Blum, M., Nunes, M., Prangle, D., and Sisson, S. (2013). “A comparative review ofdimension reduction methods in approximate Bayesian computation.”

Statistical Sci-ence , 28(2): 189 – 208. 2Blum, M. G. (2010). “Approximate Bayesian Computation: A nonparametric perspec-tive.”

Journal of American Statistical Association , 105(491): 1178 – 1187. 2Bonassi, F. and West, M. (2015). “Sequential Monte Carlo with Adaptive Weights forApproximate Bayesian Computation.”

Bayesian Analysis , (10): 171–187. 14Bregman, L. M. (1967). “The relaxation method of ﬁnding the common point of convexsets and its application to the solution of problems in convex programming.”

USSRcomputational mathematics and mathematical physics , 7(3): 200–217. 8Brent, R. P. (2013).

Algorithms for minimization without derivatives . Courier Corpo-ration. 9Brooks-Pollock, E., Becerra, M. C., Goldstein, E., Cohen, T., and Murray, M. B. (2011).“Epidemiologic inference from the distribution of tuberculosis cases in households inLima, Peru.”

Journal of Infectious Diseases , 203(11): 1582–1589. 19Cameron, E. and Pettitt, A. N. (2012). “Approximate Bayesian Computation for Astro-nomical Model Analysis: A Case Study in Galaxy Demographics and MorphologicalTransformation at High Redshift.”

Monthly Notices of the Royal Astronomical Soci-ety , 425: 44–65. 1Cisewski-Kehe, J., Weller, G., Schafer, C., et al. (2019). “A preferential attachmentmodel for the stellar initial mass function.”

Electronic Journal of Statistics , 13(1):1580–1607. 1, 2, 3, 6Corander, J., Fraser, C., Gutmann, M. U., Arnold, B., Hanage, W. P., Bentley, S. D.,Lipsitch, M., and Croucher, N. J. (2017). “Frequency-dependent selection in vaccine- imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection associated pneumococcal population dynamics.”

Nature ecology & evolution , 1(12):1950. 1Cornuet, J., Santos, F., Beaumont, M., Robert, C., Marin, J., Balding, D., Guillemaud,T., and Estoup, A. (2008). “Inferring population history with DIY ABC: a user-friendly approach to Approximate Bayesian Computation.”

Bioinformatics . 2Csill´ery, K., Blum, M. G., Gaggiotti, O. E., and Fran¸cois, O. (2010). “ApproximateBayesian Computation (ABC) in practice.”

Trends in ecology & evolution , 25(7): 410– 418. 2Del Moral, P., Doucet, A., and Jasra, A. (2012). “An adaptive sequential Monte Carlomethod for approximate Bayesian computation.”

Statistics and Computing , 22(5):1009–1020. 2, 3, 12, 14, 18, 20, 22Drovandi, C. C. and Pettitt, A. N. (2011). “Estimation of parameters for macroparasitepopulation evolution using approximate Bayesian computation. Biometrics.”

Statis-tics and Computing , 67(1): 225–233. 2Fearnhead, P. and Prangle, D. (2012). “Constructing summary statistics for approx-imate Bayesian computation: semi-automatic approximate Bayesian computation.”

Journal of the Royal Statistical Society Series B , 74(3): 419–474. 2Gelman, A., Carln, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2014).

Bayesian Data Analysis . Chapman & Hall. 10Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., and Sch¨olkopf,B. (2009). “Covariate shift by kernel mean matching.” 8Gutmann, M. U. and Corander, J. (2016). “Bayesian optimization for likelihood-freeinference of simulator-based statistical models.”

The Journal of Machine LearningResearch , 17(1): 4256–4302. 2, 19, 20, 21Hesterberg, T. C. (1988). “Advances in importance sampling.” Ph.D. thesis, StanfordUniversity. 7Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., and Kanamori, T. (2011). “Statisticaloutlier detection using direct density ratio estimation.”

Knowledge and informationsystems , 26(2): 309–336. 9Hoti, F., Er¨ast¨o, P., Leino, T., and Auranen, K. (2009). “Outbreaks of Streptococcuspneumoniae carriage in day care cohorts in Finland–implications for elimination oftransmission.”

BMC infectious diseases , 9(1): 102. 19Ishida, E., Vitenti, S., Penna-Lima, M., Cisewski, J., de Souza, R., Trindade, A.,Cameron, E., et al. (2015). “cosmoabc: Likelihood-free inference via Population MonteCarlo Approximate Bayesian Computation.”

Astronomy & Computing , 13: 1–11. 1,3, 5, 9J¨arvenp¨a¨a, M., Gutmann, M., Vehtari, A., and Marttinen, P. (2016). “Gaussian processmodeling in approximate Bayesian computation to estimate horizontal gene transferin bacteria.” arXiv preprint arXiv:1610.06462 . 2 imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 . Simola et al.

Astronomy and Computing . 2Jennings, E., Wolf, R., and Sako, M. (2016). “A new approach for obtaining cosmolog-ical constraints from type IA supernovae using approximate bayesian computation.”

Astronomy and Computing . 2Joyce, P. and Marjoram, P. (2008). “Approximately suﬃcient statistics and Bayesiancomputation.”

Statistical Applications in Genetics and Molecular Biology , 7(1): 1 –16. 2Julier, S., Uhlmann, J., and Durrant-Whyte, H. F. (2000). “A new method for thenonlinear transformation of means and covariances in ﬁlters and estimators.”

IEEETransactions on automatic control , 45(3): 477–482. 5Lenormand, M., Jabot, F., and Deuant, G. (2013). “Adaptive approximate bayesiancomputation for complex models.”

Computational Statistics , 6(28): 2777–2796. 3Lintusaari, J., Gutmann, M. U., Dutta, R., Kaski, S., and Corander, J. (2017). “Funda-mentals and recent developments in approximate Bayesian computation.”

Systematicbiology , 66(1): e66–e82. 9Marin, J.-M., Pudlo, P., Robert, C. P., and Ryder, R. J. (2012). “Approximate Bayesiancomputational methods.”

Statistics and Computing , 22(6): 1167 – 1180. 2McKinley, T., Cook, A., and Deardon, R. (2009). “Inference in epidemic models withoutlikelihoods.”

The International Journal of Biostatistics , 171(5). 1, 3Numminen, E., Cheng, L., Gyllenberg, M., and Corander, J. (2013). “Estimating thetransmission dynamics of Streptococcus pneumoniae from strain prevalence data.”

Biometrics , 69(3): 748–757. 1, 2, 3, 5, 10, 11, 19, 20, 21Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999). “Population Growthof Human Y Chromosomes: A study of Y Chromosome Microsatellites.”

MolecularBiology and Evolution , 16(12): 1791 – 1798. 1R Core Team (2019).

R: A Language and Environment for Statistical Computing . RFoundation for Statistical Computing, Vienna, Austria.URL

Monte Carlo statistical methods . Springer Science& Business Media. 6, 7Rubin, D. B. (1984). “Bayesianly justiﬁable and relevant frequency calculations for theapplied statistician.”

The Annals of Statistics , 12(4): 1151–1172. 1Schafer, C. M. and Freeman, P. E. (2012).

Statistical Challenges in Modern AstronomyV , chapter 1, 3 – 19. Lecture Notes in Statistics. Springer. 1 imsart-ba ver. 2014/10/16 file: aABCpmc_ba_REM.tex date: May 1, 2020 Adaptive Approximate Bayesian Computation Tolerance Selection

Silk, D., Filippi, S., and Stumpf, M. (2013). “Optimizing threshold-schedules for sequen-tial approximate Bayesian computation: applications to molecular systems.”

Statisti-cal Applications in Genetics and Molecular Biology , 5(12): 603–618. 2, 3, 10, 14, 15,16, 17, 19Silverman, B. W. (1986).

Density estimation for statistics and data analysis , volume 26.CRC press. 11— (2018).

Density estimation for statistics and data analysis . Routledge. 7, 11Simola, U., Pelssers, B., Barge, D., Conrad, J., and Corander, J. (2019). “Machine learn-ing accelerated likelihood-free event reconstruction in dark matter direct detection.”

Journal of Instrumentation , 14(03): P03004. 1, 2, 3Sisson, S. A., Fan, Y., and Tanaka, M. M. (2007). “Sequential Monte Carlo withoutlikelihoods.”

Proceedings of the National Academy of Science , 104(6): 1760 – 1765.2, 3, 5, 10, 11, 12Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P. V., and Kawanabe, M. (2008).“Direct importance estimation with model selection and its application to covariateshift adaptation.” In

Advances in neural information processing systems , 1433–1440.7, 8, 9Sugiyama, M., Suzuki, T., and Kanamori, T. (2010). “Density Ratio Estimation: AComprehensive Review (Statistical Experiment and Its Related Topics).” 8, 9— (2012).

Density ratio estimation in machine learning . Cambridge University Press.8, 9Tavar´e, S., Balding, D. J., Griﬃths, R., and Donnelly, P. (1997). “Inferring coalescencetimes from DNA sequence data.”

Genetics , 145: 505 – 518. 1Thornton, K. and Andolfatto, P. (2006). “Inference in epidemic models without likeli-hoods.”

Genetics , 172: 1607 – 1619. 1Toni, T., Welch, D., Strelkowa, N., Ipsen, A., and Stumpf, M. P. H. (2009). “Approx-imate Bayesian computation scheme for parameter inference and model selection indynamical systems.”

Journal of the Royal Society, Interface / the Royal Society ,6(31): 187–202. 1, 2, 3, 10Vestrheim, D. F., Høiby, E. A., Aaberge, I. S., and Caugant, D. A. (2010). “Impact of apneumococcal conjugate vaccination program on carriage among children in Norway.”

Clin. Vaccine Immunol. , 17(3): 325–334. 19Vestrheim, D. F., Løvoll, Ø., Aaberge, I. S., Caugant, D. A., Høiby, E. A., Bakke, H.,and Bergsaker, M. R. (2008). “Eﬀectiveness of a 2+ 1 dose schedule pneumococcalconjugate vaccination programme on invasive pneumococcal disease among childrenin Norway.”

Vaccine , 26(26): 3277–3281. 19Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013). “Likelihood-free cosmologicalinference with type Ia supernovae: approximate Bayesian computation for a completetreatment of uncertainty.”

The Astrophysical Journal , 764: 116. 1, 3, 764: 116. 1, 3