[PDF] On the use of approximate Bayesian computation Markov chain Monte Carlo with inflated tolerance and post-correction

Abstract

Approximate Bayesian computation allows for inference of complicated probabilistic models with intractable likelihoods using model simulations. The Markov chain Monte Carlo implementation of approximate Bayesian computation is often sensitive to the tolerance parameter: low tolerance leads to poor mixing and large tolerance entails excess bias. We consider an approach using a relatively large tolerance for the Markov chain Monte Carlo sampler to ensure its sufficient mixing, and post-processing the output leading to estimators for a range of finer tolerances. We introduce an approximate confidence interval for the related post-corrected estimators, and propose an adaptive approximate Bayesian computation Markov chain Monte Carlo, which finds a `balanced' tolerance level automatically, based on acceptance rate optimisation. Our experiments show that post-processing based estimators can perform better than direct Markov chain targetting a fine tolerance, that our confidence intervals are reliable, and that our adaptive algorithm leads to reliable inference with little user specification.

Full PDF

OON THE USE OF APPROXIMATE BAYESIAN COMPUTATIONMARKOV CHAIN MONTE CARLO WITH INFLATED TOLERANCEAND POST-CORRECTION

MATTI VIHOLA AND JORDAN FRANKS

Abstract.

Approximate Bayesian computation allows for inference of complicated prob-abilistic models with intractable likelihoods using model simulations. The Markov chainMonte Carlo implementation of approximate Bayesian computation is often sensitive to thetolerance parameter: low tolerance leads to poor mixing and large tolerance entails excessbias. We consider an approach using a relatively large tolerance for the Markov chainMonte Carlo sampler to ensure its suﬃcient mixing, and post-processing the output lead-ing to estimators for a range of ﬁner tolerances. We introduce an approximate conﬁdenceinterval for the related post-corrected estimators, and propose an adaptive approximateBayesian computation Markov chain Monte Carlo, which ﬁnds a ‘balanced’ tolerance levelautomatically, based on acceptance rate optimisation. Our experiments show that post-processing based estimators can perform better than direct Markov chain targetting a ﬁnetolerance, that our conﬁdence intervals are reliable, and that our adaptive algorithm leadsto reliable inference with little user speciﬁcation. Introduction

Approximate Bayesian computation is a form of likelihood-free inference (see, e.g., thereviews Marin et al., 2012; Sunn˚aker et al., 2013) which is used when exact Bayesian in-ference of a parameter θ ∈ T with posterior density π ( θ ) ∝ pr( θ ) L ( θ ) is impossible, wherepr( θ ) is the prior density and L ( θ ) = g ( y ∗ | θ ) is an intractable likelihood with data y ∗ ∈ Y .More speciﬁcally, when the generative model of observations g ( · | θ ) cannot be evaluated,but allows for simulations, we may perform relatively straightforward approximate inferencebased on the following (pseudo-)posterior:(1) π (cid:15) ( θ ) ∝ pr( θ ) L (cid:15) ( θ ) , where L (cid:15) ( θ ) = E [ K (cid:15) ( Y θ , y ∗ )] , Y θ ∼ g ( · | θ ) , where (cid:15) > K (cid:15) : Y → [0 , ∞ ) is a ‘kernel’ function, whichis often taken as a simple cut-oﬀ K (cid:15) ( y, y ∗ ) = 1 ( (cid:107) s ( y ) − s ( y ∗ ) (cid:107) ≤ (cid:15) ), where s : Y → R d extracts a vector of summary statistics from the (pseudo) observations.The summary statistics are often chosen based on the application at hand, and reﬂectwhat is relevant for the inference task; see also (Fearnhead and Prangle, 2012; Raynal et al.,to appear). Because L (cid:15) ( θ ) may be regarded as a smoothed version of the true likelihood g ( y ∗ | θ ) using the kernel K (cid:15) , it is intuitive that using a too large (cid:15) may blur the likelihoodand bias the inference. Therefore, it is generally desirable to use as small a tolerance (cid:15) > (cid:15) ,the choice of tolerance level is diﬃcult (cf. Bortot et al., 2007; Sisson and Fan, 2018; Tanakaet al., 2006).We discuss simple post-processing procedure which allows for consideration of a range ofvalues for the tolerance (cid:15) ≤ δ , based on a single run of approximate Bayesian computationMarkov chain Monte Carlo (Marjoram et al., 2003) with tolerance δ . Such post-processing Key words and phrases.

Adaptive, approximate Bayesian computation, conﬁdence interval, importancesampling, Markov chain Monte Carlo, tolerance choice. a r X i v : . [ s t a t . C O ] M a y MATTI VIHOLA AND JORDAN FRANKS was suggested in (Wegmann et al., 2009) (in case of simple cut-oﬀ), and similar post-processing has been suggested also with regression adjustment (Beaumont et al., 2002) (ina rejection sampling context). The method, discussed further in Section 2, can be usefulfor two reasons: A range of tolerances (cid:15) ≤ δ may be routinely inspected, which can revealexcess bias in the pseudo-posterior π δ ; and the Markov chain Monte Carlo inference maybe implemented with suﬃciently large δ to allow for good mixing.Our contribution is two-fold. We suggest straightforward-to-calculate approximate conﬁ-dence intervals for the posterior mean estimates calculated from the post-processing output,and discuss some theoretical properties related to it. We also introduce an adaptive approx-imate Bayesian computation Markov chain Monte Carlo which ﬁnds a balanced δ duringburn-in, using acceptance rate as a proxy, and detail a convergence result for it.2. Post-processing over a range of tolerances

For the rest of the paper, we assume that the kernel function in (1) has the form K (cid:15) ( y, y ∗ ) = φ (cid:0) d ( y, y ∗ ) /(cid:15) (cid:1) , where d : Y → [0 , ∞ ) is any ‘dissimilarity’ function and φ : [0 , ∞ ) → [0 ,

1] is a non-increasing ‘cut-oﬀ’ function. Typically d ( y, y ∗ ) = (cid:107) s ( y ) − s ( y ∗ ) (cid:107) , where s : Y → R d arethe chosen summaries, and in case of the simple cut-oﬀ discussed in Section 1, φ ( t ) = φ simple ( t ) = 1 ( t ≤ π (cid:15) given in (1) iswell-deﬁned for all (cid:15) > c (cid:15) = (cid:82) pr( θ ) L (cid:15) ( θ )d θ > q and tolerance δ > Algorithm 1 ( abc - mcmc ( δ )) . Suppose Θ ∈ T and Y ∈ Y are any starting values, suchthat pr(Θ ) > φ ( d ( Y , y ∗ ) /δ ) >

0. For k = 1 , , . . . , iterate:(i) Draw ˜Θ k ∼ q (Θ k − , · ) and ˜ Y k ∼ g ( · | ˜Θ k ).(ii) With probability α δ (Θ k − , Y k − ; ˜Θ k , ˜ Y k ) accept and set (Θ k , Y k ) ← ( ˜Θ k , ˜ Y k ); otherwisereject and set (Θ k , Y k ) ← (Θ k − , Y k − ), where α δ ( θ, y ; ˜ θ, ˜ y ) = min (cid:26) , pr(˜ θ ) q (˜ θ, θ ) φ (cid:0) d (˜ y, y ∗ ) /δ (cid:1) pr( θ ) q ( θ, ˜ θ ) φ (cid:0) d ( y, y ∗ ) /δ (cid:1) (cid:27) . Algorithm 1 may be implemented by storing only Θ k and the related distances T k = d ( Y k , y ∗ ), and in what follows, we regard either (Θ k , Y k ) k ≥ or (Θ k , T k ) k ≥ as the outputof Algorithm 1. In practice, the initial values (Θ , Y ) should be taken as the state of theAlgorithm 1 run for a number of initial ‘burn-in’ iterations. We also introduce an adaptivealgorithm for parameter tuning later (Section 4).It is possible to consider a variant of Algorithm 1 where many (possibly dependent)observations ˜ Y (1) k , . . . , ˜ Y ( m ) k ∼ g ( · | ˜Θ k ) are simulated in each iteration, and an average oftheir kernel values is used in the accept-reject step (cf. Andrieu et al., 2018). We focus here inthe case of single pseudo-observation per iteration, following the asymptotic eﬃciency resultof Bornn et al. (2017), but remark that our method may be applied in a straightforwardmanner also with multiple observations. Deﬁnition 2.

Suppose (Θ k , T k ) k =1 ,...,n is the output of abc - mcmc ( δ ) for some δ >

0. Forany (cid:15) ∈ (0 , δ ] such that φ ( T k /(cid:15) ) > k = 1 , . . . , n , and for any function f : T → R ,deﬁne U ( δ,(cid:15) ) k = φ ( T k /(cid:15) ) (cid:14) φ ( T k /δ ) , W ( δ,(cid:15) ) k = U ( δ,(cid:15) ) k (cid:14) (cid:80) nj =1 U ( δ,(cid:15) ) j ,E δ,(cid:15) ( f ) = (cid:80) nk =1 W ( δ,(cid:15) ) k f (Θ k ) , S δ,(cid:15) ( f ) = (cid:80) nk =1 ( W ( δ,(cid:15) ) k ) (cid:8) f (Θ k ) − E δ,(cid:15) ( f ) (cid:9) . N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 3

Algorithm 11 in Appendix details how E δ,(cid:15) ( f ) and S δ,(cid:15) ( f ) can be calculated in O ( n log n )time simultaneously for all (cid:15) ≤ δ in case of simple cut-oﬀ. The estimator E δ,(cid:15) ( f ) approxi-mates E π (cid:15) [ f (Θ)] and S δ,(cid:15) ( f ) may be used to construct a conﬁdence interval; see Algorithm 5below. Theorem 4 details consistency of E δ,(cid:15) ( f ), and relates S δ,(cid:15) ( f ) to the limiting variance,in case the following (well-known) condition ensuring a central limit theorem holds: Assumption 3 (Finite integrated autocorrelation) . Suppose that E π (cid:15) [ f (Θ)] < ∞ and (cid:80) k ≥ ρ ( δ,(cid:15) ) k is ﬁnite, with ρ ( δ,(cid:15) ) k = Corr (cid:0) h δ,(cid:15) (Θ ( s )0 , Y ( s )0 ) , h δ,(cid:15) (Θ ( s ) k , Y ( s ) k ) (cid:1) , where (Θ ( s ) k , Y ( s ) k ) k ≥ is a stationary version of the abc - mcmc ( δ ) chain, and h δ,(cid:15) ( θ, y ) = w δ,(cid:15) ( y ) f ( θ ) where w δ,(cid:15) ( y ) = φ (cid:0) d ( y, y ∗ ) /(cid:15) (cid:1) /φ (cid:0) d ( y, y ∗ ) /δ (cid:1) . Theorem 4.

Suppose (Θ k , T k ) k ≥ is the output of abc - mcmc ( δ ), and denote by E ( n ) δ,(cid:15) ( f ) and S ( n ) δ,(cid:15) ( f ) the estimators in Deﬁnition 2. If (Θ k , T k ) k ≥ is ϕ -irreducible (Meyn and Tweedie,2009) then, for any (cid:15) ∈ (0 , δ ) , we have as n → ∞ :(i) E ( n ) δ,(cid:15) ( f ) → E π (cid:15) [ f (Θ)] almost surely, whenever the expectation is ﬁnite.(ii) Under Assumption 3, n / (cid:0) E ( n ) δ,(cid:15) ( f ) − E π (cid:15) [ f (Θ)] (cid:1) → N (cid:0) , v δ,(cid:15) ( f ) τ δ,(cid:15) ( f ) (cid:1) in distribution,where τ δ,(cid:15) ( f ) = (cid:0) (cid:80) k ≥ ρ ( δ,(cid:15) ) k (cid:1) ∈ [0 , ∞ ) and nS ( n ) δ,(cid:15) ( f ) → v δ,(cid:15) ( f ) ∈ [0 , ∞ ) almostsurely. Proof of Theorem 4 is given in Appendix. Inspired by Theorem 4, we suggest to reportthe following approximate conﬁdence intervals for the suggested estimators:

Algorithm 5.

Suppose (Θ k , T k ) k =1 ,...,n is the output of abc - mcmc ( δ ) and f : Θ → R is afunction, then for any (cid:15) ≤ δ :(i) Calculate E δ,(cid:15) ( f ) and S δ,(cid:15) ( f ) as in Deﬁnition 2 (or in Algorithm 11).(ii) Calculate ˆ τ δ ( f ), an estimate of the integrated autocorrelation of (cid:0) f (Θ k ) (cid:1) k =1 ,...,n .(iii) Report the conﬁdence interval (cid:2) E δ,(cid:15) ( f ) ± z q (cid:0) S δ,(cid:15) ( f )ˆ τ δ ( f ) (cid:1) / (cid:3) , where z q > τ δ ( f ) for all τ δ,(cid:15) ( f ). This relies onthe approximation τ δ,(cid:15) ( f ) (cid:47) τ δ ( f ), which may not always be entirely accurate, but likelyto be reasonable, as illustrated by Theorem 6 in Section 3 below. We suggest using acommon ˆ τ δ ( f ) for all tolerances because direct estimation of integrated autocorrelation iscomputationally demanding, and likely to be unstable for small (cid:15) .The classical choice for ˆ τ δ ( f ) in Algorithm 5(ii) is windowed autocorrelation, ˆ τ δ ( f ) = (cid:80) ∞ k = −∞ ω ( k ) ˆ ρ k , with some 0 ≤ ω ( k ) ≤

1, where ˆ ρ k is the sample autocorrelation of (cid:0) f (Θ k ) (cid:1) (cf. Geyer, 1992). We employ this approach in our experiments with ω ( k ) = 1 ( | k | ≤ M )where the cut-oﬀ lag M is chosen adaptively as the smallest integer such that M ≥ (cid:0) (cid:80) Mi =1 ˆ ρ k (cid:1) (Sokal, 1996). Also more sophisticated techniques for the calculation of theasymptotic variance have been suggested (e.g. Flegal and Jones, 2010).We remark that, although we focus here on the case of using a common cut-oﬀ φ forboth the abc - mcmc ( δ ) and the post-correction, one could also use a diﬀerent cut-oﬀ φ s inthe simulation phase, as considered by Beaumont et al. (2002) in the regression context.The extension to Deﬁnition 2 is straightforward, setting U ( δ,(cid:15) ) k = φ ( T k /(cid:15) ) (cid:14) φ s ( T k /δ ), andTheorem 4 remains valid under a support condition. MATTI VIHOLA AND JORDAN FRANKS Theoretical justification

The following result, whose proof is given in Appendix, gives an expression for the inte-grated autocorrelation in case of simple cut-oﬀ.

Theorem 6.

Suppose Assumption 3 holds and φ = φ simple , then τ δ,(cid:15) ( f ) − (cid:0) ˇ τ δ,(cid:15) ( f ) − (cid:1) var π δ ( f δ,(cid:15) ) + 2 (cid:82) π δ ( θ ) ¯ w δ,(cid:15) ( θ ) (cid:0) − ¯ w δ,(cid:15) ( θ ) (cid:1) r δ ( θ )1 − r δ ( θ ) f ( θ )d θ var π δ ( f δ,(cid:15) ) + (cid:82) π δ ( θ ) ¯ w δ,(cid:15) ( θ ) (cid:0) − ¯ w δ,(cid:15) ( θ ) (cid:1) f ( θ )d θ , where ¯ w δ,(cid:15) ( θ ) = L (cid:15) ( θ ) /L δ ( θ ) , f δ,(cid:15) ( θ ) = f ( θ ) ¯ w δ,(cid:15) ( θ ) , ˇ τ δ,(cid:15) ( f ) is the integrated autocorrelationof { f δ,(cid:15) (Θ ( s ) k ) } k ≥ and r δ ( θ ) is the rejection probability of the abc - mcmc ( δ ) chain at θ . We next discuss how this loosely suggests that τ δ,(cid:15) ( f ) (cid:47) τ δ,δ ( f ) = τ δ ( f ). The weight¯ w δ,δ ≡

1, and under suitable regularity conditions both ¯ w δ,(cid:15) ( θ ) and ˇ τ δ,(cid:15) ( f ) are continuouswith respect to (cid:15) , and ¯ w δ,(cid:15) ( θ ) → (cid:15) →

0. Then, for (cid:15) ≈ δ , we have ¯ w δ,(cid:15) ≈ τ δ,δ ( f ) ≈ τ δ,(cid:15) ( f ). For small (cid:15) , the terms with var π δ ( f δ,(cid:15) ) are of order O ( ¯ w δ,(cid:15) ), andare dominated by the other terms of order O ( ¯ w δ,(cid:15) ). The remaining ratio may be written as2 (cid:82) π δ ( θ ) ¯ w δ,(cid:15) ( θ ) (cid:0) − ¯ w δ,(cid:15) ( θ ) (cid:1) r δ ( θ )1 − r δ ( θ ) f ( θ )d θ (cid:82) π δ ( θ ) ¯ w δ,(cid:15) ( θ ) (cid:0) − ¯ w δ,(cid:15) ( θ ) (cid:1) f ( θ )d θ = 2 E π δ (cid:104) ¯ g δ,(cid:15) (Θ) r δ (Θ)1 − r δ (Θ) (cid:105) , where ¯ g δ,(cid:15) ∝ { ¯ w δ,(cid:15) (1 − ¯ w δ,(cid:15) ) } / f with π δ (¯ g δ,(cid:15) ) = 1. If r δ ( θ ) ≤ r ∗ <

1, then the term is upperbounded by 2 r ∗ (1 − r ∗ ) − , and we believe it to be often less than τ δ,δ ( f ), because the latterexpression is similar to the contribution of rejections to the integrated autocorrelation; seethe proof of Theorem 6.For general φ , it appears to be hard to obtain similar theoretical result, but we expect theapproximation to be still sensible. Theorem 6 relies on Y ( s ) k being independent of (Θ (0) k , Y (0) k )conditional on Θ ( s ) k , assuming at least single acceptance. This is not true with other cut-oﬀs,but we believe that the dependence of Y ( s ) k from (Θ ( s )0 , Y ( s )0 ) given Θ ( s ) k is generally weakerthan dependence of Θ ( s ) k and Θ ( s )0 , suggesting similar behaviour.We conclude the section with a general (albeit pessimistic) upper bound for the asymp-totic variance of the post-corrected estimators. Theorem 7.

For any (cid:15) ≤ δ , denote by σ δ,(cid:15) ( f ) = v δ,(cid:15) ( f ) τ δ,(cid:15) ( f ) the asymptotic variance ofthe estimator of Deﬁnition 2 (see Theorem 4 (ii) ) and ¯ f ( θ ) = f ( θ ) − E π (cid:15) [ f (Θ)] , then for any (cid:15) ≤ δ , σ δ,(cid:15) ( f ) ≤ ( c δ /c (cid:15) ) (cid:8) σ (cid:15) ( f ) + ˜ π (cid:15) (cid:0) ¯ f (1 − w δ,(cid:15) ) (cid:1)(cid:9) , where ˜ π (cid:15) is the stationary distribution of the direct abc - mcmc ( (cid:15) ) and σ (cid:15) ( f ) = σ (cid:15),(cid:15) ( f ) itsasymptotic variance. Theorem 7 follows directly from (Franks and Vihola, 2017, Corollary 4). The upperbound guarantees that a moderate correction, that is, (cid:15) close to δ and c δ close to c (cid:15) , isnearly as eﬃcient as direct abc - mcmc ( δ ). Indeed, typically w δ,(cid:15) → c (cid:15) → c δ as (cid:15) → δ ,in which case Theorem 7 implies lim sup (cid:15) → δ σ δ,(cid:15) ( f ) ≤ σ (cid:15) ( f ). However, as (cid:15) →

0, the boundbecomes less informative. 4.

Tolerance adaptation

We propose Algorithm 8 below to adapt the tolerance δ in abc - mcmc ( δ ) during a burn-inof length n b , in order to obtain a user-speciﬁed overall acceptance rate α ∗ ∈ (0 , N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 5 might not be satisfactory in the Markov chain Monte Carlo context, if the prior is relativelyuninformative. We believe that acceptance rate optimisation is a more natural alternative,and Sisson and Fan (2018) suggested this as well.Our method requires also a sequence of decreasing positive step sizes ( γ k ) k ≥ . We used α ∗ = 0 . γ k = k − / in our experiments, and discuss these choices later. Algorithm 8.

Suppose Θ ∈ T is a starting value with pr(Θ ) >

0. Initialise δ = d ( Y , y ∗ ) > Y ∼ g ( · | Θ ). For k = 1 , . . . , n b , iterate:(i) Draw ˜Θ k ∼ q (Θ k − , · ) and ˜ Y k ∼ g ( · | ˜Θ k ).(ii) With probability A k = α δ k − (Θ k − , Y k − ; ˜Θ k , ˜ Y k ) accept and set (Θ k , Y k ) ← ( ˜Θ k , ˜ Y k );otherwise reject and set (Θ k , Y k ) ← (Θ k − , Y k − ).(iii) log δ k ← log δ k − + γ k ( α ∗ − A k ).In practice, we use Algorithm 8 with a Gaussian symmetric random walk proposal q Σ k ,where the covariance parameter Σ k is adapted simultaneously (Andrieu and Moulines, 2006;Haario et al., 2001) (see Algorithm 23 of Supplement C). We only detail theory for Algorithm8, but note that similar simultaneous adaptation has been discussed earlier (cf. Andrieu andThoms, 2008), and expect that our results could be elaborated accordingly.The following conditions suﬃce for convergence of the adaptation: Assumption 9.

Suppose φ = φ simple and the following hold:(i) γ k = Ck − r with r ∈ ( ,

1] and

C > T ⊂ R n θ , n θ ≥

1, is a nonempty open set and pr( θ ) is bounded.(iii) The proposal q is bounded and bounded away from zero.(iv) The distances D θ = d ( Y θ , y ∗ ) where Y θ ∼ g ( · | θ ) admit densities which are uniformlybounded in θ .(v) ( δ k ) k ≥ stays in a set [ a, b ] almost surely, where 0 < a ≤ b < + ∞ .(vi) c (cid:15) = (cid:82) pr(d θ ) L (cid:15) ( θ ) > (cid:15) ∈ [ a, b ]. Theorem 10.

Under Assumption 9, the expected value of the acceptance probability, withrespect to the stationary distribution of the chain, converges to α ∗ . Proof of Theorem 10 will follow from the more general Theorem 14 of Supplement A.Polynomially decaying step size sequences as in Assumption 9 (i) are common in adap-tation which is of the stochastic approximation type as our approach (Andrieu and Thoms,2008). Slower decaying step sizes such as n − / often behave better with acceptance rateadaptation (cf. Vihola, 2012, Remark 3).Simple random walk Metropolis with covariance adaptation (Haario et al., 2001) typicallyleads to a limiting acceptance rate around 0 .

234 (Roberts et al., 1997). In case of a pseudo-marginal algorithm such as abc - mcmc ( δ ), the acceptance rate is lower than this, anddecreases when δ is decreased (see Lemma 16 of Supplement B). Markov chain MonteCarlo would typically be necessary when rejection sampling is not possible, that is, whenthe prior is far from the posterior. In such a case, the likelihood approximation must beaccurate enough to provide reasonable approximation π δ ≈ π (cid:15) . This suggests that thedesired acceptance rate should be taken substantially lower than 0 . α ∗ could also be motivated by theory developedfor pseudo-marginal Markov chain Monte Carlo algorithms. Doucet et al. (2015) rely onlog-normality of the likelihood estimators, which is problematic in our context, because thelikelihood estimators take value zero. Sherlock et al. (2015) ﬁnd the acceptance rate 0 . .

07 guideline assumes a ﬁxed tolerance, and informs about choosing the

MATTI VIHOLA AND JORDAN FRANKS number of pseudo-data per iteration. As we stick with single pseudo-data per iterationfollowing (Bornn et al., 2017), the 0 .

07 guideline cannot be taken too informative. Werecommend slightly higher α ∗ such as 0 . Post-processing with regression correction

Beaumont et al. (2002) suggested similar post-processing as in Section 2, applying afurther regression correction. Namely, in the context of Section 2, we may consider afunction ˜ f ( (cid:15) ) ( θ, y ) = f ( θ ) − ¯ s ( y ) T b (cid:15) where ¯ s ( y ) = s ( y ) − s ( y ∗ ) and b (cid:15) is a solution ofmin a (cid:15) ,b (cid:15) E ˜ π (cid:15) (cid:2)(cid:8) f (Θ) − a (cid:15) − ¯ s ( Y ) T b (cid:15) (cid:9) (cid:3) = min a (cid:15) ,b (cid:15) E ˜ π δ (cid:2) w δ,(cid:15) ( Y ) (cid:8) f (Θ) − a (cid:15) − ¯ s ( Y ) T b (cid:15) (cid:9) (cid:3) , where ˜ π δ is the stationary distribution of abc - mcmc ( δ ), with marginal π δ , given in Ap-pendix. When the latter expectation is replaced by its empirical version, the solutioncoincides with weighted least squares (ˆ a (cid:15) , ˆ b T (cid:15) ) T = (M T W (cid:15) M) − M T W (cid:15) v , with v k = f (Θ k ),W (cid:15) = diag( W ( δ,(cid:15) )1 , . . . , W ( δ,(cid:15) ) n ) and with matrix M having rows [ M ] k, · = (1 , ¯ s ( Y k ) T ).We suggest the following conﬁdence interval for a (cid:15) = E ˜ π (cid:15) [ ˜ f ( (cid:15) ) (Θ , Y )] in the spirit ofAlgorithm 5: (cid:2) ˆ a (cid:15) ± z q (cid:0) S reg δ,(cid:15) ˆ τ reg δ (cid:1) / (cid:3) , where ˆ τ reg δ is the integrated autocorrelation estimate for ( ˆ F ( δ ) k ) where ˆ F ( δ ) k = f (Θ k ) − ¯ s T ˆ b δ and S reg δ,(cid:15) = [(M T W (cid:15) M ) − ] , (cid:80) nk =1 ( W ( δ,(cid:15) ) k ) ( ˆ F ( (cid:15) ) k − ˆ a (cid:15) ) , where the ﬁrst term is included as anattempt to account for the increased uncertainty due to estimated ˆ b (cid:15) , analogous to weightedleast squares. Experimental results show some promise for this conﬁdence interval, but westress that we do not have better theoretical backing for it, and leave further elaboration ofthe conﬁdence interval for future research.6. Experiments

We experiment with our methods on two models, a lightweight Gaussian toy example,and a Lotka-Volterra model. Our experiments focus on three aspects: can abc - mcmc ( δ )with larger tolerance δ and post-correction to a desired tolerance (cid:15) < δ deliver moreaccurate results than direct abc - mcmc ( (cid:15) ); does the approximate conﬁdence interval ap-pear reliable; how well does the tolerance adaptation work in practice. All the experi-ments are implemented in Julia (Bezanson et al., 2017), and the codes are available in https://bitbucket.org/mvihola/abc-mcmc .Because we believe that Markov chain Monte Carlo is most useful when little is knownabout the posterior, we apply covariance adaptation (Andrieu and Moulines, 2006; Haarioet al., 2001) throughout the simulation in all our experiments, using an identity covarianceinitially. When running the covariance adaptation alone, we employ the step size n − as inthe original method of Haario et al. (2001), and in case of tolerance adaptation, we use stepsize n − / .Regarding our ﬁrst question, we investigate running abc - mcmc ( δ ) starting near theposterior mode with diﬀerent pre-selected tolerances δ . We ﬁrst attempted to perform theexperiments by initialising the chains from independent samples of the prior distribution,but in this case, most of the chains failed to accept a single move during the entire run. Incontrast, our experiments with tolerance adaptation are initialised from the prior, and boththe tolerances and the covariances are adjusted fully automatically by our algorithm. N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 7 q e | q | e Figure 1.

Gaussian model with φ simple . Estimates from single run of abc - mcmc (3) (left) and estimates from 10,000 replications of abc - mcmc ( δ ) for δ ∈ { . , . , . , . , } indicated by colours.6.1. One-dimensional Gaussian model.

Our ﬁrst model is a toy model with pr( θ ) = N ( θ ; 0 , ), g ( y | θ ) = N ( y ; θ,

1) and d ( y, y ∗ ) = | y | . The true posterior without approxima-tion is Gaussian. While this scenario is clearly academic, the prior is far from the posterior,making rejection sampling approximate Bayesian computation ineﬃcient. It is clear that π (cid:15) has zero mean for all (cid:15) (by symmetry), and that π (cid:15) is more spread for bigger (cid:15) . Weexperiment with both simple cut-oﬀ φ simple and Gaussian cut-oﬀ φ Gauss ( t ) = e − t / .We run the experiments with 10,000 independent chains, each for 11,000 iterations in-cluding 1,000 burn-in. The chains were always started from θ = 0. We inspect estimatesfor the posterior mean E π (cid:15) [ f (Θ)] for f ( θ ) = θ and f ( θ ) = | θ | . Figure 1 (left) shows theestimates with their conﬁdence intervals based on a single realisation of abc - mcmc (3).Figure 1 (right) shows box plots of the estimates calculated from each abc - mcmc ( δ ), with δ indicated by colour; the rightmost box plot (blue) corresponds to abc - mcmc (3), the sec-ond from the right (red) abc - mcmc (2.275) etc. For (cid:15) = 0 .

1, the post-corrected estimatesfrom abc - mcmc (0 . abc - mcmc (1 .

55) appear slightly more accurate than direct abc - mcmc (0 . E π (cid:15) [ f (Θ)] is knownto be zero for all (cid:15) , and the overall mean of all the calculated estimates is used as theground truth for E π (cid:15) [ f (Θ)]. The frequencies appear close to ideal with the post-correctionapproach, being slightly pessimistic in case of simple cut-oﬀ as anticipated by the theoreticalconsiderations (cf. Theorem 6 and the discussion below).Figure 2 shows progress of tolerance adaptations during the burn-in, and histogram ofthe mean acceptance rates of the chain after burn-in. The lines on the left show themedian, and the shaded regions indicate the 50%, 75%, 95% and 99% quantiles. The ﬁguresuggests concentration, but reveals that the adaptation has not fully converged yet. Thisis also visible in the mean acceptance rate over all realisations, which is 0 .

17 for simplecut-oﬀ and 0 .

12 for Gaussian cut-oﬀ (see Figure 7 in the Supplement). Table 2 showsroot mean square errors for target tolerance (cid:15) = 0 .

1, with both abc - mcmc ( δ ) with δ ﬁxedas above, and for the tolerance adaptive algorithm. Here, only the adaptive chains withﬁnal tolerance ≥ . φ simple and φ Gauss , respectively). Tolerance adaptation (started from prior distribution) appears to becompetitive with ‘optimally’ tuned ﬁxed tolerance abc - mcmc ( δ ). MATTI VIHOLA AND JORDAN FRANKS

Table 1.

Frequencies of the 95% conﬁdence intervals, from abc - mcmc ( δ )to tolerances (cid:15) , containing the ground truth in the Gaussian model. f ( x ) = x f ( x ) = | x | Acc.Cut-oﬀ δ \ (cid:15) φ simple φ Gauss k (burn-in iteration) l og d k acceptance rate Figure 2.

Progress of tolerance adaptation (left) and histogram of accep-tance rates (right) in the Gaussian model experiment with simple cut-oﬀ.

Table 2.

Root mean square errors ( × − ) from abc - mcmc ( δ ) for tolerance (cid:15) = 0 . φ simple φ Gauss

Fixed tolerance Adapt Fixed tolerance Adapt δ x | x | Lotka-Volterra model.

Our second experiment is a Lotka-Volterra model suggestedby Boys et al. (2008), which was considered in the approximate Bayesian computationcontext by Fearnhead and Prangle (2012). The model is a Markov process ( X t , Y t ) t ≥ ofcounts, corresponding to a reaction network X → X with rate θ , X + Y → Y with rate θ and Y → ∅ with rate θ . The reaction log-rates (log θ , log θ , log θ ) T are parameters,which we equip with a uniform prior, (log θ , log θ , log θ ) T ∼ U ([ − , ). The data is asimulated trajectory from the model with θ = (0 . , . , . T until time 40. The inferenceis based on the Euclidean distances of ﬁve-dimensional summary statistics of the processobserved every 5 time units ( ˜ X k = X k and ˜ Y k = Y k ). The summary statistics are the N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 9

80 110 140 170 2000.560.580.600.620.64 q

80 110 140 170 2000.560.580.600.620.6480 110 140 170 2000.00300.00320.00340.0036 q

80 110 140 170 2000.00300.00320.00340.003680 110 140 170 2000.360.390.420.450.48 e q

80 110 140 170 2000.360.390.420.450.48 e Figure 3.

Lotka-Volterra model with simple cut-oﬀ. Estimates from singlerun of abc - mcmc (200) (left) and estimates from 1,000 replications of abc - mcmc ( δ ) with δ ∈ { , , , , } indicated by colour.

80 110 140 170 2000.5250.5500.5750.6000.625 q

80 110 140 170 2000.5250.5500.5750.6000.62580 110 140 170 2000.00250.00260.00270.00280.00290.00300.0031 q

80 110 140 170 2000.00250.00260.00270.00280.00290.00300.003180 110 140 170 2000.3000.3250.3500.3750.400 e q

80 110 140 170 2000.3000.3250.3500.3750.400 e Figure 4.

Lotka-Volterra model with Epanechnikov cut-oﬀ and regressioncorrection. Estimates from single run of abc - mcmc (200) (left) and estimatesfrom 1,000 replications of abc - mcmc ( δ ) with δ ∈ { , , , , } in-dicated by colour.sample autocorrelation of ( ˜ X k ) at lag 2 multiplied by 100, and the 10% and 90% quantilesof ( ˜ X k ) and ( ˜ Y k ). The observed summary statistics are ( − . , , , , T .We ﬁrst run comparisons similar to Section 6.1, but now with 1,000 independent abc - mcmc ( δ ) chains with simple cut-oﬀ. We investigate the eﬀect of post-correction, with20,000 samples, including 10,000 burn-in, for each chain. All chains were started from nearthe posterior mode, from ( − . , − . , − . T . Figure 3 shows similar comparisons as inSection 6.1, and Figure 4 shows results for regression correction with Epanechnikov cut-oﬀ φ Epa ( t ) = max { , − t } (Beaumont et al., 2002). The results suggest that post-correctionmight provide slightly more accurate estimators, particularly with smaller tolerances. Thereis also some bias in abc - mcmc ( δ ) with smaller δ , when compared to the ground truth Table 3.

Mean acceptance rates and frequencies of the 95% conﬁdence in-tervals, from abc - mcmc ( δ ) to tolerances (cid:15) , in the Lotka-Volterra model. f ( θ ) = θ f ( θ ) = θ f ( θ ) = θ Acc. δ \ (cid:15)

80 110 140 170 200 80 110 140 170 200 80 110 140 170 200 rate φ s i m p l e

80 0.8 0.73 0.74 0.05110 0.97 0.93 0.94 0.89 0.94 0.9 0.07140 0.99 0.97 0.93 0.98 0.96 0.92 0.98 0.96 0.94 0.1170 0.99 0.98 0.96 0.93 0.98 0.97 0.96 0.93 0.99 0.98 0.96 0.95 0.14200 1.0 0.99 0.98 0.97 0.94 0.99 0.99 0.98 0.97 0.92 0.99 0.98 0.98 0.96 0.94 0.17 r e g r . φ E p a

80 0.75 0.76 0.68 0.05110 0.92 0.92 0.93 0.94 0.87 0.91 0.07140 0.93 0.94 0.94 0.94 0.96 0.97 0.9 0.92 0.94 0.1170 0.93 0.95 0.95 0.95 0.96 0.97 0.97 0.98 0.92 0.94 0.94 0.95 0.14200 0.96 0.96 0.96 0.96 0.96 0.98 0.98 0.98 0.98 0.98 0.95 0.96 0.95 0.96 0.96 0.17 k (burn-in iteration) l og d k acceptance rate Figure 5.

Progress of tolerance adaptation (left) and histogram of accep-tance rates (right) in the Lotka-Volterra experiment.

Table 4.

Root mean square errors of estimators from abc - mcmc ( δ ) fortolerance (cid:15) = 80, with ﬁxed tolerance and with adaptive tolerance in theLotka-Volterra model. Post-correction, simple cut-oﬀ Regression, Epanechnikov cut-oﬀFixed tolerance Adapt Fixed tolerance Adapt δ

80 110 140 170 200 122.6 80 110 140 170 200 122.6 θ ( × − ) 2.37 1.81 1.75 1.83 1.93 1.8 3.1 2.74 3.02 3.09 3.19 2.57 θ ( × − ) 1.32 0.99 0.93 0.96 1.06 1.04 1.52 1.39 1.54 1.61 1.63 1.28 θ ( × − ) 2.94 2.26 2.11 2.14 2.37 2.34 2.77 2.53 2.76 2.85 2.91 2.34 calculated from abc - mcmc ( δ ) chain of ten million iterations. Table 3 shows coverages ofconﬁdence intervals.In addition, we experiment with the tolerance adaptation, using also 20,000 samples outof which 10,000 are burn-in. Figure 5 shows the progress of the log-tolerance during theburn-in, and histogram of the realised mean acceptance rates during the estimation phase.The realised acceptance rates are concentrated around the mean 0 .

10. Table 4 shows rootmean square errors of the estimators from abc - mcmc ( δ ) for (cid:15) = 80 for ﬁxed toleranceand with tolerance adaptation. Only the adaptive chains with ﬁnal tolerance ≥ . N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 11

In this case, the chains run with the tolerance adaptation led to better results than thoserun only with the covariance adaptation (and ﬁxed tolerance). This perhaps surprisingresult may be due to the initial behaviour of the covariance adaptation, which may beunstable when there are many rejections. Diﬀerent initialisation strategies, for instancefollowing (Haario et al., 2001, Remark 2), might lead to more stable behaviour compared tousing the adaptation of Andrieu and Moulines (2006) from the start, as we do. The diﬀerentstep size sequences ( n − and n − / ) could also play a rˆole. We repeated the experiment forthe chains with ﬁxed tolerances, but now with covariance adaptation step size n − / . Thisled to more accurate estimators for abc - mcmc ( δ ) with higher δ , but worse behaviour withsmaller δ . In any case, also here, tolerance adaptation delivered competitive results (seeSupplement E). 7. Discussion

We believe that approximate Bayesian computation inference with Markov chain MonteCarlo is a useful approach, when the chosen simulation tolerance allows for good mixing.Our conﬁdence intervals for post-processing and automatic tuning of simulation tolerancemay make this approach more appealing in practice.A related approach by Bortot et al. (2007) makes tolerance an auxiliary variable with auser-speciﬁed prior. This approach avoids explicit tolerance selection, but the inference isbased on a pseudo-posterior ˇ π ( θ, δ ) not directly related to π δ ( θ ) in (1). Bortot et al. (2007)also provide tolerance-dependent analysis, showing parameter means and variances withrespect to conditional distributions of ˇ π ( θ, δ ) given δ ≤ (cid:15) . We believe that our approach,where the eﬀect of tolerance in the expectations with respect π (cid:15) can be investigated explic-itly, can be more immediate to interpret. Our conﬁdence interval only shows the MonteCarlo uncertainty related to the posterior mean, and we are currently investigating how theoverall parameter uncertainty could be summarised in a useful manner.The convergence rates of approximate Bayesian computation has been investigated byBarber et al. (2015) in terms of cost and bias with respect to true posterior, and recently byLi and Fearnhead (2018a,b) in the large data limit, the latter in the context of regression.It would be interesting to consider extensions of these results in the Markov chain MonteCarlo context. In fact, Li and Fearnhead (2018a) already suggest that the acceptance ratemust be lower bounded, which is in line with our adaptation rule.Automatic selection of tolerance has been considered earlier in Ratmann et al. (2007), whopropose an algorithm based on tempering and a cooling schedule. Based on our experiments,the tolerance adaptation we present in this paper appears to perform well in practice, andprovides reliable results with post-correction. For the adaptation to work eﬃciently, theMarkov chains must be taken relatively long, rendering the approach diﬃcult for the mostcomputationally demanding models.We conclude with a brief discussion of certain extensions of the suggested post-correctionmethod; more details are given in Supplement D. First, in case of non-simple cut-oﬀ, therejected samples may be ‘recycled’ by using the acceptance probability as weight (Ceperleyet al., 1977). The accuracy of the post-corrected estimator could be enhanced with smallervalues of (cid:15) by performing further independent simulations from g ( · | Θ k ) (which maybe calculated in parallel). The estimator is rather straightforward, but requires some carebecause the estimators of the pseudo-likelihood take value zero. The latter extension, whichinvolves additional simulations as post-processing, is similar to the ‘lazy’ version of Prangle(2015, 2016) incorporating a randomised stopping rule for simulation, and to debiased‘exact’ approach of Tran and Kohn (2015), which may lead to estimators which get ridof (cid:15) -bias entirely. Acknowledgements

This work was supported by Academy of Finland (grants 274740, 284513 and 312605).The authors wish to acknowledge CSC, IT Center for Science, Finland, for computationalresources, and thank Christophe Andrieu for useful discussions.

References

C. Andrieu and ´E. Moulines. On the ergodicity properties of some adaptive MCMC algo-rithms.

Ann. Appl. Probab. , 16(3):1462–1505, 2006.C. Andrieu and J. Thoms. A tutorial on adaptive MCMC.

Statist. Comput. , 18(4):343–373,Dec. 2008.C. Andrieu, ´E. Moulines, and P. Priouret. Stability of stochastic approximation underveriﬁable conditions.

SIAM J. Control Optim. , 44(1):283–312, 2005.C. Andrieu, A. Lee, and M. Vihola. Theoretical and methodological aspects of MCMCcomputations with noisy likelihoods. In S. A. Sisson, Y. Fan, and M. Beaumont, editors,

Handbook of Approximate Bayesian Computation . Chapman & Hall/CRC Press, 2018.S. Barber, J. Voss, and M. Webster. The rate of convergence for approximate Bayesiancomputation.

Electron. J. Statist. , 9(1):80–105, 2015.M. Beaumont, W. Zhang, and D. Balding. Approximate Bayesian computation in popula-tion genetics.

Genetics , 162(4):2025–2035, 2002.J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach tonumerical computing.

SIAM review , 59(1):65–98, 2017.L. Bornn, N. Pillai, A. Smith, and D. Woodard. The use of a single pseudo-sample inapproximate Bayesian computation.

Statist. Comput. , 27(3):583–590, 2017.P. Bortot, S. Coles, and S. Sisson. Inference for stereological extremes.

J. Amer. Statist.Assoc. , 102(477):84–92, 2007.R. J. Boys, D. J. Wilkinson, and T. B. Kirkwood. Bayesian inference for a discretelyobserved stochastic kinetic model.

Stat. Comput. , 18(2):125–135, 2008.D. Burkholder, B. Davis, and R. Gundy. Integral inequalities for convex functions of opera-tors on martingales. In

Proc. Sixth Berkeley Symp. Math. Statist. Prob , volume 2, pages223–240, 1972.D. Ceperley, G. Chester, and M. Kalos. Monte Carlo simulation of a many-fermion study.

Phys. Rev. D , 16(7):3081, 1977.J.-F. Delmas and B. Jourdain. Does waste recycling really improve the multi-proposalMetropolis–Hastings algorithm? an analysis based on control variates.

J. Appl. Probab. ,46(4):938–959, 2009.A. Doucet, M. Pitt, G. Deligiannidis, and R. Kohn. Eﬃcient implementation of Markovchain Monte Carlo when using an unbiased likelihood estimator.

Biometrika , 102(2):295–313, 2015.P. Fearnhead and D. Prangle. Constructing summary statistics for approximate Bayesiancomputation: semi-automatic approximate Bayesian computation.

J. R. Stat. Soc. Ser.B Stat. Methodol. , 74(3):419–474, 2012.J. M. Flegal and G. L. Jones. Batch means and spectral variance estimators in Markovchain Monte Carlo.

Ann. Statist. , 38(2):1034–1070, 2010.J. Franks and M. Vihola. Importance sampling correction versus standard averages ofreversible MCMCs in terms of the asymptotic variance. Preprint arXiv:1706.09873v3,2017.C. J. Geyer. Practical Markov chain Monte Carlo.

Statist. Sci. , pages 473–483, 1992.

N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 13

H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm.

Bernoulli , 7(2):223–242, 2001.W. Li and P. Fearnhead. On the asymptotic eﬃciency of approximate Bayesian computationestimators.

Biometrika , 105(2):285–299, 2018a.W. Li and P. Fearnhead. Convergence of regression-adjusted approximate Bayesian com-putation.

Biometrika , 105(2):301–318, 2018b.J.-M. Marin, P. Pudlo, C. P. Robert, and R. J. Ryder. Approximate Bayesian computationalmethods.

Statist. Comput. , 22(6):1167–1180, 2012.P. Marjoram, J. Molitor, V. Plagnol, and S. Tavar´e. Markov chain Monte Carlo withoutlikelihoods.

Proc. Natl. Acad. Sci. USA , 100(26):15324–15328, 2003.S. Meyn and R. L. Tweedie.

Markov Chains and Stochastic Stability . Cambridge UniversityPress, 2nd edition, 2009. ISBN 978-0-521-73182-9.D. Prangle. Lazier ABC. Preprint arXiv:1501.05144, 2015.D. Prangle. Lazy ABC.

Statist. Comput. , 26(1-2):171–185, 2016.O. Ratmann, O. Jørgensen, T. Hinkley, M. Stumpf, S. Richardson, and C. Wiuf. Usinglikelihood-free inference to compare evolutionary dynamics of the protein networks of H.pylori and P. falciparum.

PLoS Comput. Biol. , 3(11):e230, 2007.L. Raynal, J.-M. Marin, P. Pudlo, M. Ribatet, C. P. Robert, and A. Estoup. ABC randomforests for bayesian parameter inference.

Bioinformatics , to appear.G. Roberts, A. Gelman, and W. Gilks. Weak convergence and optimal scaling of randomwalk Metropolis algorithms.

Ann. Appl. Probab. , 7(1):110–120, 1997.G. O. Roberts and J. S. Rosenthal. Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov chains.

Ann. Appl. Probab. , 16(4):2123–2139, 2006.D. Rudolf and B. Sprungk. On a Metropolis-Hastings importance sampling estimator.Preprint arXiv:1805.07174, 2018.I. Schuster and I. Klebanov. Markov chain importance sampling - a highly eﬃcient estimatorfor MCMC. Preprint arXiv:1805.07179, 2018.C. Sherlock, A. H. Thiery, G. O. Roberts, and J. S. Rosenthal. On the eﬃciency of pseudo-marginal random walk Metropolis algorithms.

Ann. Statist. , 43(1):238–275, 2015.S. Sisson and Y. Fan. ABC samplers. In S. Sisson, Y. Fan, and M. Beaumont, editors,

Handbook of Markov chain Monte Carlo . Chapman & Hall/CRC Press, 2018.A. D. Sokal. Monte Carlo methods in statistical mechanics: Foundations and new algo-rithms. Lecture notes, 1996.M. Sunn˚aker, A. G. Busetto, E. Numminen, J. Corander, M. Foll, and C. Dessimoz. Ap-proximate Bayesian computation.

PLoS computational biology , 9(1):e1002803, 2013.M. Tanaka, A. Francis, F. Luciani, and S. Sisson. Using approximate Bayesian computationto estimate tuberculosis transmission parameters from genotype data.

Genetics , 173(3):1511–1520, 2006.M. N. Tran and R. Kohn. Exact ABC using importance sampling. PreprintarXiv:1509.08076, 2015.M. Vihola. Robust adaptive Metropolis algorithm with coerced acceptance rate.

Statist.Comput. , 22(5):997–1008, 2012.M. Vihola, J. Helske, and J. Franks. Importance sampling type estimators based on ap-proximate marginal MCMC. Preprint arXiv:1609.02541v5, 2016.D. Wegmann, C. Leuenberger, and L. Excoﬃer. Eﬃcient approximate Bayesian computationcoupled with Markov chain Monte Carlo without likelihoods.

Genetics , 182(4):1207–1218,2009.

Appendix

The following algorithm shows that in case of simple (post-correction) cut-oﬀ, E δ,(cid:15) ( f )and S δ,(cid:15) ( f ) may be calculated simultaneously for all tolerances eﬃciently: Algorithm 11.

Suppose φ = φ simple and (Θ k , T k ) k =1 ,...,n is the output of abc - mcmc ( δ ).(i) Sort (Θ k , T k ) k =1 ,...,n with respect to T k : • Find indices I , . . . , I n such that T I k ≤ T I k +1 for all k = 1 , . . . , n − • Denote ( ˆΘ k , ˆ T k ) ← (Θ I k , T I k ).(ii) For all unique values (cid:15) ∈ { ˆ T , . . . , ˆ T n } , let m (cid:15) = max { k ≥ T k ≤ (cid:15) } , and deﬁne E δ,(cid:15) ( f ) = m − (cid:15) (cid:80) m (cid:15) k =1 f ( ˆΘ k ) , and S δ,(cid:15) ( f ) = m − (cid:15) (cid:80) m (cid:15) k =1 (cid:2) f ( ˆΘ k ) − E δ,(cid:15) ( f ) (cid:3) . (and for ˆ T k < (cid:15) < ˆ T k +1 , let E δ,(cid:15) ( f ) = E δ, ˆ T k ( f ) and S δ,(cid:15) ( f ) = S δ, ˆ T k ( f ).)The sorting in Algorithm 11(i) may be performed in O ( n log n ) time, and E δ,(cid:15) ( f ) and S δ,(cid:15) ( f ) may all be calculated in O ( n ) time by forming appropriate cumulative sums. of Theorem 4. Algorithm 1 is a Metropolis–Hastings algorithm with compound proposal˜ q ( θ, y ; θ (cid:48) , y (cid:48) ) = q ( θ, θ (cid:48) ) g ( y (cid:48) | θ (cid:48) ) and with target ˜ π (cid:15) ( θ, y ) ∝ pr( θ ) g ( y | θ ) φ (cid:0) d ( y, y ∗ ) /(cid:15) (cid:1) . Thechain (Θ k , Y k ) k ≥ is Harris-recurrent, as a full-dimensional Metropolis–Hastings which is ϕ -irreducible (Roberts and Rosenthal, 2006). Because φ is monotone and (cid:15) ≤ δ , we have φ (cid:0) d ( y, y ∗ ) /δ (cid:1) ≥ φ (cid:0) d ( y, y ∗ ) /(cid:15) (cid:1) , and therefore ˜ π (cid:15) is absolutely continuous with respect to ˜ π δ ,and w δ,(cid:15) ( y ) = c δ,(cid:15) ˜ π (cid:15) ( θ, y ) / ˜ π δ ( θ, y ), where c δ,(cid:15) > ξ k ( f ) = U ( δ,(cid:15) ) k f (Θ k ) and ξ k ( ) = U ( δ,(cid:15) ) k = w δ,(cid:15) ( Y k ), then E ( n ) δ,(cid:15) ( f ) = (cid:80) nk =1 ξ k ( f ) / (cid:80) nj =1 ξ j ( ) → E ˜ π (cid:15) [ f (Θ)] almost surely by Harris recurrence and ˜ π (cid:15) invariance (e.g. Vihola et al., 2016).The claim (i) follows because π (cid:15) is the marginal density of ˜ π (cid:15) .The chain (Θ k , Y k ) k ≥ is reversible, so (ii) follows by (Vihola et al., 2016, Theorem 7(i)),because m (2) f ( θ, y ) = w δ,(cid:15) ( y ) f ( θ ) satisﬁes E ˜ π δ [ m (2) f (Θ , Y )] = c δ,(cid:15) E ˜ π (cid:15) [ w δ,(cid:15) ( Y ) f (Θ)] ≤ c δ,(cid:15) E π (cid:15) [ f (Θ)] < ∞ , and because the asymptotic variance of the function h δ,(cid:15) with respect to (Θ k , Y k ) k ≥ may beexpressed as var ˜ π δ (cid:0) h δ,(cid:15) (Θ , Y ) (cid:1) τ δ,(cid:15) ( f ), so v δ,(cid:15) ( f ) = var ˜ π δ (cid:0) h δ,(cid:15) (Θ , Y ) (cid:1) /c δ,(cid:15) . The convergence nS ( n ) δ,(cid:15) ( f ) → v δ,(cid:15) ( f ) follows from (Vihola et al., 2016, Theorem 9). (cid:3) of Theorem 6. The invariant distribution of abc - mcmc ( δ ) may be written as ˜ π δ ( θ, y ) = π δ ( θ )¯ g δ ( y | θ ) where ¯ g δ ( y | θ ) = g ( y | θ )1 ( d ( y, y ∗ ) ≤ δ ) /L δ ( θ ), and that (cid:82) ¯ g δ ( y | θ ) w pδ,(cid:15) ( y )d y =¯ w δ,(cid:15) ( θ ) for p ∈ { , } . Consequently, ˜ π δ ( h δ,(cid:15) ) = π δ ( f δ,(cid:15) ) and ˜ π δ ( h δ,(cid:15) ) = π δ ( f ¯ w δ,(cid:15) ), sovar ˜ π δ ( h δ,(cid:15) ) = var π δ ( f δ,(cid:15) ) + π δ (cid:0) ¯ w δ,(cid:15) (1 − ¯ w δ,(cid:15) ) f (cid:1) . Hereafter, let a δ,(cid:15) = (cid:0) var ˜ π δ ( h δ,(cid:15) ) (cid:1) − / anddenote ˜ h δ,(cid:15) = a δ,(cid:15) h δ,(cid:15) and ˜ f δ,(cid:15) = a δ,(cid:15) f δ,(cid:15) . Clearly, var ˜ π δ (˜ h δ,(cid:15) ) = 1 and ρ ( δ,(cid:15) ) k = e ( δ,(cid:15) ) k − (cid:0) π δ ( ˜ f δ,(cid:15) ) (cid:1) , where e ( δ,(cid:15) ) k = E (cid:2) ˜ h δ,(cid:15) (Θ ( s )0 , Y ( s )0 )˜ h δ,(cid:15) (Θ ( s ) k , Y ( s ) k ) (cid:3) . Note that with φ = φ simple , the acceptance ratio is α δ ( θ, y ; ˆ θ, ˆ y ) = ˙ α ( θ, ˆ θ )1 ( d (ˆ y, y ∗ ) ≤ δ ),where ˙ α ( θ, ˆ θ ) = min (cid:8) , pr(ˆ θ ) q (ˆ θ, θ ) (cid:14)(cid:0) pr( θ ) q ( θ, ˆ θ ) (cid:1)(cid:9) , which is independent of y , so (Θ ( s ) k ) ismarginally a Metropolis–Hastings type chain, with proposal q and acceptance probability α ( θ, ˆ θ ) L δ (ˆ θ ), and E (cid:2) ˜ h δ,(cid:15) (Θ ( s )1 , Y ( s )1 ) (cid:12)(cid:12) (Θ ( s )0 , Y ( s )0 ) = ( θ, y ) (cid:3) − r δ ( θ )˜ h δ,(cid:15) ( θ, y )= a δ,(cid:15) (cid:90) q ( θ, ˆ θ ) ˙ α ( θ, ˆ θ ) g (ˆ y | ˆ θ ) w δ,(cid:15) (ˆ y ) f (ˆ θ )dˆ θ dˆ y = (cid:90) q ( θ, ˆ θ ) ˙ α ( θ, ˆ θ ) L δ (ˆ θ ) ˜ f δ,(cid:15) (ˆ θ )dˆ θ. N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 15

Using this iteratively, we obtain that e ( δ,(cid:15) ) k = E (cid:2) ˜ f δ,(cid:15) (Θ ( s )0 ) ˜ f δ,(cid:15) (Θ ( s ) k ) (cid:3) + (cid:82) ˜ π δ ( θ, y ) (cid:2) ˜ h δ,(cid:15) ( θ, y ) − ˜ f δ,(cid:15) ( θ ) (cid:3) r kδ ( θ )d θ d y, and therefore with γ ( δ,(cid:15) ) k = a δ,(cid:15) cov (cid:0) f δ,(cid:15) (Θ ( s )0 ) , f δ,(cid:15) (Θ ( s ) k ) (cid:1) , (cid:80) k ≥ ρ ( δ,(cid:15) ) k = (cid:80) k ≥ γ ( δ,(cid:15) ) k + a δ,(cid:15) (cid:82) π δ ( θ ) ¯ w δ,(cid:15) ( θ ) (cid:0) − ¯ w δ,(cid:15) ( θ ) (cid:1) r δ ( θ )(1 − r δ ( θ )) − f ( θ )d θ. We conclude by noticing that 2 (cid:80) k ≥ γ ( δ,(cid:15) ) k = a δ,(cid:15) var π δ ( f δ,(cid:15) )(ˇ τ δ,(cid:15) ( f ) − (cid:3) Supplement A. Convergence of the tolerance adaptive ABC-MCMC undergeneralised conditions

This section details a convergence theorem, under weaker assumptions than that of The-orem 10, for the tolerance adaptation (Algorithm 8) of Section 4.For convenience, we denote the distance distribution here as T ∼ Q θ ( · ), where T = d ( Y, y ∗ ) for Y ∼ g ( · | θ ). With this notation, and re-indexing Θ (cid:48) k = ˜Θ k +1 , we may rewriteAlgorithm 8 as follows: Algorithm 12.

Suppose Θ ∈ T is a starting value with pr(Θ ) >

0. Initialise δ = T ∼ Q Θ ( · ). k = 0 , . . . , n b −

1, iterate:(i) Draw Θ (cid:48) k ∼ q (Θ k − , · ) and T (cid:48) k ∼ Q Θ (cid:48) k ( · ).(ii) Accept, by setting (Θ k +1 , T k +1 ) ← (Θ (cid:48) k , T (cid:48) k ), with probability(2) α (cid:48) δ k (Θ k , T k ; Θ (cid:48) k , T (cid:48) k ) = min (cid:26) , pr(Θ (cid:48) k ) q (Θ (cid:48) k , Θ k ) φ ( T (cid:48) k /δ k )pr(Θ k ) q (Θ k , Θ (cid:48) k ) φ ( T k /δ k ) (cid:27) and otherwise reject, by setting (Θ k +1 , T k +1 ) ← (Θ k , T k ).(iii) log δ k +1 ← log δ k + γ k +1 (cid:0) α ∗ − α (cid:48) δ k (Θ k , Θ (cid:48) k , T (cid:48) k ) (cid:1) .Let us set β = log δ , and consider the proposal-rejection Markov kernel(3) ˙ P β ( θ, d ϑ ) = q ( θ, d ϑ ) α β ( θ, ϑ ) + (cid:18) − (cid:90) q ( θ, d ϑ ) α β ( θ, ϑ ) (cid:19) θ ∈ d ϑ ) , where α β ( θ, ϑ ) = ˙ α ( θ, ϑ ) L β ( ϑ ) , ˙ α ( θ, ϑ ) = min (cid:26) , pr( ϑ ) q ( ϑ, θ )pr( θ ) q ( θ, ϑ ) (cid:27) , and L β ( ϑ ) = (cid:90) Q ϑ (d t )1 (cid:0) t ≤ e β (cid:1) . Then ˙ P β k is the transition of the θ -coordinate chain of Algorithm 12 with simple cut-oﬀ atiteration k , obtained by disregarding the t -coordinate. It is easily seen to be reversible withrespect to the posterior probability π β ( θ ) ∝ pr( θ ) L β ( θ ) given in (1), written here in termsof β = log δ instead of δ . Assumption 13.

Suppose φ = φ simple and the following hold:(i) Step sizes ( γ k ) k ≥ satisfy γ k ≥ γ k +1 ≤ γ k , (cid:88) k ≥ γ k = ∞ , and (cid:88) k ≥ γ k (cid:16) | log γ k | + | log γ k | (cid:17) < ∞ . (ii) The domain T ⊂ R n θ , n θ ≥

1, is a nonempty open set.(iii) pr( · ) and q ( θ, · ) are uniformly bounded densities on R n θ (i.e. ∃ C > q ( θ, ϑ ) < C and pr( θ ) < C for all θ, ϑ ∈ R n θ ), and pr( θ ) = 0 for θ / ∈ T .(iv) Q θ (d t ) admits a uniformly bounded density Q θ ( t ).(v) The values { β k } remain in some compact subset B ⊂ R almost surely.(vi) c β > β ∈ B , where c β = (cid:82) pr(d θ ) L β ( θ ). (vii) There exists ˙ V : T → [1 , ∞ ) such that the Markov transitions ˙ P β are simultaneously˙ V -geometrically ergodic: there exist C > ρ ∈ (0 ,

1) s.t. for all k ≥ f : T → R with | f | ≤ ˙ V , it holds that | ˙ P kβ f ( θ ) − π β ( f ) | ≤ C ˙ V ( θ ) ρ k . (viii) With E [ · ] = E θ,β [ · ] denoting expectation with respect to the law of the marginalchain (Θ k ) of Algorithm 12 started at θ ∈ T , β ∈ B , and with ˙ V as in Assumption13(vii), we have, sup θ,β,k E (cid:2) ˙ V (Θ k ) (cid:3) < ∞ . Theorem 14.

Under Assumption 13, the expected value of the acceptance probability (2) ,taken with respect to the stationary measure of the chain, converges to α ∗ . Proof of Theorem 14 can be found in Section B. It relies heavily on the simple conditionsof (Andrieu et al., 2005, Theorem 2.3), which says that one must essentially show that thenoise in the stochastic approximation update is asymptotically controlled.We remark that there are likely extensions of Assumption 13(v) to the general non-compact adaptation parameter case based on projections (cf. Andrieu et al., 2005).

Supplement B. Analysis of the tolerance adaptive ABC-MCMC

In this section we aim to prove generalised convergence (Theorem 14 of Section A) ofthe tolerance adaptation, from which Theorem 10 of Section 4 will follow as a corollary.Throughout, we denote by

C >

Proposal augmentation.

Suppose ˙ L is a Markov kernel which can be written as(4) ˙ L ( x, d y ) = q ( x, d y ) α ( x, y ) + (cid:18) − (cid:90) q ( x, d y (cid:48) ) α ( x, y (cid:48) ) (cid:19) x ∈ d y ) , where α ( x, y ) ∈ [0 ,

1] is a jointly measurable function and q ( x, d y ) is a Markov proposalkernel. With ˘ x = ( x, x (cid:48) ), we deﬁne the proposal augmentation to be the Markov kernel(5) L (˘ x, d˘ y ) = α (˘ x )1 ( x (cid:48) ∈ d y ) q ( x (cid:48) , d y (cid:48) ) + (cid:0) − α (˘ x ) (cid:1) x ∈ d y ) q ( x, d y (cid:48) ) . It is easy to see that L need not be reversible even if ˙ L is reversible. In this case, however, L does leave a probability measure invariant. Lemma 15.

Suppose a Markov kernel ˙ L of the form given in (4) is ˙ µ -reversible. Let L beits proposal augmentation. Then the following statements hold:(i) µL = µ , where µ (d x, d x (cid:48) ) = ˙ µ (d x ) q ( x, d x (cid:48) ) .(ii) If ˙ L is ˙ V -geometrically ergodic with constants ( ˙ C, ˙ ρ ) , then L is V -geometrically ergodicwith constants ( C, ρ ) , where C = 2 ˙ C/ ˙ ρ, ρ = ˙ ρ, and V (˘ x ) = (cid:0) V ( x ) + V ( x (cid:48) ) (cid:1) . Lemma 15 extends (Schuster and Klebanov, 2018, Theorem 4), who consider the casewhere ˙ P is a Metropolis–Hastings chain (see also Delmas and Jourdain, 2009; Rudolf andSprungk, 2018). The extension to the more general class of reversible proposal-rejectionchains allows one to consider, for example, jump and delayed acceptance chains, as well asthe marginal chain (3) of Section A, which will be important for our analysis of the toleranceadaptation. N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 17 of Lemma 15.

Part (i) follows by a direct calculation. We now consider part (ii). For f : X → R , we shall use the notation ˙ f ( x ) = (cid:82) f (˘ x ) q ( x, d x (cid:48) ) . For f : X → R , we have (cid:90) q ( x, d x (cid:48) ) L (cid:0) ( x, x (cid:48) ); d˘ y (cid:1) f (˘ y ) = (cid:90) q ( x, d x (cid:48) ) α (˘ x ) ˙ f ( x (cid:48) ) + (cid:90) q ( x, d x (cid:48) ) (cid:0) − α (˘ x ) (cid:1) ˙ f ( x ) = ˙ L ˙ f ( x ) , and then inductively, for k ≥ (cid:90) q ( x, d x (cid:48) ) L k (cid:0) ( x, x (cid:48) ); d˘ y (cid:1) f (˘ y ) = (cid:90) q ( x, d x (cid:48) ) α (˘ x ) q ( x (cid:48) , d y (cid:48) ) L k − (cid:0) ( x (cid:48) , y (cid:48) ); d˘ z (cid:1) f (˘ z )+ (cid:90) q ( x, d x (cid:48) ) (cid:0) − α (˘ x ) (cid:1) q ( x, d y (cid:48) ) L k − (cid:0) ( x, y (cid:48) ); d˘ z ) f (˘ z )= (cid:90) q ( x, d x (cid:48) ) α (˘ x ) ˙ L k − ˙ f ( x (cid:48) ) + (cid:90) q ( x, d x (cid:48) ) (cid:0) − α (˘ x ) (cid:1) ˙ L k − ˙ f ( x )= ˙ L k ˙ f ( x ) . We then have the equality, L k f (˘ x ) = α (˘ x ) (cid:90) q ( x (cid:48) , d y (cid:48) ) L k − (cid:0) ( x (cid:48) , y (cid:48) ); d˘ z (cid:1) f (˘ z ) + (cid:0) − α (˘ x ) (cid:1) (cid:90) q ( x, d y (cid:48) ) L k − (cid:0) ( x, y (cid:48) ); d˘ z (cid:1) f (˘ z )= α (˘ x ) ˙ L k − ˙ f ( x (cid:48) ) + (cid:0) − α (˘ x ) (cid:1) ˙ L k − ˙ f ( x ) . For (cid:107) f (cid:107) ≤ V , note that (cid:107) ˙ f (cid:107) ≤ ˙ V since (cid:107) q (cid:107) ∞ ≤

1, and we conclude (ii) from | L k f (˘ x ) − µ ( f ) | ≤ α (˘ x ) | ˙ L k − ˙ f ( x (cid:48) ) − ˙ µ ( ˙ f ) | + (cid:0) − α (˘ x ) (cid:1) | ˙ L k − ˙ f ( x ) − ˙ µ ( ˙ f ) |≤ ˙ C ˙ ρ k − (cid:0) ˙ V ( x (cid:48) ) + ˙ V ( x ) (cid:1) . (cid:3) Consider now the θ -coordinate chain ˙ P β presented in (3) of Section A. This transition˙ P β is clearly a reversible proposal-rejection chain of the form (4). We now consider P β ,its proposal augmentation. This is the chain ˘Θ k = (Θ k , Θ (cid:48) k ) ∈ T , formed by disregardingthe t -parameter as with ˙ P β before, but now augmenting by the proposal θ (cid:48) ∼ q ( θ, · ). Itstransitions are of the form ˘ θ = ˘Θ k goes to ˘ ϑ = ˘Θ k +1 in the ABC-MCMC, with ˘ ϑ = ( ϑ, ϑ (cid:48) )and kernel P β (˘ θ, d ˘ ϑ ) = α β (˘ θ )1 ( θ (cid:48) ∈ d ϑ ) q ( θ (cid:48) , d ϑ (cid:48) ) + (cid:0) − α β (˘ θ ) (cid:1) θ ∈ d ϑ ) q ( θ, d ϑ (cid:48) )By Lemma 15(i), P β leaves π (cid:48) β = π (cid:48) β,u /c β invariant, where π (cid:48) β,u (d˘ θ ) = pr(d θ ) L β ( θ ) q ( θ, d θ (cid:48) )and c β = (cid:82) pr(d θ ) L β ( θ ) . B.2.

Monotonicity properties.

The following result establishes monotonicity of the meanﬁeld acceptance rate with increasing tolerance.

Lemma 16.

Assume Assumption 13 (iii) and 13 (iv) hold. The mapping β (cid:55)→ π (cid:48) β ( α β ) ismonotone non-decreasing.Proof. Since pr( θ ) and q ( θ, θ (cid:48) ) are uniformly bounded (Assumption 13(iii)), and L β ( θ ) ≤

1, diﬀerentiation under the integral sign is possible in the following by the dominatedconvergence theorem. By the quotient rule,(6) dd β (cid:16) π (cid:48) β ( α β ) (cid:17) = 1 c β (cid:18) c β dd β (cid:16) π (cid:48) β,u ( α β ) (cid:17) − π (cid:48) β,u ( α β ) d c β d β (cid:19) . By reversibility of Metropolis–Hastings targeting pr( θ ) with proposal q ,dd β (cid:16) π (cid:48) β,u ( α β ) (cid:17) = 2 e β (cid:90) pr(d θ ) L β ( θ ) q ( θ, d θ (cid:48) ) ˙ α ( θ, θ (cid:48) ) Q θ (cid:48) ( e β ) . With f ( θ (cid:48) ) = 2 Q θ (cid:48) ( e β ) (cid:90) pr(d˜ θ ) L β (˜ θ ) − L β ( θ (cid:48) ) (cid:90) pr(d˜ θ ) Q ˜ θ ( e β ) , we can then write (6) asdd β (cid:16) π (cid:48) β ( α β ) (cid:17) = e β c β (cid:90) pr(d θ ) L β ( θ ) q ( θ, d θ (cid:48) ) ˙ α ( θ, θ (cid:48) ) f ( θ (cid:48) ) . By the same reversibility property as before, we can write this again asdd β (cid:16) π (cid:48) β ( α β ) (cid:17) = e β c β (cid:90) f ( θ )pr(d θ ) (cid:90) q ( θ, d θ (cid:48) ) L β ( θ (cid:48) ) ˙ α ( θ, θ (cid:48) ) , We then conclude, since (cid:90) f ( θ )pr(d θ ) = (cid:90) Q θ ( e β )pr(d θ ) (cid:90) L β (˜ θ )pr(d˜ θ ) ≥ . (cid:3) Lemma 17.

The following statements hold:(i) The function β (cid:55)→ c β is monotone non-decreasing on R .(ii) If Assumption 13 (v) and 13 (vi) hold, then there exist C min > , C max > such that C min ≤ c β ≤ C max for all β ∈ B .Proof. Part (i) follows, for β ≤ β (cid:48) , from c β = (cid:90) pr(d θ ) Q θ ([0 , e β ]) ≤ (cid:90) pr(d θ ) Q θ ([0 , e β (cid:48) ]) = c β (cid:48) . Consider part (ii). By part (i) and compactness of B (Assumption 13(v)), we can set C min = c min( B ) and C max = c max( B ) , both of which are positive by Assumption 13(vi). (cid:3) B.3.

Stochastic approximation framework.

To obtain a form common in the stochasticapproximation literature (cf. Andrieu et al., 2005), we write the update in Algorithm 12 as β k +1 = β k + γ k +1 H β k ( ˘Θ k , T (cid:48) k )= β k + γ k +1 h ( β k ) + γ k +1 ζ k +1 where H β (˘ θ, t (cid:48) ) = α ∗ − α (cid:48) β (˘ θ, t (cid:48) ) ,α (cid:48) β (˘ θ, t (cid:48) ) = min (cid:26) , pr( θ (cid:48) ) q ( θ (cid:48) , θ )pr( θ ) q ( θ, θ (cid:48) ) (cid:27) (cid:0) t (cid:48) ≤ e β (cid:1) ,h ( β ) = π (cid:48) β ( (cid:98) H β ) = (cid:90) π β (d θ ) q ( θ, d θ (cid:48) ) Q θ (cid:48) (d t (cid:48) ) H β ( θ, θ (cid:48) , t (cid:48) ) , noise sequence ζ k +1 = H β k ( ˘Θ k , T (cid:48) k ) − h ( β k ) , and conditional expectation (cid:98) H β (˘ θ ) = E [ H β ( ˘Θ , T (cid:48) ) | ˘Θ = ˘ θ ] , where T (cid:48) ∼ Q θ (cid:48) ( · ). We also set for convenience ¯ H β (˘ θ ) = (cid:98) H β (˘ θ ) − π (cid:48) β ( (cid:98) H β ) . Lemma 18.

Suppose Assumption 13 (vii) holds. Then the following statements hold:(i) The proposal augmented kernels ( P β ) β ∈ B are simultaneously V -geometrically ergodic,where V ( θ, θ (cid:48) ) = (cid:0) ˙ V ( θ ) + ˙ V ( θ (cid:48) ) (cid:1) , with ˙ V as in Assumption 13 (vii) .(ii) There exists C > , such that for all β ∈ B , the formal solution g β = (cid:80) k ≥ P kβ ¯ H β tothe Poisson equation g β − P β g β = ¯ H β satisﬁes | g β (˘ θ ) | ≤ CV (˘ θ ) . N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 19

Proof. (i) follows directly from the explicit parametrisation for (

C, ρ ) given in Lemma 15(ii).Part (ii) follows from part (i) and the bound, since | ¯ H β | ≤ ≤ V , | g β (˘ θ ) | ≤ C β (cid:88) k ≥ ρ kβ V (˘ θ ) ≤ (cid:18) C β − ρ β (cid:19) V (˘ θ ) . (cid:3) B.4.

Contractions.

We deﬁne for V : T → [1 , ∞ ) and g : T → R the V -norm (cid:107) g (cid:107) V =sup θ ∈ T | g ( θ ) | V ( θ ) . We deﬁne for a bounded operator A on a Banach space of bounded functions f , the operator norm (cid:107) A (cid:107) ∞ = sup f (cid:107) Af (cid:107) ∞ (cid:107) f (cid:107) ∞ . Lemma 19.

Control of noise.

We state a simple standard fact used repeatedly in the proof ofLemma 21 below, our key lemma.

Lemma 20.

Suppose ( X j ) j ≥ are random variables with X j ≥ , X j +1 ≤ X j , and lim j →∞ E [ X j ] =0 . Then, almost surely, lim j →∞ X j = 0 . Lemma 21.

Suppose Assumption 13 holds. Then, with T j,n = (cid:80) nk = j γ k ζ k , we have lim j →∞ sup n ≥ j (cid:12)(cid:12) T j,n (cid:12)(cid:12) = 0 , almost surely.Proof. Similar to (Andrieu et al., 2005, Proof of Prop. 5.2), we write T j,n = (cid:80) i =1 T ( j ) j,n ,where (cid:98) H β k − ( ˘Θ k − ) = E [ H β k − ( ˘Θ k − , T (cid:48) ) |F (cid:48) k − ] , with F (cid:48) k − = σ ( β k − , Θ k − , Θ (cid:48) k − ) representing the information obtained through runningAlgorithm 12 up to and including iteration k − (cid:48) k − , and T (1) j,n = n (cid:88) k = j γ k (cid:16) H β k − ( ˘Θ k − , T (cid:48) k − ) − (cid:98) H β k − ( ˘Θ k − ) (cid:17) , T (2) j,n = n (cid:88) k = j γ k (cid:16) g β k − ( ˘Θ k − ) − P β k − g β k − ( ˘Θ k − ) (cid:17) , T (3) j,n = γ j − P j − g β j − ( ˘Θ j − ) − γ n P β n g β n ( ˘Θ n − ) , T (4) j,n = n (cid:88) k = j (cid:16) γ k − γ k − (cid:17) P β k − g β k − ( ˘Θ k − ) , T (5) j,n = n (cid:88) k = j γ k (cid:88) i ≥ m k +1 P iβ k ¯ H β k ( ˘Θ k − ) , T (6) j,n = − n (cid:88) k = j γ k (cid:88) i ≥ m k +1 P iβ k − ¯ H β k − ( ˘Θ k − ) , T (7) j,n = n (cid:88) k = j γ k m k (cid:88) i =1 (cid:16) P iβ k − P iβ k − (cid:17) ¯ H β k ( ˘Θ k − ) , T (8) j,n = n (cid:88) k = j γ k m k (cid:88) i =1 P iβ k − (cid:16) ¯ H β k − ¯ H β k − (cid:17) ( ˘Θ k − ) . Here, g β is the Poisson solution deﬁned in Lemma 18(ii), and m k = (cid:100)| log γ k |(cid:101) . We remindthat ¯ H β = (cid:98) H β − h ( β ) from Section B.3.We now show lim j →∞ sup n ≥ j (cid:12)(cid:12) T ( i ) j,n (cid:12)(cid:12) = 0 for each of the terms i ∈ { } individually, whichimplies the result of the lemma.(1) Since for all n > j , E [ T (1) j,n − T (1) j,n − |F (cid:48) n − ] = 0 , we have that ( T (1) j,n ) n ≥ j is a F (cid:48) n -martingale for each j ≥

1. By the Burkholder-Davis-Gundyinequality for martingales (cf. Burkholder et al., 1972), we have E [sup n ≥ j |T (1) j,n | ] ≤ C E (cid:104) ∞ (cid:88) k = j γ k (cid:0) H β k − ( ˘Θ k − , T (cid:48) k − ) − (cid:98) H β k − ( ˘Θ k − ) (cid:1) (cid:105) ≤ C ∞ (cid:88) k = j γ k , N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 21 where in the last inequality we have noted that | H β − (cid:98) H β | ≤

1. Since (cid:80) k ≥ γ k < ∞ , we getthat lim j →∞ E [sup n ≥ j |T (1) j,n | ] = 0 . Hence, the result follows by Lemma 20.(2) For j ≥

2, we have for n > j , E [ T (2) j,n − T (2) j,n − |F (cid:48) n − ] = 0 , so that ( T (2) j,n ) n ≥ j is a F (cid:48) n − -martingale, for j ≥

2. By the Burkholder-Davis-Gundy inequal-ity again, E [sup n ≥ j |T (2) j,n | ] ≤ C E (cid:104) ∞ (cid:88) k = j γ k (cid:0) g β k − ( ˘Θ k − ) − P β k − g β k − ( ˘Θ k − ) (cid:1) (cid:105) . We then use Lemma 18(ii) and (cid:107) P β (cid:107) ∞ ≤

1, to get, after combining terms, E [sup n ≥ j |T (2) j,n | ] ≤ C ∞ (cid:88) k = j − γ k E (cid:104) V ( ˘Θ k − ) (cid:105) ≤ C ∞ (cid:88) k = j − γ k , where we have used Assumption 13(viii) in the last inequality. We then conclude by Lemma20 as before.(3) By Lemma 18(ii), the triangle inequality, (cid:107) P β (cid:107) ∞ ≤

1, and the dominated convergencetheorem, we obtain E [sup n ≥ j |T (3) j,n | ≤ Cγ j − E [ V ( ˘Θ j − )] + C sup n ≥ j γ n E [ V ( ˘Θ n − )] . We then apply Assumption 13(viii) and Jensen’s inequality, and use that γ k go to zero,since (cid:80) γ k < ∞ , to get thatlim j →∞ E [sup n ≥ j |T (3) j,n | ] ≤ C (cid:16) lim j →∞ γ j − + sup n ≥ j γ n (cid:17) = 0 . We now may conclude by Lemma 20.(4) By Lemma 18(ii) and γ k ≤ γ k − , we have for j ≥ E [sup n ≥ j |T (4) j,n | ] ≤ C sup n ≥ j n (cid:88) k = j ( γ k − − γ k ) E [ V ( ˘Θ k − )] ≤ C sup n ≥ j n (cid:88) k = j ( γ k − − γ k )where we have used lastly Assumption 13(viii) and Jensen’s inequality. Since this is atelescoping sum, we get E [sup n ≥ j |T (4) j,n | ] ≤ C sup n ≥ j ( γ j − − γ n ) ≤ Cγ j − We then conclude by Lemma 20, since γ j → | P iβ ¯ H β (˘ θ ) | ≤ Cρ i V (˘ θ ) , where C, ρ do not depend on β ∈ B . Hence, E [ |T (5) j,n | ] ≤ C n (cid:88) k = j γ k (cid:88) i ≥ m k +1 ρ i E [ V ( ˘Θ k − )] ≤ C n (cid:88) k = j γ k ρ m k , where we have used lastly Assumption 13(viii) and Jensen’s inequality. Since m k was deﬁnedto be of order | log γ k | , we have E [ |T (5) j,n | ] ≤ C ∞ (cid:88) k = j γ k < ∞ By the dominated convergence theorem, we then have E [sup n ≥ j |T (5) j,n | ] ≤ C ∞ (cid:88) k = j γ k . Taking the limit j → ∞ , we can then conclude by using Lemma 20.(6) The proof is essentially the same as for (5).(7) We write for i ≥ P iβ k − P iβ k − = i − (cid:88) l =0 P i − l − β k (cid:0) P β k − P β k − (cid:1) P lβ k − . Since (cid:107) P iβ (cid:107) ∞ ≤ i ≥

0, and | ¯ H β | ≤

1, by Lemma 19(i), we have (cid:107) ( P iβ k − P iβ k − ) ¯ H β k (cid:107) ≤ C i − (cid:88) l =0 (cid:107) P i − l − β k (cid:107) ∞ | β k − β k − |(cid:107) P lβ k − ¯ H β k (cid:107) ∞ ≤ C | β k − β k − | i. Since | β k − β k − | ≤ γ k from the adaptation step in Algorithm 12, we have |T (7) j,n | ≤ C n (cid:88) k = j γ k m k (cid:88) i =1 iγ k ≤ C ∞ (cid:88) k = j γ k m k (1 + m k ) < ∞ . We then take sup n ≥ j on the left, take the expectation, and conclude by Lemma 20.(8) Since (cid:107) P iβ (cid:107) ∞ ≤ (cid:107) P iβ k − ( ¯ H β k − ¯ H β k − ) (cid:107) ∞ ≤ (cid:107) P iβ k − (cid:107) ∞ (cid:107) ¯ H β k − ¯ H β k − (cid:107) ∞ ≤ C | β k − β k − | Since | β k − β k − | ≤ γ k , we have E [sup n ≥ j T (8) j,n ] ≤ C ∞ (cid:88) k = j γ k m k < ∞ . We then conclude by Lemma 20. (cid:3)

B.6.

Proofs of convergence theorems. of Theorem 14.

We deﬁne our Lyapunov function w : R → [0 , ∞ ) to be the continuouslydiﬀerentiable function w ( β ) = | e β − e β ∗ | . We also have that h ( β ) = π (cid:48) β ( (cid:98) H β ) is continuous,which follows from Lemma 19(iii). One can then check that Assumption 13 and Lemma 21imply that the assumptions of (Andrieu et al., 2005, Theorem 2.3) hold. The latter resultimplies lim | β k − β ∗ | →

0, for some β ∗ ∈ B satisfying π (cid:48) β ∗ ( α β ∗ ) = α ∗ , as desired. (cid:3) Lemma 22.

Suppose Assumption 9 holds. Then both ( ˙ P β ) β ∈ B and ( P β ) β ∈ B are simultane-ously -geometrically ergodic (i.e. uniformly ergodic).Proof. We have pr( θ ) ≤ C pr some C pr >

0, and also 0 < δ q ≤ q ( θ, ϑ ), for all θ, ϑ ∈ T .Hence, for A ⊂ T ,˙ P β ( θ, A ) ≥ (cid:90) δ q min (cid:26) , pr( ϑ )pr( θ ) (cid:27) L β ( ϑ )1 ( ϑ ∈ A ) ≥ (cid:90) δ q pr( ϑ ) C pr L β ( ϑ )1 ( ϑ ∈ A )By Lemma 17(ii), it holds c β ≥ C min for some C min > β ∈ B . Therefore,˙ P β ( θ, A ) ≥ δπ β ( A ) , where δ ˙ P = δ q C min /C pr > β . As in Nummelin’s split chain construction(cf. Meyn and Tweedie, 2009), we can then deﬁne the Markov kernel R β ( θ, A ) = (1 − N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 23 δ ˙ P ) − (cid:0) ˙ P β ( θ, A ) − δ ˙ P π β ( A ) (cid:1) with π β R β = π β . Set Π β ( θ, A ) = π β ( A ). For any f ≤ β ∈ B ,and k ≥

1, we have (cid:107) ˙ P kβ f − π β ( f ) (cid:107) ∞ = (1 − δ ˙ P ) (cid:107) ( R β − Π β ) ˙ P k − β f (cid:107) ∞ = (1 − δ ˙ P ) (cid:107) R β ˙ P k − β (cid:0) f − π β ( f ) (cid:1) (cid:107) ∞ ≤ (1 − δ ˙ P ) (cid:107) ˙ P k − β (cid:0) f − π β ( f ) (cid:1) (cid:107) ∞ = (1 − δ ˙ P ) (cid:107) ˙ P k − β f − π β ( f ) (cid:107) ∞ ≤ . . . ≤ (1 − δ ˙ P ) k (cid:107) f − π β ( f ) (cid:107) ∞ ≤ − δ ˙ P ) k (cid:107) f (cid:107) ∞ , where we have used (cid:107) R β (cid:107) ∞ ≤ P β ) β ∈ B are simultaneously1-geometrically ergodic, and thus so are ( P β ) β ∈ B by Lemma 18(i). (cid:3) of Theorem 10. Since ( ˙ P β ) β ∈ B are simultaneously 1-geometric ergodic by Lemma 22, it isdirect to see that Assumption 9 implies Assumption 13. We conclude by Theorem 14. (cid:3) Supplement C. Simultaneous tolerance and covariance adaptation

Algorithm 23 (TA-AM( n b , α ∗ )) . Suppose Θ ∈ T ⊂ R n θ is a starting value with pr(Θ ) > = n θ × n θ is the identity matrix.1. Initialise δ = T where T ∼ Q Θ ( · ) and T >

0. Set µ = Θ .2. For k = 0 , . . . , n b −

1, iterate:(i) Draw Θ (cid:48) k ∼ N (Θ k , (2 . /n θ )Γ k )(ii) Draw T (cid:48) k ∼ Q Θ (cid:48) k ( · ).(iii) Accept, by setting (Θ k +1 , T k +1 ) ← (Θ (cid:48) k , T (cid:48) k ), with probability α δ k (Θ k , T k ; Θ (cid:48) k , T (cid:48) k ) = min (cid:26) , pr(Θ (cid:48) k ) φ ( T (cid:48) k /δ k )pr(Θ k ) φ ( T k /δ k ) (cid:27) . Otherwise reject, by setting (Θ k +1 , T k +1 ) ← (Θ k , T k ).(iv) log δ k +1 ← log δ k + γ k +1 (cid:0) α ∗ − α (cid:48) δ k (Θ k , Θ (cid:48) k , T (cid:48) k ) (cid:1) . (v) µ k +1 ← µ k + γ k +1 (cid:0) Θ k +1 − µ k (cid:1) . (vi) Γ k +1 ← Γ k + γ k +1 (cid:0) (Θ k +1 − µ k )(Θ k +1 − µ k ) T − Γ k (cid:1) .

3. Output (Θ n b , δ n b ). Supplement D. Details of extensions in Section 7

In case of non-simple cut-oﬀ, the rejected samples may be ‘recycled’ the rejected samplesin the estimator (Ceperley et al., 1977). This may improve the accuracy (but can alsoreduce accuracy in certain pathological cases; see Delmas and Jourdain (2009)). The ‘wasterecycling’ estimator is E WR δ,(cid:15) ( f ) = n (cid:88) k =1 W ( δ,(cid:15) ) k (cid:2) α δ (Θ k , Y k ; ˜Θ k +1 , ˜ Y k +1 ) f ( ˜Θ k +1 ) + [1 − α δ (Θ k , Y k ; ˜Θ k +1 , ˜ Y k +1 )] f (Θ k ) (cid:3) . When E δ,(cid:15) ( f ) is consistent under Theorem 4, this is also a consistent estimator. Namely, asin the proof of Theorem 4, we ﬁnd that (Θ k , Y k , ˜Θ k +1 , Y k +1 ) k ≥ is a Harris recurrent Markovchain with invariant distributionˆ π δ ( θ, y, ˜ θ, ˜ y ) = ˜ π δ ( θ, y )˜ q ( θ, y ; ˜ θ, ˜ y ) , and ˆ π (cid:15) ( θ, y, ˜ θ, ˜ y ) / ˆ π δ ( θ, y, ˜ θ, ˜ y ) = c (cid:15) w δ,(cid:15) ( y ), where ˜ q ( θ, y ; θ (cid:48) , y (cid:48) ) = q ( θ, θ (cid:48) ) g ( y (cid:48) | θ (cid:48) ). Therefore, E WR δ,(cid:15) ( f ) is a strongly consistent estimator of E ˆ π (cid:15) (cid:2) α δ (Θ , Y ; ˜Θ , ˜ Y ) f ( ˜Θ) + [1 − α δ (Θ , Y ; ˜Θ , ˜ Y )] f (Θ) (cid:3) = E π (cid:15) [ f (Θ)] . See (Rudolf and Sprungk, 2018; Schuster and Klebanov, 2018) for alternative waste recyclingestimators based on importance sampling analogues.

A reﬁned estimator may be formed asˆ E δ,(cid:15) ( f ) = (cid:80) nk =1 (cid:80) mj =0 ˆ U ( δ,(cid:15) ) k,j f (Θ k ) (cid:14) (cid:80) n(cid:96) =1 (cid:80) mi =0 ˆ U ( δ,(cid:15) ) (cid:96),i , where ˆ U ( δ,(cid:15) ) k, = U ( δ,(cid:15) ) k and ˆ U ( δ,(cid:15) ) k,j = ˆ N k φ ( ˆ T k,j /(cid:15) ) (cid:14) φ ( T k /δ ), for j ≥

1, and where ˆ N k is thenumber of independent random variables ˆ Z , ˆ Z , . . . ∼ g ( · | Θ k ) generated before observing φ ( ˆ T k, ˆ N k /δ ) >

0. The variables ˆ T k,j = d ( ˆ Z j , y ∗ ), and ˆ T k = d ( ˆ Y k , y ∗ ) with independentˆ Y k ∼ g ( · | Θ k ). This ensures that E [ ˆ N k φ ( ˆ T k,j /(cid:15) ) | Θ k = θ, Y k = y ] = L (cid:15) ( θ ) P g ( · | θ ) (cid:0) φ (cid:0) d ( Y, y ∗ ) /δ (cid:1) > (cid:1) , which is suﬃcient to ensure that ξ k,j ( f ) = ˆ U ( δ,(cid:15) ) k,j f (Θ k ) is a proper weighting scheme from˜ π δ to π (cid:15) ; see (Vihola et al., 2016, Proposition 17(ii)), and consequently the average ξ k ( f ) =( m + 1) − (cid:80) mj =0 ξ k,j ( f ) is a proper weighting. Supplement E. Supplementary results q e | q | e Figure 6.

Gaussian model with Gaussian cut-oﬀ. Estimates from singlerun of abc - mcmc (3) (left) and estimates from 10,000 replications of abc - mcmc ( δ ) for δ ∈ { . , . , . , . , } indicated by colours. k (burn-in iteration) l og d k acceptance rate Figure 7.

Progress of tolerance adaptation (left) and histogram of accep-tance rates (right) in the Gaussian model experiment with Gaussian cut-oﬀ.

N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 25

Table 5.

Root mean square errors ( × − ) from abc - mcmc ( δ ) for toler-ances (cid:15) in the Gaussian mode with φ simple . f ( x ) = x f ( x ) = | x | Acc.Cut-oﬀ δ \ (cid:15) φ simple φ Gauss

Table 6.

Frequencies of the 95% conﬁdence intervals for the adaptive algo-rithm in the Gaussian model, for tolerance (cid:15) = 0 . φ simple φ Gauss

Fixed tolerance Adapt Fixed tolerance Adapt δ x | x | Table 7.

Root mean square errors and acceptance rates in the Lotka-Volterra experiment. f ( θ ) = θ f ( θ ) = θ f ( θ ) = θ Acc. δ \ (cid:15)

80 110 140 170 200 80 110 140 170 200 80 110 140 170 200 rate φ s i m p l e

80 2.37 1.32 2.94 0.05110 1.81 1.48 0.99 0.86 2.26 1.88 0.07140 1.75 1.41 1.22 0.93 0.77 0.68 2.11 1.69 1.4 0.1170 1.83 1.35 1.14 1.05 0.96 0.75 0.64 0.6 2.14 1.65 1.33 1.15 0.14200 1.93 1.41 1.11 0.97 0.95 1.06 0.75 0.61 0.56 0.6 2.37 1.74 1.36 1.16 1.09 0.17 r e g r . φ E p a

80 3.1 1.52 2.77 0.05110 2.74 1.99 1.39 1.0 2.53 1.81 0.07140 3.02 2.08 1.56 1.54 1.05 0.79 2.76 1.9 1.39 0.1170 3.09 2.13 1.6 1.31 1.61 1.09 0.83 0.69 2.85 1.95 1.46 1.16 0.14200 3.19 2.2 1.68 1.36 1.15 1.63 1.1 0.84 0.71 0.63 2.91 2.04 1.52 1.21 1.01 0.17

Table 8.

Coverages of conﬁdence intervals from abc - mcmc ( δ ) for tolerance (cid:15) = 80, with ﬁxed tolerance and with adaptive tolerance in the Lotka-Volterramodel. Post-correction, simple cut-oﬀ Regression, Epanechnikov cut-oﬀFixed tolerance Adapt Fixed tolerance Adapt δ θ θ θ

80 110 140 170 2000.560.580.600.620.64 q

80 110 140 170 2000.560.580.600.620.6480 110 140 170 2000.00280.00300.00320.00340.0036 q

80 110 140 170 2000.00280.00300.00320.00340.003680 110 140 170 2000.360.390.420.450.48 e q

80 110 140 170 2000.360.390.420.450.48 e Figure 8.

Lotka-Volterra model with simple cut-oﬀ and step size n − / .Estimates from single run of abc - mcmc (200) (left) and estimates from 1,000replications of abc - mcmc ( δ ) for δ ∈ { , , , , } indicated bycolours. Department of Mathematics and Statistics, University of Jyv¨askyl¨a P.O.Box 35, FI-40014 University of Jyv¨askyl¨a, Finland

E-mail address : [email protected], [email protected] N THE USE OF ABC-MCMC WITH INFLATED TOLERANCE AND POST-CORRECTION 27

Table 9.

Frequencies of the 95% conﬁdence intervals and mean acceptancerates in the Lotka-Volterra experiment with step size n − / . f ( θ ) = θ f ( θ ) = θ f ( θ ) = θ Acc. δ \ (cid:15)

80 110 140 170 200 80 110 140 170 200 80 110 140 170 200 rate φ s i m p l e

80 0.32 0.11 0.11 0.07110 0.91 0.78 0.76 0.52 0.79 0.56 0.09140 0.97 0.96 0.91 0.95 0.88 0.8 0.96 0.9 0.86 0.12170 0.98 0.98 0.97 0.94 0.97 0.97 0.93 0.87 0.97 0.97 0.95 0.91 0.15200 0.99 0.98 0.98 0.96 0.93 0.99 0.99 0.98 0.94 0.87 0.98 0.98 0.97 0.94 0.92 0.18 r e g r . φ E p a

80 0.34 0.41 0.25 0.07110 0.86 0.81 0.88 0.87 0.81 0.82 0.09140 0.94 0.94 0.92 0.95 0.95 0.96 0.9 0.91 0.91 0.12170 0.95 0.95 0.95 0.95 0.96 0.97 0.97 0.97 0.92 0.94 0.94 0.95 0.15200 0.95 0.95 0.95 0.95 0.94 0.96 0.97 0.97 0.98 0.98 0.92 0.94 0.95 0.95 0.95 0.18

Table 10.

Root mean square errors and acceptance rates in the Lotka-Volterra experiment with step size n − / . f ( θ ) = θ f ( θ ) = θ f ( θ ) = θ Acc. δ \ (cid:15)

80 110 140 170 200 80 110 140 170 200 80 110 140 170 200 rate φ s i m p l e

80 3.24 2.2 4.67 0.07110 2.12 2.14 1.14 1.38 2.69 3.17 0.09140 1.87 1.56 1.49 0.89 0.81 0.79 2.1 1.82 1.63 0.12170 1.77 1.27 1.05 0.96 0.87 0.68 0.59 0.59 2.14 1.6 1.31 1.19 0.15200 1.94 1.45 1.2 1.11 1.08 0.95 0.69 0.59 0.54 0.57 2.44 1.95 1.68 1.58 1.52 0.18 r e g r . φ E p a

80 2.67 1.14 2.17 0.07110 2.88 2.18 1.27 0.9 2.36 1.76 0.09140 2.67 1.98 1.61 1.38 1.02 0.83 2.57 1.91 1.54 0.12170 2.89 1.98 1.49 1.21 1.46 0.98 0.74 0.61 2.63 1.79 1.34 1.08 0.15200 3.57 2.85 2.46 4.93 1.2 1.82 1.41 1.21 1.42 0.63 3.11 2.32 1.88 1.81 1.22 0.18

Table 11.

Root mean square errors of estimators from abc - mcmc ( δ ) fortolerance (cid:15) = 80, with ﬁxed tolerance and with adaptive tolerance in theLotka-Volterra model with step size n − / . Post-correction, simple cut-oﬀ Regression, Epanechnikov cut-oﬀFixed tolerance Adapt Fixed tolerance Adapt δ

80 110 140 170 200 122.6 80 110 140 170 200 122.6 θ ( × − ) 3.24 2.12 1.87 1.77 1.94 1.8 2.67 2.88 2.67 2.89 3.57 2.57 θ ( × − ) 2.2 1.14 0.89 0.87 0.95 1.04 1.14 1.27 1.38 1.46 1.82 1.28 θ ( × −2