An invitation to sequential Monte Carlo samplers
AAn invitation to sequential Monte Carlo samplers
Chenguang Dai , Jeremy Heng , Pierre E. Jacob ∗ , and Nick Whiteley Department of Statistics, Harvard University, USA ESSEC Business School, Singapore School of Mathematics, University of Bristol, UK
Abstract
Sequential Monte Carlo samplers provide consistent approximations of sequences of prob-ability distributions and of their normalizing constants, via particles obtained with a com-bination of importance weights and Markov transitions. This article presents this class ofmethods and a number of recent advances, with the goal of helping statisticians assess theapplicability and usefulness of these methods for their purposes. Our presentation empha-sizes the role of bridging distributions for computational and statistical purposes. Numericalexperiments are provided on simple settings such as multivariate Normals, logistic regressionand a basic susceptible-infected-recovered model, illustrating the impact of the dimension,the ability to perform inference sequentially and the estimation of normalizing constants.
Consider the task of sampling from a target distribution π ( dx ) = γ ( x ) dx/Z defined on a measur-able space ( X , X ), with unnormalized density γ ( x ) that can be evaluated exactly, and unknownnormalizing constant Z = R X γ ( x ) dx . This is the standard setting for various Markov chainMonte Carlo (MCMC) methods [Brooks et al., 2011]. An MCMC strategy starts by initializingthe Markov chain x from an initial distribution π on X , and subsequently sampling the nextstate x t given the current state x t − from M ( x t − , · ), where M is a Markov kernel on X designedto be π -invariant. The Markov chain ( x t ) t ≥ is generated for some time, and an initial por-tion of the chain is typically discarded as “burn-in”, perhaps based on some visualizations andquantitative diagnostics. The subsequent T ∈ N states constitute an empirical approximation T − P Tt =1 δ x t ( · ) of the target distribution π , with convergence guarantees as T → ∞ .Another classical method to approximate π starting from an initial distribution π is calledimportance sampling. One draws N independent samples ( x n ) n ∈ [ N ] from π , and computesweights ( w n ) n ∈ [ N ] with w n = γ ( x n ) /π ( x n ) for each n ∈ [ N ] = { , . . . , N } . The weights correctfor the discrepancy between π and π . The quantity Z N = N − P Nn =1 w n approximates Z as N → ∞ , and the weighted empirical measure ( N Z N ) − P Nn =1 w n δ x n ( · ) provides consistentapproximations of π as N → ∞ .In this article, we describe sequential Monte Carlo samplers (SMC, Del Moral et al. [2006]), asa combination of MCMC and importance sampling and as an alternative to either. An SMCsampler generates N draws, termed particles due to historical connections with particle filtering[Gordon et al., 1993, Chopin, 2002], that provide consistent approximations of Z and π as ∗ Corresponding author: [email protected] a r X i v : . [ s t a t . C O ] J u l → ∞ , just like importance sampling. These algorithms employ a combination of importancesampling and Markov kernels, which can allow them to tackle much more challenging problemsthan plain importance sampling, and presents various potential advantages compared to plainMCMC. The goal of this article is to help statisticians assess the applicability and usefulness ofSMC strategies.This article also serves as a review of selected advances since the germinal works of Chopin [2002],Del Moral et al. [2006]. One such advance is the realization that particle methods form a genericobject that can be used as part of larger algorithms. An example is the use of SMC to generateproposals in Metropolis–Hastings and Gibbs samplers [Andrieu et al., 2010], which itself canbecome part of a larger SMC sampler [Chopin et al., 2013, Fulop and Li, 2013]. Such assemblyof algorithms leads to estimators with new properties, for example lack of bias [Middleton et al.,2019]. Another thread of advances has been on the quantification of errors associated withSMC estimates [Chan and Lai, 2013, Lee and Whiteley, 2018, Olsson and Douc, 2019, Du andGuyader, 2019], which was crucially missing until recent years. These advances and ever broaderapplications have helped establish SMC samplers as a key part of the statistics toolbox. We begin with a description of SMC samplers following closely Del Moral et al. [2006]. It willbe apparent that SMC samplers require the specification of numerous objects, in comparisonwith MCMC methods that can be succinctly described by a choice of initial distribution and a(single) Markov transition kernel. This presentation makes the design choices one faces whenimplementing SMC samplers apparent; we will later describe how many of these choices can beimplicitly or adaptively made.Firstly, a sequence of T ∈ N distributions π t ( dx ) = γ t ( x ) dx/Z t defined on the same state space( X , X ) is introduced, where γ t ( x ) denotes an unnormalized density, which can be evaluatedpointwise, and Z t = R X γ t ( x ) dx a normalizing constant. We assume that π can be sampled fromand that the terminal distribution π T is precisely the target distribution π . For t ∈ [ T ], one caninformally think of two successive distributions, π t − and π t , as similar to one another.Next we introduce two sequences of Markov kernels. The first one is a sequence ( M t ) t ∈ [ T ] of“forward” kernels, with each M t designed to target π t exactly or approximately. In the sampler,the forward kernel M t is used to sample variables x t given realizations x t − at the t -th step of thealgorithm. One can think of M t as an MCMC kernel leaving π t invariant, although other choicesare possible and useful. The second sequence, denoted by ( L t − ) t ∈ [ T ] , consists of “backward”kernels. These backward kernels might not necessarily appear in practical implementationsof the algorithm, but they play an important conceptual role by allowing proposal and targetdistributions to be defined on a common space. Overall, the SMC sampler propagates N particlesusing the forward kernels ( M t ), and assigns to the particles some weights that depend on ( π t ),( M t ) and ( L t − ). These weights trigger interactions between the particles via resampling steps.Indeed an SMC sampler with N particles also requires the specification of a resampling mech-anism, by which some particles are discarded and others duplicated, typically maintaining afixed population size. Resampling involves a distribution r ( ·| w N ) on [ N ] N parametrized by avector w N = ( w , . . . , w N ) of probabilities. Different resampling schemes correspond to dif-ferent choices of distributions r ( ·| w N ). The simplest resampling scheme is called multinomialresampling [Gordon et al., 1993], where a N ∼ r ( ·| w N ) if and only if ( a n ) n ∈ [ N ] are independentcategorical variables on [ N ] with probabilities w N . At step t of the SMC sampler, the N parti-cles are obtained by propagating particles from the previous step with indices ( a n ) n ∈ [ N ] , whichare generated from the resampling distribution parametrized by the particle weights. We referreaders to Gerber et al. [2019], Li et al. [2020] for recent discussions on resampling schemes.With these ingredients a generic, non-adaptive SMC sampler is described in Algorithm 1. In the2eighting step, for each n ∈ [ N ], a weight is assigned to the pair (ˇ x nt − , x nt ), using the function( x t − , x t ) w t ( x t − , x t ) = γ t ( x t ) L t − ( x t , x t − ) γ t − ( x t − ) M t ( x t − , x t ) . (1.1)This corresponds to importance sampling with proposal π t − ( dx t − ) M t ( x t − , dx t ) and target π t ( dx t ) L t − ( x t , dx t − ) on the pair ( x t − , x t ). This target distribution admits π t as a marginalon x t , for any choice of backward kernel L t − . Performing importance sampling on the joint spaceovercomes the intractability of the marginal distribution of the proposed x t . With appropriatechoices of forward and backward kernels, the weight in (1.1) can be evaluated pointwise, at leastup to a multiplicative constant.The output of the algorithm includes weighted particles ( w nt , x nt ) n ∈ [ N ] approximating each dis-tribution π t , in the sense that π Nt ( ϕ ) = P n ∈ [ N ] w nt ϕ ( x nt ) converges to π t ( ϕ ) = R X ϕ ( x t ) π t ( dx t ),for a suitable class of test functions ϕ : X → R , as N → ∞ . Another output of the algorithmis an unbiased normalizing constant estimator Z Nt , computed using the unnormalized weights,which provide consistent approximation of Z t as N → ∞ . The lack of bias enables various usemodes for SMC samplers described in Section 4. Algorithm 1
Sequential Monte Carlo sampler
Input: sequence of distributions ( π t ), forward Markov kernels ( M t ), backward Markov kernels( L t ), resampling distribution r ( ·| w N ) on [ N ] n where w N is an n -vector of probabilities.1. Initialization.(a) Sample particle x n from π ( · ) for n ∈ [ N ] independently.(b) Set w n = N − for n ∈ [ N ].2. For t ∈ [ T ], iterate the following steps.(a) Sample ancestor indices ( a nt − ) n ∈ [ N ] from r ( ·| w Nt − ),and define ˇ x nt − = x a nt − t − for n ∈ [ N ].(b) Sample particle x nt ∼ M t (ˇ x nt − , · ) for n ∈ [ N ].(c) Compute weights w t (ˇ x nt − , x nt ) for n ∈ [ N ] based on (1.1),and set w nt ∝ w t (ˇ x nt − , x nt ) so that P n ∈ [ N ] w nt = 1. Output: weighted particles ( w nt , x nt ) n ∈ [ N ] approximating π t , and estimator Z Nt = Q ts =1 N − P n ∈ [ N ] w s (ˇ x ns − , x ns ) of Z t for t ∈ [ T ].Various advances have led to the development of SMC as presented in Algorithm 1. Continuous-time formulations without resampling originate in statistical physics [Jarzynski, 1997, Crooks,1998] with the aim of estimating free energy differences. Discrete-time analogues with MCMCkernels were considered independently by Neal [2001] for statistical applications. The connectionto particle filtering was explored in subsequent papers by Gilks and Berzuini [2001] and Chopin[2002]. These works exploited the use of “resample-move” steps, i.e. resampling followed byMCMC moves to improve particle diversity for both dynamic and static models. The mainreference remains Del Moral et al. [2006], where it is shown that these methods can be placedin a unified framework. The introductory section of Del Moral et al. [2006] mentions otherreferences to early, related methods. 3 .3 Key specificities of SMC and outline of the article The following is a list of key differences between SMC samplers and standard MCMC methods.The rest of this article is structured by elaborating on these points.1. According to the above description, SMC samplers require the specification of T distribu-tions ( π t ), T forward transition kernels ( M t ) and T backward transition kernels ( L t − ).Section 2 shows how various considerations can help guide these choices, leading to practicalalgorithms with a manageable number of tuning parameters.2. SMC samplers alternate between MCMC moves and weighting steps based on importancesampling. As the performance of classical importance sampling is known to deterioraterapidly with the dimension of the state space, this naturally raises concerns about theperformance of SMC samplers. Section 3 serves to alleviate such concerns by describingsimple analytical and numerical results that elucidates the role of bridging distributions.3. Approximations provided by SMC samplers are instances of interacting particle systems[Del Moral, 2004]. This contrasts with the theory of Markov chain that underpins MCMCapproximations [Nummelin, 2002]. This difference is not only theoretical, as it impactspractical choices on how SMC samplers can be executed. Section 4 describes several usecases of SMC samplers and the quantification of estimation errors.4. In addition to approximating the target π , SMC samplers provide estimates of bridgingdistributions ( π t ) and their normalizing constants ( Z t ). These objects are not easily ob-tained as by-products of standard MCMC algorithms. In Section 5, we illustrate throughexamples these objects of inference and their possible use in statistics.Methodological considerations on the choice of paths and Markov kernels can be found in Section2. Section 3 is intended for readers who might be skeptical about the performance of SMC forhigh-dimensional problems. Section 4 describes different use modes of SMC samplers, theiramenability to parallel computing and the quantification of error in the resulting estimates.There are many other points of comparison between MCMC and SMC, for example how theyperform on multimodal target distributions. Some elements are discussed briefly in Section 6. When using MCMC methods, chains are started from some initial distribution π , and iterativelypropagated using a Markov kernel M targeting the distribution of interest π . The Markov kernelitself might depend on tuning parameters which could be chosen based on preliminary runs,or adaptively determined during the course of the algorithm [Haario et al., 2001, Atchadé andRosenthal, 2005]. The SMC sampler in Algorithm 1 requires the specification of more objectsbefore it is implementable. Here we describe several ways of specifying these objects. As in the standard MCMC setup, an initial distribution π ( dx ) = γ ( x ) dx/Z and a targetdistribution π ( dx ) = γ ( x ) dx/Z are assumed to be inputs of the problem. We first considerthe choice of a path of distributions π t ( dx ) = γ t ( x ) dx/Z t for t ∈ [ T ], where the number ofdistributions T can be user-specified or determined adaptively as considered in Section 2.3. Thefollowing covers only some of the many use cases of SMC samplers in practice.4 eometric path. A popular choice is the geometric path γ t ( x ) = γ ( x ) − λ t γ ( x ) λ t , (2.1)defined by a sequence 0 = λ < λ < · · · < λ T = 1, which are commonly referred to as inversetemperatures, following the terminology from simulated annealing in the context of optimization[Kirkpatrick et al., 1983]. This choice is generic in the sense that the unnormalized density γ t ( x )(and its gradient) can be evaluated pointwise as long as it is possible to evaluate γ ( x ) and γ ( x )(and their gradients).In the Bayesian setting where π is a (proper) prior and π a posterior distribution, the geo-metric path π t corresponds to raising the likelihood function to the power of λ t . In additionto computational considerations, there could also be statistical reasons to care about the re-sulting “tempered” posteriors. One such motivation concerns misspecified models, where thereare statistical arguments to adjust the likelihood by raising it to a power that is typically lowerthan one [Royall and Tsou, 2003, Bissiri et al., 2016, Grünwald and Van Ommen, 2017, Holmesand Walker, 2017]. By applying an SMC sampler on (2.1) with a fine sequence of ( λ t ), one caninspect how approximations of the tempered posteriors vary with the exponent. Path of partial posteriors.
Consider a Bayesian setting where the initial distribution π ( dx ) = p ( dx ) represents a (proper) prior distribution of unknown parameters x ∈ X , and the tar-get distribution p ( dx | y T ) is the posterior distribution of parameters based on observations y T = ( y , . . . , y T ). In the original work of Chopin [2002], an SMC sampler was applied to thesequence of partial posterior distributions π t ( dx ) = p ( dx | y t ) for t ∈ [ T ]. In addition to compu-tational benefits, this procedure provides a much richer analysis compared to the approximationof p ( dx | y T ) alone. By visualizing how the posterior distribution of parameters evolves as datapoints are assimilated, one can assess the influence of each observation on the distributions ofbeliefs. Concepts such as sequential revision of beliefs and coherency are typically presentedas central in the Bayesian framework (e.g. Section 2.4.4 of Bernardo and Smith [2009]). SMCsamplers using the path of partial posteriors provide a computational materialization of theseideas. The ability to estimate expectations with respect to partial posteriors is also key to theestimation of certain quantities such as predictive sequential criteria; e.g. the Hyvärinen score[Dawid and Musio, 2015], which is an alternative to the marginal likelihood [Shao et al., 2019].This path also plays a specific role in Bayesian sequential experimental design [Drovandi et al.,2013, 2014, Cuturi et al., 2020].In practice, it is more robust to gradually introduce each observation by employing e.g. ageometric path between successive partial posteriors. In the presence of improper priors, thesequence has to be modified; one possibility is to bridge between a (proper) distribution and aposterior distribution that conditions on enough observations for it to be proper. Path of truncated distributions.
The task of rare event estimation can be described asapproximating the probability mass of a set A ∈ X under some distribution µ ( dx ) = µ ( x ) dx defined on ( X , X ). Following Cérou et al. [2012], we consider sets of the form A = { x ∈ X :Φ( x ) ≥ ‘ } for some function Φ : X → R and level ‘ ∈ R . In this setting, one can define γ t ( x ) = µ ( x ) I A ( ‘ t ) ( x ) , (2.2)where −∞ = ‘ < ‘ < . . . < ‘ T = ‘ is a sequence of levels, and I A ( l ) ( x ) denotes the indicatorfunction on the set A ( l ) = { x ∈ X : Φ( x ) ≥ l } . This defines a path of distributions that graduallytruncates π ( dx ) = µ ( dx ) to π ( dx ) = µ ( dx ) I A ( x ) /Z , which has a normalizing constant Z = µ ( A )that is equal to the probability of interest.Estimating the probability of a set defined by a level of a function covers a range of applicationssuch as power systems analysis [Owen et al., 2019], post-selection inference [Panigrahi et al., 2017],5rotection of digital documents [Cérou et al., 2012] and random utility models [Ridgway, 2016].Although only the final normalizing constant Z T = Z is required in the preceding applications,access to the intermediate normalizing constants Z t = µ ( A ( ‘ t )) can also be useful for the purposeof sensitivity analysis.Returning to the Bayesian setup where π and π denote a (proper) prior and posterior, re-spectively, the use of nested sampling [Skilling, 2006, Chopin and Robert, 2010] allows one torepresent the marginal likelihood as Z = R ∞ π ( A ( l )) dl with A ( l ) defined by levels of the like-lihood function Φ( x ) = γ ( x ) /γ ( x ). This identity was leveraged by Salomone et al. [2018] toconstruct an SMC sampler targeting the path of distributions (2.2) with µ = π and ‘ = ∞ .To quantify the compatibility between a distribution µ ( dx ) and an observation x ∗ ∈ X , one cancompute a p-value which compares the distribution of a test statistic Φ( X ) under X ∼ µ withthe observed value Φ( x ∗ ). Monte Carlo approximation of the p-value [Besag, 2001] can be seenas estimation of the probability µ ( A ) with A defined by the level ‘ = Φ( x ∗ ). In this setting,applying an SMC sampler [Cérou et al., 2012] for a range of levels would allow the tabulation ofp-values or critical regions for future use. Path of least coding effort.
Consider a situation where one already has access to an MCMCalgorithm that targets the distribution of interest π . To reduce implementation effort, it issometimes possible to introduce a path of distributions ( π t ) so that only slight modifications tothe existing MCMC algorithm are required to target each bridging distribution. We describe aconcrete example taken from Rischard et al. [2018].Consider a logistic regression model for binary outcomes Y = ( Y , . . . , Y m ) ∈ { , } m givencovariates X = ( x , . . . , x m ) ∈ R m × d . Under the model, Y i is a Bernoulli random variablewith probability of success (1 + exp( − x Ti β )) − for i ∈ [ m ], where β ∈ R d denote the regressioncoefficients. Let N ( µ, Σ) denote a Normal distribution with mean vector µ and covariance matrixΣ and its density by z
7→ N ( z ; µ, Σ). Assuming a prior of N ( b, B ) for β , the posterior density is p ( β | y ) ∝ N ( β ; b, B ) m Y i =1 exp( x Ti βy i )1 + exp( x Ti β ) , (2.3)where y ∈ { , } m denotes the observed outcomes. Following the data augmentation approach of[Polson et al., 2013], we introduce auxiliary variables ω ∈ R m + and consider the extended targetdensity p ( β, ω | y ) ∝ p ( β | y ) m Y i =1 PG ( ω i ; 1 , x Ti β ) , (2.4)where z
7→ PG ( z ; 1 , c ) denotes the density of the Pólya–Gamma class PG (1 , c ) (see Sections2.2 and 2.3 of Polson et al. [2013]). Under (2.4), the marginal distribution of β is the targetdistribution of interest p ( dβ | y ), and the full conditional distributions are p ( β | ω, y ) = N ( β ; µ ( ω ) , Σ( ω )) , p ( ω | β, y ) = m Y i =1 PG ( ω i ; 1 , x Ti β ) , (2.5)where Σ( ω ) = ( X T diag( ω ) X + B − ) − and µ ( ω ) = Σ( ω )( X T ˜ y + B − b ) with ˜ y = ( y − / , . . . , y m − / π t )6ndexed by λ t ∈ [0 ,
1] that replaces the covariates X by λ t X . This amounts to defining π t ( β ) ∝ N ( β ; b, B ) m Y i =1 exp( λ t x Ti βy i )1 + exp( λ t x Ti β ) , (2.6)which is not equivalent to the geometric path (2.1). By considering a sequence 0 = λ <λ < · · · < λ T = 1, we interpolate between the prior π ( β ) = N ( β ; b, B ) and the posterior π ( β ) = p ( β | y ). To construct an MCMC kernel M t for each π t , one can simply apply an existingimplementation of the PGG sampler with the modified covariates λ t X . This provides forwardkernels ( M t ) without tuning parameters. Path of ABC or coarsened posteriors.
In some inference problems, the likelihood x p ( y ∗ | x ) of observed data y ∗ ∈ Y given parameters x ∈ X is intractable. Approximate Bayesiancomputation (ABC) replaces the likelihood with x R Y I A ( y ∗ ,ε ) ( y ) p ( dy | x ), where the set A ( y ∗ , ε ) = { y ∈ Y : d ( y, y ∗ ) < ε } is defined by a discrepancy measure between datasets d : Y × Y → R + anda desired tolerance ε >
0. This approximate likelihood is also intractable, but it can be unbias-edly estimated by simulating a dataset from the model, and checking if it is close to the observeddataset. This prompts the definition of the path π t ( dx, dy ) ∝ p ( dx ) p ( dy | x ) I A ( y ∗ ,ε t ) ( y ) , (2.7)where p ( dx ) denotes the prior distribution and ∞ = ε > ε > · · · > ε T = ε is a decreasingsequence of tolerances [Sisson et al., 2007, Beaumont et al., 2009, Del Moral et al., 2012, Drovandiand Pettitt, 2011]. Marginally, the path (2.7) bridges between the prior π ( dx ) = p ( dx ) and theABC-posterior π ( dx ) ∝ p ( dx ) R Y I A ( y ∗ ,ε ) ( y ) p ( dy | x ). In this setting, SMC approximations of thebridging distributions π t ( dx ) allow one to assess the sensitivity of the choice of ε (see e.g. variousfigures in Bernton et al. [2019b]). Lastly, we note that very similar ideas can be used to constructan SMC sampler to approximate the “coarsened posteriors” introduced by Miller and Dunson[2019] at different levels of coarsening. After choosing a path of distributions ( π t ), the user selects forward and backward kernels, ( M t )and ( L t − ). From Algorithm 1, one has to be able to sample from M t ( x t − , · ) for any arbitrary x t − ∈ X , and to evaluate the weight function w t ( x t − , x t ) defined in (1.1). These requirementsare necessary to implement an SMC sampler. We would set M t ( x t − , dx t ) = π t ( dx t ) if per-fect samples could be obtained, and define L t − ( x t , dx t − ) = π t − ( dx t − ), in which case theweight function would simplify to w t ( x t − , x t ) = Z t /Z t − so the estimator of Z t would have zerovariance. The following considers some suboptimal but practical choices. MCMC moves.
The flexibility of SMC samplers allows one to exploit the vast literature onMCMC. One can select M t to be any π t -invariant MCMC kernel or a composition of several π t -invariant MCMC kernels.Although such choices typically do not admit tractable transition densities, the weight function in(1.1) can still be tractable if the backward kernel L t − is chosen judiciously. Following Jarzynski[1997], Crooks [1998], Neal [2001], Chopin [2002], L t − can be selected as the time reversal of M t ,i.e. π t ( dx t ) L t − ( x t , dx t − ) = π t ( dx t − ) M t ( x t − , dx t ), leading to the weight γ t ( x t − ) /γ t − ( x t − ).We refer the reader to Del Moral et al. [2006, Section 3.3] for other choice of backward kernelsand discussions on how to optimally select ( L t − ) given ( M t ).SMC samplers can accommodate other kernels M t , that are not necessarily π t -invariant, whilepreserving consistency of SMC estimates. The following details two examples that show how7o remove time-discretization biases without resorting to Metropolis–Hastings corrections. Thisflexibility can also be of interest when one approximates MCMC kernels to reduce computationtime [Johndrow et al., 2015]. Unadjusted Langevin moves.
We consider selecting forward kernels based on the unadjustedLangevin algorithm (ULA) [Grenander and Miller, 1994] M t ( x t − , dx t ) = N ( x t ; x t − + ε Ω ∇ log π t ( x t − ) / , ε Ω) dx t , (2.8)where ε > ∈ R d × d a positive definite preconditioning matrix whichcan also be state-dependent [Girolami and Calderhead, 2011]. As the ULA transition is an Euler–Maruyama discretization of an overdamped Langevin diffusion, it does not leave π t invariant forany ε >
0. Ergodicity properties of ULA have been studied in Roberts and Tweedie [1996]and nonasymptotic results have been established recently by Dalalyan [2017] and Durmus andMoulines [2017]. When a Metropolis–Hastings correction step is added to enforce π -invariance,the resulting MCMC method is known as MALA.In an SMC sampler, one can account for the time discretization using importance sampling.As the underlying Langevin diffusion is reversible, this prompts the choice L t − ( x t , dx t − ) = M t ( x t , dx t − ) for sufficiently small ε [Nilmeier et al., 2011]. Under these choices, the weightfunction in (1.1) is tractable as both forward and backward kernels are Normal transitions, andwould be close to γ t ( x t − ) /γ t − ( x t − ) when the step size is small. The additional flexibilitygained by having tractable ULA kernels as an alternative to MALA kernels was exploited in thecontrolled SMC approach [Heng et al., 2020], which optimizes over the path of distributions ( π t )and forward kernels ( M t ) to improve the efficiency of SMC samplers. The tractability of ULAkernels has also been used in related work by Bernton et al. [2019a] that fixes ( π t ) but optimizesover both ( M t ) and ( L t − ) to obtain better algorithmic performance. Unadjusted Hamiltonian moves.
We can consider forward kernels constructed using Hamil-tonian dynamics [Duane et al., 1987] that target the extended distributions ˜ π t ( dx t , dv t ) = π t ( dx t ) N ( v t ; 0 , Ω) dv t for ( x t , v t ) ∈ R d × R d . Note that x t are the original state variables ofinterest, v t are auxiliary variables and Ω ∈ R d × d denotes a “mass matrix” that can be state-dependent if one employs the methodology of Girolami and Calderhead [2011].Given a sample x t − (approximately) from π t − at step t −
1, we first sample v t − from N (0 , Ω),so that the pair ( x t − , v t − ) is (approximately) from ˜ π t − . We then define the initial position q (0) = x t − and initial momentum p (0) = v t − of a fictitious object undergoing Hamiltoniandynamics, defined by the Hamiltonian function H t ( q, p ) = − log π t ( q ) + p T Ω − p/
2. As the flow istypically intractable, time discretization is necessary. A popular choice is the leap-frog integrator, p ( ‘ + 1 /
2) = p ( ‘ ) + ε ∇ log π t ( q ( ‘ )) ,q ( ‘ + 1) = q ( ‘ ) + ε Ω − p ( ‘ + 1 / , (2.9) p ( ‘ + 1) = p ( ‘ + 1 /
2) + ε ∇ log π t ( q ( ‘ + 1)) , for ‘ = 0 , , . . . , m −
1, where ε > m ∈ N is the number of leap-frog steps.Finally, we set x t = q ( m ) and v t = p ( m ). We write the composition of leap-frog iterations asΦ ‘t ( q (0) , p (0)) = ( q ( ‘ ) , p ( ‘ )) for ‘ ∈ [ m ]. The transition from ( x t − , v t − ) to ( x t , v t ) defines adeterministic forward kernel M t (( x t − , v t − ) , dx t , dv t ) = δ Φ mt ( x t − ,v t − ) ( dx t , dv t ) on the extendedspace R d × R d .As the Hamiltonian is not conserved exactly under time discretization, M t is not ˜ π t -invariantfor any ε >
0. Instead of employing a Metropolis–Hastings correction, it is also possible toaccount for the discretization bias using importance sampling with proposal q t ( dx t , dv t ) =8˜ π t − mt )( dx t , dv t ) given by the push-forward measure of ˜ π t − under the map Φ mt and thetarget ˜ π t ( dx t , dv t ). Using reversibility and volume preserving properties of Φ mt , the proposaldensity can be computed using change of variables, i.e. q t ( x t , v t ) = ˜ π t − ( x t − , v t − ) where( x t − , v t − ) = (Φ mt ) − ( x t , v t ) is obtained using the inverse map. The resulting importanceweight is w t ( x t − , v t − , x t , v t ) ∝ ˜ π t ( x t , v t )˜ π t − ( x t − , v t − ) = exp( − H t ( x t , v t ))exp( − H t − ( x t − , v t − )) , (2.10)which corresponds to L t − (( x t , v t ) , dx t − , dv t − ) = δ (Φ mt ) − ( x t ,v t ) ( dx t − , dv t − ). If the Hamil-tonian is conserved, observe that the weight function (2.10) would be γ t ( x t − ) /γ t − ( x t − ), asin the case of MCMC moves with time-reversed backward kernels. The above arguments andrelated ideas can be found in Jarzynski [2000], Neal [2005], Schöll-Paschinger and Dellago [2006].We now outline several possible extensions. Firstly, as in HMC one can replace Φ mt with anyreversible, volume preserving map, e.g. by using an approximation of π t in the definition of theHamiltonian. Secondly, analogous to several applications of a π t -invariant HMC kernel, we canalso accommodate several iterations of momentum refreshment and leap-frog integration, i.e. ini-tializing at x t, = x t − , we would sample ˜ v t,i − ∼ N (0 , Ω) and set ( x t,i , v t,i ) = Φ mt ( x t,i − , ˜ v t,i − )for i ∈ [ I ]. In contrast to compositions of π t -invariant MCMC kernels that do not affect impor-tance weights, we have to modify (2.10) to account for the additional iterations, w t ( x t, I , v t, I , ˜ v t, I − ) ∝ π t ( x t,I ) π t − ( x t, ) I Y i =1 N ( v t,i ; 0 , Ω) N (˜ v t,i − ; 0 , Ω) . (2.11)To reduce the variance of the product in (2.11), one could also consider partial momentumrefreshment [Horowitz, 1991, Neal, 2011]. Thirdly, in the spirit of the work by Neal [1994],Calderhead [2014], Nishimura and Dunson [2018] for HMC, it is also possible to use all iteratesin the leap-frog integrator (2.9) within the SMC framework. Using the same arguments, wecan consider the proposals q ‘t ( dx t , dv t ) = (˜ π t − ‘t )( dx t , dv t ) for all ‘ ∈ [ m ] when forming animportance sampling approximation of ˜ π t ( dx t , dv t ). In Algorithm 1, one would have N × m instead of N samples to consider in Steps 2(b) and 2(c); the resampling operation in Step 2(a)would then select N particles among the N × m weighted samples. Since the use of multipleproposals within importance sampling is consistent in the limit of the number of samples, itfollows that the resulting SMC sampler will also be consistent as N → ∞ . Tuning parameters.
Having chosen the type of forward kernels ( M t ), there might still be sometuning parameters to consider. Firstly, it is often worthwhile to use more than one MCMC itera-tions at each step of the SMC sampler as this can significantly improve algorithmic performance.For MCMC kernels, more iterations can be employed straightforwardly without modifying theimportance weights. On the other hand, unadjusted kernels without Metropolis–Hastings cor-rections require additional care when defining importance weights (see e.g. (2.11)). Next, eachMCMC kernel may also depend on some algorithmic parameters. From the above discussion,one has to select a step size ε and a preconditioning matrix Ω for kernels based on the over-damped Langevin diffusion; a step size ε , a number of leap-frog steps m and a mass matrix Ω forkernels based on Hamiltonian dynamics. These tuning parameters can also be time-varying, i.e.adapted to each bridging distribution. Some difficulties in tuning such gradient-based algorithmsare discussed in Livingstone and Zanella [2019].A specificity of the SMC sampler framework is that approximations of the previous and currentbridging distributions are available at each step. Existing samples can be used to estimatefeatures of bridging distributions to inform the choice of tuning parameters for future steps ofthe algorithm; e.g. one can select Ω as the estimated covariance of bridging distributions forRWMH and MALA moves [Chopin, 2002]. We refer readers to Fearnhead and Taylor [2013]for a generic recipe to automate such tuning procedures, Buchholz et al. [2020] for the case of9MC kernels, and Schäfer and Chopin [2013], South et al. [2019] for other strategies to adaptindependent Metropolis–Hastings proposals within SMC. While most adaptation rules will notaffect consistency properties of the resulting SMC sampler [Beskos et al., 2016], they may notpreserve the unbiasedness property of the normalizing constant estimators. Our discussion on the choice of paths ( π t ) t ∈ [ T ] in Section 2.1 has not addressed the choice of thenumber of distributions T and the selection of particular elements along the path. In the case ofa geometric path (2.1), the latter corresponds to determining an increasing sequence of inversetemperatures ( λ t ) t ∈ [ T ] . A simple approach is to pre-specify T , which allows one to control thecomputational cost, and select λ t = ( t/T ) p for t ∈ [ T ] and some exponent p > T is sufficiently large and p is appropriately chosen (e.g. using preliminary runs). Thefollowing describes a commonly used procedure to specify T and ( λ t ) t ∈ [ T ] adaptively, resultingin a sampler with random cost.Recall from Section 2.2 that when the forward kernels ( M t ) are MCMC kernels and the backwardkernels ( L t − ) are the corresponding time reversals, the weight function at step t ∈ [ T ] is w t ( x t − ) = γ t ( x t − ) γ t − ( x t − ) = γ ( x t − ) γ ( x t − ) λ t − λ t − . (2.12)As particle weights do not depend on their states at time t in this setting, one should performweighting (Step 2(c)) and resampling (Step 2(a)) before applying MCMC moves (Step 2(b)) topromote sample diversity in Algorithm 1. Equation (2.12) can be seen as an importance samplingapproximation of π t using samples from π t − as proposals. Suppose that λ t − ∈ [0 ,
1) and hence π t − have been determined at this stage, and we would like to seek the next inverse temperature λ t ∈ ( λ t − ,
1] so that the next bridging distribution π t can be well-approximated by π t − usingimportance sampling. One way to ensure good importance sampling performance is to keep the χ -divergence small [Agapiou et al., 2017], where χ ( π t | π t − ) = Z X (cid:18) π t ( x ) π t − ( x ) − (cid:19) π t − ( dx ) = R X w t ( x ) π t − ( dx ) (cid:0)R X w t ( x ) π t − ( dx ) (cid:1) − . (2.13)Instead of fixing χ ( π t | π t − ) to a desired level, it will be more convenient to work with % t ( λ t ) =(1 + χ ( π t | π t − )) − as this quantity takes values in [0 , x nt − ) n ∈ [ N ] approximating π t − , a Monte Carlo approximation of % t ( λ t ) is given by ˆ % t ( λ t ) = ESS t ( λ t ) /N ,where ESS t ( λ t ) = (cid:16)P Nn =1 w t ( x nt − ) (cid:17) P Nn =1 w t ( x nt − ) = (cid:16)P Nn =1 ( γ/γ )( x nt − ) λ t − λ t − (cid:17) P Nn =1 ( γ/γ )( x nt − ) λ t − λ t − ) . (2.14)This is the effective sample size (ESS) introduced in Kong et al. [1994] to assess the qualityof weighted samples. This quantity takes values in [1 , N ], it achieves the lower bound whenone sample holds all the normalized weight, and the upper bound when all samples have equalweights. Despite its popularity, this diagnostic should be interpreted with care. While smallvalues of ESS indeed imply that the importance sampling approximation is poor, large ESS maynot necessarily imply good performance. For example, if the target has well-separated modesand the proposal corresponds well to one of the modes, the ESS might be close to N with largeprobability, for fixed N .If ˆ % t (1) is greater than some pre-specified threshold κ ∈ (0 , λ t = 1 and terminatethe bridging process. Otherwise, we solve for λ t ∈ ( λ t − ,
1) such that ˆ % t ( λ t ) is equal to κ . As10 enforces the χ -divergence between successive distributions to be approximately δ = κ − − N regime, higher thresholds will provide better performance at the cost of morebridging distributions T . More discussions about the interplay between κ and T will be givenin Section 3. The search for λ t can be implemented using the bisection method on the interval[ λ t − ,
1] as the function ˆ % t ( λ t ) is strictly decreasing [Beskos et al., 2016, Lemma 5.1]. The costof this procedure is negligible as evaluations of (2.14) are inexpensive once ( γ/γ )( x nt − ) havebeen pre-computed for all n ∈ [ N ].As long as the Markov kernels are chosen such that the weights do not depend on the par-ticles after the Markov transition, the same ideas can be applied to any path of distribu-tions. If resampling is not performed at every time step, e.g. when using an adaptive re-sampling scheme, Zhou et al. [2016] showed how to modify the adaptation criterion by replacingthe ESS with a quantity they termed as the conditional ESS. Criterions other than the χ -divergence/ESS can also be employed, e.g. the proportion of alive particles in Del Moral et al.[2012] for ABC targets, or generalized notions of ESS in Huggins and Roy [2019], or criteriabased on the Kullback–Leibler (KL) divergence [Cornebise et al., 2008, Equation 2.8] defined asKL( π t | π t − ) = R X log( π t ( x ) /π t − ( x )) π t ( dx ).In general, SMC samplers that adaptively determine the distributions ( π t ) t ∈ [ T ] on the fly donot preserve the unbiasedness property of the normalizing constant estimators, but consistencyproperties follow from results in Beskos et al. [2016]. Unbiasedness of normalizing constants canbe restored at approximately twice the cost by either re-running an SMC sampler with ( π t ) t ∈ [ T ] determined from an adaptive run, or running two SMC samplers simultaneously with one adapt-ing ( π t ) t ∈ [ T ] and the other producing unbiased estimators. When the bridging distributions arepre-specified, it is worth noting that adaptive resampling schemes (e.g resampling whenever theESS falls below some threshold) do not alter the unbiasedness of normalizing constant estimators[Whiteley et al., 2016]. The computational cost of obtaining a reliable importance sampling estimator is dictated by thediscrepancy between the proposal and target distributions, which may be measured by the χ orKL divergence [Agapiou et al., 2017, Chatterjee and Diaconis, 2018]. For example, the numberof samples needed to achieve an importance sampling estimator of the normalizing constant Z with a given variance is proportional to this χ -divergence.As the dimension d ∈ N of the space X grows, it is often the case in practical applications thatthe χ and KL divergences between π and π increase exponentially with d , and so exponentiallymany samples are needed to stabilize importance sampling estimates. Since each step of SMCsamplers also involves importance sampling, it is sensible to be concerned about their performancein high dimensions. Moreover, the performance of particle filters, which are the SMC counterpartfor the task of filtering in state space models, is also known to degrade exponentially with thedimension of the latent space [Snyder et al., 2008, Rebeschini and Van Handel, 2015].Remarkably however, it is possible to construct SMC samplers that deliver reliable estimatesusing practical computational cost for problems with high dimension; see e.g. the applications toinverse problems in Kantas et al. [2014], Beskos et al. [2015], as well as applications in various sta-tistical settings in Schäfer and Chopin [2013], Naesseth et al. [2015], Heng et al. [2020], Buchholzet al. [2020]. The goal of this section is to offer a simple explanation for the operational success ofSMC samplers with particular focus on the role of bridging distributions. Whilst we emphasizehigh dimensions, it will be apparent from the following discussion that the computational costof stabilizing the variability of SMC estimates is driven by the χ -divergence between the initial11nd target distributions, rather than the dimension per se, and of course this divergence can alsobe large in low-dimensional problems. To make our discussion more concrete, we will focus on the geometric path (2.1), forward MCMCkernels ( M t ), and backward kernels ( L t − ) given by their time reversals. Recall that in this case,the weight function is w t ( x t − ) = γ t ( x t − ) /γ t − ( x t − ). We shall examine the variance of thenormalizing constant estimator Z NT = T Y t =1 N N X n =1 w t ( X nt − ) , (3.1)produced by the modification of Algorithm 1 described in Section 2.3 to promote sample diver-sity. Cérou et al. [2011] established a formula for this nonasymptotic variance in an abstractsetting of Feynman-Kac formulae which has played an important role in subsequent theory andmethodology of SMC. Our first step is to make a simplifying assumption which allows us tocapture some of the essence of Cérou et al. [2011] with only simple calculations. Assumption . For all t ∈ [ T ], the forward kernel is an idealized perfectly mixing MCMC kernel M t ( x t − , dx t ) = π t ( dx t ).We stress here that our priority is exposition rather than generality or realism, but in practice if M t is taken to be multiple iterations of an ergodic MCMC kernel targeting π t , one approachesthe setting of Assumption 3.1 as the number of iterates is made large. Under Assumption3.1 and using the unbiased property of the normalizing constant estimator and the identity Z = Q Tt =1 Z t /Z t − = Q Tt =1 { R X w t ( x t − ) π t − ( dx t − ) } , a calculation shows thatVar (cid:20) Z NT Z (cid:21) = T Y t =1 (cid:20) χ ( π t | π t − ) N (cid:21) − . (3.2)From (3.2), we observe that χ -divergences between consecutive distributions play an importantrole in the performance of the algorithm. Let us now bring the question of dimension into play. So far in Section 3, we have implicitlyconsidered a generic sampling problem on a state space of dimension d . Let us suppose that weare given a sequence of such sampling problems indexed by d ∈ N . For simplicity of presentation,we shall not make the dependence of ( π t ) and T on d explicit in the notation. We will specifythe inverse temperatures ( λ t ) in a way that possibly depends on d . To do so, we introduce ournext assumption, which captures the idealized performance of the adaptive procedure describedin Section 2.3 in the case of using infinitely many particles. Considering this idealized situationallows us to dispense with some technical subtleties as the adaptive procedure would yield anon-random number of distributions T and non-random bridging distributions ( π t ). Assumption . For all t ∈ [ T − π t − and π t satisfy χ ( π t | π t − ) = δ for some pre-specified δ > d , and such that χ ( π | π ) > δ .The next assumption postulates how the number of bridging distributions T scales with dimension d . This will be verified on specific examples in the following. Assumption . There exists α > T = O ( d α ) as d → ∞ .12s an aside, we note that the χ -divergence is not a proper distance, otherwise the existenceof bridging distributions satisfying Assumptions 3.2 and 3.3 could be directly ruled out by thetriangle inequality.Since the χ -divergence between successive distributions is fixed as δ = κ − − δ/N ) T −
1. As d → ∞ and hence T → ∞ , toensure stability of the estimator (3.1), this suggests choosing the number of particles N such that N = O ( T ) to keep the relative variance of a constant order . Therefore the overall computationalcost of the idealized SMC sampler, e.g. measured in terms of density and gradient evaluations,would be O ( T ) = O ( d α ), i.e. polynomial in d , if Assumption 3.3 holds.Therefore we turn our attention to verifying Assumption 3.3, i.e. the scaling of the number ofbridging distributions as dimension d → ∞ . We first draw some insights from a Normal example. Example . Consider initial distribution π ( dx ) = N ( x ; µ , Σ) dx and target distribution π ( dx ) = N ( x ; µ, Σ) dx for some mean vectors µ , µ ∈ R d and covariance matrix Σ ∈ R d × d . In this case,an element along the geometric path (2.1) is a Normal distribution π t ( dx ) = N ( x ; µ t , Σ) dx witha mean vector that is given by the linear interpolation µ t = µ + λ t ( µ − µ ) for t ∈ [ T ].The χ -divergence between successive distributions can be computed in closed-form: χ ( π t | π t − ) = exp(( λ t − λ t − ) | µ − µ | − ) − , (3.3)where | µ − µ | Σ − = p ( µ − µ ) > Σ − ( µ − µ ) denotes the Mahalanobis distance. Using theseexpressions, we can work out the number of bridging distributions T and the sequence of inversetemperatures ( λ t ) t ∈ [ T ] needed to fix χ ( π t | π t − ) = δ for all t ∈ [ T −
1] and some pre-specified δ > χ ( π | π ) > δ . Under the specification T = d| µ − µ | Σ − / p log(1 + δ ) e , (3.4)where d·e denotes the ceiling function, and λ t = t p log(1 + δ ) / | µ − µ | Σ − , (3.5)for t ∈ [ T − | µ − µ | Σ − ≤ Λ min (Σ) − / | µ − µ | , where Λ min (Σ) denotes the minimumeigenvalue of Σ, it follows from (3.4) that T = O ( √ d ) if Λ min (Σ) is uniformly bounded awayfrom zero and | µ − µ | is O ( √ d ), both as d → ∞ . Hence in this situation Assumption 3.3 holdswith α = 1 / X = R d , we introduce some assumptions alongthe geometric path π ( λ, dx ) = γ ( λ, x ) dx/Z ( λ ) for λ ∈ [0 , γ ( λ, x ) = γ ( x ) − λ γ ( x ) λ and Z ( λ ) = R X γ ( λ, x ) dx . The densities γ ( x ) and γ ( x ) are assumed to be continuously differentiablein the following. To simplify notation, we will write the expectation of ϕ : R d → R with respectto π ( λ, dx ) as π ( λ, ϕ ) = R X ϕ ( x ) π ( λ, dx ) and ‘ ( x ) = log( γ ( x ) /γ ( x )) which corresponds to thelog-likelihood function in the Bayesian setting where π is the prior and π is the posterior. Assumption . There exist constants
C, ζ > β : [0 , → R + with inf λ ∈ [0 , β ( λ ) > λ ∈ [0 , π ( λ, dx ) along the geometric path satisfies:(i) a Poincaré inequality with constant β ( λ ), i.e. for all differentiable ϕ : X → R , we have π ( λ, ϕ ) − π ( λ, ϕ ) ≤ β ( λ ) − π ( λ, |∇ ϕ | );(ii) the maximum of log-likelihood sup x ∈ X ‘ ( x ) ≤ Cd ζ ;(iii) the expected log-likelihood π ( λ, ‘ ) ≥ − Cd ζ ; This follows from the limit lim N →∞ (1 + δ/N ) N = exp( δ ). π ( λ, |∇ ‘ | ) ≤ Cd ζ .The Poincaré inequality is an isoperimetric condition on a distribution with rich implications [Anéet al., 2000]. One such property is the exponential convergence of certain MCMC algorithms[Andrieu et al., 2018, Vempala and Wibisono, 2019] which can be used to generalize the discussionin Section 3.2 by relaxing the assumption of perfectly mixing kernels [Schweizer, 2012a]. Werefer readers to references in [Vempala and Wibisono, 2019, p. 7 & 16] for conditions to verify aPoincaré inequality and note it is strictly weaker than strong log-concavity.In general it is not possible to obtain closed-form expressions for the inverse temperatures( λ t ) t ∈ [ T ] and hence the bridging distributions ( π t ) t ∈ [ T ] satisfying Assumption 3.2, or for T it-self. However under Assumption 3.4 it is possible to verify Assumptions 3.2-3.3 by consideringbounds on the χ -divergence between successive distributions. At step t ∈ [ T ], the χ -divergenceof π t − ( dx ) = π ( λ t − , dx ) from π t ( dx ) = π ( λ t , dx ) for 0 ≤ λ t − < λ t ≤ χ ( π t | π t − ) ≤ β ( λ t − ) − ( λ t − λ t − ) Z X π t ( x ) π t − ( x ) |∇ ‘ ( x ) | π t ( dx ) . (3.6)This follows from Assumption 3.4(i) for the distribution π ( λ t − , dx ) and the function ϕ ( x ) = π t ( x ) /π t − ( x ). To upper bound the ratio of densities in (3.6), we considerlog π t ( x ) − log π t − ( x ) = ( λ t − λ t − ) ‘ ( x ) − (log Z t − log Z t − ) (3.7)= ( λ t − λ t − )( ‘ ( x ) − π ( λ ∗ t , ‘ ))which holds for some λ ∗ t ∈ ( λ t − , λ t ) using the mean value theorem. Hence using Assumption3.4(ii)-(iii), we have sup x ∈ X π t ( x ) π t − ( x ) ≤ exp(2 C ( λ t − λ t − ) d ζ ) . (3.8)Applying this upper bound in (3.6), Assumption 3.4(iv) and the lower bound β = inf λ ∈ [0 , β ( λ )gives χ ( π t | π t − ) ≤ β − ( λ t − λ t − ) exp(2 C ( λ t − λ t − ) d ζ ) Cd ζ . (3.9)If we construct a sequence of inverse temperatures with increment λ t − λ t − = cd − ζ , the constant c > T = O ( d ζ ).The above reasoning falls short of describing the behavior of an actual SMC sampler in a realisticscenario. The literature contains various rigorous studies on the performance of SMC samplersand the impact of dimension. An important reference is Beskos et al. [2014], which providesstability results in a setting where the target distribution π can be factorized into d independentcomponents, discuss the behavior of the required number of bridging steps and of the effectivesample sizes, etc. Some explicit discussions of the impact of the dimension on SMC samplers canalso be found in Section 6 of Schweizer [2012a], which gives solid reasons to expect a polynomialdimension dependence, again for target distributions defined as products. Finite sample resultswith explicit discussions of the impact of the dimension have been proposed in Marion andSchmidler [2018]. Closely related is the studies in Brosse et al. [2018] and Andrieu et al. [2016]that focus on the effect of dimension when estimating the normalizing constant Z using numericalmethods that are simpler to analyze than SMC samplers. We now illustrate the empirical performance of SMC samplers on multivariate Normal dis-tributions in R d with varying d . The initial distribution is π ( dx ) = N ( x ; µ , Σ ) dx with µ = (1 , . . . , = diag(0 . , . . . , .
5) and the target distribution is π ( dx ) = N ( x ; µ, Σ) dx dimension s t e p s fixed N linear N (a) Number of bridging distributions. dimension m ea n r oo t s fixed N fixed N & d steps linear N (b) Number of roots. Figure 1: Multivariate Normal example of Section 3.4. Number of bridging distributions chosenby an adaptive SMC sampler based on the procedure described in Section 2.3 ( left ). Number ofroots in the genealogical trees generated by SMC in three different regimes ( right ). Each plot isobtained by averaging over 50 independent repeats.with µ = (0 , . . . , , . . . , π t ) t ∈ [ T ] and we consider a geometric path (2.1) which lies in the Nor-mal family. For each t ∈ [ T ], we employ a π t -invariant HMC with step size ε = d − / and m = d d / e number of leap-frog steps as our forward kernel M t . The “mass matrix” Ω is chosento be diagonal and adapted using the inverse of the empirical marginal variances based on theparticle approximation at each step. The corresponding backward kernel L t − is given by thetime reversal of M t . All simulations employed multinomial resampling.Figure 1a shows the number of bridging distributions T obtained using an adaptive SMC samplerthat determines bridging distributions based on the procedure described in Section 2.3 withthreshold κ = 0 .
5. The two lines correspond to having N = 256 (“fixed N”) and N = 256 + 8 d (“linear N”) number of particles. The number of distributions T seems to increase sub-linearlywith d in both regimes. We introduce a third setup, referred to as “fixed N & d steps” in the plots,where we set N = 256 and T = d . In this case, the sequence of inverse temperatures ( λ t ) t ∈ [ T ] wasdetermined by interpolating between the inverse temperatures obtained from a preliminary run ofadaptive SMC. The interpolation was performed using the cobs package in R [Ng and Maechler,2007], which allows one to fit splines that are constrained to be monotonically increasing. Toassess and compare the performance between the three setups as the dimension increases, wecompute the number of roots in the ancestry tree generated by the SMC samplers. Figure 1brepresents the average number of roots against dimension. We observe that the number of uniqueancestors of the particles at the terminal time decreases in the “fixed N” regime. However, itseems to be stable when either N scales linearly with d in the adaptive sampler, or when T scaleslinearly in d for fixed N . This suggests that the performance of the sampler is stable with d inthese two regimes.We further investigate this hypothesis using more concrete measures of Monte Carlo performancein Figure 2. Figure 2a displays, for the three aforementioned regimes, the mean squared error(MSE) associated with the estimator of E π [ X ] = R X xπ ( dx ) (averaged over all components). Weobserve that the MSE appears stable in both regimes with N = 256, and decreases when N increases linearly with d . This suggests that the mechanism to adaptively select the schedule( λ t ) t ∈ [ T ] , and the choice of forward and backward kernels, perform uniformly well with respectto d in the present setting. We also observe that the regime with T = d does not provide gainsover the “fixed N” adaptive SMC sampler in terms of MSE. Figure 2b displays the variance ofthe normalizing constant estimator Z NT (in log-scale) against dimension. The variance seems toincrease with d for “fixed N”, but appears stable in the other two regimes. This is consistentwith the stability of the number of roots in Figure 1b.15 .0020.0040.0060.008 32 64 128 256 512 dimension M S E E [ X ] fixed N fixed N & d steps linear N (a) MSE for estimator of E π [ X ]. dimension v a r i a n ce l og c on s t a n t fixed N fixed N & d steps linear N (b) Variance of log Z NT . Figure 2: Multivariate Normal example of Section 3.4. MSE for estimator of E π [ X ] ( left ) andvariance of log Z NT ( right ) with increasing dimension. Each plot is obtained by averaging over 50independent repeats.Overall these plots suggest that SMC samplers can deliver stable performance as d increases,for a polynomial cost in d . The exact scaling in d is expected to vary strongly across settings.We also see that performance improves as N increases, as expected, and also as the number ofbridging distributions increases for certain measures such as the variance of log Z NT .We conclude by noting that, in addition to its impact on the number of bridging distributions,the dimension also affects the cost of weight calculations and of the propagation of each particleusing Markov kernels. In our numerical experiments, we employed d / leap-frog steps in eachHMC move, and the cost of gradient and density evaluations is linear in d . Assuming thatadaptive SMC samplers require approximately T = O ( √ d ) bridging distributions, and keepingthe number of particles N fixed, we obtain a global cost of O ( d / ) to control the MSE ofestimators of E π [ X ]. With d bridging distributions, we seem to be able to control the variance oflog Z NT for a cost of O ( d / ). Since the distributions have diagonal covariance matrices, a factorof d / would be gained by using component-wise Gibbs moves instead of HMC. If the targetcovariance matrix was fully dense instead of diagonal, the cost of HMC would be multiplied byat least d , possibly d if the “mass matrix” is chosen to be dense. Thus, even in the simplecase of Normal distributions, it is hard to concisely describe how the complexity behaves with d . In general, it therefore seems unrealistic to expect meaningful statements of the type “SMCsamplers scales as d η for some specific η ”. SMC samplers can be employed in various ways, all eventually leading to asymptotically validestimators, but in different asymptotic regimes. This variety of regimes allows one to leverageavailable computing resources, and leads to different perspectives on “diagnostics of convergence”for SMC samplers and on the quantification of errors.In the classical use of SMC, precision increases with the number of particles. This suggeststhat users should run SMC samplers with as many particles as possible. This classical regime isdescribed in Section 4.1, along with some of its limitations. We consider alternative “use modes”in Section 4.2 that are based on independent runs of SMC, each with a fixed number of particles.This provides examples where SMC samplers are used as modules in encompassing samplingalgorithms. 16 Figure 3: Genealogical tree of particles generated by a run of SMC sampler. Here N = 256particles at the terminal time have 27 unique ancestors or “roots”. Following Del Moral et al. [2006], SMC samplers are instances of interacting particle systems, i.e.Monte Carlo approximations of Feynman–Kac models. This view has proven fruitful and allowsthe application of various existing results; see Del Moral [2004] for a textbook treatment withvarious asymptotic and non-asymptotic results, and Doucet and Lee [2018] for a recent survey.Many theoretical results and methodological advances apply simultaneously to SMC samplersand other types of interacting particle systems, thanks to the unified Feynman–Kac formalism.The literature provides a variety of results on both the SMC estimator π Nt ( ϕ ) of the expectation π t ( ϕ ) for some function ϕ , and the normalizing constant estimator Z Nt , described in Section 1.2,as a function of N and t . Among the most essential results are the central limit theorems: √ N ( π Nt ( ϕ ) − π t ( ϕ )) d. −→ N (0 , v t ( ϕ )) , (4.1) √ N ( Z Nt /Z t − d. −→ N (0 , v ?t ) , (4.2)for each t as N → ∞ , where d. −→ denotes convergence in distribution, and v t ( ϕ ) , v ?t > n -th particle at step t : b nt,t = n, and b ns − ,t = a b ns,t s − for 1 ≤ s ≤ t. (4.3)Since only the offsprings of the particles indexed by b N ,t survive at time t , we will refer to suchindices as “roots”. Lineages of particles generated by a run of SMC are depicted in Figure 3, and17ome combinatorial properties of these genealogical trees are given in Jacob et al. [2015], Koskelaet al. [2020]. For a test function ϕ , consider the quantity V Nt ( ϕ ) = π Nt ( ϕ ) − (cid:18) NN − (cid:19) t +1 N X n,m : b n ,t = b m ,t ϕ ( x nt ) ϕ ( x mt ) , (4.4)which can be computed as a by-product of the SMC algorithm without much overhead. Theorem1 of Lee and Whiteley [2018] states the convergence in probability of N · V Nt ( ϕ − π Nt ( ϕ )) to v t ( ϕ ),and of N · V Nt (1) to v ?t , as N → ∞ . Note that we can directly write V Nt (1) = 1 − (cid:18) NN − (cid:19) t +1 + (cid:18) NN − (cid:19) t +1 N X n ∈ [ N ] |{ m : b m ,t = b n ,t }| . (4.5)The right-hand side features the cardinal of the set of siblings of particle n , i.e. the particles thathave the same ancestor at time zero. If all particles were siblings, the sum would be of order N and thus the estimated variance would be away from zero. On the other hand if all particleshave a small number of siblings, the sum is of order N , and the variance estimator is of order N − . Note that the estimator can take negative values for any fixed N . The terms N/ ( N − N → ∞ and thus could be removed without asymptotic effect, but are includedbecause they result in interesting unbiasedness properties [Lee and Whiteley, 2018].The variance estimators can be considered as part of the “convergence diagnostics” for SMCsamplers. It is common to monitor various other quantities during SMC runs such as the ESS. Ifthe ESS (or conditional ESS of Zhou et al. [2016], see Section 2.3) comes close to a pre-specifiedminimal value at any point, an additional bridging distribution could be introduced. We can alsomonitor the number of roots in the genealogical ancestry tree, as in Figure 1b, and check that itremains higher than one. Adding more bridging distributions or more particles would increasethe number of roots. To monitor the performance of move steps, one can compute the distancebetween the location of a particle before and after each move, and compare these distances tothe spread of the contemporaneous distribution. If the moves are concerningly small relativeto the distribution, one could tune the Markov kernels differently (e.g. Fearnhead and Taylor[2013]), or iterate the moves until some criterion is met (e.g. South et al. [2019], Buchholz et al.[2020]), or add more bridging distributions and associated moves. It is generally useful to runthe sampler multiple times independently and to check that the results are in agreement.The N → ∞ regime that underpins the above asymptotics has some practical appeal. Animportant one relates to parallel computing. Most computations in an SMC sampler can bedistributed across parallel processors, including the independent propagation of particles usingMarkov kernels and the calculation of unnormalized weights. The resampling step, on the otherhand, requires interactions between the particles and thus communication between processors.This has motivated a number of works on the problem of implementing particle methods ongraphics processing units and networks of computing devices [Lee et al., 2010, Murray, 2012, Junet al., 2012, Paige et al., 2014, Vergé et al., 2015, Murray et al., 2016a, Whiteley et al., 2016].The N → ∞ regime also has some limitations. First, a basic implementation would require N particles to be simultaneously available in memory; as N increases, and if the dimension d is large, one can often reach memory limits. For example, this issue is particularly pertinentwhen combining SMC samplers with particle filters to perform inference for state space mod-els [Chopin et al., 2013, Fulop and Li, 2013, Duan and Fulop, 2015]. Memory limitations canbe mitigated with techniques described in Jun and Bouchard-Côté [2014]. In contrast, MCMCmethods only require fast access to one state per chain, which is lighter by orders of magnitude.Secondly, an SMC sampler in the large N regime is not an “anytime” algorithm. This meansthat it has to run for T steps before terminating and returning approximations of the targetdistribution. If the user interrupts the algorithm before completion, no approximation of thetarget distribution is returned. Furthermore, if the user wishes to improve the quality of existing18igure 4: Three ancestry trees, with N = 64 particles. Section 4.2 describes how R independentruns with fixed N can be combined into consistent estimators.approximations, the algorithm has to be re-run from scratch with more particles. This is in con-trast with MCMC methods, which can be easily interrupted and resumed. This shortcoming ofSMC has attracted some attention and variants exist where new particles can be added along theway, e.g. see Brockwell et al. [2010], Paige et al. [2014], Murray et al. [2016b], Finke et al. [2018].In the following, we consider other use modes of SMC samplers, which require little additionalimplementation effort, that lead to straightforward parallelization strategies and simple ways ofconstructing confidence intervals. This section explores the use of independent SMC samplers with a fixed number of particles N ,as depicted in Figure 4. Consider R independent runs, possibly obtained from parallel machines,and denote by ( π N,r ) r ∈ [ R ] the resulting particle approximations of π , and by ( Z N,r ) r ∈ [ R ] thecorresponding estimators of the normalizing constant Z . How can we obtain consistent approx-imations of π and Z as R → ∞ , even though N is fixed?There are multiple answers to this question, described for example in Andrieu et al. [2010],Whiteley et al. [2016], Rainforth et al. [2016]. Recall from Section 1.2 that the SMC normal-izing constant estimator is unbiased if the sequence of bridging distributions is not adaptivelydetermined, and the forward and backward Markov kernels themselves do not depend on theparticles. Under these conditions, we can simply average ( Z N,r ) r ∈ [ R ] to obtain a consistent esti-mator of Z as R → ∞ . Estimating expectations under π is more involved as SMC estimators arebiased when N is fixed, in the same way that self-normalizing importance sampling estimatorsare biased (see e.g. Owen [2013]). We now discuss how to correct for this bias by following the“particle MCMC” approach of Andrieu et al. [2010].Consider the SMC sampler described in Algorithm 1. Suppose that we select a particle among the N available ones at the terminal step, i.e. we sample an index k from the categorical distributionon [ N ] with probabilities w NT and return x kT . The joint distribution of all random variablesgenerated has density q N ( k, ¯ x, ¯ a ) = Y n ∈ [ N ] π ( x n ) T Y t =1 r ( a Nt − | w Nt − ) Y n ∈ [ N ] M t ( x a nt − t − , x nt ) w kT , (4.6)where ¯ x = ( x nt ) n ∈ [ N ] for 0 ≤ t ≤ T and ¯ a = ( a nt ) n ∈ [ N ] for 0 ≤ t ≤ T −
1. Following Andrieu et al.[2010, eqn. 31], we define another distribution on the same space¯ π N ( k, ¯ x, ¯ a ) = Z NT Z T q N ( k, ¯ x, ¯ a ) . (4.7)19nder a mild assumption on the resampling scheme, it follows from the discussion in Andrieuet al. [2010] that the density in (4.7) defines a valid probability distribution, and that its marginaldistribution in x kT is the target distribution π of interest. These observations prompt the useof q N as the proposal distribution and ¯ π N as the target distribution in an importance samplingargument on the extended space. The resulting importance weights ¯ π N ( k, ¯ x, ¯ a ) /q N ( k, ¯ x, ¯ a ) isproportional to the normalizing constant estimator Z NT . For test function ϕ , a self-normalizedimportance sampling estimator after Rao-Blackwellizing the index k is¯ π R ( ϕ ) = P r ∈ [ R ] Z N,r π N,r ( ϕ ) P r ∈ [ R ] Z N,r , (4.8)which approximates π ( ϕ ) as R → ∞ , for any fixed N . Furthermore, the asymptotic variance of¯ π R ( ϕ ) can also be approximated using standard self-normalized importance sampling¯ V R ( ϕ ) = R − P r ∈ [ R ] ( Z N,r ) ( π N,r ( ϕ ) − ¯ π R ( ϕ )) ( R − P r ∈ [ R ] Z N,r ) . (4.9)The practical benefits over the large N asymptotics are numerous. One can exploit parallelmachines without any communication; arbitrarily refine results by increasing R without hittingmemory limits; and the procedure is simple to interrupt and resume. This approach can be seenas a case of IS [Tran et al., 2013] or SMC [Chopin et al., 2013, Fulop and Li, 2013]. Notethat Whiteley et al. [2016] somewhat discourages this approach relative to the large N regime.Indeed, although valid for any choice of N , the estimator in (4.8) can be prohibitively inefficientif N is poorly chosen. On the other hand, this can be detected by inspecting the variance of Z NT or monitoring the number of roots. Tuning of the SMC samplers and performance monitoringcan follow the guidelines laid out in Section 4.1. More sophisticated schemes where “islands” ofparticles are allowed to communicate instead of being fully independent have been studied e.g.in Vergé et al. [2015], Whiteley et al. [2016], Sen and Thiery [2019], Heine et al. [2020].Equation (4.7) also suggests the use of an SMC sampler as an independent proposal in aMetropolis–Hastings algorithm. This corresponds to the “particle independent Metropolis–Hastings” (PIMH) method in Andrieu et al. [2010], which is simply a standard Metropolis–Hastings algorithm with q N as the proposal and ¯ π N as the target. Despite the iterative natureof MCMC, most of the computation lies in the generation of R independent proposals, whichcan be done in parallel. One can view the retrospective construction of the PIMH chain andits associated MCMC average as a means to obtain consistency as R → ∞ , for any fixed N .Moreover, as an instance of standard Metropolis–Hastings, the approach lends itself to conver-gence diagnostics for MCMC [Brooks et al., 2011]. Other tools developed for generic MCMCalso apply; in particular, the unbiased estimation framework proposed in Glynn and Rhee [2014]and developed in Jacob et al. [2020]. The case of PIMH is explored in Middleton et al. [2019]and some appeals of unbiased estimators are mentioned in Section 5. Compared to the couplingsof most MCMC algorithms considered in Jacob et al. [2020], which require algorithmic-specificconsiderations, the coupling of PIMH is generic: any SMC sampler can be used to generateindependent proposals as long as the “particle MCMC” identity in (4.7) is valid. Related workin Biswas et al. [2019] leverages the PIMH construction to obtain an estimable upper bound onthe total variation distance between the marginal distribution of x kT under q N and the targetdistribution π . SMC samplers can be used broadly in the same settings as MCMC, and provide asymptoti-cally consistent estimators along with asymptotically valid confidence intervals. However, SMCsamplers also have some distinctive appeals, some of which are highlighted in this section.20 lllllllllllll ll llll llll ll ll lll llll ll ll lllllll ll ll ll lllll ll ll llll lll l l ll ll lllll l l ll ll lllll l l ll lllllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll llllll lll ll lll llllll ll ll ll ll lll l l ll ll lllll l l ll lllllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l llllll l l ll ll lllll l l ll ll ll lll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll l lll ll ll lll ll ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll l l lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l llllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llllllllllllllll lll l lll lll l l ll llll lll l l ll l llllll l l ll ll lllll l l ll ll lllll l l ll lllllll l l ll ll lllll l l ll ll lllll l l ll ll ll lll l l ll ll lllll l l ll l llllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l l ll l ll llll l lll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llllllllllllllll l llll lllll l lll l l lllll ll ll lllllll ll ll ll lllll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l l lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll lllllllll l l ll llll lll l l ll llll lll llll ll ll lll l l ll ll ll lll l l ll l l lllll l l ll l llllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l l lllll l l ll l llllll l l ll lll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllll ll lllll llllll ll ll llll lll ll ll lllllll l l ll ll lllll l l ll llll lll ll ll ll lllll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll lll llll l l ll ll l llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllll ll ll ll ll lllll l l ll lllllll l l ll lllllll l l ll ll lllll l l ll lllllll l l ll ll lllll l l ll lllllll l l ll lll llll l l ll l ll llll l l ll lll llll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l llllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllllll ll ll ll lll ll ll ll lllll l l ll ll lllll l l ll lllllll l l ll lllllll l l ll ll lllll l l ll lllllll l l ll l ll llll l l ll lllllll l l ll lll llll l l ll lll llll l l ll l llllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll l lll ll ll lll l lll ll lllll ll ll ll ll lll l l ll lllllll l l ll ll lllll l l ll ll lllll l l ll ll ll lll ll ll ll lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll ll mean v a r i a n ce l (a) Geometric path. lllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lllllll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lllllll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll ll lll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll llll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll lllll ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lllllll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll llll l ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lllllll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll llll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l lllllll llll lll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllllllllllllll ll ll llll lll ll ll llll l ll l ll l lllll ll lll l lll ll ll lll l lllllll lll l llll lll llll lll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll llll l ll llll lllll ll llll llll l ll llll lllll ll llll lllll ll llll ll lll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll lllll ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll lllll lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll llll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll ll mean v a r i a n ce (b) Partial posteriors. lllllllllllll ll ll llll lll l l ll ll ll lll ll ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll ll ll lll l l ll llll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l llllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll lllllll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll ll ll lll ll ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll ll lllll ll ll ll ll lll ll ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll ll lllll l l ll l llllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll llll lll ll ll llll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l l lllll l l ll l llllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll lllllllllllll ll ll ll l lll lll ll ll llll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l llll ll lllll llll ll lllll llll ll lllll llll ll lllllllllllllll ll ll llll lll l l ll llll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll ll lll l l ll ll lllll l l ll ll lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll llll l lll l ll lllll llll ll lllll llll ll lllll llll ll ll mean v a r i a n ce l (c) Path of least coding. Figure 5: Three paths of distributions connecting the prior to the posterior in a logistic regressionexample described in Section 2.1, represented by lines in the mean-variance plane, for eachcomponent and 10 independent runs.
We consider a logistic regression, as described in Section 2.1, on the “forest cover type” data[Blackard, 2000], processed as in Collobert et al. [2002] . The data contain cartographic informa-tion (relating to altitude, slope, azimuth etc) for 30 m by 30 m cells in northern Colorado, alongwith the type of cover (originally spruce/fir, lodgepole pine, Ponderosa pine, cottonwood/willow,spruce/fir and aspen or Douglas-fir, and in Collobert et al. [2002] this was simplified to lodgepolepine versus the other categories combined). Using a logistic regression we can try to predictthe cover type using the cartographic variables. With an intercept and 10 covariates, there are d = 11 regression coefficients to be estimated.The prior specification is taken as a Normal distribution with mean b = (0 , . . . ,
0) and covariance B = diag(10 , . . . , m = 1000 rows of the data, Figure 5 shows the meanand variance of the d = 11 components of β for three paths of distributions: a geometric path(5a), a path of partial posteriors where observations are assimilated in batches of size 10 (5b),and a path of “least coding efforts” using the Pólya–Gamma Gibbs sampler (5c). HMC movesare employed for the first two paths and 10 independent repeats are shown. Although the threepaths initialize and terminate at the prior and posterior distributions, respectively, their bridgingdistributions are visibly different.We illustrate the sequential Bayesian update of the distribution of parameters given data byfocusing on the path of partial posteriors. Figure 6 shows a phenomenon called “merging” (seee.g. Ghosh and Ramamoorthi [2003, Chapter 1]), whereby posteriors obtained with differentpriors eventually coincide as more observations are introduced. We see the phenomenon “inaction” for two regression coefficients, and we observe that certain components of the posteriordistribution merge faster than others. Similar figures could be used to visualize the Bernstein-vonMises phenomenon whereby the posterior distribution becomes closer to a Normal distributionas the number of observations m increases.Sequential inference allows us to monitor the evolution of our beliefs about the parameters, andthe associated performance. For example, Figure 7 shows the logarithmic score associated withthe posterior predictive distribution as m increases, on a test data set. We see that predictive . b ob s e r v a ti on s prior 1 prior 2 (a) Marginal on β . b ob s e r v a ti on s prior 1 prior 2 (b) Marginal on β . Figure 6: Logistic regression with forest cover type data. Evolution of the posterior distributionof β ( left ) and β ( right ) as more data is assimilated, with initialization from the priors N (2 , blue ) and N ( − ,
3) ( beige ). −6800−6700−6600 0 1000 2000 3000 4000 5000 l og s c o r e on t e s t −6546−6545−6544−6543 5000 6000 7000 8000 9000 10000 l og s c o r e on t e s t Figure 7: Logistic regression with forest cover type data. Performance of the posterior predictivedistribution on a test data set as the first 5000 ( left ) and next 5000 ( right ) observations areassimilated, estimated using five independent runs of SMC.performance increases significantly as we start to assimilate data. However, after a certain pointthe predictive performance seems to stagnate. Indeed, under model misspecification, there isno guarantee that the posterior predictive performance improves with more data. The abilityto monitor performance can be helpful when deciding whether the model under consideration isable to benefit from the inclusion of more data.Conversely, Bayesian asymptotics provide useful strategies for Monte Carlo computation. Forexample, we can use a Laplace approximation of the posterior as initial distribution π , i.e.a Normal distribution centered at the MLE and with covariance given by the inverse of theinformation matrix at the MLE. Figure 8a shows that the approximation is extremely accuratewhen m is large, and leads to very high effective sample sizes in one step. Thus SMC samplers thatemploy the ESS criterion to select the next inverse temperature would revert back to importancesampling in this setting. The plot in Figure 8b shows the estimates of log Z divided by thenumber of observations m ; ten repeats are overlaid but the accuracy is such that they areindistinguishable. For Monte Carlo methods with sublinear in m cost for this context, seeCornish et al. [2019], Pollock et al. [2020].Finally we discuss the potentials of using unbiased estimators in the present setting of Bayesiangeneralized linear models. Recall from Section 4.2 that such estimators can be generically ob-tained using SMC samplers. Suppose for example that one of the covariates in the regressionis actually a random draw from another model, as in two-step estimation [Murphy and Topel,2002], and we might want to propagate the uncertainty onto the regression coefficients. In theBayesian terminology, this could lead to a “cut distribution” [Plummer, 2015]. Jacob et al.22 lllllllll llllllllllllllllllll llllllllll llllllllll llllllllll llllllllll llllllllll llllllllll llllllllll E SS % (a) Effective sample size. −0.658−0.657−0.6561e+04 5e+04 1e+05 l og c on s t a n t / ob s e r v a ti on s (b) Estimates of log Z/m . Figure 8: Logistic regression with forest cover type data. ESS against number of observations m ( left ) and estimates of log Z/m ( right ), when initializing from a Laplace approximation of theposterior. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll ll ll l l ll lll ll ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll ll ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l l ll lll ll ll l l ll lll ll ll l l ll lll l l ll l l ll lll l l ll l l ll lll l l ll l llllll l l ll l l lllll l l ll l llllll l l ll l l lllll l l ll l l lllll l l ll l l lllll l l ll l llllll l l ll l llllll l l ll l l lllll l l ll l l lllll l l ll l llllll l l ll l llllll l l ll l llllll l l ll l l lllll l l ll l llllll l l ll l l lllll l l ll l l lllll l l ll l llllll l l ll l llllll l l ll l l lllll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll llll l l ll l ll ll mean v a r i a n ce Figure 9: Logistic regression with forest cover type data. Path of partial posteriors averagedover orderings of the data. The ten realizations are obtained by nonparametric bootstrap from R = 1000 independent unbiased estimators.[2020] describes how unbiased estimators can be useful in the approximation of such distribu-tions. The same setting also occurs when some data are missing and “multiple imputation” isperformed [Rubin, 1996]. Other motivations include estimating propensity scores [Zigler andDominici, 2014], or addressing model misspecification by bagging posteriors [Bühlmann, 2014,Huggins and Miller, 2019]. All these cases are instances of the generic problem of approximating π ( dx ) = R π ( dx | η ) g ( dη ), where we can sample from η ∼ g and design a Monte Carlo method toapproximate the conditional distribution π ( dx | η ). This is called a “nested Monte Carlo” problemin Rainforth et al. [2017]. By obtaining an unbiased approximation of π ( dx | η ) for any sample η ∼ g , we can obtain an unbiased approximation of π ( dx ) itself, and thus consistent estimatorsand associated confidence intervals are available.We illustrate the use of unbiased estimators by considering a variant of the path of partialposteriors, π ( dβ | y m , x m ) given m observations. This path, illustrated in Figure 5b, dependson a specific ordering of the observations. Alternatively, we can consider the distribution of23 l l l l l l l l l l l l l days c on f i n e d i n b e d (a) Observations. t l og m a r g i n a l li k e li hood (b) Log-marginal likelihood. f i nv (c) Posterior of φ inv . g b (d) Posterior of ( γ, β ). Figure 10: SIR model with boarding school data. Observations of daily counts ( top-left ). Log-marginal likelihood of the initial time t at which the first individual is assumed to be infected( top-right ). Evolution of the marginal posterior distribution of φ inv ( bottom-left ) and of ( γ, β )( bottom-right ) as more data are assimilated.posteriors averaged over orderings, π ?m ( dβ | y, X ) = ( m !) − P σ π ( dβ | y σ (1: m ) , x σ (1: m ) ) where σ (1 : m ) is a permutation of [ m ], and the sum is over all possible permutations. To obtain unbiasedestimators, we sample m observations from the entire data set at random without replacement,and run coupled PIMH chains following Middleton et al. [2019] that employ SMC samplers with N = 128 particles as proposals. We compute R = 1000 independent unbiased estimators for each m , and use empirical averages to approximate π ?m ( β | y, X ). Figure 9 illustrates the evolution ofthe means and variances of β as m increases. We consider another setting where sequential inference might be particularly relevant: the model-ing of disease outbreaks. Parameter calibration in this setting involves blending prior informationwhen available with data arriving on a regular basis, typically daily or weekly. Many diseaseoutbreak models consist of a latent stochastic process that represent the underlying progressionof a pathogen in a population, assumed to be partially observed with some noise. This amountsto a state space model or stochastic kinetic model [Ionides et al., 2006, Golightly and Wilkinson,2006, He et al., 2010], see also Britton and Pardoux [2019] for a recent textbook treatment.Various particle methods, either generic or tailored to the problem have been employed in thissetting [Del Moral and Murray, 2015, Golightly and Kypraios, 2018].For simplicity, we consider a deterministic Susceptible-Infected-Recovered (SIR) model (e.g. Ba-caër [2012]). Inference for such models can be done with generic MCMC, as described in Grinsz-tajn et al. [2020]. We consider an example from that article, using the classical boarding school24ata which contain daily counts of pupils confined to bed during an influenza outbreak, shownin Figure 10a. The model is described by the differential equations dSdt = − βSI/n, dIdt = βSI/n − γI, dRdt = γI, (5.1)where n = 763 is the total number of school children, S , I and R represents the number ofsusceptible, infected and recovered children, respectively, and γ, β > S, I, R ) = ( n − , ,
0) at time t = 0, i.e. withan infected individual. The observations, which begin at time t = 1, are assumed to be noisymeasurements of the number I of infected children that day. The observation noise is modeledas a negative binomial distribution parametrized by φ inv >
0. Priors on γ, β, φ inv are takenarbitrarily as N (0 . , . ), N (2 , ) and an exponential distribution with rate 5, respectively.Using the Stan implementation in Grinsztajn et al. [2020], which provides a function to evaluatethe posterior log-density and its gradients [Carpenter et al., 2017], we run an adaptive SMCsampler with the path of partial posteriors. The bottom row of Figure 10 displays the timeevolution of the posterior distribution of parameters. Lastly, we consider a simple procedure toinfer the initial time t at which the first individual is assumed to be infected. Figure 10b plotsthe marginal likelihood of t , which is the normalizing constant of the corresponding posteriordistribution obtained by running SMC 10 times independently over a grid of values of t . We cannot expect SMC samplers that rely on MCMC kernels to perform well when these kernelshave poor mixing properties. This has motivated the use of other non-equilibrium dynamics[Vaikuntanathan and Jarzynski, 2008, Heng et al., 2015, Bernton et al., 2019a, Everitt et al.,2020]. There are some studies on the advantages of interacting particle methods over Markovchains on multimodal targets [Schweizer, 2012b, Paulin et al., 2019]. Indeed, many existingmethods tailored for multimodal targets rely on ideas that are conceptually similar to the SMCframework, such as the use of tempered distributions and multiple interacting chains. In thisarticle, we have emphasized that SMC samplers offer other distinctive appeals. This includesthe types of objects it can estimate, its capacity to exploit parallel computing architectures,its performance even for high dimensional problems, and its amenability to be used withinencompassing algorithms. Thus SMC samplers can be useful even in settings where plain MCMCmethods might already perform satisfactorily.The last decade has seen various advances. A non-exhaustive list includes variance estimators[Lee and Whiteley, 2018], methods to refine the forward kernels and the path of distributions[Guarniero et al., 2017, Heng et al., 2020], new resampling schemes [Gerber et al., 2019, Liet al., 2020], the use of quasi-random numbers [Gerber and Chopin, 2015], the use of SMC aspart of encompassing algorithms [Andrieu et al., 2010], and the development of probabilisticprogramming languages [Wood et al., 2014, Murray and Schön, 2018]. These advances illustrateand further the appeals of this class of algorithms for normalizing constant estimation andsampling in challenging scenarios. Consequently, the range of applications of SMC samplerskeeps broadening and includes applications in Bayesian nonparametrics [Cusumano-Towner andMansinghka, 2016, Griffin, 2017], Bayesian phylogenetic inference [Wang et al., 2015], large-scalegraphical models [Lindsten et al., 2017, Naesseth et al., 2014], and partial differential equations[Beskos et al., 2017].The code to reproduce the figures of the article is available at https://github.com/pierrejacob/smcsamplers . It is written in R [R Core Team, 2018] and employs various packages (e.g. Wick-ham [2016], Eddelbuettel and François [2011]).25 cknowledgements This work was funded by CY Initiative of Excellence (grant “Investisse-ments d’Avenir” ANR-16-IDEX-0008). Pierre E. Jacob gratefully acknowledges support by theNational Science Foundation through grants DMS-1712872 and DMS-1844695. The authorsthank Ian and Elizabeth Taylor for useful discussions.
References
S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, and A. M. Stuart. Importance sampling:Intrinsic dimension and computational cost.
Statistical Science , 32(3):405–431, 2017.C. Andrieu, J. Ridgway, and N. Whiteley. Sampling normalizing constants in high dimensionsusing inhomogeneous diffusions. arXiv preprint arXiv:1612.07583 , 2016.C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods.
Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342, 2010.C. Andrieu, A. Durmus, N. Nüsken, and J. Roussel. Hypocoercivity of piecewise deterministicMarkov process-Monte Carlo. arXiv preprint arXiv:1808.08592 , 2018.C. Ané, D. Bakry, and M. Ledoux.
Sur les inégalités de Sobolev logarithmiques , volume 10.Société mathématique de France Paris, 2000.Y. F. Atchadé and J. S. Rosenthal. On adaptive Markov chain Monte Carlo algorithms.
Bernoulli ,11(5):815–828, 2005.N. Bacaër. The model of Kermack and McKendrick for the plague epidemic in Bombay and thetype reproduction number with seasonality.
Journal of mathematical biology , 64(3):403–422,2012.M. A. Beaumont, J.-M. Cornuet, J.-M. Marin, and C. P. Robert. Adaptive approximate Bayesiancomputation.
Biometrika , 96(4):983–990, 2009.J. M. Bernardo and A. F. Smith.
Bayesian theory , volume 405. John Wiley & Sons, 2009.E. Bernton, J. Heng, A. Doucet, and P. Jacob. Schrödinger bridge samplers. arXiv preprintarXiv:1912.13170 , 2019a.E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate Bayesian computationwith the Wasserstein distance.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 81(2):235–269, 2019b.J. Besag. Markov chain Monte Carlo for statistical inference.
Center for Statistics and the SocialSciences , 9:24–25, 2001.A. Beskos, D. Crisan, and A. Jasra. On the stability of sequential Monte Carlo methods in highdimensions.
The Annals of Applied Probability , 24(4):1396–1445, 2014.A. Beskos, A. Jasra, N. Kantas, and A. Thiery. On the convergence of adaptive sequential MonteCarlo methods.
Annals of Applied Probability , 26(2):1111–1146, 2016.A. Beskos, A. Jasra, E. A. Muzaffer, and A. M. Stuart. Sequential Monte Carlo methods forBayesian elliptic inverse problems.
Statistics and Computing , 25(4):727–737, 2015.A. Beskos, A. Jasra, K. Law, R. Tempone, and Y. Zhou. Multilevel sequential Monte Carlosamplers.
Stochastic Processes and their Applications , 127(5):1417–1440, 2017.P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distri-butions.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 78(5):1103–1130, 2016. 26. Biswas, P. E. Jacob, and P. Vanetti. Estimating convergence of Markov chains with L-lagcouplings. In
Advances in Neural Information Processing Systems , pages 7389–7399, 2019.J. A. Blackard.
Comparison of neural networks and discriminant analysis in predicting forestcover types . PhD thesis, Department of Forest Sciences, Colorado State University, 2000.T. Britton and E. Pardoux.
Stochastic Epidemic Models with Inference . Springer, 2019.A. Brockwell, P. Del Moral, and A. Doucet. Sequentially interacting Markov chain Monte Carlomethods.
The Annals of Statistics , 38(6):3387–3411, 2010.S. Brooks, A. Gelman, G. Jones, and X.-L. Meng.
Handbook of Markov chain Monte Carlo . CRCpress, 2011.N. Brosse, A. Durmus, and É. Moulines. Normalizing constants of log-concave densities.
Elec-tronic Journal of Statistics , 12(1):851–889, 2018.A. Buchholz, N. Chopin, and P. E. Jacob. Adaptive tuning of Hamiltonian Monte Carlo withinsequential Monte Carlo.
Bayesian Analysis (to appear) , 2020.P. Bühlmann. Discussion of Big Bayes stories and BayesBag.
Statistical science , 29(1):91–94,2014.B. Calderhead. A general construction for parallelizing Metropolis–Hastings algorithms.
Pro-ceedings of the National Academy of Sciences , 111(49):17408–17413, 2014.B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J.Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language.
Journal of statisticalsoftware , 76(1), 2017.F. Cérou, P. D. Moral, and A. Guyader. A nonasymptotic theorem for unnormalized Feynman–Kac particle models. In
Annales de l’Institut Henri Poincaré, Probabilités et Statistiques ,volume 47, pages 629–649. Institut Henri Poincaré, 2011.F. Cérou, P. Del Moral, T. Furon, and A. Guyader. Sequential Monte Carlo for rare eventestimation.
Statistics and Computing , 22(3):795–808, 2012.H. P. Chan and T. L. Lai. A general theory of particle filters in hidden Markov models and someapplications.
The Annals of Statistics , 41(6):2877–2904, 2013.S. Chatterjee and P. Diaconis. The sample size required in importance sampling.
The Annals ofApplied Probability , 28(2):1099–1135, 2018.H. M. Choi and J. P. Hobert. The Pólya–Gamma Gibbs sampler for Bayesian logistic regressionis uniformly ergodic.
Electronic Journal of Statistics , 7:2054–2064, 2013.N. Chopin. A sequential particle filter method for static models.
Biometrika , 89(3):539–552,2002.N. Chopin and C. P. Robert. Properties of nested sampling.
Biometrika , 97(3):741–755, 2010.N. Chopin, P. E. Jacob, and O. Papaspiliopoulos. SMC : an efficient algorithm for sequentialanalysis of state space models. Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 75(3):397–426, 2013.R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems.In
Advances in Neural Information Processing Systems , pages 633–640, 2002.J. Cornebise, É. Moulines, and J. Olsson. Adaptive methods for sequential importance samplingwith application to state space models.
Statistics and Computing , 18(4):461–480, 2008.27. Cornish, P. Vanetti, A. Bouchard-Cote, G. Deligiannidis, and A. Doucet. Scalable Metropolis–Hastings for exact Bayesian inference with large datasets. In
International Conference onMachine Learning , pages 1351–1360, 2019.G. E. Crooks. Nonequilibrium measurements of free energy differences for microscopically re-versible Markovian systems.
Journal of Statistical Physics , 90(5-6):1481–1487, 1998.M. F. Cusumano-Towner and V. K. Mansinghka. Measuring the non-asymptotic conver-gence of sequential Monte Carlo samplers using probabilistic programming. arXiv preprintarXiv:1612.02161 , 2016.M. Cuturi, O. Teboul, Q. Berthet, A. Doucet, and J.-P. Vert. Noisy adaptive group testing usingbayesian sequential experimental design, 2020.A. S. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concavedensities.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79(3):651–676, 2017.A. P. Dawid and M. Musio. Bayesian model selection based on proper scoring rules.
Bayesiananalysis , 10(2):479–499, 2015.P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers.
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 68(3):411–436, 2006.P. Del Moral.
Feynman–Kac formulae . Springer, 2004.P. Del Moral and L. M. Murray. Sequential Monte Carlo with highly informative observations.
SIAM/ASA Journal on Uncertainty Quantification , 3(1):969–997, 2015.P. Del Moral, A. Doucet, and A. Jasra. An adaptive sequential Monte Carlo method for approx-imate Bayesian computation.
Statistics and Computing , 22(5):1009–1020, 2012.A. Doucet and A. Lee. Sequential Monte Carlo methods.
Handbook of Graphical Models , pages165–189, 2018.C. C. Drovandi and A. N. Pettitt. Estimation of parameters for macroparasite population evo-lution using approximate Bayesian computation.
Biometrics , 67(1):225–233, 2011.C. C. Drovandi, J. M. McGree, and A. N. Pettitt. Sequential Monte Carlo for Bayesian sequen-tially designed experiments for discrete data.
Computational Statistics & Data Analysis , 57(1):320–335, 2013.C. C. Drovandi, J. M. McGree, and A. N. Pettitt. A sequential Monte Carlo algorithm toincorporate model uncertainty in Bayesian sequential design.
Journal of Computational andGraphical Statistics , 23(1):3–24, 2014.Q. Du and A. Guyader. Variance estimation in adaptive sequential Monte Carlo. arXiv preprintarXiv:1909.13602 , 2019.J.-C. Duan and A. Fulop. Density-tempered marginalized sequential Monte Carlo samplers.
Journal of Business & Economic Statistics , 33(2):192–202, 2015.S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo.
Physics lettersB , 195(2):216–222, 1987.A. Durmus and E. Moulines. Nonasymptotic convergence analysis for the unadjusted Langevinalgorithm.
Annals of Applied Probability , 27(3):1551–1587, 2017.D. Eddelbuettel and R. François. Rcpp: Seamless R and C++ integration.
Journal of StatisticalSoftware , 40(8):1–18, 2011. doi: 10.18637/jss.v040.i08.28. G. Everitt, R. Culliford, F. Medina-Aguayo, and D. J. Wilson. Sequential Monte Carlo withtransformations.
Statistics and Computing , 30(3):663–676, 2020.P. Fearnhead and B. M. Taylor. An adaptive sequential Monte Carlo sampler.
Bayesian Analysis ,8(2):411–438, 2013.A. Finke, A. Doucet, and A. M. Johansen. Limit theorems for sequential MCMC methods. arXivpreprint arXiv:1807.01057 , 2018.A. Fulop and J. Li. Efficient learning via simulation: A marginalized resample-move approach.
Journal of Econometrics , 176(2):146–161, 2013.M. Gerber and N. Chopin. Sequential quasi Monte Carlo.
Journal of the Royal Statistical Society:Series B (Statistical Methodology) , 77(3):509–579, 2015.M. Gerber, N. Chopin, and N. Whiteley. Negative association, ordering and convergence ofresampling methods.
The Annals of Statistics , 47(4):2236–2260, 2019.J. K. Ghosh and R. Ramamoorthi.
Bayesian nonparametrics . Springer Science & BusinessMedia, 2003.W. R. Gilks and C. Berzuini. Following a moving target — Monte Carlo inference for dynamicBayesian models.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) ,63(1):127–146, 2001.M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlomethods.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 73(2):123–214, 2011.P. W. Glynn and C.-H. Rhee. Exact estimation for Markov chain equilibrium expectations.
Journal of Applied Probability , 51(A):377–389, 2014.A. Golightly and T. Kypraios. Efficient SMC schemes for stochastic kinetic models. Statisticsand Computing , 28(6):1215–1230, 2018.A. Golightly and D. J. Wilkinson. Bayesian sequential inference for stochastic kinetic biochemicalnetwork models.
Journal of Computational Biology , 13(3):838–851, 2006.N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-GaussianBayesian state estimation. In
IEE Proceedings F (Radar and Signal Processing) , volume 140(2),pages 107–113. IET, 1993.U. Grenander and M. I. Miller. Representations of knowledge in complex systems.
Journal ofthe Royal Statistical Society: Series B (Methodological) , 56(4):549–581, 1994.J. E. Griffin. Sequential Monte Carlo methods for mixtures with normalized random measureswith independent increments priors.
Statistics and Computing , 27(1):131–145, 2017.L. Grinsztajn, E. Semenova, C. C. Margossian, and J. Riou. Bayesian workflow for diseasetransmission modeling in Stan. arXiv preprint arXiv:2006.02985 , 2020.P. Grünwald and T. Van Ommen. Inconsistency of Bayesian inference for misspecified linearmodels, and a proposal for repairing it.
Bayesian Analysis , 12(4):1069–1103, 2017.P. Guarniero, A. M. Johansen, and A. Lee. The iterated auxiliary particle filter.
Journal of theAmerican Statistical Association , 112(520):1636–1647, 2017.H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm.
Bernoulli , 7(2):223–242, 2001. 29. He, E. L. Ionides, and A. A. King. Plug-and-play inference for disease dynamics: measlesin large and small populations as a case study.
Journal of the Royal Society Interface , 7(43):271–283, 2010.K. Heine, N. Whiteley, and A. T. Cemgil. Parallelizing particle filters with butterfly interactions.
Scandinavian Journal of Statistics , 47(2):361–396, 2020.J. Heng, A. Doucet, and Y. Pokern. Gibbs flow for approximate transport with applications toBayesian computation. arXiv preprint arXiv:1509.08787 , 2015.J. Heng, A. N. Bishop, G. Deligiannidis, and A. Doucet. Controlled sequential Monte Carlo.
Annals of Statistics (to appear) , 2020.C. C. Holmes and S. G. Walker. Assigning a value to a power likelihood in a general Bayesianmodel.
Biometrika , 104(2):497–503, 03 2017. ISSN 0006-3444. doi: 10.1093/biomet/asx010.A. M. Horowitz. A generalized guided Monte Carlo algorithm.
Physics Letters B , 268(2):247–252,1991.J. H. Huggins and J. W. Miller. Using bagged posteriors for robust inference and model criticism. arXiv preprint arXiv:1912.07104 , 2019.J. H. Huggins and D. M. Roy. Sequential Monte Carlo as approximate sampling: bounds, adaptiveresampling via ∞ -ESS, and an application to particle Gibbs. Bernoulli , 25(1):584–622, 2019.E. L. Ionides, C. Bretó, and A. A. King. Inference for nonlinear dynamical systems.
Proceedingsof the National Academy of Sciences , 103(49):18438–18443, 2006.P. E. Jacob, L. M. Murray, and S. Rubenthaler. Path storage in the particle filter.
Statistics andComputing , 25(2):487–496, 2015.P. E. Jacob, J. O’Leary, and Y. F. Atchadé. Unbiased Markov chain Monte Carlo with couplings.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 2020.C. Jarzynski. Nonequilibrium equality for free energy differences.
Physical Review Letters , 78(14):2690–2693, 1997.C. Jarzynski. Hamiltonian derivation of a detailed fluctuation theorem.
Journal of StatisticalPhysics , 98(1-2):77–102, 2000.J. E. Johndrow, J. C. Mattingly, S. Mukherjee, and D. Dunson. Optimal approximating Markovchains for Bayesian inference. arXiv preprint arXiv:1508.03387 , 2015.J. E. Johndrow, A. Smith, N. Pillai, and D. B. Dunson. MCMC for imbalanced categorical data.
Journal of the American Statistical Association , 114(527):1394–1403, 2019.S.-H. Jun and A. Bouchard-Côté. Memory (and time) efficient sequential Monte Carlo. In
International Conference on Machine Learning , pages 514–522, 2014.S.-H. Jun, L. Wang, and A. Bouchard-Côté. Entangled Monte Carlo. In
Advances in NeuralInformation Processing Systems , pages 2726–2734, 2012.N. Kantas, A. Beskos, and A. Jasra. Sequential Monte Carlo methods for high-dimensionalinverse problems: A case study for the Navier–Stokes equations.
SIAM/ASA Journal onUncertainty Quantification , 2(1):464–489, 2014.S. Kirkpatrick, C. D. G. Jr., and M. P. Vecchi. Optimization by simulated annealing.
Science ,220(4598):671–680, 1983.A. Kong, J. S. Liu, and W. H. Wong. Sequential imputations and Bayesian missing data problems.
Journal of the American Statistical Association , 89(425):278–288, 1994.30. Koskela, P. A. Jenkins, A. M. Johansen, and D. Spano. Asymptotic genealogies of interactingparticle systems with an application to sequential Monte Carlo.
The Annals of Statistics , 48(1):560–583, 2020.A. Lee and N. Whiteley. Variance estimation in the particle filter.
Biometrika , 105(3):609–625,2018.A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes. On the utility of graphics cards toperform massively parallel simulation of advanced Monte Carlo methods.
Journal of compu-tational and graphical statistics , 19(4):769–789, 2010.Y. Li, W. Wang, K. Deng, and J. S. Liu. Stratification and optimal resampling for sequentialMonte Carlo. arXiv preprint arXiv:2004.01975 , 2020.F. Lindsten, A. M. Johansen, C. A. Naesseth, B. Kirkpatrick, T. B. Schön, J. Aston, and A.Bouchard-Côté. Divide-and-conquer with sequential Monte Carlo.
Journal of Computationaland Graphical Statistics , 26(2):445–458, 2017.S. Livingstone and G. Zanella. On the robustness of gradient-based MCMC algorithms. arXivpreprint arXiv:1908.11812 , 2019.J. Marion and S. C. Schmidler. Finite sample complexity of sequential Monte Carlo estimators. arXiv preprint arXiv:1803.09365 , 2018.L. Middleton, G. Deligiannidis, A. Doucet, and P. E. Jacob. Unbiased smoothing using particleindependent Metropolis-Hastings. In K. Chaudhuri and M. Sugiyama, editors,
Proceedings ofMachine Learning Research , volume 89, pages 2378–2387. PMLR, 16–18 Apr 2019.J. W. Miller and D. B. Dunson. Robust Bayesian inference via coarsening.
Journal of theAmerican Statistical Association , 114(527):1113–1125, 2019.K. M. Murphy and R. H. Topel. Estimation and inference in two-step econometric models.
Journal of Business & Economic Statistics , 20(1):88–97, 2002.L. Murray, A. Lee, and P. Jacob. Parallel resampling in the particle filter.
Journal of Computa-tional and Graphical Statistics , 25(3):789–805, 2016a.L. Murray. GPU acceleration of the particle filter: the Metropolis resampler. arXiv preprintarXiv:1202.6163 , 2012.L. M. Murray and T. B. Schön. Automated learning with a probabilistic programming language:Birch.
Annual Reviews in Control , 46:29–43, 2018.L. M. Murray, S. Singh, P. E. Jacob, and A. Lee. Anytime Monte Carlo. arXiv preprintarXiv:1612.03319 , 2016b.C. A. Naesseth, F. Lindsten, and T. B. Schön. Nested sequential Monte Carlo methods. In
Proceedings of the 32nd International Conference on International Conference on MachineLearning - Volume 37 , ICML’15, pages 1292–1301. JMLR.org, 2015.C. A. Naesseth, F. Lindsten, and T. B. Schön. Sequential Monte Carlo for graphical models. In
Advances in Neural Information Processing Systems , pages 1862–1870, 2014.R. M. Neal. Annealed importance sampling.
Statistics and Computing , 11(2):125–139, 2001.R. M. Neal. An improved acceptance procedure for the hybrid Monte Carlo algorithm.
Journalof Computational Physics , 111(1):194–203, 1994.R. M. Neal. Hamiltonian importance sampling. In talk presented at the Banff InternationalResearch Station (BIRS) workshop on Mathematical Issues in Molecular Dynamics , 2005.31. M. Neal. MCMC using Hamiltonian dynamics.
Handbook of Markov chain Monte Carlo , 2(11):2, 2011.P. Ng and M. Maechler. A fast and efficient implementation of qualitatively constrained quantilesmoothing splines.
Statistical Modelling , 7(4):315–328, 2007.J. P. Nilmeier, G. E. Crooks, D. D. Minh, and J. D. Chodera. Nonequilibrium candidate MonteCarlo is an efficient tool for equilibrium simulation.
Proceedings of the National Academy ofSciences , 108(45):E1009–E1018, 2011.A. Nishimura and D. Dunson. Recycling intermediate steps to improve Hamiltonian Monte Carlo.
Bayesian Analysis , 2018.E. Nummelin. MC’s for MCMC’ists.
International Statistical Review , 70(2):215–240, 2002. doi:10.1111/j.1751-5823.2002.tb00361.x.J. Olsson and R. Douc. Numerically stable online estimation of variance in particle filters.
Bernoulli , 25(2):1504–1535, 2019.A. B. Owen.
Monte Carlo theory, methods and examples . TBA, 2013.A. B. Owen, Y. Maximov, and M. Chertkov. Importance sampling the union of rare events withan application to power systems analysis.
Electronic Journal of Statistics , 13(1):231–254, 2019.B. Paige, F. Wood, A. Doucet, and Y. W. Teh. Asynchronous anytime sequential Monte Carlo.In
Advances in neural information processing systems , pages 3410–3418, 2014.S. Panigrahi, J. Markovic, and J. Taylor. An MCMC-free approach to post-selective inference. arXiv preprint arXiv:1703.06154 , 2017.D. Paulin, A. Jasra, and A. Thiery. Error bounds for sequential Monte Carlo samplers formultimodal distributions.
Bernoulli , 25(1):310–340, 2019.M. Plummer. Cuts in Bayesian graphical models.
Statistics and Computing , 25(1):37–43, 2015.M. Pollock, P. Fearnhead, A. M. Johansen, and G. O. Roberts. Quasi-stationary Monte Carlo andthe ScaLE Algorithm.
Journal of the Royal Statistical Society. Series B: Statistical Methodol-ogy , 2020.N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using Pólya–Gamma latent variables.
Journal of the American statistical Association , 108(504):1339–1349,2013.R Core Team.
R: A Language and Environment for Statistical Computing . R Foundation forStatistical Computing, Vienna, Austria, 2018. URL .T. Rainforth, C. Naesseth, F. Lindsten, B. Paige, J.-W. Vandemeent, A. Doucet, and F. Wood.Interacting particle Markov chain Monte Carlo. In
International Conference on MachineLearning , pages 2616–2625, 2016.T. Rainforth, R. Cornish, H. Yang, A. Warrington, and F. Wood. On nesting Monte Carloestimators. arXiv preprint arXiv:1709.06181 , 2017.P. Rebeschini and R. Van Handel. Can local particle filters beat the curse of dimensionality?
The Annals of Applied Probability , 25(5):2809–2866, 2015.J. Ridgway. Computation of Gaussian orthant probabilities in high dimension.
Statistics andComputing , 26(4):899–916, 2016.M. Rischard, P. E. Jacob, and N. Pillai. Unbiased estimation of log normalizing constants withapplications to Bayesian cross-validation. arXiv preprint arXiv:1810.01382 , 2018.32. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and theirdiscrete approximations.
Bernoulli , 2(4):341–363, 1996.R. Royall and T.-S. Tsou. Interpreting statistical evidence by using imperfect models: robustadjusted likelihood functions.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 65(2):391–404, 2003.D. B. Rubin. Multiple imputation after 18+ years.
Journal of the American statistical Associa-tion , 91(434):473–489, 1996.R. Salomone, L. F. South, C. C. Drovandi, and D. P. Kroese. Unbiased and consistent nestedsampling via sequential Monte Carlo. arXiv preprint arXiv:1805.03924 , 2018.C. Schäfer and N. Chopin. Sequential Monte Carlo on large binary sampling spaces.
Statisticsand Computing , 23(2):163–184, 2013.E. Schöll-Paschinger and C. Dellago. A proof of Jarzynski’s nonequilibrium work theorem fordynamical systems that conserve the canonical distribution.
The Journal of chemical physics ,125(5):054105, 2006.N. Schweizer. Non-asymptotic error bounds for sequential MCMC and stability of Feynman–Kacpropagators. arXiv preprint arXiv:1204.2382 , 2012a.N. Schweizer. Non-asymptotic error bounds for sequential MCMC methods in multimodal set-tings. arXiv preprint arXiv:1205.6733 , 2012b.D. Sen and A. H. Thiery. Particle filter efficiency under limited communication. arXiv preprintarXiv:1904.09623 , 2019.S. Shao, P. E. Jacob, J. Ding, and V. Tarokh. Bayesian model comparison with the Hyvärinenscore: computation and consistency.
Journal of the American Statistical Association , pages1–24, 2019.S. A. Sisson, Y. Fan, and M. M. Tanaka. Sequential Monte Carlo without likelihoods.
Proceedingsof the National Academy of Sciences , 104(6):1760–1765, 2007.J. Skilling. Nested sampling for general Bayesian computation.
Bayesian analysis , 1(4):833–859,2006.C. Snyder, T. Bengtsson, P. Bickel, and J. Anderson. Obstacles to high-dimensional particlefiltering.
Monthly Weather Review , 136(12):4629–4640, 2008.L. F. South, A. N. Pettitt, and C. C. Drovandi. Sequential Monte Carlo samplers with indepen-dent Markov chain Monte Carlo proposals.
Bayesian Analysis , 14(3):773–796, 2019.M.-N. Tran, M. Scharth, M. K. Pitt, and R. Kohn. Importance sampling squared for Bayesianinference in latent variable models. arXiv preprint arXiv:1309.3339 , 2013.S. Vaikuntanathan and C. Jarzynski. Escorted free energy simulations: Improving convergenceby reducing dissipation.
Physical review letters , 100(19):190601, 2008.S. S. Vempala and A. Wibisono. Rapid convergence of the unadjusted Langevin algorithm:log-Sobolev suffices. In
Advances in Neural information processing systems , pages 8092–8104,2019.C. Vergé, C. Dubarry, P. Del Moral, and E. Moulines. On parallel implementation of sequentialMonte Carlo methods: the island particle model.
Statistics and Computing , 25(2):243–260,2015. 33. Wang, A. Bouchard-Côté, and A. Doucet. Bayesian phylogenetic inference using a combina-torial sequential Monte Carlo method.
Journal of the American Statistical Association , 110(512):1362–1374, 2015. doi: 10.1080/01621459.2015.1054487.N. Whiteley, A. Lee, and K. Heine. On the role of interaction in sequential Monte Carlo algo-rithms.
Bernoulli , 22(1):494–529, 2016.H. Wickham. ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag New York, 2016.ISBN 978-3-319-24277-4. URL https://ggplot2.tidyverse.org .F. Wood, J. W. Meent, and V. Mansinghka. A new approach to probabilistic programminginference. In
Artificial Intelligence and Statistics , pages 1024–1032, 2014.Y. Zhou, A. M. Johansen, and J. A. Aston. Toward automatic model comparison: an adaptivesequential Monte Carlo approach.
Journal of Computational and Graphical Statistics , 25(3):701–726, 2016.C. M. Zigler and F. Dominici. Uncertainty in propensity score estimation: Bayesian methodsfor variable selection and model-averaged causal effects.