Spatiotemporal blocking of the bouncy particle sampler for efficient inference in state space models
SSpatiotemporal blocking of the bouncy particle sampler forefficient inference in state space models
Jacob Vorstrup Goldman and Sumeetpal S. Singh Signal Processing and Communications Laboratory, Department of Engineering, University of Cambridge
Abstract
We propose a novel blocked version of the continuous-time bouncy particle sampler Bouchard-Cˆot´e et al. [2018] applicable to any differentiable probability density. Motivated by Singh et al.[2017], we also introduce an alternative implementation that leads to significant improvementin terms of effective sample size per second, and furthermore allows for parallelization at thecost of an extra logarithmic factor. The new algorithms are particularly efficient for latentstate inference in high-dimensional state space models, where blocking in both space and timeis necessary to avoid degeneracy of the proposal kernel. The efficiency of the blocked bouncyparticle sampler, in comparison with both the standard implementation of the bouncy particlesampler and the particle Gibbs algorithm of Andrieu et al. [2010], is illustrated numerically forboth simulated data and a challenging real-world financial dataset.
Markovian state space models are a class of probabilistic graphical models applied in biology, sig-nal processing, target tracking, finance and more, see Capp´e et al. [2006] for a technical overview.In our setup, a latent process ( x n , n ≥
1) on R d develops according to a state transition density p ( x n | x n − ), with p ( · ) denoting a generic density. The dimension of the latent process is denotedthe spatial dimension, although often no physically relevant interpretation might be available. Weindirectly observe the latent process through a noisy set of observations ( y n , N ≥ n ≥
1) definedon R m with m ≤ d , where the realizations depend only on the current value of the latent state, y n | x n ∼ p ( y n | x n ). We for convenience introduce the sequence notation i : j = ( i, i + 1 , . . . , j − , j )and [ n ] = (1 , , . . . , n − , n ). Unless otherwise mentioned, the sequence is y N fixed throughout.Given the observation sequence, we define smoothing as the off-line estimation of the conditionaljoint probability density p ( x l : m | y N ), with 1 ≤ l ≤ m ≤ N . We will be interested in the case wherethe target is the full conditional p ( x N | y N ). Smoothing is generally a hard problem due to the highdimensionality of the state space and spatiotemporal interdependence of the latent states; below wewill give a brief historical overview, and subsequently detail our contributions.Sequential Monte Carlo methods form the backbone of most smoothing algorithms. A popular earlyexample is the sequential importance resampling smoother of Kitagawa [1996], which utilises theentire trajectories and final particle weights of a particle filter to generate smoothed estimates. Thismethod suffers from severe particle degeneracy as the resampling step non-strictly decreases theavailable paths used to estimate the joint posterior. A solution was the algorithm of Godsill et al.1 a r X i v : . [ s t a t . C O ] J a n As a solution to the issues in high dimension, we propose a novel blocked sampling scheme basedon irreversible, continuous-time piecewise deterministic Markov processes. Methods based on thisclass of stochastic process were originally introduced as event-chain Monte Carlo in the statistical2hysics literature by Bernard et al. [2009], and subsequently further developed in the computationalstatistics literature recently, see for example Bouchard-Cˆot´e et al. [2018], Bierkens et al. [2019],Wu and Robert [2018], Power and Goldman [2019]. In practice, the algorithms iterate persistentdynamics of the state variable with jumps in the direction at random event-times. They also onlydepend on evaluations of the gradient of the log-posterior. Local versions of these samplers, seeBouchard-Cˆot´e et al. [2018] and Bierkens et al. [2020], can exploit any additive structure of thelog-posterior density to more efficiently update trajectories, however as discussed above, long rangedependencies of states indicate that sharing of information is desirable to achieve efficient mixing.To allow for sharing of information, we introduce a blocked version of the bouncy particle samplerof Bouchard-Cˆot´e et al. [2018] that utilises arbitrarily designed overlapping blocks. The blockingscheme is implementable without any additional assumptions on the target distribution, and istherefore useful for generic target densities, particularly in cases where the associated factor graphis highly dense.As our second contribution, we introduce an alternative implementation scheme of the blockedsampler that leverages partitions to simultaneously update entire sets of blocks. The number ofcompeting exponential clocks in the resulting sampler is independent of dimension and thus feature O (1) clocks for any target, and allows, for the first time for a piecewise-deterministic Monte Carloalgorithm, to carry out parallel updates at event-times at the cost of an additional logarithmic factor.The rate of decay of the event-times is, furthermore, logarithmic in the number of blocks comparedto linear in current algorithms.Our numerical examples indicate that the blocked samplers can achieve noteworthy improvementscompared to the bouncy particle sampler, both in terms of mixing time and effective sample sizeper unit of time, even without the use of parallelization. In addition, the blocked sampler providesefficient sampling of state space models when particle Gibbs methods, which are widely consideredstate of the art for state space models, fail due to high spatial dimensions. In what follows, subscript on a variable x will denote temporal indices, while superscript indicatesspatial. By x ∼ N (0 ,
1) we mean that x is distributed as a standard normal variable, whereas we by N ( x ; 0 ,
1) mean the evaluation at x of the standard normal density; this notation is extended to otherdensities. A generic Poisson process is denoted by Π and the associated, possibly inhomogeneous,rate function is the function t (cid:55)→ λ ( t ). Let M m,n be the space of m × n real-valued matrices,with m referring to row and n to columns, respectively. We denote by (cid:63) the Hadamard productoperator. The standard Frobenius norm of a matrix X ∈ M m,n is denoted (cid:107) X (cid:107) F = (cid:112) tr( X T X ) = (cid:113)(cid:80) i (cid:80) j x i,j , and the Frobenius inner product with another matrix Y ∈ M m,n is subsequently (cid:104) X, Y (cid:105) F = tr( X T Y ) = (cid:80) i (cid:80) j x i,j y i,j . 3 .2 State space models The class of state space models we consider have differentiable transition and observation densities p ( x ) = exp (cid:110) − f ( x ) (cid:111) , f ∈ C ( R d → R ) ,p ( x n | x n − ) = exp (cid:110) − f ( x n − , x n ) (cid:111) , f ∈ C ( R d × R d → R ) ,p ( y n | x n ) = exp (cid:110) − g ( x n , y n ) (cid:111) , g ∈ C ( R d × R m → R ) . It is thus natural to work in log-space for the remainder of the paper, and we note in this regard thatall probability density functions are assumed to be normalised. The exponential notation is thereforemerely a notational convenience to avoid repeated mentions of log-densities. We also only requireaccess to derivatives of f and g which may have more convenient expressions than the full probabilitydistribution. The negative log of the joint state density of the entire latent state x ∈ M d,N is denotedthe potential energy U : M d,N → R , and is given as U ( x N | y N ) ≡ − log π ( x N | y N ) = f ( x ) + g ( x , y ) + N (cid:88) n =2 f ( x n − , x n ) + g ( x n , y n ) . To ease notation we will drop the explicit dependence on y N when writing the conditional jointstate density from now on. We will often need to refer to the derivative ∂U/∂x , which we denoteas the matrix map ∇ U : M d,N → M d,N where the entry in the k ’th row and n ’th column is givenby the partial derivative ∇ U ( x ) k,n = ∂U ( x ) /∂x kn . Again, we remind the reader that subscript on avariable x will denote temporal indices, while superscript indicates spatial. Recall that [ n ] = (1 , , . . . , n − , n ). A blocking strategy B is a cover of the index of set of thelatent states I = [ d ] × [ N ], and solely consists of rectangular subsets. A generic block B is thusalways of the form i : j × l : m with i < j, l < m , with the coordinates referring to spatial and temporaldimensions, respectively. The size of a block is the ordered pair ( | i : j | , | l : m | ). Blocks are allowed tooverlap, we denote by the interior of a block the indices that it does not share with any other block.The neighbourhood set of a block is the blocking substrategy N ( B ) = { B (cid:48) ∈ B | B ∩ B (cid:48) (cid:54) = ∅} , it per definition always includes the block itself. A strategy is temporal if each block in a strategy isof the form 1: d × l : m , these are the most natural strategy types to consider for state space modelsand will be the general focus in the rest of the paper, but the methods presented below work forarbitrary strategies. To improve mixing of blocked samplers in general it is often necessary to designa blocking strategy such that within-block correlation between variables is large while the correlationwith out-of-block variables is small. For state space models, this naturally indicates blocking acrosstime, and in Figure 1 a temporal strategy with overlap ξ and interior δ is illustrated. We can in thiscase divide the blocks into even and odd subsets such that each subset consists of non-overlappingblocks, see again the Figure 1. As analyzed in Singh et al. [2017] for blocked Gibbs samplers,temporal overlap leads to improved sharing of information across time and subsequent improvedmixing. If the spatial dimension is very high, it can be necessary to block in the spatial domain aswell; blocking strategies should in this case aim to exploit any spatial decorrelation if possible.4 k B k+1 B k+2 B k-1 Even BlocksOdd Blocks time
Figure 1: A temporal blocking strategy with overlap ξ and interior δ between blocks highlighted.The strategy will be efficient if the overlap ξ is large enough to incorporate relevant informationfrom neighbours.Recall that for matrices and spaces of matrices, the notation M m,n indicates the m ’th row and n ’th column, respectively. The restriction of x ∈ M d,N to a block B = i : j × l : m is the submatrix x B ∈ M | i : j | , | l : m | corresponding to deleting all but the rows i : j and columns l : m of x . Similarly, theblock restriction of ∇ U is the map ∇ B U : M d,N → M | i : j | , | l : m | ; the entries of the submatrix ∇ B U ( x )are in correspondence with ∇ U ( x ) via ∇ B U ( x ) a,b = ∇ U ( x ) i + a − ,l + b − . In this section we derive conditions under which the bouncy particle sampler of Peters et al. [2012],Bouchard-Cˆot´e et al. [2018] can be run in blocked fashion; the resulting algorithm therefore appliesto any target distribution π . If we assume that B consists of a single block of size 1: d × N , thedescription below reduces to the standard bouncy particle sampler and it is therefore redundant todescribe both.The class of piecewise-deterministic Markov process we consider is a coupling of the solution x ( t ) of the ordinary differential equation dx ( t ) /dt = v ( t ), and a Markov jump process v ( t ) whereboth transition operator Q ( v, dv ) and rate process λ ( t ) depends on x ( t ) as well; v ( t ) will henceforthbe denoted the velocity. The joint process ( x ( t ) , v ( t )) takes values in M d,N × M d,N . Given aninitialization ( x (0) , v (0)), the state flows as ( x ( t ) , v ( t )) = ( x (0) + t · v (0) , v (0)) , until an event τ isgenerated by an inhomogeneous Poisson process with rate λ ( t ). At this point the velocity jumpsvia the kernel Q , and the process re-initializes at ( x ( τ ) , v ( τ )). To use such a process for Markovchain Monte Carlo, the jump rate λ ( t ) and transition kernel Q of v ( t ) are chosen such that themarginal stationary distribution of ( x ( t )) t ∈ [0 , ∞ ) is the target distribution of interest. Exactly asin Metropolis-Hastings algorithms, we want to accept all values as we move into regions of higherprobability and desire to change direction, by a new choice of velocity vector, as we enter regions ofdeclining probability. This in turn implies that the rate is determined by the directional derivativeof the energy U in the direction of v , while the transition kernel Q is a deterministic involution orrandom velocity change, for general details see Vanetti et al. [2017].Blocking of this process corresponds to a localization of the rate function and transition kernelsuch that each block is equipped with its own random clock and corresponding local velocity updatingmechanism. Subsequently, only velocities within a single block are changed at an event, whilepreserving the overall invariant distribution. In comparison with discrete time blocking that updatesthe variables one block at a time while keeping every other variable else fixed, in continuous timethe block direction is changed while keeping every other direction fixed. For dimensions that arein multiple blocks, the additional clocks implies an excess amount of events compared to the base5ase of no overlap; the φ variable introduced below adjusts for this discrepancy by speeding up thevelocity of the shared dimensions. Intuitively, as a dimension shared by k blocks will have events k times as often, it should move at k times the speed to compensate. This also aligns exactly withdiscrete-time blocked sampling, where dimensions shared between blocks are updated twice as often.We now present the blocked bouncy particle sampler in detail. We assume that the velocity isdistributed such that each v kn ∼ N (0 ,
1) in stationarity. For a blocking strategy B , we introduce theauxiliary variable φ ∈ M d,N with entries φ kn = { B ∈ B | ( k, n ) ∈ B } ,φ kn counts the number of blocks that include the k ’th spatial dimension and n ’th temporal dimension.Given φ , the resulting block-augmented flow of the ordinary differential equation driving x ( t ) is t (cid:55)→ x + t · ( φ (cid:63) v ); as mentioned, individual dimensions of x are sped up in proportion to how manyblocks includes them. With x (cid:55)→ { x } + ≡ max { , x } , the rate function for the Poisson process Π B associated with block B is λ B ( x, v ) = {(cid:104)∇ B U ( x ) , v B (cid:105) F } + ;the associated superposition of all blocks is the Poisson process Π B = ∪ B ∈ B Π B . Events generatedby Π B are denoted τ b with b referring to a bounce. Note that the inner product corresponds tothe directional derivative ∂U ( x + t · v ) /∂t restricted to B . For the transition kernel, we define reflect Bx ( v ) as the (deterministic) reflection of the velocity v B in the hyperplane tangent to theblock gradient at x reflect Bx ( v B ) = v B − (cid:104)∇ B U ( x ) , v B (cid:105) F (cid:107)∇ B U ( x ) (cid:107) F ∇ B U ( x ) , which, again, only updates the velocities that correspond to the block. Velocity resampling viaan independent homogeneous Poisson process with rate γ alleviates issues with irreducibility, see[Bouchard-Cˆot´e et al., 2018, Section 2.2], and these event-times are denoted τ r with r referring to re-freshment. Without writing the refreshment operator, the infinitesimal generator of ( x ( t ) , v ( t )) t ∈ [0 , ∞ ) is L bBPS f ( x, v ) = (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) F + (cid:88) B ∈ B λ B ( x, v ) (cid:104) f ( x, reflect Bx ( v )) − f ( x, v ) (cid:105) , (1)the sum of the block-augmented linear flow φ (cid:63) v driving x ( t ) and the sum of Markov jump processesupdating the block-restricted velocities v B . Proposition 1.
Consider a blocking strategy B and a target density π ( x ) ∝ exp {− U ( x ) } . Withthe generator defined in Equation (1), the blocked bouncy particle sampler has invariant distribution π ⊗ p v .Proof. See section A.1.The most closely corresponding method to the blocked bouncy particle sampler is the factoralgorithm presented in [Bouchard-Cˆot´e et al., 2018, Section 3.1]. If the target distribution factorisesover a finite set of individual factors F such that U ( x ) = (cid:88) f ∈ F U f ( x f ) , where x f corresponds to the restriction of the components in the factor, the local bouncy particlesampler of Bouchard-Cˆot´e et al. [2018] can be applied. This contrasts with the blocked sampler,6here blocks are allowed to share variables and can be designed without any particular demandson the sum-structure of the energy. The blocked sampler algorithm in practice functions as ahybrid between the Zig-Zag sampler of Bierkens et al. [2019] and the bouncy particle sampler: itincorporates the reflection operator when performing bounces, which allows for updating the velocityvector for multiple dimensions at event-times, but combines a more local rate structure akin to thatof the Zig-Zag sampler. In particular, if | B | = 1 , ∀ B ∈ B , then φ kn = 1 for all k, n ∈ [ d ] × [ N ],and the algorithm reduces to the Zig-Zag sampler, modulo a different stationary distribution for thevelocities. In this sense, the Zig-Zag sampler is naturally blocked, but does not allow for sharing ofinformation across dimensions. In Algorithm 1 the blocked bouncy particle sampler is presented inan implementable form. Due to the simplicity of the flow the computational challenge of the algorithm is to generate correctlydistributed event-times via Poisson thinning. The thinning procedures of Lewis and Shedler [1979]for simulating inhomogeneous Poisson processes is a two-step procedure that corresponds to findinga bounding process where direct simulation is available, and subsequently using rejection samplingto keep correctly distributed event-times.To employ thinning, local upper bounds t (cid:55)→ ¯ λ B ( t ) for each block needs to be estimated. Forsome fixed lookahead time θ > x, v ), local bounds satisfy¯ λ B ( t ) ≤ max s ∈ [0 ,θ ) {(cid:104)∇ B U ( x + s · v ) , v B (cid:105) F } + , ∀ t ∈ [0 , θ )and after θ time has passed, the bounds are recomputed at the new location ( x + θv, v ), if no reflectionor refreshment has occurred in the interrim. In some particular cases, universal global bounds canbe derived, but generally these bounds will have to be estimated by evaluating the rate functionat some future time-point. If the blocks are individually log-concave densities, evaluating the rateat the lookahead time, λ B ( θ ) , gives a valid bound until an event occurs. If blocks are overlapping,the local bounds of blocks in N ( B ) become invalid after a reflection and require updating. Thegeneric process is given in Algorithm 2. Given the global bounding function ¯ λ B ( t ) = (cid:80) B ∈ B ¯ λ B ( t ),an event-time τ is simulated from Π ¯ λ B , a block B is selected with probability proportional to itsrelative rate ¯ λ B ( τ ) / ¯ λ B ( τ ), and finally a reflection is carried out with probability corresponding tothe true rate function relative to the local bound λ B ( τ ) / ¯ λ B ( τ ). Given the local rate functions, thedominant cost is the unsorted proportional sampling of a block, which is done in O ( | B | ). As mentioned in the introduction, Singh et al. [2017] shows that the even-odd blocking strategywith overlaps is known to improve mixing, and furthermore allows for parallelization of updatesin the case of Kalman smoothers or particle filter-based smoothing algorithms. Conversely, thecurrent crop of piecewise-deterministic Markov process-based samplers are all purely sequential, inthe sense that at each event-time only the velocity of a single factor or dimension is updated, andthese samplers therefore fail to exploit any conditional independence structure available. We willin this section provide an alternative implementation of the blocked bouncy particle sampler thatmimics the even-odd strategy of discrete-time blocked samplers, extends to the fully spatially blockedsetting, and allows for parallel implementation of updates at event-times. To utilise this method,7 lgorithm 1:
Blocked Bouncy Particle Sampler
Data:
Initialize ( x , v ) ∼ p ( x , v ), set overall time t = 0, index j = 0, runtime T > θ > θ . (¯ λ B ) B ∈ B ← LocalBounds ( x , v , θ, B ) // Calculate initial bounds while t ≤ T do j ← j + 1 τ r ∼ Exp( (cid:80) B ¯ λ B ( τ j − )) // Reflection time τ b ∼ Exp( γ ) // Refreshment time τ j ← min { τ r , τ b } if τ j + t > Θ then // Valid time for bounds exceeded, reinitalize at Θ x j ← x j − + (Θ − t ) · φ (cid:63) v j − v j ← v j − (¯ λ B ) B ∈ B ← LocalBounds ( x j , v j , θ, B ) τ j ← Θ // Reinitialize event-time t ← τ j // Update overall time Θ ← Θ + θ // New valid bound time else t ← t + τ j // Update overall time x j ← x j − + τ j · φ (cid:63) v j − if τ j < τ b then // Select block for reflection Draw B ∈ B with P ( B = B i ) = ¯ λ B i ( τ j ) / ¯ λ ( τ j ) u ∼ U[0 , if u < λ B ( x jB , v j − B ) / ¯ λ B ( τ j ) then v jB ← reflect Bx v j − B // Update bounds for blocks affected by reflection (¯ λ B (cid:48) ) B (cid:48) ∈ N ( B ) ← LocalBounds ( x j , v j , Θ − τ − τ j , N ( B )) else v j ← v j − else // Refresh all velocities v j ∼ N (0 , I ( d × T ) × ( d × T ) ) return ( x j , v j , τ j ) Algorithm 2: LocalBounds( x, v, θ, B )Data: ( x, v ), θ > B . for B ∈ B do Find function t (cid:55)→ ¯ λ B ( t ) that on [0 , θ ) satisfies ¯ λ B ( t ) ≤ max s ∈ [0 ,θ ) λ B ( x + s · v, v B ). return (¯ λ B ) B ∈ B
8e need a partition of the blocking strategy into sub-blocking strategies such that no two blocksin any sub-blocking strategy share any variables. To this end, we capture the no-overlap conditionprecisely in the following assumption:
Assumption 1.
Consider a blocking strategy B . We will assume given a partition ∪ Kk =1 B k = B ofthe blocking strategy that satisfies, for each sub-blocking strategy B k , k = 1 , , . . . K and for all blocks B, B (cid:48) ∈ B k , that B ∩ B (cid:48) = ∅ . This assumption also applies to fully spatiotemporal blocking schemes and not just temporalstrategies. We will for illustrative purposes only describe in detail the simplest even-odd scheme oftemporal blocking, which corresponds to K = 2 sub-blocking strategies such that no blocks that aretemporally adjacent are in the same sub-blocking strategy. Assigning an integer-ordering k to theblocks of B based based on the first temporal index of each block, we partition the strategy intotwo sets based on whether k is even or odd, and denote the partitions { B odd , B even } . See Figure 1for an illustration valid for any Markovian state space model. Under this partition, one can observethat the largest rate amongst all blocks must be a valid rate for each block, and the algorithm willhave two exponential clocks with rates (cid:98) Λ odd ( t ) = max B ∈ B odd λ B ( t ) , (cid:98) Λ even ( t ) = max B ∈ B even λ B ( t )and their associated Poisson processes Π Bodd and Π
Beven . Bounds of (cid:98) Λ odd are denoted ¯Λ odd . Consideran event generated by their superposition, say Π Bodd generated the event. Then for each B ∈ B odd ,we consider a kernel Q Bx ( v, dv ) that updates the velocity of the block Q Bx ( v, dv ) = reflect Bx ( dv ) λ B ( t ) (cid:98) Λ odd ( t ) + δ v ( dv ) (cid:32) − λ B ( t ) (cid:98) Λ odd ( t ) (cid:33) . By the construction of B odd and B even , the updated velocities never conflict as they cannot overlap.We will show invariance for the particular case considered above; the result holds in general for anypartition satisfying Assumption 1. Proposition 2.
Let { B odd , B even } be a temporal strategy for π and B satisfying Assumption 1.Then the Markov process with associated generator L eoBPS f ( x, v ) = (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) F + (cid:88) κ ∈{ odd,even } (cid:98) Λ κ ( x, v ) (cid:88) B ∈ B κ (cid:90) [ f ( x, v (cid:48) ) − f ( x, v )] Q Bx ( v, dv (cid:48) ) has invariant distribution π ⊗ p v .Proof. See section A.2.In contrast to the basic blocked BPS, the generator of Proposition 2 has a single overall event-time drawn from either the odd or even strategy, but multiple overlapping event-times for the blockscontained in the sub-blocking strategy that generated the event.As an aside, if the spatial dimension is significant, it will be necessary to also carry out spatialblocking. Under a full spatiotemporal strategy, the above implementation naturally extends to afour clock system, consisting of alternating even-odd temporal strategies over each ’row’ of spatial9locks, such that that no blocks from the same substrategy overlap; this in turn guarantees thatAssumption 1 is satisfied.In comparison with the blocked bouncy particle sampler, the even-odd implementation iteratesover every block in the sub-blocking strategy that generated the event, updating velocities of theblocks with probability proportional to the ratio of the blocks’ rate λ B ( t ) to the rate of the sub-blocking strategy given by the max-bound. It therefore becomes possible to parallelize the updatingstep with multiple processors allocated each block of each sub-blocking strategy, and events aresubsequently in practice generated by the sub-blocking strategies rather than individual blocks.Thus, the event rate now depends on the minimum of only K individually stable Poisson processesrather than the minimum of all | B | clocks at once; the event-time thus decreases at a slower rateas the global dimension grows. In particular, under plausible assumptions on the tail-decay of thetarget distribution, we can bound the expected rate. Lemma 1.
Assume that for all B ∈ B P ( λ B ( x, v ) > s ) ≤ e − αs for some α > . Then both the odd and even sub-blocking strategies, indicated by subscript κ , satisfies E π ˆΛ κ ( x, v ) ≤ eα log | B κ | Proof.
See section A.3.The subexponential decacy is, for example, verifiable in the Gaussian case, where the rate-function is a mixture of a non-centered χ variable and a point-mass at zero.In Algorithm 3, we for completeness present the above implementation in pseudo-code. Toelaborate on the computational costs of the samplers, we compare the cost to run the samplers forone sampler second. Lemma 1 and the exponential event-times of Poisson processes indicates wecan expect O (log | B κ | ) events per time unit (Line 3.4) via (cid:98) Λ κ , each costing O ( | B κ | ) evaluations ofblocks (Line 3.19) per kernel Q Bx . Thus the total cost of the even-odd sampler per sampler secondis O ( | B κ | log | B κ | ). In comparison, the factor bouncy particle sampler has linear event-rate decay oforder O ( | F | ) by the inequality E π Λ F ( x, v ) = E π (cid:88) F ∈ F λ F ( x, v ) ≥ | F | min F ∈ F E π λ F ( x, v ) , combined with O (1) costs per event-time, for a total cost of O ( | F | ) per sampler second. However,the extra logarithmic factor of the even-odd implementation is mitigated by the fact that conditionalon the sub-blocking strategy that generated the event, each of the O ( | B κ | ) evaluations of the blockscan be carried out fully in parallel as no velocities are shared across ringing blocks. In practicalimplementations of piecewise-deterministic algorithms, local upper bounds of the rate function arerequired to apply Poisson thinning of the rate process.For state space models, the computational cost of estimating bounds will likely be close toequivalent for the different piececwise-deterministic algorithms due to the particular factor structureof the model. Tight bounds are in general necessary to avoid wasteful computation from false events,however, it is simultaneously desirable to have large event-times to move as far as possible betweenestimation of gradients. The outcome of this trade-off is improved for the even-odd implementation,as the max-type bound used in the event-times generation allow for larger lookahead times withoutcompromising the overall bound too much as it only depends on the largest bound over all blocks;10his contrasts with the sum-type bounds of the standard BPS and the basic blocked version, wherethe overall event-rate declines quickly as the bound rates of all the blocks contribute to the globalrate. Overall, this leads to a constant order cost reduction for the even-odd implementation viafewer calls to LocalBounds . We consider an autoregressive model of order 1 given by x n = Ax n − + η n , η n ∼ N (0 , I d ) y n = x n + (cid:15) n , (cid:15) n ∼ N (0 , I d )with A an autoregressive matrix with entries A ij = kern( i, j ) / (cid:16) ψ + (cid:80) dl =1 kern( i, l ) (cid:17) with kern( i, j ) =exp (cid:8) − σ | i − j | (cid:9) and ψ > x ∼ N (0 , I d ). First, we want to compare theempirical mixing speed of the blocked and factor bouncy particle samplers. We consider a simulatedmodel of d = 3 and N = 1000, σ = 5, and ψ = 0 .
1. We initialise each sampler at the zero vector, rununtil T = 1000, and thin at every 0.1 sampler second. In Table 1 we provide detailed specificationsof the setups for the various algorithms and results from a representative run of the algorithms.In Figure 2a, we plot the log of the mean square error as a function of time for increasing blockoverlap; empirically the blocked sampler with block width 20 and overlap 10 reaches stationarityaround 3 times faster than the factor version. In Figure 2b, we compare the mean squared jumpingdistance of the first spatial dimension after discarding the first 25% of samples. For the overlappingsections, the exploration is, due to the shared overlap and φ , happening at twice the speed, and,accordingly, four times the mean-square jumping distance compared to the factor algorithm. Interms of effective sample size per second, the blocked and even-odd samplers are about 30-40% and100% more efficient respectively than the factor sampler, without using any parallel implementation.It is observed in general for any choice of d and T that the benefits of speeding up the dimensionscompensate for the increased computational cost due to the overlaps. We also note that for modelslike this where the spatial dimension is low, there is not a strong argument to use PDMP-basedmethods as particle Gibbs with a basic particle filter will be more than adequate.Second, we consider the case where d = 200 and T = 100 to illustrate the benefits of spatialblocking in high dimensional scenarios. In this case we also include a spatiotemporal blockingstrategy, and the details of the example and a representative simulation are provided in Table 2.The model and example parameters are otherwise as described above.The spatiotemporally blocked sampler significantly outperforms the other implementations, witheffective sample size per second typically 2-4 times larger, evidenced over multiple runs with randomtrajectories generated under the model. The even-odd temporal implementation blocked strategy isoften still efficient even when the number of dimensions per block is up to 400, but the relative ESS/sis on aggregate lower than the spatiotemporally blocked version. Furthermore, this discrepancywill only increase under models with even higher spatial dimension. As before, no concurrentimplementation was used, indicating that additional improvements in performance are possible forthe partitioned blocking schemes when parallelized over multiple processors.11 lgorithm 3: Even-odd implementation of blocked Bouncy Particle Sampler
Data:
Provide: Initialize at (random) ( x , v ) ∼ p ( x , v ), set overall time t = 0, index j = 0, runtime T >
0, lookahead time θ > θ . (¯ λ B ) B ∈ B ← LocalBounds ( x , v , θ, B ) // Calculate initial bounds while t ≤ T do j ← j + 1 τ r ∼ Exp( (cid:80) i ∈{ even,odd } ¯Λ i ( τ j )) // Reflection time τ b ∼ Exp( γ ) // Refreshment time τ j ← min { τ r , τ b } if τ j + t > Θ then // Valid time for bounds exceeded, reinitalize at Θ x j ← x j − + (Θ − t ) · φ (cid:63) v j − v j ← v j − (¯ λ B ) B ∈ B ← LocalBounds ( x j , v j , θ, B ) τ j ← Θ // Reinitialize event-time t ← τ j // Update overall time Θ ← Θ + θ // New valid bound time else t ← t + τ j // Update overall time x j ← x j − + τ j · φ (cid:63) v j − if τ j < τ b then // Select blocking strategy subset Draw κ ∈ { even, odd } with P ( κ = i ) ∝ ¯Λ i ( τ j ) for B ∈ B κ do u B ∼ U[0 , if u B < λ B ( τ j ) / ¯Λ κ ( τ j ) then v jB ← reflect Bx v j − B for B ∈ { B max { j − , } , B j , B min { j +1 , | B |} } do ¯ λ B ← LocalBounds ( x j , v j , Θ − θ, B ) else v jB ← v j − B else // Refresh all velocities v j ∼ N (0 , I ( d × T ) × ( d × T ) ) (¯ λ B i ) i ∈ B ← LocalBounds ( x j , v j , Θ − θ, B )12 lgorithm Local BPS Blocked BPS Even-oddDimensions per factor/block 60 60 60Number of factors/blocks 50 101 101Number of sub-blocking strategies - - 2Temporal width 20 20 20Spatial width 3 3 3Temporal overlap - 10 10Spatial overlap - 0 0Relative performance 0.48 0.67 1.00Table 1: Specification of implementations and results for the autoregressive Gaussian model with T = 1000 and d = 3. Performance is measured in terms of ESS/s relative to the even-odd bBPS. Algorithm
Local BPS Blocked BPS Even-odd spatiotemporalDimensions per factor/block 400 400 400 54Number of factors/blocks 50 99 99 957Number of sub-blocking strategies - - 2 4Temporal width 2 2 2 9Spatial width 200 200 200 6Temporal overlap - 1 1 3Spatial overlap - 0 0 2Relative performance 0.36 0.34 0.56 1.00Table 2: Specification of implementations and results for the autoregressive Gaussian model with T = 100 and d = 200. Performance is measured relative to ESS/s for the spatiotemporal bBPS. We will in this section consider an example based a stochastic volatility model of the Dow JonesIndustrial Average (DJIA) equity index to explore the efficiency of the even-odd implementationof the BPS in comparison with two benchmark implementations of particle Gibbs when the spatialdimension is moderate and the length of the time-series is long. Stochastic volatility models arewidely used in finance and econometrics. They model the volatility of financial assets as a dynamiclatent process to capture the time-varying and persistent nature of changes in asset returns. Wewill analyse a general model proposed by Ishihara and Omori [2012] that incorporates heavy-tailedobservations and leverage effects, see Cont [2001] for empirical discussion of these effects. To test theblocked algorithms on a reasonably challenging dataset, we attempt to estimate the latent volatilityof the 29 continuously available constituents of the DJIA between April 1st 2017 and April 6th 2020,for a total of 29 ×
757 = 21953 latent states. This period is characterized both by relatively lowvolatility and historical high levels uncertainty due to the COVID-19 pandemic, see WHO [2020] forexample.Let x n ∈ R d be an unobserved vector of volatilities, and y n ∈ R d be observed asset log returns.13 5 X Q W L P H L Q 6 H F R Q G V / R J 0 H D Q 6 T X D U H ( U U R U % 3 6 E % 3 6 2 Y H U O D S E % 3 6 2 Y H U O D S E % 3 6 2 Y H U O D S E % 3 6 2 Y H U O D S E % 3 6 2 Y H U O D S E % 3 6 2 Y H U O D S (a)
200 225 250 275 300 325 350 375 400Temporal Dimension0.010.020.030.040.05 M ean S qua r ed J u m p D i s t an c e ( S pa t i a l D i m en s i on ) BPSbBPS (Overlap 9)bBPS (Overlap 10) (b)
Figure 2: (a) Mean square error estimate per unit of CPU time of the autoregressive Gaussian modelas the overlap varies. (b) Mean square jump distance for the standard bouncy particle sampler andblocked counter-part with overlaps 9 and 10, showcasing the impact φ has on exploration. Inparticular, the dips for the overlap 9 case corresponds to the variables that are part of a single blockonly, and subsequently are not sped up. We show a subset of 200 time points to enhance detail. S t o cks Figure 3: Estimated latent volatility (the posterior mean after discarding the first 250 samples) viathe blocked bouncy particle sampler for the 29 continuously available constituents of the Dow JonesIndustrial Average 30 index between April 1st 2017 and April 6th 2020.14
250 500 750 1000 1250 1500 1750 200024000220002000018000160001400012000 E ne r g y Temporospatially Blocked Bouncy Particle SamplerBlocked Particle GibbsParticle Gibbs with Ancestor Sampling (a) A u t o c o rr e l a t i on (b) Figure 4: (a) Traceplot of the log-posterior of the stochastic volatility model for all three samplers.(b) Autocorrelation of the energy for the blocked bouncy particle sampler after discarding the first250 samples as burn-in.The dynamics over a fixed time horizon n = 1 , , . . . , N are x n +1 = Ax n + η n y n = γ − n Λ n (cid:15) n , Λ n = diag(exp (cid:110) x n (cid:111) )with A = diag( α , α , . . . , α d ), where α i ∈ [0 , , ∀ i ∈ { , , . . . , d } . The noise is jointly modelled as (cid:18) η n (cid:15) n (cid:19) ∼ N (0 , (cid:98) Σ) , with (cid:98) Σ = (cid:18) Σ η Σ ρ Σ ρ Σ (cid:15) (cid:19) and (cid:98) Σ a 2 d × d matrix. The off-diagonal block matrices introduce leverage effects in the modelif they are negative definite. Finally, for some ν ∈ N , γ n ∼ Γ( ν , ν ) is a memory-less stochasticprocess independent of ( η n , (cid:15) n ). The resulting observation noise is multivariate t-distributed with ν degrees of freedom, details are in Ishihara and Omori [2012]. For the initialization we assumethat x ∼ N (0 , ( I d − AA ) − Σ η ). Define y γn = √ γy n as the observations whenever γ n is known;it follows that y γn = Λ n (cid:15) n and inference can be carried out with this observation sequence instead.Conditional on γ N and using basic properties of multivariate Gaussians, the transition distributionscan be written as p ( x n | x n − , y γn − ) = N ( Ax n − + Σ ρ Σ − (cid:15) Λ − n − y γn − , Σ η − Σ ρ Σ − (cid:15) Σ ρ ) p ( y γn | x n ) = N (0 , Λ n Σ (cid:15) Λ n ) , implying that the distribution has a more complicated dependence structure, as the past observationfeeds into the next realized state. Furthermore, the state transition is non-linear in the previousstate variable due to the leverage effect. 15or the blocking strategy, use a spatiotemporal strategy with blocks 9 timepoints wide, 7 spatialdimensions high, and each block has temporal overlap 4 and spatial overlap 3, giving a total of151 × d × N zero vector, and for the velocity we used the d × N vector of ones.Typically, estimation of latent states will be carried out inside a Gibbs sampling algorithm thatalso estimates parameters, indicating that prior knowledge of the states are retained, whereas thisexample tests the significantly more difficult case of no prior information on the latent states.In Figure 4a, we illustrate the posterior energy. The blocked particle Gibbs sampler moves ina wide band of posterior energy, but never reaches levels of higher posterior probability. This is incontrast to the results reported in Singh et al. [2017] where the dimension of the hidden state ismuch lower and thus the state transition density has better forgetting properties than our higherdimensional example. Even if this issue could be remedied, see Bunch et al. [2015], implementingparticle Gibbs with both temporal and spatial blocking appears non-trivial in contrast to the easeof which it can be achieved with the BPS. The ancestor sampling-based particle Gibbs samplersimilarly does not generate proposals that have high posterior probability. Conversely, the bBPSreaches stationarity in less than 100 samples, and subsequently mixes across the posterior: theauto-correlation function, plotted in Figure 4b, reaches zero around a lag of 20 samples, indicatingadequate mixing for a posterior of this dimension. In Figure 5, we plot the correlation matrix ofthe assets, and also the estimated latent volatility via the posterior mean. It is quite clear that thevolatilities show weaker correlation across the assets, but appear to preserve some of the structureof seen in the correlation matrix of the log returns. Acknowledgement
JVG acknowledges financial support from an EPSRC Doctoral Training Award.
A Proofs
A.1 Proof of Proposition 1
Proof.
Invariance follows if (cid:82) L bBPS f ( x, v ) π ( dx ) p ( dv ) = 0, see Ethier and Kurtz [2009]; the proofin essence follows that of Proposition 1 in Bouchard-Cˆot´e et al. [2018]. For the first part of thegenerator associated with the linear flow, consider first the integral with respect to π ( x ). We have (cid:90) (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) F π ( dx ) = 1 Z (cid:90) (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) F e − U ( x ) dx, and applying integration-by-parts we immediately get1 Z (cid:90) (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) F e − U ( x ) dx = 1 Z (cid:104) f ( x, v ) , φ (cid:63) v (cid:105) F e − U ( x ) (cid:12)(cid:12)(cid:12) ∞−∞ + 1 Z (cid:90) f ( x, v ) (cid:104)∇ U ( x ) , φ (cid:63) v (cid:105) F e − U ( x ) dx = (cid:90) f ( x, v ) (cid:104)∇ U ( x ) , φ (cid:63) v (cid:105) F π ( dx )16 Correlation matrix of observed returns
Correlation matrix of the latent volatility
Figure 5: Left: estimated correlation matrix from the log-returns over the entire period. Right:estimated correlation matrix of the latent volatilities from the posterior mean estimate from theeven-odd bBPS.by integrability of the functions in the domain of L bBP S . For the second part, note that (cid:88) B ∈ B λ B ( x, reflect Bx v ) − λ B ( x, v ) = (cid:88) B ∈ B (cid:8) − (cid:104)∇ B U ( x ) , v (cid:105) F (cid:9) + − (cid:8) (cid:104)∇ B U ( x ) , v (cid:105) F (cid:9) + = − (cid:88) B ∈ B (cid:88) ( k,n ) ∈ B ( k,n ) / ∈ B (cid:48) , ∀ B (cid:48) ∈ B ∇ B U kn ( x ) · v kn + d (cid:88) k =1 N (cid:88) n =1 φ kn > ( i )( φ kn − ∇ U kn ( x ) · v kn = −(cid:104)∇ U ( x ) , φ (cid:63) v (cid:105) F . Consider then the integral of the jump dynamics generator (cid:90) (cid:90) (cid:88) B ∈ B λ B ( x, v ) (cid:104) f ( x, reflect Bx v ) − f ( x, v ) (cid:105) π ( dx ) p ( dv ) . Using that reflect B, − x = reflect Bx by involution and the norm-preserving property of the re-stricted reflection operator we get (cid:90) λ B ( x, v ) f ( x, reflect Bx v ) π ( dx ) p ( dv ) = (cid:90) λ B ( x, reflect Bx v ) f ( x, v ) π ( dx ) p ( dv ) ,
17o we have from using the identity above that (cid:90) (cid:90) (cid:88) B ∈ B λ B ( x, v ) (cid:104) f ( x, reflect Bx v ) − f ( x, v ) (cid:105) π ( dx ) p ( dv )= (cid:90) (cid:90) (cid:88) B ∈ B (cid:104) λ B ( x, reflect Bx v ) − λ B ( x, v ) (cid:105) f ( x, v ) π ( dx ) p ( dv ) = − (cid:90) (cid:90) (cid:104)∇ U ( x ) , φ (cid:63) v (cid:105) F f ( x, v ) π ( dx ) p ( dv ) , which implies the result. A.2 Proof of Proposition 2
Proof.
We will show that the eoBPS is a special case of the blocked bouncy particle sampler, weagain subdue dependence on refreshments. Writing out the integral with respect to Q Bx , we have L eoBPS f ( x, v ) = (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) F + (cid:88) κ ∈{ odd,even } (cid:98) Λ κ ( x, v ) (cid:88) B ∈ B κ (cid:34) λ B ( x, v ) (cid:98) Λ κ ( x, v ) f ( x, reflect Bx v ) + (cid:32) − λ B ( x, v ) (cid:98) Λ κ ( x, v ) (cid:33) f ( x, v ) − f ( x, v ) (cid:35) = (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) + (cid:88) κ ∈{ odd,even } (cid:98) Λ κ ( x, v ) (cid:88) B ∈ B κ (cid:34) λ B ( x, v ) (cid:98) Λ κ ( x, v ) f ( x, reflect Bx v ) − λ B ( x, v ) (cid:98) Λ κ ( x, v ) f ( x, v ) (cid:35) = (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) + (cid:88) κ ∈{ odd,even } (cid:88) B ∈ B κ (cid:2) λ B ( x, v ) f ( x, reflect Bx v ) − λ B ( x, v ) f ( x, v ) (cid:3) = (cid:104)∇ x f ( x, v ) , φ (cid:63) v (cid:105) + (cid:88) B ∈ B λ B ( x, v ) (cid:2) f ( x, reflect Bx v ) − f ( x, v ) (cid:3) = L bBPS f ( x, v ) , which by Proposition 1 gives the result. A.3 Proof of Lemma 1
Proof.
We will just consider the odd strategy in the proof, everything translates seamlessly. ByH¨olders inequality, we have for any p ∈ NE π ⊗ p v ( (cid:98) Λ odd ) = E π ⊗ p v ( max B ∈ B odd λ B ) ≤ E π ⊗ p v (cid:18) max B ∈ B odd | λ B | p (cid:19) p ≤ | B odd | p max B ∈ B odd ( E π ⊗ p v | λ B | p ) p . B ∈ B odd , by adapting the proof of Lemma 1.10 in Rigollet and H¨utter [2015], we have bythe sure positivity of the rate function that E π ⊗ p v | λ B | p = (cid:90) [0 , ∞ ) P ( λ B ≥ t /p ) dt ≤ (cid:90) [0 , ∞ ) e − αt /p dt = 2 p (2 α ) p (cid:90) [0 , ∞ ) e − u u p − du = 1 α p p ! , such that in particular E π ⊗ p v ˆΛ odd ≤ | B odd | p pα . (2)Optimizing over p leads to p ∗ = log | B odd | , which together with the bound gives E π ⊗ p v ˆΛ odd ≤ | B odd | | Bodd | | B odd | α ≤ eα log | B odd | , implying the result. B Parameters of the stochastic volatility model
B.1 Choice of parameters
The daily asset prices ( p n ) n =1 , ,...,N is collected from eSignal Data Access and transformed to logreturns via the relation y n = log p n /p n − . As our paper is centered on latent state estimation,we have foregone a full Bayesian parameter estimation. Instead, for all unobserved quantities wehave used the parameters proposed by Ishihara and Omori [2012] Section 3, these are based onprevious empirical studies by the authors and quite closely correspond to what is inferred during theirparameter estimation procedure on S&P 500 data. In particular, we set the persistence parameterto α = 0 .
99 and use ν = 15 degrees of freedom for the multivariate t-distribution.For the unobserved volatility covariance matrix Σ η , the cross-asset correlation is set at 0 .
7, andthe same standard deviation is assumed for each asset, 0 .
2. For the leverage matrix, the intra-assetparameter is set at Σ ρ,ii = − . ρ,ij = − . (cid:15) directly from the observed log returns over theentire period. The values we arrive at from this procedure is again close to what is empiricallyobserved, indicating it is a reasonable parameter value to use for a latent states estimate. If we wereto run a spatially blocked scheme, we could for example apply a hierarchical clustering algorithmlike the algorithm of Ward Jr [1963] to learn the relationship between the assets, and then rearrangethe order of the assets to match the order of the clustering procedure. As discussed this should havea beneficial effect on mixing, as the blocks become more localised. References
Olivier Capp´e, Eric Moulines, and Tobias Ryden.
Inference in Hidden Markov Models (SpringerSeries in Statistics) . Springer New York, 2006.19enshiro Kitagawa. Monte carlo filter and smoother for non-gaussian nonlinear state space models.
Journal of computational and graphical statistics , 5(1):1–25, 1996.Simon J Godsill, Arnaud Doucet, and Mike West. Monte carlo smoothing for nonlinear time series.
Journal of the american statistical association , 99(465):156–168, 2004.Mark Briers, Arnaud Doucet, and Simon Maskell. Smoothing algorithms for state–space models.
Annals of the Institute of Statistical Mathematics , 62(1):61, 2010.Axel Finke and Sumeetpal S Singh. Approximate smoothing and parameter estimation in high-dimensional state-space models.
IEEE Transactions on Signal Processing , 65(22):5982–5994, 2017.Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle markov chain monte carlomethods.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342, 2010.Christophe Andrieu, Anthony Lee, and Matti Vihola. Uniform ergodicity of the iterated conditionalsmc and geometric ergodicity of particle gibbs samplers. arXiv preprint arXiv:1312.6432 , 2013.Nicolas Chopin and Sumeetpal S Singh. On particle gibbs sampling.
Bernoulli , 21(3):1855–1883,2015.Pierre E Jacob, Fredrik Lindsten, and Thomas B Sch¨on. Smoothing with couplings of conditionalparticle filters.
Journal of the American Statistical Association , pages 1–20, 2019.A. Lee, S. S. Singh, and M. Vihola. Coupled conditional backward sampling particle filter.
Annalof Statistics , 2019.Thomas Bengtsson, Peter Bickel, Bo Li, et al. Curse-of-dimensionality revisited: Collapse of theparticle filter in very large scale systems. In
Probability and statistics: Essays in honor of DavidA. Freedman , pages 316–334. Institute of Mathematical Statistics, 2008.Patrick Rebeschini, Ramon Van Handel, et al. Can local particle filters beat the curse of dimension-ality?
The Annals of Applied Probability , 25(5):2809–2866, 2015.Alexandros Beskos, Dan Crisan, Ajay Jasra, Kengo Kamatani, and Yan Zhou. A stable particlefilter for a class of high-dimensional state-space models.
Advances in Applied Probability , 49(1):24–48, 2017.Pierre Del Moral, Julian Tugaut, et al. On the stability and the uniform propagation of chaosproperties of ensemble kalman–bucy filters.
The Annals of Applied Probability , 28(2):790–850,2018.Jana de Wiljes, Sebastian Reich, and Wilhelm Stannat. Long-time stability and accuracy of theensemble kalman–bucy filter for fully observed processes and small measurement noise.
SIAMJournal on Applied Dynamical Systems , 17(2):1152–1181, 2018.Peter Jan Van Leeuwen, Hans R K¨unsch, Lars Nerger, Roland Potthast, and Sebastian Reich.Particle filters for high-dimensional geoscience applications: A review.
Quarterly Journal of theRoyal Meteorological Society , 145(723):2335–2365, 2019.N Whiteley. Discussion on particle markov chain monte carlo methods.
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 72:306–307, 01 2010.20redrik Lindsten, Michael I Jordan, and Thomas B Sch¨on. Particle gibbs with ancestor sampling.
The Journal of Machine Learning Research , 15(1):2145–2184, 2014.Sumeetpal S Singh, Fredrik Lindsten, and Eric Moulines. Blocking strategies and stability of particlegibbs samplers.
Biometrika , 104(4):953–969, 2017.Etienne P Bernard, Werner Krauth, and David B Wilson. Event-chain monte carlo algorithms forhard-sphere systems.
Physical Review E , 80(5):056704, 2009.Alexandre Bouchard-Cˆot´e, Sebastian J Vollmer, and Arnaud Doucet. The bouncy particle sam-pler: A nonreversible rejection-free markov chain monte carlo method.
Journal of the AmericanStatistical Association , 113(522):855–867, 2018.Joris Bierkens, Paul Fearnhead, Gareth Roberts, et al. The zig-zag process and super-efficientsampling for bayesian analysis of big data.
The Annals of Statistics , 47(3):1288–1320, 2019.Changye Wu and Christian P Robert. The coordinate sampler: A non-reversible gibbs-like mcmcsampler. arXiv preprint arXiv:1809.03388 , 2018.Samuel Power and Jacob Vorstrup Goldman. Accelerated sampling on discrete spaces with non-reversible markov processes. arXiv preprint arXiv:1912.04681 , 2019.Joris Bierkens, Sebastiano Grazzi, Frank van der Meulen, and Moritz Schauer. A piecewise deter-ministic monte carlo method for diffusion bridges. arXiv preprint arXiv:2001.05889 , 2020.Elias AJF Peters et al. Rejection-free monte carlo sampling for general potentials.
Physical ReviewE , 85(2):026703, 2012.Paul Vanetti, Alexandre Bouchard-Cˆot´e, George Deligiannidis, and Arnaud Doucet. Piecewise de-terministic markov chain monte carlo. arXiv preprint arXiv:1707.05296 , 2017.PA W Lewis and Gerald S Shedler. Simulation of nonhomogeneous poisson processes by thinning.
Naval research logistics quarterly , 26(3):403–413, 1979.Tsunehiro Ishihara and Yasuhiro Omori. Efficient bayesian estimation of a multivariate stochasticvolatility model with cross leverage and heavy-tailed errors.
Computational Statistics & DataAnalysis , 56(11):3674–3689, 2012.Rama Cont. Empirical properties of asset returns: stylized facts and statistical issues.
QUANTI-TATIVE FINANCE , 1:223–236, 2001.WHO. Coronavirus disease 2019 (covid-19): situation report, 67. 2020.Pete Bunch, Fredrik Lindsten, and Sumeetpal Singh. Particle gibbs with refreshed backward sim-ulation. In , pages 4115–4119. IEEE, 2015.Paul Gustafson. A guided walk metropolis algorithm.
Statistics and Computing , 8(4):357–364, 1998.Chris Sherlock and Alexandre H Thiery. A discrete bouncy particle sampler. arXiv preprintarXiv:1707.05200 , 2017.Radford M Neal. Slice sampling.
Annals of statistics , pages 705–741, 2003.21ntonietta Mira et al. On metropolis-hastings algorithms with delayed rejection.
Metron , 59(3-4):231–241, 2001.Stewart N Ethier and Thomas G Kurtz.
Markov processes: characterization and convergence , volume282. John Wiley & Sons, 2009.Phillippe Rigollet and Jan-Christian H¨utter. High dimensional statistics. 2015.Joe H Ward Jr. Hierarchical grouping to optimize an objective function.