On Scalable Particle Markov Chain Monte Carlo
aa r X i v : . [ s t a t . M E ] J u l Efficiently Combining Pseudo Marginal andParticle Gibbs Sampling
David Gunawan, Christopher K. Carter and Robert KohnSchool of EconomicsUniversity of New South Wales
Abstract
Particle Markov Chain Monte Carlo (PMCMC) is a general approach tocarry out Bayesian inference in non-linear and non-Gaussian state space mod-els. Our article shows how to scale up PMCMC in terms of the number ofparameters and number of time points by generating parameters that arehighly correlated with the states with the states integrated out using a pseudomarginal step while the rest of the parameters are generated conditional onthe states using particle Gibbs. We make the PMCMC scalable in the numberof observations by using the same random numbers in the Metropolis-Hastingsratio of the pseudo marginal step. We do so by expressing the target densityof the PMCMC in terms of the basic uniform or standard normal randomnumbers rather than in terms of the particles, as has been done till now, anddevelop a constrained version of conditional sequential Monte Carlo algorithm.We illustrate the methods using a high dimensional factor stochastic volatil-ity having both a large number of parameters and a large number of latentstates and show that our proposed method makes the computation much moreefficient.
Keywords:
Correlated pseudo marginal Metropolis-Hastings; Factor stochasticvolatility model; Particle Gibbs sampler.
Our article considers the problem of statistical inference for both the unobservedlatent states and the parameters in a class of non-linear and non-Gaussian statespace models. We develop an efficient particle Markov chain Monte Carlo (PM-CMC) sampling scheme that converge to the posterior distribution of the unob-1erved latent states and the parameters and extends the PMCMC methods devel-oped by Andrieu et al. (2010), Lindsten and Sch¨on (2012), Lindsten et al. (2014),Olsson and Ryden (2011), and Mendes et al. (2018).In a seminal paper, Andrieu et al. (2010) proposed two PMCMC methods forstate space models. The first is the particle marginal Metropolis-Hastings (PMMH),where the parameters are generated with the unobserved states integrated out. Thesecond is particle Gibbs (PG), which generates the parameters conditional on thestates. The basic idea of PMCMC methods is to define a target distribution onan augmented space that includes all of the parameters and the particles generatedby a sequential Monte Carlo (SMC) algorithm and has the joint posterior densityof the parameters and states as a marginal density. Mendes et al. (2018) proposea new particle MCMC sampler that combines these two approaches and generatesthe parameters that are highly correlated with the states in a PMMH step, while allother parameters are generated in PG step(s), and we call their sampler the PMMH+ PG sampler.In a separate line of research, Deligiannidis et al. (2018) proposed the correlatedpseudo marginal Metropolis Hastings method. This approach correlates the randomnumbers used in constructing the logs of the estimated likelihood at the current andproposed values of the parameters. They show that by inducing a high correlationin successive iterations between those random numbers, it is necessary to increasethe number of particles N in proportion to T k/ ( k +1) , where T is the number of ob-servations and k is the state dimension. The computational complexity of correlatedPMMH is O (cid:16) T k +1 k +1 (cid:17) , up to a logarithmic factor, compared to O ( T ) for the stan-dard PMMH sampler. This shows that the correlated PMMH can be much moreefficient, and thus significantly reduce, the number of particles required by the stan-dard pseudo marginal method proposed by Andrieu et al. (2010) for low values ofthe dimension k .Our article builds on the correlated PMMH sampler of Deligiannidis et al. (2018)and the PMMH +PG sampler of Mendes et al. (2018) and proposes a novel PMCMCsampler for state space models, which we call the Efficient PMMH + PG sampler. Itrelies on a non-trivial and non-standard combination of the correlated PMMH andPG samplers which takes advantage of the strength of both the PG and correlatedPMMH methods. We derive the augmented target distribution, which is the invariantdistribution of the Efficient PMMH + PG sampler, and show that it has the posteriordistribution of the states and parameters as its marginal.The important innovations of the Efficient PMMH + PG sampler compared tocorrelated PMMH (Deligiannidis et al., 2018) and the PMMH+ PG sampler (Mendes et al.,2018) are that (a) The augmented target density of the Efficient PMMH + PG sam-2ler involves the basic random numbers (independent uniforms and standard nor-mals) rather than the particles themselves; (b) The new augmented target permitsus to develop a constrained conditional sequential Monte Carlo (CCSMC) algorithmusing multinomial resampling to preserve correlation instead of the conditional se-quential Monte Carlo algorithm; (c) In contrast to the correlated PMMH approach,in the PMMH part we condition on the same set of random numbers that are gener-ated using the CCSMC. This means that the correlation between random numbersused in constructing the logs of the estimated likelihood at the current and proposedvalues of the parameters is one.The proposed sampler is illustrated empirically using a sample of daily US stockreturns to estimate a univariate stochastic volatility model with leverage and a mul-tivariate factor stochastic volatility model with leverage. We note that current ap-proaches to estimate factor SV models often employ MCMC samplers, see for exam-ple, Chib et al. (2006) and Kastner et al. (2017) that are neither exact nor flexible.They use the approach proposed by Kim et al. (1998) to approximate the joint distri-bution of outcome innovations by a suitably constructed seven component mixture ofnormal distributions. Furthermore, Omori et al. (2007) requires approximating thejoint distribution of outcome and volatility innovations by a ten-component mixtureof bivariate normal distributions for a univariate SV model with leverage. Our articleshows that it is unnecessary to make such approximations. We also note that theeffectiveness of the Efficient PMMH + PG sampler will be even more pronouncedif the stochastic volatilities are modeled as continuous time processes observed dis-cretely because it is well known in this case that the parameters are highly correlatedwith the states; see, e.g., Mendes et al. (2018).The rest of the article is organized as follows. Section 2 outlines the basic statespace model, the sequential Monte Carlo algorithm for this model, and the back-ward simulation method. Section 3 introduces the Efficient PMMH + PG sampler,its invariant distribution and the constrained conditional sequential Monte Carlo al-gorithm which is an important component of the Efficient PMMH + PG sampler.Section 4 introduces the factor stochastic volatility model, which is our main applica-tion. Section 5 presents empirical results for both the univariate stochastic volatilityand for the factor stochastic volatility model. This section introduces the state space model and reviews sequential Monte Carlo(SMC) and backward simulation. We use the colon notation for collections of vari-ables, i.e. a r : s = ( a r , ..., a s ) for integers r ≤ s , a r : s is null for r > s , a r : st = ( a rt , ..., a st )3nd for t ≤ u , a t : ur : s = ( a t : ur , ..., a t : us ). We consider the stochastic process { ( X t , Y t ) , t ≥ } with parameter θ . The Y t are theobservations and the X t form a latent process, which we call the state process. Thedensity X = x is f θ ( x ) , the density of X t = x t given X t − = x t − , Y t − = y t − is f θt ( x t | x t − , y t − ) ( t ≥ Y t = y t given X t = x t , Y t − = y t − is g θt ( y t | x t ).We assume that the parameter θ ∈ Θ, where Θ is a subset of R d θ and that theuser specifies a prior p ( θ ) for θ . The X t are X valued and the Y t are Y valued and g θt and f θt are densities with respect to dominating measures which we write as dx and dy . The dominating measures are frequently taken to be the Lebesgue measureif X ∈ B (cid:0) R d x (cid:1) and Y ∈ B (cid:0) R d y (cid:1) , where B ( A ) is the Borel σ -algebra generated bythe set A . Usually X = R d x and Y = R d y .The joint posterior density function of ( x T , y T ) is p ( x T , y T | θ ) = f θ ( x ) g θ ( y | x ) T Y t =2 f θt ( x t | x t − , y t − ) g θt ( y t | x t ) . (1)The likelihood of θ for y T is Q Tt =1 Z t ( θ ), where Z ( θ ) = p ( y | θ ) and Z t ( θ ) = p ( y t | y t − , θ ) for t ≥
2. By using Bayes rule, we can express the joint posteriordensity of θ and X T as p ( x T , θ | y T ) = p ( x T , y T | θ ) p ( θ ) Z T , where the marginal likelihood of y T is Z T = ´ Θ Q Tt =1 Z t ( θ ) p ( θ ) dθ = p ( y T ). Example: Univariate stochastic volatilty with leverage
We illustrate the above state space model by considering the univariate stochas-tic volatility model with leverage discussed and motivated by Harvey and Shephard(1996), y t = exp( x t / ǫ t x ∼ N (cid:16) µ, τ − φ (cid:17) x t +1 = µ + φ ( x t − µ ) + η t ( t ≥ ǫ t η t ! ∼ N ! , ρτρτ τ !! (2)4ith the ( ǫ t , η t ) T sequence independent and identically distributed. Hence, theobservation density and the state transition density are given by g θt ( y t | x t ) = N (cid:16) , exp( x t ) (cid:17) , ( t ≥ f θ ( x ) = N (cid:16) µ, τ − φ (cid:17) f θt ( x t +1 | x t , y t ) = N ( µ + φ ( x t − µ ) + ρτ exp( − x t / y t , τ (1 − ρ ) (cid:17) ( t ≥ . (3)We call x t the latent volatility process. To ensure that the model is stationary, werestrict the persistence parameter φ so that | φ | <
1. We also need that the correlation ρ between ǫ t and η t lies between ±
1. If ρ = 0 in Eq. (2), then we obtain the standardvolatility model without leverage. Exact filtering and smoothing are only available for some special cases, such as thelinear Gaussian state space model. For non-linear and non-Gaussian state spacemodels, a sequential Monte Carlo (SMC) algorithm can be used to approximate thejoint filtering densities.Unless stated otherwise, upper case letters indicate random variables and lowercase letters indicate the corresponding values of these random variables. e.g., A jt and a jt , X t and x t .We use SMC to approximate the joint filtering densities { p ( x t | y t , θ ) : t = 1 , , ..., T } sequentially using N particles, i.e., weighted samples (cid:8) x Nt , w Nt (cid:9) , drawn from someproposal densities m θ ( x ) = m ( x | Y = y , θ ) and m θt ( x t | x t − ) = m t ( x t | X t − = x t − , Y t = y t , θ ) for t ≥
2. For t ≥
1, we define π t ( x t | θ ) = p ( x t | y t , θ ), S θt = { x t ∈ χ t : π t ( x t | θ ) > } and Q θt = (cid:8) x t ∈ χ t : π t − ( x t − | θ ) m θt ( x t | x t − ) > (cid:9) .We follow Andrieu et al. (2010) and assume that Assumption 1. S θt ⊆ Q θt for any θ ∈ Θ and t = 1 , ..., T . Assumption 1 ensures that the proposal densities m θt ( x t | x t − ) can be used toapproximate π t ( x t | x t − , θ ) for t ≥
1. If m θt ( · ) is a mixture of some general pro-posal density e m θt ( x t | x t − ) and f θt ( · ), with f θt ( · ) having nonzero weight, and and g θt ( y t | x t ) > θ , then Assumption 1 is satisfied. In particular, if we use thebootstrap filter then m θt ( · ) = f θt ( · ). Furthermore, g θt ( y t | x t ) > θ for theunivariate stochastic volatility model in Section 2.1.5 efinition 1. We define w i = g θ ( y | x i ) f θ ( dx i ) m θ ( dx i ) , w it = g θt ( y t | x it ) f θt ( dx it | x a it − t − , y t − ) m θt ( dx it | x a it − t − ) for t ≥ and w it = w it / N X j =1 w jt . We implement the SMC for t = 2 , . . . , T , by using the resampling scheme M ( a Nt − | w Nt − , x Nt − ), which depends on w Nt − and x Nt − for a given t . The argument a Nt − means that X A it − t − = x a it − t − is the ancestor of X it = x it .We follow Andrieu et al. (2010) and assume that Assumption 2.
For any k = 1 , ..., N and t = 2 , .., T , the resampling scheme M (cid:0) a Nt − | ¯ w Nt − , x Nt − (cid:1) satisfies Pr (cid:0) A kt − = j | ¯ w Nt − (cid:1) = ¯ w jt − . Assumption 2 is used to prove Theorem 1 and is satisfied by all the popularresampling schemes, e.g., multinomial, systematic and residual resampling. We re-fer to Douc and Capp´e (2005) for a comparison between resampling schemes andDoucet et al. (2000); Van Der Merwe et al. (2001); Scharth and Kohn (2016) for thechoice of proposal densities.
Definition 2.
Let V ix,t = v ix,t be the vector random variable used to generate X it given θ and x a it − t − , where a it − is the ancestor index of X it , as defined above. We write, X i = X ( V ix, ; θ, · ) and X it = X ( V ix,t ; θ, x a it − t − ) for t ≥ . (4) Denote the distribution of V ixt as ψ xt ( · ) . Common choices for ψ xt ( · ) are iid U (0 , or iid N (0 , random variables. Definition 3.
For t ≥ , we define V A,t − as the vector of random variables usedto generate the ancestor indices A Nt − using the resampling scheme M (cid:0) ·| ¯ w Nt − , x Nt − (cid:1) and define ψ A,t − ( · ) as the distribution of V A,t − . This defines the mapping A Nt − = A (cid:16) v A,t − ; w Nt − , x Nt − (cid:17) . Common choices for ψ A,t − ( · ) are iid U (0 , or iid N (0 , random variables. We define the joint distribution of (cid:0) V Nx, T , V A, T − (cid:1) as ψ (cid:0) dV Nx, T , dV A, T − (cid:1) = T Y t =1 N Y i =1 ψ xt (cid:0) dV ixt (cid:1) T − Y t =1 ψ At ( dV At ) . (5)The SMC algorithm also provides an unbiased estimate of the likelihood b Z T ( v x, T , v A, T − , θ ) = T Y t =1 N − N X i =1 w it ! , (6)6hich we will often write as b Z T ( θ ) for short.For the PMMH step(s) in the Efficient PMMH + PG sampler to be effectivewe require that the SMC algorithm ensures that the logs of the likelihood estimates b Z ( V x, T , V A, T − , θ ∗ ) and b Z ( V x, T , V A, T − , θ ) are highly correlated, especially when θ ∗ and θ are close, where θ ∗ is the proposed value of the parameters and θ is thecurrent value. This requires implementing the SMC algorithm carefully as a naiveimplementation introduces discontinuities in the logs of the estimated likelihoodsdue to the resampling steps when θ and θ ∗ are even slightly different. Consider,for simplicity, the one dimensional case. In the usual resampling scheme, we firstconstruct an empirical cumulative distribution function b F Nt − ( j ) = P ji =1 w it . whichis based on the index of the particle that we want to sample. We can then samplean index as a it − = min j b F Nt − ( j ) ≥ v iAt − , where v iAt − is a uniform random variableon (0 , a it − , and the selected particle x a it − t − , for i = 1 , ..., N . The problem here is that particles whose indices are close are notnecessarily close themselves. This creates the discontinuity and breaks down thecorrelation between the likelihoods at the current and proposed values. In the onedimensional case, this problem can be solved by first sorting the particles from thesmallest to largest (Malik and Pitt, 2011), which ensures that particles whose indicesare close are actually close to each other.However, this simple sorting method does not extend easily to the multivari-ate case, because we cannot sort multivariate states in this manner and guaran-tee closeness of the particles. We use instead the Hilbert sorting method as inDeligiannidis et al. (2018). The Hilbert curve is a continuous map H : [0 , → [0 , d ,for dimension d >
1. It admits the pseudo inverse h : [0 , d → [0 ,
1] in which H ◦ h ( x ) = x . If the points x and x ∗ are close in [0 , d , then h ( x ) and h ( x ∗ ) tendto be close. Multidimensional particles can then be sorted using their correspondingHilbert index h . The sorting algorithm now becomes a one dimensional operation asfor the univariate state.Algorithm 4 in Appendix B describes the SMC algorithm that we use, whichis similar to that of Deligiannidis et al. (2018). Algorithm 4 uses the multinomialresampling scheme (Algorithm 5) in the appendix. The Efficient PMMH + PG sampler requires sampling from the particulate ap-proximation of p ( x T | y T , θ ). We denote the selected particles and trajectory by x j T T = (cid:0) x j , ..., x j T T (cid:1) and j T , respectively. One way to do so is the backward simu-lation algorithm introduced by Godsill et al. (2004) and used in Olsson and Ryden72011). This approach samples the indices J T , J T − , ..., J sequentially, and differsfrom the ancestral tracing algorithm of Kitagawa (1996) which only samples oneindex and traces back its ancestral lineage. Backward simulation is also more robustto the resampling scheme (multinomial resampling, systematic resampling, residualresampling, or stratified resampling) used in the resampling step of the algorithm(see Chopin and Singh, 2015). Algorithm 6 in Appendix B.1 describes the backwardsimulation algorithm. The key idea of particle Markov chain Monte Carlo (PMCMC) methods is to con-struct a target distribution on an augmented space that includes all the particlesgenerated by the SMC algorithm and has the joint posterior density of the latentstates and parameters p ( θ, x T | y T ) as a marginal.The Efficient PMMH + PG sampler targets the distribution e π N (cid:0) dv Nx, T , dv A, T − , j T , dθ (cid:1) := p (cid:0) dx j T T , dθ | y T (cid:1) N T × ψ (cid:0) dv Nx, T , dv A, T − (cid:1) m θ (cid:0) dx j (cid:1) Q Tt =2 w a jtt − t − m θt (cid:18) dx j t t | x a jtt − t − (cid:19) × T Y t =2 w a jtt − t − f θt (cid:18) x j t t | x a jtt − t − (cid:19)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) , (7)where x j T T := (cid:16) x j , x j , . . . , x j T T (cid:17) . For brevity, in Eq. (7) and below, we write f θt ( x t | x t − , y t − ) as f θt ( x t | x t − ).The following results obtain some properties of the target distribution and areproved in Appendix A. Theorem 1.
The target distribution in Eq. (7) has the marginal distribution e π N (cid:0) dx j T T , j T , dθ (cid:1) = p (cid:0) dx j T T , dθ | y T (cid:1) N T , and hence, with some abuse of notation, we write e π N ( dx T , dθ ) = p ( dx T , dθ | y T ) . heorem 2. The target distribution in Eq. (7) can also be expressed as e π N (cid:0) dv Nx, T , dv A, T − , j T , dθ (cid:1) = p ( dθ ) ψ (cid:0) dv Nx, T , dv A, T − (cid:1) p ( y T ) × T Y t =1 (cid:16) N − N X i =1 w it (cid:17) w j T T T Y t =2 w j t − t − f θt (cid:16) x j t t | x j t − t − (cid:17)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) (8) Corollary 1.
By integrating j T out of the target distribution in Eq. (8) we obtain e π N (cid:0) dv Nx, T , dv A, T − , dθ (cid:1) = p ( dθ ) ψ (cid:0) dv Nx, T , dv A, T − (cid:1) p ( y T ) T Y t =1 N − N X i =1 w it ! , (9) Corollary 2.
The conditional distribution e π N (cid:0) j T | v Nx, T , v A, T − , θ (cid:1) is e π N (cid:0) j T | v Nx, T , v A, T − , θ (cid:1) = w j T T T Y t =2 w j t − t − f θt (cid:16) x j t t | x j t − t − (cid:17)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) . (10) We now outline a sampling scheme for the state space model that allows the user togenerate those parameters that are highly correlated with the states in the PMMHstep(s), i.e., with the states integrated out, while the other parameters can be gen-erated in the particle Gibbs (PG) step(s). Let θ := ( θ , ..., θ p ) be a partition of theparameter vector into p components where each component may be a vector and let0 ≤ p < p . We use the notation θ − i = ( θ , ..., θ i − , θ i +1 , ..., θ p ). Algorithm 1 gen-erates the parameters θ , ..., θ p using PMMH steps and the parameters θ p +1 , ..., θ p using PG steps. We call it the Efficient PMMH + PG sampler and use the notation P M M H ( θ , . . . , θ p ) + P G ( θ p +1 , . . . , θ p ) to indicate which parameters are generatedusing PMMH steps and which parameters are generated using PG steps.We note that the correlated PMMH method of Deligiannidis et al. (2018) cor-relates the random numbers, u and u ∗ used in constructing the estimators of thelikelihood at the current and proposed values of the parameters. This correlationis set very close to 1 to reduce the variance of the difference in the logs of theestimated likelihoods log b Z ( θ ∗ , u ∗ ) − log b Z ( θ, u ) appearing in the MH acceptanceratio. In contrast to the correlated PMMH approach of Deligiannidis et al. (2018),in the PMMH part of our sampling scheme, we condition on the same set of ran-dom numbers (cid:0) V Nx, T , V A, T − (cid:1) that is generated using CCSMC in Part 4 of Al-gorithm 1. That is, in our scheme we deal with log b Z (cid:0) θ ∗ i , θ − i , (cid:0) V Nx, T , V A, T − (cid:1)(cid:1) − b Z (cid:0) θ i , θ − i , (cid:0) V Nx, T , V A, T − (cid:1)(cid:1) which is the difference in the logs of the estimatedlikelihoods at the proposed and current values of the parameters. Algorithm 1
The Efficient PMMH + PG sampler.Given initial values for V Nx, T , V A, T − , J T , and θ Part 1: PMMH sampling. For i = 1 , ..., p (a) Sample θ ∗ i ∼ q i (cid:0) ·| v Nx, T , v A, T − , θ − i , θ i (cid:1) (b) Run the sequential Monte Carlo algorithm and obtain the estimate of thelikelihood b Z (cid:0) v Nx, T , v A, T − , θ ∗ i , θ − i (cid:1) .(c) Accept the proposed values θ ∗ i with probability α (cid:0) θ i , θ ∗ i | v Nx, T , v A, T − , θ − i (cid:1) =1 ∧ b Z (cid:0) v Nx, T , v A, T − , θ ∗ i , θ − i (cid:1) p ( θ ∗ i | θ − i ) b Z (cid:0) v Nx, T , v A, T − , θ i , θ − i (cid:1) p ( θ i | θ − i ) × q i (cid:0) θ i | v Nx, T , v A, T − , θ − i , θ ∗ i (cid:1) q i (cid:0) θ ∗ i | v Nx, T , v A, T − , θ − i , θ i (cid:1) . (11)Part 2: Sample J T = j T ∼ e π N (cid:0) ·| v Nx, T , v A, T − , θ (cid:1) given in Eq. (10) using the back-ward simulation algorithm (Algorithm 6)Part 3: PG sampling. For i = p + 1 , ..., p, (a) Sample θ ∗ i ∼ q i (cid:0) ·| x j T T , j T , θ − i , θ i (cid:1) (b) Accept the proposed values θ ∗ i with probability α (cid:0) θ i , θ ∗ i | x j T T , j T , θ − i (cid:1) =1 ∧ e π N (cid:0) θ ∗ i | x j T T , j T , θ − i (cid:1)e π N (cid:0) θ i | x j T T , j T , θ − i (cid:1) q i (cid:0) θ i | x j T T , j T , θ − i , θ ∗ i (cid:1) q i (cid:0) θ ∗ i | x j T T , j T , θ − i , θ i (cid:1) (12)Part 4: Sample ( V x, T , V A, T − ) from e π N (cid:0) ·| x j T T , j T , θ (cid:1) using the constrained condi-tional sequential Monte Carlo algorithm (Algorithm 2) and obtain the likeli-hood estimate b Z ( v x, T , v A, T − , θ i , θ − i ).We now motivate Algorithm 1. Part 1: From Corollary 1, e π N (cid:0) v Nx, T , v A, T − , θ ∗ i | θ − i (cid:1)e π N (cid:0) v Nx, T , v A, T − , θ i | θ − i (cid:1) = b Z (cid:0) v Nx, T , v A, T − , θ ∗ i , θ − i (cid:1) p ( θ ∗ i | θ − i ) b Z (cid:0) v Nx, T , v A, T − , θ i , θ − i (cid:1) p ( θ i | θ − i )where b Z (cid:0) v Nx, T , v A, T − , θ (cid:1) is the unbiased estimated of the likelihood in Eq. (6). Thisleads to Eq. (11). Part 2 follows from Corollary 2. Part 3 follows from Theorem 1.Part 4 follows from Eq. (7) (the target), Theorem 1 and Definitions 2 and 3.10 .3 Constrained Conditional Sequential Monte Carlo This section discusses the constrained conditional sequential Monte Carlo (CCSMC)algorithm (Algorithm 2 below), which we use in Part 4 of the correlated PMMH +PG sampling scheme (Algorithm 1). The CCSMC algorithm is a sequential MonteCarlo algorithm in which a particle trajectory x j T T = (cid:0) x j , ..., x j T T (cid:1) and the associatedsequence of indices j T are kept unchanged, which means that some elements of V Nx, T and V A, T − are constrained. It is a constrained version of the conditional SMC sam-pler in Andrieu et al. (2010) modified for the target distribution in Olsson and Ryden(2011). 11 lgorithm 2 The constrained conditional sequential Monte Carlo algorithmInputs: N , θ , x j T T , and j T Outputs: x N T , a N T − , w N T ,Optional outputs: V Nx, T , and V A, T − Fix X j T T = x j T T , A J T − = j T − , and J T = j T .1. For t = 1(a) Sample v ix ∼ ψ x ( · ) and set X i = x i = X ( v ix ; θ, · ) for i = 1 , ..., N \ { j } .(b) Obtain v j x such that x j = X (cid:0) v j x ; θ, · (cid:1) .(c) Compute the importance weights w i = f θ ( x i ) g θ ( y | x i ) m θ ( x i ) , for i = 1 , ..., N , andnormalize w i = w i / P Nj =1 w j .2. For t ≥ x t − has dimension d greater than 1, then map the x it − into scalars z i = h (cid:0) x it − (cid:1) for i = 1 , ..., N using the inverse of the Hilbert map. Otherwise,set z i = x it − .(b) i. Sort the z i from smallest to largest to obtain e z i , i = 1 , . . . , N .ii. Define the one to one mapping ζ : { , . . . , N } → { , . . . , N } such that e z i = z ζ i , breaking up ties in the z i in some systematic way. Let ζ − be the inverse map of ζ .iii. Define e x it − = x ζ i t − and e w it − = w ζ i t − , for i = 1 , . . . , N .(c) Use a constrained sampling algorithm, for example the constrained multi-nomial sampler (Algorithm 3 below),i. Obtain v NAt − and e A N \ ( j t ) t − ii. Set A Nt − = e A ζ − N t − .(d) Sample v ixt ∼ ψ xt ( · ) for i = 1 , ..., N \ { j t } and obtain v j t xt such that x j t t = X ( v j t xt ; θ, x a jtt − t − )(e) Set x it = X (cid:16) v ixt ; θ, x a it − t − (cid:17) for i = 1 , ..., N \ { j t } (f) Compute the importance weights, w it = f θt (cid:16) x it | x a it − t − , y t − (cid:17) g θt ( y t | x it ) m θt (cid:16) x it | x a it − t − (cid:17) , for i = 1 , ..., N, and normalize the w it . 12 iscussion of the CCSMC algorithm for the SVmodel with leverage The implementation of steps 1(b) and 2(d) of the CCSMC algorithm depends on theproposal densities. We now discuss how they are implemented for the univariate SVmodel with leverage in Section 2.1 using the bootstrap filter.Step (1b): v j x = (cid:18) (cid:0) − φ (cid:1) /τ (cid:19) (cid:0) x j − µ (cid:1) Step (2d) v j t xt = (cid:18) x j t t − µ − φ (cid:18) x a jtt − t − − µ (cid:19) − ρτ exp (cid:18) − x a jtt − t − / (cid:19) y t − (cid:19)(cid:18) τ (1 − ρ ) (cid:19) Algorithm 3
Multinomial Resampling Algorithm for CCSMCInput: e x Nt − , and e w Nt − Output: V NA, T − and e A Nt − .1. Compute the cumulative weights based on the sorted particles ne x Nt − , e w Nt − o , b F Nt − ( j ) = j X i =1 e w it − .
2. Generate N − ,
1) random numbers v iAt − ∼ ψ A,t − ( · ) for i =1 , ..., N, i = j t , and set e A it − = min j b F Nt − ( j ) ≥ v iAt − . For i = j t − , v j t At − ∼ U (cid:16) b F Nt − ( j t − − , b F Nt − ( j t − ) (cid:17) , where b F Nt − ( j t − −
1) = j t − − X i =1 e w it − and b F Nt − ( j t − ) = j t − X i =1 e w it − . The factor SV model is a popular model for parsimoniously modeling a vector ofreturns, (see, for example, Chib et al., 2006; Kastner et al., 2017) and we will use itto illustrate our methodology. Suppose that P t is a S × y t := log P t − log P t − as the log-return of the stocks. We model y t as the13actor SV model y t = βf t + V t ǫ t , ( t = 1 , ..., T ) , (13)where f t is a K × K ≪ S ), β is a S × K factorloading matrix of the unknown parameters. Section S1 of the supplement discussesparametrization and identification issues regarding the factor loading matrix β andthe latent factors f t .We model the ǫ t ∼ N (0 , I ). The error volatility matrix V t is diagonal, withdiagonal elements exp( h st ). We assume that the log volatility processes { h st , t ≥ } are independent for s = 1 , . . . , S , and that each follows a univariate SV model of theform h s ∼ N (cid:16) µ ǫs , τ ǫs − φ ǫs (cid:17) h s,t +1 = µ ǫs + φ ǫs (cid:16) h st − µ ǫs (cid:17) + η ǫst with ǫ st η ǫst ! ∼ N ! ρ ǫs τ ǫs ρ ǫs τ ǫs τ ǫs !! (14)The factors f kt , k = 1 , . . . , K are assumed independent with f t ∼ N ( f t ; 0 , D t ), and D t is a diagonal matrix with k th diagonal element exp( λ kt ). Each log volatility λ kt is assumed to follow a univariate SV model with no leverage, i.e., for k = 1 , . . . , K , λ k ∼ N (cid:16) , τ fk − φ fk (cid:17) , λ k,t +1 = φ fk λ kt + η fkt , η fkt ∼ N (cid:16) , τ fk (cid:17) ( t ≥ . (15) Target density:
Section S2 of the supplement gives the target distribution forthe factor SV model, which is a composite of the target densities of the univariateSV models for the idiosyncratic errors and the factors, together with densities for β and f T . Conditional Independence and sampling in the factor SV model:
Thekey to making the estimation of the factor SV model tractable is that given thevalues of ( y T , f T , β ), the factor model given in Eq. (13) separates into S + K inde-pendent components consisting of K univariate SV models for the latent factors and S univariate SV models with (or without) leverage for the idiosyncratic errors. Thatis, given ( y T , f T , β ), we have S univariate SV models with leverage, with ǫ st the t th ‘observation’ on the s th SV model, and we have K univariate SV models withoutleverage, with f kt the t th observation on the k th univariate SV model. Section S3 ofthe supplement discusses the Efficient PMMH +PG sampler for the factor SV andmakes full use of the conditional independence structure of the model. In addition,Section S4 of the supplement discusses a deep interweaving strategy for the loading14atrix β and the factor f t that helps the sampler mix better. To define our measure of the inefficiency of a sampler that takes computing time intoaccount, we first define the integrated autocorrelation time (IACT) for a univariatefunction ψ ( θ ) of θ is IACT ψ = 1 + 2 ∞ X j =1 ρ j,ψ , where ρ j,ψ is the j th autocorrelation of the iterates of ψ ( θ ) in the MCMC afterthe chain has converged. We use the CODA package of Plummer et al. (2006) toestimate the IACT values of the parameters. A low value of the IACT estimatesuggests that the Markov chain mixes well. Our measure of the inefficiency of asampler for a given parameter θ based on IACT ψ is the time normalised variance(TNV), TNV ψ = IACT ψ × CT , where CT is the computing time in seconds periteration of the MCMC. The estimate of TNV is the estimate of the IACT timesthe computing time. The relative time normalized variance of the sampler for ψ (RTNV ψ ) is the TNV relative to the TNV for the Efficient PMMH + PG sampler.Our approach for determining which parameters to estimate by PMMH and whichto estimate by PG in our sampling scheme is to first run the PG algorithm for allthe parameters to identify which parameters have large IACT’s. We then generatethese parameters in the PMMH step.For a given sampler, let IACT MAX and IACT
MEAN be the maximum and meanof the IACT values over all the parameters in the model.
This section illustrates the Efficient PMMH + PG sampler (Algorithm 1) by applyingit to the univariate stochastic volatility model discussed in Section 2.1 and comparesits performance to the particle Gibbs with backward simulation (PGBS) sampler ofLindsten and Sch¨on (2012). We apply the methods to a sample of daily US food in-dustry stock returns data obtained from the Kenneth French website, using a samplefrom December 11th, 2001 to the 11th of November 2013, a total of 3001 observa-tions. We do not compare to the correlated PMMH sampler of Deligiannidis et al.(2018) because that sampler does not apply to our main application which is thefactor SV model. 15able 1: Univariate SV model with leverage for the two samplers. Sampler I: Effi-cient PMMH ( τ , ρ ) + PG ( µ, φ ), Sampler II: PGBS ( µ, τ , φ, ρ ) for US stock returnsdata with T = 3001 for number of particles N = 20. The table gives estimates ofthe inefficiencies, TNV and RTNV for the parameters, as well as the inefficienciesfor the log-volatilities h T . Param. I II \ IACT µ φ τ ρ h T Min 1.28 1.31Mean 3.91 2.30Max 34.08 33.95 \ IACT
MAX .
08 572 . [ TNV
MAX .
56 40 . \ RTNV
MAX . \ IACT
MEAN .
13 245 . [ TNV
MEAN .
93 17 . \ RTNV
MEAN . .
31 0 . Priors:
We now specify the prior distributions of the parameters. We followKim et al. (1998) and choose the prior for the persistence parameter φ as ( φ + 1) / ∼ Beta ( a , b ), with a = 100 and b = 1 .
5, i.e., p ( φ ) = 12 B ( a , b ) (cid:18) φ (cid:19) a − (cid:18) − φ (cid:19) b − . The prior for τ is the half Cauchy, i.e., p ( τ ) ∝ I ( τ> τ . The prior for p ( µ ) ∝
1. Wereparametrize ρ = tanh( ξ ) and put a flat prior on ξ . We note that because of thelarge sample size, the results are insensitive to these prior choices. Results:
We ran the two sampling schemes for 11000 iterations and discardedthe initial 1000 iterations as warmup. Table 1 shows the IACT, TNV and RTNVestimates for the parameters in the univariate SV model estimated using the particleGibbs with backward simulation PGBS ( µ, τ , ρ, φ ), and the Efficient PMMH ( τ , ρ )+PG ( µ, φ ), sampler, where we generate τ and ρ by PMMH and µ and φ by PG. Thetable also gives the IACT estimates for the volatilities h T which were obtained fromthe backward simulation iterates. Both samplers used N = 20 particles. The tableshows that in this example the Efficient PMMH + PG sampler was more efficient inestimating the parameters than the PGBS sampler as measured by the TNV.16e obtained qualitatively similar relative results for both samplers when we used N = 50 and N = 100 particles and these are reported in Section S5 of the supplement. This section discusses performance of the Efficient PMMH + PG sampler to thefactor SV model discussed in Section 4.
Prior specification
For s = 1 , ..., S and k = 1 , ..., K , we choose the priorsfor the persistence parameters φ ǫs and φ fk , the priors for τ ǫs , τ fk , µ ǫs and ρ ǫs as inSection 5.2. For every unrestricted element of the factor loadings matrix β , we followKastner et al. (2017) and choose independent Gaussian distributions N (0 , Estimation details:
We applied our method to a sample of daily US industrystock returns data. The data, obtained from the website of Kenneth French, consistsof daily returns for S = 26 value weighted industry portfolios, which are listed inSection S7 of the supplementary material, using a sample from December 11th, 2001to the 11th of November, 2013, a total of 3001 observations. We compare PG withbackward simulation (PGBS) and the Efficient PMMH + PG sampler for K = 4factors and N = 100 particles.We do not compare our method to the correlated PMMH approach of Deligiannidis et al.(2018) which generates the parameters with the factors and idiosyncratic latent logvolatilities integrated out for two reasons. First, using PMMH results in a S + K = 30dimensional state space model and Mendes et al. (2018) show that it is very hard topreserve the correlation between the logs of the estimated likelihoods at the currentand proposed values for such a model. Thus the correlated PMMH sampler wouldget stuck unless enough particles are used to ensure that the variance of log of theestimated likelihood is close to 1. Deligiannidis et al. (2018) also discuss the issueof how the correlation diminishes as the dimension increases. Second, the dimensionof the parameter space in the factor SV model is large which makes it difficult toimplement the PMMH sampler efficiently as it is difficult to obtain good proposalsfor the parameters because the first and second derivatives of the estimated likeli-hood with respect to the parameters can only be estimated, while the random walkproposal is easy to implement but is very inefficient in high dimensions. Empirical Results
Table 2 summarizes the results and shows that the factor load-ing matrix β and the means µ are sampled efficiently by both samplers with compa-rable IACT values. However, the Efficient PMMH + PG sampler has much smallerIACT values than the PGBS sampler for the τ and φ parameters and is more ef-ficient that PGBS in terms of TNV. Overall, PGBS is never much better than the17fficient PMMH + PG sampler and is sometimes much worse. The research of Robert Kohn and David Gunawan was partially supported by anARC Center of Excellence grant CE140100049.
This article has an online supplement that contains additional technical details ofthe factor SV model and further empirical results for both the univariate SV modeland the factor SV model. The supplement also contains ergodicity and central limittheorem results for the Efficient PMMH + PG sampler.
A Proofs
We need the following lemma before to prove Theorem 1.
Lemma 1. (i) m θ ( dx i ) = ˆ { v ix : X ( v ix ; θ, · )= x i } ψ ( dv ix ) . (ii) For t ≥ , m θt ( dx it | x a it − t − ) = ˆ (cid:8) v ixt : X (cid:0) v ixt ; θ,x ait − t − (cid:1) = x it (cid:9) ψ ( dv ixt ) (iii) For t ≥ , w jt − = Pr( A kt − = j | w Nt − ) = ˆ (cid:26) v A,t − : (cid:18) A (cid:16) v kA,t − ; w Nt − ,x Nt − (cid:17)(cid:19) k = j (cid:27) ψ ( dv A,t − ) Proof.
The proof of parts (i) and (ii) is straightforward. The proof of part (iii) isstraightforward given Assumption 2.
Proof of Theorem 1.
To prove the lemma we carry out the marginalisation by build-ing on the proof of Theorem 3 of Olsson and Ryden (2011). Let V ( − j T ) x, T = n V ( − j ) x , ..., V ( − j T ) xT o .The marginal of e π N ( dx T , j T , dθ ) is obtained by integrating it over (cid:16) v A, T − , v ( − j T ) x, T (cid:17) .18able 2: Comparing Sampler I: Efficient PMMH (cid:0) τ f , τ ǫ , ρ ǫ (cid:1) + PG( µ ǫ , φ ǫ , φ f , β ) , withSampler II: PGBS (cid:0) µ ǫ , φ ǫ , φ f , β, τ f , τ ǫ (cid:1) in terms of Time Normalised Variance for afactor SV model with leverage for US stock return data with T = 3001, N = 100particles, S = 26, and K = 4. Time denotes the time taken in seconds per iterationof the method. The table shows minimum, mean and maximum IACT values for theparameters and the factor and idiosyncratic log-volatilities. The bottom part of thetable also shows \ IACT
MAX , [ TNV
MAX , ..., \ RTNV
MEAN which refer to the parametersonly. I IIMin Mean Max Min Mean Max β .
80 2 .
22 2 .
94 1 .
72 2 .
20 3 . β .
87 21 .
79 22 .
62 14 .
69 23 .
23 26 . β .
02 41 .
71 57 .
64 10 .
75 37 .
65 53 . β .
50 25 .
35 72 .
44 5 .
27 38 .
85 105 . µ .
06 3 .
51 36 .
74 1 .
06 4 .
36 38 . τ .
44 33 .
92 99 .
00 15 .
36 328 .
36 866 . φ .
91 42 .
10 119 .
37 181 .
97 919 .
65 3591 . ρ .
85 18 .
67 42 .
93 115 .
31 298 .
43 859 . h , T .
29 2 .
05 4 .
91 1 .
35 2 .
16 8 . h , T .
46 2 .
84 11 .
97 1 .
47 2 .
70 14 . h , T .
42 2 .
95 12 .
87 1 .
36 4 .
67 43 . h , T .
16 1 .
57 8 .
19 1 .
16 1 .
78 12 . h , T .
08 1 .
80 43 .
77 1 .
11 2 .
14 69 . λ , T .
46 2 .
27 6 .
79 1 .
49 2 .
31 5 . λ , T .
69 10 .
48 14 .
30 10 .
20 13 .
36 19 . λ , T .
77 22 .
13 35 .
24 12 .
72 20 .
88 34 . λ , T .
68 22 .
25 29 .
73 29 .
80 44 .
50 69 . \ IACT
MAX .
37 3591 . [ TNV
MAX .
11 4884 . \ RTNV
MAX . \ IACT
MEAN .
02 227 . [ TNV
MEAN .
06 309 . \ RTNV
MEAN . .
00 1 . v ( − J T ) x,T to get e π N (cid:0) dv Nx, T − , dv J T x,T , dv A, T − , j T , dθ (cid:1) = p (cid:0) dx j T T , dθ | y T (cid:1) N T × ψ (cid:0) dv Nx, T − , dv j T x,T , dv A, T − (cid:1) m θ (cid:0) dx j (cid:1) Q Tt =2 w a jtt − t − m θt (cid:18) dx j t t | x a jtt − t − (cid:19) T Y t =2 w a jtt − t − f θt (cid:18) x j t t | x a jtt − t − (cid:19)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) . Now, integrate over (cid:26) v j T x,T : X (cid:16) v j T x,T ; θ, x a jTT − T − (cid:17) = x j T T (cid:27) using Part (ii) of Lemma 1 toobtain, e π N (cid:0) dv Nx, T − , dx j T T , dv A, T − , j T , dθ (cid:1) = p (cid:0) dx j T T , dθ | y T (cid:1) N T × ψ (cid:0) dv Nx, T − , dv A, T − (cid:1) m θ (cid:0) dx j (cid:1) Q T − t =2 w a jtt − t − m θt (cid:18) dx j t t | x a jtt − t − (cid:19) w a jTT − T − T Y t =2 w a jtt − t − f θt (cid:18) x j t t | x a jtt − t − (cid:19)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) . Then, integrate over (cid:26) v A,T − : (cid:16) A ( v A,T − ; w NT − , x NT − ) (cid:17) j T = a j T T − (cid:27) using Part (iii)of Lemma 1 and then sum over a j T T − to obtain e π N (cid:0) dv Nx, T − , dx j T T , dv A, T − , j T , dθ (cid:1) = p (cid:0) dx j T T , dθ | y T (cid:1) N T × ψ (cid:0) dv Nx, T − , dv A, T − (cid:1) m θ (cid:0) dx j (cid:1) Q T − t =2 w a jtt − t − m θt (cid:18) dx j t t | x a jtt − t − (cid:19) T − Y t =2 w a jtt − t − f θt (cid:18) x j t t | x a jtt − t − (cid:19)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) . We repeat this for t = T − , ...,
2, to obtain, e π N (cid:0) dv Nx, , dx j T T , j T , dθ (cid:1) = p (cid:0) dx j T T , dθ | y T (cid:1) N T × ψ (cid:0) dv Nx, (cid:1) m θ (cid:0) dx j (cid:1) Finally, integrate over v ( − j ) x, to obtain the result.20 roof of Theorem 2. We have that, p ( dx j T T , dθ | y T ) m θ ( dx j ) Q Tt =2 w a jtt − t − m θt ( dx j t t | x a jtt − t − ) × p ( y T ) p ( dθ ) × T Y t =2 w a jtt − t − f θt (cid:18) x j t t | x a jtt − t − (cid:19)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) = g θ ( y | x j ) f θ ( dx j ) Q Tt =2 g θt ( y t | x j t t ) f θt ( dx j t t | x j t − t − ) m θ ( dx j ) Q Tt =2 w a jtt − t − m θt ( dx j t t | x a jtt − t − ) × T Y t =2 w a jtt − t − f θt (cid:18) x j t t | x a jtt − t − (cid:19)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) = w j T − Y t =1 (cid:16) N X i =1 w it (cid:17) T Y t =2 g θt ( y t | x j t t ) f θt ( dx j t t | x a jtt − t − ) m θt ( dx j t t | x a jtt − t − ) T Y t =2 f θt (cid:16) x j t t | x j t − t − (cid:17)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) = T Y t =1 (cid:16) N X i =1 w it (cid:17) T Y t =2 w j t − t − f θt (cid:16) x j t t | x j t − t − (cid:17)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) w j T T Proof of Corollary 1.
The proof follows from Theorem 2 by summing the terms inthe target distribution Eq. (8) that include j T , i.e., w j T T T Y t =2 w j t − t − f θt (cid:16) x j t t | x j t − t − (cid:17)P Nl =1 w lt − f θt (cid:0) x j t t | x lt − (cid:1) . Proof of Corollary 2.
The proof follows from Theorem 2 and Corollary 1.
B Algorithms
This section describes how to implement the SMC algorithm described in Section 2.2.21 lgorithm 4
The Sequential Monte Carlo algorithmInputs:
N, θ .Optional inputs: V Nx, T and V A, T − .Outputs: x N T , a N T − , w N T .1. For t = 1, Sample V ix ∼ ψ x ( · ) , i = 1 , . . . , N and set X i = x i = X ( v ix ; θ, · ) for i = 1 , . . . , N .2. Compute the importance weights w i = f θ ( x i ) g θ ( y | x i ) m θ ( x i ) , for i = 1 , ..., N. and normalize w i = w i / P Nj =1 w j for i = 1 , ...N .3. For t = 2 , . . . , T ,(a) If x t − has dimension d greater than 1, then map the x it − into the scalars z i = h ( x it − ) for i = 1 , . . . , N using the inverse of the Hilbert map. Oth-erwise, set z i = x it − .(b) i. Sort the z i from smallest to largest to obtain e z i , i = 1 , . . . , N .ii. Define the one to one mapping ζ : { , . . . , N } → { , . . . , N } such that e z i = z ζ i , breaking up ties in the z i in some systematic way. Let ζ − be the inverse map.iii. Define e x it − = x ζ i t − and e w it − = w ζ i t − , for i = 1 , . . . , N .(c) Generate V A,t − ∼ ψ A,t − ( · ) and then obtain e A Nt − = e a Nt − using a re-sampling scheme M ( e a Nt − | e x Nt − , e w it − ) e.g. the multinomial resampling inAlgorithm 5. Now set A it − = e A ζ − i t − for i = 1 , ..., N . This defines themapping A Nt − = A (cid:16) v A,t − ; w Nt − , x Nt − (cid:17) .(d) Generate V ixt ∼ ψ xt ( · ) and set X it = x it = X (cid:16) v ixt ; θ, x a it − t − (cid:17) for i = 1 , ..., N .(e) Compute the importance weights w it = f θt (cid:16) x it | x a it − t − , y t − (cid:17) g θt ( y t | x it ) m θt (cid:16) x it | x a it − t − (cid:17) , for i = 1 , ..., N and normalize to obtain w it for i = 1 , ...N .22 lgorithm 5 Multinomial Resampling AlgorithmInput: v At − , e x Nt − , and e w Nt − Output: e A Nt −
1. Compute the cumulative weights based on the sorted particles ne x Nt − , e w Nt − ob F Nt − ( j ) = j X i =1 e w it − .
2. Set e A it − = min j b F Nt − ( j ) ≥ v iAt − for i = 1 , ...N , and note that e A it − for i =1 , ..., N is the ancestor index based on the sorted particles. B.1 The backward simulation algorithm
Algorithm 6
The Backward simulation algorithm1. Sample J T = j T conditional on (cid:0) V Nx, T , V A, T − , θ (cid:1) , with probability propor-tional to w j T T , and choose x j T T ;2. For t = T − , ...,
1, sample J t = j t , conditional on (cid:16) V x, t , V A, t , j t +1: T , x j t +1 t +1 , ..., x j T T (cid:17) , and choose J t = l with probability propor-tional to w lt f θ (cid:16) x j t +1 t +1 | x lt (cid:17) . 23 nline Supplementary material We use the following notation in the supplement. Eq. (1), Algorithm 1, and SamplingScheme 1, etc, refer to the main paper, while Eq. (S1), Algorithm S1, and SamplingScheme S1, etc, refer to the supplement.
S1 The factor loading matrix and the latent fac-tors
In this section we discuss the parametrization of the factor loading matrix and thefactors, as well as how they are sampled.To identify the parameters of the factor loading matrix β , it is necessary toimpose some further constraints. Usually, the factor loading matrix β is assumed tobe lower triangular, i.e., β sk = 0 for k > s and furthermore, one of two constraintsare used. i) The first is that the f kt have unit variance (Geweke and Zhou, 1996);or, alternatively ii) assume that β ss = 1, for s = 1 , ..., S , and the variance of f t isdiagonal but unconstrained. The main drawback of the lower triangular assumptionon β is that the resulting inference can depend on the order in which the componentsof y t are chosen (Chan et al., 2017). We use the following approach for K = m factors to obtain an appropriate ordering of the returns that does not conflict withthe data. We follow Conti et al. (2014); Kastner et al. (2017) and run and post-process the draws from the unrestricted sampler by choosing from column 1 thestock i = i with the largest value of | β i, | . We repeat this for column 2, except thatnow we seek that i = 2 , . . . , S, i = i maximizing | β i, | . We proceed similarly forcolumns 3 to m . By an unrestricted sampler we mean that we do not restrict β to belower triangular. Furthermore, as noted by Kastner et al. (2017), the second set ofconstraints impose that the first K variables are leading the factors, and making thevariable ordering dependence stronger. We follow Kastner et al. (2017) and leave thediagonal elements β ss unrestricted and set the level µ k of the factor log-volatilities λ kt to zero for k = 1 , ..., K .Let k s denote the number of unrestricted elements in row s of β and define F s = f · · · f k s ... ... f T · · · f k s T , e V s = exp ( h s ) · · ·
00 . . . ...0 · · · exp ( h sT ) . Then, the factor loadings β s,. = ( β s , ..., β sk s ) T for s = 1 , ..., S , are sampled indepen-S1ently for each s by performing a Gibbs-update using β T s,. | f T , y s, T , h s,. ∼ N k s ( a sT , b sT ) , (S1)where b pT = (cid:16)(cid:16) F TS e V − S F S (cid:17) + I k s (cid:17) − and a sT = b sT F Ts (cid:16) e V − s y s, T (cid:17) . Sampling { f t } | y, { h t } , { λ t } , β After some algebra, we can show that { f t } canbe sampled from { f t } | y, { h t } , { λ t } , β ∼ N ( a t , b t ) , (S2)where b t = (cid:0) β T V − t β + D − t (cid:1) − and a t = b t β T (cid:0) V − t y t (cid:1) . S2 Target Distributions for the factor SV model
This section provides an appropriate target density for the factor SV model in Sec-tion 4. The target density includes all the random variables produced by K + S uni-variate SMC methods that generate the factor log-volatilities λ k, T for k = 1 , ..., K and the idiosyncratic log-volatilities h s, T for s = 1 , ..., S , as well as the latent fac-tors f T , the parameters of the individual idiosyncratic error SV’s θ ǫ, S , the pa-rameters of the factor SV’s θ f, K , and the factor loading matrix β . We define θ = ( f T , θ ǫ, S , θ f, K , β ). We use equations Eq. (15) to specify the univariate particlefilters that generate the factor log-volatilities λ k, T for k = 1 , ..., K without leverageand Eq. (14) to specify the particle filters for the idiosyncratic SV log-volatilitieswith leverage h s, T for s = 1 , ..., S . We denote the N weighted samples at time t forthe factor log volatilities by (cid:0) λ Nkt , w Nfkt (cid:1) and (cid:0) h Nst , w Nǫst (cid:1) for the idiosyncratic errorslog volatilities. The corresponding proposal densities are m θ fk fk ( λ k ), m θ fk fkt ( λ kt | λ kt − ), m θ ǫs ǫs ( h s ), and m θ ǫs ǫst ( h st | h st − ) for t = 2 , ..., T and are the same as in the univariatecase.We define V iǫst as a random variable used to generate h ist from the proposal m ǫst ( h s ) and m ǫst ( h st | h st − ) for t ≥
2, and write h ist = X ( v iǫst ; θ ǫs , h s,t − ) for s = 1 , ..., S . The distribution of v iǫst is denoted as ψ ǫst ( · ) and is the standard nor-mal distribution N (0 , V ifkt and its distribution ψ fkt ( · ) aredefined similarly.We define V iAǫst − for s = 1 , ..., S , as the vector of random variable used togenerate the ancestor indices A iǫst − for i = 1 , ..., N , such that A Nǫst − is gener-ated using M ǫ (cid:0) a Nǫst − | w Nǫst − , h Nǫst − (cid:1) , where each ancestor index a iǫst − = j indexesa particle in (cid:0) h Nst − , w Nǫst − (cid:1) and is sampled with probability w jǫst − . We write themapping A Nǫst − = A (cid:0) V Aǫst − ; w Nǫst − , h Nst − (cid:1) and denote the distribution V iAǫst asS2 Aǫst ( · ) (which is standard U (0 , V iAfkt − , its distribu-tion ψ Afkt ( · ), the resampling scheme M f (cid:0) a Nfkt − | w Nfkt − , λ Nfkt − (cid:1) and the mapping A Nfkt − = A (cid:0) V Afkt − ; w Nfkt − , λ Nkt − (cid:1) are defined similarly for k = 1 , ..., K .The joint distribution of the random variables (cid:0) V Nǫs, T , V Aǫs, T − (cid:1) is ψ (cid:0) dV Nǫs, T , dV Aǫs, T − (cid:1) = T Y t =1 N Y i =1 ψ ǫst (cid:0) dV iǫst (cid:1) T − Y t =1 ψ Aǫst ( dV Aǫst ) , (S3)for s = 1 , ..., S and the joint distribution of the random variables (cid:0) V Nfk, T , V Afk, T − (cid:1) is ψ (cid:0) dV Nfk, T , dV Afk, T − (cid:1) = T Y t =1 N Y i =1 ψ fkt (cid:0) dV ifkt (cid:1) T − Y t =1 ψ Afkt ( dV Afkt ) , (S4)for k = 1 , ..., K .We next define the indices J ǫs, T for s = 1 , ..., S , the selected particle trajectories h j ǫs, T s, T = (cid:0) h j ǫs s , ..., h j ǫsT sT (cid:1) , the indices J fk, T for k = 1 , ..., K and the selected particletrajectories λ j fk, T k, T = (cid:16) λ j fk k , ..., λ j fkT kT (cid:17) .The augmented target density in this case consists of all of the particle filter vari-ables (cid:0) V Nǫs, T , V Aǫs, T − (cid:1) and (cid:0) V Nfk, T , V Afk, T − (cid:1) , the sampled trajectory (cid:16) λ j fk, T k, T , h j ǫs, T s, T (cid:17) and indices ( J fk, T , J ǫs, T ) for all s = 1 , ..., S and for k = 1 , .., K and is e π N (cid:0) dV Nǫ, S, T , dV Aǫ, S, T − , J ǫ, S, T , dV Nf, K, T , dV Af, K, T − , J f, K, T , dθ (cid:1) := π (cid:16) dλ J f, T K, T , dh J S, T S, T , dθ (cid:17) N T ( S + K ) × S Y s =1 ψ (cid:0) dV Nǫs, T , dV Aǫs, T − (cid:1) m θǫs (cid:0) dh j ǫs s (cid:1) Q Tt =2 w a jǫstǫst − ǫst − m θǫst (cid:18) dh j ǫst st | h a jǫstǫst − st − (cid:19) × T Y t =2 w a jǫstǫst − ǫst − f θst (cid:18) h j ǫst st | h a jǫstǫst − st − (cid:19)P Nl =1 w lǫst − f θst (cid:0) h j ǫst st | h lst − (cid:1) × K Y k =1 ψ (cid:0) dV Nfk, T , dV Afk, T − (cid:1) m θfk (cid:16) dλ j fk fk (cid:17) Q Tt =2 w a jfktfkt − fkt − m θft (cid:18) dλ j fkt fkt | λ a jfktfkt − fkt − (cid:19) × T Y t =2 w a jfktfkt − fkt − f θkt (cid:18) λ j fkt fkt | λ a jfktfkt − fkt − (cid:19)P Nl =1 w lfkt − f θkt (cid:16) λ j fkt fkt | λ lfkt − (cid:17) . (S5)Similarly to the proof of Theorem 1, we can show that the marginal distribu-tion of e π N (cid:0) dV Nǫ, S, T , dV Aǫ, S, T − , J ǫ, S, T , dV Nf, K, T , dV Af, K, T − , J f, K, T , dθ (cid:1) is (cid:16) N T ( S + K ) (cid:17) − π (cid:16) dλ J f, T K, T , dh J S, T S, T , dθ (cid:17) S3 We illustrate our methods using the Efficient PMMH (cid:0) τ f , τ ǫ , ρ ǫ (cid:1) + PG( µ ǫ , φ ǫ , φ f , β )for the factor SV with leverage, which we found to give good performance in theempirical studies in Section 5. It is straightforward to modify the sampling schemefor other choices of which parameters to sample with a PG step and which parametersto sample with a PMMH step. Algorithm S1 outlines the sampling scheme.S4 lgorithm S1 The Efficient PMMH + PG sampler for the factor SV model withleverage: Efficient PMMH (cid:0) τ f , τ ǫ , ρ ǫ (cid:1) +PG( µ ǫ , φ ǫ , φ f , β )Given initial values for V Nǫ, S, T , V Aǫ, S, T − , V Nf, K, T , V Af, K, T − , J ǫ T , J f T , and θ Part 1: PMMH sampling, For k = 1 , ..., K (a) Sample τ ∗ fk ∼ q τ fk (cid:16) ·| V Nfk, T , V Afk, T − , θ − τ fk , τ fk (cid:17) (b) Run the SMC algorithm and obtain b Z (cid:16) V fk, T , V Afk, T − , τ ∗ fk , θ − τ fk (cid:17) .(c) Accept the proposed values τ ∗ fk with probability1 ∧ b Z (cid:16) V fk, T , V Afk, T − , τ ∗ fk , θ − τ fk (cid:17) p (cid:0) τ ∗ fk (cid:1)b Z (cid:16) V fk, T , V Afk, T − , τ fk , θ − τ fk (cid:17) p (cid:0) τ fk (cid:1) × q τ fk (cid:16) τ fk | V Nfk, T , V Afk, T − , θ − τ fk , τ ∗ fk (cid:17) q τ fk (cid:16) τ ∗ fk | V Nfk, T , V Afk, T − , θ − τ fk , τ fk (cid:17) . For s = 1 , ...., S ,(a) Sample ( τ ∗ ǫs , ρ ∗ ǫs ) ∼ q τ ǫs ,ρ ǫs (cid:0) ·| V Nǫs, T , V Aǫs, T − , τ ǫs , ρ ǫs , θ − τ ǫs ,ρ ǫs (cid:1) (b) Run the SMC algorithm and obtain b Z (cid:0) V ǫs, T , V Aǫs, T − , τ ∗ ǫs , ρ ∗ ǫs , θ − τ ǫs ,ρ ǫs (cid:1) .(c) Accept the proposed values ( τ ∗ ǫs , ρ ∗ ǫs ) with probability1 ∧ b Z (cid:0) V ǫs, T , V Aǫs, T − , τ ∗ ǫs , ρ ∗ ǫs , θ − τ ǫs ,ρ ǫs (cid:1) p ( τ ∗ ǫs , ρ ∗ ǫs ) b Z (cid:0) V ǫs, T , V Aǫs, T − , τ ǫs , ρ ǫs , θ − τ ǫs ,ρ ǫs (cid:1) p ( τ ǫs , ρ ǫs ) . × q τ ǫs ,ρ ǫs (cid:0) τ ǫs , ρ ǫs | V Nǫs, T , V Aǫs, T − , θ − τ ǫs ,ρ ǫs , τ ∗ ǫs , ρ ∗ ǫs (cid:1) q τ ǫs ,ρ ǫs (cid:0) τ ∗ ǫs , ρ ∗ ǫs | V Nǫs, T , V Aǫs, T − , θ − τ ǫs ,ρ ǫs , τ ǫs , ρ ǫs (cid:1) Part 2: Sample J ǫ, S, T ∼ e π N (cid:0) ·| V Nǫ, S, T , V Aǫ, S, T − , θ (cid:1) and sample J f, K, T ∼ e π N (cid:0) ·| V Nf, K, T , V Af, K, T − , θ (cid:1) Part 3: PG sampling.(a) Sample β | λ J f, T T , J f, T , h J ǫ, T T , J ǫ, T , θ − β , y T using Eq. (S1).(b) Redraw the diagonal elements of β through the deep interweaving proce-dure described in Section S4(c) Sample f T | λ J f, T T , J f, T , h J ǫ, T T , J ǫ, T , θ − f T , y T using Eq. (S2).(d) Sample φ fk | λ j fk T k T , j fk T , θ − φ fk for k = 1 , ..., K (e) Sample φ ǫs | h j ǫs T s T , j ǫs T , θ − φ ǫs and sample µ ǫs | h j ǫs T s T , j ǫs T , θ − µ ǫs for s =1 , ..., S S5his is a continuation of Algorithm S1Part 4: Sample ( V fk, T , V Afk, T − ) from e π N (cid:16) ·| λ j fk T k T , j fk T , θ (cid:17) using the CCSMC al-gorithm (Algorithm 2) and obtain b Z (cid:16) V fk, T , V Afk, T − , τ fk , θ − τ fk (cid:17) for k =1 , ..., K Part 5: Sample ( V ǫs, T , V Aǫs, T − ) from e π N (cid:0) ·| h j ǫs T s T , j ǫs T , θ (cid:1) using the CCSMC algo-rithm (Algorithm 2) and obtain b Z (cid:0) V ǫs, T , V Aǫs, T − , τ ǫs , ρ ǫs , θ − τ ǫs ,ρ ǫs (cid:1) for s =1 , ..., S . S6 urther discussion of Part 3, (d) and (e) of Algorithm S1 For k =1 , ..., K , sample the autoregressive coefficient φ fk from e π N (cid:16) ·| λ j fk T k T , j fk T , θ − φ fk (cid:17) .We draw a proposed value φ ∗ fk from N (cid:16) µ φ fk , σ φ fk (cid:17) truncated within ( − , µ φ fk = σ φ fk τ fk T X t =2 λ kt λ kt − , σ φ fk = τ fk P T − t =2 λ kt . (S6)The candidate is accepted with probabilitymin , p (cid:0) φ ∗ fk (cid:1) q − φ ∗ fk p ( φ fk ) q − φ fk . (S7)For s = 1 , ..., S , sample µ ǫs from N (cid:0) µ µ ǫs , σ µ ǫs (cid:1) , where µ µ ǫs = σ µ ǫs h s (1 − φ ǫs ) (1 − ρ ǫs ) + (1 − φ ǫs ) P Tt =2 h st − φh st − − ρ ǫs τ ǫs ǫ ∗ st − τ ǫs (1 − ρ ǫs ) , (S8) ǫ ∗ st − = ( y st − − β s f t − ) exp ( − h st − /
2) and σ µ ǫs = τ ǫs (1 − ρ ǫs )(1 − φ ǫs ) (1 − ρ ǫs ) + ( T −
1) (1 − φ ǫs ) . (S9)For s = 1 , ..., S , sample φ ǫs , by drawing a proposed value φ ∗ ǫs from N (cid:0) µ φ ǫs , σ φ ǫs (cid:1) truncated within ( − , µ φ ǫs = P Tt =2 ( h s,t − µ ǫs ) ( h st − − µ ǫs ) − ρ ǫs τ ǫs ( h st − − µ ǫs ) ǫ ∗ st − P Tt =2 ( h st − − µ ǫs ) − ( h s − µ ǫs ) (1 − ρ ǫs ) , (S10)and σ φ ǫs = τ ǫs (1 − ρ ǫs ) P Tt =2 ( h st − − µ ǫs ) − ( h s − µ ǫs ) (1 − ρ ǫs ) . The candidate is accepted with probabilitymin ( p ( φ ∗ ǫs ) p − φ ∗ ǫs p ( φ ǫs ) p − φ ǫs , ) . (S11)In all the examples, the PMMH step uses the bootstrap filter to sample the particlesand the adaptive random walk as the proposal density for the parameters.S7 It is well-known that sampling the factor loading matrix β conditional on { f t } andthen sampling { f t } conditional on β is very inefficient and leads to extremely slowconvergence and poor mixing. Our article employs a simple approach based on anancillarity-sufficiency interweaving strategy (ASIS), in particular the deep interweav-ing strategy, introduced by Kastner et al. (2017), that we now briefly describe. Theparameterisation underlying deep interweaving is given by y t = β ∗ f ∗ t + V t ε t , f ∗ t | λ ∗ ∼ N K (cid:0) , diag (cid:0) e λ ∗ t , ..., e λ ∗ Kt (cid:1)(cid:1) , (S12)with a lower triangular factor loading matrix β ∗ , where β ∗ = 1 , ..., β ∗ KK = 1. Thefactor model can be reparameterised in Eq. (S12) using a simple linear transformation f ∗ t = Df t , β ∗ = βD − , where D = diag ( β , ..., β KK ), for t = 1 , .., T . The K latent factor volatilities λ ∗ kt follow the following univariate SV models having levels µ fk = log β kk , rather thanzero, as in the factor SV model. The transformed factor volatilities are given by λ ∗ kt = λ kt + log β kk , t = 0 , ..., T, k = 1 , ..., K. We add the following deep interweaving algorithm in between sampling the factorloading matrix and sampling the latent factors and perform these steps independentlyfor each k = 1 , .., K , • Determine the vector β ∗ .,k , where β ∗ sk = β oldsk /β oldkk in the k th column of thetransformed factor loading matrix β ∗ . • Define λ ∗ k, T = λ oldk, T +2 log | β oldkk | and sample β newkk from p (cid:0) β kk | β ∗ .,k , λ ∗ k,. , φ fk , τ fk (cid:1) for factor log-volatilites follows SV process. • Update β .,k = β newkk β oldkk β old.,k , f k,. = β oldkk β newkk f oldk,. , and λ k, T = λ oldk, T + 2 log | β oldkk β newkk | .In the deep interweaving representation, we sample the scaling parameter β kk indirectly through µ fk , k = 1 , ..., K . The implied prior p ( µ fk ) ∝ exp ( µ fk / − exp ( µ fk ) / p (cid:0) β ∗ .,k | µ fk (cid:1) ∼ N (0 , exp ( − µ fk ) I k l ) so that p (cid:0) µ fk | β ∗ .,k , λ ∗ k,. , φ fk , τ fk (cid:1) ∝ p (cid:0) λ ∗ fk,. | µ fk , φ fk , τ fk (cid:1) p (cid:0) β ∗ .,k | µ fk (cid:1) p ( µ fk ) , which is not in an easily recognisable form from which to sample. Instead, weS8raw the proposal µ propfk from N ( A, B ), where A = P T − t =2 λ ∗ kt + ( λ ∗ kT − φ fk λ k ) / (1 − φ fk ) T − /B , B = τ fk / (1 − φ fk ) T − /B . Denoting the current value µ fk by µ oldfk , the new value µ propfk gets accepted withprobability min (1 , R ), where R = p (cid:0) µ propfk (cid:1) p (cid:0) λ ∗ k | µ propfk , φ fk , τ fk (cid:1) p (cid:0) β ∗ .,k | µ propfk (cid:1) p (cid:0) µ oldfk (cid:1) (cid:0) λ ∗ k, | µ oldfk , φ fk , τ fk (cid:1) p (cid:0) β ∗ .,k | µ oldfk (cid:1) × p aux (cid:0) µ oldfk | φ fk , τ fk (cid:1) p aux (cid:0) µ propfk | φ fk , τ fk (cid:1) , where p aux (cid:0) µ oldfk | φ fk , τ fk (cid:1) ∼ N (cid:0) , B τ fk / (1 − φ fk ) (cid:1) . We follow Kastner et al. (2017) and set the constant B to the large value 10 . S5 Further empirical results for the univariate SVmodel with leverage
This section gives results for N = 50 and N = 100 particles similar to the results inSection 5.2 for N = 20 particles. S9able S1: Univariate SV model with leverage for the two samplers. Sampler I:Efficient PMMH ( τ , ρ ) + PG ( µ, φ ), and Sampler II: PGBS ( µ, τ , φ, ρ ) for US stockreturns data with T = 3001 for number of particles N = 50. The table gives estimatesof the inefficiencies, TNV and RTNV for the parameters, as well as the inefficienciesfor the log-volatilities h T . Param. I II \ IACT µ φ τ ρ h T Min 1.16 1.07Mean 2.32 2.10Max 24.69 32.08 \ IACT
MAX .
68 623 . [ TNV
MAX .
87 68 . \ RTNV
MAX . \ IACT
MEAN .
27 275 . [ TNV
MEAN .
71 30 . \ RTNV
MEAN . .
40 0 . τ , ρ ) + PG ( µ, φ ), and Sampler II: PGBS ( µ, τ , φ, ρ ) for US stockreturns data with T = 3001 for number of particles N = 100. The table givesestimates of the inefficiencies, TNV and RTNV for the parameters, as well as theinefficiencies for the log-volatilities h T .Param. I II \ IACT µ φ τ ρ h T Min 1.03 1.00Mean 1.81 1.74Max 22.18 18.54 \ IACT
MAX .
18 375 . [ TNV
MAX .
42 60 . \ RTNV
MAX . \ IACT
MEAN .
92 190 . [ TNV
MEAN .
68 30 . \ RTNV
MEAN . .
56 0 . This section gives results for N = 20 ,
50 and N = 200 particles similar to the onesin Section 5.3 for N = 100 particles.Table S3: Comparing Sampler I: Efficient PMMH (cid:0) τ f , τ ǫ , ρ ǫ (cid:1) + PG( µ ǫ , φ ǫ , φ f , β ) ,with Sampler II: PGBS (cid:0) µ ǫ , φ ǫ , φ f , β, τ f , τ ǫ , ρ ǫ (cid:1) in terms of Time Normalised Vari-ance for a factor SV model with leverage for US stock return data with T = 3001, N = 20 particles, S = 26, and K = 4. Time denotes the time taken in seconds periteration of the method. The table shows minimum, mean and maximum IACT val-ues for the parameters and the factor and idiosyncratic log volatilities. The bottompart of the table also shows \ IACT
MAX , [ TNV
MAX , ..., \ RTNV
MEAN which refer to theparameters only. I IIMin Mean Max Min Mean Max β .
73 2 .
38 4 .
11 1 .
68 3 .
28 5 . β .
61 24 .
91 26 .
50 15 .
29 21 .
30 22 . β .
59 33 .
41 41 .
52 14 .
03 44 .
68 57 . β .
37 36 .
61 87 .
99 6 .
24 41 .
71 113 . µ .
10 6 .
09 90 .
07 1 .
00 4 .
27 40 . τ .
26 64 .
71 206 .
72 14 .
77 305 .
84 1464 . φ .
80 84 .
44 222 .
30 185 .
07 848 .
33 2153 . ρ .
48 35 .
52 61 .
40 82 .
61 286 .
02 695 . h , T .
74 2 .
78 8 .
19 1 .
76 2 .
90 9 . h , T .
92 5 .
27 12 .
70 2 .
70 5 .
23 19 . h , T .
40 6 .
08 27 .
62 3 .
05 5 .
72 28 . h , T .
49 2 .
37 9 .
48 1 .
56 2 .
29 9 . h , T .
65 2 .
92 50 .
91 1 .
77 3 .
49 67 . λ , T .
44 4 .
47 11 .
48 2 .
31 4 .
78 13 . λ , T .
49 14 .
40 21 .
27 11 .
45 14 .
02 19 . λ , T .
98 24 .
35 50 .
12 17 .
62 25 .
27 36 . λ , T .
79 39 .
75 61 .
90 30 .
54 46 .
76 68 . \ IACT
MAX .
30 2153 . [ TNV
MAX .
30 2584 . \ RTNV
MAX . \ IACT
MEAN .
55 213 . [ TNV
MEAN .
12 256 . \ RTNV
MEAN . .
48 1 . (cid:0) τ f , τ ǫ , ρ ǫ (cid:1) + PG( µ ǫ , φ ǫ , φ f , β ) ,with Sampler II: PGBS (cid:0) µ ǫ , φ ǫ , φ f , β, τ f , τ ǫ , ρ ǫ (cid:1) in terms of Time Normalised Vari-ance for a factor SV model with leverage for US stock return data with T = 3001, N = 50 particles, S = 26, and K = 4. Time denotes the time taken in seconds periteration of the method. The table shows minimum, mean and maximum IACT val-ues for the parameters and the factor and idiosyncratic log volatilities. The bottompart of the table also shows \ IACT
MAX , [ TNV
MAX , ..., \ RTNV
MEAN which refer to theparameters only. I IIMin Mean Max Min Mean Max β .
68 1 .
95 2 .
43 3 .
44 3 .
69 5 . β .
51 22 .
88 24 .
34 26 .
33 40 .
69 42 . β .
56 43 .
12 56 .
25 18 .
41 89 .
23 130 . β .
03 38 .
56 114 .
51 13 .
50 54 .
11 115 . µ .
10 3 .
77 37 .
46 2 .
00 6 .
35 52 . τ .
29 53 .
89 175 .
50 19 .
12 581 .
91 1545 . φ .
50 43 .
57 165 .
24 302 .
60 1708 .
74 4069 . ρ .
95 22 .
49 56 .
73 196 .
02 499 .
45 1267 . h , T .
45 2 .
26 5 .
42 2 .
87 4 .
29 9 . h , T .
75 3 .
37 10 .
00 3 .
08 5 .
28 10 . h , T .
83 3 .
65 17 .
36 3 .
06 6 .
70 45 . h , T .
21 1 .
74 7 .
75 2 .
40 3 .
49 23 . h , T .
24 2 .
01 32 .
62 2 .
35 4 .
95 109 . λ , T .
62 2 .
70 6 .
17 3 .
06 5 .
39 12 . λ , T .
69 11 .
96 16 .
00 15 .
95 21 .
42 34 . λ , T .
01 24 .
98 36 .
24 31 .
21 51 .
42 87 . λ , T .
73 41 .
86 60 .
02 27 .
70 50 .
80 79 . \ IACT
MAX .
50 4069 . [ TNV
MAX .
58 5087 . \ RTNV
MAX . \ IACT
MEAN .
29 411 . [ TNV
MEAN .
20 514 . \ RTNV
MEAN . .
67 1 . (cid:0) τ f , τ ǫ , ρ ǫ (cid:1) + PG( µ ǫ , φ ǫ , φ f , β ), withSampler II: PGBS (cid:0) µ ǫ , φ ǫ , φ f , β, τ f , τ ǫ , ρ ǫ (cid:1) in terms of Time Normalised Variance fora factor SV model with leverage for US stock return data with T = 3001, N = 200particles, S = 26, and K = 4. Time denotes the time taken in seconds per iterationof the method. The table shows minimum, mean and maximum IACT values for theparameters and the factor and idiosyncratic log volatilities. The bottom part of thetable also shows \ IACT
MAX , [ TNV
MAX , ..., \ RTNV
MEAN which refer to the parametersonly. I IIMin Mean Max Min Mean Max β .
74 2 .
21 3 .
46 1 .
75 2 .
18 3 . β .
73 20 .
86 21 .
60 20 .
82 27 .
25 32 . β .
37 46 .
40 60 .
40 10 .
56 35 .
45 46 . β .
81 30 .
40 82 .
74 6 .
19 32 .
22 79 . µ .
05 2 .
92 26 .
82 1 .
06 3 .
53 29 . τ .
31 31 .
83 98 .
20 17 .
38 323 .
40 833 . φ .
50 40 .
97 108 .
26 174 .
71 859 .
31 1956 . ρ .
35 17 .
97 70 .
88 120 .
49 301 .
36 1043 . h , T .
24 1 .
87 3 .
49 1 .
30 1 .
99 5 . h , T .
30 2 .
46 8 .
17 1 .
32 3 .
02 18 . h , T .
27 2 .
67 14 .
57 1 .
23 2 .
40 14 . h , T .
08 1 .
49 7 .
25 1 .
07 1 .
51 6 . h , T .
98 1 .
74 43 .
23 1 .
03 1 .
95 39 . λ , T .
32 1 .
97 4 .
15 1 .
32 2 .
11 5 . λ , T .
16 9 .
93 12 .
68 10 .
87 15 .
45 23 . λ , T .
58 20 .
54 32 .
43 10 .
63 16 .
32 30 . λ , T .
34 32 .
43 47 .
92 21 .
11 33 .
15 51 . \ IACT
MAX .
26 1956 . [ TNV
MAX .
98 2993 . \ RTNV
MAX . \ IACT
MEAN .
38 217 . [ TNV
MEAN .
42 333 . \ RTNV
MEAN . .
75 1 . Table S6: The list of industry portfoliosStocks1 Coal2 Health Care and Equipment3 Retail4 Tobacco5 Steel Works6 Food Products7 Recreation8 Printing and Publishing9 Consumer Goods10 Apparel11 Chemicals12 Textiles13 Fabricated Products14 Electrical Equipment15 Automobiles and Trucks16 Aircraft, ships, and Railroad Equipment17 Industrial Mining18 Petroleum and Natural Gas19 Utilities20 Telecommunication21 Personal and Business Services22 Business Equipment23 Transportation24 Wholesale25 Restaurants, Hotels, and Motels26 Banking, Insurance, Real Estate
S8 Ergodicity of the Efficient PMMH + PG sam-pler
This section discusses the ergodicity of the Efficient PMMH + PG sampler underconditions that hold for our applications. Define, w θ ( x ) := g θ ( y | x ) f θ ( x ) m θ ( x ) and w θt ( x t , x t − ) := g θt ( y t | x t ) f θt ( x t | x t − ) m θt ( x t | x t − ) for t ≥ , and assume that, Assumption S1. (i) < w θ ( x ) < ∞ and < w θt ( x t , x t − ) < ∞ for t ≥ , forall θ ∈ Θ and x t , x t − ∈ X . S14 ii) For i = 1 , . . . , p , < q i ( θ i | v Nx, T , v A, T − , θ − i , θ ∗ i ) < ∞ for all ( θ i , θ − i ) , ( θ ∗ i , θ − i ) ∈ Θ , and for all v Nx, T , v A, T − .(iii) For i = p + 1 , . . . , p , < q i ( θ i | x T , j T , θ − i , θ ∗ i ) < ∞ for all ( θ i , θ − i ) , ( θ ∗ i , θ i ) ∈ Θ , and for all x T ∈ X T and j T ∈ { , . . . , N } T . Proposition S1 shows that the Efficient PMMH + PG sampler converges to e π N in total variation norm if Assumption S1 holds and obtains a consistency result forfunctions of θ . We note that we can relax the assumptions necessary for the proposi-tion to hold at the cost of a more complicated proof (see, for example, Andrieu et al.,2010, Theorems 4 and 5). However, the conditions of Assumption S1 hold for mostapplications, and in particular for the applications in our article. Proposition S1.
Suppose that Assumptions 1, 2 and S1 hold.(i) The Efficient PMMH + PG sampler (Algorithm 1) converges to the targetdistribution e π N (Eq. (7)) in total variation norm.(ii) Suppose that ψ ( θ ) is a scalar function of θ ∈ Θ such that E π ( | ψ | ) < ∞ . Then b E π ( ψ ) := 1 M M X i =1 ψ ( θ [ i ] ) → E π ( ψ ) almost surely ( π ) , where θ [ i ] are the iterates of θ of Algorithm 1.Proof. Proof of Part (i). Without loss of generality we take p = 1 and p = 2.We can then consider the transition kernel P of Algorithm 1 as a compositionof four kernels, corresponding to Parts 1 to 4 of Algorithm 1. The first kernel is P ( θ ; dθ ∗ | v Nx, T , v A,T − , θ ). The second kernel is P ( j T | v Nx, T , v A,T − , θ ). The thirdkernel is P ( θ ; dθ ∗ | x T , j T , θ ). The fourth kernel is P ( dv Nx, T , v A, T − | x j T T , j T , θ ).Let K , . . . , K be the correspond substochastic kernels. Then, K = P and K = P . We will show all four substochastic kernels are positive under Assumption S1.It then follows that P = P P P P is Harris recurrent, irreducible and aperiodic.Hence Algorithm 1 converges to e π N in total variation norm by Theorem 1 of Tierney(1994). It is clear from Corollary 2 that K is positive as the weights w it are positiveand bounded by assumption. It is also clear that K is positive because in the con-strained conditional Monte Carlo algorithm (Algorithm 2) the normalized weightsare positive and bounded. We now consider K , where K = α (cid:0) θ i , θ ∗ i | V Nx, T , V A, T − , θ − i (cid:1) × q (cid:0) θ ∗ | V Nx, T , V A, T − , θ − i , θ i (cid:1) S15ith α (cid:0) θ i , θ ∗ i | V Nx, T , V A, T − , θ − i (cid:1) is given by Eq. (11). It is evident from Assump-tion S1 that K is positive. Finally, we consider K , where K = α (cid:0) θ , θ ∗ | x j T T , j T , θ (cid:1) q ( θ ; dθ ∗ | x , T j T , θ )with α (cid:0) θ , θ ∗ | x j T T , j T , θ (cid:1) given by Eq. (12). It is clear from Lemma 1 and Assump-tion S1 that K is positive. Finally, we note that P is also Harris recurrent from theabove and Theorem 12 of Roberts and Rosenthal (2006).Part (ii) follows from Theorem 3 of Tierney (1994). S8.1 Ergodicity of the Efficient PMMH + PG sampler forthe factor SV model
We first note that by construction Sampling Scheme S1 in Section S3 has the sta-tionary distribution Eq. (S5). The transition kernel of Sampling Scheme S1 is acomposite of the transition kernels discussed in the proof of Proposition S1 togetherwith the transition kernel for β and f T . The proof of Proposition S1 shows thatthe transition kernels involved in Algorithm 3 are positive and it is clear that thetransition kernels for β and f T are also positive. Therefore, Sampling Scheme S1 isergodic and all the results in Proposition S1 hold. The results in Proposition S2 forrandom scan and mixture versions of the basic factor SV model. S9 Central Limit theorem for Algorithm 1
Let P , . . . , P p +2 be the p + 2 transition kernels in Algorithm 1 and let P rand scan and P mixture be random scan and mixture versions of Algorithm 1. That is, P rand scan isobtained by the composition of a random permutation of P , . . . , P p +2 , and P mixture is the mixture (cid:16) P + · · · + P p +2 (cid:17) / ( p + 2). We then have Proposition S2.
Suppose that Assumption S1 holds. Then the following results holdfor the P rand scan and P mixture based MCMC algorithms(i) The mixture MCMC and random scan algorithms converge to the target distri-bution e π N (Eq. (7)) in total variation norm.(ii) Let ψ ( θ ) be a scalar function of θ ∈ Θ such that E π ( | ψ | ) < ∞ . Then, for bothalgorithms, b E π ( ψ ) := 1 M M X i =1 ψ ( θ [ i ] ) → E π ( ψ )S16 lmost sure ( π ) , where θ [ i ] are the iterates of θ Algorithm 1.(iii) For both the P rand scan and P mixture based algorithms, let Γ ψj = Cov( ψ ( θ [ i ] ) , ψ ( θ [ i + j ] )) be the j th autocovariance of the ψ ( θ [ i ] ) iterates after the sampler has converged.Suppose that V π ( ψ ) = Γ ψ < ∞ and IF ψ := (cid:16) Γ ψ + 2 ∞ X j =1 Γ ψj (cid:17) / Γ ψ < ∞ . Then, √ M (cid:16) b E π ( ψ ) − E π ( ψ ) (cid:17) → N (cid:16) , V π ( ψ ) IF ψ (cid:17) as M → ∞ .Proof. Parts (i) and (ii) follow from the proof of Proposition S1. To prove Part (iii)we note that both P rand scan and P mixture kernels are reversible. The result now followsfrom Theorem 27 of Roberts and Rosenthal (2004). References
Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain MonteCarlo methods.
Journal of the Royal Statistical Society, Series B , 72:269–342.Chan, J., Gonzalez, R. L., and Strachan, R. W. (2017). Invariant inference andefficient computation in the static factor model.
Journal of American StatisticalAssociation .Chib, S., Nardari, F., and Shephard, N. (2006). Analysis of high dimensional multi-variate stochastic volatility models.
Journal of Econometrics , 134:341–371.Chopin, N. and Singh, S. S. (2015). On the particle Gibbs sampler.
Bernoulli ,21(3):1855–1883.Conti, G., Fruhwirth-Schnatter, S., Heckman, J. J., and Piatek, R. (2014). Bayesianexploratory factor analysis.
Journal of Econometrics , 183(1):31–57.Deligiannidis, G., Doucet, A., and Pitt, M. (2018). The correlated pseudo-marginalmethod.
To appear in Journal Royal Statistical Society, B .Douc, R. and Capp´e, O. (2005). Comparison of resampling schemes for particle filter-ing. In
Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedingsof the 4th International Symposium on , pages 64–69. IEEE.S17oucet, A., Godsill, S., and Andrieu, C. (2000). On sequential Monte Carlo samplingmethods for Bayesian filtering.
Statistics and Computing , 10(3):197–208.Geweke, J. F. and Zhou, G. (1996). Measuring the pricing error of the arbitragepricing theory.
Review of Financial Studies , 9:557–587.Godsill, S., Doucet, A., and West, M. (2004). Monte Carlo smoothing for nonlineartime series.
Journal of the American Statistical Association , 99(465):156–168.Harvey, A. C. and Shephard, N. (1996). The estimation of an asymmetric stochasticvolatility model for an asset returns.
Journal of Business and Economics Statistics ,14:429–434.Kastner, G., Fruhwirth-Schnatter, S., and Lopes, H. F. (2017). Efficient Bayesianinference for multivariate factor stochastic volatility models.
Journal of Compu-tational and Graphical Statistics , 0(0):1–13.Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: Likelihood in-ference and comparison with arch models.
The Review of Economic Studies ,65(3):361–393.Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinearstate space models.
Journal of Computational and Graphical Statistics , 5(1):1–25.Lindsten, F., Jordan, M. I., and Sch¨on, T. B. (2014). Particle Gibbs with ancestorsampling.
Journal of Machine Learning Research , 15:2145–2184.Lindsten, F. and Sch¨on, T. B. (2012). On the use of backward simulation in theparticle Gibbs sampler. In
Proceedings of the 37th International Conference onAcoustics, Speech, and Signal Processing , pages 3845–3848. ICASSP.Malik, S. and Pitt, M. (2011). Particle filters for continuous likelihood evaluationand maximisation.
Journal of Econometrics , 165:190–209.Mendes, E. F., Carter, C. K., Gunawan, D., and Kohn, R. (2018). Flexible parti-cle markov chain monte carlo methods with an application to a factor stochasticvolatility model. arXiv preprint arXiv:1401.1667 .Olsson, J. and Ryden, T. (2011). Rao-Blackwellization of particle Markov chainMonte Carlo methods using forward filtering backward sampling.
Signal Process-ing, IEEE Transactions on , 59(10):4606–4619.S18mori, Y., Chib, S., Shephard, N., and Nakajima, J. (2007). Stochastic volatilitywith leverage: Fast and efficient likelihood inference.
Journal of Econometrics ,140:425–449.Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: ConvergenceDiagnosis and Output Analysis of MCMC.
R News , 6(1):7–11.Roberts, G. O. and Rosenthal, J. S. (2004). General state space Markov chains andMCMC algorithms.
Probability Surveys , 1:20–71.Roberts, G. O. and Rosenthal, J. S. (2006). Harris recurrence of Metropolis-within-Gibbs and transdimensional Markov chains.
The Annals of Applied Probability ,16(4):2123–2139.Scharth, M. and Kohn, R. (2016). Particle efficient importance sampling.
Journalof Econometrics , 190(1):133–147.Tierney, L. (1994). Markov chains for exploring posterior distriutions.
The Annalsof Statistics , 22(4):1701–1728.Van Der Merwe, R., Doucet, A., De Freitas, N., and Wan, E. (2001). The unscentedparticle filter.
Advances in neural information processing systems , pages 584–590.S19, pages 584–590.S19