GGlobally-centered autocovariances in MCMC
Medha AgarwalDept. of Mathematics and StatisticsIIT Kanpur [email protected]
Dootika Vats ∗ Dept.of Mathematics and StatisticsIIT Kanpur [email protected]
September 4, 2020
Abstract
Autocovariances are a fundamental quantity of interest in Markov chain Monte Carlo (MCMC)simulations with autocorrelation function (ACF) plots being an integral visualization tool forperformance assessment. Unfortunately, for slow mixing Markov chains, the empirical autoco-variance can highly underestimate the truth. For multiple-chain MCMC sampling, we proposea globally-centered estimator of the autocovariance function (G-ACvF) that exhibits signifi-cant theoretical and empirical improvements. We show that the bias of the G-ACvF estimatoris smaller than the bias of the current state-of-the-art. The impact of this improved estima-tor is evident in three critical output analysis applications: (1) ACF plots, (2) estimates of theMonte Carlo asymptotic covariance matrix, and (3) estimates of the effective sample size. Underweak conditions, we establish strong consistency of our improved asymptotic covariance estima-tor, and obtain its large-sample bias and variance. The performance of the new estimators isdemonstrated through various examples.
Advancements in modern personal computing have made it easy to run parallel Markov chain MonteCarlo (MCMC) implementations. This is particularly useful for slow mixing Markov chains wherethe starting points of the chains are spread over the state-space in order to more accurately capture ∗ Corresponding author a r X i v : . [ s t a t . C O ] S e p haracteristics of the target distribution. The output from m parallel chains is then summarized,visually and quantitatively, to assess the empirical mixing properties of the chains and the qualityof Monte Carlo estimators.A key quantity that drives MCMC output analysis is the autocovariance function (ACvF). Esti-mators of ACvF are only available for single-chain implementations, with a parallel-chain versionobtained by naive averaging. As we will demonstrate, this defeats the purpose of running parallelMarkov chains from dispersed starting values.Let F be the target distribution with mean µ defined on a space X ⊆ R d equipped with a countablygenerated σ -field B ( X ). For s = 1 , . . . , m , let { X st } t ≥ be the s th Harris ergodic F -invariant Markovchain (see Meyn and Tweedie, 2009, for definitions) employed to learn characteristics about F . Theprocess is covariance stationary so that the ACvF at lag k is defined asΓ( k ) = Cov F ( X s , X s k ) = E F (cid:104) ( X s − µ )( X s k − µ ) T (cid:105) . Estimating Γ( k ) is critical to assessing the quality of the sampler and the reliability of Monte Carloestimators. Let ¯ X s = n − (cid:80) nt =1 X st denote the Monte Carlo estimator of µ from the s th chain.The standard estimator for Γ( k ) is the sample autocovariance matrix at lag k ≥ s ( k ) = 1 n n − k (cid:88) t =1 (cid:0) X st − ¯ X s (cid:1) (cid:0) X s t + k − ¯ X s (cid:1) T , (1)and for k <
0, ˆΓ s ( k ) = ˆΓ s ( − k ) T . For a single-chain MCMC run, the estimator ˆΓ s ( k ) is used toconstruct ACF plots, to estimate the long-run variance of Monte Carlo estimators (Hannan, 1970;Damerdji, 1991), and to estimate effective sample size (ESS) (Kass et al., 1998; Gong and Flegal,2016; Vats et al., 2019). However, there is no unified approach to constructing estimators of Γ( k ) forparallel-chain implementations. Specifically, parallel chains are often spread across the state spaceso as to adequately capture high density areas. For slow mixing chains, the chains take time totraverse the space so that ¯ X s are all vastly different. Consequently, ˆΓ s ( k ) typically underestimatesΓ( k ) leading to a false sense of security about the quality of the process.We propose a globally-centered estimator of ACvF (G-ACvF) that centers the Markov chains2round the global mean from all m chains. We show that the bias for G-ACvF is lower thanˆΓ s ( k ), and through various examples, demonstrate improved estimation. We employ the G-ACvFestimators to construct ACF plots and a demonstrative example is at the end of this Section.Estimators of ACvFs are used to estimate the long-run variance of Monte Carlo averages. Specif-ically, spectral variance (SV) estimators are used to estimate Σ = (cid:80) ∞ k = −∞ Γ( k ) (Andrews, 1991;Damerdji, 1991; Flegal and Jones, 2010). We replace ˆΓ s ( k ) with G-ACvF in SV estimators toobtain a globally-centered SV (G-SV) estimator and demonstrate strong consistency under weakconditions. In the spirit of Andrews (1991), we also obtain large-sample bias and variance of theresulting estimator. SV estimators can be prohibitively slow to compute (Liu and Flegal, 2018).To relieve the computational burden, we adapt the fast algorithm of Heberle and Sattarhoff (2017)for G-SV estimator that dramatically reduces computation time. The G-SV estimator is employedin the computation of ESS. We will show that using G-SV estimator for estimating Σ safeguardsusers against early termination of the MCMC process. We use ACF plots to demonstrate the striking difference in the estimation of Γ( k ). Consider arandom-walk Metropolis-Hastings sampler for a univariate mixture of Gaussians target density.Let f ( x ) = 0 . f ( x ; − ,
1) + 0 . f ( x ; 5 , . , be the target density where f ( x ; a, b ) is the density of a normal distribution with mean a andvariance b . We set m = 2 with starting values distributed to the two modes. The trace plots inFigure 1 indicate that in the first 10 steps, the chains do not jump modes so that both Markovchains yield significantly different estimates of the population mean, µ . At 10 sample size, bothMarkov chains have traversed the state space reasonably and yield similar estimates of µ .In Figure 1 for n = 10 , we present the ACF plots using ˆΓ s ( k ) and our proposed G-ACvF estimator.The blue curves are the respective estimates at n = 10 . At n = 10 when the chains have similarmeans, the G-ACF and locally-centered ACF are equivalent. However, for n = 10 , ˆΓ s ( k ) criticallyunderestimates the correlation giving users a misleading visual of the quality of the Markov chain.3he G-ACF estimator accounts for the discrepancy in sample means between two chains leadingto a far improved quality of estimation. −5 0 5x−5 0 5 T i m e TargetChain 1Chain 2 −5 0 5x−5 0 5 T i m e TargetChain 1Chain 20 10 20 30 40 50 . . . . . . Lag A u t o c o rr e l a t i on . . . . . . Lag A u t o c o rr e l a t i on Figure 1: Top: Target density and trace plots for two chains at n = 10 (left) and n = 10 (right).Bottom: ACF plots for the first chain using local-centering (left) and global-centering (right).Histogram estimates at n = 10 with the blue curve being estimates from n = 10 . Let P denote the Markov transition kernel that uniquely determines the ACvF under stationarity.Recall that the sample mean of the s th Markov chain is ¯ X s and denote the global mean vector by¯¯ X = m − (cid:80) ms =1 ¯ X s . The global mean is a naturally superior estimator of µ compared to ¯ X s . For k ≥
0, we define the globally-centered ACvF (G-ACvF) estimator for the s th Markov chain asˆΓ G,s ( k ) = 1 n n − k (cid:88) t =1 ( X st − ¯¯ X )( X s ( t + k ) − ¯¯ X ) T , (2)4ith ˆΓ G,s ( k ) = ˆΓ G,s ( − k ) T for k <
0. In the event that all m Markov chains have been run longenough, ¯ X s ≈ ¯¯ X , and hence ˆΓ s ( k ) ≈ ˆΓ G,s ( k ). However, for shorter runs or for slow mixing chains,Γ( k ) is more appropriately estimated by ˆΓ G,s as it utilizes information from all chains and accountsfor disparity between estimates of µ . LetΦ ( q ) = ∞ (cid:88) k = −∞ | k | q Γ( k ) , and let Φ (1) be denoted by Φ. The proof of the following theorem is in Appendix A.2. Let (cid:107) · (cid:107) denote Euclidean norm. Theorem 1.
Let E F (cid:107) X (cid:107) δ < ∞ for some δ > . If P is polynomially ergodic of order ξ > (2 + (cid:15) ) / (1 + 2 /δ ) for some (cid:15) > , then, E F (cid:104) ˆΓ G,s ( k ) (cid:105) = (cid:18) − | k | n (cid:19) (cid:18) Γ( k ) − Σ mn − Φ mn (cid:19) + o (cid:16) n − (cid:17) . Remark . Polynomial ergodicity and the moment conditions are required to ensure Φ and Σ arefinite. The above result can be stated more generally for α -mixing processes, but we limit ourattention to Markov chains.When m = 1, ˆΓ G,s ( k ) = ˆΓ s ( k ) the bias results for which can be found in Priestley (1981). Sincelag-covariances are typically positive in MCMC, Theorem 1 implies that the G-ACvF estimatorsare asymptotically unbiased and exhibit reduced bias in finite samples compared to ˆΓ s ( k ). Theconsequences of this are particularly pertinent for the diagonals of Γ( k ).Let Γ ii ( k ) and ˆΓ iiG,s ( k ) be the i th component of Γ( k ) and ˆΓ G,s ( k ), respectively. For component i ,the autocorrelation is defined as ρ ( i ) ( k ) = Γ ii ( k )Γ ii (0) , and is instrumental in visualizing the serial correlation in the components of the Markov chain.A standard estimator of the autocorrelation is constructed from ˆΓ s ( k ). Instead, we advocate forusing G-ACvF estimates so that, ˆ ρ ( i ) G,s ( k ) = ˆΓ iiG,s ( k )ˆΓ iiG,s (0) . (3)5he globally-centered autocorrelation provides a far more realistic assessment of the correlationstructure of the marginal components of the chain as evidenced in Figure 1.We end this section by noting that we can obtain an average G-ACvF and G-ACF over all m chainsas, ˆΓ G ( k ) = 1 m m (cid:88) s =1 ˆΓ G,s ( k ) and ˆ ρ ( i ) G ( k ) = 1 m m (cid:88) s =1 ˆ ρ ( i ) G,s ( k ) . The averaged estimators present a measure of the overall correlation structure induced by theMarkov transition P . A critical need of autocovariances is in the assessment of Monte Carlo variability of estimators.Let g : X → R p be an F -integrable function so that interest is in estimating µ g = E F [ g ( X )].Set { Y st } t ≥ = { g ( X st ) } t ≥ for s = 1 , . . . , m . Let ¯ Y s = n − (cid:80) nt =1 Y st and ¯¯ Y = m − (cid:80) ms =1 ¯ Y s .By Birkhoff’s ergodic theorem, ¯¯ Y → µ g with probability 1 as n → ∞ . An asymptotic samplingdistribution may be available via a Markov chain central limit theorem (CLT) if there exists a p × p positive-definite matrix Σ such that √ mn ( ¯¯ Y − µ g ) d −→ N (0 , Σ) where Σ = ∞ (cid:88) k = −∞ Cov F (cid:16) Y , Y k ) (cid:17) := ∞ (cid:88) k = −∞ Υ( k ) . The goal in output analysis for MCMC is to estimate Σ in order to assess variability in ¯¯ Y (Flegalet al., 2008; Roy, 2019; Vats et al., 2020). There is a rich literature on estimating Σ for single-chain MCMC implementations. The most common are SV estimators (Andrews, 1991; Vats et al.,2018) and batch means estimators (Chen and Seila, 1987; Vats et al., 2019). Recently, Gupta andVats (2020) constructed a replicated batch means estimator for estimating Σ from parallel Markovchains. Batch means estimators are computationally more efficient than SV estimators, whereasSV estimators are more reliable (Damerdji, 1995; Flegal and Jones, 2010). Here, we utilize G-ACvFestimators to construct globally-centered SV (G-SV) estimator of Σ. Using the method of Heberleand Sattarhoff (2017), we also provide a computationally efficient implementation of the G-SV6stimator.For k ≥
0, the locally and globally-centered estimators of Υ( k ) areˆΥ s ( k ) = 1 n n − k (cid:88) t =1 ( Y st − ¯ Y s )( Y s t + k − ¯ Y s ) T and ˆΥ G,s ( k ) = 1 n n − k (cid:88) t =1 ( Y st − ¯¯ Y )( Y s t + k − ¯¯ Y ) T , respectively. Let ˆΥ G ( k ) = m − (cid:80) mk =1 ˆΥ G,s ( k ). SV estimators are constructed as weighted andtruncated sums of estimated ACvFs. For some c ≥
1, let w : R → [ − c, c ] be a lag window functionand b n ∈ N be a truncation point. The (locally-centered) SV estimator of Σ for chain s isˆΣ s = b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) ˆΥ s ( k ) . (4)Large-sample properties of ˆΣ s have been widely studied. Vats et al. (2018) provide conditions forstrong consistency while Flegal and Jones (2010); Hannan (1970) obtain bias and variance. Theseresults also extend naturally to an average SV (ASV) estimator defined as ˆΣ A := m − (cid:80) ms =1 ˆΣ s . We define the G-SV estimator as the weighted and truncated sum of G-ACvFsˆΣ G = b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) ˆΥ G ( k ) . (5) Assumption . The lag window w ( x ) is continuous at all but a finite number of points, is a boundedand even function with w (0) = 1, (cid:82) ∞−∞ w ( x ) dx < ∞ , and (cid:82) ∞−∞ (cid:12)(cid:12) w ( x ) (cid:12)(cid:12) < ∞ .Assumption 1 is standard (see Anderson, 1971) and is satisfied by most lag windows. We will thepopular Bartlett lag window in our simulations for which w ( x ) = 1 − | x | for | x | ≤ First, we provide conditions for strong consistency. Strong consistency is particularly important toensure sequential stopping rules in MCMC yield correct coverage at termination (Flegal and Gong,2015; Glynn and Whitt, 1992). A critical assumption is that of a strong invariance principle which7he following theorem establishes. Let B ( n ) denotes a standard p -dimensional Brownian motion. Theorem 2 (Kuelbs and Philipp (1980); Vats et al. (2018)) . Let E F (cid:107) Y (cid:107) δ < ∞ for δ > andlet P be polynomially ergodic of order ξ > ( q + 1 + (cid:15) ) / (1 + 2 /δ ) for q ≥ . Then there exists a p × p lower triangular matrix L with LL T = Σ , a non-negative function ψ ( n ) = n / − λ for some λ > ,a finite random variable D , and a sufficiently rich probability space Ω such that for all n > n , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 Y t − nµ g − LB ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < Dψ ( n ) with probability 1 . The proof the result below is in Appendix A.3.
Theorem 3.
Let the assumptions of Theorem 2 hold with q = 1 . If ˆΣ s a.s. −−→ Σ for all s , and n − b n log log n → as n → ∞ , then ˆΣ G a.s. → Σ as n → ∞ . Conditions for strong consistency of ˆΣ s for polynomially ergodic Markov chains are in Vats et al.(2018). Typically, b n = (cid:98) n ν (cid:99) for some 0 < ν < n − b n log log n →
0, thus Theorem 3presents no added conditions for strong consistency. Our next two results establish large-samplebias and variance for G-SV and mimic those of ˆΣ s which can be found in Hannan (1970). Let Σ ij and ˆΣ ijG denote the ij th element of the matrix Σ and ˆΣ G , respectively. The proofs of the resultsbelow can be found in Appendix A.4 and Appendix A.5, respectively. Theorem 4.
Let the assumptions of Theorem 2 hold with q such that lim x → − w ( x ) | x | q = k q < ∞ and b q +1 n /n → as n → ∞ . Then lim n →∞ b qn E (cid:104) ˆΣ G − Σ (cid:105) = − k q Φ ( q ) . Hannan (1970) proved a similar bias result assuming µ g was known with the rate b qn /n → n → ∞ . Similar to Anderson (1971) for the univariate case, if µ g is replaced by ¯¯ Y , we require b q +1 n /n → n → ∞ for the multivariate case. Theorem 5.
Let the assumptions of Theorem 2 hold and let E [ D ] < ∞ and E F (cid:107) Y (cid:107) < ∞ , then lim n →∞ b − n n Var (cid:16) ˆΣ ijG (cid:17) = [Σ ii Σ jj + Σ ij ] (cid:82) ∞−∞ w ( x ) dx . q = 1 with k q = 1 and (cid:82) w ( x ) dx = 2 /
3. For other popular lagwindows, see Anderson (1971).
The SV estimator, despite having good statistical properties, poses limitations due to slow compu-tation. The complexity of the SV estimator is O ( b n np ). For slow mixing Markov chains, n and b n can be prohibitively large, limiting the use of SV estimators. We adapt the fast Fourier transformbased algorithm of Heberle and Sattarhoff (2017) to calculate the G-SV estimator.Suppose w k = w ( k/b n ) and let T ( w ) be the n × n Toeplitz matrix with the first column being(1 w w . . . , w n − ) T . Notice an alternate formulation of ˆΣ s ˆΣ s = 1 n A Ts T ( w ) A s , where A s = Y s − ¯ Y s . . . Y sn − ¯ Y s T . Let w ∗ = (1 w w . . . , w n − , , w n − , . . . , w ) T and set C ( w ∗ ) to be a symmetric circulant matrixsuch that the matrix truncation C ( w ∗ ) n, n = T ( w ). Let M ( j ) denote the j th column of a matrix M and v ( i ) denote the i th element of a vector v . With inputs C ( w ∗ ) and A s , Algorithm 1 producesˆΣ s exactly. For more details, see Heberle and Sattarhoff (2017). Algorithm 1:
Heberle and Sattarhoff (2017) Algorithm
Input: C ( w ∗ ) and A s Compute eigenvalues λ i of C ( w ∗ ) (1) using a discrete Fourier transform, i = 1 , . . . , n Construct 2 n × p matrix A ∗ s = ( A Ts n × p ) T for j = 1 , , . . . , p do Calculate V ∗ A ∗ s ( j ) by DFT of A ∗ s ( j ) . Multiply V ∗ A ∗ ( i ) s ( j ) with the eigenvalue λ i for all i = 1 , . . . , n to construct Λ V ∗ A ∗ s ( j ) . Calculate C ( w ∗ ) A ∗ s ( j ) = V Λ V ∗ A ∗ s ( j ) by inverse FFT of Λ V ∗ A ∗ s ( j ) . end Select the first n rows of C ( w ∗ ) A ∗ s to form T ( w ) A s . Premultiply by A Ts and divide by n . Output: ˆΣ s We observe that a similar decomposition is possible for the G-SV estimator. Setting B s = ( Y s − ¯¯ Y . . . Y sn − ¯¯ Y ) T and calling Algorithm 1 with inputs C ( w ∗ ) and B s yields ˆΣ G,s . Algorithm 1 has9omplexity O ( n log np ) and is thus orders of magnitude faster. Of particular importance is the factthat the bandwidth b n has close to no impact on the computation time. An useful method of assessing the reliability of ¯¯ Y in estimating µ g is effective sample size (ESS).ESS are number of independent and identically distributed samples that would yield the sameMonte Carlo variability in ¯¯ Y as this correlated sample. Let | · | denote determinant. A multiplechain version of the ESS as defined by Vats et al. (2019) isESS = mn (cid:18) | Υ(0) || Σ | (cid:19) /p . The simulation terminates when the ESS is greater than a pre-specified, theoretically motivated,lower-bound W p . In our setting of m parallel chains of n samples each, we estimate ESS with (cid:100) ESS G = mn (cid:32) | ˆΥ(0) || ˆΣ G | (cid:33) /p . We use the locally-centered ˆΥ(0) to estimate Υ(0) instead of ˆΥ G (0) when calculating ESS. BothˆΥ(0) and ˆΥ G (0) are consistent for Υ(0); however the typical underestimation witnessed in Υ(0)allows a safeguard against early termination. For comparison, (cid:100) ESS A is constructed similarly usingˆΣ A instead of ˆΣ G to estimate Σ. For three different target distributions we sample m parallel Markov chains to assess the perfor-mance of our proposed estimators. We make the following three comparisons - (1) locally-centeredACF vs G-ACF estimator, (2) A-SV vs G-SV estimator, and (3) (cid:100) ESS A vs (cid:100) ESS G . The quality ofestimation of Σ is studied by coverage probabilities of a 95% Wald confidence region when the truemean µ g is known. The convergence of local and global estimators of Σ and ESS as n increases isstudied through two types of running plots (1) logarithm of Frobenius norm of estimated Σ, and102) logarithm of estimated ESS/ mn . In all three examples, we estimate the mean of the stationarydistribution, so g is the identity function. The truncation point b n are the defaults available in the R package mcmcse (Flegal et al., 2020). Our first example is one where the Υ and Σ are available in closed-form, to allow a comparison ofquality of estimation. Consider a p -dimensional VAR(1) process { X t } t ≥ such that X t = Φ X t − + (cid:15) t , where X t ∈ R p , Φ is a p × p matrix, (cid:15) t iid ∼ N (0 , Ω), and Ω is a positive-definite p × p matrix. Wefix Ω be a AR correlation matrix with parameter .9. The invariant distribution for this Markovchain is N (0 , Ψ), where vec (Ψ) = ( I p − Φ × Φ) − vec ( W ). For k ≥
0, the lag- k autocovariance isΥ( k ) = Γ( k ) = Φ k Ψ. The process satisfies a CLT if the spectral norm of Φ is less than 1 (Tjøstheim,1990) and the limiting covariance, Σ, is known in closed form (Dai and Jones, 2017). We set p = 2and set Φ to have eigenvalues .999 and .001. We further set m = 5 with starting values dispersedacross the state space.We compare the the locally and globally-centered autocorrelations against the truth. In Figure 2are the estimated ACF plots for the first component of the second chain against the truth in red.For a run length of 10 (top row), the commonly used locally-centered ACF underestimates the truecorrelation giving a false sense of security about the mixing of the chain. The G-ACF estimator,on the other hand, is far more accurate. This difference is negligible at the larger run length of n = 10 when each of the 5 chains have sufficiently explored the state space.Since µ is known, we assess the performance of A-SV and G-SV estimatiors by assessing coverageprobabilities of 95% Wald confidence regions over 1000 replications. Table 1 shows that irrespectiveof sample size, ˆΣ G results in close to nominal coverage probability, whereas ˆΣ A yields critically lowcoverage. The low coverage is a consequence of underestimating the autocovariances. It is only atthe sample size of n = 10 that ˆΣ A yields close to nominal coverage.The quality of estimation of Σ and ESS is assessed by running plots from 50 replications of run length11
10 20 30 40 . . . . . . Lag A C F Locally centered ACF . . . . . . Lag A C F Globally centered ACF . . . . . . Lag A C F Locally centered ACF . . . . . . Lag A C F Globally centered ACF
Figure 2: VAR. ACF (left) and G-ACF (right) for the second chain for m = 5. (Top) n = 10 and(bottom) n = 10 . The red line is the true ACF. n Table 1: VAR. Coverage probabilities at 95% nominal level. Replications = 1000.120000. In Figure 3, we present running plots of log( (cid:107) ˆΣ (cid:107) F ) for both ˆΣ G and ˆΣ A and the runningplots of log( (cid:100) ESS) /mn for both (cid:100)
ESS G and (cid:100) ESS A . It is evident that ˆΣ A severely underestimatesthe truth, leading to an overestimation of ESS. The G-SV estimator is able to estimate Σ moreaccurately early on, safeguarding against early termination using ESS. Simulation size
Log o f F r oben i u m no r m A−SVG−SVTruth 0 10000 20000 30000 40000 50000 − . − . − . − . − . Simulation size
Log o f ESS / m n A−SVG−SVTruth
Figure 3: VAR. (Left) Running plot for logarithm of Frobenius norm of A-SV and G-SV estimator.(Right) Running plot for logarithm of (cid:100)
ESS /mn using A-SV and G-SV estimator.
Consider the following family of bimodal bivariate distributions introduced by Gelman and Meng(1991), which we term as a boomerang distribution. For A ≥ B, C ∈ R , the target density is f ( x, y ) ∝ exp (cid:18) − (cid:104) Ax y + x + y − Bxy − Cx − Cy (cid:105)(cid:19) . We run a deterministic scan Gibbs sampler to sample from this target using the following fullconditional densities x | y ∼ N (cid:18) By + CAy + 1 , Ay + 1 (cid:19) y | x ∼ N (cid:18) Bx + CAx + 1 , Ax + 1 (cid:19) . We consider two settings; in setting 1, A = 1 , B = 3 , C = 8 which results in well-separated modes.13n setting 2, we let A = 1 , B = 10 , C = 7 which yields a boomerang shape for the contours. Thecontour plots for these two settings are in the left in Figure 4, overlaid with scatter plots of twoparallel runs of the Gibbs sampler. Setting 2 is chosen specifically to illustrate that the locally andglobally-centered ACvFs perform similarly when the Markov chain moves freely in the state space. x y + + . . . . . . Lag A u t o c o rr e l a t i on . . . . . . Lag A u t o c o rr e l a t i on x y + + + + + . . . . . . Lag A u t o c o rr e l a t i on . . . . . . Lag A u t o c o rr e l a t i on Figure 4: Boomerang. Top row is Setting 1 and bottom row is Setting 2. Contour plots of thetarget distributions overlaid with scatter plots for two chains. Locally-centered ACF (middle) andG-ACF (right) plots for one chain for n = 10 with the blue line being the ACF at n = 10 .We run m = 5 parallel Markov chains with starting points evenly distributed across the state space.For setting 1, when the Markov chains have not been able to jump modes, locally-centered ACFseverely underestimates autocorrelation. This is seen in the top-middle plot of Figure 4, wherethe locally-centered autocorrelations at n = 1000 are drastically different from the locally-centeredautocorrelations at n = 10 . Somewhere between n = 1000 and n = 10 , the Markov chains jumpmodes and it is only then that the locally-centered ACFs provide better estimates. The G-ACFs, onthe other hand, produce similar ACF estimates at n = 1000 and n = 10 by measuring deviationsabout the global mean. For setting 2, at n = 1000, both methods yield similar ACFs reinforcingour claim that there is much to gain by using G-ACvF and nothing to lose.The true mean of the target distribution can be obtained using numerical approximation. Using theA-SV and G-SV estimators, we construct 95% confidence regions and report coverage probabilities14 m = 2 m = 5A-SV G-SV A-SV G-SV5000 0.595 0.700 0.402 0.64010000 0.563 0.665 0.59 0.73950000 0.775 0.814 0.807 0.864100000 0.847 0.864 0.884 0.902 Table 2: Setting 1: Coverage probabilitiesfrom 10 replications. n m = 2 m = 5A-SV G-SV A-SV G-SV1000 0.856 0.868 0.895 0.9105000 0.921 0.925 0.910 0.91510000 0.928 0.93 0.919 0.92650000 0.943 0.944 0.951 0.952 Table 3: Setting 2: Coverage probabilitiesfrom 10 replications.for 1000 replications for both m = 2 and m = 5. Tables 2 and 3 reports all results. In setting1, systematically, the G-SV estimator yields far superior coverage than the A-SV estimator for allvalues of n . Whereas for setting 2, the results are almost similar indicating the equivalence of A-SVand G-SV estimator for fast mixing Markov chains.In Figure 5 we present running plots of estimates of log(ESS) /mn for both setting 1 and setting2. As the sample size increases, (cid:100) ESS G and (cid:100) ESS A become closer, but early on for setting 1, (cid:100) ESS G estimates are much smaller, safeguarding users against early termination. − − − − Simulation size
Log o f ESS / m n A−SVEG−SVE 0 10000 20000 30000 40000 50000 − . − . − . − . − . Simulation size
Log o f ESS / m n A−SVEG−SVE
Figure 5: Boomerang. Running plot of log( (cid:100)
ESS A ) /mn and log( (cid:100) ESS G ) /mn with m = 5 for setting1 (left) and setting 2 (right). Consider the sensor network localization problem of Ihler et al. (2005) where the goal is to identifyunknown sensor locations using noisy distance data. We use the data and the model as specifiedby Tak et al. (2018). There are four sensors scattered on a planar region where x i = ( x i , x i ) T i th sensor. Let y ij denote the distance between the sensors x i and x j and if observed, yields z ij = 1 otherwise, z ij = 0. The complete model is then, z ij | x , ..., x ∼ Bernoulli exp (cid:32) −(cid:107) x i − x j (cid:107) R (cid:33) y ij | z ij = 1 , x i , x j ∼ N (cid:16) (cid:107) x i − x j (cid:107) , σ (cid:17) . Tak et al. (2018) set R = 0 . σ = 0 .
02 and use independent N (0 , I ) priors on the locations.Distance y ij is specified only if z ij = 1. The 8-dimensional posterior of ( x , x , x , x ) is intractablewith unknown full conditionals. A Metropolis-within-Gibbs type sampler is implemented with eachfull conditional employing the repelling attractive Metropolis (RAM) algorithm of Tak et al. (2018).The RAM algorithm runs Markov chains with higher jumping frequency between the modes.We run m = 5 parallel Markov chains with well-separated starting points. Coverage probabilitiesare not estimable since the true posterior mean is unknown. Trace plot of x for two Markov chainsis shown in Figure 6. Figure 6 also plots locally and globally-centered ACFs. As seen in previousexamples, early estimates of locally-centered ACFs are much smaller than later estimates. Theglobally-centered ACFs, on the other hand are similar at small and large sample sizes. Figure 7presents the running plot of log (cid:107) Σ (cid:107) F and log ESS /mn using A-SV and G-SV estimators alongwith standard errors from 10 replications. In both the plots, G-SV estimator and (cid:100) ESS G /mn reachstability significantly earlier than A-SV estimator and (cid:100) ESS A /mn . For slow mixing Markov chains, a naive average of locally-centered ACvFs can dramatically under-estimate the truth. This has a severe impact on ACF plots, Monte Carlo variance, and stoppingtime of MCMC algorithms. We provide a globally-centered estimate of the ACvF that leads toimprovements in all three aspects of MCMC output analysis. All ACF plots in this manuscripthave been constructed using the R package multichainACF . https://github.com/medhaaga/multichainACF ime x . . . . Time 0 10 20 30 40 . . . . . . lag A C F . . . . . . lag A C F . . . . . . lag A C F . . . . . . lag A C F Figure 6: Sensor: Trace plot of x (left) for two parallel chains. Average locally-centered ACF(solid orange) and G-ACF (solid blue) at n = 5000 (middle) and n = 50000 (right). Dashed linesare individual chain estimates. Simulation size
Log o f F r oben i u m no r m A−SVEG−SVE 0 50000 100000 150000 200000 − − − − Simulation size l og ( ESS / m n ) A−SVEG−SVE
Figure 7: Sensor: Running plot of log( (cid:107) Σ (cid:107) F ) (left) and log(ESS) /mn (right) estimated using A-SVand G-SV along with standard errors for 10 replications.17nother class of estimators for Σ for reversible Markov chains are the multivariate initial sequence(mIS) estimators (Dai and Jones, 2017). Similar to SV estimators, ACvFs are a critical part ofmIS estimators and underestimation in ACvFs yields underestimates of Σ. It is easy to show thatˆΥ G,s ( k ) is a strongly consistent estimators for Υ( k ). Replacing ˆΥ s ( k ) for ˆΥ G,s ( k ) and following(Dai and Jones, 2017, Theorem 2) will yield consistent overestimation of the generalized variance | Σ | resulting in a globally-centered version of the mIS estimator. A valuable line of research wouldbe to study the theoretical and empirical properties of the globally-centered mIS estimators. The authors are thankful to Haema Nilakanta for sharing useful R code. A Appendix
A.1 Preliminaries
Let B ( i ) denote the i th component of the p -dimensional standard Brownian motion B . Lemma 1. (Cs¨org¨o and R´ev´esz (1981)). Let b n be a sequence such that b n and n/b n → ∞ as n → ∞ . Then for all (cid:15) > and for almost all sample paths, there exists n ( (cid:15) ) such that ∀ n ≥ n ( (cid:15) )sup ≤ t ≤ n − b n sup ≤ s ≤ b n (cid:12)(cid:12)(cid:12) B ( i ) ( t + s ) − B ( i ) ( t ) (cid:12)(cid:12)(cid:12) < (1 + (cid:15) ) (cid:32) b n (cid:18) log nb n + log log n (cid:19)(cid:33) / , sup ≤ s ≤ b n (cid:12)(cid:12)(cid:12) B ( i ) ( n ) − B ( i ) ( n − s ) (cid:12)(cid:12)(cid:12) < (1 + (cid:15) ) (cid:32) b n (cid:18) log nb n + log log n (cid:19)(cid:33) / , and (cid:12)(cid:12)(cid:12) B ( i ) ( n ) (cid:12)(cid:12)(cid:12) < (1 + (cid:15) ) (cid:112) n log log n . In order to generalize summations for positive and negative lags in the following proofs, we definethe sets I k , J k and J k as I k := { , . . . , n − k } , J k := { n − k + 1 , . . . , n } , J k := { , . . . , k } for k > I k := { − k, . . . , n } , J k := { , . . . , − k } , J k := { n + k + 1 , . . . , n } for k <
0. For k = 0, J k and J k are empty sets and I k := { , . . . , n } . Notice that the empirical autocovariance estimatorat lag- k requires a summation over I k . 18 .2 Proof of Theorem 1 We can break ˆΓ
G,s into four parts for all k ≥ G,s ( k ) = 1 n (cid:88) t ∈ I k (cid:16) X st − ¯¯ X (cid:17) (cid:16) X s ( t + k ) − ¯¯ X (cid:17) T = n (cid:88) t ∈ I k (cid:0) X st − ¯ X s (cid:1) (cid:16) X s ( t + k ) − ¯ X s (cid:17) T + n (cid:88) t ∈ J k (cid:0) ¯ X s − X st (cid:1) (cid:16) ¯ X s − ¯¯ X (cid:17) T + n (cid:88) t ∈ J k (cid:16) ¯ X s − ¯¯ X (cid:17) (cid:0) ¯ X s − X st (cid:1) T + (cid:20) n −| k | n (cid:16) ¯ X s − ¯¯ X (cid:17) (cid:16) ¯ X s − ¯¯ X (cid:17) T (cid:21) = ˆΓ s ( k ) − n (cid:88) t ∈ J k A st − n (cid:88) t ∈ J k A Tst + n −| k | n (cid:16) ¯ X s − ¯¯ X (cid:17) (cid:16) ¯ X s − ¯¯ X (cid:17) T , (6)where A st = ( X st − ¯ X s )( ¯ X s − ¯¯ X ) T . Under the assumption of stationarity, we will study theexpectations of each of the above terms. Without loss of generality, consider A , E [ A ]= E (cid:20)(cid:0) X − ¯ X (cid:1) (cid:16) ¯ X − ¯¯ X (cid:17) T (cid:21) = E (cid:104) X ¯ X T (cid:105) − m E (cid:104) X ¯ X T (cid:105) − m − m E (cid:104) X ¯ X T (cid:105) + 1 m E (cid:104) ¯ X ¯ X T (cid:105) + m − m E (cid:104) ¯ X ¯ X T (cid:105) − E (cid:104) ¯ X ¯ X T (cid:105) = m − m (cid:18) E (cid:104) X ¯ X T (cid:105) − E (cid:104) X ¯ X T (cid:105) + E (cid:104) ¯ X ¯ X T (cid:105) − E (cid:104) ¯ X ¯ X T (cid:105)(cid:19) = m − m n n (cid:88) t =1 E (cid:104) X X T t (cid:105) − E [ X ] E (cid:104) ¯ X T (cid:105) + E (cid:2) ¯ X (cid:3) E (cid:104) ¯ X T (cid:105) − Var (cid:2) ¯ X (cid:3) − E (cid:2) ¯ X (cid:3) E (cid:104) ¯ X T (cid:105) = m − mn n − (cid:88) k =0 Γ( k ) − n Var (cid:2) ¯ X (cid:3) . (7)Similarly, E (cid:104) A T (cid:105) = E [ A ] T = m − mn n − (cid:88) k =0 Γ( k ) T − n Var (cid:2) ¯ X (cid:3) . (8)19urther, E (cid:20)(cid:16) ¯ X − ¯¯ X (cid:17) (cid:16) ¯ X − ¯¯ X (cid:17) T (cid:21) = E (cid:104) ¯ X ¯ X T − ¯ X ¯¯ X T − ¯¯ X ¯ X T + ¯¯ X ¯¯ X T (cid:105) = (cid:16) Var( ¯ X ) + µµ T − Var( ¯¯ X ) − µµ T (cid:17) = m − m Var( ¯ X ) . (9)Additionally, ˆΓ s ( k ) exhibits the following expectation (from Priestley (1981)) E [ˆΓ s ( k )] = (cid:18) − | k | n (cid:19) (cid:0) Γ( k ) − Var( ¯ X s ) (cid:1) . (10)Using(7), (8), (9), and (10) in (6), E (cid:104) ˆΓ G,s ( k ) (cid:105) = E (cid:104) ˆΓ s ( k ) (cid:105) − n (cid:88) t ∈ J k E [ A t ] + (cid:88) t ∈ J k E [ A T t ] + (cid:18) − | k | n (cid:19) (cid:18) − m (cid:19) Var( ¯ X )= E (cid:104) ˆΓ s ( k ) (cid:105) − | k | n (cid:18) − m (cid:19) n n − (cid:88) h =0 Γ( h ) + 1 n n − (cid:88) h =0 Γ( h ) T − X ) + (cid:18) − | k | n (cid:19) (cid:18) − m (cid:19) Var( ¯ X )= (cid:18) − | k | n (cid:19) (cid:32) Γ( k ) − Var( ¯ X ) m (cid:33) + o ( n − )By (Song and Schmeiser, 1995, Proposition 1),Var( ¯ X s ) = Σ n + Φ n + o ( n − ) . As a consequence, we get the result E (cid:104) ˆΓ G,s ( k ) (cid:105) = (cid:18) − | k | n (cid:19) (cid:18) Γ( k ) − Σ mn − Φ mn (cid:19) + o ( n − ) . Although this completes the proof of Theorem 1, we will require the following decomposition E (cid:104) ˆΓ G,s ( k ) (cid:105) = (cid:18) − | k | n (cid:19) Γ( k ) + O + O . (11)20here, O = −| k | n (cid:18) − m (cid:19) n n − (cid:88) h =0 Γ( h ) T + 1 n n − (cid:88) h =0 Γ( h ) − (cid:18) − m (cid:19) (cid:18) Σ n + Φ n (cid:19) + o ( n − ) ,O = − m (cid:18) Σ n + Φ n (cid:19) + o ( n − ) A.3 Strong consistency of the G-SV estimator
Consider pseudo autocovariance and spectral variance estimators for the s th chain, denoted by˜Υ s ( k ) and ˜Σ s that use data centered around the unobserved actual mean µ g :˜Υ s ( k ) = 1 n (cid:88) t ∈ I k ( Y st − µ g )( Y s ( t + k ) − µ g ) T and ˜Σ s = b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) ˜Υ s ( k ) . The average pseudo spectral variance estimator is ˜Σ A = m − (cid:80) ms =1 ˜Σ s Further, let M = 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) (cid:88) t ∈ I k n (cid:20)(cid:0) Y st − µ g (cid:1) i (cid:16) µ g − ¯¯ Y (cid:17) j + (cid:16) µ g − ¯¯ Y (cid:17) i (cid:16) Y s ( t + k ) − µ g (cid:17) j (cid:21) ,M = (cid:16) µ g − ¯¯ Y (cid:17) i (cid:16) µ g − ¯¯ Y (cid:17) j b n − (cid:88) k = − b n +1 (cid:18) − | k | n (cid:19) w (cid:18) kb n (cid:19) . Lemma 2.
Let Assumption 1 holds, then for the G-SV estimator, ˆΣ ijG = ˜Σ ijA + M + M and | M + M | ≤ D g ( n ) + Dg ( n ) + g ( n ) , where for some constant C , g ( n ) = (4 c + C ) b n ψ ( n ) n − c ψ ( n ) n → g ( n ) = 2 √ (cid:107) L (cid:107) p / (1 + (cid:15) ) (cid:34) (4 c + C ) b n ψ ( n ) √ n log log nn − c ψ ( n ) √ n log log nn (cid:35) → g ( n ) = (cid:107) L (cid:107) p (1 + (cid:15) ) (cid:20) (4 c + C ) b n log log nn − c log log nn (cid:21) → as n → ∞ . roof. The proof follows from standard algebraic calculations and is presented here for complete-ness. Consider,ˆΣ ijG = 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:16) Y st − ¯¯ Y (cid:17) i (cid:16) Y s ( t + k ) − ¯¯ Y (cid:17) j = 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:20)(cid:0) Y st − µ g (cid:1) i (cid:16) Y s ( t + k ) − µ g (cid:17) j + (cid:0) Y st − µ g (cid:1) i (cid:16) µ g − ¯¯ Y (cid:17) j + (cid:16) µ g − ¯¯ Y (cid:17) i (cid:16) Y s ( t + k ) − µ g (cid:17) j + (cid:16) µ g − ¯¯ Y (cid:17) i (cid:16) µ g − ¯¯ Y (cid:17) j (cid:21) = ˜Σ ijA + ( µ g − ¯¯ Y ) i ( µ g − ¯¯ Y ) j b n − (cid:88) k = − b n +1 (cid:18) − | k | n (cid:19) w (cid:18) kb n (cid:19) + 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) (cid:88) t ∈ I k (cid:20) n (cid:0) Y st − µ g (cid:1) i (cid:16) µ g − ¯¯ Y (cid:17) j + 1 n (cid:16) µ g − ¯¯ Y (cid:17) i (cid:16) Y s ( t + k ) − µ g (cid:17) j (cid:21) = ˜Σ ijA + M + M . Consequently (cid:12)(cid:12)(cid:12) ˆΣ ijG − ˜Σ ijA (cid:12)(cid:12)(cid:12) = | M + M | ≤ | M | + | M | . We first present a result which will be useful later. For any Markov chain s , (cid:107) ¯ Y s − µ g (cid:107) ∞ ≤ (cid:107) ¯ Y s − µ g (cid:107) = 1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 Y st − nµ g (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 Y st − nµ g − LB ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13) LB ( n ) (cid:13)(cid:13) n< Dψ ( n ) n + (cid:107) LB ( n ) (cid:107) n< Dψ ( n ) n + 1 n (cid:107) L (cid:107) p (cid:88) i =1 | B ( i ) ( n ) | / ≤ Dψ ( n ) n + 1 n (cid:107) L (cid:107) p / (1 + (cid:15) ) (cid:112) n log log n . (12)Similarly, (cid:107) ¯¯ Y − µ g (cid:107) ∞ ≤ Dψ ( n ) n + 1 n (cid:107) L (cid:107) p / (1 + (cid:15) ) (cid:112) n log log n . (13)22ow consider, | M |≤ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) t ∈ I k ( Y st − µ g ) i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( µ g − ¯¯ Y ) j (cid:12)(cid:12)(cid:12) + 1 n (cid:12)(cid:12)(cid:12) ( µ g − ¯¯ Y ) i (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) t ∈ I k ( Y j ( t + k ) − µ g ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:107) ( ¯¯ Y − µ g ) (cid:107) ∞ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:88) t ∈ I k ( Y st − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + 1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:88) t ∈ I k ( Y s ( t + k ) − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ c (cid:107) ( ¯¯ Y − µ g ) (cid:107) ∞ m × m (cid:88) s =1 b n − (cid:88) k = − b n +1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k ( Y st − µ g ) − n ( ¯ Y s − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + 1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k ( Y st − µ g ) − n ( ¯ Y s − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ c (cid:107) ( ¯¯ Y − µ g ) (cid:107) ∞ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k ( Y st − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + 1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k ( Y st − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + 2 (cid:107) ¯ Y s − µ g (cid:107) ∞ ≤ c (cid:107) ( ¯¯ Y − µ g ) (cid:107) ∞ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k ( Y st − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k ( Y st − µ g ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + 2 c (2 b n − (cid:107) ¯¯ Y − µ g (cid:107) ∞ (cid:107) ¯ Y h − µ g (cid:107) ∞ for some h ∈ { , . . . , m } . Using SIP on summation of k terms, we obtain the following upper bound for | M || M | < c (cid:107) ( ¯¯ Y − µ g ) (cid:107) ∞ b n − (cid:88) k = − b n +1 (cid:34) Dψ ( k ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ k log log kn (cid:35) + (2 b n − (cid:107) ¯ Y h − µ g (cid:107) ∞ ≤ c (2 b n − (cid:107) ( ¯¯ Y − µ g ) (cid:107) ∞ (cid:34) Dψ ( n ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ n log log nn + (cid:107) ¯ Y h − µ g (cid:107) ∞ (cid:35) ≤ c (2 b n − (cid:34) Dψ ( n ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ n log log nn (cid:35) (by (12) and (13)) . (14)For M , | M | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m m (cid:88) s =1 (cid:16) µ g − ¯¯ Y (cid:17) i (cid:16) µ g − ¯¯ Y (cid:17) j b n − (cid:88) k = − b n +1 (cid:18) − | k | n (cid:19) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:107) ¯¯ Y − µ g (cid:107) ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b n − (cid:88) k = − b n +1 (cid:18) − | k | n (cid:19) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:107) ¯¯ Y − µ g (cid:107) ∞ b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ b n (cid:107) ¯¯ Y − µ g (cid:107) ∞ (cid:90) ∞−∞ | w ( x ) | dx ≤ Cb n (cid:34) Dψ ( n ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ n log log nn (cid:35) for some constant C (by (13)) . (15)Using (14) and (15), | M + M | ≤ | M | + | M | = D g ( n ) + Dg ( n ) + g ( n ) , where g ( n ) = (8 c + C ) b n ψ ( n ) n − c ψ ( n ) n g ( n ) = 2 √ (cid:107) L (cid:107) p / (1 + (cid:15) ) (cid:34) (8 c + C ) b n ψ ( n ) √ n log log nn − c ψ ( n ) √ n log log nn (cid:35) g ( n ) = (cid:107) L (cid:107) p (1 + (cid:15) ) (cid:20) (8 c + C ) b n log log nn − c log log nn (cid:21) . Under the assumptions of Theorem 2, ψ ( n ) = o ( n / ). Using the law of iterative logarithms (LIL),a tighter bound for ψ ( n ) is given by Strassen (1964) as o ( √ n log log n ). Since b n log log n/n → n → ∞ , consequently, b n ψ ( n ) /n → ψ ( n ) /n → b n ψ ( n ) √ n log log n/n →
0, and ψ ( n ) √ n log log n/n →
0. Thus, g ( n ) , g ( n ) and g ( n ) → n → ∞ . Proof of Theorem 3.
We have the following decomposition,˜Σ ijA = 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:0) Y st ± ¯ Y s − µ g (cid:1) i (cid:16) Y s ( t + k ) ± ¯ Y s − µ g (cid:17) j = ˆΣ ijA + 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:20)(cid:0) Y st − ¯ Y s (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j + (cid:0) ¯ Y s − µ g (cid:1) i (cid:16) Y s ( t + k ) − ¯ Y s (cid:17) j (cid:21) + m m (cid:88) s =1 (cid:0) ¯ Y s − µ g (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) (cid:18) − | k | b n (cid:19) = ˆΣ ijA + N + N , N = 1 m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:20)(cid:0) Y st − ¯ Y s (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j + (cid:0) ¯ Y s − µ g (cid:1) i (cid:16) Y s ( t + k ) − ¯ Y s (cid:17) j (cid:21) N = m m (cid:88) s =1 (cid:0) ¯ Y s − µ g (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) (cid:18) − | k | b n (cid:19) . Using the above breakdown and Lemma 2, (cid:12)(cid:12)(cid:12) ˆΣ ijG − Σ ij (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ˆΣ ijA − Σ ij + N + N + M + M (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ˆΣ ijA − Σ ij (cid:12)(cid:12)(cid:12) + | N | + | N | + | M + M | (16)By the strong consistency of single-chain SV estimator, the first term goes to 0 with probability 1and by Lemma 2, the third term goes to 0 with probability 1 as n → ∞ . It is left to show that | N | → | N | → | N | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:20)(cid:0) Y st − ¯ Y s (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j + (cid:0) ¯ Y s − µ g (cid:1) i (cid:16) Y s ( t + k ) − ¯ Y s (cid:17) j (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:104)(cid:0) Y st − ¯ Y s (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:20)(cid:0) ¯ Y s − µ g (cid:1) i (cid:16) Y s ( t + k ) − ¯ Y s (cid:17) j (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . We will show that the first term goes to 0 and the proof for the second term is similar. Consider (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n (cid:88) t ∈ I k (cid:104)(cid:0) Y st − ¯ Y s (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:0) ¯ Y s − µ g (cid:1) j (cid:12)(cid:12)(cid:12) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) t ∈ J k (cid:0) µ g − Y st (cid:1) i + | k | (cid:0) ¯ Y s − µ g (cid:1) i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:107) ¯ Y s − µ g (cid:107) ∞ n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k (cid:0) µ g − Y st (cid:1) + | k | (cid:0) ¯ Y s − µ g (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ m m (cid:88) s =1 b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:107) ¯ Y s − µ g (cid:107) ∞ n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k (cid:0) Y st − µ g (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ + | k | (cid:107) ¯ Y s − µ g (cid:107) ∞ ≤ cm m (cid:88) s =1 b n − (cid:88) k = − b n +1 (cid:107) ¯ Y s − µ g (cid:107) ∞ n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) t ∈ J k (cid:0) Y st − µ g (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + cm m (cid:88) s =1 b n ( b n − n (cid:13)(cid:13) ¯ Y s − µ g (cid:13)(cid:13) ∞ . Using SIP on the summation of k terms, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m m (cid:88) s =1 b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) n n − k (cid:88) t =1 (cid:104)(cid:0) Y st − ¯ Y s (cid:1) i (cid:0) ¯ Y s − µ g (cid:1) j (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < cm m (cid:88) s =1 (cid:107) ¯ Y s − µ g (cid:107) ∞ b n − (cid:88) k = − b n +1 (cid:34) Dψ ( k ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ k log log kn (cid:35) + cm m (cid:88) s =1 b n ( b n − n (cid:107) ¯ Y s − µ g (cid:107) ∞ < c (2 b n − m m (cid:88) s =1 (cid:107) ¯ Y s − µ g (cid:107) ∞ (cid:34) Dψ ( n ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ n log log nn (cid:35) + cm m (cid:88) s =1 b n ( b n − n (cid:107) ¯ Y s − µ g (cid:107) ∞ ≤ c (cid:32) b n − b n n − b n n (cid:33) (cid:34) Dψ ( n ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ n log log nn (cid:35) → . (by (12))Similarly, the second part of N → | N | ≤ Cb n (cid:34) Dψ ( n ) n + (cid:107) L (cid:107) p / (1 + (cid:15) ) √ n log log nn (cid:35) → . Thus, in (16), every term goes to 0 and ˆΣ ijG → Σ ij with probability 1 as n → ∞ . A.4 Proof of Theorem 4
By Equation 11, E (cid:104) ˆΥ G ( k ) (cid:105) = (cid:18) − | k | n (cid:19) Υ( k ) + O + O . where both O and O are the small order terms where O = n − | k | O ( n − ) and O = O ( n − ).By our assumptions, (cid:80) ∞ k = −∞ Υ( k ) < ∞ . For a truncation point b n , w ( k/b n ) = 0 for all | k | > b n .Therefore, SV estimator can be written as a weighted sum of estimated autocovariances from lag26 n + 1 to n −
1. Consider the G-SV estimator, E (cid:104) ˆΣ G − Σ (cid:105) = n − (cid:88) k = − n +1 w (cid:18) kb n (cid:19) E (cid:104) ˆΥ G ( k ) (cid:105) − ∞ (cid:88) k = −∞ Υ( k )= n − (cid:88) k = − n +1 w (cid:18) kb n (cid:19) (cid:34)(cid:18) − kn (cid:19) Υ( k ) + O + O (cid:35) − ∞ (cid:88) k = −∞ Υ( k )= n − (cid:88) k = − n +1 (cid:34) w (cid:18) kb n (cid:19) (cid:18) − | k | n (cid:19) Υ( k ) (cid:35) − ∞ (cid:88) k = −∞ Υ( k ) + n − (cid:88) k = − n +1 (cid:34) w (cid:18) kb n (cid:19) ( O + O ) (cid:35) = P + P , (17)where P = n − (cid:88) k = − n +1 (cid:34) w (cid:18) kb n (cid:19) (cid:18) − | k | n (cid:19) Υ( k ) (cid:35) − ∞ (cid:88) k = −∞ Υ( k ) and P = n − (cid:88) k = − n +1 (cid:34) w (cid:18) kb n (cid:19) ( O + O ) (cid:35) . Similar to Hannan (1970), we break P into three parts. Note that notation A = o ( z ) for matrix A implies A ij = o ( z ) for every ( i, j )th element of the matrix A . Consider, P = − (cid:88) | k |≥ n Υ( k ) − n − (cid:88) k = − n +1 w (cid:18) kn (cid:19) | k | n Υ( k ) − n − (cid:88) k = − n +1 (cid:32) − w (cid:18) kn (cid:19)(cid:33) Υ( k ) . (18)We deal with the three subterms of term P individually. First, − (cid:88) | k |≥ n Υ( k ) ≤ (cid:88) | k |≥ n (cid:12)(cid:12)(cid:12)(cid:12) kn (cid:12)(cid:12)(cid:12)(cid:12) q Υ( k ) = 1 b qn (cid:12)(cid:12)(cid:12)(cid:12) b n n (cid:12)(cid:12)(cid:12)(cid:12) q (cid:88) k ≥ n | k | q Υ( k ) = o (cid:18) b qn (cid:19) , (19)since (cid:80) | k |≥ n | k | q Υ( k ) < ∞ . Next, n − (cid:88) k = − n +1 w (cid:18) kn (cid:19) | k | n Υ( k ) ≤ cn n − (cid:88) k = − n +1 | k | Υ( k ) . q ≥ cn n − (cid:88) k = − n +1 | k | Υ( k ) ≤ cn n − (cid:88) k = − n +1 | k | q Υ( k ) = 1 b qn b qn n c n − (cid:88) k = − n +1 | k | q Υ( k ) = o (cid:18) b qn (cid:19) . For q < cn n − (cid:88) k = − n +1 | k | Υ( k ) ≤ c n − (cid:88) k = − n +1 (cid:12)(cid:12)(cid:12)(cid:12) kn (cid:12)(cid:12)(cid:12)(cid:12) q Υ( k ) = 1 b qn b qn n q c n − (cid:88) k = − n +1 | k | q Υ( k ) = o (cid:18) b qn (cid:19) . So, n − (cid:88) k = − n +1 w (cid:18) kn (cid:19) | k | n Υ( k ) = o (cid:18) b qn (cid:19) (20)Lastly, by our assumptions, for x → − w ( x ) | x | q = k q + o (1) . For x = k/b n , (cid:12)(cid:12) k/b n (cid:12)(cid:12) − q (1 − w ( k/b n )) converges boundedly to k q for each k . So, n − (cid:88) k = − n +1 (cid:32) − w (cid:18) kb n (cid:19)(cid:33) Υ( k ) = − b qn n − (cid:88) k = − n +1 (cid:18) | k | b n (cid:19) − q (cid:32) − w (cid:18) kb n (cid:19)(cid:33) | k | q Υ( k )= − b qn n − (cid:88) k = − n +1 (cid:2) k q + o (1) (cid:3) | k | q Γ( k )= − k q Φ ( q ) b qn + o (cid:18) b qn (cid:19) . (21)Finally, we will solve for P . Note that O = ( | k | /n ) O (1 /n ) and O = O (1 /n ). So, P ≤ n − (cid:88) k = − n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19) O (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + n − (cid:88) k = − n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19) O (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:18) n (cid:19) n − (cid:88) k = − n +1 | k | n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + O (cid:18) n (cid:19) b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:18) b n n (cid:19) b n b n − (cid:88) k = − b n +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:18) kb n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O (cid:18) b n n (cid:19) = o (cid:32) b n b q +1 n (cid:33) = o (cid:18) b qn (cid:19) . (22)Using (18),(19), (20), (21), and (22) in (17), E (cid:104) ˆΣ G − Σ (cid:105) = − k q Φ ( q ) b qn + o (cid:18) b qn (cid:19) , which completes the result. A.5 Proof of Theorem 5
By Lemma 2, (cid:12)(cid:12)(cid:12) ˆΣ ijG − ˜Σ ij (cid:12)(cid:12)(cid:12) ≤ m m (cid:88) s =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b n − (cid:88) k = − b n +1 w (cid:18) kb n (cid:19) (cid:88) t ∈ I k (cid:32) ( Y st − µ g ) i ( µ g − ¯¯ Y ) j n (cid:33) + (cid:32) ( µ g − ¯¯ Y ) i ( Y s ( t + k ) − µ g ) j n (cid:33) +( µ g − ¯¯ Y )( µ g − ¯¯ Y ) T b n − (cid:88) k = − b n +1 (cid:18) n − | k | n (cid:19) w (cid:18) kn (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < D g ( n ) + Dg ( n ) + g ( n ) , where g ( n ) , g ( n ) , g ( n ) → n → ∞ as defined in Lemma 2. Then there exists an N such that (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) = (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) I (0 ≤ n ≤ N ) + (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) I ( n > N ) ≤ (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) I (0 ≤ n ≤ N ) + (cid:16) D g ( n ) + Dg ( n ) + g ( n ) (cid:17) I ( n > N ):= g ∗ n ( Y , . . . , Y n , . . . , Y m , . . . , Y mn ) . But since by assumption E D < ∞ and the fourth moment is finite, E (cid:12)(cid:12) g ∗ n (cid:12)(cid:12) ≤ E (cid:20)(cid:16) ˆΣ ijG − ˜Σ ijA (cid:17) (cid:21) + E (cid:20)(cid:16) D g ( n ) + Dg ( n ) + g ( n ) (cid:17) (cid:21) < ∞ . E | g ∗ n | < ∞ and further as n → ∞ , g n → g , g , g → E g ∗ n →
0. By the majorized convergence theorem (Zeidler, 2013), as n → ∞ , E (cid:20)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) (cid:21) → . (23)We will use (23) to show that the variances are equivalent. Define, ξ (cid:16) ˆΣ ijG , ˜Σ ij (cid:17) = Var (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) + 2 E (cid:34)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) (cid:18) ˜Σ ij − E (cid:16) ˜Σ ij (cid:17)(cid:19)(cid:35) . We will show that the above is o (1). Using Cauchy-Schwarz inequality followed by (23), (cid:12)(cid:12)(cid:12)(cid:12) ξ (cid:16) ˆΣ ijG , ˜Σ ij (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) Var (cid:16) ˆΣ ijG − ˜Σ ij (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E (cid:34)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) (cid:18) ˜Σ ij − E (cid:16) ˜Σ ij (cid:17)(cid:19)(cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E (cid:20)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) (cid:21) + 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) E (cid:20)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) (cid:21) Var (cid:16) ˜Σ ij (cid:17)(cid:33) / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o (1) + 2 o (1) (cid:32) O (cid:18) b n n (cid:19) + o (cid:18) b n n (cid:19)(cid:33) = o (1) . Finally,Var (cid:16) ˆΣ ijG (cid:17) = E (cid:34)(cid:18) ˆΣ ijG − E (cid:104) ˆΣ ijR (cid:105)(cid:19) (cid:35) = E (cid:34)(cid:18) ˆΣ ijG ± ˜Σ ij ± E (cid:104) ˜Σ ij (cid:105) − E (cid:104) ˆΣ ijG (cid:105)(cid:19) (cid:35) = E (cid:32)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) + (cid:18) ˜Σ ij − E (cid:104) ˜Σ ij (cid:105)(cid:19) + (cid:18) E (cid:104) ˜Σ ij (cid:105) − E (cid:104) ˆΣ ijG (cid:105)(cid:19)(cid:33) = E (cid:34)(cid:18) ˜Σ ij − E (cid:104) ˜Σ ij (cid:105)(cid:19) (cid:35) + E (cid:32)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) + (cid:18) E (cid:104) ˜Σ ij (cid:105) − E (cid:104) ˆΣ ijG (cid:105)(cid:19)(cid:33) + 2 E (cid:34)(cid:18) ˜Σ ij − E (cid:104) ˜Σ ij (cid:105)(cid:19) (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) + 2 (cid:18) ˜Σ ij − E (cid:104) ˜Σ ij (cid:105)(cid:19) (cid:18) E (cid:104) ˜Σ ij (cid:105) − E (cid:104) ˆΣ ijG (cid:105)(cid:19)(cid:35) = Var (cid:16) ˜Σ ij (cid:17) + Var (cid:16) ˆΣ ijG − ˜Σ ij (cid:17) + 2 E (cid:34)(cid:16) ˆΣ ijG − ˜Σ ij (cid:17) (cid:18) ˜Σ ij − E (cid:16) ˜Σ ij (cid:17)(cid:19)(cid:35) + o (1)30 Var (cid:16) ˜Σ ij (cid:17) + o (1) . Hannan (1970) has given the calculations for variance of ˜Σ as nb n Var( ˜Σ ij ) = (cid:20) Σ ii Σ jj + (cid:16) Σ ij (cid:17)(cid:21) (cid:90) ∞−∞ w ( x ) dx + o (1) . (24)Plugging (24) into variance of ˆΣ G gives the result of the theorem. B Additional Examples
We present two additional examples to illustrate the advantage of the G-ACF estimator.
B.1 Bayesian Poisson Change Point Model
Consider the militarized interstate dispute (MID) data of Martin et al. (2011) which describesthe annual number of military conflicts in the United States. In order to detect the number andtimings of the cyclic phases in international conflicts, we fit a Bayesian Poisson change-point model.Following Martin et al. (2011), we will use
MCMCpoissonChange from
MCMCpack to fit the modelwith six change-points which samples the latent states based on the algorithm in Chib (1998). ThePoisson change-point model in
MCMCpoissonChange uses conjugate priors and is the following: y t ∼ Poisson( λ i ) , i = 1 , ..., kλ i ∼ Gamma( c o , d o ) , i = 1 , ..., kp ii ∼ Beta( α, β ) , i = 1 , ..., k . This yields a 7-dimensional posterior distribution in λ = ( λ , . . . , λ ) T . Figure 8 shows the evolutionof two chains started from random points for the second and third component. We report the ACFplots for the second component component only, however, similar behavior is observed in ACF plotsof other components as well.Figure 9 demonstrates a striking advantage of G-ACF against locally-centered ACF in estimating31 Time C o m ponen t − Time C o m ponen t − Figure 8: Change point: Trace plots for second (left) and third (right) component. . . . lag A C F . . . . . . lag A C F . . . . . . lag A C F . . . . . . lag A C F Figure 9: Change point: ACF plots using local (left) and global (right) centering for m = 2 parallelMarkov chains. The individual chains are shown through dashed lines and average over m chainsis the solid line. The chain lengths are n = 1000 (top row) and n = 10000 (bottom row).32he autocorrelations. For a chain length of n = 1000, locally-centered ACF gives a false sense ofsecurity about the Markov chains whereas in reality, the process is highly correlated. G-ACF givesa more accurate estimation of autocorrelations even at short run lengths. B.2 Network crawling
The faux.magnolia.high dataset available in the ergm R package represents a simulated friendshipnetwork based on Ad-Health data (Resnick et al. (1997)). The school communities represented bythe network data are located in the southern United States. Each node represents a student andeach edge represents a friendship between the nodes it connects.The goal is to draw nodes uniformly from the network by using a network crawler. Nilakanta et al.(2019) modified the data by removing 1,022 out of 1,461 nodes to obtain a well-connected graph.This resulting social network has 439 nodes and 573 edges. We use a Metropolis-Hastings algorithmwith a simple random-walk proposal suggested by Gjoka et al. (2011). Each node is associated withfive features namely - degree of connection, cluster coefficient, grade, binary sex indicator (1 forfemale, 0 for male), and binary race indicator (1 for white, 0 for others). We sample two parallelMarkov chains starting from two students belonging to different races. − . . . . lag A C F . . . . . . lag A C F . . . . . . lag A C F . . . . . . lag A C F Figure 10: Crawling: ACF plots using local-centering (left) and global-centering (right) for m = 2parallel Markov chains. The individual chains are shown through dashed lines and average over m chains is the solid line. The chain lengths are n = 100 (top row) and n = 1000 (bottom row).33igure 10 shows the ACF plots for the grade feature at two different simulation sizes for m = 2parallel Markov chains. For a shorter chain length, both the chains have explored different clustersand as a consequence, the local and global mean do not agree. Regardless, G-ACF displays a clearadvantage over locally-centered ACF for a chain length of n = 100 samples whereas the latter takes n = 1000 samples to reach the truth. References
Anderson, T. W. (1971).
The Statistical Analysis of Time Series.
John Wiley & Son, New York.Andrews, D. W. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix esti-mation.
Econometrica , 59:817–858.Chen, D.-F. R. and Seila, A. F. (1987). Multivariate inference in stationary simulation using batchmeans. In
Proceedings of the 19th conference on Winter simulation , pages 302–304. ACM.Chib, S. (1998). Estimation and comparison of multiple change-point models.
Journal of econo-metrics , 86(2):221–241.Cs¨org¨o, M. and R´ev´esz, P. (1981).
Strong Approximations in Probability and Statistics . Probabilityand Mathematical Statistics : a series of monographs and textbooks. Academic Press.Dai, N. and Jones, G. L. (2017). Multivariate initial sequence estimators in Markov chain MonteCarlo.
Journal of Multivariate Analysis , 159:184–199.Damerdji, H. (1991). Strong consistency and other properties of the spectral variance estimator.
Management Science , 37:1424–1440.Damerdji, H. (1995). Mean-square consistency of the variance estimator in steady-state simulationoutput analysis.
Operations Research , 43(2):282–291.Flegal, J. M. and Gong, L. (2015). Relative fixed-width stopping rules for Markov chain MonteCarlo simulations.
Statistica Sinica , 25:655–676.34legal, J. M., Haran, M., and Jones, G. L. (2008). Markov chain Monte Carlo: Can we trust thethird significant figure?
Statistical Science , 23:250–260.Flegal, J. M., Hughes, J., Vats, D., Dai, N., and Maji, U. (2020). mcmcse: Monte Carlo StandardErrors for MCMC . Riverside, CA, Denver, CO, Coventry, UK, and Minneapolis, MN. R packageversion 1.4-1.Flegal, J. M. and Jones, G. L. (2010). Batch means and spectral variance estimators in Markovchain Monte Carlo.
The Annals of Statistics , 38:1034–1070.Gelman, A. and Meng, X.-L. (1991). A note on bivariate distributions that are conditionally normal.
The American Statistician , 45(2):125–126.Gjoka, M., Kurant, M., Butts, C. T., and Markopoulou, A. (2011). Practical recommendations oncrawling online social networks.
IEEE Journal on Selected Areas in Communications , 29(9):1872–1892.Glynn, P. W. and Whitt, W. (1992). The asymptotic validity of sequential stopping rules forstochastic simulations.
The Annals of Applied Probability , 2:180–198.Gong, L. and Flegal, J. M. (2016). A practical sequential stopping rule for high-dimensional Markovchain Monte Carlo.
Journal of Computational and Graphical Statistics , 25:684–700.Gupta, K. and Vats, D. (2020). Estimating Monte Carlo variance from multiple Markov chains. arXiv preprint arXiv:2007.04229 .Hannan, E. J. (1970). Multiple time series: Wiley series in probability and mathematical statistics.Heberle, J. and Sattarhoff, C. (2017). A fast algorithm for the computation of HAC covariancematrix estimators.
Econometrics , 5(1):9.Ihler, A. T., Fisher, J. W., Moses, R. L., and Willsky, A. S. (2005). Nonparametric belief propaga-tion for self-localization of sensor networks.
IEEE Journal on Selected Areas in Communications ,23(4):809–819. 35ass, R. E., Carlin, B. P., Gelman, A., and Neal, R. M. (1998). Markov chain Monte Carlo inpractice: a roundtable discussion.
The American Statistician , 52:93–100.Kuelbs, J. and Philipp, W. (1980). Almost sure invariance principles for partial sums of mixingB-valued random variables.
The Annals of Probability , 8:1003–1036.Liu, Y. and Flegal, J. M. (2018). Weighted batch means estimators in Markov chain Monte Carlo.
Electronic Journal of Statistics , 12:3397–3442.Martin, A. D., Quinn, K. M., and Park, J. H. (2011). MCMCpack: Markov chain Monte Carlo inR.Meyn, S. P. and Tweedie, R. L. (2009).
Markov Chains and Stochastic Stability . CambridgeUniversity Press.Nilakanta, H., Almquist, Z. W., and Jones, G. L. (2019). Ensuring reliable Monte Carlo estimatesof network properties. arXiv preprint arXiv:1911.08682 .Priestley, M. B. (1981).
Spectral Analysis and Time Series: Probability and Mathematical Statistics .Number 04; QA280, P7.Resnick, M. D., Bearman, P. S., Blum, R. W., Bauman, K. E., Harris, K. M., Jones, J., Tabor, J.,Beuhring, T., Sieving, R. E., Shew, M., et al. (1997). Protecting adolescents from harm: findingsfrom the national longitudinal study on adolescent health.
Jama , 278(10):823–832.Roy, V. (2019). Convergence diagnostics for Markov chain Monte Carlo.
Annual Review of Statisticsand Its Application , 7.Song, W. T. and Schmeiser, B. W. (1995). Optimal mean-squared-error batch sizes.
ManagementScience , 41(1):110–123.Strassen, V. (1964). An invariance principle for the law of the iterated logarithm.
Zeitschrift f¨urWahrscheinlichkeitstheorie und Verwandte Gebiete , 3:211–226.Tak, H., Meng, X.-L., and van Dyk, D. A. (2018). A repelling–attracting Metropolis algorithm formultimodality.
Journal of Computational and Graphical Statistics , 27(3):479–490.36jøstheim, D. (1990). Non-linear time series and Markov chains.
Advances in Applied Probability ,22:587–611.Vats, D., Flegal, J. M., and Jones, G. L. (2018). Strong consistency of multivariate spectral varianceestimators in Markov chain Monte Carlo.
Bernoulli , 24:1860–1909.Vats, D., Flegal, J. M., and Jones, G. L. (2019). Multivariate output analysis for Markov chainMonte Carlo.
Biometrika , 106:321–337.Vats, D., Robertson, N., Flegal, J. M., and Jones, G. L. (2020). Analyzing Markov chain MonteCarlo output.
Wiley Interdisciplinary Reviews: Computational Statistics , 12:e1501.Zeidler, E. (2013).