On the auxiliary particle filter
aa r X i v : . [ m a t h . S T ] S e p On the auxiliary particle filter
R. Douc
Ecole Polytechnique, Paris, France. ´E. Moulines and J. Olsson
Ecole Nationale Sup ´erieure des T ´el ´ecommunications, Paris, France.
Summary . In this article we study asymptotic properties of weighted samples produced by theauxiliary particle filter (APF) proposed by Pitt and Shephard (1999a). Besides establishing acentral limit theorem (CLT) for smoothed particle estimates, we also derive bounds on the L p error and bias of the same for a finite particle sample size. By examining the recursive formulafor the asymptotic variance of the CLT we identify first-stage importance weights for which theincrease of asymptotic variance at a single iteration of the algorithm is minimal. In the light ofthese findings, we discuss and demonstrate on several examples how the APF algorithm canbe improved.
1. Introduction
In this paper we consider a state space model where a sequence Y , { Y k } ∞ k =0 is modeledas a noisy observation of a Markov chain X , { X k } ∞ k =0 , called the state sequence , which ishidden. The observed values of Y are conditionally independent given the hidden states X and the corresponding conditional distribution of Y k depends on X k only. When operatingon a model of this form the joint smoothing distribution , that is, the joint distribution of( X , . . . , X n ) given ( Y , . . . , Y n ), and its marginals will be of interest. Of particular interestis the filter distribution , defined as the marginal of this law with respect to the component X n is referred to. Computing these posterior distributions will be the key issue when filteringthe hidden states as well as performing inference on unknown model parameters. Theposterior distribution can be recursively updated as new observations become available—making single-sweep processing of the data possible—by means of the so-called smoothingrecursion . However, in general this recursion cannot be applied directly since it involvesthe evaluation of complicated high-dimensional integrals. In fact, closed form solutionsare obtainable only for linear/Gaussian models (where the solutions are acquired using the disturbance smoother ) and models where the state space of the latent Markov chain is finite. Sequential Monte Carlo (SMC) methods , often alternatively termed particle filters , pro-vide a helpful tool for computing approximate solutions to the smoothing recursion forgeneral state space models, and the field has seen a drastic increase in interest over recentyears. These methods are based on the principle of, recursively in time, approximating thesmoothing distribution with the empirical measure associated with a weighted sample of particles . At present time there are various techniques for producing and updating such aparticle sample (see Fearnhead, 1998; Doucet et al. , 2001; Liu, 2001). For a comprehensivetreatment of the theoretical aspects of SMC methods we refer to the work by Del Moral(2004).In this article we analyse the auxiliary particle filter (APF) proposed by Pitt and Shephard(1999a), which has proved to be one of the most useful and widely adopted implementa-
Douc et al. tions of the SMC methodology. Unlike the traditional bootstrap particle filter (Gordon et al. ,1993), the APF enables the user to affect the particle sample allocation by designing freelya set of first-stage importance weights involved in the selection procedure. Prevalently, thishas been used for assigning large weight to particles whose offsprings are likely to land upin zones of the state space having high posterior probability. Despite its obvious appeal, itis however not clear how to optimally exploit this additional degree of freedom.In order to better understand this issue, we present an asymptotical analysis (being acontinuation of (Olsson et al. , 2006) and based on recent results by (Chopin, 2004; K¨unsch,2005; Douc and Moulines, 2005) on weighted systems of particles) of the algorithm. Morespecifically, we establish CLTs (Theorems 3.1 and 3.2), with explicit expressions of theasymptotic variances, for two different versions (differentiated by the absence/presence of aconcluding resampling pass at the end of each loop) of the algorithm under general modelspecifications. The convergence bear upon an increasing number of particles, and a recentresult in the same spirit has, independently of (Olsson et al. , 2006), been stated in themanuscript (Doucet and Johansen, 2007). Using these results, we also—and this is the maincontribution of the paper—identify first-stage importance weights which are asymptoticallymost efficient. This result provides important insights in optimal sample allocation forparticle filters in general, and we also give an interpretation of the finding in terms ofvariance reduction for stratified sampling.In addition, we prove (utilising a decomposition of the Monte Carlo error proposedby Del Moral (2004) and refined by Olsson et al. (2005)) time uniform convergence in L p (Theorem 3.3) under more stringent assumptions of ergodicity of the conditional hiddenchain. With support of this stability result and the asymptotic analysis we conclude thatinserting a final selection step at the end of each loop is—at least as long as the numberof particles used in the two stages agree—superfluous, since such an operation exclusivelyincreases the asymptotic variance.Finally, in the implementation section (Section 5) several heuristics, derived from the ob-tained results, for designing efficient first-stage weights are discussed, and the improvementimplied by approximating the asymptotically optimal first-stage weights is demonstratedon several examples.
2. Notation and basic concepts
We denote by ( X , X ), Q , and ν the state space, transition kernel, and initial distribution of X , respectively, and assume that all random variables are defined on a common probabilityspace (Ω , P , A ). In addition we denote by ( Y , Y ) the state space of Y and suppose thatthere exists a measure λ and, for all x ∈ X , a non-negative function y g ( y | x ) such that,for k ≥ P ( Y k ∈ A | X k = x ) = R A g ( y | x ) λ (d y ), A ∈ Y . Introduce, for i ≤ j , the vectornotation X i : j , ( X i , . . . , X j ); similar notation will be used for other quantities. The jointsmoothing distribution of denoted by φ n ( A ) , P ( X n ∈ A | Y n = y n ) , A ∈ X ⊗ ( n +1) , and a straightforward application of Bayes’s formula shows that φ k +1 ( A ) = R A g ( y k +1 | x k +1 ) Q ( x k , d x k +1 ) φ k (d x k ) R X k +2 g ( y k +1 | x ′ k +1 ) Q ( x ′ k , d x ′ k +1 ) φ k (d x ′ k ) , (2.1) n the auxiliary particle filter for sets A ∈ X ⊗ ( k +2) . We will throughout this paper assume that we are given a sequence { y k ; k ≥ } of fixed observations, and write, for x ∈ X , g k ( x ) , g ( y k | x ). Moreover, fromnow on we let the dependence on these observations of all other quantities be implicit, anddenote, since the coming analysis is made exclusively conditionally on the given observedrecord, by P and E the conditional probability measure and expectation with respect tothese observations. Let us recall the APF algorithm by Pitt and Shephard (1999a). Assume that we at time k have a particle sample { ( ξ N,i k , ω N,ik ) } Ni =1 (each random variable ξ N,i k taking values in X k +1 )providing an approximation P Ni =1 ω N,ik δ ξ N,i k / Ω Nk of the joint smoothing distribution φ k ,where Ω Nk , P Ni =1 ω N,ik and ω N,ik ≥
0, 1 ≤ i ≤ N . Then, when the observation y k +1 becomes available, an approximation of φ k +1 is obtained by plugging the empirical measure φ Nk into the recursion (2.1), yielding, for A ∈ X ⊗ ( k +1) ,¯ φ Nk +1 ( A ) , N X i =1 ω N,ik H u k ( ξ N,i k , X k +2 ) P Nj =1 ω N,jk H u k ( ξ N,j k , X k +2 ) H k ( ξ N,i k , A ) , A ∈ X ⊗ ( n +1) . Here we have introduced, for x k ∈ X k +1 and A ∈ X ⊗ ( k +1) , the unnormalised kernels H u k ( x k , A ) , Z A g k +1 ( x ′ k +1 ) δ x k (d x ′ k ) Q ( x ′ k , d x ′ k +1 )and H k ( x k , A ) , H u k ( x k , A ) /H u k ( x k , X k +2 ). Simulating from H k ( x k , A ) consists inextending the trajectory x k ∈ X k +1 with an additional component being distributed ac-cording to the optimal kernel , that is, the distribution of X k +1 conditional on X k = x k and the observation Y k +1 = y k +1 . Now, since we want to form a new weighted sampleapproximating φ k +1 , we need to find a convenient mechanism for sampling from ¯ φ Nk +1 given { ( ξ N,i k , ω N,ik ) } Ni =1 . In most cases cases it is possible—but generally computation-ally expensive—to simulate from ¯ φ Nk +1 directly using auxiliary accept-reject sampling (seeH¨urzeler and K¨unsch, 1998; K¨unsch, 2005). A computationally cheaper (see K¨unsch, 2005,p. 1988, for a discussion of the acceptance probability associated with the auxiliary accept-reject sampling approach) solution consists in producing a weighted sample approximating¯ φ Nk +1 by sampling from the importance sampling distribution ρ Nk +1 ( A ) , N X i =1 ω N,ik τ N,ik P Nj =1 ω N,jk τ N,jk R p k ( ξ N,i k , A ) , A ∈ X ⊗ ( k +2) . Here τ N,ik , 1 ≤ i ≤ N , are positive numbers referred to as first-stage weights (Pitt and Shephard,1999a, use the term adjustment multiplier weights ) and in this article we consider first-stageweights of type τ N,ik = t k ( ξ N,i k ) (2.1)for some function t k : X k +1 → R + . Moreover, the pathwise proposal kernel R p k is, for x k ∈ X k +1 and A ∈ X ⊗ ( k +2) , of form R p k ( x k , A ) = Z A δ x k (d x ′ k ) R k ( x ′ k , d x ′ k +1 ) Douc et al. with R k being such that Q ( x, · ) ≪ R k ( x, · ) for all x ∈ X . Thus, a draw from R p k ( x k , · ) isproduced by extending the trajectory x k ∈ X k +1 with an additional component obtainedby simulating from R k ( x k , · ). It is easily checked that for x k +1 ∈ X k +2 ,d ¯ φ Nk +1 d ρ Nk +1 ( x k +1 ) ∝ w k +1 ( x k +1 ) , N X i =1 ξ N,i k ( x k ) g k +1 ( x k +1 ) τ N,ik d Q ( x k , · )d R k ( x k , · ) ( x k +1 ) . (2.2)An updated weighted particle sample { ( ˜ ξ N,i k +1 , ˜ ω N,ik +1 ) } M N i =1 targeting ¯ φ Nk +1 is hence generatedby simulating M N particles ˜ ξ N,i k +1 , 1 ≤ i ≤ M N , from the proposal ρ Nk +1 and associatingwith these particles the second-stage weights ˜ ω N,ik +1 , w k +1 ( ˜ ξ N,i k +1 ), 1 ≤ i ≤ M N . By theidentity function in (2.2), only a single term of the sum will contribute to the second-stageweight of a particle.Finally, in an optional second-stage resampling pass a uniformly weighted particle sam-ple { ( ˜ ξ N,i k +1 , } Ni =1 , still targeting ¯ φ Nk +1 , is obtained by resampling N of the particles ˜ ξ N,i k +1 ,1 ≤ i ≤ M N , according to the normalised second-stage weights. Note that the number ofparticles in the last two samples, M N and N , may be different. The procedure is now re-peated recursively (with ω N,ik +1 ≡
1, 1 ≤ i ≤ N ) and is initialised by drawing ξ N,i , 1 ≤ i ≤ N ,independently from ς , where ν ≪ ς , yielding ω N,i = w ( ξ N,i ) with w ( x ) , g ( x ) d ν/ d ς ( x ), x ∈ X . To summarise, we obtain, depending on whether second-stage resampling is per-formed or not, the procedures described in Algorithms 1 and 2. Algorithm 1
Two-Stage Sampling Particle Filter (TSSPF)
Ensure: { ( ξ N,i k , ω N,ik ) } Ni =1 approximates φ k . for i = 1 , . . . , M N do ⊲ First stage draw indices I N,ik from the set { , . . . , N } multinomially with respect to the nor-malised weights ω N,jk τ N,jk / P Nℓ =1 ω N,ℓk τ N,ℓk , 1 ≤ j ≤ N ; simulate ˜ ξ N,i k +1 ( k + 1) ∼ R k [ ξ N,I
N,ik k ( k ) , · ], and set ˜ ξ N,i k +1 , [ ξ N,I
N,ik k , ˜ ξ N,i k +1 ( k + 1)] and ˜ ω N,ik +1 , w k +1 ( ˜ ξ N,i k +1 ). end for for i = 1 , . . . , N do ⊲ Second stage draw indices J N,ik +1 from the set { , . . . , M N } multinomially with respect to the nor-malised weights ˜ ω N,jk +1 / P Nℓ =1 ˜ ω N,ℓk +1 , 1 ≤ j ≤ N , and set ξ N,i k +1 , ˜ ξ N,J
N,ik +1 k +1 . Finally, reset the weights: ω N,ik +1 = 1. end for Take { ( ξ N,i k +1 , } Ni =1 as an approximation of φ k +1 .We will use the term APF as a family name for both these algorithms and refer to themseparately as two-stage sampling particle filter (TSSPF) and single-stage auxiliary particlefilter (SSAPF). Note that we by letting τ N,ik ≡
1, 1 ≤ i ≤ N , in Algorithm 2 obtain thebootstrap particle filter suggested by Gordon et al. (1993).The resampling steps of the APF can of course be implemented using techniques (e.g., residual or systematic resampling) different from multinomial resampling, leading to straight-forward adaptions not discussed here. We believe however that the results of the cominganalysis are generally applicable and extendable to a large class of selection schemes. n the auxiliary particle filter Algorithm 2
Single-Stage Auxiliary Particle Filter (SSAPF)
Ensure: { ( ξ N,i k , ω N,ik ) } Ni =1 approximates φ k . for i = 1 , . . . , N do draw indices I N,ik from the set { , . . . , N } multinomially with respect to the nor-malised weights ω N,jk τ N,jk / P Nℓ =1 ω N,ℓk τ N,ℓk , 1 ≤ j ≤ N ; simulate ˜ ξ N,i k +1 ( k + 1) ∼ R k [ ξ N,I
N,ik k ( k ) , · ], and set ˜ ξ N,i k +1 , [ ξ N,I
N,ik k , ˜ ξ N,i k +1 ( k + 1)] and ˜ ω N,ik +1 , w k +1 ( ˜ ξ N,i k +1 ). end for Take { ( ˜ ξ N,i k +1 , ˜ ω N,ik +1 ) } Ni =1 as an approximation of φ k +1 .The issue whether second-stage resampling should be performed or not has been treatedby several authors, and the theoretical results on the particle approximation stability andasymptotic variance presented in the next section will indicate that the second-stage se-lection pass should, at least for the case M N = N , be canceled, since this exclusivelyincreases the sampling variance. Thus, the idea that the second-stage resampling pass isnecessary for preventing the particle approximation from degenerating does not apparentlyhold. Recently, a similar conclusion was reached in the manuscript (Doucet and Johansen,2007).The advantages of the APF not possessed by standard SMC methods is the possibilityof, firstly, choosing the first-stage weights τ N,ik arbitrarily and, secondly, letting N and M N be different (TSSPF only). Appealing to common sense, SMC methods work efficientlywhen the particle weights are well-balanced, and Pitt and Shephard (1999a) propose severalstrategies for achieving this by adapting the first-stage weights. In some cases it is possibleto fully adapt the filter to the model (see Section 5), providing exactly equal importanceweights; otherwise, Pitt and Shephard (1999a) suggest, in the case R k ≡ Q and X = R d , thegeneric first-stage importance weight function t P&S k ( x k ) , g k +1 [ R R d x ′ Q ( x k , d x ′ )], x k ∈ R k +1 . The analysis that follows will however show that this way of adapting the first-stage weights is not necessarily good in terms of asymptotic (as N tends to infinity) samplevariance; indeed, using first-stage weights given by t P&S k can be even detrimental for somemodels.
3. Bounds and asymptotics for produced approximations
Introduce, for any probability measure µ on some measurable space ( E , E ) and µ -measurablefunction f satisfying R E | f ( x ) | µ (d x ) < ∞ , the notation µf , R E f ( x ) µ (d x ). Moreover,for any two transition kernels K and T from ( E , E ) to ( E , E ) and ( E , E ) to ( E , E ),respectively, we define the product transition kernel KT ( x, A ) , R E T ( z, A ) K ( x, d z ), for x ∈ E and A ∈ E . A set C of real-valued functions on X m is said to be proper if thefollowing conditions hold: i) C is a linear space; ii) if g ∈ C and f is measurable with | f | ≤ | g | , then | f | ∈ C ; iii) for all c ∈ R , the constant function f ≡ c belongs to C .From (Douc and Moulines, 2005) we adapt the following definitions. Definition 3.1 (Consistency).
A weighted sample { ( ξ N,i m , ω N,im ) } M N i =1 on X m +1 is saidto be consistent for the probability measure µ and the (proper) set C ⊆ L ( X m +1 , µ ) if, for Douc et al. any f ∈ C , as N → ∞ , (Ω Nm ) − M N X i =1 ω N,im f ( ξ N,i m ) P −→ µf , (Ω Nm ) − max ≤ i ≤ M N ω N,im P −→ . Definition 3.2 (Asymptotic normality).
A weighted sample { ( ξ N,i m , ω N,im ) } M N i =1 on X m +1 is called asymptotically normal (abbreviated a.n.) for ( µ, A , W , σ, γ, { a N } ∞ N =1 ) if, as N → ∞ , a N (Ω Nm ) − M N X i =1 ω N,im [ f ( ξ N,i m ) − µf ] D −→ N [0 , σ ( f )] for any f ∈ A ,a N (Ω Nm ) − M N X i =1 ( ω N,im ) f ( ξ N,i m ) P −→ γf for any f ∈ W ,a N (Ω Nm ) − max ≤ i ≤ M N ω N,im P −→ . The main contribution of this section is the following results, which establish consistency andasymptotic normality of weighted samples produced by the TSSPF and SSAPF algorithms.For all k ≥
0, we define a transformation Φ k on the set of φ k -integrable functions byΦ k [ f ]( x k ) , f ( x k ) − φ k f , x k ∈ X k +1 . (3.1)In addition, we impose the following assumptions. (A1) For all k ≥ , t k ∈ L ( X k +1 , φ k ) and w k ∈ L ( X k +1 , φ k ) , where t k and w k are definedin (2.1) and (2.2) , respectively. (A2) i) A ⊆ L ( X , φ ) is a proper set and σ : A → R + is a function satisfying, for all f ∈ A and a ∈ R , σ ( af ) = | a | σ ( f ) .ii) The initial sample { ( ξ N,i , } Ni =1 is consistent for [ L ( X , φ ) , φ ] and a.n. for [ φ , A , W , σ , γ , {√ N } ∞ N =1 ] . Theorem 3.1.
Assume (A1) and (A2) with ( W , γ ) = [ L ( X , φ ) , φ ] . In the settingof Algorithm 1, suppose that the limit β , lim N →∞ N/M N exists, where β ∈ [0 , . Definerecursively the family { A k } ∞ k =1 by A k +1 , n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) H u k ( · , | f | ) ∈ L ( X k +1 , φ k ) ,H u k ( · , | f | ) ∈ A k ∩ L ( X k +1 , φ k ) , w k +1 f ∈ L ( X k +2 , φ k +1 ) o . (3.2) Furthermore, define recursively the family { σ k } ∞ k =1 of functionals σ k : A k → R + by σ k +1 ( f ) , φ k +1 Φ k +1 [ f ]+ σ k { H u k ( · , Φ k +1 [ f ]) } + βφ k { t k R p k ( · , w k +1 Φ k +1 [ f ]) } φ k t k [ φ k H u k ( X k +2 )] . (3.3) Then each A k is a proper set for all k ≥ . Moreover, each sample { ( ξ N,i k , } Ni =1 pro-duced by Algorithm 1 is consistent for [ L ( X k +1 , φ k ) , φ k ] and asymptotically normal for [ φ k , A k , L ( X k +1 , φ k ) , σ k , φ k , {√ N } ∞ N =1 ] . n the auxiliary particle filter The proof is found in Appendix A, and as a by-product a similar result for the SSAPF(Algorithm 2) is obtained.
Theorem 3.2.
Assume (A1) and (A2) . Define the families { ˜ W k } ∞ k =0 and { ˜ A k } ∞ k =0 by ˜ W k , n f ∈ L ( X k +1 , φ k ) : w k +1 f ∈ L ( X k +1 , φ k ) o , ˜ W , W , and, with ˜ A , A , ˜ A k +1 , n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) H u k ( · , | f | ) ∈ L ( X k +1 , φ k ) ,H u k ( · , | f | ) ∈ ˜ A k , [ H u k ( · , | f | )] ∈ ˜ W k , w k +1 f ∈ L ( X k +2 , φ k +1 ) o . (3.4) Furthermore, define recursively the family { ˜ σ k } ∞ k =0 of functionals ˜ σ k : A k → R + by ˜ σ k +1 ( f ) , ˜ σ k { H u k ( · , Φ k +1 [ f ]) } + φ k { t k R p k ( · , w k +1 Φ k +1 [ f ]) } φ k t k [ φ k H u k ( X k +2 )] , ˜ σ , σ , (3.5) and the measures { ˜ γ k } ∞ k =1 by ˜ γ k +1 f , φ k +1 ( w k +1 f ) φ k t k φ k H u k ( X k +2 ) , f ∈ ˜ W k +1 . Then each ˜ A k is a proper set for all k ≥ . Moreover, each sample { ( ˜ ξ N,i k , ˜ ω N,ik ) } Ni =1 pro-duced by Algorithm 2 is consistent for [ L ( X k +1 , φ k ) , φ k ] and asymptotically normal for [ φ k , ˜ A k , ˜ W k , ˜ σ k , ˜ γ k , {√ N } ∞ N =1 ] . Under the assumption of bounded likelihood and second-stage importance weight func-tions g k and w k , one can show that the CLTs stated in Theorems 3.1 and 3.2 indeed includeany functions having finite second moments with respect to the joint smoothing distribu-tions; that is, under these assumptions the supplementary constraints on the sets (3.2) and(3.4) are automatically fulfilled. This is the contents of the statement below. (A3) For all k ≥ , k g k k X , ∞ < ∞ and k w k k X k +1 , ∞ < ∞ . Corollary 3.1.
Assume (A3) and let { A k } ∞ k =0 and { ˜ A k } ∞ k =0 be defined by (3.2) and (3.4) , respectively, with ˜ A = A , L ( X , φ ) . Then, for all k ≥ , A k = L ( X k +1 , φ k ) and L ( X k +1 , φ k ) ⊆ ˜ A k . For a proof, see Section A.2.Interestingly, the expressions of ˜ σ k +1 ( f ) and σ k +1 ( f ) differ, for β = 1, only on theadditive term φ k +1 Φ k +1 [ f ], that is, the variance of f under φ k +1 . This quantity representsthe cost of introducing the second-stage resampling pass, which was proposed as a meanfor preventing the particle approximation from degenerating. In the coming Section 3.2 wewill however show that the approximations produced by the SSAPF are already stable fora finite time horizon, and that additional resampling is superfluous. Thus, there are indeedreasons for strongly questioning whether second-stage resampling should be performed atall, at least when the same number of particles are used in the two stages. Douc et al. L p error and bias In this part we examine, under suitable regularity conditions and for a finite particle popula-tion, the errors of the approximations obtained by the APF in terms L p bounds and boundson the bias. We preface our main result with some definitions and assumptions. Denote by B b ( X m ) space of bounded measurable functions on X m furnished with the supremum norm k f k X m , ∞ , sup x ∈ X m | f ( x ) | . Let, for f ∈ B b ( X m ), the oscillation semi-norm (alternativelytermed the global modus of continuity ) be defined by osc( f ) , sup ( x,x ′ ) ∈ X m × X m | f ( x ) − f ( x ′ ) | .Furthermore, the L p norm of a stochastic variable X is denoted by k X k p , E /p [ | X | p ].When considering sums, we will make use of the standard convention P bk = a c k = 0 if b < a .In the following we will assume that all measures Q ( x, · ), x ∈ X , have densities q ( x, · )with respect to a common dominating measure µ on ( X , X ). Moreover, we suppose thatthe following holds. (A4) i) ǫ − , inf ( x,x ′ ) ∈ X q ( x, x ′ ) > , ǫ + , sup ( x,x ′ ) ∈ X q ( x, x ′ ) < ∞ .ii) For all y ∈ Y , R X g ( y | x ) µ (d x ) > . Under (A4) we define ρ , − ǫ − ǫ + . (3.6) (A5) For all k ≥ , k t k k X k +1 , ∞ < ∞ . Assumption (A4) is now standard and is often satisfied when the state space X is compactand implies that the hidden chain, when evolving conditionally on the observations, isgeometrical ergodic with a mixing rate given by ρ <
1. For comprehensive treatments ofsuch stability properties within the framework of state space models we refer to Del Moral(2004). Finally, let C i ( X n +1 ) be the set of bounded measurable functions f on X n +1 of type f ( x n ) = ¯ f ( x i : n ) for some function ¯ f : X n − i +1 → R . In this setting we have the followingresult, which is proved in Section A.3. Theorem 3.3.
Assume (A3) , (A4) , (A5) , and let f ∈ C i ( X n +1 ) for ≤ i ≤ n . Let { ( ˜ ξ N,i k , ˜ ω N,ik ) } R N ( r ) i =1 be a weighted particle sample produced by Algorithm r , r = { , } , with R N ( r ) , { r = 1 } M N + { r = 2 } N . Then the following holds true for all N ≥ and r = { , } .i) For all p ≥ , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( ˜Ω Nn ) − R N ( r ) X j =1 ˜ ω N,jn f i ( ˜ ξ N,j n ) − φ n f i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ≤ B p osc( f i )1 − ρ " ǫ − p R N ( r ) n X k =1 k w k k X k +1 , ∞ k t k − k X k , ∞ µg k ρ ∨ ( i − k ) + { r = 1 }√ N (cid:18) ρ − ρ + n − i (cid:19) + k w k X , ∞ νg √ N ρ i , n the auxiliary particle filter ii) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E ( ˜Ω Nn ) − R N ( r ) X j =1 ˜ ω N,jn f i ( ˜ ξ N,j n ) − φ n f i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ B osc( f i )(1 − ρ ) " R N ( r ) ǫ − n X k =1 k w k k X k +1 , ∞ k t k − k X k , ∞ ( µg k ) ρ ∨ ( i − k ) + { r = 1 } N (cid:18) ρ − ρ + n − i (cid:19) + k w k X , ∞ N ( νg ) ρ i . Here ρ is defined in (3.6) , and B p and B are universal constants such that B p depends on p only. Especially, applying, under the assumption that all fractions k w k k X k +1 , ∞ k t k − k X k , ∞ /µg k are uniformly bounded in k , Theorem 3.3 for i = n , yields error bounds on the approximatefilter distribution which are uniformly bounded in n . From this it is obvious that thefirst-stage resampling pass is enough to preserve the sample stability. Indeed, by avoidingsecond-stage selection according to Algorithm 2 we can, since the middle terms in thebounds above cancel in this case, obtain even tighter control of the L p error for a fixednumber of particles.
4. Identifying asymptotically optimal first-stage weights
The formulas (3.3) and (3.5) for the asymptotic variances of the TSSPF and SSAPF maylook complicated at a first sight, but by carefully examining the same we will obtain im-portant knowledge of how to choose the first-stage importance weight functions t k in orderto robustify the APF .Assume that we have run the APF up to time k and are about to design suitable first-stage weights for the next iteration. In this setting, we call a first-stage weight function t ′ k [ f ], possibly depending on the target function f ∈ A k +1 and satisfying (A1) , optimal (attime k ) if it provides a minimal increase of asymptotic variance at a single iteration of theAPF algorithm, that is, if σ k +1 { t ′ k [ f ] } ( f ) ≤ σ k +1 { t } ( f ) (or ˜ σ k +1 { t ′ k [ f ] } ( f ) ≤ ˜ σ k +1 { t } ( f ))for all other measurable and positive weight functions t . Here we let σ k +1 { t } ( f ) denote theasymptotic variance induced by t . Define, for x k ∈ X k +1 , t ∗ k [ f ]( x k ) , sZ X g k +1 ( x k +1 ) (cid:20) d Q ( x k , · )d R k ( x k , · ) ( x k +1 ) (cid:21) Φ k +1 [ f ]( x k +1 ) R k ( x k , d x k +1 ) , (4.1)and let w ∗ k +1 [ f ] denote the second-stage importance weight function induced by t ∗ k [ f ] ac-cording to (2.2). We are now ready to state the main result of this section. The proof isfound in Section A.4. Theorem 4.1.
Let k ≥ and define t ∗ k by (4.1) . Then the following is valid.i) Let the assumptions of Theorem 3.1 hold and suppose that f ∈ { f ′ ∈ A k +1 : t ∗ k [ f ′ ] ∈ L ( X k +1 , φ k ) , w ∗ k +1 [ f ′ ] ∈ L ( X k +2 , φ k +1 ) } . Then t ∗ k is optimal for Algorithm 1 and the Douc et al. corresponding minimal variance is given by σ k +1 { t ∗ k } ( f ) = φ k +1 Φ k +1 [ f ] + σ k [ H u k ( · , Φ k +1 [ f ])] + β ( φ k t ∗ k [ f ]) [ φ k H u k ( X k +2 )] . ii) Let the assumptions of Theorem 3.2 hold and suppose that f ∈ { f ′ ∈ ˜ A k +1 : t ∗ k [ f ′ ] ∈ L ( X k +1 , φ k ) , w ∗ k +1 [ f ′ ] ∈ L ( X k +2 , φ k +1 ) } . Then t ∗ k is optimal for Algorithm 2 and thecorresponding minimal variance is given by ˜ σ k +1 { t ∗ k } ( f ) = ˜ σ k [ H u k ( · , Φ k +1 [ f ])] + ( φ k t ∗ k [ f ]) [ φ k H u k ( X k +2 )] . The functions t ∗ k have a natural interpretation in terms of optimal sample allocation for stratified sampling . Consider the mixture π = P di =1 w i µ i , each µ i being a measure on somemeasurable space ( E , E ) and P di =1 w i = 1, and the problem of estimating, for some given π -integrable target function f , the expectation πf . In order to relate this to the particlefiltering paradigm, we will make use of Algorithm 3. In other words, we perform Monte Algorithm 3
Stratified importance sampling for i = 1 , . . . , N do draw an index J i multinomially with respect to τ j , 1 ≤ j ≤ d , P dj =1 τ j = 1; simulate ξ i ∼ ν J i , and compute the weights ω i , w j τ j d µ j d ν j (cid:12)(cid:12)(cid:12) j = J i end for Take { ( ξ i , ω i ) } Ni =1 as an approximation of π .Carlo estimation of πf by means of sampling from some proposal mixture P dj =1 τ j ν j andforming a self-normalised estimate—cf. the technique applied in Section 2.2 for samplingfrom ¯ φ Nk +1 . In this setting, the following CLT can be established under weak assumptions: √ N " N X i =1 ω i P Nℓ =1 ω ℓ f ( ξ i ) − πf D −→ N , d X j =1 w j α j ( f ) τ j , with, for x ∈ E , α i ( f ) , Z E (cid:20) d µ i d ν i ( x ) (cid:21) Π [ f ]( x ) ν i (d x ) and Π[ f ]( x ) , f ( x ) − πf . Minimising the asymptotic variance P di =1 [ w i α i ( f ) /τ i ] with respect to τ i , 1 ≤ i ≤ d , e.g.,by means of the Lagrange multiplicator method (the details are simple), yields the optimalweights τ ∗ i ∝ w i p α i ( f ) = w i sZ E (cid:20) d µ i d ν i ( x ) (cid:21) Π [ f ]( x ) ν i (d x ) , and the similarity between this expression and that of the optimal first-stage importanceweight functions t ∗ k is striking. This strongly supports the idea of interpreting optimalsample allocation for particle filters in terms of variance reduction for stratified sampling. n the auxiliary particle filter
5. Implementations
As shown in the previous section, the utilisation of the optimal weights (4.1) provides,for a given sequence { R k } ∞ k =0 of proposal kernels, the most efficient of all particle filtersbelonging to the large class covered by Algorithm 2 (including the standard bootstrapfilter and any fully adapted particle filter). However, exact computation of the optimalweights is in general infeasible by two reasons: firstly, they depend (via Φ k +1 [ f ]) on theexpectation φ k +1 f , that is, the quantity that we aim to estimate, and, secondly, they involvethe evaluation of a complicated integral. A comprehensive treatment of the important issueof how to approximate the optimal weights is beyond the scope of this paper, but in thefollowing three examples we discuss some possible heuristics for doing this. In order to form an initial idea of the performance of the optimal SSAPF in practice, weapply the method to a first order (possibly nonlinear) autoregressive model observed innoise: X k +1 = m ( X k ) + σ w ( X k ) W k +1 ,Y k = X k + σ v V k , (5.1)with { W k } ∞ k =1 and { V k } ∞ k =0 being mutually independent sets of standard normal distributedvariables such that W k +1 is independent of ( X i , Y i ), 0 ≤ i ≤ k , and V k is independent of X k ,( X i , Y i ), 0 ≤ i ≤ k −
1. Here the functions σ w : R → R + and m : R → R are measurable,and X = R . As observed by Pitt and Shephard (1999a), it is, for all models of form (5.1),possible to propose new particle using the optimal kernel directly, yielding R p k = H k and,for ( x, x ′ ) ∈ R , r k ( x, x ′ ) = 1˜ σ k ( x ) √ π exp (cid:26) − [ x ′ − ˜ m k ( x )] σ k ( x ) (cid:27) , (5.2)with r k denoting the density of R k with respect to the Lebesque measure, and˜ m k ( x ) , (cid:20) y k +1 σ v + m k ( x ) σ w ( x ) (cid:21) ˜ σ k ( x ) , ˜ σ k ( x ) , σ v σ w ( x ) σ v + σ w ( x ) . (5.3)For the proposal (5.2) it is, for x k : k +1 ∈ R , valid that g k +1 ( x k +1 ) d Q ( x k , · )d R k ( x k , · ) ( x k +1 ) ∝ h k ( x k ) , ˜ σ k ( x k ) σ w ( x k ) exp (cid:20) ˜ m k ( x k )2˜ σ k ( x k ) − m ( x k )2 σ w ( x k ) (cid:21) , (5.4)and since the right hand side does not depend on x k +1 we can, by letting t k ( x k ) = h k ( x k ), x k ∈ R k +1 , obtain second-stage weights being indeed unity (providing a sampleof genuinely ¯ φ Nk +1 -distributed particles). When this is achieved, Pitt and Shephard (1999a)call the particle filter fully adapted . There is however nothing in the previous theoreticalanalysis that supports the idea that aiming at evenly distributed second-stage weights isalways convenient, and this will also be illustrated in the simulations below. On the otherhand, it is possible to find cases when the fully adapted particle filter is very close to beingoptimal; see again the following discussion.In this part we will study the following two special cases of (5.1): Douc et al. • m ( X k ) ≡ φX k and σ w ( X k ) ≡ σ For a linear/Gaussian model of this kind, exact expressions of the optimal weights canbe obtained using the Kalman filter. We set φ = 0 . X ∼ N [0 , σ / (1 − φ )]. In this setting,we simulated, for σ = σ v = 0 .
1, a record y of observations and estimated thefilter posterior means (corresponding to projection target functions π k ( x k ) , x k , x k ∈ R k +1 ) along this trajectory by applying (1) SSAPF based on true optimalweights, (2) SSAPF based on the generic weights t P&S k of Pitt and Shephard (1999a),and (3) the standard bootstrap particle filter (that is, SSAPF with t k ≡ Q was taken as proposal in all cases, and sincethe optimal weights are derived using asymptotic arguments we used as many as100 ,
000 particles for all algorithms. The result is displayed in Figure 1(a), and it isclear that operating with true optimal allocation weights improves—as expected—theMSE performance in comparison with the other methods.The main motivation of Pitt and Shephard (1999a) for introducing auxiliary parti-cle filtering was to robustify the particle approximation to outliers. Thus, we mimicCapp´e et al. (2005, Example 7.2.3) and repeat the experiment above for the obser-vation record y = ( − . , − . , − . , . , . , σ v = 1, σ = 0 .
1, and the smaller particle sample size N = 10 , y , which in this case is located at a distance of20 standard deviations from the mean of the stationary distribution. The outcome isplotted in Figure 1(b) from which it is evident that the particle filter based on theoptimal weights is the most efficient also in this case; moreover, the performance ofthe standard auxiliary particle filter is improved in comparison with the bootstrapfilter. Figure 2 displays a plot of the weight functions t ∗ and t P&S4 for the same ob-servation record. It is clear that t P&S4 is not too far away from the optimal weightfunction (which is close to symmetric in this extreme situation) in this case, evenif the distance between the functions as measured with the supremum norm is stillsignificant.Finally, we implement the fully adapted filter (with proposal kernels and first stage-weights given by (5.2) and (5.4), respectively) and compare this with the SSAPF basedon the same proposal (5.4) and optimal first-stage weights, the latter being given by,for x k ∈ R k +1 and h k defined in (5.4), t ∗ k [ π k +1 ]( x k ) ∝ h k ( x k ) sZ R Φ k +1 [ π k +1 ]( x k +1 ) R k ( x k , d x k +1 )= h k ( x k ) q ˜ σ k ( x k ) + ˜ m k ( x k ) − m k ( x k ) φ k +1 π k +1 + φ k +1 π k +1 (5.5)in this case. We note that h k , that is, the first-stage weight function for the fullyadapted filter, enters as a factor in the optimal weight function (5.5). Moreover,recall the definitions (5.3) of ˜ m k and ˜ σ k ; in the case of very informative observations,corresponding to σ v ≪ σ , it holds that ˜ σ k ( x ) ≈ σ v and ˜ m k ( x ) ≈ y k +1 with goodprecision for moderate values of x ∈ R (that is, values not too far away from the meanof the stationary distribution of X ). Thus, the factor beside h k in (5.5) is more or lessconstant in this case, implying that the fully adapted and optimal first-stage weightfilters are close to equivalent. This observation is perfectly confirmed in Figure 3(a) n the auxiliary particle filter
0 1 2 3 4 5 6 7 8 9 10−17−16.5−16−15.5−15−14.5−14
Time index M ean s qua r e e rr o r (a)
0 1 2 3 4 5−13−12−11−10−9−8−7−6−5−4−3
Time index M ean s qua r e e rr o r (b) Fig. 1.
Plot of MSE perfomances (on log-scale) of the bootstrap particle filter ( ∗ ), the SSAPF basedon optimal weights ( (cid:3) ), and the SSAPF based on the generic weights t P&S k of Pitt and Shephard(1999a) ( ◦ ). The MSE values are founded on 100,000 particles and 400 runs of each algorithm.
14 16 18 20 22 24 26 2802468101214161820 particle position
Fig. 2.
Plot of the first-stage importance weight functions t ∗ (unbroken line) and t P&S4 (dashed line)in the presence of an outlier.4
Douc et al.
0 1 2 3 4 5 6 7 8 9 10−14−13−12−11−10−9−8−7−6−5−4
Time index M ean s qua r e e rr o r (a)
0 1 2 3 4 5 6 7 8 9 10−12.4−12.2−12−11.8−11.6−11.4−11.2−11−10.8
Time index M ean s qua r e e rr o r (b) Fig. 3.
Plot of MSE perfomances (on log-scale) of the bootstrap particle filter ( ∗ ), the SSAPF basedon optimal weights ( (cid:3) ), the SSAPF based on the generic weights t P&S k ( ◦ ), and the fully adaptedSSAPF ( × ) for the Linear/Gaussian model in Section 5.1. The MSE values are computed using10,000 particles and 400 runs of each algorithm. which presents MSE performances for σ v = 0 . σ = 1, and N = 10 , σ v ≫ σ , we note that ˜ σ k ( x ) ≈ σ , ˜ m k ( x ) ≈ φx and conclude that the optimal kernel is close the prior kernel Q . In addition, theexponent of h k vanishes, implying uniform first-stage weights for the fully adaptedparticle filter. Thus, the fully adapted filter will be close to the bootstrap filter in thiscase, and Figure 3(b) seems to confirm this remark. Moreover, the optimal first-stageweight filter does clearly better than the others in terms of MSE performance. • m ( X k ) ≡ σ w ( X k ) ≡ p β + β X k Here we deal with the classical Gaussian autoregressive conditional heteroscedasticity(ARCH) model (see Bollerslev et al. , 1994) observed in noise. Since the nonlinearstate equation precludes exact computation of the filtered means, implementing theoptimal first-stage weight SSAPF is considerably more challenging in this case. Theproblem can however be tackled by means of an introductory zero-stage simulationpass, based on R ≪ N particles, in which a crude estimate of φ k +1 f is obtained.For instance, this can be achieved by applying the standard bootstrap filter withmultinomial resampling. Using this approach, we computed again MSE values forthe bootstrap filter, the standard SSAPF based on generic weights, the fully adaptedSSAPF, and the (approximate) optimal first-stage weight SSAPF, the latter usingthe optimal proposal kernel. Each algorithm used 10 ,
000 particles and the number ofparticles in the prefatory pass was set to R = N/
10 = 1000, implying only a minoradditional computational work. An imitation of the true filter means was obtainedby running the bootstrap filter with as many as 500 ,
000 particles. In compliancewith the foregoing, we considered the case of informative (Figure 4(a)) as well asnon-informative (Figure 4(b)) observations, corresponding to ( β , β , σ v ) = (9 , , n the auxiliary particle filter
0 1 2 3 4 5 6 7 8 9 10−9.5−9−8.5−8−7.5−7−6.5−6−5.5−5−4.5
Time index M ean s qua r e e rr o r (a)
0 1 2 3 4 5 6 7 8 9 10−12−11.5−11−10.5−10−9.5
Time index M ean s qua r e e rr o r (b) Fig. 4.
Plot of MSE perfomances (on log-scale) of the bootstrap particle filter ( ∗ ), the SSAPF basedon optimal weights ( (cid:3) ), the SSAPF based on the generic weights t P&S k ( ◦ ), and the fully adaptedSSAPF ( × ) for the ARCH model in Section 5.1. The MSE values are computed using 10,000 particlesand 400 runs of each algorithm. and ( β , β , σ v ) = (0 . , , σ k ( x ) ≈ σ v , ˜ m k ( x ) ≈ y k +1 in thelatter case, we should, in accordance with the previous discussion, again expect thefully adapted filter to be close to that based on optimal first-stage weights. This isalso confirmed in the plot. For the former parameter set, the fully adapted SSAPFexhibits a MSE performance close to that of the bootstrap filter, while the optimalfirst-stage weight SSAPF is clearly superior. As a final example we consider the canonical discrete-time stochastic volatility (SV) model (Hull and White, 1987) given by X k +1 = φX k + σW k +1 ,Y k = β exp( X k / V k , where X = R , and { W k } ∞ k =1 and { V k } ∞ k =0 are as in Example 5.1. Here X and Y arelog-volatility and log-returns, respectively, where the former are assumed to be stationary.Also this model was treated by Pitt and Shephard (1999a), who discussed approximatefull adaptation of the particle filter by means of a second order Taylor approximation ofthe concave function x ′ log g k +1 ( x ′ ). More specifically, by multiplying the approxi-mate observation density obtained in this way with q ( x, x ′ ), ( x, x ′ ) ∈ R , yielding a Gaus-sian approximation of the optimal kernel density, nearly even second-stage weights canbe obtained. We proceed in the same spirit, approximating however directly the (log-concave) function x ′ g k +1 ( x ′ ) q ( x, x ′ ) by means of a second order Taylor expansion of x ′ log[ g k +1 ( x ′ ) q ( x, x ′ )] around the mode ¯ m k ( x ) (obtained using Newton iterations) ofthe same: g k +1 ( x ′ ) q ( x, x ′ ) ≈ r u k ( x, x ′ ) , g k +1 [ ¯ m k ( x )] q [ x, ¯ m k ( x )] exp (cid:26) − σ k ( x ) [ x ′ − ¯ m k ( x )] (cid:27) , Douc et al. with (we refer to Capp´e et al. , 2005, pp. 225–228, for details) ¯ σ k ( x ) being the invertednegative of the second order derivative, evaluated at ¯ m k ( x ), of x ′ log[ g k +1 ( x ′ ) q ( x, x ′ )].Thus, by letting, for ( x, x ′ ) ∈ R , r k ( x, x ′ ) = r u k ( x, x ′ ) / R R r u k ( x, x ′′ ) d x ′′ , we obtain g k +1 ( x k +1 ) d Q ( x k , · )d R k ( x k , · ) ( x k +1 ) ≈ Z R r u k ( x k , x ′ ) d x ′ ∝ ¯ σ k ( x k ) g k +1 [ ¯ m k ( x k )] q [ x, ¯ m k ( x k )] , (5.6)and letting, for x k ∈ R k +1 , t k ( x k ) = ¯ σ k ( x k ) g k +1 [ ¯ m k ( x k )] q [ x k , ¯ m k ( x k )] will imply a nearlyfully adapted particle filter. Moreover, by applying the approximate relation (5.6) to theexpression (4.1) of the optimal weights, we get (cf. (5.5)) t ∗ k [ π k +1 ]( x k ) ≈ Z R r u k ( x k , x ′ ) d x ′ sZ R Φ k +1 [ π k +1 ]( x ) R k ( x k , d x ) ∝ ¯ σ k ( x k ) g k +1 [ ¯ m k ( x k )] q [ x, ¯ m k ( x k )] q ¯ σ k ( x k ) + ¯ m k ( x k ) − m k ( x k ) φ k +1 π k +1 + φ k +1 π k +1 . (5.7)In this setting, we conducted a numerical experiment where the two filters above were,again together with the bootstrap filter and the auxiliary filter based on the generic weights t P&S k , run for the parameters ( φ, β, σ ) = (0 . , . , . y of observations arising from the initial state x = 2 .
19, being above the 2% quantile of the stationary distribution of X , implying asequence of relatively impetuously fluctuating log-returns. The number of particles was setto N = 5 ,
000 for all filters, and the number of particles used in the prefatory filtering pass(in which a rough approximation of φ k +1 π k +1 in (5.7) was computed using the bootstrapfilter) of the SSAPF filter based on optimal first-stage weights was set to R = N/ A. Proofs
A.1. Proof of Theorem 3.1
Let us recall the updating scheme described in Algorithm 1 and formulate it in the followingfour isolated steps: { ( ξ N,i k , } Ni =1 I : Weighting −−−−−−−−→ { ( ξ N,i k , τ N,ik ) } Ni =1 II : Resampling (1st stage) −−−−−−−−−−−−−−−−→ { ( ˆ ξ N,i k , } M N i =1 → III : Mutation −−−−−−−−→ { ( ˜ ξ N,i k +1 , ˜ ω N,ik +1 ) } M N i =1 IV : Resampling (2nd stage) −−−−−−−−−−−−−−−−−→ { ( ξ N,i k +1 , } Ni =1 , (A.1)where we have set ˆ ξ N,i k , ξ N,I
N,ik k , 1 ≤ i ≤ M N . Now, the asymptotic properties statedin Theorem 3.1 are established by a chain of applications of (Douc and Moulines, 2005,Theorems 1–4). We will proceed by induction: assume that the uniformly weighted par-ticle sample { ( ξ N,i k , } Ni =1 is consistent for [ L ( X k +1 , φ k ) , φ k ] and asymptotically normal n the auxiliary particle filter
0 1 2 3 4 5 6 7 8 9 10−9.5−9−8.5−8−7.5−7−6.5−6−5.5−5−4.5
Time index M ean s qua r e e rr o r Fig. 5.
Plot of MSE perfomances (on log-scale) of the bootstrap particle filter ( ∗ ), the SSAPF basedon optimal weights ( (cid:3) ), the SSAPF based on the generic weights t P&S k ( ◦ ), and the fully adaptedSSAPF ( × ) for the SV model in Section 5.2. The MSE values are computed using 5,000 particlesand 400 runs of each algorithm. for [ φ k , A k , L ( X k +1 , φ k ) , σ k , φ k , {√ N } ∞ N =1 ], with A k being a proper set and σ k such that σ k ( af ) = | a | σ k ( f ), f ∈ A k , a ∈ R . We prove, by analysing each of the steps (I–IV) , thatthis property is preserved through one iteration of the algorithm. (I) . Define the measure µ k ( A ) , φ k ( t k A ) φ k t k , A ∈ X ⊗ ( k +1) . By applying (Douc and Moulines, 2005, Theorem 1) for R ( x k , · ) = δ x k ( · ), L ( x k , · ) = t k ( x k ) δ x k ( · ), µ = µ k , and ν = φ k , we conclude that the weighted sample { ( ξ N,i k , τ N,ik ) } Ni =1 is consistent for [ { f ∈ L ( X k +1 , µ k ) : t k | f | ∈ L ( X k +1 , φ k ) } , µ k ] = [ L ( X k +1 , µ k ) , µ k ].Here the equality is based on the fact that φ k ( t k | f | ) = µ k | f | φ k t k , where the second fac-tor on the right hand side is bounded by Assumption (A1) . In addition, by applying(Douc and Moulines, 2005, Theorem 1) we conclude that { ( ξ N,i k , τ N,ik ) } Ni =1 is asymptoticallynormal for ( µ k , A I ,k , W I ,k , σ I ,k , γ I ,k , {√ N } ∞ N =1 ), where A I ,k , n f ∈ L ( X k +1 , µ k ) : t k | f | ∈ A k , t k f ∈ L ( X k +1 , φ k ) o = n f ∈ L ( X k +1 , µ k ) : t k f ∈ A k ∩ L ( X k +1 , φ k ) o , W I ,k , n f ∈ L ( X k +1 , µ k ) : t k | f | ∈ L ( X k +1 , φ k ) o are proper sets, and σ I ,k ( f ) , σ k (cid:20) t k ( f − µ k f ) φ k t k (cid:21) = σ k [ t k ( f − µ k f )]( φ k t k ) , f ∈ A I ,k ,γ I ,k f , φ k ( t k f )( φ k t k ) , f ∈ W I ,k . (II) . By using (Douc and Moulines, 2005, Theorems 3 and 4) we deduce that { ( ˆ ξ N,i k , } M N i =1 is consistent for [ L ( X k +1 , µ k ) , µ k ] and a.n. for [ µ k , A II ,k , L ( X k +1 , µ k ) , σ II ,k , βµ k , {√ N } ∞ N =1 ], Douc et al. where A II ,k , n f ∈ A I ,k : f ∈ L ( X k +1 , µ k ) o = n f ∈ L ( X k +1 , µ k ) : t k f ∈ A k ∩ L ( X k +1 , φ k ) o is a proper set, and σ II ,k ( f ) , βµ k [( f − µ k f ) ] + σ I ,k ( f ) = βµ k [( f − µ k f ) ] + σ k [ t k ( f − µ k f )]( φ k t k ) , f ∈ A II ,k . (III) . We argue as in step (I) , but this time for ν = µ k , R = R p k , and L ( · , A ) = R p k ( · , w k +1 A ), A ∈ X ⊗ ( k +2) , providing the target distribution µ ( A ) = µ k R p k ( w k +1 A ) µ k R p k w k +1 = φ k H u k ( A ) φ k H u k ( X k +2 ) = φ k +1 ( A ) , A ∈ X ⊗ ( k +2) . (A.2)This yields, applying (Douc and Moulines, 2005, Theorems 1 and 2), that { ( ˜ ξ N,ik +1 , ˜ ω N,ik +1 ) } M N i =1 is consistent for hn f ∈ L ( X k +2 , φ k +1 ) , R p k ( · , w k +1 | f | ) ∈ L ( X k +1 , µ k ) o , φ k +1 i = (cid:2) L ( X k +2 , φ k +1 ) , φ k +1 (cid:3) , (A.3)where (A.3) follows, since µ k R p k ( w k +1 | f | ) φ k t k = φ k H u k ( X k +2 ) φ k +1 | f | , from (A1) , and a.n.for ( φ k +1 , A III ,k +1 , W III ,k +1 , σ III ,k +1 , γ III ,k +1 , {√ N } ∞ N =1 ). Here A III ,k +1 , n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) ∈ A II ,k , R p k ( · , w k +1 f ) ∈ L ( X k +1 , µ k ) o = n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) ∈ L ( X k +1 , µ k ) ,t k R p k ( · , w k +1 | f | ) ∈ A k ∩ L ( X k +1 , φ k ) , R p k ( · , w k +1 f ) ∈ L ( X k +1 , µ k ) o = n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) H u k ( · , | f | ) ∈ L ( X k +1 , φ k ) ,H u k ( · , | f | ) ∈ A k ∩ L ( X k +1 , φ k ) , w k +1 f ∈ L ( X k +2 , φ k +1 ) o and W III ,k +1 , n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) ∈ L ( X k +1 , µ k ) o = n f ∈ L ( X k +2 , φ k +1 ) : w k +1 f ∈ L ( X k +2 , φ k +1 ) o are proper sets. In addition, from the identity (A.2) we obtain that µ k R p k ( w k +1 Φ k +1 [ f ]) = 0 , n the auxiliary particle filter where Φ k +1 is defined in (3.1), yielding σ III ,k +1 ( f ) , σ II ,k (cid:26) R p k ( · , w k +1 Φ k +1 [ f ]) µ k R p k w k +1 (cid:27) + βµ k R p k ( { w k +1 Φ k +1 [ f ] − R p k ( · , w k +1 Φ k +1 [ f ]) } )( µ k R p k w k +1 ) = βµ k ( { R p k ( w k +1 Φ k +1 [ f ]) } )( µ k R p k w k +1 ) + σ k { t k R p k ( · , w k +1 Φ k +1 [ f ]) } ( φ k t k ) ( µ k R p k w k +1 ) + βµ k R p k ( { w k +1 Φ k +1 [ f ] − R p k ( · , w k +1 Φ k +1 [ f ]) } )( µ k R p k w k +1 ) , f ∈ A III ,k +1 . Now, applying the equality { R p k ( · , w k +1 Φ k +1 [ f ]) } + R p k ( · , { w k +1 Φ k +1 [ f ] − R p k ( · , w k +1 Φ k +1 [ f ]) } )= R p k ( · , w k +1 Φ k +1 [ f ]) , provides the variance σ III ,k +1 ( f ) = βφ k { t k R p k ( · , w k +1 Φ k +1 [ f ]) } φ k t k + σ k { H u k ( · , Φ k +1 [ f ]) } [ φ k H u k ( X k +2 )] , f ∈ A III ,k +1 . (A.4)Finally, for f ∈ W III ,k +1 , γ III ,k +1 f , βµ k R p k ( w k +1 f )( µ k R p k w k +1 ) = βφ k +1 ( w k +1 f ) φ k t k φ k H u k ( X k +2 ) . (IV) . The consistency for [ L ( X k +2 , φ k +1 ) , φ k +1 ] of the uniformly weighted particle sam-ple { ( ξ N,i k +1 , } Ni =1 follows from (Douc and Moulines, 2005, Theorem 3). In addition, ap-plying (Douc and Moulines, 2005, Theorem 4) yields that the same sample is a.n. for[ φ k +1 , A IV ,k +1 , L ( X k +2 , φ k +1 ) , σ IV ,k +1 , φ k +1 , {√ N } ∞ N =1 ], with A IV ,k +1 , n f ∈ A III ,k +1 : f ∈ L ( X k +2 , φ k +1 ) o = n f ∈ L ( X k +2 , φ k +1 ) : R p k ( · , w k +1 | f | ) H u k ( · , | f | ) ∈ L ( X k +1 , φ k ) ,H u k ( · , | f | ) ∈ A k ∩ L ( X k +1 , φ k ) , w k +1 f ∈ L ( X k +2 , φ k +1 ) o being proper set, and, for f ∈ A IV ,k +1 , σ IV ,k +1 ( f ) , φ k +1 Φ k +1 [ f ] + σ III ,k +1 ( f ) , with σ III ,k +1 ( f ) being defined by (A.4). This concludes the proof of the theorem. Douc et al.
A.2. Proof of Corollary 3.1
We pick f ∈ L ( X k +2 , φ k +1 ) and prove that the constraints of the set A k +1 defined in (3.2)are satisfied under Assumption (A3) . Firstly, by Jensen’s inequality, φ k [ R p k ( · , w k +1 | f | ) H u k ( · , | f | )]= φ k { t k [ R p k ( · , w k +1 | f | )] }≤ φ k [ t k R p k ( · , w k +1 f )]= φ k H u k ( w k +1 f ) ≤ k w k +1 k X k +2 , ∞ φ k H u k ( X k +2 ) φ k +1 ( f ) < ∞ , and similarly, φ k { [ H u k ( · , | f | )] } ≤ k g k +1 k X , ∞ φ k H u k ( X k +2 ) φ k +1 ( f ) < ∞ . From this, together with the bound φ k +1 ( w k +1 f ) ≤ k w k +1 k X k +2 , ∞ φ k +1 ( f ) < ∞ , we conclude that A k +1 = L ( X k +2 , φ k +1 ).To prove L ( X k +1 , φ k ) ⊆ ˜ A k , note that assumption (A3) implies ˜ W k = L ( X k +1 , φ k )and repeat the arguments above. A.3. Proof of Theorem 3.3
Define, for r ∈ { , } and R N ( r ) as defined in Theorem 3.3, the particle measures φ Nk ( A ) , N N X i =1 δ ξ N,i k and ˜ φ Nk ( A ) , Nk R N ( r ) X i =1 ˜ ω N,ik δ ˜ ξ N,i k ( A ) , A ∈ X ⊗ ( k +1) , playing the role of approximations of the smoothing distribution φ k . Let F , σ ( ξ N,i ; 1 ≤ i ≤ N ); then the particle history up to the different steps of loop m + 1, m ≥
0, ofAlgorithm r , r ∈ { , } , is modeled by the filtrations ˆ F m , F m ∨ σ [ I N,im ; 1 ≤ i ≤ R N ( r )],˜ F m +1 , F m ∨ σ [ ˜ ξ N,i m +1 ; 1 ≤ i ≤ R N ( r )], and F m +1 , ( ˜ F m +1 ∨ σ ( J N,im +1 ; 1 ≤ i ≤ N ) , for r = 1 , ˜ F m +1 , for r = 2 . respectively. In the coming proof we will describe one iteration of the APF algorithm bythe following two operations. { ( ξ N,i k , ω N,ik ) } Ni =1 Sampling from ϕ Nk +1 −−−−−−−−−−−−−−−−→ { ( ˜ ξ N,i k +1 , ˜ ω N,ik +1 ) } R N ( r ) i =1 → r = 1: Sampling from ˜ φ N k +1 −−−−−−−−−−−−−−−−−−−−−−−→ { ( ξ N,i k +1 , } Ni =1 , where, for A ∈ X ⊗ ( k +2) , ϕ Nk +1 ( A ) , P (cid:16) ˜ ξ N,i k +1 ∈ A (cid:12)(cid:12)(cid:12) F k (cid:17) = N X j =1 ω N,jk τ N,jk P Nℓ =1 ω N,ℓk τ N,ℓk R p k ( ξ N,j k , A ) = φ Nk [ t k R p k ( · , A )] φ Nk t k , (A.5) n the auxiliary particle filter for some index i ∈ { , . . . , R N ( r ) } (given F k , the particles ˜ ξ N,i k +1 , 1 ≤ i ≤ R N ( r ), are i.i.d.).Here the initial weights { ω N,ik } Ni =1 are all equal to one for r = 1. The second operation isvalid since, for any i ∈ { , . . . , N } , P (cid:16) ξ N,i k +1 ∈ A (cid:12)(cid:12)(cid:12) ˜ F k +1 (cid:17) = R N ( r ) X j =1 ˜ ω N,jk +1 ˜Ω Nk +1 δ ˜ ξ N,j k +1 ( A ) = ˜ φ N k +1 ( A ) , A ∈ X ⊗ ( k +2) . The fact that the evolution of the particles can be described by two Monte Carlo opera-tions involving conditionally i.i.d. variables makes it possible to analyse the error using theMarcinkiewicz-Zygmund inequality (see Petrov, 1995, p. 62).Using this, set, for 1 ≤ k ≤ n , α Nk ( A ) , Z A d α Nk d ϕ Nk ( x k ) ϕ Nk (d x k ) , A ∈ X ⊗ ( k +1) , (A.6)with, for x k ∈ X k +1 ,d α Nk d ϕ Nk ( x k ) , w k ( x k ) H u k · · · H u n − ( x k , X n +1 ) φ Nk − t k − φ Nk − H u k − · · · H u n − ( X n +1 ) . Here we apply the standard convention H u ℓ · · · H u m , Id if m < ℓ . For k = 0 we define α ( A ) , Z A d α d ς ( x ) ς (d x ) , A ∈ X , with, for x ∈ X , d α d ς ( x ) , w ( x ) H u0 · · · H u n − ( x , X n +1 ) ν [ g H u0 · · · H u n − ( · , X n +1 )] . Similarly, put, for 0 ≤ k ≤ n − β Nk ( A ) , Z A d β Nk d ˜ φ Nk ( x k ) ˜ φ Nk (d x k ) , A ∈ X ⊗ ( k +1) , (A.7)where, for x k ∈ X k +1 , d β Nk d ˜ φ Nk ( x k ) , H u k · · · H u n − ( x k , X n +1 )˜ φ Nk H u k · · · H u n − ( X n +1 ) . The following powerful decomposition is an adaption of a similar one derived by Olsson et al. (2005, Lemma 7.2) (the standard SISR case), being in turn a refinement of a decompositionoriginally presented by Del Moral (2004).
Lemma A.1.
Let n ≥ . Then, for all f ∈ B b ( X n +1 ) , N ≥ , and r ∈ { , } , ˜ φ N n f − φ n f = n X k =1 A Nk ( f ) + { r = 1 } n − X k =0 B Nk ( f ) + C N ( f ) , (A.8) Douc et al. where A Nk ( f ) , P R N ( r ) i =1 d α Nk d ϕ Nk ( ˜ ξ N,i k )Ψ k : n [ f ]( ˜ ξ N,i k ) P R N ( r ) j =1 d α Nk d ϕ Nk ( ˜ ξ N,j k ) − α Nk Ψ k : n [ f ] ,B Nk ( f ) , P Ni =1 d β Nk d ˜ φ Nk ( ξ N,i k )Ψ k : n [ f ]( ξ N,i k ) P Nj =1 d β Nk d ˜ φ Nk ( ξ N,j k ) − β Nk Ψ k : n [ f ] ,C N ( f ) , P Ni =1 d β | n d ς ( ξ N,i )Ψ n [ f ]( ξ N,i ) P Nj =1 d β d ς ( ξ N,i ) − φ n Ψ n [ f ] , and the operators Ψ k : n : B b ( X n +1 ) → B b ( X n +1 ) , ≤ k ≤ n , are, for some fixed points ˆ x k ∈ X k +1 , defined by Ψ k : n [ f ] : x k H u k · · · H u n − f ( x k ) H u k · · · H u n − ( x k , X n +1 ) − H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) . Proof of Lemma A.1.
Consider the decomposition˜ φ N n f − φ n f = n X k =1 " ˜ φ Nk H u k · · · H u n − f ˜ φ Nk H u k · · · H u n − ( X n +1 ) − φ Nk − H u k − · · · H u n − fφ Nk − H u k − · · · H u n − ( X n +1 ) + { r = 1 } n − X k =0 " φ Nk H u k · · · H u n − fφ Nk H u k · · · H u n − ( X n +1 ) − ˜ φ Nk H u k · · · H u n − f ˜ φ Nk H u k · · · H u n − ( X n +1 ) + ˜ φ N H u0 · · · H u n − f ˜ φ N H u0 · · · H u n − ( X n +1 ) − φ n f . We will show that the three parts of this decomposition are identical with the three partsof (A.8). For k ≥ ϕ Nk and α Nk ,respectively, and following the lines of Olsson et al. (2005, Lemma 7.2), φ Nk − H u k − · · · H u n − H u n − fφ Nk − H u k − · · · H u n − ( X n +1 )= ϕ Nk " w k ( · ) H u k · · · H u n − f ( · )( φ Nk − t k − ) φ Nk − H u k − · · · H u n − ( X n +1 ) = ϕ Nk " w k ( · ) H u k · · · H u n − ( · , X n +1 )( φ Nk − t k − ) φ Nk − H u k − · · · H u n − ( X n +1 ) (cid:26) Ψ k : n [ f ]( · ) + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) (cid:27) = α Nk (cid:20) Ψ k : n [ f ]( · ) + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) (cid:21) = α Nk Ψ k : n [ f ] + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) . n the auxiliary particle filter Moreover, by definition,˜ φ Nk H u k · · · H u n − f ˜ φ Nk H u k · · · H u n − ( X n +1 ) = P R N ( r ) i =1 d α Nk d ϕ Nk ( ˜ ξ N,i k )Ψ k : n [ f ]( ˜ ξ N,i k ) P R N ( r ) j =1 d α Nk d ϕ Nk ( ˜ ξ N,j k ) + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) , yielding ˜ φ Nk H u k · · · H u n − f ˜ φ Nk H u k · · · H u n − ( X n +1 ) − φ Nk − H u k − · · · H u n − fφ Nk − H u k − · · · H u n − ( X n +1 ) ≡ A Nk ( f ) . Similarly, for r = 1, using the definition (A.7) of β Nk ,˜ φ N k H u k − · · · H u n − f ˜ φ N k H u k − · · · H u n − ( X n +1 ) = β Nk (cid:20) H u k · · · H u n − f ( · ) H u k · · · H u n − ( X n +1 ) (cid:21) = β Nk (cid:20) Ψ k : n [ f ]( · ) + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) (cid:21) = β Nk Ψ k : n [ f ] + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) , and applying the obvious relation φ Nk H u k · · · H u n − fφ Nk H u k · · · H u n − ( X n +1 ) = P Ni =1 d β Nk d ˜ φ Nk ( ξ N,i k )Ψ k : n [ f ]( ξ N,i k ) P Nj =1 d β Nk d ˜ φ Nk ( ξ N,j k ) + H u k · · · H u n − f (ˆ x k ) H u k · · · H u n − (ˆ x k , X n +1 ) , we obtain the identity φ Nk H u k · · · H u n − fφ Nk H u k · · · H u n − ( X n +1 ) − ˜ φ Nk H u k · · · H u n − f ˜ φ Nk H u k · · · H u n − ( X n +1 ) ≡ B Nk ( f ) . The equality ˜ φ N H u0 · · · H u n − f ˜ φ N H u0 · · · H u n − ( X n +1 ) − φ n f ≡ C N ( f )follows analogously. This completes the proof of the lemma. ✷ Proof of Theorem 3.3.
From here the proof is a straightforward extension of (Olsson et al. ,2005, Proposition 7.1). To establish part (i), observe that: • A trivial adaption of (Olsson et al. , 2005, Lemmas 7.3 and 7.4) gives that k Ψ k : n [ f i ] k X k +1 , ∞ ≤ osc( f i ) ρ ∨ ( i − k ) , (cid:13)(cid:13)(cid:13)(cid:13) d α Nk d ϕ Nk (cid:13)(cid:13)(cid:13)(cid:13) X k +1 , ∞ ≤ k w k k X k +1 , ∞ k t k − k X k , ∞ µg k (1 − ρ ) ǫ − . (A.9) Douc et al. • By mimicking the proof of (Olsson et al. , 2005, Proposition 7.1(i)), that is, applyingthe identity a/b − c = ( a/b )(1 − b ) + a − c to each A Nk ( f i ) and using twice theMarcinkiewicz-Zygmund inequality together with the bounds (A.9), we obtain thebound p R N ( r ) (cid:13)(cid:13) A Nk ( f i ) (cid:13)(cid:13) p ≤ B p osc( f i ) k w k k X k +1 , ∞ k t k − k X k , ∞ µg k (1 − ρ ) ǫ − ρ ∨ ( i − k ) , where B p is a constant depending on p only. We refer to (Olsson et al. , 2005, Propo-sition 7.1) for details. • For r = 1, inspecting the proof of (Olsson et al. , 2005, Lemma 7.4) yields immediately (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d β Nk d ˜ φ Nk (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X k +1 , ∞ ≤ − ρ , and repeating the arguments of the previous item for B Nk ( f i ) yields √ N (cid:13)(cid:13) B Nk ( f i ) (cid:13)(cid:13) p ≤ B p osc( f i )1 − ρ ρ ∨ ( i − k ) . • The arguments above apply directly to C N ( f i ), providing √ N (cid:13)(cid:13) C N ( f i ) (cid:13)(cid:13) p ≤ B p osc( f i ) k w k X , ∞ νg (1 − ρ ) ρ i . We conclude the proof of (i) by summing up.The proof of (ii) (which mimics the proof of (Olsson et al. , 2005, Proposition 7.1(ii)))follows analogous lines; indeed, repeating the arguments of (i) above for the decomposition a/b − c = ( a/b )(1 − b ) + ( a − c )(1 − b ) + c (1 − b ) + a − c gives us the bounds R N ( r ) (cid:12)(cid:12) E (cid:2) A Nk ( f i ) (cid:3)(cid:12)(cid:12) ≤ B osc( f i ) k w k k X k +1 , ∞ k t k − k X k , ∞ ( µg k ) (1 − ρ ) ǫ − ρ ∨ ( i − k ) ,N (cid:12)(cid:12) E (cid:2) B Nk ( f i ) (cid:3)(cid:12)(cid:12) ≤ B osc( f i )(1 − ρ ) ρ ∨ ( i − k ) ,N (cid:12)(cid:12) E (cid:2) C N ( f i ) (cid:3)(cid:12)(cid:12) ≤ B osc( f i ) k w k X , ∞ ( νg ) (1 − ρ ) ρ i . We again refer to (Olsson et al. , 2005, Proposition 7.1(ii)) for details, and summing up con-cludes the proof. ✷ A.4. Proof of Theorem 4.1
The statement is a direct implication of H¨older’s inequality. Indeed, let t k be any first-stageimportance weight function and write( φ k t ∗ k [ f ]) = { φ k ( t / k t − / k t ∗ k [ f ]) } ≤ φ k t k φ k { t − k ( t ∗ k [ f ]) } . (A.10) n the auxiliary particle filter Now the result follows by the formula (3.3), the identity φ k { t − k ( t ∗ k [ f ]) } = φ k { t k R p k ( · , w k +1 Φ k +1 [ f ]) } , and the fact that we have equality in (A.10) for t k = t ∗ k [ f ]. Acknowledgment.
The authors are grateful to Olivier Capp´e who provided sensible com-ments on our results that improved the presentation of the paper.
References
Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994) ARCH Models. In
The Handbookof Econometrics (eds. Engle, R. F., and McFadden, D.), , pp. 2959–3038. Amsterdam:North-HollandCapp´e, O., Moulines, ´E., and Ryd´en, T. (2005) Inference in Hidden Markov Models . NewYork: SpringerChopin, N. (2004) Central limit theorem for sequential Monte Carlo methods and itsapplication to Bayesian inference.
Ann. Stat. , , pp. 2385–2411.Del Moral, P. (2004) Feyman-Kac Formulae. Genealogical and Interacting Particle Systemswith Applications . New York: Springer.Douc, R., and Moulines, ´E. (2005) Limit theorems for weighted samples with applicationsto sequential Monte Carlo methods. To appear in
Ann. Stat.
Doucet, A., de Freitas, N., and Gordon, N. (2001)
Sequential Monte Carlo Methods inPractice . New York: Springer.Doucet, A., and Johansen, A. (2007) A note on the auxiliary particle filter. Work inprogress.Fearnhead, P. (1998)
Sequential Monte Carlo Methods in Filter Theory . Doctorial thesis,University of Oxford.Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993) Novel approach to non-linear/non-Gaussian Bayesian state estimation.
IEEE Proc. Comm. Radar Signal Proc. , , pp. 107–113.Hull, J., and White, A. (1987) The pricing of options on assets with stochastic volatilities. J. Finance , , pp. 281–300.H¨urzeler, M., and K¨unsch, H. R. (1998) Monte Carlo approximations for general state spacemodels. J. Comput. Graph. Statist. , , pp. 175–193.Liu, J. (2001) Monte Carlo Strategies in Scientific Computing . New York: Springer.K¨unsch, H. R. (2005) Recursive Monte Carlo filters: algorithms and theoretical analysis.
Ann. Stat. , , pp. 1983–2021. Douc et al.
Olsson, J., Capp´e, O., Douc, R., Moulines, ´E. (2006) Sequential Monte Carlo smoothingwith application to parameter estimation in non-linear state space models. Technicalreport, Lund University. To appear in
Bernoulli .Olsson, J., Douc, R., Moulines, ´E. (2006) Improving the two-stage sampling algorithm: astatistical perspective. In
On bounds and Asymptotics of Sequential Monte Carlo Methodsfor Filtering, Smoothing, and Maximum Likelihood Estimation in State Space Models ,pp. 143–181. Doctorial Thesis, Lund University.Petrov. V. V. (1995)
Limit Theorems of Probability Theory . New York: Springer.Pitt, M. K., and Shephard, N. (1999a) Filtering via simulation: Auxiliary particle filters.
J. Am. Statist. Assoc. , , pp. 493–499.Pitt, M. K., and Shephard, N. (1999b) Time varying covariances: A factor stochasticvolatility approach (with discussion). In Bayesian Statistics (eds. Bernardo, J. M., Berger,J. O., Dawid, A. P., Smith, A. F. M.),6