[PDF] The prelimit generator comparison approach of Stein's method

Abstract

This paper uses the generator comparison approach of Stein's method to analyze the gap between steady-state distributions of Markov chains and diffusion processes. The "standard" generator comparison approach starts with the Poisson equation for the diffusion, and the main technical difficulty is to establish derivative bounds for the solution to the Poisson equation, known as Stein factor bounds. In this paper we propose starting with the Markov chain Poisson equation; we term this the prelimit approach. Although Stein factor bounds still must be established, they now correspond to the finite differences of the Markov chain Poisson equation solution rather than the derivatives of the solution to the diffusion Poisson equation. In certain cases, finite difference bounds are easier to obtain for example, when the drift of the diffusion is not everywhere differentiable, or in the presence of a reflecting boundary condition. We use the M/M/1 model as a simple working example to illustrate our approach. In a companion paper, we apply the prelimit approach to the join-the-shortest-queue model, which is a considerably more involved multidimensional model with a state-space collapse component.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b The prelimit generator comparison approach ofStein’s method

Anton Braverman

Kellogg School of Management, Northwestern University, Evanston, IL 60208,[email protected]

This paper uses the generator comparison approach of Stein’s method to analyze the gap between steady-state distributions of Markov chains and diﬀusion processes. The “standard” generator comparison approachstarts with the Poisson equation for the diﬀusion, and the main technical diﬃculty is to establish derivativebounds for the solution to the Poisson equation, known as Stein factor bounds. In this paper we proposestarting with the Markov chain Poisson equation; we term this the prelimit approach . Although Stein factorbounds still must be established, they now correspond to the ﬁnite diﬀerences of the Markov chain Poissonequation solution rather than the derivatives of the solution to the diﬀusion Poisson equation. In certain cases,ﬁnite diﬀerence bounds are easier to obtain for example, when the drift of the diﬀusion is not everywherediﬀerentiable, or in the presence of a reﬂecting boundary condition. We use the

M/M/

Key words : Stein method; generator comparison; Markov chain; prelimit; convergence rate; diﬀusionapproximation

1. Introduction

Recent years have seen growing use of the generator comparison approach of Stein’smethod to establish rates of convergence for steady-state diﬀusion approximations ofMarkov chains. One very active area has been the study of queueing and service systems,e.g., Stolyar (2015), Gurvich (2014a), Braverman and Dai (2017), Braverman et al. (2016),Ying (2016, 2017), Dai and Shi (2017), Huang and Gurvich (2018), Feng and Shi (2018),Liu and Ying (2019), Braverman et al. (2020b), Braverman (2020), Braverman et al.(2020a). In the typical setup, one considers a parametric family of continuous-time Markov raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. chains (CTMCs) { X ( t ) } taking values in some discrete state space. This family is oftentermed the prelimit sequence. As the parameters tend to some asymptotic limit, the pre-limit sequence converges to a limiting diﬀusion process { Y ( t ) } . In queueing, for example,the CTMC parameters are usually the arrival rate, number of servers, and service rate,and one common asymptotic regime is where the system utilization approaches one, alsoknown as the heavy-traﬃc regime. Letting X and Y denote vectors having the stationarydistribution of the CTMC and diﬀusion, respectively, the generator approach of Stein’smethod has been used to study the rates of convergence of X to Y . The generator approachis attributed to Barbour (1988, 1990) and G¨otze (1991), which were the ﬁrst papers toconnect Stein’s method to generators of diﬀusions and CTMCs.The limiting factor in the generator comparison approach is the curse of dimensionality,because the distance between X and Y depends on the derivatives of the solution to thePoisson equation of the diﬀusion. When the diﬀusion is multidimensional, the Poissonequation is a second-order partial diﬀerential equation (PDE), and obtaining derivativebounds, also known as Stein factor bounds, becomes a challenge. The present paper isconcerned with expanding the technical toolbox for getting multidimensional Stein factorbounds. Before discussing our contribution, let us examine this problem in detail.Assume that X takes values on the lattice δ Z d = { δk : k ∈ Z d } for some δ > Y ∈ R d . Let G X and G Y be the inﬁnitesimal generators of the CTMC and diﬀusion,respectively. Suppose G X has the form G X f ( δk ) = X k ′ ∈ Z d q k,k ′ ( f ( δk ′ ) − f ( δk )) , k ∈ Z d , (1)where q k,k ′ are the transition rates from state x k to x k ′ . Further suppose the diﬀusiongenerator has the form G Y f ( x ) = d X i =1 b i ( x ) ∂∂x i f ( x ) + 12 d X i,j =1 a ij ( x ) ∂ ∂x i ∂x j f ( x ) , x ∈ R d , where f : R d → R is a twice continuously diﬀerentiable function and b ( x ) = ( b ( x ) , . . . , b d ( x ))and a ( x ) = ( a ij ( x )) di,j =1 are known as the drift and diﬀusion coeﬃcient, respectively. raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. The generator approach works as follows. First, we choose a test function h ∗ : R d → R and consider the Poisson equation G Y f h ∗ ( x ) = E h ∗ ( Y ) − h ∗ ( x ) , x ∈ R d . (2)We use the star superscript above to emphasize that the functions are deﬁned on all of R d .The solution f h ∗ ( x ) is unique up to a constant and satisﬁes E G X f h ∗ ( X ) = 0 under somemild conditions. Hence, we can take expected values with respect to X in (2) to concludethat E h ∗ ( Y ) − E h ∗ ( X ) = E (cid:0) G Y f h ∗ ( X ) − G X f h ∗ ( X ) (cid:1) . (3)In practice, the chosen h ∗ ( x ) frequently belongs toLip(1) = { h ∗ : R d → R : | h ( x ) − h ( y ) | ≤ | x − y | , for all x, y ∈ R d } , and d Lip(1) ( X, Y ) = sup h ∗ ∈ Lip(1) (cid:12)(cid:12) E h ∗ ( X ) − E h ∗ ( Y ) (cid:12)(cid:12) is known as the Wasserstein distance. The class Lip(1) is chosen both because it is simple towork with and because it is convergence determining; i.e., convergence in the Wassersteindistance implies convergence in distribution (see, for instance, Gibbs and Su (2002)).Bounding the error on the right hand side of (3) using Stein’s method requires boundson the derivatives of f h ∗ ( x ), and depending on the transition structure of the CTMC, itmay also depend on moments of X . We refer to the former as “derivative bounds.” Usually,the approximation Y is such that (3) converges to zero at a rate of δ , and to prove thisit suﬃces to bound the second and third derivatives of f h ∗ ( x ). However, when one seeksapproximations Y with convergence rates faster than δ , as was done in Braverman et al.(2020a), for example, one needs to bound fourth- and higher-order derivatives.When d = 1, the explicit form of f h ∗ ( x ) is known and can be leveraged to get thederivative bounds via a brute-force approach. When d >

1, the Poisson equation is a second-order PDE, and the same kind of brute-force analysis cannot be carried out. Instead, onehas to rely on the fact that f h ∗ ( x ) = Z ∞ (cid:0) E Y (0)= x h ∗ ( Y ( t )) − E h ∗ ( Y ) (cid:1) dt, x ∈ R d , (4) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. solves the Poisson equation, provided the quantity above is ﬁnite. See any one of Barbour(1990), G¨otze (1991), Gurvich (2014a), Mackey and Gorham (2016) for a proof of (4). Onecan then leverage (4) together with ﬁnite diﬀerence approximations to get the derivativebounds needed. For instance, ∂∂x i f h ∗ ( x ) ≈ f h ∗ ( x + εe ( i ) ) − f h ∗ ( x ) ε = 1 ε Z ∞ (cid:0) E Y (0)= x + εe ( i ) h ∗ ( Y ( t )) − E Y (0)= x h ∗ ( Y ( t )) (cid:1) dt. (5)There are a few ways to bound (5). Sometimes, the transient distribution of Y ( t ) is knownas a function of Y (0), as in Barbour (1990), G¨otze (1991), Gan et al. (2017), Gan and Ross(2019), and Chen et al. (2019), but this is true only for a handful of special cases.A more general approach is to use synchronous couplings of the diﬀusion: that is, initial-ize one diﬀusion process at x and another process sharing the same Brownian motion, butstarting at x + δe ( i ) . The bound then depends on the coupling time of the two diﬀusions.This idea was exploited heavily in Mackey and Gorham (2016) to study derivative boundsfor overdamped Langevin diﬀusions with strongly concave drifts and later in Gorham et al.(2019) where the authors used a combination of synchronous couplings and reﬂection cou-plings studied in Eberle (2016) and Wang (2016) to establish derivative bounds for a classof fast-coupling diﬀusions. However, the results of both papers require non-trivial assump-tions on the diﬀusion. For example, both papers required the drift be k-strongly concanveand everywhere diﬀerentiable, and their results fail to hold for a Lipschitz drift with onlya single point of non-diﬀerentiability such as the piecewise Ohrnstein-Uhlenbeck processused for approximating the many-server queue in Braverman et al. (2016). Apart fromthe diﬀerentiability assumption on the drift, the results of Mackey and Gorham (2016)and Gorham et al. (2019) hold only for diﬀusions on the entire space R d . This excludesdiﬀusions with reﬂecting boundary conditions, such as reﬂecting Brownian motions thatappear as heavy-traﬃc limits for networks of single-server queueing systems.A second approach to getting derivative bounds was proposed in Gurvich (2014a), wherethe author used a priori Schauder estimates from PDE theory to bound the derivativesof f h ∗ ( x ) in terms of f h ∗ ( x ) and h ( x ). He then bounded f h ∗ ( x ) by a Lyapunov functionsatisfying an exponential ergodicity condition for the diﬀusion. This approach requires raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. ﬁnding a Lyapunov function satisfying an exponential ergodicity condition, which typicallyrequires signiﬁcant eﬀort, e.g. Dieker and Gao (2013), Gurvich (2014a). Furthermore, inthe case of a diﬀusion with a reﬂecting boundary, the complexity of the PDE machineryused makes it nontrivial to trace how the a priori Schauder estimates depend on theprimitives of the diﬀusion process.Most recently, another approach to getting derivative bounds based on Bismut’s formulafrom Malliavin calculus was proposed in Fang et al. (2018). The authors required the dif-fusion coeﬃcient to be constant, and the assumptions imposed on the drift were similar tothose in Mackey and Gorham (2016). While each of the four approaches discussed aboveeach has merits, each also has drawbacks, and none are universally applicable to all prob-lems. For this reason, the problem of getting derivative bounds is often the bottleneck ofthe generator comparison approach. In this paper, we present a new approach to boundingthe left-hand side of (3). Let us informally illustrate its main steps.Fix a test function h : δ Z d → R , deﬁned only on the lattice δ Z d as opposed to R d asbefore. Now, instead of (2), we consider the Poisson equation of the prelimit, G X f h ( δk ) = E h ( X ) − h ( δk ) , k ∈ Z d . (6)The solution to (6) is unique up to a constant. Furthermore, when adapted to continuoustime, Proposition 7.1 of Asmussen (2003) states that a solution to (6) exists provided E | h ( X ) | < ∞ . We are tempted to proceed analogously to (3) by taking expected valueswith respect to Y , but we cannot do so because G X f h ( δk ) is not deﬁned on R d \ δ Z d . We getaround this by interpolating the discrete Poisson equation. Namely, we introduce a spline A , which interpolates functions f : δ Z d → R and results in extended functions Af : R d → R .By applying A to both sides of (6), we obtain the interpolated Poisson equation AG X f h ( x ) = E h ( X ) − Ah ( x ) , x ∈ R d . We may now take expected values with respect to Y to arrive at E h ( X ) − E Ah ( Y ) = E AG X f h ( Y )= E AG X f h ( Y ) − E G Y Af h ( Y ) . (7) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

We will see later that E G Y Af h ( Y ) = 0 follows from Itˆo’s lemma provided that Af h ( x )satisﬁes some mild conditions. To ensure that the convergence of (7) to zero implies theconvergence of X to Y , we again need to ensure that h ( δk ) belongs to a rich-enoughclass of functions. We describe some convergence-determining classes of grid-restricted testfunctions in Section 2.2. Lastly, to make the right-hand side of (7) comparable to (3), wewant to interchange A and G X . We will see that this interchange is possible but results insome error; i.e., AG X f h ( x ) = G X Af h ( x ) + error.After this interchange, the right-hand side of (7) becomes analogous to (3) in the sensethat the derivatives of f h ∗ ( x ) that appear in (3) are replaced by corresponding derivativesof Af h ( x ). Our choice of A will be such that the derivatives of Af h ( x ) correspond toﬁnite diﬀerences of f h ( δk ), meaning that the problem of establishing derivative bounds isreplaced by an analogous problem of bounding ﬁnite diﬀerences. We can bound the ﬁnitediﬀerences of f h ( δk ) by relying on the fact that f h ( δk ) = Z ∞ (cid:0) E X (0)= δk h ( X ( t )) − E h ( X ) (cid:1) dt, k ∈ Z d (8)is one solution to the Poisson equation and constructing synchronous couplings ofthe CTMC in a manner similar to the diﬀusion synchronous couplings used inMackey and Gorham (2016) and Gorham et al. (2019). We discuss in Section 3 severalways to verify that (8) is well-deﬁned.For ease of reference, we refer to our approach as the prelimit generator comparisonapproach, or simply prelimit approach , and to the traditional approach based on (2) asthe diﬀusion approach . The prelimit and diﬀusion approaches are in some sense parallelapproaches with many conceptual similarities. If we choose h ∗ ( x ) in (3) to equal Ah ( x )from (7), we see that the right-hand sides of (3) and (7) are equal. This means thatany bound established via the diﬀusion approach should, in theory, be attainable via theprelimit approach, and vice versa. In practice, technical diﬀerences can make the prelimitapproach more attractive for some models.First, when working with models that have state-space collapse i.e., when the dimensionof the CTMC is higher than that of the diﬀusion the prelimit approach does not require oneto bound the so-called E | X ⊥ | , which is the distance between the stationary distribution raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. of the CTMC and its projection onto the state-space collapse manifold. This is illustratedin more detail in Braverman (2021), a companion paper in which the prelimit approachis applied to the join-the-shortest-queue model. Second, the diﬀusion approach can suﬀerfrom what we call “misalignment of synchronous couplings,” which can complicate theprocess of getting derivative bounds via synchronous couplings. We illustrate this issue inSection 4 using a simple example.Apart from showing how fast X converges to Y , the prelimit Poisson equation (6)can also be used to prove tightness of the family of steady-state CTMC distributions invarious asymptotic regimes, such as heavy traﬃc in queueing. Tightness has become animportant property since the seminal work of Gamarnik and Zeevi (2006), which initiated awave of research into justifying steady-state diﬀusion approximations of queueing systems;see, for instance, Dai et al. (2014), Budhiraja and Lee (2009), Zhang and Zwart (2008),Katsuda (2010), Ye and Yao (2012), Tezcan (2008), Gamarnik and Stolyar (2012), andGurvich (2014b). Roughly speaking, process-level convergence of the CTMC to a diﬀusioncombined with tightness of the CTMC stationary distributions enables one to performa limit-interchange argument to conclude convergence of steady-state distributions. Thebottleneck is usually proving tightness, which has become synonymous with steady-stateconvergence.We can use (6) to prove tightness as follows. If x ∞ is the ﬂuid equilibrium of the CTMC,we may choose h ( δk ) = | δk − x ∞ | in the Poisson equation to see that E | X − x ∞ | = G X f h ( x ∞ ) . The right-hand side corresponds to the transition rates in the CTMC and typically containsdiﬀerences of f h ( δk ) up to the second order. We give an example in Section 3.2. Provingtightness is therefore equivalent to getting ﬁrst- and second-order diﬀerence bounds at the single point x ∞ . In contrast, bounding the approximation error of Y requires third-orderdiﬀerence bounds on the entire support of Y . This perspective highlights the extra workrequired to go from merely proving the fact of convergence to having convergence rates.The idea of interpolating the discrete Poisson equation can be applied more broadly tothe problem of comparing discrete and continuous distributions using Stein’s method. To raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. the authors’ knowledge, anytime Stein’s method has been invoked for a discrete-versus-continuous random variable comparison, the starting point has always been the diﬀerentialequation for the continuous random variable. Furthermore, in most applications of themethod, the starting point has been the diﬀerential/diﬀerence equation for the limitingdistribution, whereas we start with the prelimit.To summarize, we make two main technical contributions. The ﬁrst is that we establishthe existence of an interpolator A that satisﬁes certain convenient properties. Theorem 1contains the one-dimensional result, which is generalized to multiple dimensions in Theo-rem 3. The second contribution is that we describe the interchange error of A with G X ,which is a necessary step to compare G X to a diﬀusion generator. Propositions 1 and2 contain the one-dimensional results, while Propositions 3 and 4 are multidimensionalgeneralizations. After presenting the general framework, we illustrate it using the single-server queueing system as an example. This paper is meant to be a gentle introductionto the prelimit approach as well as a chance to illustrate several techniques for obtainingdiﬀerence bounds.It is important to add that using CTMC synchronous couplings dates back to Barbour(1988), which was the ﬁrst paper to connect Stein’s method to Markov chains (althoughthe author of that paper did not use the term “synchronous coupling.”) In that work,the author viewed the Poisson distribution as the steady-state distribution of the inﬁniteserver queue. Later, the application of Stein’s method to birth-death processes receiveda thorough treatment in Brown and Xia (2001). A more recent example of using CTMCsynchronous couplings can be found in Barbour et al. (2018a,b).The remainder of the paper is structured as follows. In Section 2 we introduce thetechnical components of the prelimit approach. We then apply the prelimit approach tothe M/M/

For any set B ⊂ R d , we let Conv( B ) denote its convex hull. For any integer k ≥ B ⊂ R d , we let C k ( B ) be the set of all k -times continuously diﬀerentiable functions raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. f : B → R . We use D = D ([0 , ∞ ) , R ) to denote the space of right continuous functions withleft limits mapping [0 , ∞ ) to R . Given a stochastic process { Z ( t ) } ∈ D and a functional f : D → R , we write E x ( f ( Z )) to denote E ( f ( Z ) | Z (0) = x ). We let e ∈ R d be the vectorwhose elements all equal 1 and let e ( i ) be the element with 1 in the i th entry and zerosotherwise. We use Z to denote the set of integers and let N = { , , , . . . } . For any δ > d >

0, we let δ Z d = { δk : k ∈ Z d } and deﬁne δ N d similarly. For any function f : δ Z d → R , we deﬁne the forward diﬀerence operator in the i th direction as∆ i f ( δk ) = f (cid:0) δ ( k + e ( i ) ) (cid:1) − f ( δk ) , k ∈ Z d , ≤ i ≤ d, and for j ≥

0, we deﬁne ∆ j +1 i f ( δk ) = ∆ ji f ( δ ( k + e ( i ) )) − ∆ ji f ( δk ) , (9)with the convention that ∆ i f ( δk ) = f ( δk ). For a vector a ∈ N d , we also let∆ a f ( δk ) = ∆ a . . . ∆ a d d f ( δk ) , and if f : R d → R , then ∂ a ∂x a f ( x ) = ∂ a ∂x a . . . ∂ a d ∂x a d d f ( x ) , and we adopt the convention that ∂ ∂x f ( x ) = f ( x ). For any x ∈ R d , we deﬁne k x k = P di =1 | x i | and write | x | to denote the Eucledian norm. Throughout the paper we will oftenuse C to denote a generic positive constant that may change from line to line and that willgenerally be independent of any parameters not explicitly speciﬁed. For a random variable X , we write supp( X ) to denote the support of X .

2. The Prelimit Generator Comparison Approach

In this section, we work out the technical details of the prelimit approach. We begin byintroducing the interpolation operator A in Section 2.1. We follow this with a discussionof convergence-determining classes in Section 2.2. Then, we write the form of AG X f ( x ) ina manner that easily lends itself to analysis. Informally, we refer to this as interchanging A with G X . The interchange is not perfect and results in some error that depends on the raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. domain of the CTMC. We treat unbounded domains in section 2.3 and bounded domains insection 2.4. To minimize notational burden, we restrict our discussion in Sections 2.1, 2.3,and 2.4 to one-dimensional CTMCs. In multiple dimensions, the results are analogous froma technical perspective, but may be harder to parse at ﬁrst read. We therefore postponethe multidimensional discussion to the appendix, in which multidimensional interpolationis discussed in Appendix A, and multidimensional interchange is left to Appendix B.

The objective of this section is to state Theorem 1. Fix δ >

0, and for x ∈ R deﬁne k ( x ) = ⌊ x/δ ⌋ . Let K ⊂ R be a possibly unbounded interval and deﬁne K = { x ∈ K ∩ δ Z : ( x + 4 δ ) ∈ K ∩ δ Z } . For example, if K = ( −∞ , ∞ ), then K = δ Z . Let f : K ∩ δ Z be the function we want toextend to the continuum. One may be familiar with the cubic spline, a staple of numericalanalysis that can certainly extend the function. However, a cubic spline is insuﬃcientfor our purposes because we want the extension to be four-times diﬀerentiable almosteverywhere. Instead, we use a spline composed of degree-7 polynomials. Deﬁne Af ( x ) = P k ( x ) ( x ) , where P k ( x ) = X i =0 α kk + i ( x ) f ( δ ( k + i )) , x ∈ R . Each P k ( x ) is a degree-7 polynomial and is best understood as a weighted sum of f ( δk ) , . . . , f ( δ ( k + 4)) with weights α kk ( x ) , . . . , α kk +4 ( x ). The precise form of P k ( x ) is dis-tracting, so we state it in Appendix A. The following result summarizes the key propertieswe will need of Af ( x ) and the weights α kk + i ( x ). Theorem 1.

Given f : K ∩ δ Z → R , the function Af ( x ) = X i =0 α k ( x ) k ( x )+ i ( x ) f ( δ ( k ( x ) + i )) , x ∈ Conv ( K ) (10) belongs to C ( Conv ( K )) and is inﬁnitely diﬀerentiable on Conv ( K ) \ K . Furthermore, Af ( δk ) = f ( δk ) , δk ∈ K , (11) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. and the derivatives of Af ( x ) are bounded by the corresponding ﬁnite diﬀerences of f ( δk ) .Namely, there exists C > independent of x , f ( · ) and δ such that (cid:12)(cid:12)(cid:12) ∂ a ∂x a Af ( x ) (cid:12)(cid:12)(cid:12) ≤ Cδ − a max ≤ i ≤ − a | ∆ a f ( δ ( k ( x ) + i )) | , x ∈ Conv ( K ) , ≤ a ≤ , (12) and (12) also holds for x ∈ Conv ( K ) \ K when a = 4 . Additionally, the weights (cid:8) α kk + i : R → R : k ∈ Z , i = 0 , , , , (cid:9) are degree- polynomials in ( x − δk ) /δ whose coeﬃcientsdo not depend on k or δ . They satisfy α kk ( δk ) = 1 , and α kk + i ( δk ) = 0 , k ∈ Z , i = 1 , , , , (13) X i =0 α kk + i ( x ) = 1 , k ∈ Z , x ∈ R , (14) and also the following translational invariance property: α k + jk + j + i ( x + δj ) = α kk + i ( x ) , i, j, k ∈ Z , x ∈ R . (15)Theorem 1 is proved in Appendix A and follows directly from the form of P k ( x ) statedthere. From (12) we see that the reason P k ( x ) depends on f ( δk ) , . . . , f ( δ ( k + 4)), as opposedto also depending on f ( δ ( k + 5)), is that we want ∂ a ∂x a Af ( x ) to be related to ∆ a f ( δk ( x ))for 0 ≤ a ≤

4, and we do not care what happens beyond the fourth derivative. In theory,one can make Af ( x ) as diﬀerentiable as is needed by using a higher degree polynomial P k ( x ). We mentioned in the introduction that when one uses the diﬀusion approach, Lip(1) is acommonly used convergence-determining class. In this section we discuss two convergence-determining classes of grid-valued functions that can be used with the prelimit approach.Lemma 1 below presents the main result of this section.Recall our convention of using a star superscript to emphasize that a function is deﬁnedon the continuum. Given two random variables

U, V ∈ R d and a class of functions H = { h ∗ : R d → R } , we deﬁne d H ( U, V ) = sup h ∗ ∈H (cid:12)(cid:12)(cid:12) E h ∗ ( U ) − E h ∗ ( V ) (cid:12)(cid:12)(cid:12) . raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

We already said that Lip(1) is a convergence-determining class because d Lip(1) ( U, V ) → U converges to V in distribution. There are, of course, other convergence-determining classes. For instance, it was shown in Lemma 2.2 of Mackey and Gorham(2016) that if H = M = n h ∗ : R d → R : (cid:12)(cid:12)(cid:12) ∂ a ∂x a h ∗ ( x ) (cid:12)(cid:12)(cid:12) ≤ , ≤ k a k ≤ o , then d M ( U, V ) → M are classes of functions deﬁned on R d , but in the prelimit approachthe function h ( δk ) in (6) is deﬁned only on the grid. To mimic the two classes, we deﬁnedLip(1) = { h : δ Z d → R : | ∆ j h ( δk ) | ≤ δ, ≤ j ≤ d, k ∈ δ Z d } , M disc ( C ) = { h : δ Z d → R : | ∆ a h ( δk ) | ≤ Cδ k a k , ≤ k a k ≤ , k ∈ δ Z d } . The following lemma relates d Lip(1) ( U, V ) and d M ( U, V ) to their grid-restricted counter-parts. The lemma involves the multidimensional interpolator, which we have not yet for-mally introduced. However, that does not preclude an understanding of the lemma, whichis proved in Section C.1.

Lemma 1.

Let U ∈ δ Z d and V ∈ R d be two random vectors. For any h ∗ : R d → R , let h : δ Z d → R be the restriction of h ∗ ( x ) to δ Z d . Let Ah : R d → R be deﬁned by (10) when d = 1 , and by (51) when d > . Then there exists a constant C > such that | E h ∗ ( U ) − E h ∗ ( V ) | ≤ | E h ( U ) − E Ah ( V ) | + Cδ sup ≤ j ≤ dx ∈ R d (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x j h ∗ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . As a consequence, there exists a constant C ′ > such that d Lip(1) ( U, V ) ≤ sup h ∈ dLip(1) | E h ( U ) − E Ah ( V ) | + Cδ,d M ( U, V ) ≤ sup h ∈M disc ( C ′ ) | E h ( U ) − E Ah ( V ) | + Cδ. raverman:

Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. Assume G X f ( δk ) is deﬁned for all k ∈ Z ; i.e., the CTMC lives on δ Z . The interchange resultfor A and G X is given in Proposition 1 below. We then apply this result to characterizethe approximation error between X and its diﬀusion approximation Y in (22).Recall that q δk,δk ′ are the transition rates of our CTMC and deﬁne β ℓ ( δk ) = q δk,δ ( k + ℓ ) for k, ℓ ∈ Z . Then G X f ( δk ) = X k ′ ∈ Z q δk,δk ′ ( f ( δk ′ ) − f ( δk )) = X ℓ ∈ Z β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) , k ∈ Z . Fix h : δ Z → R such that G X f h ( δk ) = E h ( X ) − h ( δk ) , k ∈ Z has a solution f h ( δk ). Since A is a linear operator, we have AG X f h ( x ) = A ( E h ( X ) − h )( x ) = E Ah ( X ) − Ah ( x ) . The following result says AG X f ( x ) = G X Af ( x ) + error( x ) and characterizes the error term.We prove it in Section B.1 by proving the multidimensional version, Proposition 3, there. Proposition 1.

Fix f : δ Z → R and assume that G X f ( δk ) is deﬁned on all of δ Z .Assume also that X ℓ ∈ Z | β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) | < ∞ , k ∈ Z , (16) which is trivially satisﬁed when the number of possible transitions from each state is ﬁnite.Then for any x ∈ R , AG X f ( x ) = X ℓ ∈ Z Aβ ℓ ( x ) (cid:0) Af ( x + δℓ ) − Af ( x ) (cid:1) + ε ( x ) . (17) The error satisﬁes ε ( x ) = X ℓ ∈ Z X i =0 α k ( x ) k ( x )+ i ( x ) (cid:16) β ℓ (cid:0) δ ( k ( x ) + i ) (cid:1) − Aβ ℓ ( x ) (cid:17) × (cid:16) ℓ > i − X j =0 ℓ − X m =0 ∆ f (cid:0) δ ( k ( x ) + m + i ) (cid:1) − ℓ < i − X j =0 − X m = ℓ ∆ f (cid:0) δ ( k ( x ) + m + i ) (cid:1)(cid:17) . (18) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

Let us assume that our CTMC is such that f h ( δk ) satisﬁes (16). With the help of Propo-sition 1, we may derive the diﬀusion approximation Y and characterize the approximationerror. First, we apply Taylor expansion to (cid:0) Af h ( x + δℓ ) − Af h ( x ) (cid:1) to get AG X f h ( x ) = ( Af h ) ′ ( x ) δ X ℓ ∈ Z ℓAβ ℓ ( x ) + 12 ( Af h ) ′′ ( x ) δ X ℓ ∈ Z ℓ Aβ ℓ ( x )+ 16 δ X ℓ ∈ Z ℓ Aβ ℓ ( x )( Af h ) ′′′ ( ξ ℓ ( x )) + ε ( x ) , where ξ ℓ ( x ) is some number between x and x + δℓ . To approximate X , we set b ( x ) = δ X ℓ ∈ Z ℓAβ ℓ ( x ) , and a ( x ) = δ X ℓ ∈ Z ℓ Aβ ℓ ( x ) , x ∈ R , and consider the random variable Y whose density is given by p ( x ) = κa ( x ) exp (cid:16) Z x b ( y ) a ( y ) dy (cid:17) , x ∈ R , where κ is a normalizing constant that we assume to be ﬁnite. One may verify usingintegration by parts that E b ( Y ) f ( Y ) + 12 E a ( Y ) f ′ ( Y ) = 0 (19)for any f ( x ) for which the expectations above exist and for whichlim x →±∞ f ( x ) exp (cid:16) R x b ( y ) a ( y ) dy (cid:17) = 0. Another way to view Y is as the stationary distributionof the diﬀusion process Y ( t ) = Y (0) + Z t b ( Y ( s )) ds + Z t p a ( Y ( s )) dW ( s ) , (20)where { W ( t ) } is standard Brownian motion. The process above has generator G Y f ( x ) = b ( x ) f ′ ( x ) + 12 a ( x ) f ′′ ( x ) , and Itˆo’s lemma tells us that for any f ∈ C ( R ), E x f ( Y ( t )) − E x f ( Y (0)) = E x Z t G Y f ( Y ( s )) ds, t > . raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. Provided that E f ( Y ) is ﬁnite, we can initialize Y (0) d = Y to get0 = E h Z t G Y f ( Y ( s )) ds (cid:12)(cid:12)(cid:12) Y (0) d = Y i . If we further assume that E | G Y f ( Y ) | < ∞ , then we can apply the Fubini-Tonelli theoremto interchange the integral and expectation above and conclude that0 = E G Y f ( Y ) , (21)which is just a restatement of (19). Now, provided that (21) holds with Af h ( x ) in place of f ( x ) there, we get E Ah ( X ) − E Ah ( Y ) = E AG X f h ( Y ) − E G Y Af h ( Y )= 16 δ E X ℓ ∈ Z ℓ Aβ ℓ ( Y )( Af h ) ′′′ ( ξ ℓ ( Y )) + E ε ( Y ) . (22)The bounds on ( Af h ) ′′′ ( x ) from Theorem 1 imply16 δ (cid:12)(cid:12)(cid:12) E X ℓ ∈ Z ℓ Aβ ℓ ( Y )( Af h ) ′′′ ( ξ ℓ ( Y )) (cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) E X ℓ ∈ Z ℓ Aβ ℓ ( Y ) max ≤ i ≤ (cid:12)(cid:12) ∆ f h ( δ ( k ( ξ ℓ ( Y )) + i )) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) . In other words, the term above depends on ℓ Aβ ℓ ( Y ) and third order diﬀerences of f h ( δk ).The second term in (22) is E ε ( Y ). We recall ε ( x ) below for convenience: ε ( x ) = X ℓ ∈ Z X i =0 α k ( x ) k ( x )+ i ( x ) (cid:16) β ℓ (cid:0) δ ( k ( x ) + i ) (cid:1) − Aβ ℓ ( x ) (cid:17) × (cid:16) ℓ > i − X j =0 ℓ − X m =0 ∆ f (cid:0) δ ( k ( x ) + m + i ) (cid:1) − ℓ < i − X j =0 − X m = ℓ ∆ f (cid:0) δ ( k ( x ) + m + i ) (cid:1)(cid:17) . First, the fact that α kk + i ( x ) is a polynomial in ( x − δk ) /δ implies sup x ∈ R (cid:12)(cid:12) α k ( x ) k ( x )+ i ( x ) (cid:12)(cid:12) isbounded by a constant independent of δ, x , or any other parameters. Second, the fact that0 ≤ i ≤ (cid:12)(cid:12) β ℓ (cid:0) δ ( k ( x ) + i ) (cid:1) − Aβ ℓ ( x ) (cid:12)(cid:12) = (cid:12)(cid:12) Aβ ℓ (cid:0) δ ( k ( x ) + i ) (cid:1) − Aβ ℓ ( x ) (cid:12)(cid:12) ≤ δ ( Aβ ℓ ) ′ ( ξ ′ ℓ ( x )) . Therefore, provided that the transition rates β ℓ ( · ) of the CTMC do not vary too much,e.g., they are Lipschitz, the term above can be controlled, so bounding (22) comes downto bounding ∆ f h ( δk ) and ∆ f h ( δk ). raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

In this section we present Proposition 2, which is an interchange result when G X f ( δk ) isdeﬁned only for k ∈ N , i.e., a domain with a left boundary. Domains with both left andright boundaries can be handled similarly.To see why Proposition 1 is inadequate in the case of a bounded domain, consider thebirth-death process deﬁned by its generator G X f ( δk ) = λ ∆ f ( δk ) − µ k > f ( δ ( k − , k ∈ N . (23)This generator corresponds to the customer count, scaled by δ , in a single-server queuewhere customers arrive according to a Poisson process with rate λ and service times areexponentially distributed with rate µ . Such a system is also known as the M/M/ ρ = λ/µ is the system load. In steady state, the customer count isgeometrically distributed provided that ρ <

1. It is also well known that as ρ →

1, thecustomer count can be approximated by an exponential random variable. This section isfocused on obtaining an analog of (22) for the

M/M/ f : δ N → R and consider AG X f ( x ), which is deﬁned for x ∈ [0 , ∞ ). By repeating theproof of Proposition 1, one may check that AG X f ( x ) = λ (cid:0) Af (cid:0) x + δ (cid:1) − Af ( x ) (cid:1) + µ (cid:0) Af (cid:0) x − δ (cid:1) − Af ( x ) (cid:1) + ε ( x )for x ≥ δ . However, contrary to Proposition 1, the above equality fails for x ∈ [0 , δ ) due tothe fact that Af ( x ) = P i =0 α k ( x ) k ( x )+ i ( x ) f ( δ ( k ( x ) + i )) is not deﬁned for x < f ( δk )is undeﬁned for k <

0. Our restriction of f ( δk ) to k ≥ f ( δk ) we intend to use the solution to the Poisson equation for the M/M/ δ N .We now describe an alternative to Proposition 1 for a general class of CTMCs deﬁnedon δ N . Consider the CTMC with generator G X f ( δk ) = X ℓ ∈ Z β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) , k ∈ N raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. and let X be distributed according to its stationary distribution. Fix h : δ N → R with E | h ( X ) | < ∞ . As discussed in the introduction, the equation G X f h ( δk ) = E h ( X ) − h ( δk ) , k ∈ N has a ﬁnite solution f h ( δk ). We ﬁrst extrapolate f h : δ N → R to all of δ Z by letting b f h ( δk ) = X i =0 α k ∨ k ∨ i ( δk ) f h ( δ ( k ∨ i )) , k ∈ Z , (24)where the weights α kk + i ( x ) are as in Section 2.1. Next, we assume there exists some L ∈ N such that β ℓ ( δk ) = 0 for all k ∈ N if ℓ < − L . In other words, we assume that the largestjump to the left is, at most, of size L . Let us deﬁne f Aβ ℓ ( x ) = X i =0 α k ∨ Lk ∨ L + i ( x ) β ℓ (cid:0) δ ( k ∨ L + i ) (cid:1) , x ∈ R , ℓ ∈ Z . Note that f Aβ ℓ ( x ) = Aβ ℓ ( x ) for x ≥ δL , and for x < δL it is again the extrapolation of thetransition rates β ℓ ( δk ) based on their values when k ≥ L . In our M/M/ L = 1,and the facts that P i =0 α kk + i ( x ) = 1 implies f Aβ ( x ) = λ , and f Aβ − ( x ) = µ for all x ∈ R because β ( δk ) , β − ( δk ) are constant for all k ≥ Proposition 2.

Assume that X ℓ ∈ Z | β ℓ ( δk )( f h ( δ ( k + ℓ )) − f h ( δk )) | < ∞ , k ∈ N , (25) which is trivially satisﬁed when the number of possible transitions from each state is ﬁnite.Then AG X f h ( x ) = X ℓ ∈ Z f Aβ ℓ ( x )( A b f h ( x + δℓ ) − A b f h ( x )) + e ε ( x ) + ε h ( x ) + ε f ( x ) , x ≥ where e ε ( x ) = X ℓ ∈ Z X i =0 α k ( x ) ∨ Lk ( x ) ∨ L + i ( x ) (cid:16) β ℓ (cid:0) δ ( k ( x ) ∨ L + i ) (cid:1) − f Aβ ℓ ( x ) (cid:17) × (cid:16) ℓ > i − X j =0 ℓ − X m =0 ∆ f h (cid:0) δ ( k ( x ) ∨ L + m + i ) (cid:1) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. − ℓ < i − X j =0 − X m = ℓ ∆ f h (cid:0) δ ( k ( x ) ∨ L + m + i ) (cid:1)(cid:17) , | ε h ( x ) | ≤ (cid:0) x ∈ [0 , δL ) (cid:1) C ( L ) max ≤ m ≤ L (cid:12)(cid:12) ∆ h ( δm ) (cid:12)(cid:12) , | ε f ( x ) | ≤ (cid:0) x ∈ [0 , δL ) (cid:1) C ( L ) X ℓ ∈ Z (cid:12)(cid:12) f Aβ ℓ ( x ) (cid:12)(cid:12)(cid:16) max ≤ m ≤ L (cid:12)(cid:12) ∆ f h ( δm ) (cid:12)(cid:12) + max ≤ m ≤ L + ℓ +8 (cid:12)(cid:12) ∆ f h ( δm ) (cid:12)(cid:12) (cid:17) . Proposition 2 is proved in Section B.2, where we state and prove the multidimensionalversion. The error term e ε ( x ) resembles ε ( x ) from Proposition 1, while ε h ( x ) and ε f ( x )are new. The bounds on | ε f ( x ) | and | e ε ( x ) | are similar in that both depend on the ﬁnitediﬀerences of f h ( δk ). The bound on | ε h ( x ) | depends on the ﬁnite diﬀerences of h ( δk ) andcan be made small by assuming h ∈ M disc ( C ′ ) for some C ′ >

0, which we know by Lemma 1to be a convergence-determining class of functions.To conclude this section, we derive a diﬀusion approximation for the

M/M/ G X be the M/M/ f h ( δk ) solve the corresponding Poisson equation. Then Proposition 2 statesthat for x ≥ AG X f h ( x ) = λ ( A b f h ( x + δ ) − A b f h ( x )) + µ ( A b f h ( x − δ ) − A b f h ( x ))+ e ε ( x ) + ε h ( x ) + ε f ( x )= δ ( λ − µ )( Af h ) ′ ( x ) + 12 δ ( λ + µ )( Af h ) ′′ ( x )+ 16 δ (cid:0) λ ( Af h ) ′′′ ( ξ ( x )) + µ ( A b f h ) ′′′ ( ξ − ( x )) (cid:1) + e ε ( x ) + ε h ( x ) + ε f ( x ) (26)In the second equality, we used the fact that A b f h ( x ) = Af h ( x ) for x ≥

0. The only termthat we are not equipped to bound yet is ( A b f h ) ′′′ ( x ), which by Theorem 1 we know isbounded by third-order diﬀerences of b f h ( δk ). To bound it, we need the following auxiliaryresult. Lemma 2.

Let b f h ( δk ) be as in (24) . There exists a constant C > independent of anyparameters such that for a = 0 , , , , , (cid:12)(cid:12) ∆ a b f h ( δk ) (cid:12)(cid:12) ≤ C (cid:0) | k ∧ | (cid:1) max ≤ i ≤ (cid:12)(cid:12) ∆ a f h (cid:0) δ (( k ∨

0) + i )) (cid:1)(cid:12)(cid:12) , k ∈ Z . raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. The multidimensional version of the lemma is restated in Section B.2 and proved in Sec-tion B.2.2. To derive the diﬀusion approximation, we use the ﬁrst line on the right-handside of (26). Since the

M/M/ Y ( t ) = Y (0) + δ ( λ − µ ) t + δ p λ + µW ( t ) + R ( t ) , (27)where R ( t ) is the unique, continuous, and non-decreasing process such that Y ( t ) ≥ R (0) = 0 and R ( t ) increases only at those times t when Y ( t ) = 0. Theorem 2 inHarrison and Reiman (1981) provides a version of Itˆo’s lemma for RBMs. Namely, for any f ∈ C (2) ( R + ), E x f ( Y ( t )) − E x f ( Y (0)) = E x h Z t (cid:0) δ ( λ − µ ) f ′ ( Y ( s )) + 12 δ ( λ + µ ) f ′′ ( Y ( s )) (cid:1) ds + f ′ (0) R ( t ) i . Let Y be a random variable having the stationary distribution of this RBM. It is well knownthat Y is exponentially distributed with mean δ ( λ + µ )2( λ − µ ) . Picking f ( x ) = x and initializing Y (0) d = Y above yields E (cid:0) R (1) | Y (0) d = Y (cid:1) = δ ( µ − λ ) , so for any function f ( x ) such that E f ( Y ) < ∞ and E (cid:12)(cid:12) δ ( λ − µ ) f ′ ( Y ) + 12 δ ( λ + µ ) f ′′ ( Y ) (cid:12)(cid:12) < ∞ , we can invoke the Fubini-Tonelli theorem to conclude that E (cid:0) δ ( λ − µ ) f ′ ( Y ) + 12 δ ( λ + µ ) f ′′ ( Y ) (cid:1) + f ′ (0) δ ( µ − λ ) = 0 . (28)Assume we know (28) is satisﬁed when f ( x ) = Af h ( x ), a fact that will be veriﬁed byLemma 4 of the following section. Taking expected values with respect to Y in (26), andusing the fact that AG X f h ( x ) = E h ( X ) − Ah ( x ), we arrive at E h ( X ) − E Ah ( Y )= E AG X f h ( Y ) − (cid:16) E (cid:0) δ ( λ − µ )( Af h ) ′ ( Y ) + 12 δ ( λ + µ )( Af h ) ′′ ( Y ) (cid:1) + ( Af h ) ′ (0) δ ( µ − λ ) (cid:17) = 16 δ E (cid:0) λ ( Af h ) ′′′ ( ξ ( Y )) + µ ( A b f h ) ′′′ ( ξ − ( Y )) (cid:1) + E (cid:0)e ε ( Y ) + ε h ( Y ) + ε f ( Y ) (cid:1) − ( Af h ) ′ (0) δ ( µ − λ ) . (29) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

We see that just as in the case of a CTMC with an unbounded domain, bounding theright-hand side requires bounds on the ﬁnite diﬀerences of f h ( δk ).

3. Diﬀerence Bounds for the

M/M/ system In this section we discuss various ways to establish bounds on the ﬁnite diﬀerences of thesolution to the

M/M/

M/M/ f h ( δk ) = Z ∞ (cid:0) E X (0)= δk h ( X ( t )) − E h ( X ) (cid:1) dt solves the Poisson equation, so we now verify this fact. Lemma 3.

Consider a CTMC taking values on a set E ⊂ δ Z d with generator G X givenin (1) , and assume that Z ∞ (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt is ﬁnite for all δk ∈ E. (30) Then f h ( δk ) = R ∞ (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt solves G X f h ( δk ) = E h ( X ) − h ( δk ) , δk ∈ E. Lemma 3 is proved by performing a ﬁrst-step analysis on R ∞ ε (cid:0) E X (0)= δk h ( X ( t )) − E h ( X ) (cid:1) dt for small values of ε . It is relegated to Section C.2.In practice, there are several ways to verify that (30) holds. One way is by showing that { X ( t ) } is h -exponentially ergodic; i.e., | E δk h ( X ( t )) − E h ( X ) | ≤ c e − c t for some c , c > E is ﬁnite but when E is inﬁnite, theusual way to prove this would be to ﬁnd a Lyapunov function V ( δk ) such that G X V ( δk ) ≤− cV ( δk ) + ¯ c k ∈ K ) for some compact set K and some constants c, ¯ c >

0. We refer thereader to Meyn and Tweedie (1993) for more on exponential ergodicity. raverman:

Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. There is another way to verify (30) that is much closer to the spirit of this paper becauseit relies on ﬁnite diﬀerence bounds. First, note that (cid:12)(cid:12)(cid:12) Z ∞ (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt (cid:12)(cid:12)(cid:12) ≤ Z ∞ X j ∈ Z d P ( X = δj ) (cid:12)(cid:12) E δk h ( X ( t )) − E δj h ( X ( t )) (cid:12)(cid:12) dt = X j ∈ Z d P ( X = δj ) Z ∞ (cid:12)(cid:12) E δk h ( X ( t )) − E δj h ( X ( t )) (cid:12)(cid:12) , where the last equality follows from by Fubini-Tonelli. It is possible to use synchronouscouplings to prove the right-hand side is ﬁnite. Let us illustrate this for the M/M/ { X ( t ) } be the CTMC and X have the station-ary distribution associated with the M/M/ G X that we introduced in (23) ofSection 2.4. Similarly, we let Y have the stationary distribution of the RBM in (27) of thesame section. Suppose we prove that for h ∈ dLip(1), Z ∞ (cid:12)(cid:12) E δ ( k +1) h ( X ( t )) − E δk h ( X ( t )) (cid:12)(cid:12) ≤ δ ( k + 1) µ − λ , k ∈ N . (31)We can then use a telescoping sum and the triangle inequality to see that Z ∞ (cid:12)(cid:12) E δk h ( X ( t )) − E δj h ( X ( t )) (cid:12)(cid:12) dt ≤ k ∨ j − X i = k ∧ j Z ∞ (cid:12)(cid:12) E δ ( i +1) h ( X ( t )) − E δi h ( X ( t )) (cid:12)(cid:12) dt ≤ k ∨ j − X i = k ∧ j δ ( i + 1) µ − λ ≤ δ ( k + j + 1) µ − λ ( k + j ) , and so (cid:12)(cid:12)(cid:12) Z ∞ (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt (cid:12)(cid:12)(cid:12) ≤ X j ∈ Z P ( X = δj ) δ ( k + j + 1) µ − λ ( k + j ) . The right-hand side is ﬁnite because E X < ∞ . Thus, (30) is satisﬁed for our M/M/ f h ( δk ) with the helpof synchronous couplings. The following result summarizes our bounds. Lemma 4.

For any h ∈ dLip(1) , (31) holds and consequently, the function f h ( δk ) in (8) is well deﬁned. Furthermore, | ∆ f h ( δk ) | ≤ δ ( k + 1) µ − λ , (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) ≤ δµ − λ , and (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) ≤ δλ , k ∈ N . (32) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

We prove the ﬁrst claim and establish the bound on | ∆ f h ( δk ) | in Section 3.1. The remainingtwo bounds are proved in Section 3.3, with the help of the discussion from Section 3.2. Letus now show how Lemma 4 can be combined with the theory developed in Section 2 tobound the approximation error between Y and X . Theorem 2.

There exists a constant

C > such that for ρ < and h ∈ dLip(1) , | E h ( X ) − E Ah ( Y ) | ≤ Cδ (cid:16) ρ (cid:17) . Before proving the theorem, let us comment on the possible choices of δ . It is well knownthat X is well approximated by Y when ρ → E X = δρ/ (1 − ρ ). Choosing δ = 1,we see that even though E X → ∞ as ρ →

1, the approximation error | E X − E Y | doesnot grow. However, because both random variables diverge, we cannot conclude that X converges to Y . To ensure convergence of X to Y , we recall that E Y = δ ( λ + µ )2( λ − µ ) . Choosing δ = (1 − ρ ) ensures that { X } ρ< and { Y } ρ< are tight. Lemma 1 and Theorem 2 thenimply that X converges to Y in distribution as ρ →

1. As discussed in the introduction,tightness of the prelimit sequence is a sought-after property because, when combined withprocess-level convergence to some diﬀusion limit, tightness implies convergence of station-ary distributions as well. We will discuss in Section 3.2.1 below how one can use the Poissonequation to establish tightness.

Proof of Theorem 2

Since Y is exponentially distributed, a consequence of Lemma 4is that E (cid:0) δ ( λ − µ )( Af h ) ′ ( Y ) + 12 δ ( λ + µ )( Af h ) ′′ ( Y ) (cid:1) + ( Af h ) ′ (0) δ ( µ − λ ) = 0 , which is true because (28) is satisﬁed with f ( x ) = Af h ( x ) there. Consequently, (29) holds,which we recall below: E h ( X ) − E h ( Y ) = 16 δ E (cid:0) λ ( Af h ) ′′′ ( ξ ( Y )) + µ ( A b f h ) ′′′ ( ξ − ( Y )) (cid:1) + E (cid:0)e ε ( Y ) + ε h ( Y ) + ε f ( Y ) (cid:1) − ( Af h ) ′ (0) δ ( µ − λ ) . We now bound each term on the right side above. Using (12) from Theorem 1 and Lemma 4,16 δ λ | ( Af h ) ′′′ ( ξ ( Y )) | ≤ Cλ max ≤ i ≤ (cid:12)(cid:12) ∆ f h (cid:0) δ ( k ( ξ ( Y )) + i ) (cid:1)(cid:12)(cid:12) ≤ Cδ, | ( Af h ) ′ (0) δ ( µ − λ ) | ≤ Cδ , raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. where k ( x ) = ⌊ x/δ ⌋ . Similarly, but using also Lemma 2 and the fact that ξ − ( Y ) ≥ − δ ,16 δ µ (cid:12)(cid:12)(cid:12) ( A b f h ) ′′′ ( ξ − ( Y )) (cid:12)(cid:12)(cid:12) ≤ Cµ max ≤ i ≤ (cid:12)(cid:12)(cid:12) ∆ b f h (cid:0) δ ( k ( ξ − ( Y )) + i ) (cid:1)(cid:12)(cid:12)(cid:12) ≤ Cδ ρ . Note that e ε ( Y ) = 0 because the transition rates β ( δk ) = λ and β − ( δk ) = µ k >

0) areconstant for k ≥

1. Furthermore, E ε h ( Y ) ≤ E (cid:16) (cid:0) Y ∈ [0 , δ ) (cid:1) C max ≤ m ≤ (cid:12)(cid:12) ∆ h ( δm ) (cid:12)(cid:12) (cid:17) ≤ Cδ, where in the last inequality we used the fact that ∆ h ( δm ) can be written in terms of∆ h ( δm + i ) for i = 0 , , ,

3, and that h ∈ dLip(1). Lastly, | ε f ( Y ) | ≤ Cδ ρ follows from the bound on | ε f ( x ) | from Proposition 2 together with the fact that f Aβ ( x ) = λ , f Aβ − ( x ) = µ , | ∆ f h ( δk ) | ≤ | ∆ f h ( δk ) | + | ∆ f h ( δ ( k + 1)) | ≤ δ/λ . (cid:3) We now discuss three ways to bound the ﬁnite diﬀerences of f h ( δk ) and prove Lemma 4. Synchronous couplings provide a way to bound ∆ a f h ( δk ) that generalizes well even if theCTMC is multidimensional. We now illustrate them for the M/M/

Recall that X ( t ) is the number of customers in the systemat time t ≥

0, scaled by δ . Let { X (0) ( t ) } be a copy of { X ( t ) } , and let { X (1) ( t ) } be anotherCTMC whose transitions we now deﬁne. We refer to { X ( i ) ( t ) } as system i . We set X (1) (0) = X (0) (0) + 1 and deﬁne the joint evolution of the two systems via the following table oftransition rates. Table 1 Transitions of the joint process { ( X (1) ( t ) , X (0) ( t )) } in state ( x (1) , x (0) ) . λ ( x (1) + δ, x (0) + δ )2 µ x (0) >

0) ( x (1) − δ, x (0) − δ )3 µ x (1) > , x (0) = 0) ( x (1) − δ, x (0) ) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

Let us describe the intuition behind this joint construction. Note that system 1 is also an

M/M/ t = 0, and all newly arriving customers,have an identical counterpart in system 1. The only diﬀerence between the two systemsis the extra customer initially present in system 1. One may think of this customer as alow-priority customer that gets served (in system 1) only when all other customers havebeen cleared. The two systems couple once this extra customer is served. We refer tosystems 1 and 0 as a synchronous coupling because the two systems are driven by the sameunderlying stochastic processes, i.e., arrivals and services. We now bound ∆ f h ( δk ).For k ∈ N , deﬁne τ ( i ) ( δk ) = inf t ≥ { X ( i ) ( t ) = δk } . From the discussion above, we have∆ f h ( δk ) = Z ∞ (cid:0) E δ ( k +1) h ( X ( t )) − E δk h ( X ( t )) (cid:1) dt = Z ∞ E X (0) (0)= δk (cid:16) h ( X (1) ( t )) − h ( X (0) ( t )) (cid:17) dt = Z ∞ E X (0) (0)= δk (cid:20) t ≤ τ (1) (0)) (cid:16) h ( X (0) ( t ) + 1) − h ( X (0) ( t )) (cid:17)(cid:21) dt. (33)We emphasize that the last equality above is true because systems 0 and 1 always maintaina constant gap of a single customer until they couple. We bound E X (0) (0)= δk τ (1) (0) bycombining the Lyapunov function V ( δk ) = k with Dynkin’s formula. Observe that V ( δk )satisﬁes G X V ( δk ) = λ − µ < k >

0, which means that E X (0) (0)= δk τ (1) (0) = E X (1) (0)= δ ( k +1) (cid:0) V ( X (1) (0)) − V ( X (1) ( τ (1) (0))) (cid:1) µ − λ = k + 1 µ − λ . (34)To see why the equality above is true, we refer the reader to the proof of Theorem 4.3.i ofMeyn and Tweedie (1993) (which is a direct application of Dynkin’s formula). Combining(34) and the fact that h ∈ dLip(1) with (33) proves | ∆ f h ( δk ) | ≤ δ ( k + 1) µ − λ , k ∈ N . In fact, we have proved the stronger statement (31). raverman:

Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. It is straightforward to extend the coupling we con-structed in the previous section to bound∆ f h ( δk ) = Z ∞ (cid:0) E δ ( k +2) h ( X ( t )) − E δ ( k +1) h ( X ( t )) + E δk h ( X ( t )) (cid:1) dt. In addition to systems 0 and 1, we let { X (2) ( t ) } represent system 2, which is an identicalcopy of system 1 with one additional low-priority customer. The relationship between thethree systems is visualized in Figure 3.1.2, where we note that X (2) (0) = X (1) (0) + δ = X (0) (0) + 2 δ . The transitions of the joint chain are deﬁned in Table 2 below. X (0) X (1) X (2) Figure 1 The initial state of systems 0,1,2 when system 0 starts with 4 customers. Circles arecustomers common to all systems. The diamonds and squares represent the extra customers.Table 2 Transitions of the joint process { ( X (2) ( t ) , X (1) ( t ) , X (0) ( t )) } in state ( x (2) , x (1) , x (0) ) . λ ( x (2) + δ, x (1) + δ, x (0) + δ )2 µ x (0) >

0) ( x (2) − δ, x (1) − δ, x (0) − δ )3 µ x (1) > , x (0) = 0) ( x (2) − δ, x (1) − δ, x (0) )4 µ x (2) > , x (1) = 0) ( x (2) − δ, x (1) , x (0) )Observe that systems 0 and 1 are identical for all t ≥ τ (1) (0). Proceeding similarly to(33), we see that∆ f h ( δk ) = Z ∞ E X (0) (0)= δk (cid:16) h ( X (2) ( t )) − h ( X (1) ( t )) − h ( X (0) ( t )) (cid:17) dt = Z ∞ E X (0) (0)= δk (cid:20) t ≤ τ (1) (0))∆ h ( X (0) ( t )) (cid:21) dt + Z ∞ E X (1) (0)=0 (cid:16) h ( X (2) ( t )) − h ( X (1) ( t )) (cid:17) dt = Z ∞ E X (0) (0)= δk (cid:20) t ≤ τ (1) (0))∆ h ( X (0) ( t )) (cid:21) dt + ∆ f h (0) , (35) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. where the second equality follows from the strong Markov property. Provided that h ∈ dLip(1) and | ∆ h ( δk ) | ≤ δ , we combine the above with (34) and our bound on | ∆ f h ( δk ) | to conclude that (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) ≤ δ ( k + 1) µ − λ + δ µ − λ . (36)Observe that the bound above is not the same as the bound on | ∆ f h ( δk ) | in Lemma 4.Furthermore, going from (35) to (36) requires the additional assumption that | ∆ h ( δk ) | ≤ δ , whereas Lemma 4 only assumes h ∈ dLip(1). In fact, the synchronous coupling approachtypically requires stronger assumptions on h ( δk ) than are necessary. The remaining boundsin Lemma 4 are proved using approaches that we discuss in Sections 3.2 and 3.3. The jump from the second to the third diﬀerence is almostidentical to the jump from the ﬁrst to the second. We deﬁne system 3 as a copy of system2 with yet another low-priority customer. We omit the transition rate table and simplyillustrate the relationship of systems 0–3 in the ﬁgure below. It follows that X (0) X (1) X (2) X (3) Figure 2 The initial state of systems 0,1,2,3 when system 0 starts with 4 customers. The diamonds,squares, and stars represent the extra customers. ∆ f h ( δk ) = Z ∞ E X (0) (0)= δk (cid:16) h ( X (3) ( t )) − h ( X (2) ( t )) + 3 h ( X (1) ( t )) − h ( X (0) ( t )) (cid:17) dt = Z ∞ E X (0) (0)= δk (cid:20) t ≤ τ (1) (0))∆ h ( X (0) ( t )) (cid:21) dt + Z ∞ E X (1) (0)=0 (cid:16) h ( X (3) ( t )) − h ( X (2) ( t )) + 2 h ( X (1) ( t )) (cid:17) dt = Z ∞ E X (0) (0)= δk (cid:20) t ≤ τ (1) (0))∆ h ( X (0) ( t )) (cid:21) dt + ∆ f h (0) − ∆ f h (0)= Z ∞ E X (0) (0)= δk (cid:20) t ≤ τ (1) (0))∆ h ( X (0) ( t )) (cid:21) dt + Z ∞ E X (0) (0)=0 (cid:20) t ≤ τ (1) (0))∆ h ( X (0) ( t )) (cid:21) dt. (37) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. The last equality follows from (35). Provided that h ∈ dLip(1), | ∆ h ( δk ) | ≤ δ , and | ∆ h ( δk ) | ≤ δ , we apply (34) to conclude that (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) ≤ δ ( k + 1) µ − λ + δ µ − λ . In the following section we discuss how the ﬁnite diﬀerences can be bounded by using thePoisson equation.

Perhaps the most obvious way to access the values of ∆ a f h ( δk ) is through the Poissonequation. Recalling the M/M/

M/M/ E h ( X ) − h ( δk ) = λ ∆ f h ( δk ) − µ ∆ f h ( δ ( k − λ ∆ f h ( δ ( k − − ( µ − λ )∆ f h ( δ ( k − , k ≥ . Rearranging terms, we get∆ f h ( δk ) = 1 λ (cid:0) E h ( X ) − h ( δ ( k + 1)) (cid:1) + µ − λλ ∆ f h ( δk ) , ∆ f h ( δk ) = 1 λ (cid:0) h ( δ ( k + 1)) − h ( δ ( k + 2)) (cid:1) + µ − λλ ∆ f h ( δk ) , k ≥ . (38)If h (0) = 0, we replace h ( δk ) by h ( δk ) − h (0), which has no eﬀect on the solution f h ( δk ).Therefore, h (0) = 0 without loss of generality. Furthermore, if h ∈ dLip(1), then the aboveequations give automatic bounds on | ∆ f h ( δk ) | and | ∆ f h ( δk ) | provided that we can bound | E h ( X ) | ≤ E | X | and | ∆ f h ( δk ) | . From our discussion on synchronous couplings, we knowthat | ∆ f h ( δk ) | ≤ δ k +1 µ − λ . Furthermore, it is well known that X/δ is geometrically distributedwith mean ρ/ (1 − ρ ) = λµ − λ , where ρ = λ/µ , so (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) ≤ δµ − λ + 2 δ ( k + 1) λ , and (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) ≤ δλ + µ − λλ (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) . (39)Observe that the bounds above only require that h ∈ dLip(1), compared to the additionalassumptions of the synchronous coupling bounds.The problem with using the Poisson equation is that it requires the CTMC to be one-dimensional. When the CTMC is multidimensional, there is more than one second-order raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. diﬀerence ∆ i ∆ j f h ( · ) present in the Poisson equation and it is not possible to isolate a singlesecond-order diﬀerence, say ∆ f ( δk ), in terms of ﬁrst-order diﬀerences only. Therefore, onits own, the Poisson equation will not yield all the necessary high-order diﬀerence bounds.However, when combined with the synchronous coupling approach, it becomes a useful toolbecause it relates the high-order diﬀerences to each other. This idea is used in Braverman(2021). To bound ∆ f h ( δk ) via (38) we used a bound on E | X | . In thecase of the M/M/ X is known.However, obtaining a useful upper bound on E | X | is harder for more complicated systemsand usually involves using some kind of Lyapunov function. The Poisson equation providesanother route to bound E | X | . Picking h ( δk ) = | δk | , the Poisson equation at k = 0 becomes λ (cid:0) f h ( δ ) − f h (0) (cid:1) = λ ∆ f h (0) = E | X | = E X, so (33) and (34) imply that E X = δλ/ ( µ − λ ) = δρ/ (1 − ρ ). Choosing δ to be any constantmultiple of 1 − ρ ensures that { X } ρ< is tight.The main takeaway is that the problem of tightness is equivalent to bounding G X f h ( δk )at a single point that corresponds to the ﬂuid equilibrium of the CTMC. At the ﬂuidequilibrium, G X f h ( δk ) typically consists of second-order diﬀerences of f h ( δk ), unless theﬂuid equilibrium lies on the boundary of supp( X ) as in our M/M/

There is a simple trick based on the strong Markov property that lets us bound ∆ a f h ( δk )for a > X (1) ( t ) and X (0) ( t ) deﬁned in Section 3.1 and assume h ∈ dLip(1). The strongMarkov property implies that for any k ≥ f h ( δk )= Z ∞ E X (0) (0)= δk (cid:20) (cid:16) t ≤ τ (0) ( δ ( k − (cid:17)(cid:16) h ( X (1) ( t )) − h ( X (0) ( t )) (cid:17)(cid:21) dt + ∆ f h ( δ ( k − . (40) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. We bring ∆ f h ( δ ( k − X (1) ( t ) − X (0) ( t ) = δ for t ≤ τ (0) ( δ ( k − (cid:12)(cid:12) ∆ f h ( δ ( k − (cid:12)(cid:12) ≤ δ E X (0) (0)= δk τ (0) ( δ ( k − . Recall that the Lyapunov function V ( δk ) = k satisﬁes G X V ( δk ) = ( λ − µ )1( k > E X (0) (0)= δk τ (0) ( δ ( k − V ( δk ) − E X (0) (0)= δk V ( X (0) ( τ (0) ( δ ( k − µ − λ = 1 µ − λ . The equalities above follow from Dynkin’s formula, just as in (34). We have thus provedthe second diﬀerence bound in Lemma 4. We point out that the argument above does notassume that | ∆ h ( δk ) | ≤ δ like the synchronous coupling approach does, and it also doesnot require a bound on E h ( X ) as was needed when we rearranged the Poisson equation.We can bound ∆ f h ( δk ) by repeating (40) with ∆ f h ( δk ) instead of ∆ f h ( δk ). However,instead we recall (38) and arrive at (cid:12)(cid:12) ∆ f h ( δk ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:0) h ( δ ( k + 1)) − h ( δ ( k + 2)) (cid:1) + µ − λλ ∆ f h ( δk ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δλ , which completes the proof of Lemma 4. We conclude this section by pointing out thatin practice, one often uses a hybrid approach involving all the methods discussed in Sec-tions 3.1, 3.2, and 3.3 to get the best possible bounds.

4. Misalignment of Diﬀusion Synchronous Couplings

We have presented the prelimit approach as a parallel to the diﬀusion approach for thepurposes of bounding | E h ( X ) − E h ( Y ) | . As we have seen, the main challenge with eachapproach is to bound the diﬀerences/derivatives of the respective Poisson equation solu-tion. In theory, any bound achievable using one approach should be achievable with theother. In practice, there are slight technical diﬀerences between working with a discrete-valued CTMC and a diﬀusion living on a continuum. In this section we illustrate onetechnical nuance that arises when we use synchronous couplings to bound the third deriva-tives of f h ∗ ( x ) in the diﬀusion approach. We term this the “misalignment of synchronouscouplings”. raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

Let us recall the generic diﬀusion process { Y ( t ) } living on R that we deﬁned in (20).We assume for simplicity that the diﬀusion coeﬃcient a ( x ) = a for all x ∈ R and deﬁne thesynchronous couplings Y ( i ) ( t ) = Y (0) (0) + iε + Z t b ( Y ( i ) ( s )) ds + Z t √ adW ( s ) , i = 0 , , , . The four couplings start at diﬀerent initial conditions but share the same Brownian motion.Since f h ∗ ( x ) is given by (4), it follows that ∂ ∂x f h ∗ ( x )= lim ε → ε Z ∞ E Y (0) (0)= x (cid:16) h ∗ ( Y (3) ( t )) − h ∗ ( Y (2) ( t )) + 3 h ∗ ( Y (1) ( t )) − h ∗ ( Y (0) ( t )) (cid:17) dt. (41)To show the integral on the right-hand side is ﬁnite, one must show that the synchronouscouplings converge to one another and characterize the speed at which it happens. Fur-thermore, the integral must be of order ε for the limit to exist. Let us consider this lastpoint further.Given a suﬃciently diﬀerentiable function g : R → R , we know that its derivatives canbe approximated by ﬁnite diﬀerences. For instance, Taylor expansion tells us that g ′′′ ( x ) ≈ (cid:0) g ( x ′′′ ) − g ( x ′′ ) + 3 g ( x ′ ) − g ( x ) (cid:1) ε (42)when x ′′′ = x + 3 ε , x ′′ = x + 2 ε , and x ′ = x + ε . The precise spacing of x, x ′ , x ′′ , x ′′′ relativeto each other is essential for the limit (as ε →

0) of the right-hand side in (42) to exist.For example, if x ′′′ = x + 4 ε , then the numerator is now of order ε instead of ε , and theright-hand side diverges to ∞ as ε →

0. Therefore, one way to show that the integral in(41) is of order ε is to prove that the diﬀusion couplings maintain the appropriate spacingrelative to each other so that the integrand is of order ε for each t ≥ h ( x ) is smooth enough, the drift b ( x ) is four-times continuously diﬀerentiableand k -strongly concave, then (cid:12)(cid:12) h ∗ ( Y (3) ( t )) − h ∗ ( Y (2) ( t )) + 3 h ∗ ( Y (1) ( t )) − h ∗ ( Y (0) ( t )) (cid:12)(cid:12) ≤ ε Ce − kt/ (43) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. almost surely, where the constant C depends on k , h ( x ) and b ( x ). The above inequalitythen implies thatlim ε → ε Z ∞ E Y (0) (0)= x (cid:12)(cid:12)(cid:12) h ∗ ( Y (3) ( t )) − h ∗ ( Y (2) ( t )) + 3 h ∗ ( Y (1) ( t )) − h ∗ ( Y (0) ( t )) (cid:12)(cid:12)(cid:12) dt ≤ C/k. (44)Similarly, (43) also holds for d -dimensional diﬀusions with constant diﬀusion coeﬃcients.Unfortunately, if the assumptions on the drift are violated, e.g. the drift is only Lipschitz-continuous or the diﬀusion has a reﬂecting boundary, then (43) no longer holds becausethe diﬀusion couplings become misaligned. This misalignment complicates the problem ofbounding (41) because one cannot use (43) anymore.As an example, we now illustrate how this misalignment occurs in the RBM that approx-imates the M/M/ Y ( t ) = Y (0) + δ ( λ − µ ) t + δ p ( λ + µ ) W ( t ) + R ( t ) , t ≥ , and let Y be the random variable having its stationary distribution. It was shown inHarrison and Reiman (1981) that R ( t ) = − inf ≤ s ≤ t n Y (0) + δ ( λ − µ ) s + δ p ( λ + µ ) W ( s ) o . We wish to bound the third derivative of f h ∗ ( x ) = Z ∞ (cid:0) E Y (0)= x h ∗ ( Y ( t )) − E h ∗ ( Y ) (cid:1) dt, x ≥ . For simplicity, we choose h ∗ ( x ) = x . Let us deﬁne the four coupled processes Y ( i ) ( t ) = Y (0) (0) + iε + δ ( λ − µ ) t + δ p ( λ + µ ) W ( t ) + R ( i ) ( t ) , where R ( i ) ( t ) = − inf ≤ s ≤ t n Y ( i ) (0) + δ ( λ − µ ) s + δ p ( λ + µ ) W ( s ) o , i = 0 , , , . (45)We also deﬁne D ( t ) = Y (3) ( t ) − Y (2) ( t ) + 3 Y (1) ( t ) − Y (0) ( t ). It follows that ∂ ∂x f h ∗ ( x ) = lim ε → ε Z ∞ E Y (0) (0)= x D ( t ) dt. (46) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

We deﬁne γ = inf t ≥ { Y (1) ( t ) = 3 ε/ } , γ = inf t ≥ { Y (1) ( t ) = ε/ } . We will prove at the end of this section that D ( t ) ≤ − ε/ , for t ∈ [ γ , γ ] . (47)We see that (47) violates (43). Furthermore, the expected hitting time of a ﬁxed level bya Brownian motion with drift is well known and implies that E ( γ − γ ) = ε/ (2 δ ( µ − λ )).Therefore, the integral in (46) equals1 ε Z ∞ E Y (0) (0)= x ( D ( t )1( t ∈ [ γ , γ ])) dt + 1 ε Z ∞ E Y (0) (0)= x ( D ( t )1( t [ γ , γ ])) dt, (48)and the ﬁrst term is bounded from above by − (8 δ ( µ − λ ) ε ) − , which diverges as ε → | D ( t ) | and take the limit as ε → ∂ ∂x f h ∗ ( x ) is well deﬁned and the right-hand side of (46) exists. By applyingthe strong Markov property trick, one can show that the second integral in (48) containsa positive term of order 1 /ε that cancels out the ﬁrst integral.The main takeaway is that the misaligned synchronous couplings added extra complex-ity to the problem. In contrast, the analogous analysis using the prelimit approach inSection 3.1.3 was cleaner because the CTMC is restricted to the grid.In the remainder of this section, we verify (47). By deﬁnition, Y ( i +1) ( t ) − Y ( i ) ( t ) = ε + R ( i +1) ( t ) − R ( i ) ( t ) , i = 0 , , t ≥

0. Now R ( i ) ( t ) = 0 for i = 1 , , t < inf s ≥ { Y (1) ( s ) = 0 } . This implies inparticular that Y (3) ( t ) − Y (2) ( t ) = Y (2) ( t ) − Y (1) ( t ) = ε, t ∈ [ γ , γ ]because γ < inf s ≥ { Y (1) ( s ) = 0 } . Thus, D ( t ) = − ε + Y (1) ( t ) − Y (0) ( t ) = R (1) ( t ) − R (0) ( t ) = − R (0) ( t ) , t ∈ [ γ , γ ] . One can check that R (0) ( γ ) = ε/ R ( i ) ( t ) in (45). Similarly, R (0) ( γ ) =3 ε/

4. Since R (0) ( t ) is non-decreasing, we have R (0) ( t ) ∈ [ ε/ , ε/

4] when t ∈ [ γ , γ ], whichproves (47). raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no.

5. Conclusion

In this paper we introduced the prelimit generator comparison approach and used the

M/M/

U, U ′ , the Kolmogorov distance is deﬁned as d K ( U, U ′ ) = sup z ∈ R (cid:12)(cid:12) E (cid:0) U ≥ z ) (cid:1) − E (cid:0) U ′ ≥ z ) (cid:1)(cid:12)(cid:12) . It is well known (e.g., Braverman et al. (2016)) that the discontinuity in the test functions1( · ≥ z ) makes working with the Kolmogorov distance more diﬃcult than the Wasserstein.Even though we deal with discrete functions and their interpolations, the issue with thediscontinuity in 1( · ≥ z ) will still come up in the diﬀerence bounds on f h ( x ). However,the authors believe that, with minor tweaks, the prelimit approach can be applied in theKolmogorov distance setting. Appendix A: The Polynomial P k ( x ) and Interpolation in MultipleDimensions In this section we prove Theorem 1. We then state and prove Theorem 3, which is a generalizationof the theorem to multiple dimensions.

Proof of Theorem 1

Given f : K → R , for each k ∈ Z such that δk ∈ K we deﬁne P k ( x ) = f ( δk ) + (cid:16) x − δkδ (cid:17) (∆ −

12 ∆ + 13 ∆ ) f ( δk )+ 12 (cid:16) x − δkδ (cid:17) (cid:0) ∆ − ∆ (cid:1) f ( δk ) + 16 (cid:16) x − δkδ (cid:17) ∆ f ( δk ) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. − (cid:16) x − δkδ (cid:17) ∆ f ( δk ) + 412 (cid:16) x − δkδ (cid:17) ∆ f ( δk ) − (cid:16) x − δkδ (cid:17) ∆ f ( δk ) + 112 (cid:16) x − δkδ (cid:17) ∆ f ( δk ) , x ∈ R . (49)where ∆ f ( δk ) = f ( δ ( k + 1)) − f ( δk ). It is clear from (49) that P k ( δk ) = f ( δk ), implying (11).Furthermore, it is straightforward to verify that ∂ a ∂x a P k − ( x ) (cid:12)(cid:12)(cid:12) x = δk = ∂ a ∂x a P k ( x ) (cid:12)(cid:12)(cid:12) x = δk , for a = 0 , , , . (50)The property above implies Af ( x ) = P k ( x ) ( x ) ∈ C (Conv( K )). Since P k ( x ) ∈ C ∞ ( R ), we know Af ( x ) is inﬁnitely diﬀerentiable on Conv( K ) \ K . The weights α kk + i ( x ) can be read oﬀ by combiningthe coeﬃcients corresponding to f ( δ ( k + i )) in (49). For example, α kk ( x ) = 1 − (cid:16) x − δkδ (cid:17) + (cid:16) x − δkδ (cid:17) − (cid:16) x − δkδ (cid:17) − (cid:16) x − δkδ (cid:17) + 412 (cid:16) x − δkδ (cid:17) − (cid:16) x − δkδ (cid:17) + 112 (cid:16) x − δkδ (cid:17) . It is straightforward to check that X i =0 α kk + i ( x ) = 1 , α kk ( δk ) = 1 , and α kk + i ( δk ) = 0 . It is also clear that the weights are degree-7 polynomials in ( x − δk ) /δ whose coeﬃcients do notdepend on k or δ , i.e., α kk + i ( x ) = J i (cid:16) x − δkδ (cid:17) for some polynomial J i ( · ). A consequence of this that for any x ∈ R , α k + jk + j + i ( x + δj ) = J i (cid:16) x + δj − δ ( k + j ) δ (cid:17) = J i (cid:16) x − δkδ (cid:17) = α kk + i ( x ) , j, k ∈ Z , ≤ i ≤ , (cid:3) We now generalize Theorem 1 and deﬁne an interpolation operator that can interpolate any func-tion deﬁned on K ∩ δ Z d where K ⊂ R d is convex. The interpolator is based on forward diﬀerences,but one could also use central or backward diﬀerences to accommodate diﬀerent domains shapes.The following theorem summarizes the key properties we want from it. Theorem 3.

Let { α kk + i : R → R : k ∈ Z , i = 0 , , , , } be as in Theorem 1 and suppose we aregiven a convex set K ⊂ R d and a function f : K ∩ δ Z d → R . Letting i = ( i , . . . , i d ) ∈ Z d , we use theweights to deﬁne Af ( x ) = X i d =0 α k d ( x ) k d ( x )+ i d ( x d ) · · · X i =0 α k ( x ) k ( x )+ i ( x ) f ( δ ( k ( x ) + i ))= X i ,...,i d =0 (cid:18) d Y j =1 α k j ( x ) k j ( x )+ i j ( x j ) (cid:19) f ( δk ( x ) + i ) , x ∈ Conv ( K ) , (51) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. where k ( x ) ∈ Z d is deﬁned by k i ( x ) = ⌊ x i /δ ⌋ , and K = { x ∈ K ∩ δ Z d : δ ( k ( x ) + i ) ∈ K ∩ δ Z d for all ≤ i ≤ e } . Then Af ( x ) ∈ C ( Conv ( K )) and is inﬁnitely diﬀerentiable almost everywhere on Conv ( K ) . Addi-tionally, Af ( δk ) = f ( δk ) , δk ∈ K ∩ δ Z d , (52) and there exists a constant C > independent of f ( · ) , x , and δ , such that (cid:12)(cid:12)(cid:12) ∂ a ∂x a Af ( x ) (cid:12)(cid:12)(cid:12) ≤ Cδ −k a k max ≤ i j ≤ − a j j =1 ,...,d | ∆ a . . . ∆ a d d f ( δ ( k ( x ) + i )) | , x ∈ Conv ( K ) , (53) for ≤ k a k ≤ , and (53) also holds when k a k = 4 for almost all x ∈ Conv ( K ) . Note that for any J ⊂ { , . . . , d } and J c = { , . . . , d } \ J , we may rewrite (51) as Af ( x ) = X i j =0 j ∈ J c (cid:18) Y j ∈ J c α k j k j + i j ( x j ) (cid:19) X i j =0 j ∈ J (cid:18) Y j ∈ J α k j k j + i j ( x j ) (cid:19) f ( δ ( k + i )) ! . (54)The representation in (54) will come in handy in a later section. Let us construct the multidimen-sional analog of P k ( x ) from (49) by deﬁning F k ( x ) = X i d =0 α k d k d + i d ( x d ) · · · X i =0 α k k + i ( x ) f ( δ ( k + i )) , x ∈ R d , k ∈ K . (55)Note that Af ( x ) deﬁned in (51) satisﬁes Af ( x ) = F k ( x ) ( x ) for x ∈ Conv( K ). Furthermore, (13) ofTheorem 1 implies (52). To prove Theorem 3, it remains to verify the smoothness of Af ( x ) and(53).For any x ∈ R d and any set J ⊂ { , . . . , d } , we write x J to denote the vector whose i th elementequals x i i ∈ J ). The following result is the multidimensional analog (50) and is proved at the endof this section. Lemma 5.

Fix k ∈ K , and for any u ∈ [0 , d , let Θ( u ) = { i : u i = 1 } and Θ( u ) c = { , . . . , d } \ Θ( u ) . Then for any ≤ k a k ≤ , ∂ a ∂x a F k ( x ) (cid:12)(cid:12)(cid:12) x = δ ( k + u ) = ∂ a ∂x a F k + e Θ( u ) ( x ) (cid:12)(cid:12)(cid:12) x = δ ( k + u ) . (56) Furthermore, there exists a constant

C > independent of f ( · ) , k , and δ such that for all ≤ k a k ≤ and all x ∈ Conv ( K ) , (cid:12)(cid:12)(cid:12) ∂ a ∂x a F k ( x ) (cid:12)(cid:12)(cid:12) ≤ Cδ −k a k (cid:18) d Y j =1 (cid:16) (cid:12)(cid:12)(cid:12) x j − δk j δ (cid:12)(cid:12)(cid:12)(cid:17) − a j (cid:19) max ≤ i j ≤ − a j j =1 ,...,d | ∆ a . . . ∆ a d d f ( δ ( k + i )) | . (57) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

The above lemma proves Theorem 3. Indeed, (56) implies Af ( x ) ∈ C (Conv( K )), and since theweights α kk + i ( x ) belong to C ∞ ( R ), we know Af ( x ) is inﬁnitely diﬀerentiable everywhere exceptat the points where the diﬀerent F k ( x ) are glued together, i.e., on the set { x ∈ Conv( K ) : x i ∈ δ Z for some i ∈ { , . . . , d }} , which has Lebesgue measure zero. Furthermore, (53) follows directlyfrom (57). We now prove Lemma 5. Proof of Lemma 5

Fix k ∈ K . We ﬁrst prove (56). Let j ′ be an element of Θ( u ). From (54) itfollows that ∂ a ∂x a F k ( x ) (cid:12)(cid:12)(cid:12) x = δ ( k + u ) = X i j =0 j ∈{ ,...,d }\{ j ′ } (cid:18) Y j ∈{ ,...,d }\{ j ′ } ∂ a j ∂x a j j α k j k j + i j ( x j ) (cid:12)(cid:12)(cid:12) x j = δ ( k j + u j ) (cid:19) × X i j ′ =0 (cid:18) ∂ a j ′ ∂x a j ′ j ′ α k j ′ k j ′ + i j ′ ( x j ′ ) (cid:12)(cid:12)(cid:12) x j ′ = δ ( k j ′ +1) (cid:19) f ( δ ( k + i )) ! . Combining (50) with the weighted sum representation of P k ( x ) implies X i j ′ =0 (cid:18) ∂ a j ′ ∂x a j ′ j ′ α k j ′ k j ′ + i j ′ ( x j ′ ) (cid:12)(cid:12)(cid:12) x j ′ = δ ( k j ′ +1) (cid:19) f ( δ ( k + i ))= X i j ′ =0 (cid:18) ∂ a j ′ ∂x a j ′ j ′ α k j ′ +1 k j ′ +1+ i j ′ ( x j ′ ) (cid:12)(cid:12)(cid:12) x j ′ = δ ( k j ′ +1) (cid:19) f ( δ ( k + i + e j ′ )) . Repeating the above procedure for all other elements of Θ( u ), we see that ∂ a ∂x a F k ( x ) (cid:12)(cid:12)(cid:12) x = δ ( k + u ) = X i j =0 j ∈ Θ( u ) c (cid:18) Y j ∈ Θ( u ) c ∂ a j ∂x a j j α k j k j + i j ( x j ) (cid:12)(cid:12)(cid:12) x j = δ ( k j + u j ) (cid:19) × X i j =0 j ∈ Θ( u ) (cid:18) Y j ∈ Θ( u ) ∂ a j ∂x a j j α k j +1 k j +1+ i j ( x j ) (cid:12)(cid:12)(cid:12) x j = δ ( k j +1) (cid:19) f ( δ ( k + i + e Θ( u ) )) ! = ∂ a ∂x a F k + e Θ( u ) ( x ) (cid:12)(cid:12)(cid:12) x = δ ( k + u ) which proves (56). It remains to prove the bound on (cid:12)(cid:12) ∂ a ∂x a F k ( x ) (cid:12)(cid:12) in (57). We know ∂ a ∂x a F k ( x ) = X i =0 ∂ a ∂x a α k k + i ( x ) · · · X i d =0 ∂ a d ∂x a d d α k d k d + i d ( x d ) f ( δ ( k + i )) . By inspecting the form of the one-dimensional P k ( · ) in (49), one can check that X i d =0 ∂ a d ∂x a d d α k d k d + i d ( x d ) f ( δ ( k + i )) = δ − a d Q ( d ) (cid:16) x d − δk d δ (cid:17) , raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. Q ( d ) ( · ) is a (7 − a d )th order polynomial whose coeﬃcients are independent of δ and dependon f ( δ ( k + i )) only through ∆ a d d f (cid:0) δ ( k + ( i , . . . , i d − , (cid:1) ∆ a d d f (cid:0) δ ( k + ( i , . . . , i d − , (cid:1) . . . ∆ a d d f (cid:0) δ ( k + ( i , . . . , i d − , − a d )) (cid:1) . This implies in particular that (cid:12)(cid:12)(cid:12) X i d =0 ∂ a d ∂x a d d α k d k d + i d ( x d ) f ( δ ( k + i )) (cid:12)(cid:12)(cid:12) ≤ Cδ − a d (cid:16) (cid:12)(cid:12)(cid:12) x d − δk d δ (cid:12)(cid:12)(cid:12)(cid:17) − a d max ≤ i d ≤ − a d | ∆ a d d f ( δ ( k + i )) | . We now consider X i d − =0 ∂ a d − ∂x a d − d − α k d − k d − + i d − ( x d − ) (cid:18) X i d =0 ∂ a d ∂x a d d α k d k d + i d ( x d ) f ( δ ( k + i )) (cid:19) . When viewed as a one-dimensional function of x d − , the above is again a polynomial of order(7 − a d − ) that depends on the quantity inside the parentheses only through∆ a d − d − (cid:18) X i d =0 ∂ a d ∂x a d d α k d k d + i d ( x d ) f ( δ ( k + i )) (cid:19) , with i d − = 0 , . . . , − a d − . Hence, (cid:12)(cid:12)(cid:12)(cid:12) X i d − =0 ∂ a d − ∂x a d − d − α k d − k d − + i d − ( x d − ) (cid:18) X i d =0 ∂ a d ∂x a d d α k d k d + i d ( x d ) f ( δ ( k + i )) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cδ − a d − − a d (cid:16) (cid:12)(cid:12)(cid:12) x d − − δk d − δ (cid:12)(cid:12)(cid:12)(cid:17) − a d − (cid:16) (cid:12)(cid:12)(cid:12) x d − δk d δ (cid:12)(cid:12)(cid:12)(cid:17) − a d × max ≤ i d ≤ − a d ≤ i d − ≤ − a d − (cid:12)(cid:12) ∆ a d − d − ∆ a d d f ( δ ( k + i )) (cid:12)(cid:12) . Repeating this argument along each of the remaining d − (cid:3) Appendix B: Interchange in Multiple Dimensions

In this section we generalize the interchange results to multiple dimensions. Section B.1 containsthe result for unbounded domains, while Section B.2 considers the case when the domain is δ N d . raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

B.1. Unbounded Domains

In this section we prove Proposition 1 by proving the more general Proposition 3 stated below.Consider a CTMC living on Z d with generator given by (1). Just as we did in Section 2.3, we deﬁne β ℓ ( δk ) = q δk,δ ( k + ℓ ) , but this time, k, ℓ ∈ Z d . Then G X f ( δk ) = X ℓ ∈ Z d β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) , k ∈ Z d . Proposition 3.

Fix f : δ Z d → R and assume that G X f ( δk ) is deﬁned on all of δ Z d . Assumealso that X ℓ ∈ Z d | β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) | < ∞ , k ∈ Z d , (58) which is trivially satisﬁed when the number of possible transitions from each state is ﬁnite. For x ∈ R d deﬁne k ( x ) ∈ Z d by k i ( x ) = ⌊ x i /δ ⌋ . Then AG X f ( x ) = X ℓ ∈ Z Aβ ℓ ( x ) (cid:0) Af ( x + δℓ ) − Af ( x ) (cid:1) + ε ( x ) , x ∈ R d , (59) where ε ( x ) = X ℓ ∈ Z X i ,...,i d =0 (cid:18) d Y j =1 α k j ( x ) k j ( x )+ i j ( x j ) (cid:19)(cid:16) β ℓ (cid:0) δ ( k ( x ) + i ) (cid:1) − Aβ ℓ ( x ) (cid:17) × (cid:16) f (cid:0) δ ( k ( x ) + ℓ + i ) (cid:1) − f (cid:0) δ ( k ( x ) + i ) (cid:1) − (cid:0) f (cid:0) δ ( k ( x ) + ℓ ) (cid:1) − f ( δk ( x )) (cid:1)(cid:17) . (60)Before we prove Proposition 3, let us reconcile the forms of ε ( x ) in (60) above and in (18) ofProposition 1. When d = 1, (60) equals X ℓ ∈ Z X i =0 α k ( x ) k ( x )+ i ( x ) (cid:16) β ℓ (cid:0) δ ( k ( x ) + i ) (cid:1) − Aβ ℓ ( x ) (cid:17) × (cid:16) f (cid:0) δ ( k ( x ) + ℓ + i ) (cid:1) − f (cid:0) δ ( k ( x ) + i ) (cid:1) − (cid:0) f (cid:0) δ ( k ( x ) + ℓ ) (cid:1) − f (cid:0) δk ( x ) (cid:1)(cid:1)(cid:17) . Using a telescoping series, we see that if ℓ > f (cid:0) δ ( k ( x ) + ℓ + i ) (cid:1) − f (cid:0) δ ( k ( x ) + i ) (cid:1) − (cid:0) f (cid:0) δ ( k ( x ) + ℓ ) (cid:1) − f (cid:0) δk ( x ) (cid:1)(cid:1) = ℓ − X j =0 (cid:0) ∆ f ( δ ( k ( x ) + j + ℓ )) − ∆ f ( δ ( k ( x ) + j )) (cid:1) = i − X j =0 ℓ − X m =0 ∆ f ( δ ( k ( x ) + m + j )) . raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. ℓ < f (cid:0) δ ( k ( x ) + ℓ + i ) (cid:1) − f (cid:0) δ ( k ( x ) + i ) (cid:1) − (cid:0) f (cid:0) δ ( k ( x ) + ℓ ) (cid:1) − f (cid:0) δk ( x ) (cid:1)(cid:1) = − i − X j =0 − X m = − ℓ ∆ f ( δ ( k ( x ) + m + j )) . Therefore, Proposition 3 implies Proposition 1. When d >

1, it is also possible to write (60) as atelescoping series of second-order diﬀerences of f ( δk ). We leave this as an exercise in algebra forthe interested reader, because it is notationally messy. Proof of Proposition 3

Fix x ∈ R d . We will write k instead of k ( x ) for convenience. Recall fromTheorem 3 that for any function f : δ Z d → R , Af ( x ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) f (cid:0) δ ( k + i ) (cid:1) . It follows that AG X f ( x )= X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) X ℓ ∈ Z β ℓ (cid:0) δ ( k + i ) (cid:1)(cid:16) f (cid:0) δ ( k + ℓ + i ) (cid:1) − f (cid:0) δ ( k + i ) (cid:1)(cid:17) = X ℓ ∈ Z Aβ ℓ ( x ) X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19)(cid:16) f (cid:0) δ ( k + ℓ + i ) (cid:1) − f (cid:0) δ ( k + i ) (cid:1)(cid:17) (61)+ X ℓ ∈ Z X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19)(cid:16) β ℓ (cid:0) δ ( k + i ) (cid:1) − Aβ ℓ ( x ) (cid:17)(cid:16) f (cid:0) δ ( k + ℓ + i ) (cid:1) − f (cid:0) δ ( k + i ) (cid:1)(cid:17) . (62)In the second equality, interchanging the summations is allowed by our assumption that P ℓ ∈ Z d | β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) | < ∞ and the Fubini-Tonelli theorem. Looking at (61), observethat for each ℓ ∈ Z d , X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19)(cid:16) f (cid:0) δ ( k + ℓ + i ) (cid:1) − f (cid:0) δ ( k + i ) (cid:1)(cid:17) = X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) f (cid:0) δ ( k + ℓ + i ) (cid:1) − Af ( x )= X i ,...,i d =0 (cid:18) d Y j =1 α k j + ℓ j k j + ℓ j + i j ( x j + δℓ j ) (cid:19) f (cid:0) δ ( k + ℓ + i ) (cid:1) − Af ( x )= Af ( x + δℓ ) − Af ( x ) , raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. where in the second equality we used the translation invariance property of the weights stated in(15) of Theorem 1. Moving on, we see that (62) equals X ℓ ∈ Z X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19)(cid:16) β ℓ (cid:0) δ ( k + i ) (cid:1) − Aβ ℓ ( x ) (cid:17) × (cid:16) f (cid:0) δ ( k + ℓ + i ) (cid:1) − f (cid:0) δ ( k + i ) (cid:1) − (cid:0) f (cid:0) δ ( k + ℓ ) (cid:1) − f ( δk ) (cid:1)(cid:17) + X ℓ ∈ Z (cid:16) f (cid:0) δ ( k + ℓ ) (cid:1) − f ( δk ) (cid:17) X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19)(cid:16) β ℓ (cid:0) δ ( k + i ) (cid:1) − Aβ ℓ ( x ) (cid:17) . The second line above equals zero because (14) of Theorem 1 implies P i ,...,i d =0 (cid:16) Q dj =1 α k j k j + i j ( x j ) (cid:17) = 1 and because Aβ ℓ ( x ) = P i ,...,i d =0 (cid:16) Q dj =1 α k j k j + i j ( x j ) (cid:17) β ℓ (cid:0) δ ( k + i ) (cid:1) by deﬁnition. Therefore, (62) equals ε ( x ). (cid:3) B.2. A Bounded Domain

In this section, we prove Proposition 2 by proving the more general Proposition 4 stated below. Wefocus on the case when the CTMC takes values in δ N d . Assume that our CTMC has generator G X f ( δk ) = X ℓ ∈ Z d β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) , k ∈ N d . Fix h : δ N d → R such that G X f h ( δk ) = E h ( X ) − h ( δk ) , k ∈ N d has a ﬁnite solution f h ( δk ). Assume there exists some vector L ∈ N d such that β ℓ ( δk ) = 0 for all k ∈ N d if ℓ j < − L j for some 1 ≤ j ≤ d . The deﬁnition of L means that ℓ ≥ − L is a necessary conditionfor β ℓ ( δk ) to be non-zero for some k , so G X f ( δk ) = X ℓ ≥− L β ℓ ( δk )( f ( δ ( k + ℓ )) − f ( δk )) , k ∈ N d . (63)Deﬁne k ( x ) by k i ( x ) = ⌊ x i /δ ⌋ . Recalling the form of the multidimensional interpolator from (51) ofTheorem 3, we deﬁne f Aβ ℓ ( x ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j ( x ) ∨ L j k j ( x ) ∨ L j + i j ( x j ) (cid:19) β ℓ (cid:0) δ ( k ( x ) ∨ L + i ) (cid:1) , x ∈ R d , ℓ ≥ − L, where k ( x ) ∨ L is understood to be the element-wise maximum. Furthermore, deﬁne b f h ( δk ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ k j ∨ i j ( δk j ) (cid:19) f ( δ ( k ∨ i )) , k ∈ Z d , to be the extension of f h ( · ) to all of δ Z d . Let J ( k ) = { j : k j < L j } and J ( k ) c = { , . . . , d } \ J ( k ).Recall that e is the vector of ones and that for any x ∈ R d and any set J ⊂ { , . . . , d } , we write x J to denote the vector whose i th element equals x i i ∈ J ). The following generalizes Proposition 2. raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. Proposition 4.

Consider the CTMC deﬁned by the generator in (63) . Assume that X ℓ ≥− L | β ℓ ( δk )( f h ( δ ( k + ℓ )) − f h ( δk )) | < ∞ , k ∈ N d , (64) which is trivially satisﬁed when the number of possible transitions from each state is ﬁnite. Then AG X f h ( x ) = X ℓ ≥− L f Aβ ℓ ( x )( A b f h ( x + δℓ ) − A b f h ( x )) + e ε ( x ) + ε h ( x ) + ε f ( x ) , x ∈ R d + . Letting k = k ( x ) , the error terms satisfy e ε ( x ) = X ℓ ≥− L X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19)(cid:16) β ℓ (cid:0) δ ( k ∨ L + i ) (cid:1) − f Aβ ℓ ( x ) (cid:17) × (cid:16) f h (cid:0) δ ( k ∨ L + ℓ + i ) (cid:1) − f h (cid:0) δ ( k ∨ L + i ) (cid:1) − (cid:0) f h (cid:0) δ ( k ∨ L + ℓ ) (cid:1) − f h ( δ ( k ∨ L )) (cid:1)(cid:17) , | ε h ( x ) | ≤ J ( k ) = ∅ ) C ( L, d ) max ≤ i ≤ ek ≤ m ≤ Lj ∈ J ( k ) (cid:12)(cid:12) ∆ j h ( δ ( k J ( k ) c + i + m J ( k ) )) (cid:12)(cid:12) | ε f ( x ) | ≤ J ( k ) = ∅ ) C ( L, d ) X ℓ ≥− L (cid:12)(cid:12) f Aβ ℓ ( x ) (cid:12)(cid:12)(cid:16) max ≤ i ≤ ek ≤ m ≤ Lj ∈ J ( k ) (cid:12)(cid:12) ∆ j f h ( δ ( k J ( k ) c + i + m J ( k ) )) (cid:12)(cid:12) + max ≤ i ≤ ek + ℓ ≤ m ≤ L + ℓj ∈ J ( k ) (cid:12)(cid:12)(cid:12) ∆ j f h (cid:16) δ ( k + ℓ ) J ( k ) c + δm J ( k ) ∨ i (cid:17)(cid:12)(cid:12)(cid:12) (cid:17) . To see why Proposition 4 implies Proposition 2, note that { J ( k ) = ∅} = { ≤ k < L } when d = 1, sothe bounds on | ε h ( x ) | and | ε f ( x ) | in Proposition 4 imply the corresponding bounds in Proposition 2.To prove Proposition 4 we need the following two auxiliary results. They are proved in Sections B.2.1and B.2.2, respectively. Lemma 6.

Fix L ′ ∈ N d and for k ′ ∈ Z d deﬁne J ′ ( k ′ ) = { j : k ′ j < L ′ j } . There exists a constant C ( L ′ , d ) > such that for any function f : δ Z d → R , any x ∈ R d , and any k ′ ∈ Z d with k ′ ≥ − L ′ , (cid:12)(cid:12)(cid:12)(cid:12) X i ,...,i d =0 (cid:18) d Y j =1 α k ′ j ∨ L ′ j k ′ j ∨ L ′ j + i j ( x j ) (cid:19) f (cid:0) δ ( k ′ ∨ L ′ + i ) (cid:1) − X i ,...,i d =0 (cid:18) d Y j =1 α k ′ j k ′ j + i j ( x j ) (cid:19) f (cid:0) δ ( k ′ + i ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ J ′ ( k ′ ) = ∅ ) C ( L ′ , d ) (cid:0) | x − δ ( k ′ ∨ L ′ ) | /δ (cid:1) max ≤ i ≤ ek ′ ≤ m ≤ L ′ j ∈ J ′ ( k ′ ) (cid:12)(cid:12) ∆ j f ( δ ( k ′ J ′ ( k ′ ) c ( x ) + i + m J ′ ( k ′ ) )) (cid:12)(cid:12) . (65) Lemma 7.

Given f : δ N d → R , deﬁne b f ( δk ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ k j ∨ i j ( δk j ) (cid:19) f ( δ ( k ∨ i )) , k ∈ Z d . Then for any a ∈ N d with ≤ k a k ≤ , (cid:12)(cid:12) ∆ a b f ( δk ) (cid:12)(cid:12) ≤ C (cid:0) | k ∧ | (cid:1) max ≤ i ≤ e (cid:12)(cid:12) ∆ a f (cid:0) δ (( k ∨

0) + i )) (cid:1)(cid:12)(cid:12) , k ∈ Z d . (66) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

Proof of Proposition 4

Throughout the proof we write k instead of k ( x ) for notational conve-nience. Assume that x ≥ δL or, equivalently, J ( k ) = ∅ . Then AG X f h ( x ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) X ℓ ≥− L β ℓ (cid:0) δ ( k + i ) (cid:1)(cid:16) f h (cid:0) δ ( k + i + ℓ ) (cid:1) − f h (cid:0) δ ( k + i ) (cid:1)(cid:17) . The proof of Proposition 3 can be repeated to show that AG X f h ( x ) = X ℓ ≥− L Aβ ℓ ( x )( Af h ( x + δℓ ) − Af h ( x )) + ε ( x ) , where ε ( x ) is as in Proposition 3. From the deﬁnitions of e ε ( x ), f Aβ ℓ ( x ), A b f h ( x ), and A b f h ( x + δℓ ), itfollows that they equal ε ( x ), Aβ ℓ ( x ), Af h ( x ), and Af h ( x + δℓ ), respectively. Therefore, AG X f h ( x ) = X ℓ ≥− L f Aβ ℓ ( x )( A b f h ( x + δℓ ) − A b f h ( x )) + e ε ( x ) , J ( k ) = ∅ . We now handle the more involved case when J ( k ) = ∅ . Recall that Ah ( x ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) h (cid:0) δ ( k + i ) (cid:1) and deﬁne f Ah ( x ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) h (cid:0) δ ( k ∨ L + i ) (cid:1) , x ∈ R d + . (67)Setting ε h ( x ) = f Ah ( x ) − Ah ( x ), we have AG X f h ( x ) = E h ( X ) − Ah ( x ) = E h ( X ) − f Ah ( x ) + ε h ( x ) . Using Lemma 6 with L ′ and k ′ there being equal to L and k , respectively, we get | ε h ( x ) | ≤ J ( k ) = ∅ ) C ( L, d ) (cid:0) | x − δ ( k ∨ L ) | /δ (cid:1) max ≤ i ≤ e ≤ m ≤ Lj ∈ J ( k ) (cid:12)(cid:12) ∆ j h ( δ ( k J ( k ) c + i + m J ( k ) )) (cid:12)(cid:12) ≤ J ( k ) = ∅ ) C ( L, d ) max ≤ i ≤ ek ≤ m ≤ Lj ∈ J ( k ) (cid:12)(cid:12) ∆ j h ( δ ( k J ( k ) c + i + m J ( k ) )) (cid:12)(cid:12) . The second inequality follows from the facts that | x j − δk j | /δ = | x j − δk j ( x ) | /δ ≤ j suchthat k j ≥ L j , and | x j − δL j | /δ ≤ L j for those j where k j < L j because 0 ≤ x j < δL j . This proves thebound on | ε h ( x ) | from the statement of the proposition. Next, we observe that E h ( X ) − f Ah ( x )= X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19)(cid:16) E h ( X ) − h (cid:0) δ ( k ∨ L + i ) (cid:1)(cid:17) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) G X f h (cid:0) δ ( k ∨ L + i ) (cid:1) = X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) X ℓ ≥− L β ℓ (cid:0) δ ( k ∨ L + i ) (cid:1)(cid:16) f h (cid:0) δ ( k ∨ L + ℓ + i ) (cid:1) − f h (cid:0) δ ( k ∨ L + i ) (cid:1)(cid:17) , where in the ﬁrst equality we used (14) of Theorem 1; i.e., the weights sum to one. The above equals X ℓ ≥− L f Aβ ℓ ( x ) X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19)(cid:16) f h (cid:0) δ ( k ∨ L + ℓ + i ) (cid:1) − f h (cid:0) δ ( k ∨ L + i ) (cid:1)(cid:17) (68)+ X ℓ ≥− L X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19)(cid:16) β ℓ (cid:0) δ ( k ∨ L + i ) (cid:1) − f Aβ ℓ ( x ) (cid:17) × (cid:16) f h (cid:0) δ ( k ∨ L + ℓ + i ) (cid:1) − f h (cid:0) δ ( k ∨ L + i ) (cid:1)(cid:17) . (69)By repeating the argument from the proof of Proposition 3 that we used to show that (62) equals ε ( x ), one can check that (69) equals e ε ( x ). Lastly, we deﬁne ε f ( x ) = X ℓ ≥− L f Aβ ℓ ( x ) (cid:18) X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) f h (cid:0) δ ( k ∨ L + ℓ + i ) (cid:1) − A b f h (cid:0) x + δℓ (cid:1)(cid:19) + X ℓ ≥− L f Aβ ℓ ( x ) (cid:18) X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) f h (cid:0) δ ( k ∨ L + i ) (cid:1) − A b f h ( x ) (cid:19) , and arrive at AG X f h ( x ) = X ℓ ∈ Z d f Aβ ℓ ( x )( A b f h ( x + δℓ ) − A b f h ( x )) + e ε ( x ) + ε h ( x ) + ε f ( x ) , x ∈ R d + . It remains to verify the bound on | ε f ( x ) | . Note that A b f ( x ) = Af ( x ) when x ≥

0, so (cid:12)(cid:12)(cid:12)(cid:12) X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) f h (cid:0) δ ( k ∨ L + i ) (cid:1) − A b f h ( x ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) f h (cid:0) δ ( k ∨ L + i ) (cid:1) − X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) f h ( δ ( k + i )) (cid:12)(cid:12)(cid:12)(cid:12) ≤ J ( k ) = ∅ ) C ( L, d ) max ≤ i ≤ ek ≤ m ≤ Lj ∈ J ( k ) (cid:12)(cid:12) ∆ j f h ( δ ( k J ( k ) c + i + m J ( k ) )) (cid:12)(cid:12) , where the inequality follows from using Lemma 6 with L ′ = L and k ′ = k . Similarly, for any ℓ ≥ − L , (cid:12)(cid:12)(cid:12)(cid:12) X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) f h (cid:0) δ ( k ∨ L + ℓ + i ) (cid:1) − A b f h (cid:0) x + δℓ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) X i ,...,i d =0 (cid:18) d Y j =1 α ( k j + ℓ j ) ∨ ( L j + ℓ j )( k j + ℓ j ) ∨ ( L j + ℓ j )+ i j ( x j + δℓ j ) (cid:19) b f h (cid:0) δ (( k + ℓ ) ∨ ( L + ℓ ) + i ) (cid:1) − X i ,...,i d =0 (cid:18) d Y j =1 α k j + ℓ j k j + ℓ j + i j ( x j + δℓ j ) (cid:19) b f h ( δ ( k + ℓ + i )) (cid:12)(cid:12)(cid:12)(cid:12) , raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. where the equality follows from the translational invariance property (15) of Theorem 1 and thefact that b f h ( δk ) = f h ( δk ) for k ≥

0. Using Lemma 6 with L ′ = L + ℓ ≥ k ′ = k + ℓ ≥ − L ′ , andnoting that J ′ ( k ′ ) = { j : k j + ℓ j < L j + ℓ j } = { j : k j < L j } = J ( k ), we see that the quantity above isbounded by 1( J ( k ) = ∅ ) C ( L, d ) max ≤ i ≤ ek + ℓ ≤ m ≤ L + ℓj ∈ J ( k ) (cid:12)(cid:12)(cid:12) ∆ j b f h ( δ (( k + ℓ ) J ( k ) c + i + m J ( k ) )) (cid:12)(cid:12)(cid:12) . Using Lemma 7, we bound the above by1( J ( k ) = ∅ ) C ( L, d ) max ≤ i,i ′ ≤ ek + ℓ ≤ m ≤ L + ℓj ∈ J ( k ) (cid:12)(cid:12)(cid:12) ∆ j f h (cid:16) δ (cid:0) ( k + ℓ ) J ( k ) c + i + m J ( k ) (cid:1) ∨ i ′ (cid:17)(cid:12)(cid:12)(cid:12) ≤ J ( k ) = ∅ ) C ( L, d ) max ≤ i ≤ ek + ℓ ≤ m ≤ L + ℓj ∈ J ( k ) (cid:12)(cid:12)(cid:12) ∆ j f h (cid:16) δ ( k + ℓ ) J ( k ) c + δm J ( k ) ∨ i (cid:17)(cid:12)(cid:12)(cid:12) . The inequality is true because ( k + ℓ ) J ( k ) c ≥ J ( k ). (cid:3) B.2.1. Proof of Lemma 6

Proving Lemma 6 requires yet another technical lemma.

Lemma 8.

Suppose f : δ Z → R and recall that we introduced P k ( x ) = P i =0 α kk + i ( x ) f (cid:0) δ ( k + i ) (cid:1) inSection 2.1. For any u, ¯ u ∈ Z with u < ¯ u , there exists a constant C = C (¯ u − u ) > such that | P ¯ u ( x ) − P u ( x ) | ≤ C ¯ u − u X j =0 (cid:12)(cid:12) ∆ f ( δ ( u + j )) (cid:12)(cid:12) , x ∈ [ δu, δ ¯ u ] . (70)We now prove Lemma 6 and then prove Lemma 8. Proof of Lemma 6

We used L ′ , k ′ , and J ′ ( k ′ ) in the statement of Lemma 6 to distinguish thevariables from L , k , and J ( k ) in that section. To keep this proof cleaner, we drop the primes anduse L , k , and J ( k ) to represent L ′ , k ′ and J ′ ( k ′ ), respectively. When d = 1, the bound followsimmediately from Lemma 8. Indeed, if − L ≤ k < L , then (cid:12)(cid:12)(cid:12) X i =0 α LL + i ( x ) f (cid:0) δ ( L + i ) (cid:1) − X i =0 α kk + i ( x ) h (cid:0) δ ( k + i ) (cid:1)(cid:12)(cid:12)(cid:12) ≤ C L − k X j =0 (cid:12)(cid:12) ∆ f ( δ ( k + j )) (cid:12)(cid:12) ≤ C ( L ) max k ≤ m ≤ L (cid:12)(cid:12) ∆ f ( δm ) (cid:12)(cid:12) follows from Lemma 8 with k in place of u and L in place of ¯ u .Proving the result when d > j ∈ J ( k ), and theonly added complication is the notational bookkeeping involved. To begin, note that X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ L j k j ∨ L j + i j ( x j ) (cid:19) f (cid:0) δ ( k ∨ L + i ) (cid:1) − X i ,...,i d =0 (cid:18) d Y j =1 α k j k j + i j ( x j ) (cid:19) f (cid:0) δ ( k + i ) (cid:1) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. X i j =0 j ∈ J ( k ) c (cid:18) Y j ∈ J ( k ) c α k j k j + i j ( x j ) (cid:19) × X i j =0 j ∈ J ( k ) (cid:18)(cid:16) Y j ∈ J ( k ) α L j L j + i j ( x j ) (cid:17) f (cid:0) δ ( k ∨ L + i ) (cid:1) − (cid:16) Y j ∈ J ( k ) α k j k j + i j ( x j ) (cid:17) f (cid:0) δ ( k + i ) (cid:1)(cid:19) . (71)Theorem 3 states that the weights α k j k j + i j ( x j ) are degree-7 polynomials in ( x j − δk j ) /δ and thereforethere exists a constant C > (cid:12)(cid:12)(cid:12) α k j k j + i j ( x j ) (cid:12)(cid:12)(cid:12) ≤ C (cid:0) | x j − δk j | /δ (cid:1) = C (cid:0) | x j − δ ( k j ∨ L j ) | /δ (cid:1) , j ∈ J ( k ) c , (72)We bound the interior sum in (71) by writing it as a telescoping series so that Lemma 8 can beapplied to each term in the series. First, we ﬁx those elements i j for which j ∈ J ( k ) c . Next, we let J = | J ( k ) | be the size of J ( k ), and let η (1) < η (2) < . . . < η ( J ) ∈ { , . . . , d } be the elements of J ( k ).Deﬁne u (0) = k ∨ L, u ( J ) = k , and for 0 < ℓ < J , deﬁne u ( ℓ ) = ( u ( ℓ ) , . . . , u d ( ℓ )) by u j ( ℓ ) =  k j , j J ( k ) ,k j , j ∈ J ( k ) and j ≤ η ( ℓ ) ,L j , j ∈ J ( k ) and j > η ( ℓ ) . It follows that X i j =0 j ∈ J ( k ) (cid:18)(cid:16) Y j ∈ J ( k ) α L j L j + i j ( x j ) (cid:17) f (cid:0) δ ( k ∨ L + i ) (cid:1) − (cid:16) Y j ∈ J ( k ) α k j k j + i j ( x j ) (cid:17) f (cid:0) δ ( k + i ) (cid:1)(cid:19) = X i j =0 j ∈ J ( k ) (cid:18)(cid:16) Y j ∈ J ( k ) α u j (0) u j (0)+ i j ( x j ) (cid:17) f (cid:0) δ ( u (0) + i ) (cid:1) − (cid:16) Y j ∈ J ( k ) α u j ( J ) u j ( J )+ i j ( x j ) (cid:17) f (cid:0) δ ( u ( J ) + i ) (cid:1)(cid:19) = J X ℓ =1 4 X i j =0 j ∈ J ( k ) (cid:18)(cid:16) Y j ∈ J ( k ) α u j ( ℓ − u j ( ℓ − i j ( x j ) (cid:17) f (cid:0) δ ( u ( ℓ −

1) + i ) (cid:1) − (cid:16) Y j ∈ J ( k ) α u j ( ℓ ) u j ( ℓ )+ i j ( x j ) (cid:17) f (cid:0) δ ( u ( ℓ ) + i ) (cid:1)(cid:19) . (73)Now ﬁx some ℓ between 1 and J , and consider X i j =0 j ∈ J ( k ) (cid:18)(cid:16) Y j ∈ J ( k ) α u j ( ℓ − u j ( ℓ − i j ( x j ) (cid:17) f (cid:0) δ ( u ( ℓ −

1) + i ) (cid:1) − (cid:16) Y j ∈ J ( k ) α u j ( ℓ ) u j ( ℓ )+ i j ( x j ) (cid:17) f (cid:0) δ ( u ( ℓ ) + i ) (cid:1)(cid:19) = X i j =0 j ∈ J ( k ) (cid:18)(cid:16) Y j ∈ J ( k ) j = η ( ℓ ) α u j ( ℓ − u j ( ℓ − i j ( x j ) (cid:17) α u η ( ℓ ) ( ℓ − u η ( ℓ ) ( ℓ − i η ( ℓ ) ( x η ( ℓ ) ) f (cid:0) δ ( u ( ℓ −

1) + i ) (cid:1) − (cid:16) Y j ∈ J ( k ) j = η ( ℓ ) α u j ( ℓ ) u j ( ℓ )+ i j ( x j ) (cid:17) α u η ( ℓ ) ( ℓ ) u η ( ℓ ) ( ℓ )+ i η ( ℓ ) ( x η ( ℓ ) ) f (cid:0) δ ( u ( ℓ ) + i ) (cid:1)(cid:19) . raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

By deﬁnition, u j ( ℓ −

1) = u j ( ℓ ) for all j = η ( ℓ ). Also, u η ( ℓ ) ( ℓ −

1) = L η ( ℓ ) , and u η ( ℓ ) ( ℓ ) = k η ( ℓ ) .Therefore, the term above equals X i j =0 j ∈ J ( k ) j = η ( ℓ ) (cid:16) Y j ∈ J ( k ) j = η ( ℓ ) α u j ( ℓ ) u j ( ℓ )+ i j ( x j ) (cid:17) × (cid:18) X i η ( ℓ ) =0 α L η ( ℓ ) L η ( ℓ ) + i η ( ℓ ) ( x η ( ℓ ) ) f (cid:0) δ ( u ( ℓ −

1) + i ) (cid:1) − X i η ( ℓ ) =0 α k η ( ℓ ) k η ( ℓ ) + i η ( ℓ ) ( x η ( ℓ ) ) f (cid:0) δ ( u ( ℓ ) + i ) (cid:1)(cid:19) . (74)Since α u j ( ℓ ) u j ( ℓ )+ i j ( x j ) are degree-7 polynomials in ( x j − δu j ( ℓ )) /δ , and − L j ≤ k j < L j for all j ∈ J ( k ),there exists a constant C ( L ) > j ∈ J ( k ), (cid:12)(cid:12)(cid:12) α u j ( ℓ ) u j ( ℓ )+ i j ( x j ) (cid:12)(cid:12)(cid:12) ≤ C (cid:0) | x j − δu j ( ℓ ) | /δ (cid:1) ≤ C ( L ) (cid:0) | x j − δL j | /δ (cid:1) = C ( L ) (cid:0) | x j − δ ( k j ∨ L j ) | /δ (cid:1) . (75)To bound the interior sum in (74), we apply Lemma 8, with k η ( ℓ ) in place of u and L η ( ℓ ) in place of¯ u , to get (cid:12)(cid:12)(cid:12)(cid:12) X i η ( ℓ ) =0 α L η ( ℓ ) L η ( ℓ ) + i η ( ℓ ) ( x η ( ℓ ) ) f (cid:0) δ ( u ( ℓ −

1) + i ) (cid:1) − X i η ( ℓ ) =0 α k η ( ℓ ) k η ( ℓ ) + i η ( ℓ ) ( x η ( ℓ ) ) f (cid:0) δ ( u ( ℓ ) + i ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( L ) L η ( ℓ ) − k η ( ℓ ) X m =0 (cid:12)(cid:12) ∆ η ( ℓ ) f (cid:0) δ ( u ( ℓ ) + i { ,...,d }\{ η ( ℓ ) } + me { η ( ℓ ) } ) (cid:1)(cid:12)(cid:12) ≤ C ( L ) max ≤ i ≤ ek ≤ m ≤ Lj ∈ J ( k ) (cid:12)(cid:12) ∆ j f (cid:0) δ ( k J ( k ) c + i + m J ( k ) ) (cid:1)(cid:12)(cid:12) . The last inequality is there to make things simpler by having a uniform upper bound that does notdepend on ℓ . Combining the upper bound above with (72) and (75) proves the result. (cid:3) Proof of Lemma 8

Let us ﬁrst assume that ¯ u = u + 1. We have already shown in (50) of theproof of Theorem 1 that ∂ v ∂x v P ¯ u ( x ) (cid:12)(cid:12)(cid:12) x = δ ¯ u = ∂ v ∂x v P u ( x ) (cid:12)(cid:12)(cid:12) x = δ ¯ u , for v = 0 , , , . By performing a fourth-order Taylor expansion around the point x = δ ¯ u , we see that | P ¯ u ( x ) − P u ( x ) | ≤ C (cid:16) sup x ∈ [ δu,δ ¯ u ] (cid:12)(cid:12)(cid:12) ∂ ∂x P ¯ u ( x ) (cid:12)(cid:12)(cid:12) + sup x ∈ [ δu,δ ¯ u ] (cid:12)(cid:12)(cid:12) ∂ ∂x P u ( x ) (cid:12)(cid:12)(cid:12)(cid:17) , x ∈ [ δu, δ ¯ u ] . Since α kk + i ( x ) are degree-7 polynomials in ( x − δk ) /δ whose coeﬃcients do not depend on k or δ , itfollows that (cid:12)(cid:12)(cid:12) ∂ ∂x P ¯ u ( x ) (cid:12)(cid:12)(cid:12) ≤ Cδ − (cid:16) (cid:12)(cid:12)(cid:12) x − δ ¯ uδ (cid:12)(cid:12)(cid:12)(cid:17) (cid:12)(cid:12) ∆ f ( δ ¯ u ) (cid:12)(cid:12) , (cid:12)(cid:12)(cid:12) ∂ ∂x P u ( x ) (cid:12)(cid:12)(cid:12) ≤ Cδ − (cid:16) (cid:12)(cid:12)(cid:12) x − δuδ (cid:12)(cid:12)(cid:12)(cid:17) (cid:12)(cid:12) ∆ f ( δu ) (cid:12)(cid:12) , raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. | P ¯ u ( x ) − P u ( x ) | ≤ C (cid:16) (cid:12)(cid:12) ∆ f ( δ ¯ u ) (cid:12)(cid:12) + (cid:12)(cid:12) ∆ f ( δu ) (cid:12)(cid:12) (cid:17) , x ∈ [ δu, δ ¯ u ] . The general case ¯ u > u + 1 follows from the triangle inequality: | P ¯ u ( x ) − P u ( x ) | ≤ ¯ u − u X j =1 (cid:12)(cid:12)(cid:12)(cid:12) X i =0 α u + ju + j + i ( x ) f (cid:0) δ ( u + j + i ) (cid:1) − X i =0 α u + j − u + j − i ( x ) f (cid:0) δ ( u + j − i ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ¯ u − u X j =0 (cid:12)(cid:12) ∆ f ( δ ( u + j )) (cid:12)(cid:12) . (cid:3) B.2.2. Proof of Lemma 7

Proof of Lemma 7

Note that b f ( δk ) = X i ,...,i d =0 (cid:18) d Y j =1 α k j ∨ k j ∨ i j ( δk j ) (cid:19) f ( δ ( k ∨ i )) , k ∈ Z d can be interpreted as the restriction of A b f ( x ) to x ∈ δ Z d . Since A b f ( x ) = F k ( x ) ∨ ( x ), where k ( x ) ∨ F k ( x ) is as in (55), the bound in (57) of Lemma 5 implies (cid:12)(cid:12) b f ( δk ) (cid:12)(cid:12) ≤ C | ( k ∨ | max ≤ i ≤ e | f ( δ ( k ∨ i )) | . Now assume 0 < k a k ≤ (cid:12)(cid:12) ∆ a b f ( δk ) (cid:12)(cid:12) ≤ Cδ k a k Z [ δk,δ ( k + a )] (cid:12)(cid:12)(cid:12) ∂ a ∂x a A b f ( x ) (cid:12)(cid:12)(cid:12) dx. (76)Then we can again use the fact that A b f ( x ) = F k ( x ) ∨ ( x ) together with (57) of Lemma 5 to concludethat the quantity above is bounded bysup x ∈ [ δk,δ ( k + a )] C (cid:18) d Y j =1 (cid:16) (cid:12)(cid:12)(cid:12) x j − δ ( k j ( x ) ∨ δ (cid:12)(cid:12)(cid:12)(cid:17) − a j (cid:19) max ≤ i j ≤ − a j j =1 ,...,d | ∆ a f ( δ (( k ( x ) ∨

0) + i )) |≤ C (cid:18) d Y j =1 (cid:0) | k j ∧ | (cid:1) − a j (cid:19) max ≤ i ≤ e − ak ≤ m ≤ k + a | ∆ a f ( δ (( m ∨

0) + i )) |≤ C (cid:0) | k ∧ | (cid:1) max ≤ i ≤ e | ∆ a f ( δ (( k ∨

0) + i )) | . Let us now verify (76). Suppose g : R → R is three times continuously diﬀerentiable with an abso-lutely continuous third derivative, and let b g : δ Z → R be the restriction of g ( x ) to δ Z . We ﬁrst provethat for 1 ≤ v ≤

4, ∆ v b g ( δk ) = δ v Z vδ c v ( u ) ∂ v ∂x v g ( δk + u ) du, (77) raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no. where c v ( x ) is a function such that sup x ∈ [0 ,δv ] | c v ( x ) | ≤ C for some constant C > k , g ( x ), and δ . Suppose v = 4. Using Taylor expansion, we have∆ b g ( δk ) = ∆ (cid:0) g ( δ ( k + 1)) − g ( δk ) (cid:1) = ∆ (cid:16) δg ′ ( δk ) + 12 δ g ′′ ( δk ) + 16 δ g ′′′ ( δk ) + 16 Z δ g (4) ( δk + u )( δ − u ) du (cid:17) = ∆ (cid:16) δg ′ ( δk ) + 12 δ g ′′ ( δk ) + 16 δ g ′′′ ( δk ) (cid:17) + ∆ (cid:16) Z δ ( A b f ) (4) ( δ ( k + 1) + u )( δ − u ) du − Z δ ( A b f ) (4) ( δk + u )( δ − u ) du (cid:17) . One may continue to manipulate the right-hand side in a similar manner to reach the desired formin (77). For 1 ≤ v ≤

3, (77) is veriﬁed similarly. Now since ∆ a b f ( δk ) = ∆ a d d . . . ∆ a b f ( δk ), we can apply(77) along each dimension j where a j > (cid:3) Appendix C: Proofs of Miscellaneous Technical Lemmas

C.1. Proof of Lemma 1

Proof of Lemma 1

Since h ∗ ( X ) = h ( X ), the triangle inequality implies that (cid:12)(cid:12) E h ∗ ( X ) − E h ∗ ( Y ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) E h ( X ) − E Ah ( Y ) (cid:12)(cid:12) + (cid:12)(cid:12) E Ah ( Y ) − E h ∗ ( Y ) (cid:12)(cid:12) . For x ∈ R d let k ( x ) ∈ Z d be deﬁned by k i ( x ) = ⌊ x i /δ ⌋ . Since h ∗ ( δk ( x )) = h ( δk ( x )) = Ah ( δk ( x )) and | x i − k i ( x ) | ≤ δ , (cid:12)(cid:12) E Ah ( Y ) − E h ∗ ( Y ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) E Ah ( Y ) − E h ∗ ( δk ( Y )) (cid:12)(cid:12) + (cid:12)(cid:12) E h ∗ ( δk ( Y )) − E h ∗ ( Y ) (cid:12)(cid:12) = (cid:12)(cid:12) E Ah ( Y ) − E Ah ( δk ( Y )) (cid:12)(cid:12) + (cid:12)(cid:12) E h ∗ ( δk ( Y )) − E h ∗ ( Y ) (cid:12)(cid:12) ≤ Cδ sup ≤ j ≤ dx ∈ R d (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x j Ah ( x ) (cid:12)(cid:12)(cid:12)(cid:12) + Cδ sup ≤ j ≤ dx ∈ R d (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x j h ∗ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . Using the bound in (53) from Theorem 3, it follows thatsup ≤ j ≤ dx ∈ R d (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x j Ah ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ Cδ − sup ≤ j ≤ dx ∈ R d | ∆ j h ( δk ( x )) | = Cδ − sup ≤ j ≤ dx ∈ R d | ∆ j h ∗ ( δk ( x )) | ≤ C sup ≤ j ≤ dx ∈ R d (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x j h ∗ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . This proves the ﬁrst claim. The other two claims follow by observing that if h ∗ ∈ Lip(1), thenthe mean-value theorem implies h ∈ dLip(1), and if h ∗ ∈ M , then (77) can be used to show that h ∈ M disc ( C ′ ) for some C ′ > (cid:3) raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. C.2. Proof of Lemma 3

Proof of Lemma 3

For convenience, deﬁne g ( δk ) = Z ∞ (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt, k ∈ Z d . We now show that g ( δk ) solves the Poisson equation. For ε > k ∈ Z d , let J ε ( k ) be the numberof jumps made by { X ( t ) } in the interval [0 , ε ] given X (0) = δk . Also, let r = P k ′ ∈ Z d q k,k ′ . Since theinter-jump times of { X ( t ) } are exponentially distributed, it follows that P ( J ε ( k ) = j ) =  − rε + o ( ε ) , j = 0 ,rε + o ( ε ) , j = 1 ,o ( ε ) , j > , where o ( ε ) is a quantity such that o ( ε ) /ε → ε →

0. By considering the jumps made on [0 , ε ],we see that Z ∞ ε (cid:0) E X (0)= δk h ( X ( t )) − E h ( X ) (cid:1) dt = (1 − rε ) Z ∞ ε (cid:0) E X ( ε )= δk h ( X ( t )) − E h ( X ) (cid:1) dt + ε X k ′ ∈ Z d q k,k ′ Z ∞ ε (cid:0) E X ( ε )= δk ′ h ( X ( t )) − E h ( X ) (cid:1) dt + o ( ε ) . Therefore, g ( δk ) = Z ε (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt + Z ∞ ε (cid:0) E X (0)= δk h ( X ( t )) − E h ( X ) (cid:1) dt = Z ε (cid:0) E δk h ( X ( t )) − E h ( X ) (cid:1) dt + (1 − rε ) g ( δk ) + ε X k ′ ∈ Z d q k,k ′ g ( δk ′ ) + o ( ε ) . Dividing both sides by ε and letting ε →

0, we conclude that G X g ( δk ) = E h ( X ) − h ( δk ) . (cid:3) Acknowledgments

The author would like to thank Han Liang Gan for stimulating discussions during early stages ofthis work, as well as Robert Bray and Shane Henderson for providing feedback on early drafts.

References

Asmussen S (2003)

Applied probability and queues , volume 51 of

Applications of Mathematics (NewYork) (New York: Springer-Verlag), second edition, ISBN 0-387-00211-1, Stochastic Modellingand Applied Probability. raverman:

Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

Barbour A (1990) Stein’s method for diﬀusion approximations.

Probab. Theory and Related Fields http://dx.doi.org/10.1007/BF01197887 .Barbour AD (1988) Stein’s method and Poisson process convergence.

Journal of Appl. Probab. .Barbour AD, Luczak MJ, Xia A (2018a) Multivariate approximation in total variation, i:Equilibrium distributions of Markov jump processes.

Ann. Probab. http://dx.doi.org/10.1214/17-AOP1204 .Barbour AD, Luczak MJ, Xia A (2018b) Multivariate approximation in total vari-ation, ii: Discrete normal approximation.

Ann. Probab. http://dx.doi.org/10.1214/17-AOP1205 .Braverman A (2020) Steady-state analysis of the join the shortest queuemodel in the Halﬁn-Whitt regime.

Math. Oper. Res. https://doi.org/10.1287/moor.2019.1023 .Braverman A (2021) Convergence rates for the steady-state distribution of the join the shortestqueue model in the Halﬁn-Whitt regime. Working paper.Braverman A, Dai JG (2017) Stein’s method for steady-state diﬀusion approximations of M/ Ph /n + M systems. Ann. of Appl. Probab. http://dx.doi.org/10.1214/16-AAP1211 .Braverman A, Dai JG, Fang X (2020a) High order steady-state diﬀusion approximations. URL https://arxiv.org/abs/2012.02824 .Braverman A, Dai JG, Feng J (2016) Stein’s method for steady-state diﬀusion approximations:An introduction through the Erlang-A and Erlang-C models.

Stoch. Syst. .Braverman A, Gurvich I, Huang J (2020b) On the taylor expansion of value functions.

Oper. Res. http://dx.doi.org/10.1287/opre.2019.1903 .Brown TC, Xia A (2001) Stein’s method and birth-death processes.

Ann. Probab. http://dx.doi.org/10.1214/aop/1015345606 .Budhiraja A, Lee C (2009) Stationary distribution convergence for generalized Jack-son networks in heavy traﬃc.

Math. Oper. Res. http://dx.doi.org/10.1287/moor.1080.0353 . raverman: Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no. https://arxiv.org/abs/1911.12917 .Dai JG, Dieker A, Gao X (2014) Validity of heavy-traﬃc steady-state approximations in many-server queues with abandonment.

Queueing Systems http://dx.doi.org/10.1007/s11134-014-9394-x .Dai JG, Shi P (2017) A two-time-scale approach to time-varying queuesin hospital inpatient ﬂow management.

Oper. Res. http://dx.doi.org/10.1287/opre.2016.1566 .Dieker A, Gao X (2013) Positive recurrence of piecewise Ornstein–Uhlenbeck processesand common quadratic Lyapunov functions.

Ann. Appl. Probab. http://dx.doi.org/10.1214/12-AAP870 .Eberle A (2016) Reﬂection couplings and contraction rates for diﬀusions.

Probab.Theory and Related Fields http://dx.doi.org/10.1007/s00440-015-0673-1 .Fang X, Shao QM, Xu L (2018) Multivariate approximations in Wasserstein distance by Stein’smethod and Bismut’s formula. URL https://arxiv.org/abs/1801.07815 .Feng J, Shi P (2018) Steady-state diﬀusion approximations for discrete-time queue in hos-pital inpatient ﬂow management.

Naval Research Logistics (NRL) http://dx.doi.org/10.1002/nav.21787 .Gamarnik D, Stolyar AL (2012) Multiclass multiserver queueing system in the Halﬁn-Whitt heavytraﬃc regime: Asymptotics of the stationary distribution.

Queueing Systems http://dl.acm.org/citation.cfm?id=2339029 .Gamarnik D, Zeevi A (2006) Validity of heavy traﬃc steady-state approximation in gen-eralized Jackson networks.

Ann. Appl. Probab. http://dx.doi.org/10.1214/105051605000000638 .Gan HL, R¨ollin A, Ross N (2017) Dirichlet approximation of equilibrium distributionsin Cannings models with mutation.

Advances in Appl. Probab. http://dx.doi.org/10.1017/apr.2017.27 .Gan HL, Ross N (2019) Stein’s method for the Poisson-Dirichlet distribution andthe Ewens sampling formula, with applications to Wright-Fisher models. URL https://arxiv.org/abs/1910.04976 . raverman: Prelimit generator comparison approach Article submitted to

Stochastic Systems ; manuscript no.

Gibbs AL, Su FE (2002) On choosing and bounding probability metrics.

International Sta-tistical Review / Revue Internationale de Statistique .Gorham J, Duncan AB, Vollmer SJ, Mackey L (2019) Measuring sample quality with diﬀusions.

Ann. Appl. Probab. http://dx.doi.org/10.1214/19-AAP1467 .G¨otze F (1991) On the rate of convergence in the multivariate CLT.

Ann. Probab. http://dx.doi.org/10.1214/aop/1176990448 .Gurvich I (2014a) Diﬀusion models and steady-state approximations for exponen-tially ergodic Markovian queues.

Ann. Appl. Probab. http://dx.doi.org/10.1214/13-AAP984 .Gurvich I (2014b) Validity of heavy-traﬃc steady-state approximations in multiclass queue-ing networks: the case of queue-ratio disciplines.

Math. Oper. Res. http://dx.doi.org/10.1287/moor.2013.0593 .Harrison JM, Reiman MI (1981) Reﬂected Brownian motion onan orthant.

Ann. Probab. http://links.jstor.org/sici?sici=0091-1798(198104)9:2<302:RBMOAO>2.0.CO;2-P&origin=MSN .Huang J, Gurvich I (2018) Beyond heavy-traﬃc regimes: Universal boundsand controls for the single-server queue.

Oper. Res. http://dx.doi.org/10.1287/opre.2017.1715 .Katsuda T (2010) State-space collapse in stationarity and its application to a multiclasssingle-server queue in heavy traﬃc.

Queueing Syst. http://dx.doi.org/10.1007/s11134-010-9178-x .Liu X, Ying L (2019) A simple steady-state analysis of load balancing algorithms in the sub-halﬁn-whitt regime.

SIGMETRICS Perform. Eval. Rev. http://dx.doi.org/10.1145/3305218.3305225 .Mackey L, Gorham J (2016) Multivariate Stein factors for a class of strongly log-concave distribu-tions.

Electron. Commun. Probab. http://dx.doi.org/10.1214/16-ECP15 .Meyn SP, Tweedie RL (1993) Stability of Markovian processes III: Foster-Lyapunov criteria forcontinuous time processes.

Adv. Appl. Probab. raverman:

Prelimit generator comparison approach

Article submitted to

Stochastic Systems ; manuscript no.

Stoch. Syst. http://dx.doi.org/10.1214/14-SSY139 .Tezcan T (2008) Optimal control of distributed parallel server systems underthe Halﬁn and Whitt regime.

Math. Oper. Res. http://search.proquest.com/docview/212618995?accountid=10267 .Wang FY (2016) Exponential contraction in Wasserstein distances for diﬀusion semigroups withnegative curvature. URL https://arxiv.org/abs/1603.05749 .Ye HQ, Yao DD (2012) A stochastic network under proportional fair resourcecontrol—diﬀusion limit with multiple bottlenecks.

Oper. Res. http://dx.doi.org/10.1287/opre.1120.1047 .Ying L (2016) On the approximation error of mean-ﬁeld models.

Proceedings of the2016 ACM SIGMETRICS International Conference on Measurement and Model-ing of Computer Science , 285–297 (Antibes Juan-les-Pins, France: ACM), URL http://dx.doi.org/10.1145/2964791.2901463 .Ying L (2017) Stein’s method for mean ﬁeld approximations in light and heavy traﬃcregimes.

Proc. ACM Meas. Anal. Comput. Syst. http://dx.doi.org/10.1145/3084449 .Zhang J, Zwart B (2008) Steady state approximations of limited processor sharing queues in heavytraﬃc.

Queueing Systems: Theory and Applications http://dx.doi.org/10.1007/s11134-008-9095-4http://dx.doi.org/10.1007/s11134-008-9095-4