aa r X i v : . [ m a t h . C O ] J un On Wormald’s differential equation method
Lutz Warnke ∗ May 21, 2019; revised June 10, 2019
Abstract
This note contains a short and simple proof of Wormald’s differential equation method (that yieldsslightly improved approximation guarantees and error probabilities). This powerful method uses differ-ential equations to approximate the time-evolution/dynamics of random processes and algorithms.
Oftentimes it is natural and useful to approximate the trajectories of a random process by the solutions todifferential equations (whose deterministic behaviour is easier to understand). This widely-used approachhas a long history: it was pioneered around 1970 by Kurtz [11, 12] for continuous-time Markov chains, andintroduced to the computer science community in the early 1980s by Karp and Sipser [10]. In combinatoricsthis approach was popularized in the 1990s by Wormald [18, 19, 20, 15, 6]: he developed a general frameworkfor applying the so-called differential equation method to discrete-time randomized algorithms and randomcombinatorial structures, which in the late 2000s has undergone some further technical advances by Bohmanand others [1, 3, 16, 13, 2, 7, 4]. To date, this powerful method remains the state-of-the-art for the analysisof many important randomized combinatorial algorithms (see, e.g., [1, 3, 2, 7, 4, 5]).In this note we provide a short and simple proof of Wormald’s differential equation method [18, 19],obtaining slightly improved approximation guarantees and error probabilities. The organization is as follows.In the next two subsections we illustrate the flavour of the method, and motivate the core proof idea in asimpler toy setting. In Section 2 we then state and prove Wormald’s theorem (Theorem 2 and Section 2.1), andalso discuss some useful extensions (Section 2.2). The final Section 3 contains some brief concluding remarks.
The basic goal of the differential equation method is to ‘track’ a collection of variables ( Y k ( i )) k a associatedto some discrete-time random process (e.g., in some n -vertex random graph process, Y k ( i ) might denote thenumber of vertices of degree k after i steps), and it provides a framework for showing that these randomvariables closely ‘follow’ the solution ( y k ( t )) k a of a corresponding system of differential equations. Theflavour of applications is roughly as follows: if the one-step changes of these variables satisfy • E (cid:0) Y k ( i + 1) − Y k ( i ) | Y ( i ) , . . . , Y a ( i ) (cid:1) = F k (cid:0) i/n, Y ( i ) /n, ..., Y a ( i ) /n (cid:1) + o (1), where the functions F k are‘well-behaved’ (i.e., sufficiently smooth), and • (cid:12)(cid:12) Y k ( i + 1) − Y k ( i ) (cid:12)(cid:12) is never ‘too big’ (in the worst case),then the heuristic conclusion of the differential equation method (see Theorem 2) is that • with high probability Y k ( tn ) = y k ( t ) n + o ( n ), where the deterministic functions ( y k ( t )) k a are theunique solution to y ′ k ( t ) = F k (cid:0) t, y ( t ) , ..., y a ( t ) (cid:1) with y k (0) = Y k (0) /n .In concrete words, this says that if we interpret the expected one-step difference equations of the randomvariables as differential equations, then the values of the rescaled random variables Y k ( tn ) /n typically stayclose to the deterministic solutions y k ( t ) of the corresponding system of differential equations (so the y k ( t ) arethe deterministic ‘limiting objects’ of the Y k ( tn ) /n ). This also establishes a form of dynamic concentration ,since the variables ( Y k ( i )) k a are sharply concentrated around their expected trajectories in each step. ∗ School of Mathematics, Georgia Institute of Technology, Atlanta GA 30332, USA. E-mail: [email protected] . Re-search partially supported by NSF Grant DMS-1703516 and a Sloan Research Fellowship. .2 Motivation: stability of differential equations It turns out that the statement and proof of the differential equation method (Theorem 2) can be motivatedby ‘stability properties’ of differential equations with Lipschitz properties. The relevant toy question is: howmuch can two collections of functions ( y k ( t )) k a and ( z k ( t )) k a differ if they have similar derivativesand initial values? To be more precise, assume that for some small ‘perturbations’ λ, δ we have y k (0) = ˆ y k and y ′ k ( t ) = F k (cid:0) t, y ( t ) , . . . , y a ( t ) (cid:1) , (1) | z k (0) − ˆ y k | λ and (cid:12)(cid:12) z ′ k ( t ) − F k (cid:0) t, z ( t ) , . . . , z a ( t ) (cid:1)(cid:12)(cid:12) δ, (2)where the functions F k are L -Lipschitz-continuous on some bounded domain D ⊆ R a +1 (i.e., D is a connectedopen subset that is bounded). In stability theory of differential equations it is standard to compare suchfunctions via Gronwall’s inequality (see Appendix A for its simple proof). Lemma 1 (Gronwall’s inequality) . Given a continuous function x ( t ) defined on [0 , T ] , assume that thereis L > such that x ( t ) C + L R t x ( s ) d s for t ∈ [0 , T ) . Then x ( t ) Ce Lt for t ∈ [0 , T ] . Indeed, integrating the derivatives z ′ k ( t ) and y ′ k ( t ), after taking absolute values it follows that | z k ( t ) − y k ( t ) | | z k (0) − y k (0) | + Z t | z ′ k ( s ) − y ′ k ( s ) | d s λ + δt + Z t (cid:12)(cid:12) F k (cid:0) s, z ( s ) , . . . , z a ( s ) (cid:1) − F k (cid:0) s, y ( s ) , . . . , y a ( s ) (cid:1)(cid:12)(cid:12) d s. (3)Assuming for the moment that ( t, z ( t ) , . . . , z a ( t )) and ( t, y ( t ) , . . . , y a ( t )) are both still inside the domain D for all t ∈ [0 , T ), using that the functions F k are L -Lipschitz-continuous on D it follows thatmax k a | z k ( t ) − y k ( t ) | ( λ + δT ) + L Z t max k a | z k ( s ) − y k ( s ) | d s, (4)which by Gronwall’s inequality (Lemma 1) then implies for all t ∈ [0 , T ] the boundmax k a | z k ( t ) − y k ( t ) | (cid:0) λ + δT ) e LT . (5)The punchline of the above argument is as follows: in order to understand the ‘approximate’ solutions z k ( t )to the differential equation (1), it essentially suffices to understand the ‘exact’ solutions y k ( t ) on the do-main D . One snag is that, due to the error-term in (5), it can happen that ( t, z ( t ) , . . . , z a ( t )) / ∈ D despite( t, y ( t ) , . . . , y a ( t )) ∈ D . To overcome this obstacle, we intuitively remove all points from D which are ‘tooclose’ to the boundary, so that (5) ensures ( t, z ( t ) , . . . , z a ( t )) ∈ D in the remainder. The crux of this technicalidea is that, by choosing σ ∈ [0 , T ] suitably, we can then ensure that the bound (5) holds for all t ∈ [0 , σ ].Perhaps surprisingly, our upcoming proof of the differential equation method does little more than adapt-ing the above comparison argument to the random setting (using concentration inequalities and a discretevariant of Gronwall’s inequality). We hope that this viewpoint also clarifies the role of the somewhat technicalparameter σ in Theorem 2 below, which simply handles complications near the boundary of the domain D . We now state a non-asymptotic version of the differential equation method which (together with Remark 3and Lemma 11) is slightly stronger than Wormald’s original formulation [18, 19].The key difference lies in the (exponentially small) probability with which the conclusion (6) fails: in [19,Theorem 5.1] it goes to zero when nλ /β → ∞ , whereas in the Theorem 2 below the weaker assump-tion nλ /β → ∞ suffices (usually λ = o (1) and β = Ω(1) hold). Besides better probability bounds, this alsoenables smaller ‘approximation errors’ of the form O ( λn ) in the conclusion (6) below. (For example, if we areas in [14, Section 4.2] aiming at error probabilities of form n − ω (1) when β = Θ(1) and δ = O ( n − ), then [19,Theorem 5.1] requires λn = ω (1) · n / √ log n , whereas λn = ω (1) · n / √ log n suffices in Theorem 2 below.)In applications usually ( F i ) i > denotes the natural filtration of the underlying random process, and thenone can simply think of F i as the ‘history’ which contains all information available during the first i steps. A function f is said to be L -Lipschitz-continuous on D ⊆ R ℓ if | f ( x ) − f ( x ′ ) | L · max k ℓ | x k − x ′ k | holds for allpoints x = ( x , . . . , x ℓ ) and x ′ = ( x ′ , . . . , x ′ ℓ ) in D , where max k ℓ | x k − x ′ k | is the ℓ ∞ -distance between x and x ′ . heorem 2 (Differential equation method) . Given integers a, n > , a bounded domain D ⊆ R a +1 ,functions ( F k ) k a with F k : D → R , and σ -fields F ⊆ F ⊆ · · · , suppose that the random vari-ables (( Y k ( i )) k a are F i -measurable for i > . Furthermore, assume that, for all i > and k a , thefollowing conditions hold whenever ( i/n, Y ( i ) /n, ..., Y a ( i ) /n ) ∈ D :(i) (cid:12)(cid:12) E (cid:0) Y k ( i + 1) − Y k ( i ) | F i (cid:1) − F k (cid:0) i/n, Y ( i ) /n, ..., Y a ( i ) /n (cid:1)(cid:12)(cid:12) δ , where the function F k is L -Lipschitz-continuous on D (the ‘Trend hypothesis’ and ‘Lipschitz hypothesis’) ,(ii) (cid:12)(cid:12) Y k ( i + 1) − Y k ( i ) (cid:12)(cid:12) β (the ‘Boundedness hypothesis’) ,and that the following condition holds initially:(iii) max k a (cid:12)(cid:12) Y k (0) − ˆ y k n (cid:12)(cid:12) λn for some (0 , ˆ y , . . . , ˆ y a ) ∈ D (the ‘Initial condition’) .Then there are R = R ( D , ( F k ) k a , L ) ∈ [1 , ∞ ) and T = T ( D ) ∈ (0 , ∞ ) such that, whenever λ > δ min { T, L − } + R/n , with probability at least − ae − nλ / (8 T β ) we have max i σn max k a (cid:12)(cid:12) Y k ( i ) − y k (cid:0) in (cid:1) n (cid:12)(cid:12) < e LT λn, (6) where ( y k ( t )) k a is the unique solution to the system of differential equations y ′ k ( t ) = F k (cid:0) t, y ( t ) , . . . , y a ( t ) (cid:1) with y k (0) = ˆ y k for k a , (7) and σ = σ (ˆ y , . . . ˆ y a ) ∈ [0 , T ] is any choice of σ > with the property that ( t, y ( t ) , . . . y a ( t )) has ℓ ∞ -distanceat least e LT λ from the boundary of D for all t ∈ [0 , σ ) . Remark 3.
The deterministic ‘Initial condition’ (iii) can be relaxed: the proof shows P ( ¬G λ ) a · e − nλ / (8 T β ) , where G λ is the event that (6) holds for all (0 , ˆ y , . . . , ˆ y a ) ∈ D with max k a | Y k (0) − ˆ y k n | λn . The surveys [19, 6] contain numerous examples that illustrate how to apply this powerful result (some technicalextensions of Theorem 2 are discussed in Section 2.2, which, e.g., allow for larger one-step changes). We pointout that (6) only gives a ‘good’ approximation as long as | y k ( i/n ) | ≫ e LT λ , which for many natural choicesof D and σ means that the condition i/n ≤ σ in (6) is not very restrictive; see also Section 2.2.3. Remark 4.
Standard results for differential equations (see, e.g., [9, Theorem 11 in Chapter 2.5]) guaranteethat (7) has a unique solution ( y k ( t )) k a which extends arbitrarily close to the boundary of D . Remark 5.
The proof of Theorem 2 in fact works for any choice of R ∈ [1 , ∞ ) and T ∈ (0 , ∞ ) whichsatisfy t T and max k a | F k ( x ) | R for all x = ( t, y , . . . , y a ) ∈ D with t > . The below proof of Theorem 2 essentially mimics the deterministic Gronwall-type argument from Section 1.2in the present random setting (this strategy differs slightly from the original proof given by Wormald [18, 19],and resembles more some earlier arguments of Kurtz [11, 12] for continuous-time Markov chains).
Why is Theorem 2 intuitively true?
First we reduce to a deterministic setting: combining the Azuma–Hoeffdinginequality (Lemma 7 below) with the ‘Boundedness hypothesis’ (ii) and the ‘Trend hypothesis’ (i), it turnsout that, with sufficiently high probability, for all relevant j the random variables Y k ( j ) approximately satisfy Y k ( j ) − Y k (0) ≈ X i
0, the ‘tower property’ of conditional expectations implies E ( M k ( i +1) − M k ( i ) | F i ) = { i j m , thatmax k a (cid:12)(cid:12) Y k ( j ) − y k (cid:0) jn (cid:1) n (cid:12)(cid:12) < λn + X i Lemma 8 (Exploiting the proof structure) . In the verification of conditions (i) and (ii) we may additionallyassume that i < ⌊ T n ⌋ and max k a (cid:12)(cid:12) Y k ( i ) − y k (cid:0) in (cid:1) n (cid:12)(cid:12) < e LT λn hold.Proof. In the probabilistic part of the argument we redefine I D as the minimum of ⌊ T n ⌋ and the smallestinteger i > i/n, Y ( i ) /n, ..., Y a ( i ) /n ) 6∈ D or max k a (cid:12)(cid:12) Y k ( i ) − y k (cid:0) in (cid:1) n (cid:12)(cid:12) > e LT λn holds. Thedeterministic part then carries over, since the induction hypothesis again ensures m − < I D ⌊ T n ⌋ . Lemma 9 (Using additional events) . Given F i –measurable events ( E i ) i
As a second example, by using average one-step bounds or truncation arguments, we can often handle (viarefined error probabilities) much larger one-step changes in the ‘Boundedness hypothesis’ (ii) of Theorem 2. Lemma 10 (Using average one-step bounds) . Assume that we add the bound E (cid:0) | Y k ( i + 1) − Y k ( i ) | | F i (cid:1) b to condition (ii). Then P ( ¬G λ ) a · e − min { nλ / (4 T βb ) , nλ/ (4 β ) } .Proof. We shall only modify the probabilistic part of the argument, replacing the Azuma–Hoeffding inequalityby a more advanced (martingale) concentration inequality that can be traced back to Freedman [8]. Usingstandard variance properties, the crux is that the modified ‘Boundedness hypothesis’ (ii) impliesVar( M k ( i + 1) − M k ( i ) | F i ) = Var(∆ Y k ( i ) | F i ) { i β | F i (cid:1) γ and (cid:12)(cid:12) Y k ( i + 1) − Y k ( i ) (cid:12)(cid:12) B . Then, for all x > and λ > ( δ + γB ) min { T, L − } + ( R + xB ) /n , we have P ( ¬G λ ) a · e − nλ / (8 T β ) + a · P ( Z > ⌊ x + 1 ⌋ ) with Z ∼ Bin( ⌊ T n ⌋ , γ ) . Remark 12. Note that P ( Z > T nγ , and P ( Z > ⌊ x + 1 ⌋ ) ( eT nγ/ ⌈ x ⌉ ) ⌈ x ⌉ for x > .Proof. Writing Z k for the number of 0 i < I D with | Y k ( i + 1) − Y k ( i ) | > β , using the modified ‘Boundednesshypothesis’ (ii) it easily follows that Z k is stochastically dominated by Z . Defining B x := { max k a Z ′ k x } ,we infer P ( ¬B x ) a · P ( Z > ⌊ x + 1 ⌋ ). In the probabilistic part of the argument we redefine ∆ Y k ( i ) := { i y k ( t ) ∈ ( A k − ε, B k + ε ) for all t ∈ [0 , σ ], completing the proof. Example 14 (Choice of σ ) . Suppose that there are constants A k , B k such that Y k ( i ) /n ∈ [ A k , B k ] holds forall 0 i σn . Fixing σ > ε > 0, suppose that the F k are L -Lipschitz on the domain D = D ε which contains all ( t, y , . . . , y a ) ∈ R a +1 satisfying t ∈ ( − ε, σ + ε ) and y k ∈ ( A k − ε, B k + ε ). The naturalassumption λ = o (1) as n → ∞ then ensures via Lemma 13 that the conclusion (6) of the differential equationmethod implies y k ( t ) ∈ [ A k , B k ] for all t ∈ [0 , σ ]. Hence no y k ( t ) can come 3 e LT λ = o ( ε ) close to the boundaryfor t ∈ [0 , σ ], which shows that σ is a valid choice in Theorem 2 (since σ + 3 e LT λ < σ + ε/ 2, say). Remark 15 (Using additional events) . In the setting of Lemma 9 one can of course again argue about theparameter σ and the range of the functions y k ( t ) as in Example 14 and Lemma 13 above, provided that theadditional event E implies the relevant bounds Y k ( i ) /n ∈ [ A k , B k ] for all i σn , say. In this note we have given a conceptually simple proof of Wormald’s differential equation method that mightbe suitable for teaching in class (we tried to keep the entry-level low by avoiding ‘martingale jargon’). Ourslightly stronger conclusion is also useful for applications requiring small approximation errors, see [14].We believe that the differential equations perspective taken in Section 1.2 facilitates the development ofnew proof approaches. Indeed, inequalities developed in that deterministic toy setting can sometimes belifted to the random setting by adding martingale error terms to the argument, as exemplified by Section 2.1(for Gronwall’s inequality). As a further illustration, suppose that we replace the second inequality of (2) bythe following stronger approximation assumption: if | z k ( t ) − y k ( t ) | ξ k ( t ) holds for all 1 k a , then (cid:12)(cid:12) z ′ k ( t ) − F k (cid:0) t, y ( t ) , . . . , y a ( t ) (cid:1)(cid:12)(cid:12) δ k ( t ) . (15)If, in addition, the integral inequality ξ k ( t ) > R t δ k ( s ) d s + λ holds (a kind of ‘consistency equation’ for theerror terms), then a comparison argument along the lines of (3)–(5) yields the bound | z k ( t ) − y k ( t ) | λ + Z t δ k ( s ) d s ξ k ( t ) . Interestingly, it turns out that an adaptation of this idea to the discrete random setting naturally leads toan approach that is more or less equivalent to the one developed by Bohman [1, 3, 16]. Acknowledgements. The proof contained in this note (and its motivation) was presented in graduatecourses at Cambridge (2013), Georgia Tech (2017), and a Fields Institute summer school (2017). I thankPatrick Bennett, Tom Bohman, Catherine Greenhill, Tamas Makai, Oliver Riordan, Greg Sorkin, Joel Spencer,and Nick Wormald for helpful comments and discussions.6 eferences [1] T. Bohman. The triangle-free process. Adv. Math. (2009), 1653–1677.[2] T. Bohman, A. Frieze, and E. Lubetzky. Random triangle removal. Adv. Math. (2015), 379–438.[3] T. Bohman and P. Keevash. The early evolution of the H -free process. Invent. Math. (2010), 291–336.[4] T. Bohman and P. Keevash. Dynamic concentration of the triangle-free process. Preprint (2013). arXiv:1302.5963 .[5] T. Bohman and L. Warnke. Large girth approximate Steiner triple systems. Preprint (2018). arXiv:1808.01065 .[6] J. D´ıaz and D. Mitsche. The cook-book approach to the differential equation method. Computer Science Review (2010), 129–151.[7] G. Fiz Pontiveros, S. Griffiths, and R. Morris. The triangle-free process and R (3 , k ). Memoirs of the Am. Math.Soc. , to appear. arXiv:1302.6279 .[8] D. Freedman. On tail probabilities for martingales. Ann. Probability (1975), 100–118.[9] W. Hurewicz. Lectures on ordinary differential equations . M.I.T. Press, Cambridge Massachusetts (1958).[10] R. Karp and M. Sipser. Maximum matching in sparse random graphs. In Proceedings of the 22nd AnnualSymposium on Foundations of Computer Science , IEEE Comput. Soc. Press (1981), pp. 364–375.[11] T. Kurtz. Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Proba-bility (1970), 49–58.[12] T. Kurtz. Approximation of Population Processes . SIAM, Philadelphia (1981).[13] O. Riordan and L. Warnke. Convergence of Achlioptas processes via differential equations with unique solutions. Combin. Probab. Comput. (2016), 154–171.[14] O. Riordan and L. Warnke. The phase transition in bounded-size Achlioptas processes. Preprint (2017). arXiv:1704.08714 .[15] T. Seierstad. A central limit theorem via differential equations. Ann. Appl. Probab. (2009), 661–675.[16] L. Warnke. When does the K -free process stop? Rand. Struct. Algor. (2014), 355–397.[17] L. Warnke. On the method of typical bounded differences. Combin. Probab. Comput. (2016), 269–299.[18] N. Wormald. Differential equations for random processes and random graphs. Ann. Appl. Probab. (1995),1217–1235.[19] N. Wormald. The differential equation method for random graph processes and greedy algorithms. In Lectureson Approximation and Randomized Algorithm , PWN, Warsaw (1999), pp. 73–155.[20] N. Wormald. Analysis of greedy algorithms on graphs with bounded degrees. Discrete Math. (2003), 235–260. A Appendix: deferred routine proofs Proof of Lemma 1. Let y ( t ) := L R t x ( s ) d s . Noting ( y ( t ) e − Lt ) ′ = [ y ′ ( t ) − Ly ( t )] e − Lt = L [ x ( t ) − y ( t )] e − Lt LCe − Lt , integration gives y ( t ) e − Lt − Ce − Lt + C and x ( t ) C + y ( t ) Ce Lt . Proof of Lemma 6. Assuming b = 0, for 0 j m it is routine to verify by induction that x j < c + X i Define E i := { max j i M j > t } and λ := t/ ( mc ). Setting S := M = 0, define S i := {¬E i − } M i + {E i − } S i − and X i := S i +1 − S i = {¬E i } [ M i +1 − M i ]. Since x e λx is a convex functionand e y + e − y e y / holds (by comparing Taylor series), it follows that e λx x + c c e λc + c − x c e − λc x c (cid:16) e λc − e − λc (cid:17) + e ( λc ) / for − c x c. (16)We have E ( X i | F i ) = {¬E i } E ( M i +1 − M i | F i ) = 0 and | X i | | M i +1 − M i | c , so E ( e λX i | F i ) e ( λc ) / by (16). Iterating E ( e λS i +1 ) = E ( e λS i E ( e λX i | F i )) E ( e λS i ) · e ( λc ) / yields E ( e λS m ) e m ( λc ) / . Notingthat E m implies S m > t , Markov’s inequality and λ = t/ ( mc ) thus give P (cid:16) max j m M j > t (cid:17) = P ( E m ) P ( e λS m > e λt ) E ( e λS m ) e − λt e m ( cλ ) / − λt = e − t / (2 mc ) . (17)Since min j m M j − t implies max j m ( − M j ) > t , now a further application of (17) completes theproof (as M ′ i := − M i also satisfies M ′ = 0, E ( M ′ i +1 − M ′ i | F i ) = 0, and | M ′ i +1 − M ′ i | cc