Rough semimartingales and p -variation estimates for martingale transforms
aa r X i v : . [ m a t h . P R ] A ug ROUGH SEMIMARTINGALES AND p -VARIATION ESTIMATESFOR MARTINGALE TRANSFORMS PETER FRIZ AND PAVEL ZORIN-KRANICH
Abstract.
We establish a new scale of p -variation estimates for martingale para-products, martingale transforms, and Itô integrals, of relevance in rough pathstheory, stochastic, and harmonic analysis. As an application, we introduce roughsemimartingales, a common generalization of classical semimartingales and (con-trolled) rough paths, and their integration theory. Contents
1. Statement of main results 11.1. Itô integral 31.2. Rough integrators 51.3. Rough semimartingales 61.4. Differential equations 92. Vector-valued estimates in discrete time 112.1. Davis decomposition 112.2. Vector-valued BDG inequality 122.3. Vector-valued maximal paraproduct estimate 132.4. Branched rough paths 163. Variational estimates in discrete time 173.1. Stopping time construction 183.2. Sewing lemma 193.3. Discrete sums corresponding to Itô integrals 203.4. Discrete sums arising in Itô integration of branched rough paths 204. Estimates for the Itô integral 214.1. Itô integral 214.2. Mesh convergence 225. Quadratic covariation of a controlled process and a martingale 245.1. Variation norm estimate 245.2. Discretization of quadratic covariation 255.3. Integration by parts 295.4. Quadratic covariation of two martingales 306. Consistency of rough and stochastic integration 30Appendix A. Hölder estimates for martingale transforms 33References 341.
Statement of main results
Let (Ω , F , ( F t ) t ≥ , P ) be a filtered probability space. An adapted partition π is anincreasing sequence of stopping times ( π n ) n ∈ N such that π = 0 and lim n →∞ π n = ∞ .The set of adapted partitions is a directed set with respect to the inclusion relation π ′ ⊆ π : ⇐⇒ { π ′ n | n ∈ N } ⊆ { π n | n ∈ N } . Mathematics Subject Classification.
For a two-parameter process
Π = (Π t,t ′ ) ≤ t ≤ t ′ < ∞ and p ∈ (0 , ∞ ) , the p -variation is defined by(1.1) V p Π := sup l max ,u ≤···≤ u l max (cid:16) l max X l =1 | Π u l − ,u l | p (cid:17) /p , with the ℓ p norm replaced by the ℓ ∞ norm in the case p = ∞ . For a one-parameterprocess f = ( f t ) t ≥ , the p -variation is defined by V p f := V p ( δf ) , ( δf ) t,t ′ := f t ′ − f t . The p -variation is a monotonically decreasing function of p . A classical result about p -variation is Lépingle’s inequality [Lep76] which tells that, for a càdlàg martingale g = ( g t ) t ≥ , we have (1.2) k V p g k L q (Ω) . k V ∞ g k L q (Ω) , < p ≤ ∞ , ≤ q < ∞ . Here, V ∞ g = sup ≤ t
Itô integral.
Our first main result extends the estimate (1.7) to the case of Itôintegrals with integrands whose variation exponent is p ≥ . The pathwise estimate(1.7) becomes false in this regime, and we have to substitute it with a momentestimate (which follows directly from (1.7), Hölder’s, and Lépingle’s inequalities inthe case p < ). Moreover, we replace the increment process δf by a general two-parameter process F ; the motivation for doing so is explained below. Theorem 1.1.
Let < q ≤ ∞ , ≤ q < ∞ , and < r, p ≤ ∞ . Suppose (1.8) /r < /p + 1 / /p i, + 1 /p i, , /q = 1 /q + 1 /q . Let ( F s,t ) s ≤ t be a càdlàg adapted process and ( g t ) a càdlàg martingale. Suppose thatthere exist càdlàg adapted processes F i , ˜ F i , i ∈ { , . . . , i max } , i max ∈ N , such that (1.9) F s,u − F t,u = i max X i =1 F is,t ˜ F it,u , s ≤ t ≤ u. Then, the following holds. (1)
For every adapted partition π , we have the estimate (cid:13)(cid:13) V r Π π ( F, g ) (cid:13)(cid:13) L q . k V p F ( π ) k L q k V ∞ g k L q + X i k V p i, F i, ( π ) · V p i, Π π ( ˜ F i , g ) k L q . (1.10)(2) For every i , let /q = 1 /q i, + 1 /q i, , and suppose that F i = lim π F i, ( π ) in L q i, ( V p i, ) , (1.11) Π( ˜ F i , g ) = lim π Π π ( ˜ F i , g ) exists in L q i, ( V p i, ) , (1.12) and ˜ F i ∈ L q ( V ∞ ) . Suppose that the right-hand side of (1.14) is finite. Then (1.13) Π( F, g ) := lim π Π π ( F, g ) exists as the limit of a Cauchy net in L q (Ω , V r ) , satisfies the bound (cid:13)(cid:13) V r Π( F, g ) (cid:13)(cid:13) L q . k V p F k L q k V ∞ g k L q + X i k V p i, F i · V p i, Π( ˜ F i , g ) k L q , (1.14) and, for any ≤ t ≤ t ′ ≤ t ′′ < ∞ , Chen’s relation(1.15) Π( F, g ) t,t ′′ = Π( F, g ) t,t ′ + Π( F, g ) t ′ ,t ′′ + X i F it,t ′ Π( ˜ F i , g ) t ′ ,t ′′ . The limit (1.13) is the Itô integral, which can also be denoted by(1.16) Π( F, g ) t,t ′ = Z ( t,t ′ ] F t,u − dg u . The hypothesis (1.11) is easily verfied if F i satisfies a structural hypothesis similar to(1.9) for F , see Lemma 4.1. The hypothesis (1.12) is typically obtained by recursiveapplication of Theorem 1.1.1.1.1. Relation to previous works.
In the case F ≡ , we have Π π ( F, g ) = δg for anyadapted partition π . Moreover, the right-hand side of (1.9) is an empty sum in thiscase, so that Theorem 1.1 boils down to Lépingle’s inequality (1.2). Our argumenthas its roots in the approach to Lépingle’s inequality given in [Bou89; PX88]; we alsorefer to [Zor20] for a short self-contained exposition of this case.If F = δf are the differences of a càdlàg process f , then F s,u − F t,u = ( δf ) s,t · F s,t · ˜ F t,u with ˜ F s,t ≡ . The convergence hypotheses (1.11) and (1.12) are witnessed by thestopping construction in Lemma 4.1. Since Π( ˜
F , g ) = δg and by Lépingle’s inequality(1.2) for g , the estimate (1.14) becomes(1.17) (cid:13)(cid:13) V r Π( δf, g ) (cid:13)(cid:13) L q . k V p ( δf ) k L q k V ∞ g k L q . P. FRIZ AND P. ZORIN-KRANICH If f is also a martingale, ≤ q < ∞ , and r > , then, taking p = 2 + and usingLépingle inequality (1.2) for f , the estimate (1.17) implies(1.18) (cid:13)(cid:13) V r Π( δf, g ) (cid:13)(cid:13) L q . k V ∞ f k L q k V ∞ g k L q . In this case, the object Π( δf, g ) is analogous to so-called paraproducts in harmonicanalysis. For paraproducts, an estimate of the form (1.18) was first proved in[DMT12], motivated by an application of rough path theory in time-frequency anal-ysis [DMT17, Corollary 1.2].The estimate (1.18) is of interest because it shows that, for a (multidimensional)martingale X , the pair ( X, Π( X, X )) is almost surely a rough path. For continuousmartingales, the estimate (1.18) was proved in [FV06] (in the diagonal case q = q ).For càdlàg martingales, the estimate (1.18) was proved in [CF19] (in the diagonalcase q = q ) and in [KZ19] (for general q , q > ).For non-martingale integrands f , the estimate (1.17) is new. One of the motiva-tions for considering this case is the construction of joint rough path lifts of roughpaths and martingales, see Theorem 1.2 below, which underlies our notion of roughsemimartingale. Another motivation, see e.g. [CL05] and [FV10a, Ch.14], is the an-alytic stability of Itô integrals of the form R ϕ ( f ) d g , with sufficiently regular ϕ , asa function of f . A weaker version of the estimate (1.17), which does not respect theHölder scaling condition on q , was proved in the case q = q = 2 in [DOP19, Propo-sition 3.13] and used to establish invariance principles of random walks in randomenvironments in rough path topology.Although of no direct interest in rough paths, we note that the case p = ∞ , r = 2 + of (1.17) is a consequence of Lépingle’s inequality applied to the martingales ( R t f u − d g u ) t and g . However, the approach via Theorem 1.1 is still preferable inthis case, since it provides a construction of the Itô integral R f u − d g u that natu-rally comes with variation norm estimates. We further elaborate on this point ofview in Section 4.2, where we deduce the classical convergence results for discreteapproximations to the Itô integral with respect to càdlàg local martingales ( M loc )from Theorem 1.1. At this point, the ability to take q = 1 , missing in [KZ19], isimportant, see Lemma 4.4.The estimate (1.14) for processes F that are not of the increment form is usefulfor the construction of Itô branched rough paths , see Section 3.4. For instance, if f ∈ L q ( V p ) with p ≥ , then the information R δf − d g is not sufficient for roughpath theory, and more stochastic building blocks have to be included. Theorem 1.1shows, for instance, that R ( δf − ) d g has variational exponent r = 1 / (2 /p + 1 / − .Note that one can choose r < iff p < which, in that case, reflects redundancy of R ( δf − ) d g from a rough integration perspective. In harmonic analysis, analogues ofsuch integrals are known as multilinear paraproducts , see e.g. [MTT02; Mus14].Another setting in which two-parameter integrands F are useful is that of con-trolled rough integration, introduced in [Gub04]. The easiest situation is as follows.Let X, Y, Y ′ be càdlàg adapted processes and g a càdlàg martingale. We interpret Y ′ as the Gubinelli derivative of Y with respect to X , so that the remainder term isgiven by(1.19) R ≡ δY − Y ′ δX : ⇐⇒ R s,t ≡ δY s,t − Y ′ s δX s,t . Then(1.20) R s,u − R t,u = δY ′ s,t δX t,u + R s,t · , and Theorem 1.1 implies the estimate (cid:13)(cid:13) V r Π( R, g ) (cid:13)(cid:13) q . (cid:13)(cid:13) V r Y ′ · V / (1 /r +1 / Π( δX, g ) (cid:13)(cid:13) q + (cid:13)(cid:13) V / (1 /r +1 /r ) R (cid:13)(cid:13) q k Sg k q . When the ℓ r norm implicit in the left-hand side of this estimate is computed fora given partition π , this estimate can be interpreted as a bound for the error in adiscrete approximation of the controlled integral R Y dg .Such integrands also appear in stochastic numerics, see e.g. [KP92, Ch.5], [GL97,Lem.4.2.], or [KN07].
OUGH SEMIMARTINGALES 5
Further variants.
Theorem 1.1 continues to hold with all processes beingHilbert spaces valued, upon replacing all products by tensor products, and thebounds do not depend on the dimensions of the Hilbert spaces.The limiting variational estimate (1.14) has a precise analogue in Hölder topology,given in Appendix A, which extends and quantifies some previous constructionsnotably Diehl et al. [DOR15] and [FH20, Ch.13] (with g taken as Brownian motion).To wit, in these references the Hölder regularity is obtained by some variation ofKolmogorov’s criterion (or Besov-Hölder embedding); the resulting (1 /q ) + -loss onthe Hölder exponent (integrability parameter q ) is avoided in Theorem A.1.1.2. Rough integrators.
Second main result concerns integrals formally given by Π( g, Y) t,t ′ ≡ Z ( t,t ′ ] ( δg ) t,u − dY u , where g is a martingale and Y is a suitable (rough) càdlàg process. When Y hasfinite p -variation sample paths, for p < , use Young’s inequality pathwise, with p > such that /p + 1 /p > max(1 , /r ) , followed by Hölder’s inequality (with q, q , q as in Theorem 1.1) and Lépingle’s estimate (applied to || V p g || L q ) to see(1.21) (cid:13)(cid:13) V r Π( g, Y) (cid:13)(cid:13) L q (Ω) . k V p Y k L q (Ω) k V ∞ g k L q (Ω) . When p ≥ , pathwise arguments fail, and this includes the case when Y is anothercàdlàg martingale (hence dealt with by Theorem 1.1), which requires stochastic ar-guments. For any partition π , of [0 , T ] say, integration by parts for sums gives ( Y T − Y )( g T − g ) − Π π ( Y, g ) ,T = Π π ( g, Y ) ,T + X π j Y, g ] π . We give an example where [ Y, g ] , hence Π( g, Y) , does not exist. Example . Let g = B , a standard Brownian motion, and Y t := R t ( t − s ) H − / d B ,so that Y = B H is a fractional Brownian motion (fBm) of Hurst parameter H ,of finite p variation, any p > /H (and no better). Take T = 1 and compute k P π j < ( B Hπ j +1 − − B Hπ j )( B π j +1 − B π j ) k L (Ω) ∼ mesh( π ) H +1 / − , as seen by Itô isom-etry, hence divergent in the rough regime H ∈ (0 , / . (This implies that theItô integral R B H d B has infinite Itô-Stratonovich correction, cf. [FH20, Ch.14,15]for discussion of this example from KPZ type renormalisation perspective.) As aconsequence, R B d B H = lim π Π π ( B, B H ) does not exist.The problem in this example is correlation , and taking Y = X deterministic (orindependent of g ) is a way of ruling out such situations. (This example explains whyindependence of components is a standard assumption for Gaussian rough paths[FV10b]. ) A flexible structural assumption to overcome this problem is to assumefor the (adapted) process Y to be (analytically) close to a deterministic referencepath X . Theorem 1.2. Let q, q , q be as in Theorem 1.1, < r ≤ ∞ , and < ˆ p < ≤ p ≤ ∞ with /r < / /p . Let X be a deterministic càdlàg path, Y = ( Y, Y ′ ) acàdlàg adapted process, and g a càdlàg martingale. Assume that V ∞ g ∈ L q , M Y ′ := sup t | Y ′ t | ∈ L q , X ∈ V p , V ˆ p R Y ∈ L q , where (1.22) R Y s,t := R Y ,Xs,t := Y t − Y s − Y ′ s ( X t − X s ) , ≤ s ≤ t < ∞ . Then, there exists a process (Π( g, Y) t,t ′ ) ≤ t ≤ t ′ < ∞ with the following properties. For an independent Brownian B ⊥ , existence of R B ⊥ d B H = lim Π( B ⊥ , B H ) π holds in L (Ω) . P. FRIZ AND P. ZORIN-KRANICH (1) Along deterministic partitions π , we have existence of (1.23) Π( g, Y) ,T = u . c . p . -lim mesh( π ) → Π π ( g, Y ) ,T =: Z T ( δg ) ,t − dY t . (2) We have Chen’s relation (1.24) Π( g, Y) t,t ′′ = Π( g, Y) t,t ′ + Π( g, Y) t ′ ,t ′′ + ( g t ′ − g t )( Y t ′′ − Y t ′ ) . (3) We have the bound (1.25) (cid:13)(cid:13) V r Π( g, Y) (cid:13)(cid:13) L q (Ω) . (cid:16) V p X k M Y ′ k L q (Ω) + k V ˆ p R Y k L q (Ω) (cid:17) k V ∞ g k L q (Ω) . Theorem 1.2 is proved in Section 5.3. The construction of Π( g, Y) is based onthe aforementioned integration by parts identity in combination with constructingquadratic covariation, given as (u.c.p.) limit of [ Y, g ] π (see Definition 5.2), for everylocal martingale g , identified explicitly in Theorem 5.4 as(1.26) X s ≤ t ∆ X s Y ′ s − ∆ g s + X s ≤ t ∆ R Y s ∆ g s =: [Y , g ] t , were our notation tracks Y = ( Y, Y ′ ) ↔ ( Y ′ , R Y ) , with X fixed. (Note that, ingeneral, [ Y, Y ] π does not converge.) Again, several remarks are in order. • The exponent p quantifies the variational regularity of both X and Y . Theassumption p ≥ is not essential. Indeed, as noted above, when p < onecan use (pathwise) Young, Hölder, and Lépingle to get the estimate (1.21),from which (1.25), if so desired, is an easy consequence. • The assumption ˆ p < reflects the “length” of the expansion Y t ≈ Y s + Y ′ s ( X t − X s ) , familiar from controlled rough path theory (think: ˆ p = p / )although we do not need to control any variation norm of Y ′ here: Theo-rem 1.2 is a stochastic result, and not based on pathwise (sewing) arguments.It is then clear that the condition on ˆ p could be relaxed by suitable higherorder “controllness” assumptions, but we have not pursed this further. • The special case of deterministic X corresponds to ( Y, Y ′ ) = ( X, , R Y = 0 .Take q = ∞ and ≤ q = q < ∞ , so that (1.25) simplifies to(1.27) (cid:13)(cid:13) V r Π( g, X ) (cid:13)(cid:13) L q (Ω) . ( V p X ) k V ∞ g k L q (Ω) . In case of random X , but independent of g , this estimate can be used uponconditioning on X , and immediately gives (cid:13)(cid:13) V r Π( g, X ) (cid:13)(cid:13) L q (Ω) . k V p X k L q (Ω) k V ∞ g k L q (Ω) . The better integrability of the left-hand side, compared to (1.25), is a conse-quence of independence. • U.c.p. convergence as mesh( π ) → in (1.23) fails in general for the two-parameter processes Π π ( g, Y) t,t ′ . In fact, it already fails in the simpler situa-tion of Corollary 4.5, which deals with mesh convergence of discrete approx-imations to Itô integrals.1.3. Rough semimartingales. Recall that a classical semimartingale Z = g + Y ,possibly vector valued, is the sum of a càdlàg local martingale g and càdlàg adapted Y ∈ V . This was generalised, at least in the continuous setting, to Dirichletprocesses [Föl81], where the finite variation condition on Y is replaced by vanish-ing quadratic variation. In a similar spirit, we can define Young semimartingales (YSM) as processes Z = g + Y , as above, but now with Y ∈ V − loc , meaning V p loc for p ∈ [1 , . Although this decomposition need not be unique, the paraproduct Π( Z, ¯ Z ) t,t ′ = R ( δZ ) t,u − d ¯ Z u is easily seen to be well-defined, essentially as conse-quence of Itô and Young integration, with pathwise estimates obtained by combin-ing Young and Lépingle, exactly as was done for (1.21). Examples of suitable V − loc processes include fractional Brownian motion with Hurst parameter H > / and α -stable Lévy processes, α < , see [JM83; Man04] for some general results. OUGH SEMIMARTINGALES 7 Both Dirichlet processes and Young semimartingales face a seemingly fundamentalbarrier at p = 2 . Yet, Theorems 1.1 and 1.2 provide us with a way of going beyond- the key idea is to postulate a deterministic reference path X . (This assumptionappears naturally, e.g. under partial conditioning of driving noise, cf. Corollary 1.9.) Definition 1.3. Let p ∈ [2 , . Let X be a càdlàg adapted process, with valuesin some Hilbert space ˜ H and X ∈ V p loc almost surely. We call a pair of càdlàgadapted processes Y = ( Y, Y ′ ) with values in some Hilbert space H and in the operatorspace L ( ˜ H, H ) , respectively, an X -controlled p -rough process if Y, Y ′ ∈ V p loc and R Y ,X ∈ V p/ , almost surely. Definition 1.4. Let p ∈ [2 , and X ∈ V p loc be a càdlàg determinsitic path. We definean X -controlled p -rough semimartingale (RSM) to be a càdlàg adapted process of theform ( g + Y, Y ′ ) : Ω × [0 , ∞ ) → H ⊕ L ( ˜ H, H ) , where g is a càdlàg local martingale and Y = ( Y, Y ′ ) is an X -controlled p -roughcàdlàg adapted process. A trivial example of X -controlled p -RSM is given by ( g + X, Id) for some deter-ministic càdlàg path X ∈ V p loc , p < , as may be supplied by a typical realization ofanother martingale. The following can be seen as RSM version of the Doob–Meyerdecomposition for special semimartingales. Theorem 1.5. Let ( g + Y, Y ) be a RSM. Assume Y = Y ( t, ω ) is previsible, Y (0 , ω ) =0 . Then the decomposition is unique.Proof. From (1.26), using crucially the existence of the reference path X , the qua-dratic covariation, given as (u.c.p.) limit of [ Y, ¯ g ] π , exists and vanishes for every continuous local martingale ¯ g . (This shows that g + Y is a weak Dirichlet process inthe sense of [ER03; Coq+06]). Consider now two decompositions g + Y = g + Y with Y i previsible. Then Y − Y =: ¯ g is a previsible local martingale, hence acontinuous local martingale. But then [ Y − Y , Y − Y ] π = [ Y , ¯ g ] π − [ Y , ¯ g ] π and both terms on the right-hand side vanish upon refinement of π . This shows that Y − Y is a continuous martingale with vanishing quadratic variation, starting atzero, hence identically equal to zero. (cid:3) Similar to controlled rough paths, the notion of RSM is most fruitful when pairedwith rough paths . Recall [Lyo98; FS17], see also [Wil01] and [Che+19] for a recentreview (with applications to homogenization), that a càdlàg p -rough path with p ∈ (2 , can be viewed as a pair of càdlàg processes X = ( X, X ) = (( X t ) , ( X s,t )) with val-ues in a Banach space B and a tensor product space B ⊗ B , with V p X, V p/ X (locallyin time) finite and subject to Chen relation X t,t ′′ = X t,t ′ + X t ′ ,t ′′ + ( δX ) t,t ′ ( δX ) t ′ ,t ′′ .Recall further that càdlàg X -controlled p -rough paths can be integrated against X and, more generally, other càdlàg X -controlled p -rough paths, Z (0 ,T ] δ Y d ¯Y = lim mesh( π ) → Π π (Y , ¯Y) ,T , Π π (Y , ¯Y) T,T ′ = X π j ≤ T δY ,π j δ ¯ Y π j ,π j +1 ∧ T + Y ′ π j ¯ Y ′ π j X π j ,π j +1 ∧ T . (1.28)The statement with mesh convergence above is from [FZ18, Proposition 2.6]; theproof in fact also shows that the convergence is locally uniform in T . Convergenceof càdlàg rough integrals in the net sense was proved in [FS17, Theorem 34] (with ¯ Y = X , ¯ Y ′ = 1 ; see [FH20, Remark 4.12] for the general case), extending the Höldercontinuous case in [Gub04]. P. FRIZ AND P. ZORIN-KRANICH Theorem 1.6. Let p ∈ [2 , , X = ( X, X ) be a càdlàg p -rough path and W =( g + Y, Y ′ ) , ¯W = (¯ g + ¯ Y , ¯ Y ′ ) be two rough semimartingales. Then the paraproduct Π(W , ¯W) t,t ′ := Z ( t,t ′ ] δ ( g + Y ) t,u − d¯ g u + Z ( t,t ′ ] ( δg ) t,u − d ¯Y u + Z ( t,t ′ ] ( δ Y) t,u − d ¯Y u is well-defined, as sum of Itô integral, then R δg d ¯Y := Π( g, ¯Y) , and at the far righta rough integral, with quantitative estimates provided respectively by Theorem 1.1,Theorem 1.2 and (càdlàg) rough integration theory. The enhanced paraproduct (Π(W , ¯W) ,t , δ ( g + Y ) ,t Y ′ t ) defines another rough semimartingale, with local martingale component given by theItô integral R (0 ,t ] δ ( g + Y ) ,u − d¯ g u . Furthermore, a càdlàg p -rough path is given by W s,t := ( δ ( g + Y ) s,t , Π(W , W) s,t ) . Proof. By Corollary 4.5, Theorem 5.4, and (1.28), we have Π(W , ¯W) , · = u . c . p . -lim mesh( π ) → (cid:16) X π j Let X = ( X, X ) be a càdlàg p -rough path over R m , p ∈ (2 , , and g an R n -valued martingale with V ∞ g ∈ L q , for some ≤ q < ∞ . Then, a.s., themap (1.29) J : ( X , g ( ω )) (cid:18)(cid:18) Xg (cid:19) , (cid:18) X Π( g, X )Π( X, g ) Π( g, g ) (cid:19)(cid:19) = ( X g ( ω ) , X g ( ω )) . takes values in the space of càdlàg p -rough paths over R m + n , with q -integrable ho-mogeneous rough path norm, given by V p hom X g := V p X g + ( V p/ X g ) / ∈ L q . Moreover, J is locally Lipschitz continuous in the sense that (cid:13)(cid:13) V p ( X g − X g ) (cid:13)(cid:13) L q . V p ( X − X ) + k V ∞ ( g − g ) k L q , OUGH SEMIMARTINGALES 9 and k V p/ (Π( X , g ) − Π( X , g )) k L q + k V p/ (Π( g , X ) − Π( g , X )) k L q . ( V p X ) k V ∞ ( g − g ) k L q + V p ( X − X ) k V ∞ g k L q , k V p/ (Π( g , g ) − Π( g , g )) k L q / . ( k V ∞ g k L q + k V ∞ g k L q ) k V ∞ ( g − g ) k L q In particular, the map ( X , g ) 7→ J ( X , g ) =: ¯ X = ( ¯ X, ¯ X ) is continuous (and uni-formly so on bounded sets), with respect to homogeneous L q rough paths metric k V p hom ( ¯ X − ¯ X ) k L q ≍ k V p ( ¯ X − ¯ X ) k L q + k ( V p/ ( ¯ X − ¯ X )) / k L q . Differential equations. In Theorem 1.6, we gave a canonical construction ofa (random) p -rough path W associated to any rough semimartingale W = ( g + Y, Y ′ ) in sense of Definition 1.4. The parameter p ∈ (2 , and the reference path X arekept fixed. In particular, rough semimartingales can drive differential equations,(1.30) dZ = σ ( Z − ) dW : ⇐⇒ dZ = σ ( Z − ) d W , understood for a.e. realization of W = W ( ω ) as rough differential equation (bynature, multidimensional). This should be contrasted with SDEs driven by weakDirichlet processes [CR07], essentially restricted to scalar drivers. Standard resultsin (deterministic) rough path theory provide a unique solution Z = Z ( W , Z ) of theinitial value problem for (1.30) provided that σ ∈ Lip p + . The construction assuresthat Z t = Z t ( W ( ω ) , Z ( ω )) defines an adapted (càdlàg) process provided that theinitial datum Z is F -measurable. When ( Y, Y ′ ) = 0 , W is nothing but the Itôrough path lift of the càdlàg local martingale g , as previously constructed in [CF19],and yields (a robust version of) the classical Itô solution, as found in textbooks onstochastic differential equations. We find it instructive to replace σ by ( σ, µ ) andspecialise to(1.31) dZ = σ ( Z − ) d X + µ ( Z − ) d g : ⇐⇒ dZ = ( σ, µ )( Z − ) d J ( X , g ) . Many authors (e.g. [GN08]) have consider the situation where g = B , a multidi-mensional Brownian motion, and X replaced by an independent fractional Brownian B H motion with H > / . In this case, the left-hand side of (1.31) makes sensein mixed Young Itô sense (and could accordingly be phrased in terms of Youngsemimartingales). From the perspective of [FV10b], it suffices to construct ( B H , B ) jointly as Gaussian rough paths, which is possible for H > / . Equation (1.31),in case when g is a Brownian motion B and X a geometric Hölder rough path, wastreated in [Cri+13] as flow transformed Itô SDE, in [DOR15; DFS17], in the right-hand side sense of (1.31). (In absence of jumps, the situations is much simplified inthat that ( X , B ) is construction by a Kolmogorov type criterion for rough paths; see[FH20, Ch.12] for a review.) Still in a continuous setting, forthcoming work [FHL20]employes stochastic sewing arguments.Back to the case of càdlàg g ∈ M loc , with càdlàg p -rough X , the (formal) left-handof (1.31) suggests that Z is a rough semimartingale with local martingale componentgiven by the (well-defined) Itô integral R µ ( Z − ) d g . However, from a rough pathperspective, Z is constructed as an ( X, g ) -controlled rough path. Knowing only X ,this is insufficient to define Y ? = R σ ( Z − ) d X by (purely analytic) rough integration.The next theorem shows that the left-hand side of (1.31) has, thanks to stochasticcancellations, a bona-fide integral meaning after all. Theorem 1.8 (cf. Theorem 6.3) . Let σ, µ ∈ Lip p + , so that (1.31) admits a uniquesolution process in RDE sense, given by (1.32) Z t ( ω ) := Z t ( X , Z ( ω ); ω ) := Z t ( J ( X , g )( ω ) , Z ( ω )) , This restriction is easy to understand since every deterministic continuous path is a weakDirichlet process. In general, this is not sufficient to drive a differential equations (the raison d’êtreof rough path theory). adapted for F -measurable Z . Then ( Z, σ ( Z )) is a rough semimartingale with de-composition Z = M + Y with local martingale component M = R · µ ( Z − ) d g and Y given by Y t = u . c . p . -lim d-mesh( π ) → X j : π j RDE theory yields a solution ( Z, ( σ, µ )( Z )) as ( X, g ) -controlled p -rough pro-cess. By Theorem 6.3, we see that ( Z, σ ( Z )) is an X -controlled p -RSM, as is ( σ ( Z ) , Dσ ( Z ) ◦ σ ( Z )) by Corollary 6.4. To see the stated decomposition into lo-cal martingal and rough drift part, we write the RDE solution as integral equation,obtained as mesh-limit of local approximations given by δZ s,t ∼ = f ( Z s )( δX ) s,t + f ( Z s ) X s,t + f ( Z s )( δg ) s,t + f ( Z s )Π( X, g ) s,t + f ( Z s )Π( g, X ) s,t + f ( Z s )Π( g, g ) s,t where f = σ, f = µ, f = Dσ ◦ σ and so on. (Our assumptions on σ, µ imply thatall the f ij ’s are bounded.) It follows from Lemma 6.1 and 6.2 that converges stilltakes place when f , f , f are set to zero, provided we restrict ourselves to themesh limit of deterministc partitions. What remains are Itô left-point sums, with f -terms, and u.c.p. Itô limit M = R µ ( Z − ) d g . All these entails convergence ofsum with the remaining terms ( f and f ), as given in the statement. Alternatively,though equivalently, we can see as R σ ( Z − ) d X as integral of a rough semimartingaleagainst (0 + X, Id) , trivially another X -controlled rough semimartingale, hence relyon Theorem 1.6. (cid:3) The next result asserts, loosely speaking, that an Itô SDE solution, conditioned on(an independent) part of the driving noise, is a.s. a rough semimartingale. (This canbe seen as major extension of the rather trivial fact B ( ω ) + X is a rough semimartin-gale (in ω ) for a.e. typical realization of X = B ⊥ ( ω ′ ) , for independent Brownianmotions B, B ⊥ .) Corollary 1.9. Assume g = g ( ω ) and X = X ( ω ′ ) are independent local martingales,defined on some filtered product space ( ¯Ω , ¯ F ) = (Ω , F ) × (Ω ′ , F ′ ) . Let σ, µ be as inTheorem 1.8 and write ˜ Z ( Z ; ω, ω ′ ) for the unique ¯ F -adapted solution of the Itô SDE (1.33) d ˜ Z = σ ( ˜ Z − ) d X + µ ( ˜ Z − ) d g with ¯ F -measurable initial data Z = Z ( ω, ω ′ ) . With the Itô rough path lift of X , X ( ω ′ ) = ( X, X )( ω ′ ) = ( X ( ω ′ ) , Π( X, X )( ω ′ ) and rough semimartingale Z as in (1.32) we have, for a.e. ω and a.e. ω ′ , (1.34) ˜ Z ( Z ; ω, ω ′ ) = Z ( X ( ω ′ ) , Z ( ω, ω ′ ); ω ) . Proof. In view of uniqueness of the Itô solution, it suffices to show that the right-hand side of (1.34) is an Itô solution of (1.33). By Theorem 1.8, it suffices to showthat u . c . p . -lim d-mesh( π ) → X j : π j Z, σ ( ˜ Z )) given X isexpressed terms of the distribution of the rough semimartingale ( Z, σ ( Z )) . OUGH SEMIMARTINGALES 11 Vector-valued estimates in discrete time The main result of this section, Theorem 2.6, is a bound for discrete time versionsof the Itô integral. Its main advantage over the previous result [KZ19, Proposition3.1] is that the integrands F ( k ) are allowed to be arbitrary two-parameter processes,rather than martingale differences. The connection of Theorem 2.6 with variationnorm estimates will be established in Corollary 3.4.We begin this section by recalling several known results. We abbreviate k·k q := k·k L q (Ω) .2.1. Davis decomposition. For a scalar-valued process ( f n ) , we denote the mar-tingale maximal function and its stopped version by M f := sup n | f n | , M t f := sup n ≤ t | f n | , and the martingale square function and its stopped version by Sf := ℓ n | df n | , S t f := ℓ n ≤ t | df n | . Here and later, dg j := g j − g j − . We denote ℓ p norms by ℓ pk a k := ( X k ∈ N | a k | p ) /p . In order to simplify notation, we only consider martingales g with g = 0 . Theorem 2.1 (Davis decomposition [Dav70], see e.g. [Hyt+16, Theorem 3.4.3]) . Let ( f n ) ∞ n =0 be a martingale with values in a Banach space X . Suppose that f = 0 and f n ∈ L (Ω → X, F n ) for all n . Then there is a decomposition f n = f pred n + f bv n intomartingales adapted to the same filtration with f pred0 = 0 such that the differences of f pred have predictable majorants: (2.1) k df pred n k X ≤ n ′ Let ≤ q < ∞ , X be a Banach function space, elements of which are R -valued maps x ( · ) , and ( f n ) a martingale with values in X . Then for f pred givenby Theorem 2.1 we have kk Sf pred k X k L q . q kk Sf k X k L q , where the square function is given by k Sf k X := k ℓ n ( df n ( · )) k X Remark . We will apply this with X = ℓ r , i.e. r -summable series, viewed as mapsfrom N → R , with the usual Banach structure. Proof. Using (2.2) we estimate kk Sf pred k X k L q ≤ kk Sf k X k L q + kk Sf bv k X k L q ≤ kk Sf k X k L q + kk X n | d n f bv |k X k L q ≤ kk Sf k X k L q + k X n k d n f bv k X k L q ≤ kk Sf k X k L q + C k sup n k d n f k X k L q . kk Sf k X k L q . (cid:3) Vector-valued BDG inequality. We recall the weighted Burkholder–Davis–Gundy inequality. Lemma 2.3 ([Os¸e17]) . Let ( f n ) be a martingale and ( w ) a positive random variable.Then E ( M f · w ) ≤ √ E ( Sf · M w ) , where M w = sup n E ( w | F n ) .Remark . The proof of Lemma 2.3 given in [Os¸e17] also works for martingales withvalues in a real Hilbert space. Lemma 2.4. Let h ( k ) be martingales with respect to some fixed filtration. Let ≤ q < ∞ and ≤ r < ∞ . Then we have (2.3) (cid:13)(cid:13) M h ( k ) (cid:13)(cid:13) L q ( ℓ rk ) . q,r (cid:13)(cid:13) Sh ( k ) (cid:13)(cid:13) L q ( ℓ rk ) . Proof. First we consider the case < q < ∞ .Take positive functions with k w ( k ) k L q ′ ( ℓ r ′ k ) = 1 . Then, by Lemma 2.3, we have E (cid:18)X k ( M h ( k ) ) w ( k ) (cid:19) . X k E (cid:18) Sh ( k ) M w ( k ) (cid:19) ≤ (cid:13)(cid:13) Sh ( k ) (cid:13)(cid:13) L q ( ℓ rk ) (cid:13)(cid:13) M w ( k ) (cid:13)(cid:13) L q ′ ( ℓ r ′ k ) . By the vector-valued Doob’s inequality [Hyt+16, Theorem 3.2.7], we have (cid:13)(cid:13) M w ( k ) (cid:13)(cid:13) L q ′ ( ℓ r ′ k ) . (cid:13)(cid:13) w ( k ) (cid:13)(cid:13) L q ′ ( ℓ r ′ k ) = 1 . Taking the supremum over w ( k ) , we obtain the claim.Now we consider q = 1 . The case r = 1 follows from the usual BDG inequality, sowe may assume < r < ∞ .Decompose ~h = ~h pred + ~h bv as in Theorem 2.1 with X = ℓ r . For λ > , define thestopping time τ := inf { t | k S t h pred k ℓ r > λ or k S t h k ℓ r > λ } . We claim that(2.4) k Sh pred τ k ℓ r ≤ k Sh pred k ℓ r ∧ Cλ. Indeed, the first bound is trivial, and the second bound is only non-void if < τ < ∞ .In the latter case, by (2.1), we have k Sh pred τ k ℓ r ≤ k Sh pred τ − k ℓ r + k h pred τ − h pred τ − k ℓ r ≤ λ + 4 sup n ′ <τ k h n ′ − h n ′ − k ℓ r ≤ λ. Also, {k M h pred k ℓ r > λ } ⊆ {k M h pred τ k ℓ r > λ } ∪ { τ < ∞}⊆ {k M h pred τ k ℓ r > λ } ∪ {k Sh k ℓ r > λ } ∪ {k Sh pred k ℓ r > λ } By the layer cake formula, k M h pred k L ( ℓ r ) = Z ∞ P {k M h pred k ℓ r > λ } d λ ≤ Z ∞ P {k M h pred τ k ℓ r > λ } d λ + Z ∞ P {k Sh pred k ℓ r > λ } d λ + Z ∞ P {k Sh k ℓ r > λ } d λ =: I + II + III. The term III is the claimed right-hand side of the estimate (2.3), again by the layercake formula. By Lemma 2.2, we have II = kk Sh pred k ℓ r k L . kk Sh k ℓ r k L . OUGH SEMIMARTINGALES 13 Using the already known L r ( ℓ r ) case of Lemma 2.4 and (2.4), we bound the firstterm by I . Z ∞ λ − r k Sh pred τ k rL r ( ℓ r ) d λ ≤ Z ∞ λ − r kk Sh pred k ℓ r ∧ Cλ k rL r d λ = E Z ∞ min (cid:0) λ − r k Sh pred k rℓ r , (cid:1) d λ . E k Sh pred k ℓ r = II, and we reuse the previously established estimate for II . (cid:3) Remark . Lépingle’s inequality (1.2) can be obtained from Lemma 2.4 and Corol-lary 3.2. In fact, Corollary 3.2 simplifies for processes Π that are of difference form,see [Zor20, Corollary 2.4], so that the vector-valued bound (2.3) is not necessary toshow (1.2).2.3. Vector-valued maximal paraproduct estimate. We call a two-parameterprocess ( F s,t ) s ≤ t adapted if F s,t is F t -measurable for every s ≤ t .For an adapted process ( F s,t ) and a martingale ( g n ) , we define(2.5) Π s,t ( F, g ) := X s Let < q, q ≤ ∞ , ≤ q , r, r < ∞ , ≤ r ≤ ∞ . Assume /q = 1 /q + 1 /q and /r = 1 /r + 1 /r . Then, for any martingales ( g ( k ) n ) n , anyadapted sequences ( F ( k ) s,t ) s ≤ t , and any stopping times τ ′ k ≤ τ k with k ∈ Z , we have (2.6) (cid:13)(cid:13) ℓ rk sup τ ′ k ≤ t ≤ τ k | Π( F ( k ) , g ( k ) ) τ ′ k ,t | (cid:13)(cid:13) q ≤ C q ,q ,r ,r (cid:13)(cid:13) ℓ r k sup τ ′ k ≤ t<τ k | F ( k ) τ ′ k ,t | (cid:13)(cid:13) q k ℓ r k Sg ( k ) τ ′ k ,τ k k q , where Sg s,t := (cid:0)P tj = s +1 | dg j | (cid:1) / .Proof of Proposition 2.5. We may replace each g ( k ) by the martingale(2.7) ˜ g ( k ) n := g ( k ) n ∧ τ k − g ( k ) n ∧ τ ′ k without changing the value of either side of (2.6).Consider first q ≥ . For each k , the sequence h ( k ) t := ( , t < τ ′ k , Π( F ( k ) , g ( k ) ) τ ′ k ,t , t ≥ τ ′ k , is a martingale. We may also assume F τ ′ k ,t = 0 if t [ τ ′ k , τ k ) . By Lemma 2.4, we canestimate LHS (2.6) . (cid:13)(cid:13) ℓ rk | Sh ( k ) | (cid:13)(cid:13) q = (cid:13)(cid:13) ℓ rk ℓ j | F ( k ) τ ′ k ,j − dg ( k ) j | (cid:13)(cid:13) q ≤ (cid:13)(cid:13) ℓ rk M F ( k ) ℓ j | dg ( k ) j | (cid:13)(cid:13) q ≤ k ℓ r k M F ( k ) k q (cid:13)(cid:13) ℓ r k Sg ( k ) (cid:13)(cid:13) q . Here and later, we abbreviate M F := sup j | F τ ′ k ,j | .Consider now q < . By homogeneity, we may assume(2.8) (cid:13)(cid:13) ℓ r k M F ( k ) (cid:13)(cid:13) q = (cid:13)(cid:13) ℓ r k Sg ( k ) (cid:13)(cid:13) q = 1 , and we have to show (cid:13)(cid:13) ℓ rk sup τ ′ k ≤ t ≤ τ k | Π( F ( k ) , g ( k ) ) τ ′ k ,t | (cid:13)(cid:13) q . . We use the Davis decomposition g = g pred + g bv (Theorem 2.1 with X = ℓ r ). Thecontribution of the bounded variation part is estimated as follows: k ℓ rk sup τ ′ k ≤ t ≤ τ k | Π( F ( k ) , g ( k ) , bv ) τ ′ k ,t |k q ≤ k ℓ rk X j | F ( k ) τ ′ k ,j − | · | dg ( k ) , bv j |k q ≤ k ℓ r k M F ( k ) k q k ℓ r k (cid:16)X j | dg ( k ) , bv j | (cid:17) k q ≤ k ℓ r k M F ( k ) k q k X j ℓ r k | dg ( k ) , bv j |k q . k ℓ r k M F ( k ) k q k sup j ℓ r k | dg ( k ) j |k q ≤ k ℓ r k M F ( k ) k q k ℓ r k Sg ( k ) k q , where we used (2.2) in the penultimate step.It remains to consider the part g pred with predictable bounds for jumps. By thelayer cake formula, we have(2.9) (cid:13)(cid:13) ℓ rk sup τ ′ k ≤ t ≤ τ k | Π( F ( k ) , g ( k ) , pred ) τ ′ k ,t | (cid:13)(cid:13) qq = Z ∞ P { ℓ rk sup τ ′ k ≤ t ≤ τ k | Π( F ( k ) , g ( k ) , pred ) τ ′ k ,t | > λ /q } d λ. Fix some λ > and define a stopping time(2.10) τ := inf n t (cid:12)(cid:12)(cid:12) ℓ r k Sg ( k ) t > cλ /q or ℓ r k Sg ( k ) , pred t > cλ /q or ℓ r k sup This estimate no longer depends on the stopping time τ . Integrating the right-handside of (2.12) in λ , we obtain Z ∞ λ − ˜ q/q k ℓ r k Sg ( k ) , pred ∧ λ /q k ˜ q ˜ q d λ = E Z ∞ (cid:0) λ − ˜ q/q ( ℓ r k Sg ( k ) , pred ) ˜ q ∧ (cid:1) d λ ∼ E ( ℓ r k Sg ( k ) , pred ) q ∼ , where we used ˜ q > q , Lemma 2.2 with X = ℓ r , and the assumption (2.8). (cid:3) Next, we deduce a version of Proposition 2.5 that involves a two-parameter supre-mum of the kind that appears in Corollary 3.2. Recall the definition of second orderincrements of a two-parameter process ( F s,t ) :(2.13) ( δF ) s,t,u := F s,u − F s,t − F t,u , s < t < u. For a fixed s , we define(2.14) ( δ s F ) t,u := F s,u − F t,u , s < t < u. Theorem 2.6. In the situation of Proposition 2.5, we have (cid:13)(cid:13) ℓ rk sup τ ′ k ≤ s Let q, q , q , r, r be as in Proposition 2.5 with r = 2 . Let ( F s,t ) bean adapted process such that (2.17) δ s F t,u = X i F is,t ˜ F it,u with adapted processes F i , ˜ F i , g a martingale, and ( τ k ) an adapted partition. Then,we have (cid:13)(cid:13) ℓ rk sup τ k − ≤ s In this section, we iterate Corollary 2.7 by applyingit recursively to each term Π( ˜ F i , g ) on the right-hand side of (2.18). The algebraicframework for this iteration is provided by the theory of branched rough paths in-troduced in [Gub10], see also [HK15]. We recall the relevant notation from [Gub10].We fix a finite set of label L . The set of trees with vertices labeled by the elementsof L is denoted by T L . A forest is a finite unordered tuple of trees in T L , in whichrepetition is allowed. The set of all forests is denoted by F L . The free commutative R -algebra generated by the trees T L is denoted by AT L . It can be identified with thefree R -vector space generated by F L .A branched rough path is an algebra homomorphism F : AT L → C , where C is the algebra of càdlàg functions on the simplex { ( s, t ) | s < t } , that satisfiesthe generalized Chen relation(2.19) δF f = F ∆( f ) − ⊗ f − f ⊗ , f ∈ AT L . On the right-hand side, we use the extension of F to an algebra homomorphism AT L ⊗ AT L → C defined by F f ⊗ f ′ = F f F f ′ , where we use the product C × C → C given by ( F G ) stu = F st G tu . The coproduct ∆ : AT L → AT L ⊗ AT L is an algebrahomomorphism acting on forests by(2.20) ∆( f ) = X ( b , r ) ∈ Cut f b ⊗ r , where the sum goes over the multiset of all admissible cuts , that is, partitions of treesin the forest f into (possibly empty) initial trees collected in the forest r (for “roots”)and final trees collected in the forest b (for “branches”). Our convention for cuts isdifferent from [Gub10, eq. (3)], in that we allow roots and branches to be empty. Theorem 2.8. Let q ∈ (0 , ∞ ) , q ∈ [1 , ∞ ) , and, for each tree t ∈ T L , let q t ∈ (0 , ∞ ] .Let r ∈ [1 , ∞ ) and, for each tree t ∈ T L , let r t ∈ [1 , ∞ ] . Let f ∈ F L be a forest andlet F be the set of all forests f ′ that are the disjoint unions of arbitrary partitions oftrees in f into subtrees. Assume that, for each f ′ ∈ F , we have /q = 1 /q + X t ∈ f ′ /q t , /r = 1 / X t ∈ f ′ /r t . Let F be an adapted family of branched rough paths and g a martingale. Then, wehave (2.21) (cid:13)(cid:13) ℓ rk sup τ k − ≤ s We induct on the degree of the forest f , that is, the total number of verticesin its trees. Let f be given and suppose that the claim is known for all forests withstrictly smaller degree. By the generalized Chen relation (2.19) and the definition ofthe coproduct (2.20), we have(2.22) δ s F f t,u = X ( b , r ) ∈ Cut( f ) , b =0 F b s,t F r t,u . We apply Corollary 2.7 with r = r f , q = q f , where /r f = P t ∈ f /r t and /q f = P t ∈ f /q t . Then the second term on the right-hand side of (2.18) corresponds to thesummand f ′ = f in (2.21).It remains to estimate the first term on the right-hand side of (2.18), for a fixedcut ( b , r ) , we have (cid:13)(cid:13) ℓ rk sup τ k − ≤ s OUGH SEMIMARTINGALES 17 where / ˜ q = 1 /q − X t ′ ∈ b /q t ′ , / ˜ r = 1 /r − X t ′ ∈ b /r t ′ . The latter norm can be estimated by the inductive hypothesis, since deg r < deg f . (cid:3) Example . The vector-valued BDG inequality 2.4is the case of the empty forest f in Theorem 2.8. In this case, we have F f ≡ , sothat Π( F f , g ) = δg. Therefore, the estimate (2.21) becomes (2.3). Example . Suppose that F = δf . This corresponds to the forest f consisting of the single tree a . In this case, F = { f } , and Theorem 2.8 gives (cid:13)(cid:13) ℓ rk sup τ k − ≤ s In this section, we will estimate V r Π( F, g ) in open ranges r > ρ . There is adichotomy depending on the value of the threshold ρ . For ρ < , we will use thesewing lemma, see Section 3.2. The main new results of this article are in the range ρ ≥ . In this range, pathwise estimates are insufficient, and we have to rely on thecancellation provided by the martingale g . By the construction in Section 3.1, vari-ation norm estimates in this range follow directly from the vector-valued estimatesin Section 2. Stopping time construction. In this section, we will bound r -variation bysquare function-like objects. For Lépingle’s inequality, this idea was introduced in[Bou89; PX88]. It was first applied to a (real variable) paraproduct in [DMT12].The stopping time argument in [Bou89; PX88] involves a real interpolation step thatwas made increasingly more explicit in [JSW08; MSZ20]. We use different stoppingtimes, which better capture the structure of the process at hand and avoid the realinterpolation step. For Lépingle’s inequality, similar stopping times were introducedin [Zor20]. One of the advantages of the present construction is that it allows us toremove a restriction on the integrability parameters ( q > ) from [KZ19].For an adapted process (Π s,t ) s ≤ t , let Π ∗ n ′′ := sup ≤ n For any discrete time adapted process (Π s,t ) s For m ∈ N , define stopping times τ ( m )0 := 0 , and then, for j ≥ , allowing a priori values in N ∪ {∞} ,(3.2) τ ( m ) j +1 := min n t > τ ( m ) j (cid:12)(cid:12)(cid:12) sup τ ( m ) j ≤ t ′ Let j be maximal with τ ( m ) j ≤ u l − . Since l ∈ L ( m ) , by definition(3.4), we have | Π u l − ,u l | > − m − Π ∗ u l . By the definition of stopping times (3.2), we obtain τ ( m ) j +1 ≤ u l . (cid:3) Fix m . For each l ∈ L ′ ( m ) , let j ( l ) be the largest j such that τ ( m ) j ∈ ( u l − , u l ] .Then all j ( l ) are distinct, and, since l = max L ( m ) , the claim shows that τ ( m ) j ( l )+1 < ∞ .Furthermore, by (3.4), the monotonicity of t Π ∗ t , and the definition (3.2) ofstopping times, we have(3.6) | Π u l − ,u l | ≤ − m Π ∗ u l ≤ − m Π ∗ τ ( m ) j ( l )+1 ≤ τ ( m ) j ( l ) ≤ t ′ <τ ( m ) j ( l )+1 | Π t ′ ,τ ( m ) j ( l )+1 | OUGH SEMIMARTINGALES 19 by the definition of τ ( m ) j ( l ) . Since all j ( l ) are distinct, this implies X l ∈ L ′ ( m ) | Π u l − ,u l | ρ ≤ ρ ∞ X j =1 sup τ ( m ) j − ≤ t ′ <τ ( m ) j | Π t ′ ,τ ( m ) j | ρ . Substituting this into (3.5), we conclude the proof of Lemma 3.1. (cid:3) Corollary 3.2. Let (Π s,t ) s ≤ t be an adapted process with Π t,t = 0 for all t . Then,for every < ρ < r < ∞ and q ∈ (0 , ∞ ] , we have (3.7) k V r Π k L q . sup τ (cid:13)(cid:13)(cid:13)(cid:16) ∞ X j =1 (cid:0) sup τ j − ≤ t In this section, we apply the sewing lemma to the processes Π( F, g ) . Lemma 3.3. Let F s,t be a two-parameter process such that F s,s = 0 and g t a one-parameter process. Suppose that (2.17) holds. Let ρ < and /ρ = 1 /p i, + 1 /p i, for every i . Then, we have (3.8) V ρ Π( F, g ) . X i V p i, F i · V p i, Π( ˜ F i , g ) . Proof. We will use the sewing lemma [FZ18, Theorem 2.5] with Ξ s,t := Π( F, g ) s,t . By definition (2.5) and the hypothesis F s,s = 0 , we have Ξ j,j +1 = 0 , so that Π( F, g ) s,t = Ξ s,t − t − X j = s Ξ j,j +1 . Moreover, from Chen’s relation (2.16), we obtain ( δ Ξ) s,t,u = X t ≤ j
Discrete sums corresponding to Itô integrals. Here, we combine the re-sults in Sections 3.1 and 3.2 into a statement that holds for arbitrary variationalexponents r . Corollary 3.4. Let < q ≤ ∞ , ≤ q < ∞ , and < r, p ≤ ∞ . Let /q =1 /q + 1 /q and assume /r < /p + 1 / . Let ( F s,t ) be an adapted process suchthat (2.17) holds, g a martingale, and ( τ k ) an adapted partition. Assume that /r < /p i, + 1 /p i, for every i . Then, we have (cid:13)(cid:13) V r Π( F, g ) (cid:13)(cid:13) q . X i (cid:13)(cid:13) V p i, F i · V p i, Π( ˜ F i , g ) (cid:13)(cid:13) q + (cid:13)(cid:13) V p F (cid:13)(cid:13) q k Sg k q . (3.9) Proof. Define ρ by /ρ = 1 /p + 1 / . Consider first the case ρ ≥ . By Corollary 3.2with ≤ ρ < r < ∞ , it suffices to estimate the terms k ℓ ρj sup τ j − ≤ t Onecan obtain estimates for Π( F, g ) , with F being a component of a branched roughpath, by iterating Corollary 3.4. However, this would involve potentially applyingCorollary 3.2 at every step of the iteration, resulting in unnecessary losses. It is infact more efficient to iterate vector-valued, rather than variational, estimates, whichwe have already done in Theorem 2.8. Here, we indicate the consequences thatTheorem 2.8 has for variation norm estimates. Corollary 3.5. Let q ∈ (0 , ∞ ) , q ∈ [1 , ∞ ) , and, for each tree t ∈ T L , let q t ∈ (0 , ∞ ] .Let ρ ∈ (0 , ∞ ) and, for each tree t ∈ T L , let r t ∈ [1 , ∞ ] . Let f ∈ F L be a forest andlet F be the set of all forests f ′ that are the disjoint unions of arbitrary partitions oftrees in f into subtrees. Assume that, for each f ′ ∈ F , we have /q = 1 /q + X t ∈ f ′ /q t , /ρ = 1 / X t ∈ f ′ /r t . Let F be an adapted family of branched rough paths and g a martingale. Then, forevery r > ρ , we have (3.10) (cid:13)(cid:13) V r Π( F f , g ) (cid:13)(cid:13) q . X f ′ ∈ F (cid:16)Y t ∈ f ′ (cid:13)(cid:13) V r t F t (cid:13)(cid:13) q t (cid:17) k Sg k q . Proof. Consider first the case ρ ≥ . By Corollary 3.2, it suffices to estimate(3.11) k ℓ ρk sup τ k − ≤ t Itô integral. Proof of Theorem 1.1, part 1. Since Π π ( F, g ) t,t ′ is càdlàg in both t and t ′ , we have V r Π π ( F, g ) = lim n →∞ sup l max ,u < ···
Let F, F i , ˜ F i be càdlàg processes such that (1.9) holds and F it,t = 0 for all i, t . Suppose that V p F ∈ L q and V ∞ ˜ F i ∈ L q for every i . Then, for every ˜ p ∈ ( p , ∞ ) ∪ {∞} , we have lim π k V ˜ p ( F − F ( π ) ) k L q = 0 . Proof. Since V p F ( π ) ≤ V p F and by Hölder’s inequality, it suffices to consider ˜ p = ∞ .Let ǫ > and define a sequence of stopping times recursively, starting with π := 0 ,by π j +1 := min n t > π j (cid:12)(cid:12)(cid:12) sup s ≤ π j | F s,t − F s,π j | ≥ ǫ or sup π j ≤ s ′ ≤ t max i | F is ′ ,t | ≥ ǫ o . Then, by (1.9), for any adapted partition π ′ ⊇ π and s ≤ t , we have | F s,t − F ( π ′ ) s,t | ≤ | F s,t − F ⌊ s,π ′ ⌋ ,t | + | F ⌊ s,π ′ ⌋ ,t − F ⌊ s,π ′ ⌋ , ⌊ t,π ′ ⌋ |≤ X i | F i ⌊ s,π ′ ⌋ ,s || ˜ F is,t | + | F ⌊ s,π ′ ⌋ ,t − F ⌊ s,π ′ ⌋ , ⌊ t,π ′ ⌋ | . ≤ X i ǫ · V ∞ ˜ F i + ǫ. (cid:3) Remark . Some structural condition on the two-parameter process F is necessaryin Lemma 4.1. Even if F is deterministic, continuous, and vanishes on the diagonal, F ( π ) does not necessarily converge to F uniformly. To see this, let φ : R → [0 , bea smooth function such that φ = 0 on ( −∞ , and φ = 1 on (1 , ∞ ) . Let F ( s, t ) := φ ( st ) φ ( t − s ) . Then, for any partition π , for s < π , we have F ( s, t ) − F (0 , t ) → as t → ∞ .In the above example, F is not uniformly continuous. Convergence can also fail foruniformly continuous in time processes if their samples are not equicontinuous. Tosee this, let Ω = (0 , with the Lebesgue measure, F t the trivial σ -algebra for t < / and the Lebesgue σ -algebra for t ≥ / . Let F ( s, t ) := φ (2 sφ (3 t − /ω ) φ (3( t − s )) ,where ω ∈ Ω and ≤ s ≤ t ≤ . For any ≤ s ≤ t ≤ / , we have F ( s, t ) = 0 ,so this process is indeed measurable with respect to the given filtration. For anyadapted partition π , there is an < s ≤ / such that s ≤ π ( ω ) for a.e. ω ∈ Ω .Let < s < s and t ≥ / . Then F ( s, t ) − F (0 , t ) = φ (2 s/ω ) − φ (0) = 1 for ω < s, so that k V ∞ ( F − F ( π ) ) k L ∞ = 1 . Proof of Theorem 1.1, part 2. By definition of a Cauchy net, the existence of thelimit (1.13) will follow if we can show that(4.3) lim π sup π ′ ⊇ π (cid:13)(cid:13) V r (Π π ( F, g ) − Π π ′ ( F, g )) (cid:13)(cid:13) L q = 0 . However, by (4.1), we have Π π ( F, g ) − Π π ′ ( F, g ) = Π π ′ ( F ( π ) − F ( π ′ ) , g ) . It follows from (4.2) that ( F ( π ) s,u − F ( π ′ ) s,u ) − ( F ( π ) t,u − F ( π ′ ) t,u )= X i F i, ( π ) s,t ˜ F i, ( π ) t,u − X i F i, ( π ′ ) s,t ˜ F i, ( π ′ ) t,u = X i ( F i, ( π ) s,t − F i, ( π ′ ) s,t ) ˜ F i, ( π ) t,u + X i F i, ( π ′ ) s,t ( ˜ F i, ( π ) t,u − ˜ F i, ( π ′ ) t,u ) . Therefore, we can estimate the norm in (4.3) using Part 1 of Theorem 1.1 with some ˜ p ∈ ( p , ∞ ) ∪{ p } in place of p , which is possible because (1.8) is an open condition.The bound that we obtain converges to by the hypothesis amd Lemma 4.1.The Chen relation (1.15) follows from the corresponding relation (2.16) for thediscrete paraproduct. (cid:3) Mesh convergence. Theorem 1.1 can be used to recover the classical resultsabout discrete approximations to the Itô integral. We begin with the simpler case ofcontinuous integrands. Corollary 4.2. In the situation of part 2 of Theorem 1.1, suppose that F = δf , q , q < ∞ , and the process f has a.s. continuous paths. Then convergence in (1.13) holds in the stronger sense that (4.4) Π( δf, g ) = lim mesh( π ) → Π π ( δf, g ) in L q ( V p ) , where π ranges over adapted partitions. OUGH SEMIMARTINGALES 23 Proof. In view of the uniform bound in part 1 of Theorem 1.1, it suffices to consider abounded time interval. On such an interval, the paths of f are uniformly continuous.Therefore, F ( π ) → F uniformly as mesh( π ) → . Since F ( π ) are also uniformlybounded in L q ( V p ) , we have F ( π ) → F in L q ( V ˜ p ) for any ˜ p ∈ ( p , ∞ ) ∪ {∞} .We can choose ˜ p such that /r < / ˜ p + 1 / . It remains to apply the estimate(1.14) with p replaced by ˜ p to Π( F, g ) − Π π ( F, g ) = Π( F − F ( π ) , g ) . (cid:3) Next, we recover the convergence result for discrete approximations to the Itôintegral in the presence of jumps. First, let us recall the sense in which the Itôintegral is usually defined. Definition 4.3. Suppose that, for every adapted partition π , we are given a one-parameter process ( f πt ) t . We say that the family f π converges to a process ( f t ) t inthe mesh u.c.p. (uniform on compacts in probability) sense if (4.5) ( ∀ T > 0) ( ∀ ǫ > 0) ( ∃ δ > 0) ( ∀ π : mesh( π ) < δ ) P { sup ≤ t ′ ≤ T | f πt ′ − f t ′ | > ǫ } < ǫ. We denote this mode of convergence by (4.6) u . c . p . -lim mesh( π ) → f π = f. If π is only allowed to range over deterministic partitions, we denote this by d-mesh( π ) → . Lemma 4.4. Let g be a càdlàg local martingale. Then there exists a localizingsequence ( τ k ) such that, for every k , we have g ( τ k ) ∈ L ( V ∞ ) .Proof. Without loss of generality, g = 0 . Let (˜ τ k ) be a localizing sequence such that g (˜ τ k ) t ∈ L for each k, t and ( g (˜ τ k ) t ) t is a martingale for each k . Define τ k := ˜ τ k ∧ k ∧ inf { t | | g t | ≥ k } . Then V ∞ g ( τ k ) ≤ k + | g τ k | . The first summand is in L ∞ ⊂ L . For the second summand, we have E | g τ k | = E | g (˜ τ k ) τ k | ≤ E | g (˜ τ k ) k | < ∞ . (cid:3) Now, we can recover the existence of Itô integrals. Corollary 4.5. Let f be a càdlàg adapted process and g a càdlàg local martingale.Then, there exists the limit (4.7) Π( f, g ) , · = u . c . p . -lim mesh( π ) → Π π ( f, g ) , · . Note that the two-parameter supremum sup ≤ t ≤ t ′ ≤ T | Π π ( f, g ) t,t ′ − Π( f, g ) t,t ′ | does not converge to if f has jumps. Indeed by Chen’s relation, it is bounded belowby a multiple of sup ≤ t ≤ T | δ ( f − f ( π ) ) ,t δg t,T | = sup ≤ t ≤ T | ( f t − f ⌊ t,π ⌋ ) δg t,T | , and the difference ( f t − f ⌊ t,π ⌋ ) does not converge to if f has jumps. Proof of Corollary 4.5. We may assume without loss of generality that f = 0 and g = 0 . Let (˜ τ k ) be a localizing sequence for g given by Lemma 4.4. Then τ k := ˜ τ k ∧ inf { t | | f t | > k } is also a localizing sequence. Fix T > and ǫ > . For a sufficiently large k , we willhave P { τ k ≤ T } < ǫ/ . Replacing g by ( g t ∧ τ k ) t and f by ( f t ∧ τ k − ) t , we may assume that g ∈ L ( V ∞ ) and f ∈ L ∞ ( V ∞ ) .By part 2 of Theorem 1.1 with q = 1 and any r > , there exists an adaptedpartition π ◦ such that, for every adapted partition π ′ ⊇ π ◦ , we have (cid:13)(cid:13)(cid:13) V r (Π π ′ ( f, g ) − Π( f, g )) (cid:13)(cid:13)(cid:13) L q (Ω) < ( ǫ/ /q . In particular, for every adapted partition π ′ ⊇ π ◦ , we have P Ω π ′ < ǫ/ , Ω π ′ := { sup ≤ t ≤ T | Π π ′ ( f, g ) ,t − Π( f, g ) ,t | > ǫ/ } . Since V ∞ f is finite a.s., there exists A < ∞ such that P Ω < ǫ/ , Ω := { sup t ≤ T | f t | > A } < ǫ/ . Since lim j →∞ π ◦ j = ∞ a.s., there exists J ∈ N such that P Ω < ǫ/ , Ω := { π ◦ J < T } . Since g t is right continuous in t and measurable on Ω , there exists δ > such that P Ω < ǫ/ , Ω := { sup j ≤ J sup ≤ s ≤ δ | g π ◦ j + s − g π ◦ j | > ǫ/ (10 AJ ) } and P Ω < ǫ/ , Ω := { min j ≤ J | π ◦ j +1 − π ◦ j | ≤ δ } . We will show that this δ works for (4.7).Let π be an adapted partition with mesh( π ) < δ . Let π ′ := π ∪ π ◦ , this is anotheradapted partition. For every π ′ l ∈ π ◦ \ π and π ′ l < t ′ , we will use the identity(4.8) f π ′ l − ( g π ′ l ∧ t ′ − g π ′ l − ) + f π ′ l ( g π ′ l +1 ∧ t ′ − g π ′ l )= f π ′ l − ( g π ′ l +1 ∧ t ′ − g π ′ l − ) + ( f π ′ l − f π ′ l − )( g π ′ l +1 ∧ t ′ − g π ′ l ) . Now, if ω ∈ Ω \ Ω , then π ′ l − , π ′ l +1 π ◦ in the situation of (4.8). Therefore, the firstterm on the right-hand side of (4.8) appears in Π π . Therefore, for every t ′ ≤ T , wehave | Π π ′ ( f, g ) ,t ′ − Π π ( f, g ) ,t ′ | = (cid:12)(cid:12)(cid:12) X l : π ′ l ∈ π ◦ \ π and π ′ l Variation norm estimate. The main difficulty in defining [ Y, g ] for an X -controlled process Y and a martingale g is to handle the contribution of the jumpsof X . This is done by the following result. Recall OUGH SEMIMARTINGALES 25 Theorem 5.1. Let < q, q ≤ ∞ , ≤ q < ∞ with /q = 1 /q + 1 /q . Let ( g t ) t ≥ be a càdlàg martingale and ( Y ′ ) t ≥ a càdlàg adapted process. Let I ⊂ (0 , ∞ ) be acountable subset and (∆ t ) t ∈ I a (deterministic) sequence. Consider the process (5.1) B t,t ′ := X j ∈ I ∩ ( t,t ′ ] Y ′ j − ∆ j δg j − ,j . Then, for every p ∈ [2 , ∞ ] and /r < / /p , with M Y ′ = sup t | Y ′ t | , (5.2) k V r B k L q . k M Y ′ k L q (cid:0)X j ∈ I | ∆ j | p (cid:1) /p k (cid:0)X j ∈ I | δg j − ,j | (cid:1) / k L q . Proof. We will first show that the estimate (5.2) holds for finite sets I . This willimmediately imply that the series (5.1) converges unconditionally in L q ( V r ) and thatits limit also satisfies the estimate (5.2).When I is finite, we may assume that we are in discrete time, which correspondsto the case I = { , . . . , N } and Y ′ , g being constant on intervals [ n, n + 1) for n ∈ N .By Corollary 3.2, it suffices to estimate the L q norm of(5.3) (cid:13)(cid:13)(cid:13) sup τ k − ≤ t Discretization of quadratic covariation.Definition 5.2. Let g = ( g t ) t ≥ be a càdlàg local martingale. For adapted càdlàgprocesses Y, Z and a deterministic partition π , define (5.6) Z • [ Y, g ] πT := X π j Let ≤ ˆ p , p ≤ ∞ . Let X ∈ V p loc be a deterministic càdlàg path. Let Y = ( Y, Y ′ ) be a càdlàg adapted process such that Y ∈ V p loc and R Y ,X ∈ V ˆ p loc almostsurely and Y ′ ∈ L ∞ . Then, there exists a localizing sequence ( τ k ) such that, for every k , the process ˜Y = ( ˜ Y , ˜ Y ′ ) , defined by ˜ Y t = Y t ∧ τ j − , ˜ Y ′ t = ( Y ′ t if t < τ j , if t ≥ τ j , , satisfies ˜ Y ∈ L ∞ ( V p ) , M Y ′ ∈ L ∞ , and R ˜Y , ˜ X ∈ L ∞ ( V ˆ p ) , where ˜ X t := X t ∧ k .Proof. Without loss of generality, | Y ′ | ≤ / . Let τ k := k ∧ min { t | max( V p [0 ,t ] Y, sup s ∈ [0 ,t ] | Y ′ s | , V ˆ p [0 ,t ] R Y ,X ) ≥ k } . At this point, we have used the fact that the functions t V p [0 ,t ] Y and t V ˆ p [0 ,t ] R Y ,X is right continuous if X, Y, Y ′ are càdlàg, so that the above minimum in fact exists.For the former function, this is verified e.g. in [FZ18, Lemma 7.1]; the argument forthe latter function is similar.Then, for any t ≤ t ′ , we have(5.7) R ˜Y , ˜ Xt,t ′ = R Y ,Xt,t ′ if t ≤ t ′ < τ k , if τ k ≤ t ≤ t ′ ,δY t,τ k − − Y ′ t δX t,t ′ ∧ k , if t < τ k ≤ t ′ . The latter case can only appear once in any ℓ ˆ p norm in the definition of V ˆ p R ˜Y , ˜ X .Therefore, V ˆ p R ˜Y , ˜ X ≤ V ˆ p [0 ,τ k ) R Y ,X + 2 k + kV ∞ [0 ,k ] X is a bounded function. (cid:3) Theorem 5.4. Let ˆ p < ≤ p and X ∈ V p loc a deterministic càdlàg path. Supposethat Y = ( Y, Y ′ ) and Z are càdlàg adapted processes, g a càdlàg local martingale,and R Y ,X ∈ V ˆ p loc almost surely. Then (5.8) Z • [Y , g ] := u . c . p . -lim d-mesh( π ) → Z • [ Y, g ] π exists, and we have (5.9) Z • [Y , g ] t = X s ≤ t Z s − ∆ X s Y ′ s − ∆ g s + X s ≤ t Z s − ∆ R Y s ∆ g s , where ∆ g s := δg s − ,s and ∆ R Y s := R Y s − ,s . Moreover, for any /r < / /p , wehave Z • [Y , g ] ∈ V r loc .Remark . The case needed for the construction of the square bracket in Theorem 1.2is Z ≡ . General processes Z are needed in the consisteny result, Theorem 6.5. Proof. Since (5.6) and (5.9) are linear in Y , we may assume | Y ′ | ≤ upon replacing Y by Y / max(1 , | Y ′ | ) . Similarly, we may assume | Z | ≤ .Using the localizing sequence τ k = min { t | | Z t | > k } and replacing Z by ( Z t ∧ τ k − ) t ,we may assume that Z is uniformly bounded. Using the localizing sequence givenby Lemma 4.4, we may assume g ∈ L ( V ∞ ) . Using the localizing sequence given byLemma 5.3, we may assume that X ∈ V p , Y ∈ L ∞ ( V p ) , and R Y ,X ∈ L ∞ ( V ˆ p ) .Overall, we may assume(5.10) g ∈ L V ∞ , X ∈ V p , M Y ′ , M Z ∈ L ∞ , R Y ,X ∈ L ∞ ( V ˆ p ) . Assuming (5.10), the first sum in (5.9) now makes sense by Theorem 5.1 and is in V r loc for any /r < / /p . The second sum in (5.9) almost surely converges OUGH SEMIMARTINGALES 27 absolutely for every t , and in particular defines a process with almost surely V paths.Now, still assuming (5.10), we will show that the limit (5.8) exists and coincideswith (5.9).Fix T > . Let A ≥ be such that sup t ≤ T | X t | < A and the set Ω := { sup t ≤ T ( | Y t | ∨ | Y ′ t | ∨ | g t | ∨ | Z t | ) < A } has probability ≥ − ǫ .Let J X := { s | | ∆ X s | > ǫ/ (2 A ) } and J Y ( ω ) := { s | | ∆ Y s | > ǫ/ } . Let N < ∞ besuch that | J X | ≤ N and Ω := {| J Y | < N } has probability ≥ − ǫ .Let δ be such that sup t ≤ t ′ ≤ T : | t ′ − t |≤ δ, ( t,t ′ ] ∩ J X = ∅ | δX t,t ′ | < ǫ/A, sup t ∈ ( J X ∪ J Y ) ∩ [0 ,T ] sup These estimates are uniform in T , so we obtain sup T ≤ T | X π j The following estimate will be used for boundary terms. Lemma 5.5. Let < q , q ≤ ∞ and /q = 1 /q + 1 /q . Let < p , p ≤ ∞ and /r < /p + 1 /p . Let f, g be càdlàg adapted processes. Then (cid:13)(cid:13)(cid:13) V r (cid:0) δf t,t ′ δg t,t ′ (cid:1)(cid:13)(cid:13)(cid:13) L q ≤ sup τ k sup τ k − ≤ t<τ k | f t − f τ k |k L q ( ℓ p ) k sup τ k − ≤ t<τ k | g t − g τ k |k L q ( ℓ p ) . where the supremum is taken over adapted partitions τ .Proof. This is a direct consequence of Corollary 3.2 with /r < /ρ = 1 /p + 1 /p and Hölder’s inequality. (cid:3) Corollary 5.6. Let ≤ q < ∞ , < q ≤ ∞ , and /q = 1 /q + 1 /q . Let < p ≤ ∞ and /r < / /p . Let f be a càdlàg adapted process and g a càdlàgmartingale. Then (5.11) (cid:13)(cid:13)(cid:13) V r (cid:0) δf t,t ′ δg t,t ′ (cid:1)(cid:13)(cid:13)(cid:13) L q . k V p f k L q k V ∞ g k L q . Proof. We apply Lemma 5.5 with p = 2 . The resulting L q ( ℓ ) norm can be esti-mated, after discretization, using first the vector-valued and then the scalar-valuedBDG inequality. (cid:3) Proof of Theorem 1.2. For any adapted partition π and any càdlàg processes f, g ,we have the summation by parts identity Π π ( f, g ) ,T + Π π ( g, f ) ,T + [ f, g ] πT = ( f T − f )( g T − g ) . Define(5.12) Π( g, Y) := δgδY − Π( Y, g ) − δ [Y , g ] . Convergence (1.23) then follows from Corollary 4.5 and Theorem 5.4.Chen’s relation (1.24) follows from Chen’s relation (1.15) for Π( Y, g ) . The variation norm bound (1.25) follows from Corollary 5.6, part 2 of Theorem 1.1,and Theorem 5.1 applied to the respective terms. (cid:3) Quadratic covariation of two martingales. In this section, we recall a fewfacts about quadratic covariation needed in Section 6 and explain how they fit intothe approach to Itô integration provided by Theorem 1.1.Let f, g be càdlàg martingales. The quadratic covariation process of f, g is definedby [ f, g ] t := δf ,t δg ,t − Π( f, g ) ,t − Π( g, f ) ,t . One can verify that the discrete brackets introduced in (5.6) satisfy δf ,t δg ,t − Π π ( f, g ) ,t − Π π ( g, f ) ,t = [ f, g ] π ,t . Therefore, Corollary 4.5 recovers the existence of the limit that is usually used todefine the quadratic covariation: [ f, g ] t = u . c . p . -lim mesh( π ) → δf ,t δg ,t − Π π ( f, g ) ,t − Π π ( g, f ) ,t . In particular, in the case g = f , the function t [ g ] t := [ g, g ] t is a.s. monotonicallyincreasing and locally bounded. Passing to the limit in the vector-valued BDGinequality, Lemma 2.4, we obtain the estimate(5.13) (cid:13)(cid:13) V ∞ h ( k ) (cid:13)(cid:13) L q ( ℓ rk ) . q,r (cid:13)(cid:13) [ h ( k ) ] / (cid:13)(cid:13) L q ( ℓ rk ) , where h ( k ) are càdlàg martingales, [ h ] = [ h ] ∞ = lim t →∞ [ h, h ] t , and the hypotheseson the exponents q, r are the same as in Lemma 2.4.Finally, we recall the (almost sure, pathwise) Itô isometry(5.14) [Π( f, g ) s, · ] t = Z ( s,t ] | f u − − f s | d[ g ] u , where the integral is taken in the Riemann–Stieltjes sense.6. Consistency of rough and stochastic integration Let g be a càdlàg local martingale and g = ( g, Π( g, g )) the p -rough path lift (with p ∈ (2 , ) provided by Theorem 1.1 with F = δg . It is well-known that, for any g -controlled p -rough adapted process A = ( A, A ′ ) , the Itô integral and the roughintegral coincide almost surely:(6.1) Z A u − d g u = Z A u − dg u , see e.g. [FH20, Proposition 5.1] for the case of Brownian motion and references giventhere for historical information. We begin with a generalization of this fact, in whichone of the copies of g is replaced by a further process Y and Z plays the role of A ′ . Lemma 6.1. Let g be a càdlàg local martingale and Y, Z càdlàg adapted processes.Then, along adapted partitions π , we have (6.2) u . c . p . -lim mesh( π ) → (cid:16) X π j Y, Z , is the main ingredient in showing consistencyresults such as (6.1). Indeed, the difference between the discrete approximations ofthe two sides of (6.1) is precisely the sum in (6.2). More generally, one can replace therough lift g by a rough semimartingale g + ˜ g , where ˜ g is independent from g , and thecontrolled process A by another process that is a g -controlled rough semimartingaleconditionally on each path of g . OUGH SEMIMARTINGALES 31 Proof of Lemma 6.1. Without loss of generality, Y = 0 . Multiplying Z by an F -measurable time-independent function, we may also assume | Z | ≤ . Similarly to(5.10), we may assume g ∈ L V ∞ , M Y, M Z ∈ L ∞ . By the BDG inequality and Itô isometry (5.14), we have E sup T (cid:12)(cid:12)(cid:12) X π j Let ˆ p < ≤ p . Let X ∈ V p loc be a deterministic càdlàg path. Supposethat Y = ( Y, Y ′ ) is a càdlàg adapted process, Z a càdlàg adapted process, g a càdlàglocal martingale, R Y ,X ∈ V ˆ p loc a.s.. Then u . c . p . -lim d-mesh( π ) → (cid:16) X π j By definition (5.12), we have X π j The first term on the right-hand side is, by Definition 5.2, equal to Z • [ Y, g ] π . ByThoerem 5.4, it converges to Z • [Y , g ] .The middle term equals Z ( π ) • [Y , g ] . This also converges to Z • [Y , g ] as mesh( π ) → by an argument similar to (6.4). (cid:3) If ( g + Y, Y ′ ) is an X -controlled p -RSM, p ∈ (2 , , then Z = ( Z, Z ′ ) with(6.5) Z = g + Y, Z ′ t ( δX, δg ) = Y ′ t δX + δg is easily seen to be an ( X, g ) -controlled p -rough process. Indeed, g ∈ V p loc almostsurely by Lemma 4.4 and Lépingle’s inequality (1.2). It remains to observe that R Z s,t = δZ s,t − Z ′ t ( δX s,t , δg s,t )= δg s,t + δY s,t − Y ′ t δX s,t − δg s,t = R Y s,t . The converse implication is more subtle, because the g component of the Gubinelliderivative of a ( X, g ) -controlled process need not be the identity. Theorem 6.3. Let p ∈ (2 , and X ∈ V p loc be a deterministic càdlàg path. Let g be a càdlàg local martinagle. Let Z = ( Z, Z ′ ) be an adapted càdlàg ( X, g ) -controlled p -rough process.Then ( Z, Z ′ ( · , is an X -controlled p -RSM: ( Z, Z ′ ( · , g + ˜ Y , ˜ Y ′ ) , with the local martingale part given by (6.6) ˜ g T := Π( Z ′ (0 , · ) , g ) ,T and Gubinelli derivative (6.7) ˜ Y ′ T := Z ′ T ( · , . Proof of Theorem 6.3. With the local martingale component defined by (6.6), thecontrolled rough component will be defined by ˜ Y T := Z T − ˜ g T . It follows from Lépingle’s inequality (1.2) and localization, Lemma 4.4, that ˜ Y ∈ V p loc almost surely. It remains to show that R ˜Y ,X ∈ V p/ almost surely. To this end, with s < t , we write R ˜Y ,Xs,t = ˜ Y t − ˜ Y s − Z ′ s ( X t − X s , Z t − Z s − Π( Z ′ (0 , · ) , g ) ,t + Π( Z ′ (0 , · ) , g ) ,s − Z ′ s ( X t − X s , (cid:16) Z t − Z s − Z ′ s ( X t − X s , g t − g s ) (cid:17) − Π( Z ′ (0 , · ) , g ) ,t + Π( Z ′ (0 , · ) , g ) ,s + Z ′ s (0 , g t − g s )= R Z , ( X,g ) s,t − Π( Z ′ (0 , · ) , g ) s,t . (6.8)The former term is in V p/ by the hypothesis. The latter term is in V p/ by Theo-rem 1.1 and localization similar to Lemma 5.3. (cid:3) Corollary 6.4. Let p ∈ (2 , If ( g + Y, Y ′ ) is an X -controlled, p -rough semimartin-gale and σ ∈ C , then ( σ ( g + Y ) , Dσ ◦ Y ′ ) is also an X -controlled p -rough semi-martingale.Proof. By (6.5), g + Y can be lifted to an ( X, g ) -controlled p -rough process. Thecomposition of this process with σ is again an ( X, g ) -controlled p -rough path, seee.g. [FZ18, Remark 4.15], to which we can apply Theorem 6.3. (cid:3) Remark . Theorem 6.3 has an analog for classical semimartingales. Let g be a càdlàglocal martingale and Z = ( Z, Z ′ ) a càdlàg adapted process such that R Z ,g ∈ V and Z ′ ∈ V . Then Z must be a semimartingale. Indeed, let ˜ g T := Π( Z ′ , g ) T , Y T := Z T − ˜ g T , Y ′ T := 0 . OUGH SEMIMARTINGALES 33 Then, by the same calculation as in (6.8), we have δY s,t = R Y , s,t = − Π( Z ′ , g ) s,t . It follows from the ℓ -valued estimate in Corollary 2.7 that Y ∈ V , so that Z is asemimartingale. Theorem 6.5. Let p ∈ (2 , and X = ( X, X ) be a deterministic càdlàg p -roughpath. Let g be a càdlàg local martingale. Let Z = ( Z, Z ′ ) be an adapted càdlàg ( X, g ) -controlled p -rough process. Then Z Z d J (X , g ) = Π( Z, ( X, g )) . where the left-hand side is the pathwise rough integral and the right-hand side is theRSM integral.Proof. The right-hand side makes sense by Theorem 6.3. Expaniding the defini-tions, we see that the difference between the two sides vanishes by Lemma 6.1 andLemma 6.2. (cid:3) Appendix A. Hölder estimates for martingale transforms For a two-parameter process Π = (Π t,t ′ ) ≤ t In the situation of Theorem 1.1, part 2, suppose that all processeshave a.s. continuous paths and restrict the time parameter to a finite interval, t ∈ [0 , . Let ≤ γ < α + β = α i + β i with α, β, α i , β i ≥ . Then, we have (cid:13)(cid:13) H γ Π( F, g ) (cid:13)(cid:13) L q . k H β f k L q k H α ( Sg ) k L q + X i (cid:13)(cid:13) H α i F i · H β i Π( ˜ F i , g ) (cid:13)(cid:13) L q Proof. We abbreviate X := Π( F, g ) .Consider the deterministic partitions τ ( n ) = 2 − n N , ˜ τ ( n ) = { , } ∪ (2 − n N + 2 − n − ) .Let K n := sup j ∈ N sup τ ( n ) j − ≤ t ≤ t ′ ≤ τ ( n ) j | X t,t ′ | , and define ˜ K n analogously with ˜ τ ( n ) in place of τ ( n ) . Then, we have sup | t − t ′ |≤ − n − | X t,t ′ | ≤ K n + ˜ K n , sup | t − t ′ |≤ | X t,t ′ | ≤ K . It follows that sup | t − t ′ |≤ − n − | t − t ′ | − γ | X t,t ′ | . nγ K n + 2 nγ ˜ K n . Therefore, H γ X . max n ∈ N γn ( K n + ˜ K n ) . It follows that k H γ X k qL q . ∞ X n =0 (cid:0) γn k K n k L q (cid:1) q + ∞ X n =0 (cid:0) γn k ˜ K n k L q (cid:1) q . The two sums are similar, so we only consider the first one. Let < r < ∞ be suchthat γ + 1 /r < α + β . By Theorem 2.6, which passes to the continuous time case,we have γn k K n k L q ≤ γn k ℓ rj sup τ ( n ) j − ≤ t ≤ t ′ ≤ τ ( n ) j | X t,t ′ |k L q . γn X i (cid:13)(cid:13) ℓ rk (cid:0) sup τ ( n ) k − ≤ s Fundamentals of stochastic filtering . Vol. 60. Stochastic Mod-elling and Applied Probability. Springer, New York, 2009, pp. xiv+390. mr : (cit. on p. 10).[Bou89] J. Bourgain. “Pointwise ergodic theorems for arithmetic sets”. In: Inst. Hautes ÉtudesSci. Publ. Math. 69 (1989). With an appendix by the author, Harry Furstenberg, YitzhakKatznelson and Donald S. Ornstein, pp. 5–45. mr : (cit. on pp. 3, 18).[CF19] I. Chevyrev and P. K. Friz. “Canonical RDEs and general semimartingales as rough paths”.In: Ann. Probab. . mr : (cit. onpp. 4, 9, 30).[Che+19] I. Chevyrev, P. K. Friz, A. Korepanov, I. Melbourne, and H. Zhang. “Multiscale systems, homogenization, and rough paths”.In: Probability and analysis in interacting physical systems, In Honor of S.R.S. Varad-han, Berlin, August, 2016 . Vol. 283. Springer Proc. Math. Stat. Springer, Cham, 2019,pp. 17–48. arXiv: . mr : (cit. on p. 7).[CL05] L. Coutin and A. Lejay. “Semi-martingales and rough paths theory”. In: Electron. J.Probab. 10 (2005), no. 23, 761–785. mr : (cit. on p. 4).[Coq+06] F. Coquet, A. Jakubowski, J. Mémin, and L. Słomiński. “Natural decomposition of processes and weak Dirichlet processes” .In: In memoriam Paul-André Meyer: Séminaire de Probabilités XXXIX . Vol. 1874.Lecture Notes in Math. Springer, Berlin, 2006, pp. 81–116. arXiv: math/0403461 . mr : (cit. on p. 7).[CR07] R. Coviello and F. Russo. “Nonsemimartingales: stochastic differential equations and weak Dirichlet processes” .In: Ann. Probab. math/0602384 . mr : (cit. onp. 9).[Cri+13] D. Crisan, J. Diehl, P. K. Friz, and H. Oberhauser. “Robust filtering: correlated noise and multidimensional observation”.In: Ann. Appl. Probab. . mr : (cit.on pp. 9, 10).[Dav11] M. H. A. Davis. “Pathwise nonlinear filtering with correlated noise”. In: The Oxfordhandbook of nonlinear filtering . Oxford Univ. Press, Oxford, 2011, pp. 403–424. mr : (cit. on p. 10).[Dav70] B. Davis. “On the integrability of the martingale square function”. In: Israel J. Math. mr : (cit. on p. 11).[DFS17] J. Diehl, P. K. Friz, and W. Stannat. “Stochastic partial differential equations: a rough paths view on weak solutions via Feynman-Kac”.In: Ann. Fac. Sci. Toulouse Math. (6) . mr : (cit. on p. 9).[DMT12] Y. Do, C. Muscalu, and C. Thiele. “Variational estimates for paraproducts”. In: Rev.Mat. Iberoam. . mr : (cit. on pp. 4,18).[DMT17] Y. Do, C. Muscalu, and C. Thiele. “Variational estimates for the bilinear iterated Fourier integral”.In: J. Funct. Anal. . mr : (cit.on p. 4).[DOP19] J.-D. Deuschel, T. Orenshtein, and N. Perkowski. “Additive functionals as rough paths”.2019. arXiv: (cit. on p. 4). EFERENCES 35 [DOR15] J. Diehl, H. Oberhauser, and S. Riedel. “A Lévy area between Brownian motion and rough paths with applications to robust nonlinear filtering and rough partial differential equations”.In: Stochastic Process. Appl. . mr : (cit. on pp. 5, 9).[ER03] M. Errami and F. Russo. “ n -covariation, generalized Dirichlet processes and calculus with respect to finite cubic variation processes”.In: Stochastic Process. Appl. mr : (cit. on p. 7).[FH20] P. K. Friz and M. Hairer. A course on rough paths. With an introduction to regularitystructures . 2nd ed. Universitext. Springer, 2020 (cit. on pp. 5, 7, 9, 30).[FHL20] P. Friz, A. Hocquet, and K. Lê. “Rough Markov diffusions and stochastic differentialequations”. In preparation. 2020 (cit. on p. 9).[Föl81] H. Föllmer. “Dirichlet processes”. In: Stochastic integrals (Proc. Sympos., Univ. Durham,Durham, 1980) . Vol. 851. Lecture Notes in Math. Springer, Berlin, 1981, pp. 476–478. mr : (cit. on p. 6).[FS17] P. K. Friz and A. Shekhar. “General rough integration, Lévy rough paths and a Lévy-Kintchine-type formula”.In: Ann. Probab. . mr : (cit. onp. 7).[FV06] P. Friz and N. Victoir. “The Burkholder-Davis-Gundy inequality for enhanced martingales” .In: Séminaire de probabilités XLI . Vol. 1934. Lecture Notes in Math. Springer, Berlin,2006, pp. 421–438. arXiv: math/0608783 . mr : (cit. on p. 4).[FV10a] P. K. Friz and N. B. Victoir. Multidimensional stochastic processes as rough paths .Vol. 120. Cambridge Studies in Advanced Mathematics. Theory and applications. Cam-bridge University Press, Cambridge, 2010, pp. xiv+656. mr : (cit. on p. 4).[FV10b] P. Friz and N. Victoir. “Differential equations driven by Gaussian signals”. In: Ann.Inst. Henri Poincaré Probab. Stat. . mr : (cit. on pp. 5, 9).[FZ18] P. K. Friz and H. Zhang. “Differential equations driven by rough paths with jumps”. In: J. Differential Equations . mr : (cit. on pp. 7, 19, 26, 32).[GL97] J. G. Gaines and T. J. Lyons. “Variable step size control in the numerical solution of stochastic differential equations”.In: SIAM J. Appl. Math. mr : (cit. on p. 4).[GN08] J. Guerra and D. Nualart. “Stochastic differential equations driven by fractional Brownian motion and standard Brownian motion”.In: Stoch. Anal. Appl. . mr : (cit.on p. 9).[Gub04] M. Gubinelli. “Controlling rough paths”. In: J. Funct. Anal. math/0306433 . mr : (cit. on pp. 4, 7).[Gub10] M. Gubinelli. “Ramification of rough paths”. In: J. Differential Equations math/0610300 . mr : (cit. on pp. 15, 16).[HK15] M. Hairer and D. Kelly. “Geometric versus non-geometric rough paths”. In: Ann. Inst.Henri Poincaré Probab. Stat. . mr : (cit. on p. 16).[Hyt+16] T. Hytönen, J. van Neerven, M. Veraar, and L. Weis. Analysis in Banach spaces .Vol. I: Martingales and Littlewood-Paley theory . Cham: Springer, 2016, pp. xvi+614. mr : (cit. on pp. 11, 12).[JM83] N. C. Jain and D. Monrad. “Gaussian measures in B p ”. In: Ann. Probab. mr : (cit. on p. 6).[JSW08] R. L. Jones, A. Seeger, and J. Wright. “Strong variational and jump inequalities in harmonic analysis”.In: Trans. Amer. Math. Soc. mr : (cit. on p. 18).[KN07] P. E. Kloeden and A. Neuenkirch. “The pathwise convergence of approximation schemes for stochastic differential equations”.In: LMS J. Comput. Math. 10 (2007), pp. 235–253. mr : (cit. on p. 4).[KP92] P. E. Kloeden and E. Platen. Numerical solution of stochastic differential equations .Vol. 23. Applications of Mathematics (New York). Springer-Verlag, Berlin, 1992, pp. xxxvi+632. mr : (cit. on p. 4).[KZ19] V. Kovač and P. Zorin-Kranich. “Variational estimates for martingale paraproducts”.In: Electron. Commun. Probab. 24 (2019), Paper No. 48, 14. arXiv: . mr : (cit. on pp. 4, 11, 18).[Lep76] D. Lepingle. “La variation d’ordre p des semi-martingales” . In: Z. Wahrscheinlichkeits-theorie und Verw. Gebiete mr : (cit. on p. 2).[Lyo98] T. J. Lyons. “Differential equations driven by rough signals”. In: Rev. Mat. Iberoamer-icana mr : (cit. on p. 7).[Man04] M. Manstavičius. “ p -variation of strong Markov processes”. In: Ann. Probab. math/0410106 . mr : (cit. on p. 6).[MSZ20] M. Mirek, E. M. Stein, and P. Zorin-Kranich. “Jump inequalities via real interpolation”.In: Math. Ann. . mr : (cit. onp. 18).[MTT02] C. Muscalu, T. Tao, and C. Thiele. “Uniform estimates on paraproducts”. In: J. Anal.Math. 87 (2002). Dedicated to the memory of Thomas H. Wolff, pp. 369–384. arXiv: math/0106092 . mr : (cit. on p. 4). [Mus14] C. Muscalu. “Calderón commutators and the Cauchy integral on Lipschitz curves revisited II. The Cauchy integral and its generalizations” .In: Rev. Mat. Iberoam. . mr : (cit. on p. 4).[Os¸e17] A. Os¸ekowski. “A Fefferman-Stein inequality for the martingale square and maximal functions”.In: Statist. Probab. Lett. 129 (2017), pp. 81–85. mr : (cit. on p. 12).[PX88] G. Pisier and Q. H. Xu. “The strong p -variation of martingales and orthogonal series”.In: Probab. Theory Related Fields mr : (cit. on pp. 3,18).[Wil01] D. R. E. Williams. “Path-wise solutions of stochastic differential equations driven by Lévy processes” .In: Rev. Mat. Iberoamericana mr : (cit. on p. 7).[You36] L. C. Young. “An inequality of the Hölder type, connected with Stieltjes integration”.In: Acta Math. mr : (cit. on p. 2).[Zor20] P. Zorin-Kranich. “Weighted Lépingle inequality”. In: Bernoulli (cit. on pp. 3, 13, 18).(PF) Institut für Mathematik, TU Berlin (PF) Weierstraß–Institut für Angewandte Analysis und Stochastik E-mail address : [email protected] (PZK) Mathematical Institute, University of Bonn E-mail address :: δ } , Ω := { sup t ≤ t ′ ≤ T : | t ′ − t |≤ δ, ( t,t ′ ] ∩ J Y = ∅ | δY t,t ′ | < ǫ } , have probability ≥ − ǫ . Let π be a deterministic partition with mesh( π ) < δ .The basic idea to handle the main term is the following. Suppose ω ∈ Ω ∩ · · · ∩ Ω and s ∈ J X ∪ J Y ( ω ) . Suppose π j < s ≤ π j +1 ∧ T . Then (cid:12)(cid:12)(cid:12) Z π j δY π j ,π j +1 ∧ T δg π j ,π j +1 ∧ T − Z s − ∆ Y s ∆ g s (cid:12)(cid:12)(cid:12) ≤ | Z π j − Z s − | · | δY π j ,π j +1 ∧ T δg π j ,π j +1 ∧ T | + | Z s − | · | δY π j ,π j +1 ∧ T − ∆ Y s | · | δg π j ,π j +1 ∧ T | + | Z s − ∆ Y s | · | δg π j ,π j +1 ∧ T − ∆ g s | = | Z π j − Z s − | · | δY π j ,π j +1 ∧ T δg π j ,π j +1 ∧ T | + | Z s − | · | δY π j ,s − + δY s,π j +1 ∧ T | · | δg π j ,π j +1 ∧ T | + | Z s − ∆ Y s | · | δg π j ,s − + δg s,π j +1 ∧ T |≤ · (2 A ) · ǫ/ (100 A N ) ≤ ǫ/ (4 N ) . In case s ∈ J Y ( ω ) \ J X , we similarly estimate | Z s − Y ′ s − ∆ X s ∆ g s − Z π j Y ′ π j δX π j ,π j +1 ∧ T δg π j ,π j +1 ∧ T |≤ | Z s − − Z π j | · | Y ′ s − ∆ X s ∆ g s | + | Z π j | · | Y ′ s − − Y ′ π j | · | ∆ X s ∆ g s | + | Z π j Y ′ π j | · | ∆ X s − δX π j ,π j +1 ∧ T | · | ∆ g s | + | Z π j Y ′ π j δX π j ,π j +1 ∧ T | · | ∆ g s − δg π j ,π j +1 ∧ T | . ǫ/N. Since ω ∈ Ω , these errors contribute O ( ǫ ) to the sum over j . Hence, we obtain | X π j