[PDF] Time fractional gradient flows: Theory and numerics

Abstract

We develop the theory of fractional gradient flows: an evolution aimed at the minimization of a convex, l.s.c.~energy, with memory effects. This memory is characterized by the fact that the negative of the (sub)gradient of the energy equals the so-called Caputo derivative of the state. We introduce the notion of energy solutions, for which we provide existence, uniqueness and certain regularizing effects. We also consider Lipschitz perturbations of this energy. For these problems we provide an a posteriori error estimate and show its reliability. This estimate depends only on the problem data, and imposes no constraints between consecutive time-steps. On the basis of this estimate we provide an a priori error analysis that makes no assumptions on the smoothness of the solution.

Full PDF

aa r X i v : . [ m a t h . A P ] J a n TIME FRACTIONAL GRADIENT FLOWS: THEORY AND NUMERICS

WENBO LI AND ABNER J. SALGADO

Abstract.

We develop the theory of fractional gradient ﬂows: an evolution aimed at the mini-mization of a convex, l.s.c. energy, with memory eﬀects. This memory is characterized by the factthat the negative of the (sub)gradient of the energy equals the so-called Caputo derivative of thestate. We introduce the notion of energy solutions, for which we provide existence, uniquenessand certain regularizing eﬀects. We also consider Lipschitz perturbations of this energy. For theseproblems we provide an a posteriori error estimate and show its reliability. This estimate dependsonly on the problem data, and imposes no constraints between consecutive time-steps. On thebasis of this estimate we provide an a priori error analysis that makes no assumptions on thesmoothness of the solution. Introduction

In recent times problems involving fractional derivatives have garnered considerable attention,as it is claimed that they better describe certain fundamental relations between the processes ofinterest; see, for instance [29, 15, 46]. In this, and many other references the models considered arelinear. However, it is well known that real world phenomena are not linear, not even smooth. It isonly natural then to consider nonlinear/nonsmooth models with fractional derivatives.The purpose of this work is to develop the theory and numerical analysis of so-called time-fractional gradient ﬂows: an evolution equation aimed at the minimization of a convex and lowersemicontinuous (l.s.c.) energy, but where the evolution has memory eﬀects. This memory is charac-terized by the fact that the negative of the (sub)gradient of the energy equals the so-called Caputoderivative of the state.The Caputo derivative, introduced in [11], is one of the existing models of fractional derivatives.It is deﬁned, for α ∈ (0 , D αc w ( t ) = 1Γ(1 − α ) ˆ t ˙ w ( r )( t − r ) α d r, where Γ denotes the Gamma function. This deﬁnition, from the onset, seems unnatural. To deﬁnea derivative of a fractional order, it seems necessary for the function to be at least diﬀerentiable.Below we brieﬂy describe several attempts at circumventing this issue. We focus, in particular, onthe results developed in a series of papers by Li and Liu, see [25, 28, 26, 27], where they developed adistributional theory for this derivative; see also [16]. The authors of these works also constructed,in [26], so-called deconvolution schemes that aim at discretizing this derivative. With the help ofthis deﬁnition and the schemes that they develop the authors were able to study several classes ofequations, in particular time fractional gradient ﬂows.Let us be precise in what we mean by this term. Let T > H be a separableHilbert space, Φ : H → R ∪ { + ∞} be a convex and l.s.c. functional, which we will call energy . Given u ∈ H , and f : (0 , T ] → H we seek for a function u : [0 , T ] → H that satisﬁes(1.2) ( D αc u ( t ) + ∂ Φ( u ( t )) ∋ f ( t ) , t ∈ (0 , T ] ,u (0) = u , Date : Draft version of January 5, 2021.2020

Mathematics Subject Classiﬁcation.

Key words and phrases.

Caputo derivative, gradient ﬂows, a posteriori error estimate, variable time stepping. where by ∂ Φ we denote the subdiﬀerential of Φ. Our objectives in this work can be stated as follows:We will introduce the notion of “energy solutions” of (1.2), and we will reﬁne the results regardingexistence, uniqueness, and regularizing eﬀects provided in [28]. This will be done by generalizing, tonon-uniform time steps the “deconvolution” schemes of [26, 28], and developing a sort of “fractionalminimizing movements” scheme. We will also provide an a priori error estimate that seems optimalin light of the regularizing eﬀects proved above. We also develop an a posteriori error estimate, inthe spirit of [30] and show its reliability.We comment, in passing, that nonlinear evolution problems with fractional time derivative havebeen considered in other works. From a modeling point of view, their advantages have been observedin [15, 12]. Some other types of nonlinear problems have been studied in [8, 40, 2, 24, 23, 39, 45]and [31, 38] where, for a particular type of nonlinear problem other “energy dissipation inequalities”than those we obtain are derived. Regularity properties for nonlinear problems with fractional timederivatives have been obtained in [22, 14, 21, 1, 44, 43, 42, 41]. Of particular interest to us are[28] which we described above and [3] which also considers time fractional gradient ﬂows. Theassumptions on the data, however, are slightly diﬀerent than ours. As such, some of the resultsin [3] are stronger, and some weaker than ours; in particular, we conduct a numerical analysis ofthis problem. Nevertheless, we refer to this reference for a nice historical account and particularapplications to PDEs.Our presentation will be organized as follows. We will establish notation and the framework wewill adopt in Section 2. Here, in particular, we will study several properties of a particular space,which we denote by L pα (0 , T ; H ), and that will be used to characterize the requirements on the righthand side f of (1.2). In addition, we also review the various proposed generalizations of the classicaldeﬁnition of Caputo derivatives, with particular attention to that of [25, 28, 27]; since this is the onewe shall adopt. In Section 3 we generalize the deconvolution schemes of [26, 28] and their properties,to the case of nonuniform time stepping. Many of the simple properties of these schemes are lost inthis case, but we retain enough of them for our purposes. Section 4 introduces the notion of energysolutions for (1.2) and shows existence and uniqueness of these. This is accomplished by introducing,on the basis of our generalized deconvolution formulas, a fractional minimizing movements scheme;and showing that the discrete solutions have enough compactness to pass to the limit in the sizeof the partition. In Section 5 we provide an error analysis of the fractional minimizing movementsscheme. First, we show how an error estimate follows as a side result from the existence proof. Then,in the spirit of [30], we provide an a posteriori error estimator for our scheme and show its reliability.This estimator is then used to independently show rates of convergence. This section is concludedwith some particular instances in which the rate of convergence can be improved. Section 6 isdedicated to the case in which we allow a Lipschitz perturbation of the subdiﬀerential. We extendthe existence, uniqueness, a priori, and a posteriori approximation results of the fractional gradientﬂow. Finally, Section 7 presents some simple numerical experiments that illustrate, explore, andexpand our theory. 2. Notation and preliminaries

Let us begin by presenting the main notation and assumptions we shall operate under. We willdenote by T ∈ (0 , ∞ ) our ﬁnal (positive) time. By H we will always denote a separable Hilbertspace with scalar product h· , ·i and norm k · k . As it is by now customary, by C we will denote anonessential constant whose value may change at each occurrence.2.1. Convex energies.

The energy will be a convex, l.s.c., functional Φ :

H → R ∪ { + ∞} withnonempty eﬀective domain of deﬁnition, that is D (Φ) = { w ∈ H : Φ( w ) < + ∞} 6 = ∅ . We will always assume that our energy is bounded from below, that isΦ inf = inf u ∈H Φ( u ) > −∞ . IME FRACTIONAL GRADIENT FLOW 3

As we are not assuming smoothness in our energy beyond convexity, a useful substitute for itsderivative is the subdiﬀerential, that is, ∂ Φ( w ) = { ξ ∈ H : h ξ, v − w i ≤ Φ( v ) − Φ( w ) ∀ v ∈ H} . The eﬀective domain of the subdiﬀerential is D ( ∂ Φ) = { w ∈ H : ∂ Φ( w ) = ∅} . Recall that, in oursetting, we always have that D ( ∂ Φ) = D (Φ). We refer the reader to [13, 33] for basic facts on convexanalysis.In applications, it is sometimes useful to obtain error estimates on (semi)norms stronger thanthose of the ambient space, and that are dictated by the structure of the energy. For this reason,we introduce the following coercivity modulus of Φ, see [30, Deﬁnition 2.3]. Deﬁnition 2.1 (coercivity modulus) . For every w ∈ D (Φ) and w ∈ D ( ∂ Φ) , let σ ( w ; w ) ≥ be σ ( w ; w ) = Φ( w ) − Φ( w ) − sup ξ ∈ ∂ Φ( w ) h ξ, w − w i . Then for every w , w ∈ D ( ∂ Φ) we deﬁne ρ ( w , w ) = σ ( w ; w ) + σ ( w ; w ) = inf ξ ∈ ∂ Φ( w ) ,ξ ∈ ∂ Φ( w ) h ξ − ξ , w − w i . We comment that, by the deﬁnition, ρ ( · , · ) is symmetric, whereas σ ( · ; · ) might not be. Further-more, the separability of H guarantees that σ and ρ are both Borel measurable [30, Remark 2.4].One may also refer to [30, Section 2.3] for discussions and properties of σ and ρ for certain choicesof Φ. Deﬁnition 2.1 enables us to write(2.1) ξ ∈ ∂ Φ( w ) ⇐⇒ h ξ, v − w i + σ ( w ; v ) ≤ Φ( v ) − Φ( w ) , ∀ v ∈ H . Vector valued time dependent functions.

We will follow standard notation regardingBochner spaces of vector valued functions, see [32, Section 1.5]. For any w ∈ L (0 , T ; H ) and E ⊂ [0 , T ] that is measurable, we deﬁne the average by E w ( t )d t = 1 | E | ˆ E w ( t )d t, where | E | denotes the Lebesgue measure of E .Since eventually we will have to deal with time discretization, we also introduce notation fortime-discrete vector valued functions. Let P be a partition of the time interval [0 , T ](2.2) P = { t < t < . . . < t N − < t N = T } , with variable steps τ n = t n − t n − and τ = max { τ n : n ∈ { , . . . , N }} . We will always denote by N the size of a partition. For t ∈ [0 , T ] we deﬁne ⌊ t ⌋ P = max { r ∈ P : r < t } , ⌈ t ⌉ P = min { r ∈ P : t ≤ r } , and n ( t ) to be the index of ⌈ t ⌉ P , so that t ∈ ( ⌊ t ⌋ P , ⌈ t ⌉ P ] = ( t n ( t ) − , t n ( t ) ]. Given a partition P ,for W = { W i } Ni =1 ⊂ H N we deﬁne its piecewise constant interpolant with respect to P to be thefunction W P ∈ L ∞ (0 , T ; H ) deﬁned by(2.3) W P ( t ) = W n ( t ) . The space L pα (0 , T ; H ) . To quantify the assumptions we need on the right hand side f of (1.2)we introduce the following space. Deﬁnition 2.2 (space L pα (0 , T ; H )) . Let p ∈ [1 , ∞ ) and α ∈ (0 , . We say that the function w : [0 , T ] → H belongs to the space L pα (0 , T ; H ) iﬀ (2.4) k w k L pα (0 ,T ; H ) = sup t ∈ [0 ,T ] (cid:18) ˆ t ( t − s ) α − k w ( s ) k p d s (cid:19) /p < ∞ . Let us show some basic embedding results about this space.

W. LI AND A.J. SALGADO

Proposition 2.3 (embedding) . Let p ∈ [1 , ∞ ) , α ∈ (0 , , and q > p/α . Then we have that L q (0 , T ; H ) ֒ → L pα (0 , T ; H ) ֒ → L p (0 , T ; H ) . Proof.

The second embedding is immediate. For any t ∈ (0 , T ] ˆ t k w ( s ) k p d s ≤ sup s ∈ [0 ,t ] ( t − s ) − α ˆ t ( t − s ) α − k w ( s ) k p d s ≤ T − α k w k pL pα (0 ,T ; H ) , where we used that 1 − α > (cid:18) ˆ t ( t − s ) α − k w ( s ) k p d s (cid:19) /p ≤ (cid:18) q − pqα − p (cid:19) ( q − p ) /q t α − p/q k w k L q (0 ,t ; H ) , and hence(2.5) k w k L pα (0 ,T ; H ) ≤ (cid:18) q − pqα − p (cid:19) ( q − p ) /q T α − p/q k w k L q (0 ,T ; H ) , as we intended to show. (cid:3) When dealing with discretization we will approximate the right hand side f of (1.2) by its localaverages over a partition P . Thus, we must provide a bound on this operation that is independentof the partition. Lemma 2.4 (continuity of averaging) . Let p ∈ [1 , ∞ ) , α ∈ (0 , , f ∈ L pα (0 , T ; H ) , and P be apartition of [0 , T ] as in (2.2) . Deﬁne F = { ﬄ t n t n − f ( t )d t } Nn =1 ⊂ H N and let F P be deﬁned as in (2.3) .Then, there exists a constant C > only depending on p and α such that k F P k L pα (0 ,T ; H ) ≤ C k f k L pα (0 ,T ; H ) . Proof.

Let p ∈ (1 , ∞ ). We ﬁrst, for n ∈ { , . . . , N } , bound the integral ˆ t n ( t n − s ) α − k F P ( s ) k p d s. To achieve this, we decompose this integral as(2.6) ˆ t n ( t n − s ) α − k F P ( s ) k p d s = n X k =1 ˆ t k t k − ( t n − s ) α − k F P ( s ) k p d s = n X k =1 k F k k p ˆ t k t k − ( t n − s ) α − d s. We use H¨older inequality in the deﬁnition of F k to obtain that(2.7) k F k k p = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t k t k − f ( s )d s (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ≤ t k t k − ( t n − s ) α − k f ( s ) k p d s t k t k − ( t n − s ) − αp − d s ! p − . Since, for every p ∈ (1 , ∞ ) the function s s α − belongs to the Muckenhoupt class A p ( R + ), see[20, Example 7.1.7], there exists a constant C p,α that only depends on p and α such that ba s α − d s ba s − αp − d s ! p − ≤ C p,α , ∀ ≤ a < b. Therefore, for any k , we have(2.8) t k t k − ( t n − s ) α − d s " t k t k − ( t n − s ) − αp − d s p − = t n − t k t n − t k − s α − d s " t n − t k t n − t k − s − αp − d s p − ≤ C p,α . IME FRACTIONAL GRADIENT FLOW 5

Substituting (2.7) and (2.8) into (2.6) we get ˆ t n ( t n − s ) α − k F P ( s ) k p d s ≤ n X k =1 C p,α ˆ t k t k − ( t n − s ) α − k f ( s ) k p d s = C p,α ˆ t n ( t n − s ) α − k f ( s ) k p d s ≤ C p,α k f k pL pα (0 ,T ; H ) . Now consider t ∈ [0 , T ]. Taking advantage of the estimate we obtained above we write(2.9) ˆ t ( t − s ) α − k F P ( s ) k p d s = ˆ ⌊ t ⌋ P ( t − s ) α − k F P ( s ) k p d s + ˆ t ⌊ t ⌋ P ( t − s ) α − k F P ( s ) k p d s = ˆ ⌊ t ⌋ P ( t − s ) α − k F P ( s ) k p d s + k F P ( ⌈ t ⌉ P ) k p ˆ t ⌊ t ⌋ P ( t − s ) α − d s ≤ ˆ ⌊ t ⌋ P ( ⌊ t ⌋ P − s ) α − k F P ( s ) k p d s + k F P ( ⌈ t ⌉ P ) k p ˆ ⌈ t ⌉ P ⌊ t ⌋ P ( ⌈ t ⌉ P − s ) α − d s ≤ C p,α k f k pL pα (0 ,T ; H ) + ˆ ⌈ t ⌉ P ( ⌈ t ⌉ P − s ) α − k F ( s ) k p d s ≤ C p,α k f k pL pα (0 ,T ; H ) . Therefore by taking supremum over t ∈ [0 , T ] and C = (2 C p,α ) /p , we ﬁnish the proof of this lemma.For p = 1, the proof proceeds almost the same way as before. The only diﬀerence worth notingis that, instead of (2.7), we have k F k k = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t k t k − f ( s )d s (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ t k t k − ( t n − s ) α − k f ( s ) k d s sup s ∈ [ t k − ,t k ] t n − s ) α − . Next, we observe that, since α − ∈ ( − , s s α − belongs to the Muckenhouptclass A ( R + ). Thus, sup s ∈ [ a,b ] s α − ba s α − d s ≤ C α , ∀ ≤ a < b. With this information, the proof proceeds without change. (cid:3)

It turns out that averaging is not only continuous, but possesses suitable approximation propertiesin this space. Namely, we have a control on the diﬀerence between fractional integrals of f ∈ L pα (0 , T ; H ) and its averages. Lemma 2.5 (approximation) . Let p ∈ [1 , ∞ ) , α ∈ (0 , , f ∈ L pα (0 , T ; H ) , and P be a partition of [0 , T ] as in (2.2) . Let p ′ be the H¨older conjugate of p , F = { ﬄ t n t n − f ( t )d t } Nn =1 ⊂ H N , and let F P bedeﬁned as in (2.3) . Then we have (2.10)sup t ∈ [0 ,T ] (cid:13)(cid:13)(cid:13)(cid:13) ˆ t ( t − s ) α − (cid:0) f ( s ) − F P ( s ) (cid:1) d s (cid:13)(cid:13)(cid:13)(cid:13) ≤ Cτ α/p ′ k f − F P k L pα (0 ,T ; H ) ≤ C ′ τ α/p ′ k f k L pα (0 ,T ; H ) , where the constants C, C ′ depend only on p and α . In addition, for any β ∈ (0 , we also have (2.11) sup r ∈ [0 ,T ] ˆ r ( r − t ) α − (cid:13)(cid:13)(cid:13)(cid:13) ˆ t ( t − s ) β − (cid:0) f ( s ) − F P ( s ) (cid:1) d s (cid:13)(cid:13)(cid:13)(cid:13) p d t ≤ C τ pβ k f − F P k pL pα (0 ,T ; H ) ≤ C ′ τ pβ k f k pL pα (0 ,T ; H ) , where the constants C , C ′ depend on p , α , and β . As usual, when p = 1 , we have p ′ = ∞ and /p ′ is treated as . W. LI AND A.J. SALGADO

Proof.

We ﬁrst notice that the second inequalities in both (2.10) and (2.11) follow directly fromLemma 2.4 and the triangle inequality.To show the ﬁrst inequality in (2.10), given P we consider t ∈ [0 , T ]. Using that f − F P has zeromean on each subinterval of the partition, we can write(2.12) ˆ t ( t − s ) α − (cid:0) f ( s ) − F P ( s ) (cid:1) d s = ˆ t ⌊ t ⌋ P ( t − s ) α − (cid:0) f ( s ) − F P ( s ) (cid:1) d s + n ( t ) − X k =1 ˆ t k t k − ( t − s ) α − (cid:0) f ( s ) − F P ( s ) (cid:1) d s = ˆ t ⌊ t ⌋ P ( t − s ) α − (cid:0) f ( s ) − F P ( s ) (cid:1) d s + n ( t ) − X k =1 ˆ t k t k − (cid:0) ( t − s ) α − − ( t − t k − ) α − (cid:1) (cid:0) f ( s ) − F P ( s ) (cid:1) d s = I ( t ) + I ( t ) . For the ﬁrst term, denoted I ( t ), we have k I ( t ) k ≤ ˆ t ⌊ t ⌋ P ( t − s ) α − (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p d s ! /p ˆ t ⌊ t ⌋ P ( t − s ) α − d s ! /p ′ ≤ k f − F P k L pα (0 ,T ; H ) (cid:18) α ( t − ⌊ t ⌋ P ) α (cid:19) /p ′ ≤ C τ α/p ′ k f − F P k L pα (0 ,T ; H ) , where C only depends on p and α . For the second term, noticing that t − t k − + τ > t − s for s ∈ ( t k − , t k ) we have k I k ≤ ˆ ⌊ t ⌋ P (cid:0) ( t − s ) α − − ( t − s + τ ) α − (cid:1) k f ( s ) − F P ( s ) k d s ≤ " ˆ ⌊ t ⌋ P ( t − s ) α − (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p d s /p  ˆ ⌊ t ⌋ P ( t − s ) α − " − (cid:20) t − s + τt − s (cid:21) α − p ′ d s  /p ′ ≤ k f − F P k L pα (0 ,T ; H ) ˆ ⌊ t ⌋ P ( t − s ) α − − ( t − s + τ ) α − d s ! /p ′ . Since ˆ ⌊ t ⌋ P ( t − s ) α − − ( t − s + τ ) α − d s = 1 α ( t α − ( t − ⌊ t ⌋ P ) α − ( t + τ ) α + ( t − ⌊ t ⌋ P + τ ) α ) ≤ α (( t − ⌊ t ⌋ P + τ ) α − ( t − ⌊ t ⌋ P ) α ) ≤ τ α α , we obtain k I ( t ) k ≤ C τ α/p ′ k f − F P k L pα (0 ,T ; H ) , and (2.10) follows after combining the bounds for I ( t ) and I ( t ) that we have obtained.To prove (2.11) we apply the H¨older inequality to (2.12) with α replaced by β to get (cid:13)(cid:13)(cid:13)(cid:13) ˆ t ( t − s ) β − (cid:0) f ( s ) − F P ( s ) (cid:1) d s (cid:13)(cid:13)(cid:13)(cid:13) p ≤ II ( t ) p − · (II ( t ) + II ( t )) , IME FRACTIONAL GRADIENT FLOW 7 where II ( t ) = ˆ t ⌊ t ⌋ P ( t − s ) β − d s + n ( t ) − X k =1 ˆ t k t k − (cid:2) ( t − s ) β − − ( t − t k − ) β − (cid:3) d s, II ( t ) = ˆ t ⌊ t ⌋ P ( t − s ) β − (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p d s, II ( t ) = n ( t ) − X k =1 ˆ t k t k − (cid:0) ( t − s ) β − − ( t − t k − ) β − (cid:1) (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p d s. Arguing as in the bound for I ( t )II ( t ) = 1 β ( t − ⌊ t ⌋ P ) β + ˆ ⌊ t ⌋ P (cid:2) ( t − s ) β − − ( t − s + τ ) β − (cid:3) d s ≤ β τ β . Thus, to obtain (2.11) it suﬃces to show that, for every r ∈ [0 , T ], ˆ r ( r − t ) α − (II ( t ) + II ( t )) d t ≤ C τ β k f − F P k pL pα (0 ,T ; H ) with some constant C only depending on p , α , and β . To estimate the fractional integral of II byFubini’s theorem we have(2.13) ˆ r ( r − t ) α − II ( t )d t = ˆ r (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p ˆ ⌈ s ⌉ P ∧ rs ( r − t ) α − ( t − s ) β − d t d s, where we set a ∧ b = min { a, b } . We claim that there exists a constant C depending on α and β such that(2.14) ˆ ⌈ s ⌉ P ∧ rs ( r − t ) α − ( t − s ) β − d t ≤ C ( r − s ) α − τ β . On the one hand, for r − s ≤ τ , we simply have ˆ ⌈ s ⌉ P ∧ rs ( r − t ) α − ( t − s ) β − d t ≤ ˆ rs ( r − t ) α − ( t − s ) β − d t = Γ( α )Γ( β )Γ( α + β ) ( r − s ) α + β − ≤ Γ( α )Γ( β )Γ( α + β ) ( r − s ) α − (2 τ ) β . On the other hand, if r − s > τ , then ˆ ⌈ s ⌉ P ∧ rs ( r − t ) α − ( t − s ) β − d t ≤ ˆ s + τs ( r − t ) α − ( t − s ) β − d t ≤ ˆ s + τs (cid:18) r − s (cid:19) α − ( t − s ) β − d t = 2 − α β ( r − s ) α − τ β . Therefore (2.14) is proved, and thus (2.14) implies that ˆ r ( r − t ) α − II ( t )d t ≤ C τ β ˆ r ( r − s ) α − (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p d s ≤ C τ β k f − F P k pL pα (0 ,T ; H ) . For II ( t ), we again apply Fubini’s theorem to obtain ˆ r ( r − t ) α − II ( t )d t = ˆ r (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p ˆ rs ( r − t ) α − (cid:0) ( t − s ) β − − ( t − s + τ ) β − (cid:1) d t d s. To conclude, we claim that(2.15) A = ˆ rs ( r − t ) α − (cid:0) ( t − s ) β − − ( t − s + τ ) β − (cid:1) d t ≤ C τ β ( r − s ) α − , W. LI AND A.J. SALGADO for a constant C depending on α and β . Indeed, if this is the case, we have ˆ r ( r − t ) α − II ( t )d t ≤ C τ β ˆ r ( r − s ) α − (cid:13)(cid:13) f ( s ) − F P ( s ) (cid:13)(cid:13) p d s ≤ C τ β k f − F P k pL pα (0 ,T ; H ) , and we combine the estimates for II ( t ) and II ( t ) together and conclude the proof of (2.11).Let us now turn to the proof of (2.15). First, if r − s ≤ τ then it suﬃces to observe that A ≤ ˆ rs ( r − t ) α − ( t − s ) β − d t = Γ( α )Γ( β )Γ( α + β ) ( r − s ) α + β − ≤ Γ( α )Γ( β )Γ( α + β ) τ β ( r − s ) α − . Now, if r − s > τ , we estimate as A = ˆ rs ( r − t ) α − ( t − s ) β − d t − ˆ rs ( r − t ) α − ( t − s + τ ) β − d t = Γ( α )Γ( β )Γ( α + β ) ( r − s ) α + β − − ˆ r − s − τ ( t + τ ) β − ( r − t − s ) α − d t + ˆ τ ( r − s − t + τ ) α − t β − d t = Γ( α )Γ( β )Γ( α + β ) (cid:0) ( r − s ) α + β − − ( r − s + τ ) α + β − (cid:1) + ˆ τ ( r − s − t + τ ) α − t β − d t. The ﬁrst term can be bounded using that r − s > τ as follows( r − s ) α + β − − ( r − s + τ ) α + β − ≤ max { α + β − , } τ ( r − s ) α + β − ≤ τ β ( r − s ) α − . On the other hand, since for t ∈ (0 , τ ) we have that r − s + τ − t ≥ r − s , the second term can beestimated as ˆ τ ( r − s − t + τ ) α − t β − d t ≤ ( r − s ) α − ˆ τ t β − d t = 1 α ( r − s ) α − τ β . This concludes the proof. (cid:3)

We refer the reader to [27, section 4] for further results concerning the space L pα (0 , T ; H ).2.3. The Caputo derivative.

As we mentioned in the Introduction, the deﬁnition of the Caputoderivative, given in (1.1) seems unnatural. Smoothness of higher order is needed to deﬁne a fractionalderivative. Several attempts at resolving this discrepancy have been proposed in the literature andwe here quickly describe a few of them.First, one of the main reasons that motivate practitioners to use, among the many possibledeﬁnitions, the Caputo derivative (1.1) is, ﬁrst, that D αc w ( t ) = w for t ≤

0. Therefore,(2.16) D αc w ( t ) = 1Γ(1 − α ) ˆ t −∞ ˙ w ( r )( t − r ) α d r = 1Γ(1 − α ) ˆ t −∞ ( w ( r ) − w ( t ))˙( t − r ) α d r = 1Γ( − α ) ˆ t −∞ w ( r ) − w ( t )( t − r ) α +1 d r = D αm w ( t ) , where, in the last step, we integrated by parts. The expression D αm w ( t ) is known as the Marchaudderivative of order α of the function w . This is the way that the Caputo derivative has beenunderstood, for instance, in [6, 5, 7, 4]. We comment, in passing, that owing to [9] this fractionalderivative satisﬁes an extension problem similar to the (by now) classical Caﬀarelli Silvestre extension[10, 34] for the fractional Laplacian. IME FRACTIONAL GRADIENT FLOW 9

Another approach, and the one we shall adopt here, is to notice that (1.1) can be converted, forsuﬃciently smooth functions, into a Volterra type equation(2.17) w ( t ) = w (0) + 1Γ( α ) ˆ t ( t − s ) α − D αc w ( s )d s, ∀ t ∈ [0 , T ] . This identity is the beginning of the theory developed in [25] to extend the notion of Caputo deriv-ative. To be more speciﬁc, [25] considers the set of distributions E T = { w ∈ D ′ ( R ; H ) : ∃ M w ∈ ( −∞ , T ) , supp( w ) ⊂ [ − M w , T ) } . for a ﬁxed time T >

0. Then the modiﬁed Riemann Liouville derivative for any distribution w ∈ E T is deﬁned, following classical references like [18, Section 1.5.5], as D αrl w = w ∗ g − α ∈ E T where g − α ( t ) = − α ) D ( θ ( t ) t − α ), with θ being the Heaviside function, is a distribution supported in[0 , ∞ ) and the convolution is understood as the generalized deﬁnition between distributions. Here D denotes the distributional derivative. Reference [25] then uses this to deﬁne the generalized Caputoderivative of w ∈ L ([0 , T ); H ) associated with w by D αc w = D αrl ( w − w ) . If there exists w (0) ∈ H such that lim t ↓ ﬄ t k w ( s ) − w (0) k d s = 0, then we always impose w = w (0)in this deﬁnition. It is shown in [25, Theorem 3.7] that for such a function w , (2.17) holds forLebesgue a.e. t ∈ (0 , T ) provided that the generalized Caputo derivative D αc w ∈ L ([0 , T ); H ).We also comment that [25, Proposition 3.11(ii)] implies that for every function w ∈ L (0 , T ; H )with D αc w ∈ L (0 , T ; H ) we have(2.18) 12 D αc k w k ( t ) ≤ h D αc w ( t ) , w ( t ) i . Finally, we recall that the Mittag-Leﬄer function of order α ∈ (0 ,

1) is deﬁned via E α ( z ) = ∞ X k =0 z k Γ( αk + 1) . We refer the reader to [19] for an extensive treatise on this function. Here we just mention that thisfunction satisﬁes, for any λ ∈ R , the identity(2.19) D αc E α ( λt α ) = λE α ( λt α ) , E α (0) = 1 . An auxiliary estimate.

Having deﬁned the Caputo derivative of a function, we present anauxiliary result. Namely, an estimate on functions that have piecewise constant, over some partition P , Caputo derivative. Lemma 2.6 (continuity) . Let p ∈ [1 , ∞ ) ; P be a partition, as in (2.2) , of [0 , T ] ; and w ∈ L (0 , T ; H ) be such that its generalized Caputo derivative D αc w ∈ L pα (0 , T ; H ) , and it is piecewise constant over P . Then we have (2.20) sup r ∈ [0 ,T ] ˆ r ( r − t ) α − k w ( ⌈ t ⌉ P ) − w ( t ) k p d t ≤ Cτ pα k D αc w k pL pα (0 ,T ; H ) , where the constant C depends only on α .Proof. The representation (2.17) allows us to write w ( ⌈ t ⌉ P ) − w ( t ) =1Γ( α ) " ˆ t D αc w ( s ) (cid:0) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:1) d s + ˆ ⌈ t ⌉ P t D αc w ( s )( ⌈ t ⌉ P − s ) α − d s . Therefore by H¨older inequality, we have k w ( ⌈ t ⌉ P ) − w ( t ) k p ≤ p ( α ) ˆ t (cid:12)(cid:12) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:12)(cid:12) d s + ˆ ⌈ t ⌉ P t ( ⌈ t ⌉ P − s ) α − d s ! p − ˆ t k D αc w ( s ) k p (cid:12)(cid:12) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:12)(cid:12) d s + ˆ ⌈ t ⌉ P t k D αc w ( s ) k p ( ⌈ t ⌉ P − s ) α − d s ! ≤ Cτ α ( p − ˆ t k D αc w ( s ) k p (cid:12)(cid:12) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:12)(cid:12) d s + ( ⌈ t ⌉ P − t ) α α k D αc w ( t ) k p ! = C τ α ( p − ˆ t k D αc w ( s ) k p (cid:12)(cid:12) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:12)(cid:12) d s + C τ pα k D αc w ( t ) k p = I ( t ) + I ( t ) , where the constants C , C , and C depend only on p and α .For I ( t ), we simply have ˆ r ( r − t ) α − I ( t )d t ≤ Cτ pα k D αc w k pL pα (0 ,T ; H ) . Now to bound the integral for I ( t ), we use Fubini’s theorem to get ˆ r ( r − t ) α − I ( t )d t = C τ ( p − α ˆ r k D αc w ( s ) k p ˆ rs ( r − t ) α − (cid:12)(cid:12) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:12)(cid:12) d t d s. We claim that(2.21) ˆ rs ( r − t ) α − (cid:12)(cid:12) ( ⌈ t ⌉ P − s ) α − − ( t − s ) α − (cid:12)(cid:12) d t ≤ C ( r − s ) α − τ α , where C only depends α . If this is true, then we have ˆ r ( r − t ) α − I ( t )d t ≤ Cτ pα ˆ r k D αc w ( s ) k p ( r − s ) α − d s ≤ Cτ pα k D αc w k pL pα (0 ,T ; H ) . The proof of (2.21) proceeds as the one for (2.15). For brevity we skip the details. (cid:3)

Some comparison estimates.

As a ﬁnal preparatory step we present some auxiliary resultsthat shall be repeatedly used and are related to diﬀerential inequalities involving the Caputo deriv-ative, and a Gr¨onwall-like lemma.First, we present a comparison principle which is similar to [17, Proposition 4.2]. The proof canbe done easily by contradiction, and therefore it is omitted here.

Lemma 2.7 (comparison) . Let g , g : [0 , T ] × R → R be both nondecreasing in their second argumentand g be measurable. Assume that v, w ∈ C ([0 , T ]; R ) satisfy v (0) < w (0) , and there is some α ∈ (0 , , for which v ( t ) ≤ g ( t, v ( t )) + 1Γ( α ) ˆ t ( t − s ) α − g ( s, v ( s ))d s,w ( t ) > g ( t, w ( t )) + 1Γ( α ) ˆ t ( t − s ) α − g ( s, w ( s ))d s, for every t ∈ [0 , T ] . Then we have v < w on [0 , T ] . We now present a result that can be interpreted as an extension of [30, Lemma 3.7] to the fractionalcase. However, unlike the classical case, here we have the restriction that λ ≥ Lemma 2.8 (fractional Gr¨onwall) . Let a ∈ C ([0 , T ]; R ) with D αc a ∈ L loc ([0 , T ); R ) , b, c, d : [0 , T ] → [0 , + ∞ ] be measurable functions, and λ ≥ . If the following diﬀerential inequality is satisﬁed (2.22) D αc a ( t ) + b ( t ) ≤ λa ( t ) + c ( t ) + 2 d ( t ) a ( t ) , a.e. t ∈ (0 , T ) , IME FRACTIONAL GRADIENT FLOW 11 then we have sup t ∈ [0 ,T ] a ( t ) + 1Γ( α ) k b k L α (0 ,T ; R ) ! / ≤ e D ( T ) E α (2 λT α ) + q a (0) + e C ( T ) p E α (2 λT α ) where (2.23) e C ( t ) = 1Γ( α ) k c k L α (0 ,t ; R ) , e D ( t ) = 1Γ( α ) k d k L α (0 ,t ; R ) . Proof.

From (2.22) we obtain that(2.24) a ( t ) + 1Γ( α ) ˆ t ( t − s ) α − b ( s )d s ≤ a (0) + 1Γ( α ) ˆ t ( t − s ) α − (cid:2) c ( s ) + 2 d ( s ) a ( s ) + 2 λa ( s ) (cid:3) d s ≤ a (0) + e C ( t ) + 2 e a ( t ) e D ( t ) + 2 λ Γ( α ) ˆ t ( t − s ) α − e a ( s ) d s, where e a ( t ) = max ≤ s ≤ t a ( s ) and the functions e C, e D are deﬁned in (2.23). This immediately impliesthat e a ( t ) ≤ a (0) + e C ( t ) + 2 e a ( t ) e D ( t ) + 2 λ Γ( α ) ˆ t ( t − s ) α − e a ( s )d s. In order to bound e a , we construct a barrier function e ( t ) = K p E α (2 λt α ) where the constant K ischosen so that e ( t ) > a (0) + e C ( t ) + 2 e ( t ) e D ( t ) + 2 λ Γ( α ) ˆ t ( t − s ) α − e ( s )d s, ∀ t ∈ (0 , T ) . Indeed, owing to (2.19) we see that2 λ Γ( α ) ˆ t ( t − s ) α − E α (2 λs α ) d s = E α (2 λt α ) − E α (0) = E α (2 λt α ) − a (0) + e C ( t ) + e ( t ) e D ( t ) + 2 λ Γ( α ) ˆ t ( t − s ) α − e ( s ) d s = a (0) + e C ( t ) + 2 K p E α (2 λt α ) e D ( t ) + K ( E α (2 λt α ) − < K E α (2 λt α ) = e ( t ) , for every t ∈ (0 , T ) provided that(2.25) K > e D ( T ) p E α (2 λT α ) + q a (0) + e C ( T ) + e D ( t ) E α (2 λT α ) . Applying Lemma 2.7 we obtain that e a ( t ) ≤ e ( t ) = K p E α (2 λt α ) . Plugging this back into (2.24) and noticing that this holds for any K satisfying (2.25) we obtainthat sup t ∈ [0 ,T ] a ( t ) + 1Γ( α ) ˆ t ( t − s ) α − b ( s )d s ≤ (cid:18) e D ( T ) p E α (2 λT α ) + q a (0) + e C ( T ) + e D ( t ) E α (2 λT α ) (cid:19) E α (2 λT α ) ≤ (cid:18) e D ( T ) E α (2 λT α ) + q a (0) + e C ( T ) p E α (2 λT α ) (cid:19) which is the desired result. (cid:3) Deconvolutional discretization of the Caputo derivative

To discretize the Caputo fractional derivative, references [26, 28] consider a so-called deconvolu-tional scheme on uniform time grids and prove some properties of this discretization. In this section,we generalize this deconvolutional scheme to the variable time step setting, and prove propertiesthat will be useful in deriving a posteriori error estimates later, in Section 5.2.3.1.

The discrete Caputo derivative.

Let P be a partition as in (2.2). To motivate this dis-cretization, let us assume that w : [0 , T ] → H is such that D αc w ( t ) is piecewise constant on thepartition P , with D αc w ( t ) = V n ( t ) . Then formally by (2.17), we have(3.1) w ( t n ) = w (0) + 1Γ( α ) ˆ t n ( t n − s ) α − D αc w ( s )d s = w (0) + 1Γ( α + 1) n X i =1 (( t n − t i − ) α − ( t n − t i ) α ) V i , n ∈ { , . . . , N } . Let K P ∈ R N × N be the matrix induced by the partition P , which is deﬁned as(3.2) K P ,ni =  α + 1) (cid:16) ( t n − t i − ) α − ( t n − t i ) α (cid:17) , ≤ i ≤ n ≤ N, , ≤ n < i ≤ N. Then we can rewrite (3.1) in matrix form as W = W + K P V , where V , W , W ∈ H N with V n = V n , W n = w ( t n ), and ( W ) n = w (0). Notice that K P islower triangular and all the elements on and below the main diagonal are positive. Therefore K P isinvertible and its inverse is also lower triangular. Thus, the previous identity is equivalent to V = K − P ( W − W ) , in other words V n = n X i =1 K − P ,ni ( W i − W ) = K − P ,n W + n X i =1 K − P ,ni W i , where we set K − P ,n = − P nj =1 K − P ,nj . This motivates the following approximation of the Caputoderivative provided W ∈ H N and W ∈ H are given. For n ∈ { , . . . , N } we set(3.3) ( D α P W ) n = n X i =1 K − P ,ni ( W i − W ) = n X i =0 K − P ,ni W i = n − X i =0 K − P ,ni ( W i − W n ) . Properties of K − P . We note that, when the partition is uniform, both K P and its inversewill be Toeplitz matrices, and hence the product K P V can be interpreted as the convolution ofsequences. Consequently, multiplication by K − P is equivalent to taking a sequence deconvolution.This motivates the name of this scheme and enables [28] to apply techniques for the deconvolutionof a completely monotone sequence and prove properties of K − P .We were not successful in extending, to a general partition P , all the properties of K − P presentedin [28] for the case when the partition is uniform. This is mainly because their techniques are basedon ideas that rely on completely monotone sequences, which do not easily extend to a general P .Nevertheless we have obtained suﬃcient, for our purposes, properties. The following result is thecounterpart to [28, Proposition 3.2(1)]. IME FRACTIONAL GRADIENT FLOW 13

Proposition 3.1 (properties of K − P ) . Let P be a partition as in (2.2) , and K P be deﬁned in (3.2) .The matrix K P is invertible, and its inverse satisﬁes: K − P ,n = − n X j =1 K − P ,nj < , n ∈ { , . . . , N } , (3.4) K − P ,ii > i ∈ { , . . . , N } , K − P ,ni < ≤ i < n ≤ N. (3.5) Proof.

We already showed that K P is nonsingular. We prove (3.4) and (3.5) separately.First, to prove that K − P ,n <

0. For this, it suﬃces to show that for a vector W ∈ R N such that W i = 1 for any i ≥

1, then the vector F = K − P W satisﬁes F n > ∀ n ≥ . We prove this by induction on n . For n = 1, clearly F = W K P , , = 1 K P , , > . Suppose that F j > ≤ j ≤ k , now we want to show that F k +1 > W k = k X j =1 K P ,k,j F j , W k +1 = k +1 X j =1 K P ,k +1 ,j F j , then taking the diﬀerence we have(3.6) 0 = k +1 X j =1 K P ,k +1 ,j F j − k X j =1 K P ,k,j F j = K P ,k +1 ,k +1 F k +1 + k X j =1 ( K P ,k +1 ,j − K P ,k,j ) F j . We claim that K P ,k +1 ,j − K P ,k,j < j . In fact, this can be seen through the deﬁnition ofthe entries of K P K P ,k +1 ,j − K P ,k,j < ⇐⇒ ( t k +1 − t j − ) α − ( t k +1 − t j ) α < ( t k − t j − ) α − ( t k − t j ) α ⇐⇒ ˆ t j − t j − ( t k +1 − t j + s ) α − d s < ˆ t j − t j − ( t k − t j + s ) α − d s. Using K P ,k +1 ,j − K P ,k,j < F j > j ∈ { , . . . , k } in (3.6), we see that K P ,k +1 ,k +1 F k +1 > F k +1 >

0. Therefore by induction we proved that K − P ,n < n ≥ K − P ,ii > K − P ,ni <

0. Consider a vector W ∈ R N that is such that W i = 1 and W j = 0 for j = i . It suﬃces to prove that for, F = K − P W , we have F i > n > i (3.7) F n < . Since K − P is lower triangular, we know F j = 0 for j ∈ { , . . . , i − } . From K P F = W , we see that1 = W i = ( K P F ) i = i X j =1 K P ,ij F j = K − P ,ii F i and thus F i = 1 / K P ,ii >

0. Now we prove by induction that (3.7) holds. First, when n = i + 1, wehave 0 = W i +1 = ( K P F ) i +1 = K P ,i +1 ,i F i + K P ,i +1 ,i +1 F i +1 and hence F i +1 = − K P ,i +1 ,i F i K P ,i +1 ,i +1 < . This shows that (3.7) is true for n = i + 1. Now suppose that we have already shown that F n < n satisfying n ∈ { i + 1 , . . . , k } , we want to prove F k +1 <

0. To this aim, notice that0 = W k +1 = ( K P F ) k +1 = k X j = i K P ,k +1 ,j F j + K P ,k +1 ,k +1 F k +1 , therefore we only need to show P kj = i K P ,k +1 ,j F j >

0. Recall that0 = W k = ( K P F ) k = k X j = i K P ,k,j F j , and thus, since K P ,k,i >

0, we can get k X j = i K P ,k +1 ,j F j = k X j = i K P ,k +1 ,j F j − K P ,k +1 ,i K P ,k,i k X j = i K P ,k,j F j = k X j = i +1 (cid:18) K P ,k +1 ,j − K P ,k +1 ,i K P ,k,i K P ,k,j (cid:19) F j . Since by the induction hypothesis F j < j ∈ { i + 1 , . . . , k } , it only remains to show that K P ,k +1 ,j − K P ,k +1 ,i K P ,k,i K P ,k,j < ⇐⇒ K P ,k +1 ,i K P ,k,i > K P ,k +1 ,j K P ,k,j . Applying Cauchy’s mean value theorem, there exists η ∈ ( t k − t i , t k − t i − ) such that K P ,k +1 ,i K P ,k,i = ( t k +1 − t i − ) α − ( t k +1 − t i ) α ( t k − t i − ) α − ( t k − t i ) α = α ( η + τ k +1 ) α − αη α − = (cid:18) η + τ k +1 η (cid:19) α − . Similarly there exists ξ ∈ ( t k − t j , t k − t j − ) such that K P ,k +1 ,j K P ,k,j = (cid:18) ξ + τ k +1 ξ (cid:19) α − . Due to j > i , we have ξ < η and hence K P ,k +1 ,j K P ,k,j = (cid:18) ξ + τ k +1 ξ (cid:19) α − < (cid:18) η + τ k +1 η (cid:19) α − = K P ,k +1 ,i K P ,k,i . Therefore from the arguments above we see that F k +1 <

0, and by induction K − P ,ni < n > i . (cid:3) Remark 3.2 (generalization) . The discretization of the Caputo derivative, described in (3.3) , andits properties presented in Proposition 3.1 can be extended to more general kernels. Indeed, for ageneral convolutional kernel g ∈ L (0 , T ; R ) the entries of the matrix K P will be K P ,ni = ˆ t n − t i t n − t i − g ( t )d t. The proof of (3.4) follows verbatim provided g ′ ( t ) < , as the reader can readily verify. The proofof (3.5) only requires that the function G ( t ) = ln( g ( t )) , satisﬁes G ′′ ( t ) > . For a uniform time grid P , [26, Theorem 2.3] proves that, for every i , the sequence {− K − P ,n + i,i } n ≥ is completely monotone. The following result holds for a general partition P , and is a direct conse-quence of [26, Theorem 2.3] for uniform time stepping. Proposition 3.3 (monotonicity) . Let P be a partition of [0 , T ] as in (2.2) , and K P be deﬁned asin (3.2) . Then, its inverse satisﬁes:1. For n ∈ { , . . . , N − } , (3.8) − n X j =1 K − P ,nj = K − P ,n < K − P ,n +1 , = − n +1 X j =1 K − P ,n +1 ,j . IME FRACTIONAL GRADIENT FLOW 15

2. For ≤ i < n < N , (3.9) K − P ,ni < K − P ,n +1 ,i . Proof.

To prove (3.8) it suﬃces to show that for a vector W ∈ R N such that W i = 1 for any i ≥ F = K − P W satisﬁes F n > F n +1 ∀ n ≥ . We prove this by induction on n . For n = 1,1 = W = ( K P F ) = K P , F , W = ( K P F ) = K P , F + K P , F = ( K P , + K P , ) F + K P , ( F − F ) . Clearly, F > , K P , = ( t − t ) α < ( t − t ) α = K P , + K P , . Hence we have K P , ( F − F ) = 1 − ( K P , + K P , ) F < − K P , F = 0 , which, since K P , >

0, implies that F − F <

0, i.e. F > F . So the claim holds for n = 1.Suppose F j +1 < F j for all 1 ≤ j < k , now we want to show that F k +1 < F k as well. Notice that1 = W k = k X i =1 K P ,ki F i = k − X i =0  k X j = i +1 K P ,kj  ( F i +1 − F i ) = k − X i =0 ( t k − t i ) α ( F i +1 − F i ) , W k +1 = k +1 X i =1 K P ,k +1 ,i F i = k X i =0 ( t k +1 − t i ) α ( F i +1 − F i ) , where we set F = 0 in the equations above. Therefore to show F k +1 < F k , we only need to provethat(3.10) 0 < k − X i =0 ( t k +1 − t i ) α ( F i +1 − F i ) − k − X i =0 ( t k +1 − t i ) α ( F i +1 − F i ) − k − X i =0 ( t k − t i ) α ( F i +1 − F i )= k − X i =0 (cid:0) ( t k +1 − t i ) α − ( t k − t i ) α (cid:1) ( F i +1 − F i ) . Since we also have1 = W k − = k − X i =1 K P ,k − ,i F i = k − X i =0 ( t k − − t i ) α ( F i +1 − F i ) = k − X i =0 ( t k − − t i ) α ( F i +1 − F i ) , Taking the diﬀerence between the equation above and the one for W k , we obtain that0 = W k − W k − = k − X i =0 ( t k − t i ) α ( F i +1 − F i ) − k − X i =0 ( t k − − t i ) α ( F i +1 − F i )= k − X i =0 (cid:0) ( t k − t i ) α − ( t k − − t i ) α (cid:1) ( F i +1 − F i )In light of this identity, we claim that to obtain (3.10) it suﬃces to show that(3.11) t αk +1 − t αk t αk − t αk − = ( t k +1 − t ) α − ( t k − t ) α ( t k − t ) α − ( t k − − t ) α > ( t k +1 − t i ) α − ( t k − t i ) α ( t k − t i ) α − ( t k − − t i ) α , i ∈ { , . . . , k − } . If this is true, letting c = (cid:0) t αk +1 − t αk (cid:1) / (cid:0) t αk − t αk − (cid:1) we have: k − X i =0 (cid:0) ( t k +1 − t i ) α − ( t k − t i ) α (cid:1) ( F i +1 − F i )= k − X i =0 (cid:16)(cid:0) ( t k +1 − t i ) α − ( t k − t i ) α (cid:1) − c (cid:0) ( t k − t i ) α − ( t k − − t i ) α (cid:1)(cid:17) ( F i +1 − F i )= k − X i =1 (cid:16)(cid:0) ( t k +1 − t i ) α − ( t k − t i ) α (cid:1) − c (cid:0) ( t k − t i ) α − ( t k − − t i ) α (cid:1)(cid:17) ( F i +1 − F i )= k − X i =1 d i ( F i +1 − F i ) , where d i = (cid:0) ( t k +1 − t i ) α − ( t k − t i ) α (cid:1) − c (cid:0) ( t k − t i ) α − ( t k − − t i ) α (cid:1) < F i +1 − F i < ≤ i ≤ k −

1, so the equation above implies (3.10), andhence F k +1 < F k is proved.To ﬁnish the proof, we focus on (3.11), ﬁx i and deﬁne c = t k − − t i , c = t k − t i , c = t k +1 − t i and function h ( x ) = ( x + c ) α − ( x + c ) α ( x + c ) α − ( x + c ) α . Then (3.11) is equivalent to h ( t i − t ) > h (0), and it remains to show that h ( x ) is strictly increasingfor x >

0. We observe thatdd x (ln( h ( x ))) = α (cid:20) ( x + c ) α − − ( x + c ) α − ( x + c ) α − ( x + c ) α − ( x + c ) α − − ( x + c ) α − ( x + c ) α − ( x + c ) α (cid:21) . Applying Cauchy’s mean-value theorem to the two fractions above, we know there exists η ∈ ( x + c , x + c ) and ξ ∈ ( x + c , x + c ) such thatdd x (ln( h ( x ))) = α (cid:20) ( α − η α − αη α − − ( α − ξ α − αξ α − (cid:21) = ( α − (cid:0) η − − ξ − (cid:1) > , where the last inequality holds because α < ξ < x + c < η . This shows the monotonicity offunction h and conﬁrms (3.11). This concludes the inductive step and proves (3.8).The proof of (3.9) is obtained similarly. For convenience we only write the proof for i = 1, butthe extension to general i is straightforward. Consider a vector W ∈ R N such that W j = 1 if j = 1and W j = 0 if j = 1, then it suﬃces to prove that vector F = K − P W satisﬁes(3.12) F n < F n +1 for n ∈ { , . . . , N − } . We prove (3.12) by induction on n . For n = 2, observe that W k = k X j =0 ( t k − t j ) α ( F j +1 − F j ) = k − X j =0 ( t k − t j ) α ( F j +1 − F j )from the proof of (3.8) with F = 0, we have1 = W = ( t − t ) α ( F − F )0 = W = ( t − t ) α ( F − F ) + ( t − t ) α ( F − F )0 = W = ( t − t ) α ( F − F ) + ( t − t ) α ( F − F ) + ( t − t ) α ( F − F )From the ﬁrst and second equation above, we see that F > F − F <

0. Combining thesecond and the third equation we deduce that0 = W − t α t α W = (cid:20) ( t − t ) α − ( t − t ) α t α t α (cid:21) ( F − F ) + ( t − t ) α ( F − F ) . IME FRACTIONAL GRADIENT FLOW 17

Since ( t − t ) α − ( t − t ) α ( t /t ) α = ( t − t ) α − ( t − ( t t /t )) α >

0, we obtain that F − F > n = 2.It also remains to prove that when (3.12) holds for n ∈ { , . . . , k − } , then it also holds for n = k ,i.e. F k < F k +1 , provided that k < N . To this aim, we ﬁrst see that(3.13) 0 = W k +1 − t αk +1 t αk W k = k X j =1 (cid:18) ( t k +1 − t j ) α − ( t k − t j ) α t αk +1 t αk (cid:19) ( F j +1 − F j ) . Therefore in order to prove F k < F k +1 , we only need to show that(3.14) k − X j =1 (cid:18) ( t k +1 − t j ) α − ( t k − t j ) α t αk +1 t αk (cid:19) ( F j +1 − F j ) < . Similar to (3.13) we also have0 = W k − t αk t αk − W k − = k − X j =1 (cid:18) ( t k − t j ) α − ( t k − − t j ) α t αk t αk − (cid:19) ( F j +1 − F j ) . Thanks to the inductive hypothesis, we know that F j +1 − F j < j = 2 and F j +1 − F j > j ∈ { , . . . , k − } , Therefore using a similar argument used in the proof for (3.8), to prove (3.14)we only need to show(3.15)( t k +1 − t ) α − ( t k − t ) α ( t k +1 /t k ) α ( t k − t ) α − ( t k − − t ) α ( t k /t k − ) α > ( t k +1 − t j ) α − ( t k − t j ) α ( t k +1 /t k ) α ( t k − t j ) α − ( t k − − t j ) α ( t k /t k − ) α , j ∈ { , . . . , k − } , which is similar to (3.11). We rewrite the inequality above as(1 − t /t k +1 ) α − (1 − t /t k ) α (1 − t /t k ) α − (1 − t /t k − ) α > (1 − t j /t k +1 ) α − (1 − t j /t k ) α (1 − t j /t k ) α − (1 − t j /t k − ) α , j ∈ { , . . . , k − } , and deﬁne the function h ( x ) = (1 − x/t k +1 ) α − (1 − x/t k ) α (1 − x/t k ) α − (1 − x/t k − ) α , then it suﬃces to show that h ′ ( x ) < < x < t k − . Observing thatdd x ln( h ( x )) = − αx (cid:20) ( x/t k +1 )(1 − x/t k +1 ) α − − ( x/t k )(1 − x/t k ) α − (1 − x/t k +1 ) α − (1 − x/t k ) α − ( x/t k )(1 − x/t k ) α − − ( x/t k − )(1 − x/t k − ) α − (1 − x/t k ) α − (1 − x/t k − ) α (cid:21) . Letting h ( x ) = (1 − x ) x α − , h ( x ) = x α , by Cauchy’s mean-value theorem, there exists η ∈ (1 − x/t k , − x/t k +1 ) and ξ ∈ (1 − x/t k − , − x/t k ) such thatdd x (ln( h ( x ))) = − αx (cid:18) h ′ ( η ) h ′ ( η ) − h ′ ( ξ ) h ′ ( ξ ) (cid:19) = − αx (cid:18)(cid:18) α − αη − (cid:19) − (cid:18) α − αξ − (cid:19)(cid:19) < < ξ < η . This implies that h ′ ( x ) < < x < t k − and ﬁnishes inductive step of theinduction. Hence (3.9) is proved. (cid:3) Remark 3.4 (generalization) . Notice that, for a general kernel g , property (3.8) remains validprovided G ( t ) = ln( g ( t )) satisﬁes G ′′ ( t ) > . α = 0 . α = 0 . α = 0 . Figure 1.

Given a partition P , the ﬁgure shows the nonlocal basis functions { ϕ P ,i } Ni =0 for diﬀerent values of α . Every function whose Caputo derivative ispiecewise constant can be written as a linear combination of these functions. No-tice that, for any partition point ϕ P ,i ( t j ) = δ ij . In addition, Proposition 3.5 showsthat these functions form a partition of unity.3.3. A continuous interpolant.

Given a partition P , a sequence W ∈ H N , and W ∈ H , wedeﬁned the discrete Caputo derivative ( D α P W ) n via (3.3). Motivated by the Volterra type equation(2.17) between a continuous function w and its Caputo derivative D αc w , it is possible, following [28],to deﬁne, over P , a natural continuous interpolant of W n by(3.16) c W P ( t ) = W + 1Γ( α ) ˆ t ( t − s ) α − V P ( s )d s where V P is deﬁned by(3.17) V P ( t ) = ( D α P W ) n ( t ) . By deﬁnition, we have that c W P ( t n ) = W n . Moreover,(3.18) c W P ( t ) = W + 1Γ( α + 1) n − X j =1 (( t − t j − ) α − ( t − t j ) α ) ( D α P W ) j + ( t n − t ) α ( D α P W ) n = n ( t ) X i =0 W i ϕ P ,i ( t ) , where we deﬁned(3.19) ϕ P , ( t ) = 1 + 1Γ( α + 1) n ( t ) − X j =1 (( t − t j − ) α − ( t − t j ) α ) K − P ,j + ( t n − t ) α K − P ,n ,ϕ P ,i ( t ) = 1Γ( α + 1) n ( t ) − X j = i (( t − t j − ) α − ( t − t j ) α ) K − P ,ji + ( t n − t ) α K − P ,ni , i ∈ { , . . . , N } . The functions { ϕ P ,i } Ni =0 play the role, in this context, of the standard “hat” basis functions usedfor piecewise linear interpolation over a partition P . Indeed, they are such that any function withpiecewise constant (Caputo) derivative can be written as a linear combination of them. Figure 1illustrates the behavior of these functions. As expected, and in contrast to the hat basis functions,these functions are nonlocal, in the sense that they have global support. Something worth noticingis also that the ﬁgure seems to indicate that, as α ↓

0, the functions resemble piecewise constantsand, in contrast, when α ↑ t ∈ [0 , T ] we have P n ( t ) i =0 ϕ P ,i ( t ) = 1. The following result shows that IME FRACTIONAL GRADIENT FLOW 19 ϕ P ,i ( t ) ≥

0. Thus, for any t ∈ [0 , T ], c W P ( t ) is a convex combination of its nodal values { W j } Nj =0 .This observation will be crucial to derive an a posteriori error estimate in Section 5.2. Proposition 3.5 (positivity) . Let P be a partition deﬁned as in (2.2) . Let the functions { ϕ P ,i } Ni =0 be deﬁned as in (3.19) . Then, for any i ∈ { , . . . , N } and t ∈ [0 , T ] , we have ϕ P ,i ( t ) ≥ . Inaddition, for t / ∈ P and i ∈ { , . . . , n ( t ) } we have ϕ P ,i ( t ) > .Proof. By deﬁnition, for t = t n , we have ϕ P ,n ( t n ) = 1 and ϕ P ,i ( t n ) = 0 for any i = n . Also, for i > n ( t ), we see that ϕ P ,i ( t ) = 0, and hence it only remains to show that ϕ P ,i ( t ) > i ≤ n ( t ).To show this, consider W i = 1 and W j = 0 for j = i , a piecewise constant V P and its interpolation c W P deﬁned in (3.16) and (3.17). Then our goal is to show that c W P ( t ) > i = n ( t ) >

0, then it is easy to check by deﬁnition that ( D α P W ) n > D α P W ) j = 0 for j ∈ { , . . . , i − } . Therefore we obtain c W P ( t ) = 1Γ( α ) ˆ t ( t − s ) α − V ( s )d s = ( t n − t ) α Γ( α + 1) ( D α P W ) n > . If i < n ( t ), the proof is not that straightforward. The trick is to insert the time t , which isnot on the partition P , to get a new partition P ′ = P ∪ { t } and then apply Propositions 3.1 and3.3 in an appropriate way. Let us now work out the details. Let P ′ = { t ′ k } N +1 k =0 and notice that t ′ n ( t ) = t, t ′ n ( t )+1 = t n ( t ) . On the basis of this partition we deﬁne the vector W ′ ∈ H N +1 via W ′ j = c W P ( t ′ j ), then since V P is constant on ( t ′ n ( t ) − , t ′ n ( t )+1 ] = ( t n ( t ) − , t n ( t ) ], we have( D α P ′ W ′ ) n ( t ) = ( D α P ′ W ′ ) n ( t )+1 . Since the only possible nonzero components of W ′ are W ′ i = W i = 1 and W ′ n ( t ) = c W P ( t ), thereforewe deduce from the equality above that K − P ′ ,n ( t ) i W ′ i + K − P ′ ,n ( t ) n ( t ) W ′ n ( t ) = ( D α P ′ W ′ ) n ( t ) = ( D α P ′ W ′ ) n ( t )+1 = K − P ′ ,n ( t )+1 ,i W ′ i + K − P ′ ,n ( t )+1 ,n ( t ) W ′ n ( t ) , which can be rearranged as K − P ′ ,n ( t )+1 ,i − K − P ′ ,n ( t ) i = c W P ( t ) (cid:16) K − P ′ ,n ( t ) n ( t ) − K − P ′ ,n ( t )+1 ,n ( t ) (cid:17) . From Proposition 3.3 we see that K − P ′ ,n ( t )+1 ,i − K − P ′ ,n ( t ) i > K − P ′ ,n ( t ) n ( t ) − K − P ′ ,n ( t )+1 ,n ( t ) > K − P ′ ,n ( t ) n ( t ) > K − P ′ ,n ( t )+1 ,n ( t ) <

0. Thisleads to the fact that c W P ( t ) > (cid:3) Time fractional gradient flow: Theory

We have now set the stage for the study of time fractional gradient ﬂows, which were formallydescribed in (1.2). Throughout the remaining of our discussion we shall assume that the initialcondition satisﬁes u ∈ D (Φ) and that f ∈ L α (0 , T ; H ). We begin by commenting that the case f = 0 was already studied in [28, Section 5] where they studied so-called strong solutions , see [28,Deﬁnition 5.4]. Here we trivially extend their deﬁnition to the case f = 0. Deﬁnition 4.1 (strong solution) . A function u ∈ L loc ([0 , T ); H ) is a strong solution to (1.2) if(i) (Initial condition) lim t ↓ t k u ( s ) − u k d s = 0 . (ii) (Regularity) D αc u ( t ) ∈ L loc ([0 , T ); H ) .(iii) (Evolution) For almost every t ∈ [0 , T ) , we have f ( t ) − D αc u ( t ) ∈ ∂ Φ( u ( t )) . Energy solutions.

Since H is a Hilbert space, we will mimic the theory for classical gradientﬂows and introduce the notion of energy solutions for (1.2). To motivate it, suppose that at some t ∈ (0 , T ) f ( t ) − D αc u ( t ) ∈ ∂ Φ( u ( t )) , then, by deﬁnition of the subdiﬀerential, this is equivalent to the evolution variational inequality (EVI)(4.1) h D αc u ( t ) , u ( t ) − w i + Φ( u ( t )) − Φ( w ) ≤ h f ( t ) , u ( t ) − w i , ∀ w ∈ H . Deﬁnition 4.2 (energy solution) . The function u ∈ L (0 , T ; H ) is an energy solution to (1.2) if(i) (Initial condition) lim t ↓ t k u ( s ) − u k d s = 0 . (ii) (Regularity) D αc u ∈ L (0 , T ; H ) .(iii) (EVI) For any w ∈ L (0 , T ; H )(4.2) ˆ T [ h D αc u ( t ) , u ( t ) − w ( t ) i + Φ( u ( t )) − Φ( w ( t ))] d t ≤ ˆ T h f ( t ) , u ( t ) − w ( t ) i d t. Notice that, provided u ∈ D (Φ) we can set w ( t ) = u in (4.2) and obtain that ´ T Φ( u ( t ))d t < ∞ ,which motivates the name for this notion of solution. In addition, as the following result shows, anyenergy solution is a strong solution. Proposition 4.3 (energy vs. strong) . An energy solution of (1.2) is also a strong solution.Proof.

Evidently, it suﬃces to prove that that f ( t ) − D αc u ( t ) ∈ ∂ Φ( u ( t )) for almost every t ∈ (0 , T ).Let w ∈ H , t ∈ (0 , T ), and choose h > t − h, t + h ) ⊂ (0 , T ). Deﬁne w ( t ) = u ( t ) − χ ( t − h,t + h ) ( u ( t ) − w ) ∈ L (0 , T ; H )where by χ S we denote the characteristic function of the set S . This choice of test function on (4.2)gives t + ht − h h D αc u ( t ) − f ( t ) , u ( t ) − w i d t + t + ht − h (Φ( u ( t )) − Φ( w )) d t ≤ . The assumptions of an energy solution guarantee that all terms inside the integrals belong to L (0 , T ; R ) so that for almost every t we have, as h ↓

0, that h D αc u ( t ) − f ( t ) , w i + Φ( u ( t )) − Φ( w ) ≤ , which is (4.1) and, as we intended to show, is equivalent to the claim. (cid:3) Remark 4.4 (coercivity) . By introducing the coercivity modulus of Deﬁnition 2.1 one realizes thatan energy solution u satisﬁes, instead of (4.1) and (4.2) , the stronger inequalities (4.3) h D αc u ( t ) , u ( t ) − w i + Φ( u ( t )) − Φ( w ) + σ ( u ( t ); w ) ≤ h f ( t ) , u ( t ) − w i , ∀ w ∈ H , and, for any w ∈ L (0 , T ; H ) , (4.4) ˆ T [ h D αc u ( t ) , u ( t ) − w ( t ) i + Φ( u ( t )) − Φ( w ( t )) + σ ( u ( t ); w ( t ))] d t ≤ ˆ T h f ( t ) , u ( t ) − w ( t ) i d t. IME FRACTIONAL GRADIENT FLOW 21

Existence and uniqueness.

In this section, we will prove the following theorem on the ex-istence and uniqueness of energy solutions to (1.2) in the sense of Deﬁnition 4.2. The main resultthat we will prove reads as follows.

Theorem 4.5 (well posedness) . Assume that the energy Φ is convex, l.s.c., and with nonemptyeﬀective domain. Let u ∈ D (Φ) and f ∈ L α (0 , T ; H ) . In this setting, the fractional gradient ﬂowproblem (1.2) has a unique energy solution u , in the sense of Deﬁnition 4.2. For almost every t ∈ (0 , T ) , the solution u satisﬁes that f ( t ) − D αc u ( t ) ∈ ∂ Φ( u ( t )) and for any t ∈ [0 , T ] we have (4.5) u ( t ) = u + 1Γ( α ) ˆ t ( t − s ) α − D αc u ( s )d s. In addition, u ∈ C ,α/ ([0 , T ]; H ) with modulus of continuity (4.6) k u ( t ) − u ( t ) k ≤ C | t − t | α/ (cid:16) k f k L α (0 ,T ; H ) + Φ( u ) − Φ inf (cid:17) / , ∀ t , t , ∈ [0 , T ] . where the constant C depends only on α . We point out that our assumptions are weaker than those in [28, Theorem 5.10]. First, we allowfor a nonzero right hand side. In addition, we do not require [28, Assumption 5.9], which is a sortof weak-strong continuity of subdiﬀerentials.The remainder of this section will be dedicated to the proof of Theorem 4.5. To accomplish this,we follow a similar approach to [28, Section 5]. To show existence of solutions, we consider a sortof fractional minimizing movements scheme. We introduce a partition P with maximal time step τ and compute the sequence U = { U n } Nn =0 ⊂ H as follows. Assume U ∈ D (Φ) is given, the n –thiterate, for n ∈ { , . . . , N } , is deﬁned recursively via(4.7) F n − ( D α P U ) n ∈ ∂ Φ( U n ) , where(4.8) F n = t n t n − f ( t )d t. We will usually choose U = u , but other choices of U ∈ D (Φ) are also allowed.From the approximation scheme (4.7) and the expression of the discrete Caputo derivative( D α P U ) n given in (3.3), it is clear that(4.9) U n = arg min w ∈H Φ( w ) − h F n , w i − n − X i =0 K − P ,ni k w − U i k ! . Thanks to Proposition 3.1, for i = 0 , . . . , n −

1, we have that K − P ,ni < U n is well-deﬁned.Now, in order to deﬁne a continuous in time function from U , we use the interpolation introducedin (3.16). Let V P ( t ) = ( D α P U ) n ( t ) . Then we have(4.10) b U P ( t ) = U + 1Γ( α ) ˆ t ( t − s ) α − V P ( s )d s. Recall that F P can be deﬁned from { F n } Nn =1 using (2.3) and that Lemma 2.4 showed that F P ∈ L α (0 , T ; H ) with a norm bounded independently of P . We now obtain some suitable bounds for b U P and V P . Lemma 4.6 (a priori bounds) . Let P be any partition. The functions b U P and V P satisfy (4.11) sup t ∈ [0 ,T ] Φ( b U P ( t )) ≤ Φ( U ) + 14Γ( α ) k F P k L α (0 ,T ; H ) ≤ Φ( U ) + C k f k L α (0 ,T ; H ) , k V P k L α (0 ,T ; H ) = sup t ∈ [0 ,T ] ˆ t ( t − s ) α − k V P ( s ) k d s ≤ C (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) , where the constant C only depends on α .Proof. Since F n − ( D α P U ) n ∈ ∂ Φ( U n ), one hasΦ( U n ) − Φ( U i ) ≤ h F n − ( D α P U ) n , U n − U i i . Therefore noticing that K − P ,ni < i ∈ { , . . . , n − } , we get(4.12) ( D α P Φ( U )) n = − n − X i =0 K − P ,ni (Φ( U n ) − Φ( U i )) ≤ − n − X i =0 K − P ,ni h F n − ( D α P U ) n , U n − U i i = h F n − ( D α P U ) n , ( D α P U ) n i , where we denoted Φ( U ) = { Φ( U n ) } Nn =0 .We can now proceed to obtain the claimed estimates. To prove the ﬁrst one, we use that( D α P Φ( U )) n ≤ h F n − ( D α P U ) n , ( D α P U ) n i ≤ k F n k to obtain that for any n ,Φ( U n ) = Φ( U ) + n X i =1 K P ,ni ( D α P Φ( U )) i ≤ Φ( U ) + 14 n X i =1 K P ,ni k F i k = Φ( U ) + 14Γ( α ) ˆ t n ( t n − s ) α − k F P ( s ) k d s ≤ Φ( U ) + C k f k L α (0 ,T ; H ) , where the constant C depends only on α . Now, since Proposition 3.5 has shown that b U P is a convexcombination of the values U n , we haveΦ( b U P ( t )) = Φ N X i =0 ϕ P ,i ( t ) U i ! ≤ N X i =0 ϕ P ,i ( t )Φ ( U i ) ≤ max n Φ( U n ) ≤ Φ( U ) + C k f k L α (0 ,T ; H ) , which ﬁnishes the proof of the ﬁrst claim.We now proceed to prove the second claim. Using (4.12) we getΦ inf ≤ Φ( b U P ( t )) ≤ Φ( U ) + 1Γ( α ) ˆ t ( t − s ) α − h F P ( s ) − V P ( s ) , V P ( s ) i d s ≤ Φ( U ) + 1Γ( α ) (cid:18) ˆ t ( t − s ) α − k F P ( s ) k d s (cid:19) / (cid:18) ˆ t ( t − s ) α − k V P ( s ) k d s (cid:19) / − α ) ˆ t ( t − s ) α − k V P ( s ) k d s, for any t ∈ [0 , T ]. This implies that ˆ t ( t − s ) α − k V P ( s ) k d s ≤ k F P k L α (0 ,T ; H ) + 2Γ( α )(Φ( U ) − Φ inf ) , which, using Lemma 2.4, implies the result. (cid:3) IME FRACTIONAL GRADIENT FLOW 23

Remark 4.7 (the function b Φ) . Notice that, during the course of the proof of the ﬁrst estimate in (4.11) we also showed that, if we deﬁne b Φ P ( t ) = P Ni =0 ϕ P ,i ( t )Φ( U i ) , then b Φ( t ) is the interpolationof Φ P ( U ) with piecewise constant Caputo derivative. Moreover, D αc b Φ P ( t ) ≤ (cid:13)(cid:13) F P ( t ) (cid:13)(cid:13) . These estimates immediately yield a modulus of continuity estimate on the interpolant b U P whichis independent of the partition P . Lemma 4.8 (H¨older continuity) . Let P be any partition and U ∈ H N be the solution to (4.7) associated to this partition. For t , t ∈ [0 , T ] the interpolant b U P , deﬁned in (3.16) , satisﬁes k b U ( t ) − b U ( t ) k ≤ C | t − t | α/ (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / where the constant C depends only on α .Proof. As proved in [28, Lemma 5.8], D αc w ∈ L α (0 , T ; H ) guarantees w ∈ C ,α/ ([0 , T ]; H ). There-fore using D αc b U = V α ∈ L α (0 , T ; H ) and the estimate from Lemma 4.6, we obtain the result. (cid:3) Next we control the diﬀerence between discrete solutions corresponding to diﬀerent partitions.

Lemma 4.9 (equicontinuity) . Let, for i = 1 , , P i be partitions of [0 , T ] with maximal step size τ i , respectively, and denote by U ( i ) the associated solutions to (4.7) . Let b U i be their interpolations,deﬁned by (4.10) , and U i be their piecewise constant interpolations as in (2.3) . Assuming that U ( i )0 = U we have (cid:13)(cid:13)(cid:13) b U − b U (cid:13)(cid:13)(cid:13) L ∞ (0 ,T ; H ) ≤ C (cid:16) τ α/ + τ α/ (cid:17) (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / , (4.13) sup t ∈ [0 ,T ] ˆ t ( t − s ) α − ρ ( U ( s ) , U ( s ))d s ≤ C ( τ α + τ α ) (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) , (4.14) where the constant C only depends on α .Proof. For almost every t ∈ [0 , T ], we have that(4.15) D D αc ( b U − b U ) , b U − b U E = I + II + III , where I = D ( F − D αc b U ) − ( F − D αc b U ) , U − U E ≤ − ρ ( U , U ) , II = D ( F − D αc b U ) − ( F − D αc b U ) , ( b U − U ) − ( b U − U ) E , III = D F − F , b U − b U E , where to bound I we used that F i ( t ) − D αc b U i ( t ) ∈ ∂ Φ( U i ( t )) and Deﬁnition 2.1. Deﬁne now G ( t ) = 1Γ( α ) ˆ t ( t − s ) α − (cid:0) F ( s ) − F ( s ) (cid:1) d s = 1Γ( α ) ˆ t ( t − s ) α − (cid:0) F ( s ) − f ( s ) (cid:1) d s − α ) ˆ t ( t − s ) α − (cid:0) F ( s ) − f ( s ) (cid:1) d s, so that D αc G ( t ) = F ( t ) − F ( t ) and by (2.10) of Lemma 2.5 one further has(4.16) k G k L ∞ (0 ,T ; H ) ≤ C (cid:16) τ α/ + τ α/ (cid:17) k f k L α (0 ,T ; H ) , where C is a constant that depends only on α . Using these estimates, from (4.15) we deduce that(4.17) D D αc ( b U − b U − G ) , b U − b U − G E + ρ ( U , U ) ≤ II − D D αc ( b U − b U − G ) , G E . Set w = b U − b U − G . By (2.18) we have that12 D αc k w ( t ) k + ρ ( U , U ) ≤ II − h D αc w, G i , and, using (2.17) and (4.16), we then conclude12 k b U ( t ) − b U ( t ) k + 1Γ( α ) ˆ t ( t − s ) α − ρ ( U ( s ) , U ( s ))d s ≤ α ) ˆ t ( t − s ) α − (II( s ) − h D αc w ( s ) , G ( s ) i ) d s + C (cid:16) τ α/ + τ α/ (cid:17) k f k L α (0 ,T ; H ) . It remains then to estimate the fractional integral on the right hand side. We estimate each termseparately.First, owing to Lemma 2.4 and Lemma 4.6 we have, for i = 1 ,

2, that (cid:13)(cid:13)(cid:13) F i − D αc b U i (cid:13)(cid:13)(cid:13) L α (0 ,T ; H ) ≤ C (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / , Therefore using the Cauchy-Schwarz inequality, for any t ∈ [0 , T ], we have ˆ t ( t − s ) α − | II( s ) | d s ≤ C (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / X i =1 (cid:13)(cid:13)(cid:13) b U i − U i (cid:13)(cid:13)(cid:13) L α (0 ,T ; H ) . Recalling that U i ( t ) = b U i ( ⌈ t ⌉ i ) we can invoke Lemma 2.6 and, again, Lemma 4.6 to arrive at ˆ t ( t − s ) α − | II( s ) | d s ≤ C ( τ α/ + τ α/ ) (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) . Finally, for the remaining term, we use the Cauchy-Schwarz inequality and get ˆ t ( t − s ) α − |h D αc w, G i ( s ) | d s ≤ (cid:18) ˆ t ( t − s ) α − k D αc w ( s ) k d s (cid:19) / (cid:18) ˆ t ( t − s ) α − k G ( s ) k d s (cid:19) / ≤ k D αc w k L α (0 ,T ; H ) k G k L α (0 ,T ; H ) To estimate the norm of G we apply (2.11) from Lemma 2.5 with β = α to obtain k G k L α (0 ,T ; H ) ≤ C ( τ α + τ α ) k f k L α (0 ,T ; H ) . Furthermore, Lemma 2.4 and Lemma 4.6 guarantee that k D αc w k L α (0 ,T ; H ) ≤ C (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / . Combining all estimates proves the desired result. (cid:3)

We are ﬁnally able to prove Theorem 4.5. We will follow the same approach as in [28, Theorem5.10]; we will pass to the limit τ i ↓ b U i . Proof of Theorem 4.5.

Let us ﬁrst prove uniqueness of energy solutions. Suppose that we have twoenergy solutions u , u to (1.2). Let t ∈ (0 , T ) be arbitrary and h > t − h, t + h ) ⊂ [0 , T ]. Setting as test function, in the EVI that characterizes u , the function w = u − χ ( t − h,t + h ) ( u − u ) and vice versa, and adding the ensuing inequalities we obtain ˆ t + ht − h h D αc u ( s ) − D αc u ( s ) , u ( s ) − u ( s ) i d s ≤ , meaning that h D αc u ( t ) − D αc u ( t ) , u ( t ) − u ( t ) i ≤ t ∈ [0 , T ].Deﬁne d ( t ) = k u ( t ) − u ( t ) k . Since u , u ∈ L (0 , T ; H ) we clearly have d ∈ L (0 , T ; R ).Furthermore, t | d ( s ) | d s ≤ t (cid:0) k ( u ( s ) − u ) k + k ( u ( s ) − u ) k (cid:1) d s → , IME FRACTIONAL GRADIENT FLOW 25 as t ↓

0, from Deﬁnition 4.2. Using (2.18) we then have D αc d ( t ) ≤ h D αc u ( t ) − D αc u ( t ) , u ( t ) − u ( t ) i ≤ d ≥ ﬄ t | d ( s ) | d s → d ( t ) = 0. This proves the uniqueness.We now turn our attention to existence. Let {P k } ∞ k =1 be a sequence of partitions such that τ k ↓ k → ∞ . We denote by U ( k ) the discrete solution, on partition P k , given by (4.7) with U ( k )0 = u . The symbols b U k , V k and F k carry analogous meaning. Owing to Lemma 4.9 there exists u ∈ C ([0 , T ]; H ) such that b U k converges to u in C ([0 , T ]; H ).The embedding of Proposition 2.3 and an application of Lemma 4.6 shows that there is a subse-quence for which V k j ⇀ v in L (0 , T ; H ) as j → ∞ . Moreover, we can again appeal to Lemma 4.6to see that, for every t ∈ [0 , T ], the sequence( t − · ) α − V k j ( · )is uniformly bounded in L (0 , t ; H ) so that by passing to a further, not retagged, subsequence(4.18) ( t − · ) α − V k j ( · ) ⇀ ( t − · ) α − v ( · ) in L (0 , t ; H )for any t ∈ [0 , T ]. This, in addition, shows that v ∈ L α (0 , T ; H ) so that if we deﬁne(4.19) e u ( t ) = u + 1Γ( α ) ˆ t ( t − s ) α − v ( s )d s then D αc e u = v .Recall that for any j ∈ N and any t ∈ [0 , T ] we have that b U k j ( t ) = u + 1Γ( α ) ˆ t ( t − s ) α − V k j ( s )d s. Since, for an arbitrary w ∈ H we have that ( t − · ) α − w is in L (0 , t ; H ) , we can use (4.18) to obtainthat lim j →∞ h b U k j ( t ) , w i = lim j →∞ (cid:28) u + 1Γ( α ) ˆ t ( t − s ) α − V k j ( s )d s, w (cid:29) = (cid:28) u + 1Γ( α ) ˆ t ( t − s ) α − v ( s )d s, w (cid:29) = h e u ( t ) , w i . The statement above holds for any w ∈ H and all t ∈ [0 , T ]. Thus,(4.20) b U k j ( t ) ⇀ e u ( t ) , in H . However, this implies that e u = u , as b U k j converges to u in C ([0 , T ]; H ). Therefore D αc u = v ∈ L α (0 , T ; H ) and, by Lemma 4.6, we have the estimate k v k L α (0 ,T ; H ) ≤ C (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / , for some constant C depending on α . As in the proof of Lemma 4.8 this implies that (4.6) holds.From this, we also see that the initial condition is attained in the required sense.It remains to show that the EVI (4.2) holds for u . From the construction of discrete solutions,one derives that for any w ∈ L (0 , T ; H )(4.21) ˆ T (cid:16) Φ( b U k j ( t )) − Φ( w ( t )) (cid:17) d t ≤ ˆ T h F k j ( t ) − V k j ( t ) , b U k j ( t ) − w ( t ) i d t. We will pass to the limit in this inequality. For the right hand side, it suﬃces to observe that b U k j → u in C ([0 , T ]; H ), V k j ⇀ v in L (0 , T ; H ) and F k j → f in L (0 , T ; H ). Thus, ˆ T h F k j ( t ) − V k j ( t ) , b U k j ( t ) − w ( t ) i d t → ˆ T h f ( t ) − v ( t ) , u ( t ) − w ( t ) i d t. For the left hand side, the uniform convergence of b U k j and the lower semicontinuity of Φ, giveΦ( u ( t )) ≤ lim inf j →∞ Φ (cid:16) b U k j ( t ) (cid:17) , and hence ˆ T Φ( u ( t )) − Φ( w ( t ))d t ≤ ˆ T h f ( t ) − v ( t ) , u ( t ) − w ( t ) i d t. It remains to recall that D αc u = v ∈ L (0 , T ; H ) to conclude that, according to Deﬁnition 4.2, u isan energy solution. (cid:3) Remark 4.10 (other notion of solution) . The choice of u ∈ L (0 , T ; H ) and D αc u ∈ L (0 , T ; H ) inDeﬁnition 4.2 is to guarantee that (4.2) makes sense. It is also necessary in the proof of uniqueness.However, other choices of spaces are also possible. For example, one could consider the followingdeﬁnition instead of Deﬁnition 4.2: u ∈ L ∞ (0 , T ; H ) is a solution to (1.2) if:(i) lim t ↓ ﬄ t k u ( s ) − u k d s = 0 ;(ii) D αc u ∈ L (0 , T ; H ) ; and(iii) for any w ∈ L ∞ (0 , T ; H ) , (4.22) ˆ T [ h D αc u ( t ) , u ( t ) − w ( t ) i + Φ( u ( t )) − Φ( w ( t ))] d t ≤ ˆ T h f ( t ) , u ( t ) − w ( t ) i d t. Theorem 4.5 also holds for this new deﬁnition. However, at least with our techniques, the require-ments on the data u ∈ D (Φ) and f ∈ L α (0 , T ; H ) do not change. Fractional gradient flows: Numerics

Since the existence of an energy solution was proved by a rather constructive approach, namely afractional minimizing movements scheme, it makes sense to provide error analyses for this scheme.We will provide an a priori error estimate which, in light of the smoothness u ∈ C ,α/ ([0 , T ]; H )proved in Theorem 4.5, is optimal. In addition, in the spirit of [30] we will provide an a posteriorierror analysis.5.1. A priori error analysis.

The a priori error estimate reads as follows. We comment that thisresult gives us a better rate compared to [28, Theorem 5.10].

Theorem 5.1 (a priori I) . Let u be the energy solution of (1.2) . Given a partition P , of maximalstep size τ , let U ∈ H N be the discrete solution deﬁned by (4.7) starting from U ∈ D (Φ) . Let b U P and U P be deﬁned as in (4.10) and (2.3) , respectively. Then we have, (cid:13)(cid:13)(cid:13) u − b U P (cid:13)(cid:13)(cid:13) L ∞ (0 ,T ; H ) ≤ k u − U k + Cτ α/ (cid:16) k f k L α (0 ,T ; H ) + Φ − Φ inf (cid:17) / , (5.1) sup t ∈ [0 ,T ] ˆ t ( t − s ) α − ρ ( u ( s ) , U P ( s ))d s ≤ k u − U k + Cτ α (cid:16) k f k L α (0 ,T ; H ) + Φ − Φ inf (cid:17) , (5.2) where Φ = max { Φ( U ) , Φ( u ) } , and the constant C depends only on α .Proof. The proof can be obtained by following the same procedure employed in the proof of Lemma 4.9.In the current situation, however, instead of comparing two discrete solutions we compare the exactand discrete ones. The only diﬀerence is that we allow U = u here, but this presents no essentialdiﬃculty. For brevity, we skip the details. (cid:3) IME FRACTIONAL GRADIENT FLOW 27

A posteriori error analysis.

Let us now provide an a posteriori error estimate between thediscretization in (4.7) and the solution of (1.2). We will also show how, from this a posteriori errorestimator, an a priori error estimate can be derived. Let us ﬁrst introduce the a posteriori errorestimator.

Deﬁnition 5.2 (error estimator) . Let P be a partition of [0 , T ] as in (2.2) , and U ∈ H N denotethe discrete solution given by (4.7) . We deﬁne the error estimator function as (5.3) E P ( t ) = E P , ( t ) + E P , ( t ) , where E P , ( t ) = h D αc b U P ( t ) − F P ( t ) , b U P ( t ) − U P ( t ) i , E P , ( t ) = Φ( b U P ( t )) − Φ( U P ( t )) . Notice that the quantity E P ( t ) is nonnegative because F P ( t ) − D αc b U P ( t ) = F n ( t ) − ( D α P U ) n ( t ) ∈ ∂ Φ( U n ( t ) ) = ∂ Φ( U P ( t )). It is also, in principle, computable since it only depends on data, and thediscrete solution U . It is then a suitable candidate for an a posteriori error estimator.The derivation of an a posteriori error estimate begins with the observation that, for any w ∈ H ,we have(5.4) h D αc b U P ( t ) − f ( t ) , b U P ( t ) − w i + Φ( b U P ( t )) − Φ( w )= E P ( t ) + h F P ( t ) − D αc b U P ( t ) , w − U P ( t ) i + Φ( U P ( t )) − Φ( w ) + h f ( t ) − F P ( t ) , w − b U P ( t ) i≤ E P ( t ) + h f ( t ) − F P ( t ) , w − b U P ( t ) i − σ ( U P ( t ); w ) . In other words, the function b U P solves an EVI similar to (4.3) but with additional terms on theright hand side. We can then compare the EVIs by a now standard approach, that is, set w = u ( t )in (5.4) and w = b U P ( t ) in (4.3), respectively, to see that(5.5) D D αc (cid:16) b U P − u (cid:17) ( t ) , b U P ( t ) − u ( t ) E + σ ( U P ( t ); u ( t )) + σ ( u ( t ); b U P ( t )) ≤E P ( t ) + h f ( t ) − F P ( t ) , u ( t ) − b U P ( t ) i for almost every t ∈ [0 , T ]. Consider the following notions of error:(5.6) E = sup t ∈ [0 ,T ] (cid:8) E H ( t ) + E σ ( t ) (cid:9)! / , E H ( t ) = k u ( t ) − b U P ( t ) k ,E σ ( t ) = (cid:18) α ) ˆ t ( t − s ) α − h σ ( u ( s ); b U P ( s )) + σ ( U P ( s ); u ( s )) i d s (cid:19) / . We have the following error estimate for E . Theorem 5.3 (a posteriori) . Let u be the energy solution of (1.2) . Let P be a partition of [0 , T ] deﬁned as in (2.2) and let U ∈ H N be the discrete solution given by (4.7) starting from U ∈ D (Φ) .Let E and E P be deﬁned in (5.6) and (5.3) , respectively, The following a posteriori error estimateholds (5.7) E ≤ (cid:18) k u − U k + 2Γ( α ) kE P k L α (0 ,T ; H ) (cid:19) / + 2Γ( α ) k f − F P k L α (0 ,T ; H ) . Proof.

From (2.18) we infer12 D αc k b U P − u k ( t ) ≤ D D αc (cid:16) b U P − u (cid:17) ( t ) , b U P ( t ) − u ( t ) E ≤ E P ( t ) + h f ( t ) − F P ( t ) , u ( t ) − b U P ( t ) i − σ ( U P ( t ); u ( t )) − σ ( u ( t ); b U P ( t )) . The claimed a posteriori error estimate (5.7) follows from Lemma 2.8 by setting λ = 0 , a ( t ) = k ( b U P − u )( t ) k , b ( t ) = 2 (cid:16) σ ( U P ( t ); u ( t )) + σ ( u ( t ); b U P ( t )) (cid:17) ,c ( t ) = 2 E P ( t ) , d ( t ) = k ( f − F P )( t ) k . (cid:3) Rate of convergence.

Although we have already established an optimal a priori rate ofconvergence for our scheme in Theorem 5.1, in this section we study the sharpness of the a posteriorierror estimator E P by obtaining the same convergence rates through it. We comment that neitherin Theorem 5.1 nor in our discussion here, we require any relation between time steps. We will alsoconsider some cases when the rate of convergence can be improved.5.3.1. Rate of convergence for energy solutions.

Let us now use the estimator E P to derive a conver-gence rate or order O ( τ α/ ) for the error E , deﬁned in (5.6), when f ∈ L α (0 , T ; H ). Notice that suchregularity a priori does not give any order of convergence for k f − F P k L α (0 ,T ; H ) in (5.7). Observealso that the rate that we obtain is consistent with classical gradient ﬂow theories, where an order O ( τ / ) is proved provided that u ∈ D (Φ) and f ∈ L (0 , T ; H ); see [30, Sec 3.2].We ﬁrst bound kE P k L α (0 ,T ; H ) . Theorem 5.4 (bound on kE P k L α (0 ,T ; H ) ) . Under the assumption that U ∈ D (Φ) , the estimator E P ,deﬁned in (5.3) , satisﬁes (5.8) kE P k L α (0 ,T ; H ) ≤ Cτ α (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) , where the constant C depends only on α .Proof. We bound the contributions E P , and E P , separately. The bound of E P , follows withoutchange that of the term II of (4.15) in Lemma 4.9. Thus,(5.9) kE P , k L α (0 ,T ; H ) ≤ Cτ α (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) . To bound E P , , we recall the function b Φ P , deﬁned in Remark 4.7, and its properties. Deﬁne alsoΦ P ( t ) = Φ( U P ( t )). We have E P , ( t ) = Φ (cid:16) b U P ( t ) (cid:17) − Φ( U P ( t )) ≤ b Φ P ( t ) − Φ P ( t )= 1Γ( α ) ˆ t ( t − s ) α − D αc b Φ P ( s )d s − ˆ ⌈ t ⌉ P ( ⌈ t ⌉ P − s ) α − D αc b Φ P ( s )d s ! = 1Γ( α ) ˆ t [( t − s ) α − − ( ⌈ t ⌉ P − s ) α − ] D αc b Φ P ( s )d s − ˆ ⌈ t ⌉ P t ( ⌈ t ⌉ P − s ) α − D αc b Φ P ( s )d s ! ≤ α ) ˆ t [( t − s ) α − − ( ⌈ t ⌉ P − s ) α − ] (cid:13)(cid:13) F P ( s ) (cid:13)(cid:13) d s − α ) ˆ ⌈ t ⌉ P t ( ⌈ t ⌉ P − s ) α − D αc b Φ P ( s )d s = 14Γ( α ) ˆ t [( t − s ) α − − ( ⌈ t ⌉ P − s ) α − ] (cid:13)(cid:13) F P ( s ) (cid:13)(cid:13) d s − α + 1) ( ⌈ t ⌉ P − t ) α D αc b Φ P ( t )= I ( t ) − I ( t ) . On the one hand, proceeding as in the proof of Lemma 2.6 we obtainsup r ∈ [0 ,T ] ˆ r ( r − t ) α − I ( t )d t ≤ C τ α k F P k L α (0 ,T ; H ) . On the other hand, using − I ( t ) ≤ − α + 1) ( ⌈ t ⌉ P − t ) α (cid:18) D αc b Φ P ( t ) − (cid:13)(cid:13) F P ( t ) (cid:13)(cid:13) (cid:19) ≤ τ α Γ( α + 1) (cid:18) (cid:13)(cid:13) F P ( t ) (cid:13)(cid:13) − D αc b Φ P ( t ) (cid:19) IME FRACTIONAL GRADIENT FLOW 29 we have for any r ∈ [0 , T ] that − ˆ r ( r − t ) α − I ( t )d t ≤ τ α Γ( α + 1) ˆ r ( r − t ) α − (cid:18) (cid:13)(cid:13) F ( t ) (cid:13)(cid:13) − D αc b Φ P ( t ) (cid:19) d t = τ α α + 1) ˆ r ( r − t ) α − (cid:13)(cid:13) F P ( t ) (cid:13)(cid:13) d t − τ α α (cid:16)b Φ P ( r ) − Φ( U ) (cid:17) ≤ τ α α + 1) k F P k L α (0 ,T ; H ) + τ α α (Φ( U ) − Φ inf ) . Therefore combining the estimates for I and I we have proved thatsup r ∈ [0 ,T ] ˆ r ( r − t ) α − E P , ( t )d t ≤ C τ α (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) , which together with (5.9) proves (5.8) because E P is nonnegative. (cid:3) We next take advantage of Lemma 2.5, and derive a rate for E without additional smoothnessassumptions on the right hand side f . Theorem 5.5 (a priori II) . Let u be the energy solution of (1.2) . Let P be a partition of [0 , T ] deﬁned as in (2.2) and U ∈ H N be the discrete solution given by (4.7) starting from U ∈ D (Φ) .Let E be deﬁned in (5.6) . Then we have E ≤ k u − U k + Cτ α/ (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) / , where the constant C depends only on α .Proof. We follow closely the approach and notation in Lemma 4.9. Deﬁne G ( t ) = 1Γ( α ) ˆ t ( t − s ) α − (cid:0) f ( s ) − F P ( s ) (cid:1) d s and note that, by Lemma 2.5, G satisﬁes(5.10) τ α/ k G k L ∞ (0 ,T ; H ) + k G k L α (0 ,T ; H ) ≤ C τ α k f k L α (0 ,T ; H ) , where the constant depends only on α . Set e = u − b U P and note that (5.5) can be rewritten as h D αc ( e − G ) ( t ) , ( e − G ) ( t ) i + σ ( U P ( t ); u ( t )) + σ ( u ( t ); b U P ( t )) ≤ E P ( t ) − h D αc ( e − G ) ( t ) , G ( t ) i . Notice the resemblance with (4.17). We can thus proceed as in Lemma 4.9, and use Theorem 5.4,to deduce that, for some constant C , depending only on α k u − b U P − G k ( t ) + E σ ( t ) ≤ k u − U k + C τ α (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) . Estimate (5.10) then implies the result. (cid:3)

Rate of convergence for smooth energies.

Let us show that, at least for smoother energies, itis possible to obtain a better rate of convergence. We will, essentially, assume that the energy islocally C β for β ∈ (0 , β ∈ (0 ,

1] such that for every

R >

0, there is a constant C β,R > w ) − Φ( w ) − h ξ , w − w i ≤ C β,R k w − w k β , ∀ w , w ∈ B R , ξ ∈ ∂ Φ( w ) , where B R denotes the ball of radius R in H . Notice that, by Lemma 4.8, all the discrete solutions b U P are uniformly bounded in C ([0 , T ]; H ). Thus, we can ﬁx ¯ R > P and all t ∈ [0 , T ], b U P ( t ) ∈ B ¯ R . Therefore, (5.11) implies that(5.12) Φ( w ) − Φ( w ) − h ξ , w − w i ≤ C β k w − w k β , ∀ w , w ∈ b U P ([0 , T ]) , ξ ∈ ∂ Φ( w ) , for some constant C β = C β, ¯ R . A particular example to which this situation applies is the following. Let H = R d and Φ( w ) = p | w | p with p >

1. In this case, (5.12) holds with β = 1 for p ≥ β = p − p ∈ (1 , p <

2, to reach β = 1, we must assume that u and b U P stay uniformly away from zero. This examplecan, of course, be generalized.In this setting, we have the following improved estimate for kE P k L α (0 ,T ; H ) . Theorem 5.6 (improved bound) . Assume that the energy Φ satisﬁes (5.12) . Let u be the energysolution to (1.2) , and denote by P a partition of [0 , T ] deﬁned as in (2.2) . Denote by b U P the solutionof (4.7) starting from U ∈ D (Φ) . In this setting, the estimator E P deﬁned in (5.3) satisﬁes (5.13) kE P k L α (0 ,T ; H ) ≤ CT α (1 − β ) / τ α ( β +1) (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) ( β +1) / , for some constant C that depends on α , β , and the problem data.Proof. Owing to (5.12), the estimator E P can be bounded from above by E P ( t ) = h D αc b U P ( t ) − F P ( t ) , b U P ( t ) − U P ( t ) i + Φ( b U P ( t )) − Φ( U P ( t )) ≤ C β k b U P ( t ) − U P ( t ) k β . Applying Lemma 2.6 with p = 1 + β we have kE P k L α (0 ,T ; H ) ≤ sup r ∈ [0 ,T ] C β ˆ r ( r − t ) α − k b U P ( t ) − U P ( t ) k β d t ≤ Cτ α (1+ β ) (cid:13)(cid:13)(cid:13) D αc b U (cid:13)(cid:13)(cid:13) βL βα (0 ,T ; H ) , for some constant C that depends on α, β and the problem data. Since 1 + β ∈ (1 , k w k L βα (0 ,T ; H ) ≤ k w k L α (0 ,T ; H ) (cid:18) T α α (cid:19) (1 − β ) / (2(1+ β )) , imply that (cid:13)(cid:13)(cid:13) D αc b U P (cid:13)(cid:13)(cid:13) βL βα (0 ,T ; H ) ≤ C T α (1 − β ) / (cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) (1+ β ) / , and this implies the claim. (cid:3) Now, in order to obtain a convergence rate using (5.7), we still need to control k f − F P k L α (0 ,T ; H ) .To do so, we invoke inequality (2.5) and see that k f − F P k L α (0 ,T ; H ) ≤ (cid:18) q − qα − (cid:19) ( q − /q T α − /q k f − F P k L q (0 ,T ; H ) for q > /α . Thus, if f ∈ W α (1+ β ) / ,q (0 , T ; H ), then we have k f − F P k L q (0 ,T ; H ) ≤ Cτ α (1+ β ) / | f | W α (1+ β ) / ,q (0 ,T ; H ) and hence(5.14) k f − F P k L α (0 ,T ; H ) ≤ CT α − /q τ α (1+ β ) / | f | W α (1+ β ) / ,q (0 ,T ; H ) for some constant C that depends on α and q . Combining this with Theorem 5.6, the followingconvergence rate is a direct consequence of Theorem 5.3. Theorem 5.7 (improved rate: smooth energies) . Assume that the energy Φ satisﬁes (5.12) . Let u be the energy solution to (1.2) , and denote by P a partition of [0 , T ] deﬁned as in (2.2) . Denoteby b U P the solution of (4.7) starting from U ∈ D (Φ) . In this setting, if there is q > /α for which f ∈ W α ( β +1) / ,q (0 , T ; H ) then the error E , deﬁned in (5.6) , satisﬁes E ≤ k u − U k + Cτ α ( β +1) / (cid:20)(cid:16) k f k L α (0 ,T ; H ) + Φ( U ) − Φ inf (cid:17) ( β +1) / + | f | W α ( β +1) / ,q (0 ,T ; H ) (cid:21) , where the constant C depends on α , β , q , T , and the problem data. IME FRACTIONAL GRADIENT FLOW 31

Rate of convergence for linear problems.

Let us now show how for certain classes of linearproblems an improved rate of convergence can be obtained. We ﬁrst assume that we have a Gelfandtriple, V ֒ → H ֒ → V ′ and that(5.15) Φ( w ) =  a ( w, w ) , w ∈ V , + ∞ , w / ∈ V . where a : V × V → R is a nonnegative, symmetric, bounded, and semicoercive bilinear form. In thissetting, (4.1) becomes h D αc u, w i + a ( u, w ) = h f, w i , ∀ w ∈ V . Notice that the bilinear form induces an operator A : V → V ′ given by h A v, w i V , V ′ = a ( v, w ) , ∀ v, w ∈ V , which implies that, for almost every t ∈ (0 , T ), we have a problem in V ′ which reads D αc u ( t ) + A u ( t ) = f ( t ) . So that, u ∈ D ( ∂ Φ) is equivalent to A u ∈ H . The bilinear form a also induces a semi-norm on V [ w ] V = a ( w, w ) / . We further assume that f ∈ L α (0 , T ; [ · ] V ). More essentially we also require u ∈ D ( ∂ Φ).The motivation for an improved rate of convergence is then the following, at this stage formal,calculation. From (2.18) we have12 D αc k A u ( t ) k ≤ h D αc A u ( t ) , A u ( t ) i = h A u ( t ) , A D αc u ( t ) i = h f ( t ) − D αc u ( t ) , A D αc u ( t ) i = a ( f ( t ) , D αc u ( t )) − [ D αc u ( t )] V ≤ [ f ( t )] V [ D αc u ( t )] V − [ D αc u ( t )] V . Which then shows via (2.17) thatΓ( α )2 k A u ( t ) k + ˆ t ( t − s ) α − [ D αc u ( s )] V d s ≤ Γ( α )2 k A u k + (cid:18) ˆ t ( t − s ) α − [ f ( s )] V d s (cid:19) / (cid:18) ˆ t ( t − s ) α − [ D αc u ( s )] V d s (cid:19) / . This implies that[ D αc u ] L α (0 ,T ;[ · ] V ) = sup t ∈ [0 ,T ] ˆ t ( t − s ) α − [ D αc u ( s )] V d s ≤ Γ( α ) k A u k + k f k L α (0 ,T ;[ · ] V ) , which says that D αc u is uniformly bounded in L α (0 , T ; [ · ] V ).To make these considerations rigorous, we consider the discrete problem (4.7), which in this casereduces to ( D α P U ) n + A U n = F n , Then the computations can be followed verbatim to obtain thatΓ( α )2 k A b U P ( t ) k + ˆ t ( t − s ) α − h D αc b U P ( s ) i d s ≤ Γ( α )2 k A U k + (cid:18) ˆ t ( t − s ) α − h D αc b U P ( s ) i d s (cid:19) / (cid:18) ˆ t ( t − s ) α − (cid:2) F P ( s ) (cid:3) d s (cid:19) / and(5.16) h D αc b U P i L α (0 ,T ;[ · ] V ) = sup t ∈ [0 ,T ] ˆ t ( t − s ) α − h D αc b U P ( s ) i V d s ≤ Γ( α ) k A U k + k F P k L α (0 ,T ;[ · ] V ) . Similar to Lemma 2.4, we know that k F P k L α (0 ,T ;[ · ] V ) ≤ C k f k L α (0 ,T ;[ · ] V ) and hence D αc b U P is uniformly bounded L α (0 , T ; [ · ] V ).With this additional regularity, we can obtain an improved rate of convergence. To see this, wewill use that Φ is, essentially, quadratic to observe that in this case the error estimator, deﬁned in(5.3) reduces to(5.17) E P = 12 a ( b U P − U P , b U P − U P ) = 12 h b U P − U P i V . These ingredients together give us the following improved estimate.

Theorem 5.8 (improved rate: linear problems) . Assume that the energy Φ is given by (5.15) , thatthe initial data satisﬁes A u ∈ H , and that f ∈ L α (0 , T ; [ · ] V ) . Let u be the energy solution to (1.2) ,and denote by P a partition of [0 , T ] deﬁned as in (2.2) . Denote by b U P the solution to (4.7) startingfrom U ∈ H , such that A U ∈ H . In this setting, we have that (5.18) kE P k L α (0 ,T ; H ) ≤ Cτ α (cid:16) k A U k + k f k L α (0 ,T ;[ · ] V ) (cid:17) , where the constant C depends only on α . This, immediately, implies that E ≤ k u − U k + Cτ α (cid:0) k A U k + k f k L α (0 ,T ;[ · ] V ) (cid:1) + k f − F P k L α , H ) , so that if, in addition, we further have f ∈ W α,q (0 , T ; H ) for some q > /α , then (5.19) E ≤ k u − U k + Cτ α (cid:0) k A U k + k f k L α (0 ,T ;[ · ] V ) + | f | W α,q (0 ,T ; H ) (cid:1) where the constant C depends only on α, q and T .Proof. Owing to Theorem 5.3 and equation (5.14), the convergence rate (5.19) follows directly from(5.18) in the same way as Theorem 5.7. We only need to prove (5.18) and bound kE P k L α (0 ,T ; H ) .Using (5.17), for every r ∈ (0 , T ] we have2 ˆ r ( r − t ) α − E P ( t )d t = ˆ r ( r − t ) α − h b U P − U P i V ( t )d t. Now, we invoke Lemma 2.6 with p = 2 and the semi-norm [ · ] V to obtain that ˆ r ( r − t ) α − h b U P − U P i V ( t )d t ≤ Cτ α h D αc b U P i L α (0 ,T ;[ · ] V ) . By (5.16), we have that D αc b U P ∈ L α (0 , T ; [ · ] V ) uniformly in P and thus arrive at ˆ r ( r − t ) α − h b U P − U P i V ( t )d t ≤ Cτ α (cid:16) k A U k + k f k L α (0 ,T ;[ · ] V ) (cid:17) . This implies the desired bound kE P k L α (0 ,T ; H ) ≤ Cτ α (cid:16) k A U k + k f k L α (0 ,T ;[ · ] V ) (cid:17) for kE P k L α (0 ,T ; H ) and ﬁnishes the proof. (cid:3) Lipschitz perturbations

In this section, inspired by the results of [3], we consider the analysis and approximation of afractional gradient ﬂow with a Lipschitz perturbation. Namely, we consider the following problem(6.1) ( D αc u ( t ) + ∂ Φ( u ( t )) + Ψ( t, u ( t )) ∋ f ( t ) , t ∈ (0 , T ] ,u (0) = u . We assume that the perturbation function Ψ : (0 , T ] × H → H satisﬁes IME FRACTIONAL GRADIENT FLOW 33

1. (Carath´eodory) For every w ∈ H the mapping t Ψ( t, w ) is strongly measurable on (0 , T )with values in H . Moreover, there exists L > t ∈ (0 , T ) and every w , w ∈ H we have k Ψ( t, w ) − Ψ( t, w ) k ≤ L k w − w k .

2. (Integrability) There is w ∈ L α (0 , T ; H ) for which t Ψ( t, w ( t )) ∈ L α (0 , T ; H ) . We immediately comment that our assumptions can ﬁt the case where Φ is merely λ –convex.Moreover, these assumptions also guarantee the existence of ψ ∈ L α (0 , T ; R ) for which k Ψ( t, w ) k ≤ ψ ( t ) + L k w k , ∀ w ∈ H . Consequently w Ψ( · , w ( · )) is Lipschitz continuous in L α (0 , T ; H ).We introduce the notion of energy solution of (6.1). Deﬁnition 6.1 (energy solution) . A function u ∈ L (0 , T ; H ) is an energy solution to (6.1) if(i) (Initial condition) lim t ↓ t k u ( s ) − u k d s = 0 . (ii) (Regularity) D αc u ∈ L (0 , T ; H ) .(iii) (Evolution) For almost every t ∈ (0 , T ) we have D αc u ( t ) + ∂ Φ( u ( t )) + Ψ( t, u ( t )) ∋ f ( t ) . Evidently, an energy solution to (6.1) satisﬁes, for almost every t ∈ (0 , T ) and all w ∈ H , the EVI(6.2) h D αc u ( t ) , u ( t ) − w i + h Ψ( t, u ( t )) , u ( t ) − w i + Φ( u ( t )) − Φ( w ) ≤ h f ( t ) , u ( t ) − w i . Existence, uniqueness, and stability.

Our main result in this direction is the following.

Theorem 6.2 (well posedness) . Assume that the energy Φ is convex, l.s.c., and with nonemptyeﬀective domain. Assume the the mapping Ψ satisﬁes conditions 1 and 2 stated above. Let u ∈ D (Φ) and f ∈ L α (0 , T ; H ) , then there is a unique energy solution to (6.1) in the sense of Deﬁnition 6.1.Moreover, we have that this solution satisﬁes k D αc u k L α (0 ,T ; H ) ≤ C, where the constant depends only on the problem data α , T , u , f , Φ , and Ψ .Proof. We begin by proving existence. We essentially follow the idea used for the classical ODEs.A similar argument was also used in the proof of [25, Theorem 4.4].For w ∈ L α (0 , T ; H ) we denote by S ( w ) ∈ L α (0 , T ; H ) the energy solution to D αc u ( t ) + ∂ Φ( u ( t )) ∋ f ( t ) − Ψ( t, w ( t )) , a.e. t ∈ (0 , T ] , u (0) = u . Our assumptions and the results of Theorem 4.5 guarantee that this mapping is well deﬁned, andmoreover, S ( w ) ∈ L ∞ (0 , T ; H ). We want to show that there exists a ﬁxed point w such that S ( w ) = w . If u i = S ( w i ) for i = 1 ,

2, then for almost every t we have12 D αc k u ( t ) − u ( t ) k ≤ −h Ψ( t, w ( t )) − Ψ( t, w ( t )) , u ( t ) − u ( t ) i . This readily implies that k u ( t ) − u ( t ) k ≤ L Γ( α ) ˆ t ( t − s ) α − k w ( s ) − w ( s ) kk u ( s ) − u ( s ) k d s ≤ L k u − u k L ∞ (0 ,t ; H ) Γ( α ) ˆ t ( t − s ) α − k w ( s ) − w ( s ) k d s which as a consequence yields that, for every t ∈ [0 , T ], k u − u k L ∞ (0 ,t ; H ) ≤ L Γ( α ) k w − w k L α (0 ,t ; H ) . We claim that by induction, we can further obtain the following stability result(6.3) k S n ( w ) − S n ( w ) k L ∞ (0 ,t ; H ) ≤ L n t αn Γ( αn + 1) k w − w k L ∞ (0 ,t ; H ) for any t ∈ [0 , T ] and positive integer n . In fact, for n = 1, we simply have k u − u k L ∞ (0 ,t ; H ) ≤ L Γ( α ) k w − w k L α (0 ,t ; H ) ≤ L t α Γ( α + 1) k w − w k L ∞ (0 ,t ; H ) . Furthermore, if (6.3) holds for n = k , then for n = k + 1 k S k +1 ( w ) − S k +1 ( w ) k L ∞ (0 ,t ; H ) ≤ L Γ( α ) k S k ( w ) − S k ( w ) k L α (0 ,t ; H ) ≤ L Γ( α ) sup ≤ r ≤ t ˆ r ( r − s ) α − L k s αk Γ( αk + 1) k w − w k L ∞ (0 ,t ; H ) d s = L k +1 t α ( k +1) Γ( α ( k + 1) + 1) k w − w k L ∞ (0 ,t ; H ) , which proves (6.3). Now consider w ∈ L α (0 , T ; H ) and the sequence of functions deﬁned via w n = S n ( w ). It is easy to see that, for n ≥

1, we have w n ∈ L ∞ (0 , T ; H ), and P ∞ n =1 k w n − w n +1 k L ∞ (0 ,T ; H ) converges because ∞ X n =0 L n t αn Γ( αn + 1) = E α ( L t α ) . This shows that w n → u in L ∞ (0 , T ; H ) for some u . Since w n +1 = S ( w n ), it follows immediatelythat u = S ( u ). This proves the existence of solutions.As for uniqueness, assume that we have two solutions u and u , for almost every t , we have12 D αc k u ( t ) − u ( t ) k ≤ −h Ψ( t, u ( t )) − Ψ( t, u ( t )) , u ( t ) − u ( t ) i ≤ L k u ( t ) − u ( t ) k . Combining with the fact that u (0) = u (0) = u , one obtains that k u ( t ) − u ( t ) k = 0 for almostevery t , which proves uniqueness.Finally, the estimate on the Caputo derivative trivially follows from the iteration scheme. Weskip the details. (cid:3) For diversity in our arguments, we present an alternative proof. The arguments here are inspiredby those of [3, Theorem 5.1].

Alternative proof of Theorem 6.2.

Let us, for µ > L /α , deﬁne k w k µ = sup t ∈ [0 ,T ] e − µt ˆ t ( t − s ) α − k w ( s ) k d s, which by the obvious inequalities e − µT ≤ e − µt ≤

1, deﬁnes an equivalent norm in L α (0 , T ; H ).Let S : L α (0 , T ; H ) → L α (0 , T ; H ) be as before. As shown, if u i = S ( w i ) for i = 1 ,

2, then forevery t we have k u ( t ) − u ( t ) k ≤ L Γ( α ) ˆ t ( t − s ) α − k w ( s ) − w ( s ) kk u ( s ) − u ( s ) k d s, which as a consequence yields that, for every r ∈ [0 , T ], e − µr ˆ r ( r − t ) α − k u ( r ) − u ( r ) k d r ≤ L e − µr Γ( α ) I( r ) , IME FRACTIONAL GRADIENT FLOW 35 where I( r ) = ˆ r ( r − t ) α − ˆ t ( t − s ) α − k w ( s ) − w ( s ) kk u ( s ) − u ( s ) k d s d t. Obvious manipulations then yieldI( r ) ≤ k u − u k µ k w − w k µ ˆ r ( r − t ) α − e µt d t, which implies e − µr ˆ r ( r − t ) α − k u ( r ) − u ( r ) k d r ≤ L Γ( α ) ˆ r ( r − t ) α − e − µ ( r − t ) d t ≤ L µ α < , so that S is a contraction with respect to the norm k · k µ . We conclude then by invoking thecontraction mapping principle. This unique ﬁxed point, evidently, is a energy solution in the senseof Deﬁnition 6.1.Uniqueness and stability follow as before. (cid:3) Discretization.

Let us now present the numerical scheme for problem (6.1). We follow theprevious notations and conventions regarding discretization so that, for any partition P of [0 , T ]deﬁned as in (2.2), we can also consider the discrete solution deﬁned recursively via(6.4) F n − ( D α P U ) n − Ψ n ( U n ) ∈ ∂ Φ( U n ) , where F n is deﬁned in (4.8) and Ψ n : H → H is deﬁned byΨ n ( w ) = t n t n − Ψ( t, w )d t. Clearly, for every n , Ψ n is Lipschitz continuous with Lipschitz constant L . Using the deﬁnition of D α P in (3.3) and K − P ,nn = ( K P ,nn ) − = Γ( α + 1) τ − αn , we can rewrite (6.4) asΓ( α + 1) τ − αn U n + Ψ n ( U n ) + ∂ Φ( U n ) ∋ F n − n − X i =0 K − P ,ni U i . Hence the discrete scheme can be recursively well-deﬁned provided L τ α < Γ( α + 1). For this reason,moving forward, we will implicitly operate under this assumption.It is possible to show that the discrete solutions in (6.4) satisfy(6.5) k D αc b U P k L α (0 ,T ; H ) ≤ C, with a constant that depends on problem data but is independent of the partition P . To see this,we follow the arguments of either proof of Theorem 6.2, and realize that while the operator S maydepend on P , the estimates that we obtain do not.6.3. Error estimates.

Let us now show how to derive error estimates for the problem with Lipschitzperturbation (6.1). We recall that the energy solution u to this problem satisﬁes (6.2). In addition,for simplicity, we will operate under the assumption that the perturbation does not depend explicitlyon time, i.e., Ψ( t, w ) = Ψ( w ) for all w ∈ H . The general case only lengthens the discussion butbrings nothing substantive to it, as the additional terms that appear can be controlled via argumentsused to control terms of the form f ( t ) − F P ( t ) . Similar to the discussion before, we deﬁne the error estimator(6.6) E P , L ( t ) = E P ( t ) + h Ψ( U P ( t )) , b U P ( t ) − U P ( t ) i , which, as before, is nonnegative. In addition, for any w ∈ H we have h D αc b U P ( t ) + Ψ( b U P ( t )) − f ( t ) , b U P ( t ) − w i + Φ( b U P ( t )) − Φ( w )= E P , L ( t ) + h F P ( t ) − Ψ( U P ( t )) − D αc b U P ( t ) , w − U P ( t ) i + Φ( U P ( t )) − Φ( w )+ h Ψ( U P ( t )) − Ψ( b U P ( t )) + f ( t ) − F P ( t ) , w − b U P ( t ) i≤ E P , L ( t ) + h Ψ( U P ( t )) − Ψ( b U P ( t )) + f ( t ) − F P ( t ) , w − b U P ( t ) i − σ ( U P ( t ); w ) . Setting w = u ( t ) in the inequality above and setting w = b U ( t ) in (6.2) leads to(6.7) D D αc (cid:16) b U P − u (cid:17) ( t ) , b U P ( t ) − u ( t ) E + σ ( U P ( t ); u ( t )) + σ ( u ( t ); b U P ( t )) ≤E P , L ( t ) + h Ψ( U P ( t )) − Ψ( b U P ( t )) + f ( t ) − F P ( t ) , u ( t ) − b U P ( t ) i + h Ψ( b U P ( t )) − Ψ( u ( t )) , u ( t ) − b U P ( t ) i for almost every t ∈ (0 , T ). This implies the following error estimates. Theorem 6.3 (a posteriori: Lipschitz perturbations) . Let u be the unique energy solution of (6.1) .Let P be a partition of [0 , T ] deﬁned as in (2.2) and let U ∈ H N be the discrete solution given by (6.4) starting from U ∈ D (Φ) . Let E and E P , L be deﬁned in (5.6) and (6.6) , respectively, Thefollowing a posteriori error estimate holds (6.8) E ≤ (cid:18) k u − U k + 2Γ( α ) kE P , L k L α (0 ,T ; H ) (cid:19) / ( E α (2 L T α )) / + 2Γ( α ) (cid:16) k f − F P k L α (0 ,T ; H ) + L k U P − b U P k L α (0 ,T ; H ) (cid:17) E α (2 L T α ) . Proof.

We argue as in the proof of (5.3). To make formulas shorter we omit the coercivity terms.From (2.18) and (6.7) we infer(6.9) 12 D αc k b U P − u k ( t ) ≤ D D αc (cid:16) b U P − u (cid:17) ( t ) , b U P ( t ) − u ( t ) E ≤ E P , L ( t ) + h Ψ( t, U P ( t )) − Ψ( t, b U P ( t )) + f ( t ) − F P ( t ) , u ( t ) − b U P ( t ) i + L k b U P ( t ) − u ( t ) k ( t ) ≤ E P , L ( t ) + (cid:16) L k U P ( t ) − b U P ( t ) k + k f ( t ) − F P ( t ) k (cid:17) k b U P ( t ) − u ( t ) k + L k b U P ( t ) − u ( t ) k . Then the error estimate (6.8) follows from Lemma 2.8 with λ = L , a ( t ) = k ( b U P − u )( t ) k , b = 0 , c = 2 E P , L ( t ) , d ( t ) = L k U P ( t ) − b U P ( t ) k + k ( f − F P )( t ) k . (cid:3) We also comment here that by Lemma 2.6 k U P − b U P k L α (0 ,T ; H ) ≤ Cτ α k D αc b U P k L α (0 ,T ; H ) ≤ CT α/ α / τ α k D αc b U P k L α (0 ,T ; H ) , where the constant C only depends on α . In addition, the norm on the right hand side is boundedindependently of the partition P ; see (6.5). Hence the convergence rates proved in Theorems 5.5and 5.7 also hold for problems with a Lipschitz perturbation. Since the proofs are almost identical,we only state the theorems below without proofs. Theorem 6.4 (convergence rate: Lipschitz perturbations) . Let u be the energy solution of (6.1) .Let P be a partition of [0 , T ] deﬁned as in (2.2) and U ∈ H N be the discrete solution given by (6.4) starting from U ∈ D (Φ) . Let E be deﬁned in (5.6) . Then we have E ≤ k u − U k ( E α (2 L T α )) / + Cτ α/ (cid:16) k f k L α (0 ,T ; H ) + k D αc b U P k L α (0 ,T ; H ) (cid:17) , where the constant C depends only on α, L and T , but not on P . IME FRACTIONAL GRADIENT FLOW 37

Theorem 6.5 (improved rate: smooth energies and Lipschitz perturbations) . Assume that theenergy Φ satisﬁes (5.12) . Let u be the energy solution to (6.1) , and denote by P a partition of [0 , T ] deﬁned as in (2.2) . Denote by b U P the solution of (6.4) starting from U ∈ D (Φ) . In this setting, ifthere is q > /α for which f ∈ W α ( β +1) / ,q (0 , T ; H ) then the error E , deﬁned in (5.6) , satisﬁes (6.10) E ≤ k u − U k ( E α (2 L T α )) / + C τ α k D αc b U P k L α (0 ,T ; H ) + C τ α ( β +1) / (cid:20)(cid:16) k f k L α (0 ,T ; H ) + k D αc b U P k L α (0 ,T ; H ) (cid:17) ( β +1) / + | f | W α ( β +1) / ,q (0 ,T ; H ) (cid:21) , where the constants C and C depend only on α, β, q, L , T , and the problem data, but are independentof P . Finally we consider the setting of Section 5.3.3 with a Lipschitz perturbation. Similar to (6.5),we can show that k D αc b U P k L α (0 ,T ;[ · ] V ) is bounded uniformly with respect to the partition P . For thisreason, an improved error estimate analogous to Theorem 5.8 can be proved in this case. Theorem 6.6 (improved rate: quadratic energies and Lipschitz perturbations) . Assume that theenergy Φ is given by (5.15) , that the initial data satisﬁes A u ∈ H , and that f ∈ L α (0 , T ; [ · ] V ) . Let u be the energy solution to (6.1) , and denote by P a partition of [0 , T ] deﬁned as in (2.2) . Denoteby b U P the solution to (6.4) starting from U ∈ H , such that A U ∈ H . In this setting, we have that (6.11) E ≤ k u − U k ( E α (2 L T α )) / + C k f − F k L α (0 ,T ; H ) + Cτ α (cid:16) k A U k + k f k L α (0 ,T ;[ · ] V ) + k D αc b U P k L α (0 ,T ;[ · ] V ) + k D αc b U P k L α (0 ,T ; H ) (cid:17) where the constant C depends only on α, L and T . Numerical illustrations

In this section we present some simple numerical examples aimed at illustrating, and extending,our theory. All the computations were done with an in-house code that was written in MATLAB © .7.1. Practical a posteriori estimators.

We begin by commenting that, unlike the a posterioriestimators for the classical gradient ﬂow proposed in [30], our a posteriori estimator E P is notconstant on each subinterval of our partition P ; see (5.3). Here we mention more computationallyfriendly alternatives, and their properties.First, we deﬁne an estimator that is piecewise constant in time via D P ( t ) = max s ∈ [ ⌊ t ⌋ P , ⌈ t ⌉ P ] n h D αc b U P ( s ) − F ( s ) , b U P ( s ) − U P ( s ) i + Φ( b U P ( s )) − Φ( U P ( s )) o This is clearly an upper bound for E P ( t ).One may also consider the simpler indicator(7.1) e E P ,n = h ( D α P U ) n − F n , U n − − U n i + Φ( U n − ) − Φ( U n ) , n = 1 , . . . , N. Although it is not always true that E P ( t ) ≤ e E P ,n ( t ) , this indicator is convenient to use in practiceand gives reasonable results. In fact, this is the one that we implemented in the numerical examplesof Section 7.3 below.7.2. A linear one dimensional example.

As a ﬁrst simple example we consider the one dimen-sonal fractional ODE(7.2) D αc u + λu = 0 , u (0) = 1 , with λ >

0. From (2.19) we have u ( t ) = E α ( − λt α ) . This, obviously, ﬁts our framework with H = R ,and Φ( w ) = λ | w | . Notice also that all the assumptions of Section 5.3.2 are also satisﬁed with β = 1.Thus, we expect a rate of order O ( τ α ) when using (4.7) to approximate the solution over a uniformpartition with time step τ . α = 0 . α = 0 . α = 0 . τ | u (1) − U N | rate5.000 e -02 4.563 e -04 —2.500 e -02 3.702 e -04 0.3014171.250 e -02 3.005 e -04 0.3009796.250 e -03 2.440 e -04 0.3006643.125 e -03 1.981 e -04 0.3004451.563 e -03 1.609 e -04 0.3002977.813 e -04 1.307 e -04 0.3001993.906 e -04 1.061 e -04 0.3001331.953 e -04 8.619 e -05 0.3000909.766 e -05 7.001 e -05 0.3000624.883 e -05 5.686 e -05 0.3000432.441 e -05 4.619 e -05 0.300030 τ | u (1) − U N | rate5.000 e -02 2.829 e -04 —2.500 e -02 1.996 e -04 0.5030511.250 e -02 1.409 e -04 0.5023096.250 e -03 9.954 e -05 0.5017103.125 e -03 7.032 e -05 0.5012481.563 e -03 4.969 e -05 0.5009027.813 e -04 3.512 e -05 0.5006483.906 e -04 2.483 e -05 0.5004631.953 e -04 1.755 e -05 0.5003309.766 e -05 1.241 e -05 0.5002354.883 e -05 8.773 e -06 0.5001662.441 e -05 6.203 e -06 0.500118 τ | u (1) − U N | rate5.000 e -02 1.235 e -04 —2.500 e -02 7.571 e -05 0.7054171.250 e -02 4.646 e -05 0.7046206.250 e -03 2.852 e -05 0.7038713.125 e -03 1.752 e -05 0.7032071.563 e -03 1.076 e -05 0.7026387.813 e -04 6.616 e -06 0.7021603.906 e -04 4.068 e -06 0.7017641.953 e -04 2.502 e -06 0.7014379.766 e -05 1.539 e -06 0.7011704.883 e -05 9.465 e -07 0.7009522.441 e -05 5.823 e -07 0.700774 Table 1.

Convergence rate for the approximation of (7.2) using scheme (4.7) overa uniform partition of size τ . As predicted by Section 5.3.2, the rate is O ( τ α ). -9 -8 -7 -6 -5 -4 -3 Figure 2.

Adaptive time stepping for problem (7.2) with T = 1 , λ = 1 , α = is used to achieve a tolerance of ε = 10 − . The adaptive solver uses 8 ,

747 timeintervals with minimum time step 6 . × − and max time step 5 . × − .Table 1 shows, for λ = 0 .

001 and diﬀerent values of α , the diﬀerence | u (1) − U N | which we use asa proxy for the error E H of (5.6). The rate of convergence is veriﬁed.7.3. Adaptive time stepping.

We now illustrate the use of the a posteriori error estimator E P given in (5.3) to drive the selection of the size of the time step. For a given tolerance ε we, at everystep, choose the local time step τ n to guarantee that2 T α Γ( α + 1) e E P ,n ≤ ε , where e E P ,n is given in (7.1). Then, by Theorem 5.3, we expect that k u − b U P k L ∞ (0 ,T ; H ) ≤ ε, provided the approximation error k f − F P k L α (0 ,T ; H ) is negligible. Notice that to drive the processwe are using the simpler estimator e E P ; see the discussion in Section 7.1.We consider the linear problem (7.2) with λ = 1 and α = and set ε = 10 − . Figure 2 shows thelocal time step τ ( t ) for t ∈ [0 , T ]. As expected, due to the weak singularity of u at t = 0 the timestep must be rather small for small times. For larger times, however, the solution is smoother andlarger local time steps can be taken. With this process we obtain that k u − b U P k L ∞ (0 ,T ; H ) ≈ . × − , IME FRACTIONAL GRADIENT FLOW 39 τ | U ( N k ) − u ( N k − ) | rate7.813 e -04 — —3.906 e -04 1.256 e -06 —1.953 e -04 6.276 e -07 1.0013079.766 e -05 3.135 e -07 1.0012984.883 e -05 1.568 e -07 0.9992722.441 e -05 7.827 e -08 1.0027741.221 e -05 3.924 e -08 0.996178 Table 2.

Convergence rate for α = 0 . p = 1 .

5, and λ = 1 in Example 1 ofSection 7.4. The rate seems to be of order O ( τ ), which is better than what thetheory predicts.and this requires N = 8 ,

747 time subintervals. For comparison, choosing a uniform time step of τ = 6 . × − we require N = 163 ,

840 time intervals. This achieves an error of ε = 4 . × − ,which is slightly higher than that obtained with our adaptive procedure. This clearly shows theadvantages and possibilities for this strategy.7.4. Some nonlinear one dimensional examples.

We now, while staying in one dimension,depart from the linear theory and illustrate the performance of our method in a series of nonlinearexamples of increasing diﬃculty. In all the examples we set H = R and f = 0. Thus, we will onlyspecify the energy and initial condition in each case.In all the examples, since the exact solution is not known, we compare the solutions at diﬀerenttime levels. Speciﬁcally, we let τ k = 2 − k and upon denoting by U ( N k ) the approximate solution at T = 1 computed with step size τ k , we computerate k = log ( | U ( N k − ) − U ( N k − ) | ) − log ( | U ( N k ) − U ( N k − ) | ) . Example 1.

We let p ∈ (1 ,

2) and setΦ( w ) = λp | w | p , u = 110 . Notice that this example ﬁts the framework of Section 5.3.2 with β = p −

1. However, as mentionedthere, it is not expected that the solution reaches zero in ﬁnite time, so we do not expect a reducedrate.To compute the discrete solution, at every time step, we need to solve a nonlinear equation of theform U n + c U n | U n | p − − W n = 0 , c = λτ α Γ( α + 1)where W n is known. We found the solution to this problem using Newton’s method, which worksfor small values of τ .Table 2 shows the results for α = 0 . p = 1 .

5, and λ = 1. These clearly indicate a rate of O ( τ ).7.4.2. Example 2.

We set Φ( w ) = λ ( u ln u − u ) , u = 0 . with λ >

0, so that D (Φ) = [0 , ∞ ). Notice that u ∈ D (Φ) \ D ( ∂ Φ).At each time step one needs to solve a problem of the form U n + c ln( U n ) − W n = 0 , c = λτ α Γ( α + 1) , and W n is known. This is solved with a Newton scheme, which runs into diﬃculties at the initialtime step. We go around this issue by using as initial value for the iteration a very small positivenumber. τ | U ( N k ) − u ( N k − ) | rate5.000 e -02 — —2.500 e -02 6.761 e -07 —1.250 e -02 5.330 e -07 0.3429916.250 e -03 4.088 e -07 0.3829163.125 e -03 3.077 e -07 0.4095831.563 e -03 2.290 e -07 0.4263287.813 e -04 1.689 e -07 0.4392953.906 e -04 1.238 e -07 0.4478181.953 e -04 9.039 e -08 0.4539369.766 e -05 6.579 e -08 0.4583994.883 e -05 4.777 e -08 0.4617092.441 e -05 3.463 e -08 0.4642091.221 e -05 2.507 e -08 0.4661366.104 e -06 1.813 e -08 0.4676563.052 e -06 1.310 e -08 0.468883 Table 3.

Convergence rate for α = 0 . λ = 10 − in Example 2 of Section 7.4.The rate seems to be of order O ( τ α ), which is better than what the theory predicts. τ | U ( N k ) − u ( N k − ) | rate5.000 e -02 — —2.500 e -02 3.370 e -07 —1.250 e -02 1.881 e -07 0.8409966.250 e -03 1.033 e -07 0.8649443.125 e -03 5.607 e -08 0.8812861.563 e -03 3.019 e -08 0.8931687.813 e -04 1.615 e -08 0.9022813.906 e -04 8.599 e -09 0.9095741.953 e -04 4.559 e -09 0.9156069.766 e -05 2.408 e -09 0.9207234.883 e -05 1.268 e -09 0.9251492.441 e -05 6.660 e -10 0.9290391.221 e -05 3.490 e -10 0.9324976.104 e -06 1.825 e -10 0.935603 Table 4.

Convergence rate for α = 0 . λ = 10 − in Example 3 of Section 7.4.The rate seems to be of order O ( τ ), which is better than what the theory predicts.Table 3 presents the results for α = 0 . λ = 10 − . These indicate that the convergence rateis O ( τ α ). Similar results for other choices of α and λ were obtained.7.4.3. Example 3.

As a ﬁnal example we considerΦ( w ) = − λ q − (1 − u ) , u = 0 . Notice that D (Φ) = [0 , ∞ ) and, once again, u ∈ D (Φ) \ D ( ∂ Φ). Table 4 presents the results for α = 0 . λ = 10 − . We, again, seem to get a rate that is better than what the theory predicts. Acknowledgement

AJS is partially supported by NSF grant DMS-1720213.

IME FRACTIONAL GRADIENT FLOW 41

References

1. E. Aﬃli and E. Valdinoci,

Decay estimates for evolution equations with classical and fractional time-derivatives ,J. Diﬀerential Equations (2019), no. 7, 4027–4060. MR 39127102. R.P. Agarwal and B. Ahmad,

Existence theory for anti-periodic boundary value problems of fractional diﬀerentialequations and inclusions , Comput. Math. Appl. (2011), no. 3, 1200–1214. MR 28247083. G. Akagi, Fractional ﬂows driven by subdiﬀerentials in Hilbert spaces , Israel J. Math. (2019), no. 2, 809–862.MR 40408464. M. Allen,

H¨older regularity for nondivergence nonlocal parabolic equations , Calc. Var. Partial Diﬀerential Equa-tions (2018), no. 4, Paper No. 110, 29. MR 38267175. , A nondivergence parabolic problem with a fractional time derivative , Diﬀerential Integral Equations (2018), no. 3-4, 215–230. MR 37381966. M. Allen, L. Caﬀarelli, and A. Vasseur, A parabolic problem with a fractional time derivative , Arch. Ration. Mech.Anal. (2016), no. 2, 603–630. MR 34885337. ,

Porous medium ﬂow with both a fractional potential pressure and fractional time derivative , Chin. Ann.Math. Ser. B (2017), no. 1, 45–82. MR 35921568. I. Benedetti, V. Obukhovskii, and V. Taddei, On noncompact fractional order diﬀerential inclusions with general-ized boundary condition and impulses in a Banach space , J. Funct. Spaces (2015), Art. ID 651359, 10. MR 33354539. A. Bernardis, F.J. Mart´ın-Reyes, P.R. Stinga, and J.L. Torrea,

Maximum principles, extension problem andinversion for nonlocal one-sided equations , J. Diﬀerential Equations (2016), no. 7, 6333–6362. MR 345683510. L. Caﬀarelli and L. Silvestre,

An extension problem related to the fractional Laplacian , Comm. Partial DiﬀerentialEquations (2007), no. 7-9, 1245–1260. MR 235449311. M. Caputo, Linear models of dissipation whose q is almost frequency independent-ii , Geophysical Journal of theRoyal Astronomical Society (1967), no. 5, 529–539.12. A. Cernea, On a fractional diﬀerential inclusion arising from real estate asset securitization and HIV models ,Ann. Univ. Buchar. Math. Ser. (2013), no. 2, 447–453. MR 316477713. F.H. Clarke,

Optimization and nonsmooth analysis , second ed., Classics in Applied Mathematics, vol. 5, Societyfor Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. MR 105843614. B. de Andrade and T.S. Cruz,

Regularity theory for a nonlinear fractional reaction-diﬀusion equation , NonlinearAnal. (2020), 111705, 14. MR 408067515. D. del Castillo-Negrete,

Fractional diﬀusion models of nonlocal transport , Phys. Plasmas (2006), no. 8, 082308,16. MR 224973216. X. Feng and M. Sutton, A new theory of fractional diﬀerential calculus , arXiv:2007.10244, 2020.17. Y. Feng, L. Li, J.-G. Liu, and X. Xu,

Continuous and discrete one dimensional autonomous fractional ODEs ,Discrete Contin. Dyn. Syst. Ser. B (2018), no. 8, 3109–3135. MR 384819218. I.M. Gel’fand and G.E. ˇSilov, Obobshchennye funksii i de˘istviya iad nimi , Obobˇsˇcennye funkcii, Vypusk 1.,Gosudarstv. Izdat. Fiz.-Mat. Lit., Moscow, 1958, (In Russian). MR 009771519. R. Gorenﬂo, A.A. Kilbas, F. Mainardi, and S.V. Rogosin,

Mittag-Leﬄer functions, related topics and applications ,Springer Monographs in Mathematics, Springer, Heidelberg, 2014. MR 324428520. L. Grafakos,

Classical Fourier analysis , third ed., Graduate Texts in Mathematics, vol. 249, Springer, New York,2014. MR 324373421. T.D. Ke, N.N. Thang, and L. Tran P. Thuy,

Regularity and stability analysis for a class of semilinear nonlocaldiﬀerential equations in Hilbert spaces , J. Math. Anal. Appl. (2020), no. 2, 123655, 23. MR 403758622. J. Kemppainen, J. Siljander, V. Vergara, and R. Zacher,

Decay estimates for time-fractional and other non-localin time subdiﬀusion equations in R d , Math. Ann. (2016), no. 3-4, 941–979. MR 356322923. M. Krasnoschok, V. Pata, S.V. Siryk, and N. Vasylyeva, A subdiﬀusive Navier-Stokes-Voigt system , Phys. D (2020), 132503, 13. MR 408735224. M. Krasnoschok, V. Pata, and N. Vasylyeva,

Semilinear subdiﬀusion with memory in multidimensional domains ,Math. Nachr. (2019), no. 7, 1490–1513. MR 398232525. L. Li and J.-G. Liu,

A generalized deﬁnition of Caputo derivatives and its application to fractional ODEs , SIAMJ. Math. Anal. (2018), no. 3, 2867–2900. MR 380953526. , A note on deconvolution with completely monotone sequences and discrete fractional calculus , Quart.Appl. Math. (2018), no. 1, 189–198. MR 373309927. , Some compactness criteria for weak solutions of time fractional PDEs , SIAM J. Math. Anal. (2018),no. 4, 3963–3995. MR 382885628. , A discretization of Caputo derivatives with application to time fractional SDEs and gradient ﬂows , SIAMJ. Numer. Anal. (2019), no. 5, 2095–2120. MR 400021929. Y. Lin, X. Li, and C. Xu, Finite diﬀerence/spectral approximations for the fractional cable equation , Math. Comp. (2011), no. 275, 1369–1396. MR 278546230. R.H. Nochetto, G. Savar´e, and C. Verdi, A posteriori error estimates for variable time-step discretizations ofnonlinear evolution equations , Comm. Pure Appl. Math. (2000), no. 5, 525–589. MR 1737503

31. C. Quan, T. Tang, and Yang J.,

How to deﬁne dissipation-preserving energy for time-fractional phase-ﬁeldequations , arXiv:2007.14855, 2020.32. T. Roub´ıˇcek,

Nonlinear partial diﬀerential equations with applications , second ed., International Series of Numer-ical Mathematics, vol. 153, Birkh¨auser/Springer Basel AG, Basel, 2013. MR 301445633. W. Schirotzek,

Nonsmooth analysis , Universitext, Springer, Berlin, 2007. MR 233077834. P.R. Stinga and J.L. Torrea,

Extension problem and Harnack’s inequality for some fractional operators , Comm.Partial Diﬀerential Equations (2010), no. 11, 2092–2122. MR 275408035. M. Stynes, Too much regularity may force too much uniqueness , Fract. Calc. Appl. Anal. (2016), no. 6,1554–1562. MR 358936536. , Fractional-order derivatives deﬁned by continuous kernels are too restrictive , Appl. Math. Lett. (2018), 22–26. MR 382027537. , Singularities , Handbook of fractional calculus with applications. Vol. 3, De Gruyter, Berlin, 2019, pp. 287–305. MR 396657038. T. Tang, H. Yu, and T. Zhou,

On energy dissipation theory and numerical stability for time-fractional phase-ﬁeldequations , SIAM J. Sci. Comput. (2019), no. 6, A3757–A3778. MR 403609539. V. Vergara and R. Zacher, Lyapunov functions and convergence to steady state for diﬀerential equations offractional order , Math. Z. (2008), no. 2, 287–309. MR 239008240. A.N. Vityuk,

Existence of solutions of a diﬀerential inclusion of fractional order with an upper-semicontinuousright-hand side , Ukra¨ın. Mat. Zh. (1999), no. 11, 1562–1565. MR 174433641. R. Zacher, A weak Harnack inequality for fractional diﬀerential equations , J. Integral Equations Appl. (2007),no. 2, 209–232. MR 235500942. , Global strong solvability of a quasilinear subdiﬀusion problem , J. Evol. Equ. (2012), no. 4, 813–831.MR 300045743. , A De Giorgi–Nash type theorem for time fractional diﬀusion equations , Math. Ann. (2013), no. 1,99–146. MR 303812344. ,

A weak Harnack inequality for fractional evolution equations with discontinuous coeﬃcients , Ann. Sc.Norm. Super. Pisa Cl. Sci. (5) (2013), no. 4, 903–940. MR 318457345. , Time fractional diﬀusion equations: solution concepts, regularity, and long-time behavior , Handbook offractional calculus with applications. Vol. 2, De Gruyter, Berlin, 2019, pp. 159–179. MR 396539346. Y. Zhang,

Numerical treatment of the modiﬁed time fractional Fokker-Planck equation , Abstr. Appl. Anal. (2014),Art. ID 282190, 10. MR 3191030

Email address , W. Li: [email protected]

Email address , A.J. Salgado: [email protected]@utk.edu