[PDF] A wavelet-in-time, finite element-in-space adaptive method for parabolic evolution equations

Abstract

In this work, an r-linearly converging adaptive solver is constructed for parabolic evolution equations in a simultaneous space-time variational formulation. Exploiting the product structure of the space-time cylinder, the family of trial spaces that we consider are given as the spans of wavelets-in-time and (locally refined) finite element spaces-in-space. Numerical results illustrate our theoretical findings.

Full PDF

AA WAVELET-IN-TIME, FINITE ELEMENT-IN-SPACE ADAPTIVE METHODFOR PARABOLIC EVOLUTION EQUATIONS

ROB STEVENSON, RAYMOND VAN VENETIË, JAN WESTERDIEPA

BSTRACT . In this work, an r -linearly converging adaptive solver is constructedfor parabolic evolution equations in a simultaneous space-time variational formu-lation. Exploiting the product structure of the space-time cylinder, the family oftrial spaces that we consider are given as the spans of wavelets-in-time and (locallyreﬁned) ﬁnite element spaces-in-space. Numerical results illustrate our theoreticalﬁndings.

1. I

NTRODUCTION

This paper is about the adaptive numerical solution of parabolic evolution equa-tions written in a simultaneous space-time variational formulation. In comparisonto the usually applied time-marching schemes, simultaneous space-time solversoffer the following potential advantages: • local, adaptive reﬁnements simultaneous in space en time ([SY18, RS18,GS19]), • quasi-best approximation from the selected trial space (‘Cea’s lemma’) ([And13,LM17, SW20b]), being a necessary requirement for proving optimal ratesfor adaptive routines ([CS11, KSU16, RS18]), • superior parallel performance ([DGVdZ16, NS19, HLNS19, vVW20]), • using the product structure of the space-time cylinder, sparse tensor prod-uct approximation ([GO07, CS11, KSU16, RS18]) which allows to solve thewhole time evolution at a complexity of solving the corresponding station-ary problem.Other relevant publications on space-time solvers include [Ste15, LMN16, SZ20,Dev20, DS20].In any case without applying sparse tensor product approximation, a disad-vantage of the space-time approach is the larger memory consumption becauseinstead of solving a sequence of PDEs on a d -dimensional space, one has to solveone PDE on a ( d + ) -dimensional space. This disadvantage, however, disappearswhen one needs simultaneously the whole time evolution as for example withproblems of optimal control ([GK11, BRU20]) or data-assimilation ([DSW20]). Date : January 12, 2021.2010

Mathematics Subject Classiﬁcation.

Key words and phrases.

Space-time variational formulations of parabolic PDEs, quasi-best approx-imations, least squares methods, adaptive approximation, tensor product approximation, optimal pre-conditioners.The second and third author have been supported by the Netherlands Organization for ScientiﬁcResearch (NWO) under contract. no. 613.001.652. a r X i v : . [ m a t h . NA ] J a n ROB STEVENSON, RAYMOND VAN VENETIË, JAN WESTERDIEP

Parabolic problem in a simultaneous space-time variational formulation.

For some separable Hilbert spaces V (cid:44) → H with dense embedding (e.g. H ( Ω ) and L ( Ω ) for the model problem of the heat equation on a spatial domain Ω ⊂ R d ), and a boundedly invertible A ( t ) = A ( t ) (cid:48) : V → V (cid:48) with ( A ( t ) · )( · ) (cid:104) (cid:107) · (cid:107) V (e.g. ( A ( t ) η )( ζ ) = (cid:82) Ω ∇ η · ∇ ζ d x ), we consider (cid:26) dudt ( t ) + A ( t ) u ( t ) = g ( t ) ( t ∈ ( T )) , u ( ) = u .An application of a variational formulation of the PDE over space and time leadsto an equation(1.1) (cid:20) B γ (cid:21) u = (cid:20) gu (cid:21) where, with X : = L ( I ; V ) ∩ H ( I ; V (cid:48) ) and Y : = L ( I ; V ) , the operator at the lefthand side is boundedly invertible X → Y (cid:48) × H .1.2. Our previous work.

In [CS11, RS18] we equipped X and Y with Riesz basesbeing tensor products of wavelet bases in space in time, and H with some spatialRiesz basis. Consequently, the equation (1.1) got an equivalent formulation as abi-inﬁnite well-posed matrix-vector equation (cid:20) B γ (cid:21) u = (cid:20) gu (cid:21) (actually, in [RS18],we considered a formulation of ﬁrst order, and in [CS11] we used a variationalformulation with essentially interchanged roles of X and Y , which however is ir-relevant for the current discussion). To get a coercive bilinear form we formednormal equations to which we applied an adaptive wavelet scheme ([CDD01]).With such a scheme the norm of a sufﬁciently accurate approximation of the (in-ﬁnite) residual vector of a current approximation is used as an a posteriori errorestimator. The coefﬁcients in modulus of this vector are applied as local error in-dictors in a bulk chasing (or Dörﬂer) marking procedure. The resulting adaptivealgorithm converges at the best possible rate in linear computational complexity.The goal of the current work is to investigate to what extent similar optimaltheoretical results can be shown for ﬁnite element discretizations, whilst realizinga quantitatively superior implementation.1.3. Least squares minimization.

Without having Riesz bases for X and Y , al-ready the step of ﬁrst discretizing and then forming normal equations does notapply, and we reverse their order. A problem equivalent to (1.1) is to compute(1.2) u = argmin w ∈ X (cid:107) Bw − g (cid:107) Y (cid:48) + (cid:107) γ w − u (cid:107) H .An obvious approach for the numerical approximation is to consider the mini-mization over ﬁnite dimensional subspaces X δ of X , which however is not feasiblebecause of the presence of the dual norm.For trial spaces X δ that are ‘full’ (or ‘sparse’) tensor products of ﬁnite elementspaces in space and time, in [And13] it was shown how to construct correspondingtest spaces Y δ ⊂ Y of similar type and dimension, such that ( X δ , Y δ ) is uniformly inf-sup stable meaning that when the continuous dual norm (cid:107) · (cid:107) Y (cid:48) is replaced bythe discrete dual norm (cid:107) · (cid:107) Y δ (cid:48) , a minimization over X δ yields a quasi-best approxi-mation to u from X δ . Such a family of trial spaces however does not allow to createa nested sequence of trial spaces by adaptive local reﬁnements. N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 3

Family of inf-sup stable pairs of trial and test spaces.

To construct an alter-native, essentially larger family, let Σ be a wavelet Riesz basis for L ( T ) that,after renormalization, is also a Riesz basis for H ( T ) . We equip this basis witha tree structure where every wavelet that is not on the coarsest level has a parenton the next coarser level. In space, we consider the collection of all linear ﬁniteelement spaces that can be generated by conforming newest vertex bisection start-ing from an initial conforming partition of a polytopal Ω into d -simplices. Therestriction to linear ﬁnite elements is not essential and is made for simplicity only.Now we consider trial spaces X δ that are spanned by a number of wavelets eachof them tensorized with a ﬁnite element space from the aforementioned collec-tion. In order to be able to apply the arising system matrices in linear complexity([KS14, vVW21]) we impose the condition that if a wavelet tensorized with a ﬁniteelement space is in the spanning set, then so is its parent wavelet tensorized witha ﬁnite element space that includes the former one.The inﬁnite collection of ﬁnite element spaces can be associated to a hierarchical‘basis’ that can be equipped with a tree structure. Each hierarchical basis function,except those on the coarsest level, is associated to a node ν that was inserted as themidpoint of an edge connecting two nodes on the next coarser level, which nodeswe call the parents of ν . With this deﬁnition there is a one-to-one correspondencebetween the ﬁnite element spaces from our collection and the spans of the sets ofhierarchical basis functions that form trees. Consequently, our collection of trialspaces X δ consists of the spans of sets of tensor products of wavelets-in-time andhierarchical basis functions-in-space which sets are downwards closed , also knownas lower , in the sense that if a pair of a wavelet and a hierarchical basis function isin the set, then so are all its parents in time and space. Spaces from this collectioncan be ‘locally’ expanded by adding the span of a tensor product of a wavelet andhierarchical basis function one-by-one.For this family of spaces X δ we construct a corresponding family of spaces Y δ ⊂ Y of similar type such that each pair ( X δ , Y δ ) is uniformly inf-sup stable,with the dimension of Y δ being proportional to that of X δ . Furthermore, usingthe properties of the wavelets in time and by applying multigrid preconditionersin space we construct optimal preconditioners at X and Y -side which allow a fastsolution of the discrete problems argmin w ∈ X δ (cid:107) Bw − g (cid:107) Y δ (cid:48) + (cid:107) γ w − u (cid:107) H .1.5. Adaptive algorithm.

Having ﬁxed the family of trial spaces, it remains todevelop an algorithm that selects a suitable, preferably quasi-optimal, nested se-quence of spaces from the family adapted to the solution u of (1.2). The theoryabout adaptive (Ritz-) Galerkin approximations for such quadratic minimizationproblems is in a mature state. As noticed before, however, Galerkin approxima-tions for (1.2) are not computable.Therefore given X δ , let X ¯ δ ⊃ X δ be such that saturation holds, i.e., for someconstant ζ <

1, it holds that inf w ∈ X ¯ δ (cid:107) u − w (cid:107) X ≤ ζ inf w ∈ X δ (cid:107) u − w (cid:107) X . We nowreplace problem (1.2) by(1.3) u ¯ δ ¯ δ = argmin w ∈ X ¯ δ (cid:107) Bw − g (cid:107) Y ¯ δ (cid:48) + (cid:107) γ w − u (cid:107) H , ROB STEVENSON, RAYMOND VAN VENETIË, JAN WESTERDIEP where in the notation u ¯ δ ¯ δ the ﬁrst instance of ¯ δ refers to the space Y ¯ δ and the secondto the space X ¯ δ . Its (computable) Galerkin approximation from X δ is given by u ¯ δδ = argmin w ∈ X δ (cid:107) Bw − g (cid:107) Y ¯ δ (cid:48) + (cid:107) γ w − u (cid:107) H .By a standard adaptive procedure, described below, we expand X δ to some X ˜ δ ⊆ X ¯ δ such that u ¯ δ ˜ δ is closer to u ¯ δ ¯ δ than u ¯ δδ . Next, we replace Y ¯ δ by Y ¯˜ δ (being thetest space corresponding to X ¯˜ δ ) and repeat (i.e. consider (1.3) with ( ¯ δ , ¯ δ ) readingas ( ¯˜ δ , ¯˜ δ ) , and improve its Galerkin approximation u ¯˜ δ ˜ δ from X ˜ δ by an adaptive en-largement of the latter space).The adaptive expansion of the trial space X δ to X ˜ δ will be by the applicationof the usual solve-estimate-mark-reﬁne paradigm, where the error indicators arethe coefﬁcients of the residual vector w.r.t. (modiﬁed) tensor product basis func-tions that were added to X δ to create X ¯ δ . In order for this collection of additionaltensor product basis functions to be stable in X -norm, for this step we modify thehierarchical basis functions such that they get a vanishing moment, and thereforebecome closer to ‘real’ wavelets.Under the aforementioned saturation assumption, we prove that the overalladaptive procedure produces an r -linearly converging sequence to the solution.1.6. Numerical results.

We have tested the adaptive algorithm in several exam-ples with a two-dimensional spatial domain. In all but one case, we observed aconvergence rate equal to 1/2, being the best that can be expected in view of thepiecewise polynomial degree of the trial functions and the tensor product construc-tion, and for non-smooth solutions improving upon usual non-adaptive approx-imation. Only for the case where u = L -norms. So other than with (1.2) there is no need to discretize adual-norm, and so to guarantee an inf-sup condition. Minimization over any con-forming trial space yields a quasi-best approximation from that space in the cor-responding ‘energy-norm’. This norm, however, is stronger than the norm on X .For the aforementioned example of a discontinuity between initial and boundaryconditions, and with the application of continuous piecewise linear ﬁnite elementsw.r.t. tetrahedral meshes of the space-time cylinder it results in a convergence rateof 0.07 for uniform reﬁnements, which is not visibly improved using adaptive re-ﬁnements.1.7. Organization.

This paper is organized as follows: In Sect. 2 the well-posedspace-time variational formulation of the parabolic problem is discussed, and in

N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 5

Sect. 3 we discuss its inf-sup stable discretisation. The adaptive solution procedureis presented in Sect. 4, and its convergence is proven. The construction of the trialand test spaces is detailed in Sect. 5, and optimal preconditioners are presented.In Sect. 6, the deﬁnition of the enlarged space X ¯ δ is given, and the constructionof a stable basis of a stable complement space of X δ in X ¯ δ is outlined. Numericalresults are presented in Sect. 7, and a conclusion is formulated in Sect. 8.1.8. Notations.

In this work, by C (cid:46) D we will mean that C can be bounded bya multiple of D , independently of parameters which C and D may depend on.Obviously, C (cid:38) D is deﬁned as D (cid:46) C , and C (cid:104) D as C (cid:46) D and C (cid:38) D .For normed linear spaces E and F , by L ( E , F ) we will denote the normed linearspace of bounded linear mappings E → F , and by L is ( E , F ) its subset of bound-edly invertible linear mappings E → F . We write E (cid:44) → F to denote that E iscontinuously embedded into F . For simplicity only, we exclusively consider linearspaces over the scalar ﬁeld R .2. S PACE - TIME FORMULATIONS OF A PARABOLIC EVOLUTION PROBLEM

Let V , H be separable Hilbert spaces of functions on some “spatial domain”such that V (cid:44) → H with dense embedding. Identifying H with its dual, we obtainthe Gelfand triple V (cid:44) → H (cid:39) H (cid:48) (cid:44) → V (cid:48) .For a.e. t ∈ I : = ( T ) ,let a ( t ; · , · ) denote a bilinear form on V × V such that for any η , ζ ∈ V , t (cid:55)→ a ( t ; η , ζ ) is measurable on I , and such that for some (cid:36) ∈ R , for a.e. t ∈ I , | a ( t ; η , ζ ) | (cid:46) (cid:107) η (cid:107) V (cid:107) ζ (cid:107) V ( η , ζ ∈ V ) ( boundedness ),(2.1) a ( t ; η , η ) + (cid:36) (cid:104) η , η (cid:105) H (cid:38) (cid:107) η (cid:107) V ( η ∈ V ) ( Gårding inequality ).(2.2)With A ( t ) ∈ L is ( V , V (cid:48) ) being deﬁned by ( A ( t ) η )( ζ ) = a ( t ; η , ζ ) , given a forcingfunction g and an initial value u , we are interested in solving the parabolic initialvalue problem to ﬁnding u such that(2.3) (cid:26) dudt ( t ) + A ( t ) u ( t ) = g ( t ) ( t ∈ I ) , u ( ) = u .In a simultaneous space-time variational formulation, the parabolic PDE readsas ﬁnding u from a suitable space of functions of time and space such that ( Bw )( v ) : = (cid:90) I (cid:104) dwdt ( t ) , v ( t ) (cid:105) + a ( t ; w ( t ) , v ( t )) dt = (cid:90) I (cid:104) g ( t ) , v ( t ) (cid:105) = : g ( v ) for all v from another suitable space of functions of time and space. One possibilityto enforce the initial condition is by testing it against additional test functions. Aproof of the following result can be found in [SS09], cf. [DL92, Ch.XVIII, §3] and[Wlo82, Ch. IV, §26] for slightly different statements. Theorem 2.1.

With X : = L ( I ; V ) ∩ H ( I ; V (cid:48) ) , Y : = L ( I ; V ) , under conditions (2.1) and (2.2) it holds that (cid:20) B γ (cid:21) ∈ L is ( X , Y (cid:48) × H ) , ROB STEVENSON, RAYMOND VAN VENETIË, JAN WESTERDIEP where for t ∈ ¯ I, γ t : u (cid:55)→ u ( t , · ) denotes the trace map. That is, assuming g ∈ Y (cid:48) andu ∈ H, ﬁnding u ∈ X such that (2.4) (cid:20) B γ (cid:21) u = (cid:20) gu (cid:21) is a well-posed simultaneous space-time variational formulation of (2.3) . With ˜ u ( t ) : = u ( t ) e − (cid:36) t , (2.3) is equivalent to d ˜ udt ( t ) + ( A ( t ) + (cid:36) Id ) ˜ u ( t ) = g ( t ) e − (cid:36) t ( t ∈ I ), ˜ u ( ) = u . Since (( A ( t ) + (cid:36) Id ) η )( η ) (cid:38) (cid:107) η (cid:107) V , w.l.o.g. we assume that (2.2)is valid for (cid:36) =

0, i.e., a ( t ; · , · ) is coercive uniformly for a.e. t ∈ I .For simplicity, cf. discussion in Remark 3.5, additionally we assume that a ( t ; · , · ) is symmetric , and deﬁne A = A (cid:48) ∈ L is ( Y , Y (cid:48) ) by ( Aw )( v ) = (cid:82) I ( A ( t ) w ( t )) v ( t ) dt .Because (cid:20) A

00 Id (cid:21) ∈ L is ( Y × H , Y (cid:48) × H ) , an equivalent formulation of (2.4) as aself-adjoint saddle point equation reads as ﬁnding ( µ , σ , u ) ∈ Y × H × X (where µ = = σ ) such that  A B γ B (cid:48) γ (cid:48)   µσ u  =  gu  ,(2.5)or equivalently (cid:20) A BB (cid:48) − γ (cid:48) γ (cid:21) (cid:20) µ u (cid:21) = (cid:20) g − γ (cid:48) u (cid:21) ,(2.6)or ( B (cid:48) A − B + γ (cid:48) γ ) (cid:124) (cid:123)(cid:122) (cid:125) S : = u = B (cid:48) A − g + γ (cid:48) u (cid:124) (cid:123)(cid:122) (cid:125) f : = .(2.7)We equip Y and X with ‘energy’-norms (cid:107) · (cid:107) Y : = ( A · )( · ) , (cid:107) · (cid:107) X : = (cid:107) · (cid:107) Y + (cid:107) ∂ t · (cid:107) Y (cid:48) + (cid:107) γ T · (cid:107) H ,which are equivalent to the canonical norms on Y and X . Notice that (2.5)–(2.7)are the Euler-Langrange equations that result from the minimization problem u = argmin w ∈ X (cid:107) Bw − f (cid:107) Y (cid:48) + (cid:107) γ w − u (cid:107) H . Lemma 2.2.

We have (cid:107) · (cid:107) X = ( S · )( · ) .Proof. It holds that (cid:107) w (cid:107) X = sup (cid:54) = v ∈ Y ( Bw )( v ) (cid:107) v (cid:107) Y + (cid:107) γ w (cid:107) H = sup (cid:54) =( v , v ) ∈ Y × H (( Bw )( v ) + (cid:104) γ w , v (cid:105) H ) (cid:107) v (cid:107) Y + (cid:107) v (cid:107) H = ( Sw )( w ) ,where the ﬁrst equality can be found in e.g. [ESV17, Thm. 2.1], and, when realisingthat S is the Schur complement of the operator in (2.5), the last one in e.g. [KS08,Lemma 2.2]. (cid:3) N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 7

3. D

ISCRETIZATIONS

Galerkin discretization of the Schur complement equation.

Let ( X δ ) δ ∈ ∆ bea collection of closed, e.g, ﬁnite dimensional, subspaces of X , so equipped with (cid:107) · (cid:107) X . We will specify such a family in Sect. 5-6.1. We deﬁne a partial order on ∆ by δ (cid:22) ˜ δ ⇐⇒ X δ ⊆ X ˜ δ .For δ ∈ ∆ , let u δ ∈ X δ denote the Galerkin approximation to the solution u of (2.7),i.e., the solution of(3.1) ( Su δ )( v ) = f ( v ) ( v ∈ X δ ) ,being the best approximation to u from X δ w.r.t. (cid:107) · (cid:107) X .For proving convergence of an adaptive solution routine, as well as for a poste-riori error estimation, we shall make the following assumption. Assumption 3.1 (Saturation) . There exists a collection of subspaces ( δ G × δ U ) δ ∈ ∆ ⊆ Y (cid:48) × H, a mapping · : ∆ → ∆ : δ (cid:55)→ ¯ δ where ¯ δ (cid:23) δ , and some ﬁxed constant ζ < suchthat for all δ ∈ ∆ , assuming that ( g , u ) ∈ δ G × δ U , (3.2) (cid:107) u − u ¯ δ (cid:107) X ≤ ζ (cid:107) u − u δ (cid:107) X . Remark . Notice that above assumption cannot be valid without a restriction onthe right-hand side f = B (cid:48) A − g + γ (cid:48) u ∈ X (cid:48) . Indeed given any X δ ⊂ X ¯ δ (cid:40) X ,consider a non-zero f ∈ X (cid:48) that vanishes on X ¯ δ . Then u δ = u ¯ δ = (cid:54) = u , meaningthat (3.2) does not hold. For the time being we will operate under the restrictive assumption that when-ever we apply (3.2) (visible by the appearance of the constant ζ ) we simplyassume that ( g , u ) ∈ δ G × δ U . Later, in Sect. 4.3, we will remove this as-sumption. The discretized problem from (3.1) only serves theoretical purposes. Indeed,since the Schur complement operator S contains the inverse of A , there is no wayto determine u δ exactly. The reason to introduce (3.1) is that S is an elliptic operator,so that for δ (cid:22) ˜ δ we can make use of (cid:107) u − u ˜ δ (cid:107) X = (cid:107) u − u δ (cid:107) X − (cid:107) u ˜ δ − u δ (cid:107) X , beinga crucial tool for proving convergence of adaptive algorithms.3.2. Uniformly stable Galerkin discretization of the saddle-point formulation.

Our numerical approximations will be based on Galerkin discretizations of thesaddle-point formulation (2.6). Let ( Y δ ) δ ∈ ∆ be a collection of closed subspaces of Y , so equipped with (cid:107) · (cid:107) Y , such that(3.3) X δ ⊆ Y δ ( δ ∈ ∆ ) ,and(3.4) 1 ≥ γ ∆ : = inf δ ∈ ∆ inf (cid:54) = w ∈ X δ sup (cid:54) = v ∈ Y δ ( ∂ t w )( v ) (cid:107) ∂ t w (cid:107) Y (cid:48) (cid:107) v (cid:107) Y > − γ ∆ can be made arbitrarily small by selecting, for each δ ∈ ∆ , Y δ sufﬁciently large in relation to X δ . ROB STEVENSON, RAYMOND VAN VENETIË, JAN WESTERDIEP

For δ , ˆ δ ∈ ∆ with Y ˆ δ ⊇ Y δ , and E ˆ δ Y , E δ X denoting the embeddings Y ˆ δ → Y , X δ → X , let ( µ ˆ δδ , u ˆ δδ ) ∈ Y ˆ δ × X δ be the solution of(3.5) (cid:34) E ˆ δ Y (cid:48) AE ˆ δ Y E ˆ δ Y (cid:48) BE δ X E δ X (cid:48) B (cid:48) E ˆ δ Y − E δ X (cid:48) γ (cid:48) γ E δ X (cid:35) (cid:34) µ ˆ δδ u ˆ δδ (cid:35) = (cid:34) E ˆ δ Y (cid:48) g − E δ X (cid:48) γ (cid:48) u (cid:35) ,or, equivalently,(3.6) E δ X (cid:48) ( B (cid:48) E ˆ δ Y ( E ˆ δ Y (cid:48) AE ˆ δ Y ) − E ˆ δ Y (cid:48) B + γ (cid:48) γ ) E δ X (cid:124) (cid:123)(cid:122) (cid:125) S ˆ δδ : = u ˆ δδ = E δ X (cid:48) ( B (cid:48) E ˆ δ Y ( E ˆ δ Y (cid:48) AE ˆ δ Y ) − E ˆ δ Y (cid:48) g + γ (cid:48) u ) (cid:124) (cid:123)(cid:122) (cid:125) f ˆ δδ : = .Below we will see that (3.5)-(3.6) are uniquely solvable. Formulated in ‘operatorlanguage’, (3.5) is the Galerkin discretization of (2.6) on the closed subspace Y ˆ δ × X δ ⊆ Y × X . Unless Y ˆ δ = Y , it holds that S ˆ δδ (cid:54) = E δ X (cid:48) SE δ X and f ˆ δδ (cid:54) = E δ X (cid:48) f , and sogenerally u ˆ δδ (cid:54) = u δ .As we will see, however, for Y δ , and thus Y ˆ δ , ‘large’ in relation to X δ , u ˆ δδ will be‘close’ to u δ . This will allow us to show that ( r -linear) convergence of a sequence ofGalerkin solutions u δ of (3.1) implies ( r -linear) convergence of the correspondingsequence u ˆ δδ .We equip X δ with a family of ‘energy’ norms (cid:107) w (cid:107) X ˆ δδ : = (cid:107) w (cid:107) Y + sup (cid:54) = v ∈ Y ˆ δ ( ∂ t w )( v ) (cid:107) v (cid:107) Y + (cid:107) γ T w (cid:107) .By deﬁnition of γ ∆ it holds that(3.7) γ ∆ (cid:107) · (cid:107) X ≤ (cid:107) · (cid:107) X ˆ δδ ≤ (cid:107) · (cid:107) X on X δ .As follows from [SW20b, Lemma 3.3], similar to Lemma 2.2 we have the fol-lowing result. Lemma 3.3.

Thanks to (3.3) (and Y δ ⊆ Y ˆ δ ), for w ∈ X δ it holds that (cid:107) w (cid:107) X ˆ δδ = ( S ˆ δδ w )( w ) = sup (cid:54) = v ∈ Y ˆ δ ( Bw )( v ) (cid:107) v (cid:107) Y + (cid:107) γ w (cid:107) H .By using additionally (3.4) this result shows that ( S ˆ δδ · )( · ) is coercive on X δ × X δ so that (3.6), and thus (3.5), has a unique solution.Moreover, we have the following result. Theorem 3.4 ([SW20b, Thm. 3.7]) . Thanks to (3.3) (and Y δ ⊆ Y ˆ δ ) and (3.4) , it holdsthat (3.8) (cid:107) u − u δ (cid:107) X ≤ (cid:107) u − u ˆ δδ (cid:107) X ≤ γ − ∆ (cid:107) u − u δ (cid:107) X . Remark . Without the assumption of a ( t ; · , · ) being symmetric, the operator A in (2.5), (2.6), (2.7), (3.5), (3.6), and in the deﬁnition of (cid:107) · (cid:107) Y should be replaced by A s : = ( A + A (cid:48) ) , whereas ∂ t in the deﬁnitions of (cid:107) · (cid:107) X , γ ∆ in (3.8), and (cid:107) · (cid:107) X ˆ δδ should be replaced by ∂ t + A a , where A a : = ( A − A (cid:48) ) . Then, as shown in [SW20a,Thm. 6.1], it holds that (cid:107) u − u ˆ δδ (cid:107) X ≤ γ − ∆ (cid:107) u − u δ (cid:107) X . N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 9

Still without assuming that a ( t ; · , · ) is symmetric, it is interesting that underthe original, easier to demonstrate inf-sup condition (3.4) in terms of ∂ t , a quasi-optimality result similar to (3.8) can be shown, where then the upper bound for (cid:107) u − u ˆ δδ (cid:107) X / (cid:107) u − u δ (cid:107) X depends on (cid:107) A a (cid:107) L ( Y , Y (cid:48) ) , and cannot be driven to 1 by tak-ing Y δ sufﬁciently large in relation to X δ . The latter, however, will be essential forthe analysis in the current work, being the reason why we consider only symmet-ric a ( t ; · , · ) .3.3. Modiﬁed discretized saddle-point.

In view of obtaining an efﬁcient imple-mentation, in the deﬁnition of ( µ ˆ δδ , u ˆ δδ ) in (3.5), and so in that of S ˆ δδ and f ˆ δδ in(3.6), we replace ( E ˆ δ Y (cid:48) AE ˆ δ Y ) − by some K ˆ δ Y = K ˆ δ Y (cid:48) ∈ L is ( Y ˆ δ (cid:48) , Y ˆ δ ) for which both, forsome constant κ ∆ ≥ (( K ˆ δ Y ) − v )( v )( Av )( v ) ∈ [ κ − ∆ , κ ∆ ] ( δ ∈ ∆ , v ∈ Y ˆ δ ) (i.e. K ˆ δ Y is an optimal (self-adjoint and coercive) preconditioner for E ˆ δ Y (cid:48) AE ˆ δ Y ), andwhich can be applied at linear cost. The resulting system (3.6) is now amenable tothe application of the (preconditioned) conjugate residuals iteration.Despite this modiﬁcation, we keep using the old notations for µ ˆ δδ , u ˆ δδ , S ˆ δδ , (cid:107) · (cid:107) X ˆ δδ : = ( S ˆ δδ · )( · ) , and f ˆ δδ .As shown in [SW20b, Remark 3.8], instead of (3.8) now it holds that(3.10) (cid:107) u − u δ (cid:107) X ≤ (cid:107) u − u ˆ δδ (cid:107) X ≤ κ ∆ γ ∆ (cid:107) u − u δ (cid:107) X ,whereas one deduces that (3.7) now should be read as(3.11) γ ∆ √ κ ∆ (cid:107) · (cid:107) X ≤ (cid:107) · (cid:107) X ˆ δδ ≤ √ κ ∆ (cid:107) · (cid:107) X on X δ .For our forthcoming analysis, we will need κ ∆ γ ∆ − Remark . Later, in the proof of Proposition 4.5, temporarily we will consider thesystem (3.5) with ˆ δ = δ (i.e. Y ˆ δ = Y δ ), but with X δ replaced by X , and, as we do inthe current subsection, E δ Y (cid:48) AE δ Y replaced by ( K δ Y ) − . The resulting Schur operator B (cid:48) E δ Y K δ Y E δ Y (cid:48) B + γ (cid:48) γ will be denoted as S δ ∞ .Notice that the exact solution u solves S δ ∞ u = B (cid:48) E δ Y K δ Y E δ Y (cid:48) g + γ (cid:48) u . Observingthat S δδ = E δ X (cid:48) S δ ∞ E δ X , we have the Galerkin orthogonality ( S δ ∞ ( u − u δδ ))( X δ ) = (cid:107) · (cid:107) X δ ∞ : = ( S δ ∞ · )( · ) = (cid:118)(cid:117)(cid:117)(cid:116) sup (cid:54) = v ∈ Y δ (( B · )( v )) (( K δ Y ) − v )( v ) + (cid:107) γ · (cid:107) H = (cid:113) ( E δ Y (cid:48) B · )( K δ Y E δ Y (cid:48) B · ) + (cid:107) γ · (cid:107) H is only a semi -norm on X , which is equal to (cid:107) · (cid:107) X δδ on X δ , and(3.12) (cid:107) · (cid:107) X δ ∞ ≤ √ κ ∆ (cid:107) · (cid:107) X on X .

4. C

ONVERGENT ADAPTIVE SOLUTION METHOD

Preliminaries.

For δ ∈ ∆ , we consider the modiﬁed discretized saddle pointproblem (i.e. (3.6) with ( E ˆ δ Y (cid:48) AE ˆ δ Y ) − replaced by K ˆ δ Y ) taking ˆ δ : = ¯ δ from Assump-tion 3.1. So for a given ‘trial space’ X δ , we employ Y ¯ δ as ‘test space’, which isknown to be sufﬁciently large to give stability even when employed with trialspace X ¯ δ (cid:41) X δ . We will use this room to (adaptively) expand X δ to some X ˜ δ ⊂ X ¯ δ while keeping Y ¯ δ ﬁxed. Then in a second step we adapt the test space to the newtrial space, i.e., replace Y ¯ δ by Y ¯˜ δ . By doing so will construct a sequence ( δ i ) ⊆ ∆ with δ i (cid:22) δ i + such that ( u ¯ δ i δ i ) i converges r -linearly to u .As a ﬁrst step, in the next lemma it is shown that if one constructs from w ∈ X δ a v ∈ X ¯ δ that is closer to the best approximation u ¯ δ to u from X ¯ δ , then, thanks toAssumption 3.1, v is also closer to u . Lemma 4.1.

Let w ∈ X δ , v ∈ X ¯ δ be such that for some ρ ≤ , (cid:107) u ¯ δ − v (cid:107) X ≤ ρ (cid:107) u ¯ δ − w (cid:107) X . Then (cid:107) u − v (cid:107) X ≤ (cid:113) ζ + ρ ( − ζ ) (cid:107) u − w (cid:107) X . Proof.

Using u − u ¯ δ ⊥ X X ¯ δ twice, we obtain (cid:107) u − v (cid:107) X = (cid:107) u − u ¯ δ (cid:107) X + (cid:107) u ¯ δ − v (cid:107) X ≤ (cid:107) u − u ¯ δ (cid:107) X + ρ (cid:107) u ¯ δ − w (cid:107) X = (cid:107) u − u ¯ δ (cid:107) X + ρ ( (cid:107) u − w (cid:107) X − (cid:107) u − u ¯ δ (cid:107) X )= ( − ρ ) (cid:107) u − u ¯ δ (cid:107) X + ρ (cid:107) u − w (cid:107) X ≤ ( ζ ( − ρ ) + ρ ) (cid:107) u − w (cid:107) X ,where we used Assumption 3.1 and (cid:107) u − u δ (cid:107) X ≤ (cid:107) u − w (cid:107) X . (cid:3) Notice that u ¯ δδ is the Galerkin approximation from X δ to the solution u ¯ δ ¯ δ ∈ X ¯ δ of the system S ¯ δ ¯ δ u ¯ δ ¯ δ = f ¯ δ ¯ δ , i.e., it is its best approximation from X δ w.r.t. (cid:107) · (cid:107) X ¯ δ ¯ δ . Inthe next proposition it is shown that an improved Galerkin approximation from anintermediate space X ¯ δ ⊇ X ˜ δ ⊇ X δ , i.e., the function u ¯ δ ˜ δ , is, for κ ∆ γ ∆ − u , and furthermore that this holds truealso for u ¯˜ δ ˜ δ . The latter function will be the successor of u ¯ δδ in our convergingsequence. Proposition 4.2.

Let δ (cid:22) ˜ δ (cid:22) ¯ δ be such that (4.1) (cid:107) u ¯ δ ¯ δ − u ¯ δ ˜ δ (cid:107) X ¯ δ ¯ δ ≤ ρ (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ . Then it holds that (cid:107) u − u ¯˜ δ ˜ δ (cid:107) X ≤ κ ∆ γ ∆ (cid:113) ζ + ˆ ρ ( − ζ ) (cid:124) (cid:123)(cid:122) (cid:125) ¯ ρ : = (cid:107) u − u ¯ δδ (cid:107) X , where ˆ ρ : = (cid:0) + ρ √ κ ∆ γ ∆ (cid:1)(cid:114) κ ∆ γ ∆ − (cid:114) ζ − ζ + ρ √ κ ∆ γ ∆ . Notice that ˆ ρ and ¯ ρ are < when ρ < and κ ∆ γ ∆ − is sufﬁciently small dependent on ρ with κ ∆ γ ∆ − ↓ when ρ ↑ . N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 11

Proof.

Using that u − u ¯ δ ⊥ X X ¯ δ , it follows that (cid:107) u − u ¯ δδ (cid:107) X ≤ κ ∆ γ ∆ (cid:107) u − u δ (cid:107) X ((3.10))is equivalent to (cid:107) u ¯ δ − u ¯ δδ (cid:107) X ≤ (cid:114) κ ∆ γ ∆ − (cid:107) u − u δ (cid:107) X . Similarly, Assumption 3.1 isequivalent to (cid:107) u − u δ (cid:107) X ≤ (cid:114) ζ − ζ (cid:107) u ¯ δ − w (cid:107) X for any w ∈ X δ . Additionally using(3.11), we infer that (cid:107) u ¯ δ − u ¯ δ ˜ δ (cid:107) X ≤ (cid:107) u ¯ δ − u ¯ δ ¯ δ (cid:107) X + (cid:107) u ¯ δ ¯ δ − u ¯ δ ˜ δ (cid:107) X ≤ (cid:107) u ¯ δ − u ¯ δ ¯ δ (cid:107) X + √ κ ∆ γ ∆ (cid:107) u ¯ δ ¯ δ − u ¯ δ ˜ δ (cid:107) X ¯ δ ¯ δ ≤ (cid:107) u ¯ δ − u ¯ δ ¯ δ (cid:107) X + ρ √ κ ∆ γ ∆ (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ≤ (cid:0) + ρ √ κ ∆ γ ∆ (cid:1) (cid:107) u ¯ δ − u ¯ δ ¯ δ (cid:107) X + ρ √ κ ∆ γ ∆ (cid:107) u ¯ δ − u ¯ δδ (cid:107) X ≤ (cid:104)(cid:0) + ρ √ κ ∆ γ ∆ (cid:1)(cid:114) κ ∆ γ ∆ − (cid:114) ζ − ζ + ρ √ κ ∆ γ ∆ (cid:105) (cid:107) u ¯ δ − u ¯ δδ (cid:107) X From Lemma 4.1 we conclude that (cid:107) u − u ¯ δ ˜ δ (cid:107) X ≤ (cid:112) ζ + ˆ ρ ( − ζ ) (cid:107) u − u ¯ δδ (cid:107) X .Thanks to (3.10), it holds that (cid:107) u − u ¯˜ δ ˜ δ (cid:107) X ≤ κ ∆ γ ∆ (cid:107) u − u ˜ δ (cid:107) X ≤ κ ∆ γ ∆ (cid:107) u − u ¯ δ ˜ δ (cid:107) X ,which completes the proof. (cid:3) Bulk chasing and a posteriori error estimation.

To realize (4.1), i.e., to con-struct from the Galerkin approximation u ¯ δδ to u ¯ δ ¯ δ an improved Galerkin approx-imation u ¯ δ ˜ δ , we apply the concept of bulk chasing, also known as Dörﬂer mark-ing, on a collection of a posteriori error indicators that constitute an efﬁcient andreliable error estimator. We will apply an estimator of ‘hierarchical basis’ type([ZMD + Θ δ = { θ λ : λ ∈ J δ } ⊆ X ¯ δ be such that X δ + span Θ δ = X ¯ δ and, for someconstants 0 < m ≤ M , for all δ ∈ ∆ , z ∈ X δ and c : = ( c λ ) λ ∈ J δ ⊂ R .(4.2) m (cid:107) z + c (cid:62) Θ δ (cid:107) X ≤ (cid:107) z (cid:107) X + (cid:107) c (cid:107) ≤ M (cid:107) z + c (cid:62) Θ δ (cid:107) X .A suitable collection Θ δ will be constructed in Sect. 6.1. Proposition 4.3.

Assume (4.2) . Let r ¯ δδ : = ( f ¯ δ ¯ δ − S ¯ δ ¯ δ u ¯ δδ )( Θ δ ) , being the residual vectorof u ¯ δδ . Let J ⊆ J δ be such that for some constant ϑ ∈ (

0, 1 ] , (cid:107) r ¯ δδ | J (cid:107) ≥ ϑ (cid:107) r ¯ δδ (cid:107) , and, for some ˜ δ (cid:22) ¯ δ , let X δ + span Θ δ | J ⊆ X ˜ δ . Then with ρ : = (cid:114) − (cid:0) mM γ ∆ κ ∆ ϑ (cid:1) , (4.1) isvalid, i.e., (4.3) (cid:107) u ¯ δ ¯ δ − u ¯ δ ˜ δ (cid:107) X ¯ δ ¯ δ ≤ ρ (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ ; and so, when κ ∆ γ ∆ − is sufﬁciently small dependent on ϑ , with κ ∆ γ ∆ − ↓ when ϑ ↓ ,for some constant ρ < ¯ ρ < , (cid:107) u − u ¯˜ δ ˜ δ (cid:107) X ≤ ¯ ρ (cid:107) u − u ¯ δδ (cid:107) X . Proof.

As a consequence of (4.2) and (3.11), we have γ ∆ κ ∆ m (cid:107) z + c (cid:62) Θ δ (cid:107) X ¯ δ ¯ δ ≤ (cid:107) z (cid:107) X ¯ δ ¯ δ + (cid:107) c (cid:107) ≤ κ ∆ γ ∆ M (cid:107) z + c (cid:62) Θ δ (cid:107) X ¯ δ ¯ δ .We infer that (cid:107) u ¯ δ ˜ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ = sup (cid:54) =( z , c ) ∈ X δ × R J δ ( S ¯ δ ¯ δ ( u ¯ δ ˜ δ − u ¯ δδ ))( z + c (cid:62) Θ δ ) (cid:107) z + c (cid:62) Θ δ (cid:107) X ¯ δ ¯ δ ≥ m √ γ ∆ κ ∆ sup (cid:54) =( z , c ) ∈ X δ × R J δ ( S ¯ δ ¯ δ ( u ¯ δ ˜ δ − u ¯ δδ ))( c (cid:62) Θ δ ) (cid:113) (cid:107) z (cid:107) X ¯ δ ¯ δ + (cid:107) c (cid:107) ≥ m √ γ ∆ κ ∆ sup (cid:54) = c ∈ R J δ (cid:104) c | J , ( f ¯ δ ¯ δ − S ¯ δ ¯ δ u ¯ δδ )( Θ δ | J ) (cid:105)(cid:107) c | J (cid:107) = m √ γ ∆ κ ∆ (cid:107) r ¯ δδ | J (cid:107) ≥ m √ γ ∆ κ ∆ ϑ (cid:107) r ¯ δδ (cid:107) = m √ γ ∆ κ ∆ ϑ sup (cid:54) =( z , c ) ∈ X δ × R J δ ( S ¯ δ ¯ δ ( u ¯ δ ¯ δ − u ¯ δδ ))( c (cid:62) Θ δ ) (cid:113) (cid:107) z (cid:107) X ¯ δ ¯ δ + (cid:107) c (cid:107) ≥ mM γ ∆ κ ∆ ϑ (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ .(4.4)so that (cid:107) u ¯ δ ¯ δ − u ¯ δ ˜ δ (cid:107) X ¯ δ ¯ δ = (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ − (cid:107) u ¯ δ ˜ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ ≤ (cid:16) − (cid:0) mM γ ∆ κ ∆ ϑ (cid:1) (cid:17) (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ ,which completes the proof of (4.3).The ﬁnal statement follows from an application of Proposition 4.2. (cid:3) Additionally we have that (cid:107) r ¯ δδ (cid:107) provides an efﬁcient and reliable a posterioriestimator for (cid:107) u − u ¯ δδ (cid:107) X : Proposition 4.4.

Assume (4.2) . Recalling that ζ < , let κ ∆ γ ∆ < ζ . Then for δ ∈ ∆ , m √ γ ∆ κ ∆ + ζ (cid:115) κ ∆ γ ∆ − (cid:107) r ¯ δδ (cid:107) ≤ (cid:107) u − u ¯ δδ (cid:107) X ≤ M κ ∆ γ ∆ √ − ζ − ζ (cid:115) κ ∆ γ ∆ − (cid:107) r ¯ δδ (cid:107) . Proof.

Assumption 3.1 gives (cid:107) u − u ¯ δ (cid:107) X ≤ ζ (cid:107) u − u ¯ δδ (cid:107) X , which by u − u ¯ δ ⊥ X X ¯ δ yields(4.5) (cid:107) u ¯ δ − u ¯ δδ (cid:107) X ≤ (cid:107) u − u ¯ δδ (cid:107) X ≤ √ − ζ (cid:107) u ¯ δ − u ¯ δδ (cid:107) X .As we already have noted in the proof of Proposition 4.2, (3.10) is equivalent to (cid:107) u ¯ δ − u ¯ δ ¯ δ (cid:107) X ≤ (cid:114) κ ∆ γ ∆ − (cid:107) u − u ¯ δ (cid:107) X . Together with Assumption 3.1, it gives (cid:12)(cid:12)(cid:12) (cid:107) u ¯ δ − u ¯ δδ (cid:107) X − (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X (cid:12)(cid:12)(cid:12) ≤ ζ (cid:115) κ ∆ γ ∆ − (cid:107) u − u ¯ δδ (cid:107) X N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 13 which in combination with (4.5) and ζ κ ∆ γ ∆ < + ζ (cid:115) κ ∆ γ ∆ − (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ≤ (cid:107) u − u ¯ δδ (cid:107) X ≤ √ − ζ − ζ (cid:115) κ ∆ γ ∆ − (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X .The proof is completed by (3.11), and m √ γ ∆ κ ∆ (cid:107) r ¯ δδ (cid:107) ≤ (cid:107) u ¯ δ ¯ δ − u ¯ δδ (cid:107) X ¯ δ ¯ δ ≤ M κ ∆ √ γ ∆ (cid:107) r ¯ δδ (cid:107) where the latter inequalities were shown in (4.4) when reading ( J , ϑ , ˜ δ ) as ( J δ , 1, ¯ δ ) . (cid:3) Next we present an alternative a posteriori error estimator that does not rely on(4.2), that we expect to be more accurate, and that can be a computed at the cost ofone additional inner product.

Proposition 4.5.

Let κ ∆ γ ∆ < ζ , and for v ∈ X δ , deﬁne E δ ( v ) = E δ ( v ; g , u ) : = (cid:113) ( E ¯ δ Y (cid:48) ( g − Bv ))( K ¯ δ Y E ¯ δ Y (cid:48) ( g − Bv )) + (cid:107) u − γ v (cid:107) H . Then γ ∆ − ζκ ∆ √ κ ∆ (cid:107) u − v (cid:107) X ≤ E δ ( v ) ≤ (cid:114) κ ∆ ( ζ + ( + ζ κ ∆ γ ∆ ) ) (cid:107) u − v (cid:107) X . Proof.

From Remark 3.6, recall that the semi-norm (cid:107) · (cid:107) X ¯ δ ∞ on X equals (cid:107) · (cid:107) X ¯ δ ¯ δ on X ¯ δ , and ( S ¯ δ ∞ ( u − u ¯ δ ¯ δ ))( X ¯ δ ) =

0, which implies(4.6) (cid:107) u − w (cid:107) X ¯ δ ∞ = (cid:107) u − u ¯ δ ¯ δ (cid:107) X ¯ δ ∞ + (cid:107) u ¯ δ ¯ δ − w (cid:107) X ¯ δ ∞ ( w ∈ X ¯ δ ) ,and (cid:107) · (cid:107) X ¯ δ ∞ ≤ √ κ ∆ (cid:107) · (cid:107) X ((3.12)).From (3.10) and Assumption 3.1, we have for v ∈ X δ (4.7) (cid:107) u − u ¯ δ ¯ δ (cid:107) X ≤ κ ∆ γ ∆ (cid:107) u − u ¯ δ (cid:107) X ≤ ζ κ ∆ γ ∆ (cid:107) u − v (cid:107) X From (4.7), the triangle-inequality, (3.11), and (4.6) we obtain for v ∈ X δ (cid:107) u − v (cid:107) X ≤ − ζ κ ∆ γ ∆ (cid:107) u ¯ δ ¯ δ − v (cid:107) X ≤ √ κ ∆ γ ∆ − ζκ ∆ (cid:107) u ¯ δ ¯ δ − v (cid:107) X ¯ δ ¯ δ ≤ √ κ ∆ γ ∆ − ζκ ∆ (cid:107) u − v (cid:107) X ¯ δ ∞ .Conversely, we have (cid:107) u − v (cid:107) X ¯ δ ∞ (4.6) = (cid:107) u − u ¯ δ ¯ δ (cid:107) X ¯ δ ∞ + (cid:107) u ¯ δ ¯ δ − v (cid:107) X ¯ δ ∞ (4.6),(3.12) ≤ (cid:107) u − u ¯ δ (cid:107) X ¯ δ ∞ + κ ∆ (cid:107) u ¯ δ ¯ δ − v (cid:107) X (3.12),(4.7) ≤ κ ∆ (cid:107) u − u ¯ δ (cid:107) X + κ ∆ ( + ζ κ ∆ γ ∆ ) (cid:107) u − v (cid:107) X ≤ κ ∆ ( ζ + ( + ζ κ ∆ γ ∆ ) ) (cid:107) u − v (cid:107) X .by again applying Assumption 3.1.Noting that (cid:107) u − v (cid:107) X ¯ δ ∞ : = E ¯ δ Y (cid:48) ( g − Bv )( K ¯ δ Y E ¯ δ Y (cid:48) ( g − Bv )) + (cid:107) u − γ v (cid:107) H , theproof is completed. (cid:3) Notice that the estimator of (cid:107) u − v (cid:107) X from Proposition 4.5 is exact when ζ = κ ∆ = = γ ∆ , where the one from Proposition 4.3, for v = u ¯ δδ , is exact onlywhen additionally m = = M . Data oscillation.

In view of the discussion following Assumption 3.1, no-tice that all results obtained so far that depend on the ‘saturation constant’ ζ , i.e.,Lemma 4.1, and Propositions 4.2, 4.3, and 4.4, are only valid under the conditionthat ( g , u ) ∈ δ G × δ U .Let us now consider the situation that the solutions and residuals in these state-ments refer to solutions and residuals with the true data ( g , u ) ∈ Y (cid:48) × H being replaced by an approximation ( δ g , δ u ) ∈ δ G × δ U . In the following we denotesuch solutions and residuals with an additional left superscript δ , or more generally˜ δ when a right-hand side ( ˜ δ g , ˜ δ u ) ∈ ˜ δ G × ˜ δ U has been used for their computation. Proposition 4.6.

Assume (4.2) , and let ϑ ∈ (

0, 1 ] be a constant. Then for κ ∆ γ ∆ − and aconstant ˆ ω > both being sufﬁciently small dependent on ϑ , with max ( κ ∆ γ ∆ −

1, ˆ ω ) ↓ when ϑ ↓ , there exists a constant ˇ ρ < such that for J ⊆ J δ with (cid:107) δ r ¯ δδ | J (cid:107) ≥ ϑ (cid:107) δ r ¯ δδ (cid:107) ,and X δ + span Θ δ | J ⊆ X ˜ δ , and max (cid:0) (cid:107) g − δ g (cid:107) Y (cid:48) + (cid:107) u − δ u (cid:107) H , (cid:107) g − ˜ δ g (cid:107) Y (cid:48) + (cid:107) u − ˜ δ u (cid:107) H (cid:1) ≤ ˆ ω (cid:107) δ r ¯ δδ (cid:107) , it holds that (cid:107) ˜ δ u − ˜ δ u ¯˜ δ ˜ δ (cid:107) X ≤ ˇ ρ (cid:107) δ u − δ u ¯ δδ (cid:107) X . Proof.

In the newly introduced notations, the statements of Propositions 4.3 and4.4 read as (cid:107) δ u − δ u ¯˜ δ ˜ δ (cid:107) X ≤ ¯ ρ (cid:107) δ u − δ u ¯ δδ (cid:107) X ,and (cid:107) δ r ¯ δδ (cid:107) (cid:104) (cid:107) δ u − δ u ¯ δδ (cid:107) X (4.8)The proof is easily completed by (cid:107) ˜ δ u − δ u (cid:107) X (cid:107) ˜ δ u ¯˜ δ ˜ δ − δ u ¯˜ δ ˜ δ (cid:107) X (cid:41) (cid:46) (cid:107) ˜ δ g − δ g (cid:107) Y (cid:48) + (cid:107) ˜ δ u − δ u (cid:107) H ≤ ω (cid:107) δ r ¯ δδ (cid:107) (cid:104) ˆ ω (cid:107) δ u − δ u ¯ δδ (cid:107) X . (cid:3) In view of the latter proposition, we make the following assumption.

Assumption 4.7.

We assume to have maps of the following types available: ∆ → Y (cid:48) × H : δ (cid:55)→ ( δ g , δ u ) ∈ δ G × δ U , η : ∆ → R such that (cid:107) g − δ g (cid:107) Y (cid:48) + (cid:107) u − δ u (cid:107) H ≤ η ( δ ) , and η ( ˜ δ ) ≤ η ( δ ) when ˜ δ (cid:23) δ , R > → ∆ : ε (cid:55)→ δ ( ε ) such that η ( δ ( ε )) ≤ ε .Notice that this in particular means that for any ε > δ ∈ ∆ and ( δ g , δ u ) ∈ δ G × δ U with (cid:107) g − δ g (cid:107) Y (cid:48) + (cid:107) u − δ u (cid:107) H ≤ ε . A speciﬁcation of asuitable family ( δ G , δ U ) δ ∈ ∆ will be given in Sect. 6.4.Given a δ ∈ ∆ , and thinking of ( δ g , δ u ) being a quasi-best approximation to ( g , u ) from δ G × δ U , the difference ( g , u ) − ( δ g , δ u ) is often referred to as data-oscillation . N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 15

A convergent algorithm.

In view of the statement from Proposition 4.6, inthe following we will use the short-hand notations u δ = δ u ¯ δδ , r δ = δ r ¯ δδ ,i.e., u δ is the solution of(4.9) E δ X (cid:48) ( B (cid:48) E ¯ δ Y K ¯ δ Y E ¯ δ Y (cid:48) B + γ (cid:48) γ ) E δ X (cid:124) (cid:123)(cid:122) (cid:125) S ¯ δδ = u δ = E δ X (cid:48) ( B (cid:48) E ¯ δ Y K ¯ δ Y E ¯ δ Y (cid:48) δ g + γ (cid:48) δ u ) (cid:124) (cid:123)(cid:122) (cid:125) f δ = δ f ¯ δδ : = .(cf. (3.6) and Sect. 3.3), and(4.10) r δ = E ¯ δ X (cid:48) (cid:2) B (cid:48) E ¯ δ Y K ¯ δ Y E ¯ δ Y (cid:48) ( δ g − Bu δ ) + γ (cid:48) ( δ u − γ u δ ) (cid:3) ( Θ δ ) Instead of solving (4.9) exactly, we will allow it to be solved approximately with asufﬁciently small relative tolerance by the application of an iterative method. Tothat end, we assume to have available a K δ X = K δ X (cid:48) ∈ L is ( X δ (cid:48) , X δ ) for which both(4.11) (( K δ X ) − w )( w ) (cid:104) (cid:107) w (cid:107) X ( w ∈ X δ ) (i.e. K δ X is an optimal (self-adjoint and coercive) preconditioner for S ¯ δδ ), and whichcan be applied at linear cost. Besides for an efﬁcient iterative solving of (4.9), wewill use this preconditioner to compute a quantity that is equivalent to the X -normof the (algebraic) error in any approximation from X δ to u δ .We denote such an approximate solution of (4.9) by ˜ u δ ∈ X δ , with correspondingresidual vector ˜r δ deﬁned as in (4.10) by replacing u δ by ˜ u δ . Algorithm 4.8.

Let ω > ϑ ∈ (

0, 1 ] , 0 < ξ < ε > δ : = δ init ∈ ∆ , t δ (cid:104) (cid:107) g (cid:107) Y (cid:48) + (cid:107) u (cid:107) H . do do compute ˜ u δ ∈ X δ with ˜ t δ : = (cid:113) ( f δ − S ¯ δδ ˜ u δ )( K δ X ( f δ − S ¯ δδ ˜ u δ )) ≤ t δ ; t δ : = ˜ t δ if e δ : = (cid:107) ˜r δ (cid:107) + η ( δ ) + t δ ≤ ε then stop endifuntil t δ ≤ ξ e δ if η ( δ ) > ω (cid:107) ˜r δ (cid:107) then select ˜ δ ∈ ∆ s.t. X ˜ δ ⊇ X δ is (a near-smallest) space such that η ( ˜ δ ) ≤ η ( δ ) /2. else determine δ (cid:22) ˜ δ (cid:22) ¯ δ s.t. X ˜ δ is (a near- smallest) space that for a J ⊆ I ¯ δδ contains X δ + span Θ δ | J where (cid:107) ˜r δ | J (cid:107) ≥ ϑ (cid:107) ˜r δ (cid:107) . endif t ˜ δ : = e δ , δ : = ˜ δ enddo Theorem 4.9.

Assume (4.2) , and let the constants γ ∆ and κ ∆ be as deﬁned in (3.4) and (3.9) , respectively. For constants ϑ , ω / ϑ , ξ / ϑ , ( κ ∆ γ ∆ − ) / ω that are sufﬁciently small,with additionally ω and κ ∆ γ ∆ − sufﬁciently small dependent on ϑ with max ( κ ∆ γ ∆ − ω ) ↓ when ϑ ↓ , there exists a constant ˘ ρ < such that between any two successive passingsof the until-clause the value of ϑ (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) decreases with at least a factor ˘ ρ . Forany ε > , Algorithm 4.8 terminates, and at termination it holds that (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) (cid:46) ε . Remark . Minor adaptations to the proof show that the statement remains truewhen one takes e δ : = E δ ( ˜ u δ ; δ g , δ u ) + η ( δ ) in Algorithm 4.8. Having to compute ˜r δ anyway, the additional cost of computing this e δ is small, and it can be expectedto be closer to (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) . Proof.

By replacing w by K δ X S ¯ δδ w in (4.11), one infers that ( S ¯ δδ v )( K δ X S ¯ δδ v ) (cid:104) (cid:107) v (cid:107) X for v ∈ X δ , and so(4.12) (cid:107) u δ − ˜ u δ (cid:107) X (cid:104) ( f δ − S ¯ δδ ˜ u δ )( K δ X ( f δ − S ¯ δδ ˜ u δ )) .From (3.11) and (4.2), we have(4.13) (cid:107) r δ − ˜r δ (cid:107) = sup (cid:54) = c ∈ R J δ ( S ¯ δ ¯ δ ( ˜ u δ − u δ ))( c (cid:62) Θ δ ) (cid:107) c (cid:107) ≤ κ ∆ m (cid:107) u δ − ˜ u δ (cid:107) X ¯ δ ¯ δ ≤ κ ∆ m (cid:107) u δ − ˜ u δ (cid:107) X .If the algorithm stops, then (cid:107) u − ˜ u δ (cid:107) X ≤ (cid:107) u − δ u (cid:107) X + (cid:107) δ u − u δ (cid:107) X + (cid:107) u δ − ˜ u δ (cid:107) X (cid:46) η ( δ ) + (cid:107) δ u − u δ (cid:107) X + (cid:107) u δ − ˜ u δ (cid:107) X (by Thm. 2.1 & Ass. 4.7) (4.8) (cid:104) η ( δ ) + (cid:107) r δ (cid:107) + (cid:107) u δ − ˜ u δ (cid:107) X ≤ η ( δ ) + (cid:107) ˜r δ (cid:107) + (cid:107) r δ − ˜r δ (cid:107) + (cid:107) u δ − ˜ u δ (cid:107) X (4.13) (cid:46) η ( δ ) + (cid:107) ˜r δ (cid:107) + (cid:107) u δ − ˜ u δ (cid:107) X (4.12) (cid:46) η ( δ ) + (cid:107) ˜r δ (cid:107) + t δ ≤ ε .The inner do-loop always terminates either by passing the until-clause or bythe stop-statement. Indeed, inside this loop the value of t δ is driven to 0, so that (cid:107) ˜r δ (cid:107) + η ( δ ) tends to (cid:107) r δ (cid:107) + η ( δ ) . So if (cid:107) r δ (cid:107) + η ( δ ) (cid:54) =

0, then at some moment t δ ≤ ξ ( (cid:107) ˜r δ (cid:107) + η ( δ ) + t δ ) , whereas if (cid:107) r δ (cid:107) + η ( δ ) = e δ ≤ ε .When passing the until-clause, it holds that t δ ≤ ξ ( (cid:107) ˜r δ (cid:107) + η ( δ ) + t δ ) , and so byusing ξ < t δ , t δ (cid:46) ξ ( (cid:107) ˜r δ (cid:107) + η ( δ )) (4.14) ≤ ξ ( (cid:107) r δ (cid:107) + (cid:107) ˜r δ − r δ (cid:107) + η ( δ )) (cid:46) ξ ( (cid:107) δ u − u δ (cid:107) X + (cid:107) ˜ u δ − u δ (cid:107) X + η ( δ )) (cid:46) ξ ( (cid:107) u − u δ (cid:107) X + t δ + η ( δ )) Taking ξ small enough and kicking back t δ , we obtain t δ (cid:46) ξ ( (cid:107) u − u δ (cid:107) X + η ( δ )) ,and similarly t δ (cid:46) ξ ( (cid:107) u − ˜ u δ (cid:107) X + η ( δ )) ,(4.15) t δ (cid:46) ξ ( (cid:107) δ u − u δ (cid:107) X + η ( δ )) .(4.16)When passing the until-clause, furthermore we have (cid:107) u − ˜ u δ (cid:107) X (cid:46) t δ + (cid:107) u − u δ (cid:107) X (cid:46) t δ + (cid:107) δ u − u δ (cid:107) X + η ( δ ) (cid:104) t δ + (cid:107) r δ (cid:107) + η ( δ ) ≤ t δ + (cid:107) r δ − ˜r δ (cid:107) + (cid:107) ˜r δ (cid:107) + η ( δ ) (cid:46) t δ + (cid:107) u δ − ˜ u δ (cid:107) X + (cid:107) ˜r δ (cid:107) + η ( δ ) (cid:46) t δ + (cid:107) ˜r δ (cid:107) + η ( δ ) (4.14) (cid:46) (cid:107) ˜r δ (cid:107) + η ( δ ) .(4.17) N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 17

Denoting δ at the subsequent passing of the until-clause as ˜ δ , we have (cid:107) u − ˜ u ˜ δ (cid:107) X (cid:46) t ˜ δ + (cid:107) ˜ δ u − u ˜ δ (cid:107) X + η ( ˜ δ ) (3.10) ≤ t ˜ δ + κ ∆ γ ∆ (cid:107) ˜ δ u − ˜ δ u ˜ δ (cid:107) X + η ( ˜ δ ) δ (cid:22) ˜ δ ≤ t ˜ δ + κ ∆ γ ∆ (cid:107) ˜ δ u − ˜ u δ (cid:107) X + η ( ˜ δ ) ≤ t ˜ δ + κ ∆ γ ∆ (cid:107) u − ˜ u δ (cid:107) X + O ( η ( ˜ δ )) .By using t ˜ δ (cid:46) ξ ( (cid:107) u − ˜ u ˜ δ (cid:107) X + η ( ˜ δ )) ((4.15)) and kicking back (cid:107) u − ˜ u ˜ δ (cid:107) X , we inferthat for ξ sufﬁciently small,(4.18) (cid:107) u − ˜ u ˜ δ (cid:107) X ≤ ( κ ∆ γ ∆ + O ( ξ )) (cid:107) u − ˜ u δ (cid:107) X + O ( η ( ˜ δ )) .In the case that(4.19) η ( δ ) > ω (cid:107) ˜r δ (cid:107) ,it holds that η ( ˜ δ ) ≤ η ( δ ) /2,and so, thanks to (4.17) and (4.19),(4.20) (cid:107) u − ˜ u δ (cid:107) X (cid:46) η ( δ ) / ω .For any constant ρ <

1, using (4.18) and (4.20) we have (cid:107) u − ˜ u ˜ δ (cid:107) X ≤ ρ (cid:107) u − ˜ u δ (cid:107) X + κ ∆ / γ ∆ + O ( ξ ) − ρ ω ω (cid:107) u − ˜ u δ (cid:107) X + O ( η ( ˜ δ )) ≤ ρ (cid:107) u − ˜ u δ (cid:107) X + (cid:0) κ ∆ / γ ∆ + O ( ξ ) − ρ ω + (cid:1) C η ( δ ) ,for some constant C >

0, and so ϑ (cid:107) u − ˜ u ˜ δ (cid:107) X + η ( ˜ δ ) ≤ ρ ϑ (cid:107) u − ˜ u δ (cid:107) X + (cid:16) ϑ (cid:0) κ ∆ / γ ∆ − ρ + O ( ξ ) ω + (cid:1) C + (cid:17) η ( δ ) Now let ϑ > ϑ C + <

1. Given a constant ω (which later will beselected such that ω / ϑ is sufﬁciently small), let ( κ ∆ γ ∆ − ) / ω , ( − ρ ) / ω , ξ / ω besufﬁciently small such that the expression κ ∆ / γ ∆ − ρ + O ( ξ ) ω ≤

1. We conclude that inthis case ϑ (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) is reduced by at least a factor max ( ρ , 2 ϑ C + ) < η ( δ ) ≤ ω (cid:107) ˜r δ (cid:107) ,so that(4.22) (cid:107) ˜r δ | J (cid:107) ≥ ϑ (cid:107) ˜r δ (cid:107) .We have (cid:107) r δ − ˜r δ (cid:107) (cid:46) (cid:107) u δ − ˜ u δ (cid:107) X (cid:46) t δ (4.14) ≤ ξ ( (cid:107) ˜r δ (cid:107) + η ( δ )) (4.21) (cid:46) ξ (cid:107) ˜r δ (cid:107) (4.23) ≤ ξ ( (cid:107) r δ (cid:107) + (cid:107) r δ − ˜r δ (cid:107) ) ,and so by taking ξ sufﬁciently small and kicking back (cid:107) r δ − ˜r δ (cid:107) , also(4.24) (cid:107) r δ − ˜r δ (cid:107) (cid:46) ξ (cid:107) r δ (cid:107) , which together with (4.21) implies that(4.25) η ( δ ) (cid:46) ω (cid:107) r δ (cid:107) .From (4.22)-(4.24) we infer that (cid:107) r δ (cid:107) (4.23) (cid:46) (cid:107) ˜r δ (cid:107) (4.22) (cid:46) (cid:107) ˜r δ | J (cid:107) ≤ (cid:107) r δ | J (cid:107) + (cid:107) r δ − ˜r δ (cid:107) (4.24) (cid:46) (cid:107) r δ | J (cid:107) + ξ (cid:107) r δ (cid:107) .By taking ξ sufﬁciently small and kicking back (cid:107) r δ (cid:107) , we conclude that there existsa constant ˜ ϑ > (cid:107) r δ | J (cid:107) ≥ ˜ ϑ (cid:107) r δ (cid:107) .Assuming that κ ∆ γ ∆ − ω are small enough, using (4.25) an application ofProposition 4.6 shows that there exists a constant ρ < (cid:107) ˜ δ u − u ˜ δ (cid:107) X ≤ ρ (cid:107) δ u − u δ (cid:107) X .Furthermore we have (cid:107) ˜r δ (cid:107) ≤ (cid:107) r δ (cid:107) + (cid:107) ˜r δ − r δ (cid:107) (cid:46) (cid:107) δ u − u δ (cid:107) X + t δ (cid:46) (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) + t δ (4.15) (cid:46) (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) ,and so from (4.21) by kicking back η ( δ ) and taking ω sufﬁciently small,(4.27) η ( δ ) (cid:46) ω (cid:107) u − ˜ u δ (cid:107) X .We conclude that ϑ (cid:107) u − ˜ u ˜ δ (cid:107) X + η ( ˜ δ ) ≤ ϑ (cid:107) ˜ δ u − u ˜ δ (cid:107) X + O ( η ( ˜ δ ) + t ˜ δ ) (4.16) ≤ ( ϑ + O ( ξ )) (cid:107) ˜ δ u − u ˜ δ (cid:107) X + O ( η ( ˜ δ )) (4.26) ≤ ( ϑ + O ( ξ )) ρ (cid:107) δ u − u δ (cid:107) X + O ( η ( ˜ δ )) ≤ ( ϑ + O ( ξ )) ρ (cid:2) (cid:107) u − ˜ u δ (cid:107) X + O ( η ( δ )) + t δ (cid:3) + O ( η ( ˜ δ )) (4.15) ≤ ( ϑ + O ( ξ )) ρ (cid:2) ( + O ( ξ )) (cid:107) u − ˜ u δ (cid:107) X + O ( η ( δ )) (cid:3) + O ( η ( δ )) (4.27) ≤ (cid:2) ( ϑ + O ( ξ )) ρ ( + O ( ξ )) + O ( ω ) (cid:3) (cid:107) u − ˜ u δ (cid:107) X = (cid:2) ρ + O ( ω + ξϑ ) (cid:3) ϑ (cid:107) u − ˜ u δ (cid:107) X .So for ω / ϑ and ξ / ϑ sufﬁciently small, also in this case we established a reductionof ϑ (cid:107) u − ˜ u ˜ δ (cid:107) X + η ( ˜ δ ) by at least a constant factor less than 1.What is left to show is that the algorithm terminates. We have shown that thevalue of ϑ (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) at passing the until-clause is r -linearly converging.We consider the corresponding value of e δ = (cid:107) ˜r δ (cid:107) + η ( δ ) + t δ . Arguments thatwe have used multiple times show that (cid:107) ˜r δ (cid:107) (cid:46) (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) + t δ , and so e δ (cid:46) (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) + t δ . Using that t δ ≤ ξ ( (cid:107) ˜r δ (cid:107) + η ( δ )) ≤ ξ e δ , for ξ sufﬁcientlysmall kicking back e δ shows that e δ (cid:46) (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) ,and so e δ (cid:46) ϑ (cid:107) u − ˜ u δ (cid:107) X + η ( δ ) . N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 19

This last statement implies that at some moment e δ ≤ ε , meaning that the algo-rithm stops. (cid:3)

5. W

AVELETS - IN - TIMES TENSORIZED WITH FINITE - ELEMENTS - IN - SPACE

We specify the parabolic problem at hand, as well as the type of families ( X δ ) δ ∈ ∆ and ( Y δ ) δ ∈ ∆ of ‘trial’ and ‘test’ spaces. A likely harmless minor further restrictionto these families that will be needed for the construction of an X -stable collection Θ δ that spans an X -stable complement space of X δ in X ¯ δ , speciﬁcally condition(4.2), will be postponed to Sect. 6.1.5.1. Continuous problem.

For some bounded domain Ω ⊂ R d , we take H = L ( Ω ) and, for some closed ∅ ⊆ Γ D ⊆ ∂ Ω , V = H Γ D ( Ω ) : = clos H ( Ω ) { u ∈ C ∞ ( Ω ) ∩ H ( Ω ) : u | Γ D = } , and a ( t ; η , ζ ) : = (cid:90) Ω K ∇ η · ∇ ζ + c ηζ d x ,where K = K (cid:62) ∈ L ∞ ( I × Ω ) with K ( · ) (cid:104) Id a.e., and c ∈ L ∞ ( I × Ω ) .W.l.o.g. we take T = Wavelets in time.

We will construct the trial and test spaces as the span ofwavelets-in-time tensorized with ﬁnite element spaces-in-space. In this subsectionwe collect some assumptions on the wavelets.At the ‘trial side’ we consider a countable collection Σ = { σ λ : λ ∈ ∨ Σ } of func-tions I → R known as wavelets . To each λ ∈ ∨ Σ we associate a value | λ | ∈ N ,called the level of λ . We assume that the wavelets are locally supported meaningthat sup n ∈ N , (cid:96) ∈ N { λ ∈ ∨ Σ : | λ | = (cid:96) , | supp σ λ ∩ − (cid:96) ( n + [

0, 1 ]) | > } < ∞ anddiam supp σ λ (cid:46) −| λ | . To each λ ∈ ∨ Σ with | λ | >

0, we associate one or more˜ λ ∈ ∨ Σ with | ˜ λ | = | λ | − | supp σ λ ∩ supp σ ˜ λ | > parent(s) of λ . We denote this relation between a parent ˜ λ and a child λ by˜ λ (cid:47) Σ λ .The deﬁnitions of parents and children give rise to obvious notions of ancestorsand descendants.To each λ ∈ ∨ Σ we associate some neighbourhood S Σ ( λ ) of supp σ λ withdiam S Σ ( λ ) (cid:46) −| λ | and ˜ λ (cid:47) Σ λ = ⇒ S Σ ( ˜ λ ) ⊇ S Σ ( λ ) .For some wavelets bases, e.g. Alpert wavelets ([Alp93]), it sufﬁces to take S Σ ( λ ) = supp σ λ . With C Σ : = sup λ ∈∨ Σ | λ | diam supp σ λ , a neighbourhood that in anycase is sufﬁciently large is { t ∈ I : dist ( t , supp σ λ ) ≤ C Σ −| λ | } . Indeed, if withthis deﬁnition t ∈ S Σ ( λ ) and ˜ λ (cid:47) Σ λ , then dist ( t , supp σ ˜ λ ) ≤ dist ( t , supp σ λ ) + diam ( supp σ λ ) ≤ C Σ −| λ | , i.e. t ∈ S Σ ( ˜ λ ) .We assume that Σ is a Riesz basis for L ( I ) , and, when renormalized in H ( I ) -norm, it is a Riesz basis for H ( I ) . Although not essential, thinking of waveletsbeing (essentially) constructed by means of dilation , we assume that (cid:107) σ λ (cid:107) H ( I ) (cid:104) | λ | . At the ‘test side’ we consider a similar collection Ψ = { ψ µ : µ ∈ ∨ Ψ } of wavelets,with the difference though that this one has to be an even orthonormal basis for L ( Ω ) , whilst, renormalized in H ( I ) -norm, it does not need to be a Riesz basisfor H ( I ) .We will assume that Σ and Ψ are selected such that for any (cid:96) ∈ N ,span { σ λ : | λ | ≤ (cid:96) } ∪ span { σ (cid:48) λ : | λ | ≤ (cid:96) } ⊆ span { ψ µ : | µ | ≤ (cid:96) } ,so that in particular(5.1) | µ | > | λ | = ⇒ (cid:104) σ λ , ψ µ (cid:105) L ( I ) = = (cid:104) σ (cid:48) λ , ψ µ (cid:105) L ( I ) .5.3. Uniform stability.

In the following proposition, we further specify the typeof families of trial and test spaces that we consider, and formulate sufﬁcient con-ditions for the requirements (3.3)-(3.4), which implied uniform stability of theGalerkin discretizations of our saddle-point problem (2.6).

Proposition 5.1.

For δ ∈ ∆ , letX δ = ∑ λ ∈∨ Σ σ λ ⊗ W δλ , Y δ = ∑ µ ∈∨ Ψ ψ µ ⊗ V δµ for subspaces W δλ , V δµ ⊆ V of which ﬁnitely many are non-zero. Let (cid:104) σ λ , ψ µ (cid:105) L ( I ) (cid:54) = = ⇒ V δµ ⊇ W δλ ,(5.2) and, for some constant γ ∆ > , for any µ ∈ ∨ Ψ , inf (cid:54) = w ∈ ∑ { λ ∈∨ Σ : (cid:104) σ (cid:48) λ , ψµ (cid:105) L ( I ) (cid:54) = } W δλ sup (cid:54) = v ∈ V δµ w ( v ) (cid:107) w (cid:107) V (cid:48) (cid:107) v (cid:107) V ≥ γ ∆ .(5.3) Then X δ ⊆ Y δ and inf (cid:54) = w ∈ X δ sup (cid:54) = v ∈ Y δ ( ∂ t w )( v ) (cid:107) ∂ t w (cid:107) Y (cid:48) (cid:107) v (cid:107) Y ≥ γ ∆ , i.e., the conditions (3.3) - (3.4) for uniform stability are satisﬁed.Proof. For w λ ∈ W δλ and w : = ∑ λ ∈∨ Σ σ λ ⊗ w λ ∈ X δ , σ λ = ∑ µ ∈∨ Ψ (cid:104) σ λ , ψ µ (cid:105) L ( I ) ψ µ shows that w = ∑ µ ∈∨ Ψ ψ µ ⊗ ∑ λ ∈∨ Σ (cid:104) σ λ , ψ µ (cid:105) L ( I ) w λ ∈ Y δ by the ﬁrst assumption.Similarly ∂ t w = ∑ µ ∈∨ Ψ ψ µ ⊗ ˜ v µ where ˜ v µ : = ∑ λ ∈∨ Σ (cid:104) σ (cid:48) λ , ψ µ (cid:105) L ( I ) w λ . For any ε > µ ∈ ∨ Ψ , there exists a v µ ∈ V δµ with˜ v µ ( v µ ) ≥ ( γ ∆ − ε ) (cid:107) ˜ v µ (cid:107) V (cid:48) (cid:107) v µ (cid:107) V and (cid:107) ˜ v µ (cid:107) V (cid:48) = (cid:107) v µ (cid:107) V . With v : = ∑ µ ∈∨ Ψ ψ µ ⊗ v µ ∈ Y δ , we infer that ( ∂ t w )( v ) = ∑ µ ∈∨ Ψ ˜ v µ ( v µ ) ≥ ( γ ∆ − ε ) (cid:107) ∂ t w (cid:107) Y (cid:48) (cid:107) v (cid:107) Y . (cid:3) In order to be able to apply at linear cost the arising linear operators in (4.9)-(4.10), we will restrict the type of trial spaces X δ = ∑ λ ∈∨ Σ σ λ ⊗ W δλ by imposingthe following tree condition (5.4) ˜ λ (cid:47) Σ λ = ⇒ W δ ˜ λ ⊇ W δλ .For the same reason the analogous condition will be needed for Y δ . For X δ thatsatisﬁes (5.4), below the latter will be veriﬁed, and sufﬁcient, more easily veriﬁableconditions for (5.2)-(5.3) are derived. N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 21

Proposition 5.2.

Let X δ = ∑ λ ∈∨ Σ σ λ ⊗ W δλ satisfy (5.4) . For µ ∈ ∨ Ψ , set “ W δµ : = ∑ { λ ∈∨ Σ : | λ | = | µ | , |S Ψ ( µ ) ∩S Σ ( λ ) | > } W δλ . Build Y δ = ∑ µ ∈∨ Ψ ψ µ ⊗ V δµ by taking V δµ = { } when “ W δµ = { } , and otherwiseV δµ ⊇ “ W δµ where inf (cid:54) = w ∈ “ W δµ sup (cid:54) = v ∈ V δµ w ( v ) (cid:107) w (cid:107) V (cid:48) (cid:107) v (cid:107) V ≥ γ ∆ (5.5) for some constant γ ∆ > . Then the conditions (5.2) and (5.3) from Proposition 5.1 foruniform stability are satisﬁed.When dim V δµ (cid:46) dim “ W δµ , then dim Y δ (cid:46) dim X δ , and under the natural conditionthat a larger “ W δµ gives rise to a larger (more precisely, not smaller) V δµ , the constructed Y δ satisﬁes the tree condition(5.6) ˜ µ (cid:47) Ψ µ = ⇒ V ˜ µ ⊇ V µ . Proof.

Let (cid:104) σ λ , ψ µ (cid:105) L ( I ) (cid:54) = (cid:104) σ (cid:48) λ , ψ µ (cid:105) L ( I ) (cid:54) =

0. Then |S Σ ( λ ) ∩ S Ψ ( µ ) | > | λ | ≥ | µ | by (5.1). When | λ | > | µ | , λ has an ancestor ˜ λ with | ˜ λ | = | µ | , W δ ˜ λ ⊇ W δλ , and S Σ ( ˜ λ ) ⊇ S Σ ( λ ) , and thus |S Σ ( ˜ λ ) ∩ S Ψ ( µ ) | >

0. We conclude that both ∑ { λ ∈∨ Σ : (cid:104) σ λ , ψ µ (cid:105) L ( I ) (cid:54) = } W δλ and ∑ { λ ∈∨ Σ : (cid:104) σ (cid:48) λ , ψ µ (cid:105) L ( I ) (cid:54) = } W δλ are included in “ W δµ , so that(5.2) and (5.3) are guaranteed by the selection of V δµ .The statement dim Y δ (cid:46) dim X δ when dim V δµ (cid:46) dim “ W δµ follows from dim “ W δµ ≤ ∑ { λ ∈∨ Σ : | λ | = | µ | , |S Ψ ( µ ) ∩S Σ ( λ ) | > } dim W δλ , and the fact that for any λ ∈ ∨ Σ , the num-ber of µ ∈ ∨ Ψ with | µ | = | λ | and |S Ψ ( µ ) ∩ S Σ ( λ ) | > µ (cid:47) Ψ µ , and so S Ψ ( ˜ µ ) ⊇ S Ψ ( µ ) . For each λ ∈ ∨ Σ with | λ | = | µ | and |S Ψ ( µ ) ∩S Σ ( λ ) | >

0, there exists a ˜ λ (cid:47) Σ λ , thus with S Σ ( ˜ λ ) ⊇ S Σ ( λ ) , and W δ ˜ λ ⊇ W δλ by (5.4).We conclude that “ W δ ˜ µ ⊇ “ W δµ , which completes the proof of (5.6). (cid:3) As follows from [DSW20, Thm. 3.10] (taking B to be the Riesz map H → H (cid:48) )condition (5.5) has the following equivalent formulation. Proposition 5.3.

Condition (5.5) is equivalent to existence of a projector Q ∈ L ( V , V ) with ran Q ⊆ V δµ , ran Q ∗ ⊇ “ W δλ , and (cid:107) Q (cid:107) L ( V , V ) ≤ γ ∆ . Selection of the spatial approximation spaces as ﬁnite element spaces.

Wewill select the spaces W δλ from a collection O of ﬁnite element spaces in V , whichcollection is closed under taking (ﬁnite) sums, and for which(5.7) inf W ∈O inf (cid:54) = w ∈ W sup (cid:54) = v ∈ W w ( v ) (cid:107) w (cid:107) V (cid:48) (cid:107) v (cid:107) V > γ ∆ > V δµ : = “ W δµ ∈ O . As follows from Proposition 5.3, (5.7) is equivalent to uniform boundednessw.r.t. the norm on V of the H -orthogonal projector onto W ∈ O . It is well-knownthat an example of such a collection O is given by the set of all ﬁnite element spaces W δλ w.r.t. quasi-uniform, uniformly shape regular conforming partitions of Ω into,say, d -simplices.It is known that the uniform boundedness w.r.t. the V -norm of the H -orthogonalprojector holds also true for ﬁnite element spaces w.r.t. locally reﬁned partitions aslong as the grading of the partitions is sufﬁciently mild. In [GHS16] it has beenshown that for d = newest vertex bisection (NVB), starting from a ﬁxed conforming initial partition T ⊥ with an assignmentof the newest vertices that satisﬁes a so-called matching condition, is sufﬁcientlymildly graded in the above sense. Since the overlay of two conforming NVB par-titions is a conforming NVB partition, this collection is closed under taking (ﬁnite)sums. In other words, with this collection of ﬁnite element spaces, which we willemploy in our experiments, again the choice (5.8) guarantees uniform stability.In [Car04] a result similar to that from [GHS16] has been shown for red-blue-green reﬁnement and lowest order ﬁnite elements again for d =

2. Unfortunately,for d > Remark γ ∆ close to 1) . We discussed uniform boundedness w.r.t. the V -norm of the H -orthogonal projectors onto a family of ﬁnite element spaces, which,by taking V δµ : = “ W δµ in Proposition 5.2, yields the uniform inf-sup condition (3.4) some value γ ∆ >

0, and so uniform stability of the Galerkin discretizations of thesaddle-point (2.6).For proving convergence of our adaptive routine Algorithm 4.8, however, weneeded a value of γ ∆ > V δµ = “ W δµ , the adaptive routine is r -linearly converging, there is no guarantee that 1 − γ ∆ is sufﬁciently small.Restricting to quasi-uniform partitions, below we show that 1 − γ ∆ can be madearbitrarily small by taking the mesh underlying V δµ to be a sufﬁciently deep, butﬁxed reﬁnement of the mesh underlying “ W δµ . One may conjecture that the sameresult holds true for sufﬁciently mildly graded locally reﬁned meshes.Let the diameters of any d -simplex in the partitions underlying “ W δµ and V δµ beproportional to h c and h f , respectively. For s ∈ [

0, 1 ] , let H s : = [ H , V ] s ,2 . In anycase when Ω is a Lipschitz domain, it is known that there exists an s ∈ (

0, 1 ] suchthat the solution u ∈ V of (cid:104) u , v (cid:105) V = f ( v ) ( v ∈ V ) satisﬁes (cid:107) u (cid:107) H + s ( Ω ) (cid:46) (cid:107) f (cid:107) ( H − s ) (cid:48) ,assuming the right-hand side is bounded. From this, the Aubin-Nitsche dual-ity argument shows that the V -orthogonal projector P µ onto V δµ satisﬁes (cid:107) Id − P µ (cid:107) L ( V , H − s ) (cid:46) h sf . On the other hand, on “ W δµ we have the following inverse in-equality (cid:107) · (cid:107) ( H − s ) (cid:48) (cid:46) h − sc (cid:107) · (cid:107) V (cid:48) (e.g. [SvV19, (5.14)]).Given w ∈ “ W δµ , for any ε >

0, there exists a v ∈ V with w ( v ) ≥ ( − ε ) (cid:107) w (cid:107) V (cid:48) (cid:107) v (cid:107) V .We infer that, for some constant C > w ( P µ v ) = w ( v ) + w (( Id − P µ ) v ) ≥ ( − ε ) (cid:107) w (cid:107) V (cid:48) (cid:107) v (cid:107) V − (cid:107) w (cid:107) ( H − s ) (cid:48) (cid:107) ( Id − P µ ) v (cid:107) H − s ≥ ( − ( ε + C ( h f / h c ) s ) (cid:107) w (cid:107) V (cid:48) (cid:107) v (cid:107) V ≥ ( − ( ε + C ( h f / h c ) s ) (cid:107) w (cid:107) V (cid:48) (cid:107) P µ v (cid:107) V N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 23

Since ε > γ ∆ ≥ ( − C ( h f / h c ) s ) which provesour assertion.5.5. Best possible rates.

Although so far we have not proved it, we expect that thesequence of approximations generated by our adaptive Algorithm 4.8 is not only r -linearly converging, but, ignoring data oscillation, that it is a sequence of approx-imations from a sequence of spaces from the family ( X δ ) δ ∈ ∆ that converges withthe best possible rate. In this subsection, we show that with our selection of the ( X δ ) δ ∈ ∆ , under some (mild) smoothness conditions on the solution u this best pos-sible rate equals the rate of best approximation to the solution of the correspondingstationary problem from the spatial ﬁnite element spaces w.r.t. the V -norm.Consider a family of spaces X δ = ∑ λ ∈∨ Σ σ λ ⊗ W δλ that satisﬁes (5.4), with the W δλ selected from a collection of ﬁnite element spaces O that in any case contains allsuch spaces that correspond to uniform reﬁnements of some initial partition of Ω .Let Σ a collection of wavelets of order d t , and assume that the ﬁnite element spacesare of order d x . When for each X δ , the space Y δ is selected as in Proposition 5.2,then the combination of (3.10) and the analysis from [SS09, Sect. 7.1] shows that ifthe exact solution u of our parabolic problem satisﬁes the mixed regularity condi-tion u ∈ H d t ( I ) ⊗ H d x ( Ω ) , then a suitable (non-adaptive) choice of the spaces W δλ yields a sequence of solutions u ˆ δδ ∈ X δ (for arbitrary Y ˆ δ ⊃ Y δ ) of the modiﬁeddiscretized saddle-point from Sect 3.3, for which (cid:107) u − u ˆ δδ (cid:107) X (cid:46) ( dim X δ ) − min ( d t − dx − d ) .Note that for d t − ≥ d x − d , the rate d x − d equals the best rate in the V -norm thatcan be expected when the ﬁnite element spaces are employed for solving the cor-responding stationary problem, which is posed on a d -dimensional domain insteadover the d + adaptive choice of the W δλ as ﬁnite element spaces w.r.t. a sufﬁ-ciently ‘rich’ collection of locally reﬁned partitions, as the collection of all conform-ing NVB partitions, it can be expected that the rate min ( d t − d x − d ) is realizedunder much milder regularity conditions on u .When instead of being ﬁnite element spaces, the spaces W δλ can be selected asthe spans of some wavelets from a Riesz basis for V of order d x , and additionallythe tree condition (5.4) is dropped, a precise characterization of those u that canbe approximated at a rate s < min ( d t − d x − d ) in terms of tensor products ofBesov spaces can be deduced from [Nit06, SU09]. The collection of ﬁnite elementspaces w.r.t. locally reﬁned meshes, as those generated by NVB, is very resem-blant to the collection of spans of sets of such wavelets when on these sets a treecondition is imposed similar to the tree constraint (5.4) that we imposed in thetemporal direction. In other words, the collection of spaces X δ that we consider issimilar to the collection of spans of sets of tensor products of temporal and spatialwavelets when these sets satisfy a ‘double-tree’ constraint. In view of results from[BDDP02], we do not expect that this constraint makes the resulting approxima-tion classes much smaller.5.6. Preconditioners.

Our adaptive solution method of the parabolic problem re-quires optimal preconditioners for E δ Y (cid:48) AE δ Y and S ¯ δδ , i.e., for both Z = Y and Z = X and δ ∈ ∆ , we need operators K δ Z = K δ Z (cid:48) ∈ L ( Z δ (cid:48) , Z δ ) with h ( K δ Z h ) (cid:104) (cid:107) h (cid:107) Z δ (cid:48) ( h ∈ Z δ (cid:48) ), moreover which should be applicable at linear cost.To construct these preconditioners, for Z ∈ { Y , X } we will select a symmetric,bounded, and coercive bilinear form on Z × Z , and after selecting some basis for Z δ , we will construct a matrix K δ Z = K δ Z (cid:62) that can be applied in linear complex-ity, and that is uniformly spectrally equivalent to the inverse of the stiffness matrix corresponding to this bilinear form (being the matrix representation of the linearmapping Z δ → Z δ (cid:48) deﬁned by the bilinear form w.r.t. the basis for Z δ being chosenand the corresponding dual basis for Z δ (cid:48) ). Then K δ Z ∈ L ( Z δ (cid:48) , Z δ ) , deﬁned as theoperator whose matrix representation is K δ Z w.r.t. the aforementioned bases of Z δ (cid:48) and Z δ , is the preconditioner that satisﬁes our needs.Notice that the choice of the basis for Z δ is irrelevant. Indeed, denoting theaforementioned stiffness matrix as C δ Z with corresponding operator C δ Z = C δ Z (cid:48) ∈L is ( Z δ , Z δ (cid:48) ) , one may verify that (cid:107) K δ Z (cid:107) L ( Z δ (cid:48) , Z δ ) (cid:107) ( K δ Z ) − (cid:107) L ( Z δ , Z δ (cid:48) ) (cid:104) (cid:107) K δ Z C δ Z (cid:107) L ( Z δ , Z δ ) (cid:107) ( K δ Z C δ Z ) − (cid:107) L ( Z δ , Z δ ) = λ max ( K δ Z C δ Z ) λ min ( K δ Z C δ Z ) .5.6.1. Preconditioner at the ‘test side’.

Let Y = Z . Since Ψ is an orthonormal ba-sis for L ( I ) , any y ∈ Y is of the form ∑ µ ∈∨ Ψ ψ µ ⊗ v µ for some v µ ∈ V with ∑ µ ∈∨ Ψ (cid:107) v µ (cid:107) V < ∞ . Taking as bilinear form on Y × Y simply the scalar producton Y × Y , we have (cid:104) ∑ µ ∈∨ Ψ ψ µ ⊗ v ( ) µ , ∑ µ ∈∨ Ψ ψ µ ⊗ v ( ) µ (cid:105) Y = ∑ µ ∈∨ Ψ (cid:104) v ( ) µ , v ( ) µ (cid:105) V .Equipping Y δ = ∑ µ ∈∨ Ψ ψ µ ⊗ V δµ with a basis of type ∪ µ ∈∨ Ψ ψ µ ⊗ Φ δµ , the resultingstiffness matrix reads as blockdiag [ A δµ ] µ ∈∨ Ψ , where A δµ = (cid:104) Φ δµ , Φ δµ (cid:105) V is the stiffnessmatrix of (cid:104)· , ·(cid:105) V w.r.t. Φ δµ . Selecting K δµ (cid:104) ( A δµ ) − , the matrix representation of theoptimal preconditioner reads as K δ Y = blockdiag [ K δµ ] µ ∈∨ Ψ .It is well-known that when V δµ is a ﬁnite element space, possibly w.r.t. a locallyreﬁned partition, suitable K δµ of multi-grid type are available. These K δµ can beapplied in linear complexity, and so can K δ Y .To show, in Theorem 4.9, that our adaptive Algorithm 4.8 is r -linearly converg-ing we imposed the condition that C ∆ − (cid:107) ( E δ Y (cid:48) AE δ Y ) − − K δ Y (cid:107) L ( Y δ (cid:48) , Y δ ) or, equivalently, (cid:107) Id − K δ Y E δ Y (cid:48) AE δ Y (cid:107) L ( Y δ , Y δ ) is suf-ﬁciently small, i.e. the eigenvalues of K δ Y E δ Y (cid:48) AE δ Y are sufﬁciently close to 1. Givenan initial optimal, self-adjoint, and coercive preconditioner K δ Y , and some upperand lower bounds on the spectrum of the preconditioned system, one can satisfythe latter condition by polynomial acceleration using Chebychev polynomials ofsufﬁciently high degree. In our numerical experiments, it turned out that it wasnot needed to apply this ‘acceleration’. N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 25

Preconditioner at the ‘trial side’.

The preconditioner presented in this sectionis inspired by constructions of preconditioners in [And16, NS19] for parabolicproblems discretized on a tensor product of temporal and spatial spaces.Thanks to Σ and { −| λ | σ λ : λ ∈ ∨ Σ } being Riesz bases for L ( I ) and H ( I ) ,any x ∈ X is of the form ∑ λ ∈∨ Σ σ λ ⊗ w λ for some w λ ∈ V with ∑ λ ∈∨ Σ (cid:107) w λ (cid:107) V + | λ | (cid:107) w λ (cid:107) V (cid:48) < ∞ , and (cid:104) ∑ λ ∈∨ Σ σ λ ⊗ w ( ) λ , ∑ λ ∈∨ Σ σ λ ⊗ w ( ) λ (cid:105) : = ∑ λ ∈∨ Σ (cid:104) w ( ) λ , w ( ) λ (cid:105) V + | λ | (cid:104) w ( ) λ , w ( ) λ (cid:105) V (cid:48) is a symmetric, bounded, and coercive bilinear form on X × X . Equipping X δ = ∑ λ ∈∨ Σ σ λ ⊗ W δλ with a basis of type ∪ λ ∈∨ Σ σ λ ⊗ Φ δλ , the resulting stiffness matrixreads as blockdiag [ A δλ + | λ | (cid:104) Φ δλ , Φ δλ (cid:105) V (cid:48) ] λ ∈∨ Σ where A δλ = (cid:104) Φ δλ , Φ δλ (cid:105) V .Thanks to our assumption (5.7), for u ∈ W δλ it holds that (cid:107) u (cid:107) V (cid:48) (cid:46) sup (cid:54) = w ∈ W δλ (cid:104) u , w (cid:105)(cid:107) w (cid:107) V ≤(cid:107) u (cid:107) V (cid:48) . With u denoting the representation of u w.r.t. Φ δλ , we havesup (cid:54) = w ∈ W δλ (cid:104) u , w (cid:105)(cid:107) w (cid:107) V = (cid:107) ( A δλ ) − M δλ u (cid:107) ,where M δλ = (cid:104) Φ δλ , Φ δλ (cid:105) , so that (cid:104) Φ δλ , Φ δλ (cid:105) V (cid:48) (cid:46) M δλ ( A δλ ) − M δλ ≤ (cid:104) Φ δλ , Φ δλ (cid:105) V (cid:48) .Since both A δλ and M δλ are symmetric and positive deﬁnite, [PW12, Thm. 4]shows that (cid:0) A δλ + | λ | M δλ ( A δλ ) − M δλ (cid:1) ≤ ( A δλ + | λ | M δλ )( A δλ ) − ( A δλ + | λ | M δλ ) ≤ A δλ + | λ | M δλ ( A δλ ) − M δλ .Now assuming that(5.9) K δλ (cid:104) ( A δλ + | λ | M δλ ) − ,we infer that K δ X = blockdiag (cid:104) K δλ A δλ K δλ (cid:105) λ is the matrix representation of an optimal preconditioner.Notice that (5.9) requires an optimal preconditioner of a discretized reaction-diffusion equation that is robust w.r.t. to the size of the (constant) reaction term. In[OR00] it was shown that, under a ‘full-regularity’ assumption, for quasi-uniformmeshes multiplicative multi-grid yields such a preconditioner, moreover whoseapplication can be performed at linear cost. Although we expect that using the the-ory of subspace correction methods the full regularity assumption can be avoided,and furthermore that the optimality, robustness and linear complexity result ex-tends to locally reﬁned meshes, proofs of such extensions seem not to be available. ˜ ν ˜ ν ν F IGURE

1. ˜ ν , ˜ ν (cid:47) N ν , and T and its reﬁnement T d + (for d = CONCRETE REALIZATION

The collection O of ﬁnite element spaces, and the mapping δ → ¯ δ . We fur-ther specify the collection O of ﬁnite element spaces, construct a linearly inde-pendent set in H Γ D ( Ω ) , known as the hierarchical basis, and equip it with a treestructure such that there exists a 1-1 correspondence between the ﬁnite elementspaces in O , and the spans of subsets of the hierarchical basis that form trees.With this speciﬁcation of O , there will be a 1-1 correspondence between thespaces X δ = ∑ λ ∈ σ λ σ λ ⊗ W δλ with W δλ ∈ O that satisfy (5.4), and the spans of col-lections of tensor products of wavelets σ λ and hierarchical basis functions whosesets of index pairs are lower , also known as downward closed. Given such a X δ , wewill deﬁne X ¯ δ by a certain enlargement the lower set.For d ≥

2, let T be the family of all conforming partitions of a polytope Ω ⊂ R d into (closed) d -simplices that can be created by NVB starting from some givenconforming initial partition T ⊥ with an assignment of the newest vertices that sat-isﬁes the matching condition, see [Ste08]. We deﬁne a partial order on T by writing T (cid:22) ˜ T when ˜ T is a reﬁnement of T .With some small adaptations that we leave to the reader, in the following thecase d = T to be the family of a partitions of Ω into(closed) subintervals that can be constructed by bisections from T ⊥ = { Ω } suchthat the generations of any two neighbouring subintervals in any T ∈ T differ bynot more than one.The collection O that we will consider is formed by the spaces W = W T of continuous piecewise linears w.r.t. T ∈ T , zero on a possible Dirichlet boundary Γ D being the union of ∂ T ∩ ∂ Ω for some T ∈ T ⊥ . We expect that generalizations toﬁnite element spaces of higher order do not impose essential difﬁculties.For T ∈ T : = ∪ T ∈ T { T : T ∈ T } , we set gen ( T ) to be the number of bisectionsneeded to create T from its ‘ancestor’ T (cid:48) ∈ T ⊥ . With N being the set of all vertices(or nodes) of all T ∈ T , for ν ∈ N we set gen ( ν ) : = min { gen ( T ) : T ∈ T , ν ∈ T } .Any ν ∈ N with gen ( ν ) > T ∈ T with gen ( T ) = gen ( ν ) −

1. The vertices ˜ ν of these T with gen ( ˜ ν ) = gen ( ν ) − ν . We denote this relation between a parent ˜ ν and a child ν by ˜ ν (cid:47) N ν , see Figure 1. Vertices ν ∈ N with gen ( ν ) = T of Ω into T ∈ T is in T if and onlyif the set N T of vertices of all T ∈ T forms a tree , meaning that it contains all ν ∈ N with gen ( ν ) = ν ∈ N T with gen ( ν ) >

0, cf. [DKS15]for the d = N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 27

Deﬁnition 6.1.

For any

T ∈ T , we deﬁne T d + ∈ T (denoted as T ++ in [DKS15]for the d = T ∈ T by its 2 d ‘descendants’ of the d thgeneration, see Figure 1.Since this reﬁnement adds exactly one vertex at the midpoint on any edge of all T ∈ T , one infers that indeed T d + ∈ T . The corresponding tree N T d + is createdfrom N T by the addition of all descendants up to generation d of all ν ∈ N T . For ν ∈ N , we set φ ν as the continuous piecewise linear function w.r.t. the uni-form partition { T ∈ T : gen ( T ) = gen ( ν ) } ∈ T , which function is 1 at ν and 0at all other vertices of this partition. Setting N : = N \ Γ D and, for any T ∈ T , N T ,0 : = N T \ Γ D , the collection { φ ν : ν ∈ N } is known as the hierarchical basis ,and for any T ∈ T , it holds that W T = span { φ ν : ν ∈ N T ,0 } .With above speciﬁcation of the collection O of ﬁnite element spaces, there existsa 1-1 correspondence between the spaces ∑ λ ∈∨ Σ σ λ ⊗ W δλ with W δλ ∈ O that satisfy(5.4), and the spaces of the form(6.1) X δ = span { σ λ ⊗ φ ν : ( λ , ν ) ∈ I δ ,0 : = I δ \ ( ∨ Σ × Γ D ) } for some ﬁnite I δ ⊂ ∨ Σ × N being a lower set in the sense(6.2) ( λ , ν ) ∈ I δ and (cid:26) ˜ λ (cid:47) Σ λ = ⇒ ( ˜ λ , ν ) ∈ I δ ,˜ ν (cid:47) N ν or gen ( ˜ ν ) = = ⇒ ( λ , ˜ ν ) ∈ I δ .For above speciﬁcation of X δ , from Proposition 5.2 with the speciﬁcation (5.8)one infers that the corresponding space(6.3) Y δ = span { ψ µ ⊗ φ ν : ( µ , ν ) ∈ I Y δ ,0 } ,where(6.4) I Y δ ,0 : = { ( µ , ν ) : ∃ ( λ , ν ) ∈ I δ ,0 , µ ∈ ∨ Ψ , | µ | = | λ | , |S Ψ ( µ ) ∩ S Σ ( λ ) | > } which index set is a lower set. Remark . The fact that the in-dex sets of the bases for X δ and Y δ are lower sets is the key why it is possibleto compute residuals of the system S ¯ δδ u δ = f δ ((4.9)) in O ( dim X δ ) operations.Indeed, when one has a bilinear form that is ‘local’ and equals the tensor prod-uct of bilinear forms in time and space, and two spaces spanned by tensor prod-uct multi-level bases corresponding to lower sets, then the resulting generalizedsystem matrix w.r.t. both bases can be applied in a number of operations that isproportional to the sum of the dimensions of both spaces. The algorithm that re-alizes this complexity makes a clever use of multi- to single-scale transformationsalternately in time and space. In a ‘uniform’ sparse-grid setting, i.e., without ‘localreﬁnements’, this algorithm was introduced in [BZ96], and it was later extendedto general lower sets in [KS14]. The deﬁnition of a lower set in [KS14], there calledmulti-tree, is more restrictive than our current deﬁnition that allows more local-ized reﬁnements, and details about the matrix-vector multiplication and a proofof its optimal computational complexity can be found in [vVW21]. Deﬁnition 6.3.

Given X δ = span { σ λ ⊗ ˆ φ ν : ( λ , ν ) ∈ I δ ,0 } for some lower set I δ ⊂∨ Σ × N , we deﬁne the lower set I ¯ δ , and with that X ¯ δ , by adding, for each ( λ , ν ) ∈ I δ The addition of only all children of all ν ∈ N T yields a tree only if T is a uniform partition. and any child ˜ λ of λ and any descendant ˜ ν of ν up to generation d , all pairs ( ˜ λ , ν ) and ( λ , ˜ ν ) to I δ .6.2. The collection Θ δ such that X ¯ δ = X δ ⊕ Θ δ . Recall that for the bulk chasingprocess we need an ‘ X -stable’ basis Θ δ that spans an ‘ X -stable’ complement spaceof X δ in X ¯ δ , i.e., a collection that satisﬁes (4.2). For that goal we deﬁne a modiﬁedhierarchical basis { ˆ φ ν : ν ∈ N } by ˆ φ ν : = φ ν when gen ( ν ) =

0, andˆ φ ν : = φ ν − ∑ { ˜ ν ∈ N : ˜ ν (cid:47) N ν } (cid:82) Ω φ ν dx (cid:82) Ω φ ˜ ν dx φ ˜ ν { ˜ ν ∈ N : ˜ ν (cid:47) N ν } otherwise. Notice that for those ν with gen ( ν ) > Γ D it holds that (cid:82) Ω ˆ φ ν dx =

0, i.e., ˆ φ ν has a vanishing moment , and furthermorethat for any T ∈ T , W T = span { ˆ φ ν : ν ∈ N T ,0 } .For any T ∈ T , it holds that W T = span { φ ν : ν ∈ N T ,0 } = span { ˆ φ ν : ν ∈ N T ,0 } ,and thus for any lower set I δ ⊂ ∨ Σ × N , X δ = span { σ λ ⊗ ˆ φ ν : ( λ , ν ) ∈ I δ ,0 } = span { σ λ ⊗ φ ν : ( λ , ν ) ∈ I δ ,0 } .Moreover, for any T ∈ T , the basis transformation from the modiﬁed to unmodi-ﬁed hierarchical basis for W T can be applied in linear complexity traversing fromthe leaves to the roots.Given δ , the collection Θ δ will be the set of properly normalized functions σ λ ⊗ ˆ φ ν for ( λ , ν ) ∈ I ¯ δ ,0 \ I δ ,0 . In order to demonstrate (4.2), we have to impose somegradedness assumption on the lower sets I δ . Deﬁnition 6.4.

The gradedness constant of a lower set I δ ⊂ ∨ Σ × N is the smallest L δ ∈ N such that for all ( λ , ν ) ∈ I δ for which ν has an ancestor ˜ ν ∈ N with gen ( ν ) − gen ( ˜ ν ) = L δ , it holds that ( ˘ λ , ˜ ν ) ∈ I δ for any child ˘ λ ∈ ∨ Σ of λ . Remark . Under the (un-proven) assumption that our adaptive method creates a sequence of spaces X δ which are quasi-optimal for the approximation of the solution of the the parabolicPDE, one may hope that these spaces have a uniformly bounded gradedness constant ,unless (locally) the solution u is extremely more smooth as function of t than asfunction of the spatial variables.To see this, consider the non-adaptive sparse grid index sets of the form { ( λ , ν ) ∈∨ Σ × N : ˜ L | λ | + gen ( ν ) ≤ N } for some constant ˜ L and N ∈ N , which are appropri-ate when the behaviour of u as function of t on the one hand and that of the spatialvariables on the other is globally similar. Then for ˜ L ≤ L , the gradedness constantof this index set is ≤ L , where the smallest spatial resolution in the ‘sparse-gridmesh’ equals the smallest temporal resolution in this mesh to the power ˜ L / d . Soonly when a polynomial decay of the spatial resolution as function of the temporalresolution does not sufﬁce for a proper approximation of u , one cannot expect tohave a gradedness constant that is uniformly bounded. Proposition 6.6.

For ( λ , ν ) ∈ ∨ Σ × N , let e λµ : = (cid:113) ( d − ) gen ( ν ) + | λ | ( − d − ) gen ( ν ) and θ λν : = e λµ σ λ ⊗ ˆ φ ν . For any δ ∈ ∆ , let Θ δ : = { θ λν : ( λ , ν ) ∈ J δ : = I ¯ δ ,0 \ I δ ,0 } . Then N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 29 X δ ⊕ span Θ δ = X ¯ δ , and there exist constants < m δ ≤ M δ , only dependent on thegradedness constant L δ , such that for any z ∈ X δ and c = ( c λν ) ( λ , ν ) ∈ I ¯ δ ,0 \ I δ ,0 ⊂ R ,m δ ( (cid:107) z (cid:107) X + (cid:107) c (cid:107) ) ≤ (cid:107) z + c (cid:62) Θ δ (cid:107) X ≤ M δ ( (cid:107) z (cid:107) X + (cid:107) c (cid:107) ) So under the mild assumption that the gradedness constants of the sets X δ thatwe encounter are uniformly bounded, we have shown that the condition (4.2) issatisﬁed. Proof.

Setting c λν : = ( λ , ν ) (cid:54)∈ I ¯ δ ,0 \ I δ ,0 , and writing z = ∑ λ ∈∨ Σ σ λ ⊗ w λ where w λ ∈ span { ˆ φ ν : ( λ , ν ) ∈ I δ ,0 } , from Σ and { −| λ | σ λ : λ ∈ ∨ Σ } being Rieszbases for L ( I ) and H ( I ) , and c (cid:62) Θ δ = ∑ λ σ λ ⊗ ∑ ν e λµ c λν ˆ φ ν , an application ofLemma 6.7 given below shows that (cid:107) z + c (cid:62) Θ δ (cid:107) X (cid:104) ∑ λ (cid:8) (cid:107) w λ + ∑ ν e λµ c λν ˆ φ ν (cid:107) H ( Ω ) + | λ | (cid:107) w λ + ∑ ν e λµ c λν ˆ φ ν (cid:107) H Γ D ( Ω ) (cid:48) (cid:9) (cid:104) (cid:8) ∑ λ (cid:107) w λ (cid:107) H ( Ω ) + | λ | (cid:107) w λ (cid:107) H Γ D ( Ω ) (cid:48) + ∑ ν (cid:0) ( d − ) gen ( ν ) + | λ | ( − d − ) gen ( ν ) (cid:1) | e λµ c λν | (cid:9) = ∑ λ (cid:107) w λ (cid:107) H ( Ω ) + | λ | (cid:107) w λ (cid:107) H Γ D ( Ω ) (cid:48) + ∑ ν | c λν | (cid:104) (cid:107) z (cid:107) X + (cid:107) c (cid:107) ,with the (cid:104) -symbol in the second line dependent on the gradedness constant. (cid:3) Lemma 6.7.

For ˜ T ∈ T , and either T (cid:51) T (cid:22) ˜ T and v ∈ W T , or T = ∅ , N T ,0 : = ∅ ,and v = , and scalars ( d ν ) ν ∈ N ˜ T ,0 \ N T ,0 , it holds that (cid:107) v + ∑ ν d ν ˆ φ ν (cid:107) H ( Ω ) (cid:104) (cid:107) v (cid:107) H ( Ω ) + ∑ ν ( d − ) gen ( ν ) | d ν | (6.5) (cid:107) v + ∑ ν d ν ˆ φ ν (cid:107) H Γ D ( Ω ) (cid:48) (cid:104) (cid:107) v (cid:107) H − ( Ω ) + ∑ ν ( − d − ) gen ( ν ) | d ν | (6.6) with the constants hidden in the (cid:104) -symbols only dependent on M ˜ T T : = max { gen ( ˜ T ) − gen ( T ) : ˜ T (cid:51) ˜ T ⊂ T ∈ T } or M ˜ T T : = max { gen ( ˜ T ) : ˜ T ∈ ˜ T } for T = ∅ .Proof. Once the equivalences are shown uniformly in any

T (cid:22) ˜ T for which M ˜ T T =

1, a repeated application of these equivalences shows them for the general case,with constants that are only dependent on M ˜ T T . So in the following, it sufﬁces toconsider the case that M ˜ T T =

1. The case T = ∅ is easy, so we will consider thecase that T ∈ T .Let Φ ˜ T = { φ ˜ T , ν : ν ∈ N ˜ T ,0 } denote the standard nodal basis for W ˜ T . For anyweight function 0 < w ˜ T ∈ ∏ T ∈ ˜ T P ( T ) , with (cid:107) · (cid:107) L w ˜ T ( Ω ) : = (cid:107) w ˜ T · (cid:107) L ( Ω ) it holdsthat (cid:107) ∑ ν c ν φ ˜ T , ν (cid:107) L w ˜ T ( Ω ) (cid:104) ∑ ν | c ν | (cid:107) φ ˜ T , ν (cid:107) L w ˜ T ( Ω ) , only dependent on the spec-trum of the element mass matrix on a reference element, i.e., on the space dimen-sion d , so independent of the weight function w ˜ T . We refer to this equivalence bysaying that Φ ˜ T is (uniformly) stable w.r.t. (cid:107) · (cid:107) L w ˜ T ( Ω ) .Notice that for ν ∈ N ˜ T ,0 \ N T ,0 , it holds that φ ˜ T , ν = φ ν . W.r.t. the splitting N ˜ T ,0 = N T ,0 + N ˜ T ,0 \ N T ,0 , the basis transformation from Φ T ∪ { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } to Φ T ∪ { φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } is of the form (cid:20) Id ∗ (cid:21) , and the basis trans-formation from the latter basis to Φ ˜ T is of the form (cid:20) Id 0 ∗ Id (cid:21) . The entries in bothnon-zero off-diagonal blocks are uniformly bounded, where non-zeros can onlyoccur for index pairs ( ν , ˜ ν ) that are vertices of the same ˜ T ∈ ˜ T . Consequently,for a family of weight functions ( w ˜ T ) ˜ T ∈ T that have uniformly bounded jumps in thesense that(6.7) sup ˜ T ∈ T sup { T , T (cid:48) ∈ ˜ T : T ∩ T (cid:48) (cid:54) = ∅ } w ˜ T | T w ˜ T | T (cid:48) < ∞ ,all basis transformations between the L w ˜ T ( Ω ) -normalized bases Φ T ∪ { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } , Φ T ∪ { φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } and Φ ˜ T are uniformly bounded.Since, as we have seen, Φ ˜ T is (uniformly) stable w.r.t. (cid:107) · (cid:107) L w ˜ T ( Ω ) , we concludethat also Φ T ∪ { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } and Φ T ∪ { φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } are (uni-formly) stable w.r.t. (cid:107) · (cid:107) L w ˜ T ( Ω ) . Because of the uniform K-mesh property of T ∈ T ,examples of families of weights that satisfy (6.7) are given by ( h s ˜ T ) ˜ T ∈ T for any s ∈ R , where h ˜ T | T : = − gen ( T ) / d ( (cid:104) | T | d ) ( T ∈ ˜ T ).For showing (6.5), let P T : W ˜ T → W T be the projector with ran P T = W T andran ( Id − P T ) = span { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } . Using the form of the basis trans-formation from Φ T ∪ { φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } to Φ T ∪ { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } , oneinfers that Id − P T = J T ◦ ( Id − I T ) ,where I T is the nodal interpolator onto W T , and J T is deﬁned by J T φ ν = ˆ φ ν .Since both { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } and { φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } are uniformly stablew.r.t. (cid:107) h − T · (cid:107) L ( Ω ) , and (cid:107) h − T ˆ φ ν (cid:107) L ( Ω ) (cid:104) (cid:107) h − T φ ν (cid:107) L ( Ω ) , it follows that J T is uni-formly bounded w.r.t. (cid:107) h − T · (cid:107) L ( Ω ) , i.e. (cid:107) h − T J T h ˜ T (cid:107) L ( L ( Ω ) , L ( Ω )) (cid:46)

1, and so (cid:107) h − T ( Id − P T ) v (cid:107) L ( Ω ) (cid:46) (cid:107) h − T ( Id − I T ) v (cid:107) L ( Ω ) (cid:46) | v | H ( Ω ) ( v ∈ W ˜ T ) .Using the common inverse inequality (cid:107) · (cid:107) H ( Ω ) (cid:46) (cid:107) h − T · (cid:107) L ( Ω ) on W ˜ T , we inferthat ( Id − P T ) is uniformly bounded in the H ( Ω ) -norm, and that (cid:107) · (cid:107) H ( Ω ) (cid:104) (cid:107) h − T · (cid:107) L ( Ω ) on ran ( Id − P T ) . The proof of (6.5) is completed by the uniform sta-bility of { ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } w.r.t. (cid:107) h − T · (cid:107) L ( Ω ) , and the fact that (cid:107) h − T ˆ φ ν (cid:107) L ( Ω ) (cid:104) ( d − ) gen ( ν ) .Moving to (6.6), either by (cid:82) Ω ˆ φ ν dx =

0, or otherwise using the proximity of theDirichlet boundary Γ D by an application of Poincaré’s inequality, it holds that |(cid:104) ˆ φ ν , v (cid:105) L ( Ω ) | (cid:46) − gen ( ν ) / d (cid:107) ˆ φ ν (cid:107) L ( Ω ) | v | H ( supp ˆ φ ν ) ( ν ∈ N \ N T ⊥ ,0 ) .By using that for T ∈ ˜ T the number of ν ∈ N ˜ T ,0 \ N T ,0 for which supp ˆ φ ν hasnon-empty intersection with T is uniformly bounded, and furthermore that Φ T ∪{ ˆ φ ν : ν ∈ N ˜ T ,0 \ N T ,0 } is uniformly stable w.r.t. (cid:107) h T · (cid:107) L ( Ω ) , we infer that for any N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 31 level (cid:96) > − − (cid:96) /2 − (cid:96) /2 − (cid:96) /2 − (cid:96) level 0 level 1 F IGURE

2. Three-point hierarchical basis Σ . On level 0 there aretwo wavelets, and on level 1 there is one wavelet, whose par-ents are both wavelets on level 0. On each level (cid:96) > (cid:96) − wavelets, among them near each boundary one boundary-adapted wavelet, where each wavelet has one parent being thewavelet on level (cid:96) − S Σ ( λ ) can be taken equal to supp σ λ ). All but onewavelets have one (bdr. wav) or two vanishing moments. z = ∑ ν ∈ N T ,0 z ν φ T , ν ∈ W T it holds that (cid:107) ∑ ν ∈ N ˜ T ,0 \ N T ,0 d ν ˆ φ ν (cid:107) H Γ D ( Ω ) (cid:48) = sup (cid:54) = v ∈ H Γ D ( Ω ) (cid:104) ∑ ν ∈ N ˜ T ,0 \ N T ,0 d ν ˆ φ ν , v (cid:105) L ( Ω ) (cid:107) v (cid:107) H ( Ω ) (cid:46) (cid:115) ∑ ν ∈ N ˜ T ,0 \ N T ,0 | d ν | (cid:107) h ˜ T ˆ φ ν (cid:107) L ( Ω ) ≤ (cid:115) ∑ ν ∈ N T ,0 | z ν | (cid:107) h ˜ T φ T , ν (cid:107) L ( Ω ) + ∑ ν ∈ N ˜ T ,0 \ N T ,0 | d ν | (cid:107) h ˜ T ˆ φ ν (cid:107) L ( Ω ) (cid:104) (cid:107) h ˜ T ( z + ∑ ν ∈ N ˜ T ,0 \ N T ,0 d ν ˆ φ ν ) (cid:107) L ( Ω ) (cid:46) (cid:107) z + ∑ ν ∈ N ˜ T ,0 \ N T ,0 d ν ˆ φ ν (cid:107) H Γ D ( Ω ) (cid:48) (6.8)the last inequality by application of a less common inverse inequality which proofcan be found in [SvV19, Lemma 3.4] for general dimensions d . From (6.8) it fol-lows that Id − P T is uniformly bounded in the H Γ D ( Ω ) (cid:48) -norm, and also that (cid:107) ∑ ν ∈ N ˜ T ,0 \ N T ,0 d ν ˆ φ ν (cid:107) H Γ D ( Ω ) (cid:48) (cid:104) ∑ ν ∈ N ˜ T ,0 \ N T ,0 | d ν | ( − d − ) gen ( ν ) , where we used that (cid:107) h ˜ T ˆ φ ν (cid:107) L ( Ω ) (cid:104) ( − d − ) gen ( ν ) . The proof of (6.6) is completed. (cid:3) The wavelet collections Σ and Ψ . As wavelet basis Σ = { σ λ : λ ∈ ∨ Σ } weselect the three-point hierarchical basis illustrated in Figure 2. This basis is knownto be a Riesz basis for L ( I ) , and, after re-normalization, for H ( I ) (see [Ste96]).It also satisﬁes the other assumptions made in Sect. 5.2. The wavelets up to level (cid:96) span the space of continuous piecewise linear functions on I w.r.t. the uniformpartition into 2 (cid:96) subintervals.As wavelet basis Ψ = { ψ µ : µ ∈ ∨ Ψ } we take the orthonormal (discontinuous)piecewise linear wavelets, see Figure 3. The wavelets up to level (cid:96) span the spaceof (discontinuous) piecewise linear functions on I w.r.t. the uniform partition into2 (cid:96) subintervals. − + (cid:96) /2 −√ (cid:96) /2 √ (cid:96) /2 − (cid:96) /2 − (cid:96) −√ √ (cid:96) /2 √ (cid:96) /2

11 0 level (cid:96) ≥ + (cid:96) /2 F IGURE L ( I ) -orthonormal (discontinuous) piecewise linearwavelet basis Ψ . On level (cid:96) = (cid:96) ≥ (cid:96) wavelets of two types, each of them hav-ing 2 parents being the wavelets on level (cid:96) − S Ψ ( µ ) can be taken equalto supp ψ µ ). The wavelets on level 0 have either 0 or 1 vanishingmoment, all other wavelets have two vanishing moments.F IGURE

4. Index set ∨ Σ with parent-child relations, and the one-dimensional hierarchical basis.6.4. The family ( δ G , δ U ) δ ∈ ∆ . The index set ∨ Σ is naturally identiﬁed with the setof ‘nodal dyadic’ points, see Figure 4, which is the natural index set for the one-dimensional hierarchical basis that we denote by { φ λ : λ ∈ ∨ Σ } . Recalling that for δ ∈ ∆ , X δ = span { σ λ ⊗ φ ν : ( λ , ν ) ∈ I δ ,0 = I δ \ ( ∨ Σ × Γ D ) } for some lower set I δ ⊂ ∨ Σ × N , we deﬁne δ G : = span { φ λ ⊗ φ ν : ( λ , ν ) ∈ I δ } , δ U : = span { φ ν : ( λ , ν ) ∈ I δ , φ λ ( ) (cid:54) = } .Since the level of resolution of these spaces is comparable to that of X δ , basedon our experiences with wavelet and ﬁnite element methods we expect that withthis choice of ( δ G , δ U ) and the deﬁnition of X ¯ δδ , that saturation holds, i.e., thatAssumption 3.1 assumption is valid.Given g ∈ Y (cid:48) and u ∈ L ( Ω ) , it remains to deﬁne their approximations ( δ g , δ u ) ∈ ( δ G , δ U ) . In general, the construction of these approximations depends on the dataat hand. Below we give a construction that applies to general continuous g and u ,and that avoids quadrature issues.For ν ∈ N with gen ( ν ) =

0, let ˜ φ ν : = δ ν . Each ν ∈ N with gen ( ν ) > T ∈ T with gen ( T ) = gen ( ν ) −

1. Denoting the endpointsof this edge as ν , ν ∈ N , let ˜ φ ν : = δ ν − ( δ ν + δ ν ) . Then { ˜ φ ν : ν ∈ N } ⊂ C ( Ω ) (cid:48) N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 33 is biorthogonal to { φ ν : ν ∈ N } . With { ˜ φ λ : λ ∈ ∨ Σ } ⊂ C ( I ) (cid:48) deﬁned analogouslyfor the one-dimensional case, for g ∈ C ( I × Ω ) and u ∈ C ( Ω ) we deﬁne theinterpolants δ g : = ∑ ( λ , ν ) ∈ I δ ( ˜ φ λ ⊗ ˜ φ ν )( g ) φ λ ⊗ φ ν , δ u : = ∑ { ν : ( λ , ν ) ∈ I δ , φ λ ( ) (cid:54) = } ˜ φ ν ( u ) φ ν .Since we expect that for sufﬁciently smooth g and u , the errors (cid:107) g − δ g (cid:107) Y (cid:48) and (cid:107) u − δ u (cid:107) L ( Ω ) are of higher order than the approximation error inf w ∈ X δ (cid:107) u − w (cid:107) X ,for our convenience in the adaptive Algorithm 4.8 we ignore errors caused by data-oscillation by setting η ( · ) ≡ u δ requires computing the vectors (cid:2) (cid:104) δ g , ψ µ ⊗ φ ν (cid:105) L ( I ⊗ Ω ) (cid:3) ( µ , ν ) ∈ I Y ¯ δ ,0 , (cid:2) (cid:104) δ u , φ ν (cid:105) L ( Ω ) (cid:3) { ν : ( λ , ν ) ∈ I δ ,0 , σ λ ( ) (cid:54) = } which can be performed in O ( dim X δ ) operations because I δ and I Y ¯ δ ,0 are lowersets (and I Y ¯ δ ,0 (cid:46) I δ ). 7. N UMERICAL RESULTS

We test our algorithm on the heat equation, i.e., the parabolic problem with a ( t ; η , ζ ) = (cid:82) Ω ∇ η · ∇ ζ d x , posed on a two-dimensional polygonal spatial domain Ω , and Dirichlet boundary Γ D = ∂ Ω . Recall from §6.3 the three-point continuouspiecewise linear temporal wavelet basis Σ , the orthonormal discontinuous piece-wise linear temporal wavelet basis Ψ , and the hierarchical continuous piecewiselinear spatial basis Ξ : = { φ ν : ν ∈ N } .We consider ‘trial’ spaces X δ which are spanned by ﬁnite subsets of Σ ⊗ Ξ whoseindex sets are lower sets (more precisely, satisfy (6.1)-(6.2)), and corresponding‘test’ spaces Y δ spanned by ﬁnite subsets of Ψ ⊗ Ξ as deﬁned in (6.3)-(6.4). Weconstruct the enlarged trial space X ¯ δ as deﬁned in Def. 6.3, with correspondingtest space Y ¯ δ .For a given level N ∈ N , span { σ λ : | λ | ≤ N } coincides with the span ofthe continuous piecewise linears on an N -times recursive dyadic reﬁnement of I , and span { φ ν ∈ Ξ : gen ( ν ) ≤ N } coincides with that of the continuous piece-wise linears, zero at ∂ Ω , on a 2 N -times recursive bisection reﬁnement of an initialpartition T ⊥ . Therefore, the span of the ‘full’ tensor product { σ λ : | λ | ≤ N } ⊗{ φ ν : gen ( ν ) ≤ N } equals a space of lowest order continuous ﬁnite elementsw.r.t. a quasi-uniform shape regular product mesh into prismatic elements.Taking only those index pairs ( λ , ν ) for which 2 | λ | + gen ( ν ) ≤ N produces a ‘sparse’ tensor product on level N . Sparse tensor products allow to overcome the curse of dimensionality in the sense that for smooth solutions they achieve a ratein X -norm that is equal to the best rate in the H ( Ω ) -norm that can be expectedfor the corresponding stationary problem on the spatial domain, here the Poissonequation; see also Sect. 5.5.We run our adaptive Algorithm 4.8 with θ = ξ = , computing δ g and δ u as in Sect. 6.4. Since we envisage that in our experiments data-oscillation errorsare not dominant, for our convenience we took ω = ∞ . We solve the arising linearsystem of (4.9) using Preconditioned CG, using the previous solution as initialguess. We then perform Dörﬂer marking on the residual, yielding a minimal set J , and ﬁnally choose I ˜ δ as the smallest lower set containing J ∪ I δ . Due to thisconstraint generally we add index pairs outside of the marked set, i.e. I ˜ δ \ I δ (cid:41) J .Still, in our experiments, we observe I ˜ δ − I δ (cid:46) J with a moderate constant. Remark . Rather we would have applied an algorithm that produces a I ˜ δ suchthat I ˜ δ \ I δ is guaranteed to have an, up to a multiplicative factor, smallest cardinal-ity among all lower sets I ˜ δ ⊃ I δ that realize the bulk criterion. Such an algorithmwas introduced in [BD04, BFV19] for ‘single-tree’ approximation, but seems not tobe available for the ‘double-tree’ (i.e. lower set) constraint that we need here.We compare adaptive reﬁnement with non-adaptive full- and sparse tensorproducts, and monitor the error estimator E δ ( ˜ u δ ) from Proposition 4.5, the resid-ual error estimator from Proposition 4.3, and the L ( Ω ) trace error at t = Condition numbers of preconditioner.

For the calibration of our precondi-tioners, we consider Ω : = [

0, 1 ] , and compare uniformly reﬁned space-time mesheswith locally reﬁned meshes with reﬁnements towards { } × ∂ Ω .The replacement of the nonlocal operator ( E ¯ δ Y (cid:48) AE ¯ δ Y ) − in the forward applica-tion of S ¯ δδ by the block-diagonal preconditioner K ¯ δ Y from Sect. 5.6.2 is only guar-anteed to result in a convergent algorithm when the eigenvalues of K ¯ δ Y E ¯ δ Y (cid:48) AE ¯ δ Y aresufﬁciently close to one.In Table 1, we investigate the values κ δ : = max { λ max ( K ¯ δ Y A ¯ δ Y ) , 1/ λ min ( K ¯ δ Y A ¯ δ Y ) } with A ¯ δ Y the matrix representation of E ¯ δ Y (cid:48) AE ¯ δ Y , and K ¯ δ Y built from spatial multigridpreconditioners K ¯ δµ corresponding to n V-cycles. In each V-cycle we applied onepre- and one post Gauss-Seidel smoother. In case of a locally reﬁned spatial mesh,on each level these Gauss-Seidel updates were restricted to the vertices whose gen-eration is equal to that level as well as both endpoints of the edge on which thesevertices were inserted ([WZ17]). We see that for both uniform and locally reﬁnedspace-time meshes, κ δ converges to 1 rapidly in n , and is essentially independentof dim X δ . In our examples, κ δ is sufﬁciently close to one already for n = n = S ¯ δδ , we want to precondition S ¯ δδ itself as well. Following Sect. 5.6.2, we build a block-diagonal preconditioner tak-ing K δλ to correspond to m V-cycles of the aforementioned multigrid method nowapplied to A δλ + | λ | M δλ with A δλ and M δλ being stiffness- or mass-matrices. Table 2dim X δ n = n = n = n = n = n = uniform

729 1.343 1.070 1.017 1.004 1.001 1.00035937 1.360 1.075 1.019 1.004 1.001 1.0002146689 1.365 1.077 1.019 1.004 1.001 1.000 local

766 1.306 1.058 1.013 1.003 1.001 1.00030151 1.307 1.058 1.013 1.003 1.001 1.0001964797 1.307 1.058 1.013 1.003 1.001 1.000T

ABLE

1. Values κ δ : = max { λ max ( K ¯ δ Y A ¯ δ Y ) , 1/ λ min ( K ¯ δ Y A ¯ δ Y ) } usingspatial multigrid with n V-cycles.

N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 35 shows the condition numbers of the preconditioned matrix. We again see fast sta-bilization in m as well as in dim X δ . We ﬁx m = t δ .7.2. Smooth problem.

We consider the square domain Ω : = [

0, 1 ] and prescribe u ( t , x , y ) : = ( + t ) x ( − x ) y ( − y ) with derived data u and g . For this smooth solution, full and sparse tensor prod-ucts are expected to yield the best possible error decays proportional to ( dim X δ ) − and ( dim X δ ) − , respectively.The left side of Figure 5 shows the error progressions for the smooth problem.We plot the error estimator E δ ( ˜ u δ ) : = E δ ( ˜ u δ ; δ g , δ u ) (cid:104) (cid:107) δ u − ˜ u δ (cid:107) X from Proposi-tion 4.5, the residual error estimator (cid:107) r δ (cid:107) , and (cid:107) γ ( δ u − ˜ u δ ) (cid:107) L ( Ω ) . We see that theerror progressions are as expected. For this solution, adaptive reﬁnement yields noadvantage over sparse grid reﬁnement. We observe a higher order of convergencefor the trace at t = L ( Ω ) .7.3. Moving peak problem.

We consider a square domain Ω : = [

0, 1 ] and select u ( t , x , y ) : = x ( − x ) y ( − y ) exp ( − [( x − t ) + ( y − t ) ]) .We took this example from [LS20]. The solution is smooth, and almost zero ev-erywhere except on a small strip near the diagonal from (

0, 0, 0 ) to (

1, 1, 1 ) of thespace-time cylinder. As u is smooth, we expect sparse grid reﬁnements to asymp-totically yield the optimal error decay proportional to ( dim X δ ) − , albeit with aterrible constant. Adaptive reﬁnement should be able to achieve the same rate atquantitatively smaller doubletrees.From the right of Figure 5, we see that the sparse grid rate is not (yet) optimal,while our adaptive routine is able to ﬁnd the optimal rate from dim X δ ≈ onwards. Figure 6 shows the number of basis functions σ λ ⊗ φ ν whose supportsintersect given points in the time-space cylinder. We see the adaptation to themoving peak. dim X δ m = m = m = m = m = m = uniform local ABLE

2. Spectral condition numbers of K δ X S ¯ δδ , using spatialmultigrid with m V-cycles. u )|| r |||| ( u u )|| L ( ) adaptivesparse gridfull gridadaptivesparse gridfull grid 10 u )|| r |||| ( u u )|| L ( ) adaptivesparse gridfull gridadaptivesparse gridfull grid F IGURE

5. Error progressions for (left) the smooth problem and(right) the moving peak problem. Shown: estimated X -norm error(solid line), residual norm (dashed), and t = X δ for adaptive (black), sparse grid (red), andfull grid reﬁnement (orange).7.4. Cylinder problem.

Selecting the L-shaped domain Ω : = [ −

1, 1 ] \ [ −

1, 0 ] with data u ≡ g ( t , x , y ) : = t · { x + y < } , the true solution is known to besingular at the re-entrant corner and at the wall of the cylinder { ( t , x , y ) : x + y = } . We took this example from [FK19]. The left side of Figure 7 shows the errorprogression for this cylinder problem. We see that the full grid error decay propor-tional to ( dim X δ ) − is improved to an error decay proportional to ( dim X δ ) − by considering sparse grids. Adaptive reﬁnement, however, achieves the best pos-sible error decay proportional to ( dim X δ ) − , recovering the rate for a smoothsolution.7.5. Singular problem.

We again select the L-shaped domain Ω : = [ −

1, 1 ] \ [ −

1, 0 ] with data u ≡ and g ≡

0. The solution has a strong singularityalong { } × ∂ Ω due to the incompatibility of initial- and boundary conditions,in addition to the singularity at the re-entrant corner (

0, 0 ) . At the right of Fig-ure 7, for uniform reﬁnement, we see the extremely slow error decay propor-tional to ( dim X δ ) − , already found in [FK19]. Interestingly, sparse grid re-ﬁnement offers no rate improvement over full grid reﬁnement. The adaptive al-gorithm yields a much better error decay proportional to ( dim X δ ) − . We ob-served that increasing the Dörﬂer marking parameter to θ = − θ smaller than 0.5 did not improve the ratebeyond − { } × ∂ Ω and I × { (

0, 0 ) } , and observe basis functions σ λ ⊗ φ ν that span X δ whose barycen-ter is at t = − ≈ − .7.6. Gradedness and error reduction.

In Sect. 4 we used (4.2) to demonstrate pro-portionality of (cid:107) r δ (cid:107) and (cid:107) u − u δ (cid:107) X , as well as a constant error reduction in eachiteration of the adaptive algorithm. In Proposition 6.6, we showed that (4.2) holdswhen the gradedness L δ of Deﬁnition 6.4 is uniformly bounded. N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 37 t = 0 t = 0.1 t = 0.2 t = 0.3 t = 0.4 t = 0.5 F IGURE Moving peak problem, adaptive lower set withdim X δ =

89 401. Shown: { ( λ , ν ) ∈ I δ : ( t , x , y ) ∈ supp σ λ ⊗ φ ν } for a selection of times t . u )|| r |||| ( u u )|| L ( ) adaptivesparse gridfull gridadaptivesparse gridfull grid u )|| r |||| ( u u )|| L ( ) adaptivesparse gridfull gridadaptivesparse gridfull grid F IGURE

7. Error progressions for (left) the cylinder problem and(right) the singular problem. Shown: estimated X -norm error(solid line), residual norm (dashed), and t = X δ for adaptive (black), sparse grid (red), andfull grid reﬁnement (orange).In the left picture of Figure 9, we see however a more than expected increase ingradedness, where in particular for the singular problem we observe a logarithmicincrease in terms of dim X δ . However, this turns out not to be a problem in prac-tice: Figures 5 and 7 demonstrate that the residual error (cid:107) r δ (cid:107) and the estimated X -norm error E δ ( ˜ u δ ) are very close, and even converge for the singular problem. F IGURE

8. Barycenters of supports of basis functions σ λ ⊗ φ ν spanning X δ generated by Algorithm 4.8 of dimension 81 074 forthe singular problem. Left: a top-down view, with a 10 × zoom tothe origin; right: centers in spacetime, logarithmic in time. Gradedness L smoothmoving peakcylindersingular 0 10 20 30 40 5010 Estimated X -norm error ( u ) k k k k F IGURE

9. Gradedness and estimated X -norm error at every iter-ation of the adaptive loop, for the four different model problemsunder consideration.Moreover, in the right picture of Figure 9, we see a constant error reduction of˘ ρ ≈ Total runtime and memory consumption.

Figure 10 shows the total runtimeand peak memory consumption after every iteration of the adaptive algorithm.The top row shows absolute values, and the bottom row values relative to dim X δ .The left of the ﬁgure shows that the adaptive algorithm runs in optimal lineartime in the dimension of the current trial space. N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 39 T i m e [ s ] Total runtime smoothmoving peakcylindersingular 10 M e m o r y [ M B ] Memory consumption T i m e [ m s ] p e r D o F M e m o r y [ k B ] p e r D o F F IGURE

10. Total runtime and peak memory consumption asfunction of dim X δ , measured after every iteration of the adaptiveloop, for the four different model problems.The right of the ﬁgure shows that the peak memory is linear as well, stabilizingto around 15kB per degree of freedom. This is relatively high, mainly because ourimplementation uses trees rather than hash maps to represent vectors to ensure alinear-time implementation of the matrix-vector products (cf. Rem. 6.2).8. C ONCLUSION

We have constructed an adaptive solver for a space-time variational formu-lation of parabolic evolution problems. The collection of trial spaces are givenby the spans of sets of tensor products of wavelets-in-time and hierarchical basisfunctions-in-space. Compared to our previous works [CS11, RS18] where we em-ployed ‘true’ wavelets also in space, the theoretical results are weaker. We havedemonstrated r -linear convergence of the adaptive routine, but have not shownoptimal rates at linear complexity. On the other hand, the runtimes that we ob-tained with the current approach are much better. R EFERENCES[Alp93] B.K. Alpert. A class of bases in L for the sparse representation of integral operators. SIAMJ. Math. Anal. , 24:246–262, 1993.[And13] R. Andreev. Stability of sparse space-time ﬁnite element discretizations of linear parabolicevolution equations.

IMA J. Numer. Anal. , 33(1):242–260, 2013.[And16] R. Andreev. Wavelet-in-time multigrid-in-space preconditioning of parabolic evolutionequations.

SIAM J. Sci. Comput. , 38(1):A216–A242, 2016.[BD04] P. Binev and R. DeVore. Fast computation in adaptive tree approximation.

Numer. Math. ,97(2):193 – 217, 2004.[BDDP02] P. Binev, W. Dahmen, R. DeVore, and P. Petruchev. Approximation classes for adaptivemethods.

Serdica Math. J. , 28:391–416, 2002.[BFV19] P. Binev, F. Fierro, and A. Veeser. Near-best adaptive approximation on conformingmeshes, 2019.[BRU20] N. Beranek, M.A. Reinhold, and K. Urban. A space-time variational method for optimalcontrol problems, 2020.[BZ96] R. Balder and Ch. Zenger. The solution of multidimensional real Helmholtz equations onsparse grids.

SIAM J. Sci. Comput. , 17(3):631–646, 1996.[Car04] C. Carstensen. An adaptive mesh-reﬁning algorithm allowing for an H stable L projec-tion onto Courant ﬁnite element spaces. Constr. Approx. , 20(4):549–564, 2004.[CDD01] A. Cohen, W. Dahmen, and R. DeVore. Adaptive wavelet methods for elliptic operatorequations – Convergence rates.

Math. Comp , 70:27–75, 2001.[CS11] N.G. Chegini and R.P. Stevenson. Adaptive wavelets schemes for parabolic problems:Sparse matrices and numerical results.

SIAM J. Numer. Anal. , 49(1):182–212, 2011.[Dev20] Denis Devaud. Petrov-Galerkin space-time hp -approximation of parabolic equations in H . IMA J. Numer. Anal. , 40(4):2717–2745, 2020.[DGVdZ16] R. Dyja, B. Ganapathysubramanian, and K.G. Van der Zee. Massively parallel-in-space-time, adaptive ﬁnite element framework for non-linear parabolic equations, 2016.arXiv:1608.08066.[DKS15] L. Diening, Ch. Kreuzer, and R.P. Stevenson. Instance optimality of the adaptive maxi-mum strategy.

Found. Comput. Math. , pages 1–36, 2015. 10.1007/s10208-014-9236-6.[DL92] R. Dautray and J.-L. Lions.

Mathematical analysis and numerical methods for science and tech-nology. Vol. 5 . Springer-Verlag, Berlin, 1992. Evolution problems I.[DS20] L. Diening and J. Storn. A space-time dpg method for the heat equation, 2020.[DSW20] W. Dahmen, R.P. Stevenson, and J. Westerdiep. Accuracy controlled data assimilation forparabolic problems. Technical report, 2020. Submitted.[ESV17] A. Ern, I. Smears, and M. Vohralík. Guaranteed, locally space-time efﬁcient, andpolynomial-degree robust a posteriori error estimates for high-order discretizations ofparabolic problems.

SIAM J. Numer. Anal. , 55(6):2811–2834, 2017.[FK19] T. Fuehrer and M. Karkulik. Space-time least-squares ﬁnite elements for parabolic equa-tions. Technical report, 2019. arXiv:1911.01942.[GHS16] F. D. Gaspoz, C.-J. Heine, and K. G. Siebert. Optimal grading of the newest vertex bisectionand H -stability of the L -projection. IMA J. Numer. Anal. , 36(3):1217–1241, 2016.[GK11] M.D. Gunzburger and A. Kunoth. Space-time adaptive wavelet methods for control prob-lems constrained by parabolic evolution equations.

SIAM J. Contr. Optim. , 49(3):1150–1170,2011.[GO07] M. Griebel and D. Oeltz. A sparse grid space-time discretization scheme for parabolicproblems.

Computing , 81(1):1–34, 2007.[GS19] H. Gimperlein and J. Stocek. Space-time adaptive ﬁnite elements for nonlocal parabolicvariational inequalities.

Comput. Methods Appl. Mech. Engrg. , 352:137–171, 2019.[GS21] G. Gantner and R.P. Stevenson. Further results on a space-time FOSLS formulation ofparabolic PDEs.

ESAIM Math. Model. Numer. Anal. , 2021. arXiv:2005.11000.[HLNS19] Ch. Hofer, U. Langer, M. Neumüller, and R. Schneckenleitner. Parallel and robust precon-ditioning for space-time isogeometric analysis of parabolic evolution problems.

SIAM J.Sci. Comput. , 41(3):A1793–A1821, 2019.[KS08] Y. Kondratyuk and R.P. Stevenson. An optimal adaptive ﬁnite element method for theStokes problem.

SIAM J. Numer. Anal. , 46(2):747–775, 2008.

N ADAPTIVE METHOD FOR PARABOLIC EVOLUTION EQUATIONS 41 [KS14] S. Kestler and R.P. Stevenson. Fast evaluation of system matrices w.r.t. multi-tree collec-tions of tensor product reﬁnable basis functions.

J. Comput. Appl. Math. , 260:103–116, 2014.[KSU16] S. Kestler, K. Steih, and K. Urban. An efﬁcient space-time adaptive waveletGalerkin method for time-periodic parabolic partial differential equations.

Math. Comp. ,85(299):1309–1333, 2016.[LM17] S. Larsson and M. Molteni. Numerical solution of parabolic problems based on a weakspace-time formulation.

Comput. Methods Appl. Math. , 17(1):65–84, 2017.[LMN16] U. Langer, S.E. Moore, and M. Neumüller. Space-time isogeometric analysis of parabolicevolution problems.

Comput. Methods Appl. Mech. Engrg. , 306:342–363, 2016.[LS20] U. Langer and A. Schafelner. Adaptive Space-Time Finite Element Methods for Non-autonomous Parabolic Problems with Distributional Sources.

Comput. Methods Appl.Math. , 20(4):677–693, 2020.[Nit06] P.-A. Nitsche. Best N -term approximation spaces for tensor product wavelet bases. Constr.Approx. , 24(1):49–70, 2006.[NS19] M. Neumüller and I. Smears. Time-parallel iterative solvers for parabolic evolution equa-tions.

SIAM J. Sci. Comput. , 41(1):C28–C51, 2019.[OR00] M. A. Olshanskii and A. Reusken. On the convergence of a multigrid method for linearreaction-diffusion problems.

Computing , 65(3):193–202, 2000.[PW12] J. W. Pearson and A. J. Wathen. A new approximation of the Schur complement in pre-conditioners for PDE-constrained optimization.

Numer. Linear Algebra Appl. , 19(5):816–829,2012.[RS18] N. Rekatsinas and R. Stevenson. An optimal adaptive tensor product wavelet solver of aspace-time FOSLS formulation of parabolic evolution problems.

Adv. Comput. Math. , 2018.[SS09] Ch. Schwab and R.P. Stevenson. A space-time adaptive wavelet method for parabolic evo-lution problems.

Math. Comp. , 78:1293–1318, 2009.[Ste96] R.P. Stevenson. The frequency decomposition multi-level method: A robust additive hier-archical basis preconditioner.

Math. Comp. , 65(215):983–997, July 1996.[Ste08] R.P. Stevenson. The completion of locally reﬁned simplicial partitions created by bisection.

Math. Comp. , 77:227–241, 2008.[Ste15] O. Steinbach. Space-Time Finite Element Methods for Parabolic Problems.

Comput. Meth-ods Appl. Math. , 15(4):551–566, 2015.[SU09] W. Sickel and T. Ullrich. Tensor products of Sobolev-Besov spaces and applications toapproximation from the hyperbolic cross.

J. Approx. Theory , 161:748–786, 2009.[SvV19] R.P. Stevenson and R. van Venetië. Uniform preconditioners for problems of negative or-der.

Math. Comp. , 2019.[SW20a] R.P. Stevenson and J. Westerdiep. Minimal residual space-time discretizations of parabolicequations: Asymmetric spatial operators. Technical report, Korteweg-de Vries Institute,2020. In preparation.[SW20b] R.P. Stevenson and J. Westerdiep. Stability of Galerkin discretizations of a mixed space-time variational formulation of parabolic evolution equations.

IMA J. Numer. Anal. , 2020.[SY18] O. Steinbach and H. Yang. Comparison of algebraic multigrid methods for an adaptivespace-time ﬁnite-element discretization of the heat equation in 3D and 4D.

Numer. LinearAlgebra Appl. , 25(3):e2143, 17, 2018.[SZ20] O. Steinbach and M. Zank. Coercive space-time ﬁnite element methods for initial bound-ary value problems.

Electron. Trans. Numer. Anal. , 52:154–194, 2020.[vVW20] R. van Venetië and J. Westerdiep. A scalable algorithm for solving linear parabolic evolu-tion equations, 2020.[vVW21] R. van Venetië and J. Westerdiep. Technical report, Korteweg-de Vries Institute, 2021. Inpreparation.[Wlo82] J. Wloka.

Partielle Differentialgleichungen . B. G. Teubner, Stuttgart, 1982. Sobolevräume undRandwertaufgaben.[WZ17] J. Wu and H. Zheng. Uniform convergence of multigrid methods for adaptive meshes.

Appl. Numer. Math. , 113:109–123, 2017.[ZMD +

11] J. Zitelli, I. Muga, L. Demkowicz, J. Gopalakrishnan, D. Pardo, and V. M. Calo. A classof discontinuous Petrov-Galerkin methods. Part IV: the optimal test norm and time-harmonic wave propagation in 1D.

J. Comput. Phys. , 230(7):2406–2432, 2011. K ORTEWEG - DE V RIES (K D V) I

NSTITUTE FOR M ATHEMATICS , U

NIVERSITY OF A MSTERDAM , P.O.B OX MSTERDAM , T HE N ETHERLANDS . Email address ::