[PDF] A path-following inexact Newton method for optimal control in BV

Abstract

Full PDF

NNoname manuscript No. (will be inserted by the editor)

A path-following inexact Newton method forPDE-constrained optimal control in BV

D. Hafemeyer · F. Mannel

Received: date / Accepted: date

Abstract

We study a PDE-constrained optimal control problem that involvesfunctions of bounded variation as controls and includes the TV seminormof the control in the objective. We apply a path-following inexact Newtonmethod to the problems that arise from smoothing the TV seminorm andadding an H regularization. We prove in an inﬁnite-dimensional setting that,ﬁrst, the solutions of these auxiliary problems converge to the solution of theoriginal problem and, second, that an inexact Newton method enjoys fastlocal convergence when applied to a reformulation of the optimality systemin which the control appears as implicit function of the adjoint state. Weshow convergence of a Finite Element approximation, provide a globalizedpreconditioned inexact Newton method as solver for the discretized auxiliaryproblems, and embed it into an inexact path-following scheme. We construct atwo-dimensional test problem with fully explicit solution and present numericalresults to illustrate the accuracy and robustness of the approach. Keywords optimal control · partial diﬀerential equations · TV seminorm · functions of bounded variation · path-following Newton method Mathematics Subject Classiﬁcation (2010) · · · · · · · Dominik HafemeyerTU M¨unchenLehrstuhl f¨ur Optimalsteuerung, Department of MathematicsBoltzmannstr. 3, 85748 Garching b. M¨unchen, GermanyE-mail: [email protected] MannelUniversity of GrazInstitute of Mathematics and Scientiﬁc ComputingHeinrichstr. 36, 8010 Graz, AustriaE-mail: ﬂ[email protected] a r X i v : . [ m a t h . O C ] O c t D. Hafemeyer, F. Mannel

Problem setting and introduction

This work is concerned with the optimal control problemmin ( y,u ) ∈ H ( Ω ) × BV( Ω ) || y − y Ω || L ( Ω ) + β | u | BV( Ω ) (cid:124) (cid:123)(cid:122) (cid:125) =: J ( y,u ) s.t. Ay = u, (OC)where throughout Ω ⊂ R N is a bounded C , domain and N ∈ { , , } . Thecontrol u belongs to the space of functions of bounded variation BV( Ω ), thestate y lives in Y := H ( Ω ), the parameter β is positive, and Ay = u is apartial diﬀerential equation of the form (cid:40) A y + c y = u in Ω,y = 0 on ∂Ω with a non-negative function c ∈ L ∞ ( Ω ) and a linear and uniformly ellipticoperator of second order in divergence form A : H ( Ω ) → H − ( Ω ), A y ( ϕ ) = (cid:82) Ω (cid:80) Ni,j =1 a ij ∂ i y∂ j ϕ d x whose coeﬃcients satisfy a ij = a ji ∈ C , ( Ω ) for all i, j ∈ { , . . . , N } . The speciﬁc feature of (OC) is the appearance of the BVseminorm | u | BV( Ω ) in the cost functional, which favors piecewise constantcontrols and has recently attracted considerable interest in PDE-constrainedoptimal control, cf. [7,12,13,15,22,23,24,27,28,30] and the earlier works [16,17].The majority of these contributions focuses on deriving optimality conditionsand studying Finite Element approximations. In contrast, the main focus ofthis work is on a path-following method. Speciﬁcally, – we propose to smooth the TV seminorm in J and add an H regularization,and we show in an inﬁnite-dimensional setting that the solutions of theresulting auxiliary problems converge to the solution of (OC); – we present a non-standard reformulation of the optimality conditions of theauxiliary problems and show local convergence of an inﬁnite-dimensionalinexact Newton method when applied to this reformulation; – we derive a practical path-following method that yields accurate solutionsfor (OC) and illustrate its capabilities in numerical examples for Ω ⊂ R .To the best of our knowledge, these aspects have only been investigated partiallyfor optimal control problems that involve the TV seminorm in the objective. Inparticular, there are few works that address the numerical solution when themeasure ∇ u is supported in a two-dimensional set. In fact, we are only aware of[22], where a doubly-regularized version of the Fenchel predual of (OC) is solvedfor ﬁxed regularization parameters, but path-following is not applied. We stressthat in our numerical experience the two-dimensional case is signiﬁcantly morechallenging than the one-dimensional case. A FeNiCs implementation of ourpath-following method is available at https://imsc.uni-graz.at/mannel/publications.php . It includes all the features that we discuss in section 6,e.g., a preconditioner for the Newton systems, a non-monotone line searchglobalization, and inexact path-following.A further contribution of this work is that path-following inexact Newton method for optimal control in BV 3 – we provide an example of (OC) for N = 2 with fully explicit solution.For the case that ∇ u is deﬁned in an interval ( N = 1) such examples areavailable, e.g. [13,30], but for N = 2 this is new.Let us brieﬂy address three diﬃculties associated with (OC). First, the factthat (OC) is posed in the non-reﬂexive space BV( Ω ) complicates the proof ofexistence of optimal solutions. By now it is, however, well understood how todeal with this issue also in more complicated situations, cf. e.g. [13,15].Second, we notice that u (cid:55)→ | u | BV( Ω ) is not diﬀerentiable. We will copewith this by replacing | u | BV( Ω ) with a smoothed functional ψ δ , δ ≥

0, thatsatisﬁes ψ ( · ) = |·| BV( Ω ) . The functional ψ δ that we use for this purpose iswell-known, particularly in the imaging community, e.g. [1,20]. However, inmost of the existing works the smoothing parameter δ > δ to zero. We will also add the regularizer γ (cid:107) u (cid:107) H ( Ω ) , γ ≥

0, to J and drive γ to zero. For ﬁxed γ, δ > u γ,δ of thesmoothed and regularized auxiliary problem turns out to be a C ,α function forsome α > γ = 0 only ¯ u ,δ ∈ BV( Ω ) can be expected.Third, numerical experiments show that for standard formulations of theoptimality system, e.g. those that result from reduction to the control, path-following Newton methods are not able to suﬃciently reduce the smoothingparameter δ . In fact, we have consistently encountered this phenomenon in ourprevious work [13,21,27,28,30] involving the TV seminorm. As a remedy wepropose to eliminate the control from the optimality system by regarding it asan implicit function of the adjoint state, an approach that may be of interest inits own right. Since the control depends nonlinearly on the adjoint state, thisincreases the computational costs in comparison to standard formulations ofthe optimality system, e.g. reduction to the control or an all-at-once approach,where the dependencies are linear. On the other hand, the implicit approachenables us to reduce δ far below the levels that we achieved with standardformulations of the optimalty system. In addition, we provide measures thatlower the computational burden of this approach.Let us set our work in perspective with the available literature. We regardit as one of the main contributions that we show on the inﬁnite-dimensionallevel that the solutions of the auxiliary problems converge to the solution of(OC), cf. section 2.5. The asymptotic convergence for vanishing H seminormregularization is analyzed in [15, Section 6] for a more general problem than(OC), but the fact that our setting is less general allows us to prove convergencein stronger norms than the corresponding [15, Theorem 10]. The asymptoticconvergence for a doubly-regularized version of the predual of (OC) is estab-lished in [22, Appendix A], but one of the regularizations is left untouched, soconvergence is towards the solution of a regularized problem, not towards thesolution of (OC). Next, we demonstrate that an inﬁnite-dimensional inexactNewton method, applied to the aforementioned non-standard reformulation D. Hafemeyer, F. Mannel of the optimality system, converges locally for the auxiliary problems. Thisis non-trivial to prove because the implicit control and the adjoint state arecoupled by a quasilinear PDE. A related result is [22, Theorem 3.5], wherelocal q-superlinear convergence of a semismooth Newton method is shown forthe doubly-regularized Fenchel predual for ﬁxed regularization parameters. Yet,since we work with a diﬀerent optimality system, the overlap is rather small.Turning to the discrete level we provide a Finite Element approximationand demonstrate that the Finite Element solutions of the auxiliary problemsconverge to the corresponding true solutions. Finite Element approximationsfor optimal control in BV involving the TV seminorm have also been studied in[7,12,13,15,23,24,27,28,30], but in our assessment the regularization of (OC)that we propose is not covered by these studies.The BV-term in (OC) favors sparsity in the gradient of the control. Othersparsity promoting control terms that have been studied during recent yearsare measure norms and L –type functionals, e.g., [2,10,11,14,18,19,31,34,45].TV-regularization is also of signiﬁcant importance in imaging problems andits usefulness for, e.g., noise removal has long been known [41]. However, thecharacter of imaging problems is substantially diﬀerent from optimal controlproblems, for instance because the forward operator in imaging problems isusually cheap to evaluate and non-compact.This paper is organized as follows. After some preliminaries in section 1,we consider existence, optimality conditions and convergence of solutions insection 2. In section 3 we establish diﬀerentiability of the adjoint-to-controlmapping, which paves the way for proving local convergence of an inexactNewton method in section 4. Section 5 addresses the Finite Element approxi-mation and its convergence, while section 6 provides the path-following method.Numerical experiments are presented in section 7, including for the test problemwith explicit solution. Several technical results such as H¨older continuity ofsolutions to quasilinear PDEs are deferred to the appendix. We recall facts about the space BV( Ω ), introduce an index s , and collectproperties of the solution operator of the PDE in (OC).1.1 Functions of bounded variationThe following statements about BV( Ω ) can be found in [4, Chapter 3] unlessstated otherwise. The space of functions of bounded variation is deﬁned asBV( Ω ) := (cid:40) u ∈ L ( Ω ) : sup v ∈ C ( Ω ) N , (cid:107)| v |(cid:107) ∞ ≤ (cid:90) Ω u div v d x < ∞ (cid:41) . Here and throughout, | · | denotes the Euclidean norm, so we are using theisotropic total variation. It can be shown that u ∈ BV( Ω ) iﬀ there exists a vector path-following inexact Newton method for optimal control in BV 5 measure ( ∂ x u, . . . , ∂ x N u ) T = ∇ u ∈ M ( Ω ) N such that for all i ∈ { , . . . , n } there holds (cid:90) Ω ∂ x i uv d x = − (cid:90) Ω u∂ x i v d x ∀ v ∈ C ∞ ( Ω ) , where M ( Ω ) denotes the linear space of regular Borel measures, e.g. [42,Chapter 2]. The BV seminorm (also called TV seminorm) is given by | u | BV( Ω ) := sup v ∈ C ( Ω ) N , (cid:107)| v |(cid:107) ∞ ≤ (cid:90) Ω u div v d x. We endow BV( Ω ) with the norm (cid:107)·(cid:107) BV( Ω ) := (cid:107)·(cid:107) L ( Ω ) + | · | BV( Ω ) and recallfrom [5, Thm. 10.1.1] that this makes BV( Ω ) a Banach space. Obviously, wehave the inclusion W , ( Ω ) ⊂ BV( Ω ). Moreover, BV( Ω ) embeds continuously(compactly) into L r ( Ω ) for r ∈ [1 , NN − ] ( r ∈ [1 , NN − )), see, e.g., [4, Cor. 3.49and Prop. 3.21]. We use the convention that NN − = ∞ for N = 1. Alsoimportant is strict convergence, e.g. [4,5]. Deﬁnition 1

For r ∈ [1 , NN − ] the metric d BV ,r is given by d BV ,r : BV( Ω ) × BV( Ω ) → R , ( u, v ) (cid:55)→ (cid:107) u − v (cid:107) L r ( Ω ) + (cid:12)(cid:12) | u | BV( Ω ) − | v | BV( Ω ) (cid:12)(cid:12) . Convergence with respect to d BV , is called strict convergence . Remark 1

The embedding BV( Ω ) (cid:44) → L r ( Ω ), for r ∈ [1 , NN − ], implies that d BV,r is well-deﬁned and continuous with respect to (cid:107)·(cid:107)

BV( Ω ) .We will also use the following density property. Lemma 1 C ∞ ( ¯ Ω ) is dense in ( BV ( Ω ) ∩ L r ( Ω ) , d BV ,r ) for r ∈ [1 , NN − ] .Proof By straightforward modiﬁcations the proof for the special case r = 1, [5,Thm. 10.1.2], can be extended, using that the sequence of molliﬁers constructedin the proof converges in L r , see [5, Prop. 2.2.4]. (cid:117)(cid:116) ψ δ : BV( Ω ) → [0 , ∞ ) given by ψ δ ( u ) := sup (cid:40)(cid:90) Ω u div v + (cid:112) δ (1 − | v | ) d x : v ∈ C ( Ω ) N , (cid:107)| v |(cid:107) L ∞ ( Ω ) ≤ (cid:41) . It has the following properties.

Lemma 2

The following statements are true for all δ ≥ . D. Hafemeyer, F. Mannel

1. For any u ∈ BV ( Ω ) there holds | u | BV ( Ω ) = ψ ( u ) ≤ ψ δ ( u ) ≤ | u | BV ( Ω ) + √ δ | Ω | . ψ δ is lower semi-continuous with respect to the L ( Ω ) -topology.3. ψ δ is convex.4. For all u ∈ W , ( Ω ) we have ψ δ ( u ) = (cid:90) Ω (cid:112) δ + |∇ u | d x.

5. The function ψ δ | H ( Ω ) is Lipschitz with respect to (cid:107)·(cid:107) H ( Ω ) .Proof The ﬁrst four statements are from [1, Section 2] and the last one followsfrom H ( Ω ) (cid:44) → W , ( Ω ), 4. and and the Lipschitz continuity of r (cid:55)→ √ δ + r . (cid:117)(cid:116) Remark 2

The smoothing function ψ δ for the TV seminorm is frequently usedin imaging problems, e.g. [1,20].1.3 The index s For the remainder of this work we ﬁx a number s = s ( N ) ∈ (1 , NN − ) withBV( Ω ) (cid:44) → (cid:44) → L s ( Ω ) (cid:44) → H − ( Ω ). Remark 3

Consider, for instance, N = 2 and any r ∈ (1 , Ω ) (cid:44) → (cid:44) → L r ( Ω ) and H ( Ω ) (cid:44) → L rr − ( Ω ) so that any s ∈ (1 ,

2) can be used.1.4 The solution operator of the state equation

Lemma 3

For every u ∈ H − ( Ω ) the operator equation Ay = u in (OC) hasa unique solution y = y ( u ) ∈ Y . The solution operator S : H − ( Ω ) → Y, u (cid:55)→ y ( u ) is linear, continuous, and bijective. In particular, S is L s - L continuous. More-over, for given q ∈ (1 , ∞ ) there is a constant C > such that (cid:107) Su (cid:107) W ,q ( Ω ) ≤ C (cid:107) u (cid:107) L q ( Ω ) is satisﬁed for all u ∈ L q ( Ω ) .Proof Except for the estimate all statements follow from the Lax-Milgramtheorem. The estimate is a consequence of [26, Lemma 2.4.2.1, Theorem 2.4.2.5]. (cid:117)(cid:116)

Remark 4

From BV( Ω ) (cid:44) → L s ( Ω ) (cid:44) → H − ( Ω ) and Lemma 3 we obtain that(OC) has a nonempty feasible set. path-following inexact Newton method for optimal control in BV 7 In this section we prove existence of solutions for (OC) and the associatedregularized problems, characterize the solutions by optimality conditions, andshow their convergence in appropriate function spaces.2.1 The original problem: Existence of solutionsTo establish the existence of a solution for (OC) we use the reduced problem min u ∈ BV( Ω ) (cid:107) Su − y Ω (cid:107) L ( Ω ) + β | u | BV( Ω ) (cid:124) (cid:123)(cid:122) (cid:125) =: j ( u ) . (ROC) Lemma 4

The function j : BV ( Ω ) → R is well-deﬁned, strictly convex, andcontinuous with respect to d BV,s .Proof

The term (cid:107) Su − y Ω (cid:107) L ( Ω ) is well-deﬁned by Remark 4 and strictlyconvex in u due to the injectivity of S . Since |·| BV( Ω ) is convex, the strictconvexity of j follows. The continuity holds because S is L s - L continuous. (cid:117)(cid:116) The strict convexity implies that j has at most one (local=global) minimizer. Theorem 1

The problem (ROC) has a unique solution ¯ u ∈ BV ( Ω ) .Proof The proof is included in the proof of Theorem 2. (cid:117)(cid:116)

As usual, the optimal state ¯ y and the optimal adjoint state ¯ p are given by¯ y := S ¯ u ∈ Y ∩ W ,r N ( Ω ) and ¯ p := S ∗ (¯ y − y Ω ) , where, due to BV( Ω ) (cid:44) → L NN − ( Ω ) and Lemma 3, we have r N = NN − for N ∈ { , } , respectively, r N ≥ N = 1. Moreover, S ∗ is the adjoint operator of S wrt. the L inner product. Since S ∗ = S and¯ y − y Ω ∈ L ( Ω ), Lemma 3 yields ¯ p ∈ P for P := H ( Ω ) ∩ H ( Ω ) . It is standard to show that ¯ p is the unique weak solution of (cid:40) A p + c p = ¯ y − y Ω in Ω,p = 0 on ∂Ω.

D. Hafemeyer, F. Mannel H regularization to j yieldsmin u ∈ BV( Ω ) || Su − y Ω || L ( Ω ) + βψ δ ( u ) + γ (cid:107) u (cid:107) H ( Ω ) (cid:124) (cid:123)(cid:122) (cid:125) =: j γ,δ ( u ) , (ROC γ,δ )where we set j γ,δ ( u ) := + ∞ for u ∈ BV( Ω ) \ H ( Ω ) if γ > Lemma 5

For any γ, δ ≥ the function j γ,δ : BV ( Ω ) → R ∪ { + ∞} is well-deﬁned and strictly convex, and the function j γ,δ | H ( Ω ) is H continuous.Proof The well-deﬁnition and strict convexity of j γ,δ follow similarly as for j in Lemma 4. The continuity follows term by term. For the ﬁrst term it isenough to recall from Lemma 3 the L - L continuity of S . The second term isLipschitz in H by Lemma 2. The continuity of the third term is clear. (cid:117)(cid:116) To prove existence of solutions for (ROC γ,δ ) we use an auxiliary result.

Lemma 6

Let ( u k ) k ∈ N ⊂ BV ( Ω ) be such that ( j ( u k )) k ∈ N is bounded. Then (cid:0) (cid:107) u k (cid:107) BV ( Ω ) (cid:1) k ∈ N is bounded.Proof We denote by

C > (cid:0) | u k | BV( Ω ) (cid:1) k ∈ N is bounded because for each k ∈ N we have | u k | BV( Ω ) ≤ j ( u k ) β ≤ C. The Poincar´e inequality holds in BV( Ω ), see [40, Theorem 4.10], hence (cid:107) u k − ˆ u k (cid:107) L s ( Ω ) ≤ C | u k | BV( Ω ) ≤ C ∀ k ∈ N , (1)where ˆ u k := | Ω | (cid:82) Ω u k d x denotes the integral mean of u k . From (cid:107) Su k (cid:107) L ( Ω ) − (cid:107) y Ω (cid:107) L ( Ω ) ≤(cid:107) Su k − y Ω (cid:107) L ( Ω ) ≤ (cid:113) j ( u k ) ≤ C, it follows that ( (cid:107) Su k (cid:107) L ( Ω ) ) k ∈ N is bounded. Together with the L s - L continuityof S and (1) this gives | ˆ u k | (cid:107) S (cid:107) L ( Ω ) = (cid:107) S ˆ u k (cid:107) L ( Ω ) ≤ (cid:107) Su k − S ˆ u k (cid:107) L ( Ω ) + (cid:107) Su k (cid:107) L ( Ω ) ≤ C. The injectivity of S yields S (cid:54) = 0, so ( | ˆ u k | ) k ∈ N is bounded, which impliesboundedness of ( (cid:107) u k (cid:107) L s ( Ω ) ) by (1) and thus also of ( (cid:107) u k (cid:107) L ( Ω ) ). (cid:117)(cid:116) Theorem 2

For any γ, δ ≥ , (ROC γ,δ ) has a unique solution ¯ u γ,δ ∈ BV ( Ω ) .For γ > we have ¯ u γ,δ ∈ H ( Ω ) . path-following inexact Newton method for optimal control in BV 9 Proof

For γ > u γ,δ ∈ H ( Ω ) follows from standard argumentssince j γ,δ | H ( Ω ) is strongly convex and H continuous by Lemma 5. It remainsto argue for γ = 0. Let δ ≥

0. There is a sequence ( u k ) k ∈ N ⊂ BV( Ω ) such thatlim k →∞ j γ,δ ( u k ) = inf u ∈ BV( Ω ) j γ,δ ( u ) . Moreover, there exists C ∈ R such that for all k ∈ N j ( u k ) = 12 (cid:107) Su k − y Ω (cid:107) L ( Ω ) + β | u k | BV( Ω ) ≤ (cid:107) Su k − y Ω (cid:107) L ( Ω ) + βψ δ ( u k ) ≤ j γ,δ ( u k ) ≤ C, where we used | u k | BV( Ω ) ≤ ψ δ ( u k ) from Lemma 2. By Lemma 6 we have that( (cid:107) u k (cid:107) U ) k ∈ N is bounded. Since BV( Ω ) is compactly embedded in L s ( Ω ), thereis a subsequence of ( u k ) k ∈ N , denoted the same way, such that lim k →∞ (cid:107) u k − ¯ u γ,δ (cid:107) L s ( Ω ) = 0 for some ¯ u γ,δ ∈ L s ( Ω ). As S is L s - L continuous and ψ δ is L lower semi-continuous by Lemma 2, we obtain j γ,δ (¯ u γ,δ ) ≤ lim inf k →∞ j γ,δ ( u k ) = inf u ∈ BV( Ω ) j γ,δ ( u ) . This implies | ¯ u γ,δ | BV( Ω ) ≤ ψ δ (¯ u γ,δ ) ≤ j γ,δ (¯ u γ,δ ) /β < ∞ , so ¯ u γ,δ ∈ BV( Ω ) is aminimizer of (ROC γ,δ ). As j γ,δ is strictly convex, the minimizer is unique. (cid:117)(cid:116) Optimal state ¯ y γ,δ and optimal adjoint state ¯ p γ,δ for (ROC γ,δ ) are given by¯ y γ,δ := S ¯ u γ,δ ∈ Y ∩ W ,r N ( Ω ) and ¯ p γ,δ := S ∗ (cid:0) ¯ y γ,δ − y Ω (cid:1) ∈ P, where r N = NN − for N ∈ { , } , respectively, r N ≥ N = 1. In particular, ¯ p γ,δ is the unique weak solution of (cid:40) A p + c p = ¯ y γ,δ − y Ω in Ω,p = 0 on ∂Ω. j γ,δ has the following diﬀerentiability properties. Lemma 7

For γ, δ > the functional j γ,δ : H ( Ω ) → R is Lipschitz continu-ously Fr´echet diﬀerentiable and twice Gˆateaux diﬀerentiable. Its ﬁrst derivativeis j (cid:48) γ,δ ( u ) v = ( S ∗ ( Su − y Ω ) , v ) L ( Ω ) + βψ (cid:48) δ ( u ) v + γ ( u, v ) H ( Ω ) ∀ v ∈ H ( Ω ) , where ψ (cid:48) δ ( u ) v = (cid:90) Ω ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | d x ∀ v ∈ H ( Ω ) . Proof

It suﬃces to establish the claim for ψ δ , which is done in Lemma 17. (cid:117)(cid:116) For diﬀerentiable convex functions a vanishing derivative is both necessaryand suﬃcient for a global minimizer. This yields the following result.

Theorem 3

For γ, δ > the control ¯ u γ,δ ∈ H ( Ω ) is the solution of (ROC γ,δ ) iﬀ j (cid:48) γ,δ (¯ u γ,δ ) v = 0 ∀ v ∈ H ( Ω ) , which is the nonlinear Neumann problem γ (¯ u γ,δ , v ) H ( Ω ) + β (cid:90) Ω ( ∇ ¯ u γ,δ , ∇ v ) (cid:112) δ + |∇ ¯ u γ,δ | d x = − (¯ p γ,δ , v ) L ( Ω ) ∀ v ∈ H ( Ω ) . (2)2.5 Convergence of the path of solutionsWe prove that (¯ u γ,δ , ¯ y γ,δ ,¯ p γ,δ ) converges to (¯ u, ¯ y, ¯ p ) for γ, δ →

0. As a ﬁrst stepwe show convergence of the objective values.

Lemma 8

We have j γ,δ (¯ u γ,δ ) R ≥ (cid:51) ( γ,δ ) → (0 , −−−−−−−−−−−→ j (¯ u ) . Proof

Let (cid:15) > γ k , δ k )) k ∈ N ⊂ R ≥ converge to (0 , ≤ j γ k ,δ k (¯ u γ k ,δ k ) − j (¯ u ) = (cid:2) j γ k ,δ k (¯ u γ k ,δ k ) − j γ k , (¯ u γ k , ) (cid:3) + (cid:2) j γ k , (¯ u γ k , ) − j (¯ u ) (cid:3) , where we used j (¯ u ) ≤ j (¯ u γ k ,δ k ) ≤ j γ k ,δ k (¯ u γ k ,δ k ). The ﬁrst term in bracketssatisﬁes j γ k ,δ k (¯ u γ k ,δ k ) − j γ k , (¯ u γ k , ) ≤ j γ k ,δ k (¯ u γ k , ) − j γ k , (¯ u γ k , )= βψ δ k (¯ u γ k , ) − β | ¯ u γ k , | BV( Ω ) ≤ β (cid:112) δ k | Ω | , where the last inequality follows from Lemma 2. For the second term in bracketswe deduce from Lemma 1 and the d BV,s continuity of j established in Lemma 4that there is u (cid:15) ∈ C ∞ ( ¯ Ω ) such that | j (¯ u ) − j ( u (cid:15) ) | < (cid:15) . This yields j γ k , (¯ u γ k , ) − j (¯ u ) ≤ j γ k , ( u (cid:15) ) − j (¯ u )= j ( u (cid:15) ) + γ k || u (cid:15) || H ( Ω ) − j (¯ u ) ≤ (cid:15) + γ k || u (cid:15) || H ( Ω ) . Putting the estimates for the two terms together shows | j γ k ,δ k (¯ u γ k ,δ k ) − j (¯ u ) | ≤ β (cid:112) δ k | Ω | + (cid:15) + γ k (cid:107) u (cid:15) (cid:107) H ( Ω ) . For k → ∞ this implies the claim since0 ≤ lim inf k →∞ | j γ k ,δ k (¯ u γ k ,δ k ) − j (¯ u ) | ≤ lim sup k →∞ | j γ k ,δ k (¯ u γ k ,δ k ) − j (¯ u ) | ≤ (cid:15). (cid:117)(cid:116) path-following inexact Newton method for optimal control in BV 11 We infer that the optimal controls ¯ u γ,δ converge to ¯ u in L r for suitable r . Lemma 9

For any r ∈ [1 , NN − ) we have || ¯ u γ,δ − ¯ u || L r ( Ω ) ( γ,δ ) → (0 , −−−−−−−→ .Proof Let (( γ k , δ k )) k ∈ N ⊂ R ≥ converge to (0 , C be so large that γ k , δ k ≤ C for all k . The optimality of ¯ u γ k ,δ k and Lemma 2 yield for each k ∈ N j (¯ u γ k ,δ k ) ≤ j γ k ,δ k (¯ u γ k ,δ k ) ≤ j γ k ,δ k (0) ≤ j C,C (0) . Lemma 6 and the compact embedding of BV( Ω ) into L r ( Ω ), r ∈ [ s, NN − ),imply that there exists ˜ u ∈ L r ( Ω ) such that a subsequence of (¯ u γ k ,δ k ) k ∈ N ,denoted in the same way, converges to ˜ u in L r ( Ω ). It is therefore enough toshow ˜ u = ¯ u . Since j is lower semi-continuous in the L s topology, we have j (˜ u ) ≤ lim inf k →∞ j (¯ u γ k ,δ k ) ≤ lim inf k →∞ j γ k ,δ k (¯ u γ k ,δ k ) = j (¯ u ) , where we used Lemma 8 to obtain the last equality. This shows ˜ u ∈ BV( Ω ),hence Theorem 1 implies ˜ u = ¯ u . (cid:117)(cid:116) In fact, the convergence of ¯ u γ,δ to ¯ u is stronger. Theorem 4

For any r ∈ [1 , NN − ) we have d BV,r (¯ u γ,δ , ¯ u ) ( γ,δ ) → (0 , −−−−−−−→ . Proof

For any γ, δ ≥ j (¯ u ) ≤ j (¯ u γ,δ ) ≤ j γ,δ (¯ u γ,δ ), so Lemma 8 yieldslim ( γ,δ ) → (0 , j (¯ u γ,δ ) = j (¯ u ). Furthermore, there holds β (cid:12)(cid:12)(cid:12) | ¯ u | BV( Ω ) − | ¯ u γ,δ | BV( Ω ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12) j (¯ u ) − j (¯ u γ,δ ) (cid:12)(cid:12) + 12 (cid:12)(cid:12)(cid:12) (cid:107) S ¯ u − y Ω (cid:107) L ( Ω ) − (cid:107) S ¯ u γ,δ − y Ω (cid:107) L ( Ω ) (cid:12)(cid:12)(cid:12) . By Lemma 9 and the continuity of S from L s ( Ω ) to L ( Ω ) we thus ﬁnd | ¯ u γ,δ | BV( Ω ) ( γ,δ ) → (0 , −−−−−−−→ | ¯ u | BV( Ω ) . Together with Lemma 9 this proves the claim. (cid:117)(cid:116)

We conclude this section with the convergence of (¯ y γ,δ , ¯ p γ,δ ) to (¯ y, ¯ p ). Theorem 5

For any r ∈ [1 , NN − ) and any r (cid:48) ∈ [1 , ∞ ) we have lim ( γ,δ ) → (0 , (cid:107) ¯ y γ,δ − ¯ y (cid:107) W ,r ( Ω ) = 0 and lim ( γ,δ ) → (0 , (cid:107) ¯ p γ,δ − ¯ p (cid:107) W ,r (cid:48) ( Ω ) = 0 . Proof

The continuity of S from L q to W ,q for any q ∈ (1 , ∞ ), see Lemma 3,implies with Lemma 9 that lim ( γ,δ ) → (0 , (cid:107) ¯ y γ,δ − ¯ y (cid:107) W ,r ( Ω ) = 0 for any r ∈ [1 , NN − ). Since for any r (cid:48) ∈ (1 , ∞ ) there is r ∈ [1 , NN − ) such that W ,r ( Ω ) (cid:44) → L r (cid:48) ( Ω ) is satisﬁed, we can use the L r (cid:48) - W ,r (cid:48) continuity of S ∗ = S to ﬁndlim ( γ,δ ) → (0 , (cid:107) ¯ p γ,δ − ¯ p (cid:107) W ,r (cid:48) ( Ω ) = lim ( γ,δ ) → (0 , (cid:107) S ∗ (¯ y γ,δ − ¯ y ) (cid:107) W ,r (cid:48) ( Ω ) = 0. (cid:117)(cid:116) Remark 5

The results of section 2 can also be established for nonsmoothdomains Ω , but ¯ y, ¯ p, ¯ y γ,δ , ¯ p γ,δ may be less regular since S may not provide theregularity stated in Lemma 3. A careful inspection reveals that only Theorem 5has to be modiﬁed. If, for instance, Ω ⊂ R N , N ∈ { , } , is a bounded Lipschitzdomain, then [43, Theorem 3] implies that Theorem 5 holds if W ,r and W ,r (cid:48) are both replaced by H r , where r ∈ [1 , ) is arbitrary. If Ω is convex, then [26,Theorem 3.2.1.2] further yields that W ,r (cid:48) can be replaced by H . The main goal of this section is to show that the PDE  − div (cid:32)(cid:34) γ + β (cid:112) δ + |∇ u | (cid:35) ∇ u (cid:33) + γu = p in Ω, (cid:32)(cid:34) γ + β (cid:112) δ + |∇ u | (cid:35) ∇ u, ν (cid:33) = 0 on ∂Ω (3)has a unique weak solution u = u ( p ) ∈ C ,α ( Ω ) for every right-hand side p ∈ L ∞ ( Ω ), and that p (cid:55)→ u ( p ) is Lipschitz continuously Fr´echet diﬀerentiablein any open ball, having a Lipschitz constant that is independent of γ and δ ,provided γ > δ > γ, δ in u = u ( p ; γ, δ ). Assumption 6

We are given constants < γ ≤ γ , < δ ≤ δ and b > .We denote I := [ γ , γ ] × [ δ , δ ] and write B ⊂ L ∞ ( Ω ) for the open ball ofradius b > centered at the origin in L ∞ ( Ω ) . Let us ﬁrst establish well-deﬁnition of p (cid:55)→ u ( p ) and a Lipschitz estimate. Lemma 10

Let Assumption 6 hold. Then there exist

L > and α ∈ (0 , such that for each ( γ, δ ) ∈ I and all p , p ∈ B the PDE (3) has unique weaksolutions u = u ( p ) ∈ C ,α ( Ω ) and u = u ( p ) ∈ C ,α ( Ω ) that satisfy (cid:107) u − u (cid:107) C ,α ( Ω ) ≤ L (cid:107) p − p (cid:107) L ∞ ( Ω ) . In particular, we have the stability estimate (cid:107) u (cid:107) C ,α ( Ω ) ≤ L (cid:107) p (cid:107) L ∞ ( Ω ) . Proof

Unique existence and the ﬁrst estimate are established in Theorem 13in the appendix. The second estimate follows from the ﬁrst for p = 0. (cid:117)(cid:116) We introduce f : R N → R N , f ( v ) := β v (cid:112) δ + | v | , so that (3) reads − div (cid:16) γ ∇ u + f ( ∇ u ) (cid:17) + γu = p in H ( Ω ) ∗ . (4)We now show that the adjoint-to-control mapping is diﬀerentiable. path-following inexact Newton method for optimal control in BV 13 Lemma 11

Let Assumption 6 hold and let α ∈ (0 , be the constant fromLemma 10. For each ( γ, δ ) ∈ I the mapping B (cid:51) p (cid:55)→ u ( p ) ∈ C ,α ( Ω ) isFr´echet diﬀerentiable. Its derivative z = u (cid:48) ( p ) d ∈ C ,α ( Ω ) in direction d ∈ L ∞ ( Ω ) is the unique weak solution of the linear PDE  − div (cid:32)(cid:104) γI + f (cid:48) (cid:0) ∇ u ( p ) (cid:1)(cid:105) ∇ z (cid:33) + γz = d in Ω, (cid:32)(cid:104) γI + f (cid:48) (cid:0) ∇ u ( p ) (cid:1)(cid:105) ∇ z, ν (cid:33) = 0 on ∂Ω, (5) and there exists C > such that for all ( γ, δ ) ∈ I , all p ∈ B , and all d ∈ L ∞ ( Ω ) we have (cid:107) z (cid:107) C ,α ( Ω ) ≤ C (cid:107) d (cid:107) L ∞ ( Ω ) . Proof

Let p ∈ B and d ∈ L ∞ ( Ω ) be such that p + d ∈ B . From Lemma 10 we ob-tain u ( p ) ∈ C ,α ( Ω ) and (cid:107) u ( p ) (cid:107) C ,α ( Ω ) ≤ C (cid:107) p (cid:107) L ∞ ( Ω ) , where C is independentof γ, δ, p . Combining this with Lemma 18 implies f (cid:48) ( ∇ u ( p )) = I (cid:112) δ + |∇ u ( p ) | − ∇ u ( p ) ∇ u ( p ) T (cid:0) δ + |∇ u ( p ) | (cid:1) ∈ C ,α ( Ω, R N × N ) (6)and the estimate (cid:107) A (cid:107) C ,α ( Ω ) ≤ a for A := γI + f (cid:48) ( ∇ u ( p )) with a constant a that does not depend on γ, δ, p . This shows that Theorem 12 is applicable.Thus, it follows that the PDE (5) has a unique weak solution z ∈ C ,α ( Ω )that satisﬁes the claimed estimate. Concerning the Fr´echet diﬀerentiability weobtain for r := u ( p + d ) − u ( p ) − z ∈ C ,α ( Ω ) − div (cid:32)(cid:104) γI + f (cid:48) (cid:0) u ( p ) (cid:1)(cid:105) ∇ r (cid:33) + γr = − div (cid:16) γ ∇ u ( p + d ) (cid:17) + γu ( p + d ) + div (cid:16) γ ∇ u ( p ) (cid:17) − γu ( p )+ div (cid:32)(cid:104) γI + f (cid:48) (cid:0) ∇ u ( p ) (cid:1)(cid:105) ∇ z (cid:33) − γz − div (cid:16) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) w (cid:17) = div (cid:16) f (cid:0) ∇ u ( p + d ) (cid:1) − f (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) ∇ u ( p ) (cid:1) w (cid:17) , where we set w := w ( p, d ) := ∇ u ( p + d ) − ∇ u ( p ). Theorem 12 implies thatthere is C >

0, independent of d , such that (cid:13)(cid:13) r (cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:13)(cid:13)(cid:13) f (cid:0) ∇ u ( p + d ) (cid:1) − f (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) u ( p ) (cid:1) w (cid:13)(cid:13)(cid:13) C ,α ( Ω ) . The expression in the norm on the right-hand side satisﬁes the followingpointwisely in Ωf (cid:0) ∇ u ( p + d ) (cid:1) − f (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) ∇ u ( p ) (cid:1) w = (cid:18)(cid:90) f (cid:48) (cid:0) ∇ u ( p ) + tw (cid:1) − f (cid:48) (cid:0) ∇ u ( p ) (cid:1) d t (cid:19) w = (cid:18)(cid:90) (cid:90) f (cid:48)(cid:48) (cid:0) ∇ u ( p ) + τ tw (cid:1) d τ t d t (cid:19) [ w, w ] . Lemma 18 yields (cid:13)(cid:13) r (cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:90) (cid:90) (cid:13)(cid:13)(cid:13) f (cid:48)(cid:48) (cid:0) ∇ u ( p )+ τ tw (cid:1)(cid:13)(cid:13)(cid:13) C ,α ( Ω ) d τ d t (cid:13)(cid:13)(cid:13) u ( p + d ) − u ( p ) (cid:13)(cid:13)(cid:13) C ,α ( Ω ) . As f ∈ C ( R N , R N ) with bounded derivatives we have that f (cid:48)(cid:48) is Lipschitzcontinuous and bounded. We infer from Lemma 18 and Lemma 10 that (cid:13)(cid:13) r (cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:13)(cid:13) d (cid:13)(cid:13) L ∞ ( Ω ) , which shows (cid:107) r (cid:107) C ,α ( Ω ) = o (cid:0) (cid:107) d (cid:107) L ∞ ( Ω ) (cid:1) since C is independent of d . (cid:117)(cid:116) Theorem 7

Let Assumption 6 hold and let α ∈ (0 , be the constant fromLemma 10. Then the mapping u (cid:48) : B → L ( L ∞ ( Ω ) , C ,α ( Ω )) is Lipschitzcontinuous and the Lipschitz constant does not depend on ( γ, δ ) , but only on Ω , N , γ , γ , δ , δ and b .Proof Let p, q ∈ B and d ∈ L ∞ ( Ω ). Set z p := ∇ (cid:0) u (cid:48) ( p ) d (cid:1) and z q := ∇ (cid:0) u (cid:48) ( q ) d (cid:1) .Then − div (cid:16) γ (cid:2) z p − z q (cid:3) + f (cid:48) (cid:0) ∇ u ( p ) (cid:1) z p − f (cid:48) (cid:0) ∇ u ( q ) (cid:1) z q (cid:17) + γ (cid:2) u (cid:48) ( p ) d − u (cid:48) ( q ) d (cid:3) = 0holds in H ( Ω ) ∗ . Thus, the diﬀerence r := u (cid:48) ( p ) d − u (cid:48) ( q ) d satisﬁes − div (cid:0) γ ∇ r (cid:1) + γr = div (cid:16) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) z p − f (cid:48) (cid:0) ∇ u ( q ) (cid:1) z q (cid:17) = div (cid:16) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) ∇ r (cid:17) + div (cid:16)(cid:2) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) ∇ u ( q ) (cid:1)(cid:3) z q (cid:17) , from which we infer that − div (cid:16)(cid:2) γI + f (cid:48) (cid:0) ∇ u ( p ) (cid:1)(cid:3) ∇ r (cid:17) + γr = div (cid:16)(cid:2) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) ∇ u ( q ) (cid:1)(cid:3) z q (cid:17) in H ( Ω ) ∗ . By the same arguments as below (6), A := γI + f (cid:48) (cid:0) ∇ u ( p ) (cid:1) satisﬁes (cid:107) A (cid:107) C ,α ( Ω ) ≤ a with a constant a that does not depend on γ, δ, p, q . Moreover, A is clearly elliptic with constant γ . By Theorem 12 this yields (cid:13)(cid:13) r (cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:13)(cid:13)(cid:13)(cid:2) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) ∇ u ( q ) (cid:1)(cid:3) z q (cid:13)(cid:13)(cid:13) C ,α ( Ω ) . path-following inexact Newton method for optimal control in BV 15 Here,

C > p, q , but only on the desired quantities. FromLemma 18 and Lemma 11 we infer that (cid:13)(cid:13) r (cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:13)(cid:13)(cid:13) f (cid:48) (cid:0) ∇ u ( p ) (cid:1) − f (cid:48) (cid:0) ∇ u ( q ) (cid:1)(cid:13)(cid:13)(cid:13) C ,α ( Ω ) (cid:13)(cid:13) d (cid:13)(cid:13) L ∞ ( Ω ) . Lemma 18 and Lemma 10 therefore imply (cid:13)(cid:13) u (cid:48) ( p ) − u (cid:48) ( q ) (cid:13)(cid:13) L ( L ∞ ( Ω ) ,C ,α ( Ω )) ≤ C (cid:13)(cid:13)(cid:13)(cid:13)(cid:90) f (cid:48)(cid:48) (cid:16) ∇ u ( q ) + t (cid:2) ∇ u ( p ) − ∇ u ( q ) (cid:3)(cid:17) d t (cid:13)(cid:13)(cid:13)(cid:13) C ,α ( Ω ) (cid:13)(cid:13) ∇ u ( p ) − ∇ u ( q ) (cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:90) (cid:13)(cid:13)(cid:13) f (cid:48)(cid:48) (cid:16) ∇ u ( q ) + t (cid:2) ∇ u ( p ) − ∇ u ( q ) (cid:3)(cid:17)(cid:13)(cid:13)(cid:13) C ,α ( Ω ) d t (cid:13)(cid:13) p − q (cid:13)(cid:13) L ∞ ( Ω ) . The ﬁrst factor is bounded since f (cid:48)(cid:48) is bounded and Lipschitz. This demonstratesthe asserted Lipschitz continuity. (cid:117)(cid:116) Remark 6

Theorem 7 stays valid if Ω is of class C ,α (cid:48) for some α (cid:48) > In this section we introduce the formulation of the optimality system of(ROC γ,δ ) on which our numerical method is based, and we show that theapplication of an inexact Newton method to this formulation yields localconvergence. We use the following assumption.

Assumption 8

We are given constants < γ ≤ γ , < δ ≤ δ and b ≥ .We denote I := [ γ , γ ] × [ δ , δ ] and ﬁx ( γ, δ ) ∈ I . Introducing F : Y × P → Y ∗ × L ( Ω ) , F ( y, p ) := (cid:18) Ay − u ( − p ) y − y Ω − A ∗ p (cid:19) (7)the optimality conditions from Theorem 3 are given by F (¯ y γ,δ , ¯ p γ,δ ) = 0, andthe pair (¯ y γ,δ , ¯ p γ,δ ) is the unique root of F . We suppress the dependency of u = u ( p ; γ, δ ) and F = F ( y, p ; γ, δ ) on γ, δ . By standard Sobolev embeddingswe have P ⊂ H ( Ω ) (cid:44) → L ∞ ( Ω ), hence u ( − p ) ∈ C ,α ( Ω ) for some α > F is well-deﬁned. We mention the work [44], where a Newtonsystem with a somewhat similar structure is considered.The next two lemmas yield convergence of an inexact Newton method. Lemma 12

Let Assumption 8 hold. Then F deﬁned in (7) is locally Lipschitzcontinuously Fr´echet diﬀerentiable. Its derivative at ( y, p ) ∈ Y × P is given by F (cid:48) ( y, p ) : Y × P → Y ∗ × L ( Ω ) , ( δy, δp ) (cid:55)→ (cid:18) A u (cid:48) ( − p ) I − A ∗ (cid:19) (cid:18) δyδp (cid:19) . Proof

Only p (cid:55)→ u ( − p ) is nonlinear, so the claims follow from Theorem 7. (cid:117)(cid:116) Lemma 13

Let Assumption 8 hold. Then F (cid:48) ( y, p ) is invertible for all ( y, p ) ∈ Y × P .Proof The proof consists of two parts. First we show that F (cid:48) ( y, p ) is injectiveand second that it is a Fredholm operator of index 0, see [32, Chapter IV,Section 5]. These two facts imply the bijectivity of F (cid:48) ( y, p ). For the injectivitylet ( δy, δp ) ∈ Y × P with F (cid:48) ( y, p )( δy, δp ) = 0 ∈ Y ∗ × L ( Ω ), i.e.0 = Aδy + u (cid:48) ( − p ) δp ∈ Y ∗ and 0 = δy − A ∗ δp ∈ L ( Ω ) , (8)and therefore (cid:107) δy (cid:107) L ( Ω ) = ( A ∗ δp, δy ) L ( Ω ) = − ( u (cid:48) ( − p ) δp, δp ) L ( Ω ) . The representation of z := u (cid:48) ( − p ) δp from Lemma 11 yields −(cid:107) δy (cid:107) L ( Ω ) = (cid:32)(cid:104) γI + f (cid:48) (cid:0) ∇ u ( − p ) (cid:1)(cid:105) ∇ z, ∇ z (cid:33) L ( Ω ) + γ (cid:0) z, z (cid:1) L ( Ω ) ≥ (cid:16) f (cid:48) (cid:0) ∇ u ( − p ) (cid:1) ∇ z, ∇ z (cid:17) L ( Ω ) . (9)Since f (cid:48) is positive semi-deﬁnite, we ﬁnd (cid:107) δy (cid:107) L ( Ω ) ≤

0. This shows δy = 0. By(8) this yields A ∗ δp = 0 in L ( Ω ), hence δp = 0, which proves the injectivity.To apply Fredholm theory we decompose F (cid:48) ( y, p ) into the two operators F (cid:48) ( y, p ) = (cid:18) A − A ∗ (cid:19) + (cid:18) u (cid:48) ( − p ) I (cid:19) . We want to use [32, Chapter IV, Theorem 5.26], which states: If the ﬁrstoperator is a Fredholm operator of index 0 and the second operator is compactwith respect to the ﬁrst operator (see [32, Chapter IV, Introduction to Section3]), then their sum F (cid:48) ( y, p ) is also a Fredholm operator of index 0. By theinjectivity of F (cid:48) ( y, p ) this implies its bijectivity.The operators A : Y → Y ∗ and A ∗ : P → L ( Ω ) are invertible by Lemma 3,and thus Y × P → Y ∗ × L ( Ω ) , ( δy, δp ) (cid:55)→ (cid:18) A − A ∗ (cid:19) (cid:18) δyδp (cid:19) is invertible and in particular a Fredholmoperator of index 0. It remains toshow that Y × P → Y ∗ × L ( Ω ) , ( δy, δp ) (cid:55)→ (cid:18) u (cid:48) ( − p ) I (cid:19) (cid:18) δyδp (cid:19) is compact with respect to the ﬁrst operator. Thus, we have to establish thatfor any sequence (cid:0) ( δy n , δp n ) (cid:1) n ∈ N ⊂ Y × P such that there exists a C > (cid:0) (cid:107) δy n (cid:107) Y + (cid:107) δp n (cid:107) P (cid:1) + (cid:0) (cid:107) Aδy n (cid:107) Y ∗ + (cid:107) A ∗ δp n (cid:107) L ( Ω ) (cid:1) ≤ C ∀ n ∈ N , (10) path-following inexact Newton method for optimal control in BV 17 the sequence (cid:0) ( u (cid:48) ( − p ) δp n , δy n ) (cid:1) n ∈ N ⊂ Y ∗ × L ( Ω ) contains a convergent subse-quence. By (10) we have that ( (cid:107) δy n (cid:107) Y ) n ∈ N is bounded. The compact embedding Y (cid:44) → (cid:44) → L ( Ω ) therefore implies the existence of a point ˆ y ∈ L ( Ω ) and asubsequence, denoted in the same way, such that (cid:107) δy n − ˆ y (cid:107) L ( Ω ) →

0. Wealso have that ( (cid:107) δp n (cid:107) P ) n ∈ N is bounded. In particular (cid:107) δp n (cid:107) L ∞ ( Ω ) ≤ b for all n ∈ N for some b >

0. By Lemma 11 this implies that ( u (cid:48) ( − p ) δp n ) n ∈ N isbounded in C ,α ( Ω ). Since C ,α ( Ω ) (cid:44) → (cid:44) → Y ∗ , the proof is complete. (cid:117)(cid:116) We consider the following inexact Newton method to ﬁnd the root of F given by (7). The norm that appears is that of Y ∗ × P ∗ . Algorithm 1:

An inexact Newton method for (ROC γ,δ ) Input: ( y , p ) ∈ Y × P , ( γ, δ ) ∈ R > , η ∈ [0 , ∞ ) for k = 0 , , , . . . , it in do if F ( y k , p k ) = 0 then set ( y ∗ , p ∗ ) := ( y k , p k ); stop Compute ( δy k , δp k ) such that (cid:107) F ( y k , p k ) + F (cid:48) ( y k , p k )( δy k , δp k ) (cid:107) ≤ η k (cid:107) F ( y k , p k ) (cid:107) , where η k ∈ [0 , η ] Set ( y k +1 , p k +1 ) = ( y k , p k ) + ( δy k , δp k ) endOutput: ( y ∗ , p ∗ ) It is well-known that the properties established in Lemma 12 and Lemma 13are suﬃcient for local linear/q-superlinear/q-quadratic convergence of theinexact Newton method if the residual in iteration k is of appropriate order,e.g. [33, Theorem 6.1.4]. Thus, we obtain the following result. Theorem 9

Let Assumption 8 hold. If ( y , p ) ∈ Y × P is suﬃciently close to (¯ y γ,δ , ¯ p γ,δ ) , then Algorithm 1 either terminates after ﬁnitely many iterations withoutput ( y ∗ , p ∗ ) = (¯ y γ,δ , ¯ p γ,δ ) or it generates a sequence ( y k , p k ) that convergesr-linearly [q-linearly/q-superlinearly/with q-order ω ] to (¯ y γ,δ , ¯ p γ,δ ) , provided η < [ η is suﬃciently small/ η k → / η k = O ( (cid:107) F ( y k , p k ) (cid:107) ω ) ]. Here, ω ∈ (0 , is arbitrary; for ω = 1 this means q-quadratic convergence.Remark 7 The same rates of convergence can also be established if an inexactNewton method is applied to standard formulations of the optimality systemof (ROC γ,δ ). We focus on the implicit formulation (7) since in our numericalexperiments this was the only approach that proved capable of suﬃcientlyreducing γ, δ for Ω ⊂ R . For instance, with a control-based formulation andfor a ﬁxed coupling δ = 0 . γ we could not reduce γ below roughly 10 − . Also,the algorithm was quite sensitive, e.g., a slight variation in the initial datacould lead to a much higher number of total iterations. Both observations arewell in line with our previous experience [13,21,27,28,30] on PDE-constrainedoptimal control problems involving the TV seminorm. In contrast, working with(7) is much more stable and enabled us to reduce γ to levels below 10 − , asthe numerical results in section 7 show. We point out that the homotopy path( γ, δ ) (cid:55)→ (¯ u γ,δ , ¯ y γ,δ , ¯ p γ,δ ) is not aﬀected by the reformulation of the optimalitysystem, so it is appropriate to compare the ﬁnal values of γ . In this section we provide a discretization scheme for (ROC γ,δ ) and prove itsconvergence. Throughout, we work with a ﬁxed pair ( γ, δ ) ∈ R > .5.1 DiscretizationWe use Finite Elements for the discretization of (ROC γ,δ ). Control, stateand adjoint state are discretized by piecewise linear and globally continuouselements on a triangular grid. We point out that discretizing the control bypiecewise constant Finite Elements will not ensure convergence to the optimalcontrol ¯ u γ,δ , in general; cf. [6, Section 4].For all h ∈ (0 , h ] and a suitable h > T h denote a collection ofopen triangular cells T ⊂ Ω with h = max T ∈T h diam( T ). We write Ω h :=int( ∪ T ∈T h ¯ T ). We assume that there are constants C > c > such thatdist( ∂Ω h , ∂Ω ) ≤ Ch c , | Ω \ Ω h | h → −−−→ , | ∂Ω h | ≤ C. (11)We further assume ( T h ) h ∈ (0 ,h ] to be quasi-uniform and Ω h ⊂ Ω h (cid:48) for h (cid:48) ≤ h .The assumptions in (11) are rather mild and in part implied if, for example, Ω and ( Ω h ) h> are a family of uniform Lipschitz domains, cf. [29, Sections4.1.2&4.1.3]. We also utilize the function spaces V h := (cid:8) v h ∈ C ( ¯ Ω h ) : v h | T is aﬃne linear ∀ T ∈ T h (cid:9) , Y h := V h ∩ H ( Ω h ) . Because V h (cid:44) → H ( Ω h ) it follows that Y h contains precisely those functions of V h that vanish on ∂Ω h . We use the standard nodal basis ϕ , ϕ , . . . , ϕ dim( V h ) in V h and assume that it is ordered in such a way that ϕ , ϕ , . . . , ϕ dim( Y h ) isa basis of Y h . For every u ∈ L ( Ω h ) there is a unique y h ∈ Y h that satisﬁes (cid:90) Ω h  N (cid:88) i,j =1 a ij ∂ i y h ∂ j ϕ h  + c y h ϕ h d x = (cid:90) Ω h uϕ h d x ∀ ϕ h ∈ Y h and by deﬁning S h u := y h we obtain the discrete solution operator S h : L ( Ω h ) → Y h to the PDE in (OC). The discretized version of (ROC γ,δ ) isgiven bymin u ∈ V h || S h u − y Ω h || L ( Ω h ) + βψ δ ( u ) + γ (cid:107) u (cid:107) H ( Ω h ) (cid:124) (cid:123)(cid:122) (cid:125) =: j γ,δ,h ( u ) , (ROC γ,δ,h )where y Ω h represents the restriction of y Ω to Ω h . By standard argumentsthis problem has a unique optimal solution ¯ u γ,δ,h . Based on ¯ u γ,δ,h we deﬁne¯ y γ,δ,h := S h ¯ u γ,δ,h and ¯ p γ,δ,h := S ∗ h ( S h ¯ u γ,δ,h − y Ω h ). For h → u γ,δ,h , ¯ y γ,δ,h , ¯ p γ,δ,h ) converges to the continuous optimal triple (¯ u γ,δ , ¯ y γ,δ , ¯ p γ,δ )in an appropriate sense, as we show next. path-following inexact Newton method for optimal control in BV 19 v ∈ Y h ⊂ H ( Ω h ) afunction in H ( Ω ). Also, we need the following density result. Lemma 14

Let (11) hold. For each ϕ ∈ C ∞ ( ¯ Ω ) ∩ H ( Ω ) there exists asequence ( ϕ h ) ⊂ Y h such that lim h → + (cid:107) ϕ h − ϕ (cid:107) H ( Ω h ) = 0 .Proof Given ϕ and a suﬃciently small h we deﬁne ϕ h ∈ Y h on the in-ner nodes x , x , . . . , x dim( Y h ) of ¯ Ω h \ ∂Ω h as ϕ and as zero on the nodes x dim( Y h )+1 , . . . , x dim( V h ) on the boundary. That is, we set ϕ h ( x i ) := (cid:40) ϕ ( x i ) if x i ∈ Ω h , x i ∈ ∂Ω h . Inserting the nodal interpolant I h ϕ ∈ V h and utilizing an inverse inequality,e.g. [25, Corollary 1.141], we ﬁnd (cid:107) ϕ − ϕ h (cid:107) H ( Ω h ) ≤ (cid:107) ϕ − I h ϕ (cid:107) H ( Ω h ) + Ch − (cid:107) ϕ h − I h ϕ (cid:107) L ( Ω h ) , where C > h . An interpolation error estimate shows that theﬁrst term on the right-hand side converges to 0 for h → + , see for example [25,Theorem 1.103]. Owing to the deﬁnition of ϕ h the second term only involveselements near the boundary ∂Ω h , hence h − (cid:107) ϕ h − I h ϕ (cid:107) L ( Ω h ) = h − (cid:88) T ∈T h ¯ T ∩ ∂Ω h (cid:54) = ∅ (cid:107) ϕ h − I h ϕ (cid:107) L ( T ) ≤ Ch N − (cid:88) T ∈T h ¯ T ∩ ∂Ω h (cid:54) = ∅ (cid:107) ϕ h − I h ϕ (cid:107) L ∞ ( T ) . By (11) and the quasi-uniformity we ﬁnd that the number of boundary trianglesis proportional to Ch − ( N − , hence h − (cid:107) ϕ h − I h ϕ (cid:107) L ( Ω h ) ≤ Ch − max x i ∈ ∂Ω h (cid:12)(cid:12) ϕ h ( x i ) − I h ϕ ( x i ) (cid:12)(cid:12) = Ch − max x i ∈ ∂Ω h (cid:12)(cid:12) ϕ ( x i ) (cid:12)(cid:12) . We ﬁnd due to ϕ = 0 on ∂Ω , dist( ∂Ω h , ∂Ω ) ≤ Ch c and the boundedness of ∇ ϕ that the term on the right-hand side is bounded by Ch − c . After takingsquare roots this concludes the proof as c > . (cid:117)(cid:116) Theorem 10

Let (11) hold. We have lim h → + (cid:107) (¯ u γ,δ,h , ¯ y γ,δ,h , ¯ p γ,δ,h ) − (¯ u γ,δ , ¯ y γ,δ , ¯ p γ,δ ) (cid:107) L ( Ω ) = 0 , where ¯ u γ,δ,h , ¯ y γ,δ,h and ¯ p γ,δ,h are extended by zero to Ω . Proof

By the optimality of the function values it is easy to see that thereexists a constant

C >

0, independent of h , such that (cid:107) ¯ u γ,δ,h (cid:107) H ( Ω h ) ≤ C . Usingextension by zero we now ﬁnd that (cid:107) ¯ y γ,δ,h (cid:107) H ( Ω ) ≤ C .Let ( h n ) n ∈ N be a zero sequence. After taking a subsequence, not rela-beled, we may assume that it is monotonically decreasing. From the compactembedding of H ( Ω ) into L ( Ω ) and the reﬂexivity of H ( Ω ) we obtain asubsequence and a ˆ y ∈ H ( Ω ) such that ¯ y γ,δ,h n n →∞ −−−−→ ˆ y strongly in L ( Ω ) andweakly in H ( Ω ). Extending ¯ u γ,δ,h by 0 to Ω and using the fact that L ( Ω )is a Hilbert space we obtain on a subsequence, denoted the same way, that¯ u γ,δ,h n · Ω hn n →∞ −−−−→ ˆ u weakly in L ( Ω ) for some ˆ u ∈ L ( Ω ). Let ϕ ∈ C ∞ c ( Ω )and ϕ h n be deﬁned as in Lemma 14. We then have0 = A (¯ y γ,δ,h n ) ϕ h n − (¯ u γ,δ,h n , ϕ h n ) L ( Ω h ) n →∞ −−−−→ A (ˆ y ) ϕ − (ˆ u, ϕ ) L ( Ω ) . Thus ˆ y = S ˆ u by the density of C ∞ c ( Ω ) in H ( Ω ). The analogous argumentsshow that the adjoints converge in the same way to some ˆ p ∈ H ( Ω ) withˆ p = S ∗ (ˆ y − y Ω ). It therefore remains to show that (ˆ u, ˆ y, ˆ p ) is the unique optimaltriple to (ROC γ,δ ). We will use Theorem 3 for that. Let u ∈ H ( Ω ) ∩ C ∞ ( ¯ Ω )and I h u ∈ H ( Ω h ) denote the usual nodal interpolant. Then it is well-known,e.g. [25, Theorem 1.103], that (cid:107) u − I h n u (cid:107) H ( Ω hn ) n →∞ −−−−→

0. Moreover, it isstraightforward to see that ˆ u and ˆ p satisfy (2) iﬀ ˆ u minimizes H ( Ω ) (cid:51) u (cid:55)→ γ (cid:107) u (cid:107) H ( Ω ) + β (cid:90) Ω (cid:112) δ + |∇ u | d x + (ˆ p, u ) L ( Ω ) =: G ( u ) . By (13) we also have that ¯ u γ,δ,h minimizes G h , which is deﬁned analogouslyto G with Ω h and ¯ p γ,δ,h . Let ˆ n ∈ N be arbitrary and n ≥ ˆ n . We therefore ﬁndby h ˆ n ≥ h n that Ω h ˆ n ⊂ Ω h n and G h ˆ n (¯ u γ,δ,h n ) ≤ G h n (¯ u γ,δ,h n ) ≤ G h n ( I h n u ) . By (cid:107) u − I h n u (cid:107) H ( Ω hn ) n →∞ −−−−→ | Ω \ Ω h n | n →∞ −−−−→ p γ,δ,h n n →∞ −−−−→ ˆ p in L ( Ω ) we obtain lim sup n →∞ G h ˆ n (¯ u γ,δ,h n ) ≤ G ( u ) . As in previous arguments we have 1 Ω hn ¯ u γ,δ,h n n →∞ −−−−→ Ω hn ˆ u weakly in H ( Ω h ˆ n )and strongly in L ( Ω h ˆ n ). Since u (cid:55)→ (cid:82) Ω h ˆ n (cid:112) δ + |∇ u | d x is weakly lower semi-continuous with respect to the L ( Ω h ˆ n ) norm, cf. [1, Section 2], we obtain G h ˆ n (ˆ u ) ≤ lim inf n →∞ G h ˆ n (¯ u γ,δ,h n ) ≤ lim sup n →∞ G h ˆ n (¯ u γ,δ,h n ) ≤ G ( u ) . Sending ˆ n → ∞ shows that ˆ u is a minimizer of G , hence G (cid:48) (ˆ u ) = 0, whichimplies the condition of Theorem 3. Together with ˆ y = S ˆ u and ˆ p = S ∗ (ˆ y − y Ω )this demonstrates (ˆ u, ˆ y, ˆ p ) = (¯ u γ,δ , ¯ y γ,δ , ¯ p γ,δ ), thereby concluding the proof. (cid:117)(cid:116) Corollary 1

Let (11) hold. We have lim h → + (cid:107) (¯ y γ,δ,h , ¯ p γ,δ,h ) − (¯ y γ,δ , ¯ p γ,δ ) (cid:107) H ( Ω ) = 0 , where ¯ y γ,δ,h and ¯ p γ,δ,h are extended by zero to Ω . path-following inexact Newton method for optimal control in BV 21 Proof

Let R h ¯ y ∈ Y h denote the Ritz projection with respect to A . Extending¯ y γ,δ,h ∈ Y h and R h ¯ y by zero to Ω we clearly have (cid:107) ¯ y γ,δ,h − ¯ y (cid:107) H ( Ω ) ≤ (cid:107) ¯ y γ,δ,h − R h ¯ y (cid:107) H ( Ω h ) + (cid:107) R h ¯ y − ¯ y (cid:107) H ( Ω ) . By deﬁnition, ¯ y γ,δ,h − R h ¯ y satisﬁes A (¯ y γ,δ,h − R h ¯ y )( ϕ h ) = (¯ u γ,δ,h − ¯ u, ϕ h ) L ( Ω h ) ∀ ϕ h ∈ Y h . Thus, choosing ϕ h = ¯ y γ,δ,h − R h ¯ y and using the ellipticity of A and c ≥ Ω together with the Poincar´e inequality in Ω yields a constant C >

0, independentof h , such that (cid:107) ¯ y γ,δ,h − R h ¯ y (cid:107) H ( Ω ) ≤ C (cid:107) ¯ u γ,δ,h − ¯ u (cid:107) L ( Ω ) h → + −−−−→

0, where wealso used extension by zero and Theorem 10. Since R h ¯ y h → + −−−−→ ¯ y in Y , the H ( Ω ) convergence ¯ y γ,δ,h h → + −−−−→ ¯ y follows. The proof for ¯ p γ,δ,h − ¯ p is analogue. (cid:117)(cid:116) Based on the Finite Element approximation from section 5 we now study aninexact Newton method to compute the discrete solution (¯ y γ,δ,h , ¯ p γ,δ,h , ¯ u γ,δ,h )and we embed it into a practical path-following method.6.1 A preconditioned inexact Newton method for the discrete problemsIn this subsection we prove local convergence of an inexact Newton methodwhen applied to a discretized version of (7) for ﬁxed ( γ, δ ) ∈ R > . To this end,let us introduce the discrete adjoint-to-control mapping u h . (We recall thatthe constant h > Lemma 15

Let h ∈ (0 , h ] . For every p ∈ L ( Ω h ) there exists a unique u h = u h ( p ) ∈ V h that satisﬁes the following discrete version of (3) (cid:16) γ ∇ u h + f (cid:0) ∇ u h (cid:1) , ∇ ϕ h (cid:17) L ( Ω h ) + γ (cid:0) u h , ϕ h (cid:1) L ( Ω h ) = ( p, ϕ h ) L ( Ω h ) ∀ ϕ ∈ V h . (12) The associated solution operator u h : L ( Ω h ) → V h is Lipschitz continuouslyFr´echet diﬀerentiable. Its derivative u (cid:48) h ( p ) ∈ L ( L ( Ω h ) , V h ) at p ∈ L ( Ω h ) indirection d ∈ L ( Ω h ) is given by z h = u (cid:48) h ( p ) d ∈ V h , where z h is the uniquesolution to (cid:32)(cid:104) γI + f (cid:48) (cid:0) ∇ u h ( p ) (cid:1)(cid:105) ∇ z h , ∇ ϕ h (cid:33) L ( Ω h ) + γ (cid:0) z h , ϕ h (cid:1) L ( Ω h ) = ( d, ϕ h ) L ( Ω h ) ∀ ϕ ∈ V h . (13) Proof

The proof is similar to the continuous case, but easier, so we omit it. (cid:117)(cid:116)

With u h at hand we can discretize (7) by F h : Y h × Y h → Y ∗ h × Y ∗ h , F h ( y, p ) := (cid:18) Ay − u h ( − p ) y − y Ω h − A ∗ p (cid:19) . The same F h is obtained if we consider the optimality conditions of (ROC γ,δ,h )and express them in terms of ( y, p ). Moreover, (¯ y γ,δ,h , ¯ p γ,δ,h ) is the unique rootof F h and the properties of F from Lemma 12 and Lemma 13 carry over to F h . Lemma 16

Let h ∈ (0 , h ] . The map F h : Y h × Y h → Y ∗ h × Y ∗ h is Lipschitzcontinuously Fr´echet diﬀerentiable. Its derivative at ( y, p ) ∈ Y h × Y h is givenby F (cid:48) h ( y, p ) : Y h × Y h → Y ∗ h × Y ∗ h , ( δy, δp ) (cid:55)→ (cid:18) A u (cid:48) h ( − p ) I − A ∗ (cid:19) (cid:18) δyδp (cid:19) . Moreover, F (cid:48) h ( y, p ) is invertible for every ( y, p ) ∈ Y h .Proof The diﬀerentiability follows from Lemma 15. Since dim( Y h × Y h ) =dim( Y ∗ h × Y ∗ h ), it is suﬃcient to show that F (cid:48) h ( y, p ) is injective. This can bedone exactly as in Lemma 13. (cid:117)(cid:116) Similar to Theorem 9 we have the following result.

Theorem 11

Let h ∈ (0 , h ] and η ∈ [0 , ∞ ) . Then there is a neighborhood N ⊂ Y h × Y h of (¯ y γ,δ,h , ¯ p γ,δ,h ) such that for any ( y , p ) ∈ N any sequence ( y k , p k ) that is generated according to ( y k +1 , p k +1 ) = ( y k , p k )+( δy k , δp k ) , where ( δy k , δp k ) ∈ Y h × Y h satisﬁes for all k ≥ (cid:107) F h ( y k , p k ) + F (cid:48) h ( y k , p k )( δy k , δp k ) (cid:107) ≤ η k (cid:107) F h ( y k , p k ) (cid:107) with ( η k ) ⊂ [0 , η ] , converges r-linearly [q-linearly/q-superlinearly/with q-order ω ] to (¯ y γ,δ,h , ¯ p γ,δ,h ) , provided η < [ η is suﬃciently small/ η k → / η k = O ( (cid:107) F h ( y k , p k ) (cid:107) ω ) ]. Here, ω ∈ (0 , is arbitrary. As a preconditioner for the fully discrete Newton system F (cid:48) h ( y, p ) = (cid:18) A u (cid:48) h ( − p ) M − A T (cid:19) one can use ˆ P := (cid:18) diag( A ) 0 M − diag( A T ) (cid:19) . (14)It is sparse, cheaply invertible, and it does not change for ﬁxed discretization.In [47] it is shown that diagonal preconditioning has a favorable eﬀect on thedistribution of the eigenvalues for Galerkin matrices. Our numerical experimentssuggest that it can also be sensible to employ better approximations thandiag( A ) in ˆ P , e.g., a (modiﬁed incomplete) LU factorization of A . path-following inexact Newton method for optimal control in BV 23 Algorithm 2:

Inexact path-following inexact Newton method

Input: (ˆ y , ˆ p ) ∈ Y h × Y h , ( γ , δ ) ∈ R > , κ > for i = 0 , , , . . . do set ( y , p ) := (ˆ y i , ˆ p i ) for k = 0 , , , . . . do if (cid:107) F h ( y k , p k ) (cid:107) ≤ ρ ( γ i , δ i ) then set (ˆ y i +1 , ˆ p i +1 ) := ( y k , p k ) go to line 11 end choose η k > gmres to determine ( δy k , δp k ) suchthat (cid:107) r k (cid:107) ≤ η k (cid:107) F h ( y k , p k ) (cid:107) call Algorithm 3, input w k := ( y k , p k ), δw k := ( δy k , δp k ); output: λ k set ( y k +1 , p k +1 ) := ( y k , p k ) + λ k ( δy k , δp k ) end select σ i ∈ (0 , if (cid:107) (ˆ y ι +1 , β − ˆ p ι +1 ) − (ˆ y ι , β − ˆ p ι ) (cid:107) H ≤ (1 − σ ι ) κ (cid:107) (ˆ y i +1 , β − ˆ p i +1 ) (cid:107) H for ι = i, i − then set ( y ∗ , p ∗ ) := (ˆ y i +1 , ˆ p i +1 ); stop set ( γ i +1 , δ i +1 ) := ( σ i γ i , σ i δ i ) endOutput: ( y ∗ , p ∗ ) γ,δ,h ). We use the residual r k := F h ( y k , p k ) + F (cid:48) h ( y k , p k )( δy k , δp k ).The function ρ : R > → R > prescribes how small the Newton residualshould be for ﬁxed γ, δ . In the implementation we use ρ ( γ, δ ) = max { − , γ } ,which may be viewed as inexact path-following. For the forcing term η k we usethe two choices η k = ¯ η k := 10 − and η k = ˆ η k := max { − , min { − k − , √ δ i }} ,where k = k ( i ). For ¯ η k we have ¯ η k ≤ (cid:107) F h ( y k , p k ) (cid:107) since we terminate theinner loop if (cid:107) F h ( y k , p k ) (cid:107) < − . Theorem 11 therefore suggests quadraticconvergence for the choice η k = ¯ η k and this can indeed be observed. Similarly, η k = ˆ η k corresponds to superlinear convergence. For both choices, however, wefound in the numerical experiments that it is more eﬃcient to also terminate gmres if the Euclidean norm of r k drops below ¯ η k , respectively, ˆ η k althoughthis can prevent quadratic, respectively, superlinear convergence.The control u h ( − p k ) is computed with a globalized Newton method. Themethod terminates when the Newton residual falls below a threshold thatdecreases with ( γ i , δ i ). The linear systems are solved using SciPy’s sparse directsolver spsolve . As an alternative we experimented with a preconditionedconjugate gradients method (PCG). The results were mixed: While the use ofPCG diminished the total runtime of Algorithm 2 if all went well, we observedon several instances that it broke down for smaller values of ( γ i , δ i ).We choose σ i based on the number of Newton steps that are needed tocompute the implicit controls { u h ( − p k ) } k in outer iteration i . If this numbersurpasses a predeﬁned m ∈ N , then we choose σ i > σ i − . If it belongs to[0 , . m ], then we choose σ i < σ i − . Otherwise, we let σ i = σ i − . In addition,we respect the bound σ i ≥ .

25 for all i , since we found in the numerical exper- iments, cf. Table 3 below, that choosing σ i too small can prevent convergencein some cases. The weighing 1 /β in the termination criterion is made since theamplitude of the adjoint state is roughly of order β in comparison to the state.In all experiments we use κ = 10 − .Algorithm 3 augments Algorithm 2 by a non-monotone line search global-ization introduced in [35]. The non-monotonicity allows to always accept theinexact Newton step and yields potentially larger step sizes than descent-basedstrategies. The intention is to keep the number of trial step sizes low sinceevery trial step size requires the evaluation of F h and hence a recomputationof u h ( − p k ). Assuming for simplicity that u h ( − p k ) is determined exactly foreach k , it is possible to show convergence of ( y k , p k ) from arbitrary startingpoints and to prove that eventually step size 1 will be accepted, which in turnensures that the convergence rates of Theorem 11 are available for every ﬁxed( γ i , δ i ). In the numerical experiments we use τ = 10 − and we observe that inthe vast majority of iterations full steps are taken.All norms without index in Algorithm 2 and 3 are L ( Ω h ) norms. Algorithm 3:

Computation of step size Input: ( w k , δw k ) , τ > for l = 0 , , , . . . do if (cid:107) F h ( w k + 2 − l δw k ) (cid:107) ≤ (cid:16) l +1) (cid:17) (cid:107) F h ( w k ) (cid:107) − τ (cid:107) − l δw k (cid:107) then set λ k := 2 − l ; stop endOutput: λ k We provide numerical results for two examples. Our main goal is to illustratethat Algorithm 2 can robustly compute accurate solutions of (OC). The resultsare obtained from a Python implementation of Algorithm 2 using DOLFIN[38,39], which is part of FEniCS [3,37]. The code for the second example isavailable at https://imsc.uni-graz.at/mannel/publications.php .7.1 Example 1: An example with explicit solutionThe ﬁrst example has an explicit solution and satisﬁes the assumptions usedin this work. We consider (OC) for an arbitrary β > C ∞ domain Ω = B π (0) \ B π (0) in R , A = − ∆ and c ≡

0. The desired state is y Ω ( r ) = β r (cid:16) (1 + r ) sin( r ) − − (2 r −

1) cos( r ) (cid:17) + ¯ y path-following inexact Newton method for optimal control in BV 25(a) ¯ u h (b) ¯ u h with circles of radii jπ , j ∈ { , , } (c) ¯ y h (d) − ¯ p h Fig. 1: Numerically computed optimal solutions for Example 1where r ( x, y ) = (cid:112) x + y , and the optimal state ¯ y is¯ y ( r ) = (cid:40) − r + A ln( r/ (4 π )) + B if r ∈ (2 π, π ) ,C ln( r/ (4 π )) if r ∈ (3 π, π )with constants A, B, C whose values are contained in appendix D. The optimalcontrol is ¯ u ( r ) = 1 (2 π, π ) ( r ) . The optimal value is j (¯ u ) ≈ . β + 59 . β . In appendix D we provide detailson the construction of this example and verify that (¯ y, ¯ u ) is indeed the optimalsolution of (OC) If not stated otherwise, then β = 10 − is employed.We use unstructured triangulations that approximate ∂Ω increasingly betteras the meshes become ﬁner, cf. (11). Figure 1 depicts the optimal control ¯ u h ,optimal state ¯ y h and negative optimal adjoint state − ¯ p h , which were computedby Algorithm 2 on a grid with 1553207 degrees of freedom (DOF).We begin by studying convergence on several grids. We use the ﬁxed ratio( γ i /δ i ) ≡ and apply Algorithm 2 with ( γ , δ ) = (1 , .

01) and (ˆ y , ˆ p ) =(0 , η k = 10 − =: ¯ η k for all k (and all i ) and Table 1: Example 1: Number of Newton steps and errors for several meshes; theﬁrst value is for the forcing term ¯ η k , the second for ˆ η k (only shown if diﬀerent) DOF γ ﬁnal u E j E u E y E p . × − / . × − . × − . . . × − . / . × − . × − . . . × − . / . × − . × − . .

50 1 . × − . / . × − . × − . .

22 5 . × − . / . × − . × − . .

09 2 . × − Table 2: Example 1: Course of Algorithm 2 i γ i σ i ( i , iu ) E ij E iu E iy E ip τ i τ iu . / . / .

18 0 . / . / .

37 (0 ,

0) 575 155 38 . . × − . × − .

33 (1 ,

1) 5 . . . .

34 2 . × − .

30 (1 ,

2) 2 . . .

48 959 0 . . × − .

32 (3 ,

18) 1 . × − . .

23 8 . × − . . . × − .

31 (3 ,

20) 8 . × − . .

23 6 . × − . . . × − .

28 (3 ,

18) 5 . × − . .

22 6 . × − .

84 0 . . × − .

26 (5 ,

20) 3 . × − . .

22 5 . × − .

32 0 . . × − .

25 (3 ,

15) 3 . × − . .

22 5 . × − .

13 8 . × −

17 1 . × − .

25 (3 ,

16) 3 . × − . .

22 5 . × − .

081 2 . × −

18 4 . × − .

25 (3 ,

15) 3 . × − . .

22 5 . × − .

060 7 . × −

19 9 . × − .

25 (3 ,

20) 3 . × − . .

22 5 . × − .

045 3 . × −

20 2 . × − .

25 (3 ,

22) 3 . × − . .

22 5 . × − .

028 2 . × −

21 6 . × − — (3 ,

33) 3 . × − . .

22 5 . × − .

017 1 . × − η k = max { − , min { − k − , √ δ i }} =: ˆ η k . Table 1 shows y, p ), and u , which is thetotal number of Newton steps used to compute the implicit function u . Table 1also contains the errors E j := (cid:12)(cid:12) j γ final ,δ final ,h − ¯ j (cid:12)(cid:12) , E u := (cid:107) ˆ u ﬁnal − ¯ u (cid:107) L ( Ω ∗ ) , as well as E y := (cid:107) ˆ y ﬁnal − ¯ y (cid:107) H ( Ω ∗ ) , E p := (cid:107) ˆ p ﬁnal − ¯ p (cid:107) H ( Ω ∗ ) . where Ω ∗ represents a reference grid with DOF = 1553207. To evaluate theerrors, ˆ u ﬁnal , ˆ y ﬁnal and ˆ p ﬁnal are extended to Ω ∗ using extrapolation. Table 2provides details for the run from Table 1 with DOF = 97643 and η k = ˆ η k .Table 2 includes τ i := (cid:107) (ˆ y i +1 , β − ˆ p i +1 ) − (ˆ y i , β − ˆ p i ) (cid:107) H ( Ω h ) , which appears inthe termination criterion of Algorithm 2, and also τ iu := (cid:107) u (ˆ p i +1 ) − u (ˆ p i ) (cid:107) L ( Ω h ) .Table 1 indicates convergence of the computed solutions (ˆ u ﬁnal , ˆ y ﬁnal , ˆ p ﬁnal )to (¯ u, ¯ y, ¯ p ) and of the objective value j γ final ,δ final ,h to ¯ j . It also suggests thatconvergence takes place at certain rates with respect to h . Moreover, thetotal number of Newton steps both for ( y, p ) and for u stays bounded as DOFincreases, which may suggest mesh independence. The choice η k = ¯ η k frequentlyyields lower numbers of Newton steps for ( y, p ) and for u , yet the runtime(not depicted) is consistently higher than for η k = ˆ η k since more iterations path-following inexact Newton method for optimal control in BV 27 Table 3: Example 1: Results for ﬁxed values ( σ i ) ≡ σ ; the ﬁrst value is for theforcing term ¯ η k , the second for ˆ η k (only shown if diﬀerent) σ γ ﬁnal u E j E u E y E p . × − . × − . .

50 1 . × − . × − . × − . .

50 1 . × − . / . × − . / . × − . .

50 1 . × − Table 4: Example 1: Results for a sequence of nested grids

DOF γ ﬁnal u E j E u E y E p . × − . × −

30 1 . . × − . × −

13 175 7 . × − . .

51 1 . × − . × −

16 117 5 . × − . .

23 5 . × − . × −

34 387 1 . × − . .

09 3 . × − of gmres are required to compute the step for ( y, p ). Speciﬁcally, using ˆ η k saves between 5% and 36% of runtime, with 36% being the saving on the ﬁnestgrid. In the vast majority of iterations, step size 1 is accepted for ( y k , p k ). Forinstance, all of the 52 iterations required for DOF = 97643 and η k = ˆ η k usefull steps; for DOF = 6251 and η k = ¯ η k , 86 of the 87 iterations use step size 1.Table 3 displays the eﬀect of ﬁxing ( σ i ) ≡ σ in Algorithm 2. The mesh usesDOF = 24443 and is the same as in Table 1.For both forcing terms, σ = 0 . σ i that we employ requires about 6% more runtime. For σ = 0 . γ = 10 − is reached because u h ( − p k ) could not be computed to suﬃcient accuracy withinthe 200 iterations that we allow for this process. Together with Table 3 thisshows that small values of σ i can increase the number of steps required for u and even prevent convergence. We therefore let σ i ≥ .

25 for all i in allexperiments, although this diminshes the eﬃcacy of Algorithm 2 in some cases.Table 4 shows results for η k = ˆ η k and a sequence of nested grids, where thegrids are reﬁned once γ i < − , γ i < − and γ i < − , respectively.We note that the errors E j , E u , E y and E p in the last line of Table 4 are ofsimilar size as their counterparts in the last line of Table 1. Since the iterationnumbers in these lines are similar as well, the variant on the ﬁxed grid somewhatsurprisingly requires a lower runtime than the nested variant. The reason is thatthe computation of u h ( − p ) after the last grid reﬁnement at γ i = 8 . × − requires 200 iterations. We leave the issue of reducing this large number (andcorrespondingly the runtime) as a future topic and mention that, in contrast,in example 2 the usage of nested grids is clearly advantageous. Table 5: Example 1: Results for various values of β ; the ﬁrst line is for thechoice ( γ i /δ i ) ≡ , the second for ( γ i /δ i ) ≡ β − − − − ( u ) / E u (28 , /

21 (37 , / . , / . , / . u ) / E u (40 , /

21 (78 , / . , / . , / . Table 6: Example 1: Iteration numbers and errors for several ratios γ i /δ i ; thecomputations for γ i /δ i ∈ { − , − } use a lower accuracy γ i δ i δ ﬁnal u E j E u E y E p − . × −

71 288 9 . × − . .

50 1 . × − − . × −

58 246 9 . × − . .

50 1 . × − . × −

102 469 9 . × − . .

50 1 . × − . × −

80 396 9 . × − . .

50 1 . × − . × −

70 454 9 . × − . .

50 1 . × − . × −

59 475 9 . × − . .

50 1 . × − We now turn to the robustness of Algorithm 2. We emphasize that in ournumerical experience the robustness of algorithms for optimal control problemsinvolving the TV seminorm in the objective is a delicate issue. Table 5 displaysthe iteration numbers required by Algorithm 2 for diﬀerent values of β onthe mesh with DOF = 24443 along with the error E j for η k = ˆ η k for thetwo choices ( γ i /δ i ) ≡ and ( γ i /δ i ) ≡

1. The omitted values for β = 10 − and ( γ i /δ i ) ≡ are identical to those from Table 1 for DOF = 24443 and η k = ˆ η k . Table 6 provides iteration numbers and errors for various ﬁxed choicesof ( γ i /δ i ) on the mesh with DOF = 24443 for β = 10 − , η k = ¯ η k and ( σ i ) ≡ . − and 10 − we increased κ from 10 − to 5 · − to obtainconvergence. Since our goal is to demonstrate robustness, no further changesare made although this would lower the iteration numbers.Table 5 and 6 suggest that Algorithm 2 is able to handle a range of parametervalues without modiﬁcation of its internal parameters.7.2 Example 2From section 3 onward we have required Ω to be of class C , . To show thatAlgorithm 2 can still solve (OC) if Ω is only Lipschitz, we now consider anexample from [22, section 4.2] on the square Ω = [ − , . We have A = − ∆ , c ≡ β = 10 − and y Ω = 1 D , where D = ( − . , . . We use uniformtriangulations throughout this example and denote by n + 1 the number ofnodes in coordinate direction. Figure 1 depicts the optimal control ¯ u h , optimalstate ¯ y h and negative optimal adjoint state − ¯ p h , which were computed with n = 1024. Apparently, ¯ u h is piecewise constant. path-following inexact Newton method for optimal control in BV 29(a) ¯ u h (b) ¯ u h (top view)(c) ¯ y h (d) − ¯ p h Fig. 2: Numerically computed optimal solutions for Example 2Table 7: Example 2: Number of Newton steps and errors for several meshes n γ ﬁnal u E j E u E y E p

32 1 . × −

43 321 1 . × − . .

75 9 . × −

64 1 . × −

48 551 9 . × − . .

37 4 . × −

128 3 . × −

46 902 4 . × − . .

19 2 . × −

256 3 . × −

50 1212 2 . × − . .

081 1 . × −

512 5 . × −

58 2868 7 . × − .

42 0 .

031 4 . × − Throughout, we use the ﬁxed ratio ( γ i /δ i ) ≡ − and apply Algorithm 2with ( γ , δ ) = (0 . ,

1) and (ˆ y , ˆ p ) = (0 , γ i /δ i can be employed as well. We only provide results for ¯ η k sincethe forcing term ˆ η k does not yield lower runtimes in this example; both forcingterms produce the same errors, though. Table 7 displays iteration numbers anderrors for diﬀerent grids, while Table 8 shows details for n = 256.Table 7 hints at possible mesh independence for ( y, p ), but suggests thatthe number of Newton steps for u increases with n . The depicted errors are Table 8: Example 2: Course of Algorithm 2 i γ i σ i ( i , iu ) E ij E iu E iy E ip τ i τ iu . / . . . . / . . . (0 ,

0) 0 .

42 34 3 . . × − . × − .

27 (2 ,

7) 6 . × −

27 1 . . × −

498 7 .

36 1 . × − .

26 (2 ,

37) 3 . × −

23 1 . . × −

212 3 . . × − .

55 (2 ,

59) 3 . × −

12 0 .

50 4 . × − . .

612 1 . × − .

55 (2 ,

59) 1 . × − . .

39 3 . × − . .

613 9 . × − .

53 (2 ,

56) 4 . × − . .

29 2 . × − . .

414 4 . × − .

53 (3 ,

70) 4 . × − . .

21 1 . × − . . . × − .

52 (3 ,

64) 2 . × − . .

084 1 . × − .

62 0 . . × − .

47 (2 ,

23) 2 . × − . .

083 1 . × − .

38 0 . . × − .

45 (2 ,

56) 2 . × − . .

081 1 . × − .

28 0 . . × − .

59 (2 ,

80) 2 . × − . .

081 1 . × − .

15 0 . . × − .

53 (2 ,

15) 2 . × − . .

081 1 . × − .

062 0 . . × − — (2 ,

17) 2 . × − . .

081 1 . × − .

042 0 . Table 9: Example 2: Results for a sequence of nested grids n γ ﬁnal u E j E u E y E p

64 4 . × − . × −

25 1 . . × −

128 4 . × −

12 260 1 . × −

14 0 .

64 6 . × −

256 6 . × −

19 481 1 . × − . .

10 1 . × −

512 5 . × −

20 792 7 . × − .

42 0 .

031 4 . × − Table 10: Example 2: Results for various values of β . A sequence of nestedgrids is used and the displayed iteration numbers are for the ﬁnest grid only β − − − × − ( u ) / E j (12 , / . × − (22 , / . × − (70 , / . × − (104 , / . × − computed by use of a reference solution that is obtained by Algorithm 2 with η k = ¯ η k on the mesh with n = 1024. As in the ﬁrst example it seems thatconvergence with respect to h takes place at certain rates. The majority ofiterations use full Newton steps for ( y, p ). For instance, all but one of the 50iterations for n = 256 use step length one.Table 9 shows the outcome of Algorithm 2 if a sequence of nested grids isused, where the grids are reﬁned once γ i < − , γ i < − and γ i < − ,respectively. This simple strategy reduces the runtime by about 57% whileproviding the same accuracy as a run for n = 512, cf. the last line of Table 7.Table 10 adresses the robustness of Algorithm 2 with respect to β . Thecomputations are carried out on nested grids and the displayed iterationnumbers are those for the ﬁnest grid, which has n = 128. The reference solutionis computed for n = 256. The ﬁnal grid change happens once γ i < − . path-following inexact Newton method for optimal control in BV 31 Table 10 indicates that Algorithm 2 is robust with respect to β . As inexample 1 it is possible to achieve lower iteration numbers through manipulationof the algorithmic parameters. For instance, if the ﬁnal grid change for β = 10 − happens once γ i < − instead of γ i < − , then only (41 , , We have studied an optimal control problem with controls from BV in which thecontrol costs are given by the TV seminorm. By smoothing the TV seminormand adding an H regularization term we obtained a family of auxiliaryproblems whose solutions converge to the optimal solution of the originalproblem in appropriate function spaces. For ﬁxed smoothing and regularizationparameter we showed local convergence of an inﬁnite-dimensional inexactNewton method applied to a reformulation of the optimality system thatinvolves the control as an implicit function of the adjoint state. Based on aconvergent Finite Element approximation a practical algorithm was derivedand it was demonstrated that the algorithm is able to robustly compute theoptimal solution of the control problem with considerable accuracy. To verifythis, a two-dimensional test problem with known solution was constructed. A Diﬀerentiability of ψ δ Lemma 17

Let δ > , N ∈ N and let Ω ⊂ R N be open. The functional ψ δ : H ( Ω ) → R , u (cid:55)→ (cid:90) Ω (cid:113) δ + |∇ u | d x is Lipschitz continuously Fr´echet diﬀerentiable and twice Gˆateaux diﬀerentiable. Its ﬁrstderivative at u in direction v and its second derivative at u in directions v, w are given by ψ (cid:48) δ ( u ) v = (cid:90) Ω ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | d x and ψ (cid:48)(cid:48) δ ( u )[ v, w ] = (cid:90) Ω ( ∇ v, ∇ w ) (cid:112) δ + |∇ u | − ( ∇ u, ∇ v ) ( ∇ u, ∇ w )( δ + |∇ u | ) d x. Proof

First Gˆateaux derivative

Let u, v ∈ H ( Ω ). As s (cid:55)→ √ δ + s is Lipschitz on [0 , ∞ ) with constant √ δ , we obtain forall t ∈ [ − , t (cid:54) = 0, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:112) δ + |∇ u + t ∇ v | − (cid:112) δ + |∇ u | t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ |∇ v | · (2 |∇ u | + |∇ v | )2 √ δ a.e. in Ω. (15)Thus, we can apply the theorem of dominated convergence, which yieldslim t → ψ δ ( u + tv ) − ψ δ ( u ) t = (cid:90) Ω lim t → (cid:112) δ + |∇ u + t ∇ v | − (cid:112) δ + |∇ u | t d x = (cid:90) Ω ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | d x. From (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | d x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d x ≤ (cid:107)∇ u (cid:107) L ( Ω ) (cid:107)∇ v (cid:107) L ( Ω ) √ δ ≤ (cid:107) u (cid:107) H ( Ω ) (cid:107) v (cid:107) H ( Ω ) √ δ v (cid:55)→ ψ (cid:48) δ ( u ) v is linear and continuous. Second Gˆateaux derivative

Let u, v, w ∈ H ( Ω ). Since g : R N → R , g ( y ) := ( y,z ) √ δ + | y | , with z ∈ R N ﬁxed, is Lipschitzcontinuous on R N with constant √ δ | z | , we obtain for all t ∈ R , t (cid:54) = 0, (cid:12)(cid:12)(cid:12)(cid:12) t (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ∇ u + t ∇ w, ∇ v ) (cid:112) δ + |∇ u + t ∇ w | − ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ δ |∇ v ||∇ w | a.e. in Ω. (16)Dominated convergence yieldslim t → ψ (cid:48) δ ( u + tw ) v − ψ (cid:48) δ ( u ) vt = (cid:90) Ω lim t → t (cid:32) ( ∇ u + t ∇ w, ∇ v ) (cid:112) δ + |∇ u + t ∇ w | − ( ∇ u, ∇ v ) (cid:112) δ + |∇ u | (cid:33) d x = (cid:90) Ω ( ∇ v, ∇ w ) (cid:112) δ + |∇ u | − ( ∇ u, ∇ v ) ( ∇ u, ∇ w )( δ + |∇ u | ) d x, where we used the directional derivative of g to derive the last equality. From (16) we deducethe boundedness of the bilinear mapping ( v, w ) (cid:55)→ ψ (cid:48)(cid:48) δ ( u )[ v, w ] by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω ( ∇ v, ∇ w ) (cid:112) δ + |∇ u | − ( ∇ u, ∇ v ) ( ∇ u, ∇ w )( δ + |∇ u | ) d x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ δ (cid:107) v (cid:107) H ( Ω ) (cid:107) w (cid:107) H ( Ω ) . (17) Lipschitz continuous Fr´echet diﬀerentiability

From (17) we infer that sup u ∈ H ( Ω ) (cid:107) ψ (cid:48)(cid:48) δ ( u ) (cid:107) H ( Ω ) ∗∗ ≤ √ δ , which implies that u (cid:55)→ ψ (cid:48) δ ( u )is Lipschitz with constant √ δ , hence u (cid:55)→ ψ δ ( u ) is Fr´echet diﬀerentiable. (cid:117)(cid:116) B H¨older continuity for quasilinear partial diﬀerential equations

To prove results on the H¨older continuity of solutions to quasilinear elliptic PDEs, we ﬁrstdiscuss linear elliptic PDEs.

Theorem 12

Let α ∈ (0 , and let Ω be a bounded C ,α domain. Let γ , µ > be given. Let A ∈ C ,α ( Ω, R N × N ) be a uniformly elliptic matrix with ellipticity constant µ and let γ ≥ γ .Let a > be such that γ, (cid:107) A (cid:107) C ,α ( Ω ) ≤ a . Then there is a constant C > dependingonly on α , Ω , N , µ , a and γ such that for any p ∈ L ∞ ( Ω ) and any f ∈ C ,α ( Ω, R N ) theunique weak solution u to (cid:40) − div( A ∇ u ) + γu = p − div( f ) in Ω,∂ A ν u = 0 on Γ, (18) satisﬁes u ∈ C ,α ( Ω ) and (cid:107) u (cid:107) C ,α ( Ω ) ≤ C (cid:16) (cid:107) p (cid:107) L ∞ ( Ω ) + (cid:107) f (cid:107) C ,α ( Ω ) (cid:17) . Proof

A standard ellipticity argument delivers unique existence and (cid:107) u (cid:107) H ( Ω ) ≤ C (cid:107) p (cid:107) L ∞ ( Ω ) ,where C only depends on the claimed quantities. Moreover, by [46, Theorem 3.16(iii)] (cid:107) u (cid:107) L ,N +2 α ( Ω ) + (cid:107)∇ u (cid:107) L ,N +2 α ( Ω ) ≤ C (cid:16) (cid:107) p (cid:107) L , ( N +2 α − ( Ω ) + (cid:107) f (cid:107) L ,N +2 α ( Ω ) + (cid:107) u (cid:107) H ( Ω ) (cid:17) . Here, C depends on all of the claimed quantities except γ , and L ,λ ( Ω ) denotes a Cam-panato space; for details see [46, Chapter 1.4]. The deﬁnition of Campanato spaces im-plies (cid:107) p (cid:107) L , ( N +2 α − ( Ω ) ≤ C (cid:107) p (cid:107) L ∞ ( Ω ) . Using the isomorphism between L ,N +2 α ( Ω ) and C ,α ( Ω ) from [46, Theorem 1.17 (ii)] we obtain (cid:107) u (cid:107) C ,α ( Ω ) ≤ C (cid:16) (cid:107) p (cid:107) L ∞ ( Ω ) + (cid:107) f (cid:107) C ,α ( Ω ) + (cid:107) u (cid:107) H ( Ω ) (cid:17) . The earlier ellipticity estimate concludes the proof. (cid:117)(cid:116) path-following inexact Newton method for optimal control in BV 33The next result follows directly from [36, Theorem 2] and requires no proof.

Theorem 13

Let Ω be a bounded C ,α (cid:48) domain for some α (cid:48) ∈ (0 , . Let A : Ω × R × R N → R N , B : Ω × R × R N → R , M > and < λ ≤ Λ . Let κ, m ≥ and suppose that N (cid:88) i,j =1 ∂ η i A j ( x, u, η ) ξ i ξ j ≥ λ (cid:0) κ + | η | (cid:1) m | ξ | , (ellipticity) (19) N (cid:88) i,j =1 | ∂ η i A j ( x, u, η ) | ≤ Λ (cid:0) κ + | η | (cid:1) m , (boundedness of A ) (20) | B ( x, u, η ) | ≤ Λ (cid:0) | η | (cid:1) m +2 , (boundedness of B ) (21) as well as the H¨older continuity property | A ( x , u , η ) − A ( x , u , η ) | ≤ Λ (cid:0) | η | (cid:1) m +1 (cid:0) | x − x | α (cid:48) + | u − u | α (cid:48) (cid:1) (22) are satisﬁed for all x, x , x ∈ Ω , u, u , u ∈ [ − M, M ] and η, ξ ∈ R N . Then there existconstants α ∈ (0 , and C > such that each solution u ∈ H ( Ω ) of (cid:90) Ω A ( x, u, ∇ u ) T ∇ ϕ d x = (cid:90) Ω B ( x, u, ∇ u ) ϕ d x ∀ ϕ ∈ H ( Ω ) satisﬁes (cid:107) u (cid:107) C ,α ( Ω ) ≤ C. Here,

C > only depends on α (cid:48) , Ω , N , Λ/λ , m , and M , while α ∈ (0 , only depends on α (cid:48) , N , Λ/λ and m . We collect elementary estimates for H¨older continuous functions.

Lemma 18

Let Ω ⊂ R N be nonempty, let α > , and let f, g ∈ C ,α ( Ω ) . Then: • (cid:107) fg (cid:107) C ,α ( Ω ) ≤ (cid:107) f (cid:107) C ,α ( Ω ) (cid:107) g (cid:107) C ,α ( Ω ) . • (cid:107) (cid:112) (cid:15) + f (cid:107) C ,α ( Ω ) ≤ √ (cid:15) + (cid:107) f (cid:107) C ,α ( Ω ) for all (cid:15) > . • If | f | ≥ (cid:15) > on Ω for some constant (cid:15) > , then there holds (cid:107) /f (cid:107) C ,α ( Ω ) ≤ (cid:15) − (cid:107) f (cid:107) C ,α ( Ω ) + (cid:15) − . • (cid:107) | h | (cid:107) C ,α ( Ω ) ≤ (cid:107) h (cid:107) C ,α ( Ω, R N ) for all h ∈ C ,α ( Ω, R N ) . • Let N i ∈ N and let U i ⊂ R N i be nonempty, ≤ i ≤ . For φ ∈ C , ( U , U ) , h ∈ C ,α ( U , U ) and H ∈ C ,α ( U , U ) there hold (cid:107) φ ◦ h (cid:107) C ,α ( U ,U ) ≤ | φ | C , ( U ,U ) | h | C ,α ( U ,U ) + (cid:107) φ (cid:107) L ∞ ( U ,U ) and (cid:107) H ◦ φ (cid:107) C ,α ( U ,U ) ≤ | φ | αC , ( U ,U ) | H | C ,α ( U ,U ) + (cid:107) H (cid:107) L ∞ ( U ,U ) . Proof

First claim:

Because of | f ( x ) g ( x ) − f ( y ) g ( y ) | ≤ | f ( x ) | | g ( x ) − g ( y ) | + | g ( y ) | | f ( x ) − f ( y ) | ≤ ( (cid:107) f (cid:107) L ∞ ( Ω ) | g | C ,α ( Ω ) + (cid:107) g (cid:107) L ∞ ( Ω ) | f | C ,α ( Ω ) ) | x − y | α for all x, y ∈ Ω , we infer | fg | C ,α ( Ω ) ≤ (cid:107) f (cid:107) L ∞ ( Ω ) | g | C ,α ( Ω ) + (cid:107) g (cid:107) L ∞ ( Ω ) | f | C ,α ( Ω ) . Together with (cid:107) fg (cid:107) L ∞ ( Ω ) ≤(cid:107) f (cid:107) L ∞ ( Ω ) (cid:107) g (cid:107) L ∞ ( Ω ) this implies the ﬁrst claim. Second claim:

Since φ ( t ) := √ (cid:15) + t is Lipschitz continuous with constant 1 in R , theassertion follows from the ﬁfth claim by use of (cid:107) (cid:112) (cid:15) + f (cid:107) L ∞ ( Ω ) ≤ √ (cid:15) + (cid:107) f (cid:107) L ∞ ( Ω ) . Third claim:

Since φ ( t ) := | t | − is Lipschitz continuous with constant (cid:15) − in R \ ( − (cid:15), (cid:15) ),the assertion follows from the ﬁfth claim, applied with U := { f ( x ) : x ∈ Ω } , by use of (cid:107) φ (cid:107) L ∞ ( U ,U ) = (cid:107) | f | − (cid:107) L ∞ ( Ω ) ≤ (cid:15) − . Fourth claim:

The assertion follows from the ﬁfth claim.

Fifth claim:

For x, y ∈ U we have | φ ( h ( x )) − φ ( h ( y )) | ≤ | φ | C , ( U ,U ) | h ( x ) − h ( y ) | ≤ | φ | C , ( U ,U ) | h | C ,α ( U ,U ) | x − y | α . Together with | φ ( h ( x )) | ≤ sup y ∈ U | φ ( y ) | = (cid:107) φ (cid:107) L ∞ ( U ,U ) for all x ∈ U we obtain theassertion for φ ◦ h . The assertion for H ◦ φ can be established analogously. (cid:117)(cid:116) Theorem 14

Let Ω ⊂ R N be a bounded C ,α (cid:48) domain for some α (cid:48) ∈ (0 , . Let β > and γ ≥ γ ≥ γ > and δ ≥ δ ≥ δ > . By u = u ( p ) ∈ H ( Ω ) we denote for each p ∈ L ∞ ( Ω ) the unique weak solution of  − div (cid:32)(cid:34) γ + β (cid:112) δ + |∇ u | (cid:35) ∇ u (cid:33) + γu = p in Ω, (cid:32)(cid:34) γ + β (cid:112) δ + |∇ u | (cid:35) ∇ u, ν (cid:33) = 0 on Γ. (23) Then for every b > there exists α ∈ (0 , such that u : B b → C ,α ( Ω ) is well-deﬁned and Lipschitz continuous, i.e. (cid:107) u ( p ) − u ( p ) (cid:107) C ,α ( Ω ) ≤ L (cid:107) p − p (cid:107) L ∞ ( Ω ) for all p , p ∈ B b ⊂ L ∞ ( Ω ) and some L > . The constants L and α are independent of γ and δ ,but may depend on α (cid:48) , Ω , N , β , b , γ , γ , δ and δ .Proof Let b > p , p ∈ L ∞ ( Ω ) with (cid:107) p (cid:107) L ∞ ( Ω ) , (cid:107) p (cid:107) L ∞ ( Ω ) < b . Part 1: Showing existence of u , u ∈ H ( Ω ) . For i = 1 , F i : H ( Ω ) → R ,v (cid:55)→ γ (cid:107) v (cid:107) H ( Ω ) + β (cid:90) Ω (cid:113) δ + |∇ v | d x − ( p i , v ) L ( Ω ) . Invoking the convexity of ψ δ , cf. Lemma 2, we obtain that F i is strongly convex, whichimplies the existence of a unique minimizer u i ∈ H ( Ω ). Since F i is Fr´echet diﬀerentiable byLemma 17, we have F (cid:48) ( u i ) = 0 in H ( Ω ) ∗ , which is equivalent to (23). Part 2: Showing u , u ∈ L ∞ ( Ω ) and an estimate for (cid:107) u (cid:107) L ∞ ( Ω ) and (cid:107) u (cid:107) L ∞ ( Ω ) Fix

M > γ − b and let u i,M := min( M, max( − M, u i )), i = 1 ,

2. For any N (cid:51) p ≥ ∇ (cid:0) u p − i,M (cid:1) = (2 p − u p − i ∇ u i · {− M γ − b by assumption we conclude that (cid:107) u i (cid:107) L ∞ ( Ω ) ≤ γ − b for i = 1 , . (24) path-following inexact Newton method for optimal control in BV 35 Part 3: Obtaining C ,α regularity of u , u We use Theorem 13 to establish the C ,α ( Ω )-regularity of u and u . We apply it with m = 0, A ( x, u, η ) = γη + βη/ (cid:112) δ + | η | , B ( x, u, η ) = p i ( x ) for i = 1 , κ = 0, identical valuesfor α (cid:48) , λ = γ , Λ = max { b , γ N + δ − / β ( N + N ) } and M = γ − b , cf. (24). Since A isindependent of ( x, u ) and continuously diﬀerentiable, it is easy to see that the requirementsof Theorem 13 are met. This shows u , u ∈ C ,α ( Ω ) for some α > (cid:107) u i (cid:107) C ,α ( Ω ) ≤ C, (25)where C > α ∈ (0 ,

1) depend only on the quantities α (cid:48) , Ω , N , Λ/λ = γ − Λ and M = γ − b . Part 4: Lipschitz continuity of p (cid:55)→ u ( p )Taking the diﬀerence of the weak formulations supplies (cid:90) Ω ∇ ϕ T (cid:32) γ ∇ ˜ u + β ∇ u (cid:112) δ + |∇ u | − β ∇ u (cid:112) δ + |∇ u | (cid:33) + γϕ ˜ u d x = (cid:90) Ω ϕ ˜ p d x ∀ ϕ ∈ H ( Ω ) , (26)where we abbreviated ˜ u := u − u and ˜ p := p − p . The function H : R N → R given by H ( v ) := (cid:112) δ + | v | is convex. Let t ∈ [0 ,

1] and denote by u τ : Ω → R the C ,α ( Ω ) function u τ ( x ) := u ( x ) + τ ˜ u ( x ). For every x ∈ Ω it holds that ∇ u ( x ) (cid:112) δ + |∇ u ( x ) | − ∇ u ( x ) (cid:112) δ + |∇ u ( x ) | = ∇ H (cid:0) ∇ u ( x ) (cid:1) − ∇ H (cid:0) ∇ u ( x ) (cid:1) = (cid:90) ∇ H ( ∇ u τ ( x )) d τ ∇ ˜ u ( x ) , where the integral is understood componentwise. Together with (26) we infer that ˜ u satisﬁes  − div (cid:16) ˜ A ∇ ˜ u (cid:17) + γ ˜ u = ˜ p in Ω,∂ ν ˜ A ˜ u = 0 on Γ, where ˜ A : Ω → R N × N is given by˜ A ( x ) := γI + β (cid:90) ∇ H ( ∇ u τ ( x )) d τ. In order to apply Theorem 12 to this PDE, we show ˜ A ∈ C ,α ( Ω, R N × N ). The convexity of H implies that ∇ H is positive semi-deﬁnite. Thus we ﬁnd for any v ∈ R N and any x ∈ Ωv T ˜ A ( x ) v ≥ γ | v | ≥ γ | v | . For x ∈ Ω and 1 ≤ i, j ≤ N it holds that | ˜ A ij ( x ) | ≤ γ + β (cid:90) (cid:12)(cid:12)(cid:12)(cid:2) ∇ H ( ∇ u τ ( x )) (cid:3) ij (cid:12)(cid:12)(cid:12) d τ ≤ γ + β sup τ ∈ [0 , (cid:13)(cid:13)(cid:13)(cid:2) ∇ H ( ∇ u τ ) (cid:3) ij (cid:13)(cid:13)(cid:13) L ∞ ( Ω ) . We also have for all x, y ∈ Ω (cid:12)(cid:12)(cid:12) ˜ A ij ( x ) − ˜ A ij ( y ) (cid:12)(cid:12)(cid:12) ≤ β (cid:90) (cid:12)(cid:12)(cid:12)(cid:12)(cid:104) ∇ H ( ∇ u τ ( x )) − ∇ H ( ∇ u τ ( y )) (cid:105) ij (cid:12)(cid:12)(cid:12)(cid:12) d τ ≤ β sup τ ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:104) ∇ H ( ∇ u τ ( x )) − ∇ H ( ∇ u τ ( y )) (cid:105) ij (cid:12)(cid:12)(cid:12)(cid:12) ≤ β sup τ ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:104) ∇ H ( ∇ u τ ) (cid:105) ij (cid:12)(cid:12)(cid:12)(cid:12) C ,α ( Ω ) | x − y | α , | ˜ A ij | C ,α ( Ω ) ≤ β sup τ ∈ [0 , | [ ∇ H ( u τ )] ij | C ,α ( Ω ) . Together, we infer that (cid:13)(cid:13) ˜ A ij (cid:13)(cid:13) C ,α ( Ω ) ≤ γ + 2 β sup τ ∈ [0 , (cid:13)(cid:13)(cid:13)(cid:2) ∇ H ( ∇ u τ ) (cid:3) ij (cid:13)(cid:13)(cid:13) C ,α ( Ω ) (27)for all 1 ≤ i, j ≤ N . From Lemma 18 we obtain for every ﬁxed 1 ≤ i, j ≤ N (cid:13)(cid:13)(cid:13)(cid:2) ∇ H ( ∇ u τ ) (cid:3) ij (cid:13)(cid:13)(cid:13) C ,α ( Ω ) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:112) δ + |∇ u τ | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C ,α ( Ω ) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ x i u τ ∂ x j u τ (cid:112) δ + |∇ u τ | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:18) (cid:13)(cid:13)(cid:13)(cid:113) δ + |∇ u τ | (cid:13)(cid:13)(cid:13) C ,α ( Ω ) + (cid:13)(cid:13) ∇ u τ (cid:13)(cid:13) C ,α ( Ω ) (cid:13)(cid:13)(cid:13)(cid:0) δ + |∇ u τ | (cid:1) − (cid:13)(cid:13)(cid:13) C ,α ( Ω ) (cid:19) , where C only depends on δ . Since (cid:107)∇ u (cid:107) C ,α ( Ω ) , (cid:107)∇ u (cid:107) C ,α ( Ω ) ≤ C by (25), there holds (cid:107)∇ u τ (cid:107) C ,α ( Ω ) ≤ C with the same C >

0. This C only depends on α (cid:48) , Ω , N , β , b , γ , γ and δ . This and Lemma 18 show (cid:13)(cid:13)(cid:13)(cid:2) ∇ H ( ∇ u τ ) (cid:3) ij (cid:13)(cid:13)(cid:13) C ,α ( Ω ) ≤ C (cid:16) (cid:13)(cid:13)(cid:13)(cid:113) δ + |∇ u τ | (cid:13)(cid:13)(cid:13) C ,α ( Ω ) + (cid:13)(cid:13)(cid:13)(cid:113) δ + |∇ u τ | (cid:13)(cid:13)(cid:13) C ,α ( Ω ) (cid:17) ≤ C (cid:16) (cid:16) √ δ + (cid:107)∇ u τ (cid:107) C ,α ( Ω ) (cid:17) (cid:17) ≤ C, where C > τ and only depends on the quantities stated in the theorem.Hence, with the same C there holdssup τ ∈ [0 , (cid:13)(cid:13)(cid:13)(cid:2) ∇ H ( ∇ u τ ) (cid:3) ij (cid:13)(cid:13)(cid:13) C ,α ( Ω ) ≤ C ∀ ≤ i, j ≤ N. Inserting this into (27) yields ˜ A ∈ C ,α ( Ω, R N × N ) with (cid:107) ˜ A (cid:107) C ,α ( Ω ) ≤ γ + 2 βC , soTheorem 12 is applicable. We obtain (cid:107) ˜ u (cid:107) C ,α ( Ω ) ≤ C (cid:107) ˜ p (cid:107) L ∞ ( Ω ) , where C only depends onthe claimed quantities. This proves the asserted Lipschitz continuity of p (cid:55)→ u ( p ). (cid:117)(cid:116) C The original problem: Optimality conditions

The ﬁrst order optimality conditions of (ROC) can be obtained by use of [8]. The space W q (div; Ω ), q ∈ [1 , ∞ ), that appears in the following is deﬁned in [8, Deﬁnition 10]. Theorem 15

Let Ω ⊂ R N , N ∈ { , , } , be a bounded Lipschitz domain and let r N = NN − if N > , respectively, r N ∈ [1 , ∞ ) if N = 1 . Then we have: The function ¯ u ∈ BV ( Ω ) is thesolution of (ROC) iﬀ there is ¯ h ∈ L ∞ ( Ω, R N ) ∩ W r N (div; Ω ) that satisﬁes (cid:107)| ¯ h |(cid:107) L ∞ ( Ω ) ≤ β and div ¯ h = ¯ p , where ¯ p is deﬁned as in section 2.1, as well as ¯ h = β ∇ ¯ u a |∇ ¯ u a | L N -a.e. in Ω \ { x : ∇ ¯ u a ( x ) = 0 } ,T ¯ h = β ¯ u + ( x ) − ¯ u − ( x ) | ¯ u + ( x ) − ¯ u − ( x ) | ν ¯ u H -a.e. in J ¯ u ,T ¯ h = βσ C ¯ u |∇ ¯ u c | -a.e.Here, the ﬁrst, second and third equation correspond to the absolutely continuous part, thejump part, respectively, the Cantor part of the vector measure ∇ ¯ u . Also, σ C ¯ u is the Radon-Nikodym density of ∇ ¯ u c with respect to |∇ ¯ u c | , cf. e.g. [9, Theorem 9.1]. Moreover, ν ¯ u is thejump direction of ¯ u and J ¯ u denotes the discontinuity set of ¯ u in the sense of [4, Deﬁnition3.63]. Further, H is the Hausdorﬀ measure of J ¯ u . The operator T : dom( T ) ⊂ W div ,q ( Ω ) ∩ L ∞ ( Ω, R N ) → L ( Ω, R N , |∇ u | ) is called the full trace operator and is introduced in [8,Deﬁnition 12]. We emphasize that ¯ h ∈ dom( T ) .Proof The well-known optimality condition 0 ∈ ∂j (¯ u ) from convex analysis can be expressedas − ¯ pβ ∈ ∂ | ¯ u | BV( Ω ) , so the claim follows from [8, Proposition 8]. (cid:117)(cid:116) Remark 8

Theorem 15 implies the sparsity relation { x : ∇ ¯ u a ( x ) (cid:54) = 0 } ⊂ { x : | ¯ h ( x ) | = β } .Since { x : | ¯ h ( x ) | = β } typically has small Lebesgue measure (often: measure 0), ¯ u is usuallyconstant a.e. in large parts (often: all) of Ω ; cf. also the example in section D. path-following inexact Newton method for optimal control in BV 37 D An example with explicit solution

Using rotational symmetry we construct an example for (OC) for N = 2 with an explicitsolution. We let A = − ∆ and c ≡ h : [0 , ∞ ) → R ,ˆ h ( r ) := β (cos( πR r ) −

1) and Ω := B R (0) \ B R (0), where the parameters R > β > r ( x, y ) := (cid:112) x + y , ¯ h ( x, y ) := ˆ h ( r ( x, y )) ∇ r ( x, y ) and ¯ u ( x, y ) := 1 ( R, R ) ( r ( x, y )) , all of which are deﬁned on Ω . The problem data is given by¯ p := div ¯ h, ¯ y := S ¯ u and y Ω := ∆ ¯ p + ¯ y. We now show that these quantities satisfy the properties of Theorem 15. By construction ¯ y and¯ p are the state and adjoint state associated to ¯ u and we have ¯ p = div ¯ h . We check the propertiesof ¯ h . Since |∇ r | = 1 for ( x, y ) ∈ Ω , we obtain | ¯ h ( x, y ) | = | ˆ h ( r ( x, y )) | ≤ β β . We also seethat ¯ h is C in ¯ Ω and satisﬁes ¯ h = 0 on ∂Ω so that ¯ h ∈ L ∞ ( Ω, R N ) ∩ W q (div; Ω ) for any q ∈ [1 , ∞ ). By [8, Proposition 6] we have T ¯ h = ¯ h . As ∇ ¯ u ( x, y ) = −∇ r ( x, y ) H ∂B R (0) ( x, y ),we ﬁnd that ∇ ¯ u has no Cantor part and no parts that are absolutely continuous withrespect to the Lebesgue measure. Thus, the ﬁrst and third condition on ¯ h in Theorem 15 aretrivially satisﬁed. For ( x, y ) ∈ ∂B R (0) = J ¯ u we have ¯ h ( x, y ) = − β ∇ r ( x, y ) = − βν ¯ u and¯ u + ( x ) = 0, ¯ u − ( x ) = 1 for x ∈ J ¯ u , hence the second condition on ¯ h in Theorem 15 holds. Letus conﬁrm that ¯ p satisﬁes the homogeneous Dirichlet boundary conditions. From ∆r = r − and |∇ r | = 1 we obtain¯ p = div ¯ h = ∇ ˆ h ( r ) T ∇ r + ˆ h ( r ) ∆r = ˆ h (cid:48) ( r ) |∇ r | + r − ˆ h ( r ) = ˆ h (cid:48) ( r ) + r − ˆ h ( r ) . Thus, ¯ p satisﬁes the boundary conditions. Let us conﬁrm that ¯ y satisﬁes the boundaryconditions. The Ansatz ¯ y ( x, y ) = ˆ y ( r ( x, y )), with ˆ y : Ω → R to be determined, yields − ( R, R/ ( r ) = − ¯ u ( x, y ) = ∆ ¯ y ( x, y ) = div(ˆ y (cid:48) ( r ) ∇ r ) = ˆ y (cid:48)(cid:48) ( r ) + r − ˆ y (cid:48) ( r ) . This leads to ˆ y ( r ) = (cid:40) − r + A ln( r/ (2 R )) + B if r ∈ ( R, R/ ,C ln( r/ (2 R )) if r ∈ (3 R/ , R ) , and it is straightforward to check that ¯ y satisﬁes the boundary conditions and is continuouslydiﬀerentiable for the parameters A = R ·

18 ln(3 / − / , B = 9 R (cid:18) − ln(3 / (cid:19) and C = R ·

18 ln(3 / − / , All in all, the optimality conditions of Theorem 15 are satisﬁed. Moreover, the optimal valuein this example is given by j (¯ u ) = 12 (cid:107) ¯ y − y Ω (cid:107) L ( Ω ) + β | ¯ u | BV( Ω ) = 12 (cid:107) ∆ ¯ p (cid:107) L ( Ω ) + β | ¯ u | BV( Ω ) , which for R = 2 π results in j (¯ u ) = β π (cid:18) π + ln(8) + 154 Ci(2 π ) −

274 Ci(4 π ) + 3 Ci(8 π ) (cid:19) + 6 π β ≈ . β + 59 . β with Ci( t ) := − (cid:82) ∞ t cos ττ d τ . Acknowledgements

Dominik Hafemeyer acknowledges support from the graduate programTopMath of the Elite Network of Bavaria and the TopMath Graduate Center of TUMGraduate School at Technische Universit¨at M¨unchen. He received a scholarship from theStudienstiftung des deutschen Volkes. He receives support from the IGDK Munich-Graz.Funded by the Deutsche Forschungsgemeinschaft, grant no 188264188/GRK1754.8 D. Hafemeyer, F. Mannel

References

1. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posedproblems. Inverse Probl. (6), 1217–1229 (1994). DOI 10.1088/0266-5611/10/6/0032. Allendes, A., Fuica, F., Ot´arola, E.: Adaptive ﬁnite element methods for sparse PDE-constrained optimization. IMA Journal of Numerical Analysis (3), 2106–2142 (2019).DOI 10.1093/imanum/drz0253. Alnæs, M.S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson,C., Ring, J., Rognes, M.E., Wells, G.N.: The FEniCS project version 1.5. Archive ofNumerical Software (100), 9–23 (2015). DOI 10.11588/ans.2015.100.205534. Ambrosio, L., Fusco, N., Pallara, D.: Functions of bounded variation and free discontinuityproblems. Oxford Mathematical Monographs. The Clarendon Press, Oxford UniversityPress (2000)5. Attouch, H., Buttazzo, G., Michaille, G.: Variational analysis in Sobolev and BV spaces.Applications to PDEs and optimization. 2nd revised ed., MPS/SIAM Series on Opti-mization , vol. 6. SIAM (2014). DOI 10.1137/1.97816119734886. Bartels, S.: Total variation minimization with ﬁnite elements: convergence and iterativesolution. SIAM J. Numer. Anal. (3), 1162–1180 (2012). DOI 10.1137/11083277X7. Bergounioux, M., Bonnefond, X., Haberkorn, T., Privat, Y.: An optimal control problemin photoacoustic tomography. Math. Models Methods Appl. Sci. (12), 2525–2548(2014). DOI 10.1142/S02182025145002868. Bredies, K., Holler, M.: A pointwise characterization of the subdiﬀerential of the totalvariation functional (2012). Preprint, IGDK17549. Brokate, M., Kersting, G.: Measure and integral. Cham: Birkh¨auser/Springer (2015).DOI 10.1007/978-3-319-15365-010. Casas, E., Clason, C., Kunisch, K.: Approximation of elliptic control problems in measurespaces with sparse solutions. SIAM J. Control Optim. (4), 1735–1752 (2012). DOI10.1137/11084321611. Casas, E., Clason, C., Kunisch, K.: Parabolic control problems in measure spaces withsparse solutions. SIAM J. Control Optim. (1), 28–63 (2013)12. Casas, E., Kogut, P.I., Leugering, G.: Approximation of optimal control problems in thecoeﬃcient for the p -Laplace equation. I: Convergence result. SIAM J. Control Optim. (3), 1406–1422 (2016). DOI 10.1137/15M102810813. Casas, E., Kruse, F., Kunisch, K.: Optimal control of semilinear parabolic equationsby BV-functions. SIAM J. Control Optim. (3), 1752–1788 (2017). DOI 10.1137/16M105651114. Casas, E., Kunisch, K.: Optimal control of semilinear elliptic equations in measure spaces.SIAM J. Control Optim. (1), 339–364 (2014). DOI 10.1137/13092188X15. Casas, E., Kunisch, K.: Analysis of optimal control problems of semilinear ellipticequations by BV-functions. Set-Valued Var. Anal. (2), 355–379 (2019). DOI 10.1007/s11228-018-0482-716. Casas, E., Kunisch, K., Pola, C.: Some applications of BV functions in optimal controland calculus of variations. ESAIM, Proc. , 83–96 (1998). DOI 10.1051/proc:199802217. Casas, E., Kunisch, K., Pola, C.: Regularization by functions of bounded variationand applications to image enhancement. Appl. Math. Optim. (2), 229–257 (1999).DOI 10.1007/s00245990012418. Casas, E., Ryll, C., Tr¨oltzsch, F.: Sparse optimal control of the Schl¨ogl and Fitzhugh-Nagumo systems. Comput. Methods Appl. Math. (4), 415–442 (2013). DOI 10.1515/cmam-2013-001619. Casas, E., Vexler, B., Zuazua, E.: Sparse initial data identiﬁcation for parabolic PDEand its ﬁnite element approximations. Math. Control Relat. Fields (3), 377–399 (2015).DOI 10.3934/mcrf.2015.5.37720. Chan, T.F., Zhou, H.M., Chan, R.H.: Continuation method for total variation denoisingproblems. In: F.T. Luk (ed.) Advanced Signal Processing Algorithms, vol. 2563, pp. 314–325. International Society for Optics and Photonics, SPIE (1995). DOI 10.1117/12.21140821. Clason, C., Kruse, F., Kunisch, K.: Total variation regularization of multi-materialtopology optimization. ESAIM Math. Model. Numer. Anal. (1), 275–303 (2018).DOI 10.1051/m2an/2017061 path-following inexact Newton method for optimal control in BV 3922. Clason, C., Kunisch, K.: A duality-based approach to elliptic control problems in non-reﬂexive Banach spaces. ESAIM Control Optim. Calc. Var. (1), 243–266 (2011).DOI 10.1051/cocv/201000323. Engel, S., Kunisch, K.: Optimal control of the linear wave equation by time-dependingBV-controls: A semi-smooth Newton approach. Math. Control Relat. Fields (3),591–622 (2020). DOI 10.3934/mcrf.202001224. Engel, S., Vexler, B., Trautmann, P.: Optimal ﬁnite element error estimates for an optimalcontrol problem governed by the wave equation with controls of bounded variation. IMAJournal of Numerical Analysis (2020). DOI 10.1093/imanum/draa03225. Ern, A., Guermond, J.L.: Theory and practice of ﬁnite elements, Applied MathematicalSciences , vol. 159. Springer (2004). DOI 10.1007/978-1-4757-4355-526. Grisvard, P.: Elliptic problems in nonsmooth domains, vol. 69, reprint of the 1985hardback edn. SIAM (2011). DOI 10.1137/1.978161197203027. Hafemeyer, D.: Optimale Steuerung von Diﬀerentialgleichungen mit BV-Funktionen.Bachelor’s thesis (2016)28. Hafemeyer, D.: Regularization and Discretization of a BV-controlled Elliptic Problem:A Completely Adaptive Approach. Master’s thesis (2017)29. Hafemeyer, D.: Optimal control of parabolic obstacle problems - optimality conditionsand numerical analysis. Dissertation, Technische Universit¨at M¨unchen, M¨unchen (2020)30. Hafemeyer, D., Mannel, F., Neitzel, I., Vexler, B.: Finite element error estimates forone-dimensional elliptic optimal control by BV functions. Math. Control Relat. Fields (2), 333–363 (2020). DOI 10.3934/mcrf.201904131. Herzog, R., Stadler, G., Wachsmuth, G.: Directional sparsity in optimal control ofpartial diﬀerential equations. SIAM J. Control Optim. (2), 943–963 (2012). DOI10.1137/10081503732. Kato, T.: Perturbation theory for linear operators. Die Grundlehren der mathematischenWissenschaften, Band 132. Springer (1966). DOI 10.1007/978-3-662-12678-333. Kelley, C.T.: Iterative methods for linear and nonlinear equations, vol. 16. SIAM (1995).DOI 10.1137/1.978161197094434. Li, C., Stadler, G.: Sparse solutions in optimal control of PDEs with uncertain parameters:the linear case. SIAM J. Control Optim. (1), 633–658 (2019). DOI 10.1137/18M118141935. Li, D., Fukushima, M.: A derivative-free line search and global convergence of Broyden-like method for nonlinear equations. Optim. Methods Softw. (3), 181–201 (2000).DOI 10.1080/1055678000880578236. Lieberman, G.M.: Boundary regularity for solutions of degenerate elliptic equations.Nonlinear Anal. (11), 1203–1219 (1988). DOI 10.1016/0362-546X(88)90053-337. Logg, A., Mardal, K.A., Wells, G.N., et al.: Automated Solution of Diﬀerential Equationsby the Finite Element Method. Springer (2012). DOI 10.1007/978-3-642-23099-838. Logg, A., Wells, G.N.: Dolﬁn: Automated ﬁnite element computing. ACM Transactionson Mathematical Software (2) (2010). DOI 10.1145/1731022.173103039. Logg, A., Wells, G.N., Hake, J.: DOLFIN: a C++/Python Finite Element Library,chap. 10. Springer (2012). DOI 10.1007/978-3-642-23099-8 1040. Meyers, N.G., Ziemer, W.P.: Integral Inequalities of Poincar´e and Wirtinger Type forBV Functions. American Journal of Mathematics (6), 1345–1360 (1977). DOI10.2307/237402841. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algo-rithms. Physica D , 259–268 (1992). DOI 10.1016/0167-2789(92)90242-F42. Rudin, W.: Real and complex analysis, third edn. McGraw-Hill Book Co. (1987)43. Savar´e, G.: Regularity results for elliptic equations in Lipschitz domains. J. Funct. Anal. (1), 176–201 (1998). DOI 10.1006/jfan.1997.315844. Schiela, A.: An interior point method in function space for the eﬃcient solution of stateconstrained optimal control problems. Math. Program. (1-2 (A)), 83–114 (2013).DOI 10.1007/s10107-012-0595-y45. Stadler, G.: Elliptic optimal control problems with L -control cost and applicationsfor the placement of control devices. Comput. Optim. Appl. (2), 159–181 (2009).DOI 10.1007/s10589-007-9150-946. Troianiello, G.M.: Elliptic diﬀerential equations and obstacle problems. The UniversitySeries in Mathematics. Plenum Press, New York (1987). DOI 10.1007/978-1-4899-3614-147. Wathen, A.J.: Realistic eigenvalue bounds for the Galerkin mass matrix. IMA J. Numer.Anal.7