[PDF] (Non)uniqueness of critical points in variational data assimilation

Abstract

In this paper we apply the 4D-Var data assimilation scheme to the initialization problem for a family of quasilinear evolution equations. The resulting variational problem is non-convex, so it need not have a unique minimizer. We comment on the implications of non-uniqueness in numerical applications, then prove uniqueness results in the following situations: 1) the observational times are all sufficiently small; 2) the prior covariance is sufficiently small. We also give an example of a data set where the cost functional has a critical point of arbitrarily large Morse index.

Full PDF

aa r X i v : . [ m a t h . O C ] O c t (NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONALDATA ASSIMILATION GRAHAM COX

Abstract.

In this paper we apply the 4D-Var data assimilation scheme tothe initialization problem for a family of quasilinear evolution equations. Theresulting variational problem is non-convex, so it need not have a unique min-imizer. We comment on the implications of non-uniqueness in numerical ap-plications, then prove uniqueness results in the following situations: 1) theobservational times are all suﬃciently small; 2) the prior covariance is suﬃ-ciently small. We also give an example of a data set where the cost functionalhas a critical point of arbitrarily large Morse index.

Keywords.

Variational data assimilation; Inverse problems; Quasilinear evo-lution equations. Introduction

An important problem in data assimilation is to estimate the initial state of aphysical system when only given access to noisy, incomplete observations of thestate at later times. To make this more precise, suppose y ( t ) solves an evolutionequation y t = F ( y ) in some function space V , and the observations of the stateare given by a bounded linear operator H : V → R q . Then given observations z , . . . , z N ∈ R q at times t < · · · < t N , one would like to ﬁnd the initial condition u = y (0) that best matches the empirical data.Of course it is important to carefully formulate what is meant by the “best”initial condition, to ensure that the problem is well-posed and has a physicallymeaningful solution. One approach is to minimize the log-likelihood N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) over the set of all possible initial conditions u , where R is the observational co-variance matrix, and y ( t i ) is the solution to the evolution equation with initialcondition u , evaluated at time t i . However, the resulting variational problem turnsout to be ill-posed, in the sense that it does not necessarily possess a minimizer in V . One possible resolution is to add a regularization term to the cost functional, ofthe form J ( u ) := 12 N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) + 12 σ k u − u k V (1)for some ﬁxed u ∈ V and σ >

0. The analytic motivation for this is clear—thecost functional is now coercive over V and hence can be shown through standardvariational methods to admit a minimizer (see [8] and [9] for details). From adata analysis point of view, there is a Bayesian interpretation of (1) in which the regularization term corresponds to a Gaussian prior distribution with covarianceproportional to σ .It is common practice (see, for instance, [2, 3, 6, 7]) to solve a suitable dis-cretization of the regularized variational problem using a gradient-based algorithm.Implicit in the implementation of such an algorithm is the assumption of a uniqueminimizer for the variational problem—gradient descent methods are of course localand do not have the ability to distinguish between local and global minima. Theproblem of uniqueness has so far received little attention in the literature. A short-time uniqueness result for Burgers’ equation appeared in [9] under the assumptionof continuous-in-time observations, using the cost functional Z T | Hy ( t ) − z ( t ) | dt + 12 σ k u − u k V . There it was shown that the variational problem admits a unique minimizer whenthe maximal observation time, T , is suﬃciently small.The discrete problem was investigated numerically in [1], where a unique min-imizer was observed as long as σ >

0. For the non-regularized σ = 0 case (cor-responding to an improper prior in the Bayesian formulation) multiple minimizerswere found numerically.The goal of this paper is to give a rigorous Bayesian formulation of the vari-ational problem for a family of quasilinear evolution equations (which includesreaction-diﬀusion equations and viscous conservation laws) and determine suﬃcientconditions to guarantee unimodality of the resulting posterior distribution.1.1. Some notation and conventions.

Throughout we denote the L (0 ,

1) normand inner product by k · k and h· , ·i , respectively. We let V = H (0 , k u k V := k u x k . This is clearly equivalent to the standard H norm, because π k u k ≤ k u x k for any u ∈ H (0 , H (0 , ⊂ L ∞ (0 , | u | ≤ k u x k . We will frequently make use of the inequality between thearithmetic and geometric means,2 ab ≤ λa + λ − b (2)for any positive a , b and λ , which we refer to as the AM–GM inequality.2. Statement of results

For the remainder of the paper we consider a quasilinear parabolic equation y t + f ( y ) x = y xx + r ( y )(3)on the interval [0 , f and r are both of class C . This is more than suﬃcient toguarantee that the initial value problem for (3) is well-posed, as will be seen inProposition 3. The additional regularity is needed in computing the ﬁrst and secondvariation of the cost functional. We also need to ensure that the initial valueproblem admits a global (in time) solution for any initial value, so that J is well-deﬁned on all of H . This will be the case if Z −∞ | r ( y ) | + 1 dy = Z ∞ | r ( y ) | + 1 dy = ∞ . (4)If this condition is not satisﬁed, there may exist initial conditions for which thesolution blows up in a ﬁnite amount of time. NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 3

We also assume that the observation operator H is bounded on L , and hencehas a bounded adjoint H ∗ : R q → L .Our ﬁrst result is that the problem has a natural Bayesian formulation withrespect to a Gaussian prior distribution, the signiﬁcance of which will be discussedin Section 5. This requires a further assumption on r and f that will not be neededelsewhere in the paper. Theorem 1.

Let µ denote the Gaussian measure on L (0 , with covariance C = − σ ∆ − and mean u , and suppose that r ( y ) and f ′ ( y ) are uniformly Lipschitz.Then there is a well-deﬁned posterior measure µ z , with Radon–Nikodym derivative dµ z dµ ( u ) ∝ exp ( − N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) ) . (5) Moreover, the mean and covariance of the posterior distribution are continuousfunctions of the data, z = { z i } . In fact, one has that the posterior measure is Lipschitz with respect to theHellinger metric; the reader is referred to [8] for further details. The nontrivialityof this result is due to the inﬁnite-dimensional setting of the problem. Because thereis no analog of the Lebesgue measure for inﬁnite-dimensional spaces, one cannotdeﬁne the posterior measure using the exponential of J as a density, as is done inﬁnite dimensions. Thus it is necessary to deﬁne the posterior relative to the priordistribution, and care must be taken to ensure that this density, given by (5), isin fact µ -integrable and hence can be normalized. This normalizability will followfrom estimates on solutions to the nonlinear evolution equation.There is thus a Bayesian formulation of the regularized variational problem,for which the MAP (Maximum A Posteriori ) estimators are precisely the globalminima of the cost functional (1). With this framework in mind, we study theuniqueness and non-uniqueness of minima for J ( u ).We assume throughout that the data are uniformly bounded, with | z i | ≤ D (6)for all i . Our ﬁrst result is that J has a unique minimizer when all of the observa-tional times are suﬃciently small. Theorem 2.

There is a constant T , depending on N , D , k u k and σ , such that(1) has a unique global minimum in V if t N < T . The time T also depends on the observation operator, H and the observationalcovariance, R , but we consider these to be ﬁxed throughout, and hence will notexplicitly note this dependence. We will similarly not mention any dependence ofconstants on the functions f and r in (3), though this dependence can easily bededuced from the proofs if desired.The theorem is proved in Section 6 by ﬁrst observing that all minimizers arecontained in a ﬁxed ball B ⊂ V , then showing that the cost functional is convexover B as long as the observational times are small enough that nonlinear eﬀectsare not yet dominant. This diﬀers from the uniqueness result in [9] because in thediscrete-time case there are non-vanishing contributions to the cost functional evenas t N →

0, whereas in the continuous case the observational term Z T | Hy ( t ) − z ( t ) | dt GRAHAM COX vanishes in the T = 0 limit. For this reason we need to consider the second variationof the cost functional. In the continuous case the Euler–Lagrange equation can beexpressed as a ﬁxed-point equation for a nonlinear map that is a contraction forsmall T , but this contractive property is easily seen to fail in the discrete case, evenfor linear equations.We next show that it is possible to obtain a uniqueness result for any set ofobservational times, provided the observational covariance is suﬃciently small. Theorem 3.

There is a constant σ > , depending on, N , D , k u k and t N , suchthat (1) has a unique global minimum in V if σ < σ . We will see explicitly in (15) how f ′′ and r ′′ can lead to nonconvexity in J . Thegeneral idea behind the preceding uniqueness theorems is thus to determine underwhat conditions these nonlinear eﬀects can be dominated by the linear term comingfrom the Gaussian prior distribution.It will be seen in the proofs below that Theorems 2 and 3 (as well as the unique-ness result in [9]) in fact establish the stronger result that the cost functionalhas a unique critical point in a closed subset of H that necessarily contains anyglobal minima. This observation could be useful in implementing a gradient descentmethod, because it says that one can avoid suprious local minima by ensuring thatthe algorithm starts in the bounded region given by Lemma 6, where J is knownto be convex.Our ﬁnal result shows that the behavior of J can be rather complicated in general. Theorem 4.

Consider a reaction-diﬀusion equation y t = y xx + r ( y ) where r (0) = r ′ (0) = 0 and r ′′ (0) = 0 . Let H denote projection onto the ﬁrst Fourier coeﬃcient,and set R = 1 . Then for any positive integer q and times t < · · · < t N there existdata { z i } and a prior u such that u ≡ is a critical point of J with Morse indexgreater than or equal to q . Thus there are cases in which J is not globally convex, and has at least twocritical points (since it is already known to have a minimizer).3. The variational framework

We start our investigation by deriving the Euler–Lagrange equation for the vari-ational problem (1) in the space V . We also compute the second variation of thecost functional as it will be needed in proving the uniqueness theorems.We ﬁrst recall that y denotes the unique solution to (3) with Dirichlet boundaryconditions and y (0) = u . The variation of y with respect to the initial value u , inthe v -direction, is denoted η := Dy ( u ) v , and satisﬁes the initial value problem η t + [ f ′ ( y ) η ] x = η xx + r ′ ( y ) η (7) η (0) = v. Similarly, the second variation of y is denoted ω := D y ( u )( v, v ) and satisﬁes ω t + (cid:2) f ′ ( y ) ω + f ′′ ( y ) η (cid:3) x = ω xx + r ′ ( y ) ω + r ′′ ( y ) η (8) ω (0) = 0 . We observe that ω ≡ f ′′ and r ′′ vanish, which happens precisely when theforward equation is linear. NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 5

We let p denote the solution to the adjoint equation − p t − f ′ ( y ) p x = p xx + r ′ ( y ) p (9) p ( t N ) = 0with a discontinuous jump p ( t + i ) − p ( t − i ) = H ∗ R − ( Hy ( t i ) − z i )(10)prescribed at each observation time, t i . The deﬁnition is such that h p t , η i + h p, η t i = 0(11)for all t = t i .The ﬁrst variation of the cost functional (1) can thus be written DJ ( u )( v ) = N X i =1 (cid:10) η ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) + 1 σ h v, u − u i V , and from the deﬁnition of p we obtain N X i =1 (cid:10) η ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) = N X i =1 (cid:10) η ( t i ) , p ( t + i ) − p ( t − i ) (cid:11) = − h η (0) , p (0) i− N X i =1 (cid:2)(cid:10) η ( t i ) , p ( t − i ) (cid:11) − (cid:10) η ( t i − ) , p ( t + i − ) (cid:11)(cid:3) where we have set t = 0. Then (11) implies (cid:10) η ( t i ) , p ( t − i ) (cid:11) − (cid:10) η ( t i − ) , p ( t + i − ) (cid:11) = Z t i t i − ddt h p, η i dt = 0for each i , hence DJ ( u ) v = − h v, p (0) i + σ − h v, u − u i V . Integrating the ﬁrst term by parts, we arrive at the following.

Proposition 1.

The V -gradient of J is given by DJ ( u ) = ∆ − p (0) + σ − ( u − u ) . (12)To better understand this result, it is worth recalling that the solution to theadjoint equation depends on y and hence on the initial condition, u . With this de-pendence explicitly written as p [ y ( u )], the Euler–Lagrange equation can be viewedas a ﬁxed-point equation for the map u u − σ ∆ − p [ y ( u )](0)(13)Proceeding similarly for the second variation, we ﬁnd D J ( u )( v, v ) = N X i =1 h(cid:10) ω ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) + k R − / Hη ( t i ) k i + 1 σ k v k V (14) GRAHAM COX and N X i =1 (cid:10) ω ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) = − N X i =1 (cid:2)(cid:10) ω ( t i ) , p ( t − i ) (cid:11) − (cid:10) ω ( t i − ) , p ( t + i − ) (cid:11)(cid:3) because ω (0) = 0. Using (8) and (9) and integrating by parts, we ﬁnd that h p t , ω i + h p, ω t i = (cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11) , with the following consequence. Proposition 2.

The Hessian of J is given by D J ( u )( v, v ) = Z t N (cid:2)(cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11)(cid:3) dt (15) + N X i =1 (cid:12)(cid:12)(cid:12) R − / Hη ( t i ) (cid:12)(cid:12)(cid:12) + 1 σ k v k V . We observe that this term is positive deﬁnite if f ′′ = r ′′ = 0, but the Euler–Lagrange map given in (13) may fail to be a contraction in that case.4. Analytic preliminaries

In this section we review some analysis for quasilinear evolution equations. Werestrict our attention to equations that admit global solutions for all initial condi-tions, as this guarantees the cost functional is deﬁned on all of V . Proposition 3.

Suppose r and f ′ are locally Lipschitz, and (4) is satisﬁed. Thenfor any initial condition u ∈ H , (3) has a unique classical solution on [0 , × (0 , ∞ ) . This result is a direct consequence of the material in Chapter 3 of [4]; for thesake of completeness we verify some of the necessary details here.

Proof.

The local existence and uniqueness follows from Theorem 3.3.3 of [4], be-cause the map y r ( y ) − f ′ ( y ) y x is locally Lipschitz from H to L . This yieldsa solution y ∈ C (cid:0) [0 , T ); L (cid:1) , with y ( t ) ∈ H ∩ H for all t ∈ (0 , T ). Moreover,from Theorem 3.5.2 of [4], the map t y t ∈ H is locally H¨older continuous. Inparticular this implies y and y t are continuous in both x and t . We also know that y ∈ H so by the Sobolev embedding theorem y x ∈ H is H¨older continuous. Wethen have for each ﬁxed t that y xx = y t + f ( y ) x − r ( y )is in C δ [0 ,

1] for some δ >

0, hence by elliptic regularity y ( t ) ∈ C δ [0 , y is a classical solution of (3).The long-time existence claim follows from Corollary 3.3.5 of [4] together withthe following pointwise bound from Lemma 1. (cid:3) We now gather some estimates on y , η and p that will be needed in proving theuniqueness theorems. The ﬁrst of these shows that y is uniformly bounded on anyﬁnite time interval. NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 7

Lemma 1.

Suppose y ( t ) solves (3) for t ∈ (0 , T ) , with k y (0) k ∞ ≤ A . Then k y ( t ) k ∞ ≤ B (16) for any t < T , where B depends on A and T .Proof. Let ψ + and ψ − solve the initial value problems ψ ′ + = | r ( ψ + ) | , ψ + (0) = Aψ ′− = −| r ( ψ − ) | , ψ − (0) = − A. Then the parabolic maximum principle ( cf.

Theorem 1 of [5]) guarantees that ψ − ( t ) ≤ y ( x, t ) ≤ ψ + ( t ) . From (4) we know that ψ ± exist and are continuous for all t ≥

0, so we completethe proof by setting B := max ≤ t ≤ T max {− ψ − ( t ) , ψ + ( t ) } (cid:3) Since f and r are of class C , for any given value of A and k ∈ { , , } thequantities R k ( A ) := sup | y |≤ B | r ( k ) ( y ) | (17) F k ( A ) := sup | y |≤ B | f ( k ) ( y ) | (18)are well deﬁned, with B as in the proof of Lemma 1.We next derive an estimate on the L norm of η . To simplify notation, weobserve that k η k L (0 , = k η k / . Lemma 2.

Suppose η ( t ) solves (7) for t ∈ (0 , T ) , and k y (0) k ∞ ≤ A . Then k η ( t ) k ≤ k v k e αt (19) for any t < T , where α depends on A and T .Proof. We diﬀerentiate and then integrate by parts to obtain14 ddt Z η dx = Z η ( η xx − [ f ′ ( y ) η ] x + r ′ ( y ) η ) dx = Z (cid:0) − η η x + 3 f ′ ( y ) η η x + r ′ ( y ) η (cid:1) dx Then by the AM–GM inequality (cid:12)(cid:12) η η x (cid:12)(cid:12) ≤ F ( A )4 η + 1 F ( A ) η η x so we ﬁnd that ddt Z η dx ≤ (cid:0) R ( A ) + 3 F ( A ) (cid:1) Z η dx. The result follows from Gronwall’s inequality. (cid:3)

GRAHAM COX

We ﬁnally turn to the adjoint equation. Invoking linearity, the solution can beexpressed as p = p + · · · p N , where p i satisﬁes (9) with terminal condition p ( t N ) = 0and jump p ( t i +) − p ( t i − ) = H ∗ R − ( Hy ( t i ) − z i ) . (20)Therefore it suﬃces to bound each p i individually, then sum the resulting estimates. Lemma 3.

Suppose p ( t ) solves (9), and k y (0) k ∞ ≤ A . Then k p ( t ) k ≤ Ce β ( t N − t ) (21) for any t ≤ t N , and Z t N k p x ( t ) k dt ≤ C √ t N e βt N , (22) where β depends on A and t N , and C depends on N , D , A and t N .Proof. Diﬀerentiating and applying the AM–GM inequality, as in the proof ofLemma 2, we have − ddt k p k = h p, p xx + r ′ ( y ) p + f ′ ( y ) p x i≤ −k p x k + R ( A ) k p k + F ( A ) k p kk p x k≤ (cid:18) R ( A ) + F ( A ) (cid:19) k p k and so an application of Gronwall’s inequality to the function k p ( t i − t ) k yields k p i ( t ) k ≤ (cid:13)(cid:13) H ∗ R − ( Hy ( t i ) − z i ) (cid:13)(cid:13) e β ( t i − t ) for any t ≤ t i , with β = R ( A ) + F ( A ) /

4. We next recall that y ( t ) is boundeduniformly (and hence in L ), so (cid:13)(cid:13) H ∗ R − ( Hy ( t i ) − z i ) (cid:13)(cid:13) ≤ C ′ , where C ′ depends on A and t N (through Lemma 1) and D . To complete the proofof (21) we simply note that p i ( t ) = 0 for t > t i , then let C = N C ′ .With a diﬀerent choice of constants in the AM–GM inequality, we obtain − ddt k p k ≤ − k p x k + (cid:18) R ( A ) + F ( A ) (cid:19) k p k and subsequently, letting γ = 2 R ( A ) + F ( A ) , k p x k ≤ ddt (cid:0) e γt k p k (cid:1) . Integrating, we ﬁnd Z t i k p ix ( t ) k dt ≤ e γt i k p i ( t i ) k − k p i (0) k ≤ e γt i C ′ for each i . Now from the Cauchy–Schwarz inequality, Z t N k p x ( t ) k dt ≤ N X i =1 (cid:18) t i Z t i k p xi ( t ) k dt (cid:19) / ≤ N C ′ √ t N e βt N , NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 9 where we have used the fact that γ ≤ β . (cid:3) The Bayesian formulation

Before proving Theorem 1 we elaborate on the meaning of the Gaussian priormeasure µ = N (cid:0) u , − σ ∆ − (cid:1) , following throughout the presentation of [8].We ﬁrst note that the covariance operator C = − σ ∆ − has eigenvalues γ n =( σ/nπ ) , with normalized eigenfunctions φ n ( x ) = √ nπx ). Then we can ex-press a random variable u ∼ µ using the Karhunen–Lo`eve expansion: u = u + √ ∞ X n =1 σξ n nπ sin( nπx ) , (23)where { ξ n } is an i.i.d. sequence of N (0 ,

1) random variables. This means the n th Fourier coeﬃcient of u is distributed according to N ( a n , ( σ/nπ ) ), where a n is the n th Fourier coeﬃcient of the prior mean, u . It follows that k u − u k = ∞ X n =1 σ ξ n ( nπ ) and so E k u − u k = σ . Thus σ measures the expected value of k u − u k .We now observe that Theorem 1 follows from Corollary 4.4 of [8]. To do so wemust show that:(i) L (0 ,

1) has full measure under µ ;(ii) for every ǫ > M ∈ R such that N X i =1 (cid:12)(cid:12)(cid:12) R − / Hy ( t i ) (cid:12)(cid:12)(cid:12) ≤ exp( ǫ k y (0) k + M )whenever y ( t ) is a solution to (3);(iii) for every ρ > L ∈ R such that N X i =1 (cid:12)(cid:12)(cid:12) R − / H ( y ( t i ) − y ( t i )) (cid:12)(cid:12)(cid:12) ≤ L k y (0) − y (0) k whenever y ( t ) and y ( t ) satisfy (3) with max {k y (0) k , k y (0) k} < ρ .To establish (i) we use Lemma 6.25 of [8], which says that any function u ∼ µ is almost surely α -H¨older continuous for any α < /

2. In particular, this implies u is almost surely contained in L (0 , µ [ L (0 , H is bounded on L , (ii) and (iii) will follow fromLemmas 4 and 5 below. Lemma 4.

Suppose r ( y ) is uniformly Lipschitz. Then there exist positive constants a and b so that k y ( t ) k ≤ e at (cid:2) k y (0) k + bt (cid:3) (24) and Z t k y x ( s ) k ds ≤ k y (0) k − k y ( t ) k + a Z t k y ( s ) k ds + bt (25) for all t ≥ and any solution y ( t ) to (3).Proof. Diﬀerentiating, we have12 ddt k y k = h y, y xx + r ( y ) − f ( y ) x i . Letting g ( y ) be an antiderivative of yf ′ ( y ), we ﬁnd that h y, f ( y ) x i = Z g ( y ) x dx vanishes by the fundamental theorem of calculus, so ddt k y k ≤ −k y x k + 2 h y, r ( y ) i . It follows immediately from the Lipschitz condition that | yr ( y ) | ≤ K | y | + | r (0) || y | ,which implies 2 | yr ( y ) | ≤ a | y | + b for some a and b . Then (24) is a consequence ofGronwall’s inequality, and (25) is obtained by integrating from 0 to t . (cid:3) To see how this implies (ii), we ﬁrst observe that it suﬃces to prove k y ( t ) k ≤ exp( ǫ k u k + M )for any t ≤ t N . Thus ﬁxing ǫ > e M = max { bt N e at N , ǫ − e at N } we have from Lemma 4 that k y ( t ) k ≤ e M (cid:0) ǫ k u k (cid:1) ≤ e M e ǫ k u k as required. Lemma 5.

Suppose r ( y ) and f ′ ( y ) are uniformly Lipschitz. Then for any ρ > there exists a positive constant L so that k y ( t ) − y ( t ) k ≤ L k u − u k for all t ≤ t N , provided y ( t ) and y ( t ) satisfy (3) with max {k y (0) k , k y (0) k} < ρ .Proof. For convenience we let K r and K f denote the Lipschitz constants of r and f ′ , respectively. Diﬀerentiating and then integrating by parts as in the proof ofLemma 4, we have12 ddt k y − y k = h y − y , ( y − y ) xx + r ( y ) − r ( y ) − f ( y ) x + f ( y ) x i≤ −k ( y − y ) x k + K r k y − y k + h y − y , f ( y ) x − f ( y ) x i . For the ﬁnal term we write f ( y ) x − f ( y ) x = f ′ ( y )( y − y ) x + [ f ′ ( y ) − f ′ ( y )] y x . and thus obtain |h y − y , f ( y ) x − f ( y ) x i| ≤ [ K f ( k y x k + k y x k ) + | f ′ (0) | ] k y − y kk ( y − y ) x k . NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 11

Then after an application of the AM–GM inequality, we ﬁnd that12 ddt k y − y k ≤ (cid:18) K r + 14 [ K f ( k y x k + k y x k ) + | f ′ (0) | ] (cid:19) k y − y k = α ( t ) k y − y k , where we have deﬁned α ( t ) to be the term in parentheses on the right-hand side.From (25) we know that R t α ( s ) ds is bounded above by a constant depending onlyon k y (0) k , k y (0) k and t N , so the result follows from Gronwall’s inequality. (cid:3) The uniqueness theorems

Our main tool for proving Theorems 2 and 3 will be the second variation formula(15) together with the following a priori estimate for minimizers of J . Lemma 6.

Let u ∗ achieve of the inﬁmum of the cost functional (1). Then k u ∗ k V ≤ A, (26) where A depends on N , D , t N , k u k and σ . It is clear from the proof that A can be assumed to be nondecreasing with respectto σ . Proof.

Since u ∗ is a minimizer it satisﬁes J ( u ∗ ) ≤ J (0). Letting y ( t ) solve (3) with y (0) = 0, Lemma 1 implies that y ( t ) is uniformly bounded for t ≤ t N . Therefore k u ∗ k V ≤ σ J ( u ∗ ) ≤ σ N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) + k u k V is bounded above as claimed. (cid:3) We now use the estimates of Section 4 to prove that, under the conditions ofTheorems 2 and 3, J is convex on the ball k u ∗ k V ≤ A . Discarding nonnegativeterms in (15), it suﬃces to show that Z t N (cid:12)(cid:12)(cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11)(cid:12)(cid:12) dt < σ k v k V . From Lemmas 2 and 3 we have (cid:12)(cid:12)(cid:10) r ′′ ( y ) η , p (cid:11)(cid:12)(cid:12) ≤ CR k v k e αt + β ( t N − t ) and Z t N (cid:12)(cid:12)(cid:10) f ′′ ( y ) η , p x (cid:11)(cid:12)(cid:12) dt ≤ CF k v k√ t N e ( α +2 β ) t N . Combining these estimates, we ﬁnd that Z t N (cid:12)(cid:12)(cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11)(cid:12)(cid:12) dt ≤ Γ k v k V √ t N , (27)where Γ depends on A (from Lemma 6), t N , N and D .It is clear that the constant Γ in (27) remains bounded as t N →

0. Thereforein proving Theorem 2 it suﬃces to choose t N suﬃciently small that Γ √ t N < σ − .Similarly for Theorem 3, we observe that Γ remains bounded as σ → σ small enough that σ − > Γ √ t N . The non-uniqueness theorem

Turning now to the proof of Theorem 4, we must establish that u = 0 is a criticalpoint of J , and the Hessian D J (0) has at least q negative eigenvalues. The keyto the proof is the observation that the Euler–Lagrange equation depends on boththe data and the prior, whereas the Hessian is independent of the prior. Thuswe can ﬁrst construct data to ensure D J (0) has the required number of negativeeigenvalues, and then choose the prior term to ensure that 0 is in fact a criticalpoint of J .The hypothesis r (0) = 0 ensures that y ( t ) = 0 is the unique solution of (3) with y (0) = 0. Then because r ′ (0) = 0, the linearized equation reduces to the heatequation, η t = η xx , and the adjoint equation becomes the backward heat equation, − p t = p xx .We compute the Hessian of J in the direction of the ﬁrst q Fourier modes, setting v n = sin( nπx ) for 1 ≤ n ≤ q , so k v n k V = 1 /

2. The corresponding solution to thelinearized forward equation is η ( x, t ) = e − n π t sin( nπx )and so N X i =1 (cid:12)(cid:12)(cid:12) R − / Hη ( t i ) (cid:12)(cid:12)(cid:12) ≤ N X i =1 e − π t i For each observation z i ∈ R we have H ∗ z i = z i sin( πx ) , hence the solution to the adjoint equation is given by p ( x, t ) = X { i : t

0, the Hessian will also be negative for all v n with1 ≤ n ≤ q . NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 13

To complete the proof, we choose the prior u := σ ∆ − p (0) . It follows immediately from (12) that u = 0 is a critical point of J . Acknowledgments

The author would like to thank Damon McDougall for numerous enlighteningconversations throughout the preparation of this work. This research has beensupported by the Oﬃce of Naval Research under the MURI grant N00014-11-1-0087.

References [1] Amit Apte, Didier Auroux, and Mythily Ramaswamy. Variational data assimilation for dis-crete Burgers equation. In

Proceedings of the Eighth Mississippi State-UAB Conference onDiﬀerential Equations and Computational Simulations , volume 19 of

Electron. J. Diﬀer. Equ.Conf. , pages 15–30, San Marcos, TX, 2010. Southwest Texas State Univ.[2] Carlos Castro, Francisco Palacios, and Enrique Zuazua. An alternating descent method forthe optimal control of the inviscid Burgers equation in the presence of shocks.

Math. ModelsMethods Appl. Sci. , 18(3):369–416, 2008.[3] Fran¸cois-Xavier Le Dimet and Olivier Talagrand. Variational algorithms for analysis and as-similation of meteorological observations: theoretical aspects.

Tellus A , 38A(2):97–110, 1986.[4] Daniel Henry.

Geometric theory of semilinear parabolic equations , volume 840 of

Lecture Notesin Mathematics . Springer-Verlag, Berlin, 1981.[5] Stanley Kaplan. On the growth of solutions of quasi-linear parabolic equations.

Comm. PureAppl. Math. , 16:305–330, 1963.[6] J. Lundvall, V. Kozlov, and P. Weinerfelt. Iterative methods for data assimilation for Burgers’equation.

J. Inverse Ill-Posed Probl. , 14(5):505–535, 2006.[7] Antje Noack and Andrea Walther. Adjoint concepts for the optimal control of Burgers equation.

Comput. Optim. Appl. , 36(1):109–133, 2007.[8] A. M. Stuart. Inverse problems: a Bayesian perspective.

Acta Numer. , 19:451–559, 2010.[9] Luther W. White. A study of uniqueness for the initialization problem for Burgers’ equation.

J. Math. Anal. Appl. , 172(2):412–431, 1993.

E-mail address : [email protected]@email.unc.edu