(Non)uniqueness of critical points in variational data assimilation
aa r X i v : . [ m a t h . O C ] O c t (NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONALDATA ASSIMILATION GRAHAM COX
Abstract.
In this paper we apply the 4D-Var data assimilation scheme tothe initialization problem for a family of quasilinear evolution equations. Theresulting variational problem is non-convex, so it need not have a unique min-imizer. We comment on the implications of non-uniqueness in numerical ap-plications, then prove uniqueness results in the following situations: 1) theobservational times are all sufficiently small; 2) the prior covariance is suffi-ciently small. We also give an example of a data set where the cost functionalhas a critical point of arbitrarily large Morse index.
Keywords.
Variational data assimilation; Inverse problems; Quasilinear evo-lution equations. Introduction
An important problem in data assimilation is to estimate the initial state of aphysical system when only given access to noisy, incomplete observations of thestate at later times. To make this more precise, suppose y ( t ) solves an evolutionequation y t = F ( y ) in some function space V , and the observations of the stateare given by a bounded linear operator H : V → R q . Then given observations z , . . . , z N ∈ R q at times t < · · · < t N , one would like to find the initial condition u = y (0) that best matches the empirical data.Of course it is important to carefully formulate what is meant by the “best”initial condition, to ensure that the problem is well-posed and has a physicallymeaningful solution. One approach is to minimize the log-likelihood N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) over the set of all possible initial conditions u , where R is the observational co-variance matrix, and y ( t i ) is the solution to the evolution equation with initialcondition u , evaluated at time t i . However, the resulting variational problem turnsout to be ill-posed, in the sense that it does not necessarily possess a minimizer in V . One possible resolution is to add a regularization term to the cost functional, ofthe form J ( u ) := 12 N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) + 12 σ k u − u k V (1)for some fixed u ∈ V and σ >
0. The analytic motivation for this is clear—thecost functional is now coercive over V and hence can be shown through standardvariational methods to admit a minimizer (see [8] and [9] for details). From adata analysis point of view, there is a Bayesian interpretation of (1) in which the regularization term corresponds to a Gaussian prior distribution with covarianceproportional to σ .It is common practice (see, for instance, [2, 3, 6, 7]) to solve a suitable dis-cretization of the regularized variational problem using a gradient-based algorithm.Implicit in the implementation of such an algorithm is the assumption of a uniqueminimizer for the variational problem—gradient descent methods are of course localand do not have the ability to distinguish between local and global minima. Theproblem of uniqueness has so far received little attention in the literature. A short-time uniqueness result for Burgers’ equation appeared in [9] under the assumptionof continuous-in-time observations, using the cost functional Z T | Hy ( t ) − z ( t ) | dt + 12 σ k u − u k V . There it was shown that the variational problem admits a unique minimizer whenthe maximal observation time, T , is sufficiently small.The discrete problem was investigated numerically in [1], where a unique min-imizer was observed as long as σ >
0. For the non-regularized σ = 0 case (cor-responding to an improper prior in the Bayesian formulation) multiple minimizerswere found numerically.The goal of this paper is to give a rigorous Bayesian formulation of the vari-ational problem for a family of quasilinear evolution equations (which includesreaction-diffusion equations and viscous conservation laws) and determine sufficientconditions to guarantee unimodality of the resulting posterior distribution.1.1. Some notation and conventions.
Throughout we denote the L (0 ,
1) normand inner product by k · k and h· , ·i , respectively. We let V = H (0 , k u k V := k u x k . This is clearly equivalent to the standard H norm, because π k u k ≤ k u x k for any u ∈ H (0 , H (0 , ⊂ L ∞ (0 , | u | ≤ k u x k . We will frequently make use of the inequality between thearithmetic and geometric means,2 ab ≤ λa + λ − b (2)for any positive a , b and λ , which we refer to as the AM–GM inequality.2. Statement of results
For the remainder of the paper we consider a quasilinear parabolic equation y t + f ( y ) x = y xx + r ( y )(3)on the interval [0 , f and r are both of class C . This is more than sufficient toguarantee that the initial value problem for (3) is well-posed, as will be seen inProposition 3. The additional regularity is needed in computing the first and secondvariation of the cost functional. We also need to ensure that the initial valueproblem admits a global (in time) solution for any initial value, so that J is well-defined on all of H . This will be the case if Z −∞ | r ( y ) | + 1 dy = Z ∞ | r ( y ) | + 1 dy = ∞ . (4)If this condition is not satisfied, there may exist initial conditions for which thesolution blows up in a finite amount of time. NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 3
We also assume that the observation operator H is bounded on L , and hencehas a bounded adjoint H ∗ : R q → L .Our first result is that the problem has a natural Bayesian formulation withrespect to a Gaussian prior distribution, the significance of which will be discussedin Section 5. This requires a further assumption on r and f that will not be neededelsewhere in the paper. Theorem 1.
Let µ denote the Gaussian measure on L (0 , with covariance C = − σ ∆ − and mean u , and suppose that r ( y ) and f ′ ( y ) are uniformly Lipschitz.Then there is a well-defined posterior measure µ z , with Radon–Nikodym derivative dµ z dµ ( u ) ∝ exp ( − N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) ) . (5) Moreover, the mean and covariance of the posterior distribution are continuousfunctions of the data, z = { z i } . In fact, one has that the posterior measure is Lipschitz with respect to theHellinger metric; the reader is referred to [8] for further details. The nontrivialityof this result is due to the infinite-dimensional setting of the problem. Because thereis no analog of the Lebesgue measure for infinite-dimensional spaces, one cannotdefine the posterior measure using the exponential of J as a density, as is done infinite dimensions. Thus it is necessary to define the posterior relative to the priordistribution, and care must be taken to ensure that this density, given by (5), isin fact µ -integrable and hence can be normalized. This normalizability will followfrom estimates on solutions to the nonlinear evolution equation.There is thus a Bayesian formulation of the regularized variational problem,for which the MAP (Maximum A Posteriori ) estimators are precisely the globalminima of the cost functional (1). With this framework in mind, we study theuniqueness and non-uniqueness of minima for J ( u ).We assume throughout that the data are uniformly bounded, with | z i | ≤ D (6)for all i . Our first result is that J has a unique minimizer when all of the observa-tional times are sufficiently small. Theorem 2.
There is a constant T , depending on N , D , k u k and σ , such that(1) has a unique global minimum in V if t N < T . The time T also depends on the observation operator, H and the observationalcovariance, R , but we consider these to be fixed throughout, and hence will notexplicitly note this dependence. We will similarly not mention any dependence ofconstants on the functions f and r in (3), though this dependence can easily bededuced from the proofs if desired.The theorem is proved in Section 6 by first observing that all minimizers arecontained in a fixed ball B ⊂ V , then showing that the cost functional is convexover B as long as the observational times are small enough that nonlinear effectsare not yet dominant. This differs from the uniqueness result in [9] because in thediscrete-time case there are non-vanishing contributions to the cost functional evenas t N →
0, whereas in the continuous case the observational term Z T | Hy ( t ) − z ( t ) | dt GRAHAM COX vanishes in the T = 0 limit. For this reason we need to consider the second variationof the cost functional. In the continuous case the Euler–Lagrange equation can beexpressed as a fixed-point equation for a nonlinear map that is a contraction forsmall T , but this contractive property is easily seen to fail in the discrete case, evenfor linear equations.We next show that it is possible to obtain a uniqueness result for any set ofobservational times, provided the observational covariance is sufficiently small. Theorem 3.
There is a constant σ > , depending on, N , D , k u k and t N , suchthat (1) has a unique global minimum in V if σ < σ . We will see explicitly in (15) how f ′′ and r ′′ can lead to nonconvexity in J . Thegeneral idea behind the preceding uniqueness theorems is thus to determine underwhat conditions these nonlinear effects can be dominated by the linear term comingfrom the Gaussian prior distribution.It will be seen in the proofs below that Theorems 2 and 3 (as well as the unique-ness result in [9]) in fact establish the stronger result that the cost functionalhas a unique critical point in a closed subset of H that necessarily contains anyglobal minima. This observation could be useful in implementing a gradient descentmethod, because it says that one can avoid suprious local minima by ensuring thatthe algorithm starts in the bounded region given by Lemma 6, where J is knownto be convex.Our final result shows that the behavior of J can be rather complicated in general. Theorem 4.
Consider a reaction-diffusion equation y t = y xx + r ( y ) where r (0) = r ′ (0) = 0 and r ′′ (0) = 0 . Let H denote projection onto the first Fourier coefficient,and set R = 1 . Then for any positive integer q and times t < · · · < t N there existdata { z i } and a prior u such that u ≡ is a critical point of J with Morse indexgreater than or equal to q . Thus there are cases in which J is not globally convex, and has at least twocritical points (since it is already known to have a minimizer).3. The variational framework
We start our investigation by deriving the Euler–Lagrange equation for the vari-ational problem (1) in the space V . We also compute the second variation of thecost functional as it will be needed in proving the uniqueness theorems.We first recall that y denotes the unique solution to (3) with Dirichlet boundaryconditions and y (0) = u . The variation of y with respect to the initial value u , inthe v -direction, is denoted η := Dy ( u ) v , and satisfies the initial value problem η t + [ f ′ ( y ) η ] x = η xx + r ′ ( y ) η (7) η (0) = v. Similarly, the second variation of y is denoted ω := D y ( u )( v, v ) and satisfies ω t + (cid:2) f ′ ( y ) ω + f ′′ ( y ) η (cid:3) x = ω xx + r ′ ( y ) ω + r ′′ ( y ) η (8) ω (0) = 0 . We observe that ω ≡ f ′′ and r ′′ vanish, which happens precisely when theforward equation is linear. NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 5
We let p denote the solution to the adjoint equation − p t − f ′ ( y ) p x = p xx + r ′ ( y ) p (9) p ( t N ) = 0with a discontinuous jump p ( t + i ) − p ( t − i ) = H ∗ R − ( Hy ( t i ) − z i )(10)prescribed at each observation time, t i . The definition is such that h p t , η i + h p, η t i = 0(11)for all t = t i .The first variation of the cost functional (1) can thus be written DJ ( u )( v ) = N X i =1 (cid:10) η ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) + 1 σ h v, u − u i V , and from the definition of p we obtain N X i =1 (cid:10) η ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) = N X i =1 (cid:10) η ( t i ) , p ( t + i ) − p ( t − i ) (cid:11) = − h η (0) , p (0) i− N X i =1 (cid:2)(cid:10) η ( t i ) , p ( t − i ) (cid:11) − (cid:10) η ( t i − ) , p ( t + i − ) (cid:11)(cid:3) where we have set t = 0. Then (11) implies (cid:10) η ( t i ) , p ( t − i ) (cid:11) − (cid:10) η ( t i − ) , p ( t + i − ) (cid:11) = Z t i t i − ddt h p, η i dt = 0for each i , hence DJ ( u ) v = − h v, p (0) i + σ − h v, u − u i V . Integrating the first term by parts, we arrive at the following.
Proposition 1.
The V -gradient of J is given by DJ ( u ) = ∆ − p (0) + σ − ( u − u ) . (12)To better understand this result, it is worth recalling that the solution to theadjoint equation depends on y and hence on the initial condition, u . With this de-pendence explicitly written as p [ y ( u )], the Euler–Lagrange equation can be viewedas a fixed-point equation for the map u u − σ ∆ − p [ y ( u )](0)(13)Proceeding similarly for the second variation, we find D J ( u )( v, v ) = N X i =1 h(cid:10) ω ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) + k R − / Hη ( t i ) k i + 1 σ k v k V (14) GRAHAM COX and N X i =1 (cid:10) ω ( t i ) , H ∗ R − ( Hy ( t i ) − z i ) (cid:11) = − N X i =1 (cid:2)(cid:10) ω ( t i ) , p ( t − i ) (cid:11) − (cid:10) ω ( t i − ) , p ( t + i − ) (cid:11)(cid:3) because ω (0) = 0. Using (8) and (9) and integrating by parts, we find that h p t , ω i + h p, ω t i = (cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11) , with the following consequence. Proposition 2.
The Hessian of J is given by D J ( u )( v, v ) = Z t N (cid:2)(cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11)(cid:3) dt (15) + N X i =1 (cid:12)(cid:12)(cid:12) R − / Hη ( t i ) (cid:12)(cid:12)(cid:12) + 1 σ k v k V . We observe that this term is positive definite if f ′′ = r ′′ = 0, but the Euler–Lagrange map given in (13) may fail to be a contraction in that case.4. Analytic preliminaries
In this section we review some analysis for quasilinear evolution equations. Werestrict our attention to equations that admit global solutions for all initial condi-tions, as this guarantees the cost functional is defined on all of V . Proposition 3.
Suppose r and f ′ are locally Lipschitz, and (4) is satisfied. Thenfor any initial condition u ∈ H , (3) has a unique classical solution on [0 , × (0 , ∞ ) . This result is a direct consequence of the material in Chapter 3 of [4]; for thesake of completeness we verify some of the necessary details here.
Proof.
The local existence and uniqueness follows from Theorem 3.3.3 of [4], be-cause the map y r ( y ) − f ′ ( y ) y x is locally Lipschitz from H to L . This yieldsa solution y ∈ C (cid:0) [0 , T ); L (cid:1) , with y ( t ) ∈ H ∩ H for all t ∈ (0 , T ). Moreover,from Theorem 3.5.2 of [4], the map t y t ∈ H is locally H¨older continuous. Inparticular this implies y and y t are continuous in both x and t . We also know that y ∈ H so by the Sobolev embedding theorem y x ∈ H is H¨older continuous. Wethen have for each fixed t that y xx = y t + f ( y ) x − r ( y )is in C δ [0 ,
1] for some δ >
0, hence by elliptic regularity y ( t ) ∈ C δ [0 , y is a classical solution of (3).The long-time existence claim follows from Corollary 3.3.5 of [4] together withthe following pointwise bound from Lemma 1. (cid:3) We now gather some estimates on y , η and p that will be needed in proving theuniqueness theorems. The first of these shows that y is uniformly bounded on anyfinite time interval. NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 7
Lemma 1.
Suppose y ( t ) solves (3) for t ∈ (0 , T ) , with k y (0) k ∞ ≤ A . Then k y ( t ) k ∞ ≤ B (16) for any t < T , where B depends on A and T .Proof. Let ψ + and ψ − solve the initial value problems ψ ′ + = | r ( ψ + ) | , ψ + (0) = Aψ ′− = −| r ( ψ − ) | , ψ − (0) = − A. Then the parabolic maximum principle ( cf.
Theorem 1 of [5]) guarantees that ψ − ( t ) ≤ y ( x, t ) ≤ ψ + ( t ) . From (4) we know that ψ ± exist and are continuous for all t ≥
0, so we completethe proof by setting B := max ≤ t ≤ T max {− ψ − ( t ) , ψ + ( t ) } (cid:3) Since f and r are of class C , for any given value of A and k ∈ { , , } thequantities R k ( A ) := sup | y |≤ B | r ( k ) ( y ) | (17) F k ( A ) := sup | y |≤ B | f ( k ) ( y ) | (18)are well defined, with B as in the proof of Lemma 1.We next derive an estimate on the L norm of η . To simplify notation, weobserve that k η k L (0 , = k η k / . Lemma 2.
Suppose η ( t ) solves (7) for t ∈ (0 , T ) , and k y (0) k ∞ ≤ A . Then k η ( t ) k ≤ k v k e αt (19) for any t < T , where α depends on A and T .Proof. We differentiate and then integrate by parts to obtain14 ddt Z η dx = Z η ( η xx − [ f ′ ( y ) η ] x + r ′ ( y ) η ) dx = Z (cid:0) − η η x + 3 f ′ ( y ) η η x + r ′ ( y ) η (cid:1) dx Then by the AM–GM inequality (cid:12)(cid:12) η η x (cid:12)(cid:12) ≤ F ( A )4 η + 1 F ( A ) η η x so we find that ddt Z η dx ≤ (cid:0) R ( A ) + 3 F ( A ) (cid:1) Z η dx. The result follows from Gronwall’s inequality. (cid:3)
GRAHAM COX
We finally turn to the adjoint equation. Invoking linearity, the solution can beexpressed as p = p + · · · p N , where p i satisfies (9) with terminal condition p ( t N ) = 0and jump p ( t i +) − p ( t i − ) = H ∗ R − ( Hy ( t i ) − z i ) . (20)Therefore it suffices to bound each p i individually, then sum the resulting estimates. Lemma 3.
Suppose p ( t ) solves (9), and k y (0) k ∞ ≤ A . Then k p ( t ) k ≤ Ce β ( t N − t ) (21) for any t ≤ t N , and Z t N k p x ( t ) k dt ≤ C √ t N e βt N , (22) where β depends on A and t N , and C depends on N , D , A and t N .Proof. Differentiating and applying the AM–GM inequality, as in the proof ofLemma 2, we have − ddt k p k = h p, p xx + r ′ ( y ) p + f ′ ( y ) p x i≤ −k p x k + R ( A ) k p k + F ( A ) k p kk p x k≤ (cid:18) R ( A ) + F ( A ) (cid:19) k p k and so an application of Gronwall’s inequality to the function k p ( t i − t ) k yields k p i ( t ) k ≤ (cid:13)(cid:13) H ∗ R − ( Hy ( t i ) − z i ) (cid:13)(cid:13) e β ( t i − t ) for any t ≤ t i , with β = R ( A ) + F ( A ) /
4. We next recall that y ( t ) is boundeduniformly (and hence in L ), so (cid:13)(cid:13) H ∗ R − ( Hy ( t i ) − z i ) (cid:13)(cid:13) ≤ C ′ , where C ′ depends on A and t N (through Lemma 1) and D . To complete the proofof (21) we simply note that p i ( t ) = 0 for t > t i , then let C = N C ′ .With a different choice of constants in the AM–GM inequality, we obtain − ddt k p k ≤ − k p x k + (cid:18) R ( A ) + F ( A ) (cid:19) k p k and subsequently, letting γ = 2 R ( A ) + F ( A ) , k p x k ≤ ddt (cid:0) e γt k p k (cid:1) . Integrating, we find Z t i k p ix ( t ) k dt ≤ e γt i k p i ( t i ) k − k p i (0) k ≤ e γt i C ′ for each i . Now from the Cauchy–Schwarz inequality, Z t N k p x ( t ) k dt ≤ N X i =1 (cid:18) t i Z t i k p xi ( t ) k dt (cid:19) / ≤ N C ′ √ t N e βt N , NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 9 where we have used the fact that γ ≤ β . (cid:3) The Bayesian formulation
Before proving Theorem 1 we elaborate on the meaning of the Gaussian priormeasure µ = N (cid:0) u , − σ ∆ − (cid:1) , following throughout the presentation of [8].We first note that the covariance operator C = − σ ∆ − has eigenvalues γ n =( σ/nπ ) , with normalized eigenfunctions φ n ( x ) = √ nπx ). Then we can ex-press a random variable u ∼ µ using the Karhunen–Lo`eve expansion: u = u + √ ∞ X n =1 σξ n nπ sin( nπx ) , (23)where { ξ n } is an i.i.d. sequence of N (0 ,
1) random variables. This means the n th Fourier coefficient of u is distributed according to N ( a n , ( σ/nπ ) ), where a n is the n th Fourier coefficient of the prior mean, u . It follows that k u − u k = ∞ X n =1 σ ξ n ( nπ ) and so E k u − u k = σ . Thus σ measures the expected value of k u − u k .We now observe that Theorem 1 follows from Corollary 4.4 of [8]. To do so wemust show that:(i) L (0 ,
1) has full measure under µ ;(ii) for every ǫ > M ∈ R such that N X i =1 (cid:12)(cid:12)(cid:12) R − / Hy ( t i ) (cid:12)(cid:12)(cid:12) ≤ exp( ǫ k y (0) k + M )whenever y ( t ) is a solution to (3);(iii) for every ρ > L ∈ R such that N X i =1 (cid:12)(cid:12)(cid:12) R − / H ( y ( t i ) − y ( t i )) (cid:12)(cid:12)(cid:12) ≤ L k y (0) − y (0) k whenever y ( t ) and y ( t ) satisfy (3) with max {k y (0) k , k y (0) k} < ρ .To establish (i) we use Lemma 6.25 of [8], which says that any function u ∼ µ is almost surely α -H¨older continuous for any α < /
2. In particular, this implies u is almost surely contained in L (0 , µ [ L (0 , H is bounded on L , (ii) and (iii) will follow fromLemmas 4 and 5 below. Lemma 4.
Suppose r ( y ) is uniformly Lipschitz. Then there exist positive constants a and b so that k y ( t ) k ≤ e at (cid:2) k y (0) k + bt (cid:3) (24) and Z t k y x ( s ) k ds ≤ k y (0) k − k y ( t ) k + a Z t k y ( s ) k ds + bt (25) for all t ≥ and any solution y ( t ) to (3).Proof. Differentiating, we have12 ddt k y k = h y, y xx + r ( y ) − f ( y ) x i . Letting g ( y ) be an antiderivative of yf ′ ( y ), we find that h y, f ( y ) x i = Z g ( y ) x dx vanishes by the fundamental theorem of calculus, so ddt k y k ≤ −k y x k + 2 h y, r ( y ) i . It follows immediately from the Lipschitz condition that | yr ( y ) | ≤ K | y | + | r (0) || y | ,which implies 2 | yr ( y ) | ≤ a | y | + b for some a and b . Then (24) is a consequence ofGronwall’s inequality, and (25) is obtained by integrating from 0 to t . (cid:3) To see how this implies (ii), we first observe that it suffices to prove k y ( t ) k ≤ exp( ǫ k u k + M )for any t ≤ t N . Thus fixing ǫ > e M = max { bt N e at N , ǫ − e at N } we have from Lemma 4 that k y ( t ) k ≤ e M (cid:0) ǫ k u k (cid:1) ≤ e M e ǫ k u k as required. Lemma 5.
Suppose r ( y ) and f ′ ( y ) are uniformly Lipschitz. Then for any ρ > there exists a positive constant L so that k y ( t ) − y ( t ) k ≤ L k u − u k for all t ≤ t N , provided y ( t ) and y ( t ) satisfy (3) with max {k y (0) k , k y (0) k} < ρ .Proof. For convenience we let K r and K f denote the Lipschitz constants of r and f ′ , respectively. Differentiating and then integrating by parts as in the proof ofLemma 4, we have12 ddt k y − y k = h y − y , ( y − y ) xx + r ( y ) − r ( y ) − f ( y ) x + f ( y ) x i≤ −k ( y − y ) x k + K r k y − y k + h y − y , f ( y ) x − f ( y ) x i . For the final term we write f ( y ) x − f ( y ) x = f ′ ( y )( y − y ) x + [ f ′ ( y ) − f ′ ( y )] y x . and thus obtain |h y − y , f ( y ) x − f ( y ) x i| ≤ [ K f ( k y x k + k y x k ) + | f ′ (0) | ] k y − y kk ( y − y ) x k . NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 11
Then after an application of the AM–GM inequality, we find that12 ddt k y − y k ≤ (cid:18) K r + 14 [ K f ( k y x k + k y x k ) + | f ′ (0) | ] (cid:19) k y − y k = α ( t ) k y − y k , where we have defined α ( t ) to be the term in parentheses on the right-hand side.From (25) we know that R t α ( s ) ds is bounded above by a constant depending onlyon k y (0) k , k y (0) k and t N , so the result follows from Gronwall’s inequality. (cid:3) The uniqueness theorems
Our main tool for proving Theorems 2 and 3 will be the second variation formula(15) together with the following a priori estimate for minimizers of J . Lemma 6.
Let u ∗ achieve of the infimum of the cost functional (1). Then k u ∗ k V ≤ A, (26) where A depends on N , D , t N , k u k and σ . It is clear from the proof that A can be assumed to be nondecreasing with respectto σ . Proof.
Since u ∗ is a minimizer it satisfies J ( u ∗ ) ≤ J (0). Letting y ( t ) solve (3) with y (0) = 0, Lemma 1 implies that y ( t ) is uniformly bounded for t ≤ t N . Therefore k u ∗ k V ≤ σ J ( u ∗ ) ≤ σ N X i =1 (cid:12)(cid:12)(cid:12) R − / ( Hy ( t i ) − z i ) (cid:12)(cid:12)(cid:12) + k u k V is bounded above as claimed. (cid:3) We now use the estimates of Section 4 to prove that, under the conditions ofTheorems 2 and 3, J is convex on the ball k u ∗ k V ≤ A . Discarding nonnegativeterms in (15), it suffices to show that Z t N (cid:12)(cid:12)(cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11)(cid:12)(cid:12) dt < σ k v k V . From Lemmas 2 and 3 we have (cid:12)(cid:12)(cid:10) r ′′ ( y ) η , p (cid:11)(cid:12)(cid:12) ≤ CR k v k e αt + β ( t N − t ) and Z t N (cid:12)(cid:12)(cid:10) f ′′ ( y ) η , p x (cid:11)(cid:12)(cid:12) dt ≤ CF k v k√ t N e ( α +2 β ) t N . Combining these estimates, we find that Z t N (cid:12)(cid:12)(cid:10) r ′′ ( y ) η , p (cid:11) + (cid:10) f ′′ ( y ) η , p x (cid:11)(cid:12)(cid:12) dt ≤ Γ k v k V √ t N , (27)where Γ depends on A (from Lemma 6), t N , N and D .It is clear that the constant Γ in (27) remains bounded as t N →
0. Thereforein proving Theorem 2 it suffices to choose t N sufficiently small that Γ √ t N < σ − .Similarly for Theorem 3, we observe that Γ remains bounded as σ → σ small enough that σ − > Γ √ t N . The non-uniqueness theorem
Turning now to the proof of Theorem 4, we must establish that u = 0 is a criticalpoint of J , and the Hessian D J (0) has at least q negative eigenvalues. The keyto the proof is the observation that the Euler–Lagrange equation depends on boththe data and the prior, whereas the Hessian is independent of the prior. Thuswe can first construct data to ensure D J (0) has the required number of negativeeigenvalues, and then choose the prior term to ensure that 0 is in fact a criticalpoint of J .The hypothesis r (0) = 0 ensures that y ( t ) = 0 is the unique solution of (3) with y (0) = 0. Then because r ′ (0) = 0, the linearized equation reduces to the heatequation, η t = η xx , and the adjoint equation becomes the backward heat equation, − p t = p xx .We compute the Hessian of J in the direction of the first q Fourier modes, setting v n = sin( nπx ) for 1 ≤ n ≤ q , so k v n k V = 1 /
2. The corresponding solution to thelinearized forward equation is η ( x, t ) = e − n π t sin( nπx )and so N X i =1 (cid:12)(cid:12)(cid:12) R − / Hη ( t i ) (cid:12)(cid:12)(cid:12) ≤ N X i =1 e − π t i For each observation z i ∈ R we have H ∗ z i = z i sin( πx ) , hence the solution to the adjoint equation is given by p ( x, t ) = X { i : t 0, the Hessian will also be negative for all v n with1 ≤ n ≤ q . NON)UNIQUENESS OF CRITICAL POINTS IN VARIATIONAL DATA ASSIMILATION 13 To complete the proof, we choose the prior u := σ ∆ − p (0) . It follows immediately from (12) that u = 0 is a critical point of J . Acknowledgments The author would like to thank Damon McDougall for numerous enlighteningconversations throughout the preparation of this work. This research has beensupported by the Office of Naval Research under the MURI grant N00014-11-1-0087. References [1] Amit Apte, Didier Auroux, and Mythily Ramaswamy. Variational data assimilation for dis-crete Burgers equation. In Proceedings of the Eighth Mississippi State-UAB Conference onDifferential Equations and Computational Simulations , volume 19 of Electron. J. Differ. Equ.Conf. , pages 15–30, San Marcos, TX, 2010. Southwest Texas State Univ.[2] Carlos Castro, Francisco Palacios, and Enrique Zuazua. An alternating descent method forthe optimal control of the inviscid Burgers equation in the presence of shocks. Math. ModelsMethods Appl. Sci. , 18(3):369–416, 2008.[3] Fran¸cois-Xavier Le Dimet and Olivier Talagrand. Variational algorithms for analysis and as-similation of meteorological observations: theoretical aspects. Tellus A , 38A(2):97–110, 1986.[4] Daniel Henry. Geometric theory of semilinear parabolic equations , volume 840 of Lecture Notesin Mathematics . Springer-Verlag, Berlin, 1981.[5] Stanley Kaplan. On the growth of solutions of quasi-linear parabolic equations. Comm. PureAppl. Math. , 16:305–330, 1963.[6] J. Lundvall, V. Kozlov, and P. Weinerfelt. Iterative methods for data assimilation for Burgers’equation. J. Inverse Ill-Posed Probl. , 14(5):505–535, 2006.[7] Antje Noack and Andrea Walther. Adjoint concepts for the optimal control of Burgers equation. Comput. Optim. Appl. , 36(1):109–133, 2007.[8] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta Numer. , 19:451–559, 2010.[9] Luther W. White. A study of uniqueness for the initialization problem for Burgers’ equation. J. Math. Anal. Appl. , 172(2):412–431, 1993. E-mail address : [email protected]@email.unc.edu