Nonlinear optimization in Hilbert space using Sobolev gradients with applications
NNonlinear optimization in Hilbert space usingSobolev gradients with applications
P. Kazemi ∗ , R. J. Renka † October 30, 2018
Abstract
The problem of finding roots or solutions of a nonlinear partial differ-ential equation may be formulated as the problem of minimizing a sumof squared residuals. One then defines an evolution equation so that inthe asymptotic limit a minimizer, and often a solution of the PDE, isobtained. The corresponding discretized nonlinear least squares prob-lem is an often met problem in the field of numerical optimization, andthus there exist a wide variety of methods for solving such problems. Wereview here Newton’s method from nonlinear optimization both in a dis-crete and continuous setting and present results of a similar nature for theLevernberg-Marquardt method. We apply these results to the Ginzburg-Landau model of superconductivity.
Consider the problem of finding a solution to the PDE F ( Du ) = 0 (1)where for u in the Sobolev space H := H , (Ω), Du = { D α u : | α | ≤ } . Ω isassumed to be a bounded domain of dimension n with smooth boundary. F is a function from R n +1 to R m which is commonly referred to as a Nemistkiioperator. Let L := L (Ω). In order that F ◦ D : H → [ L ] m be Fr´echetdifferentiable it is sufficient that F be C and satisfy the growth bound | F (cid:48) ( x ) | ≤ c | x | for x ∈ R n +1 and some c > E ( u ) = (cid:107) F ( Du ) (cid:107) L . (2) ∗ Ulm, DE ( [email protected] ) † Department of Computer Science & Engineering, University of North Texas, Denton, TX76203-1366 ( [email protected] ) a r X i v : . [ m a t h . A P ] D ec he proposed method would ideally be an existence results, proving that aminimizer exists and is a solution of the PDE, and it should give a recipe forcomputing such a solution numerically as for most nonlinear problem closedform solutions are not possible. In order to be effective, the numerical methodshould emulate an iteration in the infinite-dimensional Sobolev space in whichthe PDE is formulated. This formulation follows naturally from Neuberger’stheory of Sobolev gradients. The Fr´echet derivative E (cid:48) ( u ), which is a boundedlinear functional on H , is represented by an element of H . This element is theSobolev gradient of E at u and is denoted by ∇ H E ( u ): E (cid:48) ( u ) h = (cid:104) h, ∇ H E ( u ) (cid:105) H , h ∈ H. Note that the gradient depends on the inner product attached to H . Oneconsiders the evolution equation z (0) = z ∈ H and z (cid:48) ( t ) = −∇ H E ( z ( t )) , t ≥ . (3)The energy E is non-increasing on the trajectory z . Existence, uniqueness, andasymptotic convergence to a critical point are established by the following twotheorems taken from [9, Chapter 4]. Theorem 1.
Suppose that E is a non-negative C real-valued function on aHilbert space H with a locally Lipschitz continuous Sobolev gradient. Then foreach z ∈ H there is a unique global solution of (3) . Definition 1.
The energy functional E satisfies a gradient inequality on K ⊆ H if there exists θ ∈ (0 , and m > so that for all x ∈ K (cid:107)∇ H E ( x ) (cid:107) H ≥ mE ( x ) θ . Theorem 2.
Suppose that E is a non-negative C functional on H with alocally Lipschitz continuous gradient, z is the unique global solution of (3) , and E satisfies a gradient inequality on the range of z . Then lim t →∞ z ( t ) existsand is a zero of the gradient, where the limit is defined by the H -norm. By thegradient inequality, the limit is also a zero of E . The above theorems provide a firm theoretical basis for the numerical treat-ment of a system of nonlinear PDE’s by a gradient descent method that emu-lates (3); i.e., discretization in time and space results in the method of steepestdescent with a discretized Sobolev gradient. Note that the Sobolev gradientmethod differs from methods based on calculus of variations in which the Euler-Lagrange equation is solved. Forming the Euler-Lagrange equation requiresintegration by parts to obtain the element that represents E (cid:48) ( u ) in the L innerproduct. This L gradient is usually only defined on a Sobolev space of higherorder than that of H . Hence, unlike the Sobolev gradient, the L gradient isonly densely defined on the domain of E . For gradient flows involving the L gradient, existence and uniqueness results similar to those of Theorems 1 and(2) may be proved, but only under stricter assumptions.2 Newton and Variable Newton methods
In order to solve the minimization problem (2), we take the first variation toobtain E (cid:48) ( u ) h = (cid:104) F (cid:48) ( Du ) Dh, F ( Du ) (cid:105) L . (4)We seek an evolution equation so that in the limit as time goes to infinity, wecan find a zero. A natural setting would be if such an evolution came from agradient system as defined in [2]. In particular, first assume that G ( u ) := F ( Du )and G (cid:48) ( u ) is invertible for each u . Then Newton’s method u (0) = u and u (cid:48) ( t ) = − ( G (cid:48) ( u )) − G ( u )is the gradient system associated with the inner product (cid:104) v, w (cid:105) g N ( u ) = (cid:104) G (cid:48) ( u ) v, G (cid:48) ( u ) w (cid:105) L . (5)This is achieved by noting that E (cid:48) ( u ) h = (cid:104) G (cid:48) ( u ) h, G (cid:48) ( u )( G (cid:48) ( u )) − G ( u ) (cid:105) = (cid:104) h, ( G (cid:48) ( u )) − G ( u ) (cid:105) g N ( u ) . Continuous Newton’s method gives an infinite dimensional method for find-ing solutions of (1) ; see, e.g., [4], [1], and [10] for zero-finding results of Nash-Moser type ([8]). In [5] Newton’s method is discussed in relation to gradientdescent methods. It is shown that, while the method of steepest descent islocally optimal in terms of the descent direction for a fixed metric, Newton’smethod is optimal (in a sense which is made precise) in terms of both the di-rection and the inner product in a variable metric method. When Newton’smethod is available the quadratic rate of convergence to a solution makes thismethod ideal in a numerical setting.For many partial differential equations G (cid:48) ( u ) may not be invertible for all u thus Newton’s method cannot be applied in the infinite dimensional setting. Inthe finite dimensional setting, one needs that the initial condition be close tothe solution to obtain convergence. Another option is to minimize (2) using avariable metric method. We give here the description of one such method whichwhen discretized gives a variation of the Levenberg-Marquardt method. Theresults are taken from [7].For u ∈ H = H (Ω) consider the bilinear form on H defined by (cid:104) v, w (cid:105) u = (cid:104) v, w (cid:105) H + (1 /λ ( u )) (cid:104) G (cid:48) ( u ) v, G (cid:48) ( u ) w (cid:105) L (6)where λ ( u ) is a positive damping parameter. By our assumption that G (cid:48) ( u ) ∈ L ( H, L ), there exists a constant c = c ( u ) so that for all v ∈ H , (cid:107) G (cid:48) ( u ) v (cid:107) L ≤ c (cid:107) v (cid:107) H . Hence (cid:107) v (cid:107) u = (cid:107) v (cid:107) H + (1 /λ ( u )) (cid:107) G (cid:48) ( u ) v (cid:107) L , and (cid:107) v (cid:107) H ≤ (cid:107) v (cid:107) u ≤ (cid:112) c ( u ) /λ ( u ) (cid:107) v (cid:107) H so that, for each u ∈ H , (6) defines a norm that is equivalent to the standardSobolev norm on H . The gradient of E with respect to (cid:104)· , ·(cid:105) u is defined to bethe unique element ∇ u E ( u ) so that E (cid:48) ( u ) h = (cid:104) h, ∇ u E ( u ) (cid:105) u ∀ h ∈ H. (7)3onsider the gradient flow z (0) = u ∈ H and z (cid:48) ( t ) = −∇ z ( t ) E ( z ( t )) , t ≥ . (8)We seek a C solution z of this flow so that u f = lim t →∞ z ( t ) exists and E (cid:48) ( u f ) = 0. In the case that a gradient inequality is satisfied, a zero of thederivative of E is also a zero of E and hence a solution of (1). The key ideain obtaining global existence and asymptotic convergence of the flow (8) is toobtain an expression for the abstract gradient. We obtain this expression byconsidering a family of orthogonal projection onto the graph of a closed denselydefined operator. In particular, let S u : H → [ L ] n +1 be given by S u = (cid:0) TT u (cid:1) where T h = { D α h : | α | = 1 } and T u h = (1 /λ ( u )) G (cid:48) ( u ) h . Since the domainof S u is all of H , S u can be viewed as a densely defined operator on L . It isalso the case that S u is a bounded linear operator from H to [ L ] n +1 . Note that S u need not be bounded when viewed as an operator from L to [ L ] n +1 . It alsofollows that the graph of S u is a closed subspace of [ L ] n +2 . By a theorem of vonNeumann ([13]) there exists a unique orthogonal projection from [ L ] n +2 ontothe graph of S u , and the projection is given by P u = (cid:18) ( I + S ∗ u S u ) − S ∗ u ( I + S u S ∗ u ) − S u ( I + S ∗ u S u ) − I − ( I + S u S ∗ u ) − (cid:19) . This result can also be found in [9, Theorem 5.2]. Here S ∗ u is the adjoint of S u with S u treated as a closed and densely defined operator on L , and hence S ∗ u isalso closed and densely defined on its domain [ L ] n +1 . Note also that ( I + S ∗ u S u ) − and S ∗ u ( I + S u S ∗ u ) − are everywhere defined on L and [ L ] n +1 , respectively, andare bounded as operators from L to L and [ L ] n +1 to L , respectively ([11, Sec118]). We will obtain an expression for the gradient given in (7). The graph of S u is (cid:110)(cid:0) DhT u h (cid:1) : h ∈ H (cid:111) . Since P u is the unique orthogonal projection of [ L ] n +2 ontothe graph of S u , P u is the identity on the graph of S u , and thus P u (cid:0) DhT u h (cid:1) = (cid:0) DhT u h (cid:1) for all h ∈ H . Using symmetry of P u , we have E (cid:48) ( u ) h = (cid:104) G (cid:48) ( u ) h, G ( u ) (cid:105) L = (cid:112) λ ( u ) (cid:28)(cid:18) Dh (1 / (cid:112) λ ( u )) G (cid:48) ( u ) h (cid:19) , (cid:18) G ( u ) (cid:19)(cid:29) [ L ] n +2 = (cid:112) λ ( u ) (cid:28) P u (cid:18) Dh (1 / (cid:112) λ ( u )) G (cid:48) ( u ) h (cid:19) , (cid:18) G ( u ) (cid:19)(cid:29) [ L ] n +2 = (cid:112) λ ( u ) (cid:28)(cid:18) Dh (1 / (cid:112) λ ( u )) G (cid:48) ( u ) h (cid:19) , P u (cid:18) G ( u ) (cid:19)(cid:29) [ L ] n +2 = (cid:112) λ ( u ) (cid:28) h, Π P u (cid:18) G ( u ) (cid:19)(cid:29) u , (cid:0) xy (cid:1) = x . Foreach u ∈ H we have a gradient ∇ u E ( u ) = (cid:112) λ ( u )Π P u (cid:18) G ( u ) (cid:19) = (cid:112) λ ( u ) S ∗ u ( I + S u S ∗ u ) − (cid:18) G ( u ) (cid:19) . (9) Theorem 3. ∇ u E ( u ) = (cid:112) λ ( u ) M T ∗ u ( I + T u M T ∗ u ) − F ( Du ) , (10) where T ∗ u is the adjoint of T u when viewed as a closed and densely definedoperator on L , and M = ( I + T ∗ T ) − = ( D ∗ D ) − is a smoothing operator. A steepest descent iteration with a discretization of this gradient is a gener-alized Levenberg-Marquardt iteration in which the identity or a diagonal matrixis replaced by the positive definite operator M − = D ∗ D .The generalized Levenberg-Marquardt method is given by u n +1 = u n − λ ( u )( λ ( u ) D ∗ D + ( G (cid:48) ( u )) ∗ G (cid:48) ( u )) − ( G (cid:48) ( u )) ∗ F ( Du )which is a forward Euler discretization of (8) with time step 1. The expressionof the gradient was used to obtain the following results. Theorem 4.
Suppose E is as defined in (2) with F ◦ D a C function defined on H with range in L , and suppose that λ : H → R is locally Lipschitz continuousand bounded below by a positive constant. Then the gradient system (8) has aunique global solution z ∈ C ([0 , ∞ ) , H ) . Theorem 5.
Suppose that there exists ξ ∈ (0 , so that if u ∈ H , there is γ ( u ) > (cid:107) F ( Du ) (cid:107) ξL such that for each g in the domain of G (cid:48) ( u ) ∗ with (cid:107) g (cid:107) L = 1 the linear PDE G (cid:48) ( u ) x = g has a solution x ∈ H with (cid:107) x (cid:107) H ≤ γ ( u ) . Then a gradient inequality is satis-fied. Here G (cid:48) ( u ) ∗ denotes the adjoint of G (cid:48) ( u ) when viewed as a densely defineoperator on L . Theorem 6.
Suppose that the hypotheses of Theorem 4 are satisfied so that (8) has a unique global solution z , and suppose that the hypotheses of Theorem 5are satisfied on an open region containing the range of z . Then u = lim t →∞ z ( t ) exists and F ( Du ) = 0 . In [6], we applied a variation of the above formulation to study the Ginzburg-Landau energy. The Ginzburg-Landau model postulates that the behavior ofthe superconducting electrons in materials can be described by a complex valued5ave function in which case the Ginzburg-Landau (Gibbs free) energy is givenby E ( u, A ) = (cid:90) Ω |∇ u − iAu | |∇ × A − H | κ | u | − (11)in nondimensionalized form. Here u is the complex valued wave function whichgives the probability density of the superconducting electrons, A is the in-duced magnetic vector potential, H is the external magnetic field, and κ isthe Ginzburg-Landau coefficient which characterizes the type of superconduct-ing sample (type I or type II). The central Ginzburg-Landau problem is to find aminimizer of the Ginzburg-Landau energy. Note that this energy can be writtenin the form (2) with F ( D ( u, A )) = r + ass − arr + bss − brb − a − H κ √ ( r + s − . (12)for u = (cid:0) rs (cid:1) and A = (cid:0) ab (cid:1) . One can check that F ◦ D is C from H = [ H (Ω)] to L = [ L (Ω)] . Thus the gradient system (8) has a unique global solution when λ satisfies the properties of Theorem 4. We studied the rate of convergence ofthis flow to a minimizer using a trust-region method in [6]. In future work, itwould be a very nice result to obtain a gradient inequality to prove convergenceby verifying the condition of Theorem 5. Since for the Ginzburg-Landau energy,a minimizer does not correspond to a zero of the energy, the definition of thegradient inequality is altered to the following definition taken from [2]. Definition 2.
Suppose E is as in (2) and that E achieves a local minimum at u m . Then E is said to satisfy a gradient inequality in a neighborhood of u m ifthere exists a ball B containing u m and ξ ∈ (0 , , c > such that all v ∈ B | E ( v ) − E ( u m ) | ξ ≤ c (cid:107)∇ v E ( v ) (cid:107) v . This formulation was used to obtain a stabilization result for the Ginzburg-Landau equations in [3] and [12]. In figure 1, we give contour plots of minimizersfor various parameters.
We extended the theory of Sobolev gradients to include gradients associated witha variable inner product, and we described a generalized Levenberg-Marquardtmethod as a gradient flow in an infinite-dimensional Sobolev space. We pre-sented conditions under which the flow is guaranteed to converge to a zero ofa residual representing a solution of a nonlinear partial differential equation.The conditions include smoothness of the residual and satisfaction of a gradientinequality. 6igure 1: Vortex configurations corresponding to a density plot of the minimizerfor κ = 4 and H = 4 , , , Acknowledgements
The first author would like to thanks the Institute of Applied Analysis and theInstitute of Quantum Physics at Ulm University for their hospitality during mytime in Ulm. 7 eferenceseferences