[PDF] Nonlinear optimization in Hilbert space using Sobolev gradients with applications

Abstract

The problem of finding roots or solutions of a nonlinear partial differential equation may be formulated as the problem of minimizing a sum of squared residuals. One then defines an evolution equation so that in the asymptotic limit a minimizer, and often a solution of the PDE, is obtained. The corresponding discretized nonlinear least squares problem is an often met problem in the field of numerical optimization, and thus there exist a wide variety of methods for solving such problems. We review here Newton's method from nonlinear optimization both in a discrete and continuous setting and present results of a similar nature for the Levernberg-Marquardt method. We apply these results to the Ginzburg-Landau model of superconductivity.

Full PDF

NNonlinear optimization in Hilbert space usingSobolev gradients with applications

P. Kazemi ∗ , R. J. Renka † October 30, 2018

Abstract

The problem of ﬁnding roots or solutions of a nonlinear partial diﬀer-ential equation may be formulated as the problem of minimizing a sumof squared residuals. One then deﬁnes an evolution equation so that inthe asymptotic limit a minimizer, and often a solution of the PDE, isobtained. The corresponding discretized nonlinear least squares prob-lem is an often met problem in the ﬁeld of numerical optimization, andthus there exist a wide variety of methods for solving such problems. Wereview here Newton’s method from nonlinear optimization both in a dis-crete and continuous setting and present results of a similar nature for theLevernberg-Marquardt method. We apply these results to the Ginzburg-Landau model of superconductivity.

Consider the problem of ﬁnding a solution to the PDE F ( Du ) = 0 (1)where for u in the Sobolev space H := H , (Ω), Du = { D α u : | α | ≤ } . Ω isassumed to be a bounded domain of dimension n with smooth boundary. F is a function from R n +1 to R m which is commonly referred to as a Nemistkiioperator. Let L := L (Ω). In order that F ◦ D : H → [ L ] m be Fr´echetdiﬀerentiable it is suﬃcient that F be C and satisfy the growth bound | F (cid:48) ( x ) | ≤ c | x | for x ∈ R n +1 and some c > E ( u ) = (cid:107) F ( Du ) (cid:107) L . (2) ∗ Ulm, DE ( [email protected] ) † Department of Computer Science & Engineering, University of North Texas, Denton, TX76203-1366 ( [email protected] ) a r X i v : . [ m a t h . A P ] D ec he proposed method would ideally be an existence results, proving that aminimizer exists and is a solution of the PDE, and it should give a recipe forcomputing such a solution numerically as for most nonlinear problem closedform solutions are not possible. In order to be eﬀective, the numerical methodshould emulate an iteration in the inﬁnite-dimensional Sobolev space in whichthe PDE is formulated. This formulation follows naturally from Neuberger’stheory of Sobolev gradients. The Fr´echet derivative E (cid:48) ( u ), which is a boundedlinear functional on H , is represented by an element of H . This element is theSobolev gradient of E at u and is denoted by ∇ H E ( u ): E (cid:48) ( u ) h = (cid:104) h, ∇ H E ( u ) (cid:105) H , h ∈ H. Note that the gradient depends on the inner product attached to H . Oneconsiders the evolution equation z (0) = z ∈ H and z (cid:48) ( t ) = −∇ H E ( z ( t )) , t ≥ . (3)The energy E is non-increasing on the trajectory z . Existence, uniqueness, andasymptotic convergence to a critical point are established by the following twotheorems taken from [9, Chapter 4]. Theorem 1.

Suppose that E is a non-negative C real-valued function on aHilbert space H with a locally Lipschitz continuous Sobolev gradient. Then foreach z ∈ H there is a unique global solution of (3) . Deﬁnition 1.

The energy functional E satisﬁes a gradient inequality on K ⊆ H if there exists θ ∈ (0 , and m > so that for all x ∈ K (cid:107)∇ H E ( x ) (cid:107) H ≥ mE ( x ) θ . Theorem 2.

Suppose that E is a non-negative C functional on H with alocally Lipschitz continuous gradient, z is the unique global solution of (3) , and E satisﬁes a gradient inequality on the range of z . Then lim t →∞ z ( t ) existsand is a zero of the gradient, where the limit is deﬁned by the H -norm. By thegradient inequality, the limit is also a zero of E . The above theorems provide a ﬁrm theoretical basis for the numerical treat-ment of a system of nonlinear PDE’s by a gradient descent method that emu-lates (3); i.e., discretization in time and space results in the method of steepestdescent with a discretized Sobolev gradient. Note that the Sobolev gradientmethod diﬀers from methods based on calculus of variations in which the Euler-Lagrange equation is solved. Forming the Euler-Lagrange equation requiresintegration by parts to obtain the element that represents E (cid:48) ( u ) in the L innerproduct. This L gradient is usually only deﬁned on a Sobolev space of higherorder than that of H . Hence, unlike the Sobolev gradient, the L gradient isonly densely deﬁned on the domain of E . For gradient ﬂows involving the L gradient, existence and uniqueness results similar to those of Theorems 1 and(2) may be proved, but only under stricter assumptions.2 Newton and Variable Newton methods

In order to solve the minimization problem (2), we take the ﬁrst variation toobtain E (cid:48) ( u ) h = (cid:104) F (cid:48) ( Du ) Dh, F ( Du ) (cid:105) L . (4)We seek an evolution equation so that in the limit as time goes to inﬁnity, wecan ﬁnd a zero. A natural setting would be if such an evolution came from agradient system as deﬁned in [2]. In particular, ﬁrst assume that G ( u ) := F ( Du )and G (cid:48) ( u ) is invertible for each u . Then Newton’s method u (0) = u and u (cid:48) ( t ) = − ( G (cid:48) ( u )) − G ( u )is the gradient system associated with the inner product (cid:104) v, w (cid:105) g N ( u ) = (cid:104) G (cid:48) ( u ) v, G (cid:48) ( u ) w (cid:105) L . (5)This is achieved by noting that E (cid:48) ( u ) h = (cid:104) G (cid:48) ( u ) h, G (cid:48) ( u )( G (cid:48) ( u )) − G ( u ) (cid:105) = (cid:104) h, ( G (cid:48) ( u )) − G ( u ) (cid:105) g N ( u ) . Continuous Newton’s method gives an inﬁnite dimensional method for ﬁnd-ing solutions of (1) ; see, e.g., [4], [1], and [10] for zero-ﬁnding results of Nash-Moser type ([8]). In [5] Newton’s method is discussed in relation to gradientdescent methods. It is shown that, while the method of steepest descent islocally optimal in terms of the descent direction for a ﬁxed metric, Newton’smethod is optimal (in a sense which is made precise) in terms of both the di-rection and the inner product in a variable metric method. When Newton’smethod is available the quadratic rate of convergence to a solution makes thismethod ideal in a numerical setting.For many partial diﬀerential equations G (cid:48) ( u ) may not be invertible for all u thus Newton’s method cannot be applied in the inﬁnite dimensional setting. Inthe ﬁnite dimensional setting, one needs that the initial condition be close tothe solution to obtain convergence. Another option is to minimize (2) using avariable metric method. We give here the description of one such method whichwhen discretized gives a variation of the Levenberg-Marquardt method. Theresults are taken from [7].For u ∈ H = H (Ω) consider the bilinear form on H deﬁned by (cid:104) v, w (cid:105) u = (cid:104) v, w (cid:105) H + (1 /λ ( u )) (cid:104) G (cid:48) ( u ) v, G (cid:48) ( u ) w (cid:105) L (6)where λ ( u ) is a positive damping parameter. By our assumption that G (cid:48) ( u ) ∈ L ( H, L ), there exists a constant c = c ( u ) so that for all v ∈ H , (cid:107) G (cid:48) ( u ) v (cid:107) L ≤ c (cid:107) v (cid:107) H . Hence (cid:107) v (cid:107) u = (cid:107) v (cid:107) H + (1 /λ ( u )) (cid:107) G (cid:48) ( u ) v (cid:107) L , and (cid:107) v (cid:107) H ≤ (cid:107) v (cid:107) u ≤ (cid:112) c ( u ) /λ ( u ) (cid:107) v (cid:107) H so that, for each u ∈ H , (6) deﬁnes a norm that is equivalent to the standardSobolev norm on H . The gradient of E with respect to (cid:104)· , ·(cid:105) u is deﬁned to bethe unique element ∇ u E ( u ) so that E (cid:48) ( u ) h = (cid:104) h, ∇ u E ( u ) (cid:105) u ∀ h ∈ H. (7)3onsider the gradient ﬂow z (0) = u ∈ H and z (cid:48) ( t ) = −∇ z ( t ) E ( z ( t )) , t ≥ . (8)We seek a C solution z of this ﬂow so that u f = lim t →∞ z ( t ) exists and E (cid:48) ( u f ) = 0. In the case that a gradient inequality is satisﬁed, a zero of thederivative of E is also a zero of E and hence a solution of (1). The key ideain obtaining global existence and asymptotic convergence of the ﬂow (8) is toobtain an expression for the abstract gradient. We obtain this expression byconsidering a family of orthogonal projection onto the graph of a closed denselydeﬁned operator. In particular, let S u : H → [ L ] n +1 be given by S u = (cid:0) TT u (cid:1) where T h = { D α h : | α | = 1 } and T u h = (1 /λ ( u )) G (cid:48) ( u ) h . Since the domainof S u is all of H , S u can be viewed as a densely deﬁned operator on L . It isalso the case that S u is a bounded linear operator from H to [ L ] n +1 . Note that S u need not be bounded when viewed as an operator from L to [ L ] n +1 . It alsofollows that the graph of S u is a closed subspace of [ L ] n +2 . By a theorem of vonNeumann ([13]) there exists a unique orthogonal projection from [ L ] n +2 ontothe graph of S u , and the projection is given by P u = (cid:18) ( I + S ∗ u S u ) − S ∗ u ( I + S u S ∗ u ) − S u ( I + S ∗ u S u ) − I − ( I + S u S ∗ u ) − (cid:19) . This result can also be found in [9, Theorem 5.2]. Here S ∗ u is the adjoint of S u with S u treated as a closed and densely deﬁned operator on L , and hence S ∗ u isalso closed and densely deﬁned on its domain [ L ] n +1 . Note also that ( I + S ∗ u S u ) − and S ∗ u ( I + S u S ∗ u ) − are everywhere deﬁned on L and [ L ] n +1 , respectively, andare bounded as operators from L to L and [ L ] n +1 to L , respectively ([11, Sec118]). We will obtain an expression for the gradient given in (7). The graph of S u is (cid:110)(cid:0) DhT u h (cid:1) : h ∈ H (cid:111) . Since P u is the unique orthogonal projection of [ L ] n +2 ontothe graph of S u , P u is the identity on the graph of S u , and thus P u (cid:0) DhT u h (cid:1) = (cid:0) DhT u h (cid:1) for all h ∈ H . Using symmetry of P u , we have E (cid:48) ( u ) h = (cid:104) G (cid:48) ( u ) h, G ( u ) (cid:105) L = (cid:112) λ ( u ) (cid:28)(cid:18) Dh (1 / (cid:112) λ ( u )) G (cid:48) ( u ) h (cid:19) , (cid:18) G ( u ) (cid:19)(cid:29) [ L ] n +2 = (cid:112) λ ( u ) (cid:28) P u (cid:18) Dh (1 / (cid:112) λ ( u )) G (cid:48) ( u ) h (cid:19) , (cid:18) G ( u ) (cid:19)(cid:29) [ L ] n +2 = (cid:112) λ ( u ) (cid:28)(cid:18) Dh (1 / (cid:112) λ ( u )) G (cid:48) ( u ) h (cid:19) , P u (cid:18) G ( u ) (cid:19)(cid:29) [ L ] n +2 = (cid:112) λ ( u ) (cid:28) h, Π P u (cid:18) G ( u ) (cid:19)(cid:29) u , (cid:0) xy (cid:1) = x . Foreach u ∈ H we have a gradient ∇ u E ( u ) = (cid:112) λ ( u )Π P u (cid:18) G ( u ) (cid:19) = (cid:112) λ ( u ) S ∗ u ( I + S u S ∗ u ) − (cid:18) G ( u ) (cid:19) . (9) Theorem 3. ∇ u E ( u ) = (cid:112) λ ( u ) M T ∗ u ( I + T u M T ∗ u ) − F ( Du ) , (10) where T ∗ u is the adjoint of T u when viewed as a closed and densely deﬁnedoperator on L , and M = ( I + T ∗ T ) − = ( D ∗ D ) − is a smoothing operator. A steepest descent iteration with a discretization of this gradient is a gener-alized Levenberg-Marquardt iteration in which the identity or a diagonal matrixis replaced by the positive deﬁnite operator M − = D ∗ D .The generalized Levenberg-Marquardt method is given by u n +1 = u n − λ ( u )( λ ( u ) D ∗ D + ( G (cid:48) ( u )) ∗ G (cid:48) ( u )) − ( G (cid:48) ( u )) ∗ F ( Du )which is a forward Euler discretization of (8) with time step 1. The expressionof the gradient was used to obtain the following results. Theorem 4.

Suppose E is as deﬁned in (2) with F ◦ D a C function deﬁned on H with range in L , and suppose that λ : H → R is locally Lipschitz continuousand bounded below by a positive constant. Then the gradient system (8) has aunique global solution z ∈ C ([0 , ∞ ) , H ) . Theorem 5.

Suppose that there exists ξ ∈ (0 , so that if u ∈ H , there is γ ( u ) > (cid:107) F ( Du ) (cid:107) ξL such that for each g in the domain of G (cid:48) ( u ) ∗ with (cid:107) g (cid:107) L = 1 the linear PDE G (cid:48) ( u ) x = g has a solution x ∈ H with (cid:107) x (cid:107) H ≤ γ ( u ) . Then a gradient inequality is satis-ﬁed. Here G (cid:48) ( u ) ∗ denotes the adjoint of G (cid:48) ( u ) when viewed as a densely deﬁneoperator on L . Theorem 6.

Suppose that the hypotheses of Theorem 4 are satisﬁed so that (8) has a unique global solution z , and suppose that the hypotheses of Theorem 5are satisﬁed on an open region containing the range of z . Then u = lim t →∞ z ( t ) exists and F ( Du ) = 0 . In [6], we applied a variation of the above formulation to study the Ginzburg-Landau energy. The Ginzburg-Landau model postulates that the behavior ofthe superconducting electrons in materials can be described by a complex valued5ave function in which case the Ginzburg-Landau (Gibbs free) energy is givenby E ( u, A ) = (cid:90) Ω |∇ u − iAu | |∇ × A − H | κ | u | − (11)in nondimensionalized form. Here u is the complex valued wave function whichgives the probability density of the superconducting electrons, A is the in-duced magnetic vector potential, H is the external magnetic ﬁeld, and κ isthe Ginzburg-Landau coeﬃcient which characterizes the type of superconduct-ing sample (type I or type II). The central Ginzburg-Landau problem is to ﬁnd aminimizer of the Ginzburg-Landau energy. Note that this energy can be writtenin the form (2) with F ( D ( u, A )) =  r + ass − arr + bss − brb − a − H κ √ ( r + s −  . (12)for u = (cid:0) rs (cid:1) and A = (cid:0) ab (cid:1) . One can check that F ◦ D is C from H = [ H (Ω)] to L = [ L (Ω)] . Thus the gradient system (8) has a unique global solution when λ satisﬁes the properties of Theorem 4. We studied the rate of convergence ofthis ﬂow to a minimizer using a trust-region method in [6]. In future work, itwould be a very nice result to obtain a gradient inequality to prove convergenceby verifying the condition of Theorem 5. Since for the Ginzburg-Landau energy,a minimizer does not correspond to a zero of the energy, the deﬁnition of thegradient inequality is altered to the following deﬁnition taken from [2]. Deﬁnition 2.

Suppose E is as in (2) and that E achieves a local minimum at u m . Then E is said to satisfy a gradient inequality in a neighborhood of u m ifthere exists a ball B containing u m and ξ ∈ (0 , , c > such that all v ∈ B | E ( v ) − E ( u m ) | ξ ≤ c (cid:107)∇ v E ( v ) (cid:107) v . This formulation was used to obtain a stabilization result for the Ginzburg-Landau equations in [3] and [12]. In ﬁgure 1, we give contour plots of minimizersfor various parameters.

We extended the theory of Sobolev gradients to include gradients associated witha variable inner product, and we described a generalized Levenberg-Marquardtmethod as a gradient ﬂow in an inﬁnite-dimensional Sobolev space. We pre-sented conditions under which the ﬂow is guaranteed to converge to a zero ofa residual representing a solution of a nonlinear partial diﬀerential equation.The conditions include smoothness of the residual and satisfaction of a gradientinequality. 6igure 1: Vortex conﬁgurations corresponding to a density plot of the minimizerfor κ = 4 and H = 4 , , , Acknowledgements

The ﬁrst author would like to thanks the Institute of Applied Analysis and theInstitute of Quantum Physics at Ulm University for their hospitality during mytime in Ulm. 7 eferenceseferences