Taylor approximation and variance reduction for PDE-constrained optimal control under uncertainty
aa r X i v : . [ m a t h . O C ] A ug TAYLOR APPROXIMATION AND VARIANCE REDUCTION FORPDE-CONSTRAINED OPTIMAL CONTROL UNDERUNCERTAINTY
PENG CHEN ∗ , UMBERTO VILLA † , AND
OMAR GHATTAS † Abstract.
In this work we develop a scalable computational framework for the solution ofPDE-constrained optimal control under high-dimensional uncertainty. Specifically, we consider amean-variance formulation of the control objective and employ a Taylor expansion with respect tothe uncertain parameter either to directly approximate the control objective or as a control variatefor variance reduction. The expressions for the mean and variance of the Taylor approximationare known analytically, although their evaluation requires efficient computation of the trace of the(preconditioned) Hessian of the control objective. We propose to estimate this trace by solving ageneralized eigenvalue problem using a randomized algorithm that only requires the action of theHessian on a small number of random directions. Then, the computational work does not depend onthe nominal dimension of the uncertain parameter, but depends only on the effective dimension (i.e.,the rank of the preconditioned Hessian), thus ensuring scalability to high-dimensional problems.Moreover, to increase the estimation accuracy of the mean and variance of the control objectiveby the Taylor approximation, we use it as a control variate for variance reduction, which resultsin considerable computational savings (several orders of magnitude) compared to a plain MonteCarlo method. In summary, our approach amounts to solving an optimal control constrained by theoriginal PDE and a set of linearized PDEs, which arise from the computation of the gradient andHessian of the control objective with respect to the uncertain parameter. We use the Lagrangianformalism to derive expressions for the gradient with respect to the control and apply a gradient-basedoptimization method to solve the problem. We demonstrate the accuracy, efficiency, and scalability ofthe proposed computational method for two examples with high-dimensional uncertain parameters:subsurface flow in a porous medium modeled as an elliptic PDE, and turbulent jet flow modeled by theReynolds-averaged Navier–Stokes equations coupled with a nonlinear advection-diffusion equationcharacterizing model uncertainty. In particular, for the latter more challenging example we showscalability of our algorithm up to one million parameters resulting from discretization of the uncertainparameter field.
Key words.
PDE-constrained optimal control, Taylor approximation, variance reduction,Monte Carlo integration, scalability, uncertainty quantification, high dimensionality
AMS subject classifications.
1. Introduction.
Optimal control (and more general optimization) constrainedby partial differential equations (PDEs) are ubiquitous in many applications of prac-tical relevance for science and engineering. The objective of such problems is to seekan optimal control that minimizes a cost functional, which often consists of a controlobjective and a penalty term. The control objective is often given as a distance be-tween the solution of the PDE and a desired state or more generally formulated asa quantity of interest related to the solution, while the penalty term reflects the costof the control or imposes regularity of the control. In practical applications, uncer-tainties in the PDE model are inevitable and can arise from various sources, such asthe PDE coefficients, initial or boundary conditions, external sources, or geometries.Different realizations of these uncertain parameters may lead to significant differencesin the optimal control. Mathematical theories and computational methods have beendeveloped for several decades to deal with deterministic PDE-constrained optimal ∗ Institute for Computational Engineering & Sciences, The University of Texas at Austin, Austin,TX 78712 ([email protected], [email protected]). † Institute for Computational Engineering & Sciences, Department of Mechanical Engineering,and Department of Geological Sciences, The University of Texas at Austin, Austin, TX 78712([email protected]). 1
Taylor approximation and variance reduction for optimal control under uncertainty control [56, 39, 40, 14, 44, 75, 15]. More recently, PDE-constrained optimal controlunder uncertainty has become an important area of research and has drawn increasingattention [16, 71, 45, 41, 67, 49, 74, 28, 54, 27, 51, 52, 61, 29, 53, 50, 11, 3, 4].Different formulations of the cost functional have been studied to incorporateuncertainty into the PDE-constrained optimal control. A straightforward choice is tooptimize the mean of the control objective, i.e., the integration of the control objectivewith respect to the probability measure of the uncertain parameter [16, 45, 41, 49, 27].However, this formulation does not control the variability of the control objective. Toavoid a possible and undesired large variation of the control objective, which mayrepresent the risk of a system failure, one can include the variance or higher momentsof the control objective in the cost functional [67, 11, 74]. An alternative approach isuse the Value-at-Risk (VaR) or conditional VaR [50], which measure the (conditional)probability of the control objective surpassing a certain critical value. An extremechoice is robust optimization (also referred as min-max optimization), where one seeksa control that yields the optimal value of the objective in the worst-case scenario, i.e.,the extreme value of the control objective in the range of the uncertain parameter[13, 54].A common challenge in all of these formulations is to compute the statistics of thecontrol objective (i.e., mean, variance, higher moments, or conditional probabilities);this involves the integration of the control objective or related quantities with respectto the probability measure of the uncertain parameter. This challenge becomes moresevere for high-dimensional uncertain parameter space, which is often the case whenthe parameter is a uncertain spatially correlated field. A straightforward approachis to apply a Monte Carlo method to compute the statistics by taking the averageof the control objective or related quantities at a set of samples randomly drawnaccording to the probability measure of the parameter. However, the converge rate ofthe Monte Carlo estimator is just O ( M − / ), where M is the number of samples. Thismay lead to an extremely large number of samples to achieve a target accuracy, andtherefore may not be feasible when the state equation is a complex nonlinear PDE.Quasi Monte Carlo methods using low-discrepancy sequences improve the convergencerate to O ((log( M )) d /M ) for a d -dimensional parameter [34], and is thus attractive forlow to medium dimensional problems. Stochastic Galerkin and stochastic collocationmethods have also been used to compute the statistical moments in the cost functional[45, 41, 74, 67, 49, 28, 54, 28, 27, 52, 53], provided that a suitable finite dimensionalparametrization of the uncertain parameter, such as a truncated Karhunen–Lo`eveexpansion, is applicable. These methods achieve fast convergence when the controlobjective depends smoothly on the low-dimensional parameter, but suffer from theso-called curse of dimensionality , i.e., the convergence rate quickly deteriorates as thedimension of the parameter increases. Recent advances in adaptive and anisotropicsparse quadrature [72, 31] and high-order quasi Monte Carlo methods [35] have beenshown to achieve a dimension-independent convergence rate N − s with s potentiallymuch larger than 1 /
2. Here s is independent of the nominal dimension of the uncertainparameter and depends only on the choice of parametrization and on the regularity ofthe control objective with respect to the parameter, thus mitigating or even breakingthe curse of dimensionality. However, the convergence may still be slow if the choiceof the parametrization does not correctly capture the effective parameter dimensions.Another computational challenge is that, in applications of practical relevance,the PDE may lead, after discretization, to possibly nonlinear large-scale systems thatare extremely expensive to solve. Thus, only a limited number of high-fidelity PDE . Chen, U. Villa, and O. Ghattas Taylor approximation and variance reduction for optimal control under uncertainty
The Taylor approximation may be not sufficiently accurate and introduces biasin the estimation of the mean and variance of the control objective. Specifically, thenorm of the remainder of the Taylor expansion scales as O (tr( C ) ( r +1) / ) where C isthe covariance of the uncertain parameter and r is the order of the truncated Taylorexpansion, e.g., r = 1 and 2 for linear and quadratic approximations, respectively [3].To address this issue, we propose to use the Taylor approximation as a control variateto reduce the variance of the Monte Carlo estimator for the mean and variance ofthe control objective, or use Monte Carlo correction for the remainder of the Taylorexpansion from another viewpoint. In cases when the Taylor approximation is highlycorrelated with the exact control objective, reduction in the computational cost canbe dramatic, up to several orders of magnitude in our numerical examples.We consider two examples of PDE-constrained optimal control under uncertaintyto numerically illustrate the scalability, accuracy and efficiency of our computationalframework. The first is a problem of optimal control of subsurface flow in a porousmedium, where the control is the injection rate at given well locations, the state equa-tion is an elliptic PDE representing single phase flow with uncertain permeability, andthe objective is to drive the pressure field to a desired state. The second is an optimalboundary control for a turbulent jet flow, where the control is the inlet velocity profile,the state equations are the Reynolds-averaged Navier–Stokes equations coupled with anonlinear stochastic advection-diffusion equation representing model uncertainty, andthe objective is to maximize the jet width at a cross-section a certain distance fromthe inlet boundary. In both cases, the uncertain parameter is infinite-dimensional inthe continuous setting and becomes high-dimensional after discretization.For solution of the PDE-constrained optimal control with Taylor approximationand variance reduction we employ a gradient-based method, specifically the limitedmemory BFGS method [62]. The cost functional involves the analytical expressionsfor the mean and variance of the Taylor approximation and the Monte Carlo cor-rection when using the Taylor approximation for variance reduction; the constraintinvolves the state (possibly nonlinear) PDE and a set of linearized PDEs used in thecomputation of the gradient and Hessian of the control objective with respect to theuncertain parameter. We use the Lagrangian formalism to derive the gradient of thecost functional with respect to the control. Specifically, we compute mixed high-orderderivatives of the Lagrangian functional with respect to the state, the adjoint, the un-certain parameter, and the control by symbolic differentiation in a variational settingbefore discretization. This allows us to efficiently apply our framework to rather com-plex PDE models such as the turbulent jet example, which features coupled nonlinearPDEs, nonlinear numerical stabilization, and weak imposition of Dirichlet boundaryconditions. We demonstrate numerically that our method scales independent of theuncertain parameter dimension, in that the number of required PDE solves does notdepend on the discretization of the parameter field, and instead depends only onthe effective or instrinsic dimension of the problem, i.e., the number of parameterdimensions to which the control objective is sensitive.The rest of the paper is organized as follows: We present the basic setting forthe PDE-constrained optimal control under uncertainty in Section 2 using a generalnotation for the uncertain parameter, the PDE model, the cost functional, and theoptimal control problem. We also present a sample average approximation of themean and variance of the control objective in this section. Section 3 is devoted tothe development of the Taylor approximation and variance reduction for the evalua-tion of the mean and variance of the control objective. We also present computation . Chen, U. Villa, and O. Ghattas
2. PDE-constrained optimal control under uncertainty.
In this sectionwe present the PDE-constrained optimal control under uncertainty in an abstractsetting. To this aim, let us first introduce the notation used throughout the paper.Let X be a separable Banach space and X ′ the dual space, then X h· , ·i X ′ denotes theduality pairing between the spaces X and X ′ . For ease of notation, we will omit tospecify the subscritps X and X ′ and simply write h· , ·i when the spaces can be inferredfrom the context without ambiguity. Given two separable Banach spaces X and Y and a map f : X × Y 7→ R , ∂ x f ( x, y ) ∈ X ′ denotes the Fr´echet derivative of f ( x, y )with respect to x evaluated at ( x, y ), which satisfies(2.1) lim ˜ x → | f ( x + ˜ x, y ) − f ( x, y ) − h ˜ x, ∂ x f ( x, y ) i| / || ˜ x || X = 0 . Let ∂ xy f ( x, y ) : Y 7→ X ′ denote the Fr´echet derivative of ∂ x f ( x, y ) with respect to y evaluated at ( x, y ), or the second order (mixed) Fr´echet derivative of f ( x, y ) withrespect to x and y evaluated at ( x, y ). Similarly, ∂ yx f ( x, y ) : X 7→ Y ′ denotes theFr´echet derivative of ∂ y f ( x, y ) with respect to x evaluated at ( x, y ), which is theadjoint operator of ∂ xy f ( x, y ) and satisfies(2.2) h ˜ x, ∂ xy f ˆ y i := X h ˜ x, ∂ xy f ˆ y i X ′ = Y h ˆ y, ∂ yx f ˜ x i Y ′ =: h ˆ y, ∂ yx f ˜ x i , ∀ ˜ x ∈ X , ˆ y ∈ Y . where we have omitted the argument ( x, y ) for simplicity. Finally, let us introducethe separable Banach spaces U , V , M , and Z in which the state u , the adjoint v , theuncertain parameter m , and the deterministic control z live, respectively. The exactdefinition of these spaces is problem specific and is specified in Sec. 5. We assume that the uncertain parameter m ∈M has mean ¯ m ∈ M and covariance C : M → M ′ . The computational frameworkdeveloped in this work is applicable as long as this assumption is satisfied regarding theuncertain parameter, which can be both finite and infinite dimensional. In particular,to demonstrate the scalability of our computational methods, we consider an infinite-dimensional Gaussian random field with distribution µ = N ( ¯ m, C ) for the uncertainparameter, which is one of the most important building blocks in modern spatialstatistical modeling. A general covariance structure is known as Mat´ern covariance.It is equivalent to that the uncertain parameter m is given as the solution of somelinear fractional stochastic PDE [55]. In this work we consider this type of covariance,i.e.,(2.3) C = ( − α ∇ · (Θ ∇ ) + α I ) − α , where ∇· , ∇ , and I are the divergence, gradient, and identity operators, respectively;the parameter α > α > d/ d = 1 , , α , α > ∈ R d × d enables its spatialanisotropy. Taylor approximation and variance reduction for optimal control under uncertainty
We consider a state problemmodeled by the following PDE model. Given a realization of the uncertain parameter m ∈ M and a deterministic control z ∈ Z , find the state u ∈ U , such that(2.4) R ( u, m, z ) = 0 in V ′ , where R ( · , m, z ) : U → V ′ denotes a (possibly nonlinear) operator from U to the dualof V . We then define the weak form associated to the state equation (2.4) by meansof duality pairing, and write(2.5) r ( u, v, m, z ) := V h v, R ( u, m, z ) i V ′ = 0 ∀ v ∈ V . Note that r ( u, v, m, z ) is linear with respect to the adjoint v and possibly nonlinearwith respect to the state u .As for the control objective, we consider a real-valued, possibly nonlinear, func-tional Q of the state u ∈ U ,(2.6) Q : U → R . For simplicity, we assume that Q depends on m and z only implicitly through thesolution of the state equation (2.5). The general case where Q has an explicit depen-dence on m and z can be treated with slight modifications in the derivation of thecost functional and its gradient, as we address along the presentation in Sec. 3 andSec. 4. Since thecontrol objective Q (implicitly) depends on the uncertain parameter m through thestate u , we face PDE-constrained optimal control under uncertainty. In this work weconsider a mean-variance cost functional of the form(2.7) J ( z ) = E [ Q ] + β Var[ Q ] + P ( z ) , which is often used in stochastic programming [73]. Above E [ Q ] and Var[ Q ] are,respectively, the mean and variance of Q with respect to the probability measure µ of the uncertain parameter m , and are defined as(2.8) E [ Q ] := Z M Q ( u ( m, z )) dµ ( m ) , and(2.9) Var[ Q ] := Z M ( Q ( u ( m, z )) − E [ Q ]) dµ ( m ) = E [ Q ] − ( E [ Q ]) . As the control objective Q (implicitly) depends on the control z through the state u ,so do E [ Q ] and Var[ Q ]. The scaling factor β ≥ P ( z ) is a penalty term.Recalling that the state u (implicitly) depends on the uncertain parameter m and control z through the solution of the PDE (2.5), we then formulate the PDE-constrainted optimal control under uncertainty as: find z ∗ ∈ Z such that(2.10) z ∗ = arg min z ∈Z J ( z ) , subject to the problem (2.5) . To evaluate the mean and variance in the cost functional (2.7), a plain MonteCarlo sample average approximation can be used. We briefly discuss this approachbelow, while in the following section we introduce more sophisticated approximationthat allow for more efficient algorithms by exploiting high-order derivatives. . Chen, U. Villa, and O. Ghattas The (plain Monte Carlo) sample aver-age approximation of the mean reads(2.11) E [ Q ] ≈ b Q := 1 M M X i =1 Q ( m i ) , where m i , i = 1 , . . . , M , are independent indentically distributed (i.i.d.) randomsamples drawn from the probability measure µ . Similarly, the sample average ap-proximation of the variance is given by(2.12) Var[ Q ] ≈ b V Q := 1 M M X i =1 Q ( m i ) − M M X i =1 Q ( m i ) ! , where m i , i = 1 , . . . , M , are i.i.d. random samples from µ . Since each evaluationof Q ( m i ) involves solving the possibly nonlinear state equation (2.5), it is commonpractice to choose M = M = M and reuse the same samples m i to estimate boththe mean and variance of Q . However, a more careful choice of M and M relieson estimating the variance of Q and Q and aims at minimizing the mean squareerror of b Q + β b V Q . Note that a large number of samples is required to obtain arelatively accurate approximation of E [ Q ] and Var[ Q ] due to the slow convergenceof Monte Carlo sampling, thus rendering this approach computationally prohibitivefor practical applications due to the large number of (nonlinear) PDE solves at eachoptimization step.
3. Taylor approximation and variance reduction.
We develop the mainapproximation methods in this section, which include the approximation by (low-order) Taylor expansion of the control objective, the computation of the gradient andHessian of the control objective with respect to the uncertain parameter used in theTaylor expansion, the randomized algorithm to solve a generalized eigenvalue problemfor trace estimation, and the Monte Carlo correction for the remainder of the Taylorexpansion.
In this section wederive expressions for the Taylor expansion of the control objective Q ( u ( m, z )) withrespect to the uncertain parameter m for a fixed value of the control z . In what follows,we denote with the subscript m the total derivative of Q ( u ( m, z )) with respect to m and we assume sufficient smoothness so that all Fr´echet derivatives are well-defined.For ease of notation, we also denote with h· , ·i the duality-pairing M h· , ·i M ′ , and, witha slight abuse of notation, we write Q ( m ) to indicate the control objective Q ( u ( m, z ))evaluated at a fixed value of z .A formal Taylor expansion of the control objective evaluated at ¯ m ∈ M is givenby(3.1) Q ( m ) = Q ( ¯ m ) + h m − ¯ m, Q m ( ¯ m ) i + 12 h m − ¯ m, Q mm ( ¯ m ) ( m − ¯ m ) i + · · · , where Q m ( m ) ∈ M ′ and Q mm ( ¯ m ) : M → M ′ are the first and second order Fr´echetderivatives of Q with respect to m evaluated at the nominal ¯ m ∈ M . In what follows,we refer to such derivatives as the gradient and Hessian of Q , respectively. Based onthe Taylor expansion, we can define the linear approximation of Q as(3.2) Q lin ( m ) = Q ( ¯ m ) + h m − ¯ m, Q m ( ¯ m ) i , Taylor approximation and variance reduction for optimal control under uncertainty and the quadratic approximation of Q as(3.3) Q quad ( m ) = Q ( ¯ m ) + h m − ¯ m, Q m ( ¯ m ) i + 12 h m − ¯ m, Q mm ( ¯ m ) ( m − ¯ m ) i . We remark that Q lin ( m ) and Q quad ( m ) are polynomials with respect to the uncertainparameter m , and depend on the control z through the solution of the state equation(2.5) at m = ¯ m .For the linear approximation, given the mean ¯ m and covariance C of the uncertainparameter m , we have(3.4) E [ Q lin ] = Q ( ¯ m ) , and Var[ Q lin ] = h Q m ( ¯ m ) , C Q m ( ¯ m ) i . In fact, Q lin ( m ) is Gaussian if the uncertain parameter m ∼ N ( ¯ m, C ). For thequadratic approximation, we have(3.5) E [ Q quad ] = E [ Q lin ] + 12 tr( H ) , and Var[ Q quad ] = Var[ Q lin ] + 12 tr( H ) , where H = C / Q mm ( ¯ m ) C / is the covariance preconditioned Hessian, and tr( · ) de-notes the trace operator, see details in [3]. Efficient algorithms to accurately estimatethe trace of H and H will be discussed in Sec. 3.3.Finally, following [3], we approximate the cost functional J ( z ) in (2.7) by replacing E [ Q ] and Var[ Q ] with the moments of the linear or quadratic Taylor expansions ofthe control objective. In this section we use the Lagrangianformalism to derive the expressions for the gradient Q m and for the action of theHessian Q mm on a given direction ˆ m evaluated at ¯ m . In what follows, for ease ofnotation, we use the overline to indicate quantities evaluated at m = ¯ m and define(3.6) ¯ r := r ( u, v, ¯ m, z ) , and ¯ Q := Q ( u ( ¯ m, z )) . We define the Lagragian functional(3.7) L ( u, v, m, z ) = Q ( u ) + r ( u, v, m, z ) , where the adjoint v is regarded as the Lagrangian multiplier of the state equation.By requiring the first order variation of (3.7) at ¯ m with respect to the adjoint v to vanish, we obtain the state problem: find u ∈ U , such that(3.8) h ˜ v, ∂ v ¯ r i = 0 , ∀ ˜ v ∈ V . Similarly, by setting the first order variation of (3.7) at ¯ m with respect to the state u to zero, we obtain the adjoint problem: find v ∈ V , such that(3.9) h ˜ u, ∂ u ¯ r i = −h ˜ u, ∂ u ¯ Q i , ∀ ˜ u ∈ U . Then, the gradient of Q at ¯ m acting in the direction ˜ m = m − ¯ m is given by(3.10) h ˜ m, Q m ( ¯ m ) i = h ˜ m, ∂ m ¯ r i , where u solves the state problem (3.8) and v solves the adjoint problem (3.9). . Chen, U. Villa, and O. Ghattas Q acting on ˆ m , we consider theLagrangian functional(3.11) L H ( u, v, m, z ; ˆ u, ˆ v, ˆ m ) = h ˆ m, ∂ m ¯ r i + h ˆ v, ∂ v ¯ r i + h ˆ u, ∂ u ¯ r + ∂ u ¯ Q i , where ˆ u and ˆ v denote the incremental state and incremental adjoint, respectively. Bytaking variation of (3.11) with respect to the adjoint v and using (2.2), we obtain theincremental state problem: find ˆ u ∈ U such that(3.12) h ˜ v, ∂ vu ¯ r ˆ u i = −h ˜ v, ∂ vm ¯ r ˆ m i , ∀ ˜ v ∈ V , where the derivatives ∂ vu ¯ r : U → V ′ and ∂ vm ¯ r : M → V ′ are linear operators. Theincremental adjoint problem, obtained by taking variation of (3.11) with respect tothe state u and using (2.2), reads: find ˆ v ∈ V such that(3.13) h ˜ u, ∂ uv ¯ r ˆ v i = −h ˜ u, ∂ uu ¯ r ˆ u + ∂ uu ¯ Q ˆ u + ∂ um ¯ r ˆ m i , ∀ ˜ u ∈ U , where ∂ uv ¯ r : V → U ′ is the adjoint of ∂ vu ¯ r : U → V ′ in the sense of (2.2). We remarkthat an extra term −h ˜ u, ∂ um ¯ Q ˆ m i is added on the right hand side if Q explicitlydepends on m . The Hessian of Q at ¯ m acting on ˆ m can then be computed by takingvariation of (3.11) with respect to m and using (2.2) as(3.14) h ˜ m, Q mm ( ¯ m ) ˆ m i = h ˜ m, ∂ mv ¯ r ˆ v + ∂ mu ¯ r ˆ u + ∂ mm ¯ r ˆ m i , where the incremental state ˆ u and adjoint ˆ v solve (3.12) and (3.13), respectively. Notethat h ˜ m, ∂ mu ¯ Q ˆ u + ∂ mm ¯ Q ˆ m i is also added if Q explicitly depends on m .We summarize the computation of the gradient and Hessian of the control objec-tive Q with respect to the uncertain parameter m in Algorithm 1. Algorithm 1
Computation of the gradient and Hessian of Q with respect to m . Input: control objective Q , PDE r ( u, v, m, z ), mean ¯ m , direction ˆ m . Output: the gradient Q m ( ¯ m ) and Hessian Q mm ( ¯ m ) acting on ˆ m .1. Solve the state problem (3.8) for u .2. Solve the adjoint problem (3.9) for v .3. Compute the gradient Q m ( ¯ m ) by (3.10).4. Solve the incremental state problem (3.12) for ˆ u .5. Solve the incremental adjoint problem (3.13) for ˆ v .6. Compute the Hessian Q mm ( ¯ m ) acting on ˆ m by (3.14). The first and secondmoments of the quadratic approximation Q quad in equation (3.5) involve the trace ofthe covariance preconditioned Hessian H . H is formally a large scale dense operatorand it is only implicitly defined through its action on a given direction as in (3.14).The standard approach to estimate the trace of an implicitly defined operator isbased on a Monte Carlo method, where one computes the action of the operator on N random directions sampled from a suitable distribution. For example, using theGaussian trace estimator, one can approximate tr( H ) and tr( H ) as(3.15) tr( H ) ≈ b T ( H ) := 1 N N X j =1 h ˆ m j , Q mm ( ¯ m ) ˆ m j i , Taylor approximation and variance reduction for optimal control under uncertainty and(3.16) tr( H ) ≈ b T ( H ) := 1 N N X j =1 h Q mm ( ¯ m ) ˆ m j , C Q mm ( ¯ m ) ˆ m j i , where ˆ m j , j = 1 , . . . , N , are i.i.d. random samples from the Gaussian distribution N (0 , C ). Lower bounds on the number of samples N to obtain a guaranteed prob-abilistic error bound for the trace are discussed in [5]. Such bounds hold for anysymmetric positive defined operator, regardless of size and spectral properties. How-ever the number of samples N necessary to obtain an accurate approximation maybe prohibitively large and impractical: e.g., to guarantee a relative error of 10 − inthe estimation of the trace more than 10 samples are need using the bounds in [5].Improved bounds are demonstrated in [66] and show that the Gaussian estimator be-comes very inefficient if the stable rank of the operator is small and that it allows forsmall N only if the eigenvalues are all of approximately the same size. This meansthat the randomized Gaussian estimator is not a viable solution to estimate the traceof H , since it has been observed numerically or proven analytically that for manyproblems the Hessian operator is either nearly low-rank or its eigenvalues exhibit fastdecay [8, 36, 19, 21, 20, 18, 23, 30, 2, 3, 1, 33, 64, 46, 58, 22, 24].In this work, we propose to estimate the trace of H and H by means of arandomized approximated eigendecomposition. Specifically, we approximate tr( H )and tr( H ) by(3.17) tr( H ) ≈ b T ( H ) := N X j =1 λ j ( H ) , and tr( H ) ≈ b T ( H ) := N X j =1 λ j ( H ) , where λ j ( H ), j = 1 , . . . , N , are the dominant eigenvalues of H obtained by solvingthe generalized eigenvalue problem(3.18) h φ, Q mm ( ¯ m ) ψ j i = λ j h φ, C − ψ j i , ∀ φ ∈ M , j = 1 , . . . , N. Above, the eigenvector ψ j , j = 1 , . . . , N , are C − -orthonormal, that is(3.19) h ψ j , C − ψ i i = δ ij , i, j = 1 , . . . , N, where δ ij is the Kronecker delta. Note that N is independent of the nominal parameterdimension, and, as shown numerically in Sec. 5, it is relatively small for H nearly low-rank or when its spectrum exhibits fast decay.After suitable discretization, e.g., by a finite element discretization, the gener-alized eigenvalue problem (3.18) results in an algebraic eigenproblem of the form Aψ = λBψ with A, B ∈ R n × n and ψ ∈ R n . Algorithm 2 summarizes the so-called double pass randomized algorithm to solve the algebraic generalized eigenvalue prob-lem (see [42, 70] for details of the algorithms and [76] for its implementation).There are multiple advantages in using Algorithm 2 to approximate the trace,see e.g., [42, 69]. In terms of accuracy, the approximation error is bounded by thesum of the remaining eigenvalues, so that the error is small if the eigenvalues decayfast or if the Hessian H has low rank, see [42, 69] for more details. In terms ofcomputational efficiency, the 2( k + p ) Hessian matrix-vector products, which entailthe solution of a pair of linearized state/adjoint equations (as shown in Sec. 3.2)and therefore dominate the computational cost of the algorithm, can be computedindependently and, therefore, asynchronously across the random directions. . Chen, U. Villa, and O. Ghattas Algorithm 2
Randomized algorithm for generalized eigenvalue problem (
A, B ) Input: matrix
A, B , the number of eigenpairs k , an oversampling factor p . Output: eigenpairs (Λ k , Ψ k ) with Λ k = diag( λ , . . . , λ k ) and Ψ k = ( ψ , . . . , ψ k ).1. Draw a Gaussian random matrix Ω ∈ R n × ( k + p ) .2. Compute Y = B − ( A Ω).3. Compute QR -factorization Y = QR such that Q ⊤ BQ = I k + p .4. Form T = Q ⊤ AQ and compute eigendecomposition T = S Λ S ⊤ .5. Extract Λ k = Λ(1 : k, k ) and Ψ k = QS k with S k = S (: , k ). Using the moments of the truncated Taylor ex-pansion Q lin and Q quad as approximation of E [ Q ] and Var[ Q ] in the cost functional(2.7) may be not sufficiently accurate. In this circumstance, we can use the Taylorapproximation as a control variate to reduce the variance of the Monte Carlo estima-tor. The basic idea is to correct the moments of Q lin and Q quad by applying a MonteCarlo method to estimate the mean (and variance) of the difference between Q andits Taylor approximation.Specifically, recalling (3.4) and (3.2), the Monte Carlo correction for the mean ofthe linear approximation reads E [ Q ] = E [ Q lin ] + E [ Q − Q lin ] ≈ b Q lin := Q ( ¯ m ) + 1 M M X i =1 (cid:16) Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i (cid:17) . (3.20)Similarly, recalling that the variance is invariant with respect to deterministic trans-lations, we have(3.21) Var[ Q ] = Var[ Q − Q ( ¯ m )] = E [( Q − Q ( ¯ m )) ] − ( E [ Q − Q ( ¯ m )]) = E [( Q lin − Q ( ¯ m )) ] + E [( Q − Q ( ¯ m )) − ( Q lin − Q ( ¯ m )) ] − (cid:0) E [ Q lin − Q ( ¯ m )] + E [( Q − Q ( ¯ m )) − ( Q lin − Q ( ¯ m ))] (cid:1) , and therefore the Monte Carlo correction of the variance of the linear approximationis given by(3.22) b V lin Q := hC Q m ( ¯ m ) , Q m ( ¯ m ) i + 1 M M X i =1 (cid:16) ( Q ( m i ) − Q ( ¯ m )) − ( h m i − ¯ m, Q m ( ¯ m ) i ) (cid:17) − M M X i =1 Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i ! . Note that, similar to the sample average approximation in Sec. 2.4, for simplicity wecan use the same M = M = M random samples for the approximation of both themean and variance. A more careful choice of M and M relies on the variance of Q (1) = Q − Q lin and Q (2) = ( Q − ¯ Q ) − ( Q − Q lin ) , in order to balance the errorsin the approximation of the mean and variance. For the quadratic approximation, a2 Taylor approximation and variance reduction for optimal control under uncertainty combination of (3.5) and (3.3) leads to(3.23) E [ Q ] = E [ Q quad ] + E [ Q − Q quad ] ≈ b Q quad := Q ( ¯ m ) + 12 tr( H )+ 1 M M X i =1 (cid:16) Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i − h m i − ¯ m, Q mm ( ¯ m ) ( m i − ¯ m ) i (cid:17) . For the variance, akin to the linear approximation, we obtain(3.24) Var[ Q ] = E [( Q quad − Q ( ¯ m )) ] + E [( Q − Q ( ¯ m )) − ( Q quad − Q ( ¯ m )) ] − (cid:0) E [ Q quad − Q ( ¯ m )] + E [( Q − Q ( ¯ m )) − ( Q quad − Q ( ¯ m ))] (cid:1) , which can be approximated by(3.25) b V quad Q := hC Q m ( ¯ m ) , Q m ( ¯ m ) i + 14 (tr( H )) + 12 tr( H )+ 1 M M X i =1 (cid:16) ( Q ( m i ) − Q ( ¯ m )) − (cid:16) h m i − ¯ m, Q m ( ¯ m ) i + 12 h m i − ¯ m, Q mm ( ¯ m ) ( m i − ¯ m ) i (cid:17) (cid:17) − (cid:16)
12 tr( H ) + 1 M M X i =1 (cid:16) Q ( m i ) − h m i − ¯ m, Q m ( ¯ m ) i − h m i − ¯ m, Q mm ( ¯ m ) ( m i − ¯ m ) i (cid:17)(cid:17) . Note that to achieve certain approximation accuracy, the number of Monte Carlosamples depends on the variance of the integrand. If the approximations Q lin and Q quad are highly correlated with Q , the variances of Q − Q lin and Q − Q quad wouldbe small so that only a small number of Monte Carlo samples are required. This useof the Taylor approximation (of the control objective with respect to the uncertainparameter) can be viewed as a kind of “multifidelity Monte Carlo method” [63], inwhich the Taylor approximation plays the role of a reduced model.
4. Gradient-based optimization.
In this section, we present a gradient-basedoptimization method to solve the problem (2.10) using different approximations intro-duced in Sec. 3, namely the sample average approximation, the first and second orderTaylor approximation, and variance reduction using the Taylor approximation as thecontrol variate. These approximated formulations feature different levels of accuracyand computational complexity, and could possibly be combined within a multifidelityframework to obtain an efficient computational procedure while preserving high ac-curacy. In this section, we use the Lagrangian formalism to derive expressions forthe gradient of the approximated formulations of the cost functional with respect tothe control z , which are needed to perform the gradient-based optimization. In thefollowing, to distinguish the gradient of the cost function with respect to the controlfrom the gradient of the control objective with respect to the uncertain parameter,we denote the former as z -gradient. z -gradient for the sample average approximation. By the sampleaverage approximation presented in Sec. 2.4, the cost functional becomes(4.1) J MC ( z ) = b Q + β b V Q + P ( z ) , subject to the state problem (2.5) at m i , i = 1 , . . . , M , where M = M = M if thesame samples are used for the mean and variance or M = M + M otherwise. To . Chen, U. Villa, and O. Ghattas z ( z -gradient for short), weconsider the Lagrangian functional(4.2) L MC ( { u i } , { v i } , z ) = J MC ( z ) + M X i =1 r ( u i , v i , m i , z ) , where { v i } are the Lagrange multipliers (adjoints) of the state problems: given m i ∈M and z ∈ Z , find u i ∈ U such that(4.3) r ( u i , v i , m i , z ) = 0 , ∀ v i ∈ V ; i = 1 , . . . M. The i -th adjoint problem ( i = 1 , . . . M ) is obtained by setting to zero the first variationof the Lagrangian functional with respect to the state u i . The i -th adjoint v i thensolves: given m i ∈ M , z ∈ Z , and u i ∈ U as the solution of (4.3), find v i ∈ V suchthat(4.4) h ˜ u i , ∂ u r ( u i , v i , m i , z ) i = − M h ˜ u i , ∂ u Q ( m i ) i − βM Q ( m i ) h ˜ u i , ∂ u Q ( m i ) i + 2 βM M M X j =1 Q ( m j ) h ˜ u i , ∂ u Q ( m i ) i , ∀ ˜ u i ∈ U . Then we can compute the z -gradient as(4.5) h ˜ z, D z J MC ( z ) i = h ˜ z, ∂ z L MC ( { u i } , { v i } , z ) i = h ˜ z, D z P ( z ) i + M X i =1 h ˜ z, ∂ z r ( u i , v i , m i , z ) i , where { u i } and { v i } denote the set of solutions of the state problem (4.3) and adjointproblem (4.4) at the set of samples { m j } , respectively. Therefore, each gradient eval-uation requires M (nonlinear) state PDE solves and M linearized PDE solves. Notethat, for the general case in which Q explicitly depends on z , extra terms involving h ˜ z, ∂ z Q ( m i ) i need to be included in the gradient computation. z -gradient for the Taylor approximation. We first consider the linearapproximation (3.4), by which the cost functional (2.7) becomes(4.6) J lin ( z ) = ¯ Q + β hC Q m ( ¯ m ) , Q m ( ¯ m ) i + P ( z ) , where ¯ Q and its gradient Q m ( ¯ m ) are defined in (3.6) and (3.10), respectively.To compute the z -gradient of (4.6), we introduce the Lagrangian functional(4.7) L lin ( u, v, u ∗ , v ∗ , z ) = J lin ( z ) + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i , where the adjoints v ∗ ∈ V and u ∗ ∈ U represent the Lagrange multipliers of the stateproblem (3.8) and adjoint problem (3.9).By setting to zero the first order variation of the Lagrangian (4.7) with respectto v and using (2.2), we have: find u ∗ ∈ U such that(4.8) h ˜ v, ∂ vu ¯ ru ∗ i = − β h ˜ v, ∂ vm ¯ r ( C ∂ m ¯ r ) i , ∀ ˜ v ∈ V . while setting to zero the first order variation with respect to u and using (2.2), wehave: find v ∗ ∈ V such that(4.9) h ˜ u, ∂ uv ¯ r v ∗ i = − h ˜ u, ∂ u ¯ Q i − β h ˜ u, ∂ um ¯ r ( C ∂ m ¯ r ) i − h ˜ u, ∂ uu ¯ r u ∗ + ∂ uu ¯ Q u ∗ i , ∀ ˜ u ∈ U . Taylor approximation and variance reduction for optimal control under uncertainty
Finally, the z -gradient of J lin in a direction ˜ z ∈ Z is computed as(4.10) h ˜ z, D z J lin ( z ) i = h ˜ z, ∂ z L lin ( u, v, u ∗ , v ∗ , z ) i = 2 β h ˜ z, ∂ zm ¯ r ( C ∂ m ¯ r ) i + h ˜ z, ∂ z P ( z ) i + h ˜ z, ∂ zv ¯ r v ∗ + ∂ zu ¯ r u ∗ i . In summary, evaluation of the cost functional with the linear approximation requiressolution of the state problem (3.8) and adjoint problem (3.9). Computation of the z -gradient requires, in addition, solution of the two linear problems in (4.8) and (4.9).For the quadratic approximation (3.3), thanks to the relation (3.5) and the ap-proximation (3.16), we obtain the approximation of the cost functional as(4.11) J quad ( z ) = ¯ Q + 12 N X j =1 λ j + β hC Q m ( ¯ m ) , Q m ( ¯ m ) i + 12 N X j =1 λ j + P ( z ) , which is subject to the state and adjoint problems (3.8) and (3.9) for the evaluationof Q and its gradient at ¯ m , the generalized eigenvalue problems (3.18) for the compu-tation of the eigenvalues, as well as the incremental state and adjoint problems (3.12)and (3.13) for the Hessian action in (3.18). Correspondingly, we form the Lagrangian(4.12) L quad (cid:0) u, v, { λ j } , { ψ j } , { ˆ u j } , { ˆ v j } , u ∗ , v ∗ , { λ ∗ j } , { ψ ∗ j } , { ˆ u ∗ j } , { ˆ v ∗ j } , z (cid:1) = J quad ( z ) + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i + N X j =1 h ψ ∗ j , ( Q mm ( ¯ m ) − λ j C − ) ψ j i + N X j =1 λ ∗ j ( h ψ j , C − ψ j i − N X j =1 h ˆ v ∗ j , ∂ vu ¯ r ˆ u j + ∂ vm ¯ r ψ j i + N X j =1 h ˆ u ∗ j , ∂ uv ¯ r ˆ v j + ∂ uu ¯ r ˆ u j + ∂ uu ¯ Q ˆ u j + ∂ um ¯ r ψ j i . Here we assume that the dominating eigenvalues are not repeated (thanks to theirrapid decay as shown in Sec. 5), so that the constraints h ψ j , C − ψ j i = 1, j = 1 , . . . , N ,are sufficient to guarantee the orthonormality h ψ j , C − ψ i i = δ ij for any i, j = 1 , . . . , N .In fact, for any i = j , by definition we have(4.13) h ψ j , Q mm ψ i i = h ψ j , λ i C − ψ i i and h ψ i , Q mm ψ j i = h ψ i , λ j C − ψ j i , which, by the symmetry of Q mm and C − , leads to(4.14) ( λ i − λ j ) h ψ i , C − ψ j i = 0 , so that h ψ i , C − ψ j i = 0 if λ i = λ j . By setting the variation of L quad with respect to λ j as zero, we obtain(4.15) ψ ∗ j = 1 + 2 βλ j ψ j , j = 1 , . . . , N. Subsequently, for each j = 1 , . . . , N , taking the variation of L quad with respect to ˆ v j as zero, and using the Hessian action equation (3.14) and (2.2), we have: find ˆ u ∗ j ∈ U such that(4.16) h ˜ v, ∂ vu ¯ r ˆ u ∗ j i = −h ˜ v, ∂ vm ¯ r ψ ∗ j i , ∀ ˜ v ∈ V , which is the same as the incremental forward problem (3.12). Therefore, from theresult (4.15), we have(4.17) ˆ u ∗ j = 1 + 2 βλ j u j , j = 1 , . . . , N . . Chen, U. Villa, and O. Ghattas u j as zero and using (2.2), we have: find ˆ v ∗ j ∈ V such that(4.18) h ˜ u, ∂ uv ¯ r ˆ v ∗ j i = −h ˜ u, ∂ uu ¯ r ˆ u ∗ j + ∂ uu ¯ Q ˆ u ∗ j + ∂ um ¯ r ψ ∗ j i , ∀ ˜ u ∈ U , which is the same as the incremental adjoint problem (3.13). Therefore, by (4.17) and(4.15), we obtain(4.19) ˆ v ∗ j = 1 + 2 βλ j v j , j = 1 , . . . , N. By setting to zero the variation with respect to v and using (2.2), we have: find u ∗ ∈ U such that(4.20) h ˜ v, ∂ vu ¯ r u ∗ i = − β h ˜ v, ∂ vm ¯ r ( C ∂ m ¯ r ) i − N X j =1 h ˜ v, ∂ vmu ¯ r ˆ u j ψ ∗ j + ∂ vmm ¯ r ψ j ψ ∗ j i− N X j =1 h ˜ v, ∂ vuu ¯ r ˆ u j ˆ u ∗ j + ∂ vum ¯ r ψ j ˆ u ∗ j i , ∀ ˜ v ∈ V . Lastly, by setting to zero the variation with respect to u and using (2.2), we have:find v ∗ ∈ V such that(4.21) h ˜ u, ∂ uv ¯ r v ∗ i = −h ˜ u, ∂ u ¯ Q i − β h ˜ u, ∂ um ¯ r ( C ∂ m ¯ r ) i − h ˜ u, ∂ uu ¯ r u ∗ + ∂ uu ¯ Q u ∗ i− N X j =1 h ˜ u, ∂ umv ¯ r ˆ v j ψ ∗ j + ∂ umu ¯ r ˆ u j ψ ∗ j + ∂ uum ¯ r ψ j ψ ∗ j i− N X j =1 h ˜ u, ∂ uvu ¯ r ˆ u j ˆ v ∗ j + ∂ uvm ¯ r ψ j ˆ v ∗ j i− N X j =1 h ˜ u, ∂ uuv ¯ r ˆ v j ˆ u ∗ j + ∂ uuu ¯ r ˆ u j ˆ u ∗ j + ∂ uuu ¯ Q ˆ u j ˆ u ∗ j + ∂ uum ¯ r ψ j ˆ u ∗ j i , ∀ ˜ u ∈ U , where the third order derivatives take the form of the bilinear map ∂ xyz f : Z×Y → X ′ .Then, the z -gradient of the cost functional in the direction ˜ z ∈ Z can be computedas(4.22) h ˜ z, D z J quad ( z ) i = h ˜ z, ∂ z L quad (states , adjoints , z ) i , where states and adjoints represent the state and adjoint variables for short. Notethat we do not need to solve for λ ∗ j as the terms involving λ ∗ j are independent of z . In summary, to get these states and adjoints, we need to solve 1 (nonlinear) stateproblem and 2 N +3 linear problems (1 for v , N for { ˆ u j } , N for { ˆ v j } , 1 for u ∗ , and 1 for v ∗ ), where the linear operators are the same as those in (3.12) and (3.13). Moreover,to compute the eigenvalues by Algorithm 2, which requires the action of the Hessianon N + p random directions twice, we need to solve 4( N + p ) linear problems.We summarize the evaluation of the cost functional and its z -gradient withquadratic approximation as follows.6 Taylor approximation and variance reduction for optimal control under uncertainty
Algorithm 3
Computation of the cost functional J quad and its z -gradient. Input: cost functional J ( z ) and PDE r ( u, v, m, z ). Output: the approximate cost functional J quad and its z -gradient D z J quad .1. Solve the state problem (3.8) for u at ¯ m and evaluate ¯ Q = Q ( u ( ¯ m )).2. Solve the adjoint problem (3.9) for v and evaluate Q m ( ¯ m ) by (3.10).3. Use Algorithm 1 and 2 to compute the generalized eigenpairs ( λ j , ψ j ) Nj =1 .4. Compute the approximate cost functional J quad by (4.11).5. Solve the linear problem (4.20) for ˆ u ∗ .6. Solve the linear problem (4.21) for ˆ v ∗ .7. Compute the z -gradient D z J quad by (4.22). z -gradient for the Monte Carlo correction. The approximation in (4.6)and (4.11) introduced by the Taylor approximation may be not sufficiently accurateand generates bias in the evaluation of the mean E [ Q ] and variance Var[ Q ]. Recalling(3.20) and (3.22), we obtain a MC-corrected unbiased linear approximation for thecost functional, which is given by(4.23) J MClin ( z ) = b Q lin + β b V lin Q + P ( z ) , subject to the state problem (3.8) at ¯ m , the adjoint problem (3.9), and the stateproblems (2.5) at m i , i = 1 , . . . , M , where M = M = M if the same samplesare used for the mean and variance or M = M + M otherwise. We form thecorresponding Lagrangian functional as(4.24) L MClin (cid:16) u, v, { u i } , u ∗ , v ∗ , { v i } , z (cid:17) = J MClin ( z ) + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i + M X i =1 r ( u i , v i , m i , z ) . Setting variation with respect to v ∗ , v i , and u ∗ to zero we obtain u , u i , and v bysolving 1 + M state problems (2.5) at m i , i = 0 , . . . , M , and 1 linear adjoint problem(3.9). In a similar way, setting variation with respect to v , u i , and u to zero, weobtain u ∗ , v i , and v ∗ , by solving 2 + M linear problems. Thus, in addition, 2 + M linear problems need to be solved to compute the z -gradient(4.25) h ˜ z, D z J MClin ( z ) i = h ˜ z, ∂ z L MClin (cid:16) u, v, { u i } , u ∗ , v ∗ , { v i } , z (cid:17) i . As for the quadratic approximation (3.3), by the Monte Carlo correction (3.23)and (3.25), we have an unbiased and more accurate evaluation of the cost functionalas(4.26) J MCquad ( z ) = b Q quad + β b V quad Q + P ( z ) , subject to the state problems (2.5) and (3.8), the adjoint problem (3.9), the incre-mental state and adjoint problems (3.12) and (3.13). We form the Lagrangian with . Chen, U. Villa, and O. Ghattas u ∗ , v ∗ , { v i } , { λ ∗ j } , { ψ ∗ j } , { ˆ u ∗ j } , { ˆ v ∗ j } , { ˆ u ∗ i } , { ˆ v ∗ i } ) as(4.27) L MCquad (cid:16) u, v, { u i } , { λ j } , { ψ j } , { ˆ u j } , { ˆ v j } , { ˆ u i } , { ˆ v i } , u ∗ , v ∗ , { v i } , { λ ∗ j } , { ψ ∗ j } , { ˆ u ∗ j } , { ˆ v ∗ j } , { ˆ u ∗ i } , { ˆ v ∗ i } , z (cid:17) = J MCquad + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i + M X i =1 r ( u i , v i , m i , z )+ N X j =1 h ψ ∗ j , ( Q mm ( ¯ m ) − λ j C − ) ψ j i + N X j =1 λ ∗ j ( h ψ j , C − ψ j i − N X j =1 h ˆ v ∗ j , ∂ vu ¯ r ˆ u j + ∂ vm ¯ r ψ j i + N X j =1 h ˆ u ∗ j , ∂ uv ¯ r ˆ v j + ∂ uu ¯ r ˆ u j + ∂ uu ¯ Q ˆ u j + ∂ um ¯ r ψ j i + M X i =1 h ˆ v ∗ i , ∂ vu ¯ r ˆ u i + ∂ vm ¯ r m i i + M X i =1 h ˆ u ∗ i , ∂ uv ¯ r ˆ v i + ∂ uu ¯ r ˆ u i + ∂ uu ¯ Q ˆ u i + ∂ um ¯ r m i i . By setting variation with respect to v ∗ , u ∗ , v i , ˆ v ∗ j , ˆ u ∗ j , ˆ v ∗ i , ˆ u ∗ i to zero, we obtain the states u, v, u i , ˆ u j , ˆ v j , ˆ u i , ˆ v i by solving 1 + M (nonlinear) state problems and 1 + 2 N + 2 M linear problems, i = 1 , . . . , M , j = 1 , . . . , N . For the eigenpairs, 4( N + p ) linearproblems needs to be solved. By setting variation with respect to λ j , ˆ v j , and ˆ u j tozero, we obtain similar expressions (4.15), (4.17), and (4.19) for the adjoint variables ψ ∗ j , ˆ u ∗ j , and ˆ v ∗ j , where the constant βλ j is replaced by βλ j − βc where c is givenby(4.28) c = 1 M M X i =1 Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i− h m i − ¯ m, Q mm ( ¯ m )( m i − ¯ m ) i . By setting variation with respect to ˆ v i , ˆ u i , u i , v, u to vanish, we get the adjointsˆ u ∗ i , ˆ v ∗ i , v i , u ∗ , v ∗ by solving 2 + 3 M linear problems, which are defined similarly asthat for the quadratic approximation, but with the additional correction terms inboth the cost functional and the constraints, thus in total 1 + M (nonlinear) stateproblems and 3 + 6 N + 4 p + 5 M linear problems to evaluate the z -gradient(4.29) h ˜ z, D z J MCquad ( z ) i = h ˜ z, ∂ z L MCquad (states , adjoints , z ) i , where states and adjoints represent the state and adjoints for short. z -gradient-based optimization method. To solve the PDE-constrainedoptimal control problem we use a gradient-based optimization method, since it onlyrequires the ability to evaluate the cost functional (2.7) and its z -gradient. In partic-ular, we use a BFGS method [62].The five approximations described in this section all differ among each other interms of accuracy and computational cost. Table 1 summarizes the cost, measuredin number of (nonlinear) state and linear PDE solves, to evaluate cost functional andits z -gradient for each approximation. The linear approximation in (4.6) is the cheap-est to compute but also the most crude, while the MC-corrected approximations in(4.23) and (4.26) are the most accurate but also the most expensive. The quadraticapproximation (4.11) is, in many application, more accurate then the linear approx-imation and, for highly nonlinear problems, is much cheaper to compute than theMC-corrected approximation. In fact, all the linear PDE solves in the trace approxi-mation share the same left hand side ∂ vu ¯ r (or its adjoint ∂ uv ¯ r ), and therefore one can8 Taylor approximation and variance reduction for optimal control under uncertainty amortize the cost of computing an expensive preconditioner or sparse factorizationover several solutions of the linearized PDE. It is worth noting that these differentapproximations can be exploited in a multifidelity framework [63] where the optimalsolution of a cheaper approximation is used as initial guess for a more accurate (andmore expensive) approximation. For example, in the numerical results section theoptimal solution using the MC-corrected formulation are obtained by first solving thelinear approximation in (4.6), then solving the quadratic approximation in (4.11) andfinally (4.23) or (4.26).
Table 1
The computational cost for the evaluation of the cost functional J and its z -gradient D z J interms of the number of PDE solves for the five cases of approximation of E [ Q ] and Var [ Q ] . N isthe number of samples/bases for the trace estimator; p is the oversampling factor ( p = 5 ∼ ) inAlgorithm 2; M is the number of samples used in the sample average approximation or the MonteCarlo correction. Computational cost MC Linear Quadratic Linear+MC Quadratic+MC J (state PDE) M 1 1 1+M 1+M J (linear PDE) 0 1 1+4N+4p 1 1+4N+4p+2M D z J (linear PDE) M 2 2+2N 2+M 2+2N+3M
5. Numerical experiments.
We consider two examples of PDE-constrainedoptimal control under uncertainty in which the uncertain parameter is the discretiza-tion of an infinite-dimensional random field. These examples aim to demonstrate thescalability, computational efficiency and accuracy of the approximation based on theTaylor approximation and their MC-corrected counterparts. In the first example weconsider a fluid flow in a porous medium; the control is the fluid injection rate into thesubsurface at specific well locations, the uncertain parameter is the permeability field(modeled as a log-normal random field), and the objective is to drive the pressuresmeasured at the production wells to given target values. In the second example weseek an optimal velocity profile on the inlet boundary of a turbulent jet flow suchthat the jet width in a given cross-section is maximized; the governing state equa-tions are the Reynolds-averaged Navier–Stokes equations (RANS) with an algebraicclosure model and an ad-hoc nonlinear advection-diffusion equation with uncertaincoefficients is introduced to model the intermittency of turbulence at the edges of thejet. The numerical results presented in this section are obtained using hIPPYlib [76], an extensible software framework for large scale PDE-based deterministic andBayesian inverse problems. hIPPYlib builds on
FEniCS [57] (a parallel finite elementelement library) for the discretization of the PDE and
PETSc [7] for scalable and ef-ficient linear algebra operations and solvers. Finally, we use the implementation ofthe L-BFGS algorithm in the
Python library
SciPy , specifically the fmin l bfgs b routine (BFGS with limited memory and box constraint), to solve the optimizationproblem.
We con-sider an optimal source control problem—originally presented in [3]—motivated byflow in a subsurface porous medium, where the state problem is a linear elliptic equa- . Chen, U. Villa, and O. Ghattas −∇ · ( e m ∇ u ) = F z in D,u = g on Γ D ,e m ∇ u · n = 0 on Γ N , where D is an open and bounded physical domain D ⊂ R d ( d = 2 ,
3) with Lipschitzboundary ∂D , and Γ D , Γ N ⊂ ∂D (Γ D ∪ Γ N = ∂D and Γ D ∩ Γ N = ∅ ) denote re-spectively the Dirichlet and Neumann portion of ∂D . u is the state representing thepressure. m is an uncertain parameter representing the logarithm of the permeabilityfield, which is assumed to be Gaussian N ( ¯ m, C ) with mean ¯ m and covariance C . F isa map from the control z to the source term F z defined as(5.2)
F z = n c X i =1 z i f i , where f i , i = 1 , . . . , n c , are the mollified Dirac functions located at the n c injectionwells, and z = ( z , . . . , z n c ) ⊤ ∈ R N c is a control that represents the fluid injectionrate. The control z takes admissible value in Z = [ z min , z max ] n c , where z min and z max denote the minimum and maximum injection rate.Let R g denote the lifting of the boundary data g such that R g ∈ H ( D ) and R g | Γ D = g . The weak formulation of problem (5.1) reads: Given m ∈ M = L ( D )and z ∈ Z , find ˚ u = u − R g ∈ U := { v ∈ H ( D ) : v | Γ D = 0 } such that(5.3) r (˚ u, v, m, z ) = 0 , ∀ v ∈ V , where V = U , and(5.4) r (˚ u, v, m, z ) = Z D e m ∇ (˚ u + R g ) · ∇ vdx − Z D F z vdx.
The control objective measures the difference between the pressure (computed bysolving the state problem (5.3)) and a target pressure distribution ¯ u = (¯ u , . . . , ¯ u n p )at the production wells x , . . . , x n p ∈ D . Specifically,(5.5) Q (˚ u ) = ||B ˚ u − ¯ u || , where B : U → R n p is a pointwise observation operator at the locations x , . . . , x n p .Here, we set D = (0 , × (0 , D = { , } × [0 , N = (0 , × { , } . Weimpose the Dirichlet data g = 2 − x . The mean field ¯ m is shown in the top part ofFig. 1, in which locations for the 20 injection wells and the 12 production wells arealso shown. For the covariance C defined in (2.3), we specify Θ = [1 ,
0; 0 , α = 2, α = 0 .
1, and α = 20. Three random samples of the Gaussian measure N ( ¯ m, C )are shown in the bottom part of Fig. 1. In the cost functional (2.7), we set β = 1and use the control cost P ( z ) = 10 − × || z || . We use a finite element discretizationwith a structured uniform triangular mesh with 65 ×
33 vertices and piecewise linearelements for the discrete approximation of the state u , the adjoint v , and the uncertainparameter m , which results in 2145 dimensions for m . We specify the desired pressure¯ u i = 3 − x i − − x i − . at the production wells x i = ( x i , x i ), i = 1 , . . . , z min = 0 and z max = 32 for the control, i.e., z ∈ [0 , .0 Taylor approximation and variance reduction for optimal control under uncertainty
Fig. 1 . Top: mean ¯ m of the Gaussian uncertain parameter m ∼ N ( ¯ m, C ) . The red diamondsrepresent the locations of the injection wells, and the black squares represent the locations of theproduction wells. Bottom: random samples drawn from the Gaussian measure N ( ¯ m, C ) . We consider the five different approximations of the cost functional presentedin Sec. 3, and use the limited memory BFGS method with box constraints to solvethe optimization problem, where the z -gradient of the cost functional is computed asin Sec. 4. We chose z = (16 , . . . ,
16) as an initial guess for the control. We used M = 100 random samples from N ( ¯ m, C ) to estimate the mean and variance of thecontrol objective in both the sample average approximation presented in Sec. 2.4 andthe variance reduction in Sec. 3.4.For the computation of the trace appearing in the quadratic approximations (3.5),(3.23), and (3.25), we use the trace estimator b T ( · ) (3.17) by solving the generalizedeigenvalue problem (3.18) using Algorithm 2, with k = 100 and p = 10. We also testthe Gaussian trace estimator b T ( · ) in (3.15). In Fig. 2, we plot the errors, denoted as error and error for the two trace approximations, respectively, against the referencevalue P kj =1 λ j with k = 140 at both the initial control z = z and the optimal control z = z MCquad obtained by the quadratic approximation with Monte Carlo correction. Wecan observe that in both cases, the error for b T ( H ) decays much faster than thatfor b T ( H ) as a result of fast decaying eigenvalues, especially at the optimal controlwhere more than two orders of magnitude improvement in accuracy is observed for N = 100. We remark that the slow decay of the error for b T ( · ) may demand a verylarge N if relatively high accuracy of the trace estimator is required, which leadsto a large number of PDE solves at each step of the optimization procedure, thusdiminishing the computational advantage of using the Taylor approximation. In thefollowing test, we use b T ( · ) with N = 100.Fig. 3 shows the relative errors at 100 random samples of the uncertain parameter m for the linear and quadratic approximations of Q in (5.5) used for the computationof E [ Q ], as well as of q = ( Q − ¯ Q ) in (3.22) and (3.25) for the computation of Var[ Q ].We note that, for both quantities, the quadratic approximation is more accurate thanthe linear approximation for most of the realizations.Since the quadratic approximation is very close and strongly correlated to thetrue value, we may expect large variance reduction by using it as a control variate.In fact, the correlation coefficient between the quadratic approximation and the truevalue is very close to one, 0.996 for Q and 0.998 for q evaluated using the 100 samples.On the contrary, the linear approximation is poorly correlated to the true value (with . Chen, U. Villa, and O. Ghattas N -4 -3 -2 -1 λ + λ − error error N -7 -6 -5 -4 -3 -2 -1 λ + λ − error error Fig. 2 . The decay of the generalized eigenvalues, λ + for positive eigenvalues and λ − for negativeeigenvalues, and the errors, denoted as error and error for the trace estimators b T and b T withrespect to the number N in (3.15) and (3.17) . Left: at the initial control z = z ; right: at theoptimal control z = z MCquad obtained by quadratic approximation with Monte Carlo correction. sample -4 -3 -2 -1 r e l a t i v e e rr o r f o r Q linearquadratic 0 20 40 60 80 100 sample -3 -2 -1 r e l a t i v e e rr o r f o r q linearquadratic Fig. 3 . Relative errors at 100 random samples of the uncertain parameter for the linear andquadratic approximations of Q used in the computation of the mean E [ Q ] (left) and of q = ( Q − ¯ Q ) for the variance Var [ Q ] (right). Here, the errors are show at the optimal control z = z MCquad obtainedby quadratic approximation with Monte Carlo correction. correlation coefficient -0.211 for Q and 0.718 for q ), thus leading to negligible vari-ance reduction. In Tables 2 and 3, we report the effect of the variance reduction bythe linear and quadratic approximations for the computation of the mean E [ Q ] andvariance Var[ Q ], respectively. This reduction is measured by the mean-square-error(MSE), which is defined (e.g., for Q ) as(5.6) MSE( Q ) = Var p [ Q ] M , where Var p [ Q ] is the population variance of Q at M random samples.In all cases, we can see that the quadratic approximation achieves more significantvariance reduction than the linear approximation. This means that the number ofsamples needed to achieve a target MSE is significantly smaller for the quadraticapproximation than for the linear. For example, at the initial control z , the quadraticapproximation results in a speed-up (defined as the ratio between the MSE of theMonte Carlo estimator and the MSE of the correction) of order 1000X and 100Xin the estimation of E [ Q ] and Var[ Q ], respectively. On the other hand, the linear2 Taylor approximation and variance reduction for optimal control under uncertainty
Table 2
Variance reduction by the linear and quadratic approximations at the initial control z , and theoptimal controls z MC lin and z MC quad . The mean-square-error (MSE) with different numbers of randomsamples for the evaluation of the mean E [ Q ] are reported. control E MC [ Q ] MSE( Q ) MSE( Q − Q lin ) MSE( Q − Q quad ) z
10 2.42e+01 1.10e+00 7.94e-03 1.52e-04100 2.54e+01 2.54e-01 4.16e-03 7.40e-05 z MClin
10 3.01e-01 1.10e-03 1.31e-03 1.69e-05100 2.89e-01 9.68e-05 7.49e-05 9.00e-07 z MCquad
10 3.16e-01 1.11e-03 8.02e-04 1.05e-05100 2.98e-01 1.10e-04 9.98e-05 1.12e-06
Table 3
Variance reduction by the linear and quadratic approximations at the initial control z , andthe optimal controls z MC lin and z MC quad . The MSE for the evaluation of the variance Var [ Q ] , where q = ( Q − ¯ Q ) , q lin = ( Q lin − ¯ Q ) , q quad = ( Q quad − ¯ Q ) are reported. control E MC [ q ] MSE( q ) MSE( q − q lin ) MSE( q − q quad ) z
10 1.32e+01 1.71e+01 9.38e-01 6.05e-03100 2.54e+01 1.35e+01 2.61e+00 3.09e-02 z MClin
10 1.99e-02 1.00e-04 9.35e-05 3.92e-06100 1.65e-02 1.32e-05 1.22e-05 2.44e-07 z MCquad
10 2.38e-02 1.82e-04 1.79e-04 8.07e-07100 1.99e-02 1.66e-05 1.65e-05 3.81e-07approximation provides a speed-up of order 100X and 10X respectively. In addition,at the optimal controls z MClin and z MCquad , the quadratic approximation can still achievea ∼ We consider aturbulent jet flow modeled by the Reynolds-averaged Navier–Stokes equations coupledwith a nonlinear stochastic advection-diffusion equation that serves as a notionalmodel of uncertainty in the turbulent viscosity ν t . Specifically, the governing equationsread(5.7) −∇ · (cid:0) ( ν + ν t ) (cid:0) ∇ u + ∇ u ⊤ (cid:1)(cid:1) + ( u · ∇ ) u + ∇ p = 0 , in D, ∇ · u = 0 , in D, −∇ · (( ν + ( γ + e m ) ν t, ) ∇ γ ) + u · ∇ γ − u · e x + b γ = 0 , in D, . Chen, U. Villa, and O. Ghattas σ n ( u ) · τ = 0 , u · n + χ W φ ( z ) = 0 , on Γ I , σ n ( u ) · n = 0 , u · τ = 0 , on Γ O ∪ Γ W , σ n ( u ) · τ = 0 , u · n = 0 , on Γ C ,γ − γ = 0 , on Γ I ∪ Γ W , σ γn ( γ ) · n = 0 , on Γ O ∪ Γ C . Here, D = (0 , × (0 ,
10) is the computational domain (the symmetric upper half ofthe physical domain for the jet flow) as shown in the left part of Fig. 4. e = (1 , τ and n are the unit tangential vector and unit normal vector pointing outside thedomain D along the boundary ∂D , σ n ( u ) = ( ν + ν t ) (cid:0) ∇ u + ∇ u ⊤ (cid:1) · n is the tractionvector, and σ γn ( γ ) = ( ν +( γ + e m ) ν t, ) ∇ γ · n is the normal flux of γ . The control portionof the inflow boundary is denoted by the indicator function χ W with χ W ( x ) = 1 if x ∈ [0 , W ] and χ W ( x ) = 0 if x ∈ ( W, W = 1 .
5. The state variables u = ( u , u ) and p are the time-averaged velocity and pressure, and ν t = γν t, is theturbulent viscosity, where ν t, represents an algebraic closure model and γ is theindicator function that is close to one near the centerline Γ C and zero outside the jetregion. Based on a dimensional analysis of the turbulent planar jet, the turbulentviscosity along the jet centerline is given by(5.9) ν t, = C √ M ( x + aW ) / , where M = R Γ I || u || ds and W are momentum and jet width at the inlet boundary, C and a are two adimensional parameters that need to be calibrated against eitherexperimental or direct Navier-Stokes simulation (DNS) data. The indicator func-tion γ satisfies the nonlinear stochastic advection-diffusion equation in (5.7) with theboundary data(5.10) γ = 0 . − . (cid:18) (cid:18) − x (cid:19) ( x − − . x ) (cid:19) . The uncertain parameter m is a Gaussian random field with distribution N ( ¯ m, C ), forwhich we set the mean ¯ m as a constant field and the covariance as in (2.3) with α = 2, α = α = 0 .
5, and Θ = [15 ,
0; 0 , N (0 , C ) are shownin Fig. 5. In general, the parameters ( C, a, ¯ m, b ) are also random variables whosedistributions can be inferred from experimental or DNS data. Here, for simplicity,we consider only the uncertainty in m and take deterministic values for ( C, a, ¯ m, b ) =(0 . , . , − . , . The function φ ( z ) in (5.8) is the horizontal component of the velocity profile onthe inlet boundary Γ I , which is represented by B-splines as (we use t for x here)(5.11) φ ( z )( t ) = K X k =1 z k B k,n ( t ) , where the coefficient vector z := ( z , . . . , z K ) ∈ [0 , K is the control and B k,n ( t ) areB-spline basis functions recursively defined as(5.12) B k,n ( t ) = t − t k t k + n − t k B k,n − ( t ) + t k + n +1 − tt k + n +1 − t k +1 B k +1 ,n − ( t ) , These values are the maximum a posteriori estimates obtained by solving a Bayesian calibrationproblem where the misfit functional is the L distance between the velocity of (5.7) and the DNSdata provided in [48]. Taylor approximation and variance reduction for optimal control under uncertainty φ ( z ) initial controlB-spline knots Fig. 4 . Left: sketch of the physical domain of the turbulent jet flow, with inlet boundary Γ I ,outlet boundary Γ O , top and bottom wall Γ W , center axis Γ C , and the cross-section Γ . The compu-tational domain D is the symmetric top half of the physical domain. Right: B-spline interpolationof the inlet velocity profile used as an initial guess, and the distribution of the B-spline knots. Fig. 5 . Two random samples drawn from the Gaussian distribution N (0 , C ) with C given in (2.3) , where α = 2 , α = α = 0 . , Θ = [15 ,
0; 0 , . where(5.13) B k, ( t ) = (cid:26) t k ≤ t < t k +1 ;0 otherwise . Here we take K = 11 and use the B-spline of order n = 3 with 15 knots (4 repetitionsat each endpoint), as depicted on the right of Fig. 4. A B-spline representation of theinflow velocity profile used for the initial control is also shown.The control objective is the jet width Q along the cross-section Γ at x = 20shown in Fig. 4, and is defined as(5.14) Q ( u ) = 1 δ ( u ) Z Γ u · e dx , where δ ( u ) = u ( x ) · e with x = (20 , Q , i.e., to minimize E [ − Q ] subject to the state problem (5.7)and the constraint that the inlet momentum M is prescribed. Specifically, we considerthe cost functional(5.15) J ( z ) = E [ − Q ] + β Var[ Q ] + P ( z ) . Here we set β = 10 to balance the mean and variance, and define the penalty term P ( z ) as(5.16) P ( z ) = β (cid:18)Z Γ I ( φ ( z )) dx − M (cid:19) + β Z Γ I |∇ φ ( z ) | dx, . Chen, U. Villa, and O. Ghattas β = 10 ),and the second term controls the regularity of the inlet velocity ( β = 1).The weak formulation of the turbulence model is given by: Given the realization m ∈ M = L ( D ) and control z ∈ Z = R K , find u ∈ U such that(5.17) r ( u, v, m, z ) = 0 , ∀ v ∈ V , where we denote the state with u = (˚ u , p, ˚ γ ) ∈ U and adjoint with v = ( v , q, η ) ∈V . The space for the state and adjoint variables are U = V = U × P × V , where U = { v ∈ ( H ( D )) : v · n = 0 on Γ C and v · τ = 0 on Γ O ∪ Γ W } , P = L ( D ), and V = { γ ∈ H ( D ) : γ | Γ W ∪ Γ I = 0 } . Let R be the lifting of the Dirichlet boundarydata for velocity on Γ C ∪ Γ O ∪ Γ W , we have u = ˚ u + R . Let R γ be the lifting of theboundary data γ on Γ I ∪ Γ W , we have γ = ˚ γ + R γ .Specifically, the weak form in (5.17) consists of three terms and reads(5.18) r ( u, v, m, z ) = Model( u, v, m ) + Stablization( u, v ) + Nitsche( u, v, m, z ) . The first term represents the PDE model in weak form given by(5.19)Model( u, v, m ) = Z D ( ν + ν t )2 S ( u ) · S ( v ) dx + Z D [( u · ∇ ) u ] · v dx − Z D p ∇ · v dx + Z D q ∇ · u dx + Z D ( ν + ( γ + e m ) ν t, ) ∇ γ · ∇ ηdx + Z D [ u · ∇ γ ] ηdx − Z D u · e x + b γηdx. where S ( u ) = ( ∇ u + ∇ u ⊤ ) / u, v ) = Z D τ L ( u ) · D u L ( u )( v ) dx + Z D τ ( ∇ · u )( ∇ · v ) dx + Z D τ ( u · ∇ γ )( u · ∇ η ) dx, where L ( u ) represents the strong form of the residual of the momentum equation (line1) of (5.7), and τ , τ and τ are properly chosen stabilization parameters associatedwith the local P´eclet number.The third term of (5.17) represents the weak imposition of the inlet velocity profile(i.e., the control) by Nitsche’s method [9] and reads(5.21) Nitsche( u, v, m, z ) = C d Z Γ I h − ( ν + ν t )( u · n + χ W φ ( z ))( v · n ) ds − Z Γ I ( σ n ( u ) · n )( v · n ) + ( σ n ( v ) · n )( u · n + χ W φ ( z )) ds, where the first term guarantees the coercivity of the form and the second term arisesfrom integration by parts. The constant C d is taken as C d = 10 , and h is a localelement edge length along the Dirichlet boundaries. This weak imposition of the inletvelocity is necessary to reveal the boundary control in the optimization formulation.We discretize the problem using finite element with piecewise polynomials (P2,P1, P1) for the state u and P1 for the uncertain parameter m . We triangulate thecomputational domain using an anisotropic mesh, locally refined in the jet region,with 14,400 triangular elements (60 elements along the x -direction and 120 along6 Taylor approximation and variance reduction for optimal control under uncertainty the x -direction). After discretization, we obtain an optimization problem in whichthe dimension of the control, state, and uncertain parameter are 11, 73084, and 7381,respectively. As in the previous example, we use a BFGS method with bound con-straints to solve the optimization problem (2.10) with the cost functional defined in(5.15) and the constraint defined by the state problem (5.17). At each BFGS iteration,we solve the fully-coupled nonlinear state problem using Newton’s method with anLU factorization of the Jacobian operator. To compute all the derivatives involved inthe z -gradient presented in Sec. 4 we exploit the symbolic differentiation capabilitiesof FEniCS [57].For the computation of the trace in the quadratic approximation we use boththe Gaussian trace estimator b T ( · ) in (3.15), and the trace estimator b T ( · ) in (3.17)obtained by solving the generalized eigenvalue problem (3.18). Fig. 6 displays thedecay of the eigenvalues as well as the approximation error of the two estimatorsfor both the initial control z and the optimal control z MCquad . The rapid decay of theeigenvalues (four orders of magnitude reduction in the first 100 out of 7381 eigenvalues)of the covariance-preconditioned Hessian reflects the intrinsic low-dimensionality ofthe map from the uncertain parameter field m to the control objective Q . This isdespite the complexity and nonlinearity of the state equations. The reference valuefor the trace tr( H ) was computed (with a precision of 10 − ) as the sum of the first140 dominant eigenvalues of H using Algorithm 2 with k = 140 and p = 10. Itis evident that the second trace estimator achieves much faster decay of the errorthan the Gaussian trace estimator, with a gain in accuracy of about three orders ofmagnitude when using N = 100 samples/eigenvalues. In the numerical test, we usethe second trace estimator with N = 20, which already provides sufficiently accurateresults. It is worth noticing that, in contrast, the Gaussian trace estimator wouldhave required about N = 10 samples to achieve the same accuracy. N -8 -7 -6 -5 -4 -3 -2 λ + λ − error error N -8 -7 -6 -5 -4 -3 -2 λ + λ − error error Fig. 6 . The decay of the generalized eigenvalues, λ + for positive eigenvalues and λ − for negativeeigenvalues, and the errors, denoted as error and error for the trace estimators b T and b T withrespect to the number N in (3.15) and (3.17) . Left: the initial control z = z ; right: the optimalcontrol z = z MCquad obtained by quadratic approximation with Monte Carlo correction.
Fig. 8 shows the relative difference between the true quantity of interest and thelinear and quadratic approximations evaluated at 10 random samples for both Q and q = ( Q − ¯ Q ) . It is clear that the quadratic approximation is more accurate than thelinear approximation for both quantities. Nevertheless both approximations lead todrastic reduction in the variance of the Monte Carlo correction with respect to plainMonte Carlo estimator, as demonstrated by the MSE reported in Tables 4 and 5. . Chen, U. Villa, and O. Ghattas Fig. 7 . The velocity field of the turbulent jet flow obtained solving the state model for the initial(left) and optimal (right) inlet velocity profile. Specifically, the optimal profile was obtained usingthe quadratic approximation with Monte Carlo correction. sample -5 -4 -3 -2 r e l a t i v e e rr o r f o r Q linearquadratic sample -3 -2 -1 r e l a t i v e e rr o r f o r q linearquadratic Fig. 8 . Relative errors at 10 random samples of the linear and quadratic approximations of Q (for the computation of the mean E [ Q ] (left)) and of q = ( Q − ¯ Q ) (for the variance Var [ Q ] (right)).Here, the errors are shown at the optimal control z = z MCquad . This variance reduction translates to greater than 100X speed up in the computationrelative to the plain Monte Carlo estimator for the same accuracy (MSE).
Table 4
Variance reduction by the linear and quadratic approximations at the optimal control z lin fordifferent uncertain parameter dimensions. The mean-square-error (MSE) with 10 random samplesfor the evaluation of the mean E [ Q ] are reported. parameter dimension E MC [ − Q ] MSE( Q ) MSE( Q − Q lin ) MSE( Q − Q quad )1,891 -1.71e+00 7.40e-06 2.68e-08 1.81e-097,381 -1.59e+00 7.94e-06 1.57e-07 1.46e-0829,161 -1.44e+00 3.82e-06 7.23e-08 1.66e-08115,921 -1.42e+00 9.47e-06 6.91e-08 3.06e-08462, 241 -1.41e+00 8.85e-06 9.00e-08 1.78e-088 Taylor approximation and variance reduction for optimal control under uncertainty
Table 5
Variance reduction by the linear and quadratic approximations at the optimal control z lin fordifferent uncertain parameter dimensions. The MSE with test 10 random samples for the evaluationof the variance are reported, where q = ( Q − ¯ Q ) , q lin = ( Q lin − ¯ Q ) , q quad = ( Q quad − ¯ Q ) . parameter dimension E MC [ q ] MSE( q ) MSE( q − q lin ) MSE( q − q quad )1,891 8.05e-05 9.37e-10 1.76e-11 8.77e-137,381 8.13e-05 1.15e-09 8.87e-12 1.48e-1229,161 5.60e-05 6.59e-10 3.39e-11 4.04e-12115,921 1.07e-04 1.82e-09 7.04e-11 2.08e-11462,241 8.96e-05 1.37e-09 7.78e-12 1.33e-11To investigate the scalability of the proposed algorithm, we consider up to sixlevels of uniform mesh refinement: 30 ×
60, 60 × × × × × m ranges from 1 ,
891 on thecoarsest mesh to 1 , ,
961 on the finest. For the linear and quadratic approximation,we solve the optimization problem using the BFGS algorithm with convergence definedby an absolute tolerance of 10 − for the ℓ ∞ norm of the projected gradient. We alsouse a mesh continuation technique to define initial guesses for the control: on thecoarsest mesh we use the velocity profile in Fig. 4 (right), and, on the finer meshes,we use the optimal control obtained from the previous (coarser) mesh. For the linearapproximation with Monte Carlo correction, we set the BFGS absolute tolerance to10 − and we use the solution of the linear approximation (at the same mesh resolution)as the initial guess.The scalability of our approach with respect to the uncertain parameter dimen-sion depends on three properties, namely the dimension-independent behavior of ( i )the spectral of the covariance preconditioned Hessian, ( ii ) the convergence rate ofBFGS, and ( iii ) the variance of estimators for the Monte Carlo correction of the ob-jective function. The top-left plot of Fig. 9 demonstrates ( i ): the eigenvalues of thecovariance-preconditioned Hessian of the control objective (i.e., the generalized eigen-values of problem (3.18)) at the optimal control obtained by linear approximationexhibit the same fast decay independent of the uncertain parameter dimension (rang-ing from a thousand to a million), thus indicating that approximating the trace by arandomized eigensolver is scalable with respect to the uncertain parameter dimension.The other three plots of Fig. 9 provide numerical evidence of the mesh independentconvergence of the BFGS algorithm, property ( ii ), for this particular problem. In thecase of the linear and quadratic approximations using mesh continuation, the norm ofthe z -gradient drops below the prescribed tolerance within 12 iterations for all meshresolutions except the coarsest one (which represents a pre-asymptotic result); in thecase of the linear approximation with Monte Carlo correction, BFGS converges within19 iterations for all mesh resolutions using the solution of the linear approximationas initial guess. This implies that the gradient-based BFGS algorithm is also scalablewith respect to the uncertain parameter dimension. Finally, Tables 4 and 5 indicatethat the variance of the Monte Carlo correction , i.e., property ( iii ), using Taylorapproximation as control variates is independent of the uncertain parameter dimen-sion, and, furthermore, it allows greater than 100X speed up with respect to plainMonte Carlo estimator to achieve the same accuracy (MSE). Combining the threedimension-independent properties, we conclude that the Taylor approximation andvariance reduction stochastic optimization method is overall scalable with respect to . Chen, U. Villa, and O. Ghattas -8 -7 -6 -5 -4 -3 -2 | λ j | dim = 1,891dim = 7,381dim = 29,161dim = 115,921dim = 462,241dim = 1,038,961 -4 -3 -2 -1 | g r a d | dim = 1,891dim = 7,381dim = 29,161dim = 115,921dim = 462,241dim = 1,038,961 -4 -3 -2 -1 | g r a d | dim = 1,891dim = 7,381dim = 29,161dim = 115,921dim = 462,241dim = 1,038,961 -5 -4 -3 -2 -1 | g r a d | dim = 1,891dim = 7,381dim = 29,161dim = 115,921 Fig. 9 . Top-left: decay of the eigenvalues of the covariance-preconditioned Hessian (i.e., thegeneralized eigenvalues (in absolute value) of (3.18) ) with different uncertain parameter dimensions(dim) at the optimal control obtained by linear approximation. Decay of the gradient with respect tothe number of BFGS iterations (
6. Conclusions.
In this work we proposed a scalable computational frameworkto solve PDE-constrained optimal control under uncertainty where the cost functionalinvolves the mean and variance of the control objective and the state equation is gov-erned by a (nonlinear) PDE with high-dimensional uncertain parameter. Specifically,we advocate the use of (low-order) Taylor expansions of the control objective withrespect to the uncertain parameter either to directly approximate the moments ofthe control objective or as control variates to reduce the variance of Monte Carloestimator. The mean and variance expressions of the (linear and quadratic) Taylorexpansions involve the trace of the covariance-preconditioned Hessian of the controlobjective with respect to the uncertain parameter. Randomized algorithms for so-lution of generalized eigenvalue problems [76, 70, 69] allow for accurate and efficientapproximation of this trace and only require computing the action of the covariance-preconditioned Hessian on a number of random directions that depends on its numer-ical rank. As we showed in the numerical results, this approach is more efficient andaccurate than the Gaussian trace estimator when the eigenvalues of the covariance-preconditioned Hessian exhibit fast decay, which is true when the control objectiveis only sensitive to a limited number of directions in the uncertain parameter space.This is often true regardless of the complexity of the state problem. Moreover, when0
Taylor approximation and variance reduction for optimal control under uncertainty the possibly biased Taylor (linear and quadratic) approximation is not sufficientlyaccurate, we can use it as a control variate and correct the mean and variance byMonte Carlo estimator of the remainder of the Taylor expansion, which was shown toachieve considerable computational cost reduction (by several orders of magnitude)compared to a plain Monte Carlo estimator. We have demonstrated the scalabilityof our method with respect to the uncertain parameter dimension in three aspects:the decay of the eigenvalues of the covariance-preconditioned Hessian of the controlobjective, the variance of the remainder of the Taylor expansion, and the number ofoptimization iterations. All three properties do not depend on the nominal dimen-sion of the uncertain parameter but only on the intrinsic dimension of the uncertainparameter, i.e. on the number of directions the control objective is sensitive to.Moreover, by using the randomized eigensolver for trace estimator, Taylor approxi-mation as control variate for variance reduction, and quasi-Newton method with aLagrangian formalism for the optimization, we showed that the complexity—measuredin number of PDE solves—scales linearly with respect to the intrinsic dimensionalityto achieve target (high) accuracy, several orders of magnitude more accurate than aplain Monte Carlo estimator. This is illustrated numerically by solving an optimalboundary control governed by a nonlinear PDE model for turbulent jet flow withinfinite-dimensional uncertain parameter field that characterizes the turbulent vis-cosity. Hence, the proposed computational framework for PDE-constrained optimalcontrol under uncertainty is demonstrated scalable, accurate and efficient.Several extensions and further developments of the proposed computational frame-work are worth pursuing. First, we intend to investigate higher-order Taylor expan-sions beyond the quadratic approximation when the later becomes poor. Second,another research direction is to pursue more general risk measures than the mean-variance measure, such as VaR and conditional VaR [50, 73]. Third, further compu-tational savings can be achieved in combination with model reduction for the statePDE and the linearized PDEs [77, 29]. Fourth, one can replace the Monte Carloestimator in the variance reduction by a sparse quadrature [31] to possibly achievefaster convergence of the approximation error. Fifth, application of the Taylor ap-proximation and variance reduction in a multifidelity framework [63] may lead toadditional computational saving. Finally, one can apply the scalable computationalframework to other challenging control and design problems, such as optimal designof metamaterials under uncertainty [26, 24].
Acknowledgement.
This work was supported by DARPA contract W911NF-15-2-0121, NSF grants CBET-1508713 and ACI-1550593, and DOE grants DE-SC0010518and DE-SC0009286. We thank Robert Moser, Todd Oliver, and Myoungkyu Lee forthe construction of the notional turbulent jet flow optimal boundary control underuncertainty problem.
REFERENCES[1] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. A-optimal design ofexperiments for infinite-dimensional Bayesian linear inverse problems with regularized ℓ -sparsification. SIAM Journal on Scientific Computing , 36(5):A2122–A2148, 2014.[2] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. A fast and scalablemethod for A-optimal design of experiments for infinite-dimensional Bayesian nonlinearinverse problems.
SIAM Journal on Scientific Computing , 38(1):A243–A272, 2016.[3] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. Mean-variance risk-averseoptimal control of systems governed by PDEs with random parameter fields using quadratic. Chen, U. Villa, and O. Ghattas approximations. SIAM/ASA Journal on Uncertainty Quantification , 5(1):1166–1192, 2017.arXiv preprint arXiv:1602.07592.[4] Ahmad Ahmad Ali, Elisabeth Ullmann, and Michael Hinze. Multilevel Monte Carlo analysisfor optimal control of elliptic PDEs with random coefficients.
SIAM/ASA Journal onUncertainty Quantification , 5(1):466–492, 2017.[5] H. Avron and S. Toledo. Randomized algorithms for estimating the trace of an implicit sym-metric positive semi-definite matrix.
Journal of the ACM (JACM) , 58(2):8, 2011.[6] E. Bader, M. K¨archer, M.A. Grepl, and K. Veroy. Certified reduced basis methods forparametrized distributed elliptic optimal control problems with control constraints.
SIAMJournal on Scientific Computing , 38(6):A3921–A3946, 2016.[7] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschel-man, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G.Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, and HongZhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.6, Argonne Na-tional Laboratory, 2015.[8] O. Bashir, K. Willcox, O. Ghattas, B. van Bloemen Waanders, and J. Hill. Hessian-based modelreduction for large-scale systems with initial condition inputs.
International Journal forNumerical Methods in Engineering , 73:844–868, 2008.[9] Y. Bazilevs and T.J.R. Hughes. Weak imposition of Dirichlet boundary conditions in fluidmechanics.
Computers & Fluids , 36(1):12–26, 2007.[10] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction methodsfor parametric dynamical systems.
SIAM review , 57(4):483–531, 2015.[11] P. Benner, A. Onwunta, and M. Stoll. Block-diagonal preconditioning for optimal controlproblems constrained by PDEs with uncertain inputs.
SIAM Journal on Matrix Analysisand Applications , 37(2):491–518, 2016.[12] P. Benner, E. Sachs, and S. Volkwein. Model order reduction for PDE constrained optimization.In
Trends in PDE Constrained Optimization , pages 303–326. Springer, 2014.[13] D. Bertsimas, D.B. Brown, and C. Caramanis. Theory and applications of robust optimization.
SIAM review , 53(3):464–501, 2011.[14] L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders, editors.
Large-Scale PDE-Constrained Optimization . Lecture Notes in Computational Science and Engi-neering, Vol. 30. Springer-Verlag, Heidelberg, 2003.[15] A. Borz`ı and V. Schulz.
Computational optimization of systems governed by partial differentialequations , volume 8. SIAM, 2011.[16] A. Borz`ı, V. Schulz, C. Schillings, and G. Von Winckel. On the treatment of distributeduncertainties in PDE-constrained optimization.
GAMM-Mitteilungen , 33(2):230–246, 2010.[17] T. Bui-Thanh, K. Willcox, O. Ghattas, and B. van Bloemen Waanders. Goal-oriented, model-constrained optimization for reduction of large-scale systems.
Journal of ComputationalPhysics , 224(2):880–896, 2007.[18] Tan Bui-Thanh, Carsten Burstedde, Omar Ghattas, James Martin, Georg Stadler, and Lu-cas C. Wilcox. Extreme-scale UQ for Bayesian inverse problems governed by PDEs. In
SC12: Proceedings of the International Conference for High Performance Computing,Networking, Storage and Analysis , 2012. Gordon Bell Prize finalist.[19] Tan Bui-Thanh and Omar Ghattas. Analysis of the Hessian for inverse scattering problems.Part I: Inverse shape scattering of acoustic waves.
Inverse Problems , 28(5):055001, 2012.[20] Tan Bui-Thanh and Omar Ghattas. Analysis of the Hessian for inverse scattering problems.Part II: Inverse medium scattering of acoustic waves.
Inverse Problems , 28(5):055002,2012.[21] Tan Bui-Thanh and Omar Ghattas. Analysis of the Hessian for inverse scattering problems. PartIII: Inverse medium scattering of electromagnetic waves.
Inverse Problems and Imaging ,7(4):1139–1155, 2013.[22] Tan Bui-Thanh and Omar Ghattas. A scalable MAP solver for Bayesian inverse problems withBesov priors.
Inverse Problems and Imaging , 9(1):27–54, 2015.[23] Tan Bui-Thanh, Omar Ghattas, James Martin, and Georg Stadler. A computational frameworkfor infinite-dimensional Bayesian inverse problems Part I: The linearized case, with appli-cation to global seismic inversion.
SIAM Journal on Scientific Computing , 35(6):A2494–A2523, 2013.[24] P. Chen and O. Ghattas. Analysis of the Hessian for optimization under uncertainty: Applica-tion to optimal design of acoustic metamaterial. in preparation , 2018.[25] P. Chen and O. Ghattas. Hessian-based sampling for goal-oriented model reduction with high-dimensional parameter. preprint , 2018.[26] P. Chen and O. Ghattas. Scalable computational methods for optimal design of acoustic Taylor approximation and variance reduction for optimal control under uncertaintymetamaterials under uncertainty. in preparation , 2018.[27] P. Chen and A. Quarteroni. Weighted reduced basis method for stochastic optimal control prob-lems with elliptic PDE constraints.
SIAM/ASA J. Uncertainty Quantification , 2(1):364–396, 2014.[28] P. Chen, A. Quarteroni, and G. Rozza. Stochastic optimal Robin boundary control problems ofadvection-dominated elliptic equations.
SIAM Journal on Numerical Analysis , 51(5):2700– 2722, 2013.[29] P. Chen, A. Quarteroni, and G. Rozza. Multilevel and weighted reduced basis method forstochastic optimal control problems constrained by Stokes equations.
Numerische Mathe-matik , 133(1):67–102, 2016.[30] P. Chen, U. Villa, and O. Ghattas. Hessian-based adaptive sparse quadrature for infinite-dimensional Bayesian inverse problems.
Computer Methods in Applied Mechanics andEngineering , 327:147–172, 2017.[31] Peng Chen. Sparse quadrature for high-dimensional integration with Gaussian measure.
ESAIM: Mathematical Modelling and Numerical Analysis , 52(2):631–657, 2018.[32] Peng Chen, Alfio Quarteroni, and Gianluigi Rozza. Reduced basis methods for uncertaintyquantification.
SIAM/ASA Journal on Uncertainty Quantification , 5(1):813–869, 2017.[33] Benjamin Crestel, Alen Alexanderian, Georg Stadler, and Omar Ghattas. A-optimal encodingweights for nonlinear inverse problems, with application to the Helmholtz inverse problem.
Inverse Problems , 33(7):074008, 2017.[34] J. Dick, F.Y. Kuo, and I.H. Sloan. High-dimensional integration–the Quasi-Monte Carlo way.
Acta Numerica , 22:133–288, 2013.[35] J. Dick, Q.T. Le Gia, and Ch. Schwab. Higher order quasi–Monte Carlo integration for holomor-phic, parametric operator equations.
SIAM/ASA Journal on Uncertainty Quantification ,4(1):48–79, 2016.[36] Pearl H. Flath, Lucas C. Wilcox, Volkan Ak¸celik, Judy Hill, Bart van Bloemen Waanders,and Omar Ghattas. Fast algorithms for Bayesian uncertainty quantification in large-scalelinear inverse problems based on low-rank partial Hessian approximations.
SIAM Journalon Scientific Computing , 33(1):407–432, 2011.[37] D. Ghate and M.B. Giles. Efficient Hessian calculation using automatic differentiation.
AIAApaper , 4059:2007, 2007.[38] O. Ghattas and J.-H. Bark. Optimal control of two- and three-dimensional Navier-Stokes flow.
Journal of Computational Physics , 136:231–244, 1997.[39] R. Glowinski and J.L. Lions.
Exact and approximate controllability for distributed parametersystems . Cambridge University Press, Cambridge, UK, 1996.[40] M.D. Gunzburger.
Perspectives in flow control and optimization , volume 5. SIAM, 2003.[41] M.D. Gunzburger, H.C. Lee, and J. Lee. Error estimates of stochastic optimal neumann bound-ary control problems.
SIAM Journal on Numerical Analysis , 49:1532–1552, 2011.[42] N. Halko, P-G Martinsson, and J.A. Tropp. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions.
SIAM review , 53(2):217–288, 2011.[43] J.S. Hesthaven, G. Rozza, and B. Stamm. Certified reduced basis methods for parametrizedpartial differential equations.
SpringerBriefs in Mathematics , 2015.[44] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich.
Optimization with PDE constraints , vol-ume 23. Springer Science & Business Media, 2008.[45] L.S. Hou, J. Lee, and H. Manouzi. Finite element approximations of stochastic optimal controlproblems constrained by stochastic elliptic PDEs.
Journal of Mathematical Analysis andApplications , 384(1):87–103, 2011.[46] Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas. Scalable and efficient algorithmsfor the propagation of uncertainty from data through inference to prediction for large-scaleproblems, with application to flow of the Antarctic ice sheet.
Journal of ComputationalPhysics , 296:348–368, September 2015.[47] M. K¨archer and M.A. Grepl. A posteriori error estimation for reduced order solutions ofparametrized parabolic optimal control problems.
ESAIM: Mathematical Modelling andNumerical Analysis , 48(6):1615–1638, 2014.[48] M Klein, A Sadiki, and J Janicka. Investigation of the influence of the reynolds number on aplane jet using direct numerical simulation.
International Journal of Heat and Fluid Flow ,24(6):785–794, 2003.[49] D.P. Kouri, D. Heinkenschloos, M. Ridzal, and B.G. Van Bloemen Waanders. A trust-regionalgorithm with adaptive stochastic collocation for PDE optimization under uncertainty.
SIAM Journal on Scientific Computing , 35(4):1847–1879, 2012.[50] D.P. Kouri and T.M. Surowiec. Risk-averse PDE-constrained optimization using the conditional. Chen, U. Villa, and O. Ghattas Value-At-Risk.
SIAM Journal on Optimization , 26(1):365–396, 2016.[51] Drew P. Kouri, Matthias Heinkenschloss, Denis Ridzal, and Bart van Bloemen Waanders.Inexact objective function evaluations in a trust-region algorithm for PDE-constrainedoptimization under uncertainty.
SIAM Journal on Scientific Computing , 36(6):A3011–A3029, 2014.[52] A. Kunoth and Ch. Schwab. Analytic regularity and GPC approximation for control problemsconstrained by linear parametric elliptic and parabolic PDEs.
SIAM Journal on Controland Optimization , 51(3):2442–2471, 2013.[53] A. Kunoth and Ch. Schwab. Sparse adaptive tensor Galerkin approximations of stochasticPDE-constrained control problems.
SIAM/ASA Journal on Uncertainty Quantification ,4(1):1034–1059, 2016.[54] T. Lassila, A. Manzoni, A. Quarteroni, and G. Rozza. Boundary control and shape optimizationfor the robust design of bypass anastomoses under uncertainty.
ESAIM: MathematicalModelling and Numerical Analysis , 47(4):1107–1131, 2013.[55] Finn Lindgren, H˚avard Rue, and Johan Lindstr¨om. An explicit link between Gaussian fieldsand Gaussian Markov random fields: the stochastic partial differential equation approach.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 73(4):423–498,2011.[56] J.L. Lions.
Optimal control of systems governed by partial differential equations . Springer-Verlag, Berlin, 1971.[57] A. Logg, K.A. Mardal, and G. Wells.
Automated solution of differential equations by the finiteelement method: The FEniCS book , volume 84. Springer Science & Business Media, 2012.[58] James Martin, Lucas C. Wilcox, Carsten Burstedde, and Omar Ghattas. A stochastic New-ton MCMC method for large-scale statistical inverse problems with application to seismicinversion.
SIAM Journal on Scientific Computing , 34(3):A1460–A1487, 2012.[59] F. Negri, A. Manzoni, and G. Rozza. Reduced basis approximation of parametrized optimal flowcontrol problems for the Stokes equations.
Computers & Mathematics with Applications ,69(4):319–336, 2015.[60] F. Negri, G. Rozza, A. Manzoni, and A. Quarteroni. Reduced basis method for parametrizedelliptic optimal control problems.
SIAM Journal on Scientific Computing , 35(5):A2316–A2340, 2013.[61] L.W.T. Ng and K.E. Willcox. Multifidelity approaches for optimization under uncertainty.
International Journal for numerical methods in Engineering , 100(10):746–772, 2014.[62] J. Nocedal and S. Wright.
Numerical optimization . Springer Science & Business Media, 2006.[63] B. Peherstorfer, K. Willcox, and M. Gunzburger. Survey of multifidelity methods in uncertaintypropagation, inference, and optimization.
SIAM Review , 60(3):550–591, 2018.[64] Noemi Petra, James Martin, Georg Stadler, and Omar Ghattas. A computational frameworkfor infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMCwith application to ice sheet inverse problems.
SIAM Journal on Scientific Computing ,36(4):A1525–A1555, 2014.[65] A. Quarteroni, A. Manzoni, and F. Negri.
Reduced Basis Methods for Partial DifferentialEquations: An Introduction , volume 92. Springer, 2015.[66] Farbod Roosta-Khorasani and Uri Ascher. Improved bounds on sample size for implicit matrixtrace estimators.
Foundations of Computational Mathematics , 15(5):1187–1212, 2015.[67] E. Rosseel and G.N. Wells. Optimal control with stochastic PDE constraints and uncertaincontrols.
Computer Methods in Applied Mechanics and Engineering , 213 - 216(0):152 –167, 2012.[68] G. Rozza, D.B.P. Huynh, and A.T. Patera. Reduced basis approximation and a posteriori errorestimation for affinely parametrized elliptic coercive partial differential equations.
Archivesof Computational Methods in Engineering , 15(3):229–275, 2008.[69] A.K. Saibaba, A. Alexanderian, and I.C.F. Ipsen. Randomized matrix-free trace and log-determinant estimators.
Numerische Mathematik , pages 1–43, 2016.[70] A.K. Saibaba, J. Lee, and P.K. Kitanidis. Randomized algorithms for generalized Hermitianeigenvalue problems with application to computing Karhunen–Lo`eve expansion.
NumericalLinear Algebra with Applications , 23(2):314–339, 2016.[71] C. Schillings, S. Schmidt, and V. Schulz. Efficient shape optimization for certain and uncertainaerodynamic design.
Computers & Fluids , 46(1):78–87, 2011.[72] C. Schillings and Ch. Schwab. Sparse, adaptive Smolyak quadratures for Bayesian inverseproblems.
Inverse Problems , 29(6), 2013.[73] Alexander Shapiro, Darinka Dentcheva, and Andrezj Ruszczynski.
Lectures on Stochastic Pro-gramming: Modeling and Theory . Society for Industrial and Applied Mathematics, 2009.[74] H. Tiesler, R.M. Kirby, D. Xiu, and T. Preusser. Stochastic collocation for optimal control Taylor approximation and variance reduction for optimal control under uncertaintyproblems with stochastic PDE constraints.
SIAM Journal on Control and Optimization ,50(5):2659–2682, 2012.[75] F. Tr¨oltzsch.
Optimal control of partial differential equations: theory, methods, and applica-tions , volume 112. American Mathematical Society, Providence, RI, 2010.[76] U. Villa, N. Petra, and O. Ghattas. hIPPYlib: An extensible software framework for large-scaledeterministic and linearized Bayesian inversion, 2016. http://hippylib.github.io .[77] M.J. Zahr and C. Farhat. Progressive construction of a parametric reduced-order model forPDE-constrained optimization.