[PDF] Taylor approximation and variance reduction for PDE-constrained optimal control under uncertainty

Abstract

In this work we develop a scalable computational framework for the solution of PDE-constrained optimal control under high-dimensional uncertainty. Specifically, we consider a mean-variance formulation of the control objective and employ a Taylor expansion with respect to the uncertain parameter either to directly approximate the control objective or as a control variate for variance reduction. The expressions for the mean and variance of the Taylor approximation are known analytically, although their evaluation requires efficient computation of the trace of the (preconditioned) Hessian of the control objective. We propose to estimate this trace by solving a generalized eigenvalue problem using a randomized algorithm that only requires the action of the Hessian on a small number of random directions. Then, the computational work does not depend on the nominal dimension of the uncertain parameter, but depends only on the effective dimension, thus ensuring scalability to high-dimensional problems. Moreover, to increase the estimation accuracy of the mean and variance of the control objective by the Taylor approximation, we use it as a control variate for variance reduction, which results in considerable computational savings (several orders of magnitude) compared to a plain Monte Carlo method. We demonstrate the accuracy, efficiency, and scalability of the proposed computational method for two examples with high-dimensional uncertain parameters: subsurface flow in a porous medium modeled as an elliptic PDE, and turbulent jet flow modeled by the Reynolds-averaged Navier--Stokes equations coupled with a nonlinear advection-diffusion equation characterizing model uncertainty. In particular, for the latter more challenging example we show scalability of our algorithm up to one million parameters resulting from discretization of the uncertain parameter field.

Full PDF

aa r X i v : . [ m a t h . O C ] A ug TAYLOR APPROXIMATION AND VARIANCE REDUCTION FORPDE-CONSTRAINED OPTIMAL CONTROL UNDERUNCERTAINTY

PENG CHEN ∗ , UMBERTO VILLA † , AND

OMAR GHATTAS † Abstract.

In this work we develop a scalable computational framework for the solution ofPDE-constrained optimal control under high-dimensional uncertainty. Speciﬁcally, we consider amean-variance formulation of the control objective and employ a Taylor expansion with respect tothe uncertain parameter either to directly approximate the control objective or as a control variatefor variance reduction. The expressions for the mean and variance of the Taylor approximationare known analytically, although their evaluation requires eﬃcient computation of the trace of the(preconditioned) Hessian of the control objective. We propose to estimate this trace by solving ageneralized eigenvalue problem using a randomized algorithm that only requires the action of theHessian on a small number of random directions. Then, the computational work does not depend onthe nominal dimension of the uncertain parameter, but depends only on the eﬀective dimension (i.e.,the rank of the preconditioned Hessian), thus ensuring scalability to high-dimensional problems.Moreover, to increase the estimation accuracy of the mean and variance of the control objectiveby the Taylor approximation, we use it as a control variate for variance reduction, which resultsin considerable computational savings (several orders of magnitude) compared to a plain MonteCarlo method. In summary, our approach amounts to solving an optimal control constrained by theoriginal PDE and a set of linearized PDEs, which arise from the computation of the gradient andHessian of the control objective with respect to the uncertain parameter. We use the Lagrangianformalism to derive expressions for the gradient with respect to the control and apply a gradient-basedoptimization method to solve the problem. We demonstrate the accuracy, eﬃciency, and scalability ofthe proposed computational method for two examples with high-dimensional uncertain parameters:subsurface ﬂow in a porous medium modeled as an elliptic PDE, and turbulent jet ﬂow modeled by theReynolds-averaged Navier–Stokes equations coupled with a nonlinear advection-diﬀusion equationcharacterizing model uncertainty. In particular, for the latter more challenging example we showscalability of our algorithm up to one million parameters resulting from discretization of the uncertainparameter ﬁeld.

Key words.

PDE-constrained optimal control, Taylor approximation, variance reduction,Monte Carlo integration, scalability, uncertainty quantiﬁcation, high dimensionality

AMS subject classiﬁcations.

1. Introduction.

Optimal control (and more general optimization) constrainedby partial diﬀerential equations (PDEs) are ubiquitous in many applications of prac-tical relevance for science and engineering. The objective of such problems is to seekan optimal control that minimizes a cost functional, which often consists of a controlobjective and a penalty term. The control objective is often given as a distance be-tween the solution of the PDE and a desired state or more generally formulated asa quantity of interest related to the solution, while the penalty term reﬂects the costof the control or imposes regularity of the control. In practical applications, uncer-tainties in the PDE model are inevitable and can arise from various sources, such asthe PDE coeﬃcients, initial or boundary conditions, external sources, or geometries.Diﬀerent realizations of these uncertain parameters may lead to signiﬁcant diﬀerencesin the optimal control. Mathematical theories and computational methods have beendeveloped for several decades to deal with deterministic PDE-constrained optimal ∗ Institute for Computational Engineering & Sciences, The University of Texas at Austin, Austin,TX 78712 ([email protected], [email protected]). † Institute for Computational Engineering & Sciences, Department of Mechanical Engineering,and Department of Geological Sciences, The University of Texas at Austin, Austin, TX 78712([email protected]). 1

Taylor approximation and variance reduction for optimal control under uncertainty control [56, 39, 40, 14, 44, 75, 15]. More recently, PDE-constrained optimal controlunder uncertainty has become an important area of research and has drawn increasingattention [16, 71, 45, 41, 67, 49, 74, 28, 54, 27, 51, 52, 61, 29, 53, 50, 11, 3, 4].Diﬀerent formulations of the cost functional have been studied to incorporateuncertainty into the PDE-constrained optimal control. A straightforward choice is tooptimize the mean of the control objective, i.e., the integration of the control objectivewith respect to the probability measure of the uncertain parameter [16, 45, 41, 49, 27].However, this formulation does not control the variability of the control objective. Toavoid a possible and undesired large variation of the control objective, which mayrepresent the risk of a system failure, one can include the variance or higher momentsof the control objective in the cost functional [67, 11, 74]. An alternative approach isuse the Value-at-Risk (VaR) or conditional VaR [50], which measure the (conditional)probability of the control objective surpassing a certain critical value. An extremechoice is robust optimization (also referred as min-max optimization), where one seeksa control that yields the optimal value of the objective in the worst-case scenario, i.e.,the extreme value of the control objective in the range of the uncertain parameter[13, 54].A common challenge in all of these formulations is to compute the statistics of thecontrol objective (i.e., mean, variance, higher moments, or conditional probabilities);this involves the integration of the control objective or related quantities with respectto the probability measure of the uncertain parameter. This challenge becomes moresevere for high-dimensional uncertain parameter space, which is often the case whenthe parameter is a uncertain spatially correlated ﬁeld. A straightforward approachis to apply a Monte Carlo method to compute the statistics by taking the averageof the control objective or related quantities at a set of samples randomly drawnaccording to the probability measure of the parameter. However, the converge rate ofthe Monte Carlo estimator is just O ( M − / ), where M is the number of samples. Thismay lead to an extremely large number of samples to achieve a target accuracy, andtherefore may not be feasible when the state equation is a complex nonlinear PDE.Quasi Monte Carlo methods using low-discrepancy sequences improve the convergencerate to O ((log( M )) d /M ) for a d -dimensional parameter [34], and is thus attractive forlow to medium dimensional problems. Stochastic Galerkin and stochastic collocationmethods have also been used to compute the statistical moments in the cost functional[45, 41, 74, 67, 49, 28, 54, 28, 27, 52, 53], provided that a suitable ﬁnite dimensionalparametrization of the uncertain parameter, such as a truncated Karhunen–Lo`eveexpansion, is applicable. These methods achieve fast convergence when the controlobjective depends smoothly on the low-dimensional parameter, but suﬀer from theso-called curse of dimensionality , i.e., the convergence rate quickly deteriorates as thedimension of the parameter increases. Recent advances in adaptive and anisotropicsparse quadrature [72, 31] and high-order quasi Monte Carlo methods [35] have beenshown to achieve a dimension-independent convergence rate N − s with s potentiallymuch larger than 1 /

2. Here s is independent of the nominal dimension of the uncertainparameter and depends only on the choice of parametrization and on the regularity ofthe control objective with respect to the parameter, thus mitigating or even breakingthe curse of dimensionality. However, the convergence may still be slow if the choiceof the parametrization does not correctly capture the eﬀective parameter dimensions.Another computational challenge is that, in applications of practical relevance,the PDE may lead, after discretization, to possibly nonlinear large-scale systems thatare extremely expensive to solve. Thus, only a limited number of high-ﬁdelity PDE . Chen, U. Villa, and O. Ghattas Taylor approximation and variance reduction for optimal control under uncertainty

The Taylor approximation may be not suﬃciently accurate and introduces biasin the estimation of the mean and variance of the control objective. Speciﬁcally, thenorm of the remainder of the Taylor expansion scales as O (tr( C ) ( r +1) / ) where C isthe covariance of the uncertain parameter and r is the order of the truncated Taylorexpansion, e.g., r = 1 and 2 for linear and quadratic approximations, respectively [3].To address this issue, we propose to use the Taylor approximation as a control variateto reduce the variance of the Monte Carlo estimator for the mean and variance ofthe control objective, or use Monte Carlo correction for the remainder of the Taylorexpansion from another viewpoint. In cases when the Taylor approximation is highlycorrelated with the exact control objective, reduction in the computational cost canbe dramatic, up to several orders of magnitude in our numerical examples.We consider two examples of PDE-constrained optimal control under uncertaintyto numerically illustrate the scalability, accuracy and eﬃciency of our computationalframework. The ﬁrst is a problem of optimal control of subsurface ﬂow in a porousmedium, where the control is the injection rate at given well locations, the state equa-tion is an elliptic PDE representing single phase ﬂow with uncertain permeability, andthe objective is to drive the pressure ﬁeld to a desired state. The second is an optimalboundary control for a turbulent jet ﬂow, where the control is the inlet velocity proﬁle,the state equations are the Reynolds-averaged Navier–Stokes equations coupled with anonlinear stochastic advection-diﬀusion equation representing model uncertainty, andthe objective is to maximize the jet width at a cross-section a certain distance fromthe inlet boundary. In both cases, the uncertain parameter is inﬁnite-dimensional inthe continuous setting and becomes high-dimensional after discretization.For solution of the PDE-constrained optimal control with Taylor approximationand variance reduction we employ a gradient-based method, speciﬁcally the limitedmemory BFGS method [62]. The cost functional involves the analytical expressionsfor the mean and variance of the Taylor approximation and the Monte Carlo cor-rection when using the Taylor approximation for variance reduction; the constraintinvolves the state (possibly nonlinear) PDE and a set of linearized PDEs used in thecomputation of the gradient and Hessian of the control objective with respect to theuncertain parameter. We use the Lagrangian formalism to derive the gradient of thecost functional with respect to the control. Speciﬁcally, we compute mixed high-orderderivatives of the Lagrangian functional with respect to the state, the adjoint, the un-certain parameter, and the control by symbolic diﬀerentiation in a variational settingbefore discretization. This allows us to eﬃciently apply our framework to rather com-plex PDE models such as the turbulent jet example, which features coupled nonlinearPDEs, nonlinear numerical stabilization, and weak imposition of Dirichlet boundaryconditions. We demonstrate numerically that our method scales independent of theuncertain parameter dimension, in that the number of required PDE solves does notdepend on the discretization of the parameter ﬁeld, and instead depends only onthe eﬀective or instrinsic dimension of the problem, i.e., the number of parameterdimensions to which the control objective is sensitive.The rest of the paper is organized as follows: We present the basic setting forthe PDE-constrained optimal control under uncertainty in Section 2 using a generalnotation for the uncertain parameter, the PDE model, the cost functional, and theoptimal control problem. We also present a sample average approximation of themean and variance of the control objective in this section. Section 3 is devoted tothe development of the Taylor approximation and variance reduction for the evalua-tion of the mean and variance of the control objective. We also present computation . Chen, U. Villa, and O. Ghattas

2. PDE-constrained optimal control under uncertainty.

In this sectionwe present the PDE-constrained optimal control under uncertainty in an abstractsetting. To this aim, let us ﬁrst introduce the notation used throughout the paper.Let X be a separable Banach space and X ′ the dual space, then X h· , ·i X ′ denotes theduality pairing between the spaces X and X ′ . For ease of notation, we will omit tospecify the subscritps X and X ′ and simply write h· , ·i when the spaces can be inferredfrom the context without ambiguity. Given two separable Banach spaces X and Y and a map f : X × Y 7→ R , ∂ x f ( x, y ) ∈ X ′ denotes the Fr´echet derivative of f ( x, y )with respect to x evaluated at ( x, y ), which satisﬁes(2.1) lim ˜ x → | f ( x + ˜ x, y ) − f ( x, y ) − h ˜ x, ∂ x f ( x, y ) i| / || ˜ x || X = 0 . Let ∂ xy f ( x, y ) : Y 7→ X ′ denote the Fr´echet derivative of ∂ x f ( x, y ) with respect to y evaluated at ( x, y ), or the second order (mixed) Fr´echet derivative of f ( x, y ) withrespect to x and y evaluated at ( x, y ). Similarly, ∂ yx f ( x, y ) : X 7→ Y ′ denotes theFr´echet derivative of ∂ y f ( x, y ) with respect to x evaluated at ( x, y ), which is theadjoint operator of ∂ xy f ( x, y ) and satisﬁes(2.2) h ˜ x, ∂ xy f ˆ y i := X h ˜ x, ∂ xy f ˆ y i X ′ = Y h ˆ y, ∂ yx f ˜ x i Y ′ =: h ˆ y, ∂ yx f ˜ x i , ∀ ˜ x ∈ X , ˆ y ∈ Y . where we have omitted the argument ( x, y ) for simplicity. Finally, let us introducethe separable Banach spaces U , V , M , and Z in which the state u , the adjoint v , theuncertain parameter m , and the deterministic control z live, respectively. The exactdeﬁnition of these spaces is problem speciﬁc and is speciﬁed in Sec. 5. We assume that the uncertain parameter m ∈M has mean ¯ m ∈ M and covariance C : M → M ′ . The computational frameworkdeveloped in this work is applicable as long as this assumption is satisﬁed regarding theuncertain parameter, which can be both ﬁnite and inﬁnite dimensional. In particular,to demonstrate the scalability of our computational methods, we consider an inﬁnite-dimensional Gaussian random ﬁeld with distribution µ = N ( ¯ m, C ) for the uncertainparameter, which is one of the most important building blocks in modern spatialstatistical modeling. A general covariance structure is known as Mat´ern covariance.It is equivalent to that the uncertain parameter m is given as the solution of somelinear fractional stochastic PDE [55]. In this work we consider this type of covariance,i.e.,(2.3) C = ( − α ∇ · (Θ ∇ ) + α I ) − α , where ∇· , ∇ , and I are the divergence, gradient, and identity operators, respectively;the parameter α > α > d/ d = 1 , , α , α > ∈ R d × d enables its spatialanisotropy. Taylor approximation and variance reduction for optimal control under uncertainty

We consider a state problemmodeled by the following PDE model. Given a realization of the uncertain parameter m ∈ M and a deterministic control z ∈ Z , ﬁnd the state u ∈ U , such that(2.4) R ( u, m, z ) = 0 in V ′ , where R ( · , m, z ) : U → V ′ denotes a (possibly nonlinear) operator from U to the dualof V . We then deﬁne the weak form associated to the state equation (2.4) by meansof duality pairing, and write(2.5) r ( u, v, m, z ) := V h v, R ( u, m, z ) i V ′ = 0 ∀ v ∈ V . Note that r ( u, v, m, z ) is linear with respect to the adjoint v and possibly nonlinearwith respect to the state u .As for the control objective, we consider a real-valued, possibly nonlinear, func-tional Q of the state u ∈ U ,(2.6) Q : U → R . For simplicity, we assume that Q depends on m and z only implicitly through thesolution of the state equation (2.5). The general case where Q has an explicit depen-dence on m and z can be treated with slight modiﬁcations in the derivation of thecost functional and its gradient, as we address along the presentation in Sec. 3 andSec. 4. Since thecontrol objective Q (implicitly) depends on the uncertain parameter m through thestate u , we face PDE-constrained optimal control under uncertainty. In this work weconsider a mean-variance cost functional of the form(2.7) J ( z ) = E [ Q ] + β Var[ Q ] + P ( z ) , which is often used in stochastic programming [73]. Above E [ Q ] and Var[ Q ] are,respectively, the mean and variance of Q with respect to the probability measure µ of the uncertain parameter m , and are deﬁned as(2.8) E [ Q ] := Z M Q ( u ( m, z )) dµ ( m ) , and(2.9) Var[ Q ] := Z M ( Q ( u ( m, z )) − E [ Q ]) dµ ( m ) = E [ Q ] − ( E [ Q ]) . As the control objective Q (implicitly) depends on the control z through the state u ,so do E [ Q ] and Var[ Q ]. The scaling factor β ≥ P ( z ) is a penalty term.Recalling that the state u (implicitly) depends on the uncertain parameter m and control z through the solution of the PDE (2.5), we then formulate the PDE-constrainted optimal control under uncertainty as: ﬁnd z ∗ ∈ Z such that(2.10) z ∗ = arg min z ∈Z J ( z ) , subject to the problem (2.5) . To evaluate the mean and variance in the cost functional (2.7), a plain MonteCarlo sample average approximation can be used. We brieﬂy discuss this approachbelow, while in the following section we introduce more sophisticated approximationthat allow for more eﬃcient algorithms by exploiting high-order derivatives. . Chen, U. Villa, and O. Ghattas The (plain Monte Carlo) sample aver-age approximation of the mean reads(2.11) E [ Q ] ≈ b Q := 1 M M X i =1 Q ( m i ) , where m i , i = 1 , . . . , M , are independent indentically distributed (i.i.d.) randomsamples drawn from the probability measure µ . Similarly, the sample average ap-proximation of the variance is given by(2.12) Var[ Q ] ≈ b V Q := 1 M M X i =1 Q ( m i ) − M M X i =1 Q ( m i ) ! , where m i , i = 1 , . . . , M , are i.i.d. random samples from µ . Since each evaluationof Q ( m i ) involves solving the possibly nonlinear state equation (2.5), it is commonpractice to choose M = M = M and reuse the same samples m i to estimate boththe mean and variance of Q . However, a more careful choice of M and M relieson estimating the variance of Q and Q and aims at minimizing the mean squareerror of b Q + β b V Q . Note that a large number of samples is required to obtain arelatively accurate approximation of E [ Q ] and Var[ Q ] due to the slow convergenceof Monte Carlo sampling, thus rendering this approach computationally prohibitivefor practical applications due to the large number of (nonlinear) PDE solves at eachoptimization step.

3. Taylor approximation and variance reduction.

We develop the mainapproximation methods in this section, which include the approximation by (low-order) Taylor expansion of the control objective, the computation of the gradient andHessian of the control objective with respect to the uncertain parameter used in theTaylor expansion, the randomized algorithm to solve a generalized eigenvalue problemfor trace estimation, and the Monte Carlo correction for the remainder of the Taylorexpansion.

In this section wederive expressions for the Taylor expansion of the control objective Q ( u ( m, z )) withrespect to the uncertain parameter m for a ﬁxed value of the control z . In what follows,we denote with the subscript m the total derivative of Q ( u ( m, z )) with respect to m and we assume suﬃcient smoothness so that all Fr´echet derivatives are well-deﬁned.For ease of notation, we also denote with h· , ·i the duality-pairing M h· , ·i M ′ , and, witha slight abuse of notation, we write Q ( m ) to indicate the control objective Q ( u ( m, z ))evaluated at a ﬁxed value of z .A formal Taylor expansion of the control objective evaluated at ¯ m ∈ M is givenby(3.1) Q ( m ) = Q ( ¯ m ) + h m − ¯ m, Q m ( ¯ m ) i + 12 h m − ¯ m, Q mm ( ¯ m ) ( m − ¯ m ) i + · · · , where Q m ( m ) ∈ M ′ and Q mm ( ¯ m ) : M → M ′ are the ﬁrst and second order Fr´echetderivatives of Q with respect to m evaluated at the nominal ¯ m ∈ M . In what follows,we refer to such derivatives as the gradient and Hessian of Q , respectively. Based onthe Taylor expansion, we can deﬁne the linear approximation of Q as(3.2) Q lin ( m ) = Q ( ¯ m ) + h m − ¯ m, Q m ( ¯ m ) i , Taylor approximation and variance reduction for optimal control under uncertainty and the quadratic approximation of Q as(3.3) Q quad ( m ) = Q ( ¯ m ) + h m − ¯ m, Q m ( ¯ m ) i + 12 h m − ¯ m, Q mm ( ¯ m ) ( m − ¯ m ) i . We remark that Q lin ( m ) and Q quad ( m ) are polynomials with respect to the uncertainparameter m , and depend on the control z through the solution of the state equation(2.5) at m = ¯ m .For the linear approximation, given the mean ¯ m and covariance C of the uncertainparameter m , we have(3.4) E [ Q lin ] = Q ( ¯ m ) , and Var[ Q lin ] = h Q m ( ¯ m ) , C Q m ( ¯ m ) i . In fact, Q lin ( m ) is Gaussian if the uncertain parameter m ∼ N ( ¯ m, C ). For thequadratic approximation, we have(3.5) E [ Q quad ] = E [ Q lin ] + 12 tr( H ) , and Var[ Q quad ] = Var[ Q lin ] + 12 tr( H ) , where H = C / Q mm ( ¯ m ) C / is the covariance preconditioned Hessian, and tr( · ) de-notes the trace operator, see details in [3]. Eﬃcient algorithms to accurately estimatethe trace of H and H will be discussed in Sec. 3.3.Finally, following [3], we approximate the cost functional J ( z ) in (2.7) by replacing E [ Q ] and Var[ Q ] with the moments of the linear or quadratic Taylor expansions ofthe control objective. In this section we use the Lagrangianformalism to derive the expressions for the gradient Q m and for the action of theHessian Q mm on a given direction ˆ m evaluated at ¯ m . In what follows, for ease ofnotation, we use the overline to indicate quantities evaluated at m = ¯ m and deﬁne(3.6) ¯ r := r ( u, v, ¯ m, z ) , and ¯ Q := Q ( u ( ¯ m, z )) . We deﬁne the Lagragian functional(3.7) L ( u, v, m, z ) = Q ( u ) + r ( u, v, m, z ) , where the adjoint v is regarded as the Lagrangian multiplier of the state equation.By requiring the ﬁrst order variation of (3.7) at ¯ m with respect to the adjoint v to vanish, we obtain the state problem: ﬁnd u ∈ U , such that(3.8) h ˜ v, ∂ v ¯ r i = 0 , ∀ ˜ v ∈ V . Similarly, by setting the ﬁrst order variation of (3.7) at ¯ m with respect to the state u to zero, we obtain the adjoint problem: ﬁnd v ∈ V , such that(3.9) h ˜ u, ∂ u ¯ r i = −h ˜ u, ∂ u ¯ Q i , ∀ ˜ u ∈ U . Then, the gradient of Q at ¯ m acting in the direction ˜ m = m − ¯ m is given by(3.10) h ˜ m, Q m ( ¯ m ) i = h ˜ m, ∂ m ¯ r i , where u solves the state problem (3.8) and v solves the adjoint problem (3.9). . Chen, U. Villa, and O. Ghattas Q acting on ˆ m , we consider theLagrangian functional(3.11) L H ( u, v, m, z ; ˆ u, ˆ v, ˆ m ) = h ˆ m, ∂ m ¯ r i + h ˆ v, ∂ v ¯ r i + h ˆ u, ∂ u ¯ r + ∂ u ¯ Q i , where ˆ u and ˆ v denote the incremental state and incremental adjoint, respectively. Bytaking variation of (3.11) with respect to the adjoint v and using (2.2), we obtain theincremental state problem: ﬁnd ˆ u ∈ U such that(3.12) h ˜ v, ∂ vu ¯ r ˆ u i = −h ˜ v, ∂ vm ¯ r ˆ m i , ∀ ˜ v ∈ V , where the derivatives ∂ vu ¯ r : U → V ′ and ∂ vm ¯ r : M → V ′ are linear operators. Theincremental adjoint problem, obtained by taking variation of (3.11) with respect tothe state u and using (2.2), reads: ﬁnd ˆ v ∈ V such that(3.13) h ˜ u, ∂ uv ¯ r ˆ v i = −h ˜ u, ∂ uu ¯ r ˆ u + ∂ uu ¯ Q ˆ u + ∂ um ¯ r ˆ m i , ∀ ˜ u ∈ U , where ∂ uv ¯ r : V → U ′ is the adjoint of ∂ vu ¯ r : U → V ′ in the sense of (2.2). We remarkthat an extra term −h ˜ u, ∂ um ¯ Q ˆ m i is added on the right hand side if Q explicitlydepends on m . The Hessian of Q at ¯ m acting on ˆ m can then be computed by takingvariation of (3.11) with respect to m and using (2.2) as(3.14) h ˜ m, Q mm ( ¯ m ) ˆ m i = h ˜ m, ∂ mv ¯ r ˆ v + ∂ mu ¯ r ˆ u + ∂ mm ¯ r ˆ m i , where the incremental state ˆ u and adjoint ˆ v solve (3.12) and (3.13), respectively. Notethat h ˜ m, ∂ mu ¯ Q ˆ u + ∂ mm ¯ Q ˆ m i is also added if Q explicitly depends on m .We summarize the computation of the gradient and Hessian of the control objec-tive Q with respect to the uncertain parameter m in Algorithm 1. Algorithm 1

Computation of the gradient and Hessian of Q with respect to m . Input: control objective Q , PDE r ( u, v, m, z ), mean ¯ m , direction ˆ m . Output: the gradient Q m ( ¯ m ) and Hessian Q mm ( ¯ m ) acting on ˆ m .1. Solve the state problem (3.8) for u .2. Solve the adjoint problem (3.9) for v .3. Compute the gradient Q m ( ¯ m ) by (3.10).4. Solve the incremental state problem (3.12) for ˆ u .5. Solve the incremental adjoint problem (3.13) for ˆ v .6. Compute the Hessian Q mm ( ¯ m ) acting on ˆ m by (3.14). The ﬁrst and secondmoments of the quadratic approximation Q quad in equation (3.5) involve the trace ofthe covariance preconditioned Hessian H . H is formally a large scale dense operatorand it is only implicitly deﬁned through its action on a given direction as in (3.14).The standard approach to estimate the trace of an implicitly deﬁned operator isbased on a Monte Carlo method, where one computes the action of the operator on N random directions sampled from a suitable distribution. For example, using theGaussian trace estimator, one can approximate tr( H ) and tr( H ) as(3.15) tr( H ) ≈ b T ( H ) := 1 N N X j =1 h ˆ m j , Q mm ( ¯ m ) ˆ m j i , Taylor approximation and variance reduction for optimal control under uncertainty and(3.16) tr( H ) ≈ b T ( H ) := 1 N N X j =1 h Q mm ( ¯ m ) ˆ m j , C Q mm ( ¯ m ) ˆ m j i , where ˆ m j , j = 1 , . . . , N , are i.i.d. random samples from the Gaussian distribution N (0 , C ). Lower bounds on the number of samples N to obtain a guaranteed prob-abilistic error bound for the trace are discussed in [5]. Such bounds hold for anysymmetric positive deﬁned operator, regardless of size and spectral properties. How-ever the number of samples N necessary to obtain an accurate approximation maybe prohibitively large and impractical: e.g., to guarantee a relative error of 10 − inthe estimation of the trace more than 10 samples are need using the bounds in [5].Improved bounds are demonstrated in [66] and show that the Gaussian estimator be-comes very ineﬃcient if the stable rank of the operator is small and that it allows forsmall N only if the eigenvalues are all of approximately the same size. This meansthat the randomized Gaussian estimator is not a viable solution to estimate the traceof H , since it has been observed numerically or proven analytically that for manyproblems the Hessian operator is either nearly low-rank or its eigenvalues exhibit fastdecay [8, 36, 19, 21, 20, 18, 23, 30, 2, 3, 1, 33, 64, 46, 58, 22, 24].In this work, we propose to estimate the trace of H and H by means of arandomized approximated eigendecomposition. Speciﬁcally, we approximate tr( H )and tr( H ) by(3.17) tr( H ) ≈ b T ( H ) := N X j =1 λ j ( H ) , and tr( H ) ≈ b T ( H ) := N X j =1 λ j ( H ) , where λ j ( H ), j = 1 , . . . , N , are the dominant eigenvalues of H obtained by solvingthe generalized eigenvalue problem(3.18) h φ, Q mm ( ¯ m ) ψ j i = λ j h φ, C − ψ j i , ∀ φ ∈ M , j = 1 , . . . , N. Above, the eigenvector ψ j , j = 1 , . . . , N , are C − -orthonormal, that is(3.19) h ψ j , C − ψ i i = δ ij , i, j = 1 , . . . , N, where δ ij is the Kronecker delta. Note that N is independent of the nominal parameterdimension, and, as shown numerically in Sec. 5, it is relatively small for H nearly low-rank or when its spectrum exhibits fast decay.After suitable discretization, e.g., by a ﬁnite element discretization, the gener-alized eigenvalue problem (3.18) results in an algebraic eigenproblem of the form Aψ = λBψ with A, B ∈ R n × n and ψ ∈ R n . Algorithm 2 summarizes the so-called double pass randomized algorithm to solve the algebraic generalized eigenvalue prob-lem (see [42, 70] for details of the algorithms and [76] for its implementation).There are multiple advantages in using Algorithm 2 to approximate the trace,see e.g., [42, 69]. In terms of accuracy, the approximation error is bounded by thesum of the remaining eigenvalues, so that the error is small if the eigenvalues decayfast or if the Hessian H has low rank, see [42, 69] for more details. In terms ofcomputational eﬃciency, the 2( k + p ) Hessian matrix-vector products, which entailthe solution of a pair of linearized state/adjoint equations (as shown in Sec. 3.2)and therefore dominate the computational cost of the algorithm, can be computedindependently and, therefore, asynchronously across the random directions. . Chen, U. Villa, and O. Ghattas Algorithm 2

Randomized algorithm for generalized eigenvalue problem (

A, B ) Input: matrix

A, B , the number of eigenpairs k , an oversampling factor p . Output: eigenpairs (Λ k , Ψ k ) with Λ k = diag( λ , . . . , λ k ) and Ψ k = ( ψ , . . . , ψ k ).1. Draw a Gaussian random matrix Ω ∈ R n × ( k + p ) .2. Compute Y = B − ( A Ω).3. Compute QR -factorization Y = QR such that Q ⊤ BQ = I k + p .4. Form T = Q ⊤ AQ and compute eigendecomposition T = S Λ S ⊤ .5. Extract Λ k = Λ(1 : k, k ) and Ψ k = QS k with S k = S (: , k ). Using the moments of the truncated Taylor ex-pansion Q lin and Q quad as approximation of E [ Q ] and Var[ Q ] in the cost functional(2.7) may be not suﬃciently accurate. In this circumstance, we can use the Taylorapproximation as a control variate to reduce the variance of the Monte Carlo estima-tor. The basic idea is to correct the moments of Q lin and Q quad by applying a MonteCarlo method to estimate the mean (and variance) of the diﬀerence between Q andits Taylor approximation.Speciﬁcally, recalling (3.4) and (3.2), the Monte Carlo correction for the mean ofthe linear approximation reads E [ Q ] = E [ Q lin ] + E [ Q − Q lin ] ≈ b Q lin := Q ( ¯ m ) + 1 M M X i =1 (cid:16) Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i (cid:17) . (3.20)Similarly, recalling that the variance is invariant with respect to deterministic trans-lations, we have(3.21) Var[ Q ] = Var[ Q − Q ( ¯ m )] = E [( Q − Q ( ¯ m )) ] − ( E [ Q − Q ( ¯ m )]) = E [( Q lin − Q ( ¯ m )) ] + E [( Q − Q ( ¯ m )) − ( Q lin − Q ( ¯ m )) ] − (cid:0) E [ Q lin − Q ( ¯ m )] + E [( Q − Q ( ¯ m )) − ( Q lin − Q ( ¯ m ))] (cid:1) , and therefore the Monte Carlo correction of the variance of the linear approximationis given by(3.22) b V lin Q := hC Q m ( ¯ m ) , Q m ( ¯ m ) i + 1 M M X i =1 (cid:16) ( Q ( m i ) − Q ( ¯ m )) − ( h m i − ¯ m, Q m ( ¯ m ) i ) (cid:17) − M M X i =1 Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i ! . Note that, similar to the sample average approximation in Sec. 2.4, for simplicity wecan use the same M = M = M random samples for the approximation of both themean and variance. A more careful choice of M and M relies on the variance of Q (1) = Q − Q lin and Q (2) = ( Q − ¯ Q ) − ( Q − Q lin ) , in order to balance the errorsin the approximation of the mean and variance. For the quadratic approximation, a2 Taylor approximation and variance reduction for optimal control under uncertainty combination of (3.5) and (3.3) leads to(3.23) E [ Q ] = E [ Q quad ] + E [ Q − Q quad ] ≈ b Q quad := Q ( ¯ m ) + 12 tr( H )+ 1 M M X i =1 (cid:16) Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i − h m i − ¯ m, Q mm ( ¯ m ) ( m i − ¯ m ) i (cid:17) . For the variance, akin to the linear approximation, we obtain(3.24) Var[ Q ] = E [( Q quad − Q ( ¯ m )) ] + E [( Q − Q ( ¯ m )) − ( Q quad − Q ( ¯ m )) ] − (cid:0) E [ Q quad − Q ( ¯ m )] + E [( Q − Q ( ¯ m )) − ( Q quad − Q ( ¯ m ))] (cid:1) , which can be approximated by(3.25) b V quad Q := hC Q m ( ¯ m ) , Q m ( ¯ m ) i + 14 (tr( H )) + 12 tr( H )+ 1 M M X i =1 (cid:16) ( Q ( m i ) − Q ( ¯ m )) − (cid:16) h m i − ¯ m, Q m ( ¯ m ) i + 12 h m i − ¯ m, Q mm ( ¯ m ) ( m i − ¯ m ) i (cid:17) (cid:17) − (cid:16)

12 tr( H ) + 1 M M X i =1 (cid:16) Q ( m i ) − h m i − ¯ m, Q m ( ¯ m ) i − h m i − ¯ m, Q mm ( ¯ m ) ( m i − ¯ m ) i (cid:17)(cid:17) . Note that to achieve certain approximation accuracy, the number of Monte Carlosamples depends on the variance of the integrand. If the approximations Q lin and Q quad are highly correlated with Q , the variances of Q − Q lin and Q − Q quad wouldbe small so that only a small number of Monte Carlo samples are required. This useof the Taylor approximation (of the control objective with respect to the uncertainparameter) can be viewed as a kind of “multiﬁdelity Monte Carlo method” [63], inwhich the Taylor approximation plays the role of a reduced model.

4. Gradient-based optimization.

In this section, we present a gradient-basedoptimization method to solve the problem (2.10) using diﬀerent approximations intro-duced in Sec. 3, namely the sample average approximation, the ﬁrst and second orderTaylor approximation, and variance reduction using the Taylor approximation as thecontrol variate. These approximated formulations feature diﬀerent levels of accuracyand computational complexity, and could possibly be combined within a multiﬁdelityframework to obtain an eﬃcient computational procedure while preserving high ac-curacy. In this section, we use the Lagrangian formalism to derive expressions forthe gradient of the approximated formulations of the cost functional with respect tothe control z , which are needed to perform the gradient-based optimization. In thefollowing, to distinguish the gradient of the cost function with respect to the controlfrom the gradient of the control objective with respect to the uncertain parameter,we denote the former as z -gradient. z -gradient for the sample average approximation. By the sampleaverage approximation presented in Sec. 2.4, the cost functional becomes(4.1) J MC ( z ) = b Q + β b V Q + P ( z ) , subject to the state problem (2.5) at m i , i = 1 , . . . , M , where M = M = M if thesame samples are used for the mean and variance or M = M + M otherwise. To . Chen, U. Villa, and O. Ghattas z ( z -gradient for short), weconsider the Lagrangian functional(4.2) L MC ( { u i } , { v i } , z ) = J MC ( z ) + M X i =1 r ( u i , v i , m i , z ) , where { v i } are the Lagrange multipliers (adjoints) of the state problems: given m i ∈M and z ∈ Z , ﬁnd u i ∈ U such that(4.3) r ( u i , v i , m i , z ) = 0 , ∀ v i ∈ V ; i = 1 , . . . M. The i -th adjoint problem ( i = 1 , . . . M ) is obtained by setting to zero the ﬁrst variationof the Lagrangian functional with respect to the state u i . The i -th adjoint v i thensolves: given m i ∈ M , z ∈ Z , and u i ∈ U as the solution of (4.3), ﬁnd v i ∈ V suchthat(4.4) h ˜ u i , ∂ u r ( u i , v i , m i , z ) i = − M h ˜ u i , ∂ u Q ( m i ) i − βM Q ( m i ) h ˜ u i , ∂ u Q ( m i ) i + 2 βM  M M X j =1 Q ( m j )  h ˜ u i , ∂ u Q ( m i ) i , ∀ ˜ u i ∈ U . Then we can compute the z -gradient as(4.5) h ˜ z, D z J MC ( z ) i = h ˜ z, ∂ z L MC ( { u i } , { v i } , z ) i = h ˜ z, D z P ( z ) i + M X i =1 h ˜ z, ∂ z r ( u i , v i , m i , z ) i , where { u i } and { v i } denote the set of solutions of the state problem (4.3) and adjointproblem (4.4) at the set of samples { m j } , respectively. Therefore, each gradient eval-uation requires M (nonlinear) state PDE solves and M linearized PDE solves. Notethat, for the general case in which Q explicitly depends on z , extra terms involving h ˜ z, ∂ z Q ( m i ) i need to be included in the gradient computation. z -gradient for the Taylor approximation. We ﬁrst consider the linearapproximation (3.4), by which the cost functional (2.7) becomes(4.6) J lin ( z ) = ¯ Q + β hC Q m ( ¯ m ) , Q m ( ¯ m ) i + P ( z ) , where ¯ Q and its gradient Q m ( ¯ m ) are deﬁned in (3.6) and (3.10), respectively.To compute the z -gradient of (4.6), we introduce the Lagrangian functional(4.7) L lin ( u, v, u ∗ , v ∗ , z ) = J lin ( z ) + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i , where the adjoints v ∗ ∈ V and u ∗ ∈ U represent the Lagrange multipliers of the stateproblem (3.8) and adjoint problem (3.9).By setting to zero the ﬁrst order variation of the Lagrangian (4.7) with respectto v and using (2.2), we have: ﬁnd u ∗ ∈ U such that(4.8) h ˜ v, ∂ vu ¯ ru ∗ i = − β h ˜ v, ∂ vm ¯ r ( C ∂ m ¯ r ) i , ∀ ˜ v ∈ V . while setting to zero the ﬁrst order variation with respect to u and using (2.2), wehave: ﬁnd v ∗ ∈ V such that(4.9) h ˜ u, ∂ uv ¯ r v ∗ i = − h ˜ u, ∂ u ¯ Q i − β h ˜ u, ∂ um ¯ r ( C ∂ m ¯ r ) i − h ˜ u, ∂ uu ¯ r u ∗ + ∂ uu ¯ Q u ∗ i , ∀ ˜ u ∈ U . Taylor approximation and variance reduction for optimal control under uncertainty

Finally, the z -gradient of J lin in a direction ˜ z ∈ Z is computed as(4.10) h ˜ z, D z J lin ( z ) i = h ˜ z, ∂ z L lin ( u, v, u ∗ , v ∗ , z ) i = 2 β h ˜ z, ∂ zm ¯ r ( C ∂ m ¯ r ) i + h ˜ z, ∂ z P ( z ) i + h ˜ z, ∂ zv ¯ r v ∗ + ∂ zu ¯ r u ∗ i . In summary, evaluation of the cost functional with the linear approximation requiressolution of the state problem (3.8) and adjoint problem (3.9). Computation of the z -gradient requires, in addition, solution of the two linear problems in (4.8) and (4.9).For the quadratic approximation (3.3), thanks to the relation (3.5) and the ap-proximation (3.16), we obtain the approximation of the cost functional as(4.11) J quad ( z ) = ¯ Q + 12 N X j =1 λ j + β  hC Q m ( ¯ m ) , Q m ( ¯ m ) i + 12 N X j =1 λ j  + P ( z ) , which is subject to the state and adjoint problems (3.8) and (3.9) for the evaluationof Q and its gradient at ¯ m , the generalized eigenvalue problems (3.18) for the compu-tation of the eigenvalues, as well as the incremental state and adjoint problems (3.12)and (3.13) for the Hessian action in (3.18). Correspondingly, we form the Lagrangian(4.12) L quad (cid:0) u, v, { λ j } , { ψ j } , { ˆ u j } , { ˆ v j } , u ∗ , v ∗ , { λ ∗ j } , { ψ ∗ j } , { ˆ u ∗ j } , { ˆ v ∗ j } , z (cid:1) = J quad ( z ) + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i + N X j =1 h ψ ∗ j , ( Q mm ( ¯ m ) − λ j C − ) ψ j i + N X j =1 λ ∗ j ( h ψ j , C − ψ j i − N X j =1 h ˆ v ∗ j , ∂ vu ¯ r ˆ u j + ∂ vm ¯ r ψ j i + N X j =1 h ˆ u ∗ j , ∂ uv ¯ r ˆ v j + ∂ uu ¯ r ˆ u j + ∂ uu ¯ Q ˆ u j + ∂ um ¯ r ψ j i . Here we assume that the dominating eigenvalues are not repeated (thanks to theirrapid decay as shown in Sec. 5), so that the constraints h ψ j , C − ψ j i = 1, j = 1 , . . . , N ,are suﬃcient to guarantee the orthonormality h ψ j , C − ψ i i = δ ij for any i, j = 1 , . . . , N .In fact, for any i = j , by deﬁnition we have(4.13) h ψ j , Q mm ψ i i = h ψ j , λ i C − ψ i i and h ψ i , Q mm ψ j i = h ψ i , λ j C − ψ j i , which, by the symmetry of Q mm and C − , leads to(4.14) ( λ i − λ j ) h ψ i , C − ψ j i = 0 , so that h ψ i , C − ψ j i = 0 if λ i = λ j . By setting the variation of L quad with respect to λ j as zero, we obtain(4.15) ψ ∗ j = 1 + 2 βλ j ψ j , j = 1 , . . . , N. Subsequently, for each j = 1 , . . . , N , taking the variation of L quad with respect to ˆ v j as zero, and using the Hessian action equation (3.14) and (2.2), we have: ﬁnd ˆ u ∗ j ∈ U such that(4.16) h ˜ v, ∂ vu ¯ r ˆ u ∗ j i = −h ˜ v, ∂ vm ¯ r ψ ∗ j i , ∀ ˜ v ∈ V , which is the same as the incremental forward problem (3.12). Therefore, from theresult (4.15), we have(4.17) ˆ u ∗ j = 1 + 2 βλ j u j , j = 1 , . . . , N . . Chen, U. Villa, and O. Ghattas u j as zero and using (2.2), we have: ﬁnd ˆ v ∗ j ∈ V such that(4.18) h ˜ u, ∂ uv ¯ r ˆ v ∗ j i = −h ˜ u, ∂ uu ¯ r ˆ u ∗ j + ∂ uu ¯ Q ˆ u ∗ j + ∂ um ¯ r ψ ∗ j i , ∀ ˜ u ∈ U , which is the same as the incremental adjoint problem (3.13). Therefore, by (4.17) and(4.15), we obtain(4.19) ˆ v ∗ j = 1 + 2 βλ j v j , j = 1 , . . . , N. By setting to zero the variation with respect to v and using (2.2), we have: ﬁnd u ∗ ∈ U such that(4.20) h ˜ v, ∂ vu ¯ r u ∗ i = − β h ˜ v, ∂ vm ¯ r ( C ∂ m ¯ r ) i − N X j =1 h ˜ v, ∂ vmu ¯ r ˆ u j ψ ∗ j + ∂ vmm ¯ r ψ j ψ ∗ j i− N X j =1 h ˜ v, ∂ vuu ¯ r ˆ u j ˆ u ∗ j + ∂ vum ¯ r ψ j ˆ u ∗ j i , ∀ ˜ v ∈ V . Lastly, by setting to zero the variation with respect to u and using (2.2), we have:ﬁnd v ∗ ∈ V such that(4.21) h ˜ u, ∂ uv ¯ r v ∗ i = −h ˜ u, ∂ u ¯ Q i − β h ˜ u, ∂ um ¯ r ( C ∂ m ¯ r ) i − h ˜ u, ∂ uu ¯ r u ∗ + ∂ uu ¯ Q u ∗ i− N X j =1 h ˜ u, ∂ umv ¯ r ˆ v j ψ ∗ j + ∂ umu ¯ r ˆ u j ψ ∗ j + ∂ uum ¯ r ψ j ψ ∗ j i− N X j =1 h ˜ u, ∂ uvu ¯ r ˆ u j ˆ v ∗ j + ∂ uvm ¯ r ψ j ˆ v ∗ j i− N X j =1 h ˜ u, ∂ uuv ¯ r ˆ v j ˆ u ∗ j + ∂ uuu ¯ r ˆ u j ˆ u ∗ j + ∂ uuu ¯ Q ˆ u j ˆ u ∗ j + ∂ uum ¯ r ψ j ˆ u ∗ j i , ∀ ˜ u ∈ U , where the third order derivatives take the form of the bilinear map ∂ xyz f : Z×Y → X ′ .Then, the z -gradient of the cost functional in the direction ˜ z ∈ Z can be computedas(4.22) h ˜ z, D z J quad ( z ) i = h ˜ z, ∂ z L quad (states , adjoints , z ) i , where states and adjoints represent the state and adjoint variables for short. Notethat we do not need to solve for λ ∗ j as the terms involving λ ∗ j are independent of z . In summary, to get these states and adjoints, we need to solve 1 (nonlinear) stateproblem and 2 N +3 linear problems (1 for v , N for { ˆ u j } , N for { ˆ v j } , 1 for u ∗ , and 1 for v ∗ ), where the linear operators are the same as those in (3.12) and (3.13). Moreover,to compute the eigenvalues by Algorithm 2, which requires the action of the Hessianon N + p random directions twice, we need to solve 4( N + p ) linear problems.We summarize the evaluation of the cost functional and its z -gradient withquadratic approximation as follows.6 Taylor approximation and variance reduction for optimal control under uncertainty

Algorithm 3

Computation of the cost functional J quad and its z -gradient. Input: cost functional J ( z ) and PDE r ( u, v, m, z ). Output: the approximate cost functional J quad and its z -gradient D z J quad .1. Solve the state problem (3.8) for u at ¯ m and evaluate ¯ Q = Q ( u ( ¯ m )).2. Solve the adjoint problem (3.9) for v and evaluate Q m ( ¯ m ) by (3.10).3. Use Algorithm 1 and 2 to compute the generalized eigenpairs ( λ j , ψ j ) Nj =1 .4. Compute the approximate cost functional J quad by (4.11).5. Solve the linear problem (4.20) for ˆ u ∗ .6. Solve the linear problem (4.21) for ˆ v ∗ .7. Compute the z -gradient D z J quad by (4.22). z -gradient for the Monte Carlo correction. The approximation in (4.6)and (4.11) introduced by the Taylor approximation may be not suﬃciently accurateand generates bias in the evaluation of the mean E [ Q ] and variance Var[ Q ]. Recalling(3.20) and (3.22), we obtain a MC-corrected unbiased linear approximation for thecost functional, which is given by(4.23) J MClin ( z ) = b Q lin + β b V lin Q + P ( z ) , subject to the state problem (3.8) at ¯ m , the adjoint problem (3.9), and the stateproblems (2.5) at m i , i = 1 , . . . , M , where M = M = M if the same samplesare used for the mean and variance or M = M + M otherwise. We form thecorresponding Lagrangian functional as(4.24) L MClin (cid:16) u, v, { u i } , u ∗ , v ∗ , { v i } , z (cid:17) = J MClin ( z ) + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i + M X i =1 r ( u i , v i , m i , z ) . Setting variation with respect to v ∗ , v i , and u ∗ to zero we obtain u , u i , and v bysolving 1 + M state problems (2.5) at m i , i = 0 , . . . , M , and 1 linear adjoint problem(3.9). In a similar way, setting variation with respect to v , u i , and u to zero, weobtain u ∗ , v i , and v ∗ , by solving 2 + M linear problems. Thus, in addition, 2 + M linear problems need to be solved to compute the z -gradient(4.25) h ˜ z, D z J MClin ( z ) i = h ˜ z, ∂ z L MClin (cid:16) u, v, { u i } , u ∗ , v ∗ , { v i } , z (cid:17) i . As for the quadratic approximation (3.3), by the Monte Carlo correction (3.23)and (3.25), we have an unbiased and more accurate evaluation of the cost functionalas(4.26) J MCquad ( z ) = b Q quad + β b V quad Q + P ( z ) , subject to the state problems (2.5) and (3.8), the adjoint problem (3.9), the incre-mental state and adjoint problems (3.12) and (3.13). We form the Lagrangian with . Chen, U. Villa, and O. Ghattas u ∗ , v ∗ , { v i } , { λ ∗ j } , { ψ ∗ j } , { ˆ u ∗ j } , { ˆ v ∗ j } , { ˆ u ∗ i } , { ˆ v ∗ i } ) as(4.27) L MCquad (cid:16) u, v, { u i } , { λ j } , { ψ j } , { ˆ u j } , { ˆ v j } , { ˆ u i } , { ˆ v i } , u ∗ , v ∗ , { v i } , { λ ∗ j } , { ψ ∗ j } , { ˆ u ∗ j } , { ˆ v ∗ j } , { ˆ u ∗ i } , { ˆ v ∗ i } , z (cid:17) = J MCquad + h v ∗ , ∂ v ¯ r i + h u ∗ , ∂ u ¯ r + ∂ u ¯ Q i + M X i =1 r ( u i , v i , m i , z )+ N X j =1 h ψ ∗ j , ( Q mm ( ¯ m ) − λ j C − ) ψ j i + N X j =1 λ ∗ j ( h ψ j , C − ψ j i − N X j =1 h ˆ v ∗ j , ∂ vu ¯ r ˆ u j + ∂ vm ¯ r ψ j i + N X j =1 h ˆ u ∗ j , ∂ uv ¯ r ˆ v j + ∂ uu ¯ r ˆ u j + ∂ uu ¯ Q ˆ u j + ∂ um ¯ r ψ j i + M X i =1 h ˆ v ∗ i , ∂ vu ¯ r ˆ u i + ∂ vm ¯ r m i i + M X i =1 h ˆ u ∗ i , ∂ uv ¯ r ˆ v i + ∂ uu ¯ r ˆ u i + ∂ uu ¯ Q ˆ u i + ∂ um ¯ r m i i . By setting variation with respect to v ∗ , u ∗ , v i , ˆ v ∗ j , ˆ u ∗ j , ˆ v ∗ i , ˆ u ∗ i to zero, we obtain the states u, v, u i , ˆ u j , ˆ v j , ˆ u i , ˆ v i by solving 1 + M (nonlinear) state problems and 1 + 2 N + 2 M linear problems, i = 1 , . . . , M , j = 1 , . . . , N . For the eigenpairs, 4( N + p ) linearproblems needs to be solved. By setting variation with respect to λ j , ˆ v j , and ˆ u j tozero, we obtain similar expressions (4.15), (4.17), and (4.19) for the adjoint variables ψ ∗ j , ˆ u ∗ j , and ˆ v ∗ j , where the constant βλ j is replaced by βλ j − βc where c is givenby(4.28) c = 1 M M X i =1 Q ( m i ) − Q ( ¯ m ) − h m i − ¯ m, Q m ( ¯ m ) i− h m i − ¯ m, Q mm ( ¯ m )( m i − ¯ m ) i . By setting variation with respect to ˆ v i , ˆ u i , u i , v, u to vanish, we get the adjointsˆ u ∗ i , ˆ v ∗ i , v i , u ∗ , v ∗ by solving 2 + 3 M linear problems, which are deﬁned similarly asthat for the quadratic approximation, but with the additional correction terms inboth the cost functional and the constraints, thus in total 1 + M (nonlinear) stateproblems and 3 + 6 N + 4 p + 5 M linear problems to evaluate the z -gradient(4.29) h ˜ z, D z J MCquad ( z ) i = h ˜ z, ∂ z L MCquad (states , adjoints , z ) i , where states and adjoints represent the state and adjoints for short. z -gradient-based optimization method. To solve the PDE-constrainedoptimal control problem we use a gradient-based optimization method, since it onlyrequires the ability to evaluate the cost functional (2.7) and its z -gradient. In partic-ular, we use a BFGS method [62].The ﬁve approximations described in this section all diﬀer among each other interms of accuracy and computational cost. Table 1 summarizes the cost, measuredin number of (nonlinear) state and linear PDE solves, to evaluate cost functional andits z -gradient for each approximation. The linear approximation in (4.6) is the cheap-est to compute but also the most crude, while the MC-corrected approximations in(4.23) and (4.26) are the most accurate but also the most expensive. The quadraticapproximation (4.11) is, in many application, more accurate then the linear approx-imation and, for highly nonlinear problems, is much cheaper to compute than theMC-corrected approximation. In fact, all the linear PDE solves in the trace approxi-mation share the same left hand side ∂ vu ¯ r (or its adjoint ∂ uv ¯ r ), and therefore one can8 Taylor approximation and variance reduction for optimal control under uncertainty amortize the cost of computing an expensive preconditioner or sparse factorizationover several solutions of the linearized PDE. It is worth noting that these diﬀerentapproximations can be exploited in a multiﬁdelity framework [63] where the optimalsolution of a cheaper approximation is used as initial guess for a more accurate (andmore expensive) approximation. For example, in the numerical results section theoptimal solution using the MC-corrected formulation are obtained by ﬁrst solving thelinear approximation in (4.6), then solving the quadratic approximation in (4.11) andﬁnally (4.23) or (4.26).

Table 1

The computational cost for the evaluation of the cost functional J and its z -gradient D z J interms of the number of PDE solves for the ﬁve cases of approximation of E [ Q ] and Var [ Q ] . N isthe number of samples/bases for the trace estimator; p is the oversampling factor ( p = 5 ∼ ) inAlgorithm 2; M is the number of samples used in the sample average approximation or the MonteCarlo correction. Computational cost MC Linear Quadratic Linear+MC Quadratic+MC J (state PDE) M 1 1 1+M 1+M J (linear PDE) 0 1 1+4N+4p 1 1+4N+4p+2M D z J (linear PDE) M 2 2+2N 2+M 2+2N+3M

5. Numerical experiments.

We consider two examples of PDE-constrainedoptimal control under uncertainty in which the uncertain parameter is the discretiza-tion of an inﬁnite-dimensional random ﬁeld. These examples aim to demonstrate thescalability, computational eﬃciency and accuracy of the approximation based on theTaylor approximation and their MC-corrected counterparts. In the ﬁrst example weconsider a ﬂuid ﬂow in a porous medium; the control is the ﬂuid injection rate into thesubsurface at speciﬁc well locations, the uncertain parameter is the permeability ﬁeld(modeled as a log-normal random ﬁeld), and the objective is to drive the pressuresmeasured at the production wells to given target values. In the second example weseek an optimal velocity proﬁle on the inlet boundary of a turbulent jet ﬂow suchthat the jet width in a given cross-section is maximized; the governing state equa-tions are the Reynolds-averaged Navier–Stokes equations (RANS) with an algebraicclosure model and an ad-hoc nonlinear advection-diﬀusion equation with uncertaincoeﬃcients is introduced to model the intermittency of turbulence at the edges of thejet. The numerical results presented in this section are obtained using hIPPYlib [76], an extensible software framework for large scale PDE-based deterministic andBayesian inverse problems. hIPPYlib builds on

FEniCS [57] (a parallel ﬁnite elementelement library) for the discretization of the PDE and

PETSc [7] for scalable and ef-ﬁcient linear algebra operations and solvers. Finally, we use the implementation ofthe L-BFGS algorithm in the

Python library

SciPy , speciﬁcally the fmin l bfgs b routine (BFGS with limited memory and box constraint), to solve the optimizationproblem.

We con-sider an optimal source control problem—originally presented in [3]—motivated byﬂow in a subsurface porous medium, where the state problem is a linear elliptic equa- . Chen, U. Villa, and O. Ghattas −∇ · ( e m ∇ u ) = F z in D,u = g on Γ D ,e m ∇ u · n = 0 on Γ N , where D is an open and bounded physical domain D ⊂ R d ( d = 2 ,

3) with Lipschitzboundary ∂D , and Γ D , Γ N ⊂ ∂D (Γ D ∪ Γ N = ∂D and Γ D ∩ Γ N = ∅ ) denote re-spectively the Dirichlet and Neumann portion of ∂D . u is the state representing thepressure. m is an uncertain parameter representing the logarithm of the permeabilityﬁeld, which is assumed to be Gaussian N ( ¯ m, C ) with mean ¯ m and covariance C . F isa map from the control z to the source term F z deﬁned as(5.2)

F z = n c X i =1 z i f i , where f i , i = 1 , . . . , n c , are the molliﬁed Dirac functions located at the n c injectionwells, and z = ( z , . . . , z n c ) ⊤ ∈ R N c is a control that represents the ﬂuid injectionrate. The control z takes admissible value in Z = [ z min , z max ] n c , where z min and z max denote the minimum and maximum injection rate.Let R g denote the lifting of the boundary data g such that R g ∈ H ( D ) and R g | Γ D = g . The weak formulation of problem (5.1) reads: Given m ∈ M = L ( D )and z ∈ Z , ﬁnd ˚ u = u − R g ∈ U := { v ∈ H ( D ) : v | Γ D = 0 } such that(5.3) r (˚ u, v, m, z ) = 0 , ∀ v ∈ V , where V = U , and(5.4) r (˚ u, v, m, z ) = Z D e m ∇ (˚ u + R g ) · ∇ vdx − Z D F z vdx.

The control objective measures the diﬀerence between the pressure (computed bysolving the state problem (5.3)) and a target pressure distribution ¯ u = (¯ u , . . . , ¯ u n p )at the production wells x , . . . , x n p ∈ D . Speciﬁcally,(5.5) Q (˚ u ) = ||B ˚ u − ¯ u || , where B : U → R n p is a pointwise observation operator at the locations x , . . . , x n p .Here, we set D = (0 , × (0 , D = { , } × [0 , N = (0 , × { , } . Weimpose the Dirichlet data g = 2 − x . The mean ﬁeld ¯ m is shown in the top part ofFig. 1, in which locations for the 20 injection wells and the 12 production wells arealso shown. For the covariance C deﬁned in (2.3), we specify Θ = [1 ,

0; 0 , α = 2, α = 0 .

1, and α = 20. Three random samples of the Gaussian measure N ( ¯ m, C )are shown in the bottom part of Fig. 1. In the cost functional (2.7), we set β = 1and use the control cost P ( z ) = 10 − × || z || . We use a ﬁnite element discretizationwith a structured uniform triangular mesh with 65 ×

33 vertices and piecewise linearelements for the discrete approximation of the state u , the adjoint v , and the uncertainparameter m , which results in 2145 dimensions for m . We specify the desired pressure¯ u i = 3 − x i − − x i − . at the production wells x i = ( x i , x i ), i = 1 , . . . , z min = 0 and z max = 32 for the control, i.e., z ∈ [0 , .0 Taylor approximation and variance reduction for optimal control under uncertainty

Fig. 1 . Top: mean ¯ m of the Gaussian uncertain parameter m ∼ N ( ¯ m, C ) . The red diamondsrepresent the locations of the injection wells, and the black squares represent the locations of theproduction wells. Bottom: random samples drawn from the Gaussian measure N ( ¯ m, C ) . We consider the ﬁve diﬀerent approximations of the cost functional presentedin Sec. 3, and use the limited memory BFGS method with box constraints to solvethe optimization problem, where the z -gradient of the cost functional is computed asin Sec. 4. We chose z = (16 , . . . ,

16) as an initial guess for the control. We used M = 100 random samples from N ( ¯ m, C ) to estimate the mean and variance of thecontrol objective in both the sample average approximation presented in Sec. 2.4 andthe variance reduction in Sec. 3.4.For the computation of the trace appearing in the quadratic approximations (3.5),(3.23), and (3.25), we use the trace estimator b T ( · ) (3.17) by solving the generalizedeigenvalue problem (3.18) using Algorithm 2, with k = 100 and p = 10. We also testthe Gaussian trace estimator b T ( · ) in (3.15). In Fig. 2, we plot the errors, denoted as error and error for the two trace approximations, respectively, against the referencevalue P kj =1 λ j with k = 140 at both the initial control z = z and the optimal control z = z MCquad obtained by the quadratic approximation with Monte Carlo correction. Wecan observe that in both cases, the error for b T ( H ) decays much faster than thatfor b T ( H ) as a result of fast decaying eigenvalues, especially at the optimal controlwhere more than two orders of magnitude improvement in accuracy is observed for N = 100. We remark that the slow decay of the error for b T ( · ) may demand a verylarge N if relatively high accuracy of the trace estimator is required, which leadsto a large number of PDE solves at each step of the optimization procedure, thusdiminishing the computational advantage of using the Taylor approximation. In thefollowing test, we use b T ( · ) with N = 100.Fig. 3 shows the relative errors at 100 random samples of the uncertain parameter m for the linear and quadratic approximations of Q in (5.5) used for the computationof E [ Q ], as well as of q = ( Q − ¯ Q ) in (3.22) and (3.25) for the computation of Var[ Q ].We note that, for both quantities, the quadratic approximation is more accurate thanthe linear approximation for most of the realizations.Since the quadratic approximation is very close and strongly correlated to thetrue value, we may expect large variance reduction by using it as a control variate.In fact, the correlation coeﬃcient between the quadratic approximation and the truevalue is very close to one, 0.996 for Q and 0.998 for q evaluated using the 100 samples.On the contrary, the linear approximation is poorly correlated to the true value (with . Chen, U. Villa, and O. Ghattas N -4 -3 -2 -1 λ + λ − error error N -7 -6 -5 -4 -3 -2 -1 λ + λ − error error Fig. 2 . The decay of the generalized eigenvalues, λ + for positive eigenvalues and λ − for negativeeigenvalues, and the errors, denoted as error and error for the trace estimators b T and b T withrespect to the number N in (3.15) and (3.17) . Left: at the initial control z = z ; right: at theoptimal control z = z MCquad obtained by quadratic approximation with Monte Carlo correction. sample -4 -3 -2 -1 r e l a t i v e e rr o r f o r Q linearquadratic 0 20 40 60 80 100 sample -3 -2 -1 r e l a t i v e e rr o r f o r q linearquadratic Fig. 3 . Relative errors at 100 random samples of the uncertain parameter for the linear andquadratic approximations of Q used in the computation of the mean E [ Q ] (left) and of q = ( Q − ¯ Q ) for the variance Var [ Q ] (right). Here, the errors are show at the optimal control z = z MCquad obtainedby quadratic approximation with Monte Carlo correction. correlation coeﬃcient -0.211 for Q and 0.718 for q ), thus leading to negligible vari-ance reduction. In Tables 2 and 3, we report the eﬀect of the variance reduction bythe linear and quadratic approximations for the computation of the mean E [ Q ] andvariance Var[ Q ], respectively. This reduction is measured by the mean-square-error(MSE), which is deﬁned (e.g., for Q ) as(5.6) MSE( Q ) = Var p [ Q ] M , where Var p [ Q ] is the population variance of Q at M random samples.In all cases, we can see that the quadratic approximation achieves more signiﬁcantvariance reduction than the linear approximation. This means that the number ofsamples needed to achieve a target MSE is signiﬁcantly smaller for the quadraticapproximation than for the linear. For example, at the initial control z , the quadraticapproximation results in a speed-up (deﬁned as the ratio between the MSE of theMonte Carlo estimator and the MSE of the correction) of order 1000X and 100Xin the estimation of E [ Q ] and Var[ Q ], respectively. On the other hand, the linear2 Taylor approximation and variance reduction for optimal control under uncertainty

Table 2

Variance reduction by the linear and quadratic approximations at the initial control z , and theoptimal controls z MC lin and z MC quad . The mean-square-error (MSE) with diﬀerent numbers of randomsamples for the evaluation of the mean E [ Q ] are reported. control E MC [ Q ] MSE( Q ) MSE( Q − Q lin ) MSE( Q − Q quad ) z

10 2.42e+01 1.10e+00 7.94e-03 1.52e-04100 2.54e+01 2.54e-01 4.16e-03 7.40e-05 z MClin

10 3.01e-01 1.10e-03 1.31e-03 1.69e-05100 2.89e-01 9.68e-05 7.49e-05 9.00e-07 z MCquad

10 3.16e-01 1.11e-03 8.02e-04 1.05e-05100 2.98e-01 1.10e-04 9.98e-05 1.12e-06

Table 3

Variance reduction by the linear and quadratic approximations at the initial control z , andthe optimal controls z MC lin and z MC quad . The MSE for the evaluation of the variance Var [ Q ] , where q = ( Q − ¯ Q ) , q lin = ( Q lin − ¯ Q ) , q quad = ( Q quad − ¯ Q ) are reported. control E MC [ q ] MSE( q ) MSE( q − q lin ) MSE( q − q quad ) z

10 1.32e+01 1.71e+01 9.38e-01 6.05e-03100 2.54e+01 1.35e+01 2.61e+00 3.09e-02 z MClin

10 1.99e-02 1.00e-04 9.35e-05 3.92e-06100 1.65e-02 1.32e-05 1.22e-05 2.44e-07 z MCquad

10 2.38e-02 1.82e-04 1.79e-04 8.07e-07100 1.99e-02 1.66e-05 1.65e-05 3.81e-07approximation provides a speed-up of order 100X and 10X respectively. In addition,at the optimal controls z MClin and z MCquad , the quadratic approximation can still achievea ∼ We consider aturbulent jet ﬂow modeled by the Reynolds-averaged Navier–Stokes equations coupledwith a nonlinear stochastic advection-diﬀusion equation that serves as a notionalmodel of uncertainty in the turbulent viscosity ν t . Speciﬁcally, the governing equationsread(5.7) −∇ · (cid:0) ( ν + ν t ) (cid:0) ∇ u + ∇ u ⊤ (cid:1)(cid:1) + ( u · ∇ ) u + ∇ p = 0 , in D, ∇ · u = 0 , in D, −∇ · (( ν + ( γ + e m ) ν t, ) ∇ γ ) + u · ∇ γ − u · e x + b γ = 0 , in D, . Chen, U. Villa, and O. Ghattas σ n ( u ) · τ = 0 , u · n + χ W φ ( z ) = 0 , on Γ I , σ n ( u ) · n = 0 , u · τ = 0 , on Γ O ∪ Γ W , σ n ( u ) · τ = 0 , u · n = 0 , on Γ C ,γ − γ = 0 , on Γ I ∪ Γ W , σ γn ( γ ) · n = 0 , on Γ O ∪ Γ C . Here, D = (0 , × (0 ,

10) is the computational domain (the symmetric upper half ofthe physical domain for the jet ﬂow) as shown in the left part of Fig. 4. e = (1 , τ and n are the unit tangential vector and unit normal vector pointing outside thedomain D along the boundary ∂D , σ n ( u ) = ( ν + ν t ) (cid:0) ∇ u + ∇ u ⊤ (cid:1) · n is the tractionvector, and σ γn ( γ ) = ( ν +( γ + e m ) ν t, ) ∇ γ · n is the normal ﬂux of γ . The control portionof the inﬂow boundary is denoted by the indicator function χ W with χ W ( x ) = 1 if x ∈ [0 , W ] and χ W ( x ) = 0 if x ∈ ( W, W = 1 .

5. The state variables u = ( u , u ) and p are the time-averaged velocity and pressure, and ν t = γν t, is theturbulent viscosity, where ν t, represents an algebraic closure model and γ is theindicator function that is close to one near the centerline Γ C and zero outside the jetregion. Based on a dimensional analysis of the turbulent planar jet, the turbulentviscosity along the jet centerline is given by(5.9) ν t, = C √ M ( x + aW ) / , where M = R Γ I || u || ds and W are momentum and jet width at the inlet boundary, C and a are two adimensional parameters that need to be calibrated against eitherexperimental or direct Navier-Stokes simulation (DNS) data. The indicator func-tion γ satisﬁes the nonlinear stochastic advection-diﬀusion equation in (5.7) with theboundary data(5.10) γ = 0 . − . (cid:18) (cid:18) − x (cid:19) ( x − − . x ) (cid:19) . The uncertain parameter m is a Gaussian random ﬁeld with distribution N ( ¯ m, C ), forwhich we set the mean ¯ m as a constant ﬁeld and the covariance as in (2.3) with α = 2, α = α = 0 .

5, and Θ = [15 ,

0; 0 , N (0 , C ) are shownin Fig. 5. In general, the parameters ( C, a, ¯ m, b ) are also random variables whosedistributions can be inferred from experimental or DNS data. Here, for simplicity,we consider only the uncertainty in m and take deterministic values for ( C, a, ¯ m, b ) =(0 . , . , − . , . The function φ ( z ) in (5.8) is the horizontal component of the velocity proﬁle onthe inlet boundary Γ I , which is represented by B-splines as (we use t for x here)(5.11) φ ( z )( t ) = K X k =1 z k B k,n ( t ) , where the coeﬃcient vector z := ( z , . . . , z K ) ∈ [0 , K is the control and B k,n ( t ) areB-spline basis functions recursively deﬁned as(5.12) B k,n ( t ) = t − t k t k + n − t k B k,n − ( t ) + t k + n +1 − tt k + n +1 − t k +1 B k +1 ,n − ( t ) , These values are the maximum a posteriori estimates obtained by solving a Bayesian calibrationproblem where the misﬁt functional is the L distance between the velocity of (5.7) and the DNSdata provided in [48]. Taylor approximation and variance reduction for optimal control under uncertainty φ ( z ) initial controlB-spline knots Fig. 4 . Left: sketch of the physical domain of the turbulent jet ﬂow, with inlet boundary Γ I ,outlet boundary Γ O , top and bottom wall Γ W , center axis Γ C , and the cross-section Γ . The compu-tational domain D is the symmetric top half of the physical domain. Right: B-spline interpolationof the inlet velocity proﬁle used as an initial guess, and the distribution of the B-spline knots. Fig. 5 . Two random samples drawn from the Gaussian distribution N (0 , C ) with C given in (2.3) , where α = 2 , α = α = 0 . , Θ = [15 ,

0; 0 , . where(5.13) B k, ( t ) = (cid:26) t k ≤ t < t k +1 ;0 otherwise . Here we take K = 11 and use the B-spline of order n = 3 with 15 knots (4 repetitionsat each endpoint), as depicted on the right of Fig. 4. A B-spline representation of theinﬂow velocity proﬁle used for the initial control is also shown.The control objective is the jet width Q along the cross-section Γ at x = 20shown in Fig. 4, and is deﬁned as(5.14) Q ( u ) = 1 δ ( u ) Z Γ u · e dx , where δ ( u ) = u ( x ) · e with x = (20 , Q , i.e., to minimize E [ − Q ] subject to the state problem (5.7)and the constraint that the inlet momentum M is prescribed. Speciﬁcally, we considerthe cost functional(5.15) J ( z ) = E [ − Q ] + β Var[ Q ] + P ( z ) . Here we set β = 10 to balance the mean and variance, and deﬁne the penalty term P ( z ) as(5.16) P ( z ) = β (cid:18)Z Γ I ( φ ( z )) dx − M (cid:19) + β Z Γ I |∇ φ ( z ) | dx, . Chen, U. Villa, and O. Ghattas β = 10 ),and the second term controls the regularity of the inlet velocity ( β = 1).The weak formulation of the turbulence model is given by: Given the realization m ∈ M = L ( D ) and control z ∈ Z = R K , ﬁnd u ∈ U such that(5.17) r ( u, v, m, z ) = 0 , ∀ v ∈ V , where we denote the state with u = (˚ u , p, ˚ γ ) ∈ U and adjoint with v = ( v , q, η ) ∈V . The space for the state and adjoint variables are U = V = U × P × V , where U = { v ∈ ( H ( D )) : v · n = 0 on Γ C and v · τ = 0 on Γ O ∪ Γ W } , P = L ( D ), and V = { γ ∈ H ( D ) : γ | Γ W ∪ Γ I = 0 } . Let R be the lifting of the Dirichlet boundarydata for velocity on Γ C ∪ Γ O ∪ Γ W , we have u = ˚ u + R . Let R γ be the lifting of theboundary data γ on Γ I ∪ Γ W , we have γ = ˚ γ + R γ .Speciﬁcally, the weak form in (5.17) consists of three terms and reads(5.18) r ( u, v, m, z ) = Model( u, v, m ) + Stablization( u, v ) + Nitsche( u, v, m, z ) . The ﬁrst term represents the PDE model in weak form given by(5.19)Model( u, v, m ) = Z D ( ν + ν t )2 S ( u ) · S ( v ) dx + Z D [( u · ∇ ) u ] · v dx − Z D p ∇ · v dx + Z D q ∇ · u dx + Z D ( ν + ( γ + e m ) ν t, ) ∇ γ · ∇ ηdx + Z D [ u · ∇ γ ] ηdx − Z D u · e x + b γηdx. where S ( u ) = ( ∇ u + ∇ u ⊤ ) / u, v ) = Z D τ L ( u ) · D u L ( u )( v ) dx + Z D τ ( ∇ · u )( ∇ · v ) dx + Z D τ ( u · ∇ γ )( u · ∇ η ) dx, where L ( u ) represents the strong form of the residual of the momentum equation (line1) of (5.7), and τ , τ and τ are properly chosen stabilization parameters associatedwith the local P´eclet number.The third term of (5.17) represents the weak imposition of the inlet velocity proﬁle(i.e., the control) by Nitsche’s method [9] and reads(5.21) Nitsche( u, v, m, z ) = C d Z Γ I h − ( ν + ν t )( u · n + χ W φ ( z ))( v · n ) ds − Z Γ I ( σ n ( u ) · n )( v · n ) + ( σ n ( v ) · n )( u · n + χ W φ ( z )) ds, where the ﬁrst term guarantees the coercivity of the form and the second term arisesfrom integration by parts. The constant C d is taken as C d = 10 , and h is a localelement edge length along the Dirichlet boundaries. This weak imposition of the inletvelocity is necessary to reveal the boundary control in the optimization formulation.We discretize the problem using ﬁnite element with piecewise polynomials (P2,P1, P1) for the state u and P1 for the uncertain parameter m . We triangulate thecomputational domain using an anisotropic mesh, locally reﬁned in the jet region,with 14,400 triangular elements (60 elements along the x -direction and 120 along6 Taylor approximation and variance reduction for optimal control under uncertainty the x -direction). After discretization, we obtain an optimization problem in whichthe dimension of the control, state, and uncertain parameter are 11, 73084, and 7381,respectively. As in the previous example, we use a BFGS method with bound con-straints to solve the optimization problem (2.10) with the cost functional deﬁned in(5.15) and the constraint deﬁned by the state problem (5.17). At each BFGS iteration,we solve the fully-coupled nonlinear state problem using Newton’s method with anLU factorization of the Jacobian operator. To compute all the derivatives involved inthe z -gradient presented in Sec. 4 we exploit the symbolic diﬀerentiation capabilitiesof FEniCS [57].For the computation of the trace in the quadratic approximation we use boththe Gaussian trace estimator b T ( · ) in (3.15), and the trace estimator b T ( · ) in (3.17)obtained by solving the generalized eigenvalue problem (3.18). Fig. 6 displays thedecay of the eigenvalues as well as the approximation error of the two estimatorsfor both the initial control z and the optimal control z MCquad . The rapid decay of theeigenvalues (four orders of magnitude reduction in the ﬁrst 100 out of 7381 eigenvalues)of the covariance-preconditioned Hessian reﬂects the intrinsic low-dimensionality ofthe map from the uncertain parameter ﬁeld m to the control objective Q . This isdespite the complexity and nonlinearity of the state equations. The reference valuefor the trace tr( H ) was computed (with a precision of 10 − ) as the sum of the ﬁrst140 dominant eigenvalues of H using Algorithm 2 with k = 140 and p = 10. Itis evident that the second trace estimator achieves much faster decay of the errorthan the Gaussian trace estimator, with a gain in accuracy of about three orders ofmagnitude when using N = 100 samples/eigenvalues. In the numerical test, we usethe second trace estimator with N = 20, which already provides suﬃciently accurateresults. It is worth noticing that, in contrast, the Gaussian trace estimator wouldhave required about N = 10 samples to achieve the same accuracy. N -8 -7 -6 -5 -4 -3 -2 λ + λ − error error N -8 -7 -6 -5 -4 -3 -2 λ + λ − error error Fig. 6 . The decay of the generalized eigenvalues, λ + for positive eigenvalues and λ − for negativeeigenvalues, and the errors, denoted as error and error for the trace estimators b T and b T withrespect to the number N in (3.15) and (3.17) . Left: the initial control z = z ; right: the optimalcontrol z = z MCquad obtained by quadratic approximation with Monte Carlo correction.

Fig. 8 shows the relative diﬀerence between the true quantity of interest and thelinear and quadratic approximations evaluated at 10 random samples for both Q and q = ( Q − ¯ Q ) . It is clear that the quadratic approximation is more accurate than thelinear approximation for both quantities. Nevertheless both approximations lead todrastic reduction in the variance of the Monte Carlo correction with respect to plainMonte Carlo estimator, as demonstrated by the MSE reported in Tables 4 and 5. . Chen, U. Villa, and O. Ghattas Fig. 7 . The velocity ﬁeld of the turbulent jet ﬂow obtained solving the state model for the initial(left) and optimal (right) inlet velocity proﬁle. Speciﬁcally, the optimal proﬁle was obtained usingthe quadratic approximation with Monte Carlo correction. sample -5 -4 -3 -2 r e l a t i v e e rr o r f o r Q linearquadratic sample -3 -2 -1 r e l a t i v e e rr o r f o r q linearquadratic Fig. 8 . Relative errors at 10 random samples of the linear and quadratic approximations of Q (for the computation of the mean E [ Q ] (left)) and of q = ( Q − ¯ Q ) (for the variance Var [ Q ] (right)).Here, the errors are shown at the optimal control z = z MCquad . This variance reduction translates to greater than 100X speed up in the computationrelative to the plain Monte Carlo estimator for the same accuracy (MSE).

Table 4

Variance reduction by the linear and quadratic approximations at the optimal control z lin fordiﬀerent uncertain parameter dimensions. The mean-square-error (MSE) with 10 random samplesfor the evaluation of the mean E [ Q ] are reported. parameter dimension E MC [ − Q ] MSE( Q ) MSE( Q − Q lin ) MSE( Q − Q quad )1,891 -1.71e+00 7.40e-06 2.68e-08 1.81e-097,381 -1.59e+00 7.94e-06 1.57e-07 1.46e-0829,161 -1.44e+00 3.82e-06 7.23e-08 1.66e-08115,921 -1.42e+00 9.47e-06 6.91e-08 3.06e-08462, 241 -1.41e+00 8.85e-06 9.00e-08 1.78e-088 Taylor approximation and variance reduction for optimal control under uncertainty

Table 5

Variance reduction by the linear and quadratic approximations at the optimal control z lin fordiﬀerent uncertain parameter dimensions. The MSE with test 10 random samples for the evaluationof the variance are reported, where q = ( Q − ¯ Q ) , q lin = ( Q lin − ¯ Q ) , q quad = ( Q quad − ¯ Q ) . parameter dimension E MC [ q ] MSE( q ) MSE( q − q lin ) MSE( q − q quad )1,891 8.05e-05 9.37e-10 1.76e-11 8.77e-137,381 8.13e-05 1.15e-09 8.87e-12 1.48e-1229,161 5.60e-05 6.59e-10 3.39e-11 4.04e-12115,921 1.07e-04 1.82e-09 7.04e-11 2.08e-11462,241 8.96e-05 1.37e-09 7.78e-12 1.33e-11To investigate the scalability of the proposed algorithm, we consider up to sixlevels of uniform mesh reﬁnement: 30 ×

60, 60 × × × × × m ranges from 1 ,

891 on thecoarsest mesh to 1 , ,

961 on the ﬁnest. For the linear and quadratic approximation,we solve the optimization problem using the BFGS algorithm with convergence deﬁnedby an absolute tolerance of 10 − for the ℓ ∞ norm of the projected gradient. We alsouse a mesh continuation technique to deﬁne initial guesses for the control: on thecoarsest mesh we use the velocity proﬁle in Fig. 4 (right), and, on the ﬁner meshes,we use the optimal control obtained from the previous (coarser) mesh. For the linearapproximation with Monte Carlo correction, we set the BFGS absolute tolerance to10 − and we use the solution of the linear approximation (at the same mesh resolution)as the initial guess.The scalability of our approach with respect to the uncertain parameter dimen-sion depends on three properties, namely the dimension-independent behavior of ( i )the spectral of the covariance preconditioned Hessian, ( ii ) the convergence rate ofBFGS, and ( iii ) the variance of estimators for the Monte Carlo correction of the ob-jective function. The top-left plot of Fig. 9 demonstrates ( i ): the eigenvalues of thecovariance-preconditioned Hessian of the control objective (i.e., the generalized eigen-values of problem (3.18)) at the optimal control obtained by linear approximationexhibit the same fast decay independent of the uncertain parameter dimension (rang-ing from a thousand to a million), thus indicating that approximating the trace by arandomized eigensolver is scalable with respect to the uncertain parameter dimension.The other three plots of Fig. 9 provide numerical evidence of the mesh independentconvergence of the BFGS algorithm, property ( ii ), for this particular problem. In thecase of the linear and quadratic approximations using mesh continuation, the norm ofthe z -gradient drops below the prescribed tolerance within 12 iterations for all meshresolutions except the coarsest one (which represents a pre-asymptotic result); in thecase of the linear approximation with Monte Carlo correction, BFGS converges within19 iterations for all mesh resolutions using the solution of the linear approximationas initial guess. This implies that the gradient-based BFGS algorithm is also scalablewith respect to the uncertain parameter dimension. Finally, Tables 4 and 5 indicatethat the variance of the Monte Carlo correction , i.e., property ( iii ), using Taylorapproximation as control variates is independent of the uncertain parameter dimen-sion, and, furthermore, it allows greater than 100X speed up with respect to plainMonte Carlo estimator to achieve the same accuracy (MSE). Combining the threedimension-independent properties, we conclude that the Taylor approximation andvariance reduction stochastic optimization method is overall scalable with respect to . Chen, U. Villa, and O. Ghattas -8 -7 -6 -5 -4 -3 -2 | λ j | dim = 1,891dim = 7,381dim = 29,161dim = 115,921dim = 462,241dim = 1,038,961 -4 -3 -2 -1 | g r a d | dim = 1,891dim = 7,381dim = 29,161dim = 115,921dim = 462,241dim = 1,038,961 -4 -3 -2 -1 | g r a d | dim = 1,891dim = 7,381dim = 29,161dim = 115,921dim = 462,241dim = 1,038,961 -5 -4 -3 -2 -1 | g r a d | dim = 1,891dim = 7,381dim = 29,161dim = 115,921 Fig. 9 . Top-left: decay of the eigenvalues of the covariance-preconditioned Hessian (i.e., thegeneralized eigenvalues (in absolute value) of (3.18) ) with diﬀerent uncertain parameter dimensions(dim) at the optimal control obtained by linear approximation. Decay of the gradient with respect tothe number of BFGS iterations (

6. Conclusions.

In this work we proposed a scalable computational frameworkto solve PDE-constrained optimal control under uncertainty where the cost functionalinvolves the mean and variance of the control objective and the state equation is gov-erned by a (nonlinear) PDE with high-dimensional uncertain parameter. Speciﬁcally,we advocate the use of (low-order) Taylor expansions of the control objective withrespect to the uncertain parameter either to directly approximate the moments ofthe control objective or as control variates to reduce the variance of Monte Carloestimator. The mean and variance expressions of the (linear and quadratic) Taylorexpansions involve the trace of the covariance-preconditioned Hessian of the controlobjective with respect to the uncertain parameter. Randomized algorithms for so-lution of generalized eigenvalue problems [76, 70, 69] allow for accurate and eﬃcientapproximation of this trace and only require computing the action of the covariance-preconditioned Hessian on a number of random directions that depends on its numer-ical rank. As we showed in the numerical results, this approach is more eﬃcient andaccurate than the Gaussian trace estimator when the eigenvalues of the covariance-preconditioned Hessian exhibit fast decay, which is true when the control objectiveis only sensitive to a limited number of directions in the uncertain parameter space.This is often true regardless of the complexity of the state problem. Moreover, when0

Taylor approximation and variance reduction for optimal control under uncertainty the possibly biased Taylor (linear and quadratic) approximation is not suﬃcientlyaccurate, we can use it as a control variate and correct the mean and variance byMonte Carlo estimator of the remainder of the Taylor expansion, which was shown toachieve considerable computational cost reduction (by several orders of magnitude)compared to a plain Monte Carlo estimator. We have demonstrated the scalabilityof our method with respect to the uncertain parameter dimension in three aspects:the decay of the eigenvalues of the covariance-preconditioned Hessian of the controlobjective, the variance of the remainder of the Taylor expansion, and the number ofoptimization iterations. All three properties do not depend on the nominal dimen-sion of the uncertain parameter but only on the intrinsic dimension of the uncertainparameter, i.e. on the number of directions the control objective is sensitive to.Moreover, by using the randomized eigensolver for trace estimator, Taylor approxi-mation as control variate for variance reduction, and quasi-Newton method with aLagrangian formalism for the optimization, we showed that the complexity—measuredin number of PDE solves—scales linearly with respect to the intrinsic dimensionalityto achieve target (high) accuracy, several orders of magnitude more accurate than aplain Monte Carlo estimator. This is illustrated numerically by solving an optimalboundary control governed by a nonlinear PDE model for turbulent jet ﬂow withinﬁnite-dimensional uncertain parameter ﬁeld that characterizes the turbulent vis-cosity. Hence, the proposed computational framework for PDE-constrained optimalcontrol under uncertainty is demonstrated scalable, accurate and eﬃcient.Several extensions and further developments of the proposed computational frame-work are worth pursuing. First, we intend to investigate higher-order Taylor expan-sions beyond the quadratic approximation when the later becomes poor. Second,another research direction is to pursue more general risk measures than the mean-variance measure, such as VaR and conditional VaR [50, 73]. Third, further compu-tational savings can be achieved in combination with model reduction for the statePDE and the linearized PDEs [77, 29]. Fourth, one can replace the Monte Carloestimator in the variance reduction by a sparse quadrature [31] to possibly achievefaster convergence of the approximation error. Fifth, application of the Taylor ap-proximation and variance reduction in a multiﬁdelity framework [63] may lead toadditional computational saving. Finally, one can apply the scalable computationalframework to other challenging control and design problems, such as optimal designof metamaterials under uncertainty [26, 24].

Acknowledgement.

This work was supported by DARPA contract W911NF-15-2-0121, NSF grants CBET-1508713 and ACI-1550593, and DOE grants DE-SC0010518and DE-SC0009286. We thank Robert Moser, Todd Oliver, and Myoungkyu Lee forthe construction of the notional turbulent jet ﬂow optimal boundary control underuncertainty problem.

REFERENCES[1] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. A-optimal design ofexperiments for inﬁnite-dimensional Bayesian linear inverse problems with regularized ℓ -sparsiﬁcation. SIAM Journal on Scientiﬁc Computing , 36(5):A2122–A2148, 2014.[2] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. A fast and scalablemethod for A-optimal design of experiments for inﬁnite-dimensional Bayesian nonlinearinverse problems.

SIAM Journal on Scientiﬁc Computing , 38(1):A243–A272, 2016.[3] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. Mean-variance risk-averseoptimal control of systems governed by PDEs with random parameter ﬁelds using quadratic. Chen, U. Villa, and O. Ghattas approximations. SIAM/ASA Journal on Uncertainty Quantiﬁcation , 5(1):1166–1192, 2017.arXiv preprint arXiv:1602.07592.[4] Ahmad Ahmad Ali, Elisabeth Ullmann, and Michael Hinze. Multilevel Monte Carlo analysisfor optimal control of elliptic PDEs with random coeﬃcients.

SIAM/ASA Journal onUncertainty Quantiﬁcation , 5(1):466–492, 2017.[5] H. Avron and S. Toledo. Randomized algorithms for estimating the trace of an implicit sym-metric positive semi-deﬁnite matrix.

Journal of the ACM (JACM) , 58(2):8, 2011.[6] E. Bader, M. K¨archer, M.A. Grepl, and K. Veroy. Certiﬁed reduced basis methods forparametrized distributed elliptic optimal control problems with control constraints.

SIAMJournal on Scientiﬁc Computing , 38(6):A3921–A3946, 2016.[7] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschel-man, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G.Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, and HongZhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.6, Argonne Na-tional Laboratory, 2015.[8] O. Bashir, K. Willcox, O. Ghattas, B. van Bloemen Waanders, and J. Hill. Hessian-based modelreduction for large-scale systems with initial condition inputs.

International Journal forNumerical Methods in Engineering , 73:844–868, 2008.[9] Y. Bazilevs and T.J.R. Hughes. Weak imposition of Dirichlet boundary conditions in ﬂuidmechanics.

Computers & Fluids , 36(1):12–26, 2007.[10] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction methodsfor parametric dynamical systems.

SIAM review , 57(4):483–531, 2015.[11] P. Benner, A. Onwunta, and M. Stoll. Block-diagonal preconditioning for optimal controlproblems constrained by PDEs with uncertain inputs.

SIAM Journal on Matrix Analysisand Applications , 37(2):491–518, 2016.[12] P. Benner, E. Sachs, and S. Volkwein. Model order reduction for PDE constrained optimization.In

Trends in PDE Constrained Optimization , pages 303–326. Springer, 2014.[13] D. Bertsimas, D.B. Brown, and C. Caramanis. Theory and applications of robust optimization.

SIAM review , 53(3):464–501, 2011.[14] L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders, editors.

Large-Scale PDE-Constrained Optimization . Lecture Notes in Computational Science and Engi-neering, Vol. 30. Springer-Verlag, Heidelberg, 2003.[15] A. Borz`ı and V. Schulz.

Computational optimization of systems governed by partial diﬀerentialequations , volume 8. SIAM, 2011.[16] A. Borz`ı, V. Schulz, C. Schillings, and G. Von Winckel. On the treatment of distributeduncertainties in PDE-constrained optimization.

GAMM-Mitteilungen , 33(2):230–246, 2010.[17] T. Bui-Thanh, K. Willcox, O. Ghattas, and B. van Bloemen Waanders. Goal-oriented, model-constrained optimization for reduction of large-scale systems.

Journal of ComputationalPhysics , 224(2):880–896, 2007.[18] Tan Bui-Thanh, Carsten Burstedde, Omar Ghattas, James Martin, Georg Stadler, and Lu-cas C. Wilcox. Extreme-scale UQ for Bayesian inverse problems governed by PDEs. In

SC12: Proceedings of the International Conference for High Performance Computing,Networking, Storage and Analysis , 2012. Gordon Bell Prize ﬁnalist.[19] Tan Bui-Thanh and Omar Ghattas. Analysis of the Hessian for inverse scattering problems.Part I: Inverse shape scattering of acoustic waves.

Inverse Problems , 28(5):055001, 2012.[20] Tan Bui-Thanh and Omar Ghattas. Analysis of the Hessian for inverse scattering problems.Part II: Inverse medium scattering of acoustic waves.

Inverse Problems , 28(5):055002,2012.[21] Tan Bui-Thanh and Omar Ghattas. Analysis of the Hessian for inverse scattering problems. PartIII: Inverse medium scattering of electromagnetic waves.

Inverse Problems and Imaging ,7(4):1139–1155, 2013.[22] Tan Bui-Thanh and Omar Ghattas. A scalable MAP solver for Bayesian inverse problems withBesov priors.

Inverse Problems and Imaging , 9(1):27–54, 2015.[23] Tan Bui-Thanh, Omar Ghattas, James Martin, and Georg Stadler. A computational frameworkfor inﬁnite-dimensional Bayesian inverse problems Part I: The linearized case, with appli-cation to global seismic inversion.

SIAM Journal on Scientiﬁc Computing , 35(6):A2494–A2523, 2013.[24] P. Chen and O. Ghattas. Analysis of the Hessian for optimization under uncertainty: Applica-tion to optimal design of acoustic metamaterial. in preparation , 2018.[25] P. Chen and O. Ghattas. Hessian-based sampling for goal-oriented model reduction with high-dimensional parameter. preprint , 2018.[26] P. Chen and O. Ghattas. Scalable computational methods for optimal design of acoustic Taylor approximation and variance reduction for optimal control under uncertaintymetamaterials under uncertainty. in preparation , 2018.[27] P. Chen and A. Quarteroni. Weighted reduced basis method for stochastic optimal control prob-lems with elliptic PDE constraints.

SIAM/ASA J. Uncertainty Quantiﬁcation , 2(1):364–396, 2014.[28] P. Chen, A. Quarteroni, and G. Rozza. Stochastic optimal Robin boundary control problems ofadvection-dominated elliptic equations.

SIAM Journal on Numerical Analysis , 51(5):2700– 2722, 2013.[29] P. Chen, A. Quarteroni, and G. Rozza. Multilevel and weighted reduced basis method forstochastic optimal control problems constrained by Stokes equations.

Numerische Mathe-matik , 133(1):67–102, 2016.[30] P. Chen, U. Villa, and O. Ghattas. Hessian-based adaptive sparse quadrature for inﬁnite-dimensional Bayesian inverse problems.

Computer Methods in Applied Mechanics andEngineering , 327:147–172, 2017.[31] Peng Chen. Sparse quadrature for high-dimensional integration with Gaussian measure.

ESAIM: Mathematical Modelling and Numerical Analysis , 52(2):631–657, 2018.[32] Peng Chen, Alﬁo Quarteroni, and Gianluigi Rozza. Reduced basis methods for uncertaintyquantiﬁcation.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 5(1):813–869, 2017.[33] Benjamin Crestel, Alen Alexanderian, Georg Stadler, and Omar Ghattas. A-optimal encodingweights for nonlinear inverse problems, with application to the Helmholtz inverse problem.

Inverse Problems , 33(7):074008, 2017.[34] J. Dick, F.Y. Kuo, and I.H. Sloan. High-dimensional integration–the Quasi-Monte Carlo way.

Acta Numerica , 22:133–288, 2013.[35] J. Dick, Q.T. Le Gia, and Ch. Schwab. Higher order quasi–Monte Carlo integration for holomor-phic, parametric operator equations.

SIAM/ASA Journal on Uncertainty Quantiﬁcation ,4(1):48–79, 2016.[36] Pearl H. Flath, Lucas C. Wilcox, Volkan Ak¸celik, Judy Hill, Bart van Bloemen Waanders,and Omar Ghattas. Fast algorithms for Bayesian uncertainty quantiﬁcation in large-scalelinear inverse problems based on low-rank partial Hessian approximations.

SIAM Journalon Scientiﬁc Computing , 33(1):407–432, 2011.[37] D. Ghate and M.B. Giles. Eﬃcient Hessian calculation using automatic diﬀerentiation.

AIAApaper , 4059:2007, 2007.[38] O. Ghattas and J.-H. Bark. Optimal control of two- and three-dimensional Navier-Stokes ﬂow.

Journal of Computational Physics , 136:231–244, 1997.[39] R. Glowinski and J.L. Lions.

Exact and approximate controllability for distributed parametersystems . Cambridge University Press, Cambridge, UK, 1996.[40] M.D. Gunzburger.

Perspectives in ﬂow control and optimization , volume 5. SIAM, 2003.[41] M.D. Gunzburger, H.C. Lee, and J. Lee. Error estimates of stochastic optimal neumann bound-ary control problems.

SIAM Journal on Numerical Analysis , 49:1532–1552, 2011.[42] N. Halko, P-G Martinsson, and J.A. Tropp. Finding structure with randomness: Probabilisticalgorithms for constructing approximate matrix decompositions.

SIAM review , 53(2):217–288, 2011.[43] J.S. Hesthaven, G. Rozza, and B. Stamm. Certiﬁed reduced basis methods for parametrizedpartial diﬀerential equations.

SpringerBriefs in Mathematics , 2015.[44] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich.

Optimization with PDE constraints , vol-ume 23. Springer Science & Business Media, 2008.[45] L.S. Hou, J. Lee, and H. Manouzi. Finite element approximations of stochastic optimal controlproblems constrained by stochastic elliptic PDEs.

Journal of Mathematical Analysis andApplications , 384(1):87–103, 2011.[46] Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas. Scalable and eﬃcient algorithmsfor the propagation of uncertainty from data through inference to prediction for large-scaleproblems, with application to ﬂow of the Antarctic ice sheet.

Journal of ComputationalPhysics , 296:348–368, September 2015.[47] M. K¨archer and M.A. Grepl. A posteriori error estimation for reduced order solutions ofparametrized parabolic optimal control problems.

ESAIM: Mathematical Modelling andNumerical Analysis , 48(6):1615–1638, 2014.[48] M Klein, A Sadiki, and J Janicka. Investigation of the inﬂuence of the reynolds number on aplane jet using direct numerical simulation.

International Journal of Heat and Fluid Flow ,24(6):785–794, 2003.[49] D.P. Kouri, D. Heinkenschloos, M. Ridzal, and B.G. Van Bloemen Waanders. A trust-regionalgorithm with adaptive stochastic collocation for PDE optimization under uncertainty.

SIAM Journal on Scientiﬁc Computing , 35(4):1847–1879, 2012.[50] D.P. Kouri and T.M. Surowiec. Risk-averse PDE-constrained optimization using the conditional. Chen, U. Villa, and O. Ghattas Value-At-Risk.

SIAM Journal on Optimization , 26(1):365–396, 2016.[51] Drew P. Kouri, Matthias Heinkenschloss, Denis Ridzal, and Bart van Bloemen Waanders.Inexact objective function evaluations in a trust-region algorithm for PDE-constrainedoptimization under uncertainty.

SIAM Journal on Scientiﬁc Computing , 36(6):A3011–A3029, 2014.[52] A. Kunoth and Ch. Schwab. Analytic regularity and GPC approximation for control problemsconstrained by linear parametric elliptic and parabolic PDEs.

SIAM Journal on Controland Optimization , 51(3):2442–2471, 2013.[53] A. Kunoth and Ch. Schwab. Sparse adaptive tensor Galerkin approximations of stochasticPDE-constrained control problems.

SIAM/ASA Journal on Uncertainty Quantiﬁcation ,4(1):1034–1059, 2016.[54] T. Lassila, A. Manzoni, A. Quarteroni, and G. Rozza. Boundary control and shape optimizationfor the robust design of bypass anastomoses under uncertainty.

ESAIM: MathematicalModelling and Numerical Analysis , 47(4):1107–1131, 2013.[55] Finn Lindgren, H˚avard Rue, and Johan Lindstr¨om. An explicit link between Gaussian ﬁeldsand Gaussian Markov random ﬁelds: the stochastic partial diﬀerential equation approach.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 73(4):423–498,2011.[56] J.L. Lions.

Optimal control of systems governed by partial diﬀerential equations . Springer-Verlag, Berlin, 1971.[57] A. Logg, K.A. Mardal, and G. Wells.

Automated solution of diﬀerential equations by the ﬁniteelement method: The FEniCS book , volume 84. Springer Science & Business Media, 2012.[58] James Martin, Lucas C. Wilcox, Carsten Burstedde, and Omar Ghattas. A stochastic New-ton MCMC method for large-scale statistical inverse problems with application to seismicinversion.

SIAM Journal on Scientiﬁc Computing , 34(3):A1460–A1487, 2012.[59] F. Negri, A. Manzoni, and G. Rozza. Reduced basis approximation of parametrized optimal ﬂowcontrol problems for the Stokes equations.

Computers & Mathematics with Applications ,69(4):319–336, 2015.[60] F. Negri, G. Rozza, A. Manzoni, and A. Quarteroni. Reduced basis method for parametrizedelliptic optimal control problems.

SIAM Journal on Scientiﬁc Computing , 35(5):A2316–A2340, 2013.[61] L.W.T. Ng and K.E. Willcox. Multiﬁdelity approaches for optimization under uncertainty.

International Journal for numerical methods in Engineering , 100(10):746–772, 2014.[62] J. Nocedal and S. Wright.

Numerical optimization . Springer Science & Business Media, 2006.[63] B. Peherstorfer, K. Willcox, and M. Gunzburger. Survey of multiﬁdelity methods in uncertaintypropagation, inference, and optimization.

SIAM Review , 60(3):550–591, 2018.[64] Noemi Petra, James Martin, Georg Stadler, and Omar Ghattas. A computational frameworkfor inﬁnite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMCwith application to ice sheet inverse problems.

SIAM Journal on Scientiﬁc Computing ,36(4):A1525–A1555, 2014.[65] A. Quarteroni, A. Manzoni, and F. Negri.

Reduced Basis Methods for Partial DiﬀerentialEquations: An Introduction , volume 92. Springer, 2015.[66] Farbod Roosta-Khorasani and Uri Ascher. Improved bounds on sample size for implicit matrixtrace estimators.

Foundations of Computational Mathematics , 15(5):1187–1212, 2015.[67] E. Rosseel and G.N. Wells. Optimal control with stochastic PDE constraints and uncertaincontrols.

Computer Methods in Applied Mechanics and Engineering , 213 - 216(0):152 –167, 2012.[68] G. Rozza, D.B.P. Huynh, and A.T. Patera. Reduced basis approximation and a posteriori errorestimation for aﬃnely parametrized elliptic coercive partial diﬀerential equations.

Archivesof Computational Methods in Engineering , 15(3):229–275, 2008.[69] A.K. Saibaba, A. Alexanderian, and I.C.F. Ipsen. Randomized matrix-free trace and log-determinant estimators.

Numerische Mathematik , pages 1–43, 2016.[70] A.K. Saibaba, J. Lee, and P.K. Kitanidis. Randomized algorithms for generalized Hermitianeigenvalue problems with application to computing Karhunen–Lo`eve expansion.

NumericalLinear Algebra with Applications , 23(2):314–339, 2016.[71] C. Schillings, S. Schmidt, and V. Schulz. Eﬃcient shape optimization for certain and uncertainaerodynamic design.

Computers & Fluids , 46(1):78–87, 2011.[72] C. Schillings and Ch. Schwab. Sparse, adaptive Smolyak quadratures for Bayesian inverseproblems.

Inverse Problems , 29(6), 2013.[73] Alexander Shapiro, Darinka Dentcheva, and Andrezj Ruszczynski.

Lectures on Stochastic Pro-gramming: Modeling and Theory . Society for Industrial and Applied Mathematics, 2009.[74] H. Tiesler, R.M. Kirby, D. Xiu, and T. Preusser. Stochastic collocation for optimal control Taylor approximation and variance reduction for optimal control under uncertaintyproblems with stochastic PDE constraints.

SIAM Journal on Control and Optimization ,50(5):2659–2682, 2012.[75] F. Tr¨oltzsch.

Optimal control of partial diﬀerential equations: theory, methods, and applica-tions , volume 112. American Mathematical Society, Providence, RI, 2010.[76] U. Villa, N. Petra, and O. Ghattas. hIPPYlib: An extensible software framework for large-scaledeterministic and linearized Bayesian inversion, 2016. http://hippylib.github.io .[77] M.J. Zahr and C. Farhat. Progressive construction of a parametric reduced-order model forPDE-constrained optimization.