A fast and scalable computational framework for goal-oriented linear Bayesian optimal experimental design: Application to optimal sensor placement
AA fast and scalable computational framework for goal-orientedlinear Bayesian optimal experimental design: Application tooptimal sensor placement
Keyi Wu , Peng Chen , and Omar Ghattas Department of Mathematics, University of Texas at Austin, Austin, USA Oden Institute for Computational Engineering and Sciences, University of Texas atAustin, Austin, USA
Abstract
Optimal experimental design (OED) is a principled framework for maximizing information gainedfrom limited data in inverse problems. Unfortunately, conventional methods for OED are prohibitivewhen applied to expensive models with high-dimensional parameters, as we target here. We developa fast and scalable computational framework for goal-oriented OED of large-scale Bayesian linearinverse problems that finds sensor locations to maximize the expected information gain (EIG) for apredicted quantity of interest. By employing low-rank approximations of appropriate operators,an online-offline decomposition, and a new swapping greedy algorithm, we are able to maximizeEIG at a cost measured in model solutions that is independent of the problem dimensions. Wedemonstrate the efficiency, accuracy, and both data- and parameter-dimension independence ofthe proposed algorithm for a contaminant transport inverse problem with infinite-dimensionalparameter field.
Optimizing the acquisition of data—e.g., what, where, and when to measure, what experiments torun—in order to maximize information gained from the data is a fundamental and ubiquitous problemacross all of the natural and social sciences, engineering, medicine, and technology. Just three importantexamples include optimal observing system design for ocean climate data [37], optimal sensor placementfor early warning of tsunami waves [25], and optimal experimental design to accelerate MRI imaging [9].Bayesian optimal experimental design (BOED)—including formulations as active learning, Bayesianoptimization, and sensor placement—provides a probabilistic framework to maximize the expectedinformation gain (EIG) or mutual information (MI) for uncertain parameters or related quantities ofinterest [15]. However, evaluating the EIG remains prohibitive for large-scale, complex models, due tothe need to compute double integrals with respect to both parameter and data distributions. Recently,
This research was partially funded by the National Science Foundation, Division of Mathematical Sciences underaward DMS-2012453; the Department of Energy, Office of Science, Office of Advanced Scientific Computing Research,Mathematical Multifaceted Integrated Capability Centers (MMICCS) program under award DE-SC0019303; and theSimons Foundation under award 560651. a r X i v : . [ m a t h . O C ] F e b dvances in efficiently evaluating the EIG and optimizing the design have been achieved using methodsbased on posterior Laplace approximation-based EIG estimation [36], myopic posterior sampling foradaptive goal-oriented BOED [34], EIG estimation by variational inference for BOED [27], BOED forimplicit models by neural EIG estimation [35], and sequential BOED with variable cost structure [45].Interest has intensified in extending BOED to the case of experiments on, or observations of,complex physical systems, since these can be very expensive (e.g., satellite trajectories, subsurface wells,ocean-bottom acoustic sensors). Such physical systems are typically modeled by partial differentialequations (PDEs), which are expensive to solve and often contain infinite-dimensional parameterfields and large numbers of design variable. This presents fundamental challenges to conventionalBOED methods, which require prohibitively large numbers of (PDE) model solves. Several differentclasses of methods have been developed to tackle these computational challenges by exploiting (1)sparsity by polynomial chaos approximation of parameter-to-observation maps [29, 30, 31], (2) intrinsiclow dimensionality by low-rank approximation of (prior-preconditioned and data-informed) operators[2, 1, 3, 41, 23, 8], and (3) decomposibility by offline (for model-constrained EIG approximation)–online(for design optimization) decomposition [43].Here, we focus on goal-oriented optimal experimental design (GOOED) for large-scale Bayesianinverse problems, in the context of optimal sensor placement. That is, we seek optimal sensor locationsthat maximize the information gained from the sensors, not about the model parameters, but (of greaterpractical interest) for a posterior model-predictive goal. In particular, we consider linear parameter-to-observable (PtO) maps governed by expensive models (e.g., PDEs) with high-dimensional uncertainparameters (e.g., infinite-dimensional before discretization). In [8], a gradient-based optimizationmethod is developed to solve the linear GOOED problem to find the optimal sensor locations. However,in each of the possibly very large number of iterations, many model evaluations have to be performed,which makes the algorithm prohibitive if each model evaluation (e.g., solving PDEs) is very expensive. Contributions . We propose a fast and scalable computational framework for high-dimensionaland Bayesian GOOED problems governed by large-scale, expensive-to-solve models. To overcome thecurse-of-dimensionality with respect to both parameter and data dimensions, we exploit the intrinsiclow-dimensionality of the PtO map and propose and analyze low-rank approximation of appropriatedata- and parameter-informed operators and the EIG. These properties are revealed by Jacobiansand Hessians of the PtO map, as has been done for model reduction for sampling and deep learning[10, 17, 6, 38], Bayesian inference [13, 14, 16, 21, 22, 18], optimization under uncertainty [4, 20, 19],and BOED [2, 3, 23, 41, 8, 43]. We use a randomized algorithm for the low-rank approximation, whichrequires only a small and dimension-independent number of large-scale model evaluations. Moreover,we develop an efficient offline-online decomposition scheme to evaluate the goal-oriented
EIG, wherein the offline stage the model-constrained low-rank approximations are performed just once, whilein the online stage the design optimization is performed free of model evaluations. Furthermore, forthe design optimization we introduce a new swapping greedy algorithm that first constructs an initialset of sensors using leverage scores, and then swaps the chosen sensors with other candidates untilcertain convergence criteria are met. Finally, we demonstrate the efficiency, accuracy, and dimensionindependence (with respect to both data and parameters) of the proposed algorithm for a contaminanttransport inverse problem with infinite-dimensional parameter field.We present background on BOED in Section 2, propose our computational framework for GOOEDin Section 3, and report results on experiments in Section 4.2
Background
We consider a general linear model y = Fm + (cid:15) , (1)where y ∈ R d y is a d y -dimensional observational data vector corrupted by additive Gaussian noise (cid:15) ∈ N ( , Γ n ) with zero mean and covariance Γ n ∈ R d y × d y , m ∈ R d m is a d m -dimensional uncertainparameter vector, and F : R d m (cid:55)→ R d y is a linear PtO map. As a specific case, m is a discretization(e.g., by finite element method) of an infinite-dimensional parameter field in a model described byPDEs, while F is implicitly given by solving the PDE model. In this case, the parameter dimension istypically very high, O (10 − ) for practical applications.We assume a Gaussian prior m ∼ N ( m pr , Γ pr ) with mean m pr and covariance Γ pr for the parameter m with density π pr ( m ) ∝ exp (cid:18) − || m − m pr || Γ − (cid:19) , (2)where || m − m pr || Γ − := ( m − m pr ) T Γ − ( m − m pr ). Then by Bayes’ rule the posterior density of m satisfies π post ( m | y ) ∝ π like ( y | m ) π pr ( m ) . (3)Here π like ( y | m ) is the likelihood function that satisfies π like ( y | m ) ∝ exp ( − Φ( m , y )) (4)under Gaussian noise (cid:15) ∈ N ( , Γ n ) , where the potentialΦ( m , y ) := 12 || Fm − y || Γ − . (5)Under the assumption of Gaussian prior and Gaussian noise, the posterior of m is also Gaussian N ( m map , Γ post ) with mean m post = Γ post ( F ∗ Γ − y + Γ − m pr ) and covariance Γ post = ( H m + Γ − ) − ,where H m = F ∗ Γ − F (6)is the (data-misfit) Hessian of the potential Φ( m , y ), and F ∗ is the adjoint of F , e.g., by solving theadjoint PDE model. The expected information gain (EIG) is defined as the expected (with respect to data) Kullback-Leibler (KL) divergence between the posterior and the prior distributions,Ψ := E y [ D KL ( π post ( ·| y ) (cid:107) π pr )] , (7)where the KL divergence is defined as D KL ( π post (cid:107) π pr ) := (cid:90) ln (cid:18) dπ post dπ pr (cid:19) dπ post . (8)3or a Bayesian linear inverse problem as formulated in Section 2.1, the EIG Ψ admits the closed form[1] Ψ = 12 logdet (cid:16) I m + (cid:101) H m (cid:17) , (9)where I m is an identity matrix of size d m × d m , and (cid:101) H m := Γ pr H m Γ pr is the prior-preconditionedHessian that includes both data and prior information. We consider an optimal sensor placement problem. Assume we have a collection of d candidatesensors { s i } di =1 . We need to choose a much smaller number r < d of sensors (due to a limited budgetor physical constraints) at which data are collected. The OED problem seeks to find the best sensorcombination from the candidates. We use a Boolean design matrix W ∈ W ⊂ R r × d to represent sensorplacement such that W ij = 1 if the i -th sensor is placed at the j -th candidate location, i.e., W ij ∈ { , } , d (cid:88) j =1 W ij = 1 , r (cid:88) i =1 W ij ∈ { , } . (10)We assume that the observational noise for the d candidate sensors is uncorrelated, with covariance Γ d n = diag( σ , . . . , σ d ) . (11)As a result, for any design W with the covariance for the observation noise (cid:15) as Γ n ( W ) = W Γ d n W T , wehave Γ − ( W ) = W ( Γ d n ) − W T . (12)Denoting by F d the PtO map using all d candidate sensors, we have the design-specific PtO map F ( W ) = W F d , (13)with its adjoint F ∗ = F ∗ d W T . We can now state the OED problem as: find an optimal design W ∈ W such that W = arg max W ∈W Ψ( W ) . (14) The classical OED problem seeks a design that maximizes the information gain for the parametervector m . In this work, we consider a goal-oriented optimal experimental design (GOOED) problemthat maximizes the information gain of a predicted quantity of interest (QoI) ρ ∈ R p , which is assumedto be a linear function of the parameter m , ρ = Pm , (15)where P : R d m (cid:55)→ R d ρ is a linear map that typically involves model evaluation (e.g., solving PDEs). Dueto linearity, the prior distribution of ρ is Gaussian N ( ρ pr , Σ pr ) with mean ρ pr = Pm pr and covariance Σ pr = PΓ pr P ∗ , where P ∗ is the adjoint of P . Moreover, the posterior distribution of ρ is also Gaussian N ( ρ post , Σ post ) with mean ρ post = Pm post and covariance Σ post = PΓ post P ∗ .4 .1 Expected information gain for GOOED To construct an expression for EIG for GOOED, we first introduce Proposition 3.1 [42], whichrelates the observational data y and the QoI ρ . Proposition 3.1.
Model (1) and QoI (15) lead to y = FP † ρ + η , (16) where P † := Γ pr P ∗ Σ − pr , and η ∼ N ( , Γ η ) with Γ η := Γ n + F ( Γ pr − Γ pr P ∗ Σ − pr PΓ pr ) F ∗ , (17) or equivalently Γ η = C ov [ (cid:15) ] + C ov [ F ( I m − P † P ) m ] , with C ov as covariance. Moreover, ρ and η areindependent. Thus, the EIG for ρ can be obtained analogously to (9),Ψ ρ ( W ) = 12 logdet (cid:16) I ρ + (cid:101) H ρm ( W ) (cid:17) , (18)where I ρ is an identity matrix of size d ρ × d ρ , and (cid:101) H ρm ( W ) = Σ pr H ρm ( W ) Σ pr , with H ρm ( W ) given by H ρm ( W ) = ( F ( W ) P † ) ∗ Γ − η ( W ) F ( W ) P † . (19) Ψ ρ The EIG Ψ ρ ( W ) depends on W through F ( W ) given in (13), which involves expensive modelevaluations (e.g., PDE solutions). Since Ψ ρ ( W ) must be evaluated repeatedly in the course ofmaximizing EIG, these repeated model evaluations would be prohibitive. To circumvent this problem,we propose an offline-online decomposition scheme, where model-constrained computation of quantitiesthat are independent of W is performed offline a single time, and the online design optimization isfree of any model evaluations. The key result permitting this decomposition is given in the followingtheorem, whose proof is given in Appendix A. Theorem 3.2.
For each design W ∈ W , the goal-oriented EIG Ψ ρ ( W ) given in (18) can be computedas Ψ ρ ( W ) = 12 logdet (cid:0) I r + L T W H ρd W T L (cid:1) , (20) where I r is an identity matrix of size r × r , H ρd is given by H ρd := F d Γ pr P ∗ Σ − pr PΓ pr F ∗ d , (21) and L is given by the Cholesky factorization Γ − η = LL T . Note that Γ η defined in (17) can be equivalently written as Γ η ( W ) = W (cid:0) Γ d n + H d − H ρd (cid:1) W T , (22)where H d := F d Γ pr F ∗ d . Hence evaluation of Ψ ρ ( W ) can be decomposed as follows: (1) construct themodel-constrained matrices H d and H ρd offline just once; and (2) for each W in the online optimizationprocess, assemble a small ( r × r ) matrix Γ η ( W ) by (22), compute a Cholesky factorization Γ − η = LL T ,and assemble Ψ ρ ( W ) by (20), which are all free of the expensive model evaluations.5ote that H d ∈ R d × d and H ρd ∈ R d × d are large matrices when we have a large number of candidatesensors d (cid:29)
1. Moreover, their construction involves expensive model evaluations when the parametersare high-dimensional, d m (cid:29)
1, e.g., by solving PDEs. Therefore, it is computationally not practical todirectly compute and store these matrices. Fortunately, the intrinsic ill-posedness of high-dimensionalBayesian inverse problems—data inform only a low-dimensional subspace of parameter space, e.g.,[32, 13, 39, 26, 7]—suggests that these matrices are likely low rank or exhibit rapid spectral decay. Weexploit this property and construct low-rank approximations of H ρd and H d − H ρd in the next section. Let ∆ H d := H d − H ρd , where H ρd and H d are given in (21) and (22) and integrate data, parameter,and QoI information. Noting that H ρd and ∆ H d are both symmetric, we compute their low-rankapproximation for given tolerances (cid:15) ζ , (cid:15) λ > H ρd = U k Z k U Tk and ∆ ˆ H d = V l Λ l V Tl , (23)where ( U k , Z k ) represent the k dominant eigenpairs of H ρd with Z k = diag( ζ , . . . , ζ k ) such that ζ ≥ ζ ≥ · · · ≥ ζ k ≥ (cid:15) ζ ≥ ζ k +1 · · · ≥ ζ d ; (24)and ( V l , Λ l ) represent the l dominant eigenpairs of ∆ H d with Λ l = diag( λ , . . . , λ l ) such that λ ≥ λ ≥ · · · ≥ λ l ≥ (cid:15) λ ≥ λ l +1 ≥ · · · ≥ λ d . (25)For the low-rank approximation, we employ a randomized SVD algorithm [28], which requires only O ( k ) and O ( l ) model evaluations, respectively. In practice, k, l (cid:28) d . More details on the algorithmapplied to the example problem in Section 4 can be found in Appendix C.With ˆ Γ η ( W ) := W (cid:16) Γ d n + ∆ ˆ H d (cid:17) W T as an approximation of Γ η ( W ) in (17), we compute theCholesky factorization ˆ Γ − η = ˆ L ˆ L T . Then we can define an approximate EIG asˆΨ ρ ( W ) := 12 logdet (cid:16) I r + ˆ L T W ˆ H ρd W T ˆ L (cid:17) . (26)The following theorem quantifies the approximation error, whose proof can be found in Appendix B. Theorem 3.3.
For any design W ∈ W , the error for the goal-oriented EIG Ψ ρ ( W ) in (20) by itsapproximation ˆΨ ρ ( W ) in (26) can be bounded by | Ψ ρ ( W ) − ˆΨ ρ ( W ) | ≤ d (cid:88) i = k +1 log(1 + ζ i /σ min )+ 12 k (cid:88) i = l +1 log(1 + λ i ζ /σ min ) , (27) where σ min := min( σ , . . . , σ d ) as defined in (11). We remark that with rapid decay of the eigenvalues ( ζ k ) k ≥ of ˆ H ρd and ( λ l ) l ≥ of ∆ ˆ H d , the errorbound in (27) becomes very small. Moreover, the decay rates are often independent of the (candidate)data dimension d and the parameter dimension d m , as demonstrated in Section 4.3. This means thatan arbitrarily-accurate EIG approximation can be constructed with a small number, O ( k + l ), of modelsolves. 6 .4 Swapping greedy optimization Once the low-rank approximations of H ρd and ∆ H d are constructed per (23), we obtain a fast methodfor evaluating the approximate EIG in (26), with certified approximation error given by Theorem 3.3.We emphasize that this fast computation does not involve expensive model evaluations (e.g., large-scalePDE solves) for any given design W . We now turn to the (combinatorial) optimization problem offinding the optimal design matrix W , W = arg max W ∈W ˆΨ ρ ( W ) . (28)We next introduce a swapping greedy algorithm to solve this problem requiring only evaluation ofˆΨ ρ ( W ).In contrast to classical greedy algorithms that sequentially find the optimal sensors one by one(or batch by batch) [11, 33], we extend a swapping greedy algorithm developed for BOED in [43] tosolve the GOOED problem. Given a current sensor set, it swaps sensors with the remaining sensorsto maximize the approximate EIG ˆΨ ρ ( W ) until convergence. To initialize the chosen sensor set, wetake advantage of the low-rank approximation ˆ H ρd in (23), which contains information from the data(through F d ), parameter (through Γ pr ), and QoI (through P ), as can be seen from (21). In particular,the most informative sensors can be revealed by the rows of U k with the largest norms, or the leveragescores of H ρd [12]. More specifically, given a budget of selecting r sensors from d candidate locations, weinitialize the candidate set S = { s , . . . , s r } such that s i , i = 1 , . . . , r , is the row index correspondingto the i -th largest row norm of U k , i.e., s i = arg max s ∈ S \ S i − || U k ( s, :) || , i = 1 , . . . , r, (29)where U k ( s, :) is the s -th row of U k , || · || is the Euclidean norm, and the set S i − = { s , . . . , s i − } for i = 2 , , . . . , and S = ∅ . Then at each step of a loop for t = 1 , . . . , r , we swap a sensor s t from thecurrent chosen sensor set S t − with one from the candidate set such that the approximate EIG ˆΨ ρ ( W )evaluated as in (26) can be maximized, i.e., we choose s ∗ such that s ∗ = arg max s ∈{ s t }∪ ( S \ S t − ) ˆΨ ρ ( W s ) , (30)where W s is the design matrix corresponding to the sensor choice S t − \ { s t } ∪ { s } . We repeat theloop until a convergence criterion is met, e.g., the chosen S does not change or the difference of theapproximate EIG is smaller than a given tolerance. We summarize the swapping algorithm algorithm7n Algorithm 1. Algorithm 1:
A swapping greedy algorithm for GOOED Input : low-rank approximations (23), a set S = { , . . . , d } of d candidate sensors, a budget of r sensors tobe placed. Output : the optimal sensor set S ∗ with r sensors. Initialize S ∗ = { s , . . . , s r } ⊂ S according to (29). Set S = {∅} . while S ∗ (cid:54) = S do S ← S ∗ . for t = 1 , . . . , r do Choose s ∗ according to (30). Update S t ← ( S t − \ { s t } ) ∪ { s ∗ } . end for Update S ∗ ← S r . end while Output: optimal sensor choice S ∗ . In this section, we present results of numerical experiments for GOOED governed by a lineardynamical PDE model with infinite-dimensional parameter field and varying numbers of candidatesensors. This problem features the key challenges of (1) expensive model evaluation and (2) high-dimensional parameters and data. The code is available at https://github.com/cpempire/GOOED.
We consider sensor placement for Bayesian inversion of a contaminant source with the goal ofmaximizing information gain for contaminant concentration on some building surfaces. The transport ofthe contaminant can be modeled by the time-dependent advection-diffusion equation with homogeneousNeumann boundary condition, u t − k ∆ u + v · ∇ u = 0 in D × (0 , T ) ,u ( · ,
0) = m in D ,k ∇ u · n = 0 on ∂ D × (0 , T ) , (31)where k = 0 .
001 is the diffusion coefficient and
T >
D ⊂ R is open andbounded with boundary ∂ D depicted in Figure 1. The initial condition m is an infinite-dimensionalrandom parameter field in D , which is to be inferred. The velocity field v ∈ R is obtained as thesolution of the steady-state Navier–Stokes equations with Dirichlet boundary condition, − v + ∇ q + v · ∇ v = 0 in D , ∇ · v = 0 in D , v = g on ∂ D , (32)where q represents the pressure field and the Reynolds number Re = 50. The Dirichlet boundary data g ∈ R are prescribed as g = (0 ,
1) on the left wall of the domain, g = (0 , −
1) on the right wall, and g = (0 ,
0) elsewhere.We consider a Gaussian prior for the parameter m ∼ N ( m pr , C pr ) with mean m pr and covariance8 .00.20.40.60.81.0 0.00.10.20.30.40.5 Figure 1:
The domain D is [0 , with two buildings (rectangles occupying [0 . , . × [0 . , . , [0 . , . × [0 . , . v . Right: true parameter field m true . operator C pr = A − , where the elliptic operator A = − γ ∆ + δI (with Laplacian ∆ and identity I ) isequipped with Robin boundary condition γ ∇ m · n + βm on ∂ D . Here γ, δ > m [24]. In our numerical test, we set m pr = 0 . γ = 1 , δ = 8. We synthesizea “true” initial condition m true = min(0 . , exp( − (cid:107) x − [0 . , . (cid:107) ) as the contaminant source(Figure 1). To solve the PDE model, we use an implicit Euler method for temporal discretization with N t time steps, and a finite element method for spatial discretization, resulting in a d m -dimensionaldiscrete parameter m ∼ N ( m pr , Γ pr ), with m pr , Γ pr denoting finite element discretizations of m pr , C pr ,respectively.Figure 2: Data of contaminant concentration at time T = 0 .
8, obtained as the solution of (31) at the initialcondition shown in Figure 1. The QoI maps ( P , P , P ) are the averaged solution within the lines along theleft, right, and both buildings. Candidate sensor locations are shown in circles (9 left and 75 right). The solution of the PDE for d m = 2023 and N t = 40 is shown in Figure 2 at the observationtime T = 0 .
8. The d candidate sensor locations are also shown in Figure 2, at which we observe thecontaminant concentration u . The linear map F is defined by the predicted data, i.e., the concentrations9t the selected sensors. Finally, we take the QoI as an averaged contaminant concentration at time t pred within a distance δ = 0 .
02 from the boundaries of either the left, the right, or both buildings, withcorresponding QoI maps denoted as P , P , P (see Figure 2). We first consider the case of a small number of candidate sensors, for which we can use exhaustivesearch to find the optimal sensor combination and compare it with the sensors chosen by the standardand swapping greedy algorithms. Specifically, we use a grid of d = 9 candidate locations { s i } i =0 ( x i ∈{ . , . , . }×{ . , . , . } ) as shown in Figure 2 (left) with the goal of choosing r = 2 , , , , , , t pred = 1 .
0. We compute the matrices H ρd and ∆ H d (of size 9 × a pp r o x i m a t e d E I G swapping greedy choicestandard greedy choiceall possible choices (a) P a pp r o x i m a t e d E I G swapping greedy choicestandard greedy choiceall possible choices (b) P a pp r o x i m a t e d E I G swapping greedy choicestandard greedy choiceall possible choices (c) P Figure 3:
Approximate EIG ˆΨ ρ at r sensors chosen by the standard and swapping greedy algorithms, and thedistribution of ˆΨ ρ at all possible combinations of 9 candidate sensors. The three plots are for the QoI maps P , P , and P . We can see from Figure 3 that for QoI maps P and P , both greedy algorithms find the optimaldesign, while for P with r = 2 ,
4, only swapping greedy finds the optimal design. Moreover, an increasein r leads to diminishing returns, as the gain in information about the QoI from additional sensorssaturates. We see that ∼ r optimal sensors, r = 5 , , , , , , , ,
60, from among the 75 candidates. Results are shown in Figure 4.
10 20 30 40 50 60number of sensors r0.51.01.52.02.53.03.5 a pp r o x i m a t e d E I G swapping greedy choicestandard greedy choicerandom choices (a) P
10 20 30 40 50 60number of sensors r0.51.01.52.02.53.03.5 a pp r o x i m a t e d E I G swapping greedy choicestandard greedy choicerandom choices (b) P
10 20 30 40 50 60number of sensors r0.51.01.52.02.53.03.54.0 a pp r o x i m a t e d E I G swapping greedy choicestandard greedy choicerandom choices (c) P Figure 4:
Approximate EIG ˆΨ ρ for r out of 75 sensors, found by the standard and swapping greedy algorithms,compared with the distribution of ˆΨ ρ for 200 randomly-chosen sets from the 75. The three plots are for theQoI maps P , P and P . We see that both greedy algorithms find designs with larger EIG than all random choices. Moreover,for small r , the swapping greedy algorithm finds better designs than the standard greedy. For large r ,both greedy algorithms can find designs with similar EIG. In fact, multiple designs with similar EIGbecome more likely with larger r .To demonstrate the reduction of computational cost achieved by the offline-online decomposition,we report the total number of EIG evaluations, the number of swapping loops, and the number of swapsof the swapping greedy algorithm (Algorithm 1) in Table 1 for 75 candidate sensors with different targetnumber of sensors. We see that the number of loops at convergence is mostly 3. We observe in theexperiments that most of the swaps take place in the first loop, followed by a smaller number of swapsin the second loop resulting in slight sensor adjustments. There are no swaps in the last loop, which werequire as a convergence criterion. As a result of the offline-online decomposition (Theorem 3.2), whichrelieves the (thousands of) EIG evaluations of expensive PDE solves once the low-rank approximation(23) is built, we achieve over 1000X speedup. This is because the PDE solves overwhelmingly dominatethe overall cost, and because the offline decomposition is computed at a cost comparable to one directEIG evaluation by (18). 11able 1: Number of swapping loops ( r selected sensors out of 75 candidates. Results are reported for Algorithm 1 for the goal P . r
41 73 124 164 190 r
30 40 50 60
194 235 199 119
Figure 5 illustrates the effect of the goal of maximizing information gain for the QoIs from optimallyplaced sensors. Specifically, for the parameter-to-QoI maps P , P , P that quantify the averagecontaminant concentration at time t pred = 1 around left, right, and both blocks, the goal-oriented OEDfinds the sensors depicted in the first row. For P at longer prediction times t pred = 1 , , ,
8, we see inthe bottom row of Figure 5 that the optimal sensors are no longer placed in the immediate vicinity ofthe building, but instead are increasingly dispersed to better detect the now more diffused field. Finally,the ability of GOOED to reduce the posterior variance in the initial condition field is depicted in Figure6 for different goals P , P , P . Compared to a random design (lower right), the three optimal designslead to lower variance surrounding regions of interest. Here we demonstrate the fast decay of the eigenvalues of H ρd and ∆ H d with respect to the parameterand data dimensions, as exploited by the algorithms of Section 3.3. For H ρd defined in (21), we haverank( H ρd ) ≤ min( p, d ) with QoI dimension p and data dimension d . In practice, the QoI is often anaveraged quantity with small p , so the rank of H ρd is also small. In our tests we have rank( H ρd ) = p = 1.For ∆ H d = H d − H ρd with H d = F d Γ pr F ∗ , the spectrum of ∆ H d depends on that of H d , which typicallyexhibits fast decay due to ill-posedness of inverse problems. As can be observed in the left plot of Figure7, the eigenvalues of ∆ H d decay very rapidly and independently of the parameter dimension, whichimplies that the required number of PDE solves is small and independent of the parameter dimensionwhile achieving the same absolute accuracy of the approximate EIG by Theorem 3.3. The right plotin Figure 7 also illustrates rapid decay of eigenvalues, as well as diminishing returns, with increasingnumber of candidate sensors, suggesting that the number of PDE solves is asymptotically independentof the data dimension for the same relative accuracy of the approximate EIG. These plots suggest that O (100) PDE solves are required to accurately capture the information gained about the parameterfield and QoI from the data, regardless of the parameter or sensor dimensions, when using randomizedSVD (Algorithm 2). We have developed a fast and scalable computational framework for goal-oriented linear Bayesianoptimal experimental design governed by expensive models. Repeated fast evaluation of an (arbitrarilyaccurate) approximate EIG while avoiding model evaluations is made possible by an offline-online12a) P at t pred = 1 . (b) P at t pred = 1 . (c) P at t pred = 1 . (d) P at t pred = 2 . (e) P at t pred = 4 . (f) P at t pred = 8 . Figure 5:
Sensor locations chosen by the swapping greedy algorithm for 10 out of 75 candidates for theparameter-to-QoI maps P , P , P at time t pred = 1 and also P at time t pred = 2 , , decomposition and low-rank approximation of certain operators informed by the parameter, data, andpredictive goals of interest. Scalability, as measured by parameter- and data-dimension independenceof the number of model evaluations, is achieved by carefully exploiting the GOOED problem’s intrinsiclow dimensionality as manifested by the rapid spectral decay of several critical operators. To justifythe low-rank approximation of these operators in computing the EIG, we proved an upper bound forthe approximation error in terms of the operators’ truncated eigenvalues. Moreover, we proposed anew swapping greedy algorithm that is demonstrated to be more effective than the standard greedyalgorithm in our experiments. Numerical experiments with optimal sensor placement for Bayesianinference of the initial condition of an advection–diffusion PDE demonstrated over 1000X speedups(measured in PDE solves). Future work includes extension to nonlinear Bayesian GOOED problemswith nonlinear parameter-to-observable maps and nonlinear parameter-to-QoI maps. References [1] Alen Alexanderian, Philip J. Gloor, and Omar Ghattas. On Bayesian A-and D-optimal experimentaldesigns in infinite dimensions.
Bayesian Analysis , 11(3):671–695, 2016.[2] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. A-optimal design of experi-ments for infinite-dimensional Bayesian linear inverse problems with regularized (cid:96) -sparsification.13a) optimal design for P (b) optimal design for P (c) optimal design for P (d) random design Figure 6:
Pointwise posterior variance of the parameter at optimal designs for goals P , P , P , compared to arandom design, for 10 sensors. The darker regions represent lower variance. number of eigenvalue e i g e n v a l u e o f H d d m = 248 d m = 398 d m = 897 d m = 1525 d m = 2685 d m = 5895 10 number of eigenvalue e i g e n v a l u e o f H d d = 7d = 13d = 32d = 58d = 124d = 232d = 506d = 907d = 1566 Figure 7:
Decay of the eigenvalues of ∆ H d with increasing parameter dimension (left) and data (candidatesensor locations) dimension (right). SIAM Journal on Scientific Computing , 36(5):A2122–A2148, 2014.[3] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. A fast and scalable methodfor A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems.
SIAM Journal on Scientific Computing , 38(1):A243–A272, 2016.[4] Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. Mean-variance risk-averseoptimal control of systems governed by PDEs with random parameter fields using quadraticapproximations.
SIAM/ASA Journal on Uncertainty Quantification , 5(1):1166–1192, 2017.[5] Alen Alexanderian and Arvind K. Saibaba. Efficient D-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems.
SIAM Journal on Scientific Computing , 40(5):A2956–A2985, 2018.[6] Nick Alger, Peng Chen, and Omar Ghattas. Tensor train construction from tensor actions, withapplication to compression of large high order derivative tensors.
SIAM Journal on ScientificComputing , 42(5):A3516–A3539, 2020.[7] Ilona Ambartsumyan, Wajih Boukaram, Tan Bui-Thanh, Omar Ghattas, David Keyes, GeorgStadler, George Turkiyyah, and Stefano Zampini. Hierarchical matrix approximations of Hessians14rising in inverse problems governed by PDEs.
SIAM Journal on Scientific Computing , 42(5):A3397–A3426, 2020.[8] Ahmed Attia, Alen Alexanderian, and Arvind K Saibaba. Goal-oriented optimal design ofexperiments for large-scale bayesian linear inverse problems.
Inverse Problems , 34(9):095009, 2018.[9] Tim Bakker, Herke van Hoof, and Max Welling. Experimental design for MRI by greedy policysearch.
Advances in Neural Information Processing Systems , 33, 2020.[10] O. Bashir, K. Willcox, O. Ghattas, B. van Bloemen Waanders, and J. Hill. Hessian-based modelreduction for large-scale systems with initial condition inputs.
International Journal for NumericalMethods in Engineering , 73:844–868, 2008.[11] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guaranteesfor greedy maximization of non-submodular functions with applications. In
Proceedings of the 34thInternational Conference on Machine Learning-Volume 70 , pages 498–507. JMLR. org, 2017.[12] Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. An improved approximationalgorithm for the column subset selection problem.
CoRR , abs/0812.4293, 2008.[13] Tan Bui-Thanh, Carsten Burstedde, Omar Ghattas, James Martin, Georg Stadler, and Lucas C.Wilcox. Extreme-scale UQ for Bayesian inverse problems governed by PDEs. In
SC12: Proceedingsof the International Conference for High Performance Computing, Networking, Storage andAnalysis , 2012.[14] Tan Bui-Thanh, Omar Ghattas, James Martin, and Georg Stadler. A computational frameworkfor infinite-dimensional Bayesian inverse problems Part I: The linearized case, with application toglobal seismic inversion.
SIAM Journal on Scientific Computing , 35(6):A2494–A2523, 2013.[15] Kathryn Chaloner and Isabella Verdinelli. Bayesian experimental design: A review.
StatisticalScience , 10(3):273–304, 1995.[16] P. Chen, U. Villa, and O. Ghattas. Hessian-based adaptive sparse quadrature for infinite-dimensionalBayesian inverse problems.
Computer Methods in Applied Mechanics and Engineering , 327:147–172,2017.[17] Peng Chen and Omar Ghattas. Hessian-based sampling for high-dimensional model reduction.
International Journal for Uncertainty Quantification , 9(2), 2019.[18] Peng Chen and Omar Ghattas. Projected Stein variational gradient descent. In
Advances inNeural Information Processing Systems , 2020.[19] Peng Chen, Michael R Haberman, and Omar Ghattas. Optimal design of acoustic metamaterialcloaks under uncertainty.
Journal of Computational Physics , page 110114, 2021.[20] Peng Chen, Umberto Villa, and Omar Ghattas. Taylor approximation and variance reduction forPDE-constrained optimal control under uncertainty.
Journal of Computational Physics , 385:163–186, 2019.[21] Peng Chen, Keyi Wu, Joshua Chen, Thomas O’Leary-Roseberry, and Omar Ghattas. ProjectedStein variational Newton: A fast and scalable Bayesian inference method in high dimensions.
Advances in Neural Information Processing Systems , 2019.1522] Peng Chen, Keyi Wu, and Omar Ghattas. Bayesian inference of heterogeneous epidemic mod-els: Application to COVID-19 spread accounting for long-term care facilities. arXiv preprintarXiv:2011.01058 , 2020.[23] Benjamin Crestel, Alen Alexanderian, Georg Stadler, and Omar Ghattas. A-optimal encodingweights for nonlinear inverse problems, with application to the Helmholtz inverse problem.
InverseProblems , 33(7):074008, 2017.[24] Yair Daon and Georg Stadler. Mitigating the influence of boundary conditions on covarianceoperators derived from elliptic PDEs.
Inverse Problems and Imaging , 12(5):1083–1102, 2018.[25] Angelie R Ferrolino, Jose Ernie C Lope, and Renier G Mendoza. Optimal location of sensors forearly detection of tsunami waves. In
International Conference on Computational Science , pages562–575. Springer, 2020.[26] Pearl H. Flath, Lucas C. Wilcox, Volkan Ak¸celik, Judy Hill, Bart van Bloemen Waanders, andOmar Ghattas. Fast algorithms for Bayesian uncertainty quantification in large-scale linearinverse problems based on low-rank partial Hessian approximations.
SIAM Journal on ScientificComputing , 33(1):407–432, 2011.[27] Adam Foster, Martin Jankowiak, Elias Bingham, Paul Horsfall, Yee Whye Teh, Thomas Rain-forth, and Noah Goodman. Variational Bayesian optimal experimental design. In H. Wallach,H. Larochelle, A. Beygelzimer, F. d ' Alch´e-Buc, E. Fox, and R. Garnett, editors,
Advances inNeural Information Processing Systems , volume 32, pages 14036–14047. Curran Associates, Inc.,2019.[28] Nathan Halko, Per Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness:Probabilistic algorithms for constructing approximate matrix decompositions.
SIAM Review ,53(2):217–288, 2011.[29] Xun Huan and Youssef M. Marzouk. Simulation-based optimal Bayesian experimental design fornonlinear systems.
Journal of Computational Physics , 232(1):288–317, 2013.[30] Xun Huan and Youssef M. Marzouk. Gradient-based stochastic optimization methods in Bayesianexperimental design.
International Journal for Uncertainty Quantification , 4(6):479–510, 2014.[31] Xun Huan and Youssef M Marzouk. Sequential bayesian optimal experimental design via approxi-mate dynamic programming. arXiv preprint arXiv:1604.08320 , 2016.[32] Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas. Scalable and efficient algorithms forthe propagation of uncertainty from data through inference to prediction for large-scale problems,with application to flow of the Antarctic ice sheet.
Journal of Computational Physics , 296:348–368,September 2015.[33] Jayanth Jagalur-Mohan and Youssef Marzouk. Batch greedy maximization of non-submodularfunctions: Guarantees and applications to experimental design. arXiv preprint arXiv:2006.04554 ,2020.[34] Kirthevasan Kandasamy, Willie Neiswanger, Reed Zhang, Akshay Krishnamurthy, Jeff Schneider,and Barnabas Poczos. Myopic posterior sampling for adaptive goal oriented design of experiments.In
International Conference on Machine Learning , pages 3222–3232. PMLR, 2019.1635] Steven Kleinegesse and Michael U Gutmann. Bayesian experimental design for implicit models bymutual information neural estimation. In
International Conference on Machine Learning , pages5316–5326. PMLR, 2020.[36] Quan Long, Marco Scavino, Ra´ul Tempone, and Suojin Wang. Fast estimation of expectedinformation gains for Bayesian experimental designs based on Laplace approximations.
ComputerMethods in Applied Mechanics and Engineering , 259:24–39, 2013.[37] Nora Loose and Patrick Heimbach. Leveraging uncertainty quantification to design ocean climateobserving systems.
Journal of Advances in Modeling Earth Systems , pages 1–29, January 2021.[38] T. O’Leary-Roseberry, U. Villa, P. Chen, and O. Ghattas. Derivative-informed projected neural net-works for high-dimensional parametric maps governed by PDEs. https://arxiv.org/abs/2011.15110 ,2020.[39] Noemi Petra, James Martin, Georg Stadler, and Omar Ghattas. A computational framework forinfinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with applicationto ice sheet flow inverse problems.
SIAM Journal on Scientific Computing , 36(4):A1525–A1555,2014.[40] Constantine Pozrikidis.
An introduction to grids, graphs, and networks . Oxford University Press,2014.[41] Arvind K Saibaba, Alen Alexanderian, and Ilse CF Ipsen. Randomized matrix-free trace andlog-determinant estimators.
Numerische Mathematik , 137(2):353–395, 2017.[42] A. Spantini, T. Cui, K. Willcox, L. Tenorio, and Y. M. Marzouk. Goal-oriented optimal approxima-tions of Bayesian linear inverse problems.
SIAM Journal on Scientific Computing , 39(5):S167–S196,2017.[43] Keyi Wu, Peng Chen, and Omar Ghattas. A fast and scalable computational framework for large-scale and high-dimensional Bayesian optimal experimental design. arXiv preprint arXiv:2010.15196 ,2020.[44] Man-Chung Yue. A matrix generalization of the hardy-littlewood-p´olya rearrangement inequalityand its applications. arXiv preprint arXiv:2006.08144 , 2020.[45] Sue Zheng, David Hayden, Jason Pacheco, and John W Fisher III. Sequential Bayesian experimentaldesign with variable cost structure.
Advances in Neural Information Processing Systems , 33, 2020.17 ppendix A: Proof of Theorem 3.2
To start with, we introduce the Weinstein-Aronszajn identity in Proposition .1 which is proven in[40].
Proposition .1.
Let A and B be matrices of size m × n and n × m respectively, then det( I n × n + BA ) = det( I m × m + AB ) . (33) Proof.
Considering the design problem defined in Section 2.2.2, for each design with design matrix W ,we have F ( W ) = W F d , and Γ n ( W ) = W Γ d n W T . (34)We can then reformulate Ψ ρ with the definition in (18) asΨ ρ ( W ) = 12 log det( I + (cid:101) H ρm )= 12 log det( I + Σ pr ( W F d P † ) ∗ Γ − η ( W )( W F d P † ) Σ pr ) . (35)where Γ η ( W ) = Γ n ( W ) + F ( W )( Γ pr − Γ pr P ∗ Σ − PΓ pr ) F ∗ ( W )= W Γ d n W T + W F d ( Γ pr − Γ pr P ∗ Σ − PΓ pr ) F ∗ d W T = W ( Γ d n + F d ( Γ pr − Γ pr P ∗ Σ − PΓ pr ) F ∗ d ) W T = W ( Γ d n + F d Γ pr F ∗ d (cid:124) (cid:123)(cid:122) (cid:125) := H d ∈ R d × d − F d Γ pr P ∗ Σ − PΓ pr F ∗ d (cid:124) (cid:123)(cid:122) (cid:125) := H ρd ∈ R d × d ) W T = W ( Γ d n + H d − H ρd (cid:124) (cid:123)(cid:122) (cid:125) :=∆ H d ) W T = W ( Γ d n + ∆ H d ) W T . (36)To this end, we haveΨ( W ) = 12 logdet (cid:16) I + Σ pr ( W F d P † ) ∗ Γ − η ( W )( W F d P † ) Σ pr (cid:17) = 12 logdet I + Σ pr ( W F d P † ) ∗ L (cid:124) (cid:123)(cid:122) (cid:125) A L T ( W F d P † ) Σ pr (cid:124) (cid:123)(cid:122) (cid:125) B = 12 logdet I + L T ( W F d P † ) Σ pr (cid:124) (cid:123)(cid:122) (cid:125) B Σ pr ( W F d P † ) ∗ L (cid:124) (cid:123)(cid:122) (cid:125) A = 12 logdet (cid:0) I + L T ( W F d P † ) Σ pr ( W F d P † ) ∗ L (cid:1) = 12 logdet (cid:0) I + L T W F d Γ pr P ∗ Σ − Σ pr Σ − PΓ pr F ∗ d W T L (cid:1) = 12 logdet (cid:0) I + L T W F d Γ pr P ∗ Σ − PΓ pr F ∗ d W T L (cid:1) = 12 logdet (cid:0) I + L T W H ρd W T L (cid:1) , (37)where we use the Cholesky decomposition Γ − η = LL T in the second equality, Proposition .1 in thethird, definition of P † from (16) in the fifth, and definition of H ρd from (21) in the last.18 ppendix B: Proof of Theorem 3.3 To start with, we introduce necessary properties that are proven in [43] for Proposition .2, [5] forProposition .3 and [44] for Proposition .4.
Proposition .2.
Let A and B be matrices of size m × n and n × m respectively, then AB and BA have the same non-zero eigenvalues. Proposition .3.
Let
A, B ∈ C n × n be Hermitian positive semidefinite with A ≥ B (i.e., A − B isHermitian positive semidefinite), then ≤ log det( I + A ) − log det( I + B ) ≤ log det( I + A − B ) . (38) Proposition .4.
Let f : R + → R be a continuous function that is differentiable on R + (with x ≥ for x ∈ R + ). If the function x (cid:55)→ xf (cid:48) ( x ) is monotonically increasing on R + . Then for any matrices A, B ∈ R n × m , it holds that n (cid:88) i =1 f ( υ i ( AB T )) ≤ n (cid:88) i =1 f ( υ i ( A ) υ i ( B )) (39) where υ i ( · ) denotes the singular values of matrices sorted in non-increasing order. Lemma .5.
Let A ∈ R n × m , B ∈ R m × m , A T A and B are Hermitian positive semidefinite, then log det( I + ABA T ) ≤ m (cid:88) i =1 log(1 + υ i ( A T A ) υ i ( B )) . (40) Proof.
Since log det( I + ABA T ) = (cid:80) ni =1 log(1 + υ i ( ABA T )) = (cid:80) ni =1 log(1 + υ i ( AB / )), let f ( x ) =log(1 + x ), which satisfies Proposition .4, we have n (cid:88) i =1 log(1 + υ i ( AB / )) ≤ n (cid:88) i =1 log(1 + υ i ( A ) υ i ( B / )) = m (cid:88) i =1 log(1 + υ i ( A T A ) υ i ( B )) . (41)To this end, we are at the place to prove Theorem 3.3. Proof.
Denote the eigenvalue decompositions of H ρd and ∆ H d as H ρd = U k Z k U Tk + U ⊥ Z ⊥ U T ⊥ , and ∆ H d = V l Λ l V Tl + V ⊥ Λ ⊥ V T ⊥ , (42)where ( Z k , U k ) , ( V l , Λ l ) represent the dominant eigenpairs, and ( Z ⊥ , U ⊥ ) , ( V ⊥ , Λ ⊥ ) represent the re-maining eigenpairs. By triangle inequality, we have | Ψ ρ ( W ) − ˆΨ ρ ( W ) | = |
12 logdet (cid:0) I r × r + L T W H ρd W T L (cid:1) −
12 logdet (cid:16) I r × r + ˆ L T W ˆ H ρd W T ˆ L (cid:17) |≤ |
12 logdet (cid:0) I r × r + L T W H ρd W T L (cid:1) −
12 logdet (cid:16) I r × r + L T W ˆ H ρd W T L (cid:17) | (cid:124) (cid:123)(cid:122) (cid:125) ( a ) + |
12 logdet (cid:16) I r × r + L T W ˆ H ρd W T L (cid:17) −
12 logdet (cid:16) I r × r + ˆ L T W ˆ H ρd W T ˆ L (cid:17) | (cid:124) (cid:123)(cid:122) (cid:125) ( b ) . (43)19e first look at ( a ). By Proposition .3 and note that ( H ρd − ˆ H ρd ) = U ⊥ Z ⊥ U T ⊥ is Hermitian positivesemidefinite, we have ( a ) ≤
12 logdet (cid:16) I r × r + L T W H ρd W T L − L T W ˆ H ρd W T L (cid:17) = 12 logdet (cid:16) I r × r + L T W ( H ρd − ˆ H ρd ) W T L (cid:17) = 12 logdet (cid:0) I r × r + L T W U ⊥ Z ⊥ U T ⊥ W T L (cid:1) . (44)Then applying Proposition .1, we have( a ) = 12 logdet (cid:16) I r × r + L T W U ⊥ Z / ⊥ Z / ⊥ U T ⊥ W T L (cid:17) = 12 logdet (cid:16) I ( d − k ) × ( d − k ) + Z / ⊥ U T ⊥ W T LL T W U ⊥ Z / ⊥ (cid:17) = 12 logdet (cid:16) I ( d − k ) × ( d − k ) + Z / ⊥ U T ⊥ W T ( W ( Γ d n + ∆ H d ) W T ) − W U ⊥ Z / ⊥ (cid:17) . (45)Applying Lemme .5, let A = Z / ⊥ U T ⊥ W T , B = ( W ( Γ d n + ∆ H d ) W T ) − , we have( a ) ≤ (cid:88) i log(1 + υ i ( W U ⊥ Z / ⊥ Z / ⊥ U T ⊥ W T ) υ i (( W ( Γ d n + ∆ H d ) W T ) − ))= 12 (cid:88) i log(1 + υ i ( W U ⊥ Z ⊥ U T ⊥ W T ) υ i (( W ( Γ d n + ∆ H d ) W T ) − )) . (46)By Proposition 3.1, ∆ H d = C ov [ F d ( I − P † P ) m ], is a covariance matrix, thus is positive semidefinite.The smallest eigenvalue of Γ d n + ∆ H d is greater than the smallest eigenvalue of Γ d n . Hence υ i ( W ( Γ d n +∆ H d ) W T )) ≥ σ , i.e., υ i (( W ( Γ d n + ∆ H d ) W T ) − ) ≤ /σ . Note that υ i ( W U ⊥ Z ⊥ U T ⊥ W T ) ≤ υ i ( U ⊥ Z ⊥ U T ⊥ ) = ζ i . Thus we have ( a ) ≤ d (cid:88) i = k +1 log(1 + ζ i /σ ) . (47)Then we turn to second part ( b ), with Proposition .1 and Proposition .3, we have( b ) = | −
12 logdet (cid:0) I r × r + L T W U k Z k U Tk W T L (cid:1) + 12 logdet (cid:16) I r × r + ˆ L T W U k Z k U Tk W T ˆ L (cid:17) | = | −
12 logdet (cid:16) I k × k + Z / k U Tk W T LL T W U k Z / k (cid:17) + 12 logdet (cid:16) I k × k + Z / k U Tk W T ˆ L ˆ L T W U k Z / k (cid:17) |≤
12 logdet (cid:16) I k × k + Z / k U Tk W T ˆ L ˆ L T W U k Z / k − Z / k U Tk W T LL T W U k Z / k (cid:17) = 12 logdet (cid:16) I k × k + Z / k U Tk W T ( ˆ L ˆ L T − LL T ) W U k Z / k (cid:17) = 12 logdet I k × k + Z / k U Tk W T (( W ( Γ d n + ∆ ˆ H d ) W T ) − − ( W ( Γ d n + ∆ H d ) W T ) − ) (cid:124) (cid:123)(cid:122) (cid:125) ( c ) W U k Z / k . (48)Note that ( A + B ) − = A − − A − B ( A + B ) − , let A = W ( Γ d n + ∆ ˆ H d ) W T , B = W (∆ H d − ∆ ˆ H d ) W T = W V ⊥ Λ ⊥ V T ⊥ W T , we have( A + B ) − = ( W ( Γ d n + ∆ H d ) W T ) − = ( W ( Γ d n + ∆ ˆ H d ) W T ) − − ( W ( Γ d n + ∆ ˆ H d ) W T ) − W V ⊥ Λ ⊥ V T ⊥ W T ( W ( Γ d n + ∆ H d ) W T ) − ⇒ ( c ) = ( W ( Γ d n + ∆ ˆ H d ) W T ) − W V ⊥ Λ ⊥ V T ⊥ W T ( W ( Γ d n + ∆ H d ) W T ) − (49)20hen we can see that( b ) ≤
12 logdet (cid:16) I k × k + Z / k U Tk W T ( W ( Γ d n + ∆ ˆ H d ) W T ) − W V ⊥ Λ ⊥ V T ⊥ W T ( W ( Γ d n + ∆ H d ) W T ) − W U k Z / k (cid:17) = 12 logdet (cid:16) I ( d − l ) × ( d − l ) + Λ / ⊥ V T ⊥ W T ( W ( Γ d n + ∆ ˆ H d ) W T ) − W U k Z / k Z / k U Tk W T ( W ( Γ d n + ∆ H d ) W T ) − W V ⊥ Λ / ⊥ (cid:17) = 12 logdet (cid:16) I ( d − l ) × ( d − l ) + Λ / ⊥ V T ⊥ W T ( W ( Γ d n + ∆ ˆ H d ) W T ) − W U k Z k U Tk W T ( W ( Γ d n + ∆ H d ) W T ) − W V ⊥ Λ / ⊥ (cid:17) . (50)Applying Lemma .5, we have( b ) ≤ (cid:88) i log(1 + υ i ( W V ⊥ Λ ⊥ V T ⊥ W T ) υ i (( W ( Γ d n + ∆ ˆ H d ) W T ) − W U k Z k U Tk W T ( W ( Γ d n + ∆ H d ) W T ) − )) ≤ k (cid:88) i = l +1 log(1 + λ i ζ /σ ) , (51)where we have used υ i (( W ( Γ d n + ∆ ˆ H d ) W T ) − W U k Z k U Tk W T ( W ( Γ d n + ∆ H d ) W T ) − ) ≤ v (( W ( Γ d n + ∆ ˆ H d ) W T ) − ) v ( W U k Z k U Tk W T ) v (( W ( Γ d n + ∆ H d ) W T ) − ) ≤ ζ /σ (52)for i ≤ k in the last inequality. Note that it vanishes for i > k as Z k has rank not larger than k .Combining (47) and (51), | Ψ ρ ( W ) − ˆΨ ρ ( W ) | ≤ ( a ) + ( b ) ≤ d (cid:88) i = k +1 log(1 + ζ i /σ ) + 12 k (cid:88) i = l +1 log(1 + λ i ζ /σ ) . (53) Appendix C: Low-rank approximation
To compute the low-rank approximations of ∆ H d and H ρd as described in Section 3.3, we presentthe randomized SVD algorithm for these two quantities. Recall the explicit forms of ∆ H d and H ρd as H ρd = F d Γ pr P ∗ Σ − PΓ pr F ∗ d , ∆ H d = F d Γ pr F ∗ d − F d Γ pr P ∗ Σ − PΓ pr F ∗ d . (54) Algorithm 2:
Randomized SVD to compute H with low rank k Generate i.i.d. Gaussian matrix Ω ∈ R d × ( k + p ) with an oversampling parameter p very small (e.g., p = 10). Compute Y = HΩ . Compute the QR factorization Y = QR satisfying Q T Q = I . Compute B = Q T H Q . Solve an eigenvalue problem for B such that B = Z Σ Z T . Form U k = QZ [1 : k ] and Σ k = Σ [1 : k, k ]. We see that this is a matrix-free eigensolver. Steps 2 and 4 represent ∆ H d action on O (2( l + p ))vectors and H ρd action on O (2( k + p )) vectors. In terms of the total actions, it requires 2(2 l + k + p )forward operator F and 2( l + k + p ) of its adjoint F ∗ , 2( k + l + p ) prediction operator P and its adjoint P ∗ . 21or the contaminant problem given in Section 4.1, the concentration field u ( x, t ) is given by u t − k ∆ u + v · ∇ u = 0 in D × (0 , T ) ,u ( · ,
0) = m in D ,k ∇ u · n = 0 on ∂ D × (0 , T ) , (55)we can form the parameter-to-observable map Fm as the discretized value of B u ( m ) where B is thepointwise observation operator. The adjoint problem is a terminal value problem which can be solvedbackwards in time by the equation: − p t − ∇ · ( p v ) − k ∆ p = B ∗ y in D × (0 , T ) ,p ( · , T ) = 0 in D , ( p v + k ∇ p ) · n = 0 on ∂ D × (0 , T ) . (56)Then we can define the adjoint of the parameter-to-observable map F ∗ y as the discretized value of p ( x,
0) for any yy