Domain-decomposition least-squares Petrov-Galerkin (DD-LSPG) nonlinear model reduction
DDomain-decomposition least-squares Petrov–Galerkin (DD-LSPG)nonlinear model reduction
Chi Hoang ∗ Youngsoo Choi † Kevin Carlberg ‡ Abstract
While reduced-order models have demonstrated success in many applications across computationalscience and engineering, they encounter challenges when applied both to truly extreme-scale modelsdue to the prohibitive cost of generating requisite training data, and to decomposable systems due tomany-query problems (e.g., design) often requiring repeated reconfigurations of system components.To address these challenges, we propose the domain-decomposition least-squares Petrov–Galerkin (DD-LSPG) model-reduction method applicable to parameterized systems of nonlinear algebraic equations(e.g., arising from discretizing a parameterized partial-differential-equations problem). In contrast withprevious works, we adopt an algebraically non-overlapping decomposition strategy rather than a spatial-decomposition strategy, which facilitates application to different spatial-discretization schemes. Ratherthan constructing a low-dimensional subspace for the entire state space in a monolithic fashion—whichwould be infeasible for extreme-scale systems and decomposable models—the methodology constructsseparate subspaces for the different subdomains/components characterizing the original model. Duringthe offline stage, the method constructs low-dimensional bases for the interior and interface of subdo-mains/components. During the online stage, the approach constructs an LSPG reduced-order model foreach subdomain/component (equipped with hyper-reduction in the case of nonlinear operators), and en-forces strong or weak compatibility on the ‘ports’ connecting them. We propose several different strategiesfor defining the ingredients characterizing the methodology: (i) four different ways to construct reducedbases on the interface/ports of subdomains, and (ii) different ways to enforce compatibility across con-necting ports. In particular, we show that the appropriate compatibility-constraint strategy dependsstrongly on the basis choice. In addition, we derive a posteriori and a priori error bounds for the DD-LSPG solutions. Numerical results performed on nonlinear benchmark problems in heat transfer andfluid dynamics that employ both finite-element and finite-difference spatial discretizations demonstratethat the proposed method performs well in terms of both accuracy and (parallel) computational cost,with different choices of basis and compatibility constraints yielding different performance profiles.
Keywords : domain decomposition, substructuring, model reduction, least-squares Petrov–Galerkinprojection, error bounds
Many tasks in computational science and engineering are many query in nature, as they require the repeatedsimulation of a parameterized large-scale computational model. Model reduction has become a popularapproach to make such tasks tractable. Most of such techniques first perform an “offline” training stage thatsimulates the computational model for multiple input-parameter instances; then, during an “online” deployed ∗ Extreme-scale Data Science and Analytics Department, Sandia National Laboratories, Livermore, CA 94550 ([email protected]). Sandia National Laboratories is a multimission laboratory managed and operated by National Technologyand Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Departmentof Energy’s National Nuclear Security Administration under contract DE-NA-0003525. † Lawrence Livermore National Laboratory, Livermore, CA 94550 ([email protected]). Lawrence Livermore National Labora-tory is operated by Lawrence Livermore National Security, LLC, for the U.S. Department of Energy, National Nuclear SecurityAdministration under Contract DE-AC52-07NA27344 (LLNL-JRNL-812648). ‡ Mechanical Engineering and Applied Mathematics, University of Washington, Seattle, WA 98195 ([email protected]). a r X i v : . [ m a t h . NA ] J u l tage, these techniques reduce the dimensionality and complexity of the original computational model atarbitrary input-parameter instances by performing a projection process of the original computational modelonto a low-dimensional subspace or manifold.While such reduced-order models (ROMs) have demonstrated success in many applications, challengesarise when applying model reduction either to extreme-scale models or to decomposable systems , i.e., systemscomposed of well-defined components. In the former case, the extreme-scale nature of the original compu-tational model renders the offline training simulations infeasible. In the latter case, the many-query taskoften involves design, wherein components are swapped or their interconnecting topology is modified; in thiscase, the state space characterizing the original computational model changes substantially between queries,rendering training simulations (which assume a fixed state space) challenging.To date, researchers have developed several methods to enable model reduction for decomposable systems.During the offline stage, these approaches construct a unique reduced basis for each component; duringthe online stage, they formulate a reduced-order model for the full system using domain-decompositionapproaches that enforce solution compatibility along component interfaces. Most approaches to date havebeen developed for parameterized linear partial differential equations (PDEs).Reduced basis element (RBE) methods, which comprise a family of domain-decomposition reduced-order model (DDROM) techniques, are applicable to linear PDEs [1, 2, 3, 4, 5, 6]. Maday et al. [1, 2]proposed the very first work of this family; this approach combines the reduced-basis (RB) method withdomain decomposition (DD), using full-subdomain bases and “gluing” the subdomain interfaces weakly viaLagrange multipliers. The full-subdomain bases are built in the offline stage, while in the online stage asaddle point problem [7] is solved to compute the solution for any input-parameter instance. The reducedbasis hybrid method (RBHM), which was proposed later by Iapichino and coworkers [3], modifies the RBEby including the finite element (FE) coarse solutions in the reduced bases (in the online stage) to recoverthe nonzero normal stress component of the final solution. The RBE and RBHM were employed to solve thesteady Stokes problem with applications in cardiovascular networks [1], [3]. In the reduced-basis–domain-decomposition–finite-element (RDF) method [6], the same authors proposed to separate the global DOFsinto all subdomain interior DOFs and “skeleton” DOFs, then approximate all subdomain interior DOFsby RB method. The unknowns in the final reduced linear system comprise the generalized coordinatesassociated with all subdomain interiors and FE degrees of freedom on the skeleton. Similar in concept, thestatic condensation reduced basis element (SCRBE) method proposed by Huynh et al. [4, 8] decomposesthe “skeleton” DOFs further into “port” DOFs on each subdomain, where a subdomain can have multiplenonoverlapping ports. SCRBE employs a primal-Schur domain-decomposition method to assemble and solvethe resulting system. In particular, Ref. [4] carefully constructs interface bases to represent all possiblevariations of the solution on the skeleton of the global domain. While this is a robust and comprehensiveapproach to compute the skeleton solution, it also incurs a high computational cost: the dimension of theSchur-complement system is equal to the number of FE degrees of freedom across all ports, which canremain large scale for fine spatial discretizations. To address this, Ref. [5] applies “adaptive port reduction”to reduce the number of port degrees of freedom and hence the dimensionality and cost of solving theSchur-complement system.Besides the RBE family mentioned above, researchers have developed other DDROM methods to solveparameterized linear PDEs in the context of multiscale heterogeneous materials analysis. These methodsinclude the multiscale reduced basis method (MsRBM) [9], FE -based model order reduction method [10],the localized reduced basis multiscale method (LRBMS) [11, 12], the reduced basis localized orthogonaldecomposition method (RB-LOD) [13], the reduced basis method for heterogeneous domain decomposition(RBHDD) [14] and recently the ArbiLoMod method [15]. In addition, we are also aware of the use ofDDROM in the work of graphic community, for example (not a comprehensive list), [16, 17] deal withnonlinear problems while [18, 19] handle linear problems. The work [20] solves nonlinear problems using aFOM-ROM hybrid approach that will be described in next paragraph.While some DDROM techniques have been applied to nonlinear PDEs, most of these techniques are Note that “full-subdomain bases” here include all degrees of freedom (DOFs) of a subdomain: both interior and interfaceDOFs. subset of the physical domain, and apply thehigh-fidelity model elsewhere; compatibility between the ROM and high-fidelity-model solutions is enforcedusing non-overlapping domain decomposition methods and some multiscale homogenization assumptions[10]. For example, in the work by Buffoni and coworkers [21], the authors implemented the overlappingclassical Schwarz method (using Dirichlet–Neumann iterations [22]) and divided the computational domaininto two subdomains. The high-fidelity-model subdomain is discretized using a standard method (e.g., fi-nite difference, finite element), while the ROM subdomain employes a snapshot-based proper orthogonaldecomposition (POD) technique [23] with subdomain bases. Solution compatibility on the interface holdsweakly through the enforcement of continuity of normal derivatives of the trace of the solutions on theinterface. In another work by Kerfriden et al. [24], the authors used a primal-Schur domain-decompositionmethod combined with a snapshot-based POD ROM subdomain to solve nonlinear fracture-mechanics prob-lems. In particular, the approach approximates the interior DOFs of linear subdomains with snapshot-POD(further reduction with the hyper-reduction technique DEIM [25] due to nonaffine parameter dependence)and use a full-order model (FOM) on nonlinear damaged subdomains. The Schur-complement system isformed by enforcing strong (i.e., node pairwise) compatibility between ROM and FOM subdomains andcondenses out only the generalized coordinates characterizing the ROM subdomains, rendering the Schur-complement system high-dimensional. With similar FOM/ROM hybrid idea, the DD-POD method [26] usesthe Gravouil–Combescure domain-decomposition approach [27] to solve elastic–plastic structural dynamicsproblems. The method divides the domain of interest into subdomains; during the online stage, a plasticcheck is performed on each subdomain to determine whether ROM or FOM approximations will be imple-mented in that subdomain. Again, full-subdomain bases are used in the linear-elastic subdomains and weakcompatibility constraints are used on the interface. Baiges and coworkers [28] used a primal-dual monolithicapproach to solve incompressible Navier–Stokes equations with overlapping domain decomposition. The ap-proach also comprises a FOM/ROM hybrid wherein the physical domain is decomposed into FOM, ROM andoverlapping subdomains. The ROM subdomains use full-subdomain bases and are further hyper-reduced bya discrete variant [29] of the best point interpolation method, while the overlapping/interface regions enforcevelocity continuity, which corresponds to a weak compatibility constraint.This work aims to overcome several shortcomings of existing works. First, the only available DDROMmethods for nonlinear PDEs employed a hybrid ROM/FOM approach; a “complete ROM” methodologyappears to be missing for nonlinear problems. Second, most previously developed DDROM methods were ap-plied to self-adjoint problems and thus constrained optimization problems could be derived from a Galerkin-projection perspective; the extension of many methods to non-self-adjoint problems is unclear. Finally, mostof the above approaches (with the exception of SCRBE [4, 8]) employ “full-subdomain” bases with supportover both interior and interface degrees of freedom. Such bases only are generally compatible only with weakconstraints (see, e.g., [1, 3, 21, 26]), which precludes an equivalent global solution due to non-uniqueness ofthe solution on the interfaces. To address these shortcomings, this work is characterized by the followingnovel features: • We consider parameterized systems of nonlinear algebraic equations, and adopt an algebraically non-overlapping decomposition strategy rather than a spatial-decomposition strategy, which facilitatesapplication to models derived using different discretization methods. • We develop a “complete ROM” approach that applies model reduction to all degrees of freedom char-acterizing the nonlinear algebraic system; thus it is not a ROM/FOM hybrid. • We formulate a constrained optimization problem for the global problem by equipping the least-squares Petrov–Galerkin (LSPG) [30, 31, 32, 33] projection (with hyper-reduction [34]) with interface-compatibility constraints. We employ a sequential quadrating programming (SQP) method to solvethe resulting optimization problem. Critically, this formulation is valid for both self-adjoint and non-self-adjoint problems. • We propose four different subdomain basis types, including the classical “full-subdomain” type andthree “interface/boundary” types: port, skeleton, and full-interface. Consequently, the characterization3f the solution on the interfaces has much greater flexibility than in previous contributions. • Support for both strong and weak compatibility constraints on the interfaces for all basis types. Inparticular, we show that the best choice for compatibility constraints is strongly dependent on thesubdomain-basis type (i.e., weak compatibility is best for full-subdomain and full-interface bases; strongcompatibility is best for port and skeleton bases). • Both a posteriori and a priori error bounds for the method, which illustrate how the error on eachsubdomain and port can be bounded using global quantities. • Numerical experiments on benchmark problems in heat transfer and fluid dynamics that employ bothfinite-element and finite-difference discretizations that systematically assess the effect of all methodparameters on accuracy and computational cost, lending deep insights into the performance aspects ofthe proposed methodology.The paper is structured as follows. Section 2 formulates the full-order model and algebraically non-overlapping decomposition that characterizes our domain-decomposition strategy. Section 3 describes theproposed DD-LSPG framework, including the two proposed choices for subdomain reduced bases (Sections 3.1and 3.2), and strong vs. weak compatibility constraints (Section 3.3). Section 4 describes the proposed SQPsolver used to numerically solve the constrained optimization problem characterizing DD-LSPG projection,its particularization to the two types of subdomain reduced bases (Sections 4.1 and 4.2) and its serial/parallelcosts (Section 5). Section 6 describes the offline algorithms for constructing interior/boundary bases (Section6.1) and full-subdomain bases (Section 6.2). Section 7 derives a posteriori and a priori error bounds for themethod. Section 8 reports numerical experiments on a benchmark problem in heat transfer that employsa finite-element discretization (Section 8.1) and a benchmark problem in fluid dynamics that employs afinite-difference discretization (Section 8.2). Finally, Section 9 concludes the paper.
We consider the (high-fidelity) full-order model to be expressed as a parameterized system of nonlinearalgebraic equations r ( x ; µ ) = , (2.1)where the residual r : R n × D → R n is nonlinear in (at least) its first argument, µ ∈ D ⊆ R n µ denotesthe parameters, and x : D → R n denotes the state, which is implicitly defined as the solution to Eq. (2.1)given an instance of the parameters. Such problems arise, for example, after applying spatial discretizationto a stationary PDE problem; because we take Eq. (2.1) to be our full-order model, our methodology is spatial-discretization agnostic . For notational simplicity, we suppress all dependence on the parameters µ until needed in Section 6.We consider an algebraic decomposition of this problem into n Ω ( ≤ n ) ‘subdomains’ such that the residualsatisfies r : w (cid:55)→ n Ω (cid:88) i =1 [ P r i ] T r i ( P Ω i w , P Γ i w ) , ∀ w ∈ R n . (2.2)Here, r i : R n Ω i × R n Γ i → R n r i with r i : ( w Ω i , w Γ i ) (cid:55)→ r i ( w Ω i , w Γ i ) denotes the i th subdomain residual, P r i ∈ { , } n r i × n denotes i th the residual sampling matrix, P Ω i ∈ { , } n Ω i × n denotes the i th interior-statesampling matrix, and P Γ i ∈ { , } n Γ i × n denotes the i th interface-state sampling matrix; each samplingmatrix comprises selected rows of the n × n identity matrix. The residual sampling matrix is such that thedecomposition is algebraically non-overlapping , i.e., P r i [ P r j ] T = for i (cid:54) = j and (cid:80) n Ω i =1 n r i = n . Further, theinterior-state sampling matrix satisfies P Ω i [ P Ω j ] T = for i (cid:54) = j ; this implies that there is no overlap betweenthe interior states associated with different subdomains. Thus, the operators P Ω i and P Γ i , i = 1 , . . . , n Ω aredetermined from the sparsity patterns of the sampled Jacobians P r i ∂ r ∂ w , i = 1 , . . . , n Ω . We define the totalnumber of degrees of freedom for each subdomain as n i := n Ω i + n Γ i ; note that n i ≥ n r i .4f we set x Ω i := P Ω i x ∈ R n Ω i and x Γ i := P Γ i x ∈ R n Γ i , then from Eqs. (2.1)–(2.2), the solution for eachsubdomain x i := ( x Ω i , x Γ i ) satisfies r i ( x Ω i , x Γ i ) = , i = 1 , . . . , n Ω (2.3)along with compatibility conditions that enforce consistency across the boundary states for different subdo-mains. To reason about these compatibility conditions, we define a set of n p ‘ports’; the j th port is charac-terized by n pj ≤ n states that are shared across a fixed set of subdomains denoted by P ( j ) ⊆ { , . . . , n Ω } .Then, the compatibility conditions can be expressed as P ji x Γ i = P j(cid:96) x Γ (cid:96) , i, (cid:96) ∈ P ( j ) , j = 1 , . . . , n p , (2.4)where the port sampling matrix P ji ∈ { , } n pj × n Γ i comprises selected rows of the n Γ i × n Γ i identity matrix.For a given subdomain i , we require the ports to be non-overlapping such that P ji [ P (cid:96)i ] T = for j, (cid:96) ∈ Q ( i )and j (cid:54) = (cid:96) and (cid:80) j ∈ Q ( i ) n pj = n Γ i , where we have defined the set of ports associated with subdomain i as Q ( i ) := { j | i ∈ P ( j ) } ⊆ { , . . . , n p } . We note that for a given port j , although the number of total pairwisecompatibility conditions arising from Eq. (3.3) is (cid:0) k (cid:1) , the number of unique pairwise compatibility conditionsis only n pair j := | P ( j ) | −
1. Using this formulation, the full-order model (2.1) can be recast in decomposedform as r i ( x Ω i , x Γ i ) = , i = 1 , . . . , n Ω n Ω (cid:88) i =1 ¯ A i x Γ i = , (2.5)where ¯ A i ∈ {− , , } n ¯ A × n Γ i with n ¯ A = (cid:80) n p j =1 n pair j n pj denote the constraint matrices associated with port-compatibility conditions (3.3). Note that Eqs. (2.5) comprise (cid:80) n Ω i =1 n r i + n ¯ A = n + n ¯ A equations in (cid:80) n Ω i =1 n Ω i + (cid:80) n Ω i =1 n Γ i unknowns; as there exists a unique solution to these equations, we have n + n ¯ A ≥ (cid:80) n Ω i =1 n Ω i + (cid:80) n Ω i =1 n Γ i . For illustration, Figure 1 shows an example of a decomposition using n Ω = 4 subdomains and n p = 5global ports for the case of a full-order model derived from discretizing a PDE in two spatial dimensionsusing a residual operator with a 9-point stencil. Figure 2 shows the degrees of freedom and residual elementsassociated with subdomain Ω . We now consider applying least-squares Petrov–Galerkin (LSPG) model reduction [30, 31, 32, 33] in thedomain-decomposition setting presented in Section 2.
Assume we have constructed interior reduced bases Φ Ω i ∈ R n Ω i × p Ω i (cid:63) with p Ω i ≤ n Ω i , i = 1 , . . . , n Ω and interfacereduced bases Φ Γ i ∈ R n Γ i × p Γ i (cid:63) with p Γ i ≤ n Γ i , i = 1 , . . . , n Ω , where R m × n(cid:63) denotes the non-compact Stiefelmanifold: the set of full-column-rank m × n real-valued matrices; Section 6 described proposed approachesfor constructing these bases. We then approximate the solution on the i th subdomain in the associated p i -dimensional trial subspace with p i = p Ω i + p Γ i as x i ≈ ˜ x i ≡ (˜ x Ω i , ˜ x Γ i ) = ( Φ Ω i ˆ x Ω i , Φ Γ i ˆ x Γ i ) ∈ Ran( Φ Ω i ) × Ran( Φ Γ i ) ⊆ R n Ω i × R n Γ i . We formulate the domain-decomposition LSPG (DD-LSPG) reduced-order model by minimizing the sumof squared residual norms over these trial subspaces subject to (possibly weak) port-compatibility conditions,5 Ω Ω Ω Γ Γ Γ Γ Ω P P P P P corner node of Ω corner node of Ω corner node of Ω corner node of Ω Figure 1: Domain-decomposition example: full-order model derived from discretizing a PDE in two spatialdimensions using a residual operator with a 9-point stencil. (Left) Residual and corner nodes. (Right)Each colored point is associated with a residual for the associated subdomain, n Ω = 4 subdomains, n p = 5ports, P (1) = { , } , P (2) = { , } , P (3) = { , } , P (4) = { , } , P (5) = { , , , } ; and Q (1) = { , , } , Q (2) = { , , } , Q (3) = { , , } , Q (4) = { , , } . Ports P , P , P , P are two-components ports, and port P is four-components port.i.e., we compute (ˆ x Ω i , ˆ x Γ i ), i = 1 , . . . , n Ω as the solution to the optimization problemminimize ( ˆ w Ω i , ˆ w Γ i ) , i =1 ,...,n Ω n Ω (cid:88) i =1 (cid:107) B i r i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) (cid:107) subject to n Ω (cid:88) i =1 A i Φ Γ i ˆ w Γ i = . (3.1)Here, A i ∈ R n A × n Γ i , i = 1 , . . . , n Ω denote constraint matrices (see Section 3.3 for how this can be derivedfrom the constraint matrices ¯ A i , i = 1 , . . . , n Ω ) with n A ≤ n ¯ A denotes the number of constraints incurredby port compatibility, and B i ∈ R n Bi × n r i , i = 1 , . . . , n Ω with n Bi ≤ n r i denotes a matrix that enables thesubdomain residuals to be minimized in any weighted (cid:96) -(semi)norm.In particular, we focus on three choices of matrix B i : B i = I as in “standard” LSPG, B i = Z i in thecase of collocation hyper-reduction [31, 35, 36], and B i = ( Z i Φ ri ) + Z i in the case of gappy POD hyper-reduction [32, 34, 37]. Here, Z i := [ e ξ i . . . e ξ nzii ] T ∈ { , } n zi × n r i comprises selected rows of the n r i × n r i identity matrix, e i denotes the i th Kronecker vector, and { ξ i , . . . , ξ n zi i } ⊆ { , . . . , n r i } denotes the indicesof the residual elements sampled by the operator. On the other hand, Φ ri ∈ R n r i × p ri (cid:63) , i = 1 , . . . , n Ω denotereduced bases for the residual and the superscript + denotes the Moore–Penrose pseudoinverse. For thepseudoinverses to correspond to left inverses, the matrices Z i Φ ri , i = 1 , . . . , n Ω must have full column rank,6 x Ω i x p x p x p r i x Γ i x i Figure 2: Domain-decomposition example: full-order model derived from discretizing a PDE in two spatialdimensions using a residual operator with a 9-point stencil. This figure considers the bottom-left subdomainΩ of Fig. 1: interior DOFs (blue circles), interface DOFs (nodes included in yellow rectangle), residualDOFs (nodes included in green rectangle) and subdomain DOFs (nodes included in purple rectangle). Theremaining boundary nodes correspond to Dirichlet boundary conditions.which in turn necessitates n zi ≥ p ri , i = 1 , . . . , n Ω . Note that n Bi = n zi for collocation hyper-reduction and n Bi = p ri in the case of gappy POD hyper-reduction. We note that hyper-reduction is required to ensure thatcomputing the ROM solution incurs an n -independent operation count. We also consider a variation on this formulation corresponding to the case of classical full-subdomain reducedbases [1, 3, 21, 26]. In this case, each subdomain is equipped with a single reduced basis Φ i ∈ R n i × p i (cid:63) whosecolumns can have support over both interior and interface degrees of freedom such that Φ Ω i = P Ω i Φ i ∈ R n Ω i × p Ω i and Φ Γ i = P Γ i Φ i ∈ R n Γ i × p Γ i with p Ω i = p Γ i = p i ; note that the reduced bases Φ Ω i and Φ Γ i neednot have full column rank individually. Approximating the solution on the i th subdomain as x i ≈ ˜ x i ≡ (˜ x Ω i , ˜ x Γ i ) = ( Φ Ω i ˆ x i , Φ Γ i ˆ x i ), the resulting DD-LSPG model computes solutions ˆ x i , i = 1 , . . . , n Ω as the solutionto the optimization problem minimize ˆ w i , i =1 ,...,n Ω n Ω (cid:88) i =1 (cid:107) B i r i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) (cid:107) subject to n Ω (cid:88) i =1 A i Φ Γ i ˆ w i = . (3.2)We again consider the choices of B i = I , B i = Z i , and B i = ( Z i Φ ri ) + Z i .For both Problems (3.1) and (3.2), the effective number of degrees of freedom in the resulting ROMcorresponds to p = (cid:80) n Ω i =1 p i − rank( A ). Here, we have defined the reduced constraint matrix as A :=[ A Φ Γ1 · · · A n Ω Φ Γ n Ω ]. This result holds because the null space of the operator A defines the effectivesubspace over which unconstrained minimization takes place.7 .3 Strong versus weak compatibility constraints Recall that the constraint matrices ¯ A i , i = 1 , . . . , n Ω are derived by enforcing the degrees of freedom on eachport to be consistent across shared subdomains according to Eq. (3.3). We can effectively reduce the numberof constraints by weakening this notion of consistency through enforcing a zero inner product between thedifference between port solutions and a collection of prescribed test functions, i.e., C j P ji x Γ i = C j P j(cid:96) x Γ (cid:96) , i, (cid:96) ∈ P ( j ) , j = 1 , . . . , n p , (3.3)where C j ∈ R n cj × n pj with n cj ≤ n pj and rank( C j ) = n cj denotes the matrix of test functions. If we assemble aconstraint matrix from (weak) compatibility conditions (3.3) in the same manner that the original constraintmatrices ¯ A i , i = 1 , . . . , n Ω were assembled from (strong) compatibility conditions (3.3), we obtain theconstraint matrices A i ∈ R n A × n Γ i , i = 1 , . . . , n Ω with n A = (cid:80) n p j =1 n pair j n cj that have the structure A i = C ¯ A i for some C ∈ R n A × n ¯ A , with C = I in the case of strong compatibility constraints (3.3). Remark 1.
Critically, the case of weak compatibility constraints (i.e., n cj < n pj ) admits discrepancies be-tween the restrictions of subdomain solutions to the j th port, i.e., Eq. (3.3) will not hold in general. Thisphenomenon precludes the existence of a ‘global solution’, as ports may not have a uniquely computed solu-tion. While this may appear to be deleterious to the accuracy of the computed DD-LSPG solution, we showin the numerical experiments that this relaxation is critical to obtain accurate solutions when neighboringcomponents have incompatible bases on the associated port; this occurs in particular for interface and full-subdomain bases. For such bases, enforcing strong compatibility constraints effectively causes the subdomainsto generate the trivial solution on the associated ports, yielding poor overall solution accuracy, even if thesolution on the ports is uniquely defined. Problems (3.1)–(3.2) can be classified as a nonlinear least-squares problems with linear equality constraints .As such, they are well-suited to solution with a sequential quadratic programming (SQP) method, which inthis case is equivalent to applying Newton’s method to the Karush–Kuhn–Tucker (KKT) necessary conditionsfor optimality. This section describes this solution approach.
We begin by defining the Lagrangian associated with problem (3.1) L : ( ˆ w Ω1 , ˆ w Γ1 , . . . , ˆ w Ω n Ω , ˆ w Γ n Ω , γ ) (cid:55)→ n Ω (cid:88) i =1 (cid:107) B i r i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) (cid:107) + n Ω (cid:88) i =1 γ T A i Φ Γ i ˆ w Γ i , (4.1)where γ ∈ R n A denotes Lagrange multipliers. The KKT conditions arise from stationarity of the Lagrangian,i.e., the DD-LSPG ROM solution (ˆ x Ω1 , ˆ x Γ1 , . . . , ˆ x Ω n Ω , ˆ x Γ n Ω , λ ) satisfies ∂L∂ ˆ w Ω i (ˆ x Ω1 , ˆ x Γ1 , . . . , ˆ x Ω n Ω , ˆ x Γ n Ω , λ ) = ˆ r Ω i (ˆ x Ω i , ˆ x Γ i ) = , i = 1 , . . . , n Ω ∂L∂ ˆ w Γ i (ˆ x Ω1 , ˆ x Γ1 , . . . , ˆ x Ω n Ω , ˆ x Γ n Ω , λ ) = ˆ r Γ i (ˆ x Ω i , ˆ x Γ i , λ ) = , i = 1 , . . . , n Ω ∂L∂ γ (ˆ x Ω1 , ˆ x Γ1 , . . . , ˆ x Ω n Ω , ˆ x Γ n Ω , λ ) = n Ω (cid:88) i =1 A i Φ Γ i ˆ x Γ i = , (4.2)8here we have definedˆ r Ω i : ( ˆ w Ω i , ˆ w Γ i ) (cid:55)→ [ Φ Ω i ] T ∂ r i ∂ x Ω i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) T [ B i ] T B i r i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i )ˆ r Γ i : ( ˆ w Ω i , ˆ w Γ i , γ ) (cid:55)→ [ Φ Γ i ] T ∂ r i ∂ x Γ i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) T [ B i ] T B i r i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) + [ Φ Γ i ] T A Ti γ (4.3)for i = 1 , . . . , n Ω . Applying Newton’s method with a Gauss–Newton Hessian approximation to solve thesystem of nonlinear algebraic equations (4.2) yields the SQP iterations for k = 0 , . . . , K H ΩΩ1 (ˆ x Ω( k )1 , ˆ x Γ( k )1 ) H ΩΓ1 (ˆ x Ω( k )1 , ˆ x Γ( k )1 ) . . . H ΓΩ1 (ˆ x Ω( k )1 , ˆ x Γ( k )1 ) H ΓΓ1 (ˆ x Ω( k )1 , ˆ x Γ( k )1 ) . . . [ Φ Γ1 ] T A T ... ... . . . ... ... ... . . . H ΩΩ n Ω (ˆ x Ω( k ) n Ω , ˆ x Γ( k ) n Ω ) H ΩΓ n Ω (ˆ x Ω( k ) n Ω , ˆ x Γ( k ) n Ω )
00 0 . . . H ΓΩ n Ω (ˆ x Ω( k ) n Ω , ˆ x Γ( k ) n Ω ) H ΓΓ n Ω (ˆ x Ω( k ) n Ω , ˆ x Γ( k ) n Ω ) [ Φ Γ n Ω ] T A Tn Ω A Φ Γ1 . . . A n Ω Φ Γ n Ω p Ω( k )1 p Γ( k )1 ... p Ω( k ) n Ω p Γ( k ) n Ω p λ ( k ) = − ˆ r Ω1 (ˆ x Ω( k )1 , ˆ x Γ( k )1 )ˆ r Γ1 (ˆ x Ω( k )1 , ˆ x Γ( k )1 )...ˆ r Ω n Ω (ˆ x Ω( k ) n Ω , ˆ x Γ( k ) n Ω )ˆ r Γ n Ω (ˆ x Ω( k ) n Ω , ˆ x Γ( k ) n Ω ) (cid:80) n Ω i =1 A i Φ Γ i ˆ x Γ i (4.4)where H ΩΩ i : ( ˆ w Ω i , ˆ w Γ i ) (cid:55)→ [ Φ Ω i ] T ∂ r i ∂ w Ω i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) T [ B i ] T B i ∂ r i ∂ w Ω i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) Φ Ω i H ΩΓ i : ( ˆ w Ω i , ˆ w Γ i ) (cid:55)→ [ Φ Ω i ] T ∂ r i ∂ w Ω i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) T [ B i ] T B i ∂ r i ∂ w Γ i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) Φ Γ i H ΓΩ i : ( ˆ w Ω i , ˆ w Γ i ) (cid:55)→ [ Φ Γ i ] T ∂ r i ∂ w Γ i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) T [ B i ] T B i ∂ r i ∂ w Ω i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) Φ Ω i H ΓΓ i : ( ˆ w Ω i , ˆ w Γ i ) (cid:55)→ [ Φ Γ i ] T ∂ r i ∂ w Γ i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) T [ B i ] T B i ∂ r i ∂ w Γ i ( Φ Ω i ˆ w Ω i , Φ Γ i ˆ w Γ i ) Φ Γ i (4.5)for i = 1 , . . . , n Ω . We can then update the solution asˆ x Ω( k +1) i = ˆ x Ω( k ) i + α ( k ) p Ω( k ) i , i = 1 , . . . , n Ω ˆ x Γ( k +1) i = ˆ x Γ( k ) i + α ( k ) p Γ( k ) i , i = 1 , . . . , n Ω λ ( k +1) = λ ( k ) + α ( k ) p λ ( k ) , (4.6)where α ( k ) is a step length that can be computed, e.g., via line search. We note that the block sparsestructure of SQP iterations (4.4) admit interesting parallel-solution strategies, which is the subject of futurework. Analogously to Section 4.1, the Lagrangian associated with problem (3.2) is defined as L : ( ˆ w , . . . , ˆ w n Ω , γ ) (cid:55)→ n Ω (cid:88) i =1 (cid:107) B i r i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) (cid:107) + n Ω (cid:88) i =1 γ T A i Φ Γ i ˆ w i , (4.7)9here the KKT system can be derived from stationarity of the Lagrangian such that the DD-LSPG ROMsolution (ˆ x , . . . , ˆ x n Ω , λ ) satisfies ∂L∂ ˆ w i (ˆ x , . . . , ˆ x n Ω , λ ) = ˆ r i (ˆ x i , λ ) = , i = 1 , . . . , n Ω ∂L∂ λ (ˆ x , . . . , ˆ x n Ω , λ ) = n Ω (cid:88) i =1 A i Φ Γ i ˆ x Γ i = , (4.8)where we have definedˆ r i : ( ˆ w i , γ ) (cid:55)→ [ Φ Ω i ] T ∂ r i ∂ w Ω i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) T [ B i ] T B i r i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i )+ [ Φ Γ i ] T ∂ r i ∂ w Γ i ( Φ Ω i ˆ x i , Φ Γ i ˆ x i ) T [ B i ] T B i r i ( Φ Ω i ˆ x i , Φ Γ i ˆ x i ) + ( Φ Γ i ) T A Ti γ (4.9)for i = 1 , . . . , n Ω . Applying Newton’s method with a Gauss–Newton Hessian approximation to solve thesystem of nonlinear algebraic equations (4.8) yields the SQP iterations for k = 0 , . . . , K H (ˆ x ( k )1 ) . . . ( Φ Γ1 ) T A T ... . . . ... ... . . . H n Ω (ˆ x ( k ) n Ω ) ( Φ Γ n Ω ) T A Tn Ω A Φ Γ1 . . . A n Ω Φ Γ n Ω p ( k )1 ... p ( k ) n Ω p λ = − ˆ r (ˆ x ( k )1 )...ˆ r n Ω (ˆ x ( k ) n Ω ) (cid:80) n Ω i =1 A i Φ Γ i ˆ x ( k ) i , (4.10)where H i : ˆ w i (cid:55)→ [ Φ Ω i ] T ∂ r i ∂ w Ω i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) T [ B i ] T B i ∂ r i ∂ w Ω i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) Φ Ω i + [ Φ Ω i ] T ∂ r i ∂ w Ω i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) T [ B i ] T B i ∂ r i ∂ w Γ i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) Φ Γ i + [ Φ Γ i ] T ∂ r i ∂ w Γ i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) T [ B i ] T B i ∂ r i ∂ w Ω i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) Φ Ω i + [ Φ Γ i ] T ∂ r i ∂ w Γ i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) T [ B i ] T B i ∂ r i ∂ w Γ i ( Φ Ω i ˆ w i , Φ Γ i ˆ w i ) Φ Γ i (4.11)for i = 1 , . . . , n Ω . We can then update the solution asˆ x ( k ) i = ˆ x ( k ) i + α ( k ) p ( k ) i , i = 1 , . . . , n Ω λ ( k +1) = λ ( k ) + α ( k ) p λ ( k ) , (4.12)where α ( k ) is a step length that can be computed, e.g., via line search. We now describe the computational cost of executing the online stage. To make this precise, we first introducethe sampling operators Z ri ∈ { , } n z,ri × n i , Z Ω i ∈ { , } n z, Ω i × n Ω i , and Z Γ i ∈ { , } n z, Γ i × n Γ i , i = 1 , . . . , n Ω , whichcomprise selected rows of the n i × n i , n Ω i × n Ω i , and n Γ i × n Γ i identity matrices, respectively. These matricesare those of the prescribed structure that satisfy B i [ Z ri ] T Z ri r i ([ Z Ω i ] T Z Ω i w Ω i , [ Z Γ i ] T Z Γ i w Γ i ) = B i r i ( w Ω i , w Γ i ) , ∀ w Ω i ∈ R n Ω i , w Γ i ∈ R n Γ i . (5.1)10ith the fewest number of rows. In particular, Z Ω i and Z Γ i sample the degrees of freedom associated withnonzero columns of the Jacobians B i ∂ r i ∂ w Ω i and B i ∂ r i ∂ w Γ i , respectively.Note that for standard LSPG (i.e., B i = I ), we have simply Z ri = I , Z Ω i = I , and Z Γ i = I with n z,ri = n i , n z, Ω i = n Ω i , and n z, Γ i = n Γ i . For collocation (i.e., B i = Z i ) and gappy POD (i.e., B i = ( Z i Φ ri ) + Z i ), we have Z ri = Z i , n z,ri = n zi , n z, Ω i (cid:28) n Ω i and n z, Γ i (cid:28) n Γ i if n zi (cid:28) n i and the Jacobians ∂ r i ∂ w Ω i and ∂ r i ∂ w Γ i are sparse.Algorithms 1 and 2 describe the assembly and solve steps required within each SQP iteration, respectively,while Tables 1 and 2 report the associated floating-point operation counts.Algorithm 1 and Table 1 show that Steps 1–3 of the online assembly can be parallelized across the subdo-mains, while Step 4 requires a reduction across subdomains. Further, these illuminate that n z, Ω i , n z, Γ i , n z,ri (cid:28) n i are necessary in order to achieve an n -independent online operation count; this is precisely what is pro-vided by hyper-reduction. Here, c ri and c Ji denote the average number of floating point operations requiredto evaluate one entry of the i th residual and one row of the i th Jacobian matrix, respectively, while w Ω i and w Γ i denote the average number of non-zeros per row of the i th Jacobian for the interior and interface,respectively. Note that the operation counts for assembly are identical for the interior/boundary bases andfull-subdomain bases cases.Algorithm 2 and Table 2 report the steps and associated computational costs associated with the solveand update for each SQP iteration. Importantly, we see that the online solve and update depend only onthe dimensions of the reduced bases and constraint matrices; as such, they are independent of the quantities n z, Ω i , n z, Γ i , and n z,ri and are thus unaffected by hyper-reduction. Also, here we observe noticeable differencesin the operation counts associated with the interior/boundary bases and full-subdomain bases cases. In thecase of interior/boundary bases, using a specialized Cholesky-based LDL T factorization [38], the solve costfor system (4.4) is ( s dM ) with system dimension s dM = (cid:80) n Ω i =1 p Ω i + (cid:80) n Ω i =1 p Γ i + n A . In contrast, in the caseof full-subdomain bases, the solve cost for system (4.10) is ( s sdM ) with dimension system s sdM = n Ω p i + n A .Thus, we expect the solve cost to be less expensive for the full-subdomain cases when similar reduced-basisdimensions are employed. Algorithm 1:
Interior/boundary bases: Online assembly at each SQP iteration1:
Parallel : Compute the required elements of the state( Z Ω i ˜ x Ω( k ) i , Z Γ i ˜ x Γ( k ) i ) = ( Z Ω i Φ Ω i ˆ x Ω( k ) i , Z Γ i Φ Γ i ˆ x Γ i ) for i = 1 , . . . , n Ω ;2: Parallel : Compute residual B i r i ( Z Ω i ˜ x Ω( k ) i , Z Γ i ˜ x Γ( k ) i ), Jacobians B i ∂ r i ∂ w Ω i ( Z Ω i ˜ x Ω( k ) i , Z Γ i ˜ x Γ( k ) i ) Φ Ω i , B i ∂ r i ∂ w Γ i ( Z Ω i ˜ x Ω( k ) i , Z Γ i ˜ x Γ( k ) i ) Φ Γ i , constraint A i Φ Γ i ˆ x Γ i , and [ Φ Γ i ] T [ A i ] T λ ( k ) for i = 1 , . . . , n Ω ;3: Parallel : Using these quantities, compute terms that appear in the SQP system; • Interior/boundary bases : compute ˆ r Ω i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ), ˆ r Γ i (ˆ x Ω( k ) i , ˆ x Γ( k ) i , λ ( k ) ), H ΩΩ i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ), H ΩΓ i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ), H ΓΩ i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ), and H ΓΓ i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ) for i = 1 , . . . , n Ω . • Full-subdomain basis : compute ˆ r i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ), H i (ˆ x Ω( k ) i , ˆ x Γ( k ) i ), for i = 1 , . . . , n Ω ;4: Serial : Reduce constraints over subdomains (cid:80) n Ω i =1 A i Φ Γ i ˆ x Γ i ; Algorithm 2:
Interior/boundary bases: Online solve and update at each SQP iteration5:
Serial : Solve the SQP system (4.4) or (4.10);6:
Serial : Update the boundary and interface states via Eq. (4.6) or (4.12);7:
Serial : Update the Lagrange multipliers via Eq. (4.6) or (4.12); For reference, a better solver based on antitriangular factorization for saddle point matrices that was proposed recently[39, 40] has the solving cost of only 8 mn − m ( n + m/
3) flops where n = (cid:80) n Ω i =1 ( p Ω i + p Γ i ) and m = n A . n z, Ω i p Ω i + 2 n z, Γ i p Γ i for the i th subdomain2 Parallel n z,ri c ri + n z,ri c Ji + 2 n z,ri w Ω i p Ω i + 2 n z,ri w Γ i p Γ i + 4 n A p Γ i + is dense ( B i ) (cid:0) n Bi n z,ri (1 + p Ω i + p Γ i ) (cid:1) for the i th subdomain3 Parallel 2 p Ω i n Bi + 2 p Γ i n Bi + p Γ i + ( n Bi ) (cid:16)(cid:0) p Γ i (cid:1) + 2 p Γ i p Ω i + (cid:0) p Ω i (cid:1) (cid:17) for the i th subdomain4 Serial 2 n Ω n A Table 2: Algorithm 2 operation countAlgorithm 2 step Parallel/serial Floating point operation count5 Serial
Interior/boundary bases : (cid:0) n Ω p Ω i + n Ω p Γ i + n A (cid:1) Full-subdomain bases : ( n Ω p i + n A ) n Ω ( p Ω i + p Γ i ) for the i th subdomain7 Serial 2 n A This section describes how the different proposed reduced bases can be constructed assuming that a full-system state-snapshot matrix X := (cid:2) x ( µ ) · · · x ( µ n train train ) (cid:3) ∈ R n × n train with { µ j train } n train j =1 ⊆ D has beenprecomputed during an “offline” training stage. Future work will consider constructing these snapshots fromsubdomain/component training snapshots alone, which will be necessary for truly extreme-scale models anddecomposable systems. Algorithm 3 lists the widely-used proper orthogonal decomposition (POD) algorithmthat we employ to construct all proposed reduced bases in this work. Throughout, υ ∈ [0 ,
1] is the “energycriterion” used to determine the applied truncation.
Algorithm 3:
POD : Proper orthogonal decomposition
Input:
Snapshots X ∈ R n × m , energy criterion υ ∈ [0 , Output:
Reduced-basis matrix Φ ∈ R n × p Compute (thin) singular value decomposition: X = U Σ V T , Set Φ = [ u · · · u p ], where p = min i ∈ Γ( υ ) , Γ( υ ) := { i | (cid:80) ij =1 σ j / (cid:80) mk =1 σ k ≥ − υ } . Here, U ≡ [ u · · · , u m ] and Σ ≡ diag( σ , . . . , σ m ). We first describe various approaches to constructing interior/boundary bases. • Interior bases . To compute interior bases Φ Ω i ∈ R n Ω i × p Ω i (cid:63) , i = 1 , . . . , n Ω , we simply execute Algorithm3 with snapshots isolated to subdomain interiors such that Φ Ω i = POD ( P Ω i X , υ ), i = 1 , . . . , n Ω . • (Boundary) Full-interface bases . Analogously, we compute full-interface bases by executing Algo-rithm 3 with snapshots isolated to subdomain interfaces such that Φ Γ i = POD ( P Γ i X , υ ), i = 1 , . . . , n Ω . • (Boundary) Port bases . To compute port bases, we first compute reduced bases for each port by exe-cuting Algorithm 3 with snapshots P j(cid:96) P Γ (cid:96) X , j = 1 , . . . , n p with any (cid:96) ∈ P ( j ), and compute the resulting12nterface bases by assembling the appropriate port bases as Φ Γ i = [ POD ( P q i i P Γ i X , υ ) · · · POD ( P q | Q ( i ) | i i P Γ i X , υ )], i = 1 , . . . , n Ω , where Q ( i ) ≡ { q ji } j . • (Boundary) Skeleton bases . This approach first computes a reduced basis for the “skeleton”,which is the union of subdomain interfaces, and subsequently isolates that basis to each subdomain’sinterface while ensuring full column rank on that interface. More precisely, we compute ¯ Φ Γ i = POD (( I − (cid:80) n Ω i =1 [ P Ω i ] T P Ω i ) X , υ ) followed by a rank-revealing QR factorization with column pivoting ¯ Φ Γ i P = Q i R i , and finally set Φ Γ i = (cid:104) q i · · · q p Γ i i (cid:105) , where Q i ≡ [ q i · · · q n train i ] and rank( ¯ Φ Γ i ) = p Γ i ( ≤ n train ).We note that skeleton bases require full-system snapshots and thus are not generally practical for eitherextreme-scale systems nor for decomposable systems, as both of these scenarios in practice precludethe ability to collect full-system snapshots; nevertheless, because this work does not directly considersubsystem/component-based training, we include this approach in the present work. The full-subdomain-basis approach computes reduced bases that have support over all degrees of freedomfor their respective subdomains, and subsequently isolates this basis to the subdomain interior and interface.That is, we compute Φ i = POD (cid:18)(cid:20) P Ω i P Γ i (cid:21) X , υ (cid:19) and set Φ Ω i = (cid:2) I n Ω i n Γ i (cid:3) Φ i and Φ Γ i = (cid:2) n Ω i × n Ω i I n Γ i × n Γ i (cid:3) Φ i ,where I n and n denote the n × n identity and zero matrices, respectively. A posteriori and a priori error bounds
For notational simplicity, this section omits explicit parameter dependence; results can be interpreted asholding for any arbtrary parameter instance µ ∈ D . We begin by stating assumptions that will be employedin subsequent analysis.A1 The DD-LSPG ROM employs strong constraints, i.e., A i = ¯ A i , i = 1 , . . . , n Ω .Under Assumption A1, the DD-LSPG ROMs can be converted to unconstrained minimization problems.First, we introduce the null-space matrix ¯ N ∈ R (cid:80) n Ω i =1 p Γ i × p null with p null := (cid:80) n Ω i =1 p Γ i − rank( ¯ A ), whichsatisfies ¯ A ¯ N = with ¯ A := [ ¯ A Φ Γ1 · · · ¯ A n Ω Φ Γ n Ω ]. Because strong constraints enforce a global solution suchthat Eq. (3.3) holds (e.g., see Remark 1), DD-LSPG yields a ‘global’ solution ˜ x ∈ R n satisfying˜ x Ω i = P Ω i ˜ x , ˜ x Γ i = P Γ i ˜ x , i = 1 , . . . , n Ω . (7.1)Now, the interior/boundary-basis problem (3.1) is equivalent to the unconstrained minimization problemwherein ˆ x Ω i , i = 1 , . . . , n Ω and ˆ x null comprise the solution to the problemminimize ( ˆ w Ω i ) , i =1 ,...,n Ω , ˆ w null n Ω (cid:88) i =1 (cid:107) B i r i ( Φ Ω i ˆ w Ω i , Φ Γ i ¯ N i ˆ w null ) (cid:107) , (7.2)with x i ≈ ˜ x i ≡ (˜ x Ω i , ˜ x Γ i ) = ( Φ Ω i ˆ x Ω i , Φ Γ i ¯ N i ˆ x null ) , (7.3)where ¯ N i ∈ R p Γ i × p null denotes the i th row block of ¯ N .Note that Problem (7.2) can be expressed equivalently as computing ˜ x that satisfiesminimize w ∈S I/B n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i w , P Γ i w ) (cid:107) , (7.4)13here the trial subspace S I/B ⊆ R n is defined as S I/B := { w ∈ R n | ∃ ˆ w Ω i ∈ R p Ω i , i = 1 , . . . , n Ω , and ˆ w null ∈ R p null s.t. P Ω i w = Φ Ω i ˆ w Ω i , P Γ i w = Φ Γ i ¯ N i ˆ w null , i = 1 , . . . , n Ω } ⊆ R n . (7.5)Similarly, A1 admits conversion of the full-subdomain-basis problem (3.2) to an unconstrained minimiza-tion problem wherein ˆ x null comprises the solution to the problemminimize ˆ w null n Ω (cid:88) i =1 (cid:107) B i r i ( Φ Ω i ¯ N i ˆ w null , Φ Γ i ¯ N i ˆ w null ) (cid:107) . (7.6)with x i ≈ ˜ x i ≡ (˜ x Ω i , ˜ x Γ i ) = ( Φ Ω i ¯ N i ˆ x null , Φ Γ i ¯ N i ˆ x null ) , (7.7)which can be expressed equivalently as computing ˜ x that satisfiesminimize w ∈S F n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i w , P Γ i w ) (cid:107) , (7.8)where the trial subspace S F ⊆ R n is defined as S F := { w ∈ R n | ∃ ˆ w null ∈ R p null s.t. P Ω i w = Φ Ω i ¯ N i ˆ w null , P Γ i w = Φ Γ i ¯ N i ˆ w null , i = 1 , . . . , n Ω } ⊆ R n , (7.9)We now introduce two more assumptions needed for the error bounds.A2 The residual is inverse Lipschitz continuous in the (cid:96) -norm, i.e., there exists κ (cid:96) > (cid:32) n Ω (cid:88) i =1 (cid:107) r i ( w Ω i , w Γ i ) − r i ( y Ω i , y Γ i ) (cid:107) (cid:33) / ≥ κ (cid:96) (cid:107) w − y (cid:107) , ∀ w , y ∈ R n (7.10)with w Ω i := Z Ω i w , w Γ i := Z Γ i w , y Ω i := Z Ω i y , and y Γ i := Z Γ i y , i = 1 , . . . , n Ω .A3 The B T B -norm and the (cid:96) -norm of the residual are equivalent over all elements of the trial subspacesuch that there exists P > (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i w , P Γ i w ) (cid:107) (cid:33) / ≥ P (cid:32) n Ω (cid:88) i =1 (cid:107) r i ( P Ω i w , P Γ i w ) (cid:107) (cid:33) / , ∀ w ∈ S ⊆ R n , (7.11)where S = S I/B in the case of interior/boundary bases and S = S F in the case of full-domain bases. Proposition 1 ( A posteriori error bound) . Under Assumptions A1–A3, the error in the DD-LSPG ROMapproximate solution for both interior/boundary bases and full-subdomain bases can be bounded as max( max i ∈{ ,...,n Ω } (cid:107) x Ω i − ˜ x Ω i (cid:107) , max j ∈{ ,...,n p } (cid:107) P j x − P j ˜ x (cid:107) ) ≤ (cid:107) x − ˜ x (cid:107) ≤ P κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i (˜ x Ω i , ˜ x Γ i ) (cid:107) (cid:33) / , (7.12) where ˜ x ∈ R n is the ‘global’ solution satisfying Eq. (7.1) and where P j ∈ { , } n pj × n , j = 1 , . . . , n p comprisesselected rows of the identity matrix that extract the global degrees of freedom associated with the j th port.Proof. Leveraging the norm-equivalence relation (cid:107) · (cid:107) ∞ ≤ (cid:107) · (cid:107) , the ‘global’ solution relations (7.1), sequen-tially invoking A2 and A3, and noting that r i ( x Ω i , x Γ i ) = 0, i = 1 , . . . , n Ω yieldsmax i ∈{ ,...,n Ω } (cid:107) x Ω i − ˜ x Ω i (cid:107) ≤ (cid:107) x − ˜ x (cid:107) ≤ κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) r i (˜ x Ω i , ˜ x Γ i ) (cid:107) (cid:33) / ≤ P κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i (˜ x Ω i , ˜ x Γ i ) (cid:107) (cid:33) / , (7.13)14hich is valid for both interior/boundary bases and full-subdomain bases according to A3. On the invocationof the (cid:96) ∞ -norm, we have decomposed the state vector into segments: one for each group of interior degreesof freedom, and one for each port.We now introduce another assumption that will be employed to derive a priori error bounds.A4 The residual is Lipschitz continuous in the B T B -norm, i.e., there exists κ u > (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i ( w Ω i , w Γ i ) − B i r i ( y Ω i , y Γ i ) (cid:107) (cid:33) / ≤ κ u (cid:107) w − y (cid:107) , ∀ w , y ∈ R n . (7.14) Proposition 2 ( A priori error bound with respect to the (cid:96) -optimal approximation error) . Under Assump-tions A1–A4, the error in the DD-LSPG ROM approximate solution for both interior/boundary bases andfull-subdomain bases can be bounded in terms of the (cid:96) -optimal approximation error as (cid:107) x − ˜ x (cid:107) ≤ κ u P κ (cid:96) min w ∈S (cid:107) x − w (cid:107) (7.15) where S = S I/B in the case of interior/boundary bases and S = S F in the case of full-subdomain bases.Proof. We have from Eq. (7.3), and Problems (7.4) and (7.8), and Inequality (7.12) that1
P κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i (˜ x Ω i , ˜ x Γ i ) (cid:107) (cid:33) / = 1 P κ (cid:96) min w ∈S (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i w , P Γ i w ) (cid:107) (cid:33) / ≤ P κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i x (cid:63), , P Γ i x (cid:63), ) (cid:107) (cid:33) / ≤ κ u P κ (cid:96) (cid:107) x − x (cid:63), (cid:107) . (7.16)Defining x (cid:63), := arg min w ∈S (cid:107) x − w (cid:107) and combining this with Proposition 1 yields the desired result. Proposition 3 ( A priori error bound with respect to the (cid:96) ∞ -optimal approximation error) . Under Assump-tions A1–A4, max( max i ∈{ ,...,n Ω } (cid:107) x Ω i − ˜ x Ω i (cid:107) , max j ∈{ ,...,n p } (cid:107) P j x − P j ˜ x (cid:107) ) ≤ κ u √ n Ω + n p P κ (cid:96) min w ∈S max( max i ∈{ ,...,n Ω } (cid:107) x Ω i − w Ω i (cid:107) , max j ∈{ ,...,n p } (cid:107) P j x − P j w (cid:107) ) , (7.17) where S = S I/B in the case of interior/boundary bases and S = S F in the case of full-subdomain bases.Proof. Analogously to the proof of Proposition 2 we have from Eq. (7.3), and Problems (7.4) and (7.8), andInequality (7.12) that1
P κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i (˜ x Ω i , ˜ x Γ i ) (cid:107) (cid:33) / = 1 P κ (cid:96) min w ∈S (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i w , P Γ i w ) (cid:107) (cid:33) / ≤ P κ (cid:96) (cid:32) n Ω (cid:88) i =1 (cid:107) B i r i ( P Ω i x (cid:63), ∞ , P Γ i x (cid:63), ∞ ) (cid:107) (cid:33) / ≤ κ u P κ (cid:96) (cid:107) x − x (cid:63), ∞ (cid:107) ≤ κ u √ n Ω + n p P κ (cid:96) max( max i ∈{ ,...,n Ω } (cid:107) x Ω i − x Ω ,(cid:63), ∞ i (cid:107) , max j ∈{ ,...,n p } (cid:107) P j x − P j x (cid:63), ∞ (cid:107) ) (7.18)where, on the invocation of the (cid:96) ∞ -norm, we have decomposed the state vector into segments: one for eachgroup of interior degrees of freedom, and one for each port. Defining x (cid:63), ∞ as satisfying the minimizationproblem minimize w ∈S max( max i ∈{ ,...,n Ω } (cid:107) x Ω i − w Ω i (cid:107) , max j ∈{ ,...,n p } (cid:107) P j x − P j w (cid:107) ) (7.19)and combining with Proposition 1 yields the desired result.15 Numerical experiments
This section reports numerical experiments that assess the performance of the proposed DD-LSPG methodon two benchmark problems. In particular, we focus on the effect of key method parameters (i.e., constrainttype, basis type, hyper-reduction, truncation levels) on accuracy and speed in both weak and strong scaling.We compare the following methods: • FOM . This model corresponds to the full-order model, i.e., the solution satisfying Eq. (2.1) (equiv.Eq. (2.5)). • DD-LSPG . This model corresponds to the unweighted LSPG ROM, i.e., the solution satisfies (3.1)–(3.2) with B i = I , i = 1 , . . . , n Ω . • DD-GNAT . This model corresponds to the GNAT ROM, i.e., the solution satisfying (3.1)–(3.2)with B i = ( Z i Φ ri ) + Z i , i = 1 , . . . , n Ω . Algorithm 4 is used to construct the sampling matrices Z i , i = 1 , . . . , n Ω .We assess the accuracy of any ROM solution ˜ x ( µ ) as followsrelative error = (cid:118)(cid:117)(cid:117)(cid:116) n Ω n Ω (cid:88) i =1 (cid:107) ˜ x i ( µ ) − x i ( µ ) (cid:107) (cid:107) x i ( µ ) (cid:107) , (8.1)and we measure its computational cost in terms of the wall time incurred by the ROM simulation relativeto that incurred by the FOM simulation; the speedup is the reciprocal of the relative wall time. All timingsare obtained by performing calculations in Matlab R2018b on an 2x6-Core Intel Xeon 2.93GHz with 64 GBRAM of memory. Reported timings comprise the average over five simulations. (a) “Coarse” mesh: 40x40 elements (b) “Fine” mesh: 80x80 elements Figure 3: Heat equation, two global meshes used for discretization.16 a) µ = (1 ,
1) (b) µ = (10 , Figure 4: Heat equation, FOM solutions for different µ using the fine mesh.Table 3: Heat equation, parameters of global FOM discretization“Coarse” mesh “Fine” meshNumber of elements 1600 6400Number of nodes 1681 6561Number of degrees of freedom n u ( x , µ ) with x ≡ ( x , x ) ∈ Ω = [0 , and µ ≡ ( µ , µ ) ∈ D = [0 . , and homogeneous Dirichlet boundary condition on Γ ≡ ∂ Ω satisfying − ∇ u + µ µ ( e µ u −
1) = 100 sin(2 π x ) sin(2 π x ) . (8.2)This model can be interpreted as a 2D stationary diffusion problem with a nonlinear interior heat source.The resulting solution exhibits a strongly nonlinear dependence on the parameters µ .For spatial discretization, we apply the finite-element method using two meshes (which will be used toassess strong and weak scaling): a “coase” mesh and a “fine” mesh, characterized by 1600 (40 ×
40) and 6400(80 ×
80) bilinear quadrilateral (Q1) elements, respectively. Figure 3 depicts these meshes, while Table 3reports the corresponding parameters. Figure 4 plots the FOM reference solutions on the “fine” mesh withtwo different parameter values µ = (1 ,
1) and µ = (10 , After applying the finite-element discretization, we introduce the algebraically non-overlapping decomposi-tion of the problem described in Section 2. For this problem, the chosen algebraic decomposition correspondsto a spatial domain decomposition in space. In particular, we employ decompositions into both 2 × n Ω = 4) and 4 × n Ω = 16) configurations as depicted in Figure 5; note that these localsubdomains have one layer of elements overlapping (as explained in Figure 1). We apply the 2 × × × × × × × × × n Ω n ¯ A
172 1092 332Number of ports n p n Ω1
441 441 1681weak scaling strong scalingTable 5: Heat equation, parameters on each Ω i of the 2 × n p = 5 total ports with n p = 76, n p = 76, n p = 78, n p = 78, n p = 4.Ω Ω Ω Ω n r i n Ω i n Γ i
156 158 158 160 n i (= n Ω i + n Γ i ) 1600 1640 1640 1681Number of subdomain ports | Q ( i ) | Ω Ω Ω Ω (a) 2 × Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω (b) 4 × Figure 5: Heat equation, two domain decomposition configurations based on the finite-element mesh.and 4 × i , i = 1 , . . . , n Ω of the 2 × To generate the reduced bases required for the reduced-order models, we solve the FOM (2.1) for µ ∈{ µ j train } n train j =1 ⊂ D . In our case, we define the training-parameter set { µ j train } n train j =1 via a 20 ×
20 equispacedsampling of the parameter domain D , yielding n train = 400 samples. We apply the methods describedin Section 6 to create port, skeleton, full-interface, and full-subdomain bases from these training data.However, we recall that skeleton bases require full-system snapshots and thus are not generally practical fordecomposable systems that demand “bottom-up” training; we still include this approach for comparativepurposes. At each iteration of the Newton–Raphson algorithm used to solve the FOM equations (2.1), the18esidual vector is saved; the resulting residual snapshots are employed to generate the residual bases Φ ri , i = 1 , . . . , n Ω employed by DD-GNAT via POD. Lastly, the GNAT offline algorithm 4 is performed to createsample meshes s i for all subdomains Ω i .Table 6: Heat equation, 2 × µ = (5 . , . / ∈{ µ j train } n train j =1 for one online computation. Recall from Section 6 that υ ∈ [0 ,
1] denotes the energy criterionemployed by POD. constraint strongbasis port skeleton full-interface subdomainmethod DD-LSPG DD-GNAT DD-LSPG DD-GNAT DD-LSPG DD-GNAT DD-LSPG DD-GNAT υ for state 1 − − − − − − − − − − − − − − − − υ for residual 1 − − − − − − − − n zi /p ri Table 7: Heat equation, 2 × i resulting from Table 6.basis port skeleton full-interface subdomainΩ Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω n A
18 9 12 12 p Ω i p Γ i p p p p p p n zi
102 310 310 104 102 310 310 104 102 310 310 104 102 310 310 104 p ri
51 155 155 52 51 155 155 52 51 155 155 52 51 155 155 52We now compare the DD-LSPG and DD-GNAT methods for fixed values of their parameters, and for asingle randomly selected online point µ = (5 . , . / ∈ { µ j train } n train j =1 ; results at other online points arequalitatively similar. Table 6 reports the chosen input parameters and associated performance of the meth-ods, while the resulting ROM parameters over each subdomain Ω i , i = 1 , . . . , n Ω are listed on Table 7. Theresults in Table 6 confirm the comments in Remark 1, which suggested that enforcing strong compatibilitycan yield poor results for full-interface and full-subdomain bases, and that only port and (generally imprac-tical) skeleton bases are well-suited for strong compatibility constraints. Figure 6 visualizes DD-LSPG andDD-GNAT solutions for the port-bases case: it shows that DD-LSPG and DD-GNAT yield accurate resultsfor port bases with strong constraints as anticipated. Because assessing a given method’s performance for a single instance of its parameters does not lend insightinto the model’s complete error–cost performance tradeoff, this section subjects each of the proposed ROMsto a parameter study wherein each model parameter is varied between limits specified in Table 8.For each weak compatibility constraint case, we generate five sets of random matrices C j , j = 1 , . . . , n p ,which in turn yields five different sets of constraint matrices A i , i = 1 . . . , n Ω as described in Section 3.3.We use each set to perform one ROM simulation, and record the associated timing and relative error of thatsimulation. The reported timing and relative error comprises the average obtained over these five simulations.For the strong-constraint case, we perform only one simulation as A i = ¯ A i defined uniquely. The recorded19 a) global FEM solution (b) DD-LSPG solution (c) DD-LSPG error(d) Sample mesh (e) DD-GNAT solution (f) DD-GNAT error Figure 6: Heat equation, 2 × υ ∈ [0 ,
1] denotes the energy criterion employedby POD. method DD-LSPG DD-GNAT υ on Ω i for interior/boundary bases { − − , − − } { − − , − − } υ on Γ i for interior/boundary bases { − − , − − } { − − , − − } υ for full-subdomain bases { − − , − − , { − − , − − , − − , − − } − − , − − } υ for r i { − − , − − , − − , − − } n zi /p ri {
1, 1.5, 2, 4 } constraint type {
1, 2, 3, 4, 5, strong } {
1, 2, 3, 4, 5, strong } basis types { port, skel., intf., subdom. } { port, skel., intf., subdom. } wall time for any parallel step is set to the largest wall time incurred by any subdomain, while that for anyserial step is simply set to overall wall time incurred by the step (see Algorithms 1–2).From these results, we then construct a Pareto front to characterize the error–cost for each method. ThisPareto front is characterized by the collection of method-parameter instances that yield simulation resultsthat are not dominated in both relative error or wall time by any other method-parameter instance. Figure 7reports these Pareto fronts for the three configurations considered. Figure 8 plots the average relative errorversus number of constraints per port for the 4 × a) Heat 2 × × × × × × × × × Figure 7: Heat equation, Pareto front plots for wall-all (normalized with respect global FEM wall-all timing),wall-assemble and wall-solve timing of three different configurations for varying model parameters reportedin Table 8.We first analyze the solve wall time, and thus consider the subfigures in the rightmost column of Figure 7.These plots yield the following observations:(i) For a given basis type, the solve costs for DD-LSPG and DD-GNAT are nearly the same, which issensible because they yield SQP systems of the same dimension and structure.(ii) The 4 × n Ω assuming fixed basis dimensions (seeTables2).(iii) For both DD-LSPG and DD-GNAT, the solve times for both the 2 × × a) port, υ = 10 − on Ω i , υ = 10 − on Γ i (b) port, υ = 10 − on Ω i , υ = 10 − on Γ i (c) port, υ = 10 − on Ω i , υ = 10 − on Γ i (d) skeleton, υ = 10 − on Ω i , υ =10 − on Γ i (e) skeleton, υ = 10 − on Ω i , υ =10 − on Γ i (f) skeleton, υ = 10 − on Ω i , υ =10 − on Γ i (g) full-interface, υ = 10 − on Ω i , υ = 10 − on Γ i (h) full-interface, υ = 10 − on Ω i , υ = 10 − on Γ i (i) full-interface, υ = 10 − on Ω i , υ = 10 − on Γ i (j) subdomain, υ = 10 − (k) subdomain, υ = 10 − (l) subdomain, υ = 10 − Figure 8: Heat equation, 4 × x, y ) implies the DD-GNAT model with n zi /p ri = x and υ = 10 − y for r i .22iv) For a fixed error, the port basis type incurred the largest solve wall time compared with the other threebasis times; this is sensible because—on average—it has a larger basis dimension compared with theother three types. The skeleton, full-interface and subdomain types yield roughly the same solve costfor a fixed error.We now analyze assembly wall time and thus consider the middle column of Figure 7. These figuresillustrate the following trends:(i) For DD-LSPG, the 2 × n Ω i and subdomain boundary n Γ i is the largest in this case, and n z,ri = n z, Ω i = n i and n z, Γ i = n Γ i for DD-LSPG (see Table 4). On the other hand, the assembly walltime for DD-LSPG is similar in the 2 × × × × × > × wall-timespeedups relative to the FOM in this case, with DD-LSPG yielding only modest wall-time speedups, whichare < × in both the 2 × × compatible bases on shared ports, so weak compatibility constraints lead to nobenefit; see Remark 1. On the other hand, the subfigures in the two last rows of Figure 8 show that weakconstraint case with only one constraint per port yield the best accuracy for full-interface and full-subdomainbasis types. As discussed in Remark 1, this is result is expected because neighboring components generallyhave incompatible bases on shared ports for these basis types The next section will lend additional insightinto the behavior of the full-interface and full-subdomain basis types. For illustrative purposes, we investigate further the effects of weak versus strong compatibility constraintsfor the full-interface basis type. Due to its similarity, only full-interface bases is discussed here. We considerthe parameters for DD-LSPG simulation as follows: 4 × µ = (5 . , . υ = 1 − − on Ω i , υ = 1 − − on Γ i , and weak constraint cases with both one and fourconstraints per port. Figures 9 and 10 visualize the corresponding solutions of these two cases. These figuresverify visually our observations in the previous section and the comments in Remark 1: enforcing weakcompatibility with only one constraint per port yields better global solutions despite a larger discrepancy23 a) DD-LSPG sol. on Ω (b) DD-LSPG error on Ω (c) Absolute discrepancy on Ω(d) DD-LSPG sol. on Ω (e) DD-LSPG sol. on Ω (f) DD-LSPG sol. on Ω Figure 9: Heat equation, 4 × Table 9: Burgers equation, parameters for the exact solutionParameter ν x a a a a a λ Value 0.1 1 a ∈ [1 , ∈ [5 , n a) DD-LSPG sol. on Ω (b) DD-LSPG error on Ω (c) Absolute discrepancy on Ω(d) DD-LSPG sol. on Ω (e) DD-LSPG sol. on Ω (f) DD-LSPG sol. on Ω Figure 10: Heat equation, 4 × -1 -0.5 0 0.5 100.010.020.030.040.05 (a) “Coarse” 120x12 elements -1 -0.5 0 0.5 100.010.020.030.040.05 (b) “Fine” 240x12 elements Figure 11: Burgers’ equation. Two global FD mesh used for discretization.consists of computing the velocity field u ≡ ( u , u ) that satisfies u · ∇ u = ν ∇ u , (8.3)where ν is the viscosity coefficient, x = ( x , x ) ∈ Ω = [ − , × [0 , . ≡ ∂ Ω) for the numerical solutions are taken directly from the exact solution that is defined25 a) u -component (b) u -component Figure 12: Burgers’ equation, global FD solution for µ = (7692 . , . u = − ν (cid:104) a + a x + λa (cid:16) e λ ( x − x ) + e − λ ( x − x ) (cid:17) cos( λ x ) (cid:105) / Φ ,u = − ν (cid:104) a + a x − λa (cid:16) e λ ( x − x ) + e − λ ( x − x ) (cid:17) sin( λ x ) (cid:105) / Φ , (8.4)where Φ = a + a x + a x + a x x + a (cid:16) e λ ( x − x ) + e − λ ( x − x ) (cid:17) cos( λ x ), and a i , i = 1 , . . . , λ and x aregiven scalars. To parameterize the problem, these parameters are given on Table 9, and the input parameter µ is defined as µ ≡ ( µ , µ ) = ( a , λ ) ∈ D = [1 , × [5 , ×
12) and 2880(240 ×
12) quadrilateral elements, respectively. Figure 11 depicts these meshes, while Table 10 reports thecorresponding parameters. We emphasize that this problem is characterized by two degrees of freedom pernode as opposed to the previous example. Figure 12 plots the FD reference solution on the “fine” mesh with µ = (7692 . , . µ = ( a , λ ), where a relates to the distance of the shock from the left edge and λ relates to the steepnessof the shock, respectively. Table 11: Burgers equation, parameters used for three FOM configurations4x2 “coarse” 8x2 “fine” 4x2 “fine” n Ω n ¯ A
704 1488 1184 n p
13 29 13
434 434 854
217 217 427weak scaling strong scaling26 Ω Ω Ω Ω Ω Ω Ω Ω (a) 4x2 configuration -1 -0.5 0 0.5 100.010.020.030.040.05 Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω (b) 8x2 configuration Figure 13: Burgers’ equation, two domain-decomposition configurations based on the finite-difference mesh.Table 12: Burgers equation, parameters on each Ω i for the 4x2 “fine” configuration. In this case, there are n p = 13 total ports with n p = 16, n p = 232, n p = 20, n p = 16, n p = 232, n p = 20, n p = 16, n p = 232, n p = 20, n p = 236, n p = 8, n p = 8, n p = 8. Ω Ω Ω Ω Ω Ω Ω Ω n r i
590 708 600 720 600 720 600 720 n Ω i
464 580 464 580 464 580 472 590 n Γ i
256 260 280 288 280 288 260 264 n i (= n Ω i + n Γ i ) 720 840 744 868 744 868 732 854Number of subdomain ports | Q ( i ) | × n Ω = 8) and 8 × n Ω = 16) configurations as depicted in Figure 13.Table 11 lists the parameters used for each of these configurations. The pairwise comparison of the 4 × × × × i , i = 1 , . . . , n Ω of the 4 × We apply the same procedure described in Section 8.1.3 to generate the reduced bases required for thereduced-order models. In particular, we solve the FOM (2.1) for µ ∈ { µ j train } n train j =1 ⊂ D , where we againdefine the training-parameter set { µ j train } n train j =1 according to a 20 ×
20 equispaced sampling of the parameterdomain D , yielding n train = 400 samples. We then apply the methods described in Section 6 to createport, skeleton, full-interface, and full-subdomain bases from these training data. At each iteration of theNewton–Raphson algorithm used to solve the FOM equations (2.1), the residual vector is saved, and theresulting residual snapshots are employed to generate the residual bases Φ ri , i = 1 , . . . , n Ω that are used byDD-GNAT via POD. Lastly, the GNAT offline algorithm 4 is performed to create sample meshes s i for allsubdomains Ω i . 27able 13: Burgers 4x2 “fine” configuration, ROM methods performance at point µ = (7692 . , . / ∈{ µ j train } n train j =1 for one online computation. Recall from Section 6 that υ ∈ [0 ,
1] denotes the energy criterionemployed by POD. constraint strongbasis port skeleton full-interface subdomainmethod DD-LSPG DD-GNAT DD-LSPG DD-GNAT DD-LSPG DD-GNAT DD-LSPG DD-GNAT υ for state 1 − − − − − − − − − − − − − − − − υ for residual 1 − − − − − − − − n zi /p ri We now compare the methods DD-LSPG and GNAT for fixed values of their parameters, and for therandomly selected online point µ = (7692 . , . / ∈ { µ j train } n train j =1 . Table 13 reports the chosen inputparameters and associated performance of the methods, while Figure 14 visualizes the DD-LSPG and DD-GNAT solutions for the port-bases case. Again, the results on Table 13 confirm the comments in Remark1, which suggested that enforcing strong compatibility can yield poor results for full-interface and full-subdomain bases, and that only port and skeleton bases are well-suited for strong compatibility constraints.Figure 14 shows DD-LSPG and DD-GNAT yield accurate results for port bases with strong constraints asanticipated. Table 14: Burgers equation, ROM-method parameters limits for parameter study (skel.=skeleton, intf.=full-interface, subdom.=subdomain). Recall from Section 6 that υ ∈ [0 ,
1] denotes the energy criterion employedby POD. method DD-LSPG GNAT υ on Ω i for interior/boundary bases { − − , − − } { − − , − − } υ on Γ i for interior/boundary bases { − − , − − } { − − , − − } υ for full-subdomain bases { − − , − − , { − − , − − , − − , − − } − − , − − } υ for r i { − − , − − , − − , − − } n zi /p ri {
1, 1.5, 2, 4 } number of constraints {
1, 2, 3, 4, 5, strong } {
1, 2, 3, 4, 5, strong } basis types { port, skel., intf., subdom. } { port, skel., intf., subdom. } We again compare the performance of the ROM methods across a wide variation of all method parameters.Table 14 reports the tested parameter values for each method. We employ the same approach to reportingwall times as previously described in Section 8.1.4. Again, as described previously in Section 8.1.4, we thenconstruct a Pareto front for each method. Figure 15 reports these Pareto fronts, while Figure 16 plots theaverage relative error versus number of constraints per port for the 4x2 “fine” configuration.Comparing Figures 15 and 7 illustrates that nearly identical overall trends are apparent for the twoexamples; we thus refer to the discussion in Section 8.1.4 to provide the primary interpretations for the28 a) FOM solution ( u ) (b) DD-LSPG solution ( u ) (c) DD-LSPG error ( u )(d) Sample mesh ( u ) (e) DD-GNAT solution ( u ) (f) DD-GNAT error ( u )(g) FOM solution ( u ) (h) DD-LSPG solution ( u ) (i) DD-LSPG error ( u )(j) Sample mesh ( u ) (k) DD-GNAT solution ( u ) (l) DD-GNAT error ( u ) Figure 14: Burgers equation, 4x2 “fine” configuration, port bases in Table 13, solutions visualized on Ω.current case. The primary difference between the previous example and the current one is that the full-subdomain bases outperform the full-interface bases in this case and thus yield the best overall performance;further, the skeleton basis yields worse wall-time performance for smaller errors compared with the port bases29 a) Burger 4x2 “coarse” (b) Burger 4x2 “coarse” (c) Burger 4x2 “coarse”(d) Burger 8x2 “fine” (e) Burger 8x2 “fine” (f) Burger 8x2 “fine”(g) Burger 4x2 “fine” (h) Burger 4x2 “fine” (i) Burger 4x2 “fine”
Figure 15: Burgers’ equation, Pareto front plots for wall-all (normalized with respect global FOM wall-alltiming), wall-assemble and wall-solve timing of three different configurations for parameters reported inTable 14.in the present example. We emphasize that—as in the previous example—DD-GNAT yields the best overallperformance, achieving > × speedup with <
1% relative error, and performs best with the full-subdomainbasis in this case.Lastly, Figure 16 reports the average relative error as a function of the number of constraints per portwith parameters reported in Table 14 for the 4x2 “fine” configuration. Again, comparing Figures 16 and8 illuminate that nearly identical overall trends are observed in this example as in the previous one. Inparticular, the subfigures in the top two rows of Figure 16 imply that strong compatibility constraints yieldbetter accuracy than weak compatibility for both port and skeleton basis types, while the two last rows showthat weak constraint case with a small number of constraints per port yield the best accuracy for full-interfaceand full-subdomain basis types. The discussion in Remark 1 accounts for this behavior: approaches thatensure neighboring components have compatible bases on shared ports perform best with strong compatibilityconstraints, while approaches that allow for neighboring components to have incompatible bases on shared30orts perform best with weak compatibility constraints.
This work proposed the domain-decomposition least-squares Petrov–Galerkin (DD-LSPG) model-reductionmethod applicable to parameterized systems of nonlinear algebraic equations. In contrast to previous works,we adopt an algebraically non-overlapping decomposition strategy, allowing it to be applicable to multiplediscretization techniques in the case of parameterized PDEs; further, in constrast with previous DDROMmethods for nonlinear systems, it is a “complete ROM” approach rather than a hybrid ROM/FOM tech-nique. We equipped DD-LSPG with hyper-reduction, four different strategies for constructing subdomainbases, supported both strong and weak compatibility constraints, and proposed an SQP solver that ex-poses parallelization. Further, we developed both a posteriori and a priori error bounds for the technique.Numerical experiments revealed several interesting performance attributes of the DD-LSPG methodology:1. The best type of compatibility constraint is strongly dependent on the type of subdomain bases; inparticular, subdomain bases that admit basis incompatibilities on shared interfaces (i.e., full-interfaceand full-subdomain bases) require weak compatibility constraints to avoid trivial interface solutions,while subdomain bases that guarantee shared-interface compatibility (i.e., port and skeleton bases)perform well with strong constraints.2. Hyper-reduction is essential to keep assembly costs low when the number of DOFs per subdomain islarge; this is evidenced by the substantial performance gains of DD-GNAT over DD-LSPG for suchcases.3. The best overall performance was achieved by full-subdomain and full-interface bases that employedweak compatibility constraints, with the worst performance obtained by port bases, as the latter casegenerally yields a large number of interface DOFs compared with the other approaches. Skeleton basesgenerally yielded intermediate performance, but are impractical for truly extreme-scale problems ordecomposable systems, as they require full-system snapshots to be constructed.Future work will consider application to truly large-scale problems, alternative parallel numerical solversfor DD-LSPG, “bottom-up” training strategies that don’t require full-system snapshots and thus makethe approach directly amenable to extreme-scale and decomposable systems, considering time-dependentproblems, and supporting nonlinear trial manifolds [43] for subdomains rather than strictly linear subspacesspanned by reduced bases.
Acknowledgements
This paper describes objective technical results and analysis. Any subjective views or opinions that might beexpressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the UnitedStates Government. Sandia National Laboratories is a multimission laboratory managed and operated byNational Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of HoneywellInternational, Inc., for the U.S. Department of Energys National Nuclear Security Administration undercontract DE-NA-0003525.
A Offline computational procedure for GNAT
A.1 Formation of residual bases
Given a residual-snapshot matrix X r g = (cid:2) r ( x ; µ ) , . . . , r k ( x ; µ ) , . . . , r ( x ; µ n train train ) , . . . , r k n train ( x ; µ n train train ) (cid:3) ∈ R n × (cid:80) n train j =1 k j , where k j is the number of Newton iterations associated with parameter µ j train , ≤ j ≤ n train .We will build the residual bases on each subdomain as: Φ ri = POD ( P r i X r g , υ ), i = 1 , . . . , n Ω .31 .2 Greedy algorithm Algorithm 4:
Greedy algorithm to construct spatial sample sets of all subdomains Ω i Input:
On each subdomain Ω i : residual basis Φ ri ∈ R n r i × p ri , desired number of sample nodes n zi ,number of working columns of Φ ri denoted by n ci ≤ min( p ri , γn zi ), where γ denotes the number ofunknowns at a node. Output:
Spatial sample set s i on each subdomain Ω i , i = 1 , . . . , n Ω for i = 1 : n Ω { drop subscript i from this line for legibility } do s ← { corner nodes } Compute the additional number of nodes to sample: n a = n z − | s | Initialize counter for the number of working basis vectors used: n b ← Set the number of greedy iterations to perform: n it = min( n c , n a ) Compute the maximum number of right-hand sides in the least squares problems: n RHS = ceil( n c /n a ) Compute the minimum number of working basis vectors per iteration: n cj, min = floor( n c /n it ) Compute the minimum number of sample nodes to add per iteration: n aj, min = floor( n a n RHS /n c ) for j = 1 , . . . , n it { greedy iteration loop } do Compute the number of working basis vectors for this iteration: n cj ← n cj, min If ( j ≤ n c mod n it ), then n cj ← n cj + 1 Compute the number of sample nodes to add during this iteration: n aj ← n aj, min If ( n RHS = 1) and ( j ≤ n a mod n c ), then n aj ← n aj + 1 if j = 1 then [ R . . . R n cj ] ← [ φ r . . . φ n cj r ] else for q = 1 , . . . , n cj { basis vector loop } R q ← φ n b + qr − [ φ r . . . φ n b r ] α , with α = argmin γ ∈ R nb (cid:107) [ Z φ r . . . Z φ n b r ] γ − Z φ n b + qr (cid:107) end for end if for k = 1 , . . . , n aj { sample node loop } do Choose node with largest average error: n ← arg max l/ ∈ s (cid:80) n cj q =1 (cid:16)(cid:80) j ∈ δ ( l ) ( R qj ) (cid:17) , where δ ( l )denotes the degrees of freedom associated with node l . s ← s ∪ { n } end for n b ← n b + n cj end for end for We adopt and adjust the original greedy algorithm developed earlier [34] to build the sample mesh foreach subdomain Ω i , i = 1 , . . . , n Ω . Algorithm 4 presents the modified greedy algorithm in which we dropthe subscript i of subdomains for avoiding cumbersomeness, and note that [ φ r . . . φ p ri r ] = Φ ri is the residualbases and Z = Z i is the sampling matrix of each subdomain Ω i .In comparison to the original greedy algorithm [34], Algorithm 4 has two modifications: (i) there is anouter “for loop” that loops over all subdomains (line 1, algorithm 4), and (ii) we include the “corner” nodes(i.e., any interface node) into the sample mesh before the first greedy iteration (line 2, algorithm 4). The lattermodification ensures that there is at least one interface node be included in the sample mesh, as otherwiseˆ x Γ i (and hence ˜ x Γ i ) will not be updated through Newton iterations (since p Γ( k ) i is always zero in (4.6)).Namely, there may have no connection between one subdomain with surrounding neighbor subdomains, orthat subdomain is completely isolated. This phenomenon is called “digraph connecting condition” [44] (see32gure 1 for an example of corner nodes of subdomains). We also note that the offline GNAT procedure isperformed only once as it only depends on the residual bases of each subdomain, and completely do not depend on basis types, constraint types and solver types of the DD-LSPG problem. References [1] Maday, Y. and Rønquist, E. M., “A reduced-basis element method,” Journal of scientific computing,Vol. 17, No. 1-4, 2002, pp. 447–459.[2] Maday, Y. and Ronquist, E. M., “The reduced basis element method: application to a thermal finproblem,” SIAM Journal on Scientific Computing, Vol. 26, No. 1, 2004, pp. 240–258.[3] Iapichino, L., Quarteroni, A., and Rozza, G., “A reduced basis hybrid method for the coupling ofparametrized domains represented by fluidic networks,” Computer Methods in Applied Mechanics andEngineering, Vol. 221, 2012, pp. 63–82.[4] Huynh, D. B. P., Knezevic, D. J., and Patera, A. T., “A static condensation reduced basis elementmethod: approximation and a posteriori error estimation,” ESAIM: Mathematical Modelling andNumerical Analysis, Vol. 47, No. 1, 2013, pp. 213–251.[5] Eftang, J., Huynh, D., Knezevic, D., Ronquist, E., and Patera, A., “Adaptive port reduction in staticcondensation,” IFAC Proceedings Volumes, Vol. 45, No. 2, 2012, pp. 695–699.[6] Iapichino, L., Quarteroni, A., and Rozza, G., “Reduced basis method and domain decomposition forelliptic problems in networks and complex parametrized geometries,” Computers & Mathematics withApplications, Vol. 71, No. 1, 2016, pp. 408–430.[7] Benzi, M., Golub, G. H., and Liesen, J., “Numerical solution of saddle point problems,” Acta numerica,Vol. 14, 2005, pp. 1–137.[8] Huynh, D. B. P., Knezevic, D. J., and Patera, A. T., “A static condensation reduced basis elementmethod: Complex problems,” Computer Methods in Applied Mechanics and Engineering, Vol. 259,2013, pp. 197–216.[9] Nguyen, N. C., “A multiscale reduced-basis method for parametrized elliptic partial differential equa-tions with multiple scales,” Journal of Computational Physics, Vol. 227, No. 23, 2008, pp. 9807–9822.[10] He, W., Avery, P., and Farhat, C., “In-situ adaptive reduction of nonlinear multiscale structural dy-namics models,” arXiv preprint arXiv:2004.00153, 2020.[11] Kaulmann, S., Ohlberger, M., and Haasdonk, B., “A new local reduced basis discontinuous Galerkinapproach for heterogeneous multiscale problems,” Comptes Rendus Mathematique, Vol. 349, No. 23-24,2011, pp. 1233–1238.[12] Ohlberger, M. and Schindler, F., “Error control for the localized reduced basis multiscale method withadaptive on-line enrichment,” SIAM Journal on Scientific Computing, Vol. 37, No. 6, 2015, pp. A2865–A2895.[13] Abdulle, A. and Henning, P., “A reduced basis localized orthogonal decomposition,” Journal ofComputational Physics, Vol. 295, 2015, pp. 379–401.[14] Martini, I., Rozza, G., and Haasdonk, B., “Reduced basis approximation and a-posteriori error estima-tion for the coupled Stokes-Darcy system,” Advances in Computational Mathematics, Vol. 41, No. 5,2015, pp. 1131–1157. 3315] Buhr, A., Engwer, C., Ohlberger, M., and Rave, S., “ArbiLoMod, a simulation technique designed forarbitrary local modifications,” SIAM Journal on Scientific Computing, Vol. 39, No. 4, 2017, pp. A1435–A1465.[16] Barbiˇc, J. and Zhao, Y., “Real-time large-deformation substructuring,” ACM transactions on graphics(TOG), Vol. 30, No. 4, 2011, pp. 1–8.[17] Kim, T. and James, D. L., “Physics-based character skinning using multidomain subspace deforma-tions,” IEEE transactions on visualization and computer graphics, Vol. 18, No. 8, 2012, pp. 1228–1240.[18] Yang, Y., Xu, W., Guo, X., Zhou, K., and Guo, B., “Boundary-aware multidomain subspace deforma-tion,” IEEE transactions on visualization and computer graphics, Vol. 19, No. 10, 2013, pp. 1633–1645.[19] Peiret, A., Andrews, S., K¨ovecses, J., Kry, P. G., and Teichmann, M., “Schur complement-based sub-structuring of stiff multibody systems with contact,” ACM Transactions on Graphics (TOG), Vol. 38,No. 5, 2019, pp. 1–17.[20] Teng, Y., Meyer, M., DeRose, T., and Kim, T., “Subspace condensation: full space adaptivity forsubspace deformations,” ACM Transactions on Graphics (TOG), Vol. 34, No. 4, 2015, pp. 1–9.[21] Buffoni, M., Telib, H., and Iollo, A., “Iterative methods for model reduction by domain decomposition,”Computers & Fluids, Vol. 38, No. 6, 2009, pp. 1160–1167.[22] Toselli, A. and Widlund, O., Domain decomposition methods - algorithms and theory, Vol. 34, SpringerScience & Business Media, 2006.[23] Sirovich, L., “Turbulence and the dynamics of coherent structures. Part I: Coherent structures,”Quarterly of applied mathematics, Vol. 45, No. 3, 1987, pp. 561–571.[24] Kerfriden, P., Goury, O., Rabczuk, T., and Bordas, S.-A., “A partitioned model order reduction ap-proach to rationalise computational expenses in nonlinear fracture mechanics,” Computer Methods inApplied Mechanics and Engineering, Vol. 200, No. 5, 2012, pp. 850–866.[25] Chaturantabut, S. and Sorensen, D. C., “Nonlinear model reduction via discrete empirical interpola-tion,” SIAM Journal on Scientific Computing, Vol. 32, No. 5, 2010, pp. 2737–2764.[26] Corigliano, A., Dossi, M., and Mariani, S., “Model Order Reduction and domain decomposition strate-gies for the solution of the dynamic elastic–plastic structural problem,” Computer Methods in AppliedMechanics and Engineering, Vol. 290, 2015, pp. 127–155.[27] Gravouil, A. and Combescure, A., “Multi-time-step explicit–implicit method for non-linear structuraldynamics,” International Journal for Numerical Methods in Engineering, Vol. 50, No. 1, 2001, pp. 199–225.[28] Baiges, J., Codina, R., and Idelsohn, S., “A domain decomposition strategy for reduced order models.Application to the incompressible Navier–Stokes equations,” Computer Methods in Applied Mechanicsand Engineering, Vol. 267, 2013, pp. 23–42.[29] Baiges, J., Codina, R., and Idelsohn, S., “Explicit reduced-order models for the stabilized finite elementapproximation of the incompressible Navier–Stokes equations,” International Journal for NumericalMethods in Fluids, Vol. 72, No. 12, 2013, pp. 1219–1243.[30] Bui-Thanh, T., Willcox, K., and Ghattas, O., “Model reduction for large-scale systems with high-dimensional parametric input space,” SIAM Journal on Scientific Computing, Vol. 30, No. 6, 2008,pp. 3270–3288.[31] LeGresley, P. A., Application of proper orthogonal decomposition (POD) to design decomposition methods,Stanford University, 2006. 3432] Carlberg, K., Bou-Mosleh, C., and Farhat, C., “Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations,” International Journal forNumerical Methods in Engineering, Vol. 86, No. 2, 2011, pp. 155–181.[33] Carlberg, K., Barone, M., and Antil, H., “Galerkin v. least-squares Petrov–Galerkin projection innonlinear model reduction,” Journal of Computational Physics, Vol. 330, 2017, pp. 693–734.[34] Carlberg, K., Farhat, C., Cortial, J., and Amsallem, D., “The GNAT method for nonlinear modelreduction: effective implementation and application to computational fluid dynamics and turbulentflows,” Journal of Computational Physics, Vol. 242, 2013, pp. 623–647.[35] Astrid, P., Weiland, S., Willcox, K., and Backx, T., “Missing point estimation in models described byproper orthogonal decomposition,” IEEE Transactions on Automatic Control, Vol. 53, No. 10, 2008,pp. 2237–2251.[36] Ryckelynck, D., “A priori hyperreduction method: an adaptive approach,” Journal of computationalphysics, Vol. 202, No. 1, 2005, pp. 346–366.[37] Everson, R. and Sirovich, L., “Karhunen–Lo`eve procedure for gappy data,” Journal of the OpticalSociety of America A, Vol. 12, No. 8, 1995, pp. 1657–1664.[38] Lubin, M., Petra, C. G., and Anitescu, M., “The parallel solution of dense saddle-point linear systemsarising in stochastic programming,” Optimization Methods and Software, Vol. 27, No. 4-5, 2012, pp. 845–864.[39] Pestana, J. and Wathen, A. J., “The antitriangular factorization of saddle point matrices,” SIAMJournal on Matrix Analysis and Applications, Vol. 35, No. 2, 2014, pp. 339–353.[40] Rees, T. and Scott, J., “A comparative study of null-space factorizations for sparse symmetric saddlepoint systems,” Numerical Linear Algebra with Applications, Vol. 25, No. 1, 2018, pp. e2103.[41] Grepl, M. A., Maday, Y., Nguyen, N. C., and Patera, A. T., “Efficient reduced-basis treatment ofnonaffine and nonlinear partial differential equations,” ESAIM: Mathematical Modelling and NumericalAnalysis, Vol. 41, No. 03, 2007, pp. 575–605.[42] Masoudi, S. and Romano, F., “CFD Vienna blog,” http://cfdblogvienna.blogspot.co.at/p/computational-fluid-dynamics.html , Nov. 2015.[43] Lee, K. and Carlberg, K. T., “Model reduction of dynamical systems on nonlinear manifolds using deepconvolutional autoencoders,” Journal of Computational Physics, Vol. 404, 2020, pp. 108973.[44] GeeksforGeeks, “Check if a directed graph is connected or not,” , Dec. 2018.35 a) port, υ = 10 − on Ω i , υ = 10 − on Γ i (b) port, υ = 10 − on Ω i , υ = 10 − on Γ i (c) port, υ = 10 − on Ω i , υ = 10 − on Γ i (d) skeleton, υ = 10 − on Ω i , υ =10 − on Γ i (e) skeleton, υ = 10 − on Ω i , υ =10 − on Γ i (f) skeleton, υ = 10 − on Ω i , υ =10 − on Γ i (g) full-interface, υ = 10 − on Ω i , υ = 10 − on Γ i (h) full-interface, υ = 10 − on Ω i , υ = 10 − on Γ i (i) full-interface, υ = 10 − on Ω i , υ = 10 − on Γ i (j) subdomain, υ = 10 − (k) subdomain, υ = 10 − (l) subdom, υ = 10 − Figure 16: Burgers equation, 4x2 “fine” configuration, average relative error (DD-LSPG and GNAT) versusnumber of constraint per port for parameters reported in Table 14. In the legend GNAT( x, y ) implies theGNAT model with n zi /p ri = x and υ = 10 − y for r ii