[PDF] A benchmark of basis-adaptive sparse polynomial chaos expansions for engineering regression problems

Abstract

Sparse polynomial chaos expansions (PCE) are an efficient and widely used surrogate modeling method in uncertainty quantification for engineering problems with computationally expensive models. To make use of the available information in the most efficient way, several approaches for so-called basis-adaptive sparse PCE have been proposed to determine the set of polynomial regressors ("basis") for PCE adaptively. We describe three state-of-the-art basis-adaptive approaches from the recent sparse PCE literature and extensively benchmark them in terms of global approximation accuracy on a large set of computational models representative of a wide range of engineering problems. Investigating the synergies between sparse regression solvers and basis adaptivity schemes, we find that virtually all basis-adaptive schemes outperform a static choice of basis. Three sparse solvers, namely Bayesian compressive sensing and two variants of subspace pursuit, perform especially well. Aggregating our results by model dimensionality and experimental design size, we identify combinations of methods that are most promising for the specific problem class. Additionally, we introduce a novel solver and basis adaptivity selection scheme guided by cross-validation error. We demonstrate that this meta-selection procedure provides close-to-optimal results in terms of accuracy, and significantly more robust solutions, while being more general than the case-by-case recommendations obtained by the benchmark.

Full PDF

SSparse Polynomial Chaos Expansions: Solvers, Basis Adaptivityand Meta-selection

Nora Lüthen , Stefano Marelli , and Bruno Sudret Chair of Risk, Safety and Uncertainty Quantiﬁcation, ETH Zürich, Stefano-Franscini-Platz 5, 8093Zürich, Switzerland

September 11, 2020

Abstract

Sparse polynomial chaos expansions (PCEs) are an eﬃcient and widely used surrogatemodeling method in uncertainty quantiﬁcation. Among the many contributions aiming atcomputing an accurate sparse PCE while using as few model evaluations as possible, basisadaptivity is a particularly interesting approach. It consists in starting from an expansionwith a small number of polynomial terms, and then parsimoniously adding and removing ba-sis functions in an iterative fashion. We describe several state-of-the-art approaches from therecent literature and extensively benchmark them on a large set of computational modelsrepresentative of a wide range of engineering problems. We investigate the synergies be-tween sparse regression solvers and basis adaptivity schemes, and provide recommendationson which of them are most promising for speciﬁc classes of problems. Furthermore, we ex-plore the performance of a novel cross-validation-based solver and basis adaptivity selectionscheme, which consistently provides close-to-optimal results.

Surrogate modeling techniques are a popular tool in applied sciences and engineering, becausethey can signiﬁcantly reduce the computational cost of uncertainty quantiﬁcation analysis forcostly real-world computational models. Here, the computational model is approximated by acheaper-to-evaluate function, which is created based on a small number of model evaluations,the so-called experimental design. One well-known and popular surrogate modeling technique ispolynomial chaos expansion (PCE), which approximates the output of a computational modelby a spectral expansion in terms of an orthonormal polynomial basis in the input randomvariables (Xiu and Karniadakis, 2002). PCE is particularly well suited for surrogating smoothmodels in low to medium dimension, and for the eﬃcient computation of moments and Sobol’indices (Sudret, 2008; Le Gratiet et al., 2017). Sparse PCE techniques, which aim to computean expansion involving only few terms, have proven especially powerful and cost-eﬃcient forreal-world engineering problems such as, among many others, surrogate-assisted robust design1 a r X i v : . [ s t a t . C O ] S e p ptimization (Chatterjee et al., 2019), hybrid simulation for earthquake engineering (Abbiatiet al., 2021), dam engineering (Guo et al., 2019; Harari-Ardebili and Sudret, 2020), and windturbine design (Slot et al., 2020).In the last decade, a large amount of literature on sparse PCE has been published, proposingmethods that make sparse PCE more accurate, eﬃcient and applicable to high-dimensionalproblems. However, it is often not obvious how well these methods perform when compared toand combined with each other, especially on real-world engineering problems. In an attempt toﬁll this gap, the authors recently conducted a literature survey on sparse PCE and a classiﬁcationof the available methods into a general framework, as well as a benchmark of selected methodson a broad class of computational models (Lüthen et al., 2020). This benchmark extensivelycompared sparse regression solvers and experimental design sampling schemes, using a ﬁxedtruncation scheme for the polynomial basis.The goal of the present paper is to complement this benchmark by exploring the promisingﬁeld of basis-adaptive sparse PCE , in which the basis is iteratively augmented and reﬁned. Wedescribe several basis-adaptive schemes from the sparse PCE literature in detail, namely, degreeand q-norm adaptivity (Blatman and Sudret, 2011; Marelli and Sudret, 2019), forward neighbordegree adaptivity (Jakeman et al., 2015), and anisotropic degree basis adaptivity (Hampton andDoostan, 2018). The performance and synergies of combinations of sparse regression solversand basis adaptivity schemes are then evaluated in terms of validation error on a library of11 computational models of varying complexity, representative of a broad class of engineeringmodels. These range from three- to 100-dimensional and include both analytical and numericalmodels. Furthermore, we explore the idea of performing an additional selection of sparse PCEsamong several candidate PCEs computed by diﬀerent combinations of methods on the sameexperimental design, using a cross-validation estimate for the generalization error.The paper is structured as follows. In Section 2, we recall the deﬁnition of (sparse) PCE andthe computing framework introduced by Lüthen et al. (2020). We discuss various estimatorsfor the generalization error, sparse regression solvers, and basis adaptivity, and introduce meta-selection. The associated benchmark results for basis adaptivity and meta-selection are presentedin Section 3. Finally, a discussion and summary of our results is provided in Section 4. Additionalinformation and results can be found in the Appendix. Let X be a d -dimensional random vector with mutually independent components and probabilitydensity function f X . Denote by D X the domain of the random vector X . Deﬁne the space L f X ( D X ) = { h : D X → R | Var X [ h ( X )] < + ∞} of all scalar valued models with ﬁnite varianceunder f X . Under very general conditions on the input distribution f X , there exists a polynomialbasis { ψ α : α ∈ N d } of L f X ( D X ) (Xiu and Karniadakis, 2002; Ernst et al., 2012). Since thecomponents of X are mutually independent, each polynomial basis function can be built as aproduct of univariate polynomials in X , . . . , X d and characterized by the multi-index α ∈ N d M ∈ L f X ( D X ), let Y = M ( X ) denote the output random variable. Y can be cast as the following spectral expansion: Y = M ( X ) = X α ∈ N d c α ψ α ( X ) . (1)In practice, a ﬁnite, truncated polynomial chaos expansion Y = M ( X ) ≈ M PCE ( X ) = X α ∈A c α ψ α ( X ) (2)is computed, where A ⊂ N d is the truncation set deﬁning the basis elements used in the expan-sion. The accuracy of the expansion depends on A and the coeﬃcient values ( c α ) α ∈A =: c ∈ R P ,with P = card( A ).Among several methods for computing the coeﬃcient vector c for a given truncation set, sparseregression is a particularly powerful and eﬃcient method (Doostan and Owhadi, 2011; Blat-man and Sudret, 2011). In this approach, the model is evaluated at a number of points X = { x (1) , . . . , x ( N ) } ⊂ D X called the experimental design (ED), yielding the vector ofmodel responses y = ( M ( x (1) ) , . . . , M ( x ( N ) )) T . Let { α j } Pj =1 be an arbitrary ordering of themulti-indices in the truncation set and deﬁne the regression matrix Ψ ∈ R N × P with entriesΨ ij = ψ α j ( x ( i ) ). Sparse regression methods determine a vector c that minimizes the residualnorm k Ψ c − y k under the constraint that it has only few nonzero entries i.e., it is sparse . Thisis usually achieved by regularized regression, resulting e.g. in the LASSO formulationˆ c = min c ∈ R P k Ψ c − y k s.t. k c k ≤ τ, (3)where τ is a parameter regulating the sparsity of c . A PCE with a sparse coeﬃcient vector c iscalled sparse PCE . Provided that the regression matrix fulﬁlls certain properties (Candès andWakin, 2008; Bruckstein et al., 2009; Candès and Plan, 2011), sparse regression can ﬁnd robustsolutions to underdetermined systems of linear equations, which means that the experimentaldesign can be smaller than the number of unknown coeﬃcients.The quality of the solution depends on the choice of the basis A , on the experimental design X , and on the method used for computing the coeﬃcients. For each of these components,many diﬀerent methods have been proposed in recent years, including iterative methods whichadaptively determine A , or the experimental design X . These methods were recently surveyedand classiﬁed into the framework shown in Fig. 1 (modiﬁed from Lüthen et al. (2020)). In thepresent contribution, we focus on the question of how to determine a suitable basis A . To thisend, we benchmark several basis-adaptive approaches and explore their interplay with sparseregression solvers. Model selection is applied on several levels in the sparse PCE framework:3 xperimental designSparse regression ED enrichmentEvaluation of model selection criterionCandidate basis sparse PCE solution Basis adaptivity

Figure 1:

Framework for computing sparse PCE introduced by Lüthen et al. (2020), whoconducted a literature survey and a benchmark for the central components “Experimental design”and “Sparse regression” (adapted from Lüthen et al. (2020)). In the present work, we discuss andbenchmark the components marked in orange. In particular, we discuss the component “Basisadaptivity” and explore its relationship to sparse regression solvers. • Within the sparse solver to select its hyperparameter (see Section 2.2) • Within the basis adaptivity scheme to select a basis (see Section 2.3) • Finally, between solvers and basis adaptivity schemes, to automatically select a combina-tion that is close to best (see Section 2.4).Our main quantity of interest is the generalization error, in the form of the relative mean squarederror normalized by the model variance E gen = E X h ( M ( X ) − M PCE ( X )) i Var X [ M ( X )] . (4)It can be approximated by the validation error in the form of the relative mean squared error(MSE) RelMSE = P N val i =1 (cid:16) M ( x ( i )val ) − M PCE ( x ( i )val ) (cid:17) P N val i =1 (cid:16) M ( x ( i )val ) − N val P N val j =1 M ( x ( j )val ) (cid:17) (5)computed on a large validation set { x ( i )val } N val i =1 ∼ i.i.d. f X . We only consider model selection criteriathat are estimates of the generalization error.To get an accurate estimate of the generalization error, without using an additional validationset, a widely used method is cross-validation (Vapnik, 1995). Here, the available data is repeat-edly divided into two disjoint sets, one for computing the solution (training) and the other forevaluating the error (validation). Aggregating the error estimates from the diﬀerent splits, weget a cross-validation estimate of the generalization error.One cross-validation strategy is to divide the data randomly into k disjoint, roughly equally largeparts, and use each of the parts in turn as validation set. This is called k -fold cross-validation .4f k is chosen to be equal to the size of the experimental design, the strategy is known as leave-one-out cross-validation (LOO). This is closest to using all data points to compute a solution,since in each iteration, only one point is left out from the training set.In general, LOO can be quite costly, since for an experimental design of size N , the methodunder consideration has to be applied N times. A cheap approximation to the LOO error,which requires only one application of the method, is available for certain sparse regressionsolvers, namely for those which in their last step recompute the solution with ordinary least-squares (OLS) on the set of regressors with nonzero coeﬃcient (called active basis ) (Blatmanand Sudret, 2010a). In particular, this is the case for the solvers hybrid least angle regression(LARS) (Blatman and Sudret, 2011), orthogonal matching pursuit (OMP) (Pati et al., 1993;Jakeman et al., 2015), and subspace pursuit (SP) (Diaz et al., 2018) (see Section 2.2). For thesesolvers, the OLS-based LOO estimate coincides with the true LOO error if the following holds:regardless of which experimental design point is left out, the sparse regression solver consistentlyyields the same active basis.Since the repeated use of LOO for model selection often results in the generalization error beingunderestimated, especially on small experimental designs, Blatman and Sudret (2011) proposedto use a modiﬁcation factor originally developed for the empirical error (Chapelle et al., 2002),deﬁned by T = NN − P active (cid:16) Ψ T active Ψ active ) − ) (cid:17) , (6)where P active denotes the number of nonzero coeﬃcients (the corresponding basis functions arecalled active ), and Ψ active denotes the regression matrix containing only the active regressors.The product of the modiﬁcation factor T with the LOO error is called modiﬁed LOO error . Various sparse regression solvers available for solving sparse PCE were described in detail byLüthen et al. (2020). We give here only a short overview of the solvers used in our benchmark,and refer to Lüthen et al. (2020) for further details. These solvers are common choices in thesparse PCE literature. • Hybrid least angle regression (LARS) (Blatman and Sudret, 2011): adding regressorsone-by-one, following a least-angle strategy. The hybrid approach recomputes the ﬁnalcoeﬃcient values by ordinary least squares (OLS) on the selected regressors. • Orthogonal matching pursuit (OMP) (Pati et al., 1993; Jakeman et al., 2015): greedilyadding orthogonal regressors one-by-one, computing their coeﬃcients by OLS. • Subspace pursuit (SP) (Diaz et al., 2018): searching iteratively for a solution with acertain ‘ -norm by adding and removing regressors from the active set. Coeﬃcients arecomputed by OLS. We include two variants of SP in this benchmark, one which determinesits hyperparameter by 4-fold cross-validation as implemented by Diaz (2018), which we5enote by SP k =4 , and one where it is determined by OLS-based LOO, introduced in Lüthenet al. (2020) and denoted by SP LOO . • Bayesian compressive sensing (BCS) (Babacan et al., 2010): using a Bayesian frameworkto enforce sparsity of the coeﬃcients. • SPGL1 (van den Berg and Friedlander, 2008): a convex optimization solver following thePareto front of the residual-sparsity trade-oﬀ.Each of the solvers features at least one hyperparameter whose value needs to be determinedvia cross-validation in order to get a good solution. For LARS, OMP, SP k =4 , and SP LOO ,this hyperparameter is the number of active basis functions (nonzero coeﬃcients) of the ﬁnalsolution. For BCS and SPGL1, it is the bound on the residual norm in the sparse regressionformulation.The benchmark by Lüthen et al. (2020) of sparse regression solvers on a non-adaptive polynomialbasis showed that BCS and SP

LOO generally outperform other sparse solvers for low-dimensionalmodels, while for high-dimensional models, BCS is by far the best sparse solver.

The sparse solver and the experimental design, which were benchmarked in Lüthen et al. (2020),are not the only ingredients to a sparse PCE. The choice of the candidate basis , from which thesparse solver determines the active basis (i.e., the set of regressors with nonzero coeﬃcient), isanother important ingredient: if the candidate basis is chosen too small, important terms mightbe missing, which might lead to a large model error. On the other hand, if the candidate basisis too large, the ratio of the number of experimental design points to the number of coeﬃcientsis small, which causes some properties of the regression matrix to deteriorate and can result ina large approximation error.Of course, it is not possible to know a-priori the best choice of the candidate basis. Basis-adaptiveschemes start with an initial candidate basis and adapt it iteratively, adding or removing basisfunctions in each iteration according to various heuristics. The performance of the bases in thediﬀerent iterations is evaluated using a model selection criterion, typically an estimate of thegeneralization error.Several procedures for basis-adaptive sparse PCE have been proposed in the literature. Wediscuss three approaches in detail, namely • degree and q-norm (“p&q”) basis adaptivity, as implemented in UQLab (Marelli and Su-dret, 2019), see Section 2.3.1; • forward neighbor basis adaptivity (Jakeman et al., 2015), see Section 2.3.2; and • anisotropic degree basis adaptivity (Hampton and Doostan, 2018), from the softwareBASE_PC (Hampton and Doostan, 2017), see Section 2.3.3.We also brieﬂy mention several other approaches found in the literature.6 .3.1 Degree and q-norm (“p&q”) adaptivity A typical choice for a PCE basis is the basis of total degree p deﬁned by the set of multi-indices A p = { α ∈ N d : d X i =1 α i ≤ p } . (7)Furthermore, hyperbolic truncation (Blatman and Sudret, 2011) uses the q-(quasi-)norm k x k q = d X i =1 | x i | q ! q (8)with q ∈ (0 ,

1] to truncate a total-degree basis further: A p,q = { α ∈ N d : k α k q ≤ p } . (9)Hyperbolic truncation has the eﬀect of excluding some of the basis functions with high degreeand high interaction order. This is particularly eﬀective for high-dimensional problems.A simple basis-adaptive scheme is degree adaptivity (Blatman and Sudret, 2011), which computesa number of PCEs on a sequence of total-degree candidate bases of increasing size, and returnsthe PCE minimizing a certain error estimate as ﬁnal solution. Analogously, a q-norm adaptivescheme can be developed, and easily be combined with degree adaptivity (Marelli and Sudret,2019), yielding degree and q-norm (p&q) adaptivity . Degree and q-norm adaptivity is solution-agnostic, i.e., it does not take the solution computed in the previous iteration into account. Jakeman et al. (2015) suggest a basis-adaptive algorithm based on a graph view of the PCEregressors (see also Gerstner and Griebel (2003); Narayan and Jakeman (2014)).Since the input random vector is assumed to have independent components, the basis functionshave tensor product structure. The basis functions can be considered nodes of a directed acyclicgraph constructed as follows: two regressors are considered neighbors if their multi-index ofdegrees diﬀers only in one dimension by 1, i.e., there is a directed edge from ψ α to ψ β iﬀ γ := β − α is a multi-index with γ i = 1 for one i ∈ { , . . . , d } and γ j = 0 , j = i . Forwardneighbors of a regressor are regressors reachable by an outgoing edge, and backwards neighbors are regressors connected by an incoming edge.In the context of basis-adaptive PCE, this construction is used to determine a number of candi-date regressors to be added to the currently active basis, starting from an initial basis of smalltotal degree. Assume that an important high-degree or high-order regressor is not yet part of thecandidate basis but some of its backwards neighbors are. The fact that it is missing should bevisible in the coeﬃcients of its backwards neighbors, which can be expected to have a signiﬁcantnonzero coeﬃcient to compensate for the fact that the important high-degree regressor is notyet part of the basis. 7his heuristic is the foundation of the algorithm whose pseudocode is summarized in Algo-rithm 1. In each iteration, the current set of active basis functions is determined (restrictionstep). All forward neighbors of these active basis functions are surveyed and added to the can-didate basis if they are admissible , i.e., if all of their backwards neighbors are in the active set(expansion step). Jakeman et al. (2015) employ several expansion steps to prevent prematuretermination, and use cross-validation error estimates. We call this algorithm forward neighborbasis-adaptive . Algorithm 1

Forward neighbor basis adaptivity (Jakeman et al., 2015) Initial PCE (basis chosen a-priori, typically small total-degree basis) Restriction: retain only the active regressors A active Expansion: let A (0) = A active . For t = 1 , . . . , T , obtain the set A ( t ) by augmenting A ( t − by all its admissible forward neighbors. Compute a PCE and its error estimate for each augmented basis A (1) , . . . , A ( T ) . Choose the PCE with the lowest error estimate among the T candidates. Stop if this errorestimate is larger than the previously obtained best error estimate. Else, continue iterationwith restriction step (Step 2)This algorithm is implemented in the software Dakota (Adams et al., 2014). We use our ownMatlab-based implementation of the algorithm. Consistently with (Jakeman et al., 2015), weset T = 3. Hampton and Doostan (2018) propose an algorithm called BASE-PC, which combines a basis-adaptive scheme with sequential enrichment of a coherence-optimal experimental design. Thetwo steps are executed alternatingly: based on the current ED, a suitable basis is chosen; andaccording to the currently active basis functions, the ED is enriched and the weights are updated.In the present work, we solely consider the basis adaptivity part of the BASE-PC algorithm.The main idea of this algorithm is to use an anisotropic degree basis , deﬁned by a degree vector p ∈ N d . A related idea was explored earlier by Blatman and Sudret (2009). Similarly to atotal-order basis, an anisotropic degree basis is deﬁned by A p = { α ∈ N d : d X i =1 α i p i ≤ } . (10)If all entries of p are the same, i.e., p = p = . . . = p d = p , this deﬁnition reduces to a total-orderbasis of degree p . The equation P di =1 α i p i = 1 deﬁnes a hyperplane that cuts the i -th coordinateaxis at α i = p i .In each iteration, the algorithm determines the current anisotropic degree vector based on thecurrently active basis. A new, larger candidate basis is then constructed by considering theanisotropic degree basis corresponding to a uniformly increased anisotropic degree vector.8e use our own, slightly modiﬁed implementation of the basis adaptivity part of BASE-PC, assummarized in Algorithm 2. The most costly operation is the computation of the anisotropic-degree basis (line 6). Hampton and Doostan (2018) developed a specialized eﬃcient algorithm togenerate the multi-indices of an anisotropic degree basis, which we utilize in our implementation.We call Algorithm 2 anisotropic degree basis-adaptive . Algorithm 2

Anisotropic degree basis adaptivity (Hampton and Doostan, 2018) Initial PCE (ﬁxed basis of low order) for o = 1 , . . . , do . outer loop Restriction: denote by A active the set of active regressors of the last selected PCE for i = 1 , . . . , do . inner loop Additional restriction: remove i − regressors from the set A active starting with theones with smallest-in-magnitude coeﬃcients, obtaining a set A i ⊆ A active Expansion: compute the dimension-wise maximal degree of the regressors in A i ,denoted by degree vector p max . Compute p new = ( p max1 + 1 , p max2 + 1 , . . . , p max d + 1) and theassociated anisotropic-degree basis A i, new Compute a PCE on the basis A i, new and its error estimator e i If e i ≥ e i − , increase the so-called strike counter by 1. Break from the inner loop ifthe strike counter is ≥ end for From the 10 candidate PCEs, select the PCE with the lowest error estimate

Break from the outer loop if the error estimate has increased three times in subsequentiterations (ExpansionEarlyStop; same idea as inner loop strike counter) end for

Return the PCE with the lowest error estimate among all PCEs selected in the outer loop.

We brieﬂy summarize other approaches for basis adaptivity for sparse PCE. These approacheswill not be investigated in our benchmark.Many algorithms which use stepwise regression to build up a sparse basis regressor-by-regressorcan be classiﬁed both as sparse regression solvers (as done in Lüthen et al. (2020)) and as basis-adaptive algorithms. As an example, the approach by Blatman and Sudret (2010a) adds andremoves regressors one-by-one based on the induced change in LOO error, thereby building upa small set of active basis functions which is sparse in a larger total-degree basis. In this case,the candidate basis coincides with the active basis. Mai and Sudret (2015) use the “principle ofheredity” together with LARS to determine additional bivariate interaction terms to be added tothe model, once a univariate term is identiﬁed as relevant. We do not consider these approaches,since we are interested in algorithms modifying the candidate basis not only one regressor at atime, but at a larger scale.Alemazkoor and Meidani (2017) propose a basis-adaptive algorithm relying on sparsity and step-9y-step augmentation of the basis. In each step, the candidate basis is a total-order basis fora subset of input random variables, while the remaining input random variables are consideredconstant. The initial basis consists only of the constant term. In each iteration, either one of theconstant dimensions is activated, or the total degree of the candidate basis is increased by one(active dimensions only), depending on the resulting error estimate or sparsity of the solution.The coeﬃcients are solved by OLS until a pre-deﬁned threshold of residual norm is reached.Then, the sparse regression solver SPGL1 is used, which identiﬁes the sparsest solution whoseresidual norm is smaller than the given threshold. We do not consider this method due to itshigh computational cost and its strong tie to the solver SPGL1, making it less eﬀective whenpaired with other sparse regression solvers.Loukrezis and De Gersem (2019) propose a basis-adaptive algorithm for interpolating PCEon Leja-based sparse grids. Their algorithm is a more cautious version of forward neighborbasis adaptivity (Section 2.3.2): after adding all admissible forward neighbors to the basis andcomputing their coeﬃcients, all of the recently added regressors are removed again, except forthe one that achieved the largest-in-magnitude coeﬃcient. Thapa et al. (2020) suggest a basis-adaptive procedure that relies on total-order bases of increasing degree. In contrast with degreeadaptivity (Section 2.3.1), the basis functions of degree p + 1 are not added all at once, but inchunks of a certain size dependent on the dimension and the degree p . After adding a chunk ofbasis functions, the PCE is recomputed and regressors with a coeﬃcient smaller than a certainthreshold are removed from the basis. We do not consider these two approaches because theyare similar to the previously presented approaches while being more costly. For realistic simulators used in engineering and applied sciences, evaluating the computationalmodel is the expensive part of the surrogate modeling process. Once the model evaluations areobtained, all further computations are post-processing steps, and are computationally cheap incomparison to the model evaluations. Thus, it is feasible to apply several adaptive sparse PCEmethods and choose the best one.We therefore propose the use of an additional layer of selection ( meta-selection , also known as bagging or ensemble modeling ). For a given experimental design, we compute several sparsePCEs through diﬀerent combinations of sparse solvers and basis adaptivity schemes. From theresulting PCE solutions, we choose one based on the value of a suitable estimate of generalizationerror as model selection criterion. There are several possibilities. One possible choice is theestimator already used for selecting the hyperparameter of the solver and for basis adaptivity,i.e., (modiﬁed) LOO for LARS, OMP and SP LOO , and k -fold cross-validation for SP k =4 , BCSand SPGL1. However, this estimator might not be consistent between solvers, which is howevernecessary for such a meta-selection. Furthermore, this estimate might be biased due to itsrepeated use.A second class of estimators is given by the so-called hybrid cross-validation error estimators .10he word hybrid is a reference to Efron et al. (2004), who created the hybrid version of least-angle regression (LARS) which uses LARS only to identify the set of active basis functions,and then recomputes the coeﬃcients by ordinary least-squares (OLS) on this active set. Tocompute the hybrid leave-one-out (LOO) or hybrid k -fold cross-validation error estimate forany PCE solution, the same procedure is used: ﬁrst, the active basis functions are identiﬁedusing the whole experimental design. Then, the coeﬃcients are recomputed several times usingOLS, following the chosen cross-validation framework. This requires solving a linear system ofequations k times in the case of k -fold cross-validation. In case of LOO, only one linear systemof equations needs to be solved (Blatman and Sudret, 2010a). Furthermore, we can make use ofthe modiﬁcation factor of Eq. (6) to compute the hybrid modiﬁed LOO error estimate.As baseline cases, we will also select a solution 1) randomly from a set of generally well-performing methods, and 2) corresponding to the best combination identiﬁed in the benchmarkof basis adaptivity schemes (Section 3.1). In this benchmark, we compare the performance of various combinations of sparse regressionsolvers with basis-adaptive schemes. We use the following methods and associated implementa-tions. The sparse solvers (wrapped to ﬁt into our benchmark framework) are: • Hybrid-LARS: UQLab (Marelli and Sudret, 2019) • OMP: UQLab (Marelli and Sudret, 2019) • SP k =4 : DOPT_PCE (Diaz et al., 2018; Diaz, 2018) • SP LOO : own adaptation of Diaz et al. (2018); Diaz (2018) • BCS: FastLaplace (Babacan et al., 2010; Babacan, 2011) • SPGL1: SPGL1 (van den Berg and Friedlander, 2008; Van den Berg and Friedlander,2015)The basis-adaptive schemes are: • p&q adaptivity: UQLab (Marelli and Sudret, 2014) • forward neighbor basis adaptivity: own implementation of the algorithm based on thedescription by Jakeman et al. (2015) • anisotropic degree basis adaptivity: own implementation of an algorithm adapted fromthe basis-adaptive part of BASE-PC (Hampton and Doostan, 2018) (see Section 2.3.3)To reduce the complexity of our benchmark, we choose Latin hypercube sampling with max-imin distance optimization to generate the experimental design (ED) since Lüthen et al. (2020)11emonstrated that the choices of sparse solver and sampling scheme are mostly independentfrom each other.We use the following model selection methods: • For the selection of the hyperparameters of the sparse regression solvers, we use – modiﬁed OLS-based LOO for the solvers LARS, OMP and SP LOO – k -fold CV for the solvers SP k =4 , BCS and SPGL1 • The basis adaptivity schemes use the same criterion as the respective solver uses. • We investigate in Section 3.2 which criterion is suited best for the ﬁnal model selection.For our benchmark, we use 11 benchmark models ranging from low-dimensional analytical mod-els to high-dimensional diﬀerential equations. All of these models have previously been used asnumerical examples for surrogate modeling or reliability analysis. Table 1 provides an overviewof the benchmark models. For a more detailed presentation, we refer the interested reader tothe respective publications (see column "Reference" of Table 1).

We benchmark the sparse solvers LARS, OMP, SP k =4 , SP LOO , BCS, and SPGL1 combined withthe basis-adaptive schemes described in Section 2.3:1. degree and q-norm (p&q) adaptivity (abbreviation: PQ)2. forward neighbor basis adaptivity (FN)3. anisotropic degree basis adaptivity (AD)As a base case, we include a static basis following the rule P ≈ N (abbreviation: ST), wherewe use hyperbolic truncation with q = 0 . d ≥ R = 30 times to account for the stochasticityof the sampling method. Due to their relatively high computational cost, we omit SPGL1and anisotropic degree basis adaptivity from the benchmark for high-dimensional models. Moredetails on the settings for the basis adaptivity schemes (e.g., initial basis and investigated degreeranges) can be found in Appendix A.The aggregated rankings based on the relative MSE are shown in Figs. 2a–2d in the form ofstacked bar plots. We analyze the results for low- and high-dimensional models as well as smalland large ED sizes separately, resulting in 2 × able 1: Overview of the 11 computational models used in our benchmark. Finite elementmodels are marked in italic font, all other models are analytical. The column “Reference” pro-vides the literature in which the models and their input are described in detail. The column “EDsizes” contains the two sizes of experimental design (small and large) used in the basis adaptivitybenchmark.

Model Dimension Distributions Reference ED sizes(small, large)Ishigami function 3 uniform Blatman andSudret (2011) 50, 150Undamped oscillator 6 Gaussian Echard et al.(2013) 60, 120Borehole function 8 Gaussian, lognormal,uniform Morris et al.(1993) 100, 250Damped oscillator 8 lognormal Dubourg (2011) 150, 350Wingweight function 10 uniform Forrester et al.(2008) 100, 250

Truss model

10 lognormal, Gumbel Blatman andSudret (2011) 100, 250Morris function 20 uniform Blatman andSudret (2010b) 150, 350

Structural frame model

21 lognormal, Gaussian;dependent input variables Blatman andSudret (2010a) 150, 350

53 Gaussian Konakli andSudret (2016) 100, 400

62 Gaussian Fajraoui et al.(2017) 100, 400High-dim. function 100 uniform UQLab example 400, 1200 × × X - Y - Z , where X denotes the percentage of runs where therespective combination turned out best (i.e., the length of the bright yellow bar), Y denotesthe percentage of runs that were within a factor of two of the smallest error on that ED (redtriangle), and Z denotes the percentage of runs that were within 1 order of magnitude of thesmallest error (light red triangle). The numbers are rounded to integer values. Italic numbersindicate that this is the best value achieved among all combinations. The enumeration tags referto the panels in Figure 2.(2a) Low-dimensional models, small ED : BCS together with forward neighbor basis adaptivity( - -74) achieves the smallest error more often than any other combination. However,this combination is not the most robust: in 100 −

74 = 26% of runs, its error is more thanone magnitude worse than the best error. SP

LOO together with p&q adaptivity (8-36- )is most robust.(2b) Low-dimensional models, large ED : by far the best combination in all three categories isSP

LOO together with forward neighbor basis adaptivity ( - - ).(2c) High-dimensional models, small ED : there are two combinations that outperform the oth-ers: BCS with p&q adaptivity ( -69- ) and BCS with a static basis (2- - ).(2d) High-dimensional models, large ED : the two best combinations are SP k =4 with forward14 Ranking percentage

BCS & PQSP(LOO) & PQLARS & ADSP(LOO) & FNBCS & ADBCS & FN Rank 1 Rank 12 Rank 24 (a) low-dimensional models, small ED

Ranking percentage

SP(LOO) & ADSP(k=4) & FNSP(LOO) & PQSP(k=4) & ADLARS & ADSP(LOO) & FN Rank 1 Rank 12 Rank 24 (b) low-dimensional models, large ED

Ranking percentage

LARS & STLARS & PQBCS & PQSP(k=4) & FNBCS & FNBCS & ST Rank 1 Rank 7 Rank 15 (c) high-dimensional models, small ED

Ranking percentage

BCS & PQBCS & STLARS & FNSP(LOO) & FNBCS & FNSP(k=4) & FN Rank 1 Rank 7 Rank 15 (d) high-dimensional models, large ED

Ranking percentage

OMP & FNOMP & PQOMP & STSP(LOO) & STSP(k=4) & STSP(k=4) & PQLARS & STLARS & FNSP(LOO) & PQLARS & PQBCS & STBCS & PQBCS & FNSP(k=4) & FNSP(LOO) & FN1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.within 2*bestwithin 5*bestwithin 10*best (e)

Aggregated results for all models and ED sizes

Figure 2:

Visualization of rankings for the combinations of sparse solvers and basis adaptivityschemes. For each model, each ED size and each replication, the ranking of the combinations isdetermined based on the relative MSE. The rankings are aggregated for low- and high-dimensionalmodels as well as small and large EDs separately (top) or over all models and ED sizes, respec-tively (panel 2e). The color scale of the stacked bars, ranging from yellow to blue, visualizeshow often each combination attained the respective ranking. The triangles in hues of red denotethe percentage of results that were within a factor ( { , , } ) of the best relative MSE for thespeciﬁc ED. - -81) and SP LOO with forward neighbor basis adaptivity(26-61- ).We observe that from the list of six solvers, only BCS, SP k =4 and SP LOO are found among thebest combinations. Considering the best six combinations (based on Y ) as in Fig. 2, only LARSis joining the list. OMP does not perform well in any of the cases (see also Appendix, Figs. 4–7).This is likely due to its tendency to severely underestimate the generalization error, which mightmake the comparison between error estimates of diﬀerent bases unreliable. Likewise, SPGL1 isnever among the best six combinations for low-dimensional models.Regarding basis adaptivity schemes, the static basis is not among the best six combinations(based on Y ) for low-dimensional models. However for high-dimensional models, especially forthe small ED case, it is among the six best combinations several times and performs well.To get a complete picture of solver and basis adaptivity scheme performance, we display in Fig. 2ethe relative performance of all combinations of solvers and basis adaptivity schemes, averagedover all models and experimental designs regardless of their dimensionality and size. We excludecombinations involving SPGL1 and anisotropic degree basis adaptivity (AD), which were onlytested for low-dimensional models. Therefore, Fig. 2e contains 3 × LOO together with forward neighbor basis adaptivity ( - - ) performs best inall three categories. SP k =4 together with forward neighbor basis adaptivity (22-53-81) is on thesecond place in terms of the ﬁrst two criteria ( X and Y ). However, SP LOO together with p&qbasis adaptivity (7-31-85) is the second-best solver in terms of achieving an error within oneorder of magnitude of the best error most often ( Z ). BCS together with any basis adaptivityscheme performs well, while all combinations involving OMP are on the bottom of the list.Combinations with a static basis are found more towards the end of the list, and those withforward neighbor basis adaptivity are found more towards the beginning of the list.From these plots, we see clearly that there is no single combination of sparse solver and basisadaptivity scheme that always outperforms all others. While SP LOO together with forwardneighbor basis adaptivity shows superior performance when averaged over all models and EDsizes, we identify better-performing combinations when we diﬀerentiate by model dimension andED size. In some cases, we are faced with a trade-oﬀ between accuracy and robustness: e.g., forhigh-dimensional models and large EDs, the choice of SP k =4 & FN has a higher probability ofyielding a close-to-optimal solution, while the choice of SP LOO & FN has a higher probabilityof not being too far oﬀ from the optimal one. The same holds for low-dimensional models andsmall EDs for the combinations BCS & FN vs. SP

LOO & PQ.

As we saw in the previous section, there is no single best-performing combination of sparse solverand basis adaptivity scheme. In this section, we investigate whether it is possible to select awell-performing combination from a number of candidate combinations using a model selectioncriterion ( meta-selection ). 16or each model, ED size and repetition, the combination of basis adaptivity scheme and sparsesolver that minimizes the considered selection index is chosen to be the ﬁnal meta-model. Dueto the performance of the methods in the benchmark in the previous sections, we restrict ourinvestigation to the solvers SP k =4 , SP LOO and BCS, and to the basis adaptivity schemes • p&q adaptivity (PQ), forward neighbor adaptivity (FN), and anisotropic degree adaptivity(AD) for low-dimensional models • static basis (static), p&q adaptivity (PQ), and forward neighbor adaptivity (FN) for high-dimensional modelsresulting in 9 possible solutions for each model, ED size and repetition.We use the following model selection criteria (see also Section 2.4):1. the oracle solution, i.e, the smallest relative MSE attained among the 9 combinations oneach ED realization, as a lower bound. Of course, this information is not available inpractice.2. the criterion used for basis and hyperparameter selection by the respective solver (seeSection 2.1)3. hybrid LOO, computed by OLS on the active basis functions only4. hybrid modiﬁed LOO, computed by OLS on the active basis functions only, and using thecorrection factor from Eq. (6)5. hybrid 10-fold cross-validation error, computed by OLS on the active basis functions only6. A ﬁxed rule dependent on dimensionality and ED size, according to the ﬁndings in Section3.1, choosing the solver that was ranked ﬁrst most often (bright yellow bar): • low-dimensional models, small ED: BCS & FN • low-dimensional models, large ED: SP LOO & FN • high-dimensional models, small ED: BCS & PQ • high-dimensional models, large ED: SP k =4 & FN7. A randomly picked combination of solver and basis adaptivity scheme as upper bound:any sensible model selection criterion must perform better than this.The results are presented in Figure 3. Note that we do not display which sparse solver and basisadaptivity scheme was chosen – we only show how close the chosen solution comes to the bestpossible solution. • By construction, the oracle selection performs best in all cases. • The random selection is by far the worst selection criterion, which shows that meta-selection provides a statistically signiﬁcant improvement over random selection.17

Ranking percentage randomfixed rulemin hybrid k-foldmin hybrid mod LOOmin hybrid LOOmin CV erroracle 1. 2. 3. 4. 5. 6. 7. within 2*bestwithin 5*bestwithin 10*best (a) low-dimensional models, small ED

Ranking percentage randomfixed rulemin hybrid k-foldmin hybrid mod LOOmin hybrid LOOmin CV erroracle 1. 2. 3. 4. 5. 6. 7. within 2*bestwithin 5*bestwithin 10*best (b) low-dimensional models, large ED

Ranking percentage randomfixed rulemin hybrid k-foldmin hybrid mod LOOmin hybrid LOOmin CV erroracle 1. 2. 3. 4. 5. 6. 7. within 2*bestwithin 5*bestwithin 10*best (c) high-dimensional models, small ED

Ranking percentage randomfixed rulemin hybrid k-foldmin hybrid mod LOOmin hybrid LOOmin CV erroracle 1. 2. 3. 4. 5. 6. 7. within 2*bestwithin 5*bestwithin 10*best (d) high-dimensional models, large ED

Ranking percentage randomfixed rulemin hybrid k-foldmin hybrid mod LOOmin hybrid LOOmin CV erroracle 1. 2. 3. 4. 5. 6. 7. within 2*bestwithin 5*bestwithin 10*best (e)

All dimensionalities and ED sizes in one

Figure 3:

Testing diﬀerent model selection strategies. As before, the stacked bar plot visualizesthe percentage of runs where the respective strategy ranked ﬁrst, second, etc., among the sevenselection strategies. The triangles in hues of red illustrate how many of the runs yielded a resultthat was within a factor of { , , } of the best error. The ﬁxed rule is overall less robust than the hybrid CV selection criteria (light red triangle).Furthermore, it does not ﬁnd the best or second-best solution more often than the hybridCV selection criteria. • The criterion used for basis and hyperparameter selection, i.e., k-fold CV for SP k =4 andBCS, and LOO for SP LOO , performs slightly worse than the three hybrid selection criteria– except for high-dimensional models with a large ED, for which this criterion performsbest by a large margin (Fig. 3d). Overall, this results in an almost identical performanceof this and the hybrid criteria when all models and ED sizes are combined (Fig. 3e). Therelatively poor performance of this selection criterion for low-dimensional models and smallexperimental designs might be explained by selection bias: since the criterion was alreadyused twice for selection, it is likely that it underestimates the generalization error. Anotherreason might be that diﬀerent solvers use diﬀerent criteria which might not be comparableamong each other (inconsistency). • The three other model selection criteria hybrid LOO, hybrid modiﬁed LOO, and hybrid 10-fold CV perform almost identically, especially when averaging the results over all modelsand ED sizes. Note that all three attain in almost all cases an error which is withinone order of magnitude of the oracle solution, which is signiﬁcantly better than any ﬁxedcombination of methods (compare Figs. 2e and 3e).We conclude that meta-selection, i.e., using cross-validation to choose a PCE solution froma number of candidate solutions computed with diﬀerent methods on the same experimentaldesign, can achieve more robust and slightly more accurate results than a ﬁxed rule on whichcombination of solver and basis adaptivity scheme to use, even if this rule is based on a thoroughbenchmark such as the one in Section 3.1.

We investigated several approaches for computing sparse PCE with the goal of surrogate mod-eling, using the relative mean squared error on a validation set as the main performance metric.In particular, we studied the performance of combinations of basis adaptivity schemes andsparse solvers in order to identify combinations that yield the smallest generalization error. Ourinvestigations are based on 11 analytical and numerical benchmark models of varying inputdimension and complexity, representative of a wide range of engineering problems. We con-sidered the sparse solvers least angle regression (LARS), orthogonal matching pursuit (OMP),subspace pursuit based on k -fold cross-validation (SP k =4 ), subspace pursuit based on LOO(SP LOO ), Bayesian compressive sensing/FastLaplace (BCS), and spectral projected gradient- ‘ (SPGL1). The basis adaptivity schemes we compared were a ﬁxed basis truncation schemefollowing the rule N ≈ P , degree- and q-norm adaptivity, forward neighbor basis adaptivity,and anisotropic degree basis adaptivity. We made a distinction between four cases, namely low-and high-dimensional models as well as small and large experimental design sizes.19he benchmark revealed that the diﬀerence in generalization error between diﬀerent combina-tions of methods can be large, even more than an order of magnitude. In each of the four cases,we were able to ﬁnd combinations that generally perform well. These combinations always in-volved the solvers SP LOO , SP k =4 , or BCS, but never SPGL1 or OMP. For the basis-adaptiveschemes, the picture is less clear, except that the static basis (i.e., no basis adaptivity) was innearly all cases outperformed by basis-adaptive schemes. There is no combination of solver andbasis-adaptive scheme that was consistently best across all models and experimental design sizes,although averaging over all four cases, SP LOO together with forward neighbor basis adaptivityoutperformed all other combinations.Since no clear best combination of solver and basis adaptivity scheme emerged, we introduced anautomatic meta-selection step. Based on a suitable error estimate, meta-selection chooses onePCE solution out of several candidate solutions computed by various combinations of solvers andbasis adaptivity schemes. We found that meta-selection using any hybrid cross-validation errorestimate performs better than the ﬁxed rules obtained from the basis adaptivity benchmark bothin terms of attained relative MSE and robustness, and far better than a random selection of solverand basis adaptivity scheme. An additional advantage of meta-selection is that it is automaticand independent of model dimension or size of the available experimental design, unlike theproposed ﬁxed rules, which rely on the somewhat arbitrary (albeit simple) classiﬁcation weapplied (low/high dimension and small/large experimental design).These ﬁndings demonstrate that when building a sparse PCE for an expensive black-box com-putational model, it is worth it to carefully select a sparse solver, and to apply a basis-adaptivescheme, because the diﬀerence in relative MSE between diﬀerent combinations of methods on thesame experimental design can be larger than one order of magnitude. While we could identify anumber of methods that generally perform well, and others that should be avoided, a superiorstrategy is to compute several PCE solutions and perform a ﬁnal model selection using one ofthe presented hybrid cross-validation error estimators.Further research is needed to investigate the use of true cross-validation for meta-selectioninstead of the hybrid estimates which we used here. Also, it might be possible to identify otherproblem characteristics besides its dimension and the size of the available experimental designto guide the choice of methods in the sparse PCE framework. A promising class of methodscombines basis adaptivity with the sequential enrichment of the experimental design, adaptedto the current candidate or active basis (Fajraoui et al., 2017; Diaz et al., 2018; Hampton andDoostan, 2018), which might be able to further improve on the results obtained here.

Acknowledgments

We thank John Jakeman (Sandia National Laboratories), Negin Alemazkoor (University of Mas-sachusetts Dartmouth), Hadi Meidani (University of Illinois at Urbana-Champaign), and JerradHampton (University of Colorado Boulder) for generously providing their code and explanations.We thank John Jakeman for his useful hints to relevant literature on basis adaptivity.20 eferences

Abbiati, G., S. Marelli, N. Tsokanas, B. Sudret, and B. Stojadinovic (2021). A global sensitivityanalysis framework for hybrid simulation.

Mechanical Systems and Signal Processing 146 ,106997.Adams, B. M., L. E. Bauman, W. J. Bohnhoﬀ, K. R. Dalbey, M. S. Ebeida, J. P. Eddy, M. S.Eldred, P. D. Hough, K. T. Hu, J. D. Jakeman, J. A. Stephens, L. P. Swiler, D. M. Vigil, , andT. M. Wildey (2014).

Dakota, A Multilevel Parallel Object-Oriented Framework for DesignOptimization, Parameter Estimation, Uncertainty Quantiﬁcation, and Sensitivity Analysis:Version 6.0 User’s Manual . Sandia National Laboratories. Technical Report SAND2014-4633(Updated November 2015 (Version 6.3)).Alemazkoor, N. and H. Meidani (2017). Divide and conquer: An incremental sparsity promotingcompressive sampling approach for polynomial chaos expansions.

Comput. Methods Appl.Mech. Engrg. 318 , 937–956.Babacan, D. (2011). MATLAB code for the TIP 2009 paper "Bayesian Compressive Sensingusing Laplace Priors". . [Online; accessed 28-August-2019].Babacan, S., R. Molina, and A. Katsaggelos (2010). Bayesian compressive sensing using Laplacepriors.

IEEE Trans. Image Process. 19 (1), 53–63.Blatman, G. and B. Sudret (2009). Anisotropic parcimonious polynomial chaos expansionsbased on the sparsity-of-eﬀects principle. In H. Furuta, D. Frangopol, and M. Shinozuka(Eds.),

Proc. 10th Int. Conf. Struct. Safety and Reliability (ICOSSAR’2009), Osaka, Japan .Blatman, G. and B. Sudret (2010a). An adaptive algorithm to build up sparse polynomial chaosexpansions for stochastic ﬁnite element analysis.

Prob. Eng. Mech. 25 , 183–197.Blatman, G. and B. Sudret (2010b). Eﬃcient computation of global sensitivity indices usingsparse polynomial chaos expansions.

Reliab. Eng. Sys. Safety 95 , 1216–1229.Blatman, G. and B. Sudret (2011). Adaptive sparse polynomial chaos expansion based on LeastAngle Regression.

J. Comput. Phys 230 , 2345–2367.Bruckstein, A. M., D. L. Donoho, and M. Elad (2009). From sparse solutions of systems ofequations to sparse modeling of signals and images.

SIAM review 51 (1), 34–81.Candès, E. J. and Y. Plan (2011). A probabilistic and RIPless theory of compressed sensing.

IEEE Trans. Inform. Theory 57 (11), 7235–7254.Candès, E. J. and M. B. Wakin (2008). An introduction to compressive sampling: A sens-ing/sampling paradigm that goes against the common knowledge in data acquisition.

IEEESignal Process. Mag. 25 (2), 21–30. 21hapelle, O., V. Vapnik, and Y. Bengio (2002). Model selection for small sample regression.

Machine Learning 48 (1), 9–23.Chatterjee, T., S. Chakraborty, and R. Chowdhury (2019). A critical review of surrogate assistedrobust design optimization.

Arch. Comput. Method. E. 26 (1), 245–274.Diaz, P. (2018). DOPT_PCE. https://github.com/CU-UQ/DOPT_PCE . [Online; accessed 14-May-2020].Diaz, P., A. Doostan, and J. Hampton (2018). Sparse polynomial chaos expansions via com-pressed sensing and D-optimal design.

Comput. Methods Appl. Mech. Engrg. 336 , 640–666.Doostan, A. and H. Owhadi (2011). A non-adapted sparse approximation of PDEs with stochas-tic inputs.

J. Comput. Phys. 230 (8), 3015–3034.Dubourg, V. (2011).

Adaptive surrogate models for reliability analysis and reliability-based designoptimization . Ph. D. thesis, Université Blaise Pascal, Clermont-Ferrand, France.Echard, B., N. Gayton, M. Lemaire, and N. Relun (2013). A combined importance samplingand Kriging reliability method for small failure probabilities with time-demanding numericalmodels.

Reliab. Eng. Syst. Safety 111 , 232–240.Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (2004). Least angle regression.

Ann.Stat. 32 , 407–499.Ernst, O., A. Mugler, H.-J. Starkloﬀ, and E. Ullmann (2012). On the convergence of generalizedpolynomial chaos expansions.

ESAIM: Math. Model. Numer. Anal. 46 (02), 317–339.Fajraoui, N., S. Marelli, and B. Sudret (2017). Sequential design of experiment for sparsepolynomial chaos expansions.

SIAM/ASA J. Unc. Quant. 5 (1), 1061–1085.Forrester, A., A. Sobester, and A. Keane (2008).

Engineering design via surrogate modelling: apractical guide . Wiley.Gerstner, T. and M. Griebel (2003). Dimension-adaptive tensor-product quadrature.

Comput-ing 71 , 65–87.Guo, X., D. Dias, and Q. Pan (2019). Probabilistic stability analysis of an embankment damconsidering soil spatial variability.

Comput. Geotech. 113 , 103093.Hampton, J. and A. Doostan (2017). BASE_PC. https://github.com/CU-UQ/BASE_PC . [On-line; accessed 14-May-2020].Hampton, J. and A. Doostan (2018). Basis adaptive sample eﬃcient polynomial chaos (BASE-PC).

J. Comput. Phys. 371 , 20–49.Harari-Ardebili, M. and B. Sudret (2020). Polynomial chaos expansion for uncertainty quantiﬁ-cation of dam engineering problems.

Engineering Structures 203 .22akeman, J. D., M. S. Eldred, and K. Sargsyan (2015). Enhancing ‘ -minimization estimates ofpolynomial chaos expansions using basis selection. J. Comput. Phys. 289 , 18–34.Konakli, K. and B. Sudret (2016). Global sensitivity analysis using low-rank tensor approxima-tions.

Reliab. Eng. Sys. Safety 156 , 64–83.Le Gratiet, L., S. Marelli, and B. Sudret (2017).

Metamodel-based sensitivity analysis: polyno-mial chaos expansions and Gaussian processes , Chapter 8, Handbook on Uncertainty Quan-tiﬁcation (Ghanem, R. and Higdon, D. and Owhadi, H. (Eds.). Springer.Loukrezis, D. and H. De Gersem (2019). Adaptive sparse polynomial chaos expansions via Lejainterpolation. arXiv preprint arXiv:1911.08312 .Lüthen, N., S. Marelli, and B. Sudret (2020). Sparse polynomial chaos expansions: Literaturesurvey and benchmark.

SIAM/ASA J. Unc. Quant. . (submitted).Mai, C. V. and B. Sudret (2015). Hierarchical adaptive polynomial chaos expansions. In M. Pa-padrakakis, V. Papadopoulos, and G. Stefanou (Eds.), , Creta, Greece.Marelli, S. and B. Sudret (2014). UQLab: A framework for uncertainty quantiﬁcation in Matlab.In

Vulnerability, Uncertainty, and Risk (Proc. 2nd Int. Conf. on Vulnerability, Risk Analysisand Management (ICVRAM2014), Liverpool, United Kingdom) , pp. 2554–2563.Marelli, S. and B. Sudret (2019). UQLab user manual – Polynomial chaos expansions. Techni-cal report, Chair of Risk, Safety and Uncertainty Quantiﬁcation, ETH Zurich, Switzerland.Report

Technometrics 35 , 243–255.Narayan, A. and J. D. Jakeman (2014). Adaptive Leja sparse grid constructions for stochastic col-location and high-dimensional approximation.

SIAM Journal on Scientiﬁc Computing 36 (6),A2952–A2983.Pati, Y. C., R. Rezaiifar, and P. S. Krishnaprasad (1993). Orthogonal matching pursuit: Re-cursive function approximation with applications to wavelet decomposition. In

Proc. of 27thAsilomar Conf. on signals, systems and computers , pp. 40–44. IEEE.Slot, R., J. D. Sorensen, B. Sudret, L. Svenningsen, and M. Thogersen (2020). Surrogate modeluncertainty in wind turbine reliability assessment.

Renewable Energy 151 , 1150–1162.Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions.

Reliab. Eng.Sys. Safety 93 , 964–979.Thapa, M., S. B. Mulani, and R. W. Walters (2020). Adaptive weighted least-squares polynomialchaos expansion with basis adaptivity and sequential adaptive sampling.

Comput. Meth. Appl.Mech. Eng. 360 , 112759. 23an den Berg, E. and M. P. Friedlander (2008). Probing the Pareto frontier for basis pursuitsolutions.

SIAM J. Sci. Comput. 31 (2), 890–912.Van den Berg, E. and M. P. Friedlander (2015). SPGL1 - A Matlab solver for sparse optimization. https://friedlander.io/software/spgl1/ . [Online; accessed 28-August-2019].Vapnik, V. N. (1995).

The Nature of Statistical Learning Theory . Springer-Verlag, New York.Xiu, D. and G. E. Karniadakis (2002). The Wiener-Askey polynomial chaos for stochasticdiﬀerential equations.

SIAM J. Sci. Comput. 24 (2), 619–644.24

Details on the settings for the basis adaptivity schemes inthe benchmark

In Table 2, we list the basis adaptivity settings for each of the 11 models in our benchmark. • The rule for the static basis (ST) is to choose a total-degree basis with p so that thenumber of basis functions P is closest to N , where N is the number of ED points. Here,for low-dimensional models, the q-norm is chosen as q = 1, while for high-dimensionalmodels, we use q = 0 . • For degree and q-norm basis adaptivity (PQ), we choose the ranges for degree and q-normas large as possible while still keeping the number of basis functions computationallymanageable (rule of thumb (cid:46) ). • For forward neighbor degree adaptivity (FN), the degree of the initial basis is chosen sothat the size of the basis is closest to 10 N (as recommended in (Jakeman et al., 2015)),while we set the q-norm to the maximum of the q-norm-range for PQ basis adaptivity. • Finally, for anisotropic degree basis adaptivity, which is only used for low-dimensionalmodels, we use q = 1 and p = d p max e , where p max is the maximum of the degree range forPQ basis adaptivity. B Additional results

B.1 Full plots of rankings

In Figs. 4-7, we display the aggregated results for all combinations of solvers and samplingschemes (while in Section 3.1, we only displayed the six best combinations sorted according tohow often they achieved an error within two times the best relMSE (red triangle). For a detaileddescription of how to read this plot, we refer to Section 3.1.

B.2 Basis adaptivity benchmark: raw data

Figs. 8 and 9 show the raw data of the basis adaptivity benchmark: for each model and EDsize, we display the boxplots of resulting relative MSE for each combination of sparse solver andbasis-adaptive scheme. 25 able 2:

Details on the initial bases and degree and q-norm ranges for the various basis adap-tivity schemes

Model dim d static basis PQ range FN initial AD initialIshigami function 3 p = 8 (small ED)/ p = 12 (large), q = 1 p ∈ [1 , . . . , q ∈ [0 . , . , . . . , p = 12 (small ED)/ p = 19 (large) p = 13Undamped oscilla-tor 6 p = 4 / p = 4 , q = 1 p ∈ [1 , . . . , q ∈ [0 . , . , . . . , p = 5 / p = 6 p = 5Borehole function 8 p = 4 / p = 4 , q = 1 p ∈ [1 , . . . , q ∈ [0 . , . , . . . , p = 5 / p = 6 p = 5Damped oscillator 8 p = 4 / p = 5 , q = 1 p ∈ [1 , . . . , q ∈ [0 . , . , . . . , p = 5 / p = 6 p = 4Wingweight func-tion 10 p = 3 / p = 4 , q = 1 p ∈ [1 , . . . , q ∈ [0 . , . , . . . , p = 4 / p = 5 p = 4 Truss model p = 3 / p = 4 , q = 1 p ∈ [1 , . . . , q ∈ [0 . , . , . . . , p = 4 / p = 5 p = 3Morris function 20 p = 6 / p = 8 , q = 0 . p ∈ [1 , . . . , q ∈ [0 . , . , . p = 6 / p = 8, q = 0 . Structural framemodel p = 5 / p = 8 , q = 0 . p ∈ [1 , . . . , q ∈ [0 . , . , . p = 6 / p = 8, q = 0 . p = 3 / p = 4 , q = 0 . p ∈ [1 , . . . , q ∈ [0 . , . , . p = 4 / p = 5, q = 0 . p = 3 / p = 4 , q = 0 . p ∈ [1 , . . . , q ∈ [0 . , . , . p = 3 / p = 4, q = 0 . p = 3 / p = 4 , q = 0 . p ∈ [1 , . . . , q = 0 . p = 4 / p = 5, q = 0 . Ranking percentage

OMP & FNOMP & STSP(LOO) & STOMP & ADSP(k=4) & STOMP & PQSPGL1 & STLARS & STSPGL1 & FNLARS & FNSPGL1 & ADSP(k=4) & PQBCS & STSP(k=4) & ADLARS & PQSPGL1 & PQSP(LOO) & ADSP(k=4) & FNBCS & PQSP(LOO) & PQLARS & ADSP(LOO) & FNBCS & ADBCS & FN Rank 1 Rank 12 Rank 24 within 2*bestwithin 5*bestwithin 10*best

Figure 4: Small ED sizes . Aggregated results for low-dim models. Ranking percentage

OMP & FNOMP & STOMP & PQOMP & ADBCS & FNLARS & STSP(LOO) & STSP(k=4) & STBCS & STLARS & FNSPGL1 & FNSPGL1 & STLARS & PQSPGL1 & PQBCS & PQBCS & ADSPGL1 & ADSP(k=4) & PQSP(LOO) & ADSP(k=4) & FNSP(LOO) & PQSP(k=4) & ADLARS & ADSP(LOO) & FN Rank 1 Rank 12 Rank 24 within 2*bestwithin 5*bestwithin 10*best

Figure 5: Large ED sizes . Aggregated results for low-dim models.

Ranking percentage

OMP & FNOMP & PQSP(LOO) & PQSP(LOO) & STSP(k=4) & STSP(k=4) & PQSP(LOO) & FNOMP & STLARS & FNLARS & STLARS & PQBCS & PQSP(k=4) & FNBCS & FNBCS & ST Rank 1 Rank 7 Rank 15 within 2*bestwithin 5*bestwithin 10*best

Figure 6: Small ED sizes . Aggregated results for high-dim models. Ranking percentage

OMP & STOMP & PQSP(LOO) & STSP(LOO) & PQOMP & FNSP(k=4) & PQSP(k=4) & STLARS & STLARS & PQBCS & PQBCS & STLARS & FNSP(LOO) & FNBCS & FNSP(k=4) & FN Rank 1 Rank 7 Rank 15 within 2*bestwithin 5*bestwithin 10*best

Figure 7:

Aggregated results for high-dim models.

Large ED sizes . a) Ishigami function, N = 50 (b) Ishigami function, N = 150 (c) Undamped oscillator, N = 60 (d) Undamped oscillator, N = 120 (e) Borehole function, N = 100 (f) Borehole function, N = 250 (g) Damped oscillator, N = 150 (h) Damped oscillator, N = 350 Figure 8:

Comparison of diﬀerent basis adaptivity schemes.

Low-dim models . Left: smallED; right: large ED. Diﬀerent colors denote diﬀerent solvers (LARS, OMP, SP k =4 , SP LOO ,BCS and SPGL1). Continued in next ﬁgure. i) Truss model, N = 100 (j) Truss model, N = 250 (k) Wingweight function, N = 100 (l) Wingweight function, N = 250 Figure 8:

Continued. Comparison of diﬀerent basis adaptivity schemes.

Low-dim models .Left: small ED; right: large ED. Diﬀerent colors denote diﬀerent solvers (LARS, OMP, SP k =4 ,SP LOO , BCS and SPGL1). (a)

Morris function, N = 150 (b) Morris function, N = 350 Figure 9:

Comparison of diﬀerent basis adaptivity schemes.

High-dim models . Left: smallED; right: large ED. Diﬀerent colors denote diﬀerent solvers (LARS, OMP, SP k =4 , SP LOO , andBCS). c) Structural frame, N = 150 (d) Structural frame, N = 350 (e) Diﬀusion 2D, N = 100 (f) Diﬀusion 2D, N = 400 (g) Diﬀusion 1D, N = 100 (h) Diﬀusion 1D, N = 400 (i) High-dim function, N = 400 (j) High-dim function, N = 1200 Figure 9:

Continued. Comparison of diﬀerent basis adaptivity schemes.

High-dim models .Left: small ED; right: large ED. Diﬀerent colors denote diﬀerent solvers (LARS, OMP, SP k =4 ,SP LOO , and BCS)., and BCS).