[PDF] The l1-l2 minimization with rotation for sparse approximation in uncertainty quantification

Abstract

This paper proposes a combination of rotational compressive sensing with the l1-l2 minimization to estimate coefficients of generalized polynomial chaos (gPC) used in uncertainty quantification. In particular, we aim to identify a rotation matrix such that the gPC of a set of random variables after the rotation has a sparser representation. However, this rotational approach alters the underlying linear system to be solved, which makes finding the sparse coefficients much more difficult than the case without rotation. We further adopt the l1-l2 minimization that is more suited for such ill-posed problems in compressive sensing (CS) than the classic l1 approach. We conduct extensive experiments on standard gPC problem settings, showing superior performance of the proposed combination of rotation and l1-l2 minimization over the ones without rotation and with rotation but using the l1 minimization.

Full PDF

aa r X i v : . [ s t a t . C O ] J a n The ℓ - ℓ minimization with rotation for sparse approximation inuncertainty quantiﬁcation Mengqi Hu , Yifei Lou , and Xiu Yang ∗ Department of Mathematical Sciences, University of Texas Dallas, Richardson, TX75080, USA Department of Industrial and Systems Engineering, Lehigh University, PA 18015, USA

Abstract

This paper proposes a combination of rotational compressive sensing with the ℓ - ℓ minimizationto estimate coeﬃcients of generalized polynomial chaos (gPC) used in uncertainty quantiﬁcation. Inparticular, we aim to identify a rotation matrix such that the gPC of a set of random variables afterthe rotation has a sparser representation. However, this rotational approach alters the underlying linearsystem to be solved, which makes ﬁnding the sparse coeﬃcients much more diﬃcult than the case withoutrotation. We further adopt the ℓ - ℓ minimization that is more suited for such ill-posed problems incompressive sensing (CS) than the classic ℓ approach. We conduct extensive experiments on standardgPC problem settings, showing superior performance of the proposed combination of rotation and ℓ - ℓ minimization over the ones without rotation and with rotation but using the ℓ minimization. Keywords

Generalized polynomial chaos, uncertainty quantiﬁcation, iterative rotations, the ℓ - ℓ min-imization Surrogate model (also known as “response surface”) plays an important role in uncertainty quantiﬁcation(UQ), as it can eﬃciently evaluate the quantity of interest (QoI) of a system given a set of inputs. Specif-ically, in parametric uncertainty study, the input usually refers to a set of parameters in the system, whilethe QoI can be observables such as mass, density, pressure, velocity, or even a trajectory of a dynamicalsystem. The uncertainty in the system’s parameters typically originates from the lack of physical knowledge,inaccurate measurements, etc. Therefore, it is common to treat these parameters as random variables, andstatistics, e.g., mean, variance, and probability density function (PDF) of the QoI with respect to suchrandom parameters are crucial in understanding the behavior of the system.A widely used surrogate model in applied mathematics and engineering studies is the generalized poly-nomial chaos (gPC) expansion [23, 61] which uses orthogonal polynomials associated with measures of theaforementioned random variables. Under some conditions, the gPC expansion converges to the QoI in aHilbert space as the number of polynomials increases [61, 9, 42, 21]. Similar to the Fourier expansion, thebasis functions are given to achieve the task of identifying gPC coeﬃcients. Both intrusive methods (e.g.,stochastic Galerkin) and non-intrusive methods (e.g., probabilistic collocation method) have been developed[23, 61, 55, 60, 4] to compute the gPC coeﬃcients. The latter is speciﬁcally more desirable to study a com-plex system, as it does not require modifying the computational models or simulation codes. In particular,the gPC coeﬃcients can be calculated based on input samples and corresponding output using least squaredﬁtting, probabilistic collocation method, etc [23, 61, 55, 60, 4]. ∗ Corresponding author: [email protected] Ψ of size M × N with M < N (or even M ≪ N ), where M is the size of available outputsamples and N is the number of basis functions used in the gPC expansion. When the solution to theunder-determined system is sparse, compressive sensing (CS) techniques [11, 19, 10, 8] are eﬀective. Recentstudies have shown some success in applying CS to UQ problems [20, 64, 68, 33, 63, 51, 44, 32, 2]. Forexample, sampling strategies [48, 26, 3, 29] can improve the property of Ψ to guarantee sparse recovery viathe ℓ minimization. Computationally, the weighted ℓ minimization [13, 68, 43, 49, 1] assigns larger weightsto smaller components (in magnitude) of the solution, and hence minimizing the weighted ℓ norm leads toa sparser solution than the vanilla ℓ minimization does. Besides, adaptive basis selection [28, 16, 6, 3, 27]as well as dimension reduction techniques can be adopted to reduce the number of unknown [71, 58] thusimproving computational eﬃciency.In this paper, we focus on a sparsity-enhancing approach, e.g., iterative rotation [33, 69, 70, 72], thatintrinsically changes the structure of the surrogate model to make the gPC coeﬃcients more sparse. However,this method tends to deteriorate properties of Ψ that are favorable by CS algorithms, e.g., low coherence,which may counteract the beneﬁt of the enhanced sparsity. Since the polynomials associated with therandom variables may not be orthogonal after the rotation, the coherence of Ψ does not converges tozeros asymptotically, leading to an ampliﬁed coherence after the rotation. To remedy this drawback, weinnovatively combine the iterative rotation technique with an ℓ - ℓ minimization algorithm [36, 74, 65] inorder to improve the eﬃciency of CS-based UQ methods. Speciﬁcally, our new approach uses rotations toincrease the sparsity while taking advantages of ℓ - ℓ formalism for dealing with Ψ that has a large mutualcoherence. In this way, we leverage the advantages of both methods to exploit information from limitedsamples of the QoI more eﬃciently, and to construct gPC expansions more accurately.The rest of the paper is organized as follows. We brieﬂy review gPC, CS, and rotational CS in Section 2.We describe the combination of rotational CS and the ℓ - ℓ minimization in Section 3, followed by theoreticalanalysis on an upper bound of the change in coherence for a special case in Section 4. Section 5 devotes toextensive experimental results, showing that the proposed approach signiﬁcantly outperforms the state-of-the-art. Finally, conclusions are given in Section 6. In this section, we brieﬂy review gPC expansions, some useful concepts in CS, and a prior work on therotational CS for gPC method.

Given a set of N deterministic functions { c n } and multivariate polynomials { ψ n } , a gPC expansion takesthe following general form: u ( x , t ; ξ ) = N X n =1 c n ( x , t ) ψ n ( ξ ) + ε ( x , t ; ξ ) , (1)where u is the QoI depending on location x , time t and a set of random variables ξ , and ε denotes anytruncation error. Of particular interest is the case when { ψ n } Nn =1 are orthonormal with respect to themeasure of ξ , i.e., Z R d ψ i ( x ) ψ j ( x ) ρ ξ ( x )d x = δ ij , (2)where ρ ξ ( x ) is the PDF of ξ and δ ij is the Kronecker delta function. In this paper, we study systems relyingon d -dimensional i.i.d. random variables ξ = ( ξ , . . . , ξ d ), and the gPC basis functions are constructed bytensor products of univariate orthonormal polynomials. Speciﬁcally, for a multi-index α = ( α , . . . , α d ) with2ach α i ∈ N ∪ { } , we set ψ α ( ξ ) = ψ α ( ξ ) ψ α ( ξ ) · · · ψ α d ( ξ d ) , (3)where ψ α i are univariate orthonormal polynomial of degree α i . For two diﬀerent multi-indices α = ( α , · · · , α d )and β = ( β , · · · , β d ), we have Z R d ψ α ( x ) ψ β ( x ) ρ ξ ( x )d x = δ αβ = δ α β δ α β · · · δ α d β d , (4)where ρ ξ ( x ) = ρ ξ ( x ) ρ ξ ( x ) · · · ρ ξ d ( x d ) . (5)Typically, a p th order gPC expansion involves all polynomials ψ α satisfying | α | ≤ p , where | α | = P di =1 α i .This condition indicates that a total number of N = (cid:0) p + dd (cid:1) polynomials are used in the expansion. Forsimplicity, we reorder α in such a way that we index ψ α by ψ n , which is consistent with Eq. (1).In this paper, we focus on time independent problems, in which the gPC expansion at a ﬁxed location x is given by u ( ξ q ) = N X n =1 c n ψ n ( ξ q ) + ε ( ξ q ) , q = 1 , , · · · , M. (6)We rewrite the above expansion in terms of the matrix-vector notation, i.e., Ψ c = u − ε , (7)where u = ( u , . . . , u M ) T is a vector of output samples, c = ( c , . . . , c N ) T is a vector of gPC coeﬃcients,measurement matrix Ψ is an M × N matrix with Ψ ij = ψ j ( ξ i ) and ε = ( ε , . . . , ε M ) T is a vector of errorsamples with ε q = ε ( ξ q ). We consider to solve an under-determined system with M < N in (7), speciﬁcallyinterested in identifying sparse coeﬃcients c = ( c , . . . , c N ) T , which is the focus of compressive sensing[17, 12, 22]. We review the concept of sparsity , which plays an important role in error estimation for solving theunder-determined system (7). The number of non-zero entries of a vector x = ( x , . . . , x N ) is denoted by k x k [17, 11, 8]. Note that k · k is named the “ ℓ norm” in [17], although it is not a norm nor a semi-norm.The vector x is called s -sparse if k x k ≤ s , and it is considered a sparse vector if s ≪ N . Few practicalsystems have truly sparse gPC coeﬃcients, but rather compressible, i.e., only a few entries contributingsigniﬁcantly to its ℓ norm. To this end, we deﬁne a vector x s as the best s -sparse approximation that canbe obtained by setting all but the s -largest entries in magnitude of x to zero. Subsequently, we call x sparseor compressible if k x − x s k is small for s ≪ N .In order to ﬁnd a sparse vector c from (7), one formulates the following problem,ˆ c = arg min c k Ψ c − u k + λ k c k , (8)where λ is a positive parameter to be tuned such that k Ψ ˆ c − u k ≤ ǫ. As the ℓ minimization (8) is NP-hardto solve [41], one often uses the convex ℓ norm to replace ℓ , i.e.,ˆ c = arg min c k Ψ c − u k + λ k c k . (9)A suﬃcient condition of the ℓ minimization to exactly recover the sparse signal was proved based on the restricted isometry property (RIP) [11]. Unfortunately, RIP is numerically unveriﬁable for a given matrix[5, 56]. Instead, a computable condition for ℓ ’s exact recovery is coherence , which is deﬁned as µ ( Ψ ) = max i = j |h ψ i , ψ j i|k ψ i kk ψ j k , with Ψ = [ ψ , · · · , ψ d ] . (10)3onoho-Elad [18] and Gribonval [24] proved independently that if k ˆ c k < (cid:18) µ ( Ψ ) (cid:19) , (11)then ˆ c is indeed the sparsest solution to (9). Apparently if the samples ξ q are drawn independently, µ ( Ψ )for the gPC expansion converges to zeros as M → ∞ according to the distribution of ξ . However, theinequality (11) implies that the ℓ minimization may not work well when the coherence of the matrix Ψ is large, e.g., µ ( Ψ ) ≈ , as (11) can only guarantee to recover a 1-sparse vector via the ℓ minimization(9). Empirically speaking, the ℓ minimization does not work very well when the matrix is coherent (i.e., µ is large). There are many nonconvex alternatives to approximate the ℓ norm, including ℓ p [14, 62, 31],capped ℓ [77, 52, 37], transformed ℓ [38, 75, 76], and ℓ /ℓ [47, 59, 54]. In this work, we adopt the ℓ - ℓ functional [73, 36, 35] due to its outstanding performance in the coherent regime. Speciﬁcally, a recent work[25] reported that ℓ - ℓ is better than the weighted ℓ minimization [13] for solving the ℓ minimization. To further exploit sparsity, we aim to ﬁnd a linear map A : R d R d such that a new set of randomvariables η , given by η = A ξ , η = ( η , η , · · · , η d ) T , (12)leads to a sparser polynomial expansion than ξ does. We consider A as an orthogonal matrix, i.e., AA T = I with the identity matrix I , such that the linear map from ξ to η can be regarded as a rotation in R d .Therefore, the new polynomial expansion for u is expressed as u ( ξ ) ≈ u g ( ξ ) = N X n =1 ˜ c n ψ n ( A ξ ) = N X n =1 ˜ c n ψ n ( η ) = v g ( η ) . (13)Here u g ( ξ ) can be understood as a polynomial u g ( x ) evaluated at random variables ξ , and the same for v g .Ideally, ˜ c is sparser than c . In the previous works [33, 69], it is assumed that ξ ∼ N ( , I ), so η ∼ N ( , I ).For general cases where { ξ i } di =1 are not i.i.d. Gaussian, { η i } di =1 are not necessarily independent. Moreover, { ψ n } Nn =1 are not necessarily orthogonal to each other with respect to ρ η . Therefore, v g ( η ) may not bea standard gPC expansion of v ( η ), but rather a polynomial equivalent to u g ( ξ ) with potentially sparsercoeﬃcients [72].We can identify A using the gradient information of u based on the framework of active subspace [50, 15].In particular, we deﬁne W = 1 √ M [ ∇ u ( ξ ) , ∇ u ( ξ ) , · · · , ∇ u ( ξ M )] . (14)Note that W is a d × M matrix, and we consider M ≥ d in this work. The singular value decomposition(SVD) or principle component analysis (PCA) of W yields W = UΣV T , (15)where U is a d × d orthogonal matrix, Σ is a d × M matrix, whose diagonal consists of singular values σ ≥ · · · ≥ σ d ≥

0, and V is a M × M orthogonal matrix. We set the rotation matrix as A = U T . Assuch, the rotation projects ξ to the directions of principle components of ∇ u . Of note, we do not use theinformation of the orthogonal matrix V .Unfortunately, u is unknown and samples of ∇ u are not available. Instead of u, we approximate Eq. (14)by a computed solution u g that can be obtained by standard ℓ [69]. In other words, we have W ≈ W g = 1 √ M [ ∇ u g ( ξ ) , ∇ u g ( ξ ) , · · · , ∇ u g ( ξ M )] , (16)4nd the rotation matrix is constructed based on the SVD of W g : W g = U g Σ g V T g , A = U T g . (17)Deﬁning η = A ξ , we compute the corresponding input samples as η q = A ξ q and construct a new measure-ment matrix Ψ ( η ) as ( Ψ ( η )) ij = ψ j ( η i ). We then solve the minimization problem in Eq. (9) to obtain ˜ c . Ifsome singular values of W are much larger than the others, we can expect to obtain a sparser representationof u with respect to η , which is dominated by the eigenspace associated with these larger singular values.On the other hand, if all the singular values σ i are of the same order, the rotation does not enhance thesparsity. In practice, this method can be designed as an iterative algorithm, in which A and ˜ c are updatedseparately in each iteration following an alternating direction manner.It is worth noting that the idea of using a linear map is also used in sliced inverse regression (SIR) [34],active subspace [50, 15], basis adaptation [57], etc., but with diﬀerent manners in computing the matrix.In contrast to these methods, the iterative rotation approach does not truncate the dimension in the sensethat A is a square matrix. As an initial guess may not be suﬃciently accurate, reducing dimension beforethe iterations terminate may lead to suboptimal results. The dimension reduction was integrated withthe iterative method in [66], while another iterative rotation method preceded with SIR-based dimensionreduction was proposed in [71]. We refer interested readers to the respective literature. When applying the rotational CS techniques, the measurement matrix Ψ ( η ) may become more coherentcompared with Ψ . This is because popular ψ i used in gPC method, e.g., Legendre and Laguerre polynomials,are not orthogonal with respect to the measure of η , so µ ( Ψ ( η )) converges to a positive number instead ofzeros as µ ( Ψ ) does. Under such coherent regime, we adopt the ℓ - ℓ minimization to identify the sparsecoeﬃcients ˜ c .To start with, we generate input samples { ξ q } Mq =1 based on the distribution of ξ and select the gPC basisfunctions { ψ j } Nj =1 associated with ξ in order to generate the measurement matrix Ψ by setting Ψ ij = ψ j ( ξ i ),where we initialize A (0) = I and η (0) = ξ . Then we propose an alternating direction method (ADM) thatcombines the ℓ - ℓ minimization and rotation matrix estimation. Speciﬁcally given { ξ q } Mq =1 , { ψ j } Nj =1 , and u := { u q = u ( ξ q ) } Mq =1 , we formulate the following minimization problem,arg min c , A k Ψ c − u k + λ ( k c k − θ k c k ) , AA T = I and Ψ ij = ψ j ( A ξ i ) . (18)If θ = 0 , it reduces to the ℓ approach [69], while we propose to use θ = 1 for the ℓ - ℓ model in this paper.We can minimize Eq. (18) with respect to c and A in an alternating direction manner. Speciﬁcally when A isﬁxed, we minimize the ℓ - ℓ functional to identify c , as detailed in Section 3.1. When c is ﬁxed, optimizing A is computationally expensive, unless a dimension reduction technique is used, e.g., as in [58]. Instead, we usethe rotation estimation introduced in Section 2.3 to compute A . Admittedly, this way of estimating A maynot be optimal, but it promotes the sparsity of c , thus improving the accuracy of the sparse approximationof the gPC expansion. In addition, the sparsity structure is problem-dependent (see examples in Section 5),and more iterations do not grant signiﬁcant improvements for many practical problems. ℓ - ℓ minimization We focus on the c -subproblem in (18), whose objective function is deﬁned as F ( c ) = 12 k Ψ c − u k + λ ( k c k − θ k c k ) . We adopt the diﬀerence of convex algorithm (DCA) [45, 46] that decomposes F ( c ) = G ( c ) − H ( c ) into (cid:26) G ( c ) = k Ψ c − u k + λ k c k H ( c ) = λθ k c k .

5n iterative scheme of minimizing F ( c ) relies on a linearization of H ( c ) at the current step c ( n ) to advanceto the next one, i.e., c ( n +1) = arg min c ∈ R n k Ψ c − u k + λ k c k − (cid:28) c , λθ c ( n ) k c ( n ) k (cid:29) . (19)Note that (19) is an ℓ type of problems, which can be minimized by the alternating direction method ofmultipliers (ADMM) [7]. Denote v = λθ c ( n ) k c ( n ) k . We introduce an auxiliary variable y and rewrite (19) as anequivalent problem, min c , y λ k c k + 12 k Ψ y − u k − h y , v i . s.t. c = y . (20)This new formulation (20) makes the objective function separable with respect to two variables c and y .Speciﬁcally, the augmented Lagrangian corresponding to (20) can be expressed as L ρ ( c , y ; w ) = λ k c k + 12 k Ψ y − u k − h y , v i + w T ( c − y ) + ρ k c − y k , (21)where w is an Lagrangian multiplier and ρ is a positive parameter. Then ADMM consists of three steps: c ( n ) k +1 = arg min c λ k c k + ρ k c − y ( n ) k + w ( n ) k ρ k (22) y ( n ) k +1 = arg min y k Ψ y − u k − h y , v i + ρ k c ( n ) k +1 − y + w ( n ) k ρ k (23) w ( n ) k +1 = w ( n ) k + ρ ( c ( n ) k +1 − y ( n ) k +1 ) . (24)Here we use subscript k to indicate the inner ADMM iteration, as opposed to the superscript n for the outerDCA loop. The c − update (22) has a closed-form solution given by c ( n ) k +1 = shrink( y ( n ) k − w ( n ) k ρ , λρ ) , (25)where the soft thresholding or shrinkage is deﬁned asshrink( v , µ ) =  v − µ v > µ | v | ≤ µ v + µ v < − µ. (26)As for the y − update (23), we take the gradient of L ρ with respect to y , thus leading to Ψ T ( Ψ y − u ) − v T + ρ ( y − c ( n ) k +1 − w ( n ) k ρ ) = . (27)Therefore, the update for y is given by y ( n ) k +1 = ( Ψ T Ψ + ρ I ) − (cid:16) Ψ T u + v T + ρ c ( n ) k +1 + w ( n ) k (cid:17) . (28)Note that Ψ T Ψ + ρ I is a positive deﬁnite matrix and there are many eﬃcient numeral algorithms for matrixinversion. Since Ψ has more columns than rows in our case, we further use the Woodbury formula to speedup, ρ ( Ψ T Ψ + ρ I ) − = I − ρ Ψ T ( ΨΨ T + ρ I ) − , (29)as ΨΨ T has a smaller dimension than Ψ T Ψ to be invented.In summary, the overall ℓ - ℓ minimization algorithm is described in Algorithm 1.6 lgorithm 1 ℓ - ℓ minimization via DCA Input: measurement matrix Ψ and observed data u . Parameters: λ , ρ , ǫ ∈ R + and n max , k max ∈ Z + . Initialize: c , y , v , w and n, k = 0. while n ≤ n max or k c ( n ) − c ( n − k > ǫ do v = λθ c ( n ) / k c ( n ) + ǫ k while k ≤ k max or k c ( n ) − c ( n − k > ǫ do c k +1 = shrink( y k − w k /ρ, λ/ρ ) y k +1 = ( Ψ T Ψ + ρ I ) − ( Ψ T u + v T + ρ c k +1 + w k ) w k +1 = w k + ρ ( c k +1 − y k +1 ) k = k + 1. end while c ( n +1) = c k n = n + 1 and k = 0 end while return c = c ( n +1) . Suppose we obtain the gPC coeﬃcients ˜ c ( l ) at the l -th iteration via Algorithm 1 with l ≥

1. Given v ( l ) g ( η )with input samples { ( η ( l ) ) q } Mq =1 for ( η ( l ) ) q = A ( l − ( η ( l − ) q , we collect the gradient of v ( l ) g , denoted by W ( l ) g = 1 √ M h ∇ ξ v ( l ) g (cid:16) ( η ( l ) ) (cid:17) , · · · , ∇ ξ v ( l ) g (cid:16) ( η ( l ) ) M (cid:17)i , (30)where ∇ ξ · = ( ∂ · /∂ξ , ∂ · /∂ξ , · · · , ∂ · /∂ξ d ) T . It is straightforward to evaluate ∇ ψ n at ( η ( l ) ) q , as weconstruct ψ n using the tensor product of univariate polynomials (3) and derivatives for widely used orthogonalpolynomials, e.g., Hermite, Laguerre, Legendre, and Chebyshev, are well studied in the UQ literature. Here,we can analytically compute the gradient of v ( l ) g with respect to ξ using the chain rule, ∇ ξ v ( l ) g (cid:16) ( η ( l ) ) q (cid:17) = ∇ ξ v ( l ) g (cid:16) A ( l ) ξ q (cid:17) = ( A ( l ) ) T ∇ v ( l ) g ( x ) (cid:12)(cid:12)(cid:12)(cid:12) x = A ( l ) ξ q =( A ( l ) ) T ∇ N X n =1 ˜ c ( l ) n ψ n ( x ) (cid:12)(cid:12)(cid:12)(cid:12) x = A ( l ) ξ q = ( A ( l ) ) T N X n =1 ˜ c ( l ) n ∇ ψ n ( x ) (cid:12)(cid:12)(cid:12)(cid:12) x = A ( l ) ξ q . (31)Then, we have an update for A ( l +1) = (cid:16) U ( l ) g (cid:17) T , where U ( l ) W is from the SVD of W ( l ) g = U ( l ) g Σ ( l ) g (cid:16) V ( l ) g (cid:17) T . (32)Now we can deﬁne a new set of random variables as η ( l +1) = A ( l +1) ξ and compute their samples accordingly:( η ( l +1) ) q = A ( l +1) ξ q . These samples will be used to construct a new measurement matrix Ψ ( l +1) as Ψ ( l +1) ij = ψ j (( η ( l +1) ) i ) to feed into the ℓ - ℓ minimization to obtain c ( l +1) .We summarize the entire procedure in Algorithm 2. The stopping criterion we use is l ≤ l max , where l max is can taken as 2 or 3 empirically for practical problems. Alternatively, it can be set as the diﬀerencebetween two successive rotation matrices A ( l ) and A ( l +1) .7 lgorithm 2 Alternating direction method of minimizing (18) Generate input samples { ξ q } Mq =1 based on the distribution of ξ . Generate corresponding output samples u := { u q = u ( ξ q ) } Mq =1 by solving the complete model, e.g.,running simulations, solvers, etc. Select gPC basis functions { ψ n } Nn =1 associated with ξ and set counter l = 0. Set A (0) = I and η (0) = ξ . Generate the measurement matrix Ψ ( l ) by setting Ψ ( l ) ij = ψ j ( ξ i ). for l < l max do if l > then Construct W ( l ) in (30) with v ( l ) g = P Nn =1 ˜ c ( l ) ψ n ( ξ ( l ) ). Compute SVD of W ( l ) g : W ( l ) g = U ( l ) g Σ ( l ) g (cid:16) V ( l ) g (cid:17) T . Set A ( l +1) = (cid:16) U ( l ) g (cid:17) T and η ( l +1) = A ( l +1) ξ . Construct the new measurement matrix Ψ ( l +1) with Ψ ( l +1) ij = ψ j (cid:0) ( η ( l +1) ) i (cid:1) . end if Solve the ℓ - ℓ minimization via DCA (Algorithm 1):˜ c ( l +1) = arg min c k Ψ ( l +1) c − u k + λ ( k c k − θ k c k ) . l = l + 1. end for Construct gPC expansion as u ( ξ ) ≈ u g ( ξ ) = v ( l max ) g ( η ( l max ) ) = N P n =1 ˜ c ( l max ) n ψ n ( A ( l max ) ξ ). Generally speaking, ψ i ( A x ) Nn =1 are not orthogonal with respect to the measure of ξ [72], i.e., Z R d ψ i ( x ) ψ j ( x ) ρ ξ ( x )d x = δ ij . (33)As a result, the mutual coherence of matrix Ψ ( η ) does not converge to zero as the number of samples in-creases. Besides, µ ( Ψ ( η )) can even be larger than µ ( Ψ ). An exception is the normalized Hermite polynomialassociated with the standard normal random variables N ( , I ). Rotating a sample drawn from N ( , I ) stillyields a sample of a standard normal random variable, so the expectation of the mutual coherence remain thesame, i.e., E { µ ( Ψ ) } = E { µ ( Ψ ( η )) } . In practice, we draw a sample from the standard normal distribution,then accept this sample as ξ i only if it is within a ball B R (0) with a radius parameter R >

0. As the samples ξ i are drawn from a truncated normal distribution, Hermite polynomials are not necessarily orthonormalwith respect to the measure of ξ . Theorem 4.3 provides an upper bound for the change of µ in the case ofHermite polynomials. To prove for Theorem 4.3, we introduce Lemma 4.1 that can be proved in the sameway as in [20]. Lemma 4.1

Let ξ , . . . , ξ M be independent samples drawn from the truncated normal distribution, i.e., ξ ∼ N ( , I ) with k ξ k < R . { ψ j } Nj =1 are normalized multivariate Hermite polynomials constructed as inEq. (3) with maximum order p , and a matrix Ψ is constructed as Ψ ij = ψ j ( ξ i ) . Denote τ jk = E { ψ j ( ξ ) ψ k ( ξ ) } (for j = k ), τ jj = 1 − E { ψ j ( ξ ) } , and τ = max ≤ j,k ≤ N | τ jk | . Then for any t > ,Prob (cid:26) µ ( Ψ ) ≥ τ + t − τ − t (cid:27) < N exp (cid:18) − M ( t + τ ) C R p (cid:19) , (34) where C is a constant depending on p . roof 4.2 It is clear that E ( M M X i =1 ψ j ( ξ i ) ψ k ( ξ i ) ) = E { ψ j ( ξ ) ψ k ( ξ ) } = τ jk , j = k, and E ( M M X i =1 ψ j ( ξ i ) ) = E { ψ j ( ξ ) } = 1 − τ jj . Using the McDiarmid’s inequality [40], we obtain that for t > ,Prob ((cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M M X i =1 ψ j ( ξ i ) ψ k ( ξ i ) − τ jk (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t ) ≤ (cid:18) − M t H ( R, p ) (cid:19) , (35) where H ( R, p ) = sup x ∈ B R (0) | ψ j ( x ) | ≤ C R p with a constant C that only depends on p . The inequality (35) indicates that Prob ((cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M M X i =1 ψ j ( ξ i ) ψ k ( ξ i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ τ + t ) ≤ (cid:18) − M t H ( R, p ) (cid:19) . Similarly, we get Prob ( M M X i =1 ψ j ( ξ i ) ≤ − τ − t ) ≤ (cid:18) − M t H ( R, p ) (cid:19) . The above two probability estimations yield thatProb ((cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M P Mi =1 ψ j ( ξ i ) ψ k ( ξ i )( M P Mi =1 ψ j ( ξ i )) / ( M P Mi =1 ψ k ( ξ i )) / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ τ + t − τ − t ) ≤ (cid:18) − M t C R p (cid:19) . Notice that µ ( Ψ ) = max ≤ j,k ≤ Nj = k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M P Mi =1 ψ j ( ξ i ) ψ k ( ξ i )( M P Mi =1 ψ j ( ξ i )) / ( M P Mi =1 ψ k ( ξ i )) / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , so Prob (cid:26) µ ( Ψ ) ≥ τ + t − τ − t (cid:27) ≤ (cid:18) N (cid:19) (cid:18) − M t C R p (cid:19) < N exp (cid:18) − M ( t + τ ) C R p (cid:19) . Since k A ξ i k = k ξ i k for a unitary matrix A , Lemma 4.1 still holds when replacing ξ i with A ξ i .Therefore, we have the following theorem: Theorem 4.3

Let { ξ i } Mi =1 be independent samples of the normal random variable ξ ∼ N ( , I ) with k ξ i k − N exp (cid:18) − M r C R p (cid:19) , (36) where r = τ + t ≥ τ and τ is deﬁned in Lemma 4.1. Proof 4.4

Apply Lemma 4.1 to µ ( Ψ ) and µ ( ˜ Ψ ) , and use the fact that µ ( Ψ ) , µ ( ˜ Ψ ) ≥ . Numerical Examples

In this section, we conduct extensive experiments using three diﬀerent types of polynomial expansions todemonstrate the performance of the proposed method in comparison to the one without rotation (by setting l max = 1 in Algorithm 2) and the ℓ approach (by setting θ = 1 in (18)). The performance is evaluated interms of relative error (RE), deﬁned as k u − u g k k u k , where u is the exact solution, u g is the gPC approximationof u , and the integral in computing the norm k · k is approximated with a high-level sparse grids method,based on one-dimensional Gaussian quadrature and the Smolyak structure [53].The parameters λ, ρ for both ℓ and ℓ - ℓ are tuned to achieve the smallest RE after l max = 9 rotationsfor demonstration purpose. In particular, we start with a coarser grid of 10 − , 10 − , · · · , 10 . Once we ﬁndthe optimal order, we then multiply it by 0 .

5, 1, · · · , 9 . λ, ¯ ρ ), thatprovide the smallest RE. Note that (¯ λ, ¯ ρ ) may not be optimal for other number of rotations, except for 9.We specify the optimal parameter pair (¯ λ, ¯ ρ ) for each testing case in the corresponding section. In practice,the parameters can be determined by cross-validation or followed by the work of [65]. Consider the following ridge function: u ( ξ ) = d X i =1 ξ i + 0 . d X i =1 ξ i ! + 0 . d X i =1 ξ i ! . (37)As { ξ i } di =1 are equally important in this example, adaptive methods [39, 67, 78] that build surrogate modelshierarchically based on the importance of ξ i may not be eﬀective. We consider a rotation matrix in the formof A =  d − d − · · · d − ˜ A  , (38)where ˜ A is an d × ( d −

1) matrix designed to guarantee the orthogonality of the matrix A , e.g., A can beobtained by the Gram–Schmidt process. With this choice of A , we have η = d − P di =1 ξ i and u can berepresented as u ( ξ ) = v ( η ) = d η + 0 . dη + 0 . d η . (39)In the expression u ( ξ ) = P Nn =1 ˜ c n ψ n ( A ξ ) = P Nn =1 ˜ c n ψ n ( η ), all of the polynomials that are not related to η make no contribution to the expansion, which guarantees the sparsity of ˜ c = (˜ c , . . . , ˜ c N ).By setting d = 12 (hence, the number of gPC basis functions is N = 455 for p = 3), we comparethe accuracy of computing gPC expansions using ℓ and ℓ - ℓ minimization with and without rotations.We consider gPC expansion using Legendre polynomial (assuming ξ i are i.i.d. uniform random variables),Hermite polynomial expansion (assuming ξ i are i.i.d. Gaussian random variables), and Laguerre polynomial(assuming ξ i are i.i.d. exponential random variables), respectively. The number of sample M ranges from100 to 180 in our tests. Each experiment is repeated 100 times with 9 rotations for each case to compute theaverage RE. We choose the optimal parameters of (¯ λ, ¯ ρ ) to be (3 × − , × − ) , (10 − , − ) , and (5 × − ,

1) for the ℓ - ℓ method, when considering Legendre, Hermite, and Laguerre polynomials, respectively.As for ℓ , the optimal parameters (¯ λ, ¯ ρ ) are set as (6 × − , × − ) , (10 − , − ) , and (5 × − , , forthe corresponding polynomials.Figure 1 presents relative errors corresponding to diﬀerent ratio M/N . It shows that the standard ℓ minimization without rotation is not eﬀective, as the relative error is close to 50% even when M/N approaches0 .

4. On the other hand, the ℓ - ℓ approach does not improve much upon ℓ in this case, but rather our10 .2 0.25 0.3 0.35 0.410 -5 (a) Legendre -6 -4 -2 (b) Hermite -3 -2 -1 (c) Laguerre Figure 1: (Ridge function) Relative error of Legendre, Hermite and Laguerre polynomial expansions for ridgefunction against ratio

M/N . Solid lines are for ℓ - ℓ minimization and dashed lines are for ℓ minimization,while the marker “ ◦ ” is used for results without rotation and “ ♦ ” for results with 9 rotations.iterative rotation demonstrates much higher accuracy, especially when M is large. The reason can be partiallyexplained by Figure 2. Speciﬁcally, Figure 2 (a), (b) and (c) plot the absolute values of exact coeﬃcients | c n | of Legendre, Hermite, and Laguerre polynomials before rotation using 120 samples (randomly chosenfrom the 100 independent experiments), while Figure 2 (d), (e) and (f) show corresponding coeﬃcients | ˜ c n | after 9 iterations. In Figure 2, we do not include ˜ c n whose absolute value is smaller than 10 − , since theyare suﬃciently small (more than two magnitudes smaller than the dominating ones). As demonstrated inFigure 2, the iterative updates on the rotation matrix signiﬁcantly sparsiﬁes the representation of u ; and asa result, the eﬃciency of compressive sensing methods is substantially enhanced. -3 -2 -1 (a) Legendre | c n | -3 -2 -1 (b) Hermite | c n | -2 (c) Laguerre | c n | -3 -2 -1 (d) Legendre | ˜ c n | -2 (e) Hermite | ˜ c n | -2 (f) Laguerre | ˜ c n | Figure 2: (Ridge function) Absolute values of exact coeﬃcients c n (ﬁrt row) and coeﬃcients ˜ c n after 9rotations (second row) using 120 samples. 11 .2 Korteweg-de Vries equation As an example of a more complicated and nonlinear diﬀerential equation, we consider the Korteweg-deVries (KdV) equation with time-dependent additive noise, u t ( x, t ; ξ ) − u ( x, t ; ξ ) u x ( x, t ; ξ ) + u xxx ( x, t ; ξ ) = f ( t ; ξ ) , x ∈ ( −∞ , ∞ ) u ( x, ξ ) = − ( x ) . (40)Here f ( t ; ξ ) is modeled as a random ﬁeld represented by the Karhunen-Lo`eve expansion: f ( t ; ξ ) = σ d X i =1 p λ i φ i ( t ) ξ i , (41)where σ is a constant and { λ i , φ i ( t ) } di =1 are eigenvalues/eigenfunctions (in a descending order) of the expo-nential covariance kernel: C ( x, x ′ ) = exp (cid:18) − | x − x ′ | l c (cid:19) . (42)The value of { λ i } di =1 and analytical expressions for { φ i ( x ) } di =1 are discussed in [30]. We set l c = 0 .

25 and d = 10 in (42) such that P di =1 λ i > . P ∞ i =1 λ i . Under this setting, we have an analytical solution givenby u ( x, t ; ξ ) = σ d X i =1 p λ i ξ i Z t φ i ( y )d y − x − t + 6 σ d X i =1 p λ i ξ i Z t Z z φ i ( y )d y d z ! . (43)We choose QoI to be u ( x, t ; ξ ) at x = 6 , t = 1 , and σ = 0 .

4. Thanks to analytical expressions of φ i ( x ), wecan compute the integrals in (43) with high accuracy. Denote A i = p λ i Z φ i ( y )d y and B i = p λ i Z Z z φ i ( y )d y d z, i = 1 , , · · · , d, (44)the analytical solution can be written as u ( x, t ; ξ ) | x =6 ,t =1 = σ d X i =1 A i ξ i − σ d X i =1 B i ξ i ! . (45)We use a fourth-order gPC expansion to approximate the solution, i.e., p = 4, and the number of gPCbasis functions N = 1001. The experiment is repeated 50 times with 3 rotations for each gPC expansion tocompute the average relative error. For both polynomials, we ﬁx ¯ λ = 1 × − and ¯ ρ = 1 × − in ℓ - ℓ ,while choosing (10 − , − ) , (10 − , × − ) in ℓ for Legendre and Hermite, respectively. The relativeerrors of the Legendre and Hermite polynomial expansions are presented in Figure 3, which illustrates thecombined method of iterative rotation and ℓ - ℓ minimization outperforms ℓ and ℓ - ℓ approaches. In theHermite polynomial case, when the sample size is small (i.e., M/N ≤ . ℓ - ℓ alone can outperform thecombination of ℓ and iterative rotation method. This indicates that the impact of increased coherence afterrotation on ℓ minimization is more signiﬁcant than the potential enhancement of sparsity. Similarly for theLegendre case, the ℓ - ℓ without rotation is close to ℓ combined with rotation when M/N = 0 . . We illustrate the potential capability of the proposed approach for dealing with higher-dimensional prob-lems, referred to as HD function. Speciﬁcally, we select a function similar to the one in Section 5.1 but witha much higher dimension, u ( ξ ) = d X i =1 ξ i + 0 . d X i =1 ξ i / √ i ! , d = 100 . (46)12 .1 0.12 0.14 0.16 0.1810 -3 -2 (a) Legendre -3 -2 (b) Hermite Figure 3: (KdV equation) Relative error of Legendre and Hermite polynomial expansions for KdV equationagainst ratio

M/N . Solid lines are for ℓ - ℓ minimization and dashed lines are for ℓ minimization. “ ◦ ” isthe marker for results without rotation and “ ♦ ” is the marker for results with 3 rotations.The total number of basis functions for this example is N = 5151. The experiment is repeated 20 times with 3rotations for each polynomial to compute the average relative error. We choose (¯ λ, ¯ ρ ) to be (5 × − , × − )and (10 − , × − ) in ℓ - ℓ as well as (1 . , × − ) and (10 − , − ) in ℓ for Legendre and Hermite,respectively. The results are presented in Figure 4. As in the previous examples, ℓ - ℓ combined withrotation yields best results. -2 -1 (a) Legendre -2 -1 (b) Hermite Figure 4: (High-dimensional function) Relative error of Legendre and Hermite polynomial expansions for thehigh-dimensional function. Solid lines are for ℓ - ℓ minimization and dashed lines are for ℓ minimization.“ ◦ ” is the marker for results without rotation and “ ♦ ” is the marker for results with 3 rotations. We intend to discuss the eﬀects of coherence and the number of rotations on the performance of the ℓ and ℓ - ℓ approaches. As reported in [74, 36], ℓ - ℓ performs particularly well for coherent matrices. Thecoherence values are reported in Tables 1-2 for ridge and KdV/HD, respectively. Both Tables 1-2 conﬁrmthat more rotations increase the coherence level of the sensing matrix Ψ except for the Hermite polynomialcase. For relatively incoherent scenarios (e.g., Legendre and Hermite in Ridge), we observe signiﬁcant13mprovements of ℓ - ℓ over ℓ in numerical simulations (see Figure 1). On the other hand, ℓ - ℓ is slightlybetter than ℓ when coherence values are closer to 1 (e.g., Laguerre in Ridge), which seems diﬃcult for anysparse recovery algorithms.Table 1: Average coherence of matrix Ψ (size 160 × Ψ for KdV (matrix size 160 × × η = ξ , while the KdV equation does not have this property. Also, there is no truncationerror ε ( ξ ) when using the gPC expansion to represent the ridge function as we use a third order expansion,while ε ( ξ ) exists for the KdV equation. In most practical problems, the truncation error exists and thelinear transform may not yield a very good low-dimensional structure to have sparse coeﬃcients of the gPCexpansion. Therefore, we empirically set a maximum number of rotations l max to terminate iterations in thealgorithm.Finally, we present the computation time in Table 3. All the experiments are performed on an AMDRyzen 5 3600, 16 GB RAM machine on Windows 10 system with Matlab 2018b. The major computationcomes from two components: one is the ℓ - ℓ minimization and the other is to ﬁnd the rotation matrix A .The computation complexity for every iteration of the ℓ - ℓ algorithm is O ( M + M N ) , which reduces to O ( M N ) as we assume M ≪ N. In practice, we choose the maximum outer/inner numbers in Algorithm 1as n max = 10 , k max = 2 N respectively, and hence the complexity for ℓ - ℓ algorithm is O ( M N ). To ﬁndthe rotation matrix A, one has to construct a matrix W using (30) with complexity of O ( M N ) , followed bySVD with complexity of O ( M N + M N ) . Therefore, the total complexity of our approach is O ( M N )per rotation. We divide the time of Legendre polynomial reported in Table 3 by l max ( M N ) , getting1 . e − , . e − , and 0 . e − . As the ratios are of the same order, the empirical results are consistentwith the complexity analysis.

In this work, we proposed an alternating direction method to identify a rotation matrix iteratively inorder to enhance the sparsity of gPC expansion, followed by the ℓ - ℓ minimization to eﬃciently identifythe sparse coeﬃcients. As such, it improves the accuracy of the compressive sensing method to construct14 -6 -4 -2 (a) Ridge Legendre -4 -2 (b) Ridge Hermite (c) KdV Legendre -3 -2 (d) KdV Hermite Figure 5:

Relative error vs rotation. The size of Ψ is 160 ×

455 in the Ridge problem and 160 × the gPC expansions from a small amount of data. In particular, the rotation is determined by seekingthe directions of maximum variation for the QoI through SVD of the gradients at diﬀerent points in theparameter space. The linear system after rotations becomes ill-posed, speciﬁcally more coherent, whichmotivated us to choose the ℓ - ℓ method for sparse recovery. We conducted extensive simulations undervarious scenarios, including ridge function, KdV equation, and HD function with Legendre, Hermite, andLaguerre polynomials, all of which are widely used in practice. Our experimental results demonstrated theproposed combination of rotation and ℓ - ℓ signiﬁcantly outperforms the standard ℓ minimization (withoutrotation) and the rotational CS with the ℓ approach.Table 3: Computation time per each random realization, averaged over 20 trials. (N/A means a certain caseis not available.) Time ( sec.) dimension rotations Legendre Hermite LaguerreRidge function 160 ×

455 9 6.53 4.31 16.19KdV equation 160 × × cknowledgement This work was partially supported by NSF CAREER 1846690. Xiu Yang was supported by the U.S.Department of Energy (DOE), Oﬃce of Science, Oﬃce of Advanced Scientiﬁc Computing Research (ASCR)as part of Multifaceted Mathematics for Rare, Extreme Events in Complex Energy and Environment Systems(MACSER).

References [1]

B. Adcock , Inﬁnite-dimensional compressed sensing and function interpolation , Found. of Comput.Math., (2017), pp. 1–41.[2]

B. Adcock, S. Brugiapaglia, and C. G. Webster , Compressed sensing approaches for polynomialapproximation of high-dimensional functions , in Compressed Sensing and its Applications, Springer,2017, pp. 93–124.[3]

N. Alemazkoor and H. Meidani , Divide and conquer: An incremental sparsity promoting com-pressive sampling approach for polynomial chaos expansions , Comput. Methods Appl. Mech. Eng., 318(2017), pp. 937–956.[4]

I. Babuˇska, F. Nobile, and R. Tempone , A stochastic collocation method for elliptic partial diﬀer-ential equations with random input data , SIAM Rev., 52 (2010), pp. 317–355.[5]

A. S. Bandeira, E. Dobriban, D. G. Mixon, and W. F. Sawin , Certifying the restricted isometryproperty is hard , IEEE Trans. Inf. Theory, 59 (2013), pp. 3448–3450.[6]

G. Blatman and B. Sudret , Adaptive sparse polynomial chaos expansion based on least angle regres-sion , J. Comput. Phys., 230 (2011), pp. 2345–2367.[7]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein , Distributed optimization and statisticallearning via the alternating direction method of multipliers , Found. Trends Mach. Learn., 3 (2011), pp. 1–122.[8]

A. M. Bruckstein, D. L. Donoho, and M. Elad , From sparse solutions of systems of equations tosparse modeling of signals and images , SIAM Rev., 51 (2009), pp. 34–81.[9]

R. H. Cameron and W. T. Martin , The orthogonal development of non-linear functionals in seriesof fourier-hermite functionals , Ann. Math., (1947), pp. 385–392.[10]

E. J. Cand`es , The restricted isometry property and its implications for compressed sensing , C.R. Math.,346 (2008), pp. 589–592.[11]

E. J. Cand`es, J. K. Romberg, and T. Tao , Stable signal recovery from incomplete and inaccuratemeasurements , Comm. Pure Appl. Math, 59 (2006), pp. 1207–1223.[12]

E. J. Cand`es and M. B. Wakin , An introduction to compressive sampling , IEEE Signal Process.Mag., 25 (2008), pp. 21–30.[13]

E. J. Cand`es, M. B. Wakin, and S. P. Boyd , Enhancing sparsity by reweighted l1 minimization , J.Fourier Anal. Appl., 14 (2008), pp. 877–905.[14]

R. Chartrand , Exact reconstruction of sparse signals via nonconvex minimization , IEEE Signal Pro-cess Lett., 14 (2007), pp. 707–710.[15]

P. G. Constantine, E. Dow, and Q. Wang , Active subspace methods in theory and practice: Ap-plications to kriging surfaces , SIAM J. Sci. Comput., 36 (2014), pp. A1500–A1524.1616]

W. Dai and O. Milenkovic , Subspace pursuit for compressive sensing: Closing the gap betweenperformance and complexity , tech. rep., ILLINOIS UNIV AT URBANA-CHAMAPAIGN, 2008.[17]

D. L. Donoho , Compressed sensing , IEEE Trans. Inf. Theory, 52 (2006), pp. 1289–1306.[18]

D. L. Donoho and M. Elad , Optimally sparse representation in general (nonorthogonl) dictionariesvia l1 minimization , Proc. Nat. Acad. Scien. USA, 100 (2003), pp. 2197–2202.[19]

D. L. Donoho, M. Elad, and V. N. Temlyakov , Stable recovery of sparse overcomplete represen-tations in the presence of noise , IEEE Trans. Inf. Theory, 52 (2006), pp. 6–18.[20]

A. Doostan and H. Owhadi , A non-adapted sparse approximation of PDEs with stochastic inputs ,J. Comput. Phys., 230 (2011), pp. 3015–3034.[21]

O. G. Ernst, A. Mugler, H.-J. Starkloff, and E. Ullmann , On the convergence of generalizedpolynomial chaos expansions , ESAIM: Math. Model. Numer. Anal., 46 (2012), pp. 317–339.[22]

S. Foucart and H. Rauhut , A mathematical introduction to compressive sensing , vol. 1, Birkh¨auserBasel, 2013.[23]

R. G. Ghanem and P. D. Spanos , Stochastic ﬁnite elements: a spectral approach , Springer-Verlag,New York, 1991.[24]

R. Gribonval and M. Nielsen , Sparse representations in unions of bases , IEEE Trans. Inf. Theory,49 (2003), pp. 3320–3325.[25]

W. Guo, Y. Lou, J. Qin, and M. Yan , A novel regularization based on the error function for sparserecovery , arXiv preprint arXiv:2007.02784, (2020).[26]

J. Hampton and A. Doostan , Compressive sampling of polynomial chaos expansions: Convergenceanalysis and sampling strategies , J. Comput. Phys., 280 (2015), pp. 363–386.[27] ,

Basis adaptive sample eﬃcient polynomial chaos (base-pc) , J. Comput. Phys., 371 (2018), pp. 20–49.[28]

J. D. Jakeman, M. S. Eldred, and K. Sargsyan , Enhancing ℓ -minimization estimates of polyno-mial chaos expansions using basis selection , J. Comput. Phys., 289 (2015), pp. 18–34.[29] J. D. Jakeman, A. Narayan, and T. Zhou , A generalized sampling and preconditioning scheme forsparse approximation of polynomial chaos expansions , SIAM J. Sci. Comput., 39 (2017), pp. A1114–A1144.[30]

M. Jardak, C.-H. Su, and G. E. Karniadakis , Spectral polynomial chaos solutions of the stochasticadvection equation , J. Sci. Comput., 17 (2002), pp. 319–338.[31]

M.-J. Lai, Y. Xu, and W. Yin , Improved iteratively reweighted least squares for unconstrainedsmoothed ℓ q minimization , SIAM J. Numer. Anal., 51 (2013), pp. 927–957.[32] H. Lei, X. Yang, Z. Li, and G. E. Karniadakis , Systematic parameter inference in stochasticmesoscopic modeling , J. Comput. Phys., 330 (2017), pp. 571–593.[33]

H. Lei, X. Yang, B. Zheng, G. Lin, and N. A. Baker , Constructing surrogate models of complexsystems with enhanced sparsity: quantifying the inﬂuence of conformational uncertainty in biomolecularsolvation , SIAM Multiscale Model. Simul., 13 (2015), pp. 1327–1353.[34]

K.-C. Li , Sliced inverse regression for dimension reduction , J. Am. Stat. Assoc., 86 (1991), pp. 316–327.[35]

Y. Lou and M. Yan , Fast l1-l2 minimization via a proximal operator , J. Sci. Comput., 74 (2018),pp. 767–785. 1736]

Y. Lou, P. Yin, Q. He, and J. Xin , Computing sparse representation in a highly coherent dictionarybased on diﬀerence of L and L , J. Sci. Comput., 64 (2015), pp. 178–196.[37] Y. Lou, P. Yin, and J. Xin , Point source super-resolution via non-convex l1 based methods , J. Sci.Comput., 68 (2016), pp. 1082–1100.[38]

J. Lv, Y. Fan, et al. , A uniﬁed approach to model selection and sparse recovery using regularized leastsquares , Annals of Stat., 37 (2009), pp. 3498–3528.[39]

X. Ma and N. Zabaras , An adaptive high-dimensional stochastic model representation technique forthe solution of stochastic partial diﬀerential equations , J. Comput. Phys., 229 (2010), pp. 3884–3915.[40]

C. McDiarmid , On the method of bounded diﬀerences , Surveys in combinatorics, 141 (1989), pp. 148–188.[41]

B. K. Natarajan , Sparse approximate solutions to linear systems , SIAM J. Comput., 24 (1995),pp. 227–234.[42]

H. Ogura , Orthogonal functionals of the Poisson process , IEEE Trans. Inf. Theory, 18 (1972), pp. 473–481.[43]

J. Peng, J. Hampton, and A. Doostan , A weighted ℓ -minimization approach for sparse polynomialchaos expansions , J. Comput. Phys., 267 (2014), pp. 92–111.[44] , On polynomial chaos expansion via gradient-enhanced ℓ -minimization , J. Comput. Phys., 310(2016), pp. 440–458.[45] T. Pham-Dinh and H. A. Le-Thi , A D.C. optimization algorithm for solving the trust-region sub-problem , SIAM J. Optim., 8 (1998), pp. 476–505.[46] ,

The DC (diﬀerence of convex functions) programming and DCA revisited with DC models of realworld nonconvex optimization problems , Annals Oper. Res., 133 (2005), pp. 23–46.[47]

Y. Rahimi, C. Wang, H. Dong, and Y. Lou , A scale invariant approach for sparse signal recovery ,SIAM J. Sci. Comput., 41 (2019), pp. A3649–A3672.[48]

H. Rauhut and R. Ward , Sparse legendre expansions via l -minimization , J. Approx. Theory, 164(2012), pp. 517–533.[49] H. Rauhut and R. Ward , Interpolation via weighted ℓ minimization , Appl. Comput. Harmon. Anal.,40 (2016), pp. 321 – 351.[50] T. M. Russi , Uncertainty quantiﬁcation with experimental data and complex system models , PhD thesis,UC Berkeley, 2010.[51]

K. Sargsyan, C. Safta, H. N. Najm, B. J. Debusschere, D. Ricciuto, and P. Thornton , Di-mensionality reduction for complex models via bayesian compressive sensing , Int. J. Uncertain. Quantif.,4 (2014).[52]

X. Shen, W. Pan, and Y. Zhu , Likelihood-based selection and sharp parameter estimation , J. Am.Stat. Assoc., 107 (2012), pp. 223–232.[53]

S. Smolyak , Quadrature and interpolation formulas for tensor products of certain classes of functions ,Sov. Math. Dokl., 4 (1963), pp. 240–243.[54]

M. Tao and Y. Lou , Minimization of l1 over l2 for sparse signal recovery with convergence guarantee . , Oct 2020.1855] M. A. Tatang, W. Pan, R. G. Prinn, and G. J. McRae , An eﬃcient method for parametricuncertainty analysis of numerical geophysical models , J. Geophys. Res-Atmos. (1984–2012), 102 (1997),pp. 21925–21932.[56]

A. M. Tillmann and M. E. Pfetsch , The computational complexity of the restricted isometry prop-erty, the nullspace property, and related concepts in compressed sensing , IEEE Trans. Inf. Theory, 60(2014), pp. 1248–1259.[57]

R. Tipireddy and R. Ghanem , Basis adaptation in homogeneous chaos spaces , J. Comput. Phys.,259 (2014), pp. 304–317.[58]

P. Tsilifis, X. Huan, C. Safta, K. Sargsyan, G. Lacaze, J. C. Oefelein, H. N. Najm, andR. G. Ghanem , Compressive sensing adaptation for polynomial chaos expansions , J. Comput. Phys.,380 (2019), pp. 29–47.[59]

C. Wang, M. Yan, Y. Rahimi, and Y. Lou , Accelerated schemes for the L /L minimization , IEEETrans. Signal Process., 68 (2020), pp. 2660–2669.[60] D. Xiu and J. S. Hesthaven , High-order collocation methods for diﬀerential equations with randominputs , SIAM J. Sci. Comput., 27 (2005), pp. 1118–1139.[61]

D. Xiu and G. E. Karniadakis , The Wiener-Askey polynomial chaos for stochastic diﬀerential equa-tions , SIAM J. Sci. Comput., 24 (2002), pp. 619–644.[62]

Z. Xu, X. Chang, F. Xu, and H. Zhang , L / regularization: A thresholding representation theoryand a fast solver , IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), pp. 1013–1027.[63] Z. Xu and T. Zhou , On sparse interpolation and the design of deterministic interpolation points ,SIAM J. Sci. Comput., 36 (2014), pp. A1752–A1769.[64]

L. Yan, L. Guo, and D. Xiu , Stochastic collocation algorithms using l -minimization , Int. J. Uncer-tain. Quan., 2 (2012), pp. 279–293.[65] L. Yan, Y. Shin, and D. Xiu , Sparse approximation using ℓ - ℓ minimization and its application tostochastic collocation , SIAM J. Sci. Comput., 39 (2017), pp. A229–A254.[66] X. Yang, D. A. Barajas-Solano, W. S. Rosenthal, and A. M. Tartakovsky , PDF estimationfor power grid systems via sparse regression , arXiv preprint arXiv:1708.08378, (2017).[67]

X. Yang, M. Choi, G. Lin, and G. E. Karniadakis , Adaptive ANOVA decomposition of stochasticincompressible and compressible ﬂows , J. Comput. Phys., 231 (2012), pp. 1587–1614.[68]

X. Yang and G. E. Karniadakis , Reweighted ℓ minimization method for stochastic elliptic diﬀer-ential equations , J. Comput. Phys., 248 (2013), pp. 87–108.[69] X. Yang, H. Lei, N. Baker, and G. Lin , Enhancing sparsity of Hermite polynomial expansions byiterative rotations , J. Comput. Phys., 307 (2016), pp. 94–109.[70]

X. Yang, H. Lei, P. Gao, D. G. Thomas, D. L. Mobley, and N. A. Baker , Atomic radius andcharge parameter uncertainty in biomolecular solvation energy calculations , J. Chem. Theory Comput.,14 (2018), pp. 759–767.[71]

X. Yang, W. Li, and A. Tartakovsky , Sliced-inverse-regression–aided rotated compressive sensingmethod for uncertainty quantiﬁcation , SIAM/ASA J. Uncertainty, 6 (2018), pp. 1532–1554.[72]

X. Yang, X. Wan, L. Lin, and H. Lei , A general framework for enhancing sparsity of generalizedpolynomial chaos expansions , Int. J. Uncertain. Quantif., 9 (2019).1973]

P. Yin, E. Esser, and J. Xin , Ratio and diﬀerence of l and l norms and sparse representation withcoherent dictionaries , Comm. Inf. Syst., 14 (2014), pp. 87–109.[74] P. Yin, Y. Lou, Q. He, and J. Xin , Minimization of ℓ − for compressed sensing , SIAM J. Sci.Comput., 37 (2015), pp. A536–A563.[75] S. Zhang and J. Xin , Minimization of transformed L penalty: Closed form representation and iter-ative thresholding algorithms , Comm. Math. Sci., 15 (2017), pp. 511–537.[76] , Minimization of transformed L penalty: theory, diﬀerence of convex function algorithm, androbust application in compressed sensing , Math. Program., 169 (2018), pp. 307–336.[77] T. Zhang , Multi-stage convex relaxation for learning with sparse regularization , in Adv. Neural Inf.Proces. Syst. (NIPS), 2009, pp. 1929–1936.[78]