[PDF] Compressive Sampling of Polynomial Chaos Expansions: Convergence Analysis and Sampling Strategies

Abstract

Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with high-dimensional random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as {\it coherence}, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ 1 -minimization problem. Utilizing asymptotic results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under the respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence on the order of the approximation. For more general orthonormal bases, we propose the {\it coherence-optimal} sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases the coherence-optimal sampling scheme leads to similar or considerably improved accuracy.

Full PDF

aa r X i v : . [ m a t h . P R ] S e p Compressive Sampling of Polynomial Chaos Expansions:Convergence Analysis and Sampling Strategies

Jerrad Hampton, Alireza Doostan

Aerospace Engineering Sciences Department, University of Colorado, Boulder, CO80309, USA

Abstract

Sampling orthogonal polynomial bases via Monte Carlo is of interest foruncertainty quantiﬁcation of models with high-dimensional random inputs,using Polynomial Chaos (PC) expansions. It is known that bounding a prob-abilistic parameter, referred to as coherence , yields a bound on the num-ber of samples necessary to identify coeﬃcients in a sparse PC expansionvia solution to an ℓ -minimization problem. Utilizing results for orthogo-nal polynomials, we bound the coherence parameter for polynomials of Her-mite and Legendre type under their respective natural sampling distribution.In both polynomial bases we identify an importance sampling distributionwhich yields a bound with weaker dependence on the order of the approx-imation. For more general orthonormal bases, we propose the coherence-optimal sampling: a Markov Chain Monte Carlo sampling, which directlyuses the basis functions under consideration to achieve a statistical optimal-ity among all sampling schemes with identical support. We demonstratethese diﬀerent sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of eachsampling method is compared in the identiﬁcation of solutions to two diﬀer- Preprint submitted to Unknown September 25, 2014 ntial equations, one with a high-dimensional random input and the otherwith a high-order PC expansion. In both cases the coherence-optimal sam-pling scheme leads to similar or considerably improved accuracy.

Keywords:

Compressive Sampling, Polynomial Chaos, SparseApproximation, ℓ -minimization, Markov Chain Monte Carlo, HermitePolynomials, Legendre Polynomials, Stochastic PDEs, UncertaintyQuantiﬁcation

1. Introduction

A precise approach to analyzing modern, sophisticated engineering sys-tems requires understanding how various Quantities of Interest (QoI) behaveas functions of uncertain system inputs. An ineﬀective understanding maygive unfounded conﬁdence in the QoI or suggest unnecessary restrictionsin the system inputs due to unnecessary incredulity concerning the QoI.This process of Uncertainty Quantiﬁcation (UQ) has received much recentstudy [1, 2, 3].Probability is a natural framework for modeling uncertain inputs by as-suming the input depends on a d -dimensional random vector Ξ := (Ξ , · · · , Ξ d )with some joint probability density function f ( ξ ). In this manner we modelthe scalar QoI, denoted by u ( Ξ ), as an unknown function of the input, whichwe seek to approximate. In this work we approximate u ( Ξ ), assumed tohave a ﬁnite variance, using an expansion in multivariate orthogonal polyno-mials, each of which we denote by ψ k ( Ξ ), yielding a Polynomial Chaos (PC)2xpansion [1, 4], u ( Ξ ) = ∞ X k =0 c k ψ k ( Ξ ) , (1) ≈ X k ∈C c k ψ k ( Ξ ) . Under conditions discussed in Section 2.1, the index set C may have fewelements, allowing us to accurately reconstruct u from a relatively smallnumber of basis polynomials, i.e., there exists a sparse representation for u as a linear combination of orthogonal polynomials in Ξ . For computation wetruncate the expansion in (1) so that we have c = ( c , · · · , c P ) T and u ( Ξ ) ≈ P X k =1 c k ψ k ( Ξ ) , (2)where the error introduced by this truncation to a ﬁnite number of termsis referred to as truncation error . The polynomials ψ k ( Ξ ) are naturally se-lected to be orthogonal with respect to the measure f ( ξ ) of the inputs Ξ ,[4, 5]. For instance, when Ξ follows a jointly uniform or Gaussian distri-bution (with independent components), ψ k ( Ξ ) are multivariate Legendre orHermite polynomials, respectively. For the interest of analysis, we assumethat ψ k ( Ξ ) are normalized such that E [ ψ k ( Ξ )] = 1, where E denotes themathematical expectation operator. If we can accurately identify the coef-ﬁcients c k = E [ u ( Ξ ) ψ k ( Ξ )] for our approximation, then as P → ∞ there isthe mean-squares convergence of our PC approximation to u .To identify c we consider non-intrusive, i.e., sampling-based, methodswhere we do not require changes to deterministic solvers for u as we generaterealizations of Ξ to identify u ( Ξ ). We denote these realizations ξ ( i ) and3 ( ξ ( i ) ), respectively. We let i = 1 : N so that N is the number of independentsamples considered, and deﬁne u := ( u ( ξ (1) ) , · · · , u ( ξ ( N ) )) T ; (3) Ψ ( i, j ) := ψ j ( ξ ( i ) ) . These deﬁnitions imply the matrix equality Ψ c = u . We also introduce adiagonal positive-deﬁnite matrix W such that W ( i, i ) is a function of ξ ( i ) that depends on our sampling strategy and is described in Sections 3 and 4.To approximate c we use Basis Pursuit Denoising (BPDN), [6, 7, 8, 9]. Thisinvolves solving either the ℓ -minimization problemarg min c k c k subject to k W u − W Ψ c k ≤ δ, (4)where δ is a tolerance of solution inaccuracy due to the truncation error, orthe closely related arg min c k W u − W Ψ c k + λ k c k , (5)where λ is a regularization parameter.The solution to these problems are closely related to the solution of eitherarg min c k c k subject to k W u − W Ψ c k ≤ δ, which is similar to (4), or the closely relatedarg min c k W u − W Ψ c k + λ k c k , which is similar to (5). Here, k c k = c k = 0) is the number of non-zero entries of c . Solutions to these problems are of great practical interestfor sparse approximation and have received signiﬁcant study in the ﬁeld ofCompressive Sampling/Compressed Sensing, see, e.g., [10, 8, 11, 12], andmore recently in UQ, [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23].4 .1. Contributions of This Work This work is concerned with convergence analysis and sampling strategiesto recover a sparse stochastic function in both Hermite and Legendre PC ex-pansions from ℓ -minimization problem (4). As an extension of our previouswork in [13, 14, 20], the main contributions of this study are three-fold.Firstly, we utilize properties of these polynomials, in conjunction with theanalysis of sparse function recovery in [24, 25], to give a framework whichadmits a bound on the number of samples suﬃcient for a successful solutionof (4). To our best knowledge, the Hermite results are the ﬁrst of theirtype, and the Legendre recovery bounds, while here obtained from diﬀerenttechniques, are similar to those in [25].Secondly, we provide a contribution of particular practical interest in thatwe analyze sampling Hermite polynomials uniformly over a d -dimensionalball – with a radius depending on the order of approximation – instead ofsampling from the standard Gaussian measure. This sampling arises in a sim-ilar context to the Chebyshev distribution as a sampling for Legendre polyno-mials. Interestingly, as explained in Section 4.2.1, this sampling of Hermitepolynomial expansion is analogous to Hermite function expansion, [26], ofappropriately weighted solution of interest. We provide analytic and nu-meric results justifying the use of this importance sampling distribution forthe recovery of sparse Hermite PC expansions.Finally, we analytically identify an importance sampling distribution witha statistical optimality , in terms of the coherence of the PC basis as a keyrecovery parameter of the method, and identify a Markov Chain Monte Carlosampler for which we provide associated numeric results. This approach,5ere referred to as coherence-optimal sampling, provides a general samplingscheme for the reconstruction of sparse Hermite and Legendre PC expansions,and may be extended to other types of orthogonal bases.The motivation to design a sampling strategy based on the coherence issimilar to that of [25, 27], but utilizing a diﬀerent pre-conditioning from [27],considering unbounded bases and asymptotic scenarios, and providing a pro-cedure for generating samples.The presentation in this work has Section 2 clearly stating the problem.Section 3 provides key background information and motivates our approach,while Section 4 describes our sampling methods and provides key theoreticalresults. Section 5 demonstrates the performance of the sampling methodsand Section 6 presents the proofs to the Theorems from Section 4.

2. Problem Statement and Solution Approach

We ﬁrst describe the random inputs to the system, letting the randomvector Ξ , deﬁned on the probability space (Ω , F , P ), represent the input un-certainties to the physical problem under consideration. We assume that(Ω , F , P ) is formed by the product of d probability spaces ( R , B ( R ) , P i ) as-sociated with each Ξ i where B denotes the Borel σ -algebra. We note thatthis implies that F = B ( R d ) the d -dimensional σ -algebra, and Ω = R d . Fur-ther implied are that P is Lebesgue measurable and the Ξ i are independentrandom variables. For convenience, we assume that the Ξ i are identically dis-tributed with distribution function f ( ξ ), and abuse this notation by allowingthat Ξ is distributed according to f ( ξ ), noting that the two distributions maybe diﬀerentiated by the presence of a scalar or vector function argument.6e consider the physical system through which the input uncertainty Ξ propagates to be given by operators deﬁned on a bounded Lipschitz contin-uous domain D ⊂ R D for D ∈ { , , } , with a boundary denoted by ∂ D .Letting operators L , B and I depend on the physics of the problem beingconsidered, we assume that a solution u satisﬁes L ( x , t, Ξ ; u ( t, x , Ξ )) = 0 x ∈ D , B ( x , t, Ξ ; u ( t, x , Ξ )) = 0 x ∈ ∂ D , I ( x , , Ξ ; u (0 , x , Ξ )) = 0 x ∈ D . We note that the problems considered in Section 5 depend only on spaceor time, but the methods considered here are independent of the under-lying physical problem. We assume that conditioned on the i th indepen-dent sample of Ξ , denoted by ξ ( i ) , a numerical solution to this problemmay be identiﬁed by a ﬁxed solver; we utilize FEniCS [28] for the exam-ples in the present work. For any ﬁxed x ∈ D and t >

0, our objec-tive is to reconstruct u ( x , t , Ξ ) via problem (4) using information obtainedfrom { u ( x , t , ξ ( i ) ) } Ni =1 , that is N independent realizations of the QoI. For acleaner presentation, we suppress the dependence of u on x and t . Here we discuss the polynomials in (2) as utilized in this work, and thekey sparsity assumption which this approximation frequently facilitates. Weconsider an arbitrary number of input dimensions, denoted by d , and the setof orthogonal polynomials in any mixture of these coordinates of total orderless than or equal to p . To explain the total order, let k = ( k , . . . , k d ) be a d × k i ∈ N ∪ { } represents the order of the polynomial7 k i (Ξ i ), orthogonal with respect to the measure of Ξ i . The d -dimensionalpolynomials ψ k ( Ξ ) are constructed by the tensorization of ψ k i (Ξ i ), ψ k ( Ξ ) = d Y i =1 ψ k i (Ξ i ) . The total order of p implies that we consider all polynomials satisfying k k k ≤ p k i ∈ N ∪ { } ∀ i. We note that a direct combinatorial count implies that a d -dimensional ap-proximation of total order p has P = (cid:0) p + dd (cid:1) basis polynomials. This totalorder basis facilitates a polynomial approximation to the general functionthat favors lower order polynomials. If the coeﬃcients have a suﬃcientlyrapid decay or if certain dimensions are dominant in an accurate reconstruc-tion, then we have u ( Ξ ) ≈ X k ∈C c k ψ k ( Ξ ) , where s := |C| ≪ P is the operative sparsity of the approximation that leadsto stable and convergent approximations of c from (4), using N < P ran-dom samples of u ( Ξ ). To demonstrate this, we rely primarily on existingtheorems from [24] as presented in Section 3. Subsequently, in Section 4, weadapt these results to the case of sparse PC expansions for which we utilizebasic properties of orthogonal polynomials, and further present our main re-sults on the choices of random sampling of Ξ and, hence, u ( Ξ ). Notation:

In the sequel we occasionally use a multi-index notation for poly-nomials, but also ﬁnd it convenience to index polynomials by a scalar, e.g., k , from 1 to P . 8 . Deﬁnitions and Background To contextualize the results from [24], presented in Section 3.3, we ﬁrstintroduce two main deﬁnitions that are used both in these results, as well asin constructing our sampling methods.

We ﬁrst consider the set of polynomials, { ψ k ( ξ ) } Pk =1 , as deﬁned in Sec-tion 2 and deﬁne B ( ξ ) to be B ( ξ ) := max k =1: P | ψ k ( ξ ) | . (6)This represents a uniformly least upper bound on the basis polynomials ofinterest. In addition, we consider G ( ξ ) ≥ B ( ξ ) ∀ ξ ∈ Ω . (7)Here, G ( ξ ) represents an upper bound on the tight bound of B ( ξ ) for all ξ ∈ Ω, where Ω is the sample space of potential values for ξ as a realizationof Ξ .We note that for several orthonormal polynomials of interest a bound on B ( ξ ) may be attained, [25, 26, 29, 30, 31, 32]. In this case, we have that ψ k ( ξ ) /G ( ξ ) ≤

1. It follows that for any set

S ⊆ Ω, c = (cid:18)Z S f ( ξ ) G ( ξ ) d ξ (cid:19) − / (8)is such that c Z S f ( ξ ) G ( ξ ) d ξ = 1 , f Y ( ξ ) := c f ( ξ ) G ( ξ ) , (9)is a probability distribution supported on S , which we consider as the distri-bution for Y . Let δ i,j denote the Kronecker delta such that δ i,j = 1 if i = j and 0 if i = j . Note that for i, j = 1 : P , (cid:12)(cid:12)(cid:12)(cid:12)Z S ψ i ( ξ ) cG ( ξ ) ψ j ( ξ ) cG ( ξ ) c f ( ξ ) G ( ξ ) d ξ − δ i,j (cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ i,j , (10)and we may select S such that ǫ i,j may be made as small as needed, e.g., ifwe take S = Ω, then ǫ i,j = 0. For this purpose we employ the heuristic ofselecting S to encompass the largest values of f ( ξ ) until S is large enough tosatisfy the condition (13), discussed in Section 3.2. The justiﬁcation for thisis that in unbounded domains, e.g., for Hermite polynomials, regions of small f ( ξ ) typically correspond to larger sup k =1: P | ψ k ( ξ ) | as p grows [26, 30, 29].While this formulation is useful for identifying distributions for Y , un-fortunately, we may no longer guarantee that E [ ψ i ( Y ) ψ j ( Y )] − δ i,j is small.Fortunately, from (10) if we let w ( Y ) := 1 cG ( Y ) , (11)then | E [ w ( Y ) ψ i ( Y ) ψ j ( Y )] − δ i,j | ≤ ǫ i,j . In this way we consider w ( Y ) to bea weight function so that { w ( Y ) ψ i ( Y ) } Pi =1 , are approximately orthonormalrandom variables. This function deﬁnes the diagonal positive-deﬁnite matrix W from (4) as W ( i, i ) = w ( ξ ( i ) ) , ξ ( i ) is the i th realization of Y . For a notational symmetry with theconceptual connection, we refer to all realized random vectors by ξ regard-less of the sampling distribution for ξ , noting that the weight function, w ,depends on that distribution. Additionally, we note that for simulation, weare not interested in the normalizing constant, c , associated with describingour sampling distribution. Consider realizations of w ( Y ) ψ k ( Y ) for k = 1 : P . We investigate thecoherence parameter deﬁned as in [24] by µ ( Y ) := sup k =1: P, ξ ∈ Ω | w ( ξ ) ψ k ( ξ ) | . (12)This is a conceptually simple parameter that we will see allows us to boundthe number of samples necessary to accurately recover c via a solution to (4).From (11) and (12) we are motivated to take G ( ξ ) to be B ( ξ ) as deﬁned in (6),and as we shall show in Section 4.3, this choice leads to an optimally minimalcoherence. Fortunately, asymptotic results give us approximations to thedistribution f Y ( ξ ) of Y in certain cases. These approximations also lead toeasier simulation of Y when f Y ( ξ ) corresponds to the choice of G ( ξ ) = B ( ξ ),as described in Section 4.3.1.We utilize the deﬁnition in (12) when analyzing Legendre polynomialswhich are bounded on the domain [ − , d . However, we note that (12) isnot useful when sup k =1: P, ξ ∈ Ω | w ( ξ ) ψ k ( ξ ) | is inﬁnite, such as when ψ k ( ξ ) areHermite polynomials and w ( ξ ) = 1. If N is the number of samples of Y which we will take, following [24], we consider a truncation of Ω to some11ppropriate S and let µ ( Y ) := min S (cid:26) sup k =1: P, ξ ∈S | w ( ξ ) ψ k ( ξ ) | , · · · (13)subject to P ( S c ) < N P ; P X k =1 E (cid:2) | w ( Y ) ψ k ( Y ) | S c (cid:3) ≤ P − / ) , where S is a subset of the support of f , a superscript c denotes a set comple-ment, and is the indicator function. While (12) highlights the quantity thatwe seek to bound, the conditions in (13) insure that the truncation from Ωto S has a limited eﬀect on the orthogonality of the set of random variables, { w ( Y ) ψ k ( Y ) } Pk =1 . As normal random variables are unbounded, we use (13)in the analysis of Hermite polynomials with a truncation S that capturesthe essential behavior of w ( ξ ) ψ k ( ξ ). These deﬁnitions are compatible in thateither deﬁnition may be used for the following theorems. The following theorems use the coherence parameter in either (12) or (13)to bound the number of samples necessary to recover a sparse signal withhigh probability.

Theorem 3.1. [24] Let c be a ﬁxed arbitrary vector in R P with at most s non-zero elements such that Ψ c = u , where Ψ is deﬁned as in (3). Withprobability at least − /P − e − β , and C an absolute constant, if N ≥ C (1 + β ) µ ( Y ) s log( P ) , (14) then c = arg min c {k c k : W Ψ c = W u } . When allowing for truncation error and considering a regularized versionof this ℓ -minimization problem as in (5), a similar result may be stated.12ollowing [24], we require the condition that k Ψ T W z k ∞ ≤ ν , for some0 ≤ ν < ∞ , where z is the associated truncation error with the model u = Ψ c + z for an arbitrary solution vector c . Additionally, we denote by σ w the standard deviation of the weighted truncation error w ( Y ) z ( Y ). Theorem 3.2. [24] Let c be a ﬁxed arbitrary vector in R P , and c s be a vectorsuch that c s ( i ) = c ( i ) for the s largest | c ( i ) | , and c s ( i ) = 0 otherwise. Forsome ¯ s , let N ≥ C (1 + β ) µ ( Y )¯ s log( P ) . With probability at least − /P − e − β , and C an absolute constant, thesolution to ˆ c = min ¯ c ∈ R P k W Ψ ¯ c − W u k + λσ w k ¯ c k , with λ = 10 q log( P ) N obeys for any c , k ˆ c − c k ≤ min ≤ s ≤ ¯ s C (1 + α ) " k c − c s k √ s + σ w r s log( P ) N ; k ˆ c − c k ≤ min ≤ s ≤ ¯ s C (1 + α ) " k c − c s k + σ w s r log( P ) N , where α := q (1+ β ) s log ( P ) N . We note that when k Ψ T W z k ∞ cannot be bounded by a ν , we maybe interested in a subset S of Ω that will be sampled with suﬃciently highprobability and admit a bound on k Ψ T W z ( Y , ··· , Y N ) ∈S k ∞ . This may berelated to the truncation of Ω to S in the conditions of (13).13hese results show how a bound on µ ( Y ) translates into a bound onthe number of samples needed to recover a solution vector, and provide atheoretical justiﬁcation to the identiﬁcation of distributions for Y which yielda smaller bound on µ ( Y ). With these bounds we may utilize Theorems 3.1and 3.2 to bound the number of samples required to recover solutions of anyparticular sparsity.

4. Sampling Methods

Here we describe the sampling methods that we consider in this work, andpresent theorems related to recovery when we use them. We ﬁrst consider asampling according to random variables deﬁned by the orthogonality measurein Section 4.1. Such a sampling, dubbed here standard sampling , is commonlyused in PC regression, [33, 2, 14, 16]. Second, we consider sampling from adistribution related to an asymptotic analysis of the orthogonal polynomials ψ k ( Ξ ) in Section 4.2, and refer to it as asymptotic sampling . Finally, inSection 4.3, we introduce the coherence-optimal sampling that correspondsto minimizing the coherence parameters deﬁned in Section 3.2. Here we consider sampling ξ according to f ( ξ ), the distribution withrespect to which the PC bases are naturally orthogonal. This implies taking w ( ξ ) = 1. For the d -dimensional Legendre polynomials the standard method cor-responds to sampling from the uniform distribution on [ − , d , while for14 -dimensional Hermite polynomials this corresponds to samples from a multi-variate normal distribution such that each of d coordinates is an independentstandard normal random variable. A standard sampling of Hermite polynomials leads to a coherence boundedas in Theorem 4.1, while a standard sampling of Legendre polynomials leadsto a coherence bounded as in Theorem 4.2. We note that these results holdfor a number of dimensions d and a set of orthogonal polynomials of arbitrarytotal order p as deﬁned in Section 2.1. Theorem 4.1.

Assume that d = o ( p ) , that is, d is asymptotically dominatedby p . Additionally, let N = O ( P k ) for some k > , that is, the number ofsamples does not grow faster than a polynomial in the number of basis poly-nomials considered. For d -dimensional Hermite polynomials of total order p ≥ , the coherence in (13) is bounded by µ ( Ξ ) ≤ C p · η pp , (15) for some constants C p , η p depending on p . For d = o ( p ) , and as p → ∞ , wemay take C p and η p to be larger than but arbitrarily close to and exp(2 − log(2)) ≈ . , respectively. We note that together with Theorems 3.1 and 3.2, this implies that withhigh probability, the number of samples required for recovery from Hermitepolynomials grows exponentially with the total order of approximation. Thefollowing theorem for Legendre polynomials is analogous to previous resultsin [25], and provides a similar result for the number of samples required for15ignal recovery.

Remark:

When sampling Hermite polynomials, we have the technical re-quirement that N = O ( P k ) for some ﬁnite k , and we note that this conditionis satisﬁed here as N < P is the case of interest in compressive sampling.

Theorem 4.2.

A standard sampling of the d -dimensional Legendre polyno-mials of total order p gives a coherence of µ ( Ξ ) ≤ exp(2 p ) . (16)As we shall see in Section 6.5, for the case of p < d the bound in (16)may be improved to µ ( Ξ ) ≤ p ≈ exp(1 . p ). Additionally, for p > d , thebound in (16) is loose, but a sharper dimension-dependent bound is given by(2 p/d + 1) d . Here we consider taking G ( ξ ) to approximate or coincide with the asymp-totic (in order) envelope for the polynomials as the order p goes to inﬁnity.Speciﬁcally, for the case of Hermite polynomials we consider a relativelysimple envelope function over a signiﬁcant range of ξ , corresponding to auniform sampling, though this envelope does not coincide with B ( ξ ) and isloose compared to known behavior of Hermite polynomials at high orders,[34]. The uniform approximation is, however, both simple to simulate andanalyze. For the case of Legendre polynomials, we take G ( ξ ) to be B ( ξ ) forasymptotically large order p , which corresponds to Chebyshev sampling. Forboth cases, sampling with this choice of G ( ξ ) leads to coherence parameterswith weaker dependence on p , as compared to the standard sampling.16 .2.1. Asymptotic Sampling Method For d -dimensional Hermite polynomials, we sample uniformly from withinthe d -dimensional ball of radius √ √ p + 1, which corresponds to G ( ξ ) =exp( k ξ k /

4) on this ball. This choice of uniform sampling and radius ismotivated by the analysis of Section 6.1. For completion, we outline onealgorithm for sampling uniformly from the d -dimensional ball of radius r .First, let Z := ( Z , · · · , Z d ) be a vector of d independent normally distributedrandom variables with zero mean and the same variance. If U is anotherindependent random variable that is uniformly distributed on [0 , Y := Z k Z k rU /d , represents a random sample uniformly distributed within the d -dimensionalball of radius r . This may be veriﬁed as Z / k Z k is uniformly distributed onthe d -dimensional hypersphere, while rU /d is the distribution for the radiusof the realization within the ball that coincides with a uniform samplingwithin the ball. Additionally, this leads to a weight function given by w ( ξ ) := exp( −k ξ k / . Remark ( Connection with Hermite function expansion ). We highlight thatthe application of the weight function w ( ξ ) = exp( −k ξ k /

4) to the Hermitepolynomials ψ k ( ξ ) leads to the so called Hermite functions , i.e., exp( −k ξ k / ψ k ( ξ ),that are orthogonal with respect to the uniform measure, [26]. This impliesthat the Hermite polynomial expansion with asymptotic sampling is analo-gous to Hermite function expansion of w ( Ξ ) u ( Ξ ). Notice that in a standardHermite function expansion, the solution of interest, u ( Ξ ), is expanded in17 exp( −k Ξ k / ψ k ( Ξ ) } . The only computational diﬀerence between solv-ing for a Hermite polynomial expansion under this sampling and a Hermitefunction expansion, is whether, during computation of the coeﬃcients, therealized u ( ξ ) are multiplied by w ( ξ ) or not.For the d -dimensional Legendre polynomials this corresponds to samplingfrom the Chebyshev distribution on [ − , d , [25], that is the distribution ineach of d coordinates is f Y ( ξ ) := 1 π p − ξ , for ξ ∈ [ − , πU ) where U isuniformly distributed on [0 , w ( ξ ) := d Y i =1 (1 − ξ i ) / . Analysis of the Hermite and Legendre polynomials sampled according tothese alternative distributions leads to a coherence with a weaker asymptoticdependence on p . In Theorem 4.3 and Theorem 4.4 we quantify such adependence. Theorem 4.3.

Assume that N = O ( P k ) for some k > , that is the numberof samples does not grow faster than a polynomial in the number of basispolynomials considered. We note that this includes the important and com-mon case that N ≤ P . Let V ( r, d ) = ( r √ π ) d / Γ( d/ denote the volumeinside the hypersphere with radius r in dimension d . or the sampling of Hermite polynomials, sampling uniformly from the d -dimensional ball of radius √ p (2 + ǫ p ) p + 1 , and weighting realized ψ k ( ξ ( i ) ) on this ball by w ( ξ ( i ) ) = exp( −k ξ ( i ) k / , gives µ ( Y ) = O ( π − d/ V ( p p, d )) = O ((2 p ) d/ / Γ( d/ . (17) Here, we note that ǫ p → if d = o ( p ) , and that the radius of the sampling isa factor of √ times larger than the radius of the volume in the coherence,due to a normalization explained in Section 6. In the uniform sampling in this work we set ǫ p in Theorem 4.3 to be zero,leaving as an open problem the determination of an optimal ǫ p , and hencesampling radius for uniform sampling of Hermite polynomials. Additionally,this theorem is applicable to sampling Hermite functions with a standard,i.e. uniform, sampling as w ( ξ ) ψ k ( ξ ) is a Hermite function.In the case of Legendre polynomials sampled by Chebyshev distributionwe have a complete independence of the order of approximation, which agreeswith previous results in [25]. Theorem 4.4.

For the sampling of d -dimensional Legendre polynomials ac-cording to the d -dimensional Chebyshev distribution and weight ψ k ( ξ ) pro-portional to w ( ξ ) = Q di =1 (1 − ξ i ) / , regardless of the relationship between d and p , we have that µ ( Y ) ≤ d . (18)It is worthwhile highlighting that the combination of Theorems 4.2 and4.4 suggests sampling Legendre polynomials by uniform distribution when d > p and Chebyshev distribution when d < p . A similar observation hasbeen made in [17]. 19 .3. Coherence-optimal Sampling Here we consider taking G ( ξ ) = B ( ξ ) in (9), which implies sampling ξ according to the distribution f Y ( ξ ) = c f ( ξ ) B ( ξ ) , (19)with some appropriate normalizing constant c . Corresponding to this sam-pling, we apply the weight function w ( ξ ) = 1 B ( ξ ) . Notice that in (19), f ( ξ ) is the measure with respect to which the polynomials ψ k ( ξ ) are naturally orthogonal. While B ( ξ ), as deﬁned in (6), is straightforward to evaluate for a ﬁxed ξ by iterating over each k = 1 : P , the quantity is diﬃcult to evaluate overa range of ξ , thus making it diﬃcult to accurately compute the normalizingconstant c in (19). This motivates sampling Ξ from (19) via a Monte CarloMarkov Chain (MCMC) approach, speciﬁcally using the Metropolis-Hastingssampler, [35]. The MCMC method uses the computable point-wise evaluationof B ( ξ ), and does not require an identiﬁcation of c necessary to normalizeto a probability distribution. Additionally, this sampling distribution allowsthe easy evaluation of w ( ξ ) using only the realized sample.The MCMC sampler requires a proposal, or candidate, distribution andwhen p > d we suggest those obtained from Section 4.2, giving a uniformsampling on a d -dimensional ball for Hermite polynomials, and d -dimensionalChebyshev sampling for Legendre polynomials. Similarly, when p ≤ d we20uggest those obtained from Section 4.1, giving a standard normal samplingfor Hermite polynomials, and sampling uniformly for [ − , d for Legendrepolynomials. We follow these proposal distributions for the sampling whichwe do in this work. Note that each proposal distribution covers the entiredomain S , and if the proposal and target distribution approximately match,then the acceptance rate is high and few burn-in samples are needed to ap-proximately draw from the desired distribution for Y . There is interest inidentifying better proposal distributions, to be studied further. One caveatwhich we note is that the proofs of Theorems 3.1 and 3.2 require independentsampling, so that it is proper to restart a chain after each accepted sample,but a more practical method is to discard intermediate samples so that serialdependence is small, [36]. We note that in applications where evaluation ofthe QoI is expensive, the generation of the samples, { ξ ( i ) } Ni =1 , is not typi-cally a bottleneck, so that the extra cost of MCMC sampling is frequentlyacceptable in practice. Theorem 4.5 justiﬁes the intuition that taking G ( ξ ) associated with sam-pling to be the envelope function B ( ξ ) leads to a minimal µ ( Y ). Theorem 4.5.

Let S be a set chosen to satisfy the conditions of (13) im-plying that no subset S s of S with µ ( S s ) < µ ( S ) satisﬁes the conditions of(13). Let B ( ξ ) be as in (6). If we sample from the distribution proportionalto f ( ξ ) B ( ξ ) and weight ψ k ( ξ ) proportional to w ( ξ ) = 1 /B ( ξ ) , then the co-herence parameter achieves a minimum over all sampling schemes of ψ k ( ξ ) , k = 1 : P , and distributions supported on S .

21n the next section we explore how the sample distributions associatedwith these results perform when used to approximate sparse functions in theappropriate PC basis.

5. Numerical Examples

Here we numerically investigate the diﬀerent sampling schemes discussedin Section 4, considering the coherence parameter in Section 5.1, randomlygenerated manufactured sparse functions in Section 5.2, the solution to an el-liptic PDE with random coeﬃcient in Section 5.3, and the amount of reactionat a given time in an adsorption model from [37] in Section 5.4.

The coherence parameter of Section 3.2 can be estimated from a largesample of realized points. Doing so leads to the results in Figure 1 for Hermitepolynomials, and those in Figure 2 for Legendre polynomials. We considerthree sampling schemes, the standard scheme where we sample based on theunderlying distribution of random variables in question as in Section 4.1.1; anasymptotically motivated method to insure a coherence with weaker depen-dence on the order p as in Section 4.2.1; and a coherence-optimal samplingbased on the distribution proportional to the envelope of basis functions asin Section 4.3.1. We see that standard sampling tends to perform poorlyat high orders, while asymptotic sampling tend to perform poorly for high-dimensional problems. Additionally, coherence-optimal sampling performswell in all regimes. These observations are consistent with the theoreticalresults presented in Section 4. 22 Degree of Polynomial Approximation, p C ohe r en c e A pp r o x i m a t i on Gaussian d=1Gaussian d=2Gaussian d=4Uniform d=1Uniform d=2Uniform d=4Coherence−optimal d=1Coherence−optimal d=2Coherence−optimal d=4

Figure 1: Computed µ ( Y ) for diﬀerent sampling methods of Hermite polynomials fordiﬀerent d and p . In this section, we investigate the reconstruction accuracy of the com-peting sampling schemes on randomly generated sparse solution vectors, c ,such that Ψ c = u . Here, c is chosen to have a uniformly selected ran-dom support and independent standard normal random variables for val-ues of each supported coordinate. We measure reconstruction accuracy asa function of sparsity, denoted by s , and the number of independent sam-ples of Y , denoted by N . We declare ˆ c to be a successful recovery of c if k ˆ c − c k / k c k ≤ .

01, where ˆ c is a solution to (4) and in this work is com-puted using the ℓ -minimization solver of SparseLab [38], which is based on23 Degree of Polynomial Approximation, p C ohe r en c e A pp r o x i m a t i on Uniform d=1Uniform d=2Uniform d=4Chebyshev d=1Chebyshev d=2Chebyshev d=4Coherence−optimal d=1Coherence−optimal d=2Coherence−optimal d=4

Figure 2: Computed µ ( Y ) for diﬀerent sampling methods of Legendre polynomials fordiﬀerent d and p . a primal-dual interior-point method. Each success probability is calculatedfrom 2500 independent realizations of Ψ and c .For a more comparable presentation, we normalize the number of samplesby the number of basis functions considered, N/P ∈ [0 . , s/N ∈ [0 . , ×

90 uniform grid in (

N/P, s/N ) for diﬀerent ( d, p ) pairs as well as fordiﬀerent distributions of Y . The results are presented in Figures 3 and 4for Hermite and Legendre polynomials, respectively, where we consider threesampling schemes of Sections 4.1.1, 4.2.1, and 4.3.1. For the coherence-24ptimal sampling, in conjunction with Theorem 4.5, we use a Metropolis-Hastings sampling to generate realizations from the appropriate distribution,where we discard 99 samples before every one kept, which both provides aburn-in eﬀect and reduces the serial correlation between samples.The results in Figures 3 and 4 identify a phase transition , [39], in theability of ℓ -minimization to recover c . For a number of solution realizations,given by N/P , the method succeeds – with probability one – in reconstruct-ing solutions with high enough sparsity, given by small s/N , and fails todo so for low sparsity. Between these two phases, the method recovers thesolution with probability smaller than one. Here, we observe diﬀerentiationin the quality of solution recovery in the transition region based on how Ψ is sampled. In particular, we highlight the following notable observations:for the high order case ( d, p ) = (2 , d, p ) = (30 , d, p ) = (5 ,

5) thetwo sampling methods lead to similar performance. In all cases, the MCMCsampling leads to recovery that is similar to those of the other two samplingstrategies or provides considerable improvements.

Remark:

Though our results following from [24] do not necessarily implyuniform recovery over all functions of a certain sparsity, the results in thissection are appropriately interpreted in the context of uniform recovery. Fora more detailed deﬁnition of uniform recovery, we refer the interested readerto [40]. 25 igure 3: Hermite Recovery Phase Diagrams: The rows correspond to diﬀering dimensionand total order while the columns correspond to the diﬀerent sampling schemes. The colorof each square represents a probability of successful function recovery. igure 4: Legendre Recovery Phase Diagrams: The rows correspond to diﬀering dimensionand total order while the columns correspond to the diﬀerent sampling schemes. The colorof each square represents a probability of successful function recovery. .3. An Elliptic PDE with Random Input As an application of Legendre PC expansions, we next consider the solu-tion of the linear elliptic PDE ∇ · ( a ( x , Ξ ) ∇ u ( x , Ξ )) = 1 , x ∈ D ,u ( x , Ξ ) = 0 , x ∈ ∂ D , on the unit square D = (0 , × (0 ,

1) with boundary ∂ D . The diﬀusioncoeﬃcient a is considered random and is modeled by a ( x , Ξ ) = a + σ a d X k =1 p ζ k ϕ k ( x )Ξ k , (20)in which the random variables { Ξ k } dk =1 , d = 20, are independent draws fromthe U([-1,1]) distribution, and we choose a = 0 . σ a = 0 . { ζ k } dk =1 are the d largest eigenvalues associated with { ϕ k } dk =1 , the L ([0 , )-orthonormalized eigenfunctions of C aa ( x , y ) = exp (cid:20) − ( x − y ) l − ( x − y ) l (cid:21) (21)with correlation lengths l = 0 . , l = 0 . a .For any realization of Ξ , we use the ﬁnite element solver FEniCS [28] tocompute an approximate solution u ( Ξ ) = u ((0 . , . , Ξ ).To identify u ( Ξ ) as a function of the random inputs Ξ , we use a Leg-endre PC expansion of total order p = 4, which for this d = 20 stochasticdimensional problem yields P = 10 ,

626 basis functions. We note that the28oot-mean-squared error is considered here as the primary measure of recov-ery.We investigate the ability to recover u ( Ξ ) via (4), using each of the threesampling schemes considered for Legendre polynomials. For this elliptic prob-lem we further improve the quality of the MCMC sampling through an initialburn-in of 1,000 discarded samples [36]. We provide bootstrapped estimatesof the various moment based measures from a pool of samples generated be-forehand. Speciﬁcally, samples for each realization are drawn from a pool of50,000 previously generated samples, which are used to calculate bootstrapestimates of averages and standard deviations.To identify the solution of (4) we use the SPGL1 package, [41, 42], with atruncation error in (4), denoted by δ , and determined for each set of samplesby two-fold, also known as hold-out, cross-validation, [43]. Speciﬁcally, wecalculate this δ from N , an even number of available samples, by splitting theavailable samples into two equally sized sets, one a training set, and the othera validation set. This process, as we have implemented it, is summarized bythe following algorithm,1. For a number of δ , construct solutions, c δ , from the training set andthe solution of (4). We use a set of potential δ deﬁned by 10 − ( − . .2. For each c δ use the validation set to identify the reconstruction error ǫ (1) δ := k Ψ c δ − u k .3. Repeat with the training and validation sets swapped to attained ǫ (2) δ .4. Identify the δ that minimizes ǫ (1) δ + ǫ (2) δ .5. Set the truncation error to δ ⋆ = √ δ , and identify a solution vector viathe combined N samples from both the training and validation sets.29e utilize this method of cross-validation to calibrate the truncation errorfor each realized sample of the calculated solution to (4). Here, a lower cross-validated truncation error suggests a computed solution vector with a moreaccurate recovery.In Figure 5 we see plots of computed moments for the distribution ofthe relative root-mean-squared error between the computed and referencesolutions obtained from 100 independent replications for each sample size, N .In addition, Figure 6 presents similar plots for the truncation error computedwith each sampling. We note here that the cross-validated computation of δ provides an available estimate for anticipated root-mean-squared error foradditional independent samples.We note the standard and coherence-optimal sampling oﬀer signiﬁcantimprovements over asymptotic, i.e., Chebyshev, sampling using similar sam-ple sizes, N , both in terms of accuracy and robustness to diﬀering realizedsamples. These observations are compatible with the theoretical results ofSection 4 demonstrating a smaller coherence for the uniform sampling – ascompared to Chebyshev sampling – for the case of d > p . The coherence-optimal sampling by construction leads to smallest coherence. We also noticethat at particularly low sample-sizes any given sampling method prefers torecover a particular but ultimately poor approximation. As the number ofsamples increases the recovery can improve but the variability in the solutionrecovery will appear to increase ﬁrst as this preferred solution is recoveredless frequently. 30 −5 −4 −3 −2 −1 Number of Samples, N M ean o f R e l a t i v e R oo t − m ean − s qua r e E rr o r Uniform (Standard) SamplingChebyshev SamplingCoherence−optimal Sampling 10 −7 −6 −5 −4 −3 −2 −1 Number of Samples, N S t anda r d D e v i a t i on o f R e l a t i v e R oo t − m ean − s qua r e E rr o r Uniform (Standard) SamplingChebyshev SamplingCoherence−optimal Sampling

Figure 5: Plots for the moments of root-mean-squared error for independent residuals forthe various sampling methods as a function of the number of samples. −5 −4 −3 −2 −1 Number of Samples, N M ean o f C o m pu t ed T o l e r an c e Uniform (Standard) SamplingChebyshev SamplingCoherence−optimal Sampling 10 −5 −4 −3 −2 −1 Number of Samples, N S t anda r d D e v i a t i on o f C o m pu t ed T o l e r an c e Uniform (Standard) SamplingChebyshev SamplingCoherence−optimal Sampling

Figure 6: Plots for the moments of cross-validated estimates of tolerance for the varioussampling methods as a function of the number of samples. .4. Surface Reaction Model Another problem of interest in this work is to quantify the uncertainty inthe solution ρ of the non-linear evolution equation  dρdt = α (1 − ρ ) − γρ − κ (1 − ρ ) ρ,ρ ( t = 0) = 0 . , (22)modeling the surface coverage of certain chemical species, as examined in[37, 44]. We consider uncertainty in the adsorption, α , and desorption, γ ,coeﬃcients, and model them as shifted log-normal variables. Speciﬁcally, weassume α = 0 . .

05 Ξ ) ,γ = 0 .

001 + 0 .

01 exp(0 .

05 Ξ ) , where Ξ , Ξ are independent standard normal random variables; hence, thedimension of our random input is d = 2. The reaction rate constant κ in(22) is assumed to be deterministic and is set to κ = 10.Our QoI is ρ c := ρ ( t = 4 , Ξ , Ξ ), and to approximate this, we consider aHermite PC expansion of total order p = 32, giving P = 561 basis functions.This high-order approximation is necessary due to the large gradient of ρ c interms of the random variables, as evidenced by the relatively slow decay ofcoeﬃcients in the reference solution presented in Figure 7. This is computedusing Gauss-Hermite quadrature approximation of the PC coeﬃcients.We utilize the same computational process as in Section 5.3 to identifyapproximate solutions. In Figure 8, we see plots of moments for the rela-tive root-mean-squared error – between the reference and ℓ -minimizationsolutions – as a function of the number of samples, N . These moments32re obtained from 200 independent replications for each N . We ﬁnd thatthe standard sampling fails to converge, while the uniform and coherence-optimal samplings lead to converged solution as N is increased. Figure 9presents plots for the truncation error computed with each sampling viacross-validation. One interesting fact to notice is that recovery for standardsampling appears to get worse for larger sample sizes. This may be an eﬀectof the poor numerical conditioning of high order p = 30 Hermite polynomi-als under standard sampling, where very rare events with very large realized | ψ k ( ξ ) | are necessary to capture the orthogonality of the polynomials. It fur-ther aﬃrms the results of Figure 1 and Theorem 4.1, that standard samplingof Hermite polynomials is not suited for high-order problems.

6. Proofs

Here we present proofs for the theorems in Section 4. These proofs, exceptfor that of Theorem 4.5, rely on an analysis of the appropriate orthonormalpolynomials. We ﬁrst work toward proofs for Theorems 4.1 and 4.3, whichrely on understanding Hermite polynomials. To do so we require a few tech-nical Lemmas concerning broad behavior of the polynomials asymptoticallyin order.This analysis is focused on three domains for one-dimensional polynomi-als. The ﬁrst sampling region coincides nearly with the so-named oscillatoryregion of ψ p ( ξ ), [30], within which all zeros of ψ k ( ξ ) are found for k ≤ p . Asecond region, referred to as the monotonic region, [30], is the complementaryregion where polynomials tend to increase monotonically in magnitude, andin this region we focus on bounding the extreme values of ψ k ( ξ ). The third33 −12 −10 −8 −6 −4 −2 Coefficient Index C oe ff i c i en t M agn i t ude Figure 7: PC coeﬃcients of reference solution for the QoI ρ c := ρ ( t = 4 , Ξ , Ξ ) in thesurface reaction model. region of importance is the boundary between the monotonic and oscillatoryregions, referred to as the boundary region.For multidimensional Hermite polynomials, we identify a domain S whichfully contains the multidimensional analogue to the oscillatory and bound-ary regions, and partially contains the monotonic region. The size of themonotonic region included is determined so as to satisfy the conditions of(13), while admitting a useful bound on the extreme values of | ψ k ( ξ ) | for ξ ∈ S . The method for our selection of S is to include ξ correspondingto the largest values of the density function, f ( ξ ), until S is veriﬁed tosatisfy (13). In the case of Hermite polynomials, this heuristic is justiﬁed34 −3 −2 −1 Number of Samples, N M ean o f R e l a t i v e R oo t − m ean − s qua r e E rr o r Gaussian (Standard) SamplingUniform SamplingCoherence−optimal Sampling −4 −3 −2 −1 Number of Samples, N S t anda r d D e v i a t i on o f R e l a t i v e R oo t − m ean − s qua r e E rr o r Gaussian (Standard) SamplingUniform SamplingCoherence−optimal Sampling

Figure 8: Moments of root-mean-squared error between the reference and ℓ -minimizationsolutions for the various sampling methods as a function of the number of samples. −3 −2 −1 Number of Samples, N M ean o f C o m pu t ed T o l e r an c e Gaussian (Standard) SamplingUniform SamplingCoherence−optimal Sampling 10 −3 −2 −1 Number of Samples, N S t anda r d D e v i a t i on o f C o m pu t ed T o l e r an c e Gaussian (Standard) SamplingUniform SamplingCoherence−optimal Sampling

Figure 9: Plots for the moments of cross-validated estimates of tolerance for the varioussampling methods as a function of the number of samples. as Hermite polynomials tend to inversely relate with f ( ξ ) for ξ within themonotonic region, [26]. The selection of S determines our radius for samplinguniformly from the d -dimensional ball, and we will show that this involves35aking S = { ξ : k ξ k ≤ r p } for an r p that grows asymptotically like 2 √ p . For convenience with the cited literature, we prove our results using theorthonormalized physicists’ Hermite polynomials (orthonormal polynomialswith respect to f ( ξ ) := π − d/ exp( −k ξ k )). We note that our results in Sec-tion 4 are in terms of the probabilists’ polynomials (orthogonal with respectto f ( ξ ) := (2 π ) − d/ exp( −k ξ k / { ψ k ( ξ ) } denotes the orthonormalized physicists’ polynomials and { ψ ′ k ( ξ ) } represents the orthonormalized probabilists’ polynomials, then for each k , ψ k ( √ ξ ) = ψ ′ k ( ξ ). We point the reader to Section 5.5 of [26] for a deriva-tion of this key relation. The eﬀect on the results of the proof is that theprobabilists’ polynomials require a sampling radius that is √ π − d/ vs. (2 π ) − d/ ).We bound (13) for the d -dimensional Hermite polynomials as follows. Let ξ be a d × k be a d × ψ k ( ξ )is an orthonormal polynomial whose order in the i th dimension is given by k i := k ( i ), and whose total order is at most p . As the total order is atmost p , k k k ≤ p , and as the weight function is formed by a tensor productof one dimensional weight functions, ψ k is a tensor product of univariateorthogonal polynomials. In this way the bounds in arbitrary dimension aretensor products of one-dimensional bounds, which are more easily derived.As mentioned previously, the behavior of the Hermite polynomials in the36ange for ξ Bound for | ψ k ( ξ ) | ≤ | ξ | ≤ n / − n − / Cn − / ( n / − | ξ | ) − / exp( ξ / n / − n − / ≤ | ξ | ≤ n / + n − / Cn − / exp( ξ / Table 1: Bounds in the oscillatory and boundary regions of Hermite polynomials from [30].Here, C is some positive constant and n := 2 k + 1. monotonic region, and the radially symmetric concentration of the weightfunction π − / exp( −k ξ k ) suggests the candidate set S := {k ξ k ≤ r p } with r p to satisfy the conditions of (13). We recall that the minimum overadmissible S yields a coherence parameter less than any given choice of S ,so that this selection of S leads to an upper bound on a minimal µ ( Y ).Being of classical and modern importance, several classes of one-dimensionalorthogonal polynomials (e.g., Hermite, Jacobi, Legendre, Laguerre) have re-ceived much analysis and key results are available in the literature, [25, 26,29, 30, 31, 32].In particular, for our interest in Hermite polynomials, a direct conse-quence of bounds from [30] gives the bounds in Table 1 for some positive C, γ ,and n := 2 k + 1. The key conclusion is that we may bound exp( − ξ / ψ k ( ξ )for ξ in each of these regions.The bounds in Table 1 are suﬃcient for both our uses within the os-cillatory region and the boundary of the oscillatory and monotonic region.We derive a bound within the monotonic region using results in [29]. Weﬁrst summarize the needed results as follows. Let σ k ( ξ ) := p ξ − k for | ξ | ≥ √ k + ǫ k where ǫ k → k → ∞ . We note that our analysis does notaddress how rapidly we may take ǫ k to 0, and for our purposes it is more con-37enient to redeﬁne ǫ k such that | ξ | ≥ p (2 + ǫ k ) k + 1, again letting ǫ k → ǫ k implies a lack of eﬀective analysis forderived quantities, and all results are guaranteed to hold in an asymptoticsense without an analysis as to how rapidly convergence occurs. We do referthe reader to [29] for some analysis of how ǫ k may be taken to zero, specif-ically in a worst case, ǫ k = O ( k − / ). As a matter of notation, while σ k ( ξ )depends on both k and ξ , in what follows we suppress the dependence on ξ .Following [29], we may approximate ψ k ( ξ ) when | ξ | ≥ p (2 + ǫ k ) k + 1 by c ′ k C k exp (cid:18) ξ − σ k ξ − k (cid:19) ( σ k + ξ ) k s (cid:18) ξσ k (cid:19) ≤ ψ k ( ξ ); c k C k exp (cid:18) ξ − σ k ξ − k (cid:19) ( σ k + ξ ) k s (cid:18) ξσ k (cid:19) ≥ ψ k ( ξ ) , (23)where both c ′ k , c k → k → ∞ , and C k = √ k k ! is the appropriateconstant so that the { ψ k } are orthonormal with regards to the weight f ( ξ ) = π − / exp( − ξ ). For smaller | ξ | the polynomials are eﬀectively oscillatoryand more technically troublesome to work with. Thankfully, as we are moreconcerned with approximating key integrals where we know the value (1 or0 by orthonormality) over the real line, we do not need to delve closely intothe analysis for small | ξ | , and understanding the behavior for large | ξ | issuﬃcient.The key technical results to bound ψ k in the monotonic region are pre-sented in the following lemma, where the motivating idea is to show thatthe polynomial ψ k ( ξ ) is tightly bounded by an envelope with a well behavedexponential parameter, denoted by η k ( ξ ). Due to the length of the proof wedelay the proof to Appendix A. 38 emma 6.1. Let C k and σ k be as in (23), and deﬁne the function η k ( ξ ) implicitly by, C k exp (cid:18) ξ − σ k ξ − k (cid:19) ( σ k + ξ ) k s (cid:18) ξσ k (cid:19) = exp (cid:0) η k ( ξ ) ξ (cid:1) . (24) That is we approximate | ψ k ( ξ ) | by exp ( η k ( ξ ) ξ ) with the exponent η k ( ξ ) im-plicitly deﬁned by the approximation in (23).It follows that For ǫ > , lim k →∞ η k ( p (2 + ǫ ) k + 1) → − log(2)2(2+ ǫ ) . For a sequence of ǫ k > such that ǫ k → as k → ∞ , and for ξ >ξ ≥ p (2 + ǫ k ) k + 1 , η k ( ξ ) < η k ( ξ ) . For a sequence of ǫ k such that ǫ k → , some ﬁnite K and k ≥ K , k < k , and for ξ ≥ p (2 + ǫ k ) k + 1 , it follows that η k ( ξ ) < η k ( ξ ) . Using Lemma 6.1, we show the following result which is useful for a directbound on the coherence parameter.

Lemma 6.2.

For some choice of ǫ p such that ǫ p → , and p ≥ p for some p it follows that for r p ≥ p (2 + ǫ p ) p + 1 , sup k ≤ p Z | ξ | >r p ψ k ( ξ ) e − ξ √ π dξ ≤ (1 + δ p ) erfc ( q (1 − η p ( r p )) r p ) p − η p ( r p ) , (25) where erfc ( · ) is the complement to the error function and δ p → . Consider-ing multidimensional polynomials and letting δ p → , sup k ξ k ≤ r p k k k ≤ p | ψ k ( ξ ) | ≤ (1 + δ p ) exp (cid:0) η p ( r p ) r p (cid:1) . (26) Here, η p is deﬁned implicitly as in (24), or equivalently, explicitly as in (31). roof. To show the ﬁrst point note from Lemma 6.1 that for | ξ | ≥ √ p + 1, | ψ k ( ξ ) | ≤ c k exp( η k ( ξ ) ξ ) . By the second point of Lemma 6.1, η k ( ξ ) decreases as ξ increases, yieldingthe bound on the integral in (25), and by the third point of that Lemma, η p ( ξ ) ≥ η k ( ξ ) for all k ≤ p . The δ p accounts for the approximation in (23)being inaccurate for ﬁnite p , but we do not address how quickly δ p convergesto zero.To show (26) note that for column vectors η and ξ with coordinatesrepresenting coordinates of η and ξ in each dimension, | η T ξ | ≤ k η k ∞ k ξ k ,with equality holding if k η k ∞ is achieved at the one coordinate on which ξ is supported. Further noting that k ξ k = k ξ k , it follows from the thirdpoint of Lemma 6.1, and hence for large enough p ,sup k ξ k ≤ r p k k k ≤ p | ψ k ( ξ ) | ≤ (1 + δ p ) | ψ p ( r p ) | , where the bound on ψ p from (23) shows (26). With Lemma 6.2 we are prepared to prove Theorem 4.1 for the case ofHermite polynomials. Let S = { ξ : k ξ k ≤ r p } where r p is as in Lemma 6.2,and we show that the conditions for (13) are satisﬁed. Let the total numberof polynomials be given by P = (cid:0) p + dd (cid:1) where p is the total order of theapproximation and d the number of dimensions. Recall that the number of40amples from the orthogonal polynomial basis is N . We show that P ( S c ) = erfc( r p ) < N P ; P X k =1 E (cid:2) ψ k ( Z ) S c (cid:3) ≤ P (1 + δ p )erfc( q (1 − η p ( r p )) r p ) p − η p ( r p ) < √ P , where Z is normally distributed with variance 1 /

2, and we recall that sub-stituting Z ′ = √ Z scales the physicists’ polynomials to probabilists’ poly-nomials.By Lemma 6.2 these are satisﬁed whenevererfc( r p ) < N P ;(1 + δ p ) erfc (cid:16)q (1 − η p ( r p )) r p (cid:17)p − η p ( r p ) < P / . Noting that δ p →

0, and erfc( r p ) = O ( e − r p /r p ) = O (exp( − (2+ ǫ p ) p ) / p (2 + ǫ p ) p ), [45],it follows that the ﬁrst inequality is satisﬁed for r p = p (2 + ǫ p ) p + 1 if N P = o ( p (2 + ǫ p ) p exp((2 + ǫ p ) p )). Recall that we assume that N = O ( P k ),and it remains to show that P k = o ( √ p exp((2 + ǫ p ) p ), which we addressshortly.From the ﬁrst point of Lemma 6.1, we see for large p that 1 − η p ( p (2 + ǫ p ) p + 1) ≥ c for a positive constant c . The second inequality is then satisﬁed for r p if P / = o ( √ p exp( c ǫ p )) for an appropriate constant c ǫ > ǫ p .It remains to insure that both of these bounds allow ǫ p to go to zero. If d is ﬁxed this holds as P = O ( p d ) = o (exp( δp )) for any δ > d . We consider the case where d = c · p c >

0. Using Stirling’s approximation we have that P = ( p + d )! p ! d ! = (( c + 1) p )! p !( cp )! , ≈ r c + 1 pc π ( c + 1) p (cid:18) c (cid:19) cp , where the approximation holds with arbitrarily high accuracy as p, d → ∞ .This gives us that for large p , P ≈ r c + 1 pc π β p , where β = ( c + 1) c +1 /c c . Note that β goes to 1 as c →

0. It follows in thelimit that P k ≈ (cid:18) √ c + 1 √ cp π (cid:19) k exp( αkp ) , where α = log( β ) →

0. As c · p = d >

0, it follows that P k = o ( √ p exp( δp ))for any ﬁxed k, δ >

0, establishing both inequalities needed for the conditionsof (13) when d = o ( p ) and N = O ( P k ).Having shown S is acceptable, we now bound µ ( Ξ ) with this choice of S .By Lemma 6.2 and the deﬁnition of η k therein, together with the bounds inTable 1 we have that, µ ( Ξ ) ≤ exp( η p ( r p ) r p ) , = exp (cid:18)(cid:20) − log(2)2 + ǫ p + o (1) (cid:21) ǫ p (cid:19) exp (cid:18)(cid:20) − log(2)2 + ǫ p + o (1) (cid:21) p (cid:19) . Letting C p := exp (cid:18)(cid:20) − log(2)2 + ǫ p + o (1) (cid:21) ǫ p (cid:19) ; η p := exp (cid:18) (cid:20) − log(2)2 + ǫ p + o (1) (cid:21)(cid:19) ,

42t follows that µ ( Ξ ) ≤ C p η pp , As ǫ p →

0, it follows that C p → η p → exp(2 − log(2)) ≈ . . Here, we consider a transformation such that φ k ( ξ ) = ψ k ( ξ ) /G ( ξ ), with G ( ξ ) > | φ k ( ξ ) | ≤ C uniformly in k and ξ for some constant C .We may then use that ψ k ( ξ ) = φ k ( ξ ) G ( ξ ) to identify ψ k ( ξ ) and satisfy theconditions of (13). We note that this approach corresponds to a weightfunction w ( ξ ) = 1 /G ( ξ ). In this framework, we sample { ψ k ( ξ ) } from adistribution proportional to f ( ξ ) G ( ξ ), where f ( ξ ) is the distribution withrespect to which the ψ k ( ξ ) are orthogonal, and use that the { φ k ( ξ ) } form abounded and approximately orthogonal system.By (26) of Lemma 6.2 and the bounds in Table 1, for k ξ k ≤ p (2 + ǫ p ) p + 1and k ≤ p , | ψ k ( ξ ) exp( −k ξ k / | ≤ C, (27)which suggests taking G ( ξ ) = exp( k ξ k / Remark.

Notice that the function in the left side of the inequality in (27)is referred to as Hermite function whose upper bound C is explicitly known,for instance, from [46]. 43rom the argument in the proof of Theorem 4.1 for the case of one di-mensional polynomials, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ π Z | ξ |≤ √ (2+ ǫ p ) p +1 ( ψ i ( ξ ) exp( − ξ / ψ j ( ξ )) exp( − ξ / dξ − δ i,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ i,j , where ǫ i,j is small enough to insure that the conditions of (13) hold. Consid-ering a corresponding change for multidimensional polynomials, let φ k ( ξ ) = π − d/ exp( −k ξ k / V / (cid:18)q (2 + ǫ p ) p + 1 , d (cid:19) ψ k ( ξ ) , where V ( r, d ) = ( r √ π ) d / Γ( d/ d -dimensionalball of radius r .If we instead consider a draw from the uniform distribution on the ballof radius p (2 + ǫ p ) p + 1, then for a d -dimensional ξ , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z k ξ k ≤ √ (2+ ǫ p ) p +1 φ i ( ξ ) φ j ( ξ ) V ( p (2 + ǫ p ) p + 1 , d ) d ξ − δ i,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ i,j , and we have that | φ k ( ξ ) | is bounded, and of order π − d/ V / ( p (2 + ǫ p ) p + 1 , d ),giving a bound on the coherence parameter of order π − d/ V ( √ p, d ). A key technical simpliﬁcation is present when working with Legendrepolynomials, namely we may ﬁx S to be [ − , d as a ﬁnite number of poly-nomials on a bounded domain are necessarily bounded. The technical resultswe require are presented in the following Lemma. Lemma 6.3.

For the -dimensional Legendre polynomials, sup ξ ∈ [ − , | ψ k ( ξ ) | = √ k + 1 . (28)44 urther, sup ξ ∈ [ − , √ π (1 − ξ ) / | ψ k ( ξ ) | ≤ r k + 1 k ≤ √ . (29) Proof.

These are classical results, with (28) following from Theorem 7.32.1of [26]. We note that a direct application of these theorems does requirenormalizing the polynomials to be orthonormal. Similarly, (29) follows fromTheorem 7.3.3 of [26] and is a direct restatement of Lemma 5.1 of [25].

To show Theorem 4.2, we note that when p ≤ d , (16) follows from µ ( Y ) ≤ max k k k ≤ p d Y i =1 k ψ k i k ∞ ≤ p ≤ exp(2 p ) , where we note that at most p of the d dimensions can be non-constant poly-nomials. Similarly, when p > d , µ ( Y ) ≤ max k k k ≤ p d Y i =1 k ψ k i k ∞ ≤ (cid:18) pd + 1 (cid:19) d ≤ exp(2 p ) , where the third bound is loose for small d .To show (18), note that (29) implies that when sampling from the Cheby-45hev distribution and independently of pµ ( Y ) ≤ max k k k ≤ p d Y i =1 k ψ k i k ∞ ≤ d Y i =1 k i + 1 k i ≤ d . The proof of Theorem 4.5 follows from a similar logic to the other proofs,but is approachable in a more general measure theoretic setting. By thedeﬁnition of B ( ξ ) in (6), we have for all ξ ∈ S thatsup k =1: P | ψ k ( ξ ) | B ( ξ ) = 1 . This shows that sampling Y according to B ( ξ ), gives a µ ( Y ) which isachieved uniformly over all values of ξ . Let c = (cid:18)Z S f ( ξ ) B ( ξ ) d ξ (cid:19) − / ;that is, c normalizes f ( ξ ) B ( ξ ) to a probability distribution on S . Then for i, j = 1 : P , Z S ψ i ( ξ ) cB ( ξ ) ψ j ( ξ ) cB ( ξ ) c f ( ξ ) B ( ξ ) d ξ = Z S ψ i ( ξ ) ψ j ( ξ ) f ( ξ ) d ξ ≈ δ i,j , and we assume that S is chosen so that the approximation holds within thesatisfaction of requirements in (13). Assup k =1: P | ψ k ( ξ ) | cB ( ξ ) = c − , (30)for all ξ ∈ S , it follows that the coherence parameter for the scheme associ-ated with sampling from the distribution c f ( ξ ) B ( ξ ) and ξ ∈ S is c − .46e deﬁne the measure ν on Lebesgue measurable subsets of S , denotedby A , via ν ( A ) := Z A f ( ξ ) dλ ( ξ ) , where λ ( ξ ) is the Lebesgue measure, and f is the distribution with respectto which the { ψ k ( ξ ) } are orthogonal. Let ˆ B be a function diﬀering from B on a set of non-zero ν -measure, so that the sampling scheme correspondingto ˆ B diﬀers on a set of non-zero ν -measure. By (13), no subset S s of S with µ ( S s ) < µ ( S ) satisﬁes the conditions of (13), implying that ˆ B may not beinﬁnite (corresponding to applying a weight of zero) on any set of positivemeasure and still satisfy these conditions. As (30) is achieved for all ξ ∈ S ,it follows that for the sampling scheme associated with ˆ B , there is a set A ⋆ with ν ( A ⋆ ) > Z A ⋆ sup k =1: P | ψ k ( ξ ) | ˆ c ˆ B ( ξ ) dλ ( ξ ) > λ ( A ⋆ ) c − , and it follows by the Mean Value Theorem for integrals thatsup ξ ∈A ⋆ sup k =1: P | ψ k ( ξ ) | ˆ c ˆ B ( ξ ) > c − . This implies that,sup ξ ∈S sup k =1: P | ψ k ( ξ ) | ˆ c ˆ B ( ξ ) ≥ sup ξ ∈A ⋆ sup k =1: P | ψ k ( ξ ) | ˆ c ˆ B ( ξ ) > c − . It follows by the deﬁnition of µ ( Y ) given in (13) for the selected S , thecoherence parameter µ ( Y ) for the sampling scheme associated with ˆ B islarger than c − . 47 . Conclusions We provided an analysis of Hermite and Legendre polynomials which al-lowed us to bound a coherence parameter and generate recovery guaranteesfor sparse polynomial chaos expansions obtained via ℓ -minimization. Wealso identiﬁed alternative random sampling schemes which provide sharperguarantees and demonstrate improved polynomial chaos reconstructions rel-ative to the random sampling from the orthogonality measure of these bases.These sampling methods were derived based on the properties of Hermiteand Legendre polynomials. Furthermore, we showed a Markov Chain MonteCarlo method for generating samples that minimize the coherence parameter,thereby achieving an optimality for the number of random solution realiza-tions. Such a sampling was referred to as coherence-optimal sampling.The sampling methods were compared on arbitrary manufactured stochas-tic functions, and the diﬀerent sampling strategies were tested for identify-ing the solution of a 20-dimensional elliptic boundary value problem, wherepositive results were attained for the coherence-optimal sampling method.Similarly positive results were observed when computing the solution to anon-linear ordinary diﬀerential equation, where a high order Hermite poly-nomial chaos expansion was needed for an accurate solution approximation. Acknowledgements

The authors would like to gratefully thank Prof. Akil Narayan (UMassDartmouth) for his constructive feedback on this manuscript.This material is based upon work supported by the U.S. Department ofEnergy Oﬃce of Science, Oﬃce of Advanced Scientiﬁc Computing Research,48nder Award Number DE-SC0006402, as well as National Science Foundationunder grants DMS-1228359 and CMMI-1201207.

Appendix A: Proof of Lemma 6.1

Proof.

We may rewrite (24) as η k ( ξ ) = 12 − σ k ξ − log( C k ) ξ − k ξ + k log( σ k + ξ ) ξ + log (cid:16) (cid:16) ξσ k (cid:17)(cid:17) ξ . (31)A straightforward, but lengthy algebraic substitution for ξ yields that forany ǫ >

0, as k → ∞ , η k ( p (2 + ǫ ) k ) = 12 − log(2)2(2 + ǫ ) + o (1) . (32)To show the second point of the Lemma note that σ k = p ξ − k impliesthat ∂σ k /∂ξ = ξ/σ k , and diﬀerentiating the expression (31) with respect to ξ gives ∂η k ( ξ ) ∂ξ = σ k ξ + 2 log( C k ) ξ + kξ + kξ σ k (33) −  σ k + 2 k log( σ k + ξ ) ξ + log (cid:16) (cid:16) ξσ k (cid:17)(cid:17) ξ + ξ − σ k ξ σ k ( σ k + ξ )  . Using that σ k = ξ − k , the above may be rewritten as (cid:0) σ k ξ (cid:1) ∂η k ( ξ ) ∂ξ = ξ (2 k + 4 log( C k ) −

1) + ξσ k − (cid:18) σ k (cid:18) k log( σ k + ξ ) + log (cid:18) (cid:18) ξσ k (cid:19)(cid:19)(cid:19) + 4 k ( k + 2 log( C k )) (cid:19) . It follows that ∂η k ( ξ ) /∂ξ < σ k (cid:18) k log( σ k + ξ ) + log (cid:18) (cid:18) ξσ k (cid:19)(cid:19)(cid:19) + 4 k ( k + 2 log( C k )) > ξ (2 k + 4 log( C k ) −

1) + ξσ k . σ k for ξ and k , gives that this condition is equivalent to ξ k (cid:18) k log( σ k + ξ ) + log (cid:18) (cid:18) ξσ k (cid:19)(cid:19) − k − C k ) + 12 − σ k ξ (cid:19) > (cid:18) k log( σ k + ξ ) + log (cid:18) (cid:18) ξσ k (cid:19)(cid:19) − k − C k ) (cid:19) . From this form and using that σ k /ξ <

1, it follows that the derivative isnegative for large enough ξ . More precisely, let X ξ := 2 k log( σ k + ξ ) + log (cid:18) (cid:18) ξσ k (cid:19)(cid:19) − k − C k ); Y ξ := 12 − σ k ξ ; Z ξ := ξ k , where we wish to identify ξ such that Z ξ ( X ξ + Y ξ ) > X ξ , which is equivalentto Z ξ Y ξ > (1 − Z ξ ) X ξ . Let ǫ k ≥

0, and note that if ξ ≥ p (2 + ǫ k ) k + 1then σ k /ξ < Y ξ >

0. Further, for ξ ≥ p (2 + ǫ k ) k + 1, itfollows that Z ξ >

1. We now identify an ǫ k such that for ξ ≥ p (2 + ǫ k ) k + 1,we verify that X ξ >

0, from which it follows that Z ξ Y ξ > (1 − Z ξ ) X ξ . Notethat ǫ k ≥

0, implies that σ k ≥ ξ ≥ p (2 + ǫ k ) k + 1, X ξ ≥ k log (cid:16)p (2 + ǫ k ) k (cid:17) − k − C k ) . Recalling that C k = √ k k !, we conclude from properties of the Log Gammafunction [47] thatlog( k !) = ( k + 1) log( k + 1) − ( k + 1) −

12 log (cid:18) k π (cid:19) + O ( k − );2 log( C k ) = k log(2) + ( k + 1) log( k + 1) − ( k + 1) −

12 log (cid:18) k π (cid:19) + O ( k − ) . X ξ for some C > X ξ ≥ k log (cid:18) (2 + ǫ k ) k k + 1) (cid:19) + (cid:18) − log(2 π )2 (cid:19) − log (cid:18) k + 1 √ k (cid:19) − Ck .

Noting that 1 > log(2 π ) /

2, it follows that we may guarantee that X ξ ispositive for some sequence of ǫ k > ǫ k →

0. It follows thatwe have a monotonic derivative for η k ( ξ ) with respect to ξ for ξ ≥ p (2 + ǫ k ) k ,and thus conclude the second point of this Lemma.To show the third point, we utilize a diﬀerential-diﬀerence equation [26]for orthonormal Hermite polynomials, p k + 1) ψ k +1 ( ξ ) = 2 ξψ k ( ξ ) − ψ ′ k ( ξ ) . (34)We note here that the approximation in [29] giving the approximation of (24)for ψ k extends to the derivative ψ ′ k so that for suﬃciently large k the approx-imation to ψ k in (24) is arbitrarily accurate for ψ k , and when diﬀerentiatedto ψ ′ k . Diﬀerentiating with the use of the chain rule gives that ∂∂ξ exp( η k ( ξ ) ξ ) = exp( η k ( ξ ) ξ ) (cid:18) ξ ∂η k ( ξ ) ∂ξ + 2 ξη k ( ξ ) (cid:19) . (35)Plugging (31) and (33) into (35), and in turn into (34) we have that ψ k +1 ( ξ ) ≈ ξ exp( η k ξ ) − ∂∂ξ exp( η k ξ ) p k + 1) , = exp( η k ξ ) p k + 1) (cid:18) ξ (cid:18) ξ σ k (cid:19) + ξ − σ k σ k ( σ k + ξ ) (cid:19) . It follows by (24) that ψ k +1 ( ξ ) ψ k ( ξ ) ≈ exp( η k +1 ( ξ ) ξ )exp( η k ( ξ ) ξ ) , ≈ p k + 1) (cid:18) ξ (cid:18) ξ σ k (cid:19) + ξ − σ k σ k ( σ k + ξ ) (cid:19) , (36)51etting k go to inﬁnity it follows that for some ǫ k → | ξ | > p (2 + ǫ k +1 )( k + 1) + 1,both the function and derivative approximations considered of exp( η k ( ξ ) ξ )to ψ k ( ξ ) and ∂ exp( η k ( ξ ) ξ ) /∂ to ψ ′ k ( ξ ) are arbitrarily accurate [29]. If theright-hand side of (36) is larger than 1, and k is large enough so that theapproximation is suﬃciently accurate for | ξ | = p (2 + ǫ k +1 )( k + 1) + 1, thenwe will show that (36) implies that η k +1 ( ξ ) > η k ( ξ ).We note that larger ǫ k , ǫ k +1 complicate the proof, but small enough ǫ k , ǫ k +1 do not aﬀect the comparisons made, and thus for brevity of presentationwe set ǫ k = ǫ k +1 = 0 for the remainder of this proof. Note that the ra-tio ψ k +1 ( ξ ) /ψ k ( ξ ) is monotonically increasing for | ξ | > p k + 1) + 1 as ∂σ k /∂ξ <

1, and so it suﬃces to check when | ξ | = p k + 1) + 1 , and thus σ k = √

3. In this case the ratio in (36) satisﬁes r k + 32 k + 2 + s k + 1) + k/ √ k + 2( √ k + 3 + √

3) + k √ / √ √ k + 1 ≈ ψ k +1 ( ξ ) ψ k ( ξ ) , (37) > , (38)as the ﬁrst term is always larger than 1 and all terms are positive, with thelast term increasing in k . It follows that the lemma is established when both k and k are larger than some K which insures that both the approximationin (24) is suﬃciently accurate and that the gap in (37) is suﬃciently large.For k < K note that σ k ( √ k + 1) = p k − k ) + 1 satisﬁes σ k ( √ k + 1) √ k + 1 = p k − k ) + 1 √ k + 1 → , as k /k → ∞ , while σ k ( √ k + 1) = 1 remains ﬁxed. From the expressionfor η k ( ξ ) in (31) it follows for any k , a large enough K , and k ≥ K that52 k ( √ k + 1) < η k ( √ k + 1), showing the third point of this Lemma. References [1] R. Ghanem, P. Spanos, Stochastic Finite Elements: A Spectral Ap-proach, Springer Verlag, 1991.[2] O. L. Maitre, O. Knio, Spectral Methods for Uncertainty Quantiﬁcationwith Applications to Computational Fluid Dynamics, Springer, 2010.[3] D. Xiu, Numerical Methods for Stochastic Computations: A SpectralMethod Approach, Princeton University Press, 2010.[4] D. Xiu, G. Karniadakis, The Wiener-Askey polynomial chaos forstochastic diﬀerential equations, SIAM Journal on Scientiﬁc Comput-ing 24 (2) (2002) 619–644.[5] C. Soize, R. Ghanem, Physical systems with random uncertainties:Chaos representations with arbitrary probability measure, SIAM Jour-nal of Scientiﬁc Computing 26 (2) (2005) 395–410.[6] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pur-suit, SIAM J. Sci. Comput. 20 (1998) 33–61.[7] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pur-suit, SIAM Rev. 43 (1) (2001) 129–159.[8] D. Donoho, Compressed sensing, IEEE Transactions on information the-ory 52 (4) (2006) 1289–1306. 539] A. Bruckstein, D. Donoho, M. Elad, From sparse solutions of systemsof equations to sparse modeling of signals and images, SIAM Review51 (1) (2009) 34–81.[10] E. Cand`es, J. Romberg, T. Tao, Robust uncertainty principles: exactsignal reconstruction from highly incomplete frequency information, In-formation Theory, IEEE Transactions on 52 (2) (2006) 489–509.[11] M. Elad, Sparse and redundant representations: from theory to appli-cations in signal and image processing, Springer, 2010.[12] Y. Eldar, G. Kutyniok, Compressed sensing: theory and applications,Cambridge University Press, 2012.[13] A. Doostan, H. Owhadi, A. Lashgari, G. Iaccarino, Non-adapted sparseapproximation of PDEs with stochastic inputs, Tech. Rep. AnnualResearch Brief, Center for Turbulence Research, Stanford University(2009).[14] A. Doostan, H. Owhadi, A non-adapted sparse approximation of PDEswith stochastic inputs, Journal of Computational Physics 230 (2011)3015–3034.[15] G. Blatman, B. Sudret, Adaptive sparse polynomial chaos expansionbased on least angle regression, Journal of Computational Physics 230(2011) 2345–2367.[16] L. Mathelin, K. Gallivan, A compressed sensing approach for partialdiﬀerential equations with random input data, Commun. Comput. Phys.12 (2012) 919–954. 5417] L. Yan, L. Guo, D. Xiu, Stochastic collocation algorithms using ℓ -minimization, International Journal for Uncertainty Quantiﬁcation2 (3).[18] X. Yang, G. E. Karniadakis, Reweighted ℓ minimization methodfor stochastic elliptic diﬀerential equations, Journal of ComputationalPhysics 248 (2013) 87–108.[19] G. Karagiannis, G. Lin, Selection of polynomial chaos bases via bayesianmodel uncertainty methods with applications to sparse approximationof pdes with stochastic inputs, Journal of Computational Physics 259(2014) 114–134.[20] J. Peng, J. Hampton, A. Doostan, A weighted ℓ -minimization ap-proach for sparse polynomial chaos expansions, Journal of Computa-tional Physics 267 (2014) 92–111.[21] D. Schiavazzi, A. Doostan, G. Iaccarino, Sparsemultiresolution regression for uncertainty propaga-tion, International Journal for Uncertainty Quantiﬁca-tion doi:10.1615/Int.J.UncertaintyQuantification.2014010147 .[22] K. Sargsyan, C. Safta, H. Najm, B. Debusschere, D. Ricciuto, P. Thorn-ton, Dimensionality reduction for complex models via bayesian com-pressive sensing, International Journal for Uncertainty Quantiﬁcation 4(2013) 63–93.[23] B. A. Jones, N. Parrish, A. Doostan, Post-maneuver collision proba-bility estimation using sparse polynomial chaos expansions, Journal of55uidance, Control, and DynamicsUnder Review, Available from:.URL http://ccar.colorado.edu/bajones/files/jones_2014a.pdfhttp://ccar.colorado.edu/bajones/files/jones_2014a.pdf