Time and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems
aa r X i v : . [ m a t h . D S ] S e p Time and Frequency Domain Methods for Basis Selectionin Random Linear Dynamical Systems
John D. Jakeman ∗ , Roland Pulch † September 5, 2018
Abstract
Polynomial chaos methods have been extensively used to analyze systems in uncertainty quantifica-tion. Furthermore, several approaches exist to determine a low-dimensional approximation (or sparseapproximation) for some quantity of interest in a model, where just a few orthogonal basis polynomialsare required. We consider linear dynamical systems consisting of ordinary differential equations withrandom variables. The aim of this paper is to explore methods for producing low-dimensional approx-imations of the quantity of interest further. We investigate two numerical techniques to compute alow-dimensional representation, which both fit the approximation to a set of samples in the time domain.On the one hand, a frequency domain analysis of a stochastic Galerkin system yields the selection of thebasis polynomials. It follows a linear least squares problem. On the other hand, a sparse minimizationyields the choice of the basis polynomials by information from the time domain only. An orthogonalmatching pursuit produces an approximate solution of the minimization problem. We compare the twoapproaches using a test example from a mechanical application.
Keywords. linear dynamical system, random variable, orthogonal basis, polynomial chaos, stochasticGalerkin method, least squares problem, orthogonal matching pursuit, uncertainty quantification
We consider linear dynamical systems in the form of ordinary differential equations (ODEs), which includephysical parameters. A quantity of interest (QoI) is defined as an output of the system. Uncertainties may bepresent in the parameters. In uncertainty quantification, a common approach is to interpret the parametersas random variables, see [1, 2].The state variables as well as the QoI can be expanded into a series with given orthogonal basis polyno-mials depending on the random variables and a priori unknown time-dependent coefficient functions. Thisspectral approach is called a (generalized) polynomial chaos expansion, see [3, 4, 1, 2]. Several types ofnumerical methods exist to compute an approximation of the coefficient functions. On the one hand, thestochastic Galerkin method projects the random linear dynamical system to a larger deterministic linearsystem of ODEs, whose solution yields the approximation, see [5, 6]. On the other hand, the approximationcan be fitted to random samples in a least squares regression, see [7].The number of basis polynomials up to a given total degree becomes huge in the case of a large numberof random variables. Our task is to identify a low-dimensional approximation of the random QoI, which issometimes called a sparse representation in the literature. Therein, just a small subset of basis polynomialsis required for a sufficiently accurate approximation. Numerical methods for this problem were based onleast angle regression [8], sparse grid quadrature [9], compressed sensing [10, 11], and reduced basis tech-niques [12, 13], for example. Dimension-adaptive ANOVA decompositions [14, 15, 16], and low-rank tensorapproximations [17, 18] can also be used to address the curse-of-dimensionality.In this paper, the aim is to further explore numerical methods for low-dimensional representations.We investigate and compare two techniques to construct a low-dimensional approximation with q basis Disclosure: See acknowledgementsCorrespondence author: Roland Pulch ([email protected]) ∗ Center for Computing Research, Sandia National Laboratories, P.O. Box 5800, MS 1318, Albuquerque, NM, 87185-1320,United States. † Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Straße 47, D-17489 Greif-swald, Germany. J.D. Jakeman & R. Pulch polynomials, where q is a given integer number. Both approaches fit their approximations to a set of samplesfrom the QoI in the time domain. First, a frequency domain analysis is performed for the transfer functionof the stochastic Galerkin system, which was derived in the previous work [19]. The minimization of an errorbound identifies a subset of q basis polynomials. We apply this subset and identify the accompanying time-dependent coefficient functions by a least squares problem, which was not considered in [19]. In addition,we examine an ℓ -minimization under a constraint purely in the time domain, which also yields a subset of q basis polynomials. Orthogonal matching pursuit (OMP) [20] generates a numerical approximation to theminimization problem.We apply both techniques to a test example, which models a mass-spring-damper system with randomparameters. A comparison of the errors for the low-dimensional approximations is presented. Moreover, weexamine the conformance of both approaches, i.e., if the techniques predominantly identify the same basispolynomials. The problem of the basis selection is specified in this section. We denote the sets of real numbers andcomplex numbers by R and C , respectively. In this section we investigate linear dynamical systems of the form E ( p ) ˙ x ( t, p ) = A ( p ) x ( t, p ) + B ( p ) u ( t ) y ( t, p ) = C ( p ) x ( t, p ) , (1)where the matrices A, E ∈ R n × n , B ∈ R n × n in and C ∈ R n out × n depend on physical parameters p ∈ Π ⊆ R n par . An input u : [0 , ∞ ) → R n in is given and an output y : [0 , ∞ ) × Π → R n out is provided. Without loss ofgenerality, we restrict the investigations to the case of single-input-single-output (SISO), i.e., n in = n out = 1.We assume that the mass matrix E is always non-singular. Consequently, a system of ODEs (1) is givenwith the state variables x : [0 , ∞ ) × Π → R n . In our examinations, initial value problems (IVPs) x (0 , p ) = 0are predetermined for all p ∈ Π. Furthermore, we assume that the systems (1) are asymptotically stable forall p ∈ Π, i.e., all eigenvalues Σ( p ) ⊂ C of the matrix pencil λE ( p ) − A ( p ) have a negative real part.The input-output mapping of the system (1) can be specified by a transfer function H : ( C \ Σ( p )) → C in the frequency domain, see [21, p. 65]. The transfer function of the system (1) becomes H ( s, p ) := C ( p ) ( sE ( p ) − A ( p )) − B ( p ) for s ∈ C \ Σ( p ) , (2)which represents a rational function in the frequency variable s . The input-output mapping reads as Y ( s, p ) = H ( s, p ) U ( s ), where U, Y denote the Laplace transforms of the input and the output, respectively.
In the dynamical system (1), the parameters p ∈ Π are replaced by independent random variables p : Ω → Πon some probability space (Ω , A , µ ) with event space Ω, sigma-algebra A and probability measure µ . Wesuppose the existence of a joint probability density function ρ : Π → R . Given a measurable function f : Π → R , the expected value reads as E [ f ] := Z Ω f ( p ( ω )) d µ ( ω ) = Z Π f ( p ) ρ ( p ) d p provided that the integral is finite. The associated Hilbert space L (Π , ρ ) := (cid:8) f : Π → R : f measurable and E (cid:2) f (cid:3) < ∞ (cid:9) features the inner product h f, g i := E [ f g ] = Z Π f ( p ) g ( p ) ρ ( p ) d p for f, g ∈ L (Π , ρ ) . (3) ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems3 The norm of the Hilbert space reads as k f k L (Π ,ρ ) := p h f, f i . (4)In the system (1), we assume that x ( t, · ) , . . . , x n ( t, · ) , y ( t, · ) ∈ L (Π , ρ ) point-wise for t ∈ [0 , ∞ ).We consider an orthonormal system (Φ i ) i ∈ N ⊂ L (Π , ρ ) consisting of polynomials. The theory of thegeneralized polynomial chaos (gPC) applies in this situation, see [22, 2]. Hence the basis functions satisfythe orthogonality relations h Φ i , Φ j i = (cid:26) i = j, i = j. (5)Such an orthonormal system exists under the above assumptions. However, the system is not always com-plete, cf. [4]. Since we investigate finite approximations, completeness is not required in the following.Finite approximations are obtained by x ( m ) ( t, p ) = m X i =1 v i ( t )Φ i ( p ) and y ( m ) ( t, p ) = m X i =1 w i ( t )Φ i ( p ) (6)including m basis polynomials. Due to the orthogonalities (5), the transient coefficient functions v i : [0 , ∞ ) → R n and w i : [0 , ∞ ) → R given by v i,j ( t ) = h x j ( t, · ) , Φ i ( · ) i and w i ( t ) = h y ( t, · ) , Φ i ( · ) i (7)represent the best approximation in the subspace spanned by { Φ , . . . , Φ m } with respect to the norm (4). Ifthe orthonormal system is complete, then the convergencelim m →∞ (cid:13)(cid:13)(cid:13) y ( t, · ) − y ( m ) ( t, · ) (cid:13)(cid:13)(cid:13) L (Π ,ρ ) = 0 for each t is guaranteed for the output (QoI) and, likewise, for the state variables.We assume that the orthonormal system (Φ i ) i ∈ N is ordered in ascending total degrees, i.e., degree(Φ i ) ≤ degree(Φ j ) for i ≤ j . Thus Φ ≡ d . The number of polynomials reads as, see [2, p. 65], m ( d ) = ( n par + d )! n par ! d ! (8)for a fixed number n par of random parameters. This number becomes large for high dimensions n par of therandom space, even though the total degree may be moderate, say 2 ≤ d ≤ We investigate the input-output behavior of a stochastic Galerkin system.
An approximation of the coefficient functions (7) belonging to the solution of the random dynamical sys-tem (1) can be obtained by a stochastic Galerkin approach. This technique yields a larger coupled systemˆ E ˙ˆ v ( t ) = ˆ A ˆ v ( t ) + ˆ Bu ( t )ˆ w ( t ) = ˆ C ˆ v ( t ) (9)with matrices ˆ A, ˆ E ∈ R mn × mn , ˆ B ∈ R mn and ˆ C ∈ R m × mn . Thus the system is single-input-multiple-output (SIMO). The matrix ˆ E is non-singular in most cases and thus (9) is a system of deterministic ODEs.Although the stochastic Galerkin system may be unstable, see [23], this loss of stability happens ratherseldom. We assume that the system (9) is asymptotically stable. IVPs ˆ v (0) = 0 are imposed. More detailson the stochastic Galerkin approach for random linear dynamical systems can be found in [5, 6, 19]. J.D. Jakeman & R. Pulch
The outputs ˆ w = ( ˆ w , . . . , ˆ w m ) ⊤ of the system (9) represent an approximation of the exact coefficients (7).The associated approximation of the QoI in the random dynamical system (1) reads asˆ y ( m ) ( t, p ) = m X i =1 ˆ w i ( t )Φ i ( p ) . (10)The Galerkin-projected system (9) exhibits its own input-output behavior described by a complex-valuedtransfer function ˆ H : C \ ˆΣ → C , ˆ H = ( ˆ H , . . . , ˆ H m ) ⊤ with a finite set of poles ˆΣ, see [24, 25]. It holds thatˆ H ( s ) = ˆ C (cid:16) s ˆ E − ˆ A (cid:17) − ˆ B for s ∈ C \ ˆΣ . (11)Again the transfer function represents a rational function. The linear dynamical system (9) is asymptoticallystable, if and only if the set of poles ˆΣ is located in the left half of the complex plane. Moreover, the transferfunction is always strictly proper for systems of ODEs, which is defined by the conditionlim s →∞ ˆ H ( s ) = 0 (12)in each component. The computational effort for an evaluation of (11) is dominated by a single LU -decomposition of the matrix s ˆ E − ˆ A .In the frequency domain, Hardy norms characterize the magnitude of a transfer function, see [21]. Thesenorms allow for error estimates. We will use the H -norm component-wise, i.e., (cid:13)(cid:13)(cid:13) ˆ H i (cid:13)(cid:13)(cid:13) H = s π Z + ∞−∞ (cid:12)(cid:12)(cid:12) ˆ H i (i ω ) (cid:12)(cid:12)(cid:12) d ω = s π Z + ∞ (cid:12)(cid:12)(cid:12) ˆ H i (i ω ) (cid:12)(cid:12)(cid:12) d ω (13)for i = 1 , . . . , m with i = √−
1. Alternatively, the H ∞ -norm can be considered component-wise, i.e., (cid:13)(cid:13)(cid:13) ˆ H i (cid:13)(cid:13)(cid:13) H ∞ = sup ω ∈ R (cid:12)(cid:12)(cid:12) ˆ H i (i ω ) (cid:12)(cid:12)(cid:12) = sup ω ≥ (cid:12)(cid:12)(cid:12) ˆ H i (i ω ) (cid:12)(cid:12)(cid:12) (14)for i = 1 , . . . , m . In [19], both H - and H ∞ -norm were investigated in this context. We apply only the H -norm in the following, because the results are qualitatively the same in both cases. There are several possibilities to compute the H -norm of a linear dynamical system, cf. [21]. We use anown technique based on quadrature, where the computational effort is nearly independent of the number m of basis polynomials. An approximation of the H -norms (13) is obtained by a quadrature with (single)rectangular rule in [0 , ω min ] and composite trapezoidal rule in [ ω min , ω max ] given 0 < ω min < ω max . We applya grid of the form ω min = ω < ω < · · · < ω ν − < ω ν = ω max . The approximation of (13) reads as Z ∞ (cid:12)(cid:12)(cid:12) ˆ H i (i ω ) (cid:12)(cid:12)(cid:12) d ω ≈ ω (cid:12)(cid:12)(cid:12) ˆ H i (i ω ) (cid:12)(cid:12)(cid:12) + ν X j =2 ω j − ω j − (cid:18)(cid:12)(cid:12)(cid:12) ˆ H i (i ω j − ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ˆ H i (i ω j ) (cid:12)(cid:12)(cid:12) (cid:19) for i = 1 , . . . , m . Since the transfer function (11) is continuous at ω = 0, the quadrature error within[0 , ω min ] is negligible for ω min sufficiently close to zero. The property (12) guarantees that the truncationerror, which arises by discarding the interval ( ω max , ∞ ), is negligible for sufficiently large ω max . In eachnode of the quadrature, the evaluation of (11) requires mainly an LU -decomposition of the matrix i ω j ˆ E − ˆ A independent of the output matrix ˆ C provided that m is not extremely large. If the mass matrix E ( p ) is singular, then the linear dynamical system (1) consists of differential-algebraicequations (DAEs). In most cases, the mass matrix ˆ E of the stochastic Galerkin system (9) also becomessingular. We assume an asymptotically stable system (9) again, where the set of poles ˆΣ is situated in the left ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems5 half of the complex plane. It follows that the associated matrix pencil is regular. We outline the potentialto obtain the frequency domain information from Section 2.3.1, which is required for the basis selection inSection 2.5.A linear system of DAEs is characterized by its (nilpotency) index κ ∈ N ( κ ≥ κ be the index of the stochastic Galerkin system (9). Consequently, the transfer function of (9) reads asˆ H ( s ) = ˆ H SP ( s ) + ˆ P ( s ) (15)with a strictly proper vector-valued rational function ˆ H SP satisfying (12) and a vector-valued polynomial ˆ P of degree at most κ −
1. The formula (15) is valid for DAEs in general, see [27]. If the polynomial part ˆ P vanishes, then the H -norms (13) do exist for all components. Hence the strategy performs as in the case ofODEs. If the polynomial part does not vanish, then some or all H -norms do not exist.If the index of the stochastic Galerkin system (9) is one, then the polynomial part ˆ P becomes constant(degree zero). In this case, the existence of the H ∞ -norms (14) is guaranteed for all components. Thefrequency domain analysis can be done using the norms (14). Let an initial approximation (6) be given with a large number (8) of basis polynomials up to total degree d .Let I d := { , . . . , m ( d ) } be the respective index set. In general, nearly all associated coefficient functionsare non-zero. However, the coefficients typically exhibit different orders of magnitudes and a fast decay forincreasing degree.Our aim is to identify a low-dimensional representation˜ y ( I ) ( t, p ) = X i ∈I ˜ w i ( t )Φ i ( p ) (16)with an index set I ⊂ I d satisfying q := |I| ≪ m ( d ). Thus the selected basis functions from (16) represent asubset of the basis functions in (6). Yet the difference y − ˜ y ( I ) with y from (1) or (at least) ˆ y ( m ( d )) − ˜ y ( I ) withˆ y from (10) should be sufficiently small in the L (Π , ρ )-norm point-wise in time. The approximation (16) isalso called a q -sparse representation, cf. [8] for a motivation. Hence the problem consists in the identificationof both the index set I and associated approximations ˜ w i of the coefficient functions for a desired number q .If a transient solution of the stochastic Galerkin system (9) is available, then we can simply set ˜ w i := ˆ w i for i ∈ I . In [19], the error ˆ y ( m ( d )) − ˜ y ( I ) between the approximation (10) from the stochastic Galerkin method and thelow-dimensional approximation (16) was analyzed using the frequency domain. The H -norms (13) of thetransfer function belonging to the Galerkin-projected system (9) are considered. Let I ⊂ I d be any indexset. Theorem 1 in [19] demonstrates the error estimatesup t ≥ (cid:13)(cid:13)(cid:13) ˆ y ( m ( d )) ( t, · ) − ˜ y ( I ) ( t, · ) (cid:13)(cid:13)(cid:13) L (Π ,ρ ) ≤ vuut X i ∈I d \I (cid:13)(cid:13)(cid:13) ˆ H i (cid:13)(cid:13)(cid:13) H k u k L [0 , ∞ ) (17)with ˜ w i = ˆ w i for all i ∈ I . The usual L [0 , ∞ )-norm is used for the time-dependent input. An advantageis that the error is bounded uniformly for all times. However, this error bound is not sharp.For a desired cardinality q = |I| , we determine a minimum upper bound in the right-hand side of (17).An optimal index set satisfies, see [19, p. 7], I q = arg min I⊆I d X i ∈I d \I (cid:13)(cid:13)(cid:13) ˆ H i (cid:13)(cid:13)(cid:13) H : |I| = q . (18)In practice, the norms will be pairwise different. Hence the unique index set I q includes the componentswith the q largest H -norms. Furthermore, the minimization of the upper bound is independent of the J.D. Jakeman & R. Pulch choice of the input u . Approximations of the H -norms, which occur in (18), can be computed numericallyas described in Section 2.3.2.This frequency domain analysis can be extended to the case of multiple outputs ( n out >
1) in (1)straightforward. Either a large output matrix ˆ C ∈ R mn out × mn in (9) or n out separate systems (9) withoutput matrices ˆ C j ∈ R m × mn for j = 1 , . . . , n out can be arranged. We discuss the latter approach. In thetransfer function (11), the part ˆ F ( s ) := ( s ˆ E − ˆ A ) − ˆ B is identical for any output. An evaluation of ˆ F (i ω ) can be reused for all systems in a quadrature to approxi-mate the integrals (13). Thus the computational effort for a frequency domain analysis is nearly independentof the number of outputs (provided that n out ≪ mn ). A separate low-dimensional basis can be computedfor each output. In this section, we discuss numerical techniques for a computation of a low-dimensional representation andits coefficient functions in the time domain. Specifically, we consider two approaches: (i) sparsity inducingregression and (ii) least squares regression using the sparse index set consisting of the q largest Hardy normsidentified by the frequency domain analysis. Our numerical methods apply the information from k samples of the output (QoI) of the linear dynamicalsystem (1) for realizations p , . . . , p k of the random variables. We collect the samples in a vector-valuedtime-dependent function ¯ y ( t ) := ( y ( t, p ) , . . . , y ( t, p k )) ⊤ . (19)Let a basis { Φ , . . . , Φ m ( d ) } be given with all orthonormal polynomials up to a total degree d . We arrangethe Vandermonde matrix V ∈ R k × m , V = ( v ij ) , v ij := Φ j ( p i ) (20)for i = 1 , . . . , k and j = 1 , . . . , m . This matrix is dense. Now let v , . . . , v m ∈ R k be the columns of thematrix (20) and I q ⊆ I d with I q = { j , . . . , j q } be an arbitrary subset of indices. We use the notation V I q := (cid:0) v j , . . . , v j q (cid:1) ∈ R k × q (21)for the matrix, which consists of the associated subset of columns from the Vandermonde matrix (20). Thecardinality (8) of the total-degree basis m ( d ) grows quickly with the number of random parameters n par and degree d . Thus, in the following, we assume k < m ( d ), because for computationally expensive lineardynamical systems we cannot afford to run the model exhaustively. We consider the determination of a sparse representation in the time domain now.
Let a fixed time point t > q -sparse representation at this timepoint by greedily minimizing the ℓ -norm k w k := |{ i : w i = 0 }| for w ∈ R m , which counts the number of non-zero entries in a vector. Specifically, the OMP approach finds an approxi-mation to the problem ˜ w ( t ) = arg min k ˜ w ( t ) k such that k V ˜ w ( t ) − ¯ y ( t ) k ≤ ε (22) ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems7 with the Euclidean norm k · k and ε ≥ w ( t ). At each iteration step, a least squares problem is solved using a subset of active columns of V . At the q th iteration step, OMP updates an active index set I q , in a greedy fashion, such that the inactive columnindex j / ∈ I q with the highest correlation (inner product) with the current residual is added to the activeset. This update reads as I q = I q − ∪ { j } with j = arg max i ∈I\I q − r q − ( t ) ⊤ v i k v i k . (23)OMP then solves the least squares problem arg min ˜ w ( q ) ( t ) ∈ R q k r q ( t ) k , (24)where r q ( t ) = V I q ˜ w ( q ) ( t ) − ¯ y ( t ).Setting ε = 0 in (22), the algorithm yields a finite sequence of q -sparse approximations (16) for q =1 , , . . . , q max , where q max ≤ min( k, m ), thus removing the need to estimate ε . In the following, we willassume that q max ≤ k < m , which always implies over-determined linear systems. For a given number q thesolution obtained using OMP is a greedy attempt to find a solution with q non-zero coefficients such thatthe residual R ( t ) := k V ˜ w ( t ) − ¯ y ( t ) k (25)is minimal. We will employ this approach in the time domain.Unlike the least squares approach in Section 3.3, OMP must be applied at each time point separately.Here we use an OMP implementation based upon rank-one updates of a QR -factorization of the matrix V I q to form V I q +1 , see [28, p. 334]. The total complexity of the q th iteration is O ( km + ks ). As shown in [29],the total complexity of our OMP for a single time point is q max km + m q max X q =1 q = mq max (cid:0) k + ( q max + 1) (cid:1) . OMP does not require the construction of the full Vandermonde matrix. If memory is limited, one cansimply compute each basis vector v i for i ∈ I \ I q − in each iteration q = 1 , . . . , q max . The construction ofthe basis vector v i is necessary to compute the inner products r q − ( t ) ⊤ v i and k v i k in (23). Although thefull Vandermonde matrix does not have to be stored, the QR -factorization of the matrix V I q must be stored.Only if q = m , then V I q will be the full Vandermonde matrix. However, we suppose q ≪ m .Note that avoiding the storage of the full Vandermonde matrix requires repeated computation of the basisvectors v i , which affects the complexity analysis. For the problem sizes considered in this paper memory wasnot an issue, so the full Vandermonde matrix was stored to avoid the redundant computation of the basisvectors. OMP is directly related to ℓ -minimization [30] which finds the dominant gPC coefficients by solving˜ w ( t ) = arg min k ˜ w ( t ) k such that k V ˜ w ( t ) − ¯ y ( t ) k ≤ ε (26)with the norm k w k = | w | + · · · + | w m | for w ∈ R m . This ℓ -minimization problem is often referred to asBasis Pursuit Denoising. The problem obtained by setting ε = 0, to enforce interpolation, is termed BasisPursuit (BP).Let { Φ , . . . , Φ m } be an orthonormal polynomial basis with respect to the probability measure of therandom variables p , i.e., the relations (5) are satisfied. Furthermore, in (26), let V ∈ R k × m be the matrixobtained by random sampling the basis under the probability measure. If the number of samples k satisfies k ≥ CL m q log q log m with L m = max i =1 ,...,m k Φ i k ∞ , J.D. Jakeman & R. Pulch then every q -sparse vector can be recovered by (26) (for ε = 0) with probability at least 1 − m − γ log q including constants C, γ >
0, see [31]. The maximum norm k·k ∞ is taken over the support of the probabilitymeasure.OMP requires stronger theoretical conditions than BP, see [32]. For example, for a sufficiently smallratio of the number of samples to sparsity, BP can guarantee recovery of all q -sparse polynomials with highprobability, whereas, for a fixed set of samples, OMP can guarantee recovery of at least one sparse polynomialbut not all, see [33]. However, despite this theoretical difference, in practice OMP can still obtain comparableaccuracy to non-approximate algorithms such as BP [33]. Moreover, OMP has a much faster execution speedthan most algorithms, which in conjunction with its iterative nature makes OMP more amenable to assessingthe effect of sparsity on the accuracy of our approximations for dynamical systems.We remark that although we focus on linear problems, sparse approximation strategies can also be appliedto nonlinear dynamical systems, for example [7]. Orthogonal matching pursuit greedily builds up a sparse representation of the output (QoI) of a lineardynamical system (1) and uses least squares to solve an over-determined linear system. Alternatively, wedetermine a q -sparse representation applying a basis specified by the frequency domain analysis in Section 2.5,i.e., for the index set (18).Let V I q be the matrix (21) for the index set I q . Likewise, let ˜ w ( q ) ( t ) ∈ R q be approximations of thecoefficient functions (7) associated to basis functions determined by the index set I q . Assuming q < k , weachieve a least squares problem min ˜ w ( q ) ( t ) ∈ R q (cid:13)(cid:13)(cid:13) V I q ˜ w ( q ) ( t ) − ¯ y ( t ) (cid:13)(cid:13)(cid:13) (27)with the Euclidean norm k · k point-wise in time. Since the matrix V I q exhibits full rank, a unique solution˘ w ( q ) exists. The solution ˘ w ( q ) directly yields a low-dimensional approximation (16).Moreover, the matrix V I q is time-invariant and thus a QR -factorization, see [28, p. 246], of this matrixcan be reused to solve the least squares problems in all time points. This property makes the technique alsoadvantageous in the case of q ≈ k ( q < k ), if k is not too large. Using Householder transformations, thenumber of operations becomes 2 kq − q in the QR -factorization. The numerical solution of the problem (27)with a given QR -decomposition requires just about 2 kq + q operations. Proceeding for q = 1 , . . . , q max ,rank-one updates can be used in the QR -decompositions, see [28, p. 334], for keeping the computationalcomplexity small. The goal of this paper is to explore methods for producing sparse representations of the output of lineardynamical systems. Specifically, we consider time-frequency analysis and compressive sensing to producesparse representations. Other approaches do exist for reducing the complexity of a polynomial approximationof a function. For example, low-rank approaches [34, 17] can be used to find low-rank approximations ofthe coefficients of a tensor-product polynomial basis. For tensor-train representations [35], the number ofunknowns grows only linearly with dimension and quadratically with rank. The coefficients of a tensor-product basis are often dense. However, there have been some attempts to combine low-rank approximationwith sparsity inducing methods [36, 18].The cardinality (8) of the total-degree polynomial basis grows exponentially with dimension d and,consequently, can become intractable for large number of variables. Often only a subset of variables andinteractions between variables contribute significantly to variations in a function. The (adaptive) ANOVAdecomposition can be used to identify and exploit such effective dimensionality [14, 15, 16]. A method thatexploits both sparsity and adaptive ANOVA was developed in [37].The ability to recover sparse, and even low-rank approximations, is dependent on the method usedto select samples at which to evaluate the function being approximated. For sparse approximation usingLegendre basis randomly sampling from the uniform probability measure, as we do in Section 4, is adequate.However, for other orthogonal polynomials sampling from the probability measure can dramatically decreasethe ability to recover sparse approximations. Changes of measure have been used successfully to improve ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems9 sparse recovery [38, 39, 40]. Algorithms for generating samples with a well-conditioned Vandermonde matrixvia subsampling of tensor-product quadrature have also been used for sparse approximation [41] as well asregression, interpolation and low-rank approximation [42, 43, 44, 45]. We assume that the solution of the stochastic Galerkin system (9) yields a sufficiently accurate approxima-tion (10) of the QoI by the selection of a sufficiently large total degree d . Thus the truncation error as well asthe error of the Galerkin approach are negligible in comparison to a sparsification error. A low-dimensionalapproximation (16) is defined by the index set I and its coefficient functions ˜ w i for i ∈ I . We embed thesecoefficients in a vector ˜ w ∈ R m by specifying ˜ w i = 0 for i / ∈ I .In the time domain, the L (Π , ρ )-norm (4) of the error becomes (cid:13)(cid:13)(cid:13) ˆ y ( m ( d )) ( t, · ) − ˜ y ( I ) ( t, · ) (cid:13)(cid:13)(cid:13) L (Π ,ρ ) = k ˆ w ( t ) − ˜ w ( t ) k for t ≥ k · k due to Parseval’s theorem. We examine the relative error in L (Π , ρ ), i.e., E L ( t ) := k ˆ w ( t ) − ˜ w ( t ) k k ˆ w ( t ) k (28)point-wise for t ≥ k samples (19) of the QoI. Both the minimiza-tion problem from Section 3.2 and the least squares problem from Section 3.3 feature the property thatthe residual (25) decreases monotone for increasing dimension q of the subspace at fixed time. The resid-ual (25) also yields a rough estimate of the L (Π , ρ )-norm of the difference between the exact QoI and thelow-dimensional representation, i.e., (cid:13)(cid:13)(cid:13) y ( t, · ) − ˜ y ( I ) ( t, · ) (cid:13)(cid:13)(cid:13) L (Π ,ρ ) ≈ √ k k V ˜ w ( t ) − ¯ y ( t ) k (29)for each t . The estimate (29) converges for k, m → ∞ in probability (provided that the orthonormal basis iscomplete).The error measures (28) and (25) are given point-wise in time. Global measures can be obtained bytaking (integral) mean values on a finite time interval. In practice, the problems are solved for a finite setof time points and (arithmetic) means can be applied. We use a mass-spring-damper system from [46] depicted in Figure 1. This configuration consists of 4 masses,6 springs and 4 dampers, i.e., n par = 14 parameters. The input is the excitation at the bottom mass, whereasthe position of the top mass represents the output. The mathematical modeling yields a system (1) of ODEswith dimension n = 8.Transient simulations are performed in a time interval [0 , T ]. We supply a harmonic oscillation as inputsignal u ( t ) = (cid:26) sin( ω t ) for 0 ≤ t ≤ T, , (30)with frequency ω . The compact support [0 , T ] guarantees u ∈ L [0 , ∞ ). We choose T = 500 and ω = 0 . k u k L [0 , ∞ ) ≈ ε rel = 10 − and the absolute tolerance ε abs = 10 − . Thesehigh accuracy requirements imply that the errors of the time integrations are negligible in comparison to theother error sources.In the stochastic modeling, we choose the means of parameters as the values given in [46]. Independentuniformly distributed random variables are introduced with 5% variation around the mean values. Thus the yu Figure 1: Mass-spring-damper configuration.
Figure 2: Expected value (left) and standard deviation (right) of random output for periodic input signal inmass-spring-damper system.gPC expansions (6) include the multivariate Legendre polynomials. We incorporate all polynomials up tototal degree d = 3, which results in m = 680 basis functions: 1 of degree zero, 14 of degree one, 105 of degreetwo and 560 of degree three.Now we arrange the stochastic Galerkin system (9) of dimension mn = 5440. The spectral abscissa(largest real part of eigenvalues in matrix pencil λ ˆ E − ˆ A ) becomes − . H -norms (13) as described in Section 2.3.2.We choose ω min = 10 − and ω max = 10 . In the interval [ ω min , ω max ], we use a logarithmically spaced gridwith ν = 200 points. Broader intervals [ ω min , ω max ] or higher numbers of grid points do not considerablychange the results. The computed norms are shown in Figure 3. We observe that the norms exhibit differentorders of magnitudes and a rapid decay. The H -norms allow for the computation of the error bounds inthe right-hand side of (17) using the optimal index sets (18) for increasing dimensions. Figure 4 depicts theerror estimates for a normalized input, i.e., k u k L [0 , ∞ ) = 1. Even though the error bounds are not sharp,they decrease exponentially.Furthermore, we generate k = 500 samples of the QoI by solving IVPs of the linear dynamical system (1)with input (30) in the interval [0 , T ]. We determine low-dimensional representations (16) in equidistant timepoints t j = j Tr for j = 1 , . . . , r using r = 1000. The presented results are arithmetic mean values of the ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems11 -10 -8 -6 -4 -2 -10 -8 -6 -4 -2 Figure 3: H -norms of output components in stochastic Galerkin system: for each component (left), dashedlines separate the coefficients for polynomial degree zero/one, two and three, and in descending order (right). -8 -6 -4 -2 Figure 4: Optimal error bounds (17) from H -norms for input with unit norm given different dimensions ofsubspaces (number of basis polynomials). Figure 5: Condition numbers of matrices in linear least squares problems (27) for different dimensions.
Figure 6: Relative L -errors (28) (left) and residuals (25) (right) of the low-dimensional representationsobtained by least squares problem and OMP approach.quantities in these time points, which can be seen as approximations of an integral mean.We compute q -sparse approximations (16) for q = 1 , , . . . , q is time-invariant in this approach. Figure 5 shows the condition numbers(with respect to the spectral norm) of the matrices in the least squares problems (27). We observe that allproblems are well-conditioned. On the other hand, the minimization problem including OMP from Section 3.2is applied, where the basis selection takes place separately for each time point. Concerning the L -error,the transient solution of the Galerkin-projected system (9) serves as reference solution in the r time points.Figure 6 demonstrates both L -errors (28) and residuals (25). The residuals decrease monotone in bothmethods, which is guaranteed point-wise in time by construction. The L -errors of the least squares problemalso reduce monotone for increasing dimensions of the subspaces. In the OMP technique, this error decreasesrapidly first and then increases slightly. The reason is that the underlying minimization problem (22) with ε = 0 tends to an interpolation of the samples for higher dimensions of the subspaces.For fixed dimension q , we obtain index sets I lsp q and I OMP q from the two techniques, which are associatedto subsets of basis polynomials. It holds that |I lsp q | = |I OMP q | = q . We investigate the agreement of the twosets. The ratio of basis functions in the intersection reads as θ ( q ) := |I lsp q ∩ I OMP q | q . A ratio θ ( q ) = 0 implies that the two sets are disjoint, whereas a ratio θ ( q ) = 1 represents identical sets. The ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems13
20 40 60 80 10000.20.40.60.81
Figure 7: Ratio of basis polynomials in the intersection of index sets from least squares problem in Section 3.3and sparse minimization problem in Section 3.2.
20 40 60 80 10000.51 20 40 60 80 10000.51
Figure 8: Ratios of the polynomials of different degrees in the subspaces identified by least squares problemfrom Section 3.3 (left) and sparse minimization problem from Section 3.2 (right).larger the ratio the more the two basis selections agree. The value 1 − θ ( q ) is the ratio for the differences I lsp q \I OMP q as well as I OMP q \I lsp q . Figure 7 depicts these ratios, which represent the mean values in time again.We observe that about 60% or more of the basis functions are within both index sets for all dimensions.However, the relative L -error, see Figure 6, is nearly the same in both methods. The minimum errorsare around 0 .
01, which can be seen as a good low-dimensional approximation. We conclude that basispolynomials in the intersection of both methods are the most important contributors to the approximation.In addition, we examine the degrees of the polynomials within the index sets for the two methodsseparately. If q basis polynomials have been chosen in a method, then let q j be the number of includedpolynomials of degree j (0 ≤ q j ≤ q ). We obtain the ratios 0 ≤ q j q ≤ j = 0 , , ,
3. The sum of the fourratios is equal to one. Figure 8 shows that these ratios are similar in both approaches. The polynomial ofdegree zero is included in nearly all computed bases. The two techniques are able to identify the dominantbasis polynomials among all degrees.
We compared two numerical techniques for a basis selection to achieve low-dimensional approximations inpolynomial chaos expansions. In the first approach the basis is identified by an analysis in the frequencydomain followed by a least squares problem for coefficients in the time domain. The second approach solves a sparse minimization problem using information in the time domain.We tested the methods on a linear dynamical system modeling a mechanical configuration. The numericalresults demonstrate that both techniques identify low-dimensional approximations of the same quality, i.e.,with a sufficiently small relative L -error. Although the identification of the basis polynomials uses differentapproaches, most of the selected basis functions (60% or more) coincide in both methods at all times.Moreover, the number of basis polynomials for each degree separately is nearly the same. Acknowledgements
Sandia National Laboratories is a multimission laboratory managed and operated by National Technologyand Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., forthe U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.The views expressed in the article do not necessarily represent the views of the U.S. Department of Energyor the United States Government. John Jakeman’s work was supported by DARPA EQUiPS.The authors are indebted to Prof. Dr. Akil Narayan (University of Utah) for helpful discussions.
References [1] Sullivan, T.J.,
Introduction to Uncertainty Quantification , Springer, 2015.[2] Xiu, D.,
Numerical methods for stochastic computations: a spectral method approach , Princeton Univer-sity Press, 2010.[3] Augustin, F., Gilg, A., Paffrath, M., Rentrop, P., and Wever, U., Polynomial chaos for the approximationof uncertainties: chances and limits,
Euro. Jnl. of Applied Mathematics , , pp. 149–190, 2008.[4] Ernst, O.G., Mugler, A., Starkloff, H.J., and Ullmann, E., On the convergence of generalized polynomialchaos expansions, ESAIM: Mathematical Modelling and Numerical Analysis , pp. 317–339, 2012.[5] Pulch, R., Polynomial chaos for linear differential algebraic equations with random parameters, Int. J.Uncertain. Quantif. , (3), pp. 223–240, 2011.[6] Pulch, R., Stochastic collocation and stochastic Galerkin methods for linear differential algebraic equa-tions, J. Comput. Appl. Math. , , pp. 281–291, 2014.[7] Hampton, J. and Doostan, A., Coherence motivated sampling and convergence analysis of least-squarespolynomial regression, Comput. Methods Appl. Mech. Eng. , , pp. 73–97, 2015.[8] Blatman, G. and Sudret, B., Adaptive sparse polynomial chaos expansion based on least angle regression, J. Comput. Phys. , , pp. 2345–2367, 2011.[9] Conrad, P.R. and Marzouk, Y.M., Adaptive Smolyak pseudospectral approximations, SIAM J. Sci.Comput. , (6), pp. A2643–A2670, 2013.[10] Doostan, A. and Owhadi, H., A non-adapted sparse approximation of PDEs with stochastic inputs, J.Comput. Phys. , , pp. 3015–3034, 2011.[11] Yan, L., Guo, L., and Xiu, D., Stochastic collocation algorithms using ℓ -minimization. Int. J. Uncertain.Quantif. , (3), pp. 279–293, 2012.[12] Nair, P.B. and Keane, A.J., Stochastic reduced basis methods, AIAA J. , , pp. 1653–1664, 2002.[13] Sachdeva, S.K., Nair, P.B., and Keane, A.J., Hybridization of stochastic reduced basis methods withpolynomial chaos expansions, Probabilistic Eng. Mech. , , pp. 182–192, 2006.[14] Prasad, A.K. and Roy, S., Accurate reduced dimensional polynomial chaos for efficient uncertaintyquantification of microwave/RF networks, IEEE Transactions on Microwave Theory and Techniques , (10), pp. 3697–3708, 2017. ime and Frequency Domain Methods for Basis Selection in Random Linear Dynamical Systems15 [15] Yang, X., Choi, M., Lin, G., and Karniadakis, G.E., Adaptive ANOVA decomposition of stochasticincompressible and compressible flows. J. Comput. Phys. , (4): pp. 1587–1614, 2012.[16] Zhang, Z., Yang, X., Oseledets, I.V., Karniadakis, G.E., and Daniel, L., Enabling high-dimensionalhierarchical uncertainty quantification by ANOVA and tensor-train decomposition, IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems , (1), pp. 63–76, 2015.[17] Oseledets, I.V., Constructive representation of functions in low-rank tensor formats, Constr. Approx. , (1), pp. 1–18, 2013.[18] Zhang, Z., Weng, T.W., and Daniel, L., Big-data tensor recovery for high-dimensional uncertaintyquantification of process variations, IEEE Transactions on Components, Packaging and ManufacturingTechnology , (5), pp. 687–697, 2017.[19] Pulch, R., Model order reduction and low-dimensional representations for random linear dynamicalsystems, Math. Comput. Simulat. , , pp. 1–20, 2018.[20] Pati, Y., Rezaiifar, R., and Krishnaprasad, P., Orthogonal matching pursuit: Recursive function ap-proximation with applications to wavelet decomposition, in Proceedings of the 27th Annual AsilomarConference on Signals, Systems, and Computers , pp. 40–44, 1993.[21] Antoulas, A.,
Approximation of Large-Scale Dynamical Systems , SIAM Publications, 2005.[22] Xiu, D. and Karniadakis, G.E., The Wiener-Askey polynomial chaos for stochastic differential equations,
SIAM J. Sci. Comput. , (2), pp. 619–644, 2002.[23] Pulch, R. and Augustin, F., Stability preservation in stochastic Galerkin projections of dynamical sys-tems, arXiv:1708.00958, 2017.[24] Pulch, R., ter Maten, E.J.W., and Augustin, F., Sensitivity analysis and model order reduction forrandom linear dynamical systems, Math. Comput. Simulat. , , pp. 80–95, 2015.[25] Pulch, R. and ter Maten, E.J.W., Stochastic Galerkin methods and model order reduction for lineardynamical systems, Int. J. Uncertain. Quantif. , (3), pp. 255–273, 2015.[26] Hairer, E. and Wanner, G., Solving Ordinary Differential Equations. Vol. 2: Stiff and Differential-Algebraic Equations , 2nd ed., Springer, 1996.[27] Benner P. and Stykel, T., Model order reduction of differential-algebraic equations: a survey, in
Surveysin Differential-Algebraic Equations IV , A. Ilchmann and T. Reis, Eds., Differential-Algebraic EquationsForum, Springer, pp. 107–160, 2017.[28] Golub, G.H. and van Loan, C.F.,
Matrix Computations , 4th ed., Johns Hopkins University Press, 2013.[29] Sturm, B.L. and Christensen, M.G., Comparison of orthogonal matching pursuit implementations, in
Proceedings of the 20th European Signal Processing Conference (EUSIPCO) , pp. 220–224, 2012.[30] Candes, E.J., Romberg, J.K., and Tao, T., Stable signal recovery from incomplete and inaccuratemeasurements,
Comm. Pure Appl. Math. , (8), pp. 1207–1223, 2006.[31] Rauhut, H., Compressive sensing and structured random matrices, in Theoretical Foundations andNumerical Methods for Sparse Recovery , M. Fornasier, Ed., , DE GRUYTER, pp. 1–92, 2010.[32] Cai, T.T. and Wang, L., Orthogonal matching pursuit for sparse signal recovery with noise, IEEE Trans.Inf. Theory , (7), pp. 4680–4688, 2011.[33] Kunis, S. and Rauhut, H., Random sampling of sparse trigonometric polynomials, II. orthogonal match-ing pursuit versus basis pursuit, Found. Comput. Math. , (6), pp. 737–763, 2008.[34] Doostan, A., Validi, A., and Iaccarino, G., Non-intrusive low-rank separated approximation of high-dimensional stochastic models, Comput. Methods Appl. Mech. Eng. , , pp. 42–55, 2013. [35] Gorodetsky, A.A. and Jakeman, J.J., Gradient-based optimization for regression in the functional tensor-train format, arXiv:1801.00885v2, accepted J. Comp. Phys. , 2018.[36] Mathelin, L., Quantification of uncertainty from high-dimensional scattered data via polynomial ap-proximation,