[PDF] A Block Circulant Embedding Method for Simulation of Stationary Gaussian Random Fields on Block-regular Grids

Abstract

We propose a new method for sampling from stationary Gaussian random field on a grid which is not regular but has a regular block structure which is often the case in applications. The introduced block circulant embedding method (BCEM) can outperform the classical circulant embedding method (CEM) which requires a regularization of the irregular grid before its application. Comparison of BCEM vs CEM is performed on some typical model problems.

Full PDF

aa r X i v : . [ s t a t . C O ] A p r A Block Circulant Embedding Method for Simulation of StationaryGaussian Random Fields on Block-regular Grids

M. Park ∗ and M.V. Tretyakov ∗ March 29, 2018

Abstract

We propose a new method for sampling from stationary Gaussian random ﬁeld on a grid which isnot regular but has a regular block structure which is often the case in applications. The introducedblock circulant embedding method (BCEM) can outperform the classical circulant embedding method(CEM) which requires a regularization of the irregular grid before its application. Comparison of BCEMvs CEM is performed on some typical model problems.

Keywords:

Stationary Gaussian random ﬁeld, irregular grids, sampling techniques, circulant embeddingmethod, symmetric block-Toeplitz matrices, block fast Fourier transform.

Uncertainties are often modeled using stationary Gaussian ﬁelds [4, 10, 16, 18, 24]. Eﬃcient generation ofsamples from stationary Gaussian ﬁelds is crucial for using Monte Carlo techniques, which are the backboneof uncertainty quantiﬁcation simulations, in studying behavior of systems subject to uncertainties. Thereare a few numerical techniques for sampling Gaussian random ﬁelds on a grid. For instance, one can ﬁnda square-root of the corresponding covariance matrix using Cholesky’s decomposition and then multiplythe square-root by a vector of independent Gaussian random variables to simulate a sample. This is anexact method but it is rarely used in applications due to the high cost of Cholesky’s decomposition inhigh dimensions. Another possibility is the Karhunen-Loeve expansion (see, e.g. [11, 15]), which requiresknowledge of eigenvalues and eigenfunctions of the covariance operator for the Gaussian random ﬁeld. Inmany cases of practical interest the eigenvalue problem has to be solved numerically which can be expensive,especially when eigenvalues decay slowly. Also, this method is not exact. The fast and exact method ofgenerating large samples from stationary Gaussian ﬁelds on regular grids is the circulant embedding method(CEM) [22, 6, 23] which is widely used in various uncertainty quantiﬁcation (UQ) applications such asgroundwater ﬂow simulation [11, 17], weather ﬁeld forecasting [10], and liquid composite molding processes[21]. The two main drawbacks of CEM are (i) the requirement imposed on the grid to be regular whileirregular grids of a block structure naturally appear in many applications (see three typical examples below)and (ii) the need to deal with non-positive deﬁniteness of circular embedding matrices which often occurin practical applications. The remedies for the latter were considered in [19, 12, 14], here we deal with theﬁrst deﬁciency of CEM. To this end, we propose a new block circulant embedding method (BCEM). Let usclarify the matter using the following three examples which come from sampling a random permeability ﬁeldin groundwater ﬂow simulations.

Example 1.1.

Triangular ﬁnite element with a quadrature point located at the barycentre of the triangle.

Consider generation of a stationary log-normal random permeability ﬁeld to be used in simulations basedon triangular ﬁnite elements and the Gaussian quadrature rule of degree 1 within a rectangular domain. ∗ School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD, UK (emails:[email protected] [email protected]. n n + 1) n n + 1) n n Example 1.2.

Cell-centered ﬁnite volume discretization in multilevel Monte Carlo (MLMC) computation.

The multilevel Monte Carlo (MLMC) method is a Monte Carlo technique, which can give a substantialreduction of computational complexity in comparison with the standard Monte Carlo method thanks tomaking use of a hierarchical sampling [2, 1]. In the MLMC algorithm, when computing the diﬀerence ofquantities on two consecutive grids with mesh sizes h and 2 h , the pair of ﬁne and coarse random samplesmust come from the same realization of the random ﬁeld. In the cell-centered ﬁnite volume discretization,which uses permeability values at the centers of cells, the locations of coarse random ﬁled do not coincidewith nodes on the ﬁne grid (see Fig. 1.2). In this case there exists a uniform grid with the mesh size h/ h h/ Figure 1.2: Left: location of sampling points on the ﬁne grid (black) of size h and the coarse grid (hollow) ofsize 2 h using the cell-centered ﬁnite volume discretization in 2D. Right: uniform grid (gray) which containsboth black and hollow points. Note that the nodes on the right and top sides of the rectangles belong to theneighboring elements.which can result in substantial savings of both computational time and memory in comparison with applyingCEM. Example 1.3.

Conditional random ﬁeld generation on block regular grids.

The conditional random ﬁeld generation based on CEM was considered in [5]. In this approach one buildsa symmetric matrix of the form R = (cid:20) R R R R (cid:21) , (1.1)where R ∈ R n × n is a (block) circulant matrix, and R ∈ R n × n is a covariance matrix of ﬁled valuesat locations of measurements, and generate random vectors using its square root R / = (cid:20) √ n F Λ / K L (cid:21) , (1.2)where K = √ n R F Λ − / and L is a matrix such that LL T = R − KK H . Here F denotes a discrete Fouriertransform matrix, and Λ is a diagonal matrix whose entries are eigenvalues of R . The computational costsof forming Λ , K H and KK H are O ( n log n ) ﬂops, O ( n n log n ) ﬂops, and O ( n n ) ﬂops, respectively.BCEM can also be used for generation of random ﬁelds conditioned on observations. As with theunconditional sampling discussed above, applications of conditional sampling often deal with grids whichare not regular but have a regular block structure (see, e.g. conditional MLMC simulation in [17]). SinceBCEM requires smaller n value as it does in the unconditional case, BCEM in the conditional random ﬁeldsetting can outperform CEM.BCEM also has the remarkable feature that it is paralellizable in contrast to the standard CEM which isa serial algorithm (of course, CEM can exploit parallelism of the fast Fourier transform (FFT) but BCEM’smain ingredient is also FFT and it can beneﬁt from FFT parallelism as well), i.e., BCEM has a furthersigniﬁcant advantage over CEM.The rest of the paper is organized as follows. In Section 2 we illustrate the idea of BCEM in the caseof one-dimensional space. In Section 3 we present a multi-dimensional BCEM. Computational complexityof BCEM is discussed in Section 4, where some numerical experiments comparing BCEM with the standardCEM show that already in 2D BCEM can be three time faster in sample generation than CEM.3 Illustration of the idea

To illustrate the idea of BCEM, we start with presenting it in the 1D case. x x x x x N − x N s (0)1 s (0)2 s (1)1 s (1)2 s (2)1 s (2)2 s ( N − s ( N − ˜ s (0)1 ˜ s (1)1 ˜ s (2)1 ˜ s ( N − Figure 2.1: 1D uniform grid Ω r = { x , . . . , x N } ∈ Ω = [ x , x N ]. Black circles represent the locations ofpoints s ( i ) j ∈ Ω s , and the combination of black and grey circles correspond to the uniform grid ˜Ω s .Consider a uniform grid Ω r = { x , . . . , x N } on the interval Ω = [ x , x N ] with a grid size h = ( x N − x ) /N ,and sets of points S i = { s ( i )1 , . . . , s ( i ) ℓ } ⊂ Ω i = [ x i , x i +1 ] with s ( i ) j = x i + δ j , where 0 ≤ δ < δ < · · · < δ ℓ < h (see Figure 2.1). Note that δ j are independent of the index i . The grid Ω s := N − S i =0 S i is, in general, non-uniform (it is uniform if ℓ = 1) but it is block-uniform, i.e., the distribution of points in each sub-interval(in other words, block) Ω i is the same.Let Z( x ), x ∈ R , be a stationary Gaussian random ﬁeld with zero mean and covariance function r ( x ).Our aim is to sample from Z( x ) on the grid Ω s . If Ω s is not a grid of equispaced points, then the covariancematrix of the ﬁeld Z( x ) on Ω s is not Toeplitz. In this case the standard CEM [22, 6, 23] cannot be appliedto this covariance matrix in order to perform highly eﬃcient computing of its square-root with subsequentgeneration of the required Gaussian ﬁeld samples. The simplest remedy is to extend the non-uniform grid Ω s to the uniform grid ˜Ω s by adding points (see Figure 2.1) and then apply the standard circulant embeddingmethod, but this approach results in a substantial increase of computational costs. In this paper, we proposea diﬀerent approach which does not need in adding points to Ω s and which is cheaper than the use of thestandard CEM on the extended uniform grid ˜Ω s .Consider the covariance matrix R of the random vector Z( s ( i ) j ), s ( i ) j ∈ Ω s , written in the block matrixform: R =  R , R , R , · · · R ,N − R , R , R , · · · R ,N − R , R , R , · · · R ,N − ... ... ... . . . ... R N − , R N − , R N − , · · · R N − ,N −  Nℓ × Nℓ , (2.1)where each block matrix R i,k is deﬁned as R i,k = h r ( | s ( i ) j − s ( k ) l | ) i ≤ j,l ≤ ℓ . (2.2)Now note that, by construction, R i,j = ( R k,l if j − i = l − k , R Tk,l if j − i = k − l . (2.3)Property (2.3) implies that the covariance matrix R from (2.1) can be uniquely determined by its ﬁrst blockrow and hence it is symmetric and block Toeplitz, having identical blocks along diagonals. Then R can berewritten as R =  R , R , R , · · · R ,N − R T , R , R , · · · R ,N − R T , R T , R , · · · R ,N − ... ... ... . . . ... R T ,N − R T ,N − R T ,N − · · · R ,  . (2.4)4e now illustrate how CEM [22, 6, 23] can be extended so that its new version, BCEM, is applicableto the symmetric block Toeplitz matrix R from (2.4). To this end, we embed R in the mℓ × mℓ symmetricblock Toeplitz matrix C for some even integer m ≥ N : C =  C C · · · C m − C m − C · · · C m − ... ... . . . ... C C · · · C  , (2.5)where C k = h r ( g ( | s (0) i − s ( k ) j | )) i ≤ i,j ≤ ℓ (2.6)and g ( x ) = ( x if x < mh/ mh − x if x ≥ mh/

2. (2.7)Note that C i = R ,i for 0 ≤ i ≤ N − C is the covariance matrix for Z( x ) deﬁned in the circularmanner on the grid Ω Es = m − S i =0 S i ⊂ Ω E = [ x , x m ], where x m = x + mh and S i are deﬁned in the same wayas before.It is not diﬃcult to see that the matrix C has the following properties C = C T , (2.8) C k = C Tm − k , for 1 ≤ k ≤ m . (2.9)The properties (2.8) and (2.9) imply that C is a symmetric block circulant matrix.Let F B be the tensor product of a one-dimensional discrete Fourier matrix F m of order m and an identitymatrix I ℓ of size ℓ × ℓ : F B = F m ⊗ I ℓ =  I ℓ I ℓ I ℓ · · · I ℓ ω I ℓ ω I ℓ ω I ℓ · · · ω m − I ℓ ω I ℓ ω I ℓ ω I ℓ · · · ω m − I ℓ ... ... ... . . . ... ω m − I ℓ ω m − I ℓ ω m − I ℓ · · · ω m − m − I ℓ  . The matrix C is unitarily block diagonalizable by F B [3, 23], i.e., there exists ℓ × ℓ matrices Λ k , k =0 , , . . . , m −

1, such that C = 1 m F B Λ F HB , where Λ =  Λ · · ·

00 Λ · · · · · · Λ m −  . (2.10)Here H denotes the conjugate transpose. Similarly to the eigenvalue decomposition of a symmetric circulantmatrix whose eigenvalues can be calculated by performing a discrete Fourier transform of its ﬁrst row (orcolumn), the block matrices on the diagonal of Λ can be computed as[ C C · · · C m − ] F B = [Λ Λ · · · Λ m − ] (2.11)or in the component-wise form: h C i,j C i,j · · · C i,jm − i F m = h Λ i,j Λ i,j · · · Λ i,jm − i , where 1 ≤ i, j ≤ ℓ. (2.12)5ince the block circulant matrix C is real and symmetric, Λ k are Hermitian. Furthermore, all the diagonalelements of Λ k are equal. Therefore, only ℓ ( ℓ + 1) / − ( ℓ −

1) applications of F m are required for computingΛ. Remark 2.1.

Consider a uniform grid, i.e., a block-regular grid with the size of regular grid being a multipleof the number of blocks or, in other words, the points in each S i being uniformly located. Then BCEM isapplicable on the uniform grid (recall that CEM works on regular grids only). Since the covariance dependson the distance between points only, the block circulant matrix C on the uniform grid satisﬁes the relationship C a,bk = C c,dk if | a − b | = | c − d | . Consequently, Λ k in (2.11) are Toeplitz and the number of 1D FFT F m to compute distinctive values of Λis equal to ℓ . Thus, in BCEM the block circulant matrix can be diagonalized by using ℓ FFTs of order m followed by using a Cholesky decomposition of the block diagonal matrix Λ, whose block entries are of size ℓ × ℓ . For small ℓ , the overall computational cost is dominated by O ( ℓm log m ). On the other hand, thecomplexity of CEM is dominated by FFTs of order mℓ , which gives the overall cost O ( mℓ log mℓ ). Hence,BCEM can outperform CEM on the uniform grid, where both CEM and BCEM use the same covariancematrix (see also Remark 4.2).The symmetricity of C also guaranties the spectral decompositionΛ k = U k D k U Hk , (2.13)where U k is unitary and D k is a real-valued diagonal matrix. The following proposition implies that Λ from(2.10) can be decomposed with m/ Proposition 2.2.

The block diagonal matrix Λ from (2.10) has the property Λ k = Λ m − k , for ≤ k ≤ m , (2.14) where the bar denotes the matrix with conjugate complex entries.Proof. Let ω n = exp( πnm i ) be a root of unity. Then (see (2.9) and (2.12)):Λ m − k = C + ω m − k C + · · · + ω m − km − C m − = C + ω k C + · · · + ω m − k C m − = Λ k . It follows from (2.10) and (2.13) that C has the eigenvalue decomposition C = 1 m ( F B U ) D ( F B U ) H , (2.15)where the unitary block-diagonal matrix U and the diagonal matrix D are of the form U =  U · · · U · · · · · · U m −  and D =  D · · · D · · · · · · D m −  . We note that C is non-negative deﬁnite if and only if D i,ik ≥ ≤ k ≤ m − ≤ i ≤ ℓ .Assume for the moment that all the eigenvalues of C are non-negative. Let two independent randomvectors ξ and ξ , each of size m , be normally distributed N ( O, I m ), i.e., E[ ξ i ξ Tj ] = δ ij I m , where δ ij denotes6he Kronecker delta. Set η = U ( D/m ) ( ξ + iξ ). Then the real and imaginary parts of the vector ζ := F B η give two independent random vectors ζ and ζ that are both distributed as N (0 , C ). Since R isembedded in C , the corresponding parts of ζ and ζ are distributed as N ( O, R ). Note that the matrix-vector multiplication F B η can be calculated component-wise by ℓ applications of F m .The algorithm described above depends on non-negative deﬁniteness of the symmetric block circulantmatrix C . The suﬃcient conditions for symmetric circulant matrices to have all non-negative eigenvalueswere developed for 1D case in [6] and [22]. Here we extend these conditions to the symmetric block circulantmatrix C from (2.5). To this end, introduce a uniform grid ˜Ω s such that Ω s ⊂ ˜Ω s and consider the covariancematrix ˜ R deﬁned on ˜Ω s . As Ω s is a subset of ˜Ω s , R is a sub-matrix of the matrix ˜ R . Let a uniform grid˜Ω Es contain all points of Ω Es = m − S i =0 S i . Then the symmetric block circulant matrix C is a sub-matrix of asymmetric circulant matrix ˜ C : ˜ C i,j = [ r ( g ( | x i − x j | ))] , (2.16)where the function g ( x ) is as in (2.7) and x i , x j ∈ ˜Ω Es . Therefore, there exists an injection matrix P T suchthat C = P T ˜ CP. (2.17)An injection matrix can be built by eliminating rows of the identity matrix, which correspond to points notin Ω Es . For instance,   is an injection matrix from { x , x , x , x , x } to { x , x , x } . The relationship (2.17) leads to the followingproposition. Proposition 2.3. If ˜ C is non-negative deﬁnite, then so is C . When the circulant matrix ˜ C fails to be non-negative deﬁnite, Wood and Chen [22] suggested to increasethe size of ˜ C until it becomes non-negative deﬁnite (the so-called padding technique). From the relationship(2.17) between C and ˜ C , the same strategy can be used for the matrix C . That is, increase m until C becomesnon-negative deﬁnite. Therefore, the number of blocks m , which is required for C to be non-negative deﬁnite,depends on the grid size of the uniform grid ˜Ω Es , not on the number of points in Ω s . Thus, the number ofpaddings needed for BCEM is the same as for CEM (see also Remark 3.2). In the previous section we illustrated the idea of BCEM in the simpler setting of 1D space. In this sectionwe present multi-dimensional BCEM which computational complexity is discussed in the next section.We start with introducing the notation which largely follows [23]. Let Z d be the set of d -vectors withnon-negative integer components and and be the d -dimensional vectors whose all components equal to0 and 1, respectively. For i = ( i [1] , . . . , i [ d ]) T , j = ( j [1] , . . . , j [ d ]) T ∈ Z d , we deﬁne addition in Z d : i + j = ( i [1] + j [1] , . . . , i [ d ] + j [ d ]) T and also the product of elements of i : i = d Y k =1 i [ k ] . For any j ∈ Z d all components of which are strictly positive, we deﬁne the set I ( j ): I ( j ) = { i ∈ Z d : 0 ≤ i [ k ] ≤ j [ k ] − ≤ k ≤ d } . (3.1)7 j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j x j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j Ω j s j s j s j s j s j s j s j s j Ω j Ω j x j x j x j x j x j x j Figure 3.1: The 2D uniform grid Ω Er = { x j = x , x j , . . . , x j m + − = x m } on the rectangle Ω E = [ x [1] , x m [1]] × [ x [2] , x m [2]] for m = (8 , T . Nodes s ( j k ) j of the block-regular grid Ω Es are represented by black circles. Theshaded rectangle corresponds to the computation (i.e, the domain of interest for the problem at hands)domain Ω = [ x [1] , x N [1]] × [ x [2] , x N [2]] for N = (4 , T .Note that the cardinality of I ( j ) is equal to j .Introduce a d -dimensional rectangular parallelepipedΩ = d Y i =1 [ x [ i ] , x N [ i ]] ⊂ R d , (3.2)where N = ( N [1] , . . . , N [ d ]) T ∈ Z d and the vector h with the components h [ i ] = ( x N [ i ] − x [ i ]) / N [ i ]. Further,8oints x i k = ( x i k [1] , . . . , x i k [ d ]) T with x i k [ j ] = x [ j ] + i k [ j ] h [ j ] form a regular grid Ω r = { x i , . . . , x i N + − } on the rectangular parallelepiped Ω (see Fig. 3.1). The domain Ω in (3.2) can be divided into d -dimensionalrectangular parallelepipeds as Ω = [ j k ∈I ( N ) Ω j k , (3.3)where Ω j k = d Y i =1 [ x j k [ i ] , x j k [ i ] + h [ i ]] , j k ∈ I ( N ) . (3.4)For the purpose of algorithm development, we use a lexicographic ordering of j k with respect to k , i.e., rowafter row and layer after layer (see Fig. 3.1).Consider a stationary Gaussian random ﬁeld Z( x ), x ∈ R d , with zero mean and covariance function r ( x ).We assume that the problem at our hands is such that we need to sample Z( x ) at the nodes s ( j k ) i deﬁned asfollows (see the examples in the Introduction and also Fig. 3.1): s ( j k ) j = x j k + δ j , (3.5)where δ j = ( δ j [1] , . . . , δ j [ d ]) T with 0 ≤ δ j [ i ] < h [ i ] for 1 ≤ i ≤ d . Here ℓ is the number of sampling pointsin each subdomain Ω j k . That is, in each Ω j k the points from the set S j k = { s ( j k )1 , . . . , s ( j k ) ℓ } are distributedaccording to the same pattern for all j k ∈ I ( N ). Denote the grid:Ω s = [ j k ∈I ( N ) Ω j k . Note that δ j are independent of the index vector j k . The covariance matrix R of Z ( s ( j k ) i ), s ( j k ) i ∈ Ω s , isblock-Toeplitz. In the one-dimensional-case it consists of non-Toeplitz blocks of order ℓ (see Section 2). Inthe d -dimensional case with d >

1, it consists of blocks which have all the properties of a correlation matrixin the d − R is not Toeplitz and hence CEM is notdirectly applicable here.Analogously to CEM, in order to build a block-circulant matrix, we consider an extended domain Ω E = d Q i =1 [ x [ i ] , x m [ i ]], where m = ( m [1] , . . . , m [ d ]) T with m [ i ] ≥ N [ i ] and x m [ i ] = x [ i ] + m [ i ] h [ i ] for 1 ≤ i ≤ d .Figure 3.1 shows an example of the computation domain Ω with N = (4 , T and the extended domain Ω E with m = (8 , T . Vectors j k ∈ I ( m + ) form the extended regular grid Ω E r = { x j k = ( x j k [1] , . . . , x j k [ d ]) T | j k ∈ I ( m + ) } ⊂ Ω E . There are m + regular grid points in the set Ω E r . The parallelepiped Ω E can bedivided into d -dimensional small parallelepipeds as (see also Fig. 3.1):Ω E = [ j k ∈I ( m ) Ω j k , (3.6)where Ω j k , j k ∈ I ( m ), are as in (3.4).We now describe BCEM in the d -dimensional case, which is applicable to our block-Toeplitz covariancematrix R . In contrast to the 1D setting, where the covariance function is always even because of its symmetry,further classiﬁcations of covariance functions are needed in higher dimensional cases. We say that r is component-wise even in i th coordinate if r ( x [1] , . . . , − x [ i ] , . . . , x [ d ]) = r ( x [1] , . . . , x [ i ] . . . , x [ d ]) (3.7)for all x ∈ R d ; otherwise, we say that r is component-wise uneven in some coordinates.For simplicity of the exposition, let us assume for now that r ( x ) is component-wise even in all coordinates(we will discuss a modiﬁcation of BCEM in the uneven case in Remark 3.1).9e ﬁrst build the block circulant embedding of the block-Toeplitz matrix R . Consider the ﬁrst row ofthe block circulant matrix C , which is an ℓ × ℓ m matrix C f of the form C f = (cid:2) C j C j · · · C j m − (cid:3) , (3.8)where ( i, j )-th element of C j k is equal to C i,j j k = r ( g m ( s ( j ) i − s ( j k ) j )) (3.9)and the vector function g m = ( g m , . . . , g d m ) T is deﬁned by g i m ( x j ) = ( x j [ i ] , if | x j [ i ] | < m [ i ] h [ i ] / m [ i ] h [ i ] − | x j [ i ] | , otherwise. (3.10)The block circulant matrix C is generated by its ﬁrst row C f in the usual way. Also note that R = [ C j k ] j k ∈I ( N ) . (3.11)The block circulant matrix C is block diagonalizable by a block discrete Fourier transform (BDFT)matrix, F B = F d m ⊗ I ℓ , where F d m is a d -dimensional DFT matrix. That is, we have C = (1 / m ) F B Λ F HB ,where Λ = diag(Λ , . . . , Λ m ). The blocks on the diagonal of Λ can be found by simply taking BDFT of ﬁrstblock row [3, 23]. Furthermore, using the fact that F B is the tensor product involving the identity matrix,we derive the following component-wise computation:[Λ i,j · · · Λ i,j m − ] = FFT d ([ C i,j · · · C i,j m − ]) , (3.12)where 1 ≤ i, j ≤ ℓ and FFT d is the d -dimensional FFT. Note that instead of FFT we will write FFT.Due to the fact that Λ is Hermitian and all diagonal entries of Λ k are the same, the required number ofFFT d of size m (which is equivalent to FFT of order ¯ m ) in (3.12) is ℓ ( ℓ + 1) / − ( ℓ − LL H exists, where L is a block diagonal matrixwith each block being a lower triangular matrix. Then, we obtain the decomposition C = (1 / m ) F B L ( F B L ) H .As in the one-dimensional case (see Section 2), let ξ = ξ + iξ be a complex-valued random vector oforder m with ξ and ξ being real, normal random vectors such that E[ ξ i ] = 0 and E[ ξ i ξ Tj ] = δ ij I . Set e L = (1 / m ) / L and η = e Lξ . Multiplying the square root of C by ξ , we obtain the complex-valued vector F B η = ζ = ζ + iζ , (3.13)with the properties: E[ ζ ζ T ] = E[ ζ ζ T ] = C and ζ and ζ are independent. Using tensor-product propertiesof F B , ζ can be computed in the component-wise manner:[ ζ [ i ] ζ [ i + ℓ ] . . . ζ [ i + ( m − ℓ ] = FFT d ([ η [ i ] η [ i + ℓ ] . . . η [ i + ( m − ℓ ]) (3.14)for 1 ≤ i ≤ ℓ .To summarize, the new BCEM can be presented in the algorithmic form as follows.10 lgorithm 3.1 Block circulant embedding method (BCEM)Given N ∈ Z d , x ∈ Ω, and strictly positive valued vector h ∈ R d , Step

1. Choose a vector m ∈ Z d such that m [ i ] ≥ N [ i ] for all 1 ≤ i ≤ d . Step

2. Compute the ﬁrst block row of the circulant matrix C as described in (3.8)-(3.10). Step

3. Compute the block diagonal matrix Λ = diag(Λ , · · · , Λ m − ) using (3.12). Step

4. Compute the square-root of Λ applying Cholesky decompositions to diagonal blocks of Λ:Λ = LL H , (3.15)where L is a block diagonal matrix with lower triangular block of order ℓ . Step

5. If the Cholesky decomposition fails in

Step

4, increase m [ i ] by one or more and go to Step Step

6. Compute e L = (1 / m ) / L . Step

7. Generate a random complex vector of dimension m ℓ, ξ = ξ + iξ , with two independent vectors ξ and ξ being N (0 , I m ℓ ). Compute η = e Lξ . Step

8. Compute z = ( ζ [1] , . . . , ζ [ m ℓ ]) T by applying times as in (3.14).Note that if ℓ = 1, then Ω r is regular, C is circular, Λ becomes diagonal instead of block diagonal andAlgorithm 3.1 degenerates to the standard CEM. Remark 3.1.

Algorithm 3.1 is applicable when the covariance function is component-wise even (see (3.7)).Although the covariance function is even by deﬁnition, i.e., r ( − x ) = r ( x ), it could be component-wise uneven,e.g., r ( x ) = exp( − x T Ax ) with A = (cid:20) − − (cid:21) , x ∈ R , is uneven. In this case, the matrix C deﬁned by the vector function g m in (3.10) usually does not have blockcirculant structure because of conﬂicting deﬁnitions at the points x with x [ i ] = m [ i ] h [ i ] /

2, if r is uneven in i th coordinate. Two adjustments to make CEM work in uneven cases were suggested in [22], which can beapplied to BCEM by modifying Algorithm 3.1 as follows.If r is uneven in the i th coordinate, either(a) choose m [ i ] to be an odd integer, e.g., a power of three;or(b) still choose m [ i ] to be an even integer and deﬁne C j k using (3.9) and (3.10), except put r ( x ) = 0 forall x such that | x [ i ] | = m [ i ] h [ i ] / i .In either case, the resulting matrix C has a block circulant structure and thus Algorithm 3.1 can be seamlesslyextended to the uneven case with aforementioned modiﬁcations. Remark 3.2.

As mentioned earlier, the matrix C is often negative deﬁnite in practical applications of CEMand BCEM. Following [22], we increase the matrix C in Algorithm 3.1 (see its Step 5) until it becomesnon-negative deﬁnite (the padding technique). The padding technique is universal and usually eﬃcientwhen the correlation length of a random ﬁeld is in a range from small to medium relative to the size of acomputational domain and the ﬁeld is not too smooth. Otherwise, the use of the padding technique could bevery expensive. There are two recently developed alternatives to padding (a cut-oﬀ of the circulant matrix[19, 12] and smoothing window circulant embedding [14]), which can deal with the problem of negativedeﬁniteness of circulant matrices eﬀectively. The techniques from [19, 12, 14] are applicable to BCEM asthey are for CEM.The equispaced FFT is highly parallelizable in high dimensions, and its highly scalable implementationsare proposed in [8, 7]. This could be beneﬁcial in the standard CEM because its computation is dominatedby the FFT. Still equipped with the parallelism of the FFT, BCEM can be further parallelized in a natural11ay because the applications of FFT d in Step

Step

In this section we analyze the computational complexity of BCEM. To this end, we use the same conventionas in Golub and Van Loan [13] for counting the number of ﬂoating point operations: 5 m log m ﬂops forFFT of size m and n / n . Step C by taking BDFT of itsﬁrst block row which can be computed using the ordinary DFT in (3.12) at the costcost = (cid:18) ℓ ( ℓ + 1)2 − ( ℓ − (cid:19) (5 m log m ) ﬂops . (4.1)Here we took into account that each Λ k is Hermitian and its diagonal elements have the same value.In Step

4, the square-root operation on the block diagonal matrix Λ with m blocks of order ℓ can beperformed on each block separately using the Cholesky decomposition method. In Proposition 2.2, we provedin the one-dimensional case that Λ has pairs of complex-conjugate blocks, Λ k and Λ m − k , which allows usto compute the square-root of Λ k and use its complex-conjugate as a square-root of its complex-conjugatepair Λ m − k . This is based on the periodicity and conjugate symmetry of FFT. Hence, Proposition 2.2 canbe extended to the higher dimensional cases. Then the matrix Λ can be decomposed at the costcost = d Y i =1 (cid:18) m [ i ]2 + 1 (cid:19) ℓ . (4.2) Remark 4.1.

Note that if the nodes of S j k are regularly (uniformly) distributed in Ω j k for all j k ∈ I ( m )which is often the case in applications (see, e.g., Example 1.1), then all blocks on the diagonal of Λ are blockToeplitz (Toeplitz). Toeplitz matrix and block Toeplitz matrix can be decomposed using Schur’s algorithm[20] and block Schur’s algorithm [9], respectively, which have O ( ℓ ) complexity as opposed to O ( ℓ ) for thestandard Cholesky decomposition. Making use of Schur’s algorithms can reduce the cost of Algorithm 3.1.In Step

8, computing a realization of ζ requires block diagonal matrix-vector multiplication e Lξ and ℓ applications of FFT of order m in (3.14) at the costcost = ℓ m ﬂops (4.3)and cost = ℓ (5 m log m ) ﬂops , (4.4)respectively.To conclude, the cost of BCEM is O ( ℓ m + ℓ m log m ) ﬂops. In practical applications of BCEM (see,e.g. examples in the Introduction) the size of blocks ℓ is relatively small while the number of blocks m is large. Recall that BCEM is designed for block-regular grids Ω s . Its main computational advantage incomparison with CEM (which is designed for regular grids) comes from the fact that the use of CEM in thecase of simulations on a block-regular grid Ω s requires regularization of Ω s , i.e., adding a signiﬁcant numberof extra nodes which BCEM does not need. Hence BCEM works on a grid with a smaller number of nodesthan CEM and needs to generate random vectors ζ of smaller size than CEM (and hence makes less numberof calls to a random number generator to sample ξ ).12 emark 4.2. It can be shown that the use of BCEM on a regular grid split in blocks of a size ℓ can be moreeﬀective in sampling the random ﬁeld than CEM but it is computationally more expensive in the matrixdecomposition than CEM. The latter can be overcome by exploiting the fact that BCEM is parallelizable incomparison with CEM. Thus BCEM can be more eﬀective than CEM even in the case of regular grids forwhich CEM is designed.We now compare computational complexity of BCEM and CEM using the ﬁrst two examples from theIntroduction and the following exponential covariance function (cf. (3.9)): r ( x ) = σ exp (cid:18) − k x k . (cid:19) , (4.5)where k·k means L -norm. We note that the circulant matrix C (cf. (3.8), (3.9)) of the size m = 2 N formedby (4.5) is always positive deﬁnite (see, e.g. [6]). This means, in particular, that Step 5 (i.e., padding) ofAlgorithm 3.1 is not needed in this case. For simplicity, we consider the domain Ω to be the unit square inthe examples. log ( N [1]) = log ( N [2]) l og ( f l op s ) Decomposition 6 7 8 9 10 11 1220222426283032343638 log ( N [1]) = log ( N [2]) l og ( f l op s ) Generation CEMBCEM CEMBCEMSlope 2.1

Figure 4.1: The ﬂoating point operations required for the matrix decomposition and random ﬁeld generationstages of CEM and BCEM in Example 4.3.

Example 4.3.

Triangular ﬁnite element with a quadrature point located at the barycentre of the triangle.

In Example 1.1 (see Figure 1.1), each rectangular block contains 9 uniform grid nodes. Hence the order ofthe circulant matrix used by CEM is 9 m , where m = 2 N and m is the number of rectangular blocks in theextended domain Ω E . Then the matrix decomposition cost for CEM is 45 m log m ﬂops, and generation ofeach realization of the random ﬁeld requires another 45 m log m ﬂops.Here BCEM uses only two points in each rectangular block, so the order of the block-circulant matrix is2 m . Substituting ℓ = 2 into (4.1) and (4.2), the total matrix decomposition cost for BCEM is 10 m log m +( m [1] / m [2] / /

3) ﬂops. Each realization of the random ﬁeld is generated at the cost of 4 m +10 m log m ﬂops (see (4.3) and (4.4)).Figure 4.1 shows the ﬂoating point operations required by the two algorithms. One can see that BCEMis more eﬀective in both procedures and that the complexity of BCEM grows at roughly the same rateas for CEM. Compared to CEM, BCEM reduces the matrix decomposition cost and the generation costapproximately in 2 . log ( N [1]) = log ( N [2]) l og ( C P U t i m e ) ( s ) Single Simulation Time CEMBCEMSlope 2.1

CPU Time (s) N BCEM CEM speed-up32 0.60 2.71 4.564 2.32 11.39 4.9128 11.34 48.55 4.3256 49.37 212.21 4.3Figure 4.2 & Table 4.1: Average time required to simulate a single realization of the random ﬁeld in Example4.3 computed using 1000 independent samples.To compare performance of BCEM and CEM further, we generated samples of the random ﬁeld by thesetwo methods on an Intel Xeon E5-2450, 96GB RAM computer using MATLAB R2014a. Figure 4.2 showsthe average computational time of generation of a single realization of the random ﬁeld by both methodsand how it increases with increasing N . Table 4.1 gives the CPU time and the speed up in generating arandom vector using BCEM against the ones using CEM. For both methods, the CPU time increases withincrease of N at about the same rate as the theoretical rate shown in Figure 4.1 (right). We see that BCEMis about 4 . − . log ( N [1]) = log ( N [2]) l og ( f l op s ) Decomposition 3 4 5 6 7 8 914161820222426283032 log ( N [1]) = log ( N [2]) l og ( f l op s ) Generation

CEMBCEM

CEMBCEMSlope 2.2

Figure 4.3: The ﬂoating point operations required for the matrix decomposition and random ﬁeld generationstages of CEM and BCEM in Example 4.4.

Example 4.4.

Cell-centered ﬁnite volume discretization in multilevel Monte Carlo (MLMC) computation.

In Example 1.2, BCEM uses 5 out of 16 uniform nodes required for CEM in each individual block to generaterandom variables located at the centers of both the ﬁne and coarse cells. That is, CEM should generaterandom variables at the extra 11 nodes that are not used in the ﬁnite volume discretization and are not usedby BCEM. Then memory requirement for CEM and BCEM are 16 m and 5 m , respectively, which makesBCEM more attractive when the number of blocks is large.14hereas the matrix decomposition and sampling costs in CEM both require 80 m log m ﬂops, the com-putational costs of the matrix decomposition and sampling in BCEM are 55 m log m +( m [1] / m [2] / /

3) ﬂops and 25 m + 25 m log m ﬂops, respectively (see (4.1) - (4.4) with ℓ = 5). Note that the totalcosts are dominated by m log m . Hence, for large m , the ratio of the matrix decomposition in CEM to onein BCEM is close to 80 / ≈ .

45 . For the sample generation cost, the ratio is close to 80 / ≈ .

2. Thesetheoretical computational costs are shown in Figure 4.3. log ( N [1]) = log ( N [2]) l og ( C P U t i m e ) ( s ) Single Simulation Time CEMBCEMSlope 2.2

CPU Time (s) N CEM BCEM speed-up32 0.94 3.08 3.364 4.62 13.71 3.0128 23.37 68.28 2.9256 101.82 301.24 3.0Figure 4.4 & Table 4.2: Average time required to simulate a single realization of the random ﬁeld in Example4.4 computed using 1000 independent samples.Figure 4.4 gives the CPU times for the random ﬁeld generation. We see that the actual computationalcost increases with increase of N similarly to the theoretical one as in Figure 4.3. Table 4.2 demonstratesthat BCEM is nearly 3 time faster than CEM as we expected from Table 1.1 and Figure 4.3. Also note thatBCEM is highly parallelizable, so the computation cost can be further reduced using parallel algorithms.We have compared BCEM and CEM on the 2D examples here. It is not diﬃcult to see (cf. Table 1.1)that in 3D cases BCEM can outperform CEM even more dramatically. Remark 4.5.

The MATLAB codes for BCEM used for Examples 4.3 and 4.4 are available at . Acknowledegments

This work was partially supported by the EPSRC grant EP/K031430/1.

References [1]

A. Barth, C. Schwab, and N. Zollinger , Multi-level Monte Carlo ﬁnite element method for ellipticPDEs with stochastic coeﬃcients, Numerische Mathematik, 119 (2011), pp. 123–161.[2]

K.A. Cliffe, M.B. Giles, R. Scheichl, and A.L. Teckentrup , Multilevel Monte Carlo methodsand applications to elliptic PDEs with random coeﬃcients, Comput. Visual. Sci., 14 (2011), pp. 3–15.[3]

P.J. Davis , Circulant Matrices, AMS Chelsea Publishing, 2nd ed., 1979.[4]

J.P. Delhomme , Spatial variability and uncertainty in groundwater ﬂow parameters, a geostatisticalapproach, Water Resour. Res., (1979), pp. 269–280.155]

C.R. Dietrich and G.N. Newsam , A fast and exact method for multidimensional Gaussian stochasticsimulations: Extension to realizations conditioned on direct and indirect measurements, Water Resour.Res., 32(6) (1996), pp. 1643–1652.[6] , Fast and exact simulation of stationary Gaussian processes through circulant embedding of thecovariance matrix, SIAM J. SCI. COMPUT., 18 (1997), pp. 1088–1107.[7]

M. Eleftheriou, B. Fitch, A. Rayshubskiy, T. Ward, and R. Germain , Performancemeasurements of the 3D FFT on the Blue Gene/L supercomputer, in In Euro-Par 2005 Parallel Process-ing: 11th International Euro-Par Conference, Lisbon, Portugal, August 30-September2, 2005, J. Cunhaand P. Medeiros, eds., vol. 3648 of Lecture Notes in Computer Science, Springer-Verlag, 2005, pp. 795–803.[8]

M. Frigo and S. G. Johnson , FFTW: An adaptive software architecture for the FFT, in In Pro-ceedings of the IEEE International Conference on Acoustics, vol. 3 of Speech and Signal Processing(ICASSP), 1998, pp. 1381–1384.[9]

K.A. Gallivan and S. Thirumalai , High performance algorithm for Toeplitz and block Toeplitzmatrices, Linear Algebra and its Applications, 241-243 (1996), pp. 343–388.[10]

Y. Gel, A.E. Raftery, T. Gneiting, C. Tebaldi, D. Nychka, W. Briggs, M.S. Roulston,and V.J. Berrocal , Calibrated probabilistic mesoscale weather ﬁeld forecasting: The geostatisticaloutput perturbation method, Journal of the American Statistical Association, 99 (2004), pp. 575–590.[11]

R.G. Ghanem and P.D. Spanos , Stochastic ﬁnite elements: A spectral approach, Springer, NewYork, 1991.[12]

T. Gneiting, H. ˘Sev˘c´ıkov´a, D. B. Percival, M. Schlather, and Jian Y. , Fast and exactsimulation of large Gaussian lattice systems in R : Exploring the limits, J. Comput. Graph. Stat., 15(2006), pp. 1–19.[13] G.H. Golub and C.F. van Loan , Matrix Computations, John Hopkins University Press, Baltimore,3rd ed., 1996.[14]

H. Helgason, V. Pipiras, and P. Abry , Smoothing windows for the synthesis of gaussian stationaryrandom ﬁelds using circulant matrix embedding, J. Comput. Graph. Stat., 23 (2014), pp. 616–635.[15]

O. Le Maˆıtre and O. Knio , Spectral Methods for Uncertainty Quantiﬁcation With Applications toComputational Fluid Dynamics, Springer, 2010.[16]

N. Marheineke and R. Wegner , Fiber dynamics in turbulent ﬂows: General modeling framework,SIAM J. APPL. MATH, 66 (2006), pp. 1703–1726.[17]

M. Park and K.A. Cliffe , Conditional multilevel Monte Carlo simulation of groundwater ﬂow inthe Culebra Dolomite at the Waste Isolation Pilot Plant (WIPP) site, arXiv:1402.5257, (2014).[18]

A.A. Skordos and M.P.F. Sutcliffe , Stochastic simulation of woven composites forming, Compos-ites and Science and Technology, 68 (2008), pp. 283–296.[19]

M. Stein , Fast and exact simulation of fractional brownian surfaces, J. Comput. Graph. Stat., 11(2002), p. 587599.[20]

M. Stewart , Cholesky factorization of semideﬁnite Toeplitz matrices, Linear Algebra and its Appli-cations, 254 (1997), pp. 497–525.[21]

B. Verleye, D. Nuyens, A. Walbran, and M. Gan , Uncertainty quantiﬁcation in liquid compositemoulding processes, in Proceedings of the FPCM11 conference, Aukland, July 2012, pp. 265–271.1622]