Discretization of transfer operators using a sparse hierarchical tensor basis - the Sparse Ulam method
DDiscretization of transfer operators using a sparsehierarchical tensor basis – the Sparse Ulam method
Oliver Junge and P´eter KoltaiFaculty for MathematicsTechnische Universit¨at M¨unchenOctober 24, 2018
Abstract
The global macroscopic behaviour of a dynamical system is encoded in the eigen-functions of a certain transfer operator associated to it. For systems with low dimen-sional long term dynamics, efficient techniques exist for a numerical approximationof the most important eigenfunctions, cf. [7]. They are based on a projection of theoperator onto a space of piecewise constant functions supported on a neighborhoodof the attractor – Ulam’s method.In this paper we develop a numerical technique which makes Ulam’s approachapplicable to systems with higher dimensional long term dynamics. It is basedon ideas for the treatment of higher dimensional partial differential equations us-ing sparse grids [31, 2]. We develop the technique, establish statements about itscomplexity and convergence and present two numerical examples.
Recently, numerical techniques have been developed which enable a coarse grained, yetglobal statistical analysis of the long term behaviour of certain dynamical systems. Thebasic algorithmic approach is to construct a box covering of some set of interest in phasespace (e.g. the attractor of the system) [4, 5]. The cells in this covering then constitutethe states of a finite Markov chain. The transition matrix of this chain (i.e. the matrixof transition probabilities between the boxes) can be viewed as a finite approximationto the transfer (or Frobenius-Perron) operator of the system. This operator describeshow probability distributions on phase space evolve according to the dynamical systemunder consideration. In certain cases and in the appropriate functional analytic setting,eigenmodes of this operater can be used to charaterize the long term behaviour of thedynamics. Certain stationary distributions of the operator characterize how frequentlytypical trajectories visit certain parts of phase space. Eigenmodes at roots of unity enablethe detection of macroscopic cycles in the dynamics and eigenmodes at real eigenvaluesclose to one yield a decomposition of phase space into almost invariant sets , i.e. sets1 a r X i v : . [ m a t h . NA ] F e b or which the probability for a typical point to be mapped back into the set is large[7]. The latter concept has e.g. been used in order to detect and compute biomolecularconformations, cf. [9, 26, 28, 27, 10].Formally, the construction of the Markov chain can be viewed as projecting the transferoperator onto the space of functions which are piecewise constant on the elements ofthe box covering. Ulam conjectured [30] that for maps on the interval, the stationarydistribution of the chain converges to an invariant density (i.e. a stationary distribution)of the map. This has been proved for certain expanding maps by Li [25] and sincethen for various special classes of maps or stochastic processes also in higher dimensions[12, 11, 15, 13, 16, 7].Ulam’s method in combination with the subdivision approach from [4, 5] for the com-putation of the box covering works fine for systems with a low dimensional attractor, cf.also [3, 8]. For systems with higher dimensional long term dynamics the approach be-comes inefficient due to the curse of dimension : the number of boxes in the covering scalesexponentially in the dimension of the attractor. Adaptive approaches to the constructionof the box covering [6, 21] do not remedy this fact.In this paper we propose to attack this discretization task using ideas from sparsegrids [29, 31, 2]. In this approach, which is e.g. being used in the numerical solutionof partial differential equations on higher dimensional domains, a basis of [0 , d is buildfrom a hierarchical basis of [0 ,
1] via a tensor product construction. The entire basis canbe decomposed into subspaces which are spanned by basis functions of the same level ofthe 1d hierarchy in each factor. To each subspace one can associate its approximation benefit and its cost (which is typically given by its dimension). The idea of the sparsegrid approach is to assemble a finite dimensional approximation space by choosing onlythose subspaces whith the highest benefit to cost ratio.In order to discretize the Frobenius-Perron operator, we employ a piecewise constantsparse hierarchical tensor basis (i.e. using the Haar system as the underlying 1d basis).This basis provides an approximation error of O ( n − · (log √ n ) d − ) for functions withbounded first derivatives, requiring a computational effort of O ( n · (log √ n ) d − ) (where n denotes the number of degrees of freedom in one coordinate direction and d is thedimension of phase space). In comparison, the standard Ulam basis requires O ( n d ) basisfunctions in order to obtain an approximation error of O ( n − ).The paper is structured as follows: in Section 2.1 we collect relevant basic conceptsfrom dynamical systems theory, in particular Ulam’s method. In Section 3 we developthe Sparse Ulam method by constructing the hierarchical tensor basis, deriving approxi-mation properties, outlining the construction of the optimal approximation subspace andcomparing cost and accuracy of the new method with the standard Ulam approach. Thesection closes with statements about the convergence properties. In Section 4 we collectconsiderations concerning an efficient implementation of our approach. In particular, wederive estimates on the computational effort as a function of the required accuracy. Sec-tion 5 presents two numerical examples: a comparison with Ulam’s method for a threedimensional map with a smooth invariant density and a computation of the leading eigen-functions of the transfer operator for a four-dimensional map, constructed via a tensorproduct from two two-dimensional standard maps.2ur implementation of the Sparse Ulam method as well as the code for the examplecomputations is freely available from the homepage of the authors. Let S : X → X , X ⊂ R d , be a discrete dynamical system which is measureable w.r.tthe Borel- σ -algebra B on X . Let M C be the set of all bounded complex valued measureson ( X, B ) and M ⊂ M C be the subset of probability measures. The Frobenius-Perronoperator (or transfer operator ) P : M C → M C , P µ = µ ◦ S − , (2.1)describes how (probability) measures on phase space evolve according to the dynamicsdefined by S . A measure is called invariant if it is a fixed point of P . A set A ⊂ X iscalled invariant if A = S − ( A ). An invariant probability measure µ is ergodic if everyinvariant set has either full or zero µ -measure. Birkhoff’s ergodic theorem [1] states thatergodic measures characterize the long time behaviour of the system: Let µ be ergodicand ϕ : X → R be a µ -integrable observable , thenlim n →∞ n n − (cid:88) k =0 ϕ ( S k ( x )) = (cid:90) X ϕ d µ (2.2)for µ -almost all x ∈ X . Definition 2.1:
A probability measure µ is called SRB measure or natural invariantmeasure if (2.2) holds for continuous observables ϕ and all points x ∈ U in a set U ⊂ X with positive Lebesgue measure.SRB measures are defined via a property which we would like them to have. But howdoes one see whether a measure is SRB? After all, equation (2.2) is not easy to check ingeneral. On the other hand, if µ is an ergodic measure which is absolutely continuousw.r.t. the Lebesgue-measure m , i.e. if there is a density f with µ ( A ) = (cid:82) A f d m for allmeasurable A , then µ is SRB.Using (2.1), we can directly define the Frobenius-Perron operator on Lebesgue inte-grable functions f : X → C : (cid:90) A P f d m = (cid:90) S − ( A ) f d m ∀ A ∈ B . (2.3)If S is differentiable, we obtain the explicit expression P f ( x ) = (cid:88) y ∈ S − ( x ) f ( y ) | DS ( y ) | . .2 Almost invariance Invariant measures (or densities) are fixed points of the Frobenius-Perron operator, i.e.eigenmeasures resp. -functions at the eigenvalue 1. Eigenvectors at eigenvalues close toone are related to almost invariant sets : Intuitively, an almost invariant set of S is asubset A ⊂ X such that the invariance ratio ρ m ( A ) := m ( S − ( A ) ∩ A ) m ( A )is close to 1, i.e. a point which is chosen randomly from A with respect to the measure m maps into A with high probability. More precisely, we say that Definition 2.2:
A subset A ⊂ X is (cid:37) -almost-invariant w.r.t. the probability measure µ if µ ( A ) (cid:54) = 0 and ρ µ ( A ) = (cid:37). (2.4)Let ν ∈ M C be an eigenmeasure of P at an eigenvalue λ (cid:54) = 1. Since λν ( X ) =( P ν )( X ) = ν ( S − ( X )) = ν ( X ), it follows that ν ( X ) = 0. In particular, if λ < ν are real, then there are two positive real measures ν + , ν − with disjoint supports such that ν = ν + − ν − ( Hahn-Jordan decomposition ). The following theorem relates the invarianceratios of the supports of ν + and ν − to the eigenvalue λ . Theorem 2.3: [7] Let ν be a normalized eigenmeasure of P at the real eigenvalue λ < ρ | ν | ( A + ) + ρ | ν | ( A − ) = λ + 1 , (2.5)where A + = supp ν + and A − = supp ν − . In order to approximate the (most important) eigenfunctions of the Frobenius-Perronoperator, we have to discretize the corresponding infinite dimensional eigenproblem. Ulam[30] proposed to project the L eigenvalue problem P f = λf into a finite dimensionalsubspace of piecewise constant functions: Let ( V n ) n ∈ N be a sequence of approximationsubspaces of L with dim V n = n and let Q n : L → V n be corresponding projections into V n . The sequences ( V n ) and ( Q n ) should be chosen such that Q n converges pointwise tothe identity on L . We define the discretised Frobenius-Perron operator as P n := Q n P | V n . We now choose the approximation spaces to be spanned by piecewise constant func-tions. To this end, let X n = { X , . . . , X n } be a disjoint partition of X with m ( X i ) → n → ∞ . Define V n := span { χ , . . . , χ n } , where χ i denotes the characteristic functionof X i .Further, let Q n h := n (cid:88) i =1 c i χ i with c i := 1 m ( I i ) (cid:90) I i h d m, i.e. | ν | ( X ) = 1. P n ∆ + n ⊆ ∆ + n and P n ∆ n ⊆ ∆ n , where ∆ n := (cid:8) h ∈ V n : (cid:82) | h | dm = 1 (cid:9) and∆ + n := { h ∈ ∆ n : h ≥ } . Due to Brouwer’s fixed point theorem there always exists anapproximative invariant density f n = P n f n ∈ ∆ + n . The matrix representation of the linearmap P n : ∆ n → ∆ n w.r.t. the basis of characteristic functions is given by the transitionmatrix with entries p ij = m ( X j ∩ S − ( X i )) m ( X j ) . (2.6)Ulam conjectured [30] that if P has a unique stationary density f , then a sequence ( f n ) n ∈ N converges to f in L . It is still an open question under which conditions on S this is true ingeneral. Li [25] proved the conjecture for expanding, piecewise continuous interval maps,Ding an Zhou [13] for the corresponding multidimensional case.In [7], Ulam’s method was applied to a small random perturbation of S which mightbe chosen such that the corresponding transfer operator is compact on L . In this case,perturbation results [23] (section IV.3.5.) for the spectrum of compact operators implyconvergence. The computation of one matrix entry (2.6) requires a d -dimensional quadrature. A stan-dard approach to this is Monte-Carlo quadrature (also cf. [20]), i.e. p ij ≈ K K (cid:88) k =1 χ i ( S ( x k )) , (2.7)where the points x , . . . , x K are chosen i.i.d from X j according to a uniform distribution.In [19], a recursive exhaustion technique has been developed in order to compute theentries to a prescribed accuracy. However, this approach relies on the availability of localLipschitz estimates on S which might not be cheaply computable in the case that S isgiven as the time- T -map of a differential equation.For the Monte-Carlo technique, consider a uniform partition of the unit cube into M d congruent cubes of edge length 1 /M . Let P M denote the transition matrix for thispartition and let ˜ P M be its Monte-Carlo approximation. According to the central limittheorem (and its error-estimate, the Berry-Ess´een theorem [14]) we have | ˜ p ij − p ij | (cid:62) / √ K (2.8)for the absolute error of the entries of ˜ P M . As a consequence, we need (cid:37) (cid:63) M d T OL (2.9)sample points in total in order to achieve an absolute error of less than T OL for all theentries p ij . Note that the accuracy of the entries of ˜ P M imposes a restriction on theachievable accuracy of the eigenvectors of P M . We write a ( K ) (cid:62) b ( K ) if there is a constant c > K such that a ( K ) ≤ cb ( K ). The Sparse Ulam method
A naive application of Ulam’s method to higher dimensional systems suffers from the curseof dimension : in order to achieve an L -accuracy of O ( ε ) one needs an approximationspace of dimension O (cid:0) ε − d (cid:1) – translating into a prohibitively large computational effortfor higher dimensional systems. There is a remedy to this problem for systems withlow dimensional long term dynamics [5, 7]: the idea is to first compute a covering ofthe attractor of the system. On this (low dimensional) covering, Ulam’s method cansuccessfully be applied.To avoid the exponential growth of complexity in the system (or attractor) dimension,we now follow an idea which was originally developed for quadrature problems [29] andused for the treatment of higher dimensional partial differential equations, cf. for exam-ple [31, 2]: sparse grids . In fact, we change from the standard Ulam basis to a sparsehierarchical one in order to obtain a better cost/accuracy relation. In the following, wediscuss the chosen basis in detail, as well its advantages and disadvantages. We describe the Haar basis on the d -dimensional unit cube [0 , d , deriving the multidi-mensional basis functions from the one dimensional ones, see e.g. [18]. Let f Haar ( x ) = − sign( x ) · ( | x | ≤ , (3.1)where ( | x | ≤
1) equals 1, if the inequality is true, otherwise 0. A basis function of theHaar basis is defined by the two parameters level i and center (point) j : f i,j ( x ) := (cid:26) i = 0 , i − · f Haar (2 i ( x − x i,j )) if i ≥ , (3.2)where x i,j := (2 j + 1) / i , j ∈ { , . . . , i − − } . (3.3)A d -dimensional basis function is constructed from the one dimensional ones using a tensorproduct construction: ϕ (cid:96), j ( x ) := d (cid:89) i =1 f (cid:96) i , j i ( x i ) , (3.4)for x = ( x , . . . , x d ) ∈ [0 , d . Here (cid:96) = ( (cid:96) , . . . , (cid:96) d ), (cid:96) i ∈ { , , , . . . } , denotes the level ofthe basis function and j = ( j , . . . , j d ), j i ∈ { , . . . , (cid:96) i − } , its center. Theorem 3.1 (Haar basis) : The set H = (cid:8) f i,j | i ∈ N , j ∈ { , . . . , i − } (cid:9) is an orthonormal basis of L ([0 , Haar basis . Similarly, the set H d = (cid:8) ϕ (cid:96), j | (cid:96) ∈ N d , j i ∈ { , . . . , (cid:96) i − } (cid:9) is an orthonormal basis of L ([0 , d ). 6 Level 0 Level 1 Level 2
Figure 1: First three levels of the 1D Haar basisFigure 1 shows the basis functions of the first three levels of the one dimensional Haarbasis. It will prove useful to collect all basis functions of one level in one subspace: W (cid:96) := span (cid:8) ϕ (cid:96), j | j i ∈ { , . . . , (cid:96) i − } (cid:9) , (cid:96) ∈ N d . (3.5)Consequently, L = L ([0 , d ) can be written as the infinite direct sum of the subspaces W (cid:96) , L = (cid:77) (cid:96) ∈ N d W (cid:96) . (3.6)In fact, it can also be shown that L = L ([0 , d ) = (cid:76) (cid:96) ∈ N d W (cid:96) as well. To see this,note that (cid:76) (cid:96) ∈ I d W (cid:96) with I = { (cid:96) | (cid:107) (cid:96) (cid:107) ∞ ≤ n } is the space of characteristic functionssupported on the uniform decomposition of the unit cube in 2 n subcubes in every direction.Moreover, we have dim W (cid:96) = d (cid:89) i =1 max { ,(cid:96) i − } = 2 P (cid:96)i (cid:54) =0 (cid:96) i − . (3.7)In order to get a finite dimensional approximation space most appropriate for our pur-poses, we are going the choose an optimal finite subset of the basis functions ϕ (cid:96), j . Since ingeneral we do not have any a priori information about the function to be approximated,and since all basis functions in one subspace W (cid:96) deliver the same contribution to theapproximation error we will use either all or none of them. In other words, the choice forthe approximation space is transferred to the level of subspaces W (cid:96) . The choice of the optimal set of subspaces W (cid:96) relies in the contribution of each of theseto the approximation error. The following statements give estimates on this. Lemma 3.2:
Let f ∈ C ([0 , c i,j be its coefficients with respect to the Haarbasis, i.e. f = (cid:80) ij c i,j f i,j . Then for i > j | c i,j | ≤ − i +12 (cid:107) f (cid:48) (cid:107) ∞ . f ∈ C (cid:0) [0 , d (cid:1) we analogously have for (cid:96) (cid:54) = 0 and all j | c (cid:96), j | ≤ − ( P (cid:96)i (cid:54) =0 (cid:96) i +1 ) / (cid:89) (cid:96) i (cid:54) =0 (cid:107) ∂ i f (cid:107) ∞ . Proof.
For i ≥ − i c ij = (cid:90) x j x j − − i f − (cid:90) x j +2 − i x j f = (cid:90) x j x j − − i (cid:32) f ( x j ) + (cid:90) xx j f (cid:48) (cid:33) d x − (cid:90) x j +2 − i x j (cid:32) f ( x j ) + (cid:90) xx j f (cid:48) (cid:33) d x and thus 2 − i | c ij | ≤ (cid:107) f (cid:48) (cid:107) ∞ (cid:90) − i x d x, which yields the claimed estimate for the 1D case. The bound in the d -dimensional casefollows similarly.Using this bound on the contribution of a single basis function to the approximationof a given function f , we can derive a bound on the total contribution of a subspace W (cid:96) .For f (cid:96) ∈ W (cid:96) (cid:107) f (cid:96) (cid:107) L ≤ − P (cid:96)i (cid:54) =0 ( (cid:96) i +1) (cid:89) (cid:96) i (cid:54) =0 (cid:107) ∂ i f (cid:107) ∞ , (3.8) (cid:107) f (cid:96) (cid:107) L ≤ − P (cid:96)i (cid:54) =0 ( (cid:96) i +3) / (cid:89) (cid:96) i (cid:54) =0 (cid:107) ∂ i f (cid:107) ∞ . (3.9) The main idea of the sparse grid approach is to choose cost and (approximation) benefit of the approximation subspace in an optimal way. We briefly sketch this idea here, for adetailed exposition see [31, 2]. For a set I ⊂ N d of multiindices we define W I = (cid:77) (cid:96) ∈ I W (cid:96) . Correspondingly, for f ∈ L , let f I = (cid:80) (cid:96) ∈ I f (cid:96) , where f (cid:96) is the orthogonal projection of f onto W (cid:96) . We define the cost C ( (cid:96) ) of a subspace W (cid:96) as its dimension, C ( (cid:96) ) = dim W (cid:96) = 2 P (cid:96)i (cid:54) =0 (cid:96) i − . Since (cid:107) f − f I (cid:107) ≤ (cid:88) (cid:96)/ ∈ I (cid:107) f (cid:96) (cid:107) = (cid:88) (cid:96) ∈ N d (cid:107) f (cid:96) (cid:107) − (cid:88) (cid:96) ∈ I (cid:107) f (cid:96) (cid:107) , (3.10)8he guaranteed increase in accuracy is bounded by the contribution of a subspace W (cid:96) which we add to the approximation space. We therefore define the benefit B ( (cid:96) ) of W (cid:96) asthe upper bound on its L -contribution as derived above, B ( (cid:96) ) = 2 − P (cid:96)i (cid:54) =0 ( (cid:96) i +1) . (3.11)Note that we omited the factor involving derivatives of f . The reason is that it does notaffect the solution of the optimization problem (3.12)Let C ( I ) = (cid:80) (cid:96) ∈ I C ( (cid:96) ) and B ( I ) = (cid:80) (cid:96) ∈ I B ( (cid:96) ) be the total cost and the total benefitof the approximation space W I . In order to find the optimal approximation space we arenow solving the following optimization problem: Given a bound c > W I which solvesmax C ( I ) ≤ c B ( I ) . (3.12)One can show (cf. [2]) that I ⊂ N d is an optimal solution to (3.12) iff C ( (cid:96) ) B ( (cid:96) ) = const for (cid:96) ∈ ∂ I , (3.13)where the boundary ∂ I is given by ∂ I = { (cid:96) ∈ I | (cid:96) (cid:48) ∈ I , (cid:96) (cid:48) ≥ (cid:96) ⇒ (cid:96) (cid:48) = (cid:96) } . Using thedefinitions for cost and benefit as introduced above, we obtain C ( (cid:96) ) B ( (cid:96) ) = 2 P (cid:96)i (cid:54) =0 ( (cid:96) i − − P (cid:96)i (cid:54) =0 ( (cid:96) i +1) = 2 P (cid:96)i (cid:54) =0 (cid:96) i = 2 | (cid:96) | , (3.14)where | (cid:96) | means the 1-norm of the vector (cid:96) .The optimality condition (3.13) thus translates into the simple condition | (cid:96) | = const for (cid:96) ∈ ∂ I . (3.15)As a result, the optimal approximation space is W I ( N ) with I ( N ) = (cid:8) (cid:96) ∈ N d | | (cid:96) | ≤ N (cid:9) , (3.16)where the level N = N ( c ) ∈ N is depending on the chosen cost bound c . Figure 2schematically shows the basis functions of the optimal subspace in 2 D for N = 3. Remark 3.3:
Because of the orthogonality of the Haar-basis in L one can take thesquared contribution as the benefit in the L -case (resulting in equality in (3.10)). In thiscase we obtain the optimality condition (cid:88) (cid:96) i (cid:54) =0 ( (cid:96) i + 1) = const for (cid:96) ∈ ∂ I (3.17)and correspondingly W I with I ( N ) = (cid:40) (cid:96) ∈ N d : (cid:88) (cid:96) i (cid:54) =0 ( (cid:96) i + 1) ≤ N (cid:41) , (3.18) N = N ( c ), as the optimal approximation space. (cid:96) (cid:48) ≥ (cid:96) is meant componentwise rd level sparse basis in 2D. Shaded means value 1, white means value − Having chosen the optimal approximation space V N = W I ( N ) we now build the corre-sponding discretized Frobenius-Perron operator P N . Since the sparse basis B N := (cid:8) ϕ (cid:96), j | | (cid:96) | ≤ N, j i ∈ { , . . . , (cid:96) i − } (cid:9) (3.19)is an L -orthogonal basis of V N , the natural projection Q N : L → V N is given by Q N f = (cid:88) ϕ ∈ B N (cid:18)(cid:90) f ϕ (cid:19) ϕ. (3.20)All basis functions ϕ ∈ B N are piecewise constant and have compact support, so Q N iswell defined on L as well. Choosing an arbitrary enumeration, the ( transition ) matrix ofthe discretized Frobenius-Perron operator P N = Q N ◦ P with respect to B N has entries p ij = (cid:90) ϕ i P ϕ j . (3.21)Writing ϕ i = ϕ + i − ϕ − i = | ϕ i | · ( χ + i − χ − i ), where | ϕ i | is the (constant) absolute value of thefunction over its support and χ + i and χ − i are the characteristic functions on the supportsof the positive and negative parts of ϕ i , we obtain p ij = | ϕ i || ϕ j | (cid:18)(cid:90) χ + i P χ + j − (cid:90) χ − i P χ + j − (cid:90) χ + i P χ − j + (cid:90) χ − i P χ − j (cid:19) , (3.22)10hich is, by (2.6) p ij = | ϕ i || ϕ j | (cid:88) ± m (cid:0) X ± j ∩ S − (cid:0) X ± i (cid:1)(cid:1) , (3.23)where X ± i = supp ϕ ± i and we add the 4 summands like in (3.22). These can be computedin the same way as presented in section 2. Remark 3.4:
We note that(a) if the i th basis function is the one corresponding to (cid:96) = (0 , . . . , p ij = δ ij . (b) The entries of P N are bounded via | p ij | ≤ (cid:115) m ( X j ) m ( X i ) ≤ N/ . (c) If P N x = λx with λ (cid:54) = 1, then x i = 0 if the i th basis function is the one correspondingto (cid:96) = (0 , . . . , x i ( a ) = ( e (cid:62) i P N ) x = e i λx = λx i . (3.24)It is straightforward to show that this property is shared by every Ulam type projec-tion method with a constant function as element of the basis of the approximationspace. This observation is useful for the reliable computation of an eigenvector atan eigenvalue close to one (since it is badly conditioned): (3.24) allows us to reducethe eigenproblem to the subspace orthogonal to the constant function.With the given change in (c) are properties (a)-(c) valid for the numerical realisation aswell. As has been pointed out in the Introduction and in Section 2.3, statements about theconvergence of Ulam’s method exist in certain cases. Note that for N = kd , k = 0 , , , . . . ,the approximation space W I ( N ) includes the Ulam approximation space W (cid:96) with (cid:96) =( k, . . . , k ) and thus we obtain convergence of the Sparse Ulam method as a corollary tothe convergence of Ulam’s method in these cases from the following Lemma (which canbe proved by standard arguments). An open question is, if in general, the convergence ofUlam’s method implies convergence of Sparse Ulam. Lemma 3.5: (cid:107) Q N f − f (cid:107) L n →∞ −→ f ∈ L . In this section, we collect basic statements about the complexity of both methods.11 .1 Cost and accuracy
We defined the total cost of an approximation space as its dimension and the accuracy viaits contribution or benefit , see (3.11). In this section we derive a recurrence formula forthese numbers, depending on the level of the optimal subspaces and the system dimension.Let C ( N, d ) be the dimension of W I ( N ) in phase space dimension d . Then C ( N, d ) = C ( N, d −
1) + N (cid:88) k =1 C ( N − k, d − k − , (4.1)since if (cid:96) = ( ∗ , . . . , ∗ , (cid:96) ’s is C ( N, d − (cid:96) = ( ∗ , . . . , ∗ , (cid:96) d ) with (cid:96) d >
0, then the number of basis functions with such (cid:96) ’sis C ( N − (cid:96) d , d − (cid:96) d − , because there are 2 (cid:96) d − one-dimensional basis functions of level (cid:96) d possible for the tensor product in the last dimension. For d = 1 we simply deal withthe standard Haar basis, so C ( N,
1) = 2 N . Lemma 4.1: C ( N, d ) . = N d − N − d +1 ( d − , (4.2)where . = means the leading order term in N . Proof.
By induction on d . The claim holds clearly for d = 1. Assume, it holds for d − C ( N, d ) = p ( N ) 2 N , where p isa polynomial of order less or equal to d . Consequentely, C ( N, d ) . = N d − N − d +2 ( d − N (cid:88) k =1 ( N − k ) d − N − k − d +2 ( d − k − = N d − N − d +2 ( d − N − d +1 ( d − N (cid:88) k =1 ( N − k ) d − . = N d − N − d +2 ( d − N − d +1 ( d − N d − d − . = N d − N − d +1 ( d − (cid:107) f − f I (cid:107) is bounded by (cid:80) (cid:96)/ ∈ I (cid:107) f (cid:96) (cid:107) , i.e. (cid:107) f − f I (cid:107) ≤ (cid:88) | (cid:96) | >N (cid:107) f (cid:96) (cid:107) , if we use the optimal approximation space W I ( N ) . By (3.8) this means (cid:107) f − f I (cid:107) ≤ (cid:88) | (cid:96) | >N (cid:34) − P (cid:96)i (cid:54) =0 ( (cid:96) i +1) (cid:89) (cid:96) i (cid:54) =0 (cid:107) ∂ i f (cid:107) ∞ (cid:35) (cid:81) (cid:96) i (cid:54) =0 (cid:107) ∂ i f (cid:107) ∞ only depend on the function to be approximated.Thus, without a priori knowledge about f we need to assume that they can be boundedby some common constant and accordingly define the discretization error of the N th levelsparse basis as E ( N, d ) = (cid:88) | (cid:96) | >N − P (cid:96)i (cid:54) =0 ( (cid:96) i +1) . (4.3)Let E ( − n, d ) for n ∈ N , n > (cid:96) = (˜ (cid:96), (cid:96) d ) with˜ (cid:96) ∈ N d − . Then E ( N, d ) = (cid:88) | (cid:96) | >N − P (cid:96)i (cid:54) =0 ( (cid:96) i +1) = ∞ (cid:88) (cid:96) d =0 − ( (cid:96) d +1)( (cid:96) d (cid:54) =0) (cid:88) | ˜ (cid:96) | >N − (cid:96) d − P ˜ (cid:96)i (cid:54) =0 (˜ (cid:96) i +1) = ∞ (cid:88) (cid:96) d =0 − ( (cid:96) d +1)( (cid:96) d (cid:54) =0) E ( N − (cid:96) d , d − , where the expression ( (cid:96) i (cid:54) = 0) has the value 1, if it is true, otherwise 0. This leads, bysplitting the sum, to the recurrence formula E ( N, d ) = E ( N, d −
1) + N (cid:88) k =1 E ( N − k, d − − k − + ∞ (cid:88) k = N +1 − k − E ( − , d − (cid:124) (cid:123)(cid:122) (cid:125) =2 − N − E ( − ,d − . (4.4)We easily compute that E ( N,
1) = 2 − N − for N ≥ E ( − , d ) = (3 / d . Lemma 4.2: E ( N, d ) . = N d − − N − d ( d − , (4.5)where, again, . = means the leading order term in N . Proof.
By induction on d . The claim holds for d = 1, assume it holds for d −
1. Then E ( N, d ) . = N d − − N − d +1 ( d − N (cid:88) k =1 ( N − k ) d − − N + k − d +1 ( d − − k − + (cid:18) (cid:19) d − − N − . = N d − − N − d +1 ( d − − N − d ( d − N (cid:88) k =1 ( N − k ) d − . = 2 − N − d ( d − N d − d − omparison with Ulam’s method. We now compare the expressions for the asymp-totic behaviour of cost and discretization error in dependence of the discretization level N and the problem dimension d in Lemmata 4.1 and 4.2 to the corresponding expressionsfor the standard Ulam basis, i.e. the span of the characteristic functions on a uniformpartition of the unit cube into cubes of edge length 2 − M in each coordinate direction –this is (cid:76) (cid:107) (cid:96) (cid:107) ∞ ≤ M W (cid:96) . This space consists of (2 M ) d basis functions, the discretization erroris O (cid:0) − M (cid:1) .We thus have – up to constants – the following asymptotic expressions for cost anderror of the sparse and the standard basis: cost errorsparse basis ( N/ d − N ( N/ d − − N standard basis 2 dM − M To highlight the main difference, consider the following simple computation: Theexpressions for the errors are equal if M = N + d − ( d −
1) log N. Using this value for M in the cost expression we get N d − N − d < dN + d − d ( d −
1) log N , i.e. Nd + 1 > log N − When we use Monte-Carlo quadrature in order to approximate the entries of the transitionmatrix in both methods, the overall computation breaks down into the following threesteps:1. mapping the sample points,2. constructing the transition matrix,3. solving the eigenproblem.While steps 1. and 3. are identical for both methods, step 2. differs significantly. Thisis due to the fact that in contrast to Ulam’s method, the basis functions of the sparsehierarchical tensor basis have global and non-disjoint supports.14 .2.1 Number of sample points
Applying Monte-Carlo approximation to (3.22), we obtain˜ p ij = | ϕ i || ϕ j | m (cid:0) X + j (cid:1) K j K j (cid:88) k =1 χ + i (cid:0) S ( x + k ) (cid:1) − χ − i (cid:0) S ( x + k ) (cid:1) (4.7) − m (cid:0) X − j (cid:1) K j K j (cid:88) k =1 χ + i ( S ( x − k )) − χ − i ( S ( x − k )) , (4.8)where the sample points x ± k are chosen i.i.d. from a uniform distribution on X + j and X − j ,respectively. In fact, since the union of the supports of the basis functions in one subspace W (cid:96) covers all of X , we can reuse the same set of (cid:37) sample points and their images for eachof the subspaces W (cid:96) (i.e. (cid:0) N + dd (cid:1) times). Note that the number K j of test points chosenin X ± j now varies with j since the supports of the various basis functions are of differentsize: on average, K j = (cid:37)m ( X ± j ). Accordingly, for the absolute error of ˜ p ij we get | ˜ p ij − p ij | ∼ m ( X j ) (cid:112) m ( X i ) m ( X j ) (cid:112) (cid:37) m ( X j ) = 1 (cid:112) (cid:37)m ( X i ) , (4.9)where we used that m ( X ± i ) ∼ m ( X i ). In the worst case we thus get | ˜ p ij − p ij | ∼ N/ √ (cid:37) , which implies (cid:37) (cid:63) N T OL . (4.10)for the total number of test points required in order to achieve an accuracy of T OL inthe entries of the transition matrix.
Comparison with Ulam’s method.
Aiming at a final accuracy of ε > M and N accordingly. Assuming that the correspondingeigenproblems are well conditioned, T OL = ε is a reasonable choice for the requiredaccuracy of the entries. This implies a number of (cid:37) (cid:63) ε − ( d +2) sample points for the standard realisation of Ulam’s method (cf. 2.4), and yields, since2 N (cid:62) ε − (log( ε − )) d − , (cid:37) (cid:63) ε − (cid:0) log( ε − ) (cid:1) d − sample points for the sparse Ulam method. Note that for d ≥
2, the sparse Ulam methodrequires less sample points than Ulam’s method in order to achieve a comparable accuracyin the eigenvector approximation. 15 .2.2 Number of index computations
While in Ulam’s method each sample point is used in the computation of one entry ofthe transition matrix only, this is not the case in the Sparse Ulam method. In fact, eachsample point (and its image) is used in the computation of | I ( N ) | matrix entries, namelyone entry for each pair ( W (cid:96) , W m ) of subspaces.Correspondingly, for each sample point x (and its image) and for each (cid:96) ∈ I ( N ), wehave to compute the index j of the basis function ϕ (cid:96), j ∈ W (cid:96) whose support contains x .Since (cf. the previous section) the required number of sample points is O (cid:16) N T OL (cid:17) and | I ( N ) | = (cid:0) N + dd (cid:1) , this leads to (cid:37) · | I ( N ) | = 2 N T OL (cid:18) N + dd (cid:19) (cid:46) T OL N d d ! 2 N . = 2 d − Nd T OL dim V N of these computations (for reasonable d ). In contrast, in Ulam’s method, the correspond-ing number is (cid:37) · dM T OL = 1 T OL dim V M . Note that for the Sparse Ulam method the number of index computations is not stayingproportional to the dimension of the approximation space. However, it is still scalingmuch more mildly with d than for Ulam’s method. The matrix which represents the discretized transfer operator in Ulam’s method is sparse :the supports of the basis functions are disjoint, and thus p ij (cid:54) = 0 only if S ( X j ) ∩ X i (cid:54) = ∅ .Hence, for a sufficiently fine partition, the number of partition elements X i which areintersected by the image S ( X j ) is determined by the local expansion of S . This is afixed number related to a Lipschitz estimate on S and so the matrix of the discretizedtransfer operator with respect to the standard Ulam basis is sparse for sufficiently large n . Unfortunately this property is not shared by the matrix with respect to the sparsebasis as the following considerations show.The main reason for this is that the supports of the basis functions in the sparse basisare not localised, cf. the thin and long supports of the basis of W (cid:96) for (cid:96) = ( N, , . . . , globalbehaviour of the dynamical system S . Let B (cid:96) := (cid:8) ϕ (cid:96), j | j i ∈ { , . . . , (cid:96) i − } (cid:9) denote thebasis of W (cid:96) and letnnz( k , (cid:96) ) = |{ ( i, j ) | S (supp( ϕ i )) ∩ supp( ϕ j ) (cid:54) = ∅ , ϕ i ∈ B k , ϕ j ∈ B (cid:96) }| be the number of nonzero matrix entries which arise from the interaction of the basisfunctions from the subspaces W k and W (cid:96) if W k is mapped. We define the matrix occupancy of a basis B I = (cid:83) (cid:96) ∈ I B (cid:96) as nnz( B I ) = (cid:88) k ,(cid:96) ∈ I nnz( k , (cid:96) ) . (4.11)16n order to estimate nnz( k , (cid:96) ) we employ upper bounds L i , i = 1 , . . . , d , for the Lipschitz-constants of S , cf. Figure 3. We obtain Proposition 4.3: nnz( k , (cid:96) ) ≤ | B k | d (cid:89) i =1 (cid:24) L i · − k i +1 − ( k i =0) − (cid:96) i +1 − ( (cid:96) i =0) (cid:25) . (4.12) Proof.
Since we have used upper bounds for the Lipschitz constants, one mapped boxhas at most the extension L i · − k i +1 − ( k i =0) in the i th dimension. Consequently, its supportintersects with at most (cid:24) L i · − k i +1 − ( k i =0) − (cid:96) i +1 − ( (cid:96) i =0) (cid:25) supports of basis functions from W (cid:96) . S a x a y L x a x a y L y b x b y Figure 3: Model for the matrix occupancy in 2D. Shaded and colorless (white) show thefunction values ( ±| ϕ | ), thicker black lines the support boundaries. Remark 4.4:
Numerical experiments suggest that the above bound approximates thematrix occupancy pretty well. However, it could be improved: (3.21) shows that a matrixentry could still be zero even if supp( ϕ i ) and supp( P ϕ j ) intersect. This is e.g. the case ifsupp( P ϕ j ) is included in a subset of supp( ϕ i ), where ϕ i is constant (i.e. does not changesign). The property (cid:107) P f (cid:107) L = (cid:107) f (cid:107) L for f ≥ P imply p ij = 0, since (cid:13)(cid:13) ϕ + j (cid:13)(cid:13) L = (cid:13)(cid:13) ϕ − j (cid:13)(cid:13) L . An asymptotic estimate.
Let us examine nnz( k , (cid:96) ) for k = (0 , . . . , , N ) and (cid:96) =( N, , . . . , L i = 1 we getnnz( k , (cid:96) ) (cid:63) N , | B k | = 2 N − and the image of each basis function from B k intersects with each basisfunction from B (cid:96) . Since | B N | ≈ N d − N , we get2 N (cid:62) nnz( B N ) (cid:62) N d − N . (4.13)The exponential term dominates the polynomial one for large N , so asymptotically wewill not get a sparse matrix.Does this affect the calculations regarding efficiency made above? As already men-tioned, the error of Ulam’s method is ε = O (2 − M ) while its cost is 2 dM = O ( ε − d ). As-suming that the Sparse Ulam method has the same error ε = O ( N d − − N ), its worst-casecost is O ( N d − N ) (cid:62) ε − N d − (cid:62) ε − log( ε − ) d − , where we used N d − − N (cid:62) − N/ , which leads to log( ε ) (cid:62) − N . Clearly, this means –similarily to subsection 4.1 – partially overcoming the curse of dimensionality. Even inthe most optimistic case, ie. the costs are of O (2 N ), we have at least O ( ε − N d − ) costs,so the sparse-Ulam-method is efficienter than Ulam’s, only if d ≥ We compare both methods by approximating the invariant density of a simple threedimensonal map. Let S i : [0 , → [0 ,
1] be given by S ( x ) = 1 − | x − / | ,S ( x ) = (cid:26) x/ (1 − x ) , x < / − x ) / (2 x ) , else , ,S ( x ) = (cid:26) x/ (1 − x ) , x < √ − − x ) / (2 x ) , else , and S : [0 , → [0 , be the tensor product map S ( x ) = ( S ( x ) , S ( x ) , S ( x )) (cid:62) , where x = ( x , x , x ) (cid:62) . This map is expanding and its unique invariant density is givenby h ( x ) = 8 π (1 + x )(1 + x ) . (cf. [13]).We approximate h by Ulam’s method on an equipartition of 2 M boxes for M = 4 , , N = 4 , ,
6. Figure 4 shows the L -error for both methods in dependence of the number of sample points (left) as well as thenumber of index computations along these curves (right). While the Sparse Ulam methodrequires almost three orders of magnitude fewer sample points than Ulam’s method, the18 −2 −1 log ( E rr o r
20 22 24 26 28 30 32 34 3610 −2 −1 log ( E rr o r
45 6 654
Figure 4: Left: L -error of the approximate invariant density in dependence on the num-ber of sample points for levels N, M = 4 , ,
6. Right: Corresponding number of indexcomputations.number of index computations is roughly comparable. This is in good agreement withour theoretical considerations in sections 4.2.1 and 4.2.2.In Figure 5 we show the dependence of the L -error on the number of nonzeros in thetransition matrices for levels M, N = 3 , . . . ,
6. Again, the Sparse Ulam method is aheadof Ulam’s method by almost an order of magnitude. −2 −1 ⋅ x) E rr o r UlamSpU
Figure 5: L -error of the approximate invariant densities in dependence on the numberof nonzeros in the transition matrices. In a second numerical experiment, we approximate a few dominant eigenfunctions of thetransfer operator for an area preserving map. Since the information on almost invariant19ets does not change [17] (but the eigenproblem becomes easier to solve) we here considerthe symmetrized transition matrix ( P + P (cid:62) ), cf. also [22].Consider the so called standard map S ρ : [0 , → [0 , ,( x , x ) (cid:62) (cid:55)→ ( x + x + ρ sin(2 πx ) + 0 . , x + ρ sin (2 πx )) (cid:62) mod 1 , where 0 < ρ < area preserving , i.e. the Lebesgue measureis invariant w.r.t. S ρ . Figure 6 shows approximations of the eigenfunctions at the secondlargest eigenvalue of S ρ for ρ = 0 . ρ = 0 . · boxes (i.e. for M = 6). x x Figure 6: Eigenfunction of the symmetrized transition matrix at the second largest eigen-value for the standard map. Left: ρ = 0 . λ = 0 .
97, right: ρ = 0 . λ = 0 . S : [0 , → [0 , by S = S ρ ⊗ S ρ , with ρ = 0 . ρ = 0 .
6. Note that the eigenfunctions of S are tensor products ofthe eigenfunctions of the S ρ i . This is reflected in Figures 7 and 8 where we show theeigenfunctions at the two largest eigenvalues, computed by the Sparse Ulam method onlevel N = 8, using 2 sample points overall. Clearly, each of these two is a tensor productof the (2d-) eigenfunction at the second largest eigenvalue with the (2d-) invariant (i.e.constant) density.Figure 9 shows an eigenfunction for which both factors of the tensor product arenon-constant. The resolution of this eigenfunction seems worse than for those with oneconstant factor. In fact, for an approximation of an eigenfunction which is constantwith respect to, say, x and x it suffices to consider subspaces W (cid:96) with (cid:96) = ( (cid:96) , (cid:96) , , x and x directions. References [1] G. D. Birkhoff. Proof of the ergodic theorem.
Proc. nat. Acad. Sci. U.S.A. , 17:650–660,1931. λ = 0 .
97. Left: v ( · , · , x , x ) for fixed x , x ,right: v ( x , x , · , · ) for fixed x , x .Figure 8: Approximate eigenfunction at λ = 0 . [2] H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica , 13:1–123, 2004.[3] M. Dellnitz, G. Froyland, and O. Junge. The algorithms behind GAIO-set oriented numer-ical methods for dynamical systems. In
Ergodic theory, analysis, and efficient simulationof dynamical systems , pages 145–174, 805–807. Springer, Berlin, 2001.[4] M. Dellnitz and A. Hohmann. The computation of unstable manifolds using subdivisionand continuation. In H. Broer, S. van Gils, I. Hoveijn, and F. Takens, editors,
NonlinearDynamical Systems and Chaos , pages 449–459. Birkh¨auser,
PNLDE
9, 1996.[5] M. Dellnitz and A. Hohmann. A subdivision algorithm for the computation of unstablemanifolds and global attractors.
Numer. Math. , 75(3):293–317, 1997.[6] M. Dellnitz and O. Junge. An adaptive subdivision technique for the approximation ofattractors and invariant measures.
Comput. Vis. Sci. , 1(2):63–68, 1998.[7] M. Dellnitz and O. Junge. On the approximation of complicated dynamical behavior.
SIAMJ. Numer. Anal. , 36:491–515, 1999. λ = 0 . [8] M. Dellnitz and O. Junge. Set oriented numerical methods for dynamical systems. In Handbook of dynamical systems, Vol. 2 , pages 221–264. North-Holland, Amsterdam, 2002.[9] P. Deuflhard, M. Dellnitz, O. Junge, and C. Sch¨utte. Computation of essential moleculardynamics by subdivision techniques. Deuflhard, Peter (ed.) et al., Computational moleculardynamics: challenges, methods, ideas. Springer. Lect. Notes Comput. Sci. Eng. 4, 98-115,1999.[10] P. Deuflhard and C. Sch¨utte. Molecular conformation dynamics and computational drug de-sign. In
Applied mathematics entering the 21st century , pages 91–119. SIAM, Philadelphia,PA, 2004.[11] J. Ding, Q. Du, and T. Y. Li. High order approximation of the Frobenius-Perron operator.
Appl. Math. Comp. , 53:151–171, 1993.[12] J. Ding and T.-Y. Li. Markov finite approximation of the Frobenius-Perron operator.
Nonlin. Anal., Theory, Meth. & Appl. , 17:759–772, 1991.[13] J. Ding and A. Zhou. Finite approximations of Frobenius-Perron operators. A solution ofUlam’s conjucture on multi-dimensional transformations.
Physica D , 92:61–68, 1996.[14] W. Feller.
An introduction to probability theory and its applications , volume 2. Wiley, 2.edition, 1971.[15] G. Froyland. Finite approximation of Sinai-Bowen-Ruelle measures for Anosov systems intwo dimensions.
Random Comp. Dyn. , 3(4):251–263, 1995.[16] G. Froyland. Approximating physical invariant measures of mixing dynamical systems inhigher dimensions.
Nonlinear Analysis, Theory, Methods, & Applications , 32(7):831–860,1998.[17] G. Froyland. Statistically optimal almost-invariant sets.
Phys. D , 200(3-4):205–219, 2005.[18] M. Griebel, P. Oswald, and T. Schiekofer. Sparse grids for boundary integral equations.
Numerische Mathematik , 83(2):279–312, 1999.
19] R. Guder, M. Dellnitz, and E. Kreuzer. An adaptive method for the approximation of thegeneralized cell mapping.
Chaos, Solitons and Fractals , (4):525–534, 1997.[20] F. Y. Hunt. A Monte Carlo approach to the approximation of invariant measures. RandomComput. Dynam. , 2(1):111–133, 1994.[21] O. Junge. An adaptive subdivision technique for the approximation of attractors andinvariant measures: proof of convergence.
Dyn. Syst. , 16(3):213–222, 2001.[22] O. Junge, J. Marsden, and I. Mezic. Uncertainty in the dynamics of conservative maps. In
Proceedings of the 43rd IEEE CDC , 2004.[23] T. Kato.
Perturbation Theory for Linear Operators . Springer-Verl., 2. edition, 1984.[24] A. Lasota and M. C. Mackey.
Chaos, Fractals, and Noise . Springer-Verl., 2. edition, 1994.[25] T.-Y. Li. Finite approximation for the Frobenius-Perron operator. A solution to Ulam’sconjecture.
J. Approx. Theory , 17:177–186, 1976.[26] C. Sch¨utte. Conformational dynamics: Modelling theory algorithm and applicatioconfor-mational dynamics: Modelling, theory, algorithm, and application to biomolecules. Habili-tation thesis, Free University Berlin, 1999.[27] C. Sch¨utte and W. Huisinga. Biomolecular conformations can be identified as metastablesets of molecular dynamics. In
Handbook of numerical analysis, Vol. X , Handb. Numer.Anal., X, pages 699–744. North-Holland, Amsterdam, 2003.[28] C. Sch¨utte, W. Huisinga, and P. Deuflhard. Transfer operator approach to conformationaldynamics in biomolecular systems. In B. Fieder, editor,
Ergodic Theory, Analysis, andEfficient Simulation of Dynamical Systems , pages 191–223. Springer, 2001.[29] S. Smolyak. Quadrature and interpolation formulas for tensor products of certain classesof functions.
Dokl. Akad. Nauk SSSR , 148:1042–1045, 1963.[30] S. M. Ulam.
A Collection of Mathematical Problems . Interscience Publisher NY, 1960.[31] C. Zenger. Sparse grids. In
Parallel algorithms for partial differential equations (Kiel, 1990) ,volume 31 of
Notes Numer. Fluid Mech. , pages 241–251. Vieweg, Braunschweig, 1991., pages 241–251. Vieweg, Braunschweig, 1991.