Time integration of tensor trains
TTIME INTEGRATION OF TENSOR TRAINS
CHRISTIAN LUBICH † , IVAN V. OSELEDETS ‡¶ , AND
BART VANDEREYCKEN § Abstract.
A robust and efficient time integrator for dynamical tensor approximation in thetensor train or matrix product state format is presented. The method is based on splitting theprojector onto the tangent space of the tensor manifold. The algorithm can be used for updatingtime-dependent tensors in the given data-sparse tensor train / matrix product state format andfor computing an approximate solution to high-dimensional tensor differential equations within thisdata-sparse format. The formulation, implementation and theoretical properties of the proposedintegrator are studied, and numerical experiments with problems from quantum molecular dynamicsand with iterative processes in the tensor train format are included.
Key words.
Tensor train, matrix product state, low-rank approximation, time-varying tensors,tensor differential equations, splitting integrator.
AMS subject classifications.
1. Introduction.
There has been much interest lately in the development ofdata-sparse tensor formats for high-dimensional problems ranging from quantum me-chanics to information retrieval; see, e.g., the monograph [5] and references therein.A very promising tensor format is provided by tensor trains (TT) [26, 27], which arealso known as matrix product states (MPS) in the theoretical physics literature [29]In the present paper we deal with the problem of computing an approximationto a time-dependent large tensor A ( t ) , t ≤ t ≤ t within the TT/MPS format. Thisincludes the situation where the tensor A ( t ) is known explicitly but in a less data-sparse format and we require an approximation of lower complexity. Alternatively,the tensor A ( t ) could also be defined implicitly as the solution of a tensor differentialequation . A = F ( t, A ), where . denotes d/dt . Such a situation typically arises from aspace discretization of a high-dimensional evolutionary partial differential equation.In both situations, such an approximation can be obtained by the principle of dy-namical low-rank: Given an approximation manifold M , the desired time-dependentapproximation Y ( t ) ∈ M is computed as (cid:107) . Y ( t ) − . A ( t ) (cid:107) = min or (cid:107) . Y ( t ) − F ( t, Y ( t )) (cid:107) = min , where . A and F are given. This is known as the Dirac–Frenkel time-dependent varia-tional principle in physics; see [17, 18]. In our case, M consists of TT/MPS tensors offixed rank and its manifold structure and tangent space were studied in [12]. For (cid:107) · (cid:107) the Euclidean norm, the minimizations from above lead to the following differentialequations on M : . Y ( t ) = P Y ( t ) . A ( t ) and . Y ( t ) = P Y ( t ) F ( t, Y ( t )) (1.1)where P Y is the orthogonal projection onto the tangent space of M at Y (see § P Y ). This time-dependent variational principle on fixed-rank TT/MPS † Mathematisches Institut, Universit¨at T¨ubingen, Auf der Morgenstelle 10, D–72076 T¨ubingen,Germany. ([email protected]) ‡ Skolkovo Institute of Science and Technology, Novaya St. 100, Skolkovo, Odintsovsky district,143025 Moscow Region, Russia ([email protected]) ¶ Institute of Numerical Mathematics, Gubkina St. 8, 119333 Moscow, Russia § Department of Mathematics, Princeton University, Fine Hall, Princeton NJ 08544, USA.([email protected]) 1 a r X i v : . [ m a t h . NA ] J a n CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN manifolds is studied in [20], where the explicit differential equations are derived andtheir approximation properties are analyzed. We further refer to [7] for a discussionof time-dependent matrix product state approximations in the physical literature.A conceptually related, but technically simpler situation arises in the dynamicallow-rank approximation of matrices [15]. There, the time-dependent variational prin-ciple is applied on manifolds of matrices of a fixed rank, in order to update low-rankapproximations to time-dependent large data matrices or to approximate solutionsto matrix differential equations by low-rank matrices. The arising differential equa-tions for the low-rank factorization need to be solved numerically, which becomes achallenge in the (often occurring) presence of small singular values in the approxima-tion. While standard numerical integrators such as explicit or implicit Runge–Kuttamethods then perform poorly, a novel splitting integrator proposed and studied in[19] shows robustness properties under ill-conditioning that are not shared by anystandard numerical integrator. The integrator of [19] is based on splitting the orthog-onal projector onto the tangent space of the low-rank matrix manifold. It providesa simple, computationally efficient update of the low-rank factorization in every timestep.In the present paper we extend the projector-splitting integrator of [19] from thematrix case to the TT/MPS case in the time-dependent approximation (1.1).After collecting the necessary prerequisites on tensor trains / matrix productstates in §
2, we study the orthogonal projection onto the tangent space of the fixed-rank TT/MPS manifold in §
3. We show that the projector admits an additive de-composition of a simple structure. In § § §
2. Tensor trains / matrix product states: prerequisites.
We present thetensor train or matrix product state formats, together with their normalized rep-resentations that we will use throughout the paper. Although our presentation isself-contained, its content is not original and can be found in, e.g, [26, 12].
Norm and inner product of tensors.
The norm of a tensor X ∈ R n ×···× n d ,as considered here, is the Euclidean norm of the vector x that carries the entries X ( (cid:96) , . . . , (cid:96) d ) of X . The inner product (cid:104) X, Y (cid:105) of two tensors
X, Y ∈ R n ×···× n d is theEuclidean inner product of the two corresponding vectors x and y . Unfolding and reconstruction.
The i th unfolding of a tensor X ∈ R n ×···× n d isthe matrix X (cid:104) i (cid:105) ∈ R ( n ··· n i ) × ( n i +1 ··· n d ) that aligns all entries X ( (cid:96) , . . . , (cid:96) d ) with fixed (cid:96) , . . . , (cid:96) i in a row of X (cid:104) i (cid:105) , and rows and columns are ordered colexicographically. Theinverse of unfolding is reconstructing, which we denote as X = Ten i ( X (cid:104) i (cid:105) ) , that is, the tensor X ∈ R n ×···× n d has the i th unfolding X (cid:104) i (cid:105) ∈ R ( n ...n i ) × ( n i +1 ...n d ) . IME INTEGRATION OF TENSOR TRAINS TT/MPS format.
A tensor X ∈ R n ×···× n d is in the TT/MPS format if thereexist core tensors C i ∈ R r i − × n i × r i with r = r d = 1 such that X ( (cid:96) , . . . , (cid:96) d ) = r (cid:88) j =1 · · · r d − (cid:88) j d − =1 C (1 , (cid:96) , j ) · C ( j , (cid:96) , j ) · · · C d ( j d − , (cid:96) d , (cid:96) i = 1 , . . . , n i and i = 1 , . . . , d . Equivalently, we have X ( (cid:96) , . . . , (cid:96) d ) = C ( (cid:96) ) · · · C d ( (cid:96) d ) , where the r i − × r i matrices C i ( (cid:96) i ) are defined as the slices C i (: , (cid:96) i , :).Observe that X can be parametrized by (cid:80) di =1 n i r i − r i ≤ dN R degrees of free-dom, where N = max { n i } and R = max { r i } . In high-dimensional applications whereTT/MPS tensors are practically relevant, R is constant or only mildly dependent on d . Hence for large d , one obtains a considerable reduction in the degrees of freedomcompared to a general tensor of size N d . Left and right unfoldings.
For any core tensor C i ∈ R r i − × n i × r i , we denote C i = C i (: , : , (cid:124) ... C i (: , : , r i ) (cid:124) ∈ R ( r i n i ) × r i − . The matrix C i is the right unfolding . TT/MPS rank.
We call a vector r = (1 , r , . . . , r d − ,
1) the
TT/MPS rank of atensor X ∈ R n ×···× n d ifrank X (cid:104) i (cid:105) = r i , ( i = 1 , . . . , d − . In case r i ≤ min { (cid:81) ij =1 n j , (cid:81) dj = i +1 n j } , this implies that X can be represented in theTT/MPS format with core tensors C i ∈ R r i − × n i × r i of full multi-linear rank, that is,rank C i = r i − , ( i = 1 , . . . , d ) . In addition, it is known (see [12, Lem. 4]) that for fixed r such a full-rank conditionon the core tensors implies that the set M = { X ∈ R n ×···× n d : TT/MPS rank of X is r } (2.1)is a smooth embedded submanifold in R n ×···× n d . Partial products.
Define the left partial product X ≤ i ∈ R n ×···× n i × r i as X ≤ i ( (cid:96) , . . . , (cid:96) i , :) = C ( (cid:96) ) · · · C i ( (cid:96) i )and the right partial product X ≥ i +1 ∈ R r i × n i +1 ×···× n d as X ≥ i +1 (: , (cid:96) i +1 , . . . , (cid:96) d ) = C i +1 ( (cid:96) i +1 ) · · · C d ( (cid:96) d ) . See also Fig. 2.1(a) for their graphical representation in terms of a tensor network.Let a particular unfolding of each of these partial products be denoted as X ≤ i ∈ R ( n ··· n i ) × r i , X ≥ i +1 ∈ R ( n i +1 ··· n d ) × r i . CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN X ≥ X ≤ X (a) X (b) Q ≥ Q ≤ S Fig. 2.1 . A 5 dimensional TT/MPS tensor X . Panel (a) indicates specific left and rightpartial products of X . Panel (b) depicts the third recursive SVD of X . Observe that left and rightorthogonalized cores are denoted using (cid:71)(cid:35) and (cid:72)(cid:35) respectively. The elementwise relation X ( (cid:96) , . . . , (cid:96) d ) = X ≤ i ( (cid:96) , . . . , (cid:96) i , :) X ≥ i +1 (: , (cid:96) i +1 , . . . , (cid:96) d ) thentranslates into X (cid:104) i (cid:105) = X ≤ i X (cid:124) ≥ i +1 . Recursive construction.
We note the recurrence relations X ≤ i = ( I n i ⊗ X ≤ i − ) C i ( i = 1 , . . . , d ) (2.3)with X ≥ d +1 = 1. Here ⊗ denotes the standard Kronecker product.Combining the above formulas we note X (cid:104) i (cid:105) = ( I n i ⊗ X ≤ i − ) C (cid:124) i ( X ≥ i +1 ⊗ I n i ) (cid:124) , (2.5)which together with the previous formula allows us to pass from the ( i − i th unfolding. Thanks to the recursive relations(2.2) and (2.3), it is possible to compute the QR decompositions of the matrices X ≤ i and X ≥ i efficiently.Let us explain the case for X ≤ i in detail. First, compute a QR factorization (the < in Q < is just notational for now but will become clear in § X ≤ = C < = Q < R , with Q < (cid:124) Q < = I r , Q < ∈ R n × r , R ∈ R r × r , and insert it into the recurrence relation (2.2) to obtain X ≤ = ( I n ⊗ Q < R ) C < = ( I n ⊗ Q < )( I n ⊗ R ) C < . Next, make another QR decomposition( I n ⊗ R ) C < = Q < R , with Q < (cid:124) Q < = I r , Q < ∈ R ( r n ) × r , R ∈ R r × r , so that we have obtained a QR decomposition of X ≤ = Q ≤ R with Q ≤ = ( I n ⊗ Q < ) Q < . IME INTEGRATION OF TENSOR TRAINS i = 2 , , . . . . Putting Q ≤ = 1, we have obtained for each i = 1 , . . . , d the QR decompositions X ≤ i = Q ≤ i R i with Q ≤ i = ( I n i ⊗ Q ≤ i − ) Q d = Q >d R d , we can use (2.3)to obtain the QR decompositions X ≥ i = Q ≥ i R i with Q ≥ i = ( Q ≥ i +1 ⊗ I n i ) Q >i , (2.6)where the matrices Q >i ∈ R ( r i n i ) × r i − and R i ∈ R r i − × r i − are recursively obtainedfrom ( R i +1 ⊗ I n i ) C >i = Q >i R i . We remark that these R i are in general different thanthose obtained while orthogonalizing from the left.Observe that when X ≤ i is left-orthogonalized, then so is X ≤ j for any j < i . Since X ≤ d = X (cid:104) d (cid:105) , we call X left orthogonal if X ≤ d is left-orthogonalized. As is evidentfrom Fig. 2.1, such a left orthogonal X is recursively computed by modifying thecores C i from left to right during a so-called forward sweep . Likewise, we call X rightorthogonal if X ≥ = X (cid:104) (cid:105) is right-orthogonalized which is obtained by a backwardsweep from right to left. Suppose that X ≤ i = Q ≤ i R i and X ≥ i +1 = Q ≥ i +1 R i +1 are QR decompositions obtained from left and right orthogonalizations, we then havethe following SVD-like decomposition X (cid:104) i (cid:105) = Q ≤ i S i Q (cid:124) ≥ i +1 , with S i = R i R (cid:124) i +1 ∈ R r i × r i . (2.7)The matrix S i can be chosen diagonal, although we do not insist that it is. Sincethe orthonormal matrices Q ≤ i and Q ≥ i +1 satisfy the recursive relations as explainedbefore, we call (2.7) a recursive SVD of X (cid:104) i (cid:105) , or the i th recursive SVD of X . Thegraphical representation of such a recursive SVD is depicted in Fig. 2.1(b).This recursiveness can be used for the SVD of X (cid:104) i +1 (cid:105) . By (2.6), we can write X (cid:104) i (cid:105) = ( Q ≤ i S i ) Q > (cid:124) i +1 ( Q ≥ i +2 ⊗ I n i +1 ) (cid:124) . (2.8)To obtain a decomposition of X (cid:104) i +1 (cid:105) by means of the relations (2.5) and (2.4), weidentify (2.8) with (2.5) (hence, i − i and C > (cid:124) i that of Q > (cid:124) i +1 ). Thecorresponding expression for (2.4) then becomes X (cid:104) i +1 (cid:105) = ( I n i +1 ⊗ Q ≤ i S i ) Q i = Q >i S (cid:124) i we can write X (cid:104) i − (cid:105) = Q ≤ i − S i Q (cid:124) ≥ i where Q ≥ i = ( Q ≥ i +1 ⊗ I n i ) Q >i . (2.13)
3. Orthogonal projection onto the tangent space.
Let M be the embeddedmanifold of tensors of a given TT/MPS rank r ; see (2.1). In this section, we derivean explicit formula for the orthogonal projection onto the tangent space T X M at X ∈ M , P X : R n ×···× n d → T X M . With the Euclidean inner product, the projection P X ( Z ) for arbitrary Z ∈ R n ×···× n d has the following equivalent variational definition: (cid:104) P X ( Z ) , δX (cid:105) = (cid:104) Z, δX (cid:105) ∀ δX ∈ T X M . Before we state the theorem, we recall a useful parametrization of T X M as intro-duced in [12]. Let X ∈ M be left orthogonal, that is, in the decompositions X (cid:104) i (cid:105) = ( I n i ⊗ X ≤ i − ) C
Let M be the manifold of fixed rank TT/MPS tensors. Then, theorthogonal projection onto the tangent space of M at X ∈ M is given by P X ( Z ) = d − (cid:88) i =1 Ten i (cid:2) ( I n i ⊗ P ≤ i − ) Z (cid:104) i (cid:105) P ≥ i +1 − P ≤ i Z (cid:104) i (cid:105) P ≥ i +1 (cid:3) + Ten d (cid:2) ( I n d ⊗ P ≤ d − ) Z (cid:104) d (cid:105) (cid:3) for any Z ∈ R n ×···× n d .Proof . We assume that X is given by (3.1). For given Z ∈ R n ×···× n d , we aim todetermine δU = P X ( Z ) ∈ T X M such that (cid:104) δU, δX (cid:105) = (cid:104) Z, δX (cid:105) ∀ δX ∈ T X M . (3.2)Writing δU = (cid:80) dj =1 δU j with δU j ∈ V j , this means that we need to determine matrices δ B Hence, for all matrices δ C
For i = 0 , . . . , d + 1 , define the orthogonal projectors P ≤ i : R n ×···× n d → T X M , Z (cid:55)→ Ten i ( P ≤ i Z (cid:104) i (cid:105) ) P ≥ i : R n ×···× n d → T X M , Z (cid:55)→ Ten i − ( Z (cid:104) i − (cid:105) P ≥ i ) . Then, the projector P X in Theorem 3.1 satisfies P X = d − (cid:88) i =1 ( P ≤ i − P ≥ i +1 − P ≤ i P ≥ i +1 ) + P ≤ d − P ≥ d +1 . In addition, P ≤ i and P ≥ j commute for i < j .Proof . The fact that P ≤ i commutes with P ≥ j follows from the observation thatfor any Z ∈ R n ×···× n d , P ≤ i ( Z ) acts on the rows of Z (cid:104) i (cid:105) —and hence also on the rowsof Z (cid:104) j (cid:105) —while P ≥ j ( Z ) acts on the columns of Z (cid:104) j (cid:105) .To write P X using the new notation, we need to work out the term P ≥ i +1 ( P ≤ i − ( Z )) = P ≥ i +1 (cid:2) Ten i − ( P ≤ i − Z (cid:104) i − (cid:105) ) (cid:3) = Ten i (cid:2) Y (cid:104) i (cid:105) P ≥ i +1 (cid:3) , with Y = Ten i − ( P ≤ i − Z (cid:104) i − (cid:105) ). Denote the mode-1 matricization of a tensor by · (1) ; see [16, § (cid:98) Z and (cid:98) Y , both of size( n · · · n i − ) × n i × ( n i +1 · · · n d ), such that (cid:98) Z (1) = Z (cid:104) i − (cid:105) and (cid:98) Y (1) = Y (cid:104) i − (cid:105) . Inaddition, let × denote the mode-1 multilinear product of a tensor with a matrix; see[16, § (cid:98) Z × P ≤ i − ) (1) = P ≤ i − (cid:98) Z (1) = P ≤ i − Z (cid:104) i − (cid:105) = Y (cid:104) i − (cid:105) = (cid:98) Y (1) . IME INTEGRATION OF TENSOR TRAINS (cid:98) Y = (cid:98) Z × P ≤ i − . Using the notation · (1 , = · (3) T (see again [16, § (cid:98) Y (1 , = ( (cid:98) Z × P ≤ i − ) (1 , = ( I n i ⊗ P ≤ i − ) (cid:98) Z (1 , . Now, observe that because of the colexicographical ordering of unfoldings and matri-cizations, we have (cid:98) Z (1 , = Z (cid:104) i (cid:105) and (cid:98) Y (1 , = Y (cid:104) i (cid:105) and this gives P ≥ i +1 ( P ≤ i − ( Z )) = Ten i (cid:2) Y (cid:104) i (cid:105) P ≥ i +1 (cid:3) = Ten i (cid:2) ( I n i ⊗ P ≤ i − ) Z (cid:104) i (cid:105) P ≥ i +1 (cid:3) . The term P ≤ i P ≥ i +1 is straightforward, and this finishes the proof. 4. Projector-splitting integrator. We now consider the main topic of thispaper: a numerical integrator for the dynamical TT/MPS approximation . Y ( t ) = P Y ( t ) ( . A ( t )) , Y ( t ) = Y ∈ M (4.1)of a given time-dependent tensor A ( t ) ∈ R n ×···× n d .Our integrator is a Lie–Trotter splitting of the vector field P Y ( . A ). The splittingitself is suggested by the sum in Corollary 3.2: using Y in the role of X , we can write P Y ( . A ) = P +1 ( . A ) − P − ( . A ) + P +2 ( . A ) − P − ( . A ) + · · · − P − d − ( . A ) + P + d ( . A )with the orthogonal projectors P + i ( Z ) = P ≤ i − P ≥ i +1 ( Z ) = Ten i (cid:2) ( I n i ⊗ P ≤ i − ) Z (cid:104) i (cid:105) P ≥ i +1 (cid:3) , (1 ≤ i ≤ d ) , (4.2) P − i ( Z ) = P ≤ i P ≥ i +1 ( Z ) = Ten i (cid:2) P ≤ i Z (cid:104) i (cid:105) P ≥ i +1 (cid:3) , (1 ≤ i ≤ d − . (4.3)By standard theory (see, e.g., [8, II.5]), any splitting of this sum results in a first-order integrator, and composing it with the adjoint gives a second-order integrator,also known as the Strang splitting. Somewhat remarkably, we shall show in Thm. 4.1that these split differential equations can be solved in closed form. Furthermore, ifthey are solved from left to right (or from right to left), the whole scheme can beimplemented very efficiently. Let t − t > t , t ]: . Y +1 = + P +1 ( . A ) , Y +1 ( t ) = Y ; . Y − = − P − ( . A ) , Y − ( t ) = Y +1 ( t );... . Y + i = + P + i ( . A ) , Y + i ( t ) = Y − i − ( t ); . Y − i = − P − i ( . A ) , Y − i ( t ) = Y + i ( t );... . Y + d = + P + d ( . A ) , Y + d ( t ) = Y − d − ( t ) . CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN Here, Y = Y ( t ) is the initial value of (4.1) and Y + d ( t ) is the final approximationfor Y ( t ). Observe that one full step consists of 2 d − substeps .We remark that the projectors P + i , P − i depend on the current value of Y + i ( t ) or Y − i ( t ); hence, they are in general time-dependent. For notational convenience, we donot denote this dependence explicitly since the following result states we can actuallytake them to be time-independent as long as they are updated after every substep.In addition, it shows how these substeps can be solved in closed form. Theorem 4.1. Let ∆ A = A ( t ) − A ( t ) . The initial value problems from abovesatisfy Y + i ( t ) = Y + i ( t ) + P + i (∆ A ) and Y − i ( t ) = Y − i ( t ) − P − i (∆ A ) , where P + i and P − i are the projectors at Y + i ( t ) and Y − i ( t ) , respectively.In particular, if Y + i ( t ) has the recursive SVD [ Y + i ( t )] (cid:104) i (cid:105) = Q ≤ i S i Q (cid:124) ≥ i +1 = ( I n i ⊗ Q ≤ i − ) Q (cid:124) i ( Q (cid:124) ≥ i +1 ⊗ I n i ) . In that case, we have[ Y + i ( t )] (cid:104) i − (cid:105) = Q ≤ i − (cid:110) S i − Q > (cid:124) i + Q (cid:124) ≤ i − [∆ A ] (cid:104) i − (cid:105) ( Q ≥ i +1 ⊗ I n i ) (cid:111) ( Q (cid:124) ≥ i +1 ⊗ I n i ) . (4.6) Theorem 4.1 canbe turned into an efficient scheme by updating the cores of the tensor Y ( t ) fromleft to right. Our explanation will be high level, focusing only on pointing out whichcores stay constant and which need to be updated throughout the sweep. A graphicaldepiction of the resulting procedure using tensor networks is given in Fig. 4.1. Moredetailed implementation issues are deferred to § Preparation of Y . Before solving the substeps, we prepare the starting value Y as follows. Write Y = Y for notational convenience and suppose Y (cid:104) (cid:105) ( t ) = Y ≤ ( t ) Y (cid:124) ≥ ( t ) . By orthogonalization from the right we decompose Y ≥ ( t ) = Q ≥ ( t ) R ( t ), so thatwe obtain the right-orthogonalized factorization Y (cid:104) (cid:105) ( t ) = K < ( t ) Q (cid:124) ≥ ( t )with K < ( t ) = Y ≤ ( t ) R (cid:124) ( t ) ∈ R n × r the first core of Y ( t ). Computation of Y +1 . Denote Y = Y +1 . Since Q (cid:124) ≥ ( t ) Q ≥ ( t ) = I r , we have that P ≥ ( t ) = Q ≥ ( t ) Q (cid:124) ≥ ( t ). Applying Theorem 4.1 gives Y (cid:104) (cid:105) ( t ) = K < ( t ) Q (cid:124) ≥ ( t ) , with K < ( t ) = K < ( t ) + (cid:0) A (cid:104) (cid:105) ( t ) − A (cid:104) (cid:105) ( t ) (cid:1) Q ≥ ( t ) . CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN Observe that compared to Y ( t ) only the first core K ( t ) of Y ( t ) is changed, whileall the others (that is, those that make up Q ≥ ( t )) stay constant. Hence, aftercomputing the QR decomposition K < ( t ) = Q < ( t ) R ( t ) , we obtain a recursive SVD for Y ( t ) = Y +1 ( t ), Y (cid:104) (cid:105) ( t ) = Q ≤ ( t ) R ( t ) Q (cid:124) ≥ ( t ) with Q ≤ ( t ) = Q < ( t ) . Computation of Y − i with i = 1 , . . . , d − . The computation for Y − follows thesame pattern as for arbitrary Y − i , so we explain it directly for Y − i .We require that the initial value Y − i ( t ) = Y + i ( t ) is available as a recursive SVDin node i . This is obviously true for Y +1 ( t ) and one can verify by induction that itis also true for Y + i ( t ) with i > 1, whose computation is explained below. Denoting Y = Y − i , we have in particular Y (cid:104) i (cid:105) ( t ) = Q ≤ i ( t ) R i ( t ) Q (cid:124) ≥ i +1 ( t ) , with Q (cid:124) ≤ i ( t ) Q ≤ i ( t ) = I r i = Q (cid:124) ≥ i +1 ( t ) Q ≥ i +1 ( t ). This means we can directly applyTheorem 4.1 for the computation of Y ( t ) and obtain Y (cid:104) i (cid:105) ( t ) = Q ≤ i ( t ) S i ( t ) Q (cid:124) ≥ i +1 ( t ) , (4.7)where S i ( t ) ∈ R r i × r i is given as S i ( t ) = R i ( t ) − Q (cid:124) ≤ i ( t ) (cid:0) A (cid:104) i (cid:105) ( t ) − A (cid:104) i (cid:105) ( t ) (cid:1) Q ≥ i +1 ( t ) . (4.8)Observe that we maintain a recursive SVD in i for Y − i ( t ) without having to orthog-onalize the matrices Q ≤ i ( t ) or Q ≥ i +1 ( t ). Computation of Y + i with i = 2 , . . . , d . In this case, the initial value Y + i ( t ) = Y − i − ( t ) is available as a recursive SVD in node i − 1. Denoting Y = Y + i , then it iseasily verified by induction that Y (cid:104) i − (cid:105) ( t ) = Q ≤ i − ( t ) S i − ( t ) Q (cid:124) ≥ i ( t ) , with Q (cid:124) ≤ i − ( t ) Q ≤ i − ( t ) = I r i − = Q (cid:124) ≥ i ( t ) Q ≥ i ( t ). Recalling the relations (2.8)and (2.10), we can transform this ( i − i th unfolding, Y (cid:104) i (cid:105) ( t ) = ( I n i ⊗ Q ≤ i − ( t )) K
6) Prepare for Y + i +1 K i +1 ( t ) Q ≥ i +2 ( t ) Q ≤ i ( t )0) Prepared for Y + i Y + i ( t ) Y − i ( t ) Y − i ( t ) Y + i +1 ( t ) Fig. 4.1 . The two sweeping algorithms update the cores selectively throughout the time steppingcomputations. Shown for the forward sweep when computing Y + i and Y − i . suffices to obtain a recursive SVD of Y + i ( t ) = Y ( t ) at node i , Y (cid:104) i (cid:105) ( t ) = Q ≤ i ( t ) R i ( t ) Q (cid:124) ≥ i +1 ( t ) , with Q ≤ i ( t ) = ( I n i ⊗ Q ≤ i − ( t )) Q
The final step Y + d ( t ) will be an approximation to Y ( t ) andconsists of a left-orthogonal Q ≤ d ( t ). If we now want to continue with the timestepper to approximate Y ( t ) for t > t , we need to apply the scheme again using Y + d ( t ) as initial value. This requires a new orthogonalization procedure from rightto left, since the initial value for the sweep has to be right orthogonalized. In many cases, it isadvisable to compose the scheme from above with its adjoint instead of only orthogo-4 CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN nalizing and continuing with the next step. In particular, the Strang splitting consistsof first computing the original splitting scheme on t ∈ [ t , t / ] with t / = ( t + t ) / t ∈ [ t / , t ]. The result will be asymmetric time stepper of order two; see, e.g., [8, II.5].For our splitting, the adjoint step is simply solving the split differential equationsin reverse order. Since Theorem 4.1 is independent of the ordering of the differentialequations, we can again use its closed-form solutions to derive an efficient sweepingalgorithm for this adjoint step. We briefly explain the first three steps and refer toAlgorithm 1 for the full second-order scheme. Observe that this scheme can be seenas a full back-and-forth sweep.Denote the final step of the forward sweep on t ∈ [ t , t / ] by (cid:98) Y = Y + d ( t / ). Itsatisfies (recall that t takes the role of t / in the derivations above) (cid:98) Y (cid:104) d (cid:105) ( t / ) = ( I n d ⊗ Q ≤ d − ( t / )) K Step of the split projector integrator of second order Data : K ( t ) ∈ R r × n × r , Q i ( t ) ∈ R r i − × n i × r i with Q > (cid:124) i ( t ) Q >i ( t ) = I r i − for i = 2 , . . . , d ; t , t Result : K ( t ) ∈ R r × n × r , Q i ( t ) ∈ R r i − × n i × r i with Q > (cid:124) i ( t ) Q >i ( t ) = I r i − for i = 2 , . . . , d begin set t / = ( t + t ) / (cid:46) Initialization set ∆ L = A ( t / ) − A ( t ) and ∆ R = A ( t ) − A ( t / ). set K < ( t / ) = K < ( t ) + ∆ (cid:104) (cid:105) L Q ≥ ( t ). (cid:46) Forward compute QR factorization K < ( t / ) = Q < ( t / ) R ( t / ). set Q ≤ ( t / ) = Q < ( t / ). set S ( t / ) = R ( t / ) − Q (cid:124) ≤ ( t / ) ∆ (cid:104) (cid:105) L Q ≥ ( t ). for i = 2 to d − do set K d ( t ) = Q >d ( t ) R d − ( t ) set Q ≥ d ( t ) = Q >d ( t ) set S d − ( t ) = R d − ( t ) − Q (cid:124) ≥ d ( t )( ∆ (cid:104) d − (cid:105) R ) (cid:124) Q ≤ d − ( t / ) for i = d − down to do set K >i ( t / ) = ( S i ( t ) ⊗ I n i ) Q >i ( t / ) set K >i ( t ) = K >i ( t / ) + ( Q (cid:124) ≥ i +1 ( t ) ⊗ I n i )( ∆ (cid:104) i − (cid:105) R ) (cid:124) Q ≤ i − ( t / ) compute QR factorization K >i ( t ) = Q >i ( t ) R i − ( t ) set Q ≥ i ( t ) = ( Q ≥ i +1 ( t ) ⊗ I n i ) Q >i ( t ) set S i − ( t ) = R i − ( t ) − Q (cid:124) ≥ i ( t )( ∆ (cid:104) i − (cid:105) R ) (cid:124) Q ≤ i − ( t / ) set K > ( t / ) = ( S ( t ) ⊗ I n ) Q > ( t / ) set K > ( t ) = K > ( t / ) + ( Q (cid:124) ≥ ( t ) ⊗ I n )( ∆ (cid:104) (cid:105) R ) (cid:124) CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN 5. Exactness property of the integrator. We show that the splitting inte-grator is exact when A ( t ) is a tensor of constant TT/MPS rank r . This is similar toTheorem 4.1 in [19] for the matrix case, except that in our case we require the rank of A ( t ) to be exactly r and not merely bounded by r . Note, however, that the positivesingular values of unfoldings of A ( t ) can be arbitrarily small. Theorem 5.1. Suppose A ( t ) ∈ M for t ∈ [ t , t ] . Then, for sufficiently small t − t > the splitting integrators of orders one and two are exact when started from Y = A ( t ) . For example, Y + d ( t ) = A ( t ) for the first-order integrator. The proof of this theorem follows trivially from the following lemma. Lemma 5.2. Suppose A ( t ) ∈ M for t ∈ [ t , t ] with recursive SVDs [ A ( t )] (cid:104) i (cid:105) = Q ≤ i ( t ) S i ( t ) Q (cid:124) ≥ i +1 ( t ) for i = 0 , , . . . , d. Let Y = A ( t ) , then for sufficiently small t − t > the consecutive steps in thesplitting integrator of § Y + i ( t ) = P (0) ≥ i +1 A ( t ) and Y − i ( t ) = P (1) ≤ i A ( t ) for i = 1 , , . . . , d, where P (0) ≥ i +1 Z = Ten i (cid:0) Z (cid:104) i (cid:105) Q ≥ i +1 ( t ) Q (cid:124) ≥ i +1 ( t ) (cid:1) ,P (1) ≤ i Z = Ten i (cid:0) Q ≤ i ( t ) Q (cid:124) ≤ i ( t ) Z (cid:104) i (cid:105) (cid:1) . Before proving this lemma, we point out that the assumption of sufficiently small t − t is only because the matrices Q (cid:124) ≤ i − ( t ) Q ≤ i − ( t ) and Q (cid:124) ≥ i ( t ) Q ≥ i ( t ) need tobe invertible. Since the full column-rank matrices Q ≤ i ( t ) and Q ≥ i ( t ) can be chosencontinuous functions in t , this is always satisfied for t − t sufficiently small. It mayhowever also hold for larger values of t − t . Proof . The proof proceeds by induction on i from left to right. Since Y +1 ( t ) = A ( t ), we can include the case for Y +1 ( t ) in our proof below for general i by putting Y − ( t ) = Y +0 ( t ) and P (1) ≤ = 1.Now, suppose the statement to be true for i > 1. Then, Y + i ( t ) = Y − i − ( t ) = P (1) ≤ i − A ( t ), which gives[ Y + i ( t )] (cid:104) i − (cid:105) = Q ≤ i − ( t ) Q (cid:124) ≤ i − ( t ) Q ≤ i − ( t ) S i − ( t ) Q (cid:124) ≥ i ( t )= Q ≤ i − ( t ) S + i − Q (cid:124) ≥ i ( t ) . Observe that Y + i ( t ) ∈ M since S + i − = Q (cid:124) ≤ i − ( t ) Q ≤ i − ( t ) S i − ( t ) is of full rankfor t − t sufficiently small. Hence, from (2.8)–(2.10) we obtain[ Y + i ( t )] (cid:104) i (cid:105) = ( I n i ⊗ Q ≤ i − ( t )) ( I n i ⊗ S + i − ) Q < (cid:124) i ( t ) Q (cid:124) ≥ i +1 ( t ) . Comparing to (4.2), we see that the projector onto the tangent space at Y + i ( t ) equals P + i = P (1) ≤ i − P (0) ≥ i +1 = P (0) ≥ i +1 P (1) ≤ i − . The previous identities give with Theorem 4.1 that Y + i ( t ) = Y + i ( t ) + P + i A ( t ) − P + i A ( t )= P (1) ≤ i − A ( t ) + P (0) ≥ i +1 P (1) ≤ i − A ( t ) − P (1) ≤ i − P (0) ≥ i +1 A ( t ) = P (0) ≥ i +1 A ( t ) , IME INTEGRATION OF TENSOR TRAINS P (0) ≥ i +1 A ( t ) = A ( t ) and P (1) ≤ i − A ( t ) = A ( t ).Continuing with Y − i ( t ) = Y + i ( t ) = P (0) ≥ i +1 A ( t ), we have[ Y − i ( t )] (cid:104) i (cid:105) = Q ≤ i ( t ) S i ( t ) Q (cid:124) ≥ i +1 ( t ) Q ≥ i +1 ( t ) Q (cid:124) ≥ i +1 ( t )= Q ≤ i ( t ) S − i Q (cid:124) ≥ i +1 ( t ) . This is again a recursive SVD with full rank S − i = S i ( t ) Q (cid:124) ≥ i +1 ( t ) Q ≥ i +1 ( t ). Com-paring to (4.3), we have P − i = P (1) ≤ i P (0) ≥ i +1 = P (0) ≥ i +1 P (1) ≤ i and Theorem 4.1 gives Y − i ( t ) = Y − i ( t ) − P − i A ( t ) + P − i A ( t )= P (0) ≥ i +1 A ( t ) − P (0) ≥ i +1 P (1) ≤ i A ( t ) + P (1) ≤ i P (0) ≥ i +1 A ( t ) = P (1) ≤ i A ( t ) , where we used P (1) ≤ i A ( t ) = A ( t ). This concludes the proof.Now, Theorem 5.1 is a simple corollary. Proof of Theorem 5.1. For the forward sweep (that is, the first-order scheme),Lemma 5.2 immediately gives exactness since Y + d ( t ) = P (0) ≥ d +1 A ( t ) = A ( t ) with Q ≥ d +1 ( t ) = 1. The second-order scheme composes this forward sweep with a back-ward sweep involving the same substeps. It is not difficult to prove the analogousversion of Lemma 5.2 for such a backward ordering such that we establish exactnessfor the second-order scheme too. 6. Numerical implementation and experiments. We consider two numer-ical experiments. First, we use the splitting integrator for the integration of a time-dependent molecular Schr¨odinger equation with a model potential. In the secondexperiment, we use one step of the splitting integrator as a retraction on the manifoldof TT/MPS tensors and perform a Newton–Schultz iteration for approximate matrixinversion. As explained in § K i and matrices S i in a forward, and possibly, backward ordering. Exceptfor the (relatively cheap) orthogonalizations of the cores, the most computationallyintensive part of the algorithm is computing these updates. For example, in theforward sweep, we need to compute the contractions (see Fig. 4.1)∆ + i = ( I ⊗ Q (cid:124) ≤ i − ( t )) [ A ( t ) − A ( t ) ] (cid:104) i (cid:105) Q ≥ i +1 ( t ) , ∆ − i = Q (cid:124) ≤ i ( t ) [ A ( t ) − A ( t ) ] (cid:104) i (cid:105) Q ≥ i +1 ( t ) . It is highly recommended to avoid constructing the matrices Q ≤ i and Q ≥ i explicitlywhen computing ∆ + i , ∆ − i and instead exploit their TT/MPS structure. How this canbe done, depends mostly on the structure of the increments A ( t ) − A ( t ). In particu-lar, the contractions are computed inexpensively if A ( t ) is itself a linear combinationof TT/MPS tensors, possibly having different rank than Y , and a sparse tensor.The computation of K i and S i changes when the tensor A ( t ) is not given explicitly,but determined as the solution of a tensor differential equation . A ( t ) = F ( t, A ( t )) . In case of a forward sweep, Y + i ( t ) is obtained as the evaluation at t = t of Y + i ( t ) = ( I ⊗ Q ≤ i − ( t )) K
12 ∆ + 12 f (cid:88) k =1 q k + anharmonic part (cid:122) (cid:125)(cid:124) (cid:123) λ f − (cid:88) k =1 (cid:18) q k q k +1 − q k +1 (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) Henon-Heiles potential V ( q ,...,q f ) (6.2)with λ = 0 . ψ , we choose a product of shifted Gaus-sians, ψ = f (cid:89) i =1 exp (cid:18) − ( q − (cid:19) . The correct discretization of such problems is delicate. A standard approach isto use a Discrete Variable Representation (DVR), specifically, the Sine-DVR schemefrom [3]. In addition, since the problem is defined over the whole space, appropriateboundary conditions are required. We use complex absorbing potentials (CAP) of theform (see, for example, [22]) W ( q ) = iη f (cid:88) i =1 (cid:16) ( q i − q ( r ) i ) b r + + ( q i − q ( l ) i ) b l − (cid:17) , IME INTEGRATION OF TENSOR TRAINS z + = (cid:40) z, if z ≥ , , otherwise , and z − = (cid:40) z, if z ≤ , , otherwise . The parameters q ( r ) i and q ( l ) i specify the effective boundary of the domain. CAPreduces the reflection from the boundary back to the domain, but the system is nolonger conservative. For the Henon–Heiles example from above we have chosen η = − , q ( l ) i = − , q ( r ) i = 6 , b r = b l = 3 . We compute the dynamics using the second-order splitting integrator where the(linear) local problems for K i , S i are integrated using the Expokit package [30] witha relative accuracy of 10 − .In order to evaluate the accuracy and efficiency of our proposed splitting inte-grator, we performed a preliminary comparison with the multi-configuration time-dependent Hartree (MCTDH) package [32]. The MCTDH method [22] is the de-factostandard for doing high-dimensional quantum molecular dynamics simulations. Forthe detailed description of MCTDH, we refer to [22, 23, 2, 21].As numerical experiment, we run MCTDH for the 10-dimensional Henon–Heilesproblem from above with mode-folding. This can be considered as a first step ofthe hierarchical Tucker format (in this context called the multilayer MCTDH de-composition) with 32 basis functions in each mode, and the resulting function wasapproximated by a 5-dimensional tensor with mode sizes equal to 18. The final timewas T = 60. Our splitting integrator solved the same Henon–Heiles problem but nowusing the second-order splitting integrator with a fixed time step h = 0 . 01. Exceptthat we use the TT/MPS manifold for our scheme instead of a Tucker-type manifoldas in MCDTH, all other computational parameters are the same.In Fig. 6.1 we see the vibrational spectrum of a molecule, which is obtained asfollows. After the dynamical low-rank approximation ψ ( t ) is computed, we evaluatethe autocorrelation function a ( t ) = (cid:104) ψ ( t ) , ψ (0) (cid:105) , and compute its Fourier transform (cid:98) a ( ξ ). The absolute value of (cid:98) a ( ξ ) gives the information about the energy spectrum ofthe operator. If the dynamics is approximated sufficiently accurately, the function (cid:98) a ( ξ ) is approximated as a sum of delta functions located at the eigenvalues of H .This method can be considered as a method to approximate many eigenvalues of H by using only one solution of the dynamical problem, which is not typical to standardnumerical analysis, but often used in chemistry.We see in Fig. 6.1 that the computed spectra are very similar, but the MCTDHcomputation took 54 354 seconds, whereas the splitting integrator scheme took only4 425 seconds. A detailed comparison of the splitting scheme and MCTDH for quan-tum molecular dynamics will be presented elsewhere. This will include different bench-mark problems and a comparison with the multilayer version of the MCTDH.0 CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN 10 20 30 40 50 − . . . . . . . . . TT-KSL, rank=18MCTDH Fig. 6.1 . Spectrum computed by the second-order splitting integrator and by the MCTDH package Optimization on low-rank tensor man-ifolds is another promising application of the splitting integrator scheme and can berather easily incorporated. Consider some iterative process of the form Y k +1 = Y k + ∆ k k = 0 , . . . (6.3)where ∆ k is the update. In order to obtain approximations Z k ∈ M of Y k in theTT/MPS format, one typically retracts the new iterate back to M , Z k +1 = P r ( Z k + ∆ k ) , with P r : R n ×···× n d → M a retraction; see [1]. A widely used choice for P r is thequasi-optimal projection computed by TT-SVD [26]. Instead, we propose the cheaperalternative of one step of Algorithm 1 with A ( t ) − A ( t ) = ∆ k as P r . In practice,the intermediate quantities in Algorithm 1 have to be computed without forming ∆ k explicitly. This can be done, for example, when ∆ k is a TT/MPS tensor of low-rankas explained in § Y k +1 = 2 Y k − Y k AY k , k = 0 , . . . . (6.4)It is well-known that iteration (6.4) converges quadratically provided that ρ ( I − AY ) ≤ 1, where ρ ( · ) is the spectral radius of the matrix. The matrix A is supposed to havelow TT/MPS rank when seen as a tensor. This typically arises from a discretizationof a high-dimensional operator on a tensor grid. In our numerical experiments wehave taken the M -dimensional Laplace operator with Dirichlet boundary conditions,discretized by the usual second-order central finite difference on a uniform grid with2 d points in each mode. IME INTEGRATION OF TENSOR TRAINS M d -dimensional TT/MPS format with all dimensions n i = 2. It isknown [13] that in this format the matrix A is represented with QTT-ranks boundedby 4. Since A is symmetric positive definite, as an initial guess we choose Y = αI with a sufficiently small α . The splitting integrator is applied with ∆ k = Y k − Y k AY k .It requires a certain amount of technical work to implement all the operations involvedin the QTT format, but the final complexity is linear in the dimension of the tensor(but of course, has high polynomial complexity with respect to the rank). To put thesolution onto the right manifold we artificially add a zero tensor to the initial guess,which has rank 1, and formally apply the splitting integrator.As first numerical result, we compare the projector-splitting scheme to the stan-dard approach where after each step of the Newton-Schultz iteration we project ontoa manifold of tensors with bounded TT/MPS ranks r using the TT-SVD, Y k +1 = P r (2 Y k − Y k AY k ) . The parameters are set as M = 2, d = 7, r = 20, α = 10 − . The convergence of therelative residual (cid:107) AY k − I (cid:107) / (cid:107) AY − I (cid:107) in the Frobenius norm for the two methods ispresented in Fig. 6.2. The splitting method has slightly better accuracy and, moreimportantly, is significantly faster. Iteration − − − R e l a t i v e r e s i du a l Split proj: 6.3 sec.Standard: 287.2 sec. Fig. 6.2 . Convergence of split projector method and the SVD-based projection method for D = 2 , d = 7 , α = 10 − , r = 20 During the numerical experiments we observed that the residual always decreasesuntil the point when the manifold is insufficient to hold a good approximation toan inverse, and then it either stabilizes or diverges. The exact explanation of thisbehavior is out of the scope of the current paper but could probably be solved usinga proper line-search on M as in [1]. Fig. 6.3 shows the convergence behavior for2 CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKEN10 − − R e l a t i v e r e s i du a l M = 1 − − R e l a t i v e r e s i du a l M = 3 Iteration number − − R e l a t i v e r e s i du a l M = 2 Iteration number − − R e l a t i v e r e s i du a l M = 10 r = 1 r = 10 r = 20 r = 30 r = 40 Fig. 6.3 . The relative residual vs. iteration number for the approximate inversion usingTT/MPS rank r of the M -dimensional Laplace operator on a uniform grid with = 128 points ineach dimension. Fixed starting guess Y = αI with α = 10 − . different M and r , with d and α fixed. Fig. 6.4 shows the convergence behavior withrespect to different α and d . Finally, Fig. 6.5 shows that the code has good scalingwith d and M . 7. Conclusion. We have presented and studied a robust and computationallyefficient integrator for updating tensors in the tensor train or matrix product stateformat and for approximately solving tensor differential equations with the approxi-mations retaining the data-sparse tensor train format. Quantum dynamics and tensoroptimization appear as promising application areas.It appears possible to extend this approach to the manifold of hierarchical Tuckertensors of fixed rank [31] and its dynamical approximation [20]. This will be reportedelsewhere. In addition, the integrator shares a close resemblance to alternating leastsquares (ALS) or one-site DMRG (see, e.g., [11, 4] and for a geometric analysis [28])when the time step goes to infinity. This requires further investigation. Acknowledgement. We thank Jutho Haegeman and Frank Verstraete (Gent)for helpful discussions regarding matrix product states and the splitting integrator,and Hans-Dieter Meyer (Heidelberg) for explaining the basic concepts behind quan-tum molecular dynamics simulations and for his help with the MCTDH package.We thank the two referees as well as Emil Kieri (Uppsala) and Hanna Walach(T¨ubingen) for pointing out numerous typos in a previous version and for suggestingimprovements of the presentation.The work of C.L. was supported by DFG through SPP 1324 and GRK 1838. Thework of I.O. was supported by Russian Science Foundation grant 14-11-00659. IME INTEGRATION OF TENSOR TRAINS − − R e l a t i v e r e s i du a l d = 7 − − R e l a t i v e r e s i du a l d = 9 Iteration number − − R e l a t i v e r e s i du a l d = 8 Iteration number − − R e l a t i v e r e s i du a l d = 10 α = 10 − α = 10 − α = 10 − α = 10 − α = 10 − α = 10 − Fig. 6.4 . The relative residual vs. iteration number for the approximate inversion usingTT/MPS rank of the -dimensional Laplace operator a uniform grid with d points in eachdimension. Starting guesses are Y = αI . Total dimension M · d − T i m e ( s e c ) r = 1 r = 10 r = 20 r = 30 r = 40 Linear scaling Fig. 6.5 . Time in log-log scale as a function of the total dimension Md of the tensor CH. LUBICH, I.V. OSELEDETS AND B. VANDEREYCKENREFERENCES[1] P.-A. Absil, R. Mahony, and R. Sepulchre , Optimization Algorithms on Matrix Manifolds ,Princeton University Press, Princeton, NJ, 2008.[2] M. H. Beck, A. J¨ackle, G. A. Worth, and H.-D. Meyer , The multiconfiguration time-dependent Hartree method: A highly efficient algorithm for propagating wavepackets. ,Phys. Rep., 324 (2000), pp. 1–105.[3] D. T. Colbert and W. H. Miller , A novel discrete variable representation for quantummechanical reactive scattering via the S-matrix Kohn method , J. Chem. Phys., 96 (1992),pp. 1982–1991.[4] S. V. Dolgov and I. V. Oseledets , Solution of linear systems and matrix inversion in theTT-format , SIAM J. Sci. Comput., 34 (2012), pp. A2718–A2739.[5] W. Hackbusch , Tensor spaces and numerical tensor calculus. , Berlin: Springer, 2012.[6] W. Hackbusch, B. Khoromskij, and E. Tyrtyshnikov , Approximate iterations for structuredmatrices , Numer. Math., 109 (2008), pp. 365–383.[7] J. Haegeman, T. J. Osborne, and F. Verstraete , Post-matrix product state methods: Totangent space and beyond , Phys. Rev. B, 88 (2013), p. 075133.[8] E. Hairer, C. Lubich, and G. Wanner , Geometric Numerical Integration , Springer-Verlag,Berlin, Germany, second ed., 2006.[9] M. Hochbruck and C. Lubich , On Krylov subspace approximations to the matrix exponentialoperator , SIAM J. Numer. Anal., 34 (1997), pp. 1911–1925.[10] M. Hochbruck and A. Ostermann , Exponential integrators , Acta Numerica, 19 (2010),pp. 209–286.[11] S. Holtz, T. Rohwedder, and R. Schneider , The alternating linear scheme for tensor opti-misation in the TT format , SIAM J. on Sci. Comput., 34 (2012).[12] , On manifolds of tensors of fixed TT-rank , Numer. Math., 120 (2012), pp. 701–731.[13] V. A. Kazeev and B. N. Khoromskij , Low-rank explicit QTT representation of the Laplaceoperator and its inverse , SIAM J. Matrix Anal. Appl., 33 (2012), pp. 742–758.[14] B. N. Khoromskij , O ( d log n ) –Quantics approximation of N – d tensors in high-dimensionalnumerical modeling , Constr. Approx., 34 (2011), pp. 257–280.[15] O. Koch and C. Lubich , Dynamical low-rank approximation , SIAM J. Matrix Anal. Appl., 29(2007), pp. 434–454.[16] T. G. Kolda and B. W. Bader , Tensor decompositions and applications , SIAM Review, 51(2009), pp. 455–500.[17] P. Kramer and M. Saraceno , Geometry of the time-dependent variational principle in quan-tum mechanics , vol. 140 of Lecture Notes in Physics, Springer-Verlag, Berlin-New York,1981.[18] C. Lubich , From quantum to classical molecular dynamics: reduced models and numerical anal-ysis , Zurich Lectures in Advanced Mathematics, European Mathematical Society (EMS),Z¨urich, 2008.[19] C. Lubich and I. Oseledets , A projector-splitting integrator for dynamical low-rank approx-imation , BIT, 54 (2014), pp. 171–188.[20] C. Lubich, T. Rohwedder, R. Schneider, and B. Vandereycken , Dynamical approximationof hierarchical Tucker and tensor-train tensors , SIAM J. Matrix Anal. Appl., 34 (2013),pp. 470–494.[21] U. Manthe, H.-D. Meyer, and L. S. Cederbaum , Wave-packet dynamics within the mul-ticonfiguration Hartree framework: General aspects and application to NOCl , J. Chem.Phys., 97 (1992), pp. 3199–3213.[22] H.-D. Meyer, F. Gatti, and G. A. Worth , eds., Multidimensional Quantum Dynamics:MCTDH Theory and Applications , Wiley-VCH, Weinheim, 2009.[23] H.-D. Meyer and G. A. Worth , Quantum molecular dynamics: Propagating wavepackets anddensity operators using the multiconfiguration time-dependent Hartree (MCTDH) method ,Theor. Chem. Acc., 109 (2003), pp. 251–267.[24] M. Nest and H.-D. Meyer , Benchmark calculations on high-dimensional Henon-Heiles po-tentials with the multi-configuration time dependent Hartree (MCTDH) method , J. Chem.Phys., 117 (2002), p. 10499.[25] I. V. Oseledets , Approximation of d × d matrices using tensor decomposition , SIAM J.Matrix Anal. Appl., 31 (2010), pp. 2130–2145.[26] , Tensor-train decomposition , SIAM J. Sci. Comput., 33 (2011), pp. 2295—2317.[27] I. V. Oseledets and E. E. Tyrtyshnikov , Breaking the curse of dimensionality, or how touse SVD in many dimensions , SIAM J. Sci. Comput., 31 (2009), pp. 3744–3759.[28] T. Rohwedder and A. Uschmajew , On local convergence of alternating schemes for opti- IME INTEGRATION OF TENSOR TRAINS mization of convex problems in the tensor train format , SIAM J. Numer. Anal., 5 (2013),pp. 1134–1162.[29] U. Schollw¨ock , The density-matrix renormalization group in the age of matrix product states ,Annals of Physics, 326 (2011), pp. 96–192.[30] R. B. Sidje , Expokit: a software package for computing matrix exponentials , ACM Transactionson Mathematical Software (TOMS), 24 (1998), pp. 130–156.[31] A. Uschmajew and B. Vandereycken , The geometry of algorithms using hierarchical tensors ,Lin. Alg. Appl., 439 (2013), pp. 133—166.[32]