[PDF] Computation of extreme eigenvalues in higher dimensions using block tensor train format

Abstract

We consider an approximate computation of several minimal eigenpairs of large Hermitian matrices which come from high--dimensional problems. We use the tensor train format (TT) for vectors and matrices to overcome the curse of dimensionality and make storage and computational cost feasible. Applying a block version of the TT format to several vectors simultaneously, we compute the low--lying eigenstates of a system by minimization of a block Rayleigh quotient performed in an alternating fashion for all dimensions. For several numerical examples, we compare the proposed method with the deflation approach when the low--lying eigenstates are computed one-by-one, and also with the variational algorithms used in quantum physics.

Full PDF

aa r X i v : . [ m a t h . NA ] J un Computation of extreme eigenvalues in higherdimensions using block tensor train format ∗ S. V. Dolgov † , B. N. Khoromskij † , I. V. Oseledets ‡ , D. V. Savostyanov ‡ § June 09, 2013

Abstract

We consider an approximate computation of several minimal eigenpairsof large Hermitian matrices which come from high–dimensional problems.We use the tensor train format (TT) for vectors and matrices to overcome thecurse of dimensionality and make storage and computational cost feasible.Applying a block version of the TT format to several vectors simultaneously,we compute the low–lying eigenstates of a system by minimization of a blockRayleigh quotient performed in an alternating fashion for all dimensions. Forseveral numerical examples, we compare the proposed method with the de-ﬂation approach when the low–lying eigenstates are computed one-by-one,and also with the variational algorithms used in quantum physics.

Keywords: high–dimensional problems, DMRG, MPS, tensor train format,low–lying eigenstates.

High-dimensional problems are notoriously diﬃcult to solve by standard numeri-cal techniques due to the curse of dimensionality — the complexity grows exponen-tially with the number of degrees of freedom. The problems of such kind arise inmany diﬀerent applications in physics, chemistry, biology and engineering, buttheir study in numerical linear algebra has begun quite recently.There are not many techniques capable of solving high–dimensional problemseﬃciently. The most prominent among them are Monte Carlo and quasi MonteCarlo methods, best N-term approximations, and advanced discretization meth-ods such as sparse grids and radial basis functions. However, all of these methodshave their own disadvantages. For example, it is diﬃcult to achieve high accuracy ∗ Partially supported by RFBR grants 11-01-00549-a, 12-01-33013, 12-01-00546-a, 12-01-91333-nnio-a, Rus. Fed. Gov. project 16.740.12.0727 at the Institute of Numerical Mathematics RAS andEPSRC grant EP/H003789/1 at the University of Southampton. † Max-Planck Institute for Mathematics in the Sciences, Inselstrasse 22, Leipzig 04103, Germany( [email protected], [email protected] ) ‡ Institute of Numerical Mathematics of Russian Academy of Sciences, Gubkina 8, Moscow119333, Russia ( [email protected] ) § University of Southampton, School of Chemistry, Highﬁeld Campus, Southampton SO17 1BJ,United Kingdom ( [email protected] ) d & One of the most fruitful ideas for solving high-dimensional problems is theidea of separation of variables . For two variables it boils down to the celebratedSchmidt decomposition, which is known on a discrete level as the singular valuedecomposition (SVD), a particular low–rank decomposition of a matrix. Diﬀerentgeneralizations of this idea to higher dimensions, most notable are the canoni-cal (CP) and Tucker formats, have been studied motivated by the applications indata analysis, particularly chemometrics, see the review [27]. These classical for-mats have their drawbacks as well, e.g. the canonical format is in general notstable to perturbations, and the Tucker format suﬀers from the curse of dimen-sionality. Nevertheless, in many applications the canonical representation can becomputed eﬃciently using, e.g. greedy algorithms [1, 28, 43] or by a multigrid ac-celerated reduced higher order SVD combined with the Tucker format [23], andfor the Tucker format a quasioptimal approximation can be computed using theSVD algorithm [3].Eﬃcient methods for quantum many-body systems are based on low–parametrictensor product formats. One of the most successful approaches, the density ma-trix renormalization group (DMRG) [45, 46] is an optimization technique that usesthe matrix product state (MPS) representation, see the review [41]. The MPS andDMRG are described in a problem–speciﬁc language, and despite they became themethods of choice for many applications in the solid state physics and quantumchemistry, they were unknown in numerical analysis.Looking for more eﬃcient dimensionality reduction schemes, two groups inthe numerical linear algebra community have re-discovered independently suc-cessful tensor formats under diﬀerent names, namely the Tree-Tucker [35], tensortrain (TT) [33] and hierarchical Tucker (HT) [15, 12] formats. The equivalence ofthe TT and MPS format has been shortly discovered and reported in [16]. Thisconnection is very beneﬁcial and fruitful: the idea of the DMRG algorithm hasbeen applied to diﬀerent kinds of problem in numerical analysis: approximatesolution of linear systems [16, 8], solution of eigenvalue problems [25], dynam-ics [6], cross interpolation [40]. At the same time, new tensor formats have beenproposed, e.g. the quantized tensor train (QTT) [21, 31], and the QTT-Tucker [4],and several new results have been obtained for eigenproblems [29], solution of lin-ear systems [9, 10], multidimensional convolution [20, 13, 17], multidimensionalFourier transform [7], interpolation [39].The use of diﬀerent tensor product formats for quantum chemistry problemshas been considered in [22, 11, 23, 14, 24, 19]. This paper can be considered as a nat-ural extension of the work [25], where the computation of a single extreme eigen-value using the (quantized) tensor train format was revisited. Instead of comput-ing one eigenvalue we want to compute approximations to several extreme eigen-pairs simultaneously. Such problems appear, for example, in the computationof excited states in physics and chemistry. This generalization is not straightfor-ward, taking into account the special structure of the manifold where the solutionis sought. Our approach consists in two key components. First, all eigenstates arerepresented in the TT-format in a simultaneous manner (so-called block TT-format ),and the minimization problem is formulated in a rigorous way as a minimization2f the Block Rayleigh quotient. The optimization problem is then solved by meansof alternating optimization with a small but very important trick: instead of usingone-and-the same representation for all steps, the auxiliary index correspondingto the eigenstate number is always associated with the currently optimized TTcore. The local problem at each iteration step becomes a linear block eigenvalueproblem which can be treated by standard iterative techniques.At the moment the draft of this paper was written, Frank Verstraete and IztokPizorn kindly informed us about the work [36] in the MPS community, which alsoaddresses solution of the block Rayleigh minimization problem in the alternatingframework, but the methods described in [36] diﬀer from the algorithm proposedin this paper.Our algorithm has a better asymptotic complexity with respect to the modesize, but adapts the TT ranks during the iterations like in the DMRG scheme. Thiscan be particularly important for solving high–dimensional problems with largemode sizes.The paper is organized as follows. In Section 2 deﬁnitions of the tensor trainformat are introduced. In Section 3 the algorithm is presented, and the complex-ity is analyzed and compared to the algorithms used in quantum physics. Sec-tions 4, 5, 6 contain numerical experiments for the particle in a box, the Henon–Heiles potential, and the Heisenberg model, including the comparison of the com-putational speed with the publicly available DMRG implementations.

We consider the eigenproblem AX = XΛ, with the Hermitian matrix A = A ∗ , and we are interested in B extreme eigenvalues λ b and their eigenvectors x b , for b =

1, . . . , B.

This problem is equivalent to the minimization of the block Rayleighquotient trace ( X ∗ AX ) → min , s.t. X ∗ X = I, (1)where X = [ x b ] Bb = contains the orthogonal eigenvectors.We assume that the problem has a tensor-product structure, i.e. all eigenvec-tors involved can be associated with d -dimensional tensors. Speciﬁcally, the el-ements of a vector x = [ x ( i )] Ni = can be enumerated with d mode indices i , . . . , i d by a linear map i = i . . . i d . The mode indices run through i k =

1, . . . , n k , where n k are referred to as the mode sizes for k =

1, . . . , d.

Naturally, N = n . . . n d , andif all mode sizes are of the same order n k ∼ n, the number of unknowns growsexponentially with the dimension, N ∼ n d . To make the problem tractable, we usethe tensor train (TT) format [33], deﬁned as follows, x ( i ) = x ( i . . . i d ) = X ( ) ( i ) . . . X ( d ) ( i d )= X α X ( ) α ( i ) . . . X ( k − ) α k − ,α k − ( i k − ) X ( k ) α k − ,α k ( i k ) X ( k + ) α k ,α k + ( i k + ) . . . X ( d ) α d − ( i d ) . (2)Here and later we write equations in the elementwise notation , i.e. assume theyhold for all possible values of all free indices. The summation over α = ( α , . . . , α d − ) assumes the summation over all possible values of all auxiliary (or bond) indices α k =

1, . . . , r k , where numbers r , . . . , r d − are referred to as the tensor train ranks (TT–ranks). Each X ( k ) ( i k ) is an r k − × r k matrix, i.e. each entry of a vector x = [ x ( i )] ( k ) X ( k + ) X ( d ) X ( k − ) X ( ) X ( ) α α α k − α k − α k α k + α d i i i k − i k i k + i d A ( k ) A ( k + ) A ( d ) A ( k − ) A ( ) A ( ) α α α k − α k − α k α k + α d i i i k − i k i k + i d j j j k − j k j k + j d Figure 1: Above: tensor train format (2) shown as a linear tensor network. Below:tensor train format (9) for a matrix in higher dimensions. The boxes are tensorswith lines (legs) denoting indices. Each bond between two tensors assumes a sum-mation over the joint index.is represented by a product of d matrices in the right–hand side. The three–dimensional arrays X ( k ) = [ X ( k ) α k − ,α k ( i k )] of size r k − × n k × r k are referred to asthe TT–cores . In the representation (2), the vector x is seen as a vectorization of a d –tensor, which explains the name of the format. The tensor train format is a lineartensor network , and can be illustrated as a graph, see Fig. 1.In this paper we represent all eigenvectors of interest simultaneously by the block tensor train format. Deﬁnition 1 (block TT-format) . The vectors x b ( i ) are said to be in the block–TTformat, if x b ( i ) = x b ( i . . . i d ) = X ( ) ( i ) . . . X ( p − ) ( i p − ) ^ X ( p ) ( i p , b ) X ( p + ) ( i p + ) . . . X ( d ) ( i d ) . (3)The choice of the mode p which carries the index b is not ﬁxed — we will moveit back and forth during the optimization. When the position p is chosen, it meansthat the matrix ^ X ( p ) ( i p , b ) is additionally parametrized by the index b. The ‘block’TT–core ^ X ( p ) = [ ^ X ( p ) α p − ,α p ( i p , b )] is now a tensor with four indices.Following [37], we deﬁne the interfaces X k ofsize r k × n k + . . . n d as follows X k ( β k , i k + . . . i d − i d ) = X α k + ...α d − X ( k + ) β k ,α k + ( i k + ) . . . X ( d − ) α d − ,α d − ( i d − ) X ( d ) α d − ( i d ) . (4)Using the interfaces we introduce the frame matrix (see Fig. 2) as follows X = k = X k (cid:1) ⊤ , X = k ( i . . . i d , β k − j k β k )= X α ,...,α k − α k + ...α d X ( ) α ( i ) . . . X ( k − ) α k − ,β k − ( i k − ) δ ( i k , j k ) X ( k + ) β k ,α k + ( i k + ) . . . X ( d ) α d − ( i d ) . (5)4 X ( k ) X ( k + ) X ( d ) X ( k − ) X ( ) X ( ) α α α k − β k − β k α k + α d i i i k − j k i k i k + i d Figure 2: The frame matrix (5) maps a TT core (above) into a large vector (below).The tensor train format (2) is polylinear with respect to the TT–cores, and inparticular the block-TT format (3) is linear w.r.t. the block core ^ X ( p ) . We can writeit using the frame matrix as follows, x = X = k ^ x ( k ) , X = X = p ^ X ( p ) , (6)where the vector x ( k ) of size r k − n k r k is a vectorization of the TT–core X ( k ) , x ( k ) = vec X ( k ) , x ( k ) ( β k − j k β k ) = X ( k ) β k − ,β k ( j k ) , and the block TT–core ^ X ( p ) = [^ x ( p ) b ] is seen as a r p − n p r p × B matrix, which encap-sulates vectorizations ^ x ( p ) b = vec ^ X ( p ) ( b ) as columns.The TT–core ^ X ( p ) can be also reshaped into a r p − n p × Br p matrix by arrangingthe indices in the form ^ X ( p ) = [ ^ X ( p ) ( α p − i p , bα p )] , where the overline shows thatthe indices α p − and i p are combined in a single multi-index, as well as b and α p . We can write the low–rank decomposition of this matrix ^ X ( p ) α p − ,α p ( i p , b ) = ^ X ( p ) ( α p − i p , bα p ) = r ′ p X α ′ p = X ( p ) α p − ,α ′ p ( i p ) G α ′ p ,α p ( b ) , (7)where the new matrix G ( b ) of size r ′ p × r p appears, which carries the index b. Substituting (7) into (3) as follows, x b ( i ) = x b ( i . . . i d ) = X ( ) ( i ) . . . X ( p ) ( i p ) G ( b ) X ( p + ) ( i p + ) . . . X ( d ) ( i d ) , (8)we obtain the regular TT/MPS format for X = [ x b ( i )] , where the index i is deﬁnedby d mode indices i , . . . , i d and b is considered as the additional index which isplaced between i p and i p + . This format with G ( b ) in the rightmost position hasbeen applied for the eigenproblem solution in [29], and is also considered in [41].Using the decomposition (7), we can move the index b back and forth the ten-sor train. Indeed, if we multiply G ( b ) by X ( p + ) and reshape the result, we obtain ^ X ( p + ) b and recover (3) with the ‘block’ TT–core at the position p + We see thatafter this multiplication the TT–rank r p is replaced by r ′ p and in general r ′ p = r p . This means that the TT–ranks depend on the position p of the TT–core which car-ries the index b. The natural bound for the considered example is r ′ p r p B, whichtechnically allows the ranks to grow exponentially during the iterations. For theoptimization algorithm to be eﬃcient, special measures should be taken to keepthe TT–ranks moderate. 5 ( k ) ^ X ( k ) ^ X ( k ) A ( k + ) A ( d ) A ( k − ) A ( ) A ( ) X ( k + ) X ( d ) X ( k − ) X ( ) X ( ) X ( k + ) X ( d ) X ( k − ) X ( ) X ( ) i i i k − i k i k + i d j j j k − j k j k + j d γ γ γ k − γ k − γ k γ k + γ d α α α k − α k − α k α k + α d β β β k − β k − β k β k + β d bb Figure 3: Tensor network corresponding to the block Rayleigh quotient (1) withthe matrix A and vectors X = [ x b ] given in the tensor train format.To reduce (truncate) the TT–rank, we can compute the low–rank decompo-sition (7) approximately. The approximation of the lowest rank within the pre-scribed accuracy level can be computed by the singular value decomposition (SVD).The perturbation introduced to ^ X ( p ) results in the perturbation to the whole ma-trix of eigenstates X, which can be ampliﬁed by the norm of the frame matrix X = p due to the linearity (6). If X ∗6 = p X = p = I, the Frobenius norms of the local and theglobal perturbations are the same. For a given tensor the orthogonality of theframe matrix can be achieved using the non–uniqueness of the TT format, by im-plying the left–orthogonality (left–normalization) constrains to X ( ) , . . . , X ( p − ) andthe right–orthogonality (right–normalization) to X ( p + ) , . . . , X ( d ) . Deﬁnition 2 (TT–orthogonality) . The TT–core X ( k ) is called left– or right–orthogonal ,if, respectively, X i k ( X ( k ) ( i k )) ∗ X ( k ) ( i k ) = I, X i k X ( k ) ( i k )( X ( k ) ( i k )) ∗ = I. It can be shown [33, 41] that if X ( ) , . . . , X ( p − ) are left-orthogonal, then the in-terface matrix X

p are or-thonormal. Together these conditions provide the orthogonality of the frame ma-trix X = p deﬁned by (5), which allows the full accuracy control in the rank trunca-tion step.For a given TT format the TT–orthogonality can be implied constructively bythe subsequent orthogonalization of the cores, see [33, 41]. The procedure is verycheap and requires only O ( nr ) operations for each TT–core to be normalized.This operation is never a bottleneck in our algorithms, so we always assume thatfor a chosen TT–core ^ X ( p ) the frame matrix is orthogonal, without going into theminor detail.The TT format for a matrix is written as follows, A ( i, j ) = A ( i . . . i d , j . . . j d ) = A ( ) ( i , j ) . . . A ( d ) ( i d , j d ) . (9)6 lgorithm 1 ALS optimization for the block Rayleigh quotient in the TT format

Require:

Matrix A and initial guess X in the TT–format (9) and (3), resp. Ensure:

Improved eigenvectors X = [ x b ] in the TT–format (3) while stopping criterion is not satisﬁed do Move index b to ^ X ( ) ( i , b ) and make X ( ) , . . . , X ( d ) right–orthogonal for p =

1, . . . , d − do {Left–to–right half–sweep} Solve one–dimensional problem (10) and replace ^ X ( p ) := ^ X ( p ) ⋆ Decompose ^ X ( p ) by (7) with accuracy ε and replace r p := r ′ p Ensure the left–orthogonality of X ( p ) In (8), merge blocks G ( b ) X ( p + ) ( i p + ) = ^ X ( p + ) ( i p + , b ) end for Perform right–to–left half–sweep in the same way end while

The multiplication of a matrix by a vector in the TT format has been discussed indetail in [42, 32]. The tensor network which represents the quadratic function in (1)is shown in Fig. 3.

We formulate the eigenproblem via the Rayleigh quotient optimization (1), whichis restrictively large when d increases. To make the problem tractable, the mini-mization over the whole space is substituted by the optimization over the mani-fold of tensors in the TT-format. The most natural approach is the alternating leastsquares (ALS)-type algorithm. Let all the TT-cores except X ( p ) ( i p ) be “frozen”.Since the frame matrix is assumed to be orthogonal, the minimization problem(1) is reduced to a smaller problem ^ X ( p ) ⋆ = arg min ^ X ( p ) trace (( ^ X ( p ) ) ∗ X ∗6 = p A X = p ^ X ( p ) ) , s.t. ( ^ X ( p ) ) ∗ ^ X ( p ) = I, (10)which is in fact equivalent to the standard block eigenproblem for a small matrix.This local problem is often called one–dimensional (one–site), since it correspondsto the particular mode p , and the number of unknowns is linear in the mode size n p . It is natural to organize the optimization process into sweeps , see Alg. 1. In thisway, the orthogonalization of the TT-cores is very cheap.The matrix-by-vector product for the local problem scales as O ( r r ) matvec ( n ) and O ( r nr A ) additional operations , where matvec ( n p ) is a cost of each multipli-cation of a n p × n p matrix by a vector, y ( i p ) = n X j p = A ( p ) γ p − ,γ p ( i p , j p ) x ( j p ) . For applications in quantum chemistry and physics the mode size is usually notlarge, and O ( n ) complexity is acceptable. In applications of numerical linear al- In complexity estimates we assume that all mode sizes are O ( n ) , TT–ranks of vector X are O ( r ) and TT–ranks of A are O ( r A )

7T notation DMRG literaturemode size n local dimension d dimension d number of sites n TT–ranks r bond dimension D B M Table 1: Correspondence between notation of our paper and notation used inquantum physics, e.g. [36]gebra and scientiﬁc computing n can grow up to thousands, and the use of spar-sity, whether possible, is essential to reduce the complexity to linear in n. The B extreme eigenvectors can be found using the block Krylov techniques (e.g. thenotable LOBPCG method [26]) using O ( B ) local matvecs and O ( B ) orthogonal-izations of vectors x b . The low–rank approximation (7) can be done by the SVDin O ( Br n min ( B, n )) operations, which can be the slowest part of the algorithm if n and B are both large. To speed up this step, we can use the cross interpolationof matrices [44] (for more robust implementation see [34, Alg. 3]), or incompleteCholesky method applied to the Gram matrix, see e.g. [38]. Using this methods,we reduce the complexity to O ( Br n ) , that is to the level of other steps of the al-gorithm.Summarizing the above, the overall storage and complexity of Alg. 1 aremem = O (( d + B ) nr ) , work = O ( dBr r A n ) + O ( dBr r ) matvec ( n ) | {z } local matvecs + O ( dB r n ) | {z } orthogonalization + O ( dBr n ) | {z } truncation . (11)Typically the TT–rank of X grows with B at least as r = O ( B ) , since the TT–format (3) has to represent B vectors with possibly diﬀerent structures simulta-neously. In this case the ﬁrst term is more likely to dominate asymptotically forlarge B and r and we have work = O ( dBr r A n ) . The complexity of the proposed method can be compared with the one ofthe algorithms used in quantum physics community, e.g. the DMRG algorithmwith targeting [47] and the variational numerical renormalization group (NRG) [36].In the notation of this paper (see Table 1), the complexity of the DMRG reportedin [36] is O ( dBr n ) and the complexity of the variational NRG is O ( Br + dr n ) . Note that the complexity of the DMRG method is cubic in n, since the optimizationover two modes is used in each step. This is a usual way to couple the subspacescorresponding to diﬀerent modes (sites) and increase (or decrease) the TT–rank(bond dimension) between the blocks. In our method the adaptation of the TT–rank r k is done by the approximate decomposition (7) of the TT–block ^ X ( p ) whichis seen as a r p − n p × Br p + matrix.The truncation step introduces the perturbation to the local vector ^ X ( p ) . The or-thogonality of the interface X = p is essential in this step to control the accuracy ofthe vectors in X. The orthogonality between x b may be also perturbed, but canbe easily recovered by the reorthogonalization of G ( b ) = [ G α ′ p ,α p ( b )] as a col-lection of n vectors of size r ′ p r p . In practice this step is not necessary, since thenon–orthogonal G ( p ) is accumulated into ^ X ( p + ) , which is then replaced by the8evel k E k − E multiplicity of X k example of eigenvector from X k ⊗ u ⊗ u ⊗ u ⊗ . . . ⊗ u µ − µ d u ⊗ u ⊗ u ⊗ u ⊗ . . . ⊗ u ( µ − µ ) d ( d − ) /2 u ⊗ u ⊗ u ⊗ u ⊗ . . . ⊗ u µ − µ d u ⊗ u ⊗ u ⊗ u ⊗ . . . ⊗ u ( µ − µ ) d ( d − )( d − ) /6 u ⊗ u ⊗ u ⊗ u ⊗ . . . ⊗ u Table 2: Low–lying eigenvalues and dimensions of the corresponding invariantsubspaces, Laplace example (12)orthogonal set of the extreme eigenvectors of the local problem in the next step ofAlg. 1.

We consider the eigenstates of a particle in a d –dimensional box and discretizethis problem on a uniform tensor product grid with n elements in each directionas follows − ∆X = XΛ, ∆ = D ⊗ I ⊗ · · · ⊗ I + · · · + I ⊗ · · · ⊗ I ⊗ D, (12)where I is the identity n × n matrix, and D = tridiag ( −

2, 1 ) is the standard ﬁnitediﬀerence discretization of the one-dimensional Laplace operator. This problemis a nice sanity test for eigensolvers, because the analytical solution of the eigen-problem is available. The eigenpairs { µ b , u b } of the matrix (− D ) read µ b = sin (cid:18) π ( b + ) ( n + ) (cid:19) , u b ( j ) = sin (cid:18) π ( b + )( j + ) n + (cid:19) , b, j =

0, . . . , n − and for the high–dimensional problem (12) we obtain λ σ ( b ,...,b d ) = µ b + · · · + µ b d , x σ ( b ,...,b d ) = u b ⊗ · · · ⊗ u b d , (13)where the order function σ sorts the eigenvalues from low to high. Alternatively,we can enumerate the energy levels by multiindices as follows: E = λ σ ( ) , E = λ σ ( ) = . . . = λ σ ( ) , and so on, and deﬁne the invariant eigenspaces X k , which consist of the eigen-vectors corresponding to the same eigenvalues. The excited eigenstates possess astrong multiplicity, which we show in Table 2, where λ = dµ denotes the groundstate energy. The multiplicity (or degeneracy ) of the energy levels is a known issuefor the convergence of eigensolvers, which makes this example particularly in-structive to consider.The matrix ∆ is represented in the TT format with TT–ranks not larger thantwo, see [18]. Each eigenvector x σ ( b ,...,b d ) has a rank-one decomposition by (13),and therefore B eigenstates are represented simultaneously in the block–TT for-mat (3) with TT–ranks not larger than B. Since { u b } are orthogonal, the maximalTT–rank grows linearly with B (for small B it is easy to check this directly, but ingeneral one would obtain a complicated combinatorial formula). Summarizing9 − − − − − − − − log ( error ) vs. log ( ε ) | λ − λ ⋆ | /λ ⋆ ∠ ( X , X ⋆ ) | λ − λ ⋆ | /λ ⋆ ∠ ( X , X ⋆ ) | λ − λ ⋆ | /λ ⋆ ∠ ( X , X ⋆ ) | λ − λ ⋆ | /λ ⋆ ∠ ( X , X ⋆ ) | λ − λ ⋆ | /λ ⋆ ∠ ( X , X ⋆ ) − − − − − − − − − log ( error ) vs. log ( ε ) Figure 4: Errors in eigenvalues and vectors vs. the thruncation threshold for theLaplace example (12). Left: block-TT Alg. 1. Right: deﬂation method. The refer-ence values λ ⋆ k and X ⋆ k are computed by (13). Parameters: dimension d = modesize n = the above, we see that Alg. 1 can be applied to compute lower–lying eigenstatesof (12) with B . We compare the proposed method with the deﬂation technique, which com-putes the eigenstates one–by–one using the standard DMRG algorithm. When B − eigenvectors X = [ x b ] are already computed, the Rayleigh quotient x ∗ Ax is minimized over the normalized vectors orthogonal to X. All vectors are repre-sented simultaneously in the block–TT format (3), and the local problem is writtenas a deﬂated eigenvalue problem with B − orthogonality constrains. We couldexpect the deﬂation method to be more eﬃcient since the size of the variationalproblem is smaller. However, in contrary with the ordinary case, the DMRG com-bined with the deﬂation may converge to a wrong eigenpair, corresponding to alarger energy level.We apply both methods to ﬁnd B = low–lying eigenstates of (12) in dimen-sion d = This should be enough for the subspaces X , . . . , X , and ﬁnd severalvectors from X . We vary the truncation parameter ε and measure the accuracyof the computed eigenvalues comparing them with the analytical values (13). For X k , we track the angle between the computed and analytical eigenspace, ∠ ( X, Y ) = max ∠ ( x, y ) , x ∈ span X, y ∈ span Y, using the formula cos ∠ ( X, Y ) = σ min ( X ∗ Y ) , where σ min is the smallest singularvalue.The results are shown in Fig. 4. We see that all errors in the block method dropdown to the machine precision rapidly . The eigenvectors are represented exactlyin the TT format as soon as the rank is large enough, even for very rough trunca-tion threshold ε. We see that the block DMRG method computes all B = eigen-states correctly. The deﬂation method determines accurately the ground state X and two vectors from the ﬁrst excited subspace X . All further eigenvectors belong Note that the minimal possible error for the angle is the square root of the machine precisiondue to the acos function − log ( time, sec ) vs. log ( d ) B = = = = = − − − − − log ( time, sec ) vs. log ( ε ) dmrg, d = eigb, d = dmrg, d = eigb, d = Figure 5: CPU time vs. dimension d (left) and truncation threshold ε (right) forthe Henon-Heiles example (14). Parameters: ε = − (left), B = (right), n = .to the subspaces higher than X , and information on the multiplicity and other in-termediate states is not computed correctly. This shows that Alg. 1 is reliable inthe case of large multiplicities, and therefore can be also applied to compute theeigenstates with small spectral gaps. We consider a particle in a d –dimensional Henon–Heiles potential which is usedas a benchmark for high-dimensional quantum molecular dynamics computa-tions [30]. The wavefunction ψ = ψ ( q , . . . , q d ) satisﬁes a stationary Schrödingerequation Hψ = Eψ, where the Hamiltonian is deﬁned as follows H = harmonic part z }| { −

12 ∆ + d X k = q + anharmonic part z }| { λ d − X k = (cid:18) q q k + −

13 q + (cid:19) | {z } Henon-Heiles potential V ( q ,...,q d ) , (14)where parameter λ = controls the anharmonic contribution to the po-tential.The principal part of the Henon-Heiles operator describes a harmonic oscil-lator, whose eigenstates are products of a Gaussian by Hermite polynomials andhave tensor rank . Therefore, for moderate anharmonicity λ the rank- basis ofeigenfunctions of the harmonic oscillator is a good choice for the discretizationof (14). The Galerkin discretization scheme results not only in dense stiﬀness andmass matrices, but also in a dense matrix that describes the action of the potential V . The DVR(discrete variable representation) scheme uses a collocation of V and ψ on the nodes of the Hermite polynomials and is known to provide the sameorder of accuracy for the eigenproblem as the Galerkin method.11 − − − − − − log ( | λ ⋆ − λ | ) vs. n ε = − ε = − ε = − ε = − ε = − ε = − − log ( time, sec ) vs. log ( n ) Figure 6: Eigenvalue errors (left) and CPU times (right) vs. mode size n and trun-cation threshold ε for the Henon-Heiles example (14). The reference value λ ⋆ iscomputed by the same algorithm with ε = − , n = Parameters: B = = The one-dimensional Laplace operator D = − d dq in the Hermite-DVR ap-proach is discretized (see [2]) as follows D ij = (cid:14) ( − − ) , i = j, (− ) ( i − j ) ( ( t i − t j ) − − ) , i = j, where t , . . . , t n are the roots of the n -th Hermite polynomial, i.e. the Hermite mesh .The discretization of the d -dimensional Laplace operator is written in the sameway as in (12) and has TT–ranks not larger than two. The potential V is constructedas a sum of rank-1 monomials and can be represented by the TT format with TT–ranks not larger than see [25]. Therefore, Alg. 1 can be applied to ﬁnd the low–lying eigenstates of (14).As an initial guess, we take a random TT tensor of rank B in the block form(3). We test the block TT algorithm for diﬀerent dimensions d , Frobenius-normtruncation tolerances for eigenvalues ε and number of eigenstates B . Numericalexperiments show that the internal convergence of the eigenvalues is quadraticw.r.t. the truncation threshold ε, as expected from the perturbation theory. Theerrors in the eigenvalues do not grow with the dimension, so the method can beexploited for higher–dimensional systems. From Fig. 5 (left), we observe that thecost grows mildly with the dimension and B , which allows to compute energylevels with high accuracy.In Fig. 5 (right) we compare our block-TT method (Alg. 1, referenced as ‘eigb’)with the DMRG technique from [25], intended for searching one lowest eigenpaironly. Alg. 1 computes at least B = eigenstates, and the TT-ranks of the block–TTformat which represents two eigenstates are larger (sometimes signiﬁcantly) thanthe TT-ranks of the ground state returned by the DMRG method. We see howlarge ranks of the excited state manifest themselves for high accuracies in Fig. 5.Despite the lower cost w.r.t. the mode size n , the ‘eigb’ method requires the sameCPU time as the DMRG due to larger TT-ranks, which leads to a higher complexityof the local eigenproblem (10). 12inally we investigate the performance of Alg. 1 w.r.t. the mode size (numberof Hermite polynomials) n , see Fig. 6. The error decreases exponentially with n due to the Hermite-DVR scheme, but only until the level O ( ε ) governed by thetensor truncation threshold. The CPU time growth rate is balanced between O ( n ) (truncation step) and O ( n ) (dense matrix-by-vector multiplication). The one-dimensional Heisenberg model is one of the classical applications of theMPS/DMRG algorithms, so it is worth to test our approximate block eigenvaluesolver on it, and compare it with the established software from the MPS commu-nity. This model describes the interaction of spins on a one-dimensional lattice.The Hamiltonian for the antiferromagnetic case is written in the following form H = d − X i = S i S i + , (15)where S i is the spin operator. The product S i S i + can be written in terms of thespin components as follows S i S i + = ( S x ) i ( S x ) i + + ( S y ) i ( S y ) i + + ( S z ) i ( S z ) i + . (16)Since the operator ( S x ) i aﬀects only the i -th spin, it has the form ( S x ) i = I ⊗ . . . ⊗ S x ⊗ . . . ⊗ I, (17)and similarly for ( S y ) i , ( S z ) i . The “elementary” spin operators S x , S y and S z arethe Pauli matrices, deﬁned as follows S x = (cid:18) (cid:19) , S y = (cid:18) − ii 0 (cid:19) , S z = (cid:18) − (cid:19) . (18)Using the equations (15),(16),(17) and (18), it is very easy to construct the TT rep-resentation of the Heisenberg Hamiltonian. Moreover, it is also not diﬃcult toshow that its TT ranks are not larger than see [5].For diﬀerent lengths of the spin chain we compute the lowest eigenvalues withthe help of our block eigenvalue solver, and compare it with two packages con-taining the DMRG: ALPS (Algorithms and Libraries for Physics Simulations) andITensor . The ALPS allows to compute several excited states, whereas the ITen-sor is devised for targeting the ground state only. Using Alg. 1 we can compute B > eigenstates, so the comparison with ITensor and ALPS with B = assumesthat our algorithm is applied with B = and the excited state is computed but notused. To introduce a fair amount of optimization, all software was compiled usingIntel C/Fortran compiler 2013 and linked with the optimized Lapack / Blas pack-ages provided in the MKL library. The experiments have been computed usingone Intel Xeon processor with cores at GHz. http://alps.comp-phys.org/ , Release 2.1.1 http://itensor.org/ , downloaded on May 20, 2013 (no particular release) .8 1 1.2 1.4 1.6 − − log ( time, sec ) vs. log ( d ) eigbalpsitensor − log ( time, sec ) vs. log ( B ) alps, ε = − eigb, ε = − alps, ε = − eigb, ε = − Figure 7: CPU times of diﬀerent methods vs. d (left), B (right), Heisenberg exam-ple. Parameters: ε = − , B = (left), d = (right). − − − − − − log ( time, sec ) vs. log ( ε ) eigbalpsitensor Figure 8: CPU times of diﬀerent methods vs. ε , Heisenberg example. Parameters: B = , d = .The computational times of Alg. 1 (‘eigb’), ALPS and ITensor are presented inFigs 7 and 8. We observe that all methods manifest the polynomial complexityscaling in the number of spins d and targeting eigenstates B . The dependence on B is of particular interest. Recalling the complexity estimate (11), we may expectwork = O ( B ) , if the TT–ranks behave as r ∼ B. In Fig. 7 we see even mildergrowth of the CPU time, time = O ( B β ) , β ≈ and the same phenomenon isobserved for the DMRG method from the ALPS. It can be explained by the non-uniform distribution of the TT–ranks, which makes the complexity estimate (11)larger than the actual computation time.Considering the memory usage, we observe that the ALPS package failed tocompute more than targets due to the Out-of-Memory exception, while ourimplementation of Alg. 1 goes readily beyond eigenstates.The accuracy parameter ε requires an additional comment, since it is useddiﬀerently in diﬀerent methods. The block TT method performs the Frobenius-14 − − − − − − − − − − log (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) λ eigb b − λ alps b (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) vs. log ( ε ) λ λ λ λ λ − − − − − − − log (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) λ eigb − λ itensor (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) vs. log ( d ) ε = − ε = − ε = − ε = − ε = − Figure 9: Discrepancies between eigenvalues in diﬀerent methods vs. eigenvaluenumber (left) and d (right), Heisenberg example. Parameters: d =

30, B = (left), B = (right).norm ε -approximation of the eigenvectors, whereas the other two packages aredesigned to keep the accuracy of the eigenvalue, and hence are parametrized with ε as a threshold.We see that the performances of our block TT technique and the method fromITensor are comparable, and the algorithm from ALPS is signiﬁcantly slower. Tak-ing into account that our approach allows to compute several eigenstates (whichseems to be not the case for ITensor), we may conclude that it overcomes the cur-rent state-of-the-art software. To be sure that the correctness is maintained, weshow the diﬀerence between the eigenvalues computed by the eigb and the othertwo methods in Fig. 9, which demonstrates that the O ( ε ) accuracy of the eigen-values is achieved. We propose the eﬃcient one–site rank–adaptive algorithm for the computation ofseveral extreme eigenvalues of a high–dimensional Hermitian matrix. The eigen-states are represented simultaneously by the block tensor train format, which isﬂexible and has a particular form in each variational step of the algorithm. Thecomplexity of the proposed method is linear with respect to the mode size forsparse matrices and quadratic for dense matrices, which is better than the com-plexity of the DMRG algorithm. The proposed method has the same asymptoticin the number of eigenstates, the TT–ranks and the dimension, as the DMRG andNRG algorithms used in quantum physics.The algorithm is veriﬁed on a number of representative problems. First, it istested on the high–dimensional Laplace matrix, which describes the eigenstates ofthe particle in a box, and it is demonstrated that the algorithm is capable of ﬁnd-ing the eigenstates with high multiplicity. Second, it is applied to the stationarySchrödinger equation with the Henon-Heiles potential, and it is shown that the15roposed method can compute the ground state faster than the DMRG algorithm.Finally, we apply the algorithm to the Heisenberg spin chain and show that it iscompetitive with the state of the art publicly available DMRG implementations.The framework developed in this paper can be applied to a wider class of prob-lems, which are formulated via the subspace optimization.

Software implementation

We have implemented the main Algorithm 1 in Fortran with interfaces to Pythonand MATLAB. The codes are available online at • http://github.com/oseledets/tt-fort • http://github.com/oseledets/ttpy • http://github.com/oseledets/TT-Toolbox References [1]

A. Ammar, B. Mokdad, F. Chinesta, and R. Keunings , A new family of solversfor some classes of multidimensional partial diﬀerential equations encoun-tered in kinetic theory modeling of complex ﬂuids,

Journal of Non-NewtonianFluid Mechanics , 139(3):153 – 176, 2006. doi: 10.1016/j.jnnfm.2006.07.007.[2]

D. Baye and P.-H. Heenen , Generalised meshes for quantum mechan-ical problems,

J. Phys. A: Math. Gen. , 19(11):2041–2059, 1986. doi:10.1088/0305-4470/19/11/013.[3]

L. de Lathauwer, B. de Moor, and J. Vandewalle , A multilinear singularvalue decomposition,

SIAM J. Matrix Anal. Appl. , 21:1253–1278, 2000. doi:10.1137/s0895479896305696.[4]

S. Dolgov and B. Khoromskij , Two-level QTT-Tucker format for optimizedtensor calculus,

SIAM J. on Matrix An. Appl. , 34(2):593–623, 2013. doi:10.1137/120882597.[5]

S. V. Dolgov and B. N. Khoromskij , Tensor-product ap-proach to global time-space-parametric discretization ofchemical master equation, Preprint 68, MPI MIS, 2012. .[6]

S. V. Dolgov, B. N. Khoromskij, and I. V. Oseledets , Fast solution of multi-dimensional parabolic problems in the tensor train/quantized tensor train–format with initial application to the Fokker-Planck equation,

SIAM J. Sci.Comput. , 34(6):A3016–A3038, 2012. doi: 10.1137/120864210.[7]

S. V. Dolgov, B. N. Khoromskij, and D. V. Savostyanov , Superfast Fouriertransform using QTT approximation,

J. Fourier Anal. Appl. , 18(5):915–953,2012. doi: 10.1007/s00041-012-9227-4.168]

S. V. Dolgov and I. V. Oseledets , Solution of linear systems and matrix in-version in the TT-format,

SIAM J. Sci. Comput. , 34(5):A2718–A2739, 2012. doi:10.1137/110833142.[9]

S. V. Dolgov and D. V. Savostyanov , Alternating minimal energy methodsfor linear systems in higher dimensions. Part I: SPD systems, arXiv preprint1301.6068, 2013. http://arxiv.org/abs/1301.6068 .[10] , Alternating minimal energy methods for linear systems in higher di-mensions. Part II: Faster algorithm and application to nonsymmetric systems,arXiv preprint 1304.1222, 2013. http://arxiv.org/abs/1304.1222 .[11]

H.-J. Flad, B. N. Khoromskij, D. V. Savostyanov, and E. E. Tyrtyshnikov , Ver-iﬁcation of the cross 3D algorithm on quantum chemistry data,

Rus. J. Numer.Anal. Math. Model. , 23(4):329–344, 2008. doi: 10.1515/RJNAMM.2008.020.[12]

L. Grasedyck , Hierarchical singular value decomposition of tensors,

SIAMJ. Matrix Anal. Appl. , 31(4):2029–2054, 2010.[13]

W. Hackbusch , Tensorisation of vectors and their eﬃcient convolution,

Nu-mer. Math. , 119(3):465–488, 2011. doi: 10.1007/s00211-011-0393-0.[14]

W. Hackbusch, B. N. Khoromskij, S. A. Sauter, and E. E. Tyrtyshnikov , Useof tensor formats in elliptic eigenvalue problems,

Numer. Linear Algebra Appl. ,19(1):133–151, 2012. doi: 10.1002/nla.793.[15]

W. Hackbusch and S. Kühn , A new scheme for the tensor representation,

J.Fourier Anal. Appl. , 15(5):706–722, 2009.[16]

S. Holtz, T. Rohwedder, and R. Schneider , The alternating linear schemefor tensor optimization in the tensor train format,

SIAM J. Sci. Comput. ,34(2):A683–A713, 2012. doi: 10.1137/100818893.[17]

V. Kazeev, B. N. Khoromskij, and E. E. Tyrtyshnikov , MultilevelToeplitz matrices generated by tensor-structured vectors and convolu-tion with logarithmic complexity, Tech. Rep. 36, MPI MIS, Leipzig, 2011. .[18]

V. A. Kazeev and B. N. Khoromskij , Low-rank explicit QTT representationof the Laplace operator and its inverse,

SIAM J. Matrix Anal. Appl. , 33(3):742–758, 2012. doi: 10.1137/100820479.[19]

V. Khoromskaia, B. N. Khoromskij, and R. Schneider , Tensor-structured fac-torized calculation of two-electron integrals in a general basis,

SIAM J SciComp , 35(2):A987–A1010, 2013. doi: 10.1137/120884067.[20]

B. N. Khoromskij , Fast and accurate tensor approximation of multivari-ate convolution with linear scaling in dimension,

J. Comp. Appl. Math. ,234(11):3122–3139, 2010. doi: 10.1016/j.cam.2010.02.004.[21] , O ( d log n ) –Quantics approximation of N – d tensors in high-dimensional numerical modeling, Constr. Appr. , 34(2):257–280, 2011.doi: 10.1007/s00365-011-9131-1. 1722]

B. N. Khoromskij and V. Khoromskaia , Low rank Tucker-type tensor ap-proximation to classical potentials,

Central European journal of mathematics ,5(3):523–550, 2007. doi: 10.2478/s11533-007-0018-0.[23] , Multigrid accelerated tensor approximation of function related mul-tidimensional arrays,

SIAM J. Sci. Comput. , 31(4):3002–3026, 2009. doi:10.1137/080730408.[24]

B. N. Khoromskij, V. Khoromskaia, and H.-J. Flad. , Numerical solution ofthe Hartree–Fock equation in multilevel tensor-structured format,

SIAM J.Sci. Comput. , 33(1):45–65, 2011. doi: 10.1137/090777372.[25]

B. N. Khoromskij and I. V. Oseledets , DMRG+QTT ap-proach to computation of the ground state for the molecu-lar Schrödinger operator, Preprint 69, MPI MIS, Leipzig, 2010. .[26]

A. V. Knyazev , Toward the optimal preconditioned eigensolver: locally opti-mal block preconditioned conjugate gradient method,

SIAM J. Sci. Comput. ,23(2):517–541, 2001. doi: 10.1137/S1064827500366124.[27]

T. G. Kolda and B. W. Bader , Tensor decompositions and applications,

SIAMReview , 51(3):455–500, 2009. doi: 10.1137/07070111X.[28]

C. Le Bris, T. Leliévre, and Y. Maday , Results and questions on a nonlin-ear approximation approach for solving high-dimensional partial diﬀerentialequations,

Constr. Approx. , 30:621–651, 2009. doi: 10.1007/s00365-009-9071-1.[29]

O. S. Lebedeva , Tensor conjugate-gradient-type method for Rayleigh quo-tient minimization in block QTT-format,

Russ. J. Numer. Anal. Math. Mod-elling , 26(5):465–489, 2011. doi: 10.1515/rjnamm.2011.026.[30]

M. Nest and H.-D. Meyer , Benchmark calculations on high-dimensionalHenon-Heiles potentials with the multi-conﬁguration time dependentHartree (MCTDH) method,

J. Chem. Phys. , 117(23):10499, 2002. doi:10.1063/1.1521129.[31]

I. V. Oseledets , Approximation of d × d matrices using tensor de-composition, SIAM J. Matrix Anal. Appl. , 31(4):2130–2145, 2010. doi:10.1137/090757861.[32] , DMRG approach to fast linear algebra in the TT–format,

Comput. Meth.Appl. Math , 11(3):382–393, 2011.[33] , Tensor-train decomposition,

SIAM J. Sci. Comput. , 33(5):2295–2317,2011. doi: 10.1137/090752286.[34]

I. V. Oseledets, D. V. Savostyanov, and E. E. Tyrtyshnikov , Cross approxi-mation in tensor electron density computations,

Numer. Linear Algebra Appl. ,17(6):935–952, 2010. doi: 10.1002/nla.682.1835]

I. V. Oseledets and E. E. Tyrtyshnikov , Breaking the curse of dimensionality,or how to use SVD in many dimensions,

SIAM J. Sci. Comput. , 31(5):3744–3759, 2009. doi: 10.1137/090748330.[36]

I. Pižorn and F. Verstraete , Variational Numerical Renormalization Group:Bridging the gap between NRG and Density Matrix Renormalization Group,

Phys. Rev. Lett. , 108(067202), 2012. doi: 10.1103/PhysRevLett.108067202.[37]

T. Rohwedder and A. Uschmajew , Local convergence of alternating schemesfor optimization of convex problems in the TT format,

SIAM J Num. Anal. ,51(2):1134–1162, 2013. doi: 10.1137/110857520.[38]

D. V. Savostyanov , Fast revealing of mode ranks of tensor in canon-ical format,

Numer. Math. Theor. Meth. Appl. , 2(4):439–444, 2009. doi:10.4208/nmtma.2009.m9006s.[39] , Quasioptimality of maximum–volume cross interpolation of tensors,arXiv preprint 1305.1818, 2013. http://arxiv.org/abs/1305.1818 .[40]

D. V. Savostyanov and I. V. Oseledets , Fast adaptive interpolation of multi-dimensional arrays in tensor train format, in Proceedings of 7th Interna-tional Workshop on Multidimensional Systems (nDS), IEEE, 2011. doi:10.1109/nDS.2011.6076873.[41]

U. Schollwöck , The density-matrix renormalization group in the ageof matrix product states,

Annals of Physics , 326(1):96–192, 2011. doi:10.1016/j.aop.2010.09.012.[42]

U. Schollwöck , The density-matrix renormalization group in the age of ma-trix product states,

Ann. Phys , 326(1):96–192, 2011.[43]

V. Temlyakov , Greedy Approximation, Cambridge University Press, 2011.[44]

E. E. Tyrtyshnikov , Incomplete cross approximation in the mosaic–skeletonmethod,

Computing , 64(4):367–380, 2000. doi: 10.1007/s006070070031.[45]

S. R. White , Density matrix formulation for quantum renor-malization groups,

Phys. Rev. Lett. , 69(19):2863–2866, 1992. doi:10.1103/PhysRevLett.69.2863.[46] , Density-matrix algorithms for quantum renormalization groups,

Phys.Rev. B , 48(14):10345–10356, 1993. doi: 10.1103/PhysRevB.48.10345.[47]

S. R. White and D. Huse , Numerical renormalization-group study of low-lying eigenstates of the antiferromagnetic S = Heisenberg chain,