Randomized Alternating Least Squares for Canonical Tensor Decompositions: Application to a PDE with Random Data
RRANDOMIZED ALTERNATING LEAST SQUARES FORCANONICAL TENSOR DECOMPOSITIONS: APPLICATION TOA PDE WITH RANDOM DATA
MATTHEW J REYNOLDS ∗ , ALIREZA DOOSTAN ∗ AND GREGORY BEYLKIN † Abstract.
This paper introduces a randomized variation of the alternatingleast squares (ALS) algorithm for rank reduction of canonical tensor formats.The aim is to address the potential numerical ill-conditioning of least squaresmatrices at each ALS iteration. The proposed algorithm, dubbed randomizedALS , mitigates large condition numbers via projections onto random tensors,a technique inspired by well-established randomized projection methods forsolving overdetermined least squares problems in a matrix setting. A proba-bilistic bound on the condition numbers of the randomized ALS matrices isprovided, demonstrating reductions relative to their standard counterparts.Additionally, results are provided that guarantee comparable accuracy of therandomized ALS solution at each iteration. The performance of the random-ized algorithm is studied with three examples, including manufactured tensorsand an elliptic PDE with random inputs. In particular, for the latter, testsillustrate not only improvements in condition numbers, but also improved ac-curacy of the iterative solver for the PDE solution represented in a canonicaltensor format. Introduction
The approximation of multivariate functions is an essential tool for numerousapplications including computational chemistry [1, 24], data mining [2, 9, 24], andrecently uncertainty quantification [11, 23]. For seemingly reasonable numbers ofvariables d , e.g. O (10) , reconstructing a function, for instance, using its sampledvalues, requires computational costs that may be prohibitive. This is related tothe so-called “curse of dimensionality.” To mitigate this phenomenon, we requiresuch functions to have special structures that can be exploited by carefully craftedalgorithms. One such structure is that the function of interest u ( z , z , . . . , z d ) ,depending on variables z , z , . . . , z d , admits a separated representation, [3, 4, 24],of the form(1.1) u ( z , z , . . . z d ) = r (cid:88) l =1 σ l u l ( z ) u l ( z ) · · · u ld ( z d ) . The number of terms, r , is called the separation rank of u and is assumed to be small . Any discretization of the univariate functions u lj ( z j ) in (1.1) with u li j = Key words and phrases.
Separated representations; Tensor Decomposition; Randomized Pro-jection; Alternating Least Squares; Canonical Tensors; Stochastic PDE.This material is based upon work supported by the U.S. Department of Energy Office ofScience, Office of Advanced Scientific Computing Research, under Award Number de-sc0006402,and NSF grants DMS-1228359, DMS-1320919, and CMMI-1454601. a r X i v : . [ m a t h . NA ] O c t ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 2 u lj (cid:0) z i j (cid:1) , i j = 1 , . . . , M j and j = 1 , . . . , d , leads to a Canonical Tensor Decomposi-tion, or CTD,(1.2) U = U ( i . . . i d ) = r (cid:88) l =1 σ l u li u li · · · u li d . The functions u lj ( z j ) in (1.1) and the corresponding vectors u li j in (1.2) are nor-malized to unit norm so that the magnitude of the terms is carried by their positive s -values, σ l . It is well understood that when the separation rank r is independentof d , the computation costs and storage requirements of standard algebraic oper-ations in separated representations scale linearly in d , [4]. For this reason, suchrepresentations are widely used for approximating high-dimensional functions. Tokeep the computation of CTDs manageable, it is crucial to maintain as small aspossible separation rank. Common operations involving CTDs, e.g. summations,lead to CTDs with separation ranks that may be larger than necessary. Therefore, astandard practice is to reduce the separation rank of a given CTD without sacrific-ing much accuracy, for which the workhorse algorithm is Alternating Least Squares(ALS) (see e.g., [3, 4, 6, 7, 21, 24, 31]). This algorithm optimizes the separatedrepresentation (in Frobenius norm) one direction at a time by solving least squaresproblems for each direction. The linear systems for each direction are obtained asnormal equations by contracting over all tensor indices, i = 1 , . . . , d , except thosein the direction of optimization k .It is well known that forming normal equations increases the condition numberof the least squares problem, see e.g. [16]. In this paper we investigate the behaviorof the condition numbers of linear systems that arise in the ALS algorithm, andpropose an alternative formulation in order to avoid potential ill-conditioning. Aswe shall see later, the normal equations in the ALS algorithm are formed via theHadamard (entry-wise) product of matrices for individual directions. We show thatin order for the resulting matrix to be ill-conditioned, the matrices for all directionshave to be ill-conditioned and obtain estimates of these condition numbers. Toimprove the conditioning of the linear systems, we propose a randomized version ofALS, called randomized ALS , where instead of contracting a tensor with itself (in alldirections but one), we contract it with a tensor composed of random entries. Weshow that this random projection improves the conditioning of the linear systems.However, its straightforward use does not insure monotonicity in error reduction,unlike in standard ALS. In order to restore monotonicity, we simply accept onlyrandom projections that do not increase the error.Our interest here in using CTDs stems from the efficiency of such representationsin tackling the issue of the curse of dimensionality arising from the solution ofPDEs with random data, as studied in the context of Uncertainty Quantification(UQ). In the probabilistic framework, uncertainties are represented via a finitenumber of random variables z j specified using, for example, available experimentaldata or expert opinion. An important task is to then quantify the dependenceof quantities of interest u ( z , . . . , z d ) on these random inputs. For this purpose,approximation techniques based on separated representations have been recentlystudied in [12, 25, 26, 11, 27, 13, 18, 23, 10, 17, 19] and proven effective in reducingthe issue of curse of dimensionality. ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 3
The paper is organized as follows. In Section 2, we introduce our notation andprovide background information on tensors, the standard ALS algorithm, and therandom matrix theory used in this paper. In Section 3, we introduce randomizedALS and provide analysis of the algorithm’s convergence and the conditioning ofmatrices used. Section 4 contains demonstrations of randomized ALS and compar-isons with standard ALS on three examples. The most important of these examplesprovides background on uncertainty quantification and demonstrates the applica-tion of randomized ALS-based reduction as a step in finding the fixed point solutionof a stochastic PDE. We conclude with a discussion on our new algorithm and futurework in Section 5. 2.
Notation and Background
Notation.
Our notation for tensors, i.e. d -directional arrays of numbers, isboldfaced uppercase letters, e.g. F ∈ R M ×···× M d . These tensors are assumed tobe in the CTD format, F = r F (cid:88) l =1 s F l F l ◦ · · · ◦ F ld , where the factors F li ∈ R M k are vectors with a subscript denoting the directionalindex and a superscript the rank index, and ◦ denotes the standard vector outerproduct. We write operators in dimension d as A = A ( j , j (cid:48) ; . . . ; j d , j (cid:48) d ) , while forstandard matrices we use uppercase letters, e.g. A ∈ R N × M . Vectors are repre-sented using boldfaced lowercase letters, e.g. c ∈ R N , while scalars are representedby lowercase letters. We perform three operations on CTDs: addition, inner prod-uct, and the application of a d -dimensional operator. • When two CTDs are added together, all terms are joined into a single listand simply re-indexed. In such a case the nominal separation rank is thesum of the ranks of the components, i.e., if CTDs are of the ranks ˜ r and ˆ r ,the output has rank ˜ r + ˆ r . • The inner product of two tensors in CTD format, ˜ F and ˆ F , is defined as (cid:68) ˜ F , ˆ F (cid:69) = ˜ r (cid:88) ˜ l =1 ˆ r (cid:88) ˆ l =1 ˜ s ˜ l ˆ s ˆ l (cid:68) ˜ F ˜ l , ˆ F ˆ l (cid:69) . . . (cid:68) ˜ F ˜ ld , ˆ F ˆ ld (cid:69) , where the inner product (cid:104)· , ·(cid:105) operating on vectors is the standard vectordot product. • When applying a d -dimensional operator to a tensor in CTD format, wehave A F = r A (cid:88) ˆ l =1 r F (cid:88) ˜ l =1 s A ˆ l s F ˜ l (cid:16) A ˆ l F ˜ l (cid:17) ◦ · · · ◦ (cid:16) A ˆ ld F ˜ ld (cid:17) . We use the symbol (cid:107) · (cid:107) to denote the standard spectral norm for matrices, as wellas the Frobenius norm for tensors, (cid:107) F (cid:107) = (cid:104) F , F (cid:105) , and (cid:107) · (cid:107) and (cid:107) · (cid:107) to denote the standard Euclidean (cid:96) and (cid:96) vector norms.For analysis involving matrices we use three different types of multiplicationin addition to the standard matrix multiplication. The Hadamard, or entry-wise, ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 4 product of two matrices A and B is denoted by A ∗ B . The Kronecker product oftwo matrices A ∈ R N A × M A and B ∈ R N B × M B , is denoted as A ⊗ B , A ⊗ B = A (1 , B . . . A (1 , M A ) B ... . . . ... A ( N A , B . . . A ( N A , M A ) B . The final type of matrix product we use, the Khatri-Rao product of two matrices A ∈ R N A × M and B ∈ R N B × M , is denoted by A (cid:12) B,A (cid:12) B = (cid:2) A (: , ⊗ B (: , A (: , ⊗ B (: , . . . A (: , N ) ⊗ B (: , M ) (cid:3) . We also frequently use the maximal and minimal (non-zero) singular values of amatrix, denoted as σ max and σ min , respectively.2.2. ALS algorithm.
Operations on tensors in CTD format lead to an increase ofthe nominal separation rank. This separation rank is not necessarily the smallestpossible rank to represent the resulting tensor for a given accuracy. The ALSalgorithm attempts to find an approximation to the tensor with minimal (or nearminimal) separation rank. Specifically, given a tensor G in CTD format withseparation rank r G , G = r G (cid:88) l =1 s G l G l ◦ · · · ◦ G ld , and an acceptable error (cid:15) , we attempt to find a representation F = r F (cid:88) ˜ l =1 s F ˜ l F ˜ l ◦ · · · ◦ F ˜ ld with lower separation rank, r F < r G , such that (cid:107) F − G (cid:107) / (cid:107) G (cid:107) < (cid:15) .The standard ALS algorithm starts from an initial guess, F , with a small sep-aration rank, e.g., r F = 1 . A sequence of least squares problems in each directionis then constructed and solved to update the representation. Given a direction k ,we freeze the factors in all other directions to produce a least squares problem forthe factors in direction k . This process is then repeated for all directions k . Onecycle through all k is called an ALS sweep. These ALS sweeps continue until theimprovement in the residual (cid:107) F − G (cid:107) / (cid:107) G (cid:107) either drops below a certain thresholdor reaches the desired accuracy, i.e. (cid:107) F − G (cid:107) / (cid:107) G (cid:107) < (cid:15) . If the residual is stillabove the target accuracy (cid:15) , the separation rank r F is increased and we repeat theprevious steps for constructing the representation with the new separation rank.Specifically, as discussed in [4], the construction of the normal equations fordirection k can be thought of as taking the derivatives of the Frobenius norm of (cid:107) F − G (cid:107) with respect to the factors F ˜ lk , ˜ l = 1 , . . . , r F , and setting these derivativesto zero. This yields the normal equations(2.1) B k c j k = b j k , where j k corresponds to the j -th entry of F ˜ lk and c j k = c j k (˜ l ) is a vector indexedby ˜ l . Alternatively, the normal system (2.1) can be obtained by contracting alldirections except the optimization direction k , so that the entries of the matrix B k are the Hadamard product of Gram matrices,(2.2) B k (ˆ l, ˜ l ) = (cid:89) i (cid:54) = k (cid:68) F ˜ li , F ˆ li (cid:69) , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 5 and, accordingly, the right-hand side is b j k (ˆ l ) = r G (cid:88) l =1 s G l G lk ( j k ) (cid:89) i (cid:54) = k (cid:68) G li , F ˆ li (cid:69) . We solve (2.1) for c j k and use the solution to update F ˜ lk . Pseudocode for the ALSalgorithm is provided in Algorithm 1, where max _ rank and max _ iter denote themaximum separation rank and the limit on the number of iterations. The threshold δ is used to decide if the separation rank needs to be increased. Algorithm 1:
Alternating least squares algorithm for rank reduction input : (cid:15) > , δ > , G with rank r G , max _ rank, max _ iter initialize r F = 1 tensor F = F ◦ · · · ◦ F d with randomly generated F k while r F ≤ max _ rank do iter = 1 if r F > then add a random rank contribution to F : F = F + F r F ◦ · · · ◦ F r F d end res = (cid:107) F − G (cid:107) / (cid:107) G (cid:107) while iter ≤ max _ iter do res _ old = res for k = 1 , . . . , d do solve B k c j k = b j k for every j k in direction kdefine v ˜ l = (cid:16) c (˜ l ) , . . . , c M k (˜ l ) (cid:17) for ˜ l = 1 , . . . , r F s F ˜ l = (cid:13)(cid:13) v ˜ l (cid:13)(cid:13) for ˜ l = 1 , . . . , r F F ˜ lk ( j k ) = c j k (˜ l ) /s F ˜ l for ˜ l = 1 , . . . , r F end res = (cid:107) F − G (cid:107) / (cid:107) G (cid:107) if res < (cid:15) thenreturn F else if | res − res _ old | < δ thenbreakelse iter = iter + 1 endend r F = r F + 1 endreturn F A potential pitfall of the ALS algorithm is poor-conditioning of the matrix B k since the construction of normal equations squares the condition number as is wellknown in matrix problems. An alternative that avoids the normal equations ismentioned in the review paper [24], but it is not feasible for problems with evenmoderately large dimension (e.g. d = 5 ). ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 6
Estimate of condition numbers of least squares matrices.
It is an em-pirical observation that the condition number of the matrices B k is sometimessignificantly better than the condition numbers of some of the Gram matrices com-prising the Hadamard product in (2.2). In fact we have Lemma 1.
Let A and B be Gram matrices with all diagonal entries equal to .Then we have σ min ( B ) ≤ σ min ( A ∗ B ) ≤ σ max ( A ∗ B ) ≤ σ max ( B ) . If the matrix B is positive definite, then κ ( A ∗ B ) ≤ κ ( B ) . Since Gram matrices are symmetric non-negative definite, the proof of Lemma 1follows directly from [22, Theorem 5.3.4]. This estimate implies that it is sufficientfor only one of the matrices to be well conditioned to assure that the Hadamardproduct is also well conditioned. In other words, it is necessary for all directionalGram matrices to be ill-conditioned to cause the ill-conditioning of the Hadamardproduct. Clearly, this situation can occur and we address it in the paper.2.4.
Modification of normal equations: motivation for randomized meth-ods.
We motivate our approach by first considering an alternative to forming nor-mal equations for ordinary matrices (excluding the QR factorization that can beeasily used for matrices). Given a matrix A ∈ R N × n , N ≥ n , we can multiply A x = b by a matrix R ∈ R n (cid:48) × N with independent random entries and then solve(2.3) RA x = R b , instead (see, e.g. [20, 29, 30, 33] ). The solution of this system, given that R is ofappropriate size (i.e., n (cid:48) is large enough), will be close to the least squares solution[29, Lemma 2]. In [29], (2.3) is used to form a preconditioner and an initial guess forsolving min (cid:107) A x − b (cid:107) via a preconditioned conjugate gradient method. However,for our application we are interested in using equations of the form (2.3) in theHadamard product in (2.2). We observe that RA typically has a smaller conditionnumber than A T A . To see why, recall that for full-rank, square matrices A and B ,a bound on the condition number is κ ( AB ) ≤ κ ( A ) κ ( B ) . However, for rectangular full-rank matrices A ∈ R r (cid:48) × N and B ∈ R N × r , r ≤ r (cid:48) ≤ N ,this inequality does not necessarily hold. Instead, we have the inequality(2.4) κ ( AB ) ≤ κ ( A ) σ ( B ) σ min ( P A T ( B )) , where P A T ( B ) is the projection of B onto the row space of A (for a proof of this in-equality, see Appendix A). If A has a small condition number (for example, when A is a Gaussian random matrix, see [8, 14, 15]) and we were to assume σ min ( P A T ( B )) is close to σ min ( B ) , we obtain condition numbers smaller than κ ( B ) . The assump-tion that σ min ( P A T ( B )) is close to σ min ( B ) is the same as assuming the columnsof B lie within the subspace spanned by the row of A . This is achieved by choosing r (cid:48) to be larger than r when A is a randomized matrix. ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 7
Definitions and random matrix theory.
The main advantage of our ap-proach is an improved condition number for the linear system solved at every step ofthe ALS algorithm. We use a particular type of random matrices to derive boundson the condition number: the rows are independently distributed random vectors,but the columns are not (instead of the standard case where all entries are i.i.d).Such matrices were studied extensively by Vershynin [32] and we rely heavily onthis work for our estimates. To proceed, we need the following definitions from [32].
Remark . Definitions involving random variables, and vectors composed of randomvariables, are not consistent with the notation of the rest of the paper, outlined inSection 2.1.
Definition 3. [32, Definition 5.7] Let P {·} denote the probability of a set and E themathematical expectation operator. Also, let X be a random variable that satisfiesone of the three following equivalent properties, . P {| X | > t } ≤ exp (cid:0) − t /K (cid:1) for all t ≥ . ( E | X | p ) /p ≤ K √ p for all p ≥ . E exp (cid:0) X /K (cid:1) ≤ e, where the constants K i , i = 1 , , , differ from each other by at most an absoluteconstant factor (see [32, Lemma 5.5] for a proof of the equivalence of these proper-ties). Then X is called a sub-Gaussian random variable. The sub-Gaussian normof X is defined as the smallest K in property 2, i.e., (cid:107) X (cid:107) ψ = sup p ≥ ( E | X | p ) /p √ p . Examples of sub-Gaussian random variables include Gaussian and Bernoulli randomvariables. We also present definitions for sub-Gaussian random vectors and theirnorm.
Definition 4. [32, Definition 5.7] A random vector X ∈ R n is called a sub-Gaussianrandom vector if (cid:104) X, x (cid:105) is a sub-Gaussian random variable for all x ∈ R n . The sub-Gaussian norm of X is subsequently defined as (cid:107) X (cid:107) ψ = sup x ∈S n − (cid:107)(cid:104) X, x (cid:105)(cid:107) ψ , where S n − is the unit Euclidean sphere. Definition 5. [32, Definition 5.19] A random vector X ∈ R n is called isotropic if itssecond moment matrix, Σ = Σ ( X ) = E (cid:2) XX T (cid:3) , is equal to identity, i.e. Σ ( X ) = I .This definition is equivalent to E (cid:104) X, x (cid:105) = (cid:107) x (cid:107) for all x ∈ R n . The following theorem from [32] provides bounds on the condition numbers ofmatrices whose rows are independent sub-Gaussian isotropic random variables.
Theorem 6. [32, Theorem 5.38]
Let A be an N × n matrix whose rows A ( i, :) areindependent, sub-Gaussian isotropic random vectors in R n . Then for every t ≥ ,with probability at least − (cid:0) − ct (cid:1) , one has (2.5) √ N − C √ n − t ≤ σ min ( A ) ≤ σ max ( A ) ≤ √ N + C √ n + t. Here C = C K , c = c K > , depend only on the sub-Gaussian norm K = max i (cid:107) A ( i, :) (cid:107) ψ . ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 8
An outline of the proof of Theorem 6 will be useful for deriving our own results,so we provide a sketch in Appendix A. The following lemma is used to prove The-orem 6, and will also be useful later on in the paper. We later modify it to prove aversion of Theorem 6 that works for sub-Gaussian, non-isotropic random vectors.
Lemma 7. [32, Lemma 5.36]
Consider a matrix B that satisfies (cid:13)(cid:13) B T B − I (cid:13)(cid:13) < max (cid:0) δ, δ (cid:1) for some δ > . Then (2.6) − δ ≤ σ min ( B ) ≤ σ max ( B ) ≤ δ. Conversely, if B satisfies (2.6) for some δ > , then (cid:13)(cid:13) B T B − I (cid:13)(cid:13) < (cid:0) δ, δ (cid:1) . Randomized ALS algorithm
Alternating least squares algorithm using random matrices.
We pro-pose the following alternative to using the normal equations in ALS algorithms:instead of (2.2), define the entries of B k via randomized projections,(3.1) B k (ˆ l, ˜ l ) = (cid:89) i (cid:54) = k (cid:68) F ˜ li , R ˆ li (cid:69) , where R ˆ li is the ˆ l -th column of a matrix R i ∈ R M i × r (cid:48) , r (cid:48) > r , with random entriescorresponding to direction i . The choice of r (cid:48) > r is made to reduce the conditionnumber of B k . As shown in Section 3.3, as r/r (cid:48) → the bound on κ ( B k ) goesto κ ( B k ) ≤ κ (cid:16)(cid:0) B ALSk (cid:1) (cid:17) , where B ALSk is the B k matrix for standard ALS, i.e.(2.2). In this paper we consider independent signed Bernoulli random variables,i.e., R i ( j k , ˆ l ) is either − or each with probability / . We have had some successusing standard Gaussian random variables in our experiments as well. The proposedchange also alters the right-hand side of the normal equations (2.1),(3.2) b j k (ˆ l ) = r G (cid:88) l =1 s G l G lk ( j k ) (cid:89) i (cid:54) = k (cid:68) G li , R ˆ li (cid:69) . Equivalently, B k may be written as B k = (cid:89) i (cid:54) = k R Ti F i . Looking ahead, we choose random matrices R i such that B k is a tall, rectangu-lar matrix. Solving the linear system (2.1) with rectangular B k will require apseudo-inverse, computed via either the singular value decomposition (SVD) or aQR algorithm.To further contrast the randomized ALS algorithm with the standard ALS algo-rithm, we highlight two differences: firstly, the randomized ALS trades the mono-tonic reduction of approximation error (a property of the standard ALS algorithm)for better conditioning. To adjust we use a simple tactic: if a randomized ALSsweep (over all directions) decreases the error, we keep the resulting approxima-tion. Otherwise, we discard the sweep, generate independent random matrices R i ,and rerun the sweep. Secondly, the randomized ALS algorithm can be more com-putationally expensive than the standard one. This is due to the rejection scheme ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 9 outlined above and the fact that B k in the randomized algorithm has a larger num-ber of rows than its standard counterpart, i.e., r (cid:48) > r . Pseudocode of our newalgorithm is presented in Algorithm 2. Algorithm 2:
Randomized alternating least squares algorithm for rank reduc-tion input : (cid:15) > , G with rank r G , max _ tries, max _ rank, max _ iter initialize r F = 1 tensor F = F ◦ · · · ◦ F d with randomly generated F k while r F ≤ max _ rank do tries = 1 iter = 1 construct randomized tensor R if r F > then add a random rank contribution to F : F = F + F r F ◦ · · · ◦ F r F d endwhile iter ≤ max _ iter and tries ≤ max _ tries do F old = F for k = 1 , . . . , d do construct B k , using (3.1)solve B k c j k = b j k for every j k in direction kdefine v ˜ l = (cid:16) c (˜ l ) , . . . , c M k (˜ l ) (cid:17) for ˜ l = 1 , . . . , r F s F ˜ l = (cid:13)(cid:13) v ˜ l (cid:13)(cid:13) for ˜ l = 1 , . . . , r F F ˜ lk ( j k ) = c j k (˜ l ) /s F ˜ l for ˜ l = 1 , . . . , r F endif (cid:107) F − G (cid:107) / (cid:107) G (cid:107) < (cid:15) thenreturn F else if (cid:107) F old − G (cid:107) / (cid:107) G (cid:107) < (cid:107) F − G (cid:107) / (cid:107) G (cid:107) then F = F old tries = tries + 1 iter = iter + 1 else tries = 1 iter = iter + 1 endend r F = r F + 1 endreturn F Remark . We have explored an alternative approach using projections onto ran-dom tensors, different from Algorithm 2. Instead of using B k in (3.1) to solve for c j k , we use the QR factorization of B k to form a preconditioner matrix, similar tothe approach of [29] for solving overdetermined least squares problems in a matrixsetting. This preconditioner is used to improve the condition number of B k in (2.2).The approach is different from Algorithm 2: we solve the same equations as thestandard ALS algorithm, but in a better conditioned manner. Solving the sameequations preserves the monotone error reduction property of standard ALS. With ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 10
Algorithm 2 the equations we solve are different, but, as shown in Section (3.2),the solutions of each least squares problem are close to the those obtained by thestandard ALS algorithm.We provide convergence results and theoretical bounds on the condition numberof B k with entries (3.1) in Sections 3.2 and 3.3, respectively. Additionally, inSection 4, we empirically demonstrate the superior conditioning properties of B k defined in (3.1) relative to those given by the standard ALS in (2.2).3.2. Convergence of the randomized ALS algorithm.
Before deriving boundson the condition number of (3.1), it is important to discuss the convergence prop-erties of our algorithm. To do so for our tensor algorithm, we derive a convergenceresult similar to [33, Lemma 4.8]. In this analysis, we flatten our tensors into largematrices and use results from random matrix theory to show convergence. First,we construct the large matrices used in this section from (3.1). Writing the innerproduct as a sum allows us to group all the summations together,We have B k (cid:16) ˆ l, ˜ l (cid:17) = (cid:89) i (cid:54) = k M i (cid:88) j i =1 F ˜ li ( j i ) R ˆ li ( j i )= M (cid:88) j =1 · · · M k − (cid:88) j k − =1 M k +1 (cid:88) j k +1 =1 · · · M d (cid:88) j d =1 (cid:16) F ˜ l ( j ) . . . (cid:17) (cid:16) R ˆ l ( j ) . . . (cid:17) , where we have expanded the product to get the sum of the products of individualentries. Introducing a multi-index j = ( j , . . . , j k − , j k +1 , . . . , j d ) , we define twomatrices, A k ∈ R M × r and R k ∈ R M × r (cid:48) , where M = (cid:81) i (cid:54) = k M i is large, i.e., we write A k (cid:16) j, ˜ l (cid:17) = F ˜ l ( j ) . . . F ˜ lk − ( j k − ) F ˜ lk +1 ( j k +1 ) . . . F ˜ ld ( j d ) R k (cid:16) j, ˆ l (cid:17) = R ˆ l ( j ) . . . R ˆ lk − ( j k − ) R ˆ lk +1 ( j k +1 ) . . . R ˆ ld ( j d ) . We note that these matrices can also be written as Khatri-Rao products, A k = F (cid:12) · · · (cid:12) F k − (cid:12) F k +1 (cid:12) · · · (cid:12) F d R k = R (cid:12) · · · (cid:12) R k − (cid:12) R k +1 (cid:12) · · · (cid:12) R d . (3.3)Since M is large, M (cid:29) r (cid:48) > r , both A and R are rectangular matrices. Similarly,we rewrite a vector b in (3.2), b j k (ˆ l ) = r G (cid:88) l =1 s G l G lk ( j k ) M (cid:88) j =1 · · · M k − (cid:88) j k − =1 M k +1 (cid:88) j k +1 =1 · · · M d (cid:88) j d =1 (cid:0) G l ( j ) . . . (cid:1) (cid:16) R ˆ l ( j ) . . . (cid:17) , using the multi-index j as b k ( j ) = r G (cid:88) l =1 s G l G lk ( j k ) (cid:0) G l ( j ) . . . G lk − ( j k − ) G lk +1 ( j k +1 ) . . . G ld ( j d ) (cid:1) . Using the introduced notation, A k , R k , and b k , we rewrite the normal equations(2.1) for direction k and coordinate j k as(3.4) A Tk A k c k = A Tk b , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 11 and the randomized version of those equations as(3.5) R Tk A k c k = R Tk b k . We highlight the notable difference between the random matrix R k above andthose found in the usual matrix settings, for instance, in randomized least squaresregression [20, 29, 30, 33]. Specifically, in the former, the entries of R k are notstatistically independent and are products of random variables, whereas in thelatter the entries are often i.i.d realizations of single random variables. In thepresent work, we utilize signed Bernoulli random variables, where the entries of R k are dependent but also follow a signed Bernoulli distribution. Whether thereexists an optimal choice of distribution for setting the entries R k in the tensor caserequires a careful examination which is beyond the scope of this paper.Next, we present a convergence result showing that the solution to the leastsquares problem at each iteration of randomized ALS is close to the solution wewould get using standard ALS. For ease of notation, we drop the subscript k from A and R . Lemma 9.
Given A ∈ R M × r and R ∈ R M × r (cid:48) where r ≤ r (cid:48) ≤ M , and assumingthat x ∈ R r is the solution that minimizes (cid:13)(cid:13) R T A ˆ x − R T b (cid:13)(cid:13) and y ∈ R r is thesolution that minimizes (cid:107) A ˆ y − b (cid:107) , then (3.6) (cid:107) A x − b (cid:107) ≤ κ (cid:0) R T Q (cid:1) (cid:107) A y − b (cid:107) , where Q ∈ R M × r Q , r Q ≤ r + 1 , is a matrix with orthonormal columns from the QRfactorization of the augmented matrix [ A (cid:124) b ] and where R T Q is assumed to havefull rank.Proof. We form the augmented matrix [ A (cid:124) b ] and find its QR decomposition, [ A (cid:124) b ] = QT , where T = [ T A (cid:124) T b ] , T A ∈ R r Q × r and T b ∈ R r Q , and Q ∈ R M × r Q has orthonor-mal columns. Therefore, we have A = QT A b = QT b . Using these decompositions of A and b , we define a matrix Θ such that Θ R T A = A Θ R T b = b , and arrive at Θ = Q (cid:16)(cid:0) R T Q (cid:1) T (cid:0) R T Q (cid:1)(cid:17) − ( RQ ) T , assuming that (cid:0) R T Q (cid:1) T (cid:0) R T Q (cid:1) is invertible. We address this additional assumptionwhen discussing the bounds of the extreme singular values of R T Q in Theorem 12.Starting from the left-hand side of (3.6), we have (cid:107) A x − b (cid:107) = (cid:13)(cid:13) Θ R T A x − Θ R T b (cid:13)(cid:13) ≤ (cid:107) Θ (cid:107) (cid:13)(cid:13) R T A x − R T b (cid:13)(cid:13) ≤ (cid:107) Θ (cid:107) (cid:13)(cid:13) R T A y − R T b (cid:13)(cid:13) . ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 12
Since multiplication by A maps a vector to the column space of A , there exists T y ∈ R r Q such that A y = QT y . Hence, we obtain (cid:107) A x − b (cid:107) ≤ (cid:107) Θ (cid:107) (cid:13)(cid:13) R T QT y − R T QT b (cid:13)(cid:13) ≤ (cid:107) Θ (cid:107) (cid:13)(cid:13) R T Q (cid:13)(cid:13) (cid:107) T y − T b (cid:107) ≤ (cid:107) Θ (cid:107) (cid:13)(cid:13) R T Q (cid:13)(cid:13) (cid:107) A y − b (cid:107) , where in the last step we used the orthonormality of the columns of Q .Next we estimate norms, (cid:107) Θ (cid:107) and (cid:13)(cid:13) R T Q (cid:13)(cid:13) . First, we decompose R T Q using thesingular value decomposition, R T Q = U Σ V T . From the definition of the spectralnorm we know (cid:13)(cid:13) R T Q (cid:13)(cid:13) = σ max (cid:0) R T Q (cid:1) . Using the SVD of R T Q and the definitionof Θ along with our assumption of invertibility of (cid:0) R T Q (cid:1) T (cid:0) R T Q (cid:1) , we write (cid:107) Θ (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) Q (cid:16)(cid:0) R T Q (cid:1) T (cid:0) R T Q (cid:1)(cid:17) − (cid:0) R T Q (cid:1) T (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:0) V Σ V T (cid:1) − V Σ U T (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) V Σ − V T V Σ U T (cid:13)(cid:13) = (cid:13)(cid:13) V Σ − U T (cid:13)(cid:13) . Hence (cid:107) Θ (cid:107) = 1 /σ min ( R T Q ) , and the bound is (cid:107) A x − b (cid:107) ≤ σ max (cid:0) R T Q (cid:1) /σ min ( R T Q ) (cid:107) A y − b (cid:107) ≤ κ (cid:0) R T Q (cid:1) (cid:107) A y − b (cid:107) . (cid:3) We later use results from [32] to bound κ (cid:0) R T Q (cid:1) since R T Q is a random matrixwhose rows are independent from one another, but whose columns are not. To usethis machinery, specifically Theorem 6, we require the following lemma. Lemma 10. R T Q is a random matrix with isotropic rows.Proof. Using the second moment matrix, we show that the rows of R T Q areisotropic. Given a row of R T Q written in column form, (cid:20) R (cid:16) : , ˆ l (cid:17) T Q (cid:21) T = Q T R (cid:16) : , ˆ l (cid:17) ,we form the second moment matrix, E (cid:20) Q T R (cid:16) : , ˆ l (cid:17) R (cid:16) ˆ l, : (cid:17) T Q (cid:21) = Q T E (cid:20) R (cid:16) : , ˆ l (cid:17) R (cid:16) ˆ l, : (cid:17) T (cid:21) Q. and show E (cid:20) R (cid:16) : , ˆ l (cid:17) R (cid:16) : , ˆ l (cid:17) T (cid:21) = I M × M . Hence E (cid:20) Q T R (cid:16) : , ˆ l (cid:17) R (cid:16) ˆ l, : (cid:17) T Q (cid:21) = Q T Q = I r Q × r Q and R T Q is isotropic.From the Khatri-Rao product definition of the matrix R (3.3), we write a columnof R as R (cid:16) : , ˆ l (cid:17) = (cid:79) i = 1 : di (cid:54) = k R i (cid:16) : , ˆ l (cid:17) . ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 13
Therefore, using properties of the Kronecker product (see, e.g. [24, equation (2.2)])we can switch the order of the regular matrix product and the Kronecker products, R (cid:16) : , ˆ l (cid:17) R (cid:16) : , ˆ l (cid:17) T = (cid:79) i = 1 : di (cid:54) = k R i (cid:16) : , ˆ l (cid:17) R i (cid:16) : , ˆ l (cid:17) T . Taking the expectation and moving it inside the Kronecker product gives us E (cid:20) R (cid:16) : , ˆ l (cid:17) R (cid:16) : , ˆ l (cid:17) T (cid:21) = (cid:79) i = 1 : di (cid:54) = k E (cid:20) R i (cid:16) : , ˆ l (cid:17) R i (cid:16) : , ˆ l (cid:17) T (cid:21) = (cid:79) i = 1 : di (cid:54) = k I M i × M i = I M × M . (cid:3) Since R T Q is a tall rectangular matrix ( R T Q ∈ R r (cid:48) × r Q ) with independent sub-Gaussian isotropic rows, we may use Theorem 5.39 from Vershynin to bound theextreme singular values. Lemma 11.
For every t ≥ , with probability at least − (cid:0) − ct (cid:1) we have (3.7) κ (cid:0) R T Q (cid:1) ≤ C (cid:112) ( r + 1) /r (cid:48) + t/ √ r (cid:48) − C (cid:112) ( r + 1) /r (cid:48) − t/ √ r (cid:48) , where C = C K and c = c K > depend only on the sub-Gaussian norm K =max i (cid:13)(cid:13)(cid:13) R (: , i ) T Q (cid:13)(cid:13)(cid:13) ψ of the rows of R T Q .Proof. Using Lemma 10 and Theorem 6, we have the following bound on the ex-treme condition numbers of R T Q ∈ R r (cid:48) × r Q for every t ≥ , with probability atleast − (cid:0) − ct (cid:1) , √ r (cid:48) − C √ r Q − t ≤ σ min (cid:0) R T Q (cid:1) ≤ σ max (cid:0) R T Q (cid:1) ≤ √ r (cid:48) + C √ r Q + t, where C = C K and c = c K > depend only on the sub-Gaussian norm K =max i (cid:13)(cid:13)(cid:0) R T Q (cid:1) i (cid:13)(cid:13) ψ of the rows of R T Q . Since r Q ≤ r + 1 , we have √ r (cid:48) − C √ r + 1 − t ≤ σ min (cid:0) R T Q (cid:1) ≤ σ max (cid:0) R T Q (cid:1) ≤ √ r (cid:48) + C √ r + 1 + t, with the same probability.We now state the convergence result. (cid:3) Theorem 12.
Given A ∈ R M × r and R ∈ R M × r (cid:48) where r ≤ r (cid:48) ≤ M , and assumingthat x ∈ R r is the solution that minimizes (cid:13)(cid:13) R T A ˆ x − R T b (cid:13)(cid:13) and y ∈ R r is thesolution that minimizes (cid:107) A ˆ y − b (cid:107) , then for every t ≥ , with probability at least − (cid:0) − ct (cid:1) we have (cid:107) A x − b (cid:107) ≤ C (cid:112) ( r + 1) /r (cid:48) + t/ √ r (cid:48) − C (cid:112) ( r + 1) /r (cid:48) − t/ √ r (cid:48) (cid:107) A y − b (cid:107) , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 14 where C = C K and c = c K > depend only on the sub-Gaussian norm K =max i (cid:13)(cid:13)(cid:13) R (: , i ) T Q (cid:13)(cid:13)(cid:13) ψ . The matrix Q ∈ R M × r Q , r Q ≤ r + 1 , is composed of or-thonormal columns from the QR factorization of the augmented matrix [ A (cid:124) b ] and R T Q is assumed to have full rank.Proof. The proof results from bounding κ (cid:0) R T Q (cid:1) in Lemma 9 with Lemma 11 viaLemma 10. (cid:3) Bounding the condition number of B k . To bound the condition numberof B k , we use a modified version of Theorem 6. If the rows of B k were isotropicthen we could use Theorem 6 directly. However, unlike R T Q in Lemma 10, thisis not the case for B k . While the second moment matrix Σ is not the identity, itdoes play a special role in the bound of the condition number since it is the matrix B k from the standard ALS algorithm. To see this, we take the ˆ l − th row of B k , B k (ˆ l, :) = (cid:110)(cid:81) i (cid:54) = k (cid:68) F ˜ li , R ˆ li (cid:69)(cid:111) ˜ l =1 ,...,n , and form the second moment matrix of X , Σ ( l, l (cid:48) ) = E (cid:89) i (cid:54) = k (cid:68) F li , R ˆ li (cid:69) (cid:68) F l (cid:48) i , R ˆ li (cid:69) = (cid:89) i (cid:54) = k E (cid:20)(cid:0) F li (cid:1) T R ˆ li (cid:16) R ˆ li (cid:17) T F l (cid:48) i (cid:21) = (cid:89) i (cid:54) = k (cid:0) F li (cid:1) T E (cid:20) R ˆ li (cid:16) R ˆ li (cid:17) T (cid:21) F l (cid:48) i . Since R ˆ li is a vector composed of either Bernoulli or standard Gaussian randomvariables, E (cid:20) R ˆ li (cid:16) R ˆ li (cid:17) T (cid:21) = I . Therefore, we are left with Σ ( l, l (cid:48) ) = (cid:81) i (cid:54) = k (cid:0) F li (cid:1) T F l (cid:48) i . We need to modify Theorem 6 for matrices that have independent, non-isotropicrows. In [32, Remark 5.40] it is noted in the case of a random matrix A ∈ R N × n with non-isotropic rows that we can apply Theorem 6 to A Σ − instead of A . Thematrix A Σ − has isotropic rows and, thus, we obtain the following inequality thatholds with probability at least − (cid:0) − ct (cid:1) ,(3.8) (cid:13)(cid:13)(cid:13)(cid:13) N A T A − Σ (cid:13)(cid:13)(cid:13)(cid:13) ≤ max (cid:0) δ, δ (cid:1) (cid:107) Σ (cid:107) , where δ = C (cid:114) nN + t √ N , and C = C K , c = c K > .To clarify how (3.8) changes the bounds on the singular values σ min ( A ) and σ max ( A ) of a matrix A with non-isotropic rows, we modify Lemma 7. Lemma 13.
Consider matrices B ∈ R N × n and Σ − ∈ R n × n (non-singular) thatsatisfy (3.9) − δ ≤ σ min (cid:16) B Σ − (cid:17) ≤ σ max (cid:16) B Σ − (cid:17) ≤ δ, for δ > . Then we have the following bounds on the extreme singular values of B : σ min (cid:16) Σ (cid:17) · (1 − δ ) ≤ σ min ( B ) ≤ σ max ( B ) ≤ σ max (cid:16) Σ (cid:17) · (1 + δ ) . ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 15
The proof of Lemma 13 is included in Appendix A. Using Lemma 13, we observethat the bound on the condition number of a matrix B satisfying (3.9) has thefollowing form: κ ( B ) ≤ (1 + δ )(1 − δ ) κ (cid:16) Σ (cid:17) . Using Lemma 13, we prove an extension of Theorem 6 for matrices with non-isotropic rows.
Theorem 14.
Let A be an N × n matrix whose rows, A ( i, :) , are independent,sub-Gaussian random vectors in R n . Then for every t ≥ , with probability at least − (cid:0) − ct (cid:1) one has σ min (cid:16) Σ (cid:17) · (cid:16) √ N − C √ n − t (cid:17) ≤ σ min ( A ) ≤ σ max ( A ) ≤ σ max (cid:16) Σ (cid:17) · (cid:16) √ N + C √ n + t (cid:17) . Here C = C K , c = c K > , depend only on the sub-Gaussian norm K = max i (cid:107) A ( i, :) (cid:107) ψ and the norm of Σ − .Proof. We form the second moment matrix Σ using rows A ( i, :) and apply Theo-rem (6) to the matrix A Σ − , which has isotropic rows. Therefore, for every t ≥ ,with probability at least − (cid:0) − ct (cid:1) , we have(3.10) √ N − C √ n − t ≤ σ min (cid:16) A Σ − (cid:17) ≤ σ max (cid:16) A Σ − (cid:17) ≤ √ N + C √ n + t, where C = C ˜ K , c = c ˜ K > , depend only on the sub-Gaussian norm ˜ K =max i (cid:13)(cid:13)(cid:13) Σ − A ( i, :) T (cid:13)(cid:13)(cid:13) ψ . Applying Lemma 13 to (3.10) with B = A/ √ N and δ = C (cid:112) n/N + t/ √ N , results in the bound σ min (cid:16) Σ (cid:17) · (cid:16) √ N − C √ n − t (cid:17) ≤ σ min ( A ) ≤ σ max ( A ) ≤ σ max (cid:16) Σ (cid:17) · (cid:16) √ N + C √ n + t (cid:17) , with the same probability as (3.10).To move Σ − outside the sub-Gaussian norm, we bound ˜ K from above usingthe sub-Gaussian norm of A , K = max i (cid:107) A ( i, :) (cid:107) ψ , (cid:13)(cid:13)(cid:13) Σ − A ( i, :) T (cid:13)(cid:13)(cid:13) ψ = sup x ∈S n − (cid:13)(cid:13)(cid:13)(cid:68) Σ − A ( i, :) T , x (cid:69)(cid:13)(cid:13)(cid:13) ψ = sup x ∈S n − (cid:13)(cid:13)(cid:13)(cid:68) A ( i, :) T , Σ − x (cid:69)(cid:13)(cid:13)(cid:13) ψ (cid:13)(cid:13)(cid:13) Σ − x (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) Σ − x (cid:13)(cid:13)(cid:13) ≤ sup y ∈S n − (cid:13)(cid:13)(cid:13)(cid:68) A ( i, :) T , y (cid:69)(cid:13)(cid:13)(cid:13) ψ sup x ∈S n − (cid:13)(cid:13)(cid:13) Σ − x (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) A ( i, :) T (cid:13)(cid:13)(cid:13) ψ (cid:13)(cid:13)(cid:13) Σ − (cid:13)(cid:13)(cid:13) , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 16 hence ˜ K ≤ K (cid:13)(cid:13)(cid:13) Σ − (cid:13)(cid:13)(cid:13) . Using this inequality, we bound the probability in (6.4) forthe case of Theorem 6 applied to A Σ − . P (cid:26) max x ∈N (cid:12)(cid:12)(cid:12)(cid:12) N (cid:13)(cid:13)(cid:13) A Σ − x (cid:13)(cid:13)(cid:13) − (cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:15) (cid:27) ≤ n · (cid:20) − c ˜ K (cid:0) C n + t (cid:1)(cid:21) ≤ n · − c K (cid:13)(cid:13)(cid:13) Σ − (cid:13)(cid:13)(cid:13) (cid:0) C n + t (cid:1) ≤ − c t K (cid:13)(cid:13)(cid:13) Σ − (cid:13)(cid:13)(cid:13) . The last step, similar to the proof of Theorem 6, comes from choosing C largeenough, for example C = K (cid:13)(cid:13)(cid:13) Σ − (cid:13)(cid:13)(cid:13) (cid:112) ln (9) /c ). (cid:3) The combination of Lemma 13, the fact that Σ for (3.1) is the same matrixas (2.2) (denoted below as B ALSk ), and Theorem 14, leads to our bound on thecondition number of B k in (3.1) is, for every t ≥ and with probability at least − (cid:0) − ct (cid:1) ,(3.11) κ ( B k ) ≤ C (cid:112) r/r (cid:48) + t/ √ r (cid:48) − C (cid:112) r/r (cid:48) − t/ √ r (cid:48) κ (cid:16)(cid:0) B ALSk (cid:1) (cid:17) , where the definitions of C and c are the same as in Theorem 14. Remark . In both (3.7) and (3.11) the ratios r/r (cid:48) and t/ √ r (cid:48) are present. Asboth ratios go to zero, our bound on the condition number of B k goes to κ ( B k ) ≤ κ (cid:16)(cid:0) B ALSk (cid:1) (cid:17) , and the bound on the condition number of R T Q goes to κ (cid:0) R T Q (cid:1) ≤ . These properties explain our choice to set r (cid:48) as a constant multiple of r inthe randomized ALS algorithm. As with similar bounds for randomized matrixalgorithms, these bounds are pessimistic. Hence r (cid:48) does not have to be very largewith respect to r in order to get acceptable results.Another reason to choose r (cid:48) as a constant multiple of r is the construction of R k ∈ R M × r (cid:48) in (3.3) and our choice to use Bernoulli random numbers. If r (cid:48) is toosmall, there is the danger of the matrix R k becoming singular due to the size of M and repeated rows. Choosing the constant multiple that defines r (cid:48) large enoughhelps mitigate this problem. 4. Examples
Sine function.
Our first test of the randomized ALS algorithm is to reduce aCTD generated from samples of the multivariate function sin ( z + · · · + z d ) . Thisreduction problem was studied in [4], where the output of the standard ALS algo-rithm suggested a new trigonometric identity yielding a rank d separated represen-tation of sin ( z + · · · + z d ) . As input, we use standard trigonometric identities toproduce a rank d − initial CTD.We ran tests using both standard ALS and the new randomized algorithmto reduce the separation rank of a CTD of samples of sin ( z + · · · + z d ) . The tests ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 17 differed in that each one had a different random initial guess with separation rank r F = 1 . In this example we chose d = 5 and sampled each variable z i , i = 1 , . . . , d ,with M = 64 equispaced samples in the interval [0 , π ] . Our input CTD for bothalgorithms was rank and was generated via a standard trigonometric identity.The reduction tolerance for both algorithms was set to (cid:15) = 10 − , and the maximumnumber of iterations per rank, i.e. max _ iter in Algorithm 1 and Algorithm 2,was set to . For tests involving the standard ALS algorithm we used a stucktolerance of δ = 10 − . To test the randomized ALS algorithm we used B k matricesof size (25 r F ) × r F and set max _ tries in Algorithm 2 to .According to Lemma 2.4 in [4], there exists exact rank separated representa-tions of sin ( z + · · · + z d ) . Using (cid:15) = 10 − for our reduction tolerance, we were ableto find rank approximations with both standard ALS and our randomized ALSwhose relative errors were less than the requested (cid:15) (for a histogram of residuals ofthe tests, see Figure 4.1).Due to the random initial guess F and our choices of parameters (in particular thestuck tolerance and max _ tries ) both algorithms had a small number of runs thatdid not find rank approximations with the requested tolerance (cid:15) . The randomizedALS algorithm produced fewer of these outcomes than standard ALS.Large differences in maximum condition number (of ALS solves) are illustrated inFigure 4.2, where we compare tests of the standard and randomized ALS algorithms.We observe that the maximum condition numbers produced by the randomizedALS algorithm are much smaller those from the standard ALS algorithm. This isconsistent with our theory.Furthermore, as shown in Figure 4.3, the number of iterations required for ran-domized ALS to converge was smaller than the number required by standard ALS.It is important to remember that the number of iterations required by the standardALS algorithm to reduce a CTD can be optimized by adjusting the tolerance, stucktolerance, and maximum number of iterations per rank. In these experiments wechose the stuck tolerance and maximum number of iterations to reduce the numberof tests of the standard ALS algorithm that did not meet the requested tolerance (cid:15) .4.2. A manufactured tensor.
Our next test is to compare the performance ofthe standard and randomized ALS algorithms on a manufactured random tensorexample. To construct this example we generate factors by drawing M = 128 random samples from the standard Gaussian distribution. We chose d = 10 and setthe separation rank of the input tensor to r = 50 . Then we normalized the factorsand set the s-values of the tensor equal to s l = e − l , l = 0 , . . . , r − , where r waspredetermined such that s end is small.Similar to the sine example, we ran experiments and requested an accuracyof (cid:15) = 10 − from both algorithms. The maximum number of iterations for bothalgorithms was set to , while the stuck tolerance for the standard ALS algo-rithm was set to − . We used the following parameters for the randomized ALSalgorithm: the B k matrices were of size (25 r F ) × r F , and the repetition parameter, max _ tries in Algorithm 2, was set to . We started all tests from randomizedguesses with rank r F = 9 . This value was chosen because in all previous runs thereduced separation rank never fell below r F = 10 . Such an experiment allows usto compare how the algorithms perform when the initial approximation has rankgreater than one. ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 18
Figure 4.1.
Histograms displaying ALS reduction residuals, in log scale, for reducing the length of a CTD of samples of sin ( z + . . . z ) . The experiments shown in (a) used randomizedALS, whereas the experiments shown in (b) used standard ALS. Wenote that both algorithms produced a small number of results withapproximation errors worse than the requested tolerance. However,the randomized ALS method produced fewer results that did notmeet our requested tolerance.We show in Figure 4.4 the output separation ranks from tests of both therandomized and standard ALS algorithms. The CTD outputs from randomizedALS had, on average, lower separation ranks than those from standard ALS. Fur-thermore, as seen in Figure 4.4, some of the output CTDs from the standard ALSalgorithm had separation rank above . In these instances, standard ALS failed toreduce the separation rank of the input CTD because simple truncation to r F = 35 would have given double precision. These failures did not occur with the random-ized ALS algorithm. We can also see the contrast in performance in Figure 4.5: all ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 19
Figure 4.2.
Histogram showing the maximum condition numbersfrom our experiments reducing the length of a CTD of samples of sin ( z + . . . z ) . The condition numbers are shown in log scale;the solid gray pattern represents condition numbers from standardALS, while the hatch pattern represents condition numbers fromthe randomized ALS algorithm. Figure 4.3.
Histogram showing the number of iterations requiredby randomized ALS (hatch pattern) and the standard ALS algo-rithm (gray pattern) to reduce the length of a CTD of samples of sin ( z + . . . z ) .tests of the randomized ALS algorithm produced CTDs with reduced separationrank whose relative reduction errors were less than the accuracy (cid:15) . Also, in Fig-ure 4.5, we observe instances where the standard ALS algorithm failed to output areduced separation rank CTD with relative error less than (cid:15) . ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 20
There was a significant difference in the maximum condition numbers of matri-ces used in the two algorithms. In Figure 4.6, we see that matrices produced bystandard ALS had much larger condition numbers (by a factor of roughly )than their counterparts in the randomized ALS algorithm. Such large conditionnumbers may explain the failures of the standard ALS algorithm to output reducedseparation rank CTDs with relative errors less than (cid:15) .From Figure 4.7, we see that in many cases standard ALS required fewer it-erations than the randomized ALS algorithm to converge. However, relative torandomized ALS, standard ALS required a large number of iterations for a consid-erable fraction of the experiments.4.3. Elliptic PDE with random coefficient.
As the key application of the ran-domized ALS algorithm, we consider the separated representation of the solution u ( x , z ) to the linear elliptic PDE − ∇ · ( a ( x , z ) ∇ u ( x , z )) = 1 , x ∈ D , (4.1) u ( x , z ) = 0 , x ∈ ∂ D , defined on the unit square D = (0 , × (0 , with boundary ∂ D . The diffusioncoefficient a ( x , z ) is considered random and is modeled by a ( x , z ) = a + σ a d (cid:88) k =1 (cid:112) ζ k ϕ k ( x ) z k , (4.2)where z = ( z , . . . , z d ) and the random variables z k are independent and uniformlydistributed over the interval [ − , , and we choose a = 0 . , σ a = 0 . , and d = 5 .In (4.2), { ζ k } dk =1 are the d largest eigenvalues associated with { ϕ k } dk =1 , the L ( D ) -orthonormalized eigenfunctions of the exponential covariance function(4.3) C aa ( x , x ) = exp (cid:18) − (cid:107) x − x (cid:107) l c (cid:19) , where l c denotes the correlation length, here set to l c = 2 / . Given the choices ofparameters a , σ a , d , and l c , the model in (4.2) leads to strictly positive realizationsof a ( x , z ) .We discretize (4.1) in the spatial domain D via triangular finite elements of size h = 1 / . This discretization, along with the affine representation of a ( x , z ) in z k ,yields the random linear system of equations(4.4) (cid:32) K + d (cid:88) k =1 K k z k (cid:33) u ( z ) = f , for the approximate vector of nodal solutions u ( z ) ∈ R N . The sparse matrices K and K k are obtained from the finite element discretization of the differentialoperator in (4.1) assuming a ( x , z ) is replaced by ¯ a and σ a √ ζ k φ k ( x ) , respectively.To fully discretize (4.4), we consider the representation of u ( z ) at a tensor-product grid { ( z ( j ) , . . . , z d ( j d )) : j k = 1 , . . . , M k } where, for each k , the grid points z k ( j k ) are selected to be the Gauss-Legendre abscissas. In our numerical experi-ments, we used the same number of abscissas M k = M = 8 for all k = 1 , . . . , d .The discrete representation of (4.4) is then given by the tensor system of equations(4.5) K U = F , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 21
Figure 4.4.
Histograms showing output ranks from experimentsin reducing the length of CTDs. (a) shows ranks of CTDs out-put by the randomized ALS algorithm. (b) shows ranks of CTDsoutput by the standard ALS algorithm. The CTDs output by therandomized ALS method typically have a smaller separation rank.In many examples the standard ALS algorithm required 40 terms,i.e. it failed since truncation of the input tensor to r F = 35 shouldgive double precision.where the linear operation K U is defined as K U = d (cid:88) ˆ l =0 r U (cid:88) ˜ l =1 s U ˜ l (cid:16) K ˆ l U ˜ l (cid:17) ◦ (cid:16) K ˆ l U ˜ l (cid:17) ◦ · · · ◦ (cid:16) K ˆ ld U ˜ ld (cid:17) , K ˆ l = K ˆ l , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 22
Figure 4.5.
Histograms displaying ALS reduction errors, in log scale, for reduced-rank CTDs of the random tensor example. (a)shows that in our 500 tests, the randomized ALS method alwaysproduced a result that met the required tolerance. (b) shows howthe standard ALS algorithm fared with the same problem. Notethat the standard ALS algorithm failed to reach the requestedtolerance in a significant number of tests.and for k = 1 , . . . , d , K ˆ lk = (cid:40) D ˆ l = kI M ˆ l (cid:54) = k, where D = z k (1) 0 . . . z k ( M ) , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 23
Figure 4.6.
Histogram showing the maximum condition numbersfrom experiments in reducing the length of CTDs of the randomtensor example. The condition numbers of B k are shown in log scale; solid gray represents condition numbers from standard ALSwhile the hatch pattern represents condition numbers from the ran-domized ALS algorithm. Similar to the sine example, the conditionnumbers from randomized ALS are much smaller than those fromthe standard ALS algorithmand I M is the M × M identity matrix. The tensor F in (4.5) is defined as F = f ◦ M ◦ · · · ◦ M , where M is an M -vector of ones. We seek to approximate U in (4.5) with a CTD,(4.6) U = r U (cid:88) ˜ l =1 s U ˜ l U ˜ l ◦ U ˜ l ◦ · · · ◦ U ˜ ld , where the separation rank r U will be determined by a target accuracy. In (4.6) U ˜ l ∈ R N and U ˜ lk ∈ R M , k = 1 , . . . , d . To solve (4.5), we use a fixed point iterationsimilar to those used for solving matrix equations and recently employed to solvetensor equations in [23]. In detail, the iteration starts with an initial tensor U ofthe form in (4.6). At each iteration i , U is updated according to U i +1 = ( I − K ) U i + F , while requiring (cid:107) I − K (cid:107) < . To assure this requirement is satisfied we solve(4.7) U i +1 = c ( F − K U i ) + U i , where c is chosen such that (cid:107) I − c K (cid:107) < . We compute the operator norm (cid:107) I − K (cid:107) via power method; see, e.g., [4, 5].One aspect of applying such an iteration to a CTD is an increase in the outputseparation rank. For example, if we take a tensor U of separation rank r U anduse it as input for (4.7), one iteration would increase the rank to r F + ( d + 2) r U .Therefore we require a reduction algorithm to decrease the separation rank as we ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 24
Figure 4.7.
Histograms showing iterations required to producereduced-length CTDs for the random tensor example. (a) showsiterations required by randomized ALS, while (b) shows the iter-ations required by the standard ALS algorithm. As seen in (b),many examples using the standard ALS algorithm required largenumbers of iterations to output CTDs.iterate. This is where either the standard or randomized ALS algorithm is required:to truncate the separated representation after we have run an iteration. BothALS methods work with a user-supplied truncation accuracy (cid:15) , so we denote thereduction operator as τ (cid:15) . Including this operator into our iteration, we have(4.8) U i +1 = τ (cid:15) ( c ( F − K U i ) + U i ) . Pseudocode for our fixed point algorithm is shown in Algorithm 3
Remark . In this example, the separation rank of K is directly related to theproblem dimension d , i.e. r K = d + 1 , which is a consequence of using a Karhunen-Loeve-type expansion for finite-dimensional noise representation of a ( x , z ) . This ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 25
Algorithm 3:
Fixed point iteration algorithm for solving (4.8) input : (cid:15) > , µ > , operator K , F , c, max _ iter, max _ rank, δ > (forstandard ALS), max _ tries (for randomized ALS) initialize r U = 1 tensor U = U ◦ · · · ◦ U d with either randomly generated U k or U k generated from the solution of the nominal equations. D = F − K U res = (cid:107) D (cid:107) / (cid:107) F (cid:107) iter = 0 while res > µ do iter = iter + 1 U iter = c D iter − + U iter − U iter = τ (cid:15) ( U iter ) D iter = F − K U iter res = (cid:107) D iter (cid:107) / (cid:107) F (cid:107) endreturn U iter will increase the computational cost of the algorithm to more than linear withrespect to d , e.g. quadratic in d when an iterative solver is used and N (cid:29) M .Alternatively, one can obtain the finite-dimensional noise representation of a ( x , z ) by applying the separated rank reduction technique of this study on the stochasticdifferential operator itself to possibly achieve r K < d . The interested reader isreferred to [3, 4] for more details.First, we examine the convergence of the iterative algorithm given a fixed ALSreduction tolerance in Figure 4.8. The randomized ALS method converges to moreaccurate solutions in all of these tests (see Table 1). However, the ranks of therandomized ALS solutions are larger than the ranks required for solutions producedby the standard ALS algorithm.In Figure 4.9, we observe different behavior in the relative residuals using fixedranks instead of fixed accuracies. For these experiments the ALS-based linear solveusing the standard algorithm out-performs the randomized version, except in therank case (see Table 2). In this case, the standard ALS algorithm has issuesreaching the requested ALS reduction tolerance, thus leading to convergence prob-lems in the iterative linear solve. The randomized ALS algorithm does not have thesame difficulty with the rank r = 30 example. The difference in performance be-tween standard and randomized ALS for this example corresponds to a significantdifference between the maximum condition numbers of B k . For the r = 30 case,the maximum condition number of B k matrices generated by randomized ALS was . × , whereas the maximum condition number of B k matrices generated bystandard ALS was . × .5. Discussion and conclusions
We have proposed a new ALS algorithm for reducing the rank of tensors in canon-ical format that relies on projections onto random tensors. Tensor rank reductionis one of the primary operations for approximations with tensors. Additionally, wehave presented a general framework for the analysis of this new algorithm. The
ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 26
Figure 4.8.
Residual error vs. iteration of results from linearsolvers. The black lines represent linear solve residuals where stan-dard ALS was used for reduction, while the gray lines representlinear solve residuals where randomized ALS was used for reduc-tion. In the three examples shown above the ALS tolerances, forboth standard and randomized ALS, were set to × − for curveslabeled (a), × − for curves labeled (b), and × − for curveslabeled (c).benefit of using such random projections is the improved conditioning of matricesassociated with the least squares problem at each ALS iteration. While significantreductions of condition numbers may be achieved, unlike in the standard ALS, theapplication of random projections results in a loss of monotonic error reduction.In order to restore monotonicity, we have employed a simple rejection approach,wherein several random tensors are applied and only those that do not increase theerror are accepted. This, however, comes at the expense of additional computa-tional cost as compared to the standard ALS algorithm. Finally, a set of numerical ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 27
Figure 4.9.
Plots showing relative residuals of linear solves vs.fixed point iteration number. (a) two linear solves are shown here:solves for fixed ranks r = 10 , and r = 20 . Gray lines are residualscorresponding to reductions with randomized ALS and black linescorrespond to reductions with the standard ALS algorithm. (b)One experiment with r = 30 and the same color scheme as in (a).ALS type ALS tol max κ ( B k ) max rank rank residualstandard × − . × . × − × − . ×
13 11 5 . × − × − . ×
37 34 4 . × − randomized × − . × . × − × − . ×
22 19 2 . × − × − . ×
57 54 3 . × − Table 1.
Table containing ranks, maximum condition numbers,and final relative residual errors of experiments with fixed ALStolerance.experiments has been studied to illustrate the efficiency of the randomized ALS inimproving numerical properties of its standard counterpart.The optimal choice of random variables to use in the context of projecting ontorandom tensors is a question to be addressed in future work. In our examples wehave used signed Bernoulli random variables, a choice that worked well with bothour numerical experiments and analysis. On the other hand, the limitations ofsuch a construction of random tensors have been discussed, which motivate furtherinvestigations. Another topic of interest for future work is the extension of the
ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 28
ALS type ALS tol max κ ( B k ) rank residualstandard × − . ×
10 7 . × − × − . ×
20 1 . × − × − . ×
30 4 . × − randomized × − . ×
10 1 . × − × − . ×
20 2 . × − × − . ×
30 1 . × − Table 2.
Table containing maximum condition numbers and finalrelative residual errors of experiments with fixed separation ranks.proposed randomized framework to other tensor formats including the Tucker, [24],and tensor-train, [28].Finally we have suggested an alternative approach to using projections ontorandom tensors that merits further examination. This approach uses the QR fac-torization to construct a preconditioner for the least squares problem at each ALSiteration. Hence it solves the same equations as the standard ALS, but the matriceshave better conditioning. Also, because it solves the same equations, the mono-tonic error reduction property is preserved. This is an important distinction fromrandomized ALS, which solves different linear systems, but the solutions to whichare close to the solutions from standard ALS.6.
Appendix A
First, we prove (2.4).
Proof.
To bound the condition number of AB we bound σ max ( AB ) from aboveand σ min ( AB ) from below. The bound we use of σ max ( AB ) is straightforward; itcomes from the properties of the two norm, σ max ( AB ) ≤ σ max ( A ) σ max ( B ) . To bound σ min ( AB ) we first note that AA T is nonsingular, and write σ min ( AB ) as follows, σ min ( AB ) = (cid:13)(cid:13)(cid:13) AA T (cid:0) AA T (cid:1) − AB x ∗ (cid:13)(cid:13)(cid:13) , where (cid:107) x ∗ (cid:107) = 1 is the value of x such that the minimum of the norm is obtained(see the minimax definition of singular values, e.g. [22, Theorem 3.1.2]). If wedefine y = A T (cid:0) AA T (cid:1) − AB x ∗ , then σ min ( AB ) = (cid:107) A y (cid:107) · (cid:107) y (cid:107)(cid:107) y (cid:107) ≥ σ min ( A ) · (cid:107) y (cid:107) , from the minimax definition of singular values. To bound (cid:107) y (cid:107) , we observe that A T (cid:0) AA T (cid:1) − AB is the projection of B onto the row space of A . Denoting thisprojection as P A T ( B ) we have, (cid:107) y (cid:107) = (cid:107) P A T ( B ) x ∗ (cid:107) ≥ σ min ( P A T ( B )) , ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 29 since (cid:107) x ∗ (cid:107) = 1 . Combining our bounds on the first and last singular values givesus the bound on the condition number, κ ( AB ) = σ max ( A ) σ max ( B ) σ min ( A ) σ min ( P A T ( B )) = κ ( A ) · σ max ( B ) σ min ( P A T ( B )) . (cid:3) The proof of Theorem 6 is broken down into three steps in order to control (cid:107) A x (cid:107) for all x on the unit sphere: an approximation step, where the unit sphere iscovered using a finite epsilon-net N (see [32, Section 5.2.2] for background on nets);a concentration step, where tight bounds are applied to (cid:107) A x (cid:107) for every x ∈ N ; andthe final step where a union bound is taken over all the vectors x ∈ N . Proof. (of Theorem 6)Vershynin observes that if we set B in Lemma 7 to A/ √ N , the bounds on theextreme singular values σ min ( A ) and σ max ( A ) in (2.5) are equivalent to(6.1) (cid:13)(cid:13)(cid:13)(cid:13) N A T A − I (cid:13)(cid:13)(cid:13)(cid:13) < max (cid:0) δ, δ (cid:1) =: (cid:15), where δ = C (cid:112) nN + t √ N . In the approximation step of the proof, he chooses a -net N to cover the unit sphere S n − . Evaluating the operator norm (6.1) on N , it issufficient to show max x ∈N (cid:12)(cid:12)(cid:12)(cid:12) N (cid:107) A x (cid:107) − (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15) , with the required probability to prove the theorem.Starting the concentration step, [32] defines Z i = (cid:104) A i , x (cid:105) , where A i is the i -throw of A and (cid:107) x (cid:107) = 1 . Hence, the vector norm may be written as(6.2) (cid:107) A x (cid:107) = N (cid:88) i =1 Z i . Using an exponential deviation inequality to control (6.2), and that K ≥ √ , thefollowing probabilistic bound for a fixed x ∈ S n − is, P (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:107) A x (cid:107) − (cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:15) (cid:27) = P (cid:40)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) i =1 Z i − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:15) (cid:41) ≤ (cid:104) − c K min (cid:0) (cid:15) , (cid:15) (cid:1) N (cid:105) = 2 exp (cid:104) − c K δ N (cid:105) ≤ (cid:104) − c K (cid:0) C n + t (cid:1)(cid:105) , (6.3)where c is an absolute constant.Finally, (6.3) is applied to every vector x ∈ N resulting in the union bound,(6.4) P (cid:26) max x ∈N (cid:12)(cid:12)(cid:12)(cid:12) N (cid:107) A x (cid:107) − (cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:15) (cid:27) ≤ n · (cid:104) − c K (cid:0) C n + t (cid:1)(cid:105) ≤ (cid:18) − c t K (cid:19) , where we arrive at the second inequality by choosing a sufficiently large C = C K ([32] gives the example C = K (cid:112) ln (9) /c ). (cid:3) We now prove Lemma 13.
ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 30
Proof.
To prove this lemma we use the following inequality derived from (3.9), (1 − δ ) ≤ σ min (cid:16) Σ − B T B Σ − (cid:17) ≤ σ max (cid:16) Σ − B T B Σ − (cid:17) ≤ (1 + δ ) . First we bound σ max ( B ) from above: σ max ( B ) ≤ (cid:13)(cid:13)(cid:13) Σ (cid:13)(cid:13)(cid:13) · (cid:13)(cid:13)(cid:13) Σ − B T B Σ − (cid:13)(cid:13)(cid:13) · (cid:13)(cid:13)(cid:13) Σ (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) Σ (cid:13)(cid:13)(cid:13) · σ max (cid:16) Σ − B T B Σ − (cid:17) ≤ σ max (cid:16) Σ (cid:17) · (1 + δ ) , implying σ max ( B ) ≤ σ max (cid:16) Σ (cid:17) · (1 + δ ) . Second we bound σ min ( B ) from below: σ min ( B ) = σ min (cid:16) Σ Σ − B T B Σ − Σ (cid:17) ≥ σ min (cid:16) Σ (cid:17) · σ min (cid:16) Σ − B T B Σ − (cid:17) ≥ σ min (cid:16) Σ (cid:17) · (1 − δ ) , (6.5)implying σ min ( B ) ≥ σ min (cid:16) Σ (cid:17) · (1 − δ ) . The first inequality in (6.5) is from [22,prob. 3.3.12]. Finally, using properties of singular values we combine the inequali-ties: σ min (cid:16) Σ (cid:17) · (1 − δ ) ≤ σ min ( B ) ≤ σ max ( B ) ≤ σ max (cid:16) Σ (cid:17) · (1 + δ ) . (cid:3) References [1] C.J. Appellof and E.R. Davidson. Strategies for analyzing data from video fluorometric mon-itoring of liquid chromatographic effluents.
Analytical Chemistry , 53(13):2053–2056, 1982.[2] B.W. Bader, M.W. Berry, and M.Browne. Discussion tracking in enron email using parafac.In
Survey of Text Mining II , pages 147–163. Springer London, 2008.[3] G. Beylkin and M. J. Mohlenkamp. Numerical operator calculus in higher dimensions.
Proc.Natl. Acad. Sci. USA , 99(16):10246–10251, August 2002.[4] G. Beylkin and M. J. Mohlenkamp. Algorithms for numerical analysis in high dimensions.
SIAM J. Sci. Comput. , 26(6):2133–2159, July 2005.[5] D. J. Biagioni, D. Beylkin, and G. Beylkin. Randomized interpolative decomposition of sep-arated representations.
Journal of Computational Physics , 281:116–134, 2015.[6] R. Bro. Parafac. Tutorial & Applications. In
Chemom. Intell. Lab. Syst., Special Is-sue 2nd Internet Conf. in Chemometrics (incinc’96) , volume 38, pages 149–171, 1997. .[7] J. D. Carroll and J. J. Chang. Analysis of individual differences in multidimensional scalingvia an N-way generalization of Eckart-Young decomposition.
Psychometrika , 35:283–320,1970.[8] Z. Chen and J. Dongarra. Condition number of gaussian random matrices.
SIAM J. MatrixAnal. Appl. , 27(3):603–620, 2005.[9] P.A. Chew, B.W. Bader, T.G. Kolda, and A. Abdelali. Cross-language information retrievalusing parafac2. In
Proceedings of the 13th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining , KDD ’07, pages 143–152, New York, NY, USA,2007. ACM.[10] F. Chinesta, P. Ladeveze, and E. Cueto. A short review on model order reduction basedon proper generalized decomposition.
Archives of Computational Methods in Engineering ,18(4):395–404, 2011.
ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 31 [11] A. Doostan and G. Iaccarino. A least-squares approximation of partial differential equationswith high-dimensional random inputs.
Journal of Computational Physics , 228(12):4332–4345,2009.[12] A. Doostan, G. Iaccarino, and N. Etemadi. A least-squares approximation of high-dimensionaluncertain systems. Technical Report Annual Research Brief, Center for Turbulence Research,Stanford University, 2007.[13] A. Doostan, A. Validi, and G. Iaccarino. Non-intrusive low-rank separated approximation ofhigh-dimensional stochastic models.
Comput. Methods Appl. Mech. Engrg. , 263:42–55, 2013.[14] A. Edelman. Eigenvalues and condition numbers of random matrices.
SIAM J. Matrix Anal.Appl. , 9(4):543–560, October 1988.[15] A. Edelman.
Eigenvalues and condition numbers of random matrices . Ph.d. thesis, Mas-sachusetts Institute of Technology, 1989.[16] G. Golub and C. Van Loan.
Matrix Computations . Johns Hopkins University Press, 3rdedition, 1996.[17] L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approxima-tion techniques.
CoRR , abs/1302.7121, 2013.[18] W. Hackbusch.
Tensor spaces and numerical tensor calculus , volume 42. Springer, 2012.[19] M. Hadigol, A. Doostan, H. Matthies, and R. Niekamp. Partitioned treatment of uncertaintyin coupled domain problems: A separated representation approach.
Computer Methods inApplied Mechanics and Engineering , 274:103–124, 2014.[20] N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilis-tic algorithms for constructing approximate matrix decompositions.
SIAM Review , 53(2):217–288, 2011.[21] R. A. Harshman. Foundations of the Parafac procedure: model and conditions for an“explanatory” multi-mode factor analysis. Working Papers in Phonetics 16, UCLA, 1970. http://publish.uwo.ca/ ∼ harshman/wpppfac0.pdf .[22] R. A. Horn and C. R. Johnson. Topics in matrix analysis . Cambridge Univ. Press, Cambridge,1994.[23] B. Khoromskij and C. Schwab. Tensor-structured Galerkin approximation of parametric andstochastic elliptic PDEs.
SIAM Journal on Scientific Computing , 33(1):364–385, 2011.[24] T. G. Kolda and B. W. Bader. Tensor decompositions and applications.
SIAM Review ,51(3):455–500, 2009.[25] A. Nouy. A generalized spectral decomposition technique to solve a class of linear stochasticpartial differential equations.
Computer Methods in Applied Mechanics and Engineering ,196(37-40):4521–4537, 2007.[26] A. Nouy. Generalized spectral decomposition method for solving stochastic finite elementequations: Invariant subspace problem and dedicated algorithms.
Computer Methods in Ap-plied Mechanics and Engineering , 197:4718–4736, 2008.[27] A. Nouy. Proper generalized decompositions and separated representations for the numericalsolution of high dimensional stochastic problems.
Archives of Computational Methods inEngineering , 17:403–434, 2010.[28] I. V. Oseledets. Tensor-train decomposition.
SIAM Journal on Scientific Computing ,33(5):2295–2317, 2011.[29] V. Rokhlin and M. Tygert. A fast randomized algorithm for overdetermined linear least-squares regression. In
Proceedings of the National Academy of Sciences , volume 105(36),pages 13212–13217. National Academy of Sciences, September 2008.[30] T. Sarlos. Improved approximation algorithms for large matrices via random projections.In
Foundations of Computer Science, 2006. FOCS ’06. 47th Annual IEEE Symposium on ,pages 143–152, 2006.[31] G. Tomasi and R. Bro. A comparison of algorithms for fitting the PARAFAC model.
Comput.Statist. Data Anal. , 50(7):1700–1734, 2006.[32] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. Eldarand G. Kutyniok, editors,
Compressed Sensing, Theory and Applications , chapter 5, pages210–268. Cambridge University Press, 2012.[33] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert. A fast randomized algorithm for theapproximation of matrices.
Appl. Comput. Harmon. Anal. , 25(3):335–366, 2008.
ANDOMIZED ALS FOR CANONICAL TENSOR DECOMPOSITION 32 ∗ Department of Aerospace Engineering Sciences, 429 UCB, University of Col-orado at Boulder, Boulder, CO 80309, ††