On Koopman Mode Decomposition and Tensor Component Analysis
OOn Koopman Mode Decomposition and Tensor Component Analysis
William T. Redman a) Interdepartmental Graduate Program in Dynamical Neuroscience, University of California Santa Barbara, California 93106,USA (Dated: 8 February 2021)
Koopman mode decomposition and tensor component analysis (also known as CANDECOMP/PARAFAC or canonicalpolyadic decomposition) are two popular approaches of decomposing high dimensional data sets into low dimensionalmodes that capture the most relevant features and/or dynamics. Despite their similar goal, the two methods are largelyused by different scientific communities and formulated in distinct mathematical languages. We examine the twotogether and show that, under a certain (reasonable) condition on the data, the theoretical decomposition given by tensorcomponent analysis is the same as that given by Koopman mode decomposition. This provides a “bridge” with whichthe two communities should be able to more effectively communicate. When this condition is not met, Koopman modedecomposition still provides a tensor decomposition with an a priori computable error, providing an alternative to thenon-convex optimization that tensor component analysis requires. Our work provides new possibilities for algorithmicapproaches to Koopman mode decomposition and tensor component analysis, provides a new perspective on the successof tensor component analysis, and builds upon a growing body of work showing that dynamical systems, and Koopmanoperator theory in particular, can be useful for problems that have historically made use of optimization theory.
Koopman operator theory (KOT) has emerged as a power-ful and general framework with which to understand non-linear dynamical systems in a data-driven manner. De-spite its many strengths, there are numerous scientific do-mains, such as neuroscience and biology, where it remainsan unfamiliar tool and is, therefore, under employed. Aspart of this stems from KOT being mathematically formu-lated in a distinct way from other more commonly usedtools, such as principal component analysis (PCA), there isa need for bridging KOT with such methods. Here, we in-vestigate how a major KOT analysis approach differs fromtensor component analysis (TCA), an extension of PCAthat has quickly become adopted by some of those samefields “hesitant” of KOT. We show that, in certain scenar-ios, TCA and KOT give the same theoretical decomposi-tion. This not only makes strides in establishing a bridgebetween TCA and KOT, but also provides new insight intoTCA, and new possible algorithms for implementing KOTand TCA.
I. INTRODUCTION
A central goal in all fields of science is to discover the un-derlying dynamics of observed data. Koopman operator the-ory (KOT) has emerged as a powerful, data-driven frameworkwith which to do this . KOT lifts the perspective of dynam-ical systems from the possibly finite and nonlinear state spaceof the data, to the infinite and linear functional space of ob-servables on the state space. A key finding over the past 15years is that there are a variety of algorithms that can well a) Electronic mail: [email protected] approximate the Koopman operator, and hence the underly-ing dynamics, with sufficient, but reasonably finite, amountsof data. These algorithms include, among others, generalizedLaplace analysis, the finite section method (also known as theGalerkin projection), Krylov subspace methods, and dynamicmode decomposition (see Mezi´c 2020 for a general review).We note that all of these methods were originally developed,at least in part, independently of KOT. It has been the real-ization of their utility in computing relevant KOT quantitiesthat has allowed for KOT to find success in an ever growingnumber of different scientific domains and problems.In many cases, researchers applying KOT are interested inthe Koopman mode decomposition (KMD). The KMD pro-vides a decomposition of the action of the Koopman operatorin such a way that, for autonomous dynamical systems, thetime evolution of the system is naturally given as the sum ofa number of “modes”, each with their own time dependencies(see Sec. II for more details). In some cases, the number ofmodes necessary for accurate prediction is small compared tothe size of the system, allowing for a low dimensional descrip-tion. KMD has been successfully used to discover the under-lying dynamics in many different dynamical systems such asfluids , power grids and neural networks .An active and fruitful area of research has been comparingthe KMD modes to the modes obtained from various morecommonly used approaches, such as principle componentanalysis (PCA – also referred to as proper orthogonal decom-position) and independent component analysis (ICA) .The two have also been indirectly linked via their connec-tions to the renormalization group . One common goalof this body of work has been to highlight to researchers infields where PCA is a common tool that KMD can give simi-lar and, in certain scenarios, superior mode representations ofdata. Despite this, to date KOT remains a niché and not widelyadopted tool in some communities, such as neuroscience and biology . a r X i v : . [ m a t h . NA ] F e b In recent years, a method that can be seen as a general-ization to PCA, tensor component analysis (TCA – alsoknown as CADECOMP/PARAFAC or canonical polyadic de-composition) has gained in popularity. Part of this stems fromthe fact that TCA can be used on data that is naturally rep-resented as a tensor, such as that obtained via repeated trialexperiments. This means it does not require the “flattening”across dimensions that many matrix methods, like PCA, do.Additionally, there exist a number of efficient and robust al-gorithms that have been developed to perform TCA and havebeen made freely available . Because of its intuitive con-nection to PCA, and because it does not require the computedcomponents to be orthogonal to each other (unlike PCA), ithas quickly been adopted by some of the same communitieswhere KOT has not, such as neuroscience .While there has been work extending KOT tools totensors , there has been little work comparing TCA toKMD . Here, by considering a data third-order tensor (i.e. atensor with elements that are indexed by three independent la-bels), we show that there exists a correspondence between thecomputed modes of TCA and KMD. In particular, we showthat, under a certain (reasonable) condition on the data, the de-composition obtained from applying TCA to the data from anautonomous dynamical system is equivalent to that obtainedvia KMD. We believe that this not only provides motivationas to why KMD can be used in fields that it is not yet popularin, but also highlights a new class of algorithms for comput-ing the KMD (based on those from the TCA literature) and anew class of algorithms for performing TCA (based on thosefrom the KMD literature). Because existing TCA and KMDalgorithms rely on different approaches, understanding whichones work better under different conditions will be a fruitfulfuture direction of study.This paper is organized as follows. In Secs. II and III, weprovide brief reviews of KMD and TCA, respectively. Whilewe imagine many readers will be familiar with KMD, we for-mulate it so as to account for the tensor nature of the data,which makes its connection with TCA more apparent. Wethen proceed to layout the correspondence between the twomethods in Sec. IV, and provide a clear description of whenthe two methods will provide (theoretically) exactly the samedecompositions. We provide simple numerical examples ofautonomous systems, where the true KMD is known, to illus-trate our claims in Sec. V. We end by discussing how our re-sults add insight into TCA, and what new questions this workopens in Sec. VI. II. KOOPMAN MODE DECOMPOSITION
The central object of interest in KOT is the Koopman oper-ator, U , an infinite dimensional linear operator that describesthe time evolution of observables (i.e. functions of the under-lying state space variables) that lives in the functional space, F . That is, after t > f ∈ F , which canbe a scalar or a vector valued function, is given by U t f ( p ) = f (cid:2) T t ( p ) (cid:3) (1) where T is the dynamical map evolving the system and p is theinitial condition or location in state space. For the remainderof the paper it will be assumed that F is the standard L -space.It was discovered that the action of the Koopman operatoron the observable f can be decomposed as U f ( p ) = ∞ ∑ r = λ r φ r ( p ) v r (2)where the φ r are eigenfunctions of U , with λ r ∈ C as theireigenvalues and v r as their eigenvectors, also called Koopmanmodes . For certain systems there is an additional term inEq. 2, arising from the continuous part of the spectrum ofthe Koopman operator . Not much is known about the con-tribution of this term . Because the point spectrum of U (theright hand side of Eq. 2) has been found to, in many practicalcases, lead to sufficient approximation, for the remainder ofthis paper it will be assumed that there is no contribution fromthe continuous part of the spectrum. Decomposing the actionof the Koopman operator is powerful because, for a discretedynamical system, the value of f at time step n ∈ Z + is givensimply by U n f ( p ) = f [ T n ( p )] = ∞ ∑ r = λ nr φ r ( p ) v r . (3)When the dynamical system is continuous, the value of f attime t ∈ R + is given by U t f ( p ) = f (cid:2) T t ( p ) (cid:3) = ∞ ∑ r = exp ( λ r t ) φ r ( p ) v r . (4)Note here that the convention changes from λ nr to exp ( λ r t ) .From Eqs. 3 and 4, we see that the dynamics of the systemin the directions v r , scaled by the φ r ( p ) , are given by the mag-nitude of the corresponding λ r . Assuming that | λ r | ≤ r , in the case of a discrete dynamical system (or | λ r | ≤
0, inthe case of a continuous dynamical system), finding the longterm behavior of the function amounts to considering only the φ r ( p ) v r whose λ r ≈
1. Additionally, we note that the eigen-functions, φ r , are the only part of Eqs. 3 and 4 that retain“memory” of the initial condition .While the number of triplets ( λ r , φ r , v r ) needed to fully cap-ture the action of U is, in principle, infinite, in many appliedsettings it has been found that a finite number, R , of them al-lows for a good approximation . That is, in the case of a dis-crete dynamical system, U n f ( p ) ≈ R − ∑ r = λ nr φ r ( p ) v r . (5)In this paper, we consider Eq. 5 to be the definition of KMD. A. Dynamic Mode Decomposition
One popular way of computing the triplets ( λ r , φ r , v r ) of Eq.5 is dynamic mode decomposition (DMD) . While thereare a number of different approaches that have been developedto perform variants of DMD, we will discuss the approachthat is related to the Arnoldi algorithm , because of its clarityin connection with TCA. We start by following the treatmentgiven by Rowley et al. 2009 , exactly.Let X = [ x , Tx , · · · , T m x ] = [ x , x , · · · , x m ] be m + x . Here T is thedynamical map evolving the system.If x m happens to lie in the span of { x , · · · , x m − } , thenthere exists c T = ( c , ..., c m − ) such that x m = c x + · · · + c m − x m − = Kc , (6)where K = [ x , x , · · · , x m − ] . Therefore, we have that TK = KC , (7)where C = . . . c . . . c . . . c ... ... . . . ... ...0 0 . . . c m − . C is called the companion matrix and can be decomposed as C = W − Λ W , (8)where Λ is a diagonal matrix, whose diagonal entries are theeigenvalues of C .If x m does not lie in the span of { x , · · · , x m − } , then thereis a residual in approximating x m by Kc , r = x m − Kc , (9)which can be minimized (in the sense that its norm can beminimized) by choice of c .DMD can be viewed as finding the matrices V , Λ , and ˜ T such that X − V Λ ˜ T = r ⊗ e (10)for a fixed r . Here ⊗ is the vector outer product, e T =( , , ..., ) ∈ R m , V = KW − , and ˜ T is the Vandermonde ma-trix ˜ T = λ . . . λ m − λ . . . λ m − ... ... . . . ...1 λ m − . . . λ m − m − , with λ r being the r th diagonal element of Λ , and therefore isthe r th eigenvalue of the Koopman operator. Its correspond-ing eigenvector (or Koopman mode), v r , is the r th column of V . These Koopman modes are scaled by the Koopman eigen-function φ r ( p ) . We can define the unscaled Koopman modes, v (cid:48) r , by v r = v (cid:48) r φ r ( p ) , so long as φ r ( p ) (cid:54) =
0. We note that ( Λ ˜ T ) T is equivalent to˜ S = ( Λ ˜ T ) T = λ λ · · · λ m − λ λ . . . λ m − ... ... . . . ... λ m λ m . . . λ mm − , which contains the information about the time evolution ofeach mode in its columns.Eq. 10 can then be re-written as X − m − ∑ r = (cid:2) φ r ( p ) v (cid:48) r (cid:3) ⊗ ˜ s r = r ⊗ e (11)where ˜ s r denotes the r th column of ˜ S .In many cases, DMD is estimated using only a single timeseries. In such a case, the computed eigenfunctions, φ r ( p ) ,are computed for only a single p . To gain more informationon the φ r , multiple experiments, each with a different initialcondition, can be performed. Alternatively, multiple sensorsor sources of data (e.g. physical locations in a fluids experi-ment) can be considered as different initial conditions. Thesemay not be feasible for a given experimental design, espe-cially when analyzing previously recorded data. In general,there exists a way in which to discover more values of theeigenfunctions. For deterministic systems, time delaying a given time series (e.g. considering the location of the sys-tem at t to be the initial condition, instead of that at t , and soon) provides more initial conditions where the eigenfunctionscan be evaluated at. This is, of course, limited by the fact thatonly the points that fall along the single trajectory that the datacame from can be used.When data from q different initial conditions (obtained viawhichever of the methods listed above), { p , ..., p q − } , areconsidered, the data can be represented by a third-order ten-sor, XXX . To do this, data matrices, as we had before, can be“stacked” along a third dimension. The i th slice of XXX is there-fore given by the data matrix X p i , which corresponds to thedata collected with initial condition p i . As noted earlier, thedependence on p in Eq. 11 is only in the eigenfunctions, φ r . If we define ϕ Tr = (cid:2) φ r ( p ) , ..., φ r ( p q − ) (cid:3) , then we havean equation (with some unfortunately clunky added notation)for DMD analogous to Eq. 11, XXX − m − ∑ r = ϕ r ⊗ v (cid:48) r ⊗ ˜ s r = RRR × e (12)where RRR ∈ C m × × q is a tensor of the residuals, with the i th slice being the column vector r from Eq. 11 for X p i , and × is the notation for multiplication of a tensor and vector alongthe second dimension.While all of this has been shown for XXX being made up ofsnapshots of the state space vector, the same holds true for thecase when the data tensor is instead composed of other, non-identity, observables . There are many cases where this hasbeen found to give better decompositions. III. TENSOR COMPONENT ANALYSIS
Similar to KOT, TCA was first developed in the early 20 th century and has seen a recent resurgence of interest. Ithas been successfully applied to a number of fields includingneuroscience and signal processing , as well as thefirst two fields it was developed in, psychometrics , andchemometrics . The fact that it was independently arrivedupon by several different researchers at different times in thecontext of different fields makes the TCA literature somewhatconfusing. TCA is also called CANDECOMP (for canonicaldecomposition), PARAFAC (for parallel factorization), andcanonical polyadic decomposition. For a comprehensive re-view of the literature on tensor decompositions, see Kolda andBader 2009 .Given a tensor XXX , which in many applied settings storesdata obtained from repeated experiments, the central objectiveof TCA is to find lower order tensors that combine to give agood approximation to
XXX . In the case of
XXX being a third-ordertensor of size K × N × T , TCA finds matrices A ∈ C K × R , B ∈ C N × R , C ∈ C T × R , such that XXX ≈ R − ∑ r = a r ⊗ b r ⊗ c r , (13)where a r , b r , and c r are the r th column vectors of A , B , and C respectively, R is the rank of XXX (also known as the number ofmodes), and ⊗ is the vector outer product. The TCA modesare therefore given by the triplets ( a r , b r , c r ) . A , B , and C arechosen by the minimization problemmin A , B , C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) XXX − R − ∑ r = a r ⊗ b r ⊗ c r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (14)where || · || is analogous to the Frobenius norm for matrices .That is, for an N th –order tensor XXX , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) XXX (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:118)(cid:117)(cid:117)(cid:116) I − ∑ i = I − ∑ i = · · · I N − ∑ i N = x i , i ,..., i N , where I r is the size of the r th dimension of XXX .Importantly, it has been proven that, when A , B , and C havefull column rank, this decomposition is unique , up to rescal-ing and permutation of the ( a r , b r , c r ) . The non-uniqueness toscaling emerges because, for α ∈ C , ( a r / α , b r , α c r ) also sat-isfies Eq. 14. Similarly, if the columns of A , B , and C areshuffled in the same way, the new matrices ˜ A , ˜ B , and ˜ C sat-isfy Eq. 14. If R is sufficiently small (and therefore, A , B , and C have full column rank), the TCA decomposition will beunique, up to these transformations. This uniqueness is not, ingeneral, true of matrix decompositions, such as PCA.As has been noted in the literature, there is no perfect wayof performing TCA . In practice, various optimization basedmethods, such as alternating and nonlinear least squares, areused for solving Eq. 14. These methods can take many itera-tions to converge, and are known to not always be guaranteedto converge to a global minimum. Therefore, the develop-ment of methods to better compute the TCA decomposition is an area of active research. More details on these and otheralgorithms, and their practical considerations can be found inKolda and Bader 2009 and Hong, Kolda, and Bader 2020 (which includes discussion on the use of more general objec-tive functions).Without loss of generality, we can assume that the com-ponents of a r correspond to the dependence of the r th modeon initial condition (or trial number, in the case of a repeatedexperiment), that the components of b r correspond to theamount each part of the system or dimension in phase space“participates” in the r th mode, and that the components of c r correspond to the dependence of the r th mode on time. The n th column vector of the p th slice of XXX , ( X p ) n , which describes thesystem with initial condition p at time n , can be reconstructedas ( X p ) n ≈ R − ∑ r = ( a r ) p b ( c r ) n . (15)where ( a r ) p and ( c r ) n are the p th and n th elements of a r and c r respectively.We note that, in view of Eq. 15, TCA has usually been seenas a tool to represent XXX . That is, the modes have been used toprovide insight into existing data, as opposed to predicting thefuture state(s) of the system.
IV. CORRESPONDENCE BETWEEN TCA AND KMD
From Eqs. 12 and 14, we have that, given a data third-ordertensor
XXX , XXX − m − ∑ r = ϕ r ⊗ v (cid:48) r ⊗ ˜ s r = RRR × e XXX − R − ∑ r = a r ⊗ b r ⊗ c r = EEE (16)where
EEE is the residual tensor corresponding the A , B , and C that minimize Eq. 14. We will assume that R = m , and thatthis R is such that the columns of A , B , and C are linearlyindependent (and thus, the TCA decomposition is unique).From these equations, we immediately have the followingLemma. Lemma 1 (cid:12)(cid:12)(cid:12)(cid:12)
RRR × e (cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12)(cid:12) EEE (cid:12)(cid:12)(cid:12)(cid:12) . Proof . This follows from the TCA minimization problem(Eq. 14).From Lemma 1, we thus see that, in terms of accuracy inrepresenting the existing data, there always exists a TCA rep-resentation that has error less than or equal to that provided byDMD. This is, of course, only guaranteed in theory. Whetherthis optimal decomposition can be found in practice is anotherquestion (see Sec. VI for more discussion on this).Based on observation of Eq. 16, we suggest the followingcorrespondence: ϕ r ⇐⇒ a r v (cid:48) r ⇐⇒ b r ˜ s r ⇐⇒ c r (17)When, if ever, will this correspondence be exact? That is,when will the computed DMD triplets ( ϕ r , v (cid:48) r , ˜ s r ) , in theory,be equal to the computed TCA triplets ( a r , b r , c r ) ? Lemma 2
When
RRR of Eq. 12 is equal to , the correspon-dence of Eq. 17 is exact, up to scaling and permutation oflabels.Proof. As with Lemma 1, this follows immediately fromconsidering the TCA minimzation problem (Eq. 14) and thefact that the decomposition of TCA is unique up to scalingand permutation . Note that if R is not, as we have assumed,such that the TCA decomposition is unique, Lemma 2 stillholds for one of the TCA decompositions that satisfies Eq. 14.Given that DMD is directly connected to the dynamics of thedata, in the case of non-uniqueness it would be well motivatedto choose, as convention, that this decomposition is the TCAdecomposition.Lemma 2 provides a precise condition of when the twomethods will give the same decomposition (at least, in the-ory). Namely, the relationships in Eq. 17 will be exact when,for each data matrix X p , the last column vector, ( X p ) N , lieswithin the span of the previous columns. In applied cases,where the number of time points is (much) larger than the sizeof the system and/or where the underlying dynamics are trulylow-dimensional, we can expect this condition to be reason-able.Given that the TCA minimzation problem (Eq. 14) ignoresthe data’s dynamics, it is somewhat surprising that it is able toextract dynamically relevant modes. This is usually explainedby the fact that the data itself comes from a mixture of lowdimensional sources, which are hoped to be recoverable bysearching for a solution to Eq. 14. However, whether thesesources provide the optimal decomposition (especially in theface of realistic experimental conditions), and whether simi-larly good decompositions could possibly come from dynam-ically “nonsensical” modes, is unclear. This section providesa new perspective on TCA’s success. When RRR = , Lemma 2tells us that the optimal TCA modes are precisely those thatdecompose the Koopman operator (Eqs. 3 and 4). Becausethe Koopman operator describes the time evolution of the data(Eq. 1), the optimal TCA modes are inherently connected tothe data’s dynamics. When RRR (cid:54) = , but has a small norm, theoptimal TCA modes may still retain significant dynamical in-formation if they are near (in the sense of some appropriatemetric) the DMD modes. From this, we may hypothesize thatthe further the optimal TCA modes are from the DMD modes,the less dynamic information they may contain. Whether thisis indeed true will be investigated in future work.Lastly, we note that the correspondence (Eq. 17) impliesthe following. For autonomous discrete dynamical systems,the time evolution of the r th KMD mode is described by inte-ger powers of a single complex number λ r . The time depen-dence of the TCA modes, in principle, have no assumed formand can be (possibly) non-exponential functions of n . How-ever, when the condition described above is met and Eq. 17is exact, the time dependence of the TCA modes will have tobe exponential functions with respect to n . To our knowledge,this has not been previously noted. Using this to constrainthe minimization search of Eq. 14, even in the case when (cid:12)(cid:12)(cid:12)(cid:12) RRR (cid:12)(cid:12)(cid:12)(cid:12) is non-zero but small, may help with convergence androbustness. Indeed, this is similar in spirit to work that foundsignificant success in speeding up TCA and increasing inter-pretability of the modes, by using an over-complete library offunctions to model the time dependence of each mode . V. NUMERICAL EXAMPLES OF AUTONOMOUSSYSTEMS
To illustrate the correspondence developed in Section IV,and to show that numerical implementations of TCA do in-deed give decompositions that are similar to the true KMD,we examine two simple problems: the linear oscillator and thehopping map. These examples are picked because their dataclearly satisfies the condition discussed in the last section forthe correspondence to be exact, because the true KMD exists,and because they can be easily interpreted. The TCA modeswere computed using Tensorlab 3.0 , a MATLAB packagethat implements various possible TCA decomposition algo-rithms. We used the nonlinear least squares based method.The code that generated these examples has been made freelyavailable online . A. Linear oscillator
We consider the linear oscillator˙ x ( t ) = Ax (18)where A = (cid:20) − c / m − k / m (cid:21) and c is the damping term.Let v (cid:48) , v (cid:48) be the eigenvectors of A with eigenvalues, λ , λ respectively. Let w (cid:48) , w (cid:48) be the eigenvectors of A ∗ with thesame eigenvalues. Basic KOT tells us that φ r ( x ) = (cid:104) x , w (cid:48) r (cid:105) isan eigenfunction of U , with eigenvalue λ r and unscaled Koop-man mode v (cid:48) r . Here (cid:104)· , ·(cid:105) is the inner product. Eq. 4 tells usthat the time evolution of each mode will be given approxi-mately by exp ( λ r n ∆ t ) , where ∆ t is the time discretization and n is the number of time steps.Figure 1 compares the two modes ( R =
2) obtained usingTCA with the two true KMD modes for the damped oscilla-tor: c =
1, with k = m =
1. We rescaled the values of eachTCA mode to get them as close to the true KMD mode aspossible. The need for rescaling is expected because, as notedearlier, the TCA modes are unique only up to rescaling andpermutation.In general, the different components of each TCA modetriplet were similar to those of the corresponding true KMD.In particular, the re (cid:104) ( a r ) x (cid:105) very accurately approximatedre [ φ r ( x )] , for a variety of initial conditions x and for bothmodes (Fig. 1c,d). The vector components of b r and thetime evolution components of c r were less accurate. Interest-ingly, it seemed that, in general, the more accurate one TCAmode was in approximating the time evolution of its respec-tive KMD mode, the more error there was in the other mode.Understanding this will be the direction of future work. These FIG. 1.
Damped linear oscillator. (a) The time evolution of the firsttrue KMD mode, exp ( λ n ∆ t ) (black), and the corresponding TCAmode (blue). Here ∆ t = . · )]. (c)The scaling, given by φ ( x ) , dependent on the initial state, x , ofthe first true KMD mode and the corresponding TCA mode. Theinitial velocity was set to 0 for all numerical experiments, so the x-axis is only position. (e) The magnitude of the “participation” of theposition and velocity in the first true KMD mode, given by | v (cid:48) | , andthe corresponding TCA mode. (b), (d), (f) are the same as (a), (c), (e),respectively, for the second true KMD mode and the correspondingTCA mode. trends were also present when data from the ideal linear oscil-lator, c =
0, was used (not shown).
B. Hopping map
We next considered the hopping map used as Example 1 ofMezi´c 2005 , T ( x ) = − ( x ) mod [ − , ] . (19)Because it maps every element in [ , ] to [ − , ] , and viceversa, any function that takes values f ( x ) ∈ C when x ∈ ( , ] and − f ( x ) when x ∈ [ − , ] is an eigenfunction of the Koop-man operator with eigenvalue λ = −
1. The time dependenceof such a mode is then ( λ , λ , λ , ... ) = ( − , , − , ... ) .Fig. 2 compares one such true mode with that found usingTCA. As with the linear oscillator, the modes were rescaled.Because this system has a one-dimensional state space, the FIG. 2.
Hopping map. (a) The time evolution of the computed TCAmode (blue) and the corresponding true KMD mode (black). (b) Thescaling dependent on the initial condition of the TCA mode and thecorresponding true KMD mode.
Koopman mode is not plotted. The plotted true Koopmaneigenfunction (Fig. 2b) was the mean absolute value of thecomputed a . This is because of the aforementioned fact thatthere are an infinite number of Koopman eigenfunctions with λ = −
1, and we picked the one closest to that computed byTCA.For both the time dependence and initial condition scaling,TCA closely, but not completely, resembled the true KMDvalues. One potential reason for the inaccuracy is that only asmall amount of data can be used (in our case, 30 time stepswere used). This is because of the computer’s finite precision,which leads all initial conditions to ultimately converge on thetrivial fixed point x = VI. DISCUSSION
In this paper, we examined tensor component analysis(TCA – also known as CANDECOMP/PARAFAC or canon-ical polyadic decomposition) in relation to Koopman modedecomposition (KMD). This was motivated by the fact thatboth methods have become popular ways to discover, in anunsupervised manner, the relevant features and/or dynamicsof a given dynamical system. Despite their joint aim, thetwo methods have largely occupied disjoint scientific realms.Therefore, it became our goal to examine the two meth-ods together and see what, if any, connections existed be-tween them, in an effort to “bridge” the different communi-ties. While previous work has compared principal componentanalysis (PCA) with KOT methods both directly andindirectly , and approaches to do KMD on tensors havebeen developed , little work has be done on comparingKMD to TCA .We considered dynamic mode decomposition(DMD) , a popular approach for performing KMD,on a data three-tensor, with one dimension being the elementsof the state space, one being time, and one being the initialconditions. We proved, in Lemma 1, that it is guaranteedthat there exists a TCA decomposition that provides lower, orequal, error in reconstructing the data tensor than DMD. Wethen formulated Eq. 17, a correspondence between the modesof TCA and the modes of DMD. With this, we were motivatedto look for when the two methods would give exactly thesame decomposition. We proved in Lemma 2 that, when thelast snapshot of each “slice” of the data tensor lies within thespan of the previous data, the theoretical decompositions ofTCA and KMD are identical , up to scaling and permutationof labels. On two simple autonomous dynamical systems,we showed that a numerical implementation of TCA, whilenot perfect, gave results similar to those expected to the trueKMD. In particular, even with relatively small amounts ofdata (as in the case of the hopping map – Sec VB.), the timeevolution and the scaling dependent on initial conditions werelargely well captured by TCA.By unifying TCA and KMD, our work opens up a numberof new directions. First, with this correspondence in hand, itshould become easier for those familiar with Koopman oper-ator theory (KOT) to communicate with those who work infields where PCA and TCA are “gold standard” tools. Thesefields include neuroscience and biology, among others, areaswhere KOT has only recently begun to be applied . Sec-ond, to our knowledge, TCA has been seen as an approachfor representing existing data. As our correspondence shows,the TCA modes can also contain information on the futuretime evolution of the system. Whether TCA is a good toolfor prediction remains an open question. Additionally, giventhat KMD takes a specific form in the time dependence ofeach mode, namely integer powers of λ r ∈ C , constraining thesearch of the c r in the TCA minimization problem (Eq. 14)to fit this form could lead to a more robust and convergentTCA algorithm. A related work indeed found that using anover-complete library of functions to model the time depen-dence of each mode led to significant benefits . Third, TCAimplementations now become another possible tool in KOT’sgrowing numerical toolbox . Most implementations of TCAmake use of optimization algorithms , which are largelyabsent from the KOT literature (although, relatedly, there hasbecome an increasing interest in using neural networks to per-form KMD ). Whether these methods might be advanta-geous over the current KMD methods in certain scenarios isan important open question. Additionally, because for manysystems it does not make physical sense to have elements ofthe state space take negative values in their participation ineach mode, non-negative TCA has been developed . Thisapproach constrains all the modes to have positive b r . Toour knowledge, no such method exists for KMD. And fourth,whether and how our results change when the underlying dy-namical system is non-autonomous, an area of recent activeresearch in the KOT community , will be the direction of fu-ture work.As can be seen from Secs. II and III, the principle differ-ence between TCA and DMD is that, while DMD finds thedecomposition that best predicts the last snapshot of the datafrom all of the prior time points (Eq. 12), TCA fits all dataequally, (Eq. 14). Because of this, it is perhaps not surpris-ing that we proved in Lemma 1 that TCA can always moreaccurately reconstruct existing data. However, fitting existingdata may not always be the (appropriate) goal, as in the caseof prediction, where understanding the future evolution of the system takes priority.In addition, because TCA methods make use of opti-mization techniques, such as alternating and nonlinear leastsquares , there are reasons to think that DMD may, in gen-eral, offer advantages. First, as was recently shown whenDMD was compared to PCA for prediction , we imagineDMD may be considerably faster than TCA. This is becauseDMD requires a few matrix operations, whereas TCA re-quires (many) sequential iterations before convergence (butsee Battaglino et al. 2018 and Erichson et al. 2020 for fastrandomized TCA approaches). Indeed, it is known that doingTCA is a non-convex optimization problem . And second,depending on the problem, there may be many local minimain the objective function landscape. Therefore, while TCAmay, in theory, provide a better decomposition, finding it maytake many attempts involving numerous different initial con-ditions. This not only adds to the run-time of TCA, but alsoimplies that each application of TCA may result in differentmodes, making it hard to interpret the output. Because DMDonly involves one set of computations, all of these challengesare avoided. Whether first applying DMD to the data and thenusing the DMD modes as the starting guess for the TCA op-timization algorithm would lead to a faster convergence onto“good” TCA modes is another open question that we plan onaddressing. In support of this idea, higher order singular valuedecomposition has been found to (sometimes) be a good start-ing point for the alternating least squares approach to TCA .Finally, we see this work as being part of a growing bodyof literature that is bringing attention to the fact that dy-namical systems theory, and in particular KOT, can be usedfor problems that have historically relied on optimizationtheory . These papers have highlighted the fact that,while optimization theory has its strengths, its de-emphasison the past history of the system (e.g. gradient descent re-computing the gradient anew at each time step) is a consider-able cost that KOT avoids. Here we showed that TCA, whichhas been formulated as a non-convex optimization problem,can be equivalent to DMD in certain scenarios. This allowsfor the past history of the system (i.e. the dynamics of the col-lected data) to be used to find an informative, low dimensionaldescription that can aid in understanding the problem of study. ACKNOWLEDGMENTS
We would like to thank Prof. Igor Mezic for the introduc-tion to KOT and the ensuing insightful conversations, DeanHuang for insightful input and catching an error in the equa-tions, Akshunna S. Dogra for his advice on the best framingof the correspondence and the potential use of DMD as a start-ing point for TCA, and Cory Brown, as well as the members ofthe Goard Lab at UCSB, for discussions on neuroscience andKOT. The author is in-part supported by a Chancellor Fellow-ship from UCSB.
DATA AVAILABILITY
The data and code used for the examples in Sec. V havebeen made freely publicly available . B. O. Koopman, Hamiltonian systems and transformation in Hilbert space,Proceedings of the National Academy of Sciences : 315 (1931) B. O. Koopman and J. v. Neumann, Dynamical systems of continuous spec-tra, Proceedings of the National Academy of Sciences : 255 (1932) I. Mezic, Spectral properties of dynamical systems, model reduction anddecompositions, Nonlinear Dynamics : 309 (2005). M. Budisíc, R. Mohr, and I. Mezic, Applied Koopmanism, Chaos: An In-terdisciplinary Journal of Nonlinear Science : 047510 (2012). I. Mezic, Spectrum of the Koopman operator, spectral expansions in func-tional spaces, and state-space geometry, Journal of Nonlinear Science10.1007/s0032-019-09598-5 (2019). I. Mezic, On Numerical Approximations of the Koopman Operator,arXiv:2009.05883 (2020). C. W. Rowley, I. Mezic, S. Bagheri, P. Schlatter, and S. S. Henningson,Spectral analysis of nonlinear flows, Journal of Fluid Mechanics : 115(2009). I. Mezic, Analysis of fluid flows via spectral properties of the Koopmanoperator, Annual Review of Fluid Mechanics : 357 (2013) H. Arbabi and I. Mezic, Study of dynamics in post transient flows usingKoopman mode decomposition, Phys. Rev. Fluids : 124402 (2017). Y. Susuki, I. Mezíc, F. Raak, and T. Hikihara, Applied Koopman operatortheory for power systems technology, Nonlinear Theory and Its Applica-tions, IEICE : 430 (2016). M. Korda, Y. Susuki,and I. Mezic, Power grid transient stabilization usingkoopman model predictive control, IFAC-PapersOnLine : 297 (2018). A.S. Dogra and W.T. Redman, Optimizing Neural Networks via KoopmanOperator Theory, Advances in Neural Information Processing Systems 33,NeurIPS 2020, (2020). I. Manojlovi´c, M. Fonoberova, R. Mohr, A. Andrejˇcuk, Z. Drmaˇc, Y.Kevrekidis, and I. Mezi´c, Applications of Koopman Mode Analysis to Neu-ral Networks, arXiv:2006.11765 (2020). M. E. Tano, G. D. Portwood, and J. C. Ragusa, Accelerating train-ing in artificial neural networks with dynamic mode decomposition,arXiv:2006.14371 (2020). B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. Kutz, Extractingspatial–temporal coherent patterns in large-scale neural recordings usingdynamic mode decomposition, Journal of Neuroscience Methods : 1(2016). S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C.Schutte,and F.Noe,Data-driven model reduction and transfer operator approximation, Journalof Nonlinear Science : 985 (2018). H. Lu and D.M. Tartakovsky, Prediction Accuracy of Dynamic Mode De-composition, SIAM Journal on Scientific Computing : 3 A1639–A1662(2020). S. Bradde and W. Bialek, PCA Meets RG, J. Stat. Phys. : 462-475(2017). W.T. Redman, Renormalization group as a Koopman operator, Phys. Rev.E : 060104(R) (2020). N. Marrouch, J. Slawinska, D. Giannakis, and H. Read, Data-driven Koop-man operator approach for computational neuroscience, Annals of Mathe-matics and Artificial Intelligence : 1155-1173 (2020). S. Balakhrishnan, A. Hasnain, N. Boddupalli, D. M. Joshy, R.G. Egbet, andE. Yeung, Prediction of fitness in bacteria with causal jump dynamic modedecomposition, 2020 American Control Conference, 3749-3756 (2020). J.D. Carroll and J.-J. Chang, Analysis of individual differences in multidi-mensional scaling via an n-way generalization of “Eckart-Young” decom-position, Psychometrika : 283–319 (1970) R. Harshman, Foundations of the PARAFAC procedure: Models and con-ditions for an “explanatory” multi-model factor analysis, UCLA WorkingPapers in Phonetics (1970). N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer,
Tensorlab 3.0 . T. G. Kolda and B. W. Bader, Tensor decompositions and applications,SIAM Review : 455–500 (2009). D. Hong, T.G. Kolda, J.D. Duersch, Generalized canonical polyadic tensordecomposition, SIAM Review : 133-163 (2020). S. Rambhatla, X. Li, and J. Haupt, Provable Online CP/PARAFAC De-composition of a Structured Tensor via Dictionary Learning, Advances inNeural Information Processing Systems 33, NeurIPS 2020, (2020). J. B. Kruskal, Three-way arrays: rank and uniqueness of trilinear decom-positions, with application to arithmetic complexity and statistics, LinearAlgebra and its Applications : 95 – 138 (1977). E. Martınez-Montes, P.A. Valdés-Sosa, F. Miwakeichi, R.I. Goldman,andM.S. Cohen, Concurrent eeg/fmri analysis by multiway partial leastsquares,NeuroImage : 1023 – 1034 (2004). F. Miwakeichi, E. Martınez-Montes, P.A. Valdés-Sosa, N. Nishiyama,H. Mizuhara, and Y. Yamaguchi, Decomposing EEG data intospace–time–frequency components using parallel factor analysis, NeuroIm-age : 1035– 1045 (2004). C. Beckmann and S. Smith, Tensorial extensions of independent compo-nent analysis for multisubject FMRI analysis, NeuroImage : 294 – 311(2005). E. Acar. C. Aykut-Bingol, H. Bingol, R. Bro,and B. Yener, Multiway anal-ysis of epilepsy tensors, Bioinformatics : i10–i18 (2007). A. Cichocki, M. De Vos, L. De Lathauwer, B. Vanrumste, S. Van Huffel,andW. Van Paesschen, Canonical decomposition of ictal scalp EEG and accu-rate source localisation: Principles and simulation study, ComputationalIntelligence and Neuroscience: 058253 (2007). M. De Vos, A. Vergult, L. De Lathauwer, W. De Clercq, S. Van Huffel,P. Dupont, A. Palmini, and W. Van Paesschen, Canonical decompositionof ictal scalp EEG reliably detects the seizure onset zone, NeuroImage :844–854 (2007). M. Mørup, L.K. Hansen, C.S. Herrmann, J. Parnas, and S.M. Arnfred, Par-allel factor analysis as an exploratory tool for wavelet transformed event-related EEG, NeuroImage : 938 – 947 (2006). M. Mørup, L.K. Hansen, and S.M. Arnfred, Erpwavelab: A toolbox formulti-channel analysis of time–frequency transformed event related poten-tials, Journal of Neuroscience Methods : 361 – 368 (2007). A.H. Williams, T.H. Kim, F. Wang, S. Vyas, S.I. Ryu, K.V. Shenoy, M.Schnitzer, T.G. Kolda,and S. Ganguli, Unsupervised discovery of demixed,low-dimensional neural dynamics across multiple timescales through tensorcomponent analysis, Neuron : 1099–1115 (2018). C.M. Constantinople, A.T. Piet, P. Bibawi, A. Akrami, C. Kopec, and C.D.Brody, Lateral orbitofrontal cortex promotes trial-by-trial learning of risky,but not spatial, biases, eLife : e49744 (2019). J. Xia, T.D. Marks, M.J. Goard, and R. Wessel, Diverse co-active neu-rons encode stimulus-driven and stimulus-independent variables, Journalof Neurophysiology (2020). Y. Zhu, J. Liu, K. Mathiak, T. Ristaniemi, and F. Cong, Deriving elec-trophysiological brain network connectivity via tensor component analysisduring freely listening to music, IEEE Transactions on Neural Systems andRehabilitation Engineering : 409–418 (2020) Y. Zhu, J. Liu, C. Ye, K. Mathiak, P. Astikainen, T. Ristaniemi, and F.Cong, Discovering dynamic task-modulated functional networks with spe-cific spectral modes using MEG, NeuroImage : 116924 (2020). J. Xia, T. D. Marks, M. J. Goard, and R. Wessel, Stable repre-sentation of a naturalistic movie emerges from episodic activity withgain variability, PREPRINT (Version 1) available at Research Squarehttps://doi.org/10.21203/rs.3.rs-126977/v1 (2021). S. Klus, P. Gelß, S. Peitz, and C. Schütte, Tensor-based dynamic modedecomposition, Nonlinearity (7): 3359 (2018). P. Gelß, S. Klus, J. Eiset, and C. Schütte, Multidimensional Approxima-tion of Nonlinear Dynamical Systems, J. Comput. Nonlinear Dynam. (6):061006 (2019). B. Lusch, E. C. Chi and J. N. Kutz, Shape Constrained Tensor Decomposi-tions, 2019 IEEE International Conference on Data Science and AdvancedAnalytics (DSAA), pp. 287-297 (2019). P.J. Schmid, Dynamic mode decomposition of numerical and experimentaldata, Journal of Fluid Mechanics : 5-28 (2010). K.K. Chen, J.H. Tu, and C.W. Rowley, Variants of dynamic mode decom-position: boundary condition, Koopman, and Fourier analyses, Journal ofNonlinear Science (6): 887-915 (2012). J.H. Tu, C.W. Rowley, D.M. Luchtenburg, S.L. Brunton, and J.N. Kutz, Ondynamic mode decomposition: Theory and applications, Journal of Com- putational Dynamics, (2): 391-421 (2014). M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, A data-driven approxi-mation of the Koopman operator: Extending dynamic mode decomposition,Journal of Nonlinear Science : 1307 (2015). F. Takens, Detecting strange attractors in turbulence. In: Rand D., YoungLS. (eds) Dynamical Systems and Turbulence, Warwick 1980. LectureNotes in Mathematics, vol 898. Springer, Berlin, Heidelberg (1981) M. Kamb, E. Kaiser, S.L. Brunton, J.N. Kutz, Time-Delay Observables forKoopman: Theory and Applications, SIAM J. Appl. Dyn. Syst., (2):886-917 (2020). F.L. Hitchcock, The expression of a tensor or a polyadic as a sum of prod-ucts, Journal of Mathematics and Physics : 164–189 (1927). F.L. Hitchcock, Multiple invariants and generalized rank of a p-way matrixor tensor, Journal of Mathematics and Physics : 39–79 (1928). D. Muti and S. Bourennane, Multidimensional filtering based on a tensorapproach, Signal Process., : 2338-2353 (2005). L. De Lathauwer and J. Castaing, Tensor-based techniques for the blindseparation of DS-CDMA signal, Signal Process. : 322-336 (2007). L. De Lathauwer and A. De Baynast, Blind deconvolution of DS-CDMAsignals by means of decomposition in rank-(1, L, L terms, IEEE Trans.Signal Process. : 1562-1571 (2008). L.R. Tucker, Some mathematical notes on three-mode factor analysis, Psy-chometrika : 279–311 (1966). C.J. Appellof and E.R. Davidson, Strategies for analyzing data from videofluorometric monitoring of liquid chromatographic effluents, AnalyticalChemistry : 2053–2056 (1981). S. Leurgans and R.T. Ross, Multilinear models: Applications in spec-troscopy, Statistical Science : 289–310 (1992). R. Henrion, Body diagonalization of core matrices in three-way principalcomponents analysis: Theoretical bounds and simulation, J. Chemometrics : 477–494 (1993). A.K. Smilde, Y. Wang, and B.R. Kowalski, Theory of medium-rank second-order calibration with restricted-Tucker models, J. Chemometrics : 21–36(1994). H.A.L. Kiers, A three–step algorithm for CANDECOMP/PARAFAC anal-ysis of large data sets with multicollinearity, J. Chemometrics : 155–171(1998). H.-L. Wu, N.B. Gallagher, and E.B. Martin, Application of PARAFAC2 tofault detection and diagnosis in semiconductor etch, J. Chemometrics :1-26 (1998). R. Bro, C.A. Andersson, and H.A.L. Kiers, PARAFAC2–Part II. Modelingchromotographic data with retention time shifts, J. Chemometrics : 295-309 (1999). C.M. Andersson and R. Bro, Practical aspects of PARAFAC modeling flu-orescence excitation-emission data, J. Chemometrics : 200-215 (2003). A. Smilde, R. Bro, and P. Geladi, Multi-Way Analysis: Applications in theChemical Sciences, Wiley, West Sussex, England, 2004. https://github.com/william-redman/Koopman-mode-decomposition-and-tensor-component-analysis N. Takeishi, Y. Kawahara, T. Yairi, Learning Koopman Invariant Subspacefor Dynamic Mode Decomposition, Advances in Neural Information Pro-cessing Systems 30, NIPS 17, 1130–1140 (2017). B. Lusch, J.N. Kutz, and S.L. Brunton, Deep learning for universal linearembeddings of nonlinear dynamics. Nat. Comm. :4950 (2018). E. Yeung, S. Kundu and N. Hodas, Learning Deep Neural Network Rep-resentations for Koopman Operators of Nonlinear Dynamical Systems,American Control Conference 2019, 4832-4839 (2019). P. Paatero and U. Tapper, Positive matrix factorization: A non-negative fac-tor model with optimal utilization of error estimates of data values, Envi-ronmetrics (2): 111-126 (1994). D. D. Lee and H. S. Seung, Learning the parts of objects by non-negativematrix factorization, Nature : 788-791 (1999). Y. Qi, P. Comon, and L.H. Lim, Uniqueness of nonnegative tensor approx-imations. IEEE Trans. Inf. Theory : 2170-2183 (2016). S. Ma´ceši´c, N. ˇCrnjari´c-Žic, and I. Mezi´c, Koopman Operator Family Spec-trum for Nonautonomous Systems, SIAM J. Appl. Dyn. Sys. (4): 2478-2515 (2018). C. Battaglino, G. Ballard, and T. G. Kolda, A Practical RandomizedCP Tensor Decomposition, SIAM J. Matrix Anal. Apply (2): 876-901(2018). N. B. Erichson, K. Manohar, S. L. Brunton, and J. N. Kutz, RandomizedCP tensor decomposition, Mach. Learn.: Sci. Technol. : 025012 (2020). F. Dietrich, T. N. Thiem, I. G. Kevrekidis, On the Koopman Operator ofAlgorithms, SIAM J. Appl. Dyn. Syst. (2): 860-885 (2020).78