Learning Koopman Eigenfunctions and Invariant Subspaces from Data: Symmetric Subspace Decomposition
11 Learning Koopman Eigenfunctions and InvariantSubspaces from Data: Symmetric SubspaceDecomposition
Masih Haseli and Jorge Cort´es
Abstract —This paper develops data-driven methods to identifyeigenfunctions of the Koopman operator associated to a dynam-ical system and subspaces that are invariant under the operator.We build on Extended Dynamic Mode Decomposition (EDMD), adata-driven method that finds a finite-dimensional approximationof the Koopman operator on the span of a predefined dictionaryof functions. We propose a necessary and sufficient conditionto identify Koopman eigenfunctions based on the application ofEDMD forward and backward in time. Checking this conditionrequires the comparison of the eigendecomposition of matriceswhose size grows with the size of the dictionary. To addressthis, we propose the Symmetric Subspace Decomposition (SSD)algorithm which provably identifies the maximal Koopman-invariant subspace and the Koopman eigenfunctions in the spanof the dictionary. We also introduce the Streaming SymmetricSubspace Decomposition (SSSD) algorithm, an online methodthat only requires a small, fixed memory and updates its estimateof the invariant subspace as new data is received. We prove that,given a data set, SSSD and SSD find the same solution.
I. I
NTRODUCTION
Driven by advances in processing, data storage, cloud ser-vices, and algorithms, the world has witnessed in recent yearsa revolution in data-driven learning, analysis, and control ofdynamical phenomena. State-space, probabilistic, and neuralnetwork models are among the most popular methods tomodel dynamical systems. With sufficient a priori informationabout the dynamics, state-space methods can provide closed-form analytic models that describe accurately the dynamicalbehavior. Such models, however, are generally nonlinear andtheir analytical study becomes arduous for moderate to high-dimensional systems. Probabilistic approaches, on the otherhand, provide an alternative description that is conductive todealing with incomplete information about the underlying dy-namics. However, under such approaches, deriving mathemati-cal guarantees may be hard, if not impossible. Neural networkscan describe the dynamics with high accuracy given enoughdata. The models acquired by neural networks are highly non-linear and difficult to study analytically. Hence, even thoughthey can be successful in predicting the behavior of the system,they often do not provide a deeper understanding into theirdynamics. These reasons have motivated researchers to seekalternative strategies to capture the dynamics using data with
This work was supported by ONR Award N00014-18-1-2828.A preliminary version of this work appeared as [1] at the IEEE Conferenceon Decision and Control.Masih Haseli and Jorge Cort´es are with Department of Mechanical andAerospace Engineering, University of California, San Diego, CA 92093, USA, { mhaseli,cortes } @ucsd.edu minimum a priori information in a computationally efficientway that result in simple yet accurate models. Approximatingthe Koopman operator associated with a dynamical system isone of such strategies. The Koopman operator is a linear butgenerally infinite-dimensional operator that fully describes thebehavior of the underlying dynamical system. Even though thelinearity of the Koopman operator makes its spectral propertiesa powerful tool for analysis, its infinite-dimensional natureprevents the use of conventional linear algebraic tools devel-oped to work with digital computers. One way to circumventthis issue is to identify finite-dimensional subspaces that areinvariant under the Koopman operator. This paper developsdata-driven methods to identify such subspaces. Literature Review:
The Koopman operator [2], [3] is alinear but generally infinite-dimensional operator that providesan alternative view of dynamical systems by describing theeffect of the dynamics on a functional space. Being a linearoperator enables one to use its spectral properties to captureand predict the behavior of nonlinear dynamical systems [4]–[6]. This leads to a wide variety of applications includingstate estimation [7], [8], system identification [9]–[12], sensorand actuator placement [13], model reduction [14], [15],control [16]–[21], and robotics [22], [23]. Moreover, theeigenfunctions of the Koopman operator play an importantrole in stability analysis of nonlinear systems [24]. Due tothe infinite-dimensional nature of the Koopman operator, thedigital implementation of the aforementioned applications isnot possible unless one can find a way to represent the effectof the operator on finite-dimensional subspaces. The literaturehas explored several data-driven methods to find such finite-dimensional approximations, which can be divided into twomain categories: projection methods and invariant-subspacemethods. Projection methods fit a linear model to the dataacquired from the system. The most popular approach in thiscategory is Dynamic Mode Decomposition (DMD), first pro-posed to capture dynamical information from fluid flows [25].DMD uses linear algebraic methods to form a linear modelfrom time series data. The work [26] explores the properties ofDMD and its connection with the Koopman operator, and [27]generalizes it to work with non-sequential data snapshots.Several extensions work with noisy data [28], [29], promotesparsity [30], and consider time-lagged data snapshots [31].Extended Dynamic Mode Decomposition (EDMD) [32] is animportant variations of DMD that lifts the states of the systemto a (generally higher-dimensional) functional space using apredefined dictionary of functions and finds the projection of a r X i v : . [ ee ss . S Y ] F e b the Koopman operator on that subspace. The work [33] studiesthe convergence properties of EDMD to the Koopman operatoras the number of data snapshots and dictionary elements goto infinity. EDMD is specifically designed to work with exactdata and experiments and simulations show that it may notwork well with noisy data. Our previous work [34] presenteda noise-resilient extension of EDMD. EDMD can also be com-bined with known information about the dynamics into EDMDto increase the accuracy of the model [35]. The aforementionedmethods provide linear higher-dimensional approximations forthe underlying dynamics that are, however, not suitable forlong term predictions, since they are generally not exact. Thisissue can be tackled by finding subspaces that are invariantunder the Koopman operator, since the acquired linear modelsare exact over them. This is the subject of the second groupof approaches. The works [36]–[38] provide approaches tofind functions that span Koopman-invariant subspaces usingneural networks. Moreover, since Koopman eigenfunctionsspan Koopman-invariant subspaces, one can use the empiricalmethods provided in [20], [39] to approximate the Koop-man eigenfunctions and consequently the invariant subspaces.Moreover, the work in [40] provides theoretical and empiricalresults based on multi-step predictions to find Koopman eigen-functions. Interestingly, note that none of the aforementionedmethods provide mathematical guarantees for the identifiedfunctions to be Koopman eigenfunctions. Statement of Contributions:
We present data-driven methodsto identify Koopman eigenfunctions and Koopman-invariantsubspaces associated with a potentially nonlinear dynami-cal system. First, we study the properties of the standardEDMD method regarding the identification of Koopman eigen-functions. We prove that EDMD correctly identifies all theKoopman eigenfunctions in the span of the predefined dic-tionary. This necessary condition however is not sufficient,i.e., the functions identified by the EDMD method are notnecessarily Koopman eigenfunctions. This motivates our nextcontribution, which is a necessary and sufficient condition thatcharacterizes the functions that evolve linearly according tothe available data snapshots. This condition is based on theapplication of EDMD forward and backward in time. Theidentified functions are not necessarily Koopman eigenfunc-tions, since one can only guarantee that they evolve linearlyon the available data (but not necessarily starting anywherein the state space). However, we prove that under reasonableassumptions on the density of the sampling, the identifiedfunctions are Koopman eigenfunctions almost surely. Our nextcontribution seeks to provide computationally efficient ways ofidentifying Koopman eigenfunctions and Koopman-invariantsubspaces. In fact, checking the aforementioned necessary andsufficient condition requires one to calculate and compare theeigendecomposition of two potentially large matrices, whichcan be computationally cumbersome. Moreover, even thoughthe subspace spanned by all the eigenfunctions in the spanof the original dictionary is Koopman-invariant, it might notbe maximal. To address these limitations, we propose theSymmetric Subspace Decomposition (SSD) strategy, whichis an iterative method to find the maximal subspace thatremains invariant under the application of dynamics (and its associated Koopman operator) according to the available data.We prove that SSD also finds all the functions that evolvelinearly in time according to the available data. Moreover,we prove that under the same conditions in the samplingdensity, the SSD strategy identifies the maximal Koopman-invariant subspace in the span of the original dictionary almostsurely. Our last contribution is motivated by applicationswhere the data becomes available in an online fashion. Insuch scenarios, at any given time step, one would need toperform SSD on all the available data received up to that time.Performing SSD requires the calculation of several singularvalue decompositions for matrices that scale with the size ofthe data, in turn requiring significant memory capabilities.To address these shortcomings, we propose the StreamingSymmetric Subspace Decomposition (SSSD) strategy, whichrefines the calculated Koopman-invariant subspaces each timeit receives new data and deals with matrices of fixed andrelatively small size (independent of the size of the data). Weprove that SSSD and SSD methods are equivalent, in the sensethat for a given dataset, they both identify the same maximalKoopman-invariant subspace.
Notation:
We denote by N , N , R , R ≥ , and C , the sets ofnatural, nonnegative integer, real, positive real, and complexnumbers respectively. For a matrix A ∈ C m × n , we denotethe sets comprised of its rows by rows( A ) , its columns by cols( A ) , the number of its rows by (cid:93) rows( A ) , and the numberof its columns by (cid:93) cols( A ) , respectively. In addition, we denoteits pseudo-inverse, transpose, complex conjugate, conjugatetranspose, Frobenius norm, and range space by A † , A T , ¯ A , A H , (cid:107) A (cid:107) F , and R ( A ) , respectively. For ≤ i < k ≤ m , wedenote by A i : k the matrix formed with the i th to k th rows of A . For a square nonsingular matrix B , we denote its inverse by B − . Given matrices A ∈ C m × n and B ∈ C m × d , we denoteby [ A, B ] ∈ C m × ( n + d ) the matrix created by concatenating A and B . The angle between vectors v, w ∈ R n is ∠ ( v, w ) .Given v , . . . , v k ∈ C n , span { v , . . . , v k } represents the setcomprised of all linear combinations c v + · · · + c n v n , with c , . . . , c n ∈ C . We use j to denote the imaginary unit (thesolution of x + 1 = 0 ). For v ∈ C n , we denote its realand imaginary parts by Re( v ) and Im( v ) , and its 2-norm as (cid:107) v (cid:107) := √ v H v . Given a set A , we denote its complement by A c . Given sets A and B , A ⊆ B means that A is a subset of B . We denote by A ∩ B and A ∪ B the intersection and unionof A and B , and set A \ B := A ∩ B c . Given a sequenceof sets { A i } ∞ i =1 , we denote its superior and inferior limits by lim sup i →∞ A i and lim inf i →∞ A i , respectively. We refer tothe set consisting of all continuous strictly increasing functions α : R ≥ → R ≥ with α (0) = 0 by class- K . Given f : B → A and g : C → B , f ◦ g : C → A denotes their composition.II. P RELIMINARIES
In this section, we review basic concepts on the Koopmanoperator and Extended Dynamic Mode Decomposition.
A. Koopman Operator
Here, we introduce the (discrete-time) Koopman operatorand its spectral properties following [6]. Consider a nonlinear, time-invariant, continuous map T : M → M on M ⊆ R n ,defining the dynamical system x + = T ( x ) . (1)The dynamics (1) acts on the points in the state space M andgenerates trajectories of the system. The Koopman operator, onthe other hand, provides an alternative approach to analyze (1)based on evolution of functions (also known as observables)defined on M and taking values in C . Formally, let F be alinear space of functions from M to C which is closed undercomposition with T , i.e., f ◦ T ∈ F , ∀ f ∈ F . (2)The Koopman operator K : F → F associated with (1) is K ( f ) = f ◦ T. A closer look at the definition of the Koopman operator showsthat it advances the observables in time, i.e., for g = K ( f ) then g ( x ) = f ◦ T ( x ) = f ( x + ) , ∀ x ∈ M . (3)This equation shows how the Koopman operator encodes thedynamics on the functional space F . The operator is linear as adirect consequence of linearity in F , i.e., for every f , f ∈ F and c , c ∈ C , K ( c f + c f ) = c K ( f ) + c K ( f ) . (4)Assuming F contains the functions describing the states ofthe system, g i ( x ) = x i with i ∈ { , . . . , n } , the Koopmanoperator fully characterizes the global features of the dynamicsin a linear fashion. Moreover, the operator might be (andgenerally is) infinite dimensional either by choice of F ordue to closedness requirement in (2).Being linear, one can naturally define its eigendecomposi-tion. A function φ ∈ F is an eigenfunction of K associatedwith eigenvalue λ ∈ C if K ( φ ) = λφ. (5)The combination of (3) and (5) leads to a significant propertyof the Koopman operator: the linear evolution of its eigen-functions in time. Formally, given an eigenfunction φ , φ ( x + ) = ( φ ◦ T )( x ) = K ( φ )( x ) = λφ ( x ) . The linear evolution of eigenfunctions, in conjunction withthe linearity (4) of the operator, enables us to use spectralproperties to analyze the nonlinear system (1). Given a setof eigenpairs { ( λ i , φ i ) } N k i =1 such that K ( φ i ) = λ i φ i , i ∈{ , . . . , N k } , one can describe the evolution of every function f in span( { φ i } N k i =1 ) , i.e., f = (cid:80) N k i =1 c i φ i , for some { c i } N k i =1 ⊂ C , as f ( x ( k )) = N k (cid:88) i =1 c i λ ki φ i ( x (0)) , ∀ k ∈ N . (6)The constants { c i } N k i =1 are called Koopman modes . It is im-portant to note that one might need to use N k = ∞ to fullydescribe the behavior of the dynamical system.Another important notion in the analysis of the Koopmanoperator is the invariance of subspaces under its application. Formally, a subspace S ⊆ F is Koopman-invariant if forevery f ∈ S we have K ( f ) ∈ S . Furthermore, S is maximalKoopman-invariant in L ⊆ F if it contains every Koopman-invariant subspace in L . Naturally, a set comprised of Koop-man eigenfunctions spans a Koopman-invariant subspace. B. Extended Dynamic Mode Decomposition
Our exposition here mainly follows [32]. As mentionedearlier, despite its linearly, the infinite-dimensional nature ofthe Koopman operator obstructs the use of efficient linearalgebraic methods. One natural way to overcome this problemis finding finite-dimensional approximations for it. ExtendedDynamic Mode Decomposition (EDMD) is a popular data-driven method to perform this task that lifts data snapshotsacquired from the dynamical system to a higher-dimensionalspace using a predefined dictionary of functions. The projec-tion of the action of the operator on the span of the dictionarycan then be found by solving a least-squares problem.Formally, let D : R n → R × N d be a dictionary of N d functions in F with D ( x ) = [ d ( x ) , . . . , d N d ( x )] . Moreover,let X, Y ∈ R N × n be matrices comprised of N data snapshotssuch that y i = T ( x i ) for i ∈ { , . . . , N } , where x Ti and y Ti are i th rows of X and Y , respectively. For convenience, wedefine the action of the dictionary on a matrix as D ( X ) := [ D ( x ) T , . . . , D ( x N ) T ] T . The EDMD method approximates the projection of the Koop-man operator by finding the matrix that best explains the dataover the dictionary, i.e.,minimize K (cid:107) D ( Y ) − D ( X ) K (cid:107) F , which yields the closed-form solution K EDMD ( D ( X ) , D ( Y )) = D ( X ) † D ( Y ) . (7)Note that the solution depends on the choice of dictionary(when it is clear from the context, we simply use K EDMD ).If the dictionary spans a Koopman-invariant subspace, then (cid:107) D ( Y ) − D ( X ) K EDMD (cid:107) F = 0 and K EDMD fully capturesthe evolution of functions in span( D ( x )) . Otherwise, EDMDloses some information about the dynamics. The work [33]analyzes the convergence properties of EDMD to the Koopmanoperator as the number of dictionary functions N d and datasnapshots N go to infinity.III. P ROBLEM S TATEMENT
As described in Section II-B, the EDMD method losesinformation about the dynamical system when the employeddictionary does not span a Koopman-invariant subspace. Asa result, in such cases, the EDMD approximation is notappropriate for long term prediction of the state evolution.Motivated by this observation, our goal is to find the maximalKoopman-invariant subspace and Koopman eigenfunctions inthe span of a given dictionary.Formally, given the dynamical system (1) defined by T : M → M , data matrices X and Y comprised of N datasnapshots, and a dictionary of functions D , our main goalis two-fold: (a) find all the Koopman eigenfunctions in span( D ( x )) ;(b) find a basis for the maximal Koopman-invariant sub-space in span( D ( x )) .Note that (a) and (b) are closely related. The eigenfunc-tions found by solving (a) span Koopman-invariant subspaces.Those invariant subspaces however might not be maximal.This mild difference between (a) and (b) requires the use ofdifferent solution approaches. Since we are dealing with finite-dimensional linear subspaces, we aim to use linear algebrainstead of optimization-based methods, which are widely usedfor solving these types of problems. This enables us to directlyuse computationally efficient linear algebraic packages thatoptimization methods rely on.Throughout this paper, we use the following assumptionregarding the dictionary snapshots. Assumption 3.1: (Full Column Rank Dictionary Matrices):
The matrices D ( X ) and D ( Y ) have full column rank. (cid:3) Assumption 3.1 is reasonable: in order to hold, the dictio-nary functions must be linearly independent, i.e., the functionsmust form a basis for span( D ( x )) . Moreover, the assumptionrequires the set of initial conditions rows( X ) to be diverseenough to capture important characteristics of the vector field.We should also point out that our treatment here relies onEDMD, which is not specifically designed to work with noisydata. Hence, we assume access to data with high signal-to-noise ratio. In practice, one might need to pre-process the datato use the algorithms proposed here.IV. EDMD AND K OOPMAN E IGENFUNCTIONS
Here we investigate the capabilities and limitations of theEDMD method regarding the identification of Koopman eigen-functions. The next result shows that EDMD is not only ableto capture Koopman eigenfunctions but also all the functionsthat evolve linearly according to the available data.
Lemma 4.1: (EDMD Captures the Koopman Eigenfunc-tions in the Span of the Dictionary):
Suppose Assumption 3.1holds. Let f ( x ) ∈ span( D ( x )) , with f ( x ) = D ( x ) v for some v ∈ C N d \ { } .(a) Let f evolve linearly according to the available data,i.e., there exists λ ∈ C such that f ( y i ) = λf ( x i ) forevery i ∈ { , . . . , (cid:93) rows( X ) } . Then, the vector v is aneigenvector of K EDMD with eigenvalue λ ;(b) Let f be an eigenfunction of the Koopman operator witheigenvalue λ . Then, the vector v is an eigenvector of K EDMD with eigenvalue λ . Proof: (a) Based on the linear evolution of f , we have D ( Y ) v = λD ( X ) v. Moreover, using the closed-form solution of EDMD, we have K EDMD v = D ( X ) † D ( Y ) v = λD ( X ) † D ( X ) v = λv, where the last equality follows from Assumption 3.1.(b) Based on the definition of Koopman eigenfunction, wehave f ( x + ) = λf ( x ) . Since this linear evolution reflectsin data snapshots, we have f ( y i ) = λf ( x i ) for every i ∈ { , . . . , (cid:93) rows( X ) } where x Ti and y Ti are the i th rows of X and Y respectively. The rest follows from (a).Despite its simplicity, this result provides a significantinsight into the EDMD method. Lemma 4.1 shows that EDMDcan capture eigenfunctions in the span of the dictionaryeven if the underlying subspace is not Koopman invariant.Throughout the literature it is well known that the (E)DMDmethod can capture physical constraints, conservation laws,and other properties of the underlying system, which actuallycorrespond to Koopman eigenfunctions, e.g. see [32], [41].Lemma 4.1 explains these observations and is a generalizationof [27, Theorem 1] to EDMD when the underlying systemis not necessarily linear (or cannot be approximated by alinear system accurately) and the underlying subspace is notKoopman invariant.Lemma 4.1 provides a necessary condition for the identifica-tion of Koopman eigenfunctions. This condition however is notsufficient, see e.g. [1, Example IV.3] for a counter example.Interestingly, if a function evolves linearly forward in time,it also evolves linearly backward in time. The next result,shows that checking this observation provides a necessary andsufficient condition for identification of functions that evolvelinearly in time according to the available data. Theorem 4.2: (Identification of Linear Evolutions by For-ward and Backward EDMD):
Suppose Assumption 3.1 holds.Let f ( x ) ∈ span( D ( x )) with f ( x ) = D ( x ) v for some v ∈ C N d \ { } . Then f ( y i ) = λf ( x i ) for some λ ∈ C \ { } and for all i ∈ { , . . . , (cid:93) rows( X ) } if and only if v is an eigen-vector of K f = K EDMD ( D ( X ) , D ( Y )) with eigenvalue λ ,and an eigenvector of K b = K EDMD ( D ( Y ) , D ( X )) witheigenvalue λ − . Proof: ( ⇐ ) : Using the closed-form solutions of theEDMD problem and Assumption 3.1, one can write, K f = ( D ( X ) T D ( X )) − D ( X ) T D ( Y ) ,K b = ( D ( Y ) T D ( Y )) − D ( Y ) T D ( X ) . Using these equations along with the definition of the eigen-pair, we have λD ( X ) T D ( X ) v = D ( X ) T D ( Y ) v, (9a) λ − D ( Y ) T D ( Y ) v = D ( Y ) T D ( X ) v. (9b)By multiplying (9a) from the left by v H and using (9b), λ (cid:107) D ( X ) v (cid:107) = v H D ( X ) T D ( Y ) v = ¯ λ − (cid:107) D ( Y ) v (cid:107) which implies | λ | (cid:107) D ( X ) v (cid:107) = (cid:107) D ( Y ) v (cid:107) . (10)Now, we decompose D ( Y ) v orthogonally as D ( Y ) v = cD ( X ) v + w, (11)with v H D ( X ) T w = 0 . Substituting (11) into (9a) and multi-plying both sides from the left by v H yields λv H D ( X ) T D ( X ) v = cv H D ( X ) T D ( X ) v. Since v (cid:54) = 0 , and under Assumption 3.1, we deduce that c = λ .Substituting the value of c in (11), finding the 2-norm, andusing the fact that v H D ( X ) T w = 0 , one can write (cid:107) D ( Y ) v (cid:107) = | λ | (cid:107) D ( X ) v (cid:107) + (cid:107) w (cid:107) . Comparing this with (10), one deduces that w = 0 and D ( Y ) v = λD ( X ) v . The result follows by looking at thisequality in a row-wise manner and noting that f ( x ) = D ( x ) v . ( ⇒ ) : Based on Lemma 4.1(a), v must be an eigenvectorof K f with eigenvalue λ . Moreover, since λ (cid:54) = 0 one canwrite f ( x i ) = λ − f ( y i ) for every i ∈ { , . . . , (cid:93) rows( X ) } and, consequently, using Lemma 4.1(a) once again, we have K b v = λ − v , concluding the proof.If the function f satisfies the conditions provided by The-orem 4.2, then it is guaranteed that f ( x + ) = λf ( x ) for all x ∈ rows( X ) . However, Theorem 4.2 does not guaranteethat f is an eigenfunction, i.e., there is no guarantee that f ( x + ) = λf ( x ) for all x ∈ M . To circumvent this issue,we introduce next infinite sampling and make an assumptionabout its density. Assumption 4.3: (Almost sure dense sampling from a com-pact state space):
Assume the state space M is compact.Suppose we gather infinitely (countably) many data snapshots.For N ∈ N , the first N data snapshots are representedby matrices X N and Y N such that y i = T ( x i ) for all i ∈ { , . . . , N } , where x i and y i are the i th rows of X N and Y N , respectively (we refer to the columns of X T N asthe set S N of initial conditions). Assume there exists a class- K function α and sequence { p N } ∞ N =1 ⊂ [0 , such that, forevery N ∈ N , ∀ m ∈ M , ∃ x ∈ S N such that (cid:107) m − x (cid:107) ≤ α (cid:16) N (cid:17) holds with probability p N , and lim N →∞ p N = 1 . (cid:3) Assumption 4.3 is not restrictive as, in most practical cases,the state space is compact or the analysis is limited to a specificbounded region. Moreover, the data is usually available on abounded region due the limited range of sensors. Regardingthe sampling density, Assumption 4.3 holds for most standardrandom samplings.Noting that our methods presented later require Assump-tion 3.1 to hold, we provide a definition for dictionary matricesacquired from infinite sampling.
Definition 4.4: ( R -rich Sequence of Dictionary Snapshots): Let { X N } ∞ N =1 and { Y N } ∞ N =1 be the sequence of data snap-shot matrices acquired from system (1). Given the dictionary D : M → R × N d , we say the sequence of dictionary snapshotmatrices is R -rich if R = min { M ∈ N | rank( D ( X M )) =rank( D ( Y M )) = N d } exists ( R is called richness constant ). (cid:3) In Definition 4.4, if { M ∈ N | rank( D ( X M )) = rank( D ( Y M )) = N d } , (cid:54) = ∅ then based on the well-ordering principle, see e.g. [42, Chapter0], the minimum of the set exists and the sequence of thedictionary snapshot matrices is R -rich. Moreover, given an R -rich sequence of dictionary snapshots matrices D ( X N ) and D ( Y N ) , Assumption 3.1 holds for every N ≥ R . We are now ready to identify the Koopman eigenfunctionsin the span of the dictionary using forward-backward EDMD. Theorem 4.5: (Identification of Koopman Eigenfunctions byForward and Backward EDMD):
Given an infinite sampling,suppose that the sequence of dictionary snapshot matricesis R -rich. Let K Nf = K EDMD ( D ( X N ) , D ( Y N )) , K Nb = K EDMD ( D ( Y N ) , D ( X N )) . Given v ∈ C N d \ { } and λ ∈ C \ { } , let f ( x ) = D ( x ) v . Then,(a) If f is an eigenfunction of the Koopman operator witheigenvalue λ , then K Nf v = λv and K Nb v = λ − v forevery N ≥ R ;(b) Conversely, and assuming the dictionary functions arecontinuous and Assumption 4.3 holds, if K Nf v = λv and K Nb v = λ − v for every N ≥ R , then f is an eigen-function of the Koopman operator with probability 1. Proof: (a) Since f is a Koopman eigenfunction, for every i ∈ N we have f ( y i ) = λf ( x i ) . Moreover, for every N ≥ R , D ( X N ) and D ( Y N ) have full column rank. Therefore, theresult follows from Theorem 4.2.(b) Based on Theorem 4.2, we deduce that, for every N ≥ Rf ( y i ) = λf ( x i ) v, ∀ i ∈ { , . . . , N } , (12)where x Ti and y Ti are the i th rows of X N and Y N respectively. Now, define h ( x ) := f ◦ T ( x ) − λf ( x ) . Thefunction h is continuous since f is a linear combination ofcontinuous functions and T is also continuous. By inspect-ing h on the data points and using (12) and the fact that y i = T ( x i ) , for all i ∈ { , . . . , N } , one can show that h ( x i ) = f ◦ T ( x i ) − λf ( x i ) = f ( y i ) − λf ( x i ) = 0 for every i ∈ { , . . . , N } . Moreover, note that based on Assumption 4.3,the set S ∞ = (cid:83) ∞ i =1 S i is dense in M with probability 1 and h ( x ) = 0 for every x ∈ S ∞ . As a result, h ( x ) = 0 on M withprobability 1. This implies that f ◦ T ( x ) = λf ( x ) for every x ∈ M almost surely. Consequently, we have K ( f ) = λf almost surely, and the result follows.Theorems 4.2 and 4.5 provide conditions to identify Koop-man eigenfunctions. The identified eigenfunctions then canspan Koopman-invariant subspaces. Unlike [40, Algorithm1], the methods proposed here do not require access tothe system’s multi-step trajectories and essentially use leastsquares algorithms with closed-form solutions. However, onestill needs to compare N d potentially complex eigenvec-tors and their corresponding eigenvalues. This procedure canbe impractical for large N d . Moreover, since M ⊆ R n ,the eigenfunctions of the Koopman operator form complex-conjugate pairs. Such pairs can be fully characterized usingtheir real and imaginary parts, which allows to use instead real-valued functions. Motivated by these observations, we developalgorithms to directly identify Koopman-invariant subspaces.V. I DENTIFICATION OF K OOPMAN -I NVARIANT S UBSPACESVIA S YMMETRIC S UBSPACE D ECOMPOSITION
Here we provide an algorithmic method to identifyKoopman-invariant subspaces in the span of a predefineddictionary. We later show how this can be used to find Koop-man eigenfunctions. Given a dictionary D : M → R × N d comprised of N d linearly independent functions, we aim to find a dictionary ˜ D : M → R × ˜ N d with ˜ N d linearlyindependent functions such that the elements of ˜ D ( x ) spanthe maximal Koopman-invariant subspace in the span of D ( x ) .Since span( ˜ D ( x )) is Koopman invariant, we have R ( ˜ D ( x + )) = R ( ˜ D ( x )) , ∀ x ∈ M . This equality gets reflected in the data, i.e., given snapshotmatrices X and Y , we have R ( ˜ D ( Y )) = R ( ˜ D ( X )) . (13)Moreover, since the elements of ˜ D ( x ) are in the span of D ( x ) ,there exists a full column rank matrix C such that ˜ D ( x ) = D ( x ) C . Thus using (13), one can write R ( D ( Y ) C ) = R ( D ( X ) C ) . (14)Hence, we can reformulate the problem as a purely linear-algebraic problem consisting of finding the full column rankmatrix C with maximum number of columns such that (14)holds. To solve this problem, we propose the SymmetricSubspace Decomposition (SSD) method (cf. Algorithm 1).At each iteration, SSD prunes the dictionary and removesfunctions that do not evolve linearly in time according to theavailable data . Algorithm 1
Symmetric Subspace Decomposition Initialization i ← , A ← D ( X ) , B ← D ( Y ) , C SSD ← I N d while do (cid:20) Z Ai Z Bi (cid:21) ← null([ A i , B i ]) (cid:46) Basis for the null space if null([ A i , B i ]) = ∅ then return (cid:46) The basis does not exist break end if if (cid:93) rows( Z Ai ) ≤ (cid:93) cols( Z Ai ) then return C SSD (cid:46)
The procedure is complete break end if C SSD ← C SSD Z Ai (cid:46) Reducing the subspace A i +1 ← A i Z Ai , B i +1 ← B i Z Ai , i ← i + 1 end while The next result analyzes the convergence of SSD andcharacterizes the dimension, maximality, and symmetry of thesubspace defined by its output.
Theorem 5.1: (Properties of SSD Output):
Suppose As-sumption 3.1 holds. For matrices D ( X ) , D ( Y ) , let C SSD =SSD( D ( X ) , D ( Y )) . The SSD algorithm has the followingproperties:(a) it stops after at most N d iterations;(b) the matrix C SSD satisfies R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) . In addition, C SSD is either orhas full column rank; In the pseudocode stated in Algorithm 1, the function null([ A i , B i ]) returns a basis for the null space of [ A i , B i ] , and Z Ai and Z Bi in Step 4have the same size. (c) the subspace R ( D ( X ) C SSD ) is maximal, in the sensethat, for any matrix E with R ( D ( X ) E ) = R ( D ( Y ) E ) ,we have R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) and R ( E ) ⊆R ( C SSD ) ;(d) R (cid:0) SSD( D ( X ) , D ( Y )) (cid:1) = R (cid:0) SSD( D ( Y ) , D ( X )) (cid:1) . Proof: (a) First, we use (strong) induction to prove that ateach iteration Z Ai , Z Bi are matrices with full column rank uponexistence . By Assumption 3.1, A and B have full columnrank. Now, by using Lemma A.1 one can derive that Z A and Z B have full column rank. Now, suppose that the matrices Z A , . . . , Z Ak and Z B , . . . , Z Bk have full column rank. UsingAssumption 3.1 one can deduce that A k +1 = A Z A . . . Z Ak , B k +1 = B Z A . . . Z Ak have full column rank since they areproduct of matrices with full column rank. Using Lemma A.1,one can conclude that Z Ak +1 and Z Bk +1 have full column rank.Consequently, we have (cid:93) rows( Z Ai ) ≥ (cid:93) cols( Z Ai ) . Hence,Step 9 of the SSD algorithm implies that the algorithm canonly move to the next iteration if (cid:93) rows( Z Ai ) > (cid:93) cols( Z Ai ) ,which means the number of columns in A i +1 and B i +1 decreases with respect to A i and B i . Hence, the algorithmterminates after at most N d iterations since A and B have N d columns.(b) The C SSD = 0 case is trivial. Suppose that thealgorithm stops after k iterations with nonzero C SSD . Thismeans that Z Ak and Z Bk are square full rank matrices. Also,by definition we have A k Z Ak = − B k Z Bk which means that A k = − B k Z Bk ( Z Ak ) − . Noting that Z Bk ( Z Ak ) − is a full ranksquare matrix, one can derive R ( A k ) = R ( B k ) . A closer lookat the definitions shows that A k = D ( X ) C SSD and B k = D ( Y ) C SSD . Hence, R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) .Moreover, C SSD = Z A · · · Z Ak − and considering the fact that Z A , . . . , Z Ak − have full column rank, one can deduce that C SSD has full column rank.(c) Suppose that the matrix E satisfies R ( D ( X ) E ) = R ( D ( Y ) E ) . First, we use induction to prove that R ( D ( X ) E ) ⊆ R ( A i ) ∩ R ( B i ) for each iteration i thatthe algorithm goes through. Let i = 1 , then A = D ( X ) and B = D ( Y ) . Consequently, R ( D ( X ) E ) ⊆ R ( A ) and R ( D ( X ) E ) = R ( D ( Y ) E ) ⊆ R ( B ) based on the definitionof E . Hence, R ( D ( X ) E ) ⊆ R ( A ) ∩ R ( B ) . Now, suppose R ( D ( X ) E ) ⊆ R ( A i ) ∩ R ( B i ) . (15)Using Lemma A.1, one can derive R ( A i Z Ai ) = R ( A i ) ∩R ( B i ) . Combining this with (15), we get R ( D ( X ) E ) ⊆ R ( A i Z Ai )= R (cid:0) D ( X ) Z A · · · Z Ai (cid:1) . (16)Using (16) with Lemma A.2 one can derive R ( E ) ⊆R ( Z A · · · Z Ai ) . Using Lemma A.2 once again, we get R ( D ( X ) E ) = R ( D ( Y ) E ) ⊆ R (cid:0) D ( Y ) Z A · · · Z Ai (cid:1) . (17)Definition of A i +1 , B i +1 along with (16) and (17) lead toconclusion that R ( D ( X ) E ) ⊆ R ( A i +1 ) ∩ R ( B i +1 ) and theinduction is complete.Now, suppose that the algorithm terminates at iteration k .In the case that C SSD = 0 , we have R ( A k ) ∩ R ( B k ) = { } , which means that E = 0 and R ( D ( X ) E ) ⊆R ( D ( X ) C SSD ) . In the case that C SSD (cid:54) = 0 , using the factthat R ( D ( X ) E ) ⊆ R ( A k ) ∩ R ( B k ) , C SSD = Z A · · · Z Ak − ,and R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) , one can deduce that R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) . Moreover, using Assump-tion 3.1 and Lemma A.2 one can write R ( E ) ⊆ R ( C SSD ) .(d) For convenience, let E SSD = SSD( D ( Y ) , D ( X )) .Based on the definition of C SSD and E SSD , one can write R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) R ( D ( X ) E SSD ) = R ( D ( Y ) E SSD ) Using this equations in conjunction with the maximality of R ( C SSD ) from part (c), we deduce R ( E SSD ) ⊆ R ( C SSD ) .Using a similar argument and invoking the maximality of R ( E SSD ) , we deduce R ( C SSD ) ⊆ R ( E SSD ) , and the prooffollows. Remark 5.2: (Time and Space Complexity of the SSD Al-gorithm):
Given N data snapshots and a dictionary with N d elements, where usually N (cid:29) N d , and assuming thatoperations on scalar elements require time and memory oforder O (1) , the most time and memory consuming operationin the SSD algorithm is Step 4. This step can be done bytruncated Singular Value Decomposition (SVD) and findingthe perpendicular space to the span of the right singular vec-tors, with time complexity O ( N N d ) and memory complexity O ( N N d ) , see e.g., [43]. Since, based on Theorem 5.1(a), theSSD algorithm terminates in at most N d iterations, the totaltime complexity is O ( N N d ) . However, since at each iterationwe can reuse the memory for Step 4, the space complexity ofSSD is O ( N N d ) . (cid:3) As mentioned earlier, SSD removes the functions that donot evolve linearly in time according to the available datasnapshots. As a consequence, as we gather more data, theidentified subspace either remains the same or gets smaller, asstated next.
Lemma 5.3: (Monotonicity of SSD Output with Respect toData Addition):
Let D ( X ) , D ( Y ) and D ( ˆ X ) , D ( ˆ Y ) be twopairs of data snapshots such that rows (cid:0) [ D ( X ) , D ( Y )] (cid:1) ⊆ rows (cid:0) [ D ( ˆ X ) , D ( ˆ Y )] (cid:1) , (18)and for which Assumption 3.1 holds. Then R (SSD([ D ( ˆ X ) , D ( ˆ Y )])) ⊆ R (SSD( D ( X ) , D ( Y ))) . Proof:
We use the shorthand notation ˆ C =SSD([ D ( ˆ X ) , D ( ˆ Y )]) and C = SSD( D ( X ) , D ( Y )) .From (18), we deduce that there exists a matrix E with rows( E ) ⊆ rows( I (cid:93) rows( ˆ X ) ) such that ED ( ˆ X ) = D ( X ) , ED ( ˆ Y ) = D ( Y ) . (19)Moreover, based on the definition of ˆ C and Theorem 5.1(b),we have R ( D ( ˆ X ) ˆ C ) = R ( D ( ˆ Y ) ˆ C ) . Hence, there exists a fullrank square matrix ˆ K such that D ( ˆ Y ) ˆ C = D ( ˆ X ) ˆ C ˆ K. Multiplying both sides from the left by E and using (19) gives D ( Y ) ˆ C = D ( X ) ˆ C ˆ K . Consequently, we have R ( D ( Y ) ˆ C ) = R ( D ( X ) ˆ C ) . Now, the maximality of C (Theorem 5.1(c))implies R ( ˆ C ) ⊆ R ( C ) . Remark 5.4: (Implementing SSD using Finite Precision andApproximation of Koopman-invariant Subspaces):
Since SSDis an iterative method, its implementation using finite-precisiondigital computers leads to small errors that can affect the rankand null space of [ A i , B i ] in Step 4. To circumvent this issue,one can approximate [ A i , B i ] at each iteration by a close (inthe Frobenius norm) low-rank matrix. Let σ ≥ . . . ≥ σ l i be the singular values of [ A i , B i ] ∈ R N × l i . Given a designparameter (cid:15) > , let k i be the minimum integer such that (cid:80) l i j = k i σ j (cid:80) l i j =1 σ j ≤ (cid:15). (20)By setting σ k i = · · · = σ l i = 0 in the singular value decom-position of [ A i , B i ] one can construct the matrix [ ˆ A i , ˆ B i ] withlower rank. Also, (cid:107) [ A i , B i ] − [ ˆ A i , ˆ B i ] (cid:107) F (cid:107) [ A i , B i ] (cid:107) F ≤ (cid:15). (21)Hence, (cid:15) tunes the accuracy of the approximation. Also,one can use the same method to approximate the Koopman-invariant subspaces, which is specially convenient when thedictionary does not contain a nontrivial Koopman-invariantsubspace. (cid:3) If the output of the SSD algorithm is not trivial, C SSD (cid:54) = 0 ,we define the invariant dictionary as ˜ D ( x ) := D ( x ) C SSD . (22)Based on Theorem 5.1(b), we have R ( ˜ D ( X )) = R ( ˜ D ( Y )) . (23)Moreover, ˜ D ( X ) and ˜ D ( Y ) have full column rank as a resultof Assumption 3.1 and Theorem 5.1(b). Hence, there exists aunique nonsingular square matrix K SSD such that ˜ D ( Y ) = ˜ D ( X ) K SSD . (24)One can find K SSD in closed form as K SSD = ˜ D ( X ) † ˜ D ( Y )= (cid:0) D ( X ) C SSD (cid:1) † (cid:0) D ( Y ) C SSD ) . (25)It is important to note that, based on (23), one can find K SSD only based on partial data instead of calculating the pseudo-inverse of D ( X ) C SSD . Formally, consider full column rankdata matrices D ( ˆ X ) , D ( ˆ Y ) such that rows[ D ( ˆ X ) , D ( ˆ Y )] ⊆ rows[ D ( X ) , D ( Y )] . Then, one can derive K SSD = (cid:0) D ( ˆ X ) C SSD (cid:1) † (cid:0) D ( ˆ Y ) C SSD ) . Next, we show that the eigenvectors of K SSD fully character-ize the functions that evolve linearly in time according to theavailable data.
Theorem 5.5: (Identification of Linear Evolutions usingthe SSD Algorithm):
Suppose that Assumption 3.1 holds.Let C SSD = SSD( D ( X ) , D ( Y )) (cid:54) = 0 , K SSD = (cid:0) D ( X ) C SSD (cid:1) † (cid:0) D ( Y ) C SSD (cid:1) , and f ( x ) ∈ span( D ( x )) de-noted as f ( x ) = D ( x ) v with v ∈ C N d \ { } . Then f ( y i ) = λf ( x i ) for some λ ∈ C \ { } and for all i ∈ { , . . . , (cid:93) rows( X ) } if and only if v = C SSD w with K SSD w = λw . Proof: ( ⇐ ) : Based on definition of K SSD , Assump-tion 3.1, and considering the fact that C SSD has full columnrank (Theorem 5.1(b)), one can use (22)-(24) and the factthat K SSD w = λw to write D ( Y ) C SSD w = λD ( X ) C SSD w .Consequently, using v = C SSD w we have D ( Y ) v = λD ( X ) v. By inspecting the equation above in a row-wise manner, onecan deduce that f ( y i ) = λf ( x i ) for some λ ∈ C \ { } andfor all i ∈ { , . . . , (cid:93) rows( X ) } , as claimed. ( ⇒ ) : Based on the hypotheses, we have D ( Y ) v = λD ( X ) v. (26)Consider first the case when v ∈ R N d . Then using (26),we deduce R ( D ( X ) v ) = R ( D ( Y ) v ) . The maximality of C SSD (Theorem 5.1(c)) implies that R ( v ) ⊆ R ( C SSD ) andconsequently v = C SSD w for some w . Replacing v by C SSD w in (26) and using the definition of K SSD , one deduces K SSD w = λw .Now, suppose that v = v R + jv I with v I (cid:54) = 0 . Since D ( X ) and D ( Y ) are real matrices, one can use (26) and write D ( Y )¯ v = ¯ λD ( X )¯ v . This, together with (26), implies D ( Y ) E = D ( X ) E Λ , (27)where E = [ v R , v I ] and Λ = (cid:20)
Re( λ ) Im( λ ) − Im( λ ) Re( λ ) (cid:21) . Since Λ is full rank, we have R ( D ( X ) E ) = R ( D ( Y ) E ) andusing Theorem 5.1(c), one can conclude R ( E ) ⊆ R ( C SSD ) .Consequently, there exists a real vector z such that E = C SSD z . By replacing this in (27) and multiplying both sidesfrom the right by r = [1 , j ] T and defining w = zr , onecan conclude that v = Er = C SSD w and D ( Y ) C SSD w = λD ( X ) C SSD w . This in conjunction with the definition of K SSD implies that K SSD w = λw , concluding the proof.Using Theorem 5.5, one can identify all the linear evolutionsin the span of the original dictionary, thereby establishing anequivalence with the forward-backward EDMD characteriza-tion of Section IV. Corollary 5.6: (Equivalence of Forward-Backward EDMDand SSD in the Identification of Linear Evolutions):
Suppose that Assumption 3.1 holds. Let K f = K EDMD ( D ( X ) , D ( Y )) , K b = K EDMD ( D ( Y ) , D ( X )) , C SSD = SSD( D ( X ) , D ( Y )) (cid:54) = 0 and K SSD = (cid:0) D ( X ) C SSD (cid:1) † (cid:0) D ( Y ) C SSD (cid:1) . Then, K f v = λv and K b v = λ − v for some v ∈ C N d \ { } and λ ∈ C \ { } ifand only if there exists vector w such that v = C SSD w and K SSD w = λw .The proof of this result is a consequence of Theorems 4.2and 5.5. Note that the linear evolutions identified by SSDmight not be Koopman eigenfunctions, since we can only guarantee that they evolve linearly according to the availabledata snapshots, not starting everywhere in the state space M .The following result uses the equivalence between SSD andthe Forward-Backward EDMD method to provide a guaranteefor the identification of Koopman eigenfunctions. Theorem 5.7: (Identification of Koopman Eigenfunctions bythe SSD Algorithm):
Given an infinite sampling, supposethat the sequence of dictionary snapshot matrices is R -rich.For N ≥ R , let C SSD N = SSD( D ( X N ) , D ( Y N )) (cid:54) = 0 ,and K SSD N = (cid:0) D ( X N ) C SSD N (cid:1) † (cid:0) D ( Y N ) C SSD N ) . Given v ∈ C N d \ { } and λ ∈ C \ { } , let f ( x ) = D ( x ) v . Then,(a) If f is an eigenfunction of the Koopman operator witheigenvalue λ , then for every N ≥ R , there exists w N such that v = C SSD N w N and K SSD N w N = λw N ;(b) Conversely, and assuming the dictionary functions arecontinuous and Assumption 4.3 holds, if v ∈ R ( C SSD N ) and there exists w N such that v = C SSD N w N and K SSD N w N = λw N for every N ≥ R , then f is an eigen-function of the Koopman operator with probability 1.This result is a consequence of Theorem 4.5 and Corol-lary 5.6. Theorem 5.7 shows that the SSD algorithm findsall the eigenfunctions in the span of the original dictionaryalmost surely. The identified eigenfunctions span a Koopman-invariant subspace. This subspace however is not necessarilythe maximal Koopman-invariant subspace in the span of theoriginal dictionary. Next, we show that the SSD methodactually identifies the maximal Koopman-invariant subspacein the span of the dictionary. Theorem 5.8: (SSD Finds the Maximal Koopman-InvariantSubspace as N → ∞ ): Given an infinite sampling and adictionary composed of continuous functions, suppose thatthe sequence of dictionary snapshot matrices is R -rich andAssumption 4.3 holds. Let the columns of C SSD ∞ form a basisfor lim N →∞ R ( C SSD N ) , i.e., R ( C SSD ∞ ) = lim N →∞ R ( C SSD N ) = ∞ (cid:92) N = R R ( C SSD N ) . (28)(note that the sequence {R ( C SSD N ) } ∞ N =1 is monotonic, andhence convergent). Then span( D ( x ) C SSD ∞ ) is the maximalKoopman-invariant subspace in the span of the dictionary D with probability 1. Proof: If C SSD ∞ = 0 , considering the fact that for all N ≥ R , R ( C SSD N +1 ) ⊆ R ( C SSD N ) (Lemma 5.3), one deducesthat there exists m ∈ N such that for all i ≥ m , C SSD i = 0 .Hence based on Theorem 5.1(c), the maximal Koopman-invariant subspace acquired from the data is { } . Notingthat the subspace identified by SSD contains the maximalKoopman-invariant subspace, we deduce that the latter is thezero subspace, which is indeed spanned by D ( x ) C SSD ∞ .Now, suppose that C SSD ∞ (cid:54) = 0 and has full column rank.First, we show that R ( D ( X N ) C SSD ∞ ) = R ( D ( Y N ) C SSD ∞ ) , ∀ N ≥ R. (29)Considering (28) and the fact that for all N ≥ R , R ( C SSD N +1 ) ⊆R ( C SSD N ) , we can write for all N ≥ R R ( C SSD ∞ ) = ∞ (cid:92) i = N R ( C SSD i ) . Invoking Lemma A.4, we have for all N ≥ R , R ( D ( X N ) C SSD ∞ ) = ∞ (cid:92) i = N R ( D ( X N ) C SSD i ) , (30a) R ( D ( Y N ) C SSD ∞ ) = ∞ (cid:92) i = N R ( D ( Y N ) C SSD i ) . (30b)Moreover, for all i ≥ N we have R ( D ( X i ) C SSD i ) = R ( D ( Y i ) C SSD i ) and hence by looking at this equality in arow-wise manner, one can write R ( D ( X N ) C SSD i ) = R ( D ( Y N ) C SSD i ) , ∀ i ≥ N. (31)The combination of (30) and (31) yields (29). Based on thelatter, the fact that D ( X N ) and D ( Y N ) have full columnrank for every N ≥ R and the fact that C SSD ∞ has full columnrank, there exists a unique nonsingular square matrix K SSD ∞ ∈ R (cid:93) cols( C SSD ∞ ) × (cid:93) cols( C SSD ∞ ) such that D ( X N ) C SSD ∞ K SSD ∞ = D ( Y N ) C SSD ∞ , ∀ N ≥ R. (32)Note that K SSD ∞ does not depend on N . Next, we aim to provethat for every function f ∈ span( D ( x ) C SSD ∞ ) , K ( f ) is alsoin span( D ( x ) C SSD ∞ ) almost surely. Let v ∈ R (cid:93) cols( C SSD ∞ ) suchthat f ( x ) = D ( x ) C SSD ∞ v and define g ( x ) := D ( x ) C SSD ∞ K SSD ∞ v. (33)We show that g = f ◦ T = K ( f ) almost surely. Define thefunction h := g − f ◦ T . Also, let S ∞ = (cid:83) ∞ N = R S N be the setof initial conditions. Based on (32), (33), and definition of h ,we have h ( x ) = 0 , ∀ x ∈ S ∞ . Moreover, h is continuous since D and T are continuous.This, together with the fact that S ∞ is dense in M almostsurely (Assumption 4.3), we deduce h ≡ on M almostsurely. Therefore, g = K ( f ) = f ◦ T with probability 1.Noting that g ( x ) ∈ span( D ( x ) C SSD ∞ ) , we have proven that span( D ( x ) C SSD ∞ ) is Koopman invariant almost surely.Finally, we prove the maximality of span( D ( x ) C SSD ∞ ) . Let L be a Koopman-invariant subspace in span( D ( x )) . Thenthere exists a full column rank matrix E such that L =span( D ( x ) E ) . Moreover, since the invariance of L reflectsin data, we have R ( D ( X N ) E ) = R ( D ( Y N ) E ) , ∀ N ≥ R. As a result, based on Theorem 5.1(c), we have R ( E ) ⊆R ( C SSD N ) , for all N ≥ R , and hence R ( E ) ⊆ R ( C SSD ∞ ) .Therefore, by Lemma A.2 we have L = R ( D ( x ) E ) ⊆ R ( D ( x ) C SSD ∞ ) , which completes the proof. Remark 5.9: (Generalized Koopman Eigenfunctions):
Onecan also extend the above discussion for generalized Koopmaneigenfunctions (see e.g. [6, Remark 11]). Given a generalizedeigenvector w of K SSD , the corresponding generalized Koop-man eigenfunction is φ ( x ) = D ( x ) C SSD w . (cid:3) VI. S
TREAMING S YMMETRIC S UBSPACE D ECOMPOSITION
In this section, we consider the setup where data becomesavailable in a streaming fashion. A straightforward algorithmicsolution for this scenario would be to re-run, at each timestep,the SSD algorithm with all the data available up to then.However, this approach does not take advantage of the answerscomputed in previous timesteps, and maybe become inefficientwhen the size of the data is large. Instead, here we pursue thedesign of an online algorithm, termed Streaming SymmetricSubspace Decomposition (SSSD), cf. Algorithm 2, that up-dates the identified subspaces using the previously computedones. Note that the SSSD algorithm is not only useful forstreaming data sets but also for the case of non-streaming largedata sets for which the execution of SSD requires a significantamount of memory.
Algorithm 2
Streaming Symmetric Subspace Decomposition Initialization D XS (1) ← (cid:20) D ( X S ) D ( x S +1 ) (cid:21) , D YS (1) ← (cid:20) D ( Y S ) D ( y S +1 ) (cid:21) i ← , A ← D XS (1) , B ← D YS (1) , C ← I N d while do if C i − = 0 then C i ← (cid:46) The basis does not exist return C i break end if F i ← SSD( A i , B i ) if F i = 0 then C i ← (cid:46) The basis does not exist return C i break end if if (cid:93) rows( F i ) > (cid:93) cols( F i ) then C i ← basis( R ( C i − F i )) (cid:46) Subspace reduction else C i ← C i − (cid:46) No change end if return C i i ← i + 1 (cid:79) Replacing the last data snapshot with the new one D XS ( i ) = (cid:20) D ( X S ) D ( x S + i ) (cid:21) , D YS ( i ) = (cid:20) D ( Y S ) D ( y S + i ) (cid:21) (cid:79) Calculating the reduced dictionary snapshots A i ← D XS ( i ) C i − , B i ← D YS ( i ) C i − end while Given the signature snapshot matrices X S and Y S , forsome S ∈ N , and a dictionary of functions D , the SSSDalgorithm proceeds as follows: at each iteration, the algorithmreceives a new pair of data snapshots, combines them with sig-nature data matrices, and applies the latest available dictionaryon them. Then, it uses SSD on those dictionary matrices andfurther prunes the dictionary. Since the SSD algorithm relieson Assumption 3.1, we make the following assumption on thesignature snapshots and the original dictionary. Assumption 6.1: (Full Rank Signature Dictionary Matri-ces):
We assume that there exists S ∈ N such that the matrices D ( X S ) and D ( Y S ) have full column rank. (cid:3) For a finite number of data snapshots, Assumption 6.1is equivalent to Assumption 3.1. For an infinite sampling,Assumption 6.1 holds for a R -rich sequence of snapshotmatrices.The next result discusses the basic properties of the SSSDoutput at each iteration. Proposition 6.2: (Properties of SSSD Output):
Suppose As-sumption 6.1 holds. For i ∈ N , let C i denote the output of theSSSD algorithm at the i th iteration. Then, for all i ∈ N ,(a) C i has full column rank or is equal to zero;(b) R ( C i ) ⊆ R ( C i − ) ;(c) R ( D XS ( i ) C i ) = R ( D YS ( i ) C i ) . Proof: (a) We prove the claim by induction. C = I N d and has full column rank. Now, suppose that C k has fullcolumn rank or is zero. We show the same fact for C k +1 . If C k = 0 , then SSSD executes Step 6 and we have C k +1 = 0 .Now, suppose that C k has full column rank. Considering thefact that D XS ( k + 1) and D YS ( k + 1) have full column rank,one can deduce that A k +1 and B k +1 have full column rank.Consequently, based on Theorem 5.1(b), F k +1 has full columnrank or is equal to zero. In the former case, the algorithmexecutes Step 17 or Step 19, and based on definition of basis function and the fact that C k has full column rank, one deducesthat C k +1 has full column rank. In the latter case, the algorithmexecutes Step 12, and C k +1 = 0 , as claimed.Now we prove (b). Note that at iteration i , C i will bedetermined by either Step 6, 12, 19, or 17. The proof for thefirst three cases is trivial. We only need to prove the result forthe case when the SSSD algorithm executes Step 17. Basedon Theorem 5.1(b), one can deduce that F i has full columnrank. Also, we have R ( F i ) ⊆ R ( I (cid:93) cols( C i − ) ) . Hence usingStep 17 and Lemma A.2, one can write R ( C i ) = R ( C i − F i ) ⊆ R ( C i − I (cid:93) cols( C i − ) ) = R ( C i − ) , as claimed.Next, we prove part (c). If the SSSD algorithm executesStep 6 or Step 12, then the result follows directly. Now,suppose that the algorithm executes Step 17 or Step 19. Notethat if the algorithm executes one of these two steps, then F i (cid:54) = 0 , C i − (cid:54) = 0 and they have full column rank (Theo-rem 5.1(b)). Hence, (cid:93) rows( F i ) ≥ (cid:93) cols( F i ) . As a result, if thealgorithm executes Step 19, we have (cid:93) rows( F i ) = (cid:93) cols( F i ) and consequently R ( F i ) = R ( I (cid:93) cols( C i − ) ) . Therefore, R ( C i ) = R ( C i − ) = R ( C i − I (cid:93) cols( C i − ) ) = R ( C i − F i ) . (34)Moreover, if the SSSD algorithm executes Step 17, then usingthe definition of basis function, we have R ( C i ) = R ( C i − F i ) . (35)Also, based on definition of F i at Step 10, Theorem 5.1(b),and the fact that A i = D XS ( i ) C i − and B i = D YS ( i ) C i − , R ( D XS ( i ) C i − F i ) = R ( D YS ( i ) C i − F i ) . Using this in conjunction with (34) upon execution ofStep 19 and (35) upon execution of Step 17, one deduces R ( D XS ( i ) C i ) = R ( D YS ( i ) C i ) , concluding the proof.Next, we show that the SSSD algorithm at each iterationidentifies exactly the same subspace as the SSD algorithmgiven all the data up to that iteration. Theorem 6.3: (Equivalence of SSD and SSSD):
SupposeAssumption 6.1 holds. For i ∈ N , let C i denote the outputof the SSSD algorithm at the i th iteration and let C SSD i =SSD (cid:0) D ( X S + i ) , D ( Y S + i ) (cid:1) . Then, R ( C i ) = R ( C SSD i ) , ∀ i ∈ N . Proof: Inclusion R ( C SSD i ) ⊆ R ( C i ) for all i ∈ N : Wereason by induction. Note that in the SSSD algorithm, for i = 1 we have F = C SSD1 . As a result, if F = 0 thenbased on Step 12, C = C SSD1 = 0 . If the SSSD algorithmexecutes Step 17, then using the fact that C = I N d , one canwrite R ( C ) = R ( C SSD1 ) . Moreover, if the SSSD algorithmexecutes Step 19, based on Step 16 and Theorem 5.1(b), onecan deduce that R ( C SSD1 ) = R ( C ) = R ( F ) = R ( I N d ) .Consequently, in all cases R ( C SSD1 ) = R ( C ) . (36)Hence, R ( C SSD1 ) ⊆ R ( C ) . Now, suppose that R ( C SSD k ) ⊆ R ( C k ) . (37)We need to show that R ( C SSD k +1 ) ⊆ R ( C k +1 ) . If C SSD k +1 = 0 then the proof follows. Now assume that C SSD k +1 (cid:54) = 0 and hasfull column rank based on Theorem 5.1(b). By Lemma 5.3,we have R ( C SSD k +1 ) ⊆ R ( C SSD k ) . (38)Using (37) and (38), one can deduce that R ( C SSD k +1 ) ⊆ R ( C k ) .Consequently, based the fact that C SSD k +1 (cid:54) = 0 , we have C k (cid:54) = 0 and hence has full column rank based on Proposition 6.2(a).Moreover, there exists a matrix E k with full column rank suchthat C SSD k +1 = C k E k . (39)Two cases are possible. In case 1, the SSSD algorithm executesStep 19. In case 2, the algorithm executes Step 12 or Step 17.For case 1, we have C k +1 = C k . Consequently, using (39)and considering the fact that R ( E k ) ⊆ R ( I (cid:93) cols( C k ) ) and thefact that C k has full column rank, one can use Lemma A.2and conclude R ( C SSD k +1 ) = R ( C k E k ) ⊆ R ( C k ) = R ( C k +1 ) . (40)Now, consider case 2. In this case, we have R ( C k +1 ) = R ( C k F k +1 ) . (41)Also, based on definition of C SSD k +1 and Theorem 5.1(b), onecan write R ( D ( X k +1 C SSD k +1 )) = R ( D ( Y k +1 C SSD k +1 )) . Looking at this equation in a row-wise manner and consideringthe fact that rows (cid:0) [ D XS ( k + 1) , D YS ( k + 1)] (cid:1) ⊆ rows (cid:0) [ D ( X k +1 ) , D ( Y k +1 )] (cid:1) , one can write R ( D XS ( k + 1) C SSD k +1 ) = R ( D YS ( k + 1) C SSD k +1 ) . Now, using (39) we have R ( D XS ( k + 1) C k E k ) = R ( D YS ( k +1) C k E k ) . Also, noting the definition of F k +1 and the factthat A k = D XS ( k + 1) C k , B k = D YS ( k + 1) C k , one can useTheorem 5.1(c) to write R ( E k ) ⊆ R ( F k +1 ) . Since C k hasfull column rank, one can use (39), (41), and Lemma A.2 towrite R ( C SSD k +1 ) = R ( C k E k ) ⊆ R ( C k F k +1 ) = R ( C k +1 ) . (42)In both cases, equations (40) and (42) conclude the induction. Inclusion R ( C i ) ⊆ R ( C SSD i ) for all i ∈ N : We reason byinduction too. Using (36), we have R ( C ) ⊆ R ( C SSD1 ) . Now,suppose that R ( C k ) ⊆ R ( C SSD k ) . (43)We prove the same result for k +1 . If C k +1 = 0 then the resultdirectly follows. Now, assume that C k +1 (cid:54) = 0 . Consequently,based on (43), Proposition 6.2(a), and Theorem 5.1(b), wededuce that C k +1 and C SSD k +1 have full column rank.The first part of the proof and (43) imply that R ( C k ) = R ( C SSD k ) . Consequently, noting the fact that C SSD k is theoutput of the SSD algorithm with D ( X S + k ) and D ( Y S + k ) ,one can use Theorem 5.1(b) to write R (cid:0) D ( X S + k ) C k (cid:1) = R (cid:0) D ( Y S + k ) C k (cid:1) . (44)Moreover, based on Proposition 6.2(b), we have R ( C k +1 ) ⊆R ( C k ) . Hence, since C k and C k +1 have full column rank,there exists a matrix G k with full column rank such that C k +1 = C k G k . (45)Also, based on Proposition 6.2(c) at iteration k + 1 of theSSSD algorithm R ( D XS ( k + 1) C k +1 ) = R ( D YS ( k + 1) C k +1 ) . (46)Consequently, based on (45) and (46), we have R ( D ( X S ) C k G k ) = R ( D ( Y S ) C k G k ) . Using this equation together with (44) and Lemma A.3, onecan write R (cid:0) D ( X S + k ) C k G k (cid:1) = R (cid:0) D ( Y S + k ) C k G k (cid:1) . Moreover, using (45) one can write R (cid:0) D ( X S + k ) C k +1 (cid:1) = R (cid:0) D ( Y S + k ) C k +1 (cid:1) . Hence, there exists a nonsingular square matrix K ∗ such that D ( X S + k ) C k +1 K ∗ = D ( Y S + k ) C k +1 . (47)Also, based on (46) and noting that D XS ( k +1) , D XS ( k +1) ,and C k +1 have full column rank, there exists a nonsingularsquare matrix K such that D XS ( k + 1) C k +1 K = D YS ( k + 1) C k +1 . (48) Using the first S rows of (47) and (48), one can write D ( X S ) C k +1 K ∗ = D ( Y S ) C k +1 ,D ( X S ) C k +1 K = D ( Y S ) C k +1 . By subtracting the second equation from the first one, we get D ( X S ) C k +1 ( K ∗ − K ) = 0 . Moreover, since D ( X S ) C k +1 has full column rank, wededuce K ∗ = K . Using this in conjunction with (47) and (48)leads to R (cid:0) D ( X S + k +1 ) C k +1 (cid:1) = R (cid:0) D ( Y S + k +1 ) C k +1 (cid:1) . (49)Based on this equation, the definition of C SSD k +1 and Theo-rem 5.1(c), we deduce that R ( C k +1 ) ⊆ R ( C SSD k +1 ) , concludingthe proof.Theorem 6.3 establishes the equivalence between the SSSDand SSD algorithms. As a consequence, all results regardingthe identification of Koopman-invariant subspaces and eigen-functions presented in Section V are also valid for the outputof the SSSD algorithm. Remark 6.4: (Time and Space Complexity of the SSSDAlgorithm):
Given the first N data snapshots and a dictionarywith N d elements, with N > S ≥ N d , and assuming thatoperations on scalar elements require time and space of order O (1) , the most time and memory consuming operation inthe SSSD algorithm is Step 10 invoking SSD. In this step,the most time consuming operation is performing SVD, withtime complexity O ( SN d ) and space complexity of O ( SN d ) ,see e.g., [43]. After having performed the first SVD, theensuing ones result in a reduction of the dimension of thesubspace. Therefore, the SSSD algorithm performs at most N − S SVDs with no subspace reduction with overall timecomplexity O ( N SN d ) and at most N d SVD operations withsubspace reductions with overall time complexity O ( SN d ) .Considering the fact that N ≥ N d , the complexity of theSSSD algorithm is O ( N SN d ) . Moreover, in many real worldapplications S = O ( N d ) (in fact usually S = N d ), whichreduces the time complexity of SSSD to O ( N N d ) , which isthe same complexity as SSD. Moreover, since we can reusethe space used in Step 10 at each iteration, and consideringthe fact that the space complexity of this step is O ( SN d ) ,we deduce that the space complexity of SSSD is O ( SN d ) .This usually reduces to O ( N d ) since S = O ( N d ) in manyreal-world applications. (cid:3) Remark 6.5: (SSSD is More Stable and Runs Faster thanSSD):
The SSSD algorithm is more computationally stablethan SSD, since it always works with matrices of size atmost ( S + 1) × N d while SSD works with matrices of size N × N d . Moreover, even though SSD and SSSD have thesame time complexity, the SSSD algorithm run faster for tworeasons. First, at each iteration of the SSSD algorithm, thedictionary gets smaller, which reduces the cost of computationfor the remaining data snapshots. Second, the characterizationsin Remarks 5.2 and 6.4 only consider the number of floatingpoint operations for the time complexity and ignore the amountof time used for loading the data. SSSD needs to load signif-icantly smaller data matrices, which leads to a considerablereduction in run time compared to SSD. (cid:3) VII. S
IMULATION R ESULTS
Here, we illustrate the efficacy of the proposed methodsin two examples. We have chosen on purpose examples thatresult in sparse low-dimensional representations in order to beable to fully detail the identified Koopman eigenvalues andassociated subspaces. However, it is worth pointing out thatthe results presented here are applicable without any restrictionon the type of dynamical system, its dimension, or the sparsityof the model in the dictionary.
Example 7.1: (Linear System with Non-invariant Dictio-nary):
Consider the system (cid:20) x x (cid:21) + = (cid:20) . . − . . (cid:21) (cid:20) x x (cid:21) , (50)with x T = [ x , x ] T . We use the non-invariant dictionary D ( x ) = [1 , x , x , x , x x , x , x , x x , x ] with N d = 9 .Moreover, we gather × data snapshots uniformly sampledfrom [ − , × [ − , .We use the SSD and SSSD strategies to identify themaximal Koopman-invariant subspace in span( D ( x )) . In theSSSD method, we use the first 9 data snapshots as signaturesnapshots and feed the rest of the data to the algorithmaccording to the order they appear in the data set. Since weperform our calculations using finite-precision digital com-puters, we used the strategy explained in Remark 5.4 with (cid:15) = 10 − . Both methods find bases (dictionaries) for thesubspace spanned by { , x , x , x , x x , x } with dimension6. It is worth mentioning that SSSD performs the calcula-tions 97% faster than SSD. Using either of the calculateddictionaries with (25), one can find K SSD . Moreover, basedon Theorems 5.5, 5.7, we use the eigendecomposition of K SSD to find all the Koopman eigenfunctions associated withthe system (50) in span( D ( x )) . Table I shows the identifiedeigenfunctions up to four decimals. One can readily verifythat all the identified eigenfunctions evolve linearly in timeaccording to the dynamics. TABLE I: Identified eigenfunctions and eigenvalues of the Koopman operatorassociated with system (50).
Eigenfunction Eigenvalue φ ( x ) = 1 λ = 1 φ ( x ) = (0 . x + 0 . x )+ (0 . x − . x ) j λ = 0 . . jφ ( x ) = (0 . x + 0 . x ) − (0 . x − . x ) j λ = 0 . − . jφ ( x ) = x + x λ = 1 φ ( x ) = (0 .
095 + 0 . j )( x − x ) − x x (1 . − . j ) λ = 0 .
28 + 0 . jφ ( x ) = (0 . − . j )( x − x ) − x x (1 .
991 + 0 . j ) λ = 0 . − . j To compare the effectiveness of our methods with EDMDfor trajectory prediction, we define error functions that quan-tify the accuracy of predictions. Let ˜ D ( x ) be the invariantdictionary identified by SSD (or equivalently by SSSD). Given a trajectory { x ( k ) } Mk =0 with length M and initial condition x ,we define the following relative error functions for EDMDand SSD. E EDMDrelative ( k ) = (cid:13)(cid:13) D ( x ( k )) − D ( x (0)) (cid:0) K EDMD (cid:1) k (cid:13)(cid:13) (cid:107) D ( x ( k )) (cid:107) × , (51a) E SSDrelative ( k ) = (cid:13)(cid:13) ˜ D ( x ( k )) − ˜ D ( x (0)) (cid:0) K SSD (cid:1) k (cid:13)(cid:13) (cid:107) ˜ D ( x ( k )) (cid:107) × , (51b)where D ( x (0)) (cid:0) K EDMD (cid:1) k and ˜ D ( x (0)) (cid:0) K SSD (cid:1) k are pre-dicted dictionary vectors at time k calculated using EDMDand SSD, respectively. Also, we can use the angle between thepredicted and exact dictionary vectors to quantify the error, E EDMDangle ( k ) = ∠ (cid:16) D ( x ( k )) , D ( x (0)) (cid:0) K EDMD (cid:1) k (cid:17) , (52a) E SSDangle ( k ) = ∠ (cid:16) ˜ D ( x ( k )) , ˜ D ( x (0)) (cid:0) K SSD (cid:1) k (cid:17) . (52b)Figure 1 illustrates the aforementioned errors along a trajectorystarting from a random initial condition in [ − , × [ − , for20 time steps. Unlike EDMD, the linear prediction using theSSD algorithm matches the behavior of the system exactly.Even though EDMD captures the Koopman eigenfunctionsexactly (Lemma 4.1), the additional functions in the dictionarydo not evolve according to the acquired model, which leadsto high prediction errors even after one time step. time step r e l a ti v e e rr o r ( % ) E relativeSSD E relativeEDMD time step a ng l e (r a d ) E angleSSD E angleEDMD Fig. 1: Relative (left) and angle (right) prediction errors for EDMD and SSDfor system (50) on a trajectory of length M = 20 . Example 7.2: (Unstable Discrete-time Polynomial System):
Consider the nonlinear system x +1 = 1 . x x +2 = 1 . x + 0 . x + 0 . , (53)with state x T = [ x , x ] T . System (53) is actually an unsta-ble Polyflow [44] which has a finite-dimensional Koopman-invariant subspace comprised of polynomials. We use thedictionary D ( x ) = [1 , x , x , x , x x , x , x , x x , x x , x ] with N d = 10 . Moreover, we gather × data snap-shots uniformly sampled from [ − , × [ − , . We use theSSD and SSSD strategies to identify the maximal Koopman-invariant subspaces in span( D ( x )) . In the SSSD method, weuse the first 10 data snapshots as signature snapshots andfeed the rest of the data to the algorithm according to theorder they appear in the data set. Similarly to the previousexample, we use the strategy explained in Remark 5.4 with (cid:15) = 10 − to overcome error due to the use of finite-precision machines. Both methods find bases for the 6-dimensionalsubspace spanned by { , x , x , x x , x , x } , which is themaximal Koopman-invariant subspace in span( D ( x )) . TheSSSD method, however, performs the calculations 96% fasterthan SSD. One can find K SSD using either of the identifieddictionaries in conjunction with (25). Moreover, based onTheorems 5.5, 5.7, we use the eigendecomposition of K SSD to find all the Koopman eigenfunctions associated with thesystem (50) in span( D ( x )) . Table II shows the identifiedeigenfunctions. One can use direct calculation to verify thatthe identified functions are the Koopman eigenfunctions asso-ciated with system (53). Note that since x and x are bothin the span of the identified Koopman-invariant subspace, onecan fully characterize the behavior of the system using theeigenfunctions and (6) linearly or directly using the identifieddictionary and K SSD . TABLE II: Identified eigenfunctions and eigenvalues of the Koopman operatorassociated with system (53).
Eigenfunction Eigenvalue φ ( x ) = 1 λ = 1 φ ( x ) = x λ = 1 . φ ( x ) = x λ = 1 . φ ( x ) = 20 x − x − λ = 1 . φ ( x ) = x λ = 1 . φ ( x ) = 20 x − x x − x λ = 1 . We compare the prediction using K SSD and K EDMD toshow the superiority of the proposed method for long termpredictions. Figure 2 illustrates the relative and angle errorsalong a trajectory starting from a random initial condition in [ − , × [ − , for 20 time steps. Unlike EDMD, the linearprediction using the SSD algorithm matches the behavior ofthe system exactly. Moreover, since the system is unstable,the EDMD errors grow rapidly and, after only a few timesteps, the predicted dictionary vector is nearly in the oppositedirection of the actual one. time step r e l a ti v e e rr o r ( % ) E relativeSSD E relativeEDMD time step a ng l e (r a d ) E angleSSD E angleEDMD Fig. 2: Relative (left) and angle (right) prediction errors for EDMD and SSDfor system (53) on a trajectory of length M = 20 . VIII. C
ONCLUSIONS
We have studied the characterization of Koopman-invariantsubspaces and Koopman eigenfunctions associated to a dy-namical system by means of data-driven methods. We haveshown that the application of EDMD over a given dictionaryforward and backward in time fully characterizes whether a function evolves linearly in time according to the availabledata. Building on this result, and under dense sampling,we have established that functions satisfying this conditionare Koopman eigenfunctions almost surely. Motivated by theimpracticality of checking the proposed conditions for largedictionaries, we have developed an alternative approach toidentify the maximal Koopman-invariant subspace in the spanof the given dictionary, which in turn also serves to identifythe Koopman eigenfunctions. We have formally characterizedthe correctness of the proposed SSD algorithm, paying specialattention to the number of iterations for convergence and com-plexity. Finally, we have considered scenarios with large andstreaming data sets, and developed the online SSSD algorithmthat refines its output as new data becomes available takingadvantage of the solutions computed in previous timesteps.We have shown that, for a given dataset, both SSD and SSSDprovide the same solution. Future work will develop paralleland distributed counterparts of the algorithms proposed hereover network systems and investigate the design of noise-resilient methods to identify Koopman eigenfunctions andinvariant subspaces. R
EFERENCES[1] M. Haseli and J. Cort´es, “Efficient identification of linear evolutions innonlinear vector fields: Koopman invariant subspaces,” in
IEEE Conf.on Decision and Control , Nice, France, Dec. 2019, pp. 1746–1751.[2] B. O. Koopman, “Hamiltonian systems and transformation in Hilbertspace,”
Proceedings of the National Academy of Sciences , vol. 17, no. 5,pp. 315–318, 1931.[3] B. O. Koopman and J. V. Neumann, “Dynamical systems of continuousspectra,”
Proceedings of the National Academy of Sciences , vol. 18,no. 3, pp. 255–263, 1932.[4] I. Mezi´c, “Spectral properties of dynamical systems, model reductionand decompositions,”
Nonlinear Dynamics , vol. 41, no. 1-3, pp. 309–325, 2005.[5] C. W. Rowley, I. Mezi´c, S. Bagheri, P. Schlatter, and D. S. Henningson,“Spectral analysis of nonlinear flows,”
Journal of Fluid Mechanics , vol.641, pp. 115–127, 2009.[6] M. Budiˇsi´c, R. Mohr, and I. Mezi´c, “Applied Koopmanism,”
Chaos ,vol. 22, no. 4, p. 047510, 2012.[7] A. Surana, M. O. Williams, M. Morari, and A. Banaszuk, “Koopmanoperator framework for constrained state estimation,” in
IEEE Conf. onDecision and Control , Melbourne, Australia, 2017, pp. 94–101.[8] M. Netto and L. Mili, “A robust data-driven Koopman Kalman filter forpower systems dynamic state estimation,”
IEEE Transactions on PowerSystems , vol. 33, no. 6, pp. 7228–7237, 2018.[9] A. Mauroy and J. Goncalves, “Linear identification of nonlinear systems:A lifting technique based on the Koopman operator,” in
IEEE Conf. onDecision and Control , Las Vegas, NV, Dec. 2016, pp. 6500–6505.[10] ——, “Koopman-based lifting techniques for nonlinear systems identi-fication,”
IEEE Transactions on Automatic Control , 2019, to appear.[11] D. Bruder, C. D. Remy, and R. Vasudevan, “Nonlinear system identifi-cation of soft robot dynamics using Koopman operator theory,” in
IEEEInt. Conf. on Robotics and Automation , Montreal, Canada, May 2019,pp. 6244–6250.[12] B. Kramer, P. Grover, P. Boufounos, S. Nabi, and M. Benosman,“Sparse sensing and DMD-based identification of flow regimes andbifurcations in complex flows,”
SIAM Journal on Applied DynamicalSystems , vol. 16, no. 2, pp. 1164–1196, 2017.[13] S. Sinha, U. Vaidya, and R. Rajaram, “Operator theoretic framework foroptimal placement of sensors and actuators for control of nonequilibriumdynamics,”
Journal of Mathematical Analysis and Applications , vol. 440,no. 2, pp. 750–772, 2016.[14] S. Klus, F. N¨uske, P. Koltai, H. Wu, I. Kevrekidis, C. Sch¨utte, and F. No´e,“Data-driven model reduction and transfer operator approximation,”
Journal of Nonlinear Science , vol. 28, no. 3, pp. 985–1010, 2018.[15] A. Alla and J. N. Kutz, “Nonlinear model order reduction via dynamicmode decomposition,”
SIAM Journal on Scientific Computing , vol. 39,no. 5, pp. B778–B796, 2017. [16] M. Korda and I. Mezi´c, “Linear predictors for nonlinear dynamical sys-tems: Koopman operator meets model predictive control,” Automatica ,vol. 93, pp. 149–160, 2018.[17] H. Arbabi, M. Korda, and I. Mezic, “A data-driven Koopman modelpredictive control framework for nonlinear flows,” arXiv preprintarXiv:1804.05291 , 2018.[18] S. Peitz and S. Klus, “Koopman operator-based model reduction forswitched-system control of PDEs,”
Automatica , vol. 106, pp. 184–191,2019.[19] B. Huang, X. Ma, and U. Vaidya, “Feedback stabilization using Koop-man operator,” in
IEEE Conf. on Decision and Control , Miami Beach,FL, Dec. 2018, pp. 6434–6439.[20] E. Kaiser, J. N. Kutz, and S. L. Brunton, “Data-driven discovery ofKoopman eigenfunctions for control,” arXiv preprint arXiv:1707.01146 ,2017.[21] A. Sootla and D. Ernst, “Pulse-based control using Koopman operatorunder parametric uncertainty,”
IEEE Transactions on Automatic Control ,vol. 63, no. 3, pp. 791–796, 2017.[22] I. Abraham, G. de la Torre, and T. Murphey, “Model-based control usingKoopman operators,” in
Robotics: Science and Systems , Cambridge,Massachusetts, Jul. 2017.[23] G. Mamakoukas, M. Castano, X. Tan, and T. Murphey, “Local Koopmanoperators for data-driven control of robotic systems,” in
Robotics:Science and Systems , Freiburg, Germany, Jun. 2019.[24] A. Mauroy, , and I. Mezi´c, “Global stability analysis using the eigen-functions of the Koopman operator,”
IEEE Transactions on AutomaticControl , vol. 61, no. 11, pp. 3356–3369, 2016.[25] P. J. Schmid, “Dynamic mode decomposition of numerical and experi-mental data,”
Journal of Fluid Mechanics , vol. 656, pp. 5–28, 2010.[26] K. K. Chen, J. H. Tu, and C. W. Rowley, “Variants of dynamic modedecomposition: boundary condition, Koopman, and Fourier analyses,”
Journal of Nonlinear Science , vol. 22, no. 6, pp. 887–915, 2012.[27] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, andJ. N. Kutz, “On dynamic mode decomposition: theory and applications,”
Journal of Computational Dynamics , vol. 1, no. 2, pp. 391–421, 2014.[28] S. T. M. Dawson, M. S. Hemati, M. O. Williams, and C. W. Rowley,“Characterizing and correcting for the effect of sensor noise in thedynamic mode decomposition,”
Experiments in Fluids , vol. 57, no. 3,p. 42, 2016.[29] M. S. Hemati, C. W. Rowley, E. A. Deem, and L. N. Cattafesta,“De-biasing the dynamic mode decomposition for applied Koopmanspectral analysis of noisy datasets,”
Theoretical and Computational FluidDynamics , vol. 31, no. 4, pp. 349–368, 2017.[30] M. R. Jovanovi´c, P. J. Schmid, and J. W. Nichols, “Sparsity-promotingdynamic mode decomposition,”
Physics of Fluids , vol. 26, no. 2, p.024103, 2014.[31] S. L. Clainche and J. M. Vega, “Higher order dynamic mode decom-position,”
SIAM Journal on Applied Dynamical Systems , vol. 16, no. 2,pp. 882–925, 2017.[32] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data-drivenapproximation of the Koopman operator: Extending dynamic modedecomposition,”
Journal of Nonlinear Science , vol. 25, no. 6, pp. 1307–1346, 2015.[33] M. Korda and I. Mezi´c, “On convergence of extended dynamic modedecomposition to the Koopman operator,”
Journal of Nonlinear Science ,vol. 28, no. 2, pp. 687–710, 2018.[34] M. Haseli and J. Cort´es, “Approximating the Koopman operator usingnoisy data: noise-resilient extended dynamic mode decomposition,” in
American Control Conference , Philadelphia, PA, Jul. 2019, pp. 5499–5504.[35] E. Qian, B. Kramer, B. Peherstorfer, and K. Willcox, “Lift & learn:Physics-informed machine learning for large-scale nonlinear dynamicalsystems,” arXiv preprint arXiv:1912.08177 , 2019.[36] Q. Li, F. Dietrich, E. M. Bollt, and I. G. Kevrekidis, “Extended dynamicmode decomposition with dictionary learning: A data-driven adaptivespectral decomposition of the Koopman operator,”
Chaos , vol. 27,no. 10, p. 103111, 2017.[37] N. Takeishi, Y. Kawahara, and T. Yairi, “Learning Koopman invariantsubspaces for dynamic mode decomposition,” in
Conference on NeuralInformation Processing Systems , 2017, pp. 1130–1140.[38] E. Yeung, S. Kundu, and N. Hodas, “Learning deep neural networkrepresentations for Koopman operators of nonlinear dynamical systems,”in
American Control Conference , Philadelphia, PA, Jul. 2019, pp. 4832–4839.[39] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N. Kutz, “Koop-man invariant subspaces and finite linear representations of nonlinear dynamical systems for control,”
PLOS One , vol. 11, no. 2, pp. 1–19,2016.[40] M. Korda and I. Mezi´c, “Optimal construction of Koopmaneigenfunctions for prediction and control,” 2019. [Online]. Available:https://hal.archives-ouvertes.fr/hal-02278835[41] S. Klus, F. N¨uske, S. Peitz, J. H. Niemann, C. Clementi, andC. Sch¨utte, “Data-driven approximation of the Koopman generator:Model reduction, system identification, and control,” arXiv preprintarXiv:1909.10638 , 2019.[42] G. B. Folland,
Real Analysis: Modern Techniques and Their Applica-tions , 2nd ed. New York: Wiley, 1999.[43] X. Li, S. Wang, and Y. Cai, “Tutorial: Complexity analysis of singularvalue decomposition and its variants,” arXiv preprint arXiv:1906.12085 ,2019.[44] R. M. Jungers and P. Tabuada, “Non-local linearization of nonlineardifferential equations via polyflows,” in
American Control Conference ,Philadelphia, PA, 2019, pp. 1906–1911. A PPENDIX
Here we gather some linear algebraic results used through-out the paper.
Lemma A.1: (Intersection of Linear Spaces):
Let
A, B ∈ R m × n be matrices with full column rank. Suppose that thecolumns of Z = [( Z A ) T , ( Z B ) T ] T ∈ R n × l form a basis forthe null space of [ A, B ] , where Z A , Z B ∈ R n × l . Then,(a) R ( AZ A ) = R ( A ) ∩ R ( B ) ;(b) Z A and Z B have full column rank. Proof: (a) First, note that R ( AZ A ) ⊆ R ( A ) . Moreover,by hypothesis, [ A, B ] Z = 0 , which leads to R ( AZ A ) = R ( BZ B ) ⊆ R ( B ) . Consequently, R ( AZ A ) ⊆ R ( A ) ∩ R ( B ) .Now, suppose that v ∈ R ( A ) ∩R ( B ) . By definition, there existvectors w , w ∈ R n such that v = Aw = Bw . Then, (cid:2) A, B (cid:3) (cid:20) w − w (cid:21) = 0 , which means that [ w T , − w T ] T ∈ R ( Z ) and there exists r ∈ R l such that w = Z A r and w = − Z B r . Therefore, v = Aw = AZ A r ∈ R ( AZ A ) . Consequently, R ( A ) ∩ R ( B ) ⊆R ( AZ A ) and the claim follows.(b) We prove this part using contradiction. Suppose thatthere exists v (cid:54) = 0 such that Z A v = 0 . Also, since [ A, B ] Z =0 , one can conclude that AZ A v = − BZ B v = 0 . Since B has full column rank, we have Z B v = 0 . Hence Zv = 0 ,which is a contradiction since the columns of Z are linearlyindependent. A similar reasoning shows that Z B has fullcolumn rank. Lemma A.2:
Let
A, C, D be matrices of appropriate sizes,with A having full column rank. Then R ( AC ) ⊆ R ( AD ) ifand only if R ( C ) ⊆ R ( D ) . Proof: ( ⇒ ) : Suppose that v ∈ R ( C ) , and hence thereexists w such that v = Cw . Therefore, Av = ACw ∈ R ( AC ) .Since R ( AC ) ⊆ R ( AD ) , one can deduce that there exist r such that ACw = ADr , and we get A ( v − Dr ) = 0 . Thisleads to v = Dr ∈ R ( D ) since A has full column rank, andhence R ( C ) ⊆ R ( D ) . ( ⇐ ) : Suppose that v ∈ R ( AC ) , and hence v = ACw for some w . Since R ( C ) ⊆ R ( D ) , there exists r such that Cw = Dr . As a result, v = ACw = ADr which leads to theconclusion that v ∈ R ( AD ) and the claim follows. Lemma A.3:
Let A , B ∈ R m × n , A , B ∈ R l × n , and C ∈ R n × k with A , B , C having full column rank. If R (cid:32) (cid:20) A A (cid:21) (cid:33) = R (cid:32) (cid:20) B B (cid:21) (cid:33) , (54a) R ( A C ) = R ( B C ) , (54b)then R (cid:32) (cid:20) A A (cid:21) C (cid:33) = R (cid:32) (cid:20) B B (cid:21) C (cid:33) . Proof:
Based on (54a), one can deduce that there existsa nonsingular square matrix K such that A = B K, (55a) A = B K. (55b)Multiplying both sides of (55a) by C gives A C = B KC. (56)Moreover, based on (54b), one can deduce that there exists anonsingular square matrix K ∗ A C = B CK ∗ . (57)By subtracting (57) from (56) and considering the fact that B has full column rank, one can deduce that CK ∗ = KC. (58)Now, multiplying both sides of (55b) from the right by C and using (58), one can write A C = B CK ∗ , which inconjunction with (57) leads to (cid:20) A A (cid:21) C = (cid:20) B B (cid:21) CK ∗ , completing the proof. Lemma A.4:
Let A , { C i } ∞ i =1 , and ˆ C be matrices of ap-propriate sizes. Assume that A has full column rank and R ( ˆ C ) = (cid:84) ∞ i =1 R ( C i ) . Then R ( A ˆ C ) = (cid:84) ∞ i =1 R ( AC i ) . Proof:
First, we prove that R ( A ˆ C ) ⊆ (cid:84) ∞ i =1 R ( AC i ) .Let v ∈ R ( A ˆ C ) , then there exists a vector w such that v = A ˆ Cw = Ar , with r = ˆ Cw . Note that r ∈ R ( ˆ C ) andconsequently r ∈ R ( C i ) for all i ∈ N . Hence, for every i ∈ N there exists z i such that r = C i z i . Based on the definition of v , we have v = Ar = AC i z i for every i ∈ N . As a result, v ∈ (cid:84) ∞ i =1 R ( AC i ) which concludes the proof of this inclusion.Now, we prove that (cid:84) ∞ i =1 R ( AC i ) ⊆ R ( A ˆ C ) . Let v ∈ (cid:84) ∞ i =1 R ( AC i ) . Then for every i ∈ N , there exists w i such that v = AC i w i . Moreover, v ∈ R ( A ) and since A has full columnrank, there exists a unique r such that v = Ar . Therefore, forall i ∈ N , we have r = C i w i . Thus, r ∈ (cid:84) ∞ i =1 R ( C i ) andconsequently, r ∈ R ( ˆ C ) . Since v = Ar , we have v ∈ R ( A ˆ C ) ,concluding the proof. Masih Haseli was born in Kermanshah, Iran in1991. He received the B.Sc. degree, in 2013, andM.Sc. degree, in 2015, both in Electrical Engi-neering from Amirkabir University of Technology(Tehran Polytechnic), Tehran, Iran. In 2017, hejoined the University of California, San Diegoto pursue the Ph.D. degree in Mechanical andAerospace Engineering. His research interests in-clude system identification, nonlinear systems, net-work systems, data-driven modeling and control, anddistributed and parallel computing. Mr. Haseli wasthe recipient of the bronze medal in Iran’s national mathematics competitionin 2014.