[PDF] Learning Koopman Eigenfunctions and Invariant Subspaces from Data: Symmetric Subspace Decomposition

Abstract

This paper develops data-driven methods to identify eigenfunctions of the Koopman operator associated to a dynamical system and subspaces that are invariant under the operator. We build on Extended Dynamic Mode Decomposition (EDMD), a data-driven method that finds a finite-dimensional approximation of the Koopman operator on the span of a predefined dictionary of functions. We propose a necessary and sufficient condition to identify Koopman eigenfunctions based on the application of EDMD forward and backward in time. Moreover, we propose the Symmetric Subspace Decomposition (SSD) algorithm, an iterative method which provably identifies the maximal Koopman-invariant subspace and the Koopman eigenfunctions in the span of the dictionary. We also introduce the Streaming Symmetric Subspace Decomposition (SSSD) algorithm, an online extension of SSD that only requires a small, fixed memory and incorporates new data as is received. Finally, we propose an extension of SSD that approximates Koopman eigenfunctions and invariant subspaces when the dictionary does not contain sufficient informative eigenfunctions.

Full PDF

11 Learning Koopman Eigenfunctions and InvariantSubspaces from Data: Symmetric SubspaceDecomposition

Masih Haseli and Jorge Cort´es

Abstract —This paper develops data-driven methods to identifyeigenfunctions of the Koopman operator associated to a dynam-ical system and subspaces that are invariant under the operator.We build on Extended Dynamic Mode Decomposition (EDMD), adata-driven method that ﬁnds a ﬁnite-dimensional approximationof the Koopman operator on the span of a predeﬁned dictionaryof functions. We propose a necessary and sufﬁcient conditionto identify Koopman eigenfunctions based on the application ofEDMD forward and backward in time. Checking this conditionrequires the comparison of the eigendecomposition of matriceswhose size grows with the size of the dictionary. To addressthis, we propose the Symmetric Subspace Decomposition (SSD)algorithm which provably identiﬁes the maximal Koopman-invariant subspace and the Koopman eigenfunctions in the spanof the dictionary. We also introduce the Streaming SymmetricSubspace Decomposition (SSSD) algorithm, an online methodthat only requires a small, ﬁxed memory and updates its estimateof the invariant subspace as new data is received. We prove that,given a data set, SSSD and SSD ﬁnd the same solution.

I. I

NTRODUCTION

Driven by advances in processing, data storage, cloud ser-vices, and algorithms, the world has witnessed in recent yearsa revolution in data-driven learning, analysis, and control ofdynamical phenomena. State-space, probabilistic, and neuralnetwork models are among the most popular methods tomodel dynamical systems. With sufﬁcient a priori informationabout the dynamics, state-space methods can provide closed-form analytic models that describe accurately the dynamicalbehavior. Such models, however, are generally nonlinear andtheir analytical study becomes arduous for moderate to high-dimensional systems. Probabilistic approaches, on the otherhand, provide an alternative description that is conductive todealing with incomplete information about the underlying dy-namics. However, under such approaches, deriving mathemati-cal guarantees may be hard, if not impossible. Neural networkscan describe the dynamics with high accuracy given enoughdata. The models acquired by neural networks are highly non-linear and difﬁcult to study analytically. Hence, even thoughthey can be successful in predicting the behavior of the system,they often do not provide a deeper understanding into theirdynamics. These reasons have motivated researchers to seekalternative strategies to capture the dynamics using data with

This work was supported by ONR Award N00014-18-1-2828.A preliminary version of this work appeared as [1] at the IEEE Conferenceon Decision and Control.Masih Haseli and Jorge Cort´es are with Department of Mechanical andAerospace Engineering, University of California, San Diego, CA 92093, USA, { mhaseli,cortes } @ucsd.edu minimum a priori information in a computationally efﬁcientway that result in simple yet accurate models. Approximatingthe Koopman operator associated with a dynamical system isone of such strategies. The Koopman operator is a linear butgenerally inﬁnite-dimensional operator that fully describes thebehavior of the underlying dynamical system. Even though thelinearity of the Koopman operator makes its spectral propertiesa powerful tool for analysis, its inﬁnite-dimensional natureprevents the use of conventional linear algebraic tools devel-oped to work with digital computers. One way to circumventthis issue is to identify ﬁnite-dimensional subspaces that areinvariant under the Koopman operator. This paper developsdata-driven methods to identify such subspaces. Literature Review:

The Koopman operator [2], [3] is alinear but generally inﬁnite-dimensional operator that providesan alternative view of dynamical systems by describing theeffect of the dynamics on a functional space. Being a linearoperator enables one to use its spectral properties to captureand predict the behavior of nonlinear dynamical systems [4]–[6]. This leads to a wide variety of applications includingstate estimation [7], [8], system identiﬁcation [9]–[12], sensorand actuator placement [13], model reduction [14], [15],control [16]–[21], and robotics [22], [23]. Moreover, theeigenfunctions of the Koopman operator play an importantrole in stability analysis of nonlinear systems [24]. Due tothe inﬁnite-dimensional nature of the Koopman operator, thedigital implementation of the aforementioned applications isnot possible unless one can ﬁnd a way to represent the effectof the operator on ﬁnite-dimensional subspaces. The literaturehas explored several data-driven methods to ﬁnd such ﬁnite-dimensional approximations, which can be divided into twomain categories: projection methods and invariant-subspacemethods. Projection methods ﬁt a linear model to the dataacquired from the system. The most popular approach in thiscategory is Dynamic Mode Decomposition (DMD), ﬁrst pro-posed to capture dynamical information from ﬂuid ﬂows [25].DMD uses linear algebraic methods to form a linear modelfrom time series data. The work [26] explores the properties ofDMD and its connection with the Koopman operator, and [27]generalizes it to work with non-sequential data snapshots.Several extensions work with noisy data [28], [29], promotesparsity [30], and consider time-lagged data snapshots [31].Extended Dynamic Mode Decomposition (EDMD) [32] is animportant variations of DMD that lifts the states of the systemto a (generally higher-dimensional) functional space using apredeﬁned dictionary of functions and ﬁnds the projection of a r X i v : . [ ee ss . S Y ] F e b the Koopman operator on that subspace. The work [33] studiesthe convergence properties of EDMD to the Koopman operatoras the number of data snapshots and dictionary elements goto inﬁnity. EDMD is speciﬁcally designed to work with exactdata and experiments and simulations show that it may notwork well with noisy data. Our previous work [34] presenteda noise-resilient extension of EDMD. EDMD can also be com-bined with known information about the dynamics into EDMDto increase the accuracy of the model [35]. The aforementionedmethods provide linear higher-dimensional approximations forthe underlying dynamics that are, however, not suitable forlong term predictions, since they are generally not exact. Thisissue can be tackled by ﬁnding subspaces that are invariantunder the Koopman operator, since the acquired linear modelsare exact over them. This is the subject of the second groupof approaches. The works [36]–[38] provide approaches toﬁnd functions that span Koopman-invariant subspaces usingneural networks. Moreover, since Koopman eigenfunctionsspan Koopman-invariant subspaces, one can use the empiricalmethods provided in [20], [39] to approximate the Koop-man eigenfunctions and consequently the invariant subspaces.Moreover, the work in [40] provides theoretical and empiricalresults based on multi-step predictions to ﬁnd Koopman eigen-functions. Interestingly, note that none of the aforementionedmethods provide mathematical guarantees for the identiﬁedfunctions to be Koopman eigenfunctions. Statement of Contributions:

We present data-driven methodsto identify Koopman eigenfunctions and Koopman-invariantsubspaces associated with a potentially nonlinear dynami-cal system. First, we study the properties of the standardEDMD method regarding the identiﬁcation of Koopman eigen-functions. We prove that EDMD correctly identiﬁes all theKoopman eigenfunctions in the span of the predeﬁned dic-tionary. This necessary condition however is not sufﬁcient,i.e., the functions identiﬁed by the EDMD method are notnecessarily Koopman eigenfunctions. This motivates our nextcontribution, which is a necessary and sufﬁcient condition thatcharacterizes the functions that evolve linearly according tothe available data snapshots. This condition is based on theapplication of EDMD forward and backward in time. Theidentiﬁed functions are not necessarily Koopman eigenfunc-tions, since one can only guarantee that they evolve linearlyon the available data (but not necessarily starting anywherein the state space). However, we prove that under reasonableassumptions on the density of the sampling, the identiﬁedfunctions are Koopman eigenfunctions almost surely. Our nextcontribution seeks to provide computationally efﬁcient ways ofidentifying Koopman eigenfunctions and Koopman-invariantsubspaces. In fact, checking the aforementioned necessary andsufﬁcient condition requires one to calculate and compare theeigendecomposition of two potentially large matrices, whichcan be computationally cumbersome. Moreover, even thoughthe subspace spanned by all the eigenfunctions in the spanof the original dictionary is Koopman-invariant, it might notbe maximal. To address these limitations, we propose theSymmetric Subspace Decomposition (SSD) strategy, whichis an iterative method to ﬁnd the maximal subspace thatremains invariant under the application of dynamics (and its associated Koopman operator) according to the available data.We prove that SSD also ﬁnds all the functions that evolvelinearly in time according to the available data. Moreover,we prove that under the same conditions in the samplingdensity, the SSD strategy identiﬁes the maximal Koopman-invariant subspace in the span of the original dictionary almostsurely. Our last contribution is motivated by applicationswhere the data becomes available in an online fashion. Insuch scenarios, at any given time step, one would need toperform SSD on all the available data received up to that time.Performing SSD requires the calculation of several singularvalue decompositions for matrices that scale with the size ofthe data, in turn requiring signiﬁcant memory capabilities.To address these shortcomings, we propose the StreamingSymmetric Subspace Decomposition (SSSD) strategy, whichreﬁnes the calculated Koopman-invariant subspaces each timeit receives new data and deals with matrices of ﬁxed andrelatively small size (independent of the size of the data). Weprove that SSSD and SSD methods are equivalent, in the sensethat for a given dataset, they both identify the same maximalKoopman-invariant subspace.

Notation:

We denote by N , N , R , R ≥ , and C , the sets ofnatural, nonnegative integer, real, positive real, and complexnumbers respectively. For a matrix A ∈ C m × n , we denotethe sets comprised of its rows by rows( A ) , its columns by cols( A ) , the number of its rows by (cid:93) rows( A ) , and the numberof its columns by (cid:93) cols( A ) , respectively. In addition, we denoteits pseudo-inverse, transpose, complex conjugate, conjugatetranspose, Frobenius norm, and range space by A † , A T , ¯ A , A H , (cid:107) A (cid:107) F , and R ( A ) , respectively. For ≤ i < k ≤ m , wedenote by A i : k the matrix formed with the i th to k th rows of A . For a square nonsingular matrix B , we denote its inverse by B − . Given matrices A ∈ C m × n and B ∈ C m × d , we denoteby [ A, B ] ∈ C m × ( n + d ) the matrix created by concatenating A and B . The angle between vectors v, w ∈ R n is ∠ ( v, w ) .Given v , . . . , v k ∈ C n , span { v , . . . , v k } represents the setcomprised of all linear combinations c v + · · · + c n v n , with c , . . . , c n ∈ C . We use j to denote the imaginary unit (thesolution of x + 1 = 0 ). For v ∈ C n , we denote its realand imaginary parts by Re( v ) and Im( v ) , and its 2-norm as (cid:107) v (cid:107) := √ v H v . Given a set A , we denote its complement by A c . Given sets A and B , A ⊆ B means that A is a subset of B . We denote by A ∩ B and A ∪ B the intersection and unionof A and B , and set A \ B := A ∩ B c . Given a sequenceof sets { A i } ∞ i =1 , we denote its superior and inferior limits by lim sup i →∞ A i and lim inf i →∞ A i , respectively. We refer tothe set consisting of all continuous strictly increasing functions α : R ≥ → R ≥ with α (0) = 0 by class- K . Given f : B → A and g : C → B , f ◦ g : C → A denotes their composition.II. P RELIMINARIES

In this section, we review basic concepts on the Koopmanoperator and Extended Dynamic Mode Decomposition.

A. Koopman Operator

Here, we introduce the (discrete-time) Koopman operatorand its spectral properties following [6]. Consider a nonlinear, time-invariant, continuous map T : M → M on M ⊆ R n ,deﬁning the dynamical system x + = T ( x ) . (1)The dynamics (1) acts on the points in the state space M andgenerates trajectories of the system. The Koopman operator, onthe other hand, provides an alternative approach to analyze (1)based on evolution of functions (also known as observables)deﬁned on M and taking values in C . Formally, let F be alinear space of functions from M to C which is closed undercomposition with T , i.e., f ◦ T ∈ F , ∀ f ∈ F . (2)The Koopman operator K : F → F associated with (1) is K ( f ) = f ◦ T. A closer look at the deﬁnition of the Koopman operator showsthat it advances the observables in time, i.e., for g = K ( f ) then g ( x ) = f ◦ T ( x ) = f ( x + ) , ∀ x ∈ M . (3)This equation shows how the Koopman operator encodes thedynamics on the functional space F . The operator is linear as adirect consequence of linearity in F , i.e., for every f , f ∈ F and c , c ∈ C , K ( c f + c f ) = c K ( f ) + c K ( f ) . (4)Assuming F contains the functions describing the states ofthe system, g i ( x ) = x i with i ∈ { , . . . , n } , the Koopmanoperator fully characterizes the global features of the dynamicsin a linear fashion. Moreover, the operator might be (andgenerally is) inﬁnite dimensional either by choice of F ordue to closedness requirement in (2).Being linear, one can naturally deﬁne its eigendecomposi-tion. A function φ ∈ F is an eigenfunction of K associatedwith eigenvalue λ ∈ C if K ( φ ) = λφ. (5)The combination of (3) and (5) leads to a signiﬁcant propertyof the Koopman operator: the linear evolution of its eigen-functions in time. Formally, given an eigenfunction φ , φ ( x + ) = ( φ ◦ T )( x ) = K ( φ )( x ) = λφ ( x ) . The linear evolution of eigenfunctions, in conjunction withthe linearity (4) of the operator, enables us to use spectralproperties to analyze the nonlinear system (1). Given a setof eigenpairs { ( λ i , φ i ) } N k i =1 such that K ( φ i ) = λ i φ i , i ∈{ , . . . , N k } , one can describe the evolution of every function f in span( { φ i } N k i =1 ) , i.e., f = (cid:80) N k i =1 c i φ i , for some { c i } N k i =1 ⊂ C , as f ( x ( k )) = N k (cid:88) i =1 c i λ ki φ i ( x (0)) , ∀ k ∈ N . (6)The constants { c i } N k i =1 are called Koopman modes . It is im-portant to note that one might need to use N k = ∞ to fullydescribe the behavior of the dynamical system.Another important notion in the analysis of the Koopmanoperator is the invariance of subspaces under its application. Formally, a subspace S ⊆ F is Koopman-invariant if forevery f ∈ S we have K ( f ) ∈ S . Furthermore, S is maximalKoopman-invariant in L ⊆ F if it contains every Koopman-invariant subspace in L . Naturally, a set comprised of Koop-man eigenfunctions spans a Koopman-invariant subspace. B. Extended Dynamic Mode Decomposition

Our exposition here mainly follows [32]. As mentionedearlier, despite its linearly, the inﬁnite-dimensional nature ofthe Koopman operator obstructs the use of efﬁcient linearalgebraic methods. One natural way to overcome this problemis ﬁnding ﬁnite-dimensional approximations for it. ExtendedDynamic Mode Decomposition (EDMD) is a popular data-driven method to perform this task that lifts data snapshotsacquired from the dynamical system to a higher-dimensionalspace using a predeﬁned dictionary of functions. The projec-tion of the action of the operator on the span of the dictionarycan then be found by solving a least-squares problem.Formally, let D : R n → R × N d be a dictionary of N d functions in F with D ( x ) = [ d ( x ) , . . . , d N d ( x )] . Moreover,let X, Y ∈ R N × n be matrices comprised of N data snapshotssuch that y i = T ( x i ) for i ∈ { , . . . , N } , where x Ti and y Ti are i th rows of X and Y , respectively. For convenience, wedeﬁne the action of the dictionary on a matrix as D ( X ) := [ D ( x ) T , . . . , D ( x N ) T ] T . The EDMD method approximates the projection of the Koop-man operator by ﬁnding the matrix that best explains the dataover the dictionary, i.e.,minimize K (cid:107) D ( Y ) − D ( X ) K (cid:107) F , which yields the closed-form solution K EDMD ( D ( X ) , D ( Y )) = D ( X ) † D ( Y ) . (7)Note that the solution depends on the choice of dictionary(when it is clear from the context, we simply use K EDMD ).If the dictionary spans a Koopman-invariant subspace, then (cid:107) D ( Y ) − D ( X ) K EDMD (cid:107) F = 0 and K EDMD fully capturesthe evolution of functions in span( D ( x )) . Otherwise, EDMDloses some information about the dynamics. The work [33]analyzes the convergence properties of EDMD to the Koopmanoperator as the number of dictionary functions N d and datasnapshots N go to inﬁnity.III. P ROBLEM S TATEMENT

As described in Section II-B, the EDMD method losesinformation about the dynamical system when the employeddictionary does not span a Koopman-invariant subspace. Asa result, in such cases, the EDMD approximation is notappropriate for long term prediction of the state evolution.Motivated by this observation, our goal is to ﬁnd the maximalKoopman-invariant subspace and Koopman eigenfunctions inthe span of a given dictionary.Formally, given the dynamical system (1) deﬁned by T : M → M , data matrices X and Y comprised of N datasnapshots, and a dictionary of functions D , our main goalis two-fold: (a) ﬁnd all the Koopman eigenfunctions in span( D ( x )) ;(b) ﬁnd a basis for the maximal Koopman-invariant sub-space in span( D ( x )) .Note that (a) and (b) are closely related. The eigenfunc-tions found by solving (a) span Koopman-invariant subspaces.Those invariant subspaces however might not be maximal.This mild difference between (a) and (b) requires the use ofdifferent solution approaches. Since we are dealing with ﬁnite-dimensional linear subspaces, we aim to use linear algebrainstead of optimization-based methods, which are widely usedfor solving these types of problems. This enables us to directlyuse computationally efﬁcient linear algebraic packages thatoptimization methods rely on.Throughout this paper, we use the following assumptionregarding the dictionary snapshots. Assumption 3.1: (Full Column Rank Dictionary Matrices):

The matrices D ( X ) and D ( Y ) have full column rank. (cid:3) Assumption 3.1 is reasonable: in order to hold, the dictio-nary functions must be linearly independent, i.e., the functionsmust form a basis for span( D ( x )) . Moreover, the assumptionrequires the set of initial conditions rows( X ) to be diverseenough to capture important characteristics of the vector ﬁeld.We should also point out that our treatment here relies onEDMD, which is not speciﬁcally designed to work with noisydata. Hence, we assume access to data with high signal-to-noise ratio. In practice, one might need to pre-process the datato use the algorithms proposed here.IV. EDMD AND K OOPMAN E IGENFUNCTIONS

Here we investigate the capabilities and limitations of theEDMD method regarding the identiﬁcation of Koopman eigen-functions. The next result shows that EDMD is not only ableto capture Koopman eigenfunctions but also all the functionsthat evolve linearly according to the available data.

Lemma 4.1: (EDMD Captures the Koopman Eigenfunc-tions in the Span of the Dictionary):

Suppose Assumption 3.1holds. Let f ( x ) ∈ span( D ( x )) , with f ( x ) = D ( x ) v for some v ∈ C N d \ { } .(a) Let f evolve linearly according to the available data,i.e., there exists λ ∈ C such that f ( y i ) = λf ( x i ) forevery i ∈ { , . . . , (cid:93) rows( X ) } . Then, the vector v is aneigenvector of K EDMD with eigenvalue λ ;(b) Let f be an eigenfunction of the Koopman operator witheigenvalue λ . Then, the vector v is an eigenvector of K EDMD with eigenvalue λ . Proof: (a) Based on the linear evolution of f , we have D ( Y ) v = λD ( X ) v. Moreover, using the closed-form solution of EDMD, we have K EDMD v = D ( X ) † D ( Y ) v = λD ( X ) † D ( X ) v = λv, where the last equality follows from Assumption 3.1.(b) Based on the deﬁnition of Koopman eigenfunction, wehave f ( x + ) = λf ( x ) . Since this linear evolution reﬂectsin data snapshots, we have f ( y i ) = λf ( x i ) for every i ∈ { , . . . , (cid:93) rows( X ) } where x Ti and y Ti are the i th rows of X and Y respectively. The rest follows from (a).Despite its simplicity, this result provides a signiﬁcantinsight into the EDMD method. Lemma 4.1 shows that EDMDcan capture eigenfunctions in the span of the dictionaryeven if the underlying subspace is not Koopman invariant.Throughout the literature it is well known that the (E)DMDmethod can capture physical constraints, conservation laws,and other properties of the underlying system, which actuallycorrespond to Koopman eigenfunctions, e.g. see [32], [41].Lemma 4.1 explains these observations and is a generalizationof [27, Theorem 1] to EDMD when the underlying systemis not necessarily linear (or cannot be approximated by alinear system accurately) and the underlying subspace is notKoopman invariant.Lemma 4.1 provides a necessary condition for the identiﬁca-tion of Koopman eigenfunctions. This condition however is notsufﬁcient, see e.g. [1, Example IV.3] for a counter example.Interestingly, if a function evolves linearly forward in time,it also evolves linearly backward in time. The next result,shows that checking this observation provides a necessary andsufﬁcient condition for identiﬁcation of functions that evolvelinearly in time according to the available data. Theorem 4.2: (Identiﬁcation of Linear Evolutions by For-ward and Backward EDMD):

Suppose Assumption 3.1 holds.Let f ( x ) ∈ span( D ( x )) with f ( x ) = D ( x ) v for some v ∈ C N d \ { } . Then f ( y i ) = λf ( x i ) for some λ ∈ C \ { } and for all i ∈ { , . . . , (cid:93) rows( X ) } if and only if v is an eigen-vector of K f = K EDMD ( D ( X ) , D ( Y )) with eigenvalue λ ,and an eigenvector of K b = K EDMD ( D ( Y ) , D ( X )) witheigenvalue λ − . Proof: ( ⇐ ) : Using the closed-form solutions of theEDMD problem and Assumption 3.1, one can write, K f = ( D ( X ) T D ( X )) − D ( X ) T D ( Y ) ,K b = ( D ( Y ) T D ( Y )) − D ( Y ) T D ( X ) . Using these equations along with the deﬁnition of the eigen-pair, we have λD ( X ) T D ( X ) v = D ( X ) T D ( Y ) v, (9a) λ − D ( Y ) T D ( Y ) v = D ( Y ) T D ( X ) v. (9b)By multiplying (9a) from the left by v H and using (9b), λ (cid:107) D ( X ) v (cid:107) = v H D ( X ) T D ( Y ) v = ¯ λ − (cid:107) D ( Y ) v (cid:107) which implies | λ | (cid:107) D ( X ) v (cid:107) = (cid:107) D ( Y ) v (cid:107) . (10)Now, we decompose D ( Y ) v orthogonally as D ( Y ) v = cD ( X ) v + w, (11)with v H D ( X ) T w = 0 . Substituting (11) into (9a) and multi-plying both sides from the left by v H yields λv H D ( X ) T D ( X ) v = cv H D ( X ) T D ( X ) v. Since v (cid:54) = 0 , and under Assumption 3.1, we deduce that c = λ .Substituting the value of c in (11), ﬁnding the 2-norm, andusing the fact that v H D ( X ) T w = 0 , one can write (cid:107) D ( Y ) v (cid:107) = | λ | (cid:107) D ( X ) v (cid:107) + (cid:107) w (cid:107) . Comparing this with (10), one deduces that w = 0 and D ( Y ) v = λD ( X ) v . The result follows by looking at thisequality in a row-wise manner and noting that f ( x ) = D ( x ) v . ( ⇒ ) : Based on Lemma 4.1(a), v must be an eigenvectorof K f with eigenvalue λ . Moreover, since λ (cid:54) = 0 one canwrite f ( x i ) = λ − f ( y i ) for every i ∈ { , . . . , (cid:93) rows( X ) } and, consequently, using Lemma 4.1(a) once again, we have K b v = λ − v , concluding the proof.If the function f satisﬁes the conditions provided by The-orem 4.2, then it is guaranteed that f ( x + ) = λf ( x ) for all x ∈ rows( X ) . However, Theorem 4.2 does not guaranteethat f is an eigenfunction, i.e., there is no guarantee that f ( x + ) = λf ( x ) for all x ∈ M . To circumvent this issue,we introduce next inﬁnite sampling and make an assumptionabout its density. Assumption 4.3: (Almost sure dense sampling from a com-pact state space):

Assume the state space M is compact.Suppose we gather inﬁnitely (countably) many data snapshots.For N ∈ N , the ﬁrst N data snapshots are representedby matrices X N and Y N such that y i = T ( x i ) for all i ∈ { , . . . , N } , where x i and y i are the i th rows of X N and Y N , respectively (we refer to the columns of X T N asthe set S N of initial conditions). Assume there exists a class- K function α and sequence { p N } ∞ N =1 ⊂ [0 , such that, forevery N ∈ N , ∀ m ∈ M , ∃ x ∈ S N such that (cid:107) m − x (cid:107) ≤ α (cid:16) N (cid:17) holds with probability p N , and lim N →∞ p N = 1 . (cid:3) Assumption 4.3 is not restrictive as, in most practical cases,the state space is compact or the analysis is limited to a speciﬁcbounded region. Moreover, the data is usually available on abounded region due the limited range of sensors. Regardingthe sampling density, Assumption 4.3 holds for most standardrandom samplings.Noting that our methods presented later require Assump-tion 3.1 to hold, we provide a deﬁnition for dictionary matricesacquired from inﬁnite sampling.

Deﬁnition 4.4: ( R -rich Sequence of Dictionary Snapshots): Let { X N } ∞ N =1 and { Y N } ∞ N =1 be the sequence of data snap-shot matrices acquired from system (1). Given the dictionary D : M → R × N d , we say the sequence of dictionary snapshotmatrices is R -rich if R = min { M ∈ N | rank( D ( X M )) =rank( D ( Y M )) = N d } exists ( R is called richness constant ). (cid:3) In Deﬁnition 4.4, if { M ∈ N | rank( D ( X M )) = rank( D ( Y M )) = N d } , (cid:54) = ∅ then based on the well-ordering principle, see e.g. [42, Chapter0], the minimum of the set exists and the sequence of thedictionary snapshot matrices is R -rich. Moreover, given an R -rich sequence of dictionary snapshots matrices D ( X N ) and D ( Y N ) , Assumption 3.1 holds for every N ≥ R . We are now ready to identify the Koopman eigenfunctionsin the span of the dictionary using forward-backward EDMD. Theorem 4.5: (Identiﬁcation of Koopman Eigenfunctions byForward and Backward EDMD):

Given an inﬁnite sampling,suppose that the sequence of dictionary snapshot matricesis R -rich. Let K Nf = K EDMD ( D ( X N ) , D ( Y N )) , K Nb = K EDMD ( D ( Y N ) , D ( X N )) . Given v ∈ C N d \ { } and λ ∈ C \ { } , let f ( x ) = D ( x ) v . Then,(a) If f is an eigenfunction of the Koopman operator witheigenvalue λ , then K Nf v = λv and K Nb v = λ − v forevery N ≥ R ;(b) Conversely, and assuming the dictionary functions arecontinuous and Assumption 4.3 holds, if K Nf v = λv and K Nb v = λ − v for every N ≥ R , then f is an eigen-function of the Koopman operator with probability 1. Proof: (a) Since f is a Koopman eigenfunction, for every i ∈ N we have f ( y i ) = λf ( x i ) . Moreover, for every N ≥ R , D ( X N ) and D ( Y N ) have full column rank. Therefore, theresult follows from Theorem 4.2.(b) Based on Theorem 4.2, we deduce that, for every N ≥ Rf ( y i ) = λf ( x i ) v, ∀ i ∈ { , . . . , N } , (12)where x Ti and y Ti are the i th rows of X N and Y N respectively. Now, deﬁne h ( x ) := f ◦ T ( x ) − λf ( x ) . Thefunction h is continuous since f is a linear combination ofcontinuous functions and T is also continuous. By inspect-ing h on the data points and using (12) and the fact that y i = T ( x i ) , for all i ∈ { , . . . , N } , one can show that h ( x i ) = f ◦ T ( x i ) − λf ( x i ) = f ( y i ) − λf ( x i ) = 0 for every i ∈ { , . . . , N } . Moreover, note that based on Assumption 4.3,the set S ∞ = (cid:83) ∞ i =1 S i is dense in M with probability 1 and h ( x ) = 0 for every x ∈ S ∞ . As a result, h ( x ) = 0 on M withprobability 1. This implies that f ◦ T ( x ) = λf ( x ) for every x ∈ M almost surely. Consequently, we have K ( f ) = λf almost surely, and the result follows.Theorems 4.2 and 4.5 provide conditions to identify Koop-man eigenfunctions. The identiﬁed eigenfunctions then canspan Koopman-invariant subspaces. Unlike [40, Algorithm1], the methods proposed here do not require access tothe system’s multi-step trajectories and essentially use leastsquares algorithms with closed-form solutions. However, onestill needs to compare N d potentially complex eigenvec-tors and their corresponding eigenvalues. This procedure canbe impractical for large N d . Moreover, since M ⊆ R n ,the eigenfunctions of the Koopman operator form complex-conjugate pairs. Such pairs can be fully characterized usingtheir real and imaginary parts, which allows to use instead real-valued functions. Motivated by these observations, we developalgorithms to directly identify Koopman-invariant subspaces.V. I DENTIFICATION OF K OOPMAN -I NVARIANT S UBSPACESVIA S YMMETRIC S UBSPACE D ECOMPOSITION

Here we provide an algorithmic method to identifyKoopman-invariant subspaces in the span of a predeﬁneddictionary. We later show how this can be used to ﬁnd Koop-man eigenfunctions. Given a dictionary D : M → R × N d comprised of N d linearly independent functions, we aim to ﬁnd a dictionary ˜ D : M → R × ˜ N d with ˜ N d linearlyindependent functions such that the elements of ˜ D ( x ) spanthe maximal Koopman-invariant subspace in the span of D ( x ) .Since span( ˜ D ( x )) is Koopman invariant, we have R ( ˜ D ( x + )) = R ( ˜ D ( x )) , ∀ x ∈ M . This equality gets reﬂected in the data, i.e., given snapshotmatrices X and Y , we have R ( ˜ D ( Y )) = R ( ˜ D ( X )) . (13)Moreover, since the elements of ˜ D ( x ) are in the span of D ( x ) ,there exists a full column rank matrix C such that ˜ D ( x ) = D ( x ) C . Thus using (13), one can write R ( D ( Y ) C ) = R ( D ( X ) C ) . (14)Hence, we can reformulate the problem as a purely linear-algebraic problem consisting of ﬁnding the full column rankmatrix C with maximum number of columns such that (14)holds. To solve this problem, we propose the SymmetricSubspace Decomposition (SSD) method (cf. Algorithm 1).At each iteration, SSD prunes the dictionary and removesfunctions that do not evolve linearly in time according to theavailable data . Algorithm 1

Symmetric Subspace Decomposition Initialization i ← , A ← D ( X ) , B ← D ( Y ) , C SSD ← I N d while do (cid:20) Z Ai Z Bi (cid:21) ← null([ A i , B i ]) (cid:46) Basis for the null space if null([ A i , B i ]) = ∅ then return (cid:46) The basis does not exist break end if if (cid:93) rows( Z Ai ) ≤ (cid:93) cols( Z Ai ) then return C SSD (cid:46)

The procedure is complete break end if C SSD ← C SSD Z Ai (cid:46) Reducing the subspace A i +1 ← A i Z Ai , B i +1 ← B i Z Ai , i ← i + 1 end while The next result analyzes the convergence of SSD andcharacterizes the dimension, maximality, and symmetry of thesubspace deﬁned by its output.

Theorem 5.1: (Properties of SSD Output):

Suppose As-sumption 3.1 holds. For matrices D ( X ) , D ( Y ) , let C SSD =SSD( D ( X ) , D ( Y )) . The SSD algorithm has the followingproperties:(a) it stops after at most N d iterations;(b) the matrix C SSD satisﬁes R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) . In addition, C SSD is either orhas full column rank; In the pseudocode stated in Algorithm 1, the function null([ A i , B i ]) returns a basis for the null space of [ A i , B i ] , and Z Ai and Z Bi in Step 4have the same size. (c) the subspace R ( D ( X ) C SSD ) is maximal, in the sensethat, for any matrix E with R ( D ( X ) E ) = R ( D ( Y ) E ) ,we have R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) and R ( E ) ⊆R ( C SSD ) ;(d) R (cid:0) SSD( D ( X ) , D ( Y )) (cid:1) = R (cid:0) SSD( D ( Y ) , D ( X )) (cid:1) . Proof: (a) First, we use (strong) induction to prove that ateach iteration Z Ai , Z Bi are matrices with full column rank uponexistence . By Assumption 3.1, A and B have full columnrank. Now, by using Lemma A.1 one can derive that Z A and Z B have full column rank. Now, suppose that the matrices Z A , . . . , Z Ak and Z B , . . . , Z Bk have full column rank. UsingAssumption 3.1 one can deduce that A k +1 = A Z A . . . Z Ak , B k +1 = B Z A . . . Z Ak have full column rank since they areproduct of matrices with full column rank. Using Lemma A.1,one can conclude that Z Ak +1 and Z Bk +1 have full column rank.Consequently, we have (cid:93) rows( Z Ai ) ≥ (cid:93) cols( Z Ai ) . Hence,Step 9 of the SSD algorithm implies that the algorithm canonly move to the next iteration if (cid:93) rows( Z Ai ) > (cid:93) cols( Z Ai ) ,which means the number of columns in A i +1 and B i +1 decreases with respect to A i and B i . Hence, the algorithmterminates after at most N d iterations since A and B have N d columns.(b) The C SSD = 0 case is trivial. Suppose that thealgorithm stops after k iterations with nonzero C SSD . Thismeans that Z Ak and Z Bk are square full rank matrices. Also,by deﬁnition we have A k Z Ak = − B k Z Bk which means that A k = − B k Z Bk ( Z Ak ) − . Noting that Z Bk ( Z Ak ) − is a full ranksquare matrix, one can derive R ( A k ) = R ( B k ) . A closer lookat the deﬁnitions shows that A k = D ( X ) C SSD and B k = D ( Y ) C SSD . Hence, R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) .Moreover, C SSD = Z A · · · Z Ak − and considering the fact that Z A , . . . , Z Ak − have full column rank, one can deduce that C SSD has full column rank.(c) Suppose that the matrix E satisﬁes R ( D ( X ) E ) = R ( D ( Y ) E ) . First, we use induction to prove that R ( D ( X ) E ) ⊆ R ( A i ) ∩ R ( B i ) for each iteration i thatthe algorithm goes through. Let i = 1 , then A = D ( X ) and B = D ( Y ) . Consequently, R ( D ( X ) E ) ⊆ R ( A ) and R ( D ( X ) E ) = R ( D ( Y ) E ) ⊆ R ( B ) based on the deﬁnitionof E . Hence, R ( D ( X ) E ) ⊆ R ( A ) ∩ R ( B ) . Now, suppose R ( D ( X ) E ) ⊆ R ( A i ) ∩ R ( B i ) . (15)Using Lemma A.1, one can derive R ( A i Z Ai ) = R ( A i ) ∩R ( B i ) . Combining this with (15), we get R ( D ( X ) E ) ⊆ R ( A i Z Ai )= R (cid:0) D ( X ) Z A · · · Z Ai (cid:1) . (16)Using (16) with Lemma A.2 one can derive R ( E ) ⊆R ( Z A · · · Z Ai ) . Using Lemma A.2 once again, we get R ( D ( X ) E ) = R ( D ( Y ) E ) ⊆ R (cid:0) D ( Y ) Z A · · · Z Ai (cid:1) . (17)Deﬁnition of A i +1 , B i +1 along with (16) and (17) lead toconclusion that R ( D ( X ) E ) ⊆ R ( A i +1 ) ∩ R ( B i +1 ) and theinduction is complete.Now, suppose that the algorithm terminates at iteration k .In the case that C SSD = 0 , we have R ( A k ) ∩ R ( B k ) = { } , which means that E = 0 and R ( D ( X ) E ) ⊆R ( D ( X ) C SSD ) . In the case that C SSD (cid:54) = 0 , using the factthat R ( D ( X ) E ) ⊆ R ( A k ) ∩ R ( B k ) , C SSD = Z A · · · Z Ak − ,and R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) , one can deduce that R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) . Moreover, using Assump-tion 3.1 and Lemma A.2 one can write R ( E ) ⊆ R ( C SSD ) .(d) For convenience, let E SSD = SSD( D ( Y ) , D ( X )) .Based on the deﬁnition of C SSD and E SSD , one can write R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) R ( D ( X ) E SSD ) = R ( D ( Y ) E SSD ) Using this equations in conjunction with the maximality of R ( C SSD ) from part (c), we deduce R ( E SSD ) ⊆ R ( C SSD ) .Using a similar argument and invoking the maximality of R ( E SSD ) , we deduce R ( C SSD ) ⊆ R ( E SSD ) , and the prooffollows. Remark 5.2: (Time and Space Complexity of the SSD Al-gorithm):

Given N data snapshots and a dictionary with N d elements, where usually N (cid:29) N d , and assuming thatoperations on scalar elements require time and memory oforder O (1) , the most time and memory consuming operationin the SSD algorithm is Step 4. This step can be done bytruncated Singular Value Decomposition (SVD) and ﬁndingthe perpendicular space to the span of the right singular vec-tors, with time complexity O ( N N d ) and memory complexity O ( N N d ) , see e.g., [43]. Since, based on Theorem 5.1(a), theSSD algorithm terminates in at most N d iterations, the totaltime complexity is O ( N N d ) . However, since at each iterationwe can reuse the memory for Step 4, the space complexity ofSSD is O ( N N d ) . (cid:3) As mentioned earlier, SSD removes the functions that donot evolve linearly in time according to the available datasnapshots. As a consequence, as we gather more data, theidentiﬁed subspace either remains the same or gets smaller, asstated next.

Lemma 5.3: (Monotonicity of SSD Output with Respect toData Addition):

Let D ( X ) , D ( Y ) and D ( ˆ X ) , D ( ˆ Y ) be twopairs of data snapshots such that rows (cid:0) [ D ( X ) , D ( Y )] (cid:1) ⊆ rows (cid:0) [ D ( ˆ X ) , D ( ˆ Y )] (cid:1) , (18)and for which Assumption 3.1 holds. Then R (SSD([ D ( ˆ X ) , D ( ˆ Y )])) ⊆ R (SSD( D ( X ) , D ( Y ))) . Proof:

We use the shorthand notation ˆ C =SSD([ D ( ˆ X ) , D ( ˆ Y )]) and C = SSD( D ( X ) , D ( Y )) .From (18), we deduce that there exists a matrix E with rows( E ) ⊆ rows( I (cid:93) rows( ˆ X ) ) such that ED ( ˆ X ) = D ( X ) , ED ( ˆ Y ) = D ( Y ) . (19)Moreover, based on the deﬁnition of ˆ C and Theorem 5.1(b),we have R ( D ( ˆ X ) ˆ C ) = R ( D ( ˆ Y ) ˆ C ) . Hence, there exists a fullrank square matrix ˆ K such that D ( ˆ Y ) ˆ C = D ( ˆ X ) ˆ C ˆ K. Multiplying both sides from the left by E and using (19) gives D ( Y ) ˆ C = D ( X ) ˆ C ˆ K . Consequently, we have R ( D ( Y ) ˆ C ) = R ( D ( X ) ˆ C ) . Now, the maximality of C (Theorem 5.1(c))implies R ( ˆ C ) ⊆ R ( C ) . Remark 5.4: (Implementing SSD using Finite Precision andApproximation of Koopman-invariant Subspaces):

Since SSDis an iterative method, its implementation using ﬁnite-precisiondigital computers leads to small errors that can affect the rankand null space of [ A i , B i ] in Step 4. To circumvent this issue,one can approximate [ A i , B i ] at each iteration by a close (inthe Frobenius norm) low-rank matrix. Let σ ≥ . . . ≥ σ l i be the singular values of [ A i , B i ] ∈ R N × l i . Given a designparameter (cid:15) > , let k i be the minimum integer such that (cid:80) l i j = k i σ j (cid:80) l i j =1 σ j ≤ (cid:15). (20)By setting σ k i = · · · = σ l i = 0 in the singular value decom-position of [ A i , B i ] one can construct the matrix [ ˆ A i , ˆ B i ] withlower rank. Also, (cid:107) [ A i , B i ] − [ ˆ A i , ˆ B i ] (cid:107) F (cid:107) [ A i , B i ] (cid:107) F ≤ (cid:15). (21)Hence, (cid:15) tunes the accuracy of the approximation. Also,one can use the same method to approximate the Koopman-invariant subspaces, which is specially convenient when thedictionary does not contain a nontrivial Koopman-invariantsubspace. (cid:3) If the output of the SSD algorithm is not trivial, C SSD (cid:54) = 0 ,we deﬁne the invariant dictionary as ˜ D ( x ) := D ( x ) C SSD . (22)Based on Theorem 5.1(b), we have R ( ˜ D ( X )) = R ( ˜ D ( Y )) . (23)Moreover, ˜ D ( X ) and ˜ D ( Y ) have full column rank as a resultof Assumption 3.1 and Theorem 5.1(b). Hence, there exists aunique nonsingular square matrix K SSD such that ˜ D ( Y ) = ˜ D ( X ) K SSD . (24)One can ﬁnd K SSD in closed form as K SSD = ˜ D ( X ) † ˜ D ( Y )= (cid:0) D ( X ) C SSD (cid:1) † (cid:0) D ( Y ) C SSD ) . (25)It is important to note that, based on (23), one can ﬁnd K SSD only based on partial data instead of calculating the pseudo-inverse of D ( X ) C SSD . Formally, consider full column rankdata matrices D ( ˆ X ) , D ( ˆ Y ) such that rows[ D ( ˆ X ) , D ( ˆ Y )] ⊆ rows[ D ( X ) , D ( Y )] . Then, one can derive K SSD = (cid:0) D ( ˆ X ) C SSD (cid:1) † (cid:0) D ( ˆ Y ) C SSD ) . Next, we show that the eigenvectors of K SSD fully character-ize the functions that evolve linearly in time according to theavailable data.

Theorem 5.5: (Identiﬁcation of Linear Evolutions usingthe SSD Algorithm):

Suppose that Assumption 3.1 holds.Let C SSD = SSD( D ( X ) , D ( Y )) (cid:54) = 0 , K SSD = (cid:0) D ( X ) C SSD (cid:1) † (cid:0) D ( Y ) C SSD (cid:1) , and f ( x ) ∈ span( D ( x )) de-noted as f ( x ) = D ( x ) v with v ∈ C N d \ { } . Then f ( y i ) = λf ( x i ) for some λ ∈ C \ { } and for all i ∈ { , . . . , (cid:93) rows( X ) } if and only if v = C SSD w with K SSD w = λw . Proof: ( ⇐ ) : Based on deﬁnition of K SSD , Assump-tion 3.1, and considering the fact that C SSD has full columnrank (Theorem 5.1(b)), one can use (22)-(24) and the factthat K SSD w = λw to write D ( Y ) C SSD w = λD ( X ) C SSD w .Consequently, using v = C SSD w we have D ( Y ) v = λD ( X ) v. By inspecting the equation above in a row-wise manner, onecan deduce that f ( y i ) = λf ( x i ) for some λ ∈ C \ { } andfor all i ∈ { , . . . , (cid:93) rows( X ) } , as claimed. ( ⇒ ) : Based on the hypotheses, we have D ( Y ) v = λD ( X ) v. (26)Consider ﬁrst the case when v ∈ R N d . Then using (26),we deduce R ( D ( X ) v ) = R ( D ( Y ) v ) . The maximality of C SSD (Theorem 5.1(c)) implies that R ( v ) ⊆ R ( C SSD ) andconsequently v = C SSD w for some w . Replacing v by C SSD w in (26) and using the deﬁnition of K SSD , one deduces K SSD w = λw .Now, suppose that v = v R + jv I with v I (cid:54) = 0 . Since D ( X ) and D ( Y ) are real matrices, one can use (26) and write D ( Y )¯ v = ¯ λD ( X )¯ v . This, together with (26), implies D ( Y ) E = D ( X ) E Λ , (27)where E = [ v R , v I ] and Λ = (cid:20)

Re( λ ) Im( λ ) − Im( λ ) Re( λ ) (cid:21) . Since Λ is full rank, we have R ( D ( X ) E ) = R ( D ( Y ) E ) andusing Theorem 5.1(c), one can conclude R ( E ) ⊆ R ( C SSD ) .Consequently, there exists a real vector z such that E = C SSD z . By replacing this in (27) and multiplying both sidesfrom the right by r = [1 , j ] T and deﬁning w = zr , onecan conclude that v = Er = C SSD w and D ( Y ) C SSD w = λD ( X ) C SSD w . This in conjunction with the deﬁnition of K SSD implies that K SSD w = λw , concluding the proof.Using Theorem 5.5, one can identify all the linear evolutionsin the span of the original dictionary, thereby establishing anequivalence with the forward-backward EDMD characteriza-tion of Section IV. Corollary 5.6: (Equivalence of Forward-Backward EDMDand SSD in the Identiﬁcation of Linear Evolutions):

Suppose that Assumption 3.1 holds. Let K f = K EDMD ( D ( X ) , D ( Y )) , K b = K EDMD ( D ( Y ) , D ( X )) , C SSD = SSD( D ( X ) , D ( Y )) (cid:54) = 0 and K SSD = (cid:0) D ( X ) C SSD (cid:1) † (cid:0) D ( Y ) C SSD (cid:1) . Then, K f v = λv and K b v = λ − v for some v ∈ C N d \ { } and λ ∈ C \ { } ifand only if there exists vector w such that v = C SSD w and K SSD w = λw .The proof of this result is a consequence of Theorems 4.2and 5.5. Note that the linear evolutions identiﬁed by SSDmight not be Koopman eigenfunctions, since we can only guarantee that they evolve linearly according to the availabledata snapshots, not starting everywhere in the state space M .The following result uses the equivalence between SSD andthe Forward-Backward EDMD method to provide a guaranteefor the identiﬁcation of Koopman eigenfunctions. Theorem 5.7: (Identiﬁcation of Koopman Eigenfunctions bythe SSD Algorithm):

Given an inﬁnite sampling, supposethat the sequence of dictionary snapshot matrices is R -rich.For N ≥ R , let C SSD N = SSD( D ( X N ) , D ( Y N )) (cid:54) = 0 ,and K SSD N = (cid:0) D ( X N ) C SSD N (cid:1) † (cid:0) D ( Y N ) C SSD N ) . Given v ∈ C N d \ { } and λ ∈ C \ { } , let f ( x ) = D ( x ) v . Then,(a) If f is an eigenfunction of the Koopman operator witheigenvalue λ , then for every N ≥ R , there exists w N such that v = C SSD N w N and K SSD N w N = λw N ;(b) Conversely, and assuming the dictionary functions arecontinuous and Assumption 4.3 holds, if v ∈ R ( C SSD N ) and there exists w N such that v = C SSD N w N and K SSD N w N = λw N for every N ≥ R , then f is an eigen-function of the Koopman operator with probability 1.This result is a consequence of Theorem 4.5 and Corol-lary 5.6. Theorem 5.7 shows that the SSD algorithm ﬁndsall the eigenfunctions in the span of the original dictionaryalmost surely. The identiﬁed eigenfunctions span a Koopman-invariant subspace. This subspace however is not necessarilythe maximal Koopman-invariant subspace in the span of theoriginal dictionary. Next, we show that the SSD methodactually identiﬁes the maximal Koopman-invariant subspacein the span of the dictionary. Theorem 5.8: (SSD Finds the Maximal Koopman-InvariantSubspace as N → ∞ ): Given an inﬁnite sampling and adictionary composed of continuous functions, suppose thatthe sequence of dictionary snapshot matrices is R -rich andAssumption 4.3 holds. Let the columns of C SSD ∞ form a basisfor lim N →∞ R ( C SSD N ) , i.e., R ( C SSD ∞ ) = lim N →∞ R ( C SSD N ) = ∞ (cid:92) N = R R ( C SSD N ) . (28)(note that the sequence {R ( C SSD N ) } ∞ N =1 is monotonic, andhence convergent). Then span( D ( x ) C SSD ∞ ) is the maximalKoopman-invariant subspace in the span of the dictionary D with probability 1. Proof: If C SSD ∞ = 0 , considering the fact that for all N ≥ R , R ( C SSD N +1 ) ⊆ R ( C SSD N ) (Lemma 5.3), one deducesthat there exists m ∈ N such that for all i ≥ m , C SSD i = 0 .Hence based on Theorem 5.1(c), the maximal Koopman-invariant subspace acquired from the data is { } . Notingthat the subspace identiﬁed by SSD contains the maximalKoopman-invariant subspace, we deduce that the latter is thezero subspace, which is indeed spanned by D ( x ) C SSD ∞ .Now, suppose that C SSD ∞ (cid:54) = 0 and has full column rank.First, we show that R ( D ( X N ) C SSD ∞ ) = R ( D ( Y N ) C SSD ∞ ) , ∀ N ≥ R. (29)Considering (28) and the fact that for all N ≥ R , R ( C SSD N +1 ) ⊆R ( C SSD N ) , we can write for all N ≥ R R ( C SSD ∞ ) = ∞ (cid:92) i = N R ( C SSD i ) . Invoking Lemma A.4, we have for all N ≥ R , R ( D ( X N ) C SSD ∞ ) = ∞ (cid:92) i = N R ( D ( X N ) C SSD i ) , (30a) R ( D ( Y N ) C SSD ∞ ) = ∞ (cid:92) i = N R ( D ( Y N ) C SSD i ) . (30b)Moreover, for all i ≥ N we have R ( D ( X i ) C SSD i ) = R ( D ( Y i ) C SSD i ) and hence by looking at this equality in arow-wise manner, one can write R ( D ( X N ) C SSD i ) = R ( D ( Y N ) C SSD i ) , ∀ i ≥ N. (31)The combination of (30) and (31) yields (29). Based on thelatter, the fact that D ( X N ) and D ( Y N ) have full columnrank for every N ≥ R and the fact that C SSD ∞ has full columnrank, there exists a unique nonsingular square matrix K SSD ∞ ∈ R (cid:93) cols( C SSD ∞ ) × (cid:93) cols( C SSD ∞ ) such that D ( X N ) C SSD ∞ K SSD ∞ = D ( Y N ) C SSD ∞ , ∀ N ≥ R. (32)Note that K SSD ∞ does not depend on N . Next, we aim to provethat for every function f ∈ span( D ( x ) C SSD ∞ ) , K ( f ) is alsoin span( D ( x ) C SSD ∞ ) almost surely. Let v ∈ R (cid:93) cols( C SSD ∞ ) suchthat f ( x ) = D ( x ) C SSD ∞ v and deﬁne g ( x ) := D ( x ) C SSD ∞ K SSD ∞ v. (33)We show that g = f ◦ T = K ( f ) almost surely. Deﬁne thefunction h := g − f ◦ T . Also, let S ∞ = (cid:83) ∞ N = R S N be the setof initial conditions. Based on (32), (33), and deﬁnition of h ,we have h ( x ) = 0 , ∀ x ∈ S ∞ . Moreover, h is continuous since D and T are continuous.This, together with the fact that S ∞ is dense in M almostsurely (Assumption 4.3), we deduce h ≡ on M almostsurely. Therefore, g = K ( f ) = f ◦ T with probability 1.Noting that g ( x ) ∈ span( D ( x ) C SSD ∞ ) , we have proven that span( D ( x ) C SSD ∞ ) is Koopman invariant almost surely.Finally, we prove the maximality of span( D ( x ) C SSD ∞ ) . Let L be a Koopman-invariant subspace in span( D ( x )) . Thenthere exists a full column rank matrix E such that L =span( D ( x ) E ) . Moreover, since the invariance of L reﬂectsin data, we have R ( D ( X N ) E ) = R ( D ( Y N ) E ) , ∀ N ≥ R. As a result, based on Theorem 5.1(c), we have R ( E ) ⊆R ( C SSD N ) , for all N ≥ R , and hence R ( E ) ⊆ R ( C SSD ∞ ) .Therefore, by Lemma A.2 we have L = R ( D ( x ) E ) ⊆ R ( D ( x ) C SSD ∞ ) , which completes the proof. Remark 5.9: (Generalized Koopman Eigenfunctions):

Onecan also extend the above discussion for generalized Koopmaneigenfunctions (see e.g. [6, Remark 11]). Given a generalizedeigenvector w of K SSD , the corresponding generalized Koop-man eigenfunction is φ ( x ) = D ( x ) C SSD w . (cid:3) VI. S

TREAMING S YMMETRIC S UBSPACE D ECOMPOSITION

In this section, we consider the setup where data becomesavailable in a streaming fashion. A straightforward algorithmicsolution for this scenario would be to re-run, at each timestep,the SSD algorithm with all the data available up to then.However, this approach does not take advantage of the answerscomputed in previous timesteps, and maybe become inefﬁcientwhen the size of the data is large. Instead, here we pursue thedesign of an online algorithm, termed Streaming SymmetricSubspace Decomposition (SSSD), cf. Algorithm 2, that up-dates the identiﬁed subspaces using the previously computedones. Note that the SSSD algorithm is not only useful forstreaming data sets but also for the case of non-streaming largedata sets for which the execution of SSD requires a signiﬁcantamount of memory.

Algorithm 2

Streaming Symmetric Subspace Decomposition Initialization D XS (1) ← (cid:20) D ( X S ) D ( x S +1 ) (cid:21) , D YS (1) ← (cid:20) D ( Y S ) D ( y S +1 ) (cid:21) i ← , A ← D XS (1) , B ← D YS (1) , C ← I N d while do if C i − = 0 then C i ← (cid:46) The basis does not exist return C i break end if F i ← SSD( A i , B i ) if F i = 0 then C i ← (cid:46) The basis does not exist return C i break end if if (cid:93) rows( F i ) > (cid:93) cols( F i ) then C i ← basis( R ( C i − F i )) (cid:46) Subspace reduction else C i ← C i − (cid:46) No change end if return C i i ← i + 1 (cid:79) Replacing the last data snapshot with the new one D XS ( i ) = (cid:20) D ( X S ) D ( x S + i ) (cid:21) , D YS ( i ) = (cid:20) D ( Y S ) D ( y S + i ) (cid:21) (cid:79) Calculating the reduced dictionary snapshots A i ← D XS ( i ) C i − , B i ← D YS ( i ) C i − end while Given the signature snapshot matrices X S and Y S , forsome S ∈ N , and a dictionary of functions D , the SSSDalgorithm proceeds as follows: at each iteration, the algorithmreceives a new pair of data snapshots, combines them with sig-nature data matrices, and applies the latest available dictionaryon them. Then, it uses SSD on those dictionary matrices andfurther prunes the dictionary. Since the SSD algorithm relieson Assumption 3.1, we make the following assumption on thesignature snapshots and the original dictionary. Assumption 6.1: (Full Rank Signature Dictionary Matri-ces):

We assume that there exists S ∈ N such that the matrices D ( X S ) and D ( Y S ) have full column rank. (cid:3) For a ﬁnite number of data snapshots, Assumption 6.1is equivalent to Assumption 3.1. For an inﬁnite sampling,Assumption 6.1 holds for a R -rich sequence of snapshotmatrices.The next result discusses the basic properties of the SSSDoutput at each iteration. Proposition 6.2: (Properties of SSSD Output):

Suppose As-sumption 6.1 holds. For i ∈ N , let C i denote the output of theSSSD algorithm at the i th iteration. Then, for all i ∈ N ,(a) C i has full column rank or is equal to zero;(b) R ( C i ) ⊆ R ( C i − ) ;(c) R ( D XS ( i ) C i ) = R ( D YS ( i ) C i ) . Proof: (a) We prove the claim by induction. C = I N d and has full column rank. Now, suppose that C k has fullcolumn rank or is zero. We show the same fact for C k +1 . If C k = 0 , then SSSD executes Step 6 and we have C k +1 = 0 .Now, suppose that C k has full column rank. Considering thefact that D XS ( k + 1) and D YS ( k + 1) have full column rank,one can deduce that A k +1 and B k +1 have full column rank.Consequently, based on Theorem 5.1(b), F k +1 has full columnrank or is equal to zero. In the former case, the algorithmexecutes Step 17 or Step 19, and based on deﬁnition of basis function and the fact that C k has full column rank, one deducesthat C k +1 has full column rank. In the latter case, the algorithmexecutes Step 12, and C k +1 = 0 , as claimed.Now we prove (b). Note that at iteration i , C i will bedetermined by either Step 6, 12, 19, or 17. The proof for theﬁrst three cases is trivial. We only need to prove the result forthe case when the SSSD algorithm executes Step 17. Basedon Theorem 5.1(b), one can deduce that F i has full columnrank. Also, we have R ( F i ) ⊆ R ( I (cid:93) cols( C i − ) ) . Hence usingStep 17 and Lemma A.2, one can write R ( C i ) = R ( C i − F i ) ⊆ R ( C i − I (cid:93) cols( C i − ) ) = R ( C i − ) , as claimed.Next, we prove part (c). If the SSSD algorithm executesStep 6 or Step 12, then the result follows directly. Now,suppose that the algorithm executes Step 17 or Step 19. Notethat if the algorithm executes one of these two steps, then F i (cid:54) = 0 , C i − (cid:54) = 0 and they have full column rank (Theo-rem 5.1(b)). Hence, (cid:93) rows( F i ) ≥ (cid:93) cols( F i ) . As a result, if thealgorithm executes Step 19, we have (cid:93) rows( F i ) = (cid:93) cols( F i ) and consequently R ( F i ) = R ( I (cid:93) cols( C i − ) ) . Therefore, R ( C i ) = R ( C i − ) = R ( C i − I (cid:93) cols( C i − ) ) = R ( C i − F i ) . (34)Moreover, if the SSSD algorithm executes Step 17, then usingthe deﬁnition of basis function, we have R ( C i ) = R ( C i − F i ) . (35)Also, based on deﬁnition of F i at Step 10, Theorem 5.1(b),and the fact that A i = D XS ( i ) C i − and B i = D YS ( i ) C i − , R ( D XS ( i ) C i − F i ) = R ( D YS ( i ) C i − F i ) . Using this in conjunction with (34) upon execution ofStep 19 and (35) upon execution of Step 17, one deduces R ( D XS ( i ) C i ) = R ( D YS ( i ) C i ) , concluding the proof.Next, we show that the SSSD algorithm at each iterationidentiﬁes exactly the same subspace as the SSD algorithmgiven all the data up to that iteration. Theorem 6.3: (Equivalence of SSD and SSSD):

SupposeAssumption 6.1 holds. For i ∈ N , let C i denote the outputof the SSSD algorithm at the i th iteration and let C SSD i =SSD (cid:0) D ( X S + i ) , D ( Y S + i ) (cid:1) . Then, R ( C i ) = R ( C SSD i ) , ∀ i ∈ N . Proof: Inclusion R ( C SSD i ) ⊆ R ( C i ) for all i ∈ N : Wereason by induction. Note that in the SSSD algorithm, for i = 1 we have F = C SSD1 . As a result, if F = 0 thenbased on Step 12, C = C SSD1 = 0 . If the SSSD algorithmexecutes Step 17, then using the fact that C = I N d , one canwrite R ( C ) = R ( C SSD1 ) . Moreover, if the SSSD algorithmexecutes Step 19, based on Step 16 and Theorem 5.1(b), onecan deduce that R ( C SSD1 ) = R ( C ) = R ( F ) = R ( I N d ) .Consequently, in all cases R ( C SSD1 ) = R ( C ) . (36)Hence, R ( C SSD1 ) ⊆ R ( C ) . Now, suppose that R ( C SSD k ) ⊆ R ( C k ) . (37)We need to show that R ( C SSD k +1 ) ⊆ R ( C k +1 ) . If C SSD k +1 = 0 then the proof follows. Now assume that C SSD k +1 (cid:54) = 0 and hasfull column rank based on Theorem 5.1(b). By Lemma 5.3,we have R ( C SSD k +1 ) ⊆ R ( C SSD k ) . (38)Using (37) and (38), one can deduce that R ( C SSD k +1 ) ⊆ R ( C k ) .Consequently, based the fact that C SSD k +1 (cid:54) = 0 , we have C k (cid:54) = 0 and hence has full column rank based on Proposition 6.2(a).Moreover, there exists a matrix E k with full column rank suchthat C SSD k +1 = C k E k . (39)Two cases are possible. In case 1, the SSSD algorithm executesStep 19. In case 2, the algorithm executes Step 12 or Step 17.For case 1, we have C k +1 = C k . Consequently, using (39)and considering the fact that R ( E k ) ⊆ R ( I (cid:93) cols( C k ) ) and thefact that C k has full column rank, one can use Lemma A.2and conclude R ( C SSD k +1 ) = R ( C k E k ) ⊆ R ( C k ) = R ( C k +1 ) . (40)Now, consider case 2. In this case, we have R ( C k +1 ) = R ( C k F k +1 ) . (41)Also, based on deﬁnition of C SSD k +1 and Theorem 5.1(b), onecan write R ( D ( X k +1 C SSD k +1 )) = R ( D ( Y k +1 C SSD k +1 )) . Looking at this equation in a row-wise manner and consideringthe fact that rows (cid:0) [ D XS ( k + 1) , D YS ( k + 1)] (cid:1) ⊆ rows (cid:0) [ D ( X k +1 ) , D ( Y k +1 )] (cid:1) , one can write R ( D XS ( k + 1) C SSD k +1 ) = R ( D YS ( k + 1) C SSD k +1 ) . Now, using (39) we have R ( D XS ( k + 1) C k E k ) = R ( D YS ( k +1) C k E k ) . Also, noting the deﬁnition of F k +1 and the factthat A k = D XS ( k + 1) C k , B k = D YS ( k + 1) C k , one can useTheorem 5.1(c) to write R ( E k ) ⊆ R ( F k +1 ) . Since C k hasfull column rank, one can use (39), (41), and Lemma A.2 towrite R ( C SSD k +1 ) = R ( C k E k ) ⊆ R ( C k F k +1 ) = R ( C k +1 ) . (42)In both cases, equations (40) and (42) conclude the induction. Inclusion R ( C i ) ⊆ R ( C SSD i ) for all i ∈ N : We reason byinduction too. Using (36), we have R ( C ) ⊆ R ( C SSD1 ) . Now,suppose that R ( C k ) ⊆ R ( C SSD k ) . (43)We prove the same result for k +1 . If C k +1 = 0 then the resultdirectly follows. Now, assume that C k +1 (cid:54) = 0 . Consequently,based on (43), Proposition 6.2(a), and Theorem 5.1(b), wededuce that C k +1 and C SSD k +1 have full column rank.The ﬁrst part of the proof and (43) imply that R ( C k ) = R ( C SSD k ) . Consequently, noting the fact that C SSD k is theoutput of the SSD algorithm with D ( X S + k ) and D ( Y S + k ) ,one can use Theorem 5.1(b) to write R (cid:0) D ( X S + k ) C k (cid:1) = R (cid:0) D ( Y S + k ) C k (cid:1) . (44)Moreover, based on Proposition 6.2(b), we have R ( C k +1 ) ⊆R ( C k ) . Hence, since C k and C k +1 have full column rank,there exists a matrix G k with full column rank such that C k +1 = C k G k . (45)Also, based on Proposition 6.2(c) at iteration k + 1 of theSSSD algorithm R ( D XS ( k + 1) C k +1 ) = R ( D YS ( k + 1) C k +1 ) . (46)Consequently, based on (45) and (46), we have R ( D ( X S ) C k G k ) = R ( D ( Y S ) C k G k ) . Using this equation together with (44) and Lemma A.3, onecan write R (cid:0) D ( X S + k ) C k G k (cid:1) = R (cid:0) D ( Y S + k ) C k G k (cid:1) . Moreover, using (45) one can write R (cid:0) D ( X S + k ) C k +1 (cid:1) = R (cid:0) D ( Y S + k ) C k +1 (cid:1) . Hence, there exists a nonsingular square matrix K ∗ such that D ( X S + k ) C k +1 K ∗ = D ( Y S + k ) C k +1 . (47)Also, based on (46) and noting that D XS ( k +1) , D XS ( k +1) ,and C k +1 have full column rank, there exists a nonsingularsquare matrix K such that D XS ( k + 1) C k +1 K = D YS ( k + 1) C k +1 . (48) Using the ﬁrst S rows of (47) and (48), one can write D ( X S ) C k +1 K ∗ = D ( Y S ) C k +1 ,D ( X S ) C k +1 K = D ( Y S ) C k +1 . By subtracting the second equation from the ﬁrst one, we get D ( X S ) C k +1 ( K ∗ − K ) = 0 . Moreover, since D ( X S ) C k +1 has full column rank, wededuce K ∗ = K . Using this in conjunction with (47) and (48)leads to R (cid:0) D ( X S + k +1 ) C k +1 (cid:1) = R (cid:0) D ( Y S + k +1 ) C k +1 (cid:1) . (49)Based on this equation, the deﬁnition of C SSD k +1 and Theo-rem 5.1(c), we deduce that R ( C k +1 ) ⊆ R ( C SSD k +1 ) , concludingthe proof.Theorem 6.3 establishes the equivalence between the SSSDand SSD algorithms. As a consequence, all results regardingthe identiﬁcation of Koopman-invariant subspaces and eigen-functions presented in Section V are also valid for the outputof the SSSD algorithm. Remark 6.4: (Time and Space Complexity of the SSSDAlgorithm):

Given the ﬁrst N data snapshots and a dictionarywith N d elements, with N > S ≥ N d , and assuming thatoperations on scalar elements require time and space of order O (1) , the most time and memory consuming operation inthe SSSD algorithm is Step 10 invoking SSD. In this step,the most time consuming operation is performing SVD, withtime complexity O ( SN d ) and space complexity of O ( SN d ) ,see e.g., [43]. After having performed the ﬁrst SVD, theensuing ones result in a reduction of the dimension of thesubspace. Therefore, the SSSD algorithm performs at most N − S SVDs with no subspace reduction with overall timecomplexity O ( N SN d ) and at most N d SVD operations withsubspace reductions with overall time complexity O ( SN d ) .Considering the fact that N ≥ N d , the complexity of theSSSD algorithm is O ( N SN d ) . Moreover, in many real worldapplications S = O ( N d ) (in fact usually S = N d ), whichreduces the time complexity of SSSD to O ( N N d ) , which isthe same complexity as SSD. Moreover, since we can reusethe space used in Step 10 at each iteration, and consideringthe fact that the space complexity of this step is O ( SN d ) ,we deduce that the space complexity of SSSD is O ( SN d ) .This usually reduces to O ( N d ) since S = O ( N d ) in manyreal-world applications. (cid:3) Remark 6.5: (SSSD is More Stable and Runs Faster thanSSD):

The SSSD algorithm is more computationally stablethan SSD, since it always works with matrices of size atmost ( S + 1) × N d while SSD works with matrices of size N × N d . Moreover, even though SSD and SSSD have thesame time complexity, the SSSD algorithm run faster for tworeasons. First, at each iteration of the SSSD algorithm, thedictionary gets smaller, which reduces the cost of computationfor the remaining data snapshots. Second, the characterizationsin Remarks 5.2 and 6.4 only consider the number of ﬂoatingpoint operations for the time complexity and ignore the amountof time used for loading the data. SSSD needs to load signif-icantly smaller data matrices, which leads to a considerablereduction in run time compared to SSD. (cid:3) VII. S

IMULATION R ESULTS

Here, we illustrate the efﬁcacy of the proposed methodsin two examples. We have chosen on purpose examples thatresult in sparse low-dimensional representations in order to beable to fully detail the identiﬁed Koopman eigenvalues andassociated subspaces. However, it is worth pointing out thatthe results presented here are applicable without any restrictionon the type of dynamical system, its dimension, or the sparsityof the model in the dictionary.

Example 7.1: (Linear System with Non-invariant Dictio-nary):

Consider the system (cid:20) x x (cid:21) + = (cid:20) . . − . . (cid:21) (cid:20) x x (cid:21) , (50)with x T = [ x , x ] T . We use the non-invariant dictionary D ( x ) = [1 , x , x , x , x x , x , x , x x , x ] with N d = 9 .Moreover, we gather × data snapshots uniformly sampledfrom [ − , × [ − , .We use the SSD and SSSD strategies to identify themaximal Koopman-invariant subspace in span( D ( x )) . In theSSSD method, we use the ﬁrst 9 data snapshots as signaturesnapshots and feed the rest of the data to the algorithmaccording to the order they appear in the data set. Since weperform our calculations using ﬁnite-precision digital com-puters, we used the strategy explained in Remark 5.4 with (cid:15) = 10 − . Both methods ﬁnd bases (dictionaries) for thesubspace spanned by { , x , x , x , x x , x } with dimension6. It is worth mentioning that SSSD performs the calcula-tions 97% faster than SSD. Using either of the calculateddictionaries with (25), one can ﬁnd K SSD . Moreover, basedon Theorems 5.5, 5.7, we use the eigendecomposition of K SSD to ﬁnd all the Koopman eigenfunctions associated withthe system (50) in span( D ( x )) . Table I shows the identiﬁedeigenfunctions up to four decimals. One can readily verifythat all the identiﬁed eigenfunctions evolve linearly in timeaccording to the dynamics. TABLE I: Identiﬁed eigenfunctions and eigenvalues of the Koopman operatorassociated with system (50).

Eigenfunction Eigenvalue φ ( x ) = 1 λ = 1 φ ( x ) = (0 . x + 0 . x )+ (0 . x − . x ) j λ = 0 . . jφ ( x ) = (0 . x + 0 . x ) − (0 . x − . x ) j λ = 0 . − . jφ ( x ) = x + x λ = 1 φ ( x ) = (0 .

095 + 0 . j )( x − x ) − x x (1 . − . j ) λ = 0 .

28 + 0 . jφ ( x ) = (0 . − . j )( x − x ) − x x (1 .

991 + 0 . j ) λ = 0 . − . j To compare the effectiveness of our methods with EDMDfor trajectory prediction, we deﬁne error functions that quan-tify the accuracy of predictions. Let ˜ D ( x ) be the invariantdictionary identiﬁed by SSD (or equivalently by SSSD). Given a trajectory { x ( k ) } Mk =0 with length M and initial condition x ,we deﬁne the following relative error functions for EDMDand SSD. E EDMDrelative ( k ) = (cid:13)(cid:13) D ( x ( k )) − D ( x (0)) (cid:0) K EDMD (cid:1) k (cid:13)(cid:13) (cid:107) D ( x ( k )) (cid:107) × , (51a) E SSDrelative ( k ) = (cid:13)(cid:13) ˜ D ( x ( k )) − ˜ D ( x (0)) (cid:0) K SSD (cid:1) k (cid:13)(cid:13) (cid:107) ˜ D ( x ( k )) (cid:107) × , (51b)where D ( x (0)) (cid:0) K EDMD (cid:1) k and ˜ D ( x (0)) (cid:0) K SSD (cid:1) k are pre-dicted dictionary vectors at time k calculated using EDMDand SSD, respectively. Also, we can use the angle between thepredicted and exact dictionary vectors to quantify the error, E EDMDangle ( k ) = ∠ (cid:16) D ( x ( k )) , D ( x (0)) (cid:0) K EDMD (cid:1) k (cid:17) , (52a) E SSDangle ( k ) = ∠ (cid:16) ˜ D ( x ( k )) , ˜ D ( x (0)) (cid:0) K SSD (cid:1) k (cid:17) . (52b)Figure 1 illustrates the aforementioned errors along a trajectorystarting from a random initial condition in [ − , × [ − , for20 time steps. Unlike EDMD, the linear prediction using theSSD algorithm matches the behavior of the system exactly.Even though EDMD captures the Koopman eigenfunctionsexactly (Lemma 4.1), the additional functions in the dictionarydo not evolve according to the acquired model, which leadsto high prediction errors even after one time step. time step r e l a ti v e e rr o r ( % ) E relativeSSD E relativeEDMD time step a ng l e (r a d ) E angleSSD E angleEDMD Fig. 1: Relative (left) and angle (right) prediction errors for EDMD and SSDfor system (50) on a trajectory of length M = 20 . Example 7.2: (Unstable Discrete-time Polynomial System):

Consider the nonlinear system x +1 = 1 . x x +2 = 1 . x + 0 . x + 0 . , (53)with state x T = [ x , x ] T . System (53) is actually an unsta-ble Polyﬂow [44] which has a ﬁnite-dimensional Koopman-invariant subspace comprised of polynomials. We use thedictionary D ( x ) = [1 , x , x , x , x x , x , x , x x , x x , x ] with N d = 10 . Moreover, we gather × data snap-shots uniformly sampled from [ − , × [ − , . We use theSSD and SSSD strategies to identify the maximal Koopman-invariant subspaces in span( D ( x )) . In the SSSD method, weuse the ﬁrst 10 data snapshots as signature snapshots andfeed the rest of the data to the algorithm according to theorder they appear in the data set. Similarly to the previousexample, we use the strategy explained in Remark 5.4 with (cid:15) = 10 − to overcome error due to the use of ﬁnite-precision machines. Both methods ﬁnd bases for the 6-dimensionalsubspace spanned by { , x , x , x x , x , x } , which is themaximal Koopman-invariant subspace in span( D ( x )) . TheSSSD method, however, performs the calculations 96% fasterthan SSD. One can ﬁnd K SSD using either of the identiﬁeddictionaries in conjunction with (25). Moreover, based onTheorems 5.5, 5.7, we use the eigendecomposition of K SSD to ﬁnd all the Koopman eigenfunctions associated with thesystem (50) in span( D ( x )) . Table II shows the identiﬁedeigenfunctions. One can use direct calculation to verify thatthe identiﬁed functions are the Koopman eigenfunctions asso-ciated with system (53). Note that since x and x are bothin the span of the identiﬁed Koopman-invariant subspace, onecan fully characterize the behavior of the system using theeigenfunctions and (6) linearly or directly using the identiﬁeddictionary and K SSD . TABLE II: Identiﬁed eigenfunctions and eigenvalues of the Koopman operatorassociated with system (53).

Eigenfunction Eigenvalue φ ( x ) = 1 λ = 1 φ ( x ) = x λ = 1 . φ ( x ) = x λ = 1 . φ ( x ) = 20 x − x − λ = 1 . φ ( x ) = x λ = 1 . φ ( x ) = 20 x − x x − x λ = 1 . We compare the prediction using K SSD and K EDMD toshow the superiority of the proposed method for long termpredictions. Figure 2 illustrates the relative and angle errorsalong a trajectory starting from a random initial condition in [ − , × [ − , for 20 time steps. Unlike EDMD, the linearprediction using the SSD algorithm matches the behavior ofthe system exactly. Moreover, since the system is unstable,the EDMD errors grow rapidly and, after only a few timesteps, the predicted dictionary vector is nearly in the oppositedirection of the actual one. time step r e l a ti v e e rr o r ( % ) E relativeSSD E relativeEDMD time step a ng l e (r a d ) E angleSSD E angleEDMD Fig. 2: Relative (left) and angle (right) prediction errors for EDMD and SSDfor system (53) on a trajectory of length M = 20 . VIII. C

ONCLUSIONS

We have studied the characterization of Koopman-invariantsubspaces and Koopman eigenfunctions associated to a dy-namical system by means of data-driven methods. We haveshown that the application of EDMD over a given dictionaryforward and backward in time fully characterizes whether a function evolves linearly in time according to the availabledata. Building on this result, and under dense sampling,we have established that functions satisfying this conditionare Koopman eigenfunctions almost surely. Motivated by theimpracticality of checking the proposed conditions for largedictionaries, we have developed an alternative approach toidentify the maximal Koopman-invariant subspace in the spanof the given dictionary, which in turn also serves to identifythe Koopman eigenfunctions. We have formally characterizedthe correctness of the proposed SSD algorithm, paying specialattention to the number of iterations for convergence and com-plexity. Finally, we have considered scenarios with large andstreaming data sets, and developed the online SSSD algorithmthat reﬁnes its output as new data becomes available takingadvantage of the solutions computed in previous timesteps.We have shown that, for a given dataset, both SSD and SSSDprovide the same solution. Future work will develop paralleland distributed counterparts of the algorithms proposed hereover network systems and investigate the design of noise-resilient methods to identify Koopman eigenfunctions andinvariant subspaces. R

EFERENCES[1] M. Haseli and J. Cort´es, “Efﬁcient identiﬁcation of linear evolutions innonlinear vector ﬁelds: Koopman invariant subspaces,” in

IEEE Conf.on Decision and Control , Nice, France, Dec. 2019, pp. 1746–1751.[2] B. O. Koopman, “Hamiltonian systems and transformation in Hilbertspace,”

Proceedings of the National Academy of Sciences , vol. 17, no. 5,pp. 315–318, 1931.[3] B. O. Koopman and J. V. Neumann, “Dynamical systems of continuousspectra,”

Proceedings of the National Academy of Sciences , vol. 18,no. 3, pp. 255–263, 1932.[4] I. Mezi´c, “Spectral properties of dynamical systems, model reductionand decompositions,”

Nonlinear Dynamics , vol. 41, no. 1-3, pp. 309–325, 2005.[5] C. W. Rowley, I. Mezi´c, S. Bagheri, P. Schlatter, and D. S. Henningson,“Spectral analysis of nonlinear ﬂows,”

Journal of Fluid Mechanics , vol.641, pp. 115–127, 2009.[6] M. Budiˇsi´c, R. Mohr, and I. Mezi´c, “Applied Koopmanism,”

Chaos ,vol. 22, no. 4, p. 047510, 2012.[7] A. Surana, M. O. Williams, M. Morari, and A. Banaszuk, “Koopmanoperator framework for constrained state estimation,” in

IEEE Conf. onDecision and Control , Melbourne, Australia, 2017, pp. 94–101.[8] M. Netto and L. Mili, “A robust data-driven Koopman Kalman ﬁlter forpower systems dynamic state estimation,”

IEEE Transactions on PowerSystems , vol. 33, no. 6, pp. 7228–7237, 2018.[9] A. Mauroy and J. Goncalves, “Linear identiﬁcation of nonlinear systems:A lifting technique based on the Koopman operator,” in

IEEE Conf. onDecision and Control , Las Vegas, NV, Dec. 2016, pp. 6500–6505.[10] ——, “Koopman-based lifting techniques for nonlinear systems identi-ﬁcation,”

IEEE Transactions on Automatic Control , 2019, to appear.[11] D. Bruder, C. D. Remy, and R. Vasudevan, “Nonlinear system identiﬁ-cation of soft robot dynamics using Koopman operator theory,” in

IEEEInt. Conf. on Robotics and Automation , Montreal, Canada, May 2019,pp. 6244–6250.[12] B. Kramer, P. Grover, P. Boufounos, S. Nabi, and M. Benosman,“Sparse sensing and DMD-based identiﬁcation of ﬂow regimes andbifurcations in complex ﬂows,”

SIAM Journal on Applied DynamicalSystems , vol. 16, no. 2, pp. 1164–1196, 2017.[13] S. Sinha, U. Vaidya, and R. Rajaram, “Operator theoretic framework foroptimal placement of sensors and actuators for control of nonequilibriumdynamics,”

Journal of Mathematical Analysis and Applications , vol. 440,no. 2, pp. 750–772, 2016.[14] S. Klus, F. N¨uske, P. Koltai, H. Wu, I. Kevrekidis, C. Sch¨utte, and F. No´e,“Data-driven model reduction and transfer operator approximation,”

Journal of Nonlinear Science , vol. 28, no. 3, pp. 985–1010, 2018.[15] A. Alla and J. N. Kutz, “Nonlinear model order reduction via dynamicmode decomposition,”

SIAM Journal on Scientiﬁc Computing , vol. 39,no. 5, pp. B778–B796, 2017. [16] M. Korda and I. Mezi´c, “Linear predictors for nonlinear dynamical sys-tems: Koopman operator meets model predictive control,” Automatica ,vol. 93, pp. 149–160, 2018.[17] H. Arbabi, M. Korda, and I. Mezic, “A data-driven Koopman modelpredictive control framework for nonlinear ﬂows,” arXiv preprintarXiv:1804.05291 , 2018.[18] S. Peitz and S. Klus, “Koopman operator-based model reduction forswitched-system control of PDEs,”

Automatica , vol. 106, pp. 184–191,2019.[19] B. Huang, X. Ma, and U. Vaidya, “Feedback stabilization using Koop-man operator,” in

IEEE Conf. on Decision and Control , Miami Beach,FL, Dec. 2018, pp. 6434–6439.[20] E. Kaiser, J. N. Kutz, and S. L. Brunton, “Data-driven discovery ofKoopman eigenfunctions for control,” arXiv preprint arXiv:1707.01146 ,2017.[21] A. Sootla and D. Ernst, “Pulse-based control using Koopman operatorunder parametric uncertainty,”

IEEE Transactions on Automatic Control ,vol. 63, no. 3, pp. 791–796, 2017.[22] I. Abraham, G. de la Torre, and T. Murphey, “Model-based control usingKoopman operators,” in

Robotics: Science and Systems , Cambridge,Massachusetts, Jul. 2017.[23] G. Mamakoukas, M. Castano, X. Tan, and T. Murphey, “Local Koopmanoperators for data-driven control of robotic systems,” in

Robotics:Science and Systems , Freiburg, Germany, Jun. 2019.[24] A. Mauroy, , and I. Mezi´c, “Global stability analysis using the eigen-functions of the Koopman operator,”

IEEE Transactions on AutomaticControl , vol. 61, no. 11, pp. 3356–3369, 2016.[25] P. J. Schmid, “Dynamic mode decomposition of numerical and experi-mental data,”

Journal of Fluid Mechanics , vol. 656, pp. 5–28, 2010.[26] K. K. Chen, J. H. Tu, and C. W. Rowley, “Variants of dynamic modedecomposition: boundary condition, Koopman, and Fourier analyses,”

Journal of Nonlinear Science , vol. 22, no. 6, pp. 887–915, 2012.[27] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, andJ. N. Kutz, “On dynamic mode decomposition: theory and applications,”

Journal of Computational Dynamics , vol. 1, no. 2, pp. 391–421, 2014.[28] S. T. M. Dawson, M. S. Hemati, M. O. Williams, and C. W. Rowley,“Characterizing and correcting for the effect of sensor noise in thedynamic mode decomposition,”

Experiments in Fluids , vol. 57, no. 3,p. 42, 2016.[29] M. S. Hemati, C. W. Rowley, E. A. Deem, and L. N. Cattafesta,“De-biasing the dynamic mode decomposition for applied Koopmanspectral analysis of noisy datasets,”

Theoretical and Computational FluidDynamics , vol. 31, no. 4, pp. 349–368, 2017.[30] M. R. Jovanovi´c, P. J. Schmid, and J. W. Nichols, “Sparsity-promotingdynamic mode decomposition,”

Physics of Fluids , vol. 26, no. 2, p.024103, 2014.[31] S. L. Clainche and J. M. Vega, “Higher order dynamic mode decom-position,”

SIAM Journal on Applied Dynamical Systems , vol. 16, no. 2,pp. 882–925, 2017.[32] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data-drivenapproximation of the Koopman operator: Extending dynamic modedecomposition,”

Journal of Nonlinear Science , vol. 25, no. 6, pp. 1307–1346, 2015.[33] M. Korda and I. Mezi´c, “On convergence of extended dynamic modedecomposition to the Koopman operator,”

Journal of Nonlinear Science ,vol. 28, no. 2, pp. 687–710, 2018.[34] M. Haseli and J. Cort´es, “Approximating the Koopman operator usingnoisy data: noise-resilient extended dynamic mode decomposition,” in

American Control Conference , Philadelphia, PA, Jul. 2019, pp. 5499–5504.[35] E. Qian, B. Kramer, B. Peherstorfer, and K. Willcox, “Lift & learn:Physics-informed machine learning for large-scale nonlinear dynamicalsystems,” arXiv preprint arXiv:1912.08177 , 2019.[36] Q. Li, F. Dietrich, E. M. Bollt, and I. G. Kevrekidis, “Extended dynamicmode decomposition with dictionary learning: A data-driven adaptivespectral decomposition of the Koopman operator,”

Chaos , vol. 27,no. 10, p. 103111, 2017.[37] N. Takeishi, Y. Kawahara, and T. Yairi, “Learning Koopman invariantsubspaces for dynamic mode decomposition,” in

Conference on NeuralInformation Processing Systems , 2017, pp. 1130–1140.[38] E. Yeung, S. Kundu, and N. Hodas, “Learning deep neural networkrepresentations for Koopman operators of nonlinear dynamical systems,”in

American Control Conference , Philadelphia, PA, Jul. 2019, pp. 4832–4839.[39] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N. Kutz, “Koop-man invariant subspaces and ﬁnite linear representations of nonlinear dynamical systems for control,”

PLOS One , vol. 11, no. 2, pp. 1–19,2016.[40] M. Korda and I. Mezi´c, “Optimal construction of Koopmaneigenfunctions for prediction and control,” 2019. [Online]. Available:https://hal.archives-ouvertes.fr/hal-02278835[41] S. Klus, F. N¨uske, S. Peitz, J. H. Niemann, C. Clementi, andC. Sch¨utte, “Data-driven approximation of the Koopman generator:Model reduction, system identiﬁcation, and control,” arXiv preprintarXiv:1909.10638 , 2019.[42] G. B. Folland,

Real Analysis: Modern Techniques and Their Applica-tions , 2nd ed. New York: Wiley, 1999.[43] X. Li, S. Wang, and Y. Cai, “Tutorial: Complexity analysis of singularvalue decomposition and its variants,” arXiv preprint arXiv:1906.12085 ,2019.[44] R. M. Jungers and P. Tabuada, “Non-local linearization of nonlineardifferential equations via polyﬂows,” in

American Control Conference ,Philadelphia, PA, 2019, pp. 1906–1911. A PPENDIX

Here we gather some linear algebraic results used through-out the paper.

Lemma A.1: (Intersection of Linear Spaces):

Let

A, B ∈ R m × n be matrices with full column rank. Suppose that thecolumns of Z = [( Z A ) T , ( Z B ) T ] T ∈ R n × l form a basis forthe null space of [ A, B ] , where Z A , Z B ∈ R n × l . Then,(a) R ( AZ A ) = R ( A ) ∩ R ( B ) ;(b) Z A and Z B have full column rank. Proof: (a) First, note that R ( AZ A ) ⊆ R ( A ) . Moreover,by hypothesis, [ A, B ] Z = 0 , which leads to R ( AZ A ) = R ( BZ B ) ⊆ R ( B ) . Consequently, R ( AZ A ) ⊆ R ( A ) ∩ R ( B ) .Now, suppose that v ∈ R ( A ) ∩R ( B ) . By deﬁnition, there existvectors w , w ∈ R n such that v = Aw = Bw . Then, (cid:2) A, B (cid:3) (cid:20) w − w (cid:21) = 0 , which means that [ w T , − w T ] T ∈ R ( Z ) and there exists r ∈ R l such that w = Z A r and w = − Z B r . Therefore, v = Aw = AZ A r ∈ R ( AZ A ) . Consequently, R ( A ) ∩ R ( B ) ⊆R ( AZ A ) and the claim follows.(b) We prove this part using contradiction. Suppose thatthere exists v (cid:54) = 0 such that Z A v = 0 . Also, since [ A, B ] Z =0 , one can conclude that AZ A v = − BZ B v = 0 . Since B has full column rank, we have Z B v = 0 . Hence Zv = 0 ,which is a contradiction since the columns of Z are linearlyindependent. A similar reasoning shows that Z B has fullcolumn rank. Lemma A.2:

Let

A, C, D be matrices of appropriate sizes,with A having full column rank. Then R ( AC ) ⊆ R ( AD ) ifand only if R ( C ) ⊆ R ( D ) . Proof: ( ⇒ ) : Suppose that v ∈ R ( C ) , and hence thereexists w such that v = Cw . Therefore, Av = ACw ∈ R ( AC ) .Since R ( AC ) ⊆ R ( AD ) , one can deduce that there exist r such that ACw = ADr , and we get A ( v − Dr ) = 0 . Thisleads to v = Dr ∈ R ( D ) since A has full column rank, andhence R ( C ) ⊆ R ( D ) . ( ⇐ ) : Suppose that v ∈ R ( AC ) , and hence v = ACw for some w . Since R ( C ) ⊆ R ( D ) , there exists r such that Cw = Dr . As a result, v = ACw = ADr which leads to theconclusion that v ∈ R ( AD ) and the claim follows. Lemma A.3:

Let A , B ∈ R m × n , A , B ∈ R l × n , and C ∈ R n × k with A , B , C having full column rank. If R (cid:32) (cid:20) A A (cid:21) (cid:33) = R (cid:32) (cid:20) B B (cid:21) (cid:33) , (54a) R ( A C ) = R ( B C ) , (54b)then R (cid:32) (cid:20) A A (cid:21) C (cid:33) = R (cid:32) (cid:20) B B (cid:21) C (cid:33) . Proof:

Based on (54a), one can deduce that there existsa nonsingular square matrix K such that A = B K, (55a) A = B K. (55b)Multiplying both sides of (55a) by C gives A C = B KC. (56)Moreover, based on (54b), one can deduce that there exists anonsingular square matrix K ∗ A C = B CK ∗ . (57)By subtracting (57) from (56) and considering the fact that B has full column rank, one can deduce that CK ∗ = KC. (58)Now, multiplying both sides of (55b) from the right by C and using (58), one can write A C = B CK ∗ , which inconjunction with (57) leads to (cid:20) A A (cid:21) C = (cid:20) B B (cid:21) CK ∗ , completing the proof. Lemma A.4:

Let A , { C i } ∞ i =1 , and ˆ C be matrices of ap-propriate sizes. Assume that A has full column rank and R ( ˆ C ) = (cid:84) ∞ i =1 R ( C i ) . Then R ( A ˆ C ) = (cid:84) ∞ i =1 R ( AC i ) . Proof:

First, we prove that R ( A ˆ C ) ⊆ (cid:84) ∞ i =1 R ( AC i ) .Let v ∈ R ( A ˆ C ) , then there exists a vector w such that v = A ˆ Cw = Ar , with r = ˆ Cw . Note that r ∈ R ( ˆ C ) andconsequently r ∈ R ( C i ) for all i ∈ N . Hence, for every i ∈ N there exists z i such that r = C i z i . Based on the deﬁnition of v , we have v = Ar = AC i z i for every i ∈ N . As a result, v ∈ (cid:84) ∞ i =1 R ( AC i ) which concludes the proof of this inclusion.Now, we prove that (cid:84) ∞ i =1 R ( AC i ) ⊆ R ( A ˆ C ) . Let v ∈ (cid:84) ∞ i =1 R ( AC i ) . Then for every i ∈ N , there exists w i such that v = AC i w i . Moreover, v ∈ R ( A ) and since A has full columnrank, there exists a unique r such that v = Ar . Therefore, forall i ∈ N , we have r = C i w i . Thus, r ∈ (cid:84) ∞ i =1 R ( C i ) andconsequently, r ∈ R ( ˆ C ) . Since v = Ar , we have v ∈ R ( A ˆ C ) ,concluding the proof. Masih Haseli was born in Kermanshah, Iran in1991. He received the B.Sc. degree, in 2013, andM.Sc. degree, in 2015, both in Electrical Engi-neering from Amirkabir University of Technology(Tehran Polytechnic), Tehran, Iran. In 2017, hejoined the University of California, San Diegoto pursue the Ph.D. degree in Mechanical andAerospace Engineering. His research interests in-clude system identiﬁcation, nonlinear systems, net-work systems, data-driven modeling and control, anddistributed and parallel computing. Mr. Haseli wasthe recipient of the bronze medal in Iran’s national mathematics competitionin 2014.