[PDF] Delay-coordinate maps, coherence, and approximate spectra of evolution operators

Abstract

The problem of data-driven identification of coherent observables of measure-preserving, ergodic dynamical systems is studied using kernel integral operator techniques. An approach is proposed whereby complex-valued observables with approximately cyclical behavior are constructed from a pair eigenfunctions of integral operators built from delay-coordinate mapped data. It is shown that these observables are ϵ -approximate eigenfunctions of the Koopman evolution operator of the system, with a bound ϵ controlled by the length of the delay-embedding window, the evolution time, and appropriate spectral gap parameters. In particular, ϵ can be made arbitrarily small as the embedding window increases so long as the corresponding eigenvalues remain sufficiently isolated in the spectrum of the integral operator. It is also shown that the time-autocorrelation functions of such observables are ϵ -approximate Koopman eigenvalue, exhibiting a well-defined characteristic oscillatory frequency (estimated using the Koopman generator) and a slowly-decaying modulating envelope. The results hold for measure-preserving, ergodic dynamical systems of arbitrary spectral character, including mixing systems with continuous spectrum and no non-constant Koopman eigenfunctions in L 2 . Numerical examples reveal a coherent observable of the Lorenz 63 system whose autocorrelation function remains above 0.5 in modulus over approximately 10 Lyapunov timescales.

Full PDF

RResearch in the Mathematical Sciences manuscript No. (will be inserted by the editor)

Delay-coordinate maps, coherence, and approximate spectraof evolution operators

Dimitrios Giannakis

Received: date / Accepted: date

Abstract

The problem of data-driven identiﬁcation of coherent observables of measure-preserving, ergodic dynamical systems is studied using kernel integral operator tech-niques. An approach is proposed whereby complex-valued observables with approxi-mately cyclical behavior are constructed from a pair eigenfunctions of integral opera-tors built from delay-coordinate mapped data. It is shown that these observables are ε -approximate eigenfunctions of the Koopman evolution operator of the system, with abound ε controlled by the length of the delay-embedding window, the evolution time,and appropriate spectral gap parameters. In particular, ε can be made arbitrarily smallas the embedding window increases so long as the corresponding eigenvalues remainsufﬁciently isolated in the spectrum of the integral operator. It is also shown thatthe time-autocorrelation functions of such observables are ε -approximate Koopmaneigenvalue, exhibiting a well-deﬁned characteristic oscillatory frequency (estimatedusing the Koopman generator) and a slowly-decaying modulating envelope. The re-sults hold for measure-preserving, ergodic dynamical systems of arbitrary spectralcharacter, including mixing systems with continuous spectrum and no non-constantKoopman eigenfunctions in L . Numerical examples reveal a coherent observable ofthe Lorenz 63 system whose autocorrelation function remains above 0.5 in modulusover approximately 10 Lyapunov timescales. Keywords

Kernel integral operators · Delay-coordinate maps · Koopman operators · Feature extraction · Ergodic dynamical systems

This paper is dedicated to Andrew Majda on the occasion of his 70th birthday.D. GiannakisDepartment of Mathematics and Center for Atmosphere Ocean ScienceCourant Institute of Mathematical SciencesNew York University251 Mercer St, New York, NY 10012, USAE-mail: [email protected] a r X i v : . [ m a t h . D S ] J u l Dimitrios Giannakis elay-coordinate maps, coherence, and approximate spectra of evolution operators 3

In [17, 27], an interpretation of the timescale separation seen in features recoveredfrom delay-coordinate-mapped data was given through a spectral analysis of kernelintegral operators and Koopman evolution operators of dynamical systems [4, 24, 36].Speciﬁcally, it was shown that for a measure-preserving ergodic dynamical system,as the number of delays increases, the commutator between kernel integral opera-tors constructed from delay-embedded data (subject to mild requirements) and theKoopman operator converges to zero in operator norm, meaning that these opera-tors acquire common eigenspaces in the inﬁnite-delay limit. Since (i) kernel integraloperators associated with sufﬁciently regular (e.g., continuous) kernels are compact,and thus have ﬁnite-dimensional eigenspaces corresponding to nonzero eigenvalues;and (ii) the eigenspaces of Koopman operators of ergodic dynamical systems areone-dimensional, it follows that in the inﬁnite-delay limit, the eigenspaces of the ker-nel integral operators employed for feature extraction are a ﬁnite union of Koopmaneigenspaces. The latter are each characterized by a distinct timescale associated withthe corresponding eigenvalue of the generator. In applications, it is oftentimes ob-served that the eigenspaces of kernel integral operators with large numbers of delaysare numerically two-dimensional, meaning that they are associated with a single pairof Koopman eigenfrequencies of equal modulus and different sign. Sampled along or-bits of the dynamics, such kernel eigenfunctions have the structure of pure sinusoids,which can be thought of as exhibiting an “ideal” form of timescale separation.A useful aspect of the results in [17, 27] is that they hold for broad classes ofmeasure-preserving, ergodic dynamical systems (including systems with non-smoothattractors) and choices of kernel, and thus provide relevant information about theasymptotic behavior of a variety of feature extraction techniques utilizing delays,including the methods [1, 7, 10, 11, 28, 30, 31, 61] outlined above. Importantly, theintegral operators employed can be consistently approximated in a spectral sensefrom time series data using well-developed theory [42, 58, 59].1.2 Motivation and contributions of this workDespite their generally broad applicability, the results in [17, 27] offer limited in-sight on the behavior of kernel-based feature extraction techniques utilizing delay-coordinate maps for an important class of dynamical systems, namely systems withmixing behavior (or so-called mixed-spectrum systems with both quasiperiodic andmixing components). Indeed, a necessary and sufﬁcient condition for a measure-preserving dynamical system to be mixing is that the generator on the L space as-sociated with the invariant measure has a simple eigenvalue at zero, with a constantcorresponding eigenfunction, and no other eigenvalues. As a prototypical example,consider the Lorenz 63 (L63) system [41] on R , which is rigorously known to pos-sess an ergodic invariant measure µ supported on the famous “butterﬂy” attractorwith mixing dynamics [43, 60]. According to [17], for such a system the kernel in-tegral operator in the inﬁnite-delay limit acquires an inﬁnite-dimensional nullspacecontaining all L ( µ ) observables orthogonal to the constant, allowing features witharbitrarily broad frequency spectra (i.e., no timescale separation or coherence). More- Dimitrios Giannakis over, data-driven spectral approximation results such as [42, 58, 59] do not hold forthe potentially inﬁnite-dimensional nullspaces of compact operators.Yet, as illustrated in Figure 1, the eigenfunctions of integral operators based on asufﬁciently long delay embedding window, T , exhibit a form of coherence, which canbe thought of as a relaxation of the periodic behavior of Koopman eigenfunctions. Inparticular, for sufﬁciently large T , the time series associated with the kernel eigen-functions near the top of the spectrum have the structure of amplitude-modulatedwaves, with a well-deﬁned carrier frequency and a low-frequency modulating en-velope. In effect, the pure sinusoids generated by Koopman eigenfunctions can bethought of as special cases of these patterns with constant modulating envelopes. Asimilar behavior was observed in [52], who found that with increasing number of de-lays NLSA provides increasingly coherent representations of the El Ni˜no SouthernOscillation of the climate system, as well as other patterns of climate variability.The main contribution of this work is to provide a characterization of the co-herence properties of eigenfunctions of integral operators constructed from delay-embedded observables of measure-preserving, ergodic dynamical systems of arbi-trary (quasiperiodic, mixing, or mixed-spectrum) spectral characteristics, underpin-ning the behavior in Figure 1. We will do so by showing that a class of complex-valued observables z , whose real and imaginary parts are eigenfunctions of an inte-gral operator K T : L ( µ ) → L ( µ ) constructed using a delay-embedding window oflength T , lie in the ε -approximate point spectrum of the Koopman operator U t for abound ε that decreases at a rate O ( T − ) , but increases with the evolution time t at alinear rate, while also being inversely proportional to the corresponding eigenvaluesand the gap between them and the rest of spectrum of K T . Moreover, we give an ex-plicit characterization of the modulating envelope and carrier frequency through thetime-autocorrelation function of z and its derivative at 0, respectively.For systems possessing non-constant Koopman eigenfunctions, these results im-ply that at ﬁxed t , ε can be made arbitrarily small by increasing T , so long as K T sat-isﬁes certain positivity conditions that depend on the observation map and the formof the kernel, consistent with the results of [17]. On the other hand, for systems withmixing dynamics, the behavior of ε , and thus the coherence of z , is inﬂuenced by aninterplay between the delay-embedding window length (promoting coherence) andthe decay of the eigenvalues of K T with increasing T (inhibiting coherence). Never-theless, it is possible that ε is made small by increasing T , so long as the eigenvaluesassociated with z remain sufﬁciently isolated in the spectrum of K T .The plan of this paper is as follows. In Section 2, we describe the class of dy-namical systems under study, and state our results, including Theorem 1 which is themain theoretical contribution of this work. Section 3 contains a proof of Theorem 1,and Section 4 describes the data-driven formulation of our framework. We illustrateour results with numerical examples for the L63 system in Section 5, and state ourconclusions in Section 6. Auxiliary results and deﬁnitions on spectral approximationof integral operators are collected in Appendix A. elay-coordinate maps, coherence, and approximate spectra of evolution operators 5 Fig. 1

Representative eigenfunctions φ j , T of the integral operator K T for (a) no delays, T =

0; and (b) adelay-embedding window T equal to 8 natural time units, numerically approximated from a dataset consist-ing of N = ∆ t = .

01. In eachset of panels, the ﬁrst and second row show the leading two nonconstant eigenfunctions of K T , ordered inorder of decreasing corresponding eigenvalue. The ﬁrst column from the left shows a scatterplot of φ j , T onthe dataset. The second and third panels show scatterplots of φ j , T acted upon by the Koopman operator U t for time t = φ j , T sampled along a portion of the training trajectoryspanning 10 natural time units. The eigenfunctions in (a) exhibit limited dynamical coherence, in the sensethat their level sets mix together on times greater than (cid:38) φ j , T and U t φ j , T for t ∈ { , } . Furthermore, the timeseries in (b) have the structure of amplitude-modulated waves with a well-deﬁned carrier frequency andslowly varying modulating envelope, while exhibiting a 90 ◦ phase difference to a good approximation. Φ t : Ω → Ω , t ∈ R , on a met-ric space Ω possessing an invariant, ergodic Borel probability measure µ , supportedon a compact set X ⊆ Ω . We assume that the support X of the invariant measure is Dimitrios Giannakis contained in a forward-invariant, C compact manifold M such that Φ t | M is C , butdo not require that X has differentiable structure. The system is observed through acontinuous function F : Ω → Y , where Y is a Banach space, and the restriction of F to M is C .This setup encompasses a large class of autonomous dynamical systems encoun-tered in applications. For instance, as a prototypical ODE example with quasiperiodicbehavior, one can consider an ergodic rotation Φ t : T → T on the 2-torus, in whichcase Ω = M = X = T and µ is the Haar measure. The L63 system from Figure 1 isan example of a smooth dissipative ﬂow on Ω = R , with a rigorously known mixingattractor X ⊂ Ω [43, 60] and compact absorbing balls M ⊃ X [40]. The assumptionsstated above also hold for classes of dissipative PDE models possessing inertial man-ifolds [15].Within this class of models, our goal is as follows: Given time-ordered data y , y , . . . , y N − ∈ Y with y n = F ( x n ) , sampled along a dynamical trajectory x n = Φ n ∆ t ( x ) at an interval ∆ t >

0, identify a collection of functions ζ j : Ω → C whichevolve coherently under the dynamics. Intuitively, by that we mean that the dynami-cally evolved functions ζ j ◦ Φ t should be relatable to ζ j in a natural way for t lyingin a “large” interval containing zero. From the perspective of learning theory, thefunctions ζ j are principal components/features, which are to be identiﬁed through anunsupervised learning problem that favors coherence. Note that this approach differssigniﬁcantly from the classical proper orthogonal decomposition (POD) [3, 34, 39],whose goal is to extract features on the basis of explained variance.2.2 Pseudospectral criteria for coherenceTo establish a mathematically precise notion of dynamical coherence, consider theevolution group of unitary Koopman operators U t : L ( µ ) → L ( µ ) , acting on actingon observables by composition with the ﬂow, U t f = f ◦ Φ t [4, 24, 36, 37]. By Stone’stheorem on one-parameter unitary groups [55], the group { U t } t ∈ R is generated by askew-adjoint operator V : D ( V ) → L ( µ ) with a dense domain D ( V ) ⊂ L ( µ ) . Asan operator, V corresponds to an extension of the directional derivative on C ( M ) functions associated with the vector ﬁeld V generating Φ t , namely V f : = V · ∇ f . Inparticular, for any f ∈ D ( V ) , t (cid:55)→ U t f is continuously differentiable in L ( µ ) and ddt U t f = VU t f = U t V f . (1)It is a standard result from ergodic theory [24] that whenever V possesses aneigenfunction z ∈ L ∞ ( µ ) with (cid:107) z (cid:107) L ( µ ) = i ω (wherethe eigenfrequency ω is real by skew-adjointness of V ), then | z ( x ) | = µ -a.e. x ∈ Ω . Thus, we have the periodic evolution U t z = e tV z = e i ω t z , (2)and at least measure-theoretically, U t z can be considered to take values on the unitcircle. This means, in particular, that for µ -a.e. x ∈ Ω , the time series t (cid:55)→ z ( Φ t ( x )) = e i ω t z ( x ) behaves as a Fourier function on R with frequency ω . Due to these facts, we elay-coordinate maps, coherence, and approximate spectra of evolution operators 7 think of Koopman eigenfunctions of measure-preserving ergodic dynamical systemsas exhibiting an “ideal” form of coherence. Indeed, starting from work in the late1990s on data-driven, spectral analysis of Koopman operators [45, 46] and the relatedtransfer operators [21, 22] spectral decomposition of evolution operators has emergedas a popular approach for coherent feature extraction in dynamical systems.Yet, despite their attractive properties, Koopman eigenfunctions in L ( µ ) arenot an appropriate theoretical paradigm for coherent features of dynamical systemswith complex (mixing) behavior. Indeed, a necessary and sufﬁcient condition for ameasure-preserving, ergodic ﬂow to be mixing is that the generator V on L ( µ ) hasa simple eigenvalue 0, with a constant corresponding eigenfunction, and no othereigenvalues. Thus, in this case Koopman eigenfunctions only yield the trivial (con-stant) feature.Systems with so-called mixed spectra exhibit an intermediate behavior, in thesense that they do exhibit non-constant eigenfunctions satisfying (2), but these eigen-functions span only a strict subspace of L ( µ ) and provide no information about themixing component of the dynamics. Speciﬁcally, it is a classical result [33] that L ( µ ) admits an orthogonal decomposition L ( µ ) = H p ⊕ H c (3)into closed, U t -invariant subspaces H p and H c , such that every observable in H p isa linear combination of Koopman eigenfunctions (and thus exhibits a quasiperiodicevolution associated with the point spectrum of the generator), whereas H c = H ⊥ p isa subspace orthogonal to every Koopman eigenfunction, and thus associated with thecontinuous spectrum of the generator. In particular, every observable g ∈ H c exhibitsa form of mixing behavior (called weak-mixing) characterized by a loss of cross-correlation with any observable f ∈ L ( µ ) , viz.,lim t → ∞ C f g ( t ) = , where C f g ( t ) : = t (cid:90) t |(cid:104) f , U s g (cid:105)| ds . (4)Here, (cid:104)· , ·(cid:105) denotes the L ( µ ) inner product, (cid:104) f , g (cid:105) = (cid:82) Ω f ∗ g d µ , taken conjugate-linear in the ﬁrst argument. The issue with feature extraction by pure Koopmaneigenfunctions is that the recovered features cannot capture observables in H c andtheir mixing behavior.Here, as a natural relaxation of (2), we seek observables satisfying the Koopmaneigenvalue equation in an approximate sense. Speciﬁcally, we seek nonzero observ-ables z ∈ L ( µ ) satisfying (cid:107) U t z − e i ω t z (cid:107) L ( µ ) ≤ ε (cid:107) z (cid:107) L ( µ ) , (5)for some ε > ω ∈ R . Every such observable z is said to be an ε -approximate eigen-function of U t , and the complex number e i ω t is said to lie in the ε -approximate pointspectrum of this operator [12]. In addition, we require that the same bound ε holdsfor all t in an interval [ , τ ] with τ >

0. Observables satisfying these conditions with ε (cid:28) τ (cid:29) π / ω then behave to a good approximation as Koopman eigenfunc-tions of measure-preserving ergodic dynamical systems. Note, in particular, that the Dimitrios Giannakis eigenfunctions φ , T and φ , T depicted in Figure 1(b) are strongly suggestive of this be-havior if they are interpreted as the real and imaginary parts of z , i.e., z = φ , T + i φ , T .In the sequel, we will refer to ( e i ω t , z ) satisfying (5) as an ε -approximate eigenpair of U t . It can be shown that because U t is a normal operator, ( e i ω t , z ) is an eigenpair ifand only if it is an ε -approximate eigenpair for every ε > L ( µ ) basedon delay-coordinate maps. To construct appropriate such operators, consider ﬁrst thedistance-like function d : Ω × Ω → R + induced by the norm of Y and the observable F , d ( x , x (cid:48) ) = (cid:107) F ( x ) − F ( x (cid:48) ) (cid:107) Y , and for every T > d T : Ω × Ω → R with d T ( x , x (cid:48) ) = T (cid:90) T d ( Φ t ( x ) , Φ t ( x (cid:48) )) dt . (6)The function d T can be equivalently thought of as being induced from the norm of Y T : = L ([ , T ] ; Y ) under the continuous-time delay-coordinate mapping F T : Ω → Y T with F T ( x )( t ) = F ( Φ t ( x )) ; that is, d T ( x , x (cid:48) ) = (cid:107) F T ( x ) − F T ( x (cid:48) ) (cid:107) Y T / T . By convention, we set d = d .Using d T and a positive, C , bounded shape function h : R + → R + with boundedderivative, we then consider the family of symmetric kernel functions k T : Ω × Ω → R + , such that k T ( x , x (cid:48) ) = h ( d T ( x , x (cid:48) )) . (7)As a concrete example, we will nominally work with the choice h ( u ) = e − u / σ ,where σ is a positive bandwidth parameter. This leads to the radial Gaussian ker-nel k T ( x , x (cid:48) ) = e − d T ( x , x (cid:48) ) / σ , which is a common starting point in manifold learningtechniques [6, 14] approximating heat kernels on Riemannian manifolds as σ → X has manifold structure, which would allow us touse these results, it should be noted when Y = R m Gaussian kernels have an impor-tant property that holds irrespective of the regularity of the support of the samplingdistribution of the data, namely they are strictly positive-deﬁnite [54]. See [26] for ad-ditional examples of kernels commonly employed in machine learning applications.Every kernel from (7) induces an integral operator K T : L ( µ ) → L ( µ ) such that K T f = (cid:90) Ω k T ( · , x ) f ( x ) d µ ( x ) . (8)By symmetry and continuity of k T and compactness of X , K T is a positive-deﬁnite,self-adjoint, Hilbert-Schmidt integral operator with Hilbert-Schmidt norm equal to elay-coordinate maps, coherence, and approximate spectra of evolution operators 9 (cid:107) k T (cid:107) L ( µ × µ ) . As a result there exists an orthonormal basis { φ , T , φ , T , . . . } of L ( µ ) consisting of eigenfunctions of K T corresponding to the eigenvalues λ , T ≥ λ , T ≥· · · (cid:38)

0. The latter are all real, and have ﬁnite multiplicity whenever nonzero by com-pactness of K T . In addition, by continuous differentiability of k T and compactness of X , every element of in the range of K T has a representative in C ( M ) . In particular,every eigenfunction φ j , T with nonzero corresponding eigenvalue has the continuousrepresentative ϕ j , T = λ j , T (cid:90) Ω k T ( · , x ) φ j , T ( x ) d µ ( x ) , (9)whose restriction on M is C . Note that ϕ j , T is an everywhere-deﬁned function on Ω ,as opposed to the left-hand side of (8) which is an L ( µ ) -element deﬁned only up tosets of µ -measure zero. We let σ p ( K T ) = { λ , T , λ , T , . . . } denote the point spectrumof K T .In the following subsection, we will show that appropriate linear combinationsof eigenfunctions φ j , T are ε -approximate eigenfunctions of the Koopman operator,satisfying (5) for a threshold ε that decreases as T increases, but increases as λ j , T decreases. The continuous representatives of these eigenfunctions will then providethe coherent features ζ j . Remark 1

In this section, we have opted to work with delay-coordinate maps in con-tinuous time as this will facilitate the derivation of ε -approximate spectral boundsvalid for continuous time intervals. We will later pass to the more common discrete-time formulation based on the sampling interval ∆ t , which will introduce quadratureerrors in (6) that vanish as ∆ t →

0. In addition, aside from the class of radial ker-nels in (7), our results hold with straightforward modiﬁcations to other classes ofkernels with T → ∞ limits in L ( µ × µ ) . Examples include the covariance kernelsemployed by SSA (which can be obtained by polarization of (7) using a linear shapefunction), Markov-normalized kernels [9, 13, 14], and variable-bandwidth kernels[8]. It is also possible to replace the kernel family k T in (7), which is obtained by aapplication of a ﬁxed shape function to the T -dependent functions d T , by a family˜ k T obtained by averaging a ﬁxed continuous kernel function k : Ω × Ω (cid:55)→ R , i.e.,˜ k T ( x , x (cid:48) ) = (cid:82) t k ( Φ t ( x ) , Φ t ( x (cid:48) )) dt / T . See [17] for further details.2.4 Dynamically coherent eigenfunctionsAccording to the theory of delay-coordinate maps, e.g., [23, 49, 51], for a sufﬁcientlylong window, the delay-coordinate map F T becomes homeomorphic on the compactsupport X of the invariant measure for a large class of dynamical systems and ob-servation functions F , even if F | X is not injective. This property has been widelyemployed in techniques for state space reconstruction [48] and forecasting [50]. Ourinterest here, however, is not so much on topological reconstruction, but rather on theeffect of delay-coordinate maps on the spectral properties of kernel integral operatorson L ( µ ) , irrespective of the injectivity properties of F . To that end, we begin with aproposition that summarizes some of the results on the limiting behavior of operatorsin the family K T from (8), reported in [17]. Proposition 1

As T → ∞ , the following hold:(i) The distance-like functions d T converge in L ( µ × µ ) norm to a function d ∞ ,which is invariant under the Koopman operator U t ⊗ U t of the product dynamical sys-tem on Ω × Ω for any t ∈ R . Correspondingly, the kernel functions k T also convergein L ( µ × µ ) to a U t ⊗ U t -invariant kernel k ∞ .(ii) The sequence of operators K T converges in L ( µ ) operator norm to theHilbert-Schmidt integral operator K ∞ associated with k ∞ .(iii) For every t ∈ R , K ∞ and the Koopman operator U t commute.(iv) The continuous spectrum subspace H c lies in the nullspace of K ∞ . While we refer the reader to [17] for a proof of this proposition, we note here thatClaim (i) follows from the fact that with the deﬁnition in (6), d T corresponds to acontinuous-time Birkhoff average of the continuous function d ∈ C ( Ω × Ω ) underthe product dynamical ﬂow Φ t × Φ t . The existence and U t ⊗ U t -invariance of d ∞ isthen a consequence of the pointwise ergodic theorem. The remaining claims of Propo-sition 1 can then be deduced by the U t ⊗ U t -invariance of k ∞ . It is also worthwhilenoting that, since Φ t is mixing with respect to µ if and only if Φ t × Φ t is ergodicwith respect to µ × µ , it follows that d ∞ is constant in L ( µ × µ ) sense if and onlyif the dynamics Φ t is µ -mixing. In that case, d ∞ is µ × µ -a.e. constant by ergodicity,and thus K ∞ is a kernel integral operator with constant kernel. This implies that thenullspace of K ∞ consists of all L ( µ ) functions orthogonal to the constant. The latter,comprise precisely the subspace H c under mixing dynamics, and we conclude thatker K ∞ = H c . This last relationship is a special case of Proposition 1(iv) for mixingsystems.For our purposes, the main corollaries of Proposition 1, which follow from Claims (iii)and (ii), respectively, in conjunction with compactness of K T and K ∞ are: Corollary 1

Every eigenspace E of K ∞ corresponding to a nonzero eigenvalue isa ﬁnite union of Koopman eigenspaces, and the restriction V | E of the generator isunitarily diagonalizable. It further follows from skew-adjointness of the generatorand ergodicity that E is even-dimensional if and only if is orthogonal to constantfunctions (i.e., the nullspace of V ). Corollary 2

For every nonzero eigenvalue λ j of K ∞ , the sequence of eigenvalues λ j , T of K T satisﬁes lim T → ∞ λ j , T = λ j . Moreover, the orthogonal projections ontothe corresponding eigenspaces converge in operator norm. Conversely, if a sequence λ T of eigenvalues of K T has a T → ∞ nonzero limit λ ∞ , then λ ∞ is necessarily aneigenvalue of K ∞ . Suppose now that E is a two-dimensional eigenspace of K ∞ corresponding to anonzero eigenvalue λ , where we have suppressed the j subscript for simplicity ofnotation. Then, by Corollary 1, E is a union of two Koopman eigenspaces orthogonalto ker V . Let also { φ , ψ } be an orthonormal basis of E , where the eigenfunctions φ and ψ are real (such a basis can always be found since the kernel k ∞ is real) and L ( µ ) -orthogonal to the constants. Then, it follows by skew-adjointness and realityof V that (cid:104) φ , V φ (cid:105) = (cid:104) ψ , V ψ (cid:105) = , elay-coordinate maps, coherence, and approximate spectra of evolution operators 11 whereas ω : = (cid:104) ψ , V φ (cid:105) = −(cid:104) φ , V ψ (cid:105) is real. In addition, ω is nonzero since E is a V -invariant subspace of L ( µ ) orthogo-nal to ker V . Deﬁning z = ( φ + i ψ ) / √

2, we get

V z = (cid:104) φ , V z (cid:105) φ + (cid:104) ψ , V z (cid:105) = − i ωφ + ωψ = i ω z , so we conclude that z is a Koopman eigenfunction corresponding to eigenfrequency ω . By construction, this eigenfunction has unit L ( µ ) norm, so for any t ∈ R we have α t : = (cid:104) z , U t z (cid:105) = e i ω t , and if we interpret α t as an instantaneous autocorrelation function for z (cf. the time-averaged cross-correlation in (4)), it follows that we can recover Koopman eigen-values from the time-autocorrelation functions of the corresponding eigenfunctions.It also follows from the generator equation (1) that ω can be determined from thederivative of the autocorrelation function at 0, i ω = ˙ α t | t = .Our main result, stated in the form of the following theorem, is essentially a gen-eralization of these basic observations to ε -approximate eigenfunctions of U t con-structed from eigenfunctions of K T with ﬁnite delay-embedding window T : Theorem 1

With the assumptions and notation of Sections 2.1–2.3, let φ and ψ bemutually-orthogonal, unit-norm, real eigenfunctions of K T corresponding to nonzeroeigenvalues λ T and ν T , respectively, with λ T ≤ ν T . Assume that λ T , ν T are simple ifdistinct and twofold-degenerate if equal. Deﬁnez = √ ( φ + i ψ ) , α t = (cid:104) z , U t z (cid:105) , ω = (cid:104) ψ , V φ (cid:105) ≡ i (cid:104) z , V z (cid:105) ≡ i ˙ α t | t = , where ω is real, and set γ T = min u ∈ σ p ( K T ) \{ λ T , ν T } { min {| λ T − u | , | ν T − u |}} , δ T = √ ( ν T − λ T ) , ˜ δ T = δ T ν T . Then, the following hold for every t ≥ :(i) The autocorrelation function α t lies in the ˜ ε t -approximate point spectrum ofU t , and z is a corresponding ˜ ε t -approximate eigenfunction for the bound ˜ ε t = s t + (cid:112) S t , where s t = γ T (cid:18) C tT + δ T (cid:19) , S t = C (cid:107) V (cid:107) ( + ˜ δ T ) λ T (cid:90) t s u du . Here, (cid:107) V (cid:107) is the norm of the dynamical vector ﬁeld, viewed as a bounded operator V : C ( M ) → C ( M ) , and C and C are constants that depend only on the observationmap F. Explicitly, we haveC = (cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( X × X ) , C = (cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( M × M ) . (ii) The modulus | ω | is independent of the choice of real orthonormal basis { φ , ψ } for the eigenspace(s) corresponding to λ T and ν T . Moreover, the phase factore i ω t is related to the autocorrelation function according to the bound | α t − e i ω t | ≤ (cid:112) S t . Note that s t and S t in Theorem 1 are increasing functions of t ≥

0. This, in con-junction with the fact that (cid:107) U t z − e i ω t z (cid:107) L ( µ ) ≤ (cid:107) U t z − α t z (cid:107) L ( µ ) + | α t − e i ω t | , leads tothe following corollary, which shows how to attain the bound in (5) valid uniformlyover a bounded time interval. Corollary 3

The phase factor e i ω t lies in the ε t -approximate point spectrum of U t ,and z is a corresponding ε t -approximate eigenfunction for the bound ε t = s t + (cid:112) S t . Moreover, for every τ ≥ , ( e i ω t , z ) is an ε τ -approximate eigenpair of U t for all t ∈ [ , τ ] . This eigenpair has the continuous representative ζ ∈ C ( Ω ) given by ζ = √ (cid:90) Ω k T ( · , x ) (cid:18) φ ( x ) λ T + i ψ ( x ) ν T (cid:19) d µ ( x ) , which acts as an everywhere-deﬁned, continuous coherent feature on the state space Ω . Theorem 1 will be proved in Section 3. We now discuss some of the intuitive as-pects of the results. First, it should be noted that the bounds established are not sharp,as there are systems for which one can readily construct integral operators K T withﬁnite embedding windows T and common eigenspaces with the Koopman operator.Examples include operators derived from translation-invariant kernels on tori underquasiperiodic dynamics [18, 27]; e.g., the heat kernel associated with the ﬂat metric.For such kernels, there exist eigenfunctions z which are also Koopman eigenfunc-tions, and the corresponding autocorrelation coefﬁcients α t lie in the ε -approximatepoint spectra of U t for any ε > t ∈ R . Still, even without sharp bounds, The-orem 1 provides useful information on the spectral properties of integral operatorsutilizing delay-coordinate maps that promote or inhibit dynamical coherence, as fol-lows.1. As one might expect, the bounds in Theorem 1 become weaker as the regular-ity of the observation map F and kernel shape function h decrease, in the sense that˜ ε t and ε t are increasing functions of the C norms of d and h .2. For ﬁxed t , the strength of the bounds is an interplay between the length T of the embedding window, the eigenvalue λ T , the gap γ T (measuring the isolationof the eigenspaces corresponding to λ T and ν T from the rest of the point spectrumof K T ), and the gaps δ T , ˜ δ T (measuring the extent at which λ T and ν T fail to betwofold-degenerate). Inspecting the dependence of the functions s t and S t on theseterms indicates that, in general, the bounds become stronger as the window length T increases and/or the gaps δ T , ˜ δ T decrease, whereas they weaken as λ T and/or the gap γ T decrease. Of course, these terms cannot be independently controlled as T varies, elay-coordinate maps, coherence, and approximate spectra of evolution operators 13 and the expected coherence of z on the basis of Theorem 1 will depend on theircombined effect. It should be noted that Theorem 1 does not make an assertion aboutexistence of T → ∞ limits for the ε -approximate eigenpairs ( e i ω t , z ) , although as wediscuss below there are particular cases for such limits exist.3. Suppose that the eigenvalue sequence λ T has a nonzero T → ∞ limit λ ∞ . Then,by Proposition 1, λ ∞ is a nonzero eigenvalue of the compact operator K ∞ . By thesame proposition, if the eigenspace E corresponding to λ ∞ does not contain constantfunctions it is even-dimensional, so the gap coefﬁcients δ T and ˜ δ T converge to 0.If, further, E is two-dimensional, the gap γ T converges to a nonzero value. In suchcases, Theorem 1 and Corollary 3 imply that for any τ ≥ ε >

0, there exists T ∗ > T > T ∗ , (5) holds for all t ∈ [ , τ ] . This implies in turn thatfor such a sequence λ T there is a subsequence of frequencies ω converging to aneigenfrequency of the generator (where we consider a subsequence to account forpossible sign ﬂips due the choice of functions φ and ψ at each T ). Moreover, thecorresponding observables z similarly approximate Koopman eigenfunctions.4. Suppose now that the dynamics is mixing with respect to the invariant mea-sure µ . Then, all eigenvalues λ T with non-constant corresponding eigenfunctionsconverge to 0 as T → ∞ , and therefore the gaps γ T , δ T , and ˜ δ T also converge to0. In that case, the asymptotic behavior of ε t as T → ∞ depends on the behavior of η T : = γ T λ T T , with ε t approaching inﬁnity, and thus failing to provide a useful bound,if η T converges to 0. However, the possibility still remains that the rate of decay of γ T and λ T is slow-enough such that η T is large, and thus ε t small on a large inter-val [ , τ ] . The numerical results om Figure 1 indicate that suitably constructed kernelintegral operators for the L63 system indeed exhibit this behavior, enabling identi-ﬁcation of highly coherent observables through their eigenfunctions. An intriguingquestion (lying outside the scope of this work) is whether there are mixing dynamicalsystems and integral operators for which η T actually diverges as T → ∞ . z and z ∗ are mutually orthogonal unit vectors in L ( µ ) , and U = Id, webegin by writing down the expansion U t z = α t z + β t z ∗ + r t , (10)where α t = (cid:104) z , U t z (cid:105) (as in the statement of the theorem), β t = (cid:104) z ∗ , U t z (cid:105) , r t is a residualorthogonal to both z and z ∗ , and | α t | ≤ , | β t | ≤ , (cid:107) r t (cid:107) L ( µ ) ≤ , α = , β = (cid:107) r (cid:107) L ( µ ) = . (11)It then follows that (cid:107) U t z − α t z (cid:107) L ( µ ) ≤ | β t | + (cid:107) r t (cid:107) L ( µ ) , (12)and we will prove the ﬁrst claim of the theorem by bounding | β t | and (cid:107) r t (cid:107) L ( µ ) . To that end, note ﬁrst that by skew-symmetry and reality of V , and by deﬁnitionof the L ( µ ) inner product, (cid:104) z ∗ , V z (cid:105) = −(cid:104) V z ∗ , z (cid:105) = −(cid:104) ( V z ) ∗ , z (cid:105) = −(cid:104) V z , z ∗ (cid:105) ∗ = −(cid:104) z ∗ , V z (cid:105) , so (cid:104) z ∗ , V z (cid:105) =

0. Moreover, (cid:104) z , V z (cid:105) ∗ = (cid:104) z ∗ , ( V z ) ∗ (cid:105) = (cid:104) z ∗ , V z ∗ (cid:105) = −(cid:104) V z ∗ , z ∗ (cid:105) = −(cid:104) z ∗ , V z ∗ (cid:105) ∗ = −(cid:104) z , V z (cid:105) , so (cid:104) z , V z (cid:105) and (cid:104) z ∗ , V z ∗ (cid:105) are purely imaginary. In fact, it follows from the deﬁnition of z that (cid:104) z , V z (cid:105) / i = (cid:104) ψ , V φ (cid:105) = ω , (13)and from the deﬁnition of the generator that1 i (cid:104) z , V z (cid:105) = lim t → it (cid:104) z , ( U t − Id ) z (cid:105) = lim t → it ( α t − ) = ˙ α t | t = , so we can use (cid:104) z , V z (cid:105) / i and ˙ α t | t = / i as alternative deﬁnitions of the frequency ω asin the statement of Theorem 1.Using these relationships, the generator equation in (1), and the bound for | β t | in (11), we obtain ddt | β t | = (cid:18) β ∗ t d β t dt (cid:19) = (cid:18) β ∗ t ddt (cid:104) z ∗ , U t z (cid:105) (cid:19) = (cid:0) β ∗ t (cid:104) z ∗ , VU t z (cid:105) (cid:1) = − (cid:0) β ∗ t (cid:104) V z ∗ , U t z (cid:105) (cid:1) = − ( β ∗ t (cid:104) V z ∗ , α t z + β t z ∗ + r t (cid:105) )= − ( β ∗ t (cid:104) V z ∗ , r t (cid:105) ) ≤ | β t ||(cid:104) V z ∗ , r t (cid:105)| ≤ (cid:107) V z (cid:107) L ( µ ) (cid:107) r t (cid:107) L ( µ ) . Therefore, the squared modulus | β t | is bounded by a solution of the differentialinequality ddt | β t | ≤ (cid:107) V z (cid:107) L ( µ ) (cid:107) r t (cid:107) L ( µ ) , β = , (14)where we have used (11) to set the initial conditions Note that we were able to usethe generator equation in order to arrive at this relation since z ∈ ran K T , and everyelement in ran K T has a C ( M ) representative and thus lies in the domain of the gen-erator, D ( V ) .Inspecting (12) and (14) indicates that the norm of the residual (cid:107) r t (cid:107) L ( µ ) bounds (cid:107) U t z − α t z (cid:107) L ( µ ) both directly, in (12), and indirectly by bounding the rate of growthof | β t | , in (14). In addition, ddt | β t | depends on the norm (cid:107) V z (cid:107) L ( µ ) . The followingtwo lemmas are useful for estimating these terms. Lemma 1

With the notation and assumptions of Theorem 1, for every t ≥ and T > the commutator [ U t , K T ] satisﬁes (cid:107) [ U t , K T ] (cid:107) ≤ (cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( X × X ) tT , where (cid:107)·(cid:107) denotes L ( µ ) operator norm. elay-coordinate maps, coherence, and approximate spectra of evolution operators 15 Proof

The proof follows closely that of Lemma 19 in [17], which established a sim-ilar result for discrete-time sampling and C ( X ) operator norm. In particular, it is adirect consequence of the deﬁnition of the delay-coordinate distance d T in (6) thatfor any x , x (cid:48) ∈ X and t ≥ d T ( Φ t ( x ) , Φ t ( x (cid:48) )) = T (cid:90) T d ( Φ t + u ( x ) , Φ t + u ( x (cid:48) )) du = d T ( x , x (cid:48) ) + T (cid:18) (cid:90) T + tT du − (cid:90) t du (cid:19) d ( Φ u ( x ) , Φ u ( x (cid:48) )) . Therefore, | d T ( Φ t ( x ) , Φ t ( x (cid:48) )) − d T ( x , x (cid:48) ) | ≤ (cid:107) d (cid:107) C ( X × X ) tT , and using the above and the deﬁnition of the kernel k T in (7) we get | k T ( Φ t ( x ) , Φ t ( x (cid:48) )) − k T ( x , x (cid:48) ) | = | h ( k T ( Φ t ( x ) , Φ t ( x (cid:48) ))) − h ( k T ( x , x (cid:48) )) |≤ (cid:107) h (cid:107) C ( R + ) | d T ( Φ t ( x ) , Φ t ( x (cid:48) )) − d T ( x , x (cid:48) ) |≤ (cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( X × X ) tT . (15)It then follows that for any f ∈ L ( µ ) (cid:107) U t K T f − K T U t f (cid:107) L ( µ ) = (cid:13)(cid:13)(cid:13)(cid:13) (cid:90) Ω (cid:0) k T ( Φ t ( · ) , x ) f ( x ) − k T ( · , x ) f ( Φ t ( x )) (cid:1) d µ ( x ) (cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) = (cid:13)(cid:13)(cid:13)(cid:13) (cid:90) Ω (cid:0) k T ( Φ t ( · ) , Φ t ( x )) − k T ( · , x ) (cid:1) U t f ( x ) d µ ( x ) (cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) ≤ (cid:107) k T ( Φ t ( · ) , Φ t ( · )) − k T (cid:107) C ( X × X ) (cid:107) U t f (cid:107) L ( µ ) ≤ (cid:107) k T ( Φ t ( · ) , Φ t ( · )) − k T (cid:107) C ( X × X ) (cid:107) f (cid:107) L ( µ ) . Note that to obtain the second and last lines in the displayed equations above we usedthe fact that µ is an invariant probability measure under the ﬂow Φ t . Using this resultand (15), we arrive at (cid:107) [ U t , K T ] (cid:107) = (cid:107) U t K T − K T U t (cid:107) ≤ (cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( X × X ) tT , proving the lemma. (cid:117)(cid:116) Lemma 2

With the notation and assumptions of Theorem 1, the family of operators { A T = V K T | T > } is uniformly bounded on L ( µ ) with (cid:107) A T (cid:107) ≤ (cid:107) V (cid:107)(cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( M × M ) . Proof

We use the notation V : C ( M × M ) → C ( M × M ) to represent the differentialoperator on C ( M × M ) which acts by the dynamical vector ﬁeld V : C ( M ) → C ( M ) along the ﬁrst coordinate; i.e., V f ( x , x (cid:48) ) = lim t → f ( Φ t ( x ) , x (cid:48) ) t = V f x (cid:48) ( x ) , where f x (cid:48) = f ( · , x (cid:48) ) ∈ C ( M ) . Note that V and V have equal operator norms, (cid:107) V (cid:107) = (cid:107) V (cid:107) . Moreover, V commutes with the induced action by the product dynamical ﬂow Φ t ⊗ Φ t on C ( M × M ) , in the sense that V ( f ◦ Φ t ⊗ Φ t ) = ( V f ) ◦ Φ t ⊗ Φ t , ∀ t ≥ , ∀ f ∈ C ( M × M ) . Using these facts, we obtain (cid:107) V d T (cid:107) C ( X × X ) = (cid:13)(cid:13)(cid:13)(cid:13) T (cid:90) T V ( d ◦ Φ t ⊗ Φ t ) dt (cid:13)(cid:13)(cid:13)(cid:13) C ( X × X ) = (cid:13)(cid:13)(cid:13)(cid:13) T (cid:90) T ( V d ) ◦ Φ t ⊗ Φ t dt (cid:13)(cid:13)(cid:13)(cid:13) C ( X × X ) ≤ (cid:107) V d (cid:107) C ( X × X ) ≤ (cid:107) V (cid:107)(cid:107) d (cid:107) C ( M × M ) = (cid:107) V (cid:107)(cid:107) d (cid:107) C ( M × M ) , and thus (cid:107) V k T (cid:107) C ( X × X ) = (cid:107) V ( h ◦ d T ) (cid:107) C ( X × X ) ≤ (cid:107) h (cid:107) C ( R + ) (cid:107) V d T (cid:107) C ( X × X ) ≤ (cid:107) h (cid:107) C ( R + ) (cid:107) V (cid:107)(cid:107) d (cid:107) C ( M × M ) . (16)Now, because k T lies in C ( M × M ) , for every f ∈ L ( µ ) we have A T f = V K T f = V (cid:90) Ω k T ( · , x ) f ( x ) d µ ( x ) = (cid:90) Ω V k T ( · , x ) f ( x ) d µ ( x ) , so A T is a kernel integral operator on L ( µ ) whose kernel V k T is continuous on X × X . The L ( µ ) operator norm of A T therefore satisﬁes (cid:107) A T (cid:107) ≤ (cid:107) V k T (cid:107) C ( X × X ) , and the claim of the lemma follows from (16). (cid:117)(cid:116) With these results in place, we proceed to bound (cid:107) r t (cid:107) L ( µ ) . First, acting with K T on both sides of (10), we obtain K T U t z = α t K T z + β t K T z ∗ + K T r t = α t √ ( λ T φ + i ν T ψ ) + β t √ ( λ T φ − i ν T ψ ) + K T r t = λ T ( α t z + β t z ∗ ) + i δ T ( α t − β t ) ψ + K T r t = λ T U t z + i δ T ( α t − β t ) ψ + ( K T − λ T ) r t = √ U t ( K T φ + i λ T ψ ) + i δ T ( α t − β t ) ψ + ( K T − λ T ) r t elay-coordinate maps, coherence, and approximate spectra of evolution operators 17 = √ U t ( K T φ + iK T ψ ) + i δ T ( α t − β t − U t ) ψ + ( K T − λ T ) r t = U t K T z + i δ T ( α t − β t − U t ) ψ + ( K T − λ T ) r t . Therefore, ( K T − λ T ) r t = − [ U t , K T ] z + i δ T ( U t − α t − β t ) ψ . which, in conjunction with (11), leads to (cid:107) ( K T − λ T ) z (cid:107) L ( µ ) ≤ (cid:107) [ U t , K T ] (cid:107) + δ T . (17)On the other hand, (cid:107) ( K T − λ T ) r t (cid:107) L ( µ ) = ∑ λ j , T ∈ σ p ( K T ) \{ λ T , ν T } ( λ j , T − λ T ) |(cid:104) φ j , T , r t (cid:105)| ≥ ∑ λ j , T ∈ σ p ( K T ) \{ λ T , ν T } γ T |(cid:104) φ j , T , r t (cid:105)| = γ T (cid:107) r t (cid:107) L ( µ ) , (18)and using (17), (18), and Lemma 1, we arrive at the bound (cid:107) r t (cid:107) L ( µ ) ≤ γ T (cid:32) (cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( X × X ) tT + δ T (cid:33) = s t , (19)where the function s t deﬁned in the statement of Theorem 1.Next, it follows from Lemma 2 that (cid:107) V z (cid:107) L ( µ ) = √ (cid:107) V ( φ + i ψ ) (cid:107) L ( µ ) = √ (cid:13)(cid:13)(cid:13)(cid:13) V K T (cid:18) φλ T + i ψν T (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) = λ T (cid:13)(cid:13)(cid:13)(cid:13) A T (cid:18) z + i √ (cid:18) ν T − λ T (cid:19) ψ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) ≤ (cid:107) A T (cid:107) ( + ˜ δ T ) λ T = (cid:107) V (cid:107)(cid:107) h (cid:107) C ( R + ) (cid:107) d (cid:107) C ( M × M ) ( + ˜ δ T ) λ T . (20)Inserting the estimates for (cid:107) r t (cid:107) L ( µ ) and (cid:107) V z (cid:107) L ( µ ) in (19) and (20), respectively,into (14), and using the deﬁnition of the constant C in the statement of the theorem,then leads to the differential inequality ddt | β t | ≤ C (cid:107) V (cid:107) ( + ˜ δ T ) λ T s t , β = , and integrating we obtain | β t | ≤ C (cid:107) V (cid:107) ( + ˜ δ T ) λ T (cid:90) t s u du = C ( + ˜ δ T ) λ T γ T (cid:18) C t T + δ T t (cid:19) = S t , (21)where the function S t is deﬁned in the statement of the theorem. Substituting (19)and (21) into (12) then leads to (cid:107) U t z − α t z (cid:107) L ( µ ) ≤ s t + √ S t , proving Claim (i) of thetheorem. ω is independent of the choice of mutually orthonormal basisfunctions φ and ψ , it is sufﬁcient to consider the following two cases: – Case I: λ T and ν T are simple eigenvalues. In this case, the claim is obvious sinceany unit-norm eigenvectors φ (cid:48) and ψ (cid:48) corresponding to λ T and ν T , respectively,are related to φ and ψ by φ (cid:48) = c φ φ , ψ (cid:48) = c ψ ψ , where c φ , c ψ ∈ {− , } . – Case II: λ T = ν T are twofold-degenerate eigenvalues. To verify the claim, let { φ (cid:48) , ψ (cid:48) } be any real, orthonormal basis of the corresponding eigenspace, E . Then,there exists a 2 × O such that (cid:18) φ (cid:48) ψ (cid:48) (cid:19) = O (cid:18) φψ (cid:19) , O = (cid:18) O φφ O φψ O ψφ O ψψ (cid:19) . Since (cid:104) φ , V φ (cid:105) = (cid:104) ψ , V ψ (cid:105) = V , in conjunc-tion with reality of φ and ψ ), we have |(cid:104) ψ (cid:48) , V φ (cid:48) (cid:105)| = |(cid:104) O ψφ φ + O ψψ ψ , O φφ V φ + O φψ V ψ (cid:105)| = | ( O ψφ O φψ − O φφ O ψψ ) ω | = | det O || ω | = | ω | , proving that | ω | is independent of the choice of real orthonormal basis of E .Next, to bound | α t − e i ω t | , we follow a differential inequality approach similar tothat used to bound | β t | in Section 3.1. In particular, let a t = α t − e i ω t . We have | a t | = | α t | + − ( α t e − i ω t ) , and therefore ddt | a t | ≤ ddt | α t | + (cid:12)(cid:12)(cid:12)(cid:12) Re ddt (cid:0) α t e − i ω t (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) . (22)To place a bound on the ﬁrst term in the right-hand side of (22), observe that ddt | α t | = (cid:18) α ∗ t d α t dt (cid:19) = (cid:0) α ∗ t (cid:104) z , U t V z (cid:105) (cid:1) = − (cid:0) α ∗ t (cid:104) V z , U t z (cid:105) (cid:1) = − ( α ∗ t (cid:104) V z , α t z + β t z ∗ + r t (cid:105) )= − (cid:0) | α t | (cid:104) V z , z (cid:105) + α ∗ t β t (cid:104) V z , z ∗ (cid:105) + α ∗ t (cid:104) V z , r t (cid:105) (cid:1) = − ( α ∗ t (cid:104) V z , r t (cid:105) ) ≤ | α t ||(cid:104) V z , r t (cid:105)| ≤ |(cid:104) V z , r t (cid:105)| . (23)Note that to obtain the equality in the second-to-last line we used the facts that (cid:104) z ∗ , V z (cid:105) and (cid:104) z , V z (cid:105) are vanishing and purely imaginary, respectively (see Section 3.1). More-over, we used the bound | α t | ≤ (cid:104) z ∗ , V z (cid:105) =

0, leads to (cid:12)(cid:12)(cid:12)(cid:12) Re ddt (cid:0) α t e − i ω t (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12) Re (cid:0) (cid:104) V z , α t z + β t z ∗ + r t (cid:105) e − i ω t − i ωα t e − i ω t (cid:1)(cid:12)(cid:12) elay-coordinate maps, coherence, and approximate spectra of evolution operators 19 = (cid:12)(cid:12) Re ( (cid:104) V z , r t (cid:105) e − i ω t ) (cid:12)(cid:12) ≤ |(cid:104) V z , r t (cid:105)| , (24)and inserting (23) and (24) into (22), we obtain ddt | a t | ≤ |(cid:104) V z , r t (cid:105)| = C (cid:107) V (cid:107) ( + ˜ δ T ) λ T s t . Integrating this differential inequality subject to the initial condition a = α − = | α t − e i ω t | = | a t | ≤ S t , and the bound in Claim (ii) of Theorem 1 follows. This completes our proof of thetheorem. In this section, we consider how to approximate the eigenvalues and eigenfunctionsof the integral operator K T , as well as the frequency ω and autocorrelation function α t , from the time series data y , . . . , y N − sampled at the interval ∆ t , as described inSection 2.1. Aside from errors associated by approximating continuous-time delay-coordinate maps by (their more familiar) discrete-time analogs, error analyses forthe approximation scheme described below have been performed elsewhere [17, 19,32]. Here, we limit ourselves to a high-level description of the construction and itsconvergence in the large-data limit, relegating technical details to these references.4.1 Construction of the data-driven approximation schemeThe main steps in the construction of the approximation scheme are as follows: Step 1 (Discrete-time delay-coordinate map)

Replace the continuous-time delay-coordinate map F T : Ω → L ([ , T ] ; Y ) by the discrete-time map F Q , ∆ t : Ω → Y Q given by F Q , ∆ t ( x ) = ( F ( x ) , F ( Φ ∆ t ( x )) , . . . , F ( Φ ( Q − ) ∆ t ( x ))) . (25)Here, Q is an integer parameter corresponding to the number of delays. The map F Q , ∆ t with ∆ t = T / Q then induces a continuous distance-like function d T , ∆ t : Ω × Ω → R + , d T , ∆ t ( x , x (cid:48) ) = Q (cid:107) F Q , ∆ t ( x ) − F Q , ∆ t ( x (cid:48) ) (cid:107) Y Q = Q Q − ∑ q = d ( Φ ( q − ) ∆ t ( x ) , Φ ( q − ) ∆ t ( x (cid:48) )) , which is meant to approximate continuous-time function d T from (6). Speciﬁcally,standard properties of quadrature using the rectangle rule [20] lead to the estimates (cid:107) d T − d T , ∆ t (cid:107) C ( M × M ) ≤ (cid:107) d (cid:107) C ( M × M ) ∆ t = (cid:107) d (cid:107) C ( M × M ) T Q , (26) (cid:107) d T − d T , ∆ t (cid:107) C ( M × M ) = o ( ∆ t ) . (27) Similarly, we approximate the continuous-time kernel k T in (7) by k T , ∆ t : = h ◦ d T , ∆ t .Note that (27) merely indicates that as ∆ t → d T , ∆ t converges to d T in C norm.A stronger bound can be obtained if d T has higher than C regularity, e.g., (cid:107) d T − d T , ∆ t (cid:107) C ( M × M ) = O ( ∆ t α ) if it lies in C , α ( M × M ) for some α > Step 2 (Sampling measure)

Replace the Hilbert space L ( µ ) associated with the in-variant measure with the ﬁnite-dimensional Hilbert space L ( µ N ) associated withthe sampling measure µ N = ∑ N − n = δ x n on the dynamical trajectory x , . . . , x N − ∈ Ω underlying the data y , . . . , y N − . Here, δ x denotes the Dirac measure supported at x ∈ Ω . The space L ( µ N ) consists of equivalence classes of measurable, complex-valued functions on Ω with common values at the sampled states x , . . . , x N , and isequipped with the inner product (cid:104) f , g (cid:105) N = (cid:90) Ω f ∗ g d µ N = N N − ∑ n = f ∗ ( x n ) g ( x n ) . For simplicity of exposition, we will assume that all sampled states x n are distinct(by ergodicity, this will be the case aside from trivial cases), so L ( µ N ) is an N -dimensional Hilbert space, canonically isomorphic to C N equipped with a normalizeddot product. Under this isomorphism, an element f ∈ L ( µ N ) is represented by acolumn vector f = ( f , . . . , f N − ) (cid:62) ∈ C N such that f n = f ( x n ) , and we have (cid:104) f , g (cid:105) N = f · g / N . Moreover, a linear map A : L ( µ N ) → L ( µ N ) is represented by an N × N matrix AAA such that

AAA f corresponds to the column vector representation of A f . Wewill also assume without loss of generality that the starting state x (and thus theentire sampled dynamical trajectory) lies in the forward-invariant manifold M , butnote that x need not lie on the support X of the invariant measure. In light of thesefacts, our data-driven schemes can be numerically implemented using standard toolsfrom linear algebra, and as we will see below, their formulation requires few structuralmodiﬁcations of their inﬁnite-dimensional counterparts from Section 2. Step 3 (Data-driven integral operator)

Approximate the kernel integral operator K T : L ( µ ) → L ( µ ) by the operator K T , ∆ t , N : L ( µ N ) → L ( µ N ) , where K T , ∆ t , N f = (cid:90) Ω k T , ∆ t ( · , x ) f ( x ) d µ N ( x ) = N N − ∑ n = k T , ∆ t ( · , x n ) f ( x n ) . This operator is self-adjoint, and there exists a real orthonormal basis { φ , T , ∆ t , N , . . . , φ N − , T , ∆ t , N } of L ( µ N ) consisting of its eigenvectors, with corresponding eigenvalues λ , T , ∆ t , N ≥ λ , T , ∆ t , N ≥ · · · ≥ λ N − , T , ∆ t , N ≥

0. The data-driven operator K T , ∆ t , N is understood asan approximation of K T in the following spectral sense: – Let λ j , T , ∆ t , N be a nonzero eigenvalue of K T , ∆ t , N . Then, λ j , T , ∆ t , N is employed asan approximation of eigenvalue λ j , T of K T . – Eigenfunction φ j , T , ∆ t , N ∈ L ( µ N ) has a continuous representative ϕ j , T , ∆ t , N = λ j , T , ∆ t , N (cid:90) Ω k T , ∆ t ( · , x ) φ j , T , ∆ t , N ( x ) d µ N ( x ) , (28) elay-coordinate maps, coherence, and approximate spectra of evolution operators 21 deﬁned everywhere on Ω . The restriction of ϕ j , T , ∆ t , N to M is a continuously dif-ferentiable function, employed as an approximation of ϕ j , T from (9).Numerically, the eigenvalues and eigenvectors of K T , ∆ t , N are computed by solving theeigenvalue problem for the N × N kernel matrix KKK = [ k T , ∆ t , N ( x m , x n )] mn / N , whichis the matrix representation of K T , ∆ t , N according to Step 2 above. For kernels withrapidly decaying shape functions (e.g., the Gaussian kernels employed in Section 5below), the leading eigenvalues and eigenvectors of KKK are well approximated bythe corresponding eigenvalues and eigenvectors of a sparse matrix obtained by ze-roing out small entries of

KKK , considerably reducing computational cost. See, e.g.,Appendix A in [27], or Appendix B in [17] for further details on numerical imple-mentation.

Step 4 (Shift operator)

For each time t = q ∆ t , q ∈ N , approximate the Koopmanoperator U t : L ( µ ) → L ( µ ) by the q -step shift operator U qN : L ( µ N ) → L ( µ N ) ,deﬁned as U qN f ( x n ) = (cid:40) f ( x n + q ) , ≤ n ≤ N − − q , , n > N − − q . It should be noted that, unlike U t f = f ◦ Φ t , the shift operator U qN is not a compositionoperator by the underlying dynamical ﬂow—this is because Φ t does not preserve µ N -null sets, and thus ◦ Φ t does not lift to an operator on equivalence classes of functionsin L ( µ N ) . In fact, while U t is unitary, U qN is a nilpotent operator with U NN =

0. Still,despite these differences, one can interpret U qN as an approximation of the Koopmanoperator in the following sense: – Let U t : C ( M ) → C ( M ) , t ≥

0, denote the Koopman operator on continuous func-tions on the forward-invariant manifold M . Let also ι N : C ( M ) → L ( µ N ) be thecanonical linear operator mapping C ( M ) functions to their corresponding equiv-alence classes in L ( µ N ) , respectively. Then, for any ﬁxed q ∈ N and continuousfunction f ∈ C ( M ) , we have U qN ◦ ι N f = ι N ◦ U q ∆ t f + r N , (29)where r N ∈ L ( µ N ) are residuals whose norm converges to 0, lim N → ∞ (cid:107) r N (cid:107) L ( µ N ) =

0. In contrast, the Koopman operator on L ( µ ) satisﬁes U t ◦ ι f = ι ◦ U t f for any(ﬁxed) t ∈ R , where ι : C ( M ) → L ( µ ) is the canonical inclusion map. Step 5 (Finite-difference operator)

Approximate the generator V : D ( V ) → L ( µ ) bythe ﬁnite-difference operator V ∆ t , N : L ( µ N ) → L ( µ N ) , where V ∆ t , N = U N − Id ∆ t . Explicitly, we have V ∆ t , N f ( x n ) = (cid:40) ( f ( x n + ) − f ( x n )) / ∆ t , ≤ n ≤ N − , − f ( x N − ) / ∆ t , n = N − . This operator can be understood as an approximation of the generator in the followingsense: – Let V ∆ t : C ( M ) → C ( M ) be the ﬁnite-difference approximation of the dynamicalvector ﬁeld V : C ( M ) → C ( M ) , given by V ∆ t = U ∆ t − Id ∆ t . Then, for any f ∈ C ( M ) , we have V ∆ t , N ◦ ι N f = ι N ◦ V ∆ t f + r ∆ t , N , where lim N → ∞ (cid:107) r ∆ t , N (cid:107) L ( µ N ) =

0. If, in addition, f lies in C ( M ) , then V ∆ t f = V f + r ∆ t , (30)where the residual r ∆ t converges uniformly to 0 as the sampling interval de-creases, lim ∆ t → + (cid:107) r ∆ t (cid:107) C ( M ) =

0. Note that the generator V on L ( µ ) satisﬁes V ◦ ι f = ι ◦ V f for any f ∈ C ( M ) . Step 6 (Coherent features)

In order to construct coherent observables analogouslyto Theorem 1, pick two consecutive, nonzero, simple eigenvalues of K T , ∆ t , N , whichwe denote λ T , ∆ t , N and ν T , ∆ t , N suppressing j subscripts, and consider correspondingreal normalized eigenfunctions φ T , ∆ t , N and ψ T , ∆ t , N , respectively. Alternatively, a sin-gle twofold-degenerate nonzero eigenvalue can be used. Then, form the complex unitvector z T , ∆ t , N = ( φ T , ∆ t , N + i ψ T , ∆ t , N ) / √ ∈ L ( µ N ) , and compute its continuous rep-resentative ζ ∆ t , N = √ (cid:90) Ω k T , ∆ t , N ( · , x ) (cid:18) φ ∆ t , N λ ∆ t , N + i ψ ∆ t , N ν ∆ t , N (cid:19) d µ N ( x ) . (31)The function ζ ∆ t , N is employed as a data-driven coherent feature, analogous to ζ inCorollary 3. Note, in particular, that ζ ∆ t , N is expressible as a ﬁnite linear combinationof kernel sections k ( · , x n ) , and thus can be empirically evaluated at any point in Ω .Moreover, we construct data-driven analogs of the autocorrelation function α t for t = q ∆ t and the oscillatory frequency ω by computing α q , ∆ t , N = (cid:104) z ∆ t , N , U qN z ∆ t , N (cid:105) N , ω ∆ t , N = (cid:104) ψ ∆ t , N , V ∆ t , N φ ∆ t , N (cid:105) N , (32)respectively.4.2 Convergence in the large-data limitWe are interested in establishing convergence of the data-driven coherent observable ζ ∆ t , N , autocorrelation function α q , ∆ t , N , and oscillatory frequency ω ∆ t , N to their coun-terparts from Section 2 in a limit of large data, N → ∞ , and vanishing sampling inter-val, ∆ t →

0. For that, we follow a similar approach to [17, 32], who employ spectralapproximation results for kernel integral operators by Von Luxburg et al. [42]. Theprincipal elements of this approach are as follows. elay-coordinate maps, coherence, and approximate spectra of evolution operators 23

Operators on continuous functions

Since the operators K T and K T , ∆ t , N act on differ-ent Hilbert spaces, we use the space of continuous functions on the forward-invariantmanifold M as a universal comparison space to establish spectral convergence. Inparticular, since the kernels k T and k T , ∆ t , N are all continuous, one can consider in-tegral operators K T : C ( M ) → C ( M ) and K T , ∆ t , N : C ( M ) → C ( M ) , deﬁned analo-gously to K T : L ( µ ) → L ( µ ) and K T , ∆ t , N : L ( µ N ) → L ( µ N ) , respectively. We thenhave ι ◦ K T = K T ◦ ι and ι N ◦ K T , ∆ t , N = K T , ∆ t , N ◦ ι N , and it is straightforward toverify that λ j , T (resp. λ j , T , ∆ t , N ) is a nonzero eigenvalue of K T (resp. K T , ∆ t , N ) if andonly if it is a nonzero eigenvalue of K T (resp. K T , ∆ t , N ). Moreover, if φ j , T ∈ L ( µ ) (resp. φ j , T , ∆ t , N ∈ L ( µ N ) ) is a corresponding eigenfunction of K T (resp. K T , ∆ t , N ), then ϕ j , T ∈ C ( M ) from (9) (resp. ϕ j , T , ∆ t , N ∈ C ( M ) from (28)) is a corresponding eigen-function of K T (resp. K T , ∆ t , N ). It can further be shown that K T is compact, andclearly K T , ∆ t , N has ﬁnite rank. Ergodicity and physical measures

Let B µ ⊆ Ω be the basin of the ergodic invariantmeasure µ in M , i.e., the set of initial conditions x ∈ Ω such that the correspondingsampling measures µ N weak-converge to µ ,lim N → ∞ E µ N f = E µ f , ∀ f ∈ C b ( Ω ) , (33)for Lebesgue almost every sampling interval ∆ t . Here, E ρ f = (cid:82) Ω f d ρ denotes ex-pectation with respect to a measure ρ , and C b ( Ω ) is the Banach space of continuous,real-valued functions on Ω equipped with the uniform norm. By ergodicity of thedynamical ﬂow Φ t , B µ ∩ X is a dense subset of the support X of µ . Moreover, for aclass of dynamical systems possessing so-called physical measures [63] the basin B µ has positive measure with respect to an ambient probability measure on state space Ω from which initial conditions are drawn, even if X is a null set with respect to thatmeasure. In such situations, the data-driven scheme described in Section 4.1 con-verges from a “large” set of experimentally accessible initial conditions, which neednot lie on the support of µ . Examples include the L63 system, where the the ergodicinvariant measure supported on the Lorenz attractor is a Sinai-Ruelle-Bowen (SRB)measure with a basin of positive Lebesgue measure in Ω = R [60]. For simplicityof exposition, and without loss of generality with regards to asymptotic convergence,we will henceforth assume that the initial state x lies in B µ ∩ M . Moreover, ∆ t → Spectral convergence

Since our approach for coherent feature extraction employs oneigenvalues and eigenvectors of kernel integral operators, it is necessary to ensurethat the family K T , ∆ t , N converges to K T in a sufﬁciently strong sense so as to im-ply spectral convergence. Here, we consider the iterated limit of N → ∞ followed by ∆ t →

0; under the former limit, empirical expectation values with respect to the sam-pling measures converge to expectation values with respect to the invariant measure(according to (33)), and under the latter limit the kernels based on discrete-time delay-coordinate maps converge to their continuous-time counterparts (according to (26)).In particular, we have:

Proposition 2

With notation and assumptions as above, let λ j , T be a nonzero eigen-value of K T , where the ordering λ , T ≥ λ , T ≥ · · · is in decreasing order and includesmultiplicities. Let Π j , T : C ( M ) → C ( M ) be the spectral projection to the correspond-ing eigenspace. Then, the following hold:(i) The j-th eigenvalues λ j , T , ∆ t , N of K T , ∆ t , N (ordered with the same conventionas the eigenvalues of K T ) converge to λ j , T , in the sense of the iterated limit lim ∆ t → lim N → ∞ λ j , T , ∆ t , N = λ j , T . (ii) For any neighborhood Σ ⊆ C such that σ ( K T ) ∩ Σ = { λ j , T } , the spectralprojections Π Σ , T , ∆ t , N of K T , ∆ t , N onto Σ converge strongly to Π j , T . In particular, forany eigenfunction ϕ j , T ∈ C ( M ) of K T corresponding to eigenvalue λ j , T there existeigenfunctions ϕ j , T , ∆ t , N ∈ C ( M ) of K T , ∆ t , N corresponding to λ j , T , ∆ t , N , such that lim ∆ t → lim N → ∞ (cid:107) ϕ j , T , ∆ t , N − ϕ j , T (cid:107) C ( M ) = . Remark 2

Analogous spectral convergence results to Proposition 2 hold for integraloperators with data-dependent kernels k T , ∆ t , N , so long as these kernels have welldeﬁned N → ∞ limits in C ( M ) norm. Examples of such kernels include Markov-normalized kernels [9, 13, 14] and variable-bandwidth Gaussian kernels [8]. See,e.g., Theorem 7 in [32] for a a spectral convergence result for data-dependent kernelsrelated to the kernels employed in the numerical experiments in Section 5.A corollary of Proposition 2 is that the properties the data-driven coherent ob-servable ζ ∆ t , N from (31) and the corresponding empirical autocorrelation functionand oscillatory frequency in (32) converge to their counterparts from Theorem 1, andthus obey the same pseudospectral bounds associated with dynamical coherence. Corollary 4

Under the assumptions of Proposition 2, the following hold in the large-data limit, ∆ t → after N → ∞ , where ζ , α t , and ω are deﬁned in Theorem 1:(i) ζ ∆ t , N converges to the coherent feature ζ , uniformly on the forward-invariantmanifold M, i.e., lim ∆ t → lim N → ∞ (cid:107) ζ ∆ t , N − ζ (cid:107) C ( M ) = . (ii) For any q ∈ N , the empirical autocorrelation α q , ∆ t , N converges to the auto-correlation function α t at t = q ∆ t.(iii) The empirical oscillatory frequency ω ∆ t , N converges to the frequency ω .Proof The uniform convergence of ζ ∆ t , N to ζ in Claim (i) is a direct consequence ofProposition 2. Claim (ii) follows from the Claim (i), in conjunction with the residualestimate in (29), viz.lim ∆ t → lim N → ∞ α q , ∆ t , N = lim ∆ t → lim N → ∞ (cid:104) z ∆ t , N , U qN z ∆ t , N (cid:105) N = lim ∆ t → lim N → ∞ (cid:104) ι N ζ ∆ t , N , U qN ι N ζ ∆ t , N (cid:105) N = lim ∆ t → lim N → ∞ (cid:104) ι N ζ ∆ t , N , ι N U q ∆ t ζ ∆ t , N + r N (cid:105) N elay-coordinate maps, coherence, and approximate spectra of evolution operators 25 = lim ∆ t → lim N → ∞ (cid:104) ι N ζ ∆ t , N , ι N U q ∆ t ζ ∆ t , N (cid:105) N = lim ∆ t → lim N → ∞ N N − ∑ n = ζ ∆ t , N ( x n ) z ∆ t , N ( x n + q )= (cid:90) Ω ζ ∗ U q ∆ t ζ d µ = (cid:104) ιζ , ι U q ∆ t ζ (cid:105) = (cid:104) z , U q ∆ t z (cid:105) = α q ∆ t . Claim (iii) follows similarly, using a ﬁnite-difference residual estimate in (30), inconjunction with the C -norm convergence of k T , ∆ t to k T as ∆ t → K T , ∆ t , N : L ( µ N ) → L ( µ N ) associated with a family of symmetric, Markov-normalized kernels k T , ∆ t , N constructed using the variable-bandwidth Gaussian ker-nels in conjunction with the bistochastic Markov normalization procedure proposedin [8] and [13], respectively. Speciﬁcally, to build k T , ∆ t , N we start from a radial Gaus-sian kernel ¯ k T , ∆ t : Ω × Ω → R + on delay-coordinate mapped data,¯ k T , ∆ t ( x , x (cid:48) ) = exp (cid:32) − d T , ∆ t ( x , x (cid:48) ) ¯ σ (cid:33) , where ¯ σ is a positive bandwidth parameter determined numerically from the data(see, e.g., Algorithm 1 in [27]). Using this kernel, we compute the bandwidth function ρ T , ∆ t , N ∈ C ( Ω ) given by ρ T , ∆ t , N ( x ) = (cid:18) (cid:90) Ω ¯ k T , ∆ t ( x , · ) d µ N (cid:19) − / m = (cid:32) N N − ∑ n = ¯ k T , ∆ t ( x , x n ) (cid:33) − / m . Here, m > X of the invariant measure,computed through the same procedure used to tune the kernel bandwidth ¯ σ . We thenbuild the variable-bandwidth kernel κ T , ∆ t , N : Ω × Ω → R + , where κ T , ∆ t , N ( x , x (cid:48) ) = exp (cid:32) − d T , ∆ t ( x , x (cid:48) ) σ ρ T , ∆ t , N ( x ) ρ T , ∆ t , N ( x (cid:48) ) (cid:33) . (34)In the above, σ is a positive bandwidth parameter determined automatically in asimilar manner as ¯ σ , though note that in general σ and ¯ σ have different values.By construction, κ T , ∆ t , N is, continuous, positive, and bounded away from zeroon M × M . Intuitively, the function ρ − mT , ∆ t , N can be thought of as a kernel estimateof the “sampling density” of the data relative to an ambient measure. The variable-bandwidth construction in (34) can then be thought of as a data-adaptive adjustmentof the bandwidth σ , such that a data point x is assigned a smaller (larger) bandwidth σ ρ T , ∆ t , N ( x ) when the sampling density is higher (lower), thus reducing sensitivity to sampling errors. This intuition can be made precise if the support X has the structureof a Riemannian manifold and µ the structure of a smooth volume form. In that case,the variable-bandwidth kernel effects a conformal change of Riemannian metric onthe data such that in the new geometry the invariant measure has constant densityrelative to the Riemannian volume form; see [27] for further details.Next, we normalize the kernel κ T , ∆ t , N to obtain a symmetric Markov kernel k T , ∆ t , N : Ω × Ω → R + by ﬁrst computing the strictly positive, continuous functions u T , ∆ t , N = (cid:90) Ω κ T , ∆ t , N ( · , x ) d µ N ( x ) , v T , ∆ t , N = (cid:90) Ω κ T , ∆ t , N ( · , x ) u T , ∆ t , N ( x ) d µ N ( x ) , and then deﬁning k T , ∆ t , N ( x , x (cid:48) ) = (cid:90) Ω κ T , ∆ t , N ( x , x (cid:48)(cid:48) ) κ T , ∆ t , N ( x (cid:48)(cid:48) , x ) u T , ∆ t , N ( x ) v T , ∆ t , N ( x (cid:48)(cid:48) ) u T , ∆ t , N ( x (cid:48) ) d µ N ( x (cid:48)(cid:48) ) . (35)It can be readily veriﬁed that with this deﬁnition k T , ∆ t , N is a symmetric, strictly pos-itive kernel with the Markov property, (cid:82) Ω k T , ∆ t , N ( x , x (cid:48) ) d µ N ( x (cid:48) ) =

1, for all x ∈ Ω .Moreover, k T , ∆ t , N is (strictly) positive-deﬁnite on the support of µ N if κ T , ∆ t , N is(strictly) positive-deﬁnite. It can further be shown [32] that in the large-data limit, ∆ t → N → ∞ , k T , ∆ t , N , converges to an L ( µ ) -Markov, symmetric, continuouskernel k T so an analogous spectral convergence result to Proposition 2 holds for thisclass of kernels (see also Remarks 1 and 2).For the purposes of extraction of coherent observables of measure-preserving,ergodic dynamics, symmetric Markov kernels have the natural property of exhibitinga constant eigenfunction corresponding to the top eigenvalue, λ , T , ∆ t , N = λ , T = K T , ∆ t , N , where explicit formation of the kernel in (35) is avoided throughsingular value decomposition of a non-symmetric kernel matrix. K T , ∆ t , N induced by the L63 system on Ω = R with thestandard parameters,˙ x = V ( x ) , x = ( x , x , x ) ∈ R , V ( x ) = ( V , V , V ) , V = ( x − x ) , V = x − x − x x , V = x x − x / . We generate numerical trajectories x , . . . , x ˜ N − ∈ Ω sampled at an interval ∆ t = . ode45 solver. Numerical integration starts at anarbitrary point ˜ x ∈ R , and we allow the state to settle near the Lorenz attractor over aspinup time of 640 time units before collecting the ﬁrst sample x . In anticipation of elay-coordinate maps, coherence, and approximate spectra of evolution operators 27 the fact that we will be using the delay-coordinate map in (25), we sample a total of˜ N = N + Q − Q is the number of delays, and N is ﬁxed at N = Q = T =

0) and another one with Q =

800 corresponding to a delay-embedding window of T = Q ∆ t = T L = / Λ , where Λ ≈ .

91 [53] is the positive Lyapunov expo-nent of the L63 system. The T = T o = .

8. In bothcases we set the observation map F : Ω → Y to the identity map on R , so the cor-responding delay coordinate map F Q , ∆ t takes values in Y Q = R Q . Note that, afterdelay embedding, each experiment has N = y n = F Q , ∆ t available foranalysis, which corresponds to 800 oscillatory timescales T o .As stated in Section 2.1, this L63 setup rigorously satisﬁes all the assumptionsmade in Theorem 1 [40, 43, 60]. In addition, since ∆ t (cid:28) T o , ( N − ) ∆ t (cid:29) T o , and inthe T = ∆ t (cid:28) T , we expect no signiﬁcant sampling errors to be present inour numerical experiments; in particular, we expect the leading eigenfunctions of thedata-driven integral operators K T , ∆ t , N to be good approximations of the correspondingeigenfunctions of the operators K T from Theorem 1.5.2 Coherent observablesWe now discuss the properties of eigenfunctions of K T , ∆ t , N constructed using theapproach described in Section 4.3, some of which were already shown in Figure 1. Allresults were obtained using the symmetric Markov kernels in (35) with N = ∆ t = .

01, and T = T =

8. For the rest of this section, we suppress ∆ t and N indices from our notation. Moreover, we do not distinguish between eigenfunctions z ∈ L ( µ N ) and their continuous representatives ζ ∈ C ( M ) , as our visualizations willbe restricted to the training dataset { x n } N − n = for which z ( x n ) = ζ ( x n ) .We begin in Figure 2 with a plot of the leading 20 eigenvalues λ j , T of K T for T = T =

8, where both operators have the top eigenvalue λ , T = T = K T has a small spectral gap λ , T − λ , T ≈ . λ j , T ≈ . j =

20. Incontrast, when T = K T exhibits a signiﬁcantly larger spectral gap λ , T − λ , T ≈ . λ , T ≈ . λ , T ≈ . δ T ≈ .

001 and ˜ δ T ≈ . z = ( φ , T + i φ , T ) / √ T = ε -approximate eigenfunction of theKoopman operator with small ε . The scatterplots and time series plots in Figure 1were already suggestive of this behavior, which we now examine in further detail. Asa point of comparison, we consider the corresponding observable z constructed fromthe leading eigenfunctions of K T at T =

0, which were also depicted in Figure 1.Figure 3 shows the evolution of the observables z as a time-parameterized curve t n (cid:55)→ z ( x n ) on the complex plane over a portion of the training data spanning 50natural time units. In effect, these plots correspond to samplings of complex-valued Fig. 2

Leading 20 eigenvalues λ j , T of the integral operators K T for T = T = functions on the Lorenz attractor along dynamical trajectories, akin to the time seriesplots in Figure 1 which (up to a scaling by a factor of √

2) correspond to the real andimaginary parts of z . The T = z (cid:39) z acted upon by the Koopman operator, andcan also be assessed more quantitatively through plots of the time-autocorrelationfunction α t , shown in Figure 4. There, the modulus | α t | is seen to rapidly decay fromits initial value | α | =

1, reaching | α t | ≈ .

05 at t ≈ .

3, and never exceeds 0.4 after (cid:39) z constructed from the eigenfunctions of K T at T = T = z have a 90 ◦ phase differ-ence to a good approximation (at leas when | z | is not too small), and as indicated bythe time series plots in Figure 1, they have a nearly constant characteristic frequency.The coherent dynamical evolution stemming from this behavior is visually evident inthe scatterplots of the real and imaginary parts of U t z in Figure 1, which appear to“resist” mixing of level sets on signiﬁcantly longer timescales than the T = α t of z for T = α t oscillate at a near-constant frequency, and remainphase-locked to a 90 ◦ phase difference at least out to t =

10 natural time units, or (cid:39)

10 Lyapunov times. Meanwhile, the modulus | α t | exhibits a signiﬁcantly slowerdecay than what was observed for T =

0, and remains above 0.4 for all t ∈ [ , ] . elay-coordinate maps, coherence, and approximate spectra of evolution operators 29 Fig. 3

Evolution of the real and imaginary parts of the observable z = ( φ , T + i φ , T ) / √

2, constructedusing the leading two nonconstant eigenfunctions of the integral operator K T for no delays ( T =

0) and T =

8. Here, z is plotted as a time-parameterized curve t n (cid:55)→ z ( x n ) on the complex plane, corresponding toa sampling of its values along an L63 dynamical trajectory at times t n = n ∆ t . For clarity of visualization, t n is restricted to a time interval of length 50 (whereas the full training datasets span 640 natural time units). As a further test of the consistency between the empirical behavior of the T = z and the expected behavior from Theorem 1, in Figure 5 we compare theevolution of the autocorrelation function α t with a pure sinusoid e i ω t with frequency ω determined through the ﬁnite-difference-approximated generator using (32). Thegenerator-based frequency, ω ≈ .

24 (corresponding to a period of 2 π / ω ≈ . α t signal, with a slow build-upof phase decoherence that becomes noticeable by t (cid:39) ω ≈ .

24 frequency identiﬁed here through eigenfunctions of K T is close to an 8 .

18 approximate eigenfrequency identiﬁed in [19] through spectralanalysis of a compact approximation to the generator V constructed using reproduc-ing kernel Hilbert space (RKHS) techniques. The RKHS-based eigenfrequency has acorresponding approximate Koopman eigenfunction, z RKHS , which has a qualitativelysimilar spatial structure on the L63 attractor as the approximate eigenfunction z iden-tiﬁed here (compare Figure 5 in [19] with Figure 1 of this paper). Moreover, both z and z RKHS resemble an observable identiﬁed by Korda et al. [38] through a spectralanalysis technique for Koopman operators utilizing Christoffel-Darboux kernels infrequency space (see Figure 13 in [38]). Having been identiﬁed via three independentdata analysis techniques, it thus appears that the approximate eigenfrequency ω (cid:39) . Fig. 4

Real part, imaginary part, and modulus of the time autocorrelation function α t of the observables z in Figure 3. Fig. 5

A comparison of the real part of the autocorrelation function α t with a pure cosine wave cos ω t = Re e i ω t for the T = ω was computed through (32)using the ﬁnite-difference approximation of the generator.elay-coordinate maps, coherence, and approximate spectra of evolution operators 31 In this paper we have studied how kernel integral operators constructed from delay-coordinate mapped data can identify, through their eigenfunctions, dynamically co-herent features of measure-preserving, ergodic dynamical systems. We have shownthat a class of eigenfunctions of such operators lead to complex-valued observableswith an approximately cyclical evolution, behaving as ε -approximate eigenfunctionsof the Koopman operator for a bound ε that decreases with the length of the embed-ding window. Such observables encapsulate a natural notion of dynamical coherence,so we have argued, in the sense of having high regularity on the attractor, a well-deﬁned oscillatory frequency, and a slowly decaying time-autocorrelation amplitude.In addition, the spectral bounds were explicitly characterized as functions of the em-bedding window length, evolution time, and appropriate spectral gap parameters.These results extend previous work on integral operators approximating the pointspectrum of the Koopman operator in the inﬁnite-delay limit [17, 27] to the setting ofmixing dynamical systems with continuous Koopman spectra. Thus, they providea theoretical interpretation of the efﬁcacy of a number of data-driven techniquesutilizing delay embeddings, including DMDC [7], HAVOK analysis [11], NLSA[28, 30, 31], and SSA [10, 61], in extracting coherent signals from complex systems.An attractive aspect of these methods is that they are amenable to consistent data-driven approximation from time series data based on techniques originally developedin the context of spectral clustering [42]. In particular, the data-driven schemes arerigorously applicable in situations where the invariant measure is supported on non-smooth sets, such as fractal attractors, without requiring addition of stochastic noiseto regularize the dynamics.As a numerical application, we have studied how eigenfunctions of kernel inte-gral operators utilizing delay-coordinate maps identify coherent observables of theL63 model—a system known to have a unique SRB measure with mixing dynam-ics [43, 60], and thus absence of non-constant Koopman eigenfunctions in L . Wefound that for a sufﬁciently long embedding window (of approximately 8 Lyapunovtimes) the kernel-based approach, realized using a symmetric Markov kernel con-structed by bistochastic normalization [13] of a variable-bandwidth Gaussian kernel[8], identiﬁes through its two leading non-constant eigenfunctions an observable ofthe L63 system exhibiting a highly coherent dynamical behavior. This observable hasan oscillatory period of approximately 0.76 natural time units, and remains coherentat least out to 10 natural time units (approximately 9 Lyapunov timescales) as mea-sured by a 0.4 threshold of its time-autocorrelation function. Spatially, its the real andimaginary parts have a structure that could be qualitatively described as a wavenum-ber 1 azimuthal oscillation about the holes in the two lobes of the attractor; a patternthat resembles observables previously identiﬁed through Koopman spectral analysistechniques appropriate for mixing dynamical systems [19, 38].Possible applied directions stemming from this work include detection of coher-ence in prototype models for metastable regime behavior in atmospheric dynamics[16], as well as PDE models with intermittency in both space and time [44]. On thetheoretical side, it would be interesting to explore connections between the spectralresults presented here and geometrical characterizations of coherence, including the characterization given in DMDC based on the multiplicative ergodic theorem [7] andthe dynamic isoperimetry approach proposed in [25]. It may also be fruitful to em-ploy coherent eigenfunctions of integral operators based on delay-coordinate mapsto construct approximation spaces for pointwise and/or spectral approximation ofKoopman and transfer operators, including the extended dynamic mode decomposi-tion (EDMD) technique [62] and the RKHS compactiﬁcation approaches proposedin [19]. Acknowledgements

The author is grateful to Andrew Majda for his guidance and mentorship during apostdoctoral position at the Courant Institute from 2009–2012. He is especially grateful for his friendshipand collaboration over the years. This research was supported by NSF grant 1842538, NSF grant DMS1854383, and ONR YIP grant N00014-16-1-2649.

Conﬂict of interest

The author declares that he has no conﬂict of interest.

A Proof of Proposition 2

It is convenient to introduce an intermediate integral operator K T , ∆ t : C ( M ) → C ( M ) , K T , ∆ t f = (cid:90) Ω k T , ∆ t ( · , x ) f ( x ) d µ , which integrates against the invariant measure µ using the discrete-time delay-coordinate map, and splitthe analysis of the spectral convergence of K T , ∆ t , N to K T to two subproblems involving the convergence of(i) K T , ∆ t , N to K T , ∆ t as N → ∞ ; and (ii) K T , ∆ t to K T as ∆ t →

0. We now consider these two subproblems,starting from the second one.

Spectral convergence of K T , ∆ t to K T as ∆ t → The uniform convergence of the kernels k T , ∆ t to k T , i.e., lim ∆ t → (cid:107) k T , ∆ t − k T (cid:107) C ( M × M ) (see (26)), implies convergence of K T , ∆ t to K T in C ( M ) operatornorm. It then follows from results on spectral theory of compact operators [2, 12] that the analogousclaims to Proposition 2 hold for the eigenvalues and spectral projections, λ Σ , T , ∆ t and Π Σ , T , ∆ t , respectively,of K T , ∆ t . That is, we havelim ∆ t → λ j , T , ∆ t = λ j , T , lim ∆ t → Π Σ , T , ∆ t f = Π j , T f , ∀ f ∈ C ( M ) , (36)where Π Σ , T , ∆ t : C ( M ) → C ( M ) is the spectral projection of K T , ∆ t onto Σ . Spectral convergence of K T , ∆ t , N to K T , ∆ t as N → Unlike the K T , ∆ t → K T case, theoperators K T , ∆ t , N need not converge to K T , ∆ t in C ( M ) operator norm. In essence, this is because theweak convergence of measures in (33) is not uniform with respect to f , even upon restriction to functionsin C ( M ) . Nevertheless, as shown in [42], the continuity of the kernel k T is sufﬁcient to ensure that for aﬁxed f ∈ C ( M ) , a restricted form of uniform convergence holds, namelylim N → ∞ sup g ∈ G | E µ N g − E µ g | = , (37)where G ⊂ C ( M ) is the set of functions given by G = { k T ( x , · ) f ( · ) | x ∈ M } . elay-coordinate maps, coherence, and approximate spectra of evolution operators 33A collection of functions satisfying (37) is known as a Glivenko-Cantelli class.The Glivenko-Cantelli property turns out to be sufﬁcient to ensure that as N → ∞ , the sequence ofoperators K T , ∆ t , N exhibits a form of convergence to K T , ∆ t , called collectively compact convergence which,despite being weaker than norm convergence, is sufﬁciently strong to imply the spectral convergenceclaims in Proposition 2. We state the relevant deﬁnitions for collectively compact convergence below, andrefer the reader to [12, 42] for additional details. Deﬁnition 1

Let A N : E → E be a sequence of bounded linear operators on a Banach space E , indexed by N ∈ N . – A N is said to converge to an operator A : E → E if A N converges to A strongly, and for everyuniformly bounded sequence f N ∈ E the sequence ( A − A N ) f N has compact closure. – { A N } is said to be collectively compact if ∪ N ∈ N A n B has compact closure in E , where B is the unitball of E . – A N is said to converge to A collectively compactly if it converges pointwise, and there exists N ∈ N such that for all N > N , { A N } N > N is collectively compact.It can be shown that operator norm convergence implies collectively compact convergence, and collec-tively compact convergence implies compact convergence. The latter, is in turn sufﬁcient for the followingspectral convergence result: Lemma 3

With the notation of Deﬁnition 1, suppose that A N converges to A compactly. Let λ ∈ σ p ( A ) be an isolated eigenvalue of A with ﬁnite multiplicity m, and Σ an open neighborhood of λ such that σ ( A ) ∩ Σ = { λ } . Then, the following hold:(i) There exists N ∈ N , such that for all N > N , σ ( A N ) ∩ Σ is an isolated subset of the spectrum ofA N , containing at most m distinct eigenvalues whose multiplicities sum to m. Moreover, as N → ∞ , everyelement of σ ( A N > N ) ∩ Σ converges to λ .(ii) As N → ∞ , the spectral projections of A N onto σ ( A N ) ∩ Σ , deﬁned in the sense of the holomorphicfunctional calculus, converge strongly to the spectral projection of A N onto { λ } . Using a similar approach as Proposition 13 in [42], which employs, in particular, the Glivenko-Cantelliproperty in (37), it can be shown that as N → ∞ , K T , ∆ t , N converges collectively compactly to K T , ∆ t . Then,Lemma 3, in conjunction with the fact that K T , ∆ t is compact (so every nonzero element of its spectrum isan isolated eigenvalue of ﬁnite multiplicity), implies thatlim N → ∞ λ j , T , ∆ t , N = λ j , T , ∆ t , lim N → ∞ Π Σ , T , ∆ t , N f = Π Σ , T , ∆ t f , ∀ λ j , T , ∆ t ∈ Σ , ∀ f ∈ C ( M ) , (38)where Σ is the spectral neighborhood in the statement of the proposition. Proposition (2) is then proved bycombining (36) and (38). (cid:117)(cid:116) References

1. Arbabi, H., Mezi´c, I.: Ergodic theory, dynamic mode decomposition and computation of spectralproperties of the Koopman operator. SIAM J. Appl. Dyn. Sys. (4), 2096–2126 (2017). DOI10.1137/17M11252362. Atkinson, K.E.: The numerical solution of the eigenvalue problem for compact integral operators.Trans. Amer. Math. Soc. (3) (1967)3. Aubry, N., Guyonnet, R., Lima, R.: Spatiotemporal analysis of complex signals: Theory and applica-tions. J. Stat. Phys. , 683–739 (1991). DOI 10.1007/bf010483124. Baladi, V.: Positive transfer operators and decay of correlations, Advanced Series in Nonlinear Dy-namics , vol. 16. World scientiﬁc, Singapore (2000)5. Banisch, R., Koltai, P.: Understanding the geometry of transport: Diffusion maps for Lagrangian tra-jectory data unravel coherent sets. Chaos , 035804. DOI 10.1063/1.49717886. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation.Neural Comput. , 1373–1396 (2003). DOI 10.1162/0899766033217803177. Berry, T., Cressman, R., Greguri´c-Ferenˇcek, Z., Sauer, T.: Time-scale separation from diffusion-mapped delay coordinates. SIAM J. Appl. Dyn. Sys. , 618–649 (2013). DOI 10.1137/12088183x4 Dimitrios Giannakis8. Berry, T., Harlim, J.: Variable bandwidth diffusion kernels. Appl. Comput. Harmon. Anal. (1),68–96 (2016). DOI 10.1016/j.acha.2015.01.0019. Berry, T., Sauer, T.: Local kernels and the geometric structure of data. Appl. Comput. Harmon. Anal. (3), 439–469 (2016). DOI 10.1016/j.acha.2015.03.00210. Broomhead, D.S., King, G.P.: Extracting qualitative dynamics from experimental data. Phys. D (2–3), 217–236 (1986). DOI 10.1016/0167-2789(86)90031-x11. Brunton, S.L., Brunton, B.W., Proctor, J.L., Kaiser, E., Kutz, J.N.: Chaos as an intermittently forcedlinear system. Nat. Commun. (19) (2017). DOI 10.1038/s41467-017-00030-812. Chatelin, F.: Spectral Approximation of Linear Operators. Classics in Applied Mathematics. Societyfor Industrial and Applied Mathematics, Philadelphia (2011)13. Coifman, R., Hirn, M.: Bi-stochastic kernels via asymmetric afﬁnity functions. Appl. Comput. Har-mon. Anal. (1), 177–180 (2013). DOI 10.1016/j.acha.2013.01.00114. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. , 5–30 (2006). DOI10.1016/j.acha.2006.04.00615. Constantin, P., Foias, C., Nicolaenko, B., T´emam, R.: Integral Manifolds and Inertial Mani-folds for Dissipative Partial Differential Equations. Springer, New York (1989). DOI 10.1007/978-1-4612-3506-416. Crommelin, D.T., Majda, A.J.: Strategies for model reduction: Comparing different optimal bases. J.Atmos. Sci. , 2206–2217 (2004). DOI 10.1175/1520-0469(2004)061 (cid:104) (cid:105) (6), 1107–1145 (2019). DOI 10.1007/s10955-019-02272-w18. Das, S., Giannakis, D.: Koopman spectra in reproducing kernel Hilbert spaces. Appl. Comput. Har-mon. Anal. (2), 573–607 (2020). DOI 10.1016/j.acha.2020.05.00819. Das, S., Giannakis, D., Slawinska, J.: Reproducing kernel Hilbert space compactiﬁcation of unitaryevolution groups (2018)20. Davis, P.J., Rabinowitz, P.: Methods of Numerical Integration, 2nd edition edn. Academic Press, SanDiego (1984)21. Dellnitz, M., Froyland, G.: On the isolated spectrum of the PerronFrobenius operator. Nonlinearitypp. 1171–1188 (2000). DOI 10.1088/0951-7715/13/4/31022. Dellnitz, M., Junge, O.: On the approximation of complicated dynamical behavior. SIAM J. Numer.Anal. , 491 (1999). DOI 10.1137/S003614299631300223. Deyle, E.R., Sugihara, G.: Generalized theorems for nonlinear state space reconstruction. PLoS ONE (3), e18295 (2011). DOI 10.1371/journal.pone.001829524. Eisner, T., Farkas, B., Haase, M., Nagel, R.: Operator Theoretic Aspects of Ergodic Theory, GraduateTexts in Mathematics , vol. 272. Springer (2015)25. Froyland, G.: Dynamic isoperimetry and the geometry of lagrangian coherent structures. Nonlinearitypp. 3587–3622 (2015). DOI 10.1088/0951-7715/28/10/358726. Genton, M.C.: Classes of kernels for machine learning: A statistics perspective. J. Mach. Learn. Res. , 299–312 (2001)27. Giannakis, D.: Data-driven spectral decomposition and forecasting of ergodic dynamical systems.Appl. Comput. Harmon. Anal. (2), 338–396 (2019). DOI 10.1016/j.acha.2017.09.00128. Giannakis, D., Majda, A.J.: Time series reconstruction via machine learning: Revealing decadal vari-ability and intermittency in the North Paciﬁc sector of a coupled climate model. In: Conference onIntelligent Data Understanding 2011. Mountain View, California (2011)29. Giannakis, D., Majda, A.J.: Comparing low-frequency and intermittent variability in comprehensiveclimate models through nonlinear Laplacian spectral analysis. Geophys. Res. Lett. , L10710 (2012).DOI 10.1029/2012GL05157530. Giannakis, D., Majda, A.J.: Nonlinear Laplacian spectral analysis for time series with intermittencyand low-frequency variability. Proc. Natl. Acad. Sci. (7), 2222–2227 (2012). DOI 10.1073/pnas.111898410931. Giannakis, D., Majda, A.J.: Nonlinear Laplacian spectral analysis: Capturing intermittent and low-frequency spatiotemporal patterns in high-dimensional data. Stat. Anal. Data Min. (3), 180–194(2013). DOI 10.1002/sam.1117132. Giannakis, D., Ourmazd, A., Slawinska, J., Zhao, Z.: Spatiotemporal pattern extraction by spectralanalysis of vector-valued observables. J. Nonlinear Sci. (5), 2385–2445 (2019). DOI 10.1007/s00332-019-09548-133. Halmos, P.R.: Lectures on Ergodic Theory. American Mathematical Society, Providence (1956)elay-coordinate maps, coherence, and approximate spectra of evolution operators 3534. Holmes, P., Lumley, J.L., Berkooz, G.: Turbulence, Coherent Structures, Dynamical Systems andSymmetry. Cambridge University Press, Cambridge (1996)35. Karrasch, D., Keller, J.: A geometric heat-ﬂow theory of Lagrangian coherent structures. J. NonlinearSci. , 1849—1888 (2020). DOI 10.1007/s00332-020-09626-936. Koopman, B.O.: Hamiltonian systems and transformation in Hilbert space. Proc. Natl. Acad. Sci. (5), 315–318 (1931). DOI 10.1073/pnas.17.5.31537. Koopman, B.O., von Neumann, J.: Dynamical systems of continuous spectra. Proc. Natl. Acad. Sci. (3), 255–263 (1931). DOI 10.1073/pnas.18.3.25538. Korda, M., Putinar, M., Mezi´c, I.: Data-driven spectral analysis of the Koopman operator. Appl.Comput. Harmon. Anal. (2), 599–629 (2020). DOI 10.1016/j.acha.2018.08.00239. Kosambi, D.D.: Satistics in function space. J. Ind. Math. Soc. , 76–88 (1943)40. Law, K., Shukla, A., Stuart, A.M.: Analysis of the 3DVAR ﬁlter for the partially observed Lorenz’63model. Discrete Contin. Dyn. Syst. (3), 1061–10178 (2013). DOI 10.3934/dcds.2014.34.106141. Lorenz, E.N.: Deterministic nonperiodic ﬂow. J. Atmos. Sci. , 130–141 (1963)42. von Luxburg, U., Belkin, M., Bousquet, O.: Consitency of spectral clustering. Ann. Stat. (2), 555–586 (2008). DOI 10.1214/00905360700000064043. Luzzatto, S., Melbourne, I., Paccaut, F.: The Lorenz attractor is mixing. Comm. Math. Phys. (2),393–401 (2005)44. Majda, M., McLaughlin, D.W., Tabak, E.G.: A one-dimensional model for dispersive wave turbulence.J. Nonlinear Sci. , 9–44 (1997). DOI 10.1007/BF0267912445. Mezi´c, I.: Spectral properties of dynamical systems, model reduction and decompositions. NonlinearDyn. , 309–325 (2005). DOI 10.1007/s11071-005-2824-x46. Mezi´c, I., Banaszuk, A.: Comparison of systems with complex behavior: Spectral methods. In: Pro-ceedings of the 39th IEEE Conference on Decision and Control, pp. 1224–1231. IEEE, Sydney, Aus-tralia (1999). DOI 10.1109/CDC.2000.91202247. Mezi´c, I., Banaszuk, A.: Comparison of systems with complex behavior. Phys. D. , 101–133(2004). DOI 10.1016/j.physd.2004.06.01548. Packard, N.H., et al.: Geometry from a time series. Phys. Rev. Lett. , 712–716 (1980). DOI10.1103/physrevlett.45.71249. Robinson, J.C.: A topological delay embedding theorem for inﬁ- nite-dimensional dynamical systems.Nonlinearity (5), 2135–2143 (2005). DOI dx.doi.org/10.1088/0951-7715/18/5/01350. Sauer, T.: Time series prediction by using delay coordinate embedding. In: A.S. Weigend, N.A.Gerhsenfeld (eds.) Time Series Prediction: Forecasting the Future and Understanding the Past, SFIStudies in the Sciences of Complexity , vol. 15, pp. 175–193. Addison-Wesley (1993)51. Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. J. Stat. Phys. (3–4), 579–616 (1991). DOI10.1007/bf0105374552. Slawinska, J., Giannakis, D.: Indo-Paciﬁc variability on seasonal to multidecadal time scales. PartI: Intrinsic SST modes in models and observations. J. Climate (14), 5265–5294 (2017). DOI10.1175/JCLI-D-16-0176.153. Sprott, J.C.: Chaos and Time-Series Analysis. Oxford University Press, Oxford (2003)54. Steinwart, I.: On the inﬂuence of the kernel on the conistency of support vector machines. J. Mach.Learn. Res. , 67–93 (2001)55. Stone, M.H.: On one-parameter unitary groups in Hilbert space. Ann. Math (3), 643–648 (1932)56. Sz´ekely, E., Giannakis, D., Majda, A.J.: Extraction and predictability of coherent intraseasonalsignals in infrared brightness temperature data. Climate Dyn. (5), 1473–1502 (2016). DOI10.1007/s00382-015-2658-257. Takens, F.: Detecting strange attractors in turbulence. In: Dynamical Systems and Turbulence, LectureNotes in Mathematics , vol. 898, pp. 366–381. Springer, Berlin (1981). DOI 10.1007/bfb009192458. Trillos, N.G., Gerlach, M., Hein, M., Slepˇcev, D.: Error estimates for spectral convergence of the graphLaplacian on random geometric graphs towards the Laplace–Beltrami operator. Found. Comput.Math. (2019). DOI 10.1007/s10208-019-09436-w. In press59. Trillos, N.G., Slepˇcev, D.: A variational approach to the consistency of spectral clustering. Appl.Comput. Harmon. Anal. (2), 239–281 (2018). DOI 10.1016/j.acha.2016.09.00360. Tucker, W.: The Lorenz attractor exists. C. R. Acad. Sci. Paris, Ser. I , 1197–1202 (1999)61. Vautard, R., Ghil, M.: Singular spectrum analysis in nonlinear dynamics, with applications to paleo-climatic time series. Phys. D , 395–424 (1989). DOI 10.1016/0167-2789(89)90077-862. Williams, M.O., Kevrekidis, I.G., Rowley, C.W.: A data-driven approximation of the Koopman op-erator: Extending dynamic mode decomposition. J. Nonlinear Sci. (6), 1307–1346 (2015). DOI6 Dimitrios Giannakis10.1007/s00332-015-9258-563. Young, L.S.: What are SRB measures, and which dynamical systems have them? J. Stat. Phys.108