Cluster-based network modeling -- automated robust modeling of complex dynamical systems
CCluster-based network modeling–automated robustmodeling of complex dynamical systems
Daniel Fernex, , Bernd R. Noack , Richard Semaan ∗ Institut f¨ur Str¨omungsmechanik, Technische Universit¨at Braunschweig,Hermann-Blenk-Str. 37, 38108 Braunschweig, Germany Center for Turbulence Control, Harbin Institute of Technology,Shenzhen 518058, People’s Republic of China ∗ To whom correspondence should be addressed; E-mail: [email protected].
We propose a universal method for data-driven modeling of complex nonlineardynamics from time-resolved snapshot data without prior knowledge. Com-plex nonlinear dynamics govern many fields of science and engineering. Data-driven dynamic modeling often assumes a low-dimensional subspace or man-ifold for the state. We liberate ourselves from this assumption by proposingcluster-based network modeling (CNM) bridging machine learning, networkscience, and statistical physics. CNM only assumes smoothness of the dynam-ics in the state space, robustly describes short- and long-term behavior and isfully automatable as it does not rely on application-specific knowledge. CNMis demonstrated for the Lorenz attractor, ECG heartbeat signals, Kolmogorovflow, and a high-dimensional actuated turbulent boundary layer. Even the no-toriously difficult modeling benchmark of rare events in the Kolmogorov flowis solved. This automatable universal data-driven representation of complexnonlinear dynamics complements and expands network connectivity scienceand promises new fast-track avenues to understand, estimate, predict and con-trol complex systems in all scientific fields.
Climate, epidemiology, brain activity, financial markets and turbulence constitute examples ofcomplex systems. They are characterized by a large range of time and spatial scales, intrinsichigh dimensionality and nonlinear dynamics. Dynamic modeling for the long-term featuresis a key enabler for understanding, state estimation from limited sensors signals, prediction,control, and optimization. Data-driven modeling has made tremendous progress in the last1 a r X i v : . [ phy s i c s . d a t a - a n ] O c t ecades, driven by algorithmic advances, accessibility to large data, and hardware speedups.Typically, the modeling is based on a low-dimensional approximation of the state and systemidentification in that approximation.The low dimensional approximation may be achieved with subspace modeling methods,such as proper orthogonal decomposition (POD) models [1, 2], dynamic mode decomposition(DMD) [3] and empirical dynamical modeling [4], to name only a few. Autoencoders [5] repre-sent a general nonlinear dimension reduction to a low-dimensional feature space. The dynamicsystem identification is significantly simplified in this feature space.An early breakthrough in system identification was reported by Bongard and Lipson [6]using symbolic regression. The method performs a heuristic search of the best equation that de-scribes the dynamics [7]. They are however expensive and not easily scalable to large systems.Recent developments in parsimonious modeling lead to the “sparse identification of nonlineardynamics” (SINDy) algorithm that identifies accurate parsimonious models from data [8]. Sim-ilarly, SINDy is not easily scalable to large problems. The computational expense becomesexorbitant already for moderate dimensional feature spaces.This limitation may be by-passed by black-box techniques. These include Volterra series [9],autoregressive models [10] (e.g., ARX, ARMA, and NARMAX), eigensystem realization algo-rithm (ERA) [11], and neural network (NN) models [12]. These approaches, however, havelimited interpretability and provide little physical insights. Some (e.g. NN) require large vol-umes of data and long training time, luxuries that are not always at hand.In this study, we follow a novel modeling paradigm starting with a time-resolved snapshotset. We only assume smoothness of the dynamics in the state space liberating ourselves from therequirement of a low-dimensional subspace or manifold for the data and analytical simplicityof the dynamical system. The snapshots are coarse-grained into a small number of centroidswith clustering. The dynamics is described by a network model with continuous transitionsbetween the centroids. The resulting cluster-based network modeling (CNM) uses time-delayembedding to identify models with an arbitrary degree of complexity and nonlinearity. Themethodology is developed within the network science [13, 14, 15] and statistical physics [16]frameworks. Due to its generic nature, network analysis is being increasingly used to investigatecomplex systems [17, 18]. The proposed method builds on previous work by Kaiser et al. [19],where clustering is used to coarse-grain the data into representative states and the temporalevolution is modeled as a probabilistic Markov model. By construction, the state vector ofcluster probabilities converges to a fixed point representing the post-transient attractor, i.e., thedynamics disappear. A recent improvement [20] models the transition dynamics between thenetwork nodes as straight constant-velocity ‘flights’ with a travel time directly inferred from thedata. The present study expands on these innovations and generalizes the approach to arbitraryhigh-order chains with time-delay coordinates [21], and introduces a control-oriented extensionto include external inputs and control. Besides its accuracy, one major advantage the methodhas is the ability to control the resolution level through adaptive coarse-graining.Dynamics of complex systems is often driven by complicated small-scale (sometimes mi-croscopic) interactions (e.g. turbulence, biological signaling) that are either unknown or very2xpensive to fully-resolve [22]. The resolution of cluster-based network modeling can beadapted to match any desired level, even when microscopic details are not known. This uni-versal representation of strongly nonlinear dynamics, enabled by adaptive coarse-graining anda probabilistic foundation, promises to revolutionize our ability to understand, estimate, pre-dict and control complex systems in all scientific fields. The method is inherently robust andhonest to the data. It requires no assumption on the analytical structure of the model, and iscomputationally tractable, even for high-degrees of freedom problems. A code is available at: https://github.com/fernexda/cnm . Robust probability-based data-driven dynamical modeling for complex nonlinear systems hasthe potential to revolutionize our ability to predict and control these systems. Cluster-basednetwork models (CNM) reproduce the dynamics on a directed network, where the nodes arethe coarse-grained states of the system. The transition properties between the nodes are basedon high-order direct transition probabilities identified from the data. The model methodology isapplied to a variety of dynamical systems, from canonical problems such as the Lorenz attractorto rare events to high degrees of freedom systems such as a boundary layer flow simulation.The general methodology is illustrated in Fig. 1 with the Lorenz system and is detailed in thefollowing.
Data collection Transition properties PropagationData clustering
Fig. 1. Cluster-based network modeling methodology. M consecutive N − dimensional states x ( t ) ∈ R N × M are collected at fixed sampling frequency. Based on their similarity, the states aregrouped into K clusters. The network nodes are computed as the cluster centroids c i , and thetransition time T and transition probability Q between the nodes are identified from the data.The CNM dynamics are propagated as consecutive flights between centroids. Each transition ischaracterized by its destination, given by Q , and its transit time given by T . Data collection and clustering.
The starting point of CNM is the data collection of M con-secutive discrete N − dimensional state of the system x ( t ) ∈ R N equally-spaced in time with ∆ t , such that the state at t m is x ( t m ) = x ( m ∆ t ) = [ x m , . . . , x mN ] . The discrete states aregrouped into K clusters C k and the network nodes are identified as the clusters’ centroids c k ,defined as the average of the states in each cluster. In this study, clustering is achieved with the3nsupervised k − means++ algorithm [23, 24] that minimizes the inner-cluster variance. Otherclustering algorithms are possible. The choice is a problem-dependent option. The vector K = [ K , . . . , K I ] , K i ∈ [1 , K ] , contains the indexes of the consecutively-visited clusters overthe entire time sequence, such that K i is the index of the i th visited cluster. The first and lastclusters are C K and C K I , respectively. The size I of K is equal to the number of transitionsbetween K centroids over the entire ensemble plus one. We note that two sequential clustervisits are not necessarily equally-spaced in time, but rather depend on the state’s rate of changein their vicinity. Transition properties.
Before we detail the transition properties of cluster-based networkmodels [20] , we briefly review those of cluster-based Markov models [19] upon which the cur-rent method builds. In cluster-based Markov models, the state variable is the cluster population p = [ p , . . . , p K ] T , where p i represents the probability to be in cluster i and the superscript T denotes the transpose. The transitions between clusters are modeled with a first-order Markovmodel. The probability to move from cluster C j to cluster C k is described by the transitionmatrix P = ( P k,j ) ∈ R K × K as P k,j = Pr ( K i = k |K i − = j ) . (1)The transition matrix P is computed as P k,j = n k,j n j , (2)where n k,j are the number of samples that move from C j to C k , and n j is the number of transi-tions departing from C j regardless of the destination point.In [19], the transition time ∆ t is a user-specified constant. Let p l be the probability vectorat time t l = l ∆ t , then the change in one time step is described by p l +1 = P p l . (3)With time evolution, equation (3) converges to the asymptotic probability p ∞ := lim l →∞ p l . In atypical case, equation (3) has a single fixed point p ∞ .Conversely, CNM relies on the direct transition matrix Q , which ignores inner-cluster res-idence probability and only considers inter-cluster transitions. The direct transition probabilityis inferred from data as Q k,j = n k,j n j , (4)with Q j,j = Pr( K i = j |K i − = j ) = 0 , by the very definition of a direct transition. Gener-alizing to an L − order model, which is equivalent to using time-delay coordinates, the directtransition probability is expressed as Pr ( K i |K i − , . . . , K i − L ) . Illustrating for a second-ordermodel the probability to move to C l having previously visited C k and C j is given by Q l,k,j = Pr( K i = l |K i − = k, K i − = j ) . (5)4ime-delay embedding is a cornerstone of dynamical systems [25]. The optimal Markov chainorder L is problem-dependent (see Appendix D). Larger L values are typically necessary forproblems with complex phase-space trajectories. In this study, we shall demonstrate how time-delay embedding benefits extend to higher-order cluster-based network models.The second transition property is the transition time. For Markov models, the time step isa critical user-defined design parameter. If the time step is too small, the cluster-based Markovmodel idles many times in each cluster for a stochastic number of times before transitioningto the next cluster. The model-based transition time may thus significantly deviate from thedeterministic data-driven trajectories through the clusters. If the time step is too large, onemay miss intermediate clusters. This design parameter can be avoided in cluster-based networkmodeling (CNM). The key idea is to abandon the ‘stroboscopic’ view and focus on non-trivialtransitions, thus avoiding rapid state diffusion. Let t n and t n +1 be the time of the first and lastsnapshots to enter and, respectively, to leave C k at the n th iteration (Fig. 2). Here, iterations refer Fig. 2. Definition of the transition time between clusters C j and C k . The transit time τ n in C k at iteration n is the time range spanned by the data entry and exit times in the clusters, t n and t n +1 . The individual transition time τ nk,j is defined as the average transit time between twoneighboring clusters.to the sequential jumps between the centroids. The residence time τ n = t n +1 − t n corresponds tothe duration of the state transit in cluster C k at this iteration. We define the individual transitiontime from cluster j to cluster k for one iteration as half the residence time of both clusters, τ nk,j = τ n − + τ n t n +1 − t n − . (6)Averaging all n k,j individual transition times between C j to C k yields the transition time T k,j =1 /n k,j (cid:80) n k,j n =1 τ nk,j . This definition may appear arbitrary but is the least-biased guess consistentwith the available data. Similar to the direct transition matrix Q for an L − order chain, thetransition time matrix T = ( T k,j ) ∈ R K × K also depends on the L − previously visitedcentroids. When L is large, this could yield to two storage-intensive L + 1 − dimensional tensors Q and T with K L +1 elements. The expensive tensor creation and storage is circumvented bya lookup table (LUT), where only non-zero entries that correspond to actual transitions areretained. The look-up tables are typically orders-of-magnitude smaller than the full tensors.(see Appendix B). 5 ropagation. The final step in cluster-based network modeling propagates the state motion.We assume a uniform state propagation between two centroids c j and c k as, x ( t ) = α kj ( t ) c k + [1 − α kj ( t )] c j , α kj = t j − tT k,j , (7)where t j is the time when the centroids c j is left. The motion between the centroids may beinterpolated with splines for smoother trajectories. As CNM is purely data-driven, the modelquality is directly related to that of the training data. More specifically, the sampling frequencyand total time range must be selected, such that all relevant dynamics are captured and arestatistically fully converged. This usually requires a larger amount of data than other data-driven methods, such as ARMA and SINDy. CNM of the Lorenz system.
CNM is applied to the Lorenz system, a widely-used canonicalchaotic dynamical system [26] defined by three coupled nonlinear differential equations, d x d t = σ ( y − x )d y d t = x ( ρ − z ) − y (8) d z d t = xy − βz , where the system parameters are here defined as σ = 10 , ρ = 28 and β = 8 / . The data areclustered with K = 50 centroids, depicted in Fig. 3A. The snapshots are colored based on theircluster affiliations. CNM is performed with a chain order L = 22 using ≈ transitions,which cover the same time range as that of the original data. The optimal K and L valuesare problem-dependent. They are identified for the Lorenz system through a parametric study,where the root-mean square error of the autocorrelation function between the reference data andthe model is minimized (c.f. Appendix D).Time series obtained with CNM agree very well with the reference data (Fig. 3B and C). Theoscillating amplitude growth in both ears, as well as the ear switching, are correctly captured.The cluster probability distribution (CPD) q k , k = 1 , . . . , K provides the probability of thestate to be in a specific cluster. It indicates whether the modeled trajectories populate the phasespace similarly to the reference data (c.f. Appendix C). The CPD for both the data and CNM isshown in Fig. 3D. For clarity, q k is shown with 10 clusters only instead of the full 50 clusters.As the figure shows, CNM accurately reproduces the probability distribution. Following Protaset al. [27], the cluster-based network model is validated based on the autocorrelation functionof the state vector. This function avoids the problem of comparing two trajectories with finitedynamic prediction horizons due to phase mismatch. The autocorrelation function also yields6 EA B C
Fig. 3. Cluster-based network modeling of the Lorenz system. ( A ) Phase-space representa-tion of the data clustering. The centroids are depicted with black circles and the small circlesare the snapshots, colored by their cluster affiliation. The CNM accuracy is demonstrated inthe accurate reproduction of ( B )-( C ) the time series, ( D ) the cluster probability distribution, and( E ) the autocorrelation function. Black and red coloring denotes the reference and CNM data,respectively. 7he fluctuation energy at vanishing delay R (0) and can be used to infer the spectral behavior (seeAppendix C). As Fig. 3E shows, CNM accurately reproduces the fast oscillatory decay, evenafter dozens of oscillations, as well as the fluctuation energy R (0) , which is reproduced with a2.8% rms error. This performance is in contrast to the cluster-based Markov models, where timeintegration leads to the average flow, and to first-order cluster-based network models [20], wherethe prediction accuracy is significantly lower. A detailed comparison between the cluster-basedMarkov model, the first-order cluster-based network model, and the current model is providedin Appendix E. Demonstration on examples.
Cluster-based network modeling is applied to numerous exam-ples, ranging from analytical systems to real-life problems using experimental and simulationdata. The main results are summarized in Fig. 4. Details on each application are provided inAppendix A. The first two applications are the Lorenz [26] and R¨ossler [28] attractors, typicalcandidates for dynamical systems analysis. The two systems are governed by simple equationsand exhibit chaotic behavior under specific parameter values. The following two implemen-tations are one-dimensional systems: electrocardiogram measurements (ECG) [29], and thedissipative energy from a Kolmogorov flow [30]. Whereas the ECG exhibits the regular heart-beat pattern, the dissipative energy of the Kolmogorov flow is quasi-random with intermittentbursts. The last CNM application is a high-dimensional large eddy simulation of an actuatedturbulent boundary layer for skin friction reduction [31]. The clustering step on this ≈ milliongrid cells simulation is performed on the mode coefficients of a lossless proper orthogonal de-composition. This dimensionality reduction step significantly reduces the computational loadwhile yielding the same clustering outcome as the full difference matrix [19]. The boundarylayer time series are therefore represented with the mode coefficients.In each example, both the qualitative and quantitative dynamics are faithfully captured. Thereconstructed time series are hardly distinguishable from the original data. Intermittent eventssuch as the peaks in the R ¨ossler z − component and the dissipation energy bursts of the Kol-mogorov flow are statistically very well reproduced. The autocorrelation distributions of bothreference data and models match perfectly over the entire range, demonstrating both robustnessand accuracy. We note that robustness is inherent to CNM, since the modeled state alwaysremains close to the training data.The CPD of the data and CNM for the R¨ossler system, the ECG signal, the Kolmogorovflow dissipation energy and the actuated turbulent boundary layer are presented in Fig. 5. Forall cases, CNM accurately reconstructs the distributions. Remarkably, the probabilities of lessvisited-clusters corresponding to rare events for the Kolmogorov flow (Fig. 5C) or fast eventssuch as the peaks in the z directions of the R ¨ossler attractor (Fig. 5A) and the heartbeat pulse(Fig. 5B) are very well captured by CNM.A special characteristic of CNM is its ability to accurately model and predict systems withrare events. This ability is rooted in the probabilistic framework upon which CNM is con-structed, where the recurrence properties are the same as the reference data. If one cluster isvisited multiple times (or seldom) in the data, it will also be a recurrence point of the CNM.8 o r e n z R ö ss l e r E C G B o und a r y l a y e r Time series Phase space Validation Reconstructedtime seriesSystem K o l m ogo r o v fl o w jAðuÞj dx ¼c ð11CÞwhereJ ′ (u) =( ∇ f + ∇ f T ) u +nDf,K † (a) = −∇ a, and C ′ (u) =A † Au(seesection S2.3for thederivations). Weset A = ∇ to enforceacon-stant energy dissipation constraint. This implies that A † Au = − Du.Using the symmetries of Eqs. 11A to 11C, we find that it ad-mits the pair of exact solutions u ± ¼±ð2 ffiffiffiffi c p =k f Þsinðk f yÞe , a ± ¼± ffiffiffiffi c p ∫ sinð2k f yÞdy, andb ± ¼±ðnk f =2 ffiffiffiffi c p Þ. Morecomplex solutions,with unknown closed forms, may exist. We approximate these solu-tions using theNewton iterations described in section S3.AteachRe,weinitiatedseveral Newtoniterationsfromrandomini-tial conditions.Inadditiontothepairofexactsolutions(u ± ,a ± ,andb ± ),theiterationsyieldedonenontrivial solution.Figure3showstheresult-ingthreebranches of solutions includingtheexact solution u + (solidblack),theexactsolutionu − (dashedblack),andthenontrivial solution(red circles). For small Reynolds numbers, our Newton searchesonlyreturnedtheexactsolutions.AtRe ≃ f ) and thewavenumbers(1, 0) and (1, k f ) togetherwith their complex conjugatepairs. Incidentally, thesewavenumbersformatriad,(0,k f ) +(1,0) =(1,k f ).Thedominantmodeoftheoptimall i d h b (1 0) h d l | (1 0)| W Fig. 2. Extreme energy dissipation in the Kolmogorov flow. (A) The time se-riesof energydissipation rateDat Reynoldsnumber R=40.(B) A close-up of theenergy input I (solid red curves) and the energy dissipation D (dashed blackcurves)at Re=40.Theburstsin theenergydissipation areslightlypreceded witha burst in the energy input. A similar behavior is observed for all bursts and athigher Reynolds numbers. (C) The vorticity field ∇ ×u(x, t) =w(x, t)e at time t =433 over the domain x ∈ [0, 2p] ×[0, 2p].Farazmand and Sapsis, Sci. Adv. 2017;3:e1701533 22 September 2017 Fig. 4. The cluster-based network modeling implemented on five applications covering awide range of dynamics.
The first two applications are three-dimensional chaotic systems,the Lorenz and R¨ossler attractors. The two following examples are one-dimensional experi-mental measurements from an electrocardiogram and numerical simulation of the dissipationenergy in a Kolmogorov flow. The final application is a large-eddy simulation of an actuatedturbulent boundary layer. The excellent match of the autocorrelation functions for all applica-tions demonstrate the CNM’s ability to capture the relevant dynamics for any complex nonlinearsystem. The modeled time series faithfully reconstruct the data including the intermittent quasi-random bursts of the Kolmogorov dissipation energy, as well as the z − component pulses of theR¨ossler system.A generic example of a rare events problem is the Kolmogorov flow [32], a two-dimensional9 B DC Fig. 5. Cluster probability distribution (CPD) of the data and CNM for four applications. ( A )-( D ) CPD of the R¨ossler system, ECG signal, Kolmogorov flow dissipation energy, and ac-tuated turbulent boundary layer, respectively. For all cases, the data (black) and CNM (red)are in good agreement. The specific features of each dataset, such as the rare events of the Kol-mogorov dissipation energy and the fast heartbeat pulses are probabilistically well reconstructedby CNM.incompressible flow with sinusoidal forcing. With a sufficiently high forcing wavenumber,the flow becomes unstable and the dissipation energy D exhibits intermittent and spontaneousbursts (c.f. Fig. 6A). The dashed line denotes an arbitrary threshold beyond which a peak isconsidered a rare event. The probability distribution function (PDF) of the dissipation energyfrom the data and CNM are compared in Fig. 6B. The main peak centered around zero re-flect the stochastic nature of the dissipation energy, whereas the tail depicts rare events whoseoccurrence probability decreases with their amplitude. As the figure shows, CNM accuratelycaptures the probabilistic behavior of the dissipation energy. Both the main stochastic peak andthe rare event tail of the distribution are well reproduced. Moreover, the total number of burstsin the current sequence is well reproduced, with 58 bursts in the original data compared to 62for CNM. Control-oriented cluster-based network modeling (CNMc).
To disambiguate the effect ofinternal dynamics from actuation or external input, we generalize CNM to include control b .The transition probabilities Q ( b ) and transition times T ( b ) are first identified for each actuationsetting b individually. The three-step procedure for the propagation of a new control command ˆ b depicted in Fig. 7A is then performed. At each iteration, (1) a search for the nearest cen-10 B Fig. 6. Rare events from the Kolmogorov flow dissipation energy. ( A ) Time series ofthe dissipation energy D . The dashed line denotes an arbitrary threshold beyond which thepeaks, represented with green filling, are considered a burst. ( B ) Probability distribution of thedata (black) and CNM (red). Both the main peak and the decaying tail of the distribution areaccurately reproduced.troids from the two closest actuation test cases is performed. (2) Their transition properties arethen identified and (3) averaged to determine the transition of the state ˆ x . More details of theCNMc algorithm are provided in Appendix F. CNMc is applied to two systems at new controlconditions, the Lorenz attractor and the actuated turbulent boundary layer. The Lorenz systemwith ρ = 28 is interpolated from two test cases with ρ = 26 and ρ = 30 and the boundarylayer with actuation parameters λ + = 1000 , T + = 120 and A + = 30 is interpolated fromcases with λ + = 1000 , T + = 120 , A + = 20 and λ + = 1000 , T + = 120 , A + = 40 . TheCNMc settings are listed in Table 3. Despite the algorithm’s simplicity, the main dynamics areproperly captured, as shown by the autocorrelation functions in Fig. 7B and 7C, and the timeseries (Fig. 14). CNMc is cast in the same probabilistic framework as CNM and thereby retainsall previously-demonstrated advantages. As the dynamics are interpolated from centroids thatbelong to potentially different trajectories, the resulting motion might be noisier and a largernumber of centroids than regular CNM are typically required. We propose a universal data-driven methodology for modeling nonlinear dynamical systems.The method builds on prior work in cluster-based Markov modeling and network dynamics.Cluster-based network modeling has several unique and desirable features. (1) It is simpleand automatable. Once the various schemes are chosen (e.g., clustering algorithm, transitiontime, etc), only two parameters must be selected: the number of clusters K and the Markovchain order L . Too few centroids might oversimplify the dynamics, whereas too many might11 C (1) Nearest centroidsCNM state Next state(2) Nearest centroidstransitions(3) Averaged transition A Fig. 7. Control-oriented cluster-based network modeling (CNMc). ( A ) CNMc iterativelypropagates the state in the phase-space populated with the centroids from the two operatingconditions with the closest control parameters. (1) Neighboring centroids to the current state ˆ x n at iteration n are first identified. (2) Their transition properties are calculated and then (3)averaged to determine the next state ˆ x n +1 . CNMc accuracy is demonstrated by the autocorre-lation function distributions of the data (black) and the predicted case (red) for the ( B ) Lorenzsystem and the ( C ) actuated turbulent boundary layer, respectively.lead to a noisy solution. We note that a high Markov chain order L is not always necessarilyadvantageous. Both parameters are problem-dependent and can be automatically optimized.(2) The method does not require any assumption on the analytical structure of the model, onlysome sense of smoothness. It is always honest to the data. (3) The offline computational loadis low. In fact, the most expensive step in the process is the occasionally-required snapshot-based proper-orthogonal decomposition (POD) for dimensionality reduction. After the PODcomputation, the clustering and network modeling require a tiny fraction of the computationaloperation. (4) The recurrence properties are the same as the reference data. If one cluster isvisited multiple times (or seldom) in the data, it will also be a recurrence point of the CNM.This feature is what enables modeling of problems with rare events. (5) Long-term integrationwill never lead to divergence – unlike, e.g., POD-based models. The simplicity and robustness,however, have a price. On the kinematic side, the simple CNM version cannot extrapolate, e.g.,resolve oscillations at higher amplitudes not contained in the data. On the dynamic side, we losethe relationship to first principles: The network model is purely inferred from data, without linksto the governing equations. In particular, cluster-based models are not natural frameworks fordynamic instabilities, as the notion of exponential growth and nonlinear saturation is intimately12ied to Galerkin expansions. Subsequent generalizations need to overcome these restrictions.(6) The framework is generalizable allowing control-oriented predictions beyond the trainingdata. A simple interpolation-based control-oriented extension of CNM is proposed and tested.Despite its simplicity, CNMc accurately predicts the state dynamics at new operating conditionsover the entire sample record.CNM is found to have a distinct superiority over cluster-based Markov models, namely themuch longer prediction horizon as evidenced by the autocorrelation function. The modeling andprediction capabilities are demonstrated on a number of examples exhibiting chaos, rare events,and high-dimensionality. In all cases, the dynamics are remarkably well represented with CNM;The temporal evolution of the main flow dynamics, the fluctuation level, the autocorrelationfunction, and the cluster population are all accurately reproduced.CNM opens a novel automatable avenue for data-driven nonlinear dynamical modeling andreal-time control. It holds the potential for a myriad of further research directions. Its probabilis-tic foundations are naturally extendable to include uncertainty quantification and propagation.One limiting requirement of CNM is the relatively large statistically-converged training data itrequires compared to other known methods (e.g. ARMA and SINDy). This requirement couldbe relaxed through explicit coupling to first-principle equations. The control-oriented extensionmay be further refined and more broadly implemented on other applications. Acknowledgments
We are grateful to Themistoklis Sapsis, Steve Brunton, Wolfgang Schr¨oder and Marian Al-bers for the stimulating discussions and for providing some of the employed data.
Funding:
The research was funded by the Deutsche Forschungsgemeinschaft (DFG) in the frameworkof the research projects SE 2504/2-1.
Authors contributions:
B.R.N. conceptualized the al-gorithm. B.R.N., R.S. and D.F. performed the investigation, data analysis and interpretation.R.S. and D.F. wrote the manuscript and D.F. implemented the software.
Competing inter-ests:
The authors declare no competing interests.
Data and materials availability:
A CNMPython package along with the data used for this study are available in the github repository atgithub.com/fernexda/cnm.
Appendix A Problem settings
Cluster-based network modeling is applied to numerous examples, ranging from analytical sys-tems to real-life problems using experimental and simulation data. The main results are sum-marized in Fig. 4. The first two applications are the Lorenz [26] and R¨ossler [28] attractors,typical candidates for dynamical systems analysis. The following two implementations are one-dimensional systems: electrocardiogram measurements (ECG) [29], and the dissipative energyfrom a Kolmogorov flow [30]. The last CNM application is a high-dimensional large eddysimulation of an actuated turbulent boundary layer for skin friction reduction [31].13n this section, we detail the various systems including the numerical setup and the CNMmodeling parameters. CNM is fully parametrized by the number of clusters K and the modelorder L . Their selection plays an important role in the model accuracy. The values used for thevarious systems are listed in Table 1. The procedure to select K and L is detailed in Appendix C.The last column in Table 1 lists the normalized time delays t L /T , where T is the fundamentalperiod computed from the dominant frequency identified from the autocorrelation function.For purely random signals with no deterministic component, such as the dissipative energyof the Kolmogorov flow, no characteristic period can be defined. As indicated by the table,Table 1: CNM settings for all applications.
The number of clusters K and the model order L are listed for the five systems. The last column t L /T designates the normalized time delaycorresponding to the selected model order L . The fundamental period T is computed from thedominant frequency of the system, when possible. System Number ofclusters K Modelorder
L t L /T Lorenz 50 22 1.7R¨ossler 100 2 0.6ECG 50 23 0.14Kolmogorovflow 200 25 -Boundarylayer 50 3 0.25the CNM parameters are strongly dependent on the nature of the systems dynamics. Physicalinterpretation of the chosen parameters is provided for each system in the following.
Lorenz system
The Lorenz system [26] is a typical candidate for dynamical system analysis. Despite its lowdimension, it exhibits a chaotic behavior. The motion is characterized by periodic oscillationsof growing amplitude in the ’ears’ and a random switching between them. The Lorenz systemis driven by a set of three coupled nonlinear ordinary differential equations (ODEs) given by d x d t = σ ( y − x )d y d t = x ( ρ − z ) − y d z d t = xy − bz . (9)The selected parameters are σ = 10 , ρ = 28 , and β = 8 / with initial conditions ( − , , .The simulation is performed with a time step ∆ t = 0 . for a total of 57 000 samples. The14umerical integration is performed with the explicit Runge-Kutta method of 5 th order using the scipy library from the python programming language [33, 34].The relatively high number of clusters ( K = 50 ) ensures that each wing is resolved by twoorbits of centroids (see the phase space clustering in Fig. 4), and allows to reproduce someof the increasing oscillation amplitude. K can be increased (decreased) to resolve more (less)orbits in each ear. Due to the dynamics complexity and especially the random ear flipping, theLorenz system requires a large time delay t L equivalent to 1.7 rotation. With lower L values,the trajectory that reaches the ears intersection becomes more likely to wrongly switch sides. R¨ossler system
The R¨ossler is a three-dimensional system governed by non-linear ordinary differential equa-tions [28] that read d x d t = − y − z d y d t = x + ay d z d t = b + z ( x − c ) . (10)where the parameters are a = 0 . , b = 0 . , and c = 14 . The initial conditions are set to (1 , , and the simulation is performed with a time step ∆ t = 0 . for a total of 50 000 samples. TheR¨ossler data is also created with the scipy library using the explicit Runge-Kutta method of 5 th order. Similar to the Lorenz system, the R ¨ossler is widely used for dynamical system analysis.The system also yields chaotic behavior under specific parameters combinations. The motion ischaracterized by rotations of slowly growing amplitude in the x - y plane, and intermittent peaksin the z direction.The R ¨ossler system requires a large number of clusters to ensure a sufficient centroid cover-age in the peak for an accurate reproduction of this intermittent and fast event. However, sincethe trajectory itself is relatively simple, a time-delay t L of approximately half of the character-istic period is sufficient ( t L /T = 0 . ). Electrocardiogram signal
An electrocardiogram (ECG) measures the heart activity over time. Electrodes are placed onthe person’s skin to deliver an univariate voltage of the cardiac muscle movements. The timeseries exhibit the typical pulse associated with the heart beat. The ECG signal used in this studyis from the PhysioNet database [35]. The signal time range is 180 seconds and the samplingfrequency is 250 Hz.Similarly to the R¨ossler, the ECG requires a large number of clusters K in order to resolvethe quasi-circular phase space trajectory corresponding to the fast heartbeat pulse. Again, dueto the very regular and repetitive nature of the heart activity, a small time delay t L is sufficient.15 olmogorov flow The Kolmogorov flow is a two-dimensional generic flow defined on a square domain q = ( x, y ) with ≤ x ≤ L and ≤ y ≤ L , subject to a horizontal sinusoidal forcing f , defined by f ( x ) = sin( a y ) e , (11)where e = (1 , T is a unit vector in the x direction. The Kolmogorov flow is a test-bed forvarious fluid mechanics and turbulence studies [36]. The temporal evolution of the flow energy E , the dissipative energy D and input energy I are defined by E ( t ) = 12 L (cid:90) (cid:90) | u ( q , t ) | d q (12) D ( t ) = ν L (cid:90) (cid:90) | ω ( q , t ) | d q (13) I ( t ) = 1 L (cid:90) (cid:90) | u ( q , t ) · f ( q , t ) | d q (14)where ν is the fluid viscosity and ω is the vorticity. The rate of change of the energy is equalto the input energy minus the dissipation energy, as ˙ E = I − D . With increasing forcing wavenumber a , the dissipation energy yields intermittent and random bursts. This behavior makesthe dissipation energy a good candidate for rare events modeling. The current data were createdand generously shared by Farazmand et al. [37], with a wavenumber a = 4 and a Reynoldsnumber Re = 40 . The total time range is 100 000 dimensionless time units with a samplingfrequency of 10.The trajectory in the phase space spanned by D and its temporal derivative ˙ D (Fig. 4) isparticularly complex. The region with higher clusters density in the left region of the phasespace corresponds to the random fluctuations, and the region with sparser centroids distributiondescribes the intermittent energy bursts. Due to its stochastic nature and the absence of deter-ministic patterns, the Kolmogorov flow dissipation energy has been particularly challenging tomodel. Remarkably, with sufficiently large K and L , CNM is capable of modeling D with highaccuracy. Actuated turbulent boundary layer
The reduction of viscous drag is crucial for many flow-related application such as airplanes andpipelines, as it is a major contributor to the total drag. Many passive [38, 39] and active [40, 41]actuation techniques have been investigated to reduce the skin-friction drag. In this study, skin-friction reduction on a turbulent boundary layer is achieved by means of a spanwise travelingsurface wave [31, 42].The waves are defined by their wavelength λ + , period T + and amplitude A + . The super-script + denotes variables scaled with the friction velocity and the viscosity. Details about thecomputational setup can be found in Albers et al. [31]. The actuation parameters are λ + = 1000 ,16 + = 120 , A + = 60 . The total time range in + units is 846 and the sampling frequency is 0.5,resulting into 420 snapshots. The velocity field is given by u ( q , t + ) , where q = ( x + , y + , z + ) inthe Cartesian coordinates with x + ∈ [2309 , , y + ∈ [0 , and z + ∈ [0 , .Clustering of large high-dimensional datasets is costly. The required distance computationbetween two snapshots u m and u n d ( u m , u n ) = (cid:107) u m − u n (cid:107) Ω (15)is computationally very expensive. Here, the norm is defined as (cid:107) u (cid:107) Ω = (cid:112) ( u , u ) Ω (16)and the inner product in the Hilbert space L ( Ω ) of square-integrable vector fields in the domain Ω is given by ( u , v ) Ω = (cid:90) Ω u ( q ) v ( q ) d q . (17)For high-dimensional data such as the boundary layer velocity field, data compression with loss-less proper orthogonal decomposition (POD) can reduce the computational cost of clustering.Here, a snapshot u m is exactly expressed by the POD expansions as u ( q , t ) = u ( q ) + M − (cid:88) i =0 a i ( t ) Φ i ( q ) , (18)where u is the mean flow, Φ i denotes the POD modes, and a i ( t ) the corresponding modecoefficients. As shown by Kaiser et al. [19], the distance computation (15) can be alternativelyperformed with the mode coefficients instead of the snapshots, as d ( u m , u n ) = (cid:107) u m − u n (cid:107) Ω (19) = (cid:107) a m − a n (cid:107) . (20)Hence, a m = [ a m , . . . , a mM − ] becomes the POD representation of snapshot m at time t m = m ∆ t . Eq. (20) is computationally much lighter than (19). Despite the additional autocorre-lation matrix computation for the POD process, the data compression procedure remains verybeneficial for large numerical grids. According to [20], the computational savings amount to M + 12 J × I × K , (21)where M is the number of snapshots, K the number of clusters, I the number of k -means inneriterations, and J is the number of random centroids initializations. For typical values ( K ∼ , I ∼ K , and J ∼ ), the saving are one or two orders magnitude. Furthermore, POD iscomputed only once for each dataset and will benefit all future clusterings performed on thatdataset. 17he actuated turbulent boundary layer at the used actuation settings exhibits synchronizationwith the actuation wave. The dynamics show quasi limit-cycle behavior with superimposedwandering. Therefore, a low number of centroids are sufficient to capture the dynamics. Ifdesired, the limit-cycle meandering associated with higher frequency turbulence can be resolvedwith a larger set of centroids. The selected value of K = 50 is a compromise between asufficient resolution of the turbulence scales (64% of the data fluctuation is resolved) and areasonable model complexity. The dynamics are well captured with a low model order L ,equivalent to a time-delay of a quarter of the actuation period. Appendix B Cluster-based network modeling methodology
Robust probability-based data-driven dynamical modeling for complex nonlinear systems hasthe potential to revolutionize our ability to predict and control these systems. Cluster-based net-work models (CNM) reproduces the dynamics on a directed network [15], where the nodes arethe coarse-grained states of the system. The transition properties between the nodes are basedon high-order direct transition probabilities identified from the data. The model methodology isapplied to a variety of dynamical systems, from canonical problems such as the Lorenz attrac-tor to rare events to high degrees of freedom systems such as a boundary layer flow simulation.The general methodology is illustrated in Fig. 1 with the Lorenz system and is detailed in thefollowing.The first step is the data collection, where a set of M states x m , m = 1 , . . . , M , also calledobservations or snapshots, are collected from a dynamical system. They are equally spaced intime by ∆ t , so that x m = x ( m ∆ t ) . There is no restriction regarding the type of system nor thestate dimension.The second step is the identification of the network nodes using an unsupervised clusteringalgorithm that groups the snapshots into K clusters C k , k = 1 , . . . , K . In this study, we em-ploy the k -means++ algorithm [43, 44, 23] for its simplicity and ability to compute physicallymeaningful and interpretable centroids. The algorithm performs an iterative search for an opti-mal centroid distribution that increases the inner-cluster similarity, by executing the followingsteps: Step 1:
The initial centroid distribution c k is randomly generated. Step 2:
Each snapshot is affiliated to its closest centroid, following the cluster affiliation func-tion k defined as k ( x m ) = arg min i (cid:107) x m − c i (cid:107) , (22)where (cid:107) x (cid:107) = √ x · x . The function k maps, for each state x m , the index of the closestcentroid. 18 tep 3: The inner-cluster variance J of this centroid distribution is computed as J ( c , . . . , c K ) = K (cid:88) k =1 (cid:88) x m ∈C k (cid:107) x m − c k (cid:107) . (23) Step 4:
The centroid positions are updated by averaging the state snapshots within the corre-sponding cluster c k = 1 n k (cid:88) x m ∈C k x m , (24)where n k is the number of snapshots in cluster C k .Steps 2 to 4 are repeated until the inner-cluster variance J is minimized below a specifiedtolerance. Let the vector K = [ K , . . . , K I ] , K i ∈ [1 , K ] contain the indexes of all consecutivelyvisited clusters over the entire time sequence, such that K i is the index of the i th visited cluster.The first and last clusters are C K and C K I , respectively. The size I of K is equal to the number oftransitions between K centroids over the entire ensemble plus one. The transition time betweenthe sequential clusters is not constant and depends on the state velocity in the phase space. Thevector K constitutes the starting point to identify the transition properties between the centroids.Following the identification of the centroids as the network nodes, the third step of the CNMalgorithm characterizes the motion along the nodes. The dynamics are constructed as lineartransitions between centroids based on the transition probabilities and the transition times. Aftera centroid is reached, the next destination is identified using the direct transition probabilitytensor Q . The tensor Q ignores inner-cluster residence probability and only considers inter-cluster transitions. One novelty of this CNM implementation is to model the direct transitionprobabilities Q using an L -order Markov model, which is defined as a conditional probability Pr ( K i |K i − , . . . , K i − L ) . In this context, high-order Markov models are equivalent to time-delay embedding, which are well known in dynamical systems [25]. The benefit of time-delaycoordinates is elaborated in Appendix E. For a second-order model, Q ∈ R K × K × K is a third-order tensor and the probability to move to C l , having previously visited C k and C j , is inferredfrom the data and given by Q l,k,j = Pr( K i = l |K i − = k, K i − = j ) = n l,k,j n k,j . (25) n l,k,j designates the number of transitions to C l , having previously visited C j and C k , and n k,j isthe number of transitions departing from C k coming from C j , regardless of the destination. Notethat inner cluster iterations are not possible, by the very definition of a direct transition, suchthat Q j,j = Pr( K i = j |K i − = j ) = 0 .The transition time designates the time required to travel from one centroid to the next. Inthis CNM implementation, the transition time is defined as half of the sum of the residencetimes in two sequential clusters, as illustrated in Fig. 2 for a first-order model. Let t n and t n +1
19e the time of the first and last snapshots to enter and, respectively, leave C k at the n th iteration.The iterations designate the sequential jumps between the centroids. The residence time τ n in C k is τ n := t n +1 − t n . (26)Following this definition, the individual transition time τ nk,j from centroid c j to centroid c k isgiven by τ nk,j = t n +1 − t n − τ n − + τ n . (27)The transition time T k,j is the average of all transition times τ nk,j between centroids c j and c k as T k,j = 1 n k,j n k,j (cid:88) n =1 τ nk,j . (28)Consistent with the direct transition matrix Q for an L -order chain, the transition time T alsodepends on the L − previously-visited centroids.For large time delays (hence large L ), the process could yield to two storage-intensive L +1 -dimensional tensors Q and T with K L +1 elements. For instance, clustering with K = 20 clus-ters and an order L = 10 , the tensors would contain elements, which exceeds the storagecapacities of most computers. The expensive tensor creation and storage is circumvented bya lookup table (LUT), where only non-zero entries that correspond to actual transitions are re-tained. Thus, the tensors are replace by a simple array indexing operation. The look-up tablesare typically orders-of-magnitude smaller than the full tensors. As illustration, let’s considerthe example in Fig. 1 but with fictional transition properties. Here, all centroids have only onepossible destination, except c , where the state can transit to either c or c with an assumedequal probability for simplicity ( Q , , = Q , , = 0 . ). The 2 nd -order LUT for this example isillustrated in Table 2. The transition times are randomly chosen for this fictional example. If thestate is in c ( k = 2 ) having visited c before ( j = 1 ), the next destination is probabilisticallychosen between Q , , ans Q , , (lines 5 and 7), yielding a transition to either centroid 3 orcentroid 5. If the selected destination is c , the corresponding time to this transition is readfrom T , , (entry c j and c k isdefined as x ( t ) = α kj ( t ) c k + [1 − α kj ( t )] c j , with α kj = t j − tT k,j , (29)where t j is the time when the centroids c j is left and T k,j is the transition time from c j to c k .The motion between the centroids may be interpolated with splines for smoother trajectories.As CNM is purely data-driven, the model quality is directly related to that of the training data.More specifically, the sampling frequency and total time range must be selected, such that allrelevant dynamics are captured and are statistically converged.20able 2: Lookup table (LUT) of the transition properties.
The storage-intensive L + 1 -dimensional tensors Q and T with K L +1 elements are replaced by a lookup table, where onlynon-zero entries that correspond to actual transitions are retained. The storage requirements areorders of magnitude smaller than that of the full tensors. l k j Q l,k,j T l,k,j .
54 2 1 6 1 0 .
255 3 2 1 0 . . Appendix C Validation
This section presents two metrics used to evaluate the model performance, namely the autocor-relation function and the cluster probability vector. We note the occasional need for additionalor different metrics to validate models of certain systems, such as the probability distributionfunction of a system with rare events.
Autocorrelation function
The direct comparison of time series of complex systems is often pointless, as the trajectoriesmight rapidly diverge even when the dynamical features are preserved. This is especially truefor chaotic systems, where a slight change in initial conditions leads to completely differenttrajectories. Following Protas et al. [27], the cluster network model is validated based on thecomputed and predicted autocorrelation function of the state vector, defined as R ( τ ) = 1 T − τ T − τ (cid:90) ( x ( t ) , x ( t + τ )) d t, τ ∈ [0 , T ] . (30)where ( , ) designates the inner product, defined as ( x , y ) = x · y T . (31)This function avoids the problem of comparing two trajectories with finite dynamic predictionhorizons due to phase mismatch. The autocorrelation function also yields the fluctuation energyat vanishing delay R ( τ = 0) and can be used to infer the spectral behavior.In case of the cluster-based Markov model (CMM) (c.f. Appendix E), the time integrationquickly leads to the average state and is not indicative for the range of possible initial conditions.21ence, K trajectories are considered starting sequentially in each of the K clusters, such that p k ( t = 0) = [ δ k , . . . , δ Kk ] T , where δ is the Dirac delta function. The autocorrelations areweighted with the cluster probability p ∞ k as, ˆ R ( τ ) = K (cid:88) k =1 p ∞ k T − τ T − τ (cid:90) (cid:0) ˆ x k ( t ) , ˆ x k ( t + τ ) (cid:1) d t, τ ∈ [0 , T ] , (32)where ˆ x k is the CMM-modeled trajectory initialized in centroid k . Cluster probability distribution
The cluster probability distribution p = [ p , . . . , p K ] , provides the probability to be in a specificcluster. It indicates whether the model trajectories populate the phase space similarly to thereference data. The cluster probability distribution of the reference data is computed as p k = n k M , (33)where n k is the number of snapshots affiliated to cluster C k and M is the total number of snap-shots. In CMM, the state is iteratively propagated with time steps ∆ t , so that t l = l ∆ t . Theasymptotic cluster probability distribution for CMM p ∞ is determined for l → ∞ by p ∞ = lim l →∞ P l p , (34)where p is the initial condition and P is the transition matrix (c.f. Appendix E). For CNM,the asymptotic probability distribution p ∞ k is obtained for a long-enough time horizon T as thesum of the residence times τ i in C k divided by the total simulation time, p ∞ k = (cid:80) τ i T . (35)The cluster probability distributions for the data, CMM, and CNM with varying order are pre-sented in Fig. 10 and discussed in Appendix E. Appendix D Parameters selection
CNM is parametrized by the number of clusters K and the model order L . Both parametersare problem-dependent and can be optimized to achieve the highest prediction accuracy. Thissection details the current parameters selection process and provides guidelines and recommen-dations for other datasets.The number of clusters K determines the resolution level. A small number of clusters willmostly captures the dominant behavior at the macro level, which is suitable for simple dynam-ics such as a limit cycle. Few centroids also enable easier interpretation of the results and22hysical insights. Higher K allows to accurately model more complex systems, possibly at themicro scale level, and to uncover a broader range of dynamics including transitional events andhigher frequency components. Excessively large K values could be however detrimental, asadjacent trajectories get clustered separately despite describing nearly the same motion. Fur-thermore, the trajectory of an over-clustered system might introduce an artificial low-amplitudeand high frequency noise component resulting from sequential small jumps between misalignedcentroids.The model order L is synonymous with time-delay embeddings in units of past centroids.Increasing L allows modeling of more realistic trajectories, especially when the dynamics havecentroids located at the intersections of multiple trajectories, as illustrated in Fig. 8. In thisexample, trajectory 1 propagates sequentially through centroids → → and trajectory 2through centroids → → . With a first-order model, the state at c can possibly transitto c or to c , regardless of the previous path. Such a low-order model is thus associated withan increased risk of selecting the wrong trajectory. With a second-order model, however, theprevious centroid is taken into account in the conditional probability and the transition from c remains on the correct trajectory, ensuring a more accurate motion. Complex dynamics withmultiple intersected trajectories require larger order L . For better interpretability, L can beconverted into the approximate corresponding time delay t L , defined as t L = LT , (36)where T is the overall average transition time. Trajectory 1Trajectory 2
Fig. 8. Improved accuracy with higher model order L . In this example, trajectory 1 and2 intersect at c . With L = 1 , the state can possibly leave the trajectory it is following, e.g. trajectory 1, and wrongly transit to trajectory 2 after leaving centroids c . A 2 nd -order modelensure that the state remains on the correct trajectory.The values of K and T are problem-dependent. In this work, K and L are selected froma parametric study that minimizes the root mean square error (RMSE) of the autocorrelationfunction R ( τ ) of the reference data and that of the model (see Appendix C for details about23 ( τ ) ), defined as RMSE = (cid:118)(cid:117)(cid:117)(cid:116) N R N R (cid:88) n =1 (cid:16) R n − ˆ R n (cid:17) , (37)where ˆ R is the modelled autocorrelation and N R is the maximum lag number. The RMSEdistribution for the Lorenz system for varying K and L is presented in Fig. 9A. The results areshown for a range of K ∈ [10 , , and normalized time delay t L /T ∈ [0 . , . , where T is the fundamental period of the system. The black dots denote the computed configurations.The RMSE does not change linearly with the number of cluster and the model order. The errordistribution exhibits regions of high and low error. A low K and low L configuration producespoor dynamics (Fig. 9B), which is expected for a complex system like the Lorenz. As K and L increase, the error generally decreases, despite local maxima (Fig. 9C). The configurationindicated with the red dot (Fig. 9D) is selected for its high accuracy, which is comparable toother more complex models (e.g., Fig. 9E). This winning configuration consists of K = 50 clusters and an order L = 22 , that corresponds to t L /T = 1 . . Appendix E Comparison between CNM and CMM
The CNM implementation presented in this study builds on two prior cluster-based modelingmethods. The first, labelled cluster-based Markov model (CMM), propagates the state using aMarkov chain with a constant time step [19]. The second is the initial CNM implementation,which introduced realistic transition times between clusters that yielded a more accurate dy-namics [45, 20]. The present work extends CNM to high-order chains for both the transitionprobability and the transition time. The resulting drastic improvements over the two precedingmethods are demonstrated in this section using the Lorenz system as example.The starting point for CMM and both CNM variants is the clustering of the snapshots into K clusters C k (for details on the clustering algorithm, see Appendix B). Clustering reduces andcoarse-grains the original potentially high-dimensional data into a set of centroids c k . Bothmethodologies are described in the main manuscript and are briefly summarized in the follow-ing. In CMM, the state variable is p = [ p , . . . , p K ] , where p k is the probability of being incluster C k . The transition from centroid c j to c i is guided by the probability transition matrix P = ( P ij ) ∈ R K × K . The propagation of p in time is performed iteratively in steps of ∆ t andis given at time t l = l ∆ t by p l +1 = P p l . (38)The state x at time t l is given by x ( t l ) = K (cid:88) k =1 p k ( t l ) c k . (39)24 C D A E R M S E Fig. 9. Selection of the number of centroids K and model order L . ( A ) Root-mean squareerror distribution of the autocorrelation function R ( τ ) of the reference data and that of the modelfor the Lorenz system. The results are shown for a range of K ∈ [10 , and normalized timedelays t L /T ∈ [0 . , . , where T is the fundamental period of the system. The winningconfiguration is indicated with a red dot. ( B ) to ( E ) Autocorrelation function of the data (black)and CNM (red) for varying K and L . A small number of centroids K = 10 produces poordynamics ( B ). The distribution presents local minima and maxima ( C ). The agreement betweenthe reference and modeled R ( τ ) increases for larger K and L values (( D ) and ( E )).25espite its ability to provide insights into the guiding mechanisms of various physical sys-tems [46, 47], CMM fails at modeling the dynamics. The state vector of cluster probabilitiesultimately and unavoidably diffuses to a fixed point representing the post-transient attractor.The initial CNM version addresses this issue by introducing realistic transition times. Ateach iteration, the state shifts to a subsequent cluster in a data-inferred time. The direct tran-sition probabilities Q and transition times T are K × K zero-diagonal matrices, which resultfrom the state switching centroids at each iteration. The data-inferred transition times drasti-cally increase accuracy compared to CMM, especially for periodic dynamics. This algorithmis, however, not well-suited for complex phase-space trajectories and long time-horizon predic-tions. The present CNM variant extends the algorithm to high-order Markov models, whereboth past and current states are jointly considered to determine the next destination cluster andthe corresponding transition time. The optimal order is problem-dependent and can be opti-mized, as detailed in Appendix D. With this latest upgrade, CNM is now capable of modelingany complex nonlinear dynamical system.The three methods are benchmarked on the Lorenz system, that is described in AppendixA. Clustering is performed with K = 50 clusters and the model order is set to L = 22 . Thecluster probability distribution, which indicates whether the model trajectories populate thephase-space similarly to the data, is depicted for the three methods in Fig. 10A. For clarity ofpresentation and interpretation, the results are shown for only 10 centroids instead of 50. Thecoarsening is performed by affiliating the CNM-generated snapshots to the closest of the 10centroids. The converged probability distribution for CMM matches exactly that of the data, asalready reported in the literature [19, 48], whereas those for both CNM variants yield a verygood agreement.The time series of the reference data and of the three models are shown in Fig. 10B to 10E.The reference time series (10B) depict the growing amplitude oscillations in both ears as well asthe random ear switching. The CMM temporal evolution in 10C quickly asymptotes toward a fixvalue after a few oscillations, thus demonstrating the model inability to resolve any meaningfuldynamics. Conversely, both CNM models appear to properly duplicate the reference time series(Fig. 10D and Fig. 10E).The benefits of the high-order CNM become apparent when comparing the autocorrelationfunction distributions in Fig. 11. As expected, the autocorrelation function from CMM rapidlydrops to zero (Fig. 11A). The first-order CNM roughly reproduces the first few iterations, butboth amplitude and phase quickly diverge from those of the data (Fig. 11B). In contrast, thehigh-order CNM, accurately duplicates the oscillations amplitude and phase over the entirerange and evidentiates the correct modeling of the dynamics (Fig. 11C).26 CD E
DataCMMCNM - order 1CNM - order 22 A Fig. 10. Comparison of the cluster probability distributions and time series. ( A ) Clusterprobability distribution of the data, CMM, 1 st -, and 22 nd -order CNM, respectively. The distri-butions are shown for 10 centroids for clarity. CMM reproduces exactly the probabilities of thereference data, whereas the CNM cluster probability distributions agree very well with the data.( B )-( E )Time series of the data, CMM, 1 st -, and 22 nd -oder CNM, respectively. ( C ) The CMMmodel fails at predicting any dynamics. ( D ) and ( E ) Both CNM variants capture the oscillationsand ear switching of the reference data. Appendix F Control-oriented cluster-based network model-ing
To disambiguate the effect of internal dynamics from actuation or external input, we generalizeCNM to include control b . This enables predictions beyond the training data for new controlparameters ˆ b . The main steps of CNMc are illustrated in Fig. 12 and the algorithm is detailedin Algorithm 1.Before we detail the algorithm, we introduce some the relevant parameters and definitions.Let I be the number of test cases with I different control terms, where the superscript ( i ) BC DataCMMCNM - order 1CNM - order 22
Fig. 11. Comparison of the autocorrelation function.
Autocorrelation function of ( A ) thedata, ( B ) CMM, ( C ) 1 st -, and ( D ) 22 nd -order CNM, respectively. The superiority of the high-order CNM over the two other methods is clearly visible. ( A ) CMM yields a flat autocorrelationfunction, demonstrating that no dynamics are resolved. ( B ) After a few oscillations, the first-order CNM prediction rapidly deteriorates. ( C ) The agreement of the 22 nd -order model with thedata is excellent over the entire time range and confirms the model long time-horizon predictioncapabilities.designate the i th test case, i = 1 , . . . , I . We denote the control term for the new to-be-predictedtest case as ˆ b . The distance between two control terms b i and b j is defined by d (cid:0) b i , b j (cid:1) = (cid:107) b i − b j (cid:107) , (40)where (cid:107) . (cid:107) designates the Euclidean norm. Similarly to all machine-learning methods, CNMcis inherently limited in its ability to extrapolate beyond the data range. Specifically, the controlparameter ˆ b must lie within the range of available control space. Since the method relies oninterpolation, two neighboring test cases must be identified. Using the Euclidean distance (40),the two closest operating conditions with the shortest distance to ˆ b are determined. Henceforth,the superscript ( i ) designates specifically these two neighboring operating conditions, such that i = 1 , . The snapshots of these two operating conditions are separately grouped into K clusters C ( i ) k , k = 1 , . . . , K . Following CNM for individual operating conditions (see Appendix B), the28 tep 2Identify their transition propertiesStep 3Average the transitions Step 4Correct the position (optional)Step 1Find nearest centroids Fig. 12. Main steps of CNMc.
The centroids of two neighboring test cases are identified. Thestate x is propagated following 4 steps. Step 1: The nearest centroids c ( i ) ,nn k to the current state ˆ x n are determined (red outer circles). Step 2: The transition vector v ( i ) n k and transition time T ( i ) n k of each nearest centroid are identified. Step 3: The transition to the new position ˆ x n +1 is set by ˆ v and ˆ T , which are the average of the nearest neighbors transition vectors and transition times,respectively. Step 4: For some systems, an additional position correction is needed to avoid adrift of the trajectory toward one of the two attractors. These steps are repeated until the totalsimulation time is reached.transition probabilities Q ( i ) and transition times T ( i ) , of each operating condition are separatelycomputed. For a realistic initialisation of the new to-be-predicted system, the initial state ˆ x iscomputed as the average of a centroid from test case ( i = 1 ) and its nearest neighbor from testcase ( i = 2 ).Now that the two neighboring test cases are identified, the state of the new test case ˆ b can bepropagated. The motion propagation is performed iteratively, following the four steps illustratedin Fig. 12 and detailed in the following. Step 1:
For each neighboring operating condition ( i ) , the N k closest centroids to the state ˆ x n at iteration n are identified c ( i ) ,nn k , n k = 1 , . . . , N k . The search is performed using a29 -d tree, that organizes the data in a tree-like structure to rapidly find nearest neigh-bors [49]. For low-dimensional data, the k -d tree distance metric is typically the Eu-clidean distance. For the high-dimensional boundary layer data, where the flow is drivenby large scale actuation, an alternative norm that reduces high-frequency low-energysmall scale contribution is recommended. One possible norm is the L norm, definedas (cid:107) x (cid:107) = ( (cid:80) ni =1 | x i | ) / , which favors the mode coefficients with large magnitude. Step 2:
The transition properties of the N k nearest centroids are determined. The appropriate L − past centroids of each neighboring centroid c ( i ) ,nn k are identified by choosing from allpossible trajectories leading to c ( i ) ,nn k the most similar to the CNMc trajectory, as illustratedin Fig. 13 for a third-order model. In this example, the past centroids of c ( i ) ,nn k alongtrajectory 2 are selected, since trajectory 2 is most aligned with ˆ x up to this point. Thenext centroid c ( i ) ,n +1 n k and the transition time are identified from the corresponding Q and T . The motion direction is based on transition vectors v ( i ) n k that span the transition from c ( i ) ,nn k to c ( i ) ,n +1 n k as v ( i ) n k = c ( i ) ,n +1 n k − c ( i ) ,nn k . (41)For the first L − iterations, the CNMc trajectory is not long enough to identify the L − past centroids of the neighbors. In that case, the first L − transition properties areidentified from a first-order CNM model. Closest centroidCNMc trajectoryTranslatedCNMc trajectoryTrajectory 1Trajectory 2
Fig. 13. Determination of the appropriate past trajectory.
Example of a neighboring cen-troid c ( i ) ,nn k to the predicted state ˆ x with two possible past trajectories. The goal is to determinewhich of trajectories 1 or 2 is the most similar to the CNMc trajectory. First, the CNMc tra-jectory is translated so that ˆ x n coincides with c ( i ) ,nn k . Then, the past predicted states ˆ x n − l aresequentially compared to the previous centroids c ( i ) ,n − ln k of trajectories 1 and 2, l = 1 , . . . , L − .The trajectory with the smallest difference is selected as past for c ( i ) ,nn k . In this example, trajec-tory 2 is selected, since it is most aligned with the CNMc trajectory.30 tep 3: The transition from ˆ x n to ˆ x n +1 is fully characterized by the transition vector ˆ v andthe transition time ˆ T , computed by averaging those of the N k nearest centroids c ( i ) ,nn k . Aweighted average can be employed to account for the distance from ˆ b to b (1) and b (2) . Inthe case where ˆ b is equally distant to b (1) and b (2) , ˆ T and ˆ v are computed as ˆ T = 12 N k (cid:88) i =1 N k (cid:88) n k =1 T ( i ) n k , (42) ˆ v = 12 N k (cid:88) i =1 N k (cid:88) n k =1 v ( i ) n k . (43)The new position is ˆ x n +1 = ˆ x n + ˆ v and is reached in a corresponding time of t n +1 = t n + ˆ T . Step 4:
The optional final step is the position correction. Observations have shown that thepredicted dynamics sometimes tend to slide toward one of the two neighboring operatingconditions. To circumvent this issue, the position correction forces the state to have aconstant relative distance between the two test cases (1) and (2). The distance d ( i ) betweenthe predicted state ˆ x and a neighboring test case ( i ) is defined as the distance between ˆ x and the closest snapshot in this test case x ( i ) as d ( i ) = d (cid:0) ˆ x , x ( i ) (cid:1) = (cid:107) ˆ x − x ( i ) (cid:107) . (44)The correction is formulated to ensure that the ratio r of the distances between the cor-rected position ˆ x n +1 corr and the two neighboring test cases is the same as that of the distancesbetween ˆ b and the two neighboring control inputs r = d (cid:16) ˆ b , b (1) (cid:17) d (cid:16) ˆ b , b (2) (cid:17) = d (cid:0) ˆ x n +1 corr , x (1) (cid:1) d ( ˆ x n +1 corr , x (2) ) . (45)This is achieved by computing ˆ x n +1 corr as ˆ x n +1 corr = d (2) b x (1) + d (1) b x (2) d (1) b + d (2) b , (46)where d ( i ) b = d (ˆ b , b ( i ) ) is given by (40).In addition to the two CNM parameters (number of centroids K and model order L ), CNMcrequires three more settings: The number of closest centroids in each neighboring test case, thenorm for the nearest neighbor distance and the optional position correction. These parameterscan be optimized for each application separately. One possible approach to optimize these pa-rameters is the hold-out method, where the RMSE is evaluated on the autocorrelation functions.31able 3: CNMc settings for the Lorenz system and boundary layer applications.Application Number ofclusters K Modelorder L Number ofneighboringcentroids Distancenorm Positioncorrection
Lorenz system 250 83 1 Euclidean NoBoundarylayer 120 15 3 L Yes
A B x y z t x y z t a a a t C D a a a t Fig. 14. Time series modeling with CNMc.
Time series of the reference data (black) andthe predicted cases (red) for the ( A ) ( B ) the Lorenz system and the ( C ) ( D ) actuated turbulentboundary layer. The Lorenz attractor with ρ = 28 is interpolated from two test cases with ρ = 26 and ρ = 30 and the boundary layer with actuation parameters λ + = 1000 , T + = 120 and A + = 30 is interpolated from cases with λ + = 1000 , T + = 120 , A + = 20 , and λ + =1000 , T + = 120 and A + = 40 . For both applications, the main dynamical features are wellreconstructed.The CNMc parameters for the interpolated Lorenz system and for the boundary layer are sum-marized in Table 3. The corresponding time series for these two applications are presented inFig. 14. 32 lgorithm 1: CNMc procedureExtract the centroids of the two closest test cases;Initialize the state ˆ x n =0 = ˆ x ;Time initialization t n =0 = t = 0 ; while t n < T max dofor i ← , do Find the N k nearest centroids from test case ( i ) : c ( i ) ,nn k , n k = 1 , . . . , N k ; for n k ← to N k do Identify past trajectory of c ( i ) ,nn k ;Get transition time T ( i ) n k ;Get next centroid c ( i ) n k , next , and transition vector v ( i ) n k = c ( i ) ,n +1 n k − c ( i ) ,nn k ; endend Average transition time ˆ T = N k (cid:80) i =1 (cid:80) N k n k =1 T ( i ) n k ;Average transition vector ˆ v = N k (cid:80) i =1 (cid:80) N k n k =1 v ( i ) n k ;Update position ˆ x n +1 = ˆ x n + ˆ v ;Correct position ˆ x n +1 corr ;Update time t n +1 = t n + ˆ T ;Update iteration number n = n + 1 end References [1] P. Holmes, J. L. Lumley, G. Berkooz, C. W. Rowley,
Turbulence, coherent structures,dynamical systems and symmetry (Cambridge university press, 2012).[2] P. Benner, S. Gugercin, K. Willcox, A survey of projection-based model reduction meth-ods for parametric dynamical systems.
SIAM Review , 483–531 (2015).[3] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, J. N. Kutz, On dynamic modedecomposition: Theory and applications. Journal of Computational Dynamics , 391(2014).[4] H. Ye, R. J. Beamish, S. M. Glaser, S. C. Grant, C.-h. Hsieh, L. J. Richards, J. T. Schnute,G. Sugihara, Equation-free mechanistic ecosystem forecasting using empirical dynamicmodeling. Proceedings of the National Academy of Sciences , E1569–E1576 (2015).[5] S. L. Brunton, B. R. Noack, P. Koumoutsakos, Machine learning for fluid mechanics.
Ann. Rev. Fluid Mech. , 477–508 (2020).336] J. Bongard, H. Lipson, Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences , 9943–9948 (2007).[7] M. Schmidt, H. Lipson, Distilling Free-Form Natural Laws from Experimental Data.
Sci-ence , 81–85 (2009).[8] S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equations from data bysparse identification of nonlinear dynamical systems.
Proceedings of the national academyof sciences , 3932–3937 (2016).[9] F. C. Fu, J. B. Farison, On the Volterra series functional evaluation of the response ofnon-linear discrete-time systems.
International Journal of Control , 553–558 (1973).[10] C. Chatfield, Time-series forecasting (CRC press, 2000).[11] J.-N. Juang,
Applied system identification (Prentice-Hall, Inc., 1994).[12] T. Wang, H. Gao, J. Qiu, A combined adaptive neural network and nonlinear model pre-dictive control for multirate networked industrial process control.
IEEE Transactions onNeural Networks and Learning Systems , 416–425 (2016).[13] M. Newman, The physics of networks. Physics today , 33–38 (2008).[14] A.-L. Barab´asi, R. Albert, Emergence of scaling in random networks. Science , 509–512 (1999).[15] A.-L. Barab´asi, E. Bonabeau, Scale-free networks.
Scientific American , 60–69 (2003).[16] J. R. Norris,
Markov chains , no. 2 (Cambridge university press, 1998).[17] N. Marwan, J. F. Donges, Y. Zou, R. V. Donner, J. Kurths, Complex network approach forrecurrence analysis of time series.
Physics Letters A , 4246–4254 (2009).[18] K. Taira, A. G. Nair, S. L. Brunton, Network structure of two-dimensional decayingisotropic turbulence.
Journal of Fluid Mechanics (2016).[19] E. Kaiser, B. R. Noack, L. Cordier, A. Spohn, M. Segond, M. Abel, G. Daviller, J. ¨Osth,S. Krajnovi´c, R. K. Niven, Cluster-based reduced-order modelling of a mixing layer.
Jour-nal of Fluid Mechanics , 365–414 (2014).[20] H. Li, D. Fernex, R. Semaan, J. Tan, M. Morzy´nski, B. R. Noack, Cluster-based networkmodel.
Journal of Fluid Mechanics (in print), see arXiv:2001.02911 (2020).[21] W.-K. Ching, X. Huang, M. K. Ng, T.-K. Siu,
Markov Chains: Models, Algorithmsand Applications , International Series in Operations Research & Management Science(Springer, 2013), pp. 141–176. 3422] B. C. Daniels, I. Nemenman, Automated adaptive inference of phenomenological dynam-ical models.
Nature Communications , 8133 (2015).[23] D. Arthur, S. Vassilvitskii, k-means++: The advantages of careful seeding, Tech. rep. ,Stanford (2006).[24] A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review.
ACM Computing Surveys , 264–323 (1999).[25] F. Takens, Dynamical systems and turbulence, Warwick 1980 , Lecture Notes in Mathe-matics (Springer, University of Warwick, 1981), pp. 366–381.[26] E. N. Lorenz, Deterministic Nonperiodic Flow.
Journal of the Atmospheric Sciences ,130–141 (1963).[27] B. Protas, B. R. Noack, J. ¨Osth, Optimal nonlinear eddy viscosity in Galerkin models ofturbulent flows. Journal of Fluid Mechanics , 337–367 (2015).[28] O. E. R¨ossler, An equation for continuous chaos.
Physics Letters A , 397–398 (1976).[29] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, J. N. Kutz, Chaos as an intermit-tently forced linear system. Nature Communications , 19 (2017).[30] M. Farazmand, T. P. Sapsis, A variational approach to probing extreme events in turbulentdynamical systems. Science Advances , e1701533 (2017).[31] M. Albers, P. S. Meysonnat, D. Fernex, R. Semaan, B. R. Noack, W. Schr ¨oder, DragReduction and Energy Saving by Spanwise Traveling Transversal Surface Waves for FlatPlate Flow. Flow, Turbulence and Combustion , 125–157 (2020).[32] Z. Y. Wan, P. Vlachas, P. Koumoutsakos, T. Sapsis, Data-assisted reduced-order modelingof extreme events in complex dynamical systems.
PLOS ONE , e0197704 (2018).[33] G. Van Rossum, F. L. Drake, The python language reference manual (Network TheoryLtd., 2011).[34] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau,E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson,K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey,I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Hen-riksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. vanMulbregt, SciPy 1.0: fundamental algorithms for scientific computing in Python.
NatureMethods , 261–272 (2020). 3535] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark,J. E. Mietus, G. B. Moody, C. K. Peng, H. E. Stanley, PhysioBank, PhysioToolkit, andPhysioNet: components of a new research resource for complex physiologic signals. Cir-culation , 215–220 (2000).[36] E. D. Fylladitakis, Kolmogorov Flow: Seven Decades of History.
Journal of AppliedMathematics and Physics , 2227–2263 (2018).[37] M. Farazmand, T. P. Sapsis, Dynamical indicators for the prediction of bursting phenom-ena in high-dimensional systems. Physical Review E , 032212 (2016).[38] D. Bechert, W. Reif, (American Institute of Aeronauticsand Astronautics, 1985), p. 546.[39] M. Luhar, A. S. Sharma, B. J. McKeon, On the design of optimal compliant walls forturbulence control. Journal of Turbulence , 787–806 (2016).[40] Y. Du, G. E. Karniadakis, Suppressing wall turbulence by means of a transverse travelingwave. Science , 1230–1234 (2000).[41] M. Quadrio, Drag reduction in turbulent boundary layers by in-plane wall motion.
Philo-sophical Transactions of the Royal Society A: Mathematical, Physical and EngineeringSciences , 1428–1442 (2011).[42] D. Fernex, R. Semaan, M. Albers, P. S. Meysonnat, W. Schr¨oder, B. R. Noack, Actuationresponse model from sparse data for wall turbulence drag reduction.
Physical ReviewFluids , 073901 (2020).[43] J. MacQueen (University of California, 1967), vol. 1. Conference Name: Proceedings ofthe Fifth Berkeley Symposium on Mathematical Statistics and Probability.[44] S. Lloyd, Least squares quantization in PCM. IEEE Transactions on Information Theory , 129–137 (1982). Conference Name: IEEE Transactions on Information Theory.[45] D. Fernex, R. Semaan, M. Albers, P. S. Meysonnat, W. Schr¨oder, R. Ishar, E. Kaiser,B. R. Noack, Cluster-based network model for drag reduction mechanisms of an actuatedturbulent boundary layer. Proceedings in Applied Mathematics and Mechanics (2019).[46] Y. Cao, E. Kaiser, J. Bor´ee, B. R. Noack, L. Thomas, S. Guilain, Cluster-based analysisof cycle-to-cycle variations: application to internal combustion engines. Experiments inFluids , 1837 (2014).[47] R. Ishar, E. Kaiser, M. Morzy´nski, D. Fernex, R. Semaan, M. Albers, P. S. Meysonnat,W. Schr¨oder, B. R. Noack, Metric for attractor overlap. Journal of Fluid Mechanics ,720–755 (2019). 3648] J. ¨Osth, E. Kaiser, S. Krajnovi´c, B. R. Noack, Cluster-based reduced-order modelling ofthe flow in the wake of a high speed train.
Journal of Wind Engineering and IndustrialAerodynamics , 327–338 (2015).[49] J. L. Bentley, Multidimensional binary search trees used for associative searching.
Com-munications of the ACM18