On Reduced Input-Output Dynamic Mode Decomposition
PP r e p r i n t On Reduced Input-Output Dynamic Mode Decomposition
Peter Benner ∗ Christian Himpe † Tim Mitchell ‡ Abstract
The identification of reduced-order models from high-dimensional datais a challenging task, and even more so if the identified system should notonly be suitable for a certain data set, but generally approximate theinput-output behavior of the data source. In this work, we consider theinput-output dynamic mode decomposition method for system identifica-tion. We compare excitation approaches for the data-driven identificationprocess and describe an optimization-based stabilization strategy for theidentified systems.
Keywords:
Dynamic Mode Decomposition, Model Reduction,System Identification, Cross Gramian, Optimization
MSC: ∗ Computational Methods in Systems and Control Theory, Max Planck Institute for Dy-namics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg, GermanyORCID: 0000-0003-3362-4103, [email protected] † Computational Methods in Systems and Control Theory, Max Planck Institute for Dy-namics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg, GermanyORCID: 0000-0003-2194-6754, [email protected] ‡ Computational Methods in Systems and Control Theory, Max Planck Institute for Dy-namics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg, GermanyORCID: 0000-0002-8426-0242, [email protected] a r X i v : . [ c s . S Y ] D ec r e p r i n t In various applications, it can be difficult (sometimes even impossible) to derivemodels from first principles. However, the input-output response of a systemand data of the inner state may still be available; we refer to such a setup asa graybox . For example, a natural process whose underlying action is not wellunderstood can be considered a graybox since we may only be able to mea-sure its behavior. In applications, such as manufacturing and design, it maybe necessary to model a third-party provided subcomponent whose exact orfull specifications may not be obtainable due to it containing proprietary infor-mation. In order to gain insight into such natural or technical processes, andderive models that simulate and/or predict their behaviors, it is often beneficialand perhaps necessary to create models using measured or generated data. Thediscipline of system identification investigates methods for the task of obtainingsuch models from data. One class of methods for data-driven identification isdynamic mode decomposition (DMD), which also provides a modal analysis ofthe resulting systems. In this work, we investigate variants of DMD for the classof input-output systems and compare data sampling strategies.DMD has its roots in the modal decomposition of the Koopman operator[15, 26], which has been recently rediscovered for the spectral analysis of fluiddynamics [19]. The basic DMD method was introduced in [27], and variousextensions have been added, such as Exact DMD [28] or Extended DMD [9].Furthermore, DMD can also be used as a tool for model order reduction: [25]proposed using DMD for flow analysis and control while DMD has also beencombined with Galerkin-projection techniques to model nonlinear systems [1].For a comprehensive survey of DMD and its variants, see [16].Our work here builds upon two specific variants of DMD. The first is Dy-namic Mode Decomposition with Control (DMDc) [23], and thus by relation,also Koopman with Inputs and Control (KIC) [24]. The second is Input-OutputDynamic Mode Decomposition (ioDMD) [3, 4], which itself is closely relatedto the Direct Numerical Algorithm for Sub-Space State System Identification(N4SID) [31]. DMDc extends DMD to scenarios where the input of the dis-crete system approximation is given by a functional while ioDMD additionallyhandles the case when the system’s output is specified and also a functional.To generically identify a system without prior knowledge on the relevant in-put functions, techniques such as persistent excitation [6] have been well knownfor some time now. We propose an extension to the ioDMD method of [3] witha new excitation variant related to the cross Gramian matrix [11]. Additionally,since DMD-based identification does not guarantee that its resulting modelsare stable, we propose a post-processing procedure to stabilize ioDMD-derivedmodels, by employing the nonsmooth constrained optimization method of [10]to solve a corresponding stabilization problem.This document is structured as follows. In Section 2, an overview of theprerequisite dynamic mode decomposition method and its relevant variants isgiven. Section 3 describes the new excitation strategy while our optimization-based stabilization procedure is discussed in Section 4. Finally, two numericalexperiments are conducted in Section 5.2 r e p r i n t Consider the time-invariant ordinary differential equation (ODE):˙ x ( t ) = f ( x ( t )) , (1)with state x ( t ) ∈ R N and vector field f : R N → R N . Furthermore, consider fornow that (1) is sampled at uniform intervals for times t , . . . , t K . The most basicversion of DMD [27, 16] aims to approximate (1) by constructing a discrete-timelinear system x k +1 = Ax k , (2)with a linear operator A ∈ R N × N , such that if x k = x ( t k ), then x k +1 ≈ x ( t k +1 )should also hold, for all k = 0 , . . . , K − x , the sequence defined by (2) corresponds to atrajectory of the state vectors x k . The corresponding data matrix X ∈ R N × K is simply the in-order concatenation of these state vectors, that is, X = (cid:2) x x . . . x K (cid:3) . By forming the following two partial trajectories: X = (cid:2) x x . . . x K − (cid:3) and X = (cid:2) x x . . . x K (cid:3) , where X ∈ R N × K − is just the data matrix X with its last data point removedwhile X ∈ R N × K − is just X with its first data point removed, the idea behind(plain) DMD [27] is to then solve the linear system of equations: X = AX , which is in fact just (2), where x k +1 and x k have been replaced by X and X ,respectively. A best-fit solution to this problem is given by the pseudoinverse: A ≈ X X +0 , which is also the solution to minimizing the least-squares error in the Frobeniusnorm ( k · k F ): A = arg min ˜ A ∈ R N × N ( k X − ˜ AX k F ) . The DMD modes of (1) are given by the eigenvectors of the matrix A : A Λ = Λ V. Beyond just using a single uniformly-sampled time series, DMD has alsobeen generalized to a method called Exact DMD [28], which can additionallysupport the concatenation of multiple different time series and/or non-uniformtime steppings. This generalization of DMD is made possible by reducing therequirements of X and X to the lesser condition that their columns need onlybe composed in pairs of data such that X (: , k ) = f ( X (: , k )) is fulfilled.3 r e p r i n t We now give a high-level algorithmic description of DMD identification. Thepseudoinverse of the data matrix can be computed using the (rank-revealing)singular value decomposition (SVD), X = U Σ V ∗ , as follows: X +0 = V Σ − U ∗ . However, inverting tiny but nonzero singular values in the computed diagonalmatrix Σ poses a numerical problem, as these small singular values may beinaccurate. Applying Σ − could overamplify components of X , in particular,the less important ones. To counteract this effect, computing the pseudoinversevia the SVD is done in practice by truncating any singular values that aresmaller than some fixed ε ∈ R + , which is equivalent to adding a regularizationterm to the least-squares solution for A : A = arg min ˜ A ∈ R N × N k X − ˜ AX k + β k ˜ A k , for some value of the regularization parameter β ∈ R + . Following [28], thealgorithm for the DMD-based computation of the system matrix A , given astate-space trajectory X and a lower bound ε > X = (cid:2) x . . . x K (cid:3) .2. Assemble the shifted partitions X and X .3. Compute the SVD of X = U Σ V ∗ .4. Truncate Σ: e Σ ii = Σ ii if Σ ii ≥ ε ; 0 otherwise.5. Identify the approximate discrete system matrix: A := U ∗ X V e Σ − In this DMD variant, the order (dimension) of the computed matrix A is equalto the number of retained (nontruncated) singular values, but this truncation isdone solely for numerical accuracy; the intention is to keep as many componentsof the dynamics as possible. In contrast, model order reduction typically aimsto significantly reduce the system down to just a small set of essential dynamics,and accomplishing this goal will be the focus of Section 2.4. Considering systems whose vector field f depends not just on the state but alsoon an input function u : R → R M :˙ x ( t ) = f ( x ( t ) , u ( t )) , (3)has led to a DMD variant called Dynamic Mode Decomposition with Control(DMDc) [23]. We focus on a specific DMDc variant [23, Sec. 3.3] that aims toapproximate (3) by a linear discrete-time control system: x k +1 = Ax k + Bu k , (4)4 r e p r i n t where B ∈ R N × M (called the input operator) must also be identified in additionto A and input u k = u ( t k ) is a discretely-sampled version of the continuous inputfunction u ( t ) for some sampling times given by t k . In addition to forming X and X as in plain DMD (Section 2), an analogous construction of matrices forinput data is also done. An in-order concatenation of the series of input data u k vectors is done to obtain matrix U ∈ R M × K while the partial data matrix U ∈ R M × K − is simply U without its last column (the last input sample): U = (cid:2) u u . . . u K − u k (cid:3) and U = (cid:2) u u . . . u K − (cid:3) . Then, A and B can be obtained by computing the approximate solutions to thelinear system of equations given by: X = AX + BU = (cid:2) A B (cid:3) (cid:20) X U (cid:21) , which is (4) with u k replaced by U , x k by X , and x k +1 by X , and has solution: (cid:2) A B (cid:3) = X (cid:20) X U (cid:21) + . As mentioned in Section 1, DMDc is actually a special case of the KICmethod [24]. For KIC, the state of the system is also augmented with the dis-cretized input u k , which leads the resulting augmented system to have additionaloperators: (cid:20) x k +1 u k +1 (cid:21) = (cid:20) A BB u A u (cid:21) (cid:20) x k u k (cid:21) , where B u ∈ R M × N and A u ∈ R M × M . Of course, these two additional operatorsmust now also be identified along with matrices A and B . However, if oneassumes that the input is known and no identification of the associated operatorsis required, then B u and A u are just zero matrices, and it is thus clear that KICis a generalization of DMDc. Extending the underlying system once more to now also include an outputfunction y : R → R Q , with an associated output functional g : R N × R M → R Q that also depends on the state x and the input u , yields the following system:˙ x ( t ) = f ( x ( t ) , u ( t )) ,y ( t ) = g ( x ( t ) , u ( t )) . (5)Data-based modeling of systems of the form given by (5) has given rise to a classof DMD methods called Input-Output Dynamic Mode Decomposition (ioDMD)[3]. Similar to previously discussed DMD variants, the ioDMD method alsoapproximates the original system by a discrete-time linear system, but nowwith input and output: x k +1 = Ax k + Bu k ,y k +1 = Cx k + Du k , (6)5 r e p r i n t where C ∈ R Q × N and D ∈ R Q × M are the output and feed-through operators,respectively. Since this approximation includes output data, the discrete outputinstances y k = y ( t k ) ∈ R Q are also correspondingly arranged into a matrix Y ∈ R Q × K by in-order concatenation while Y ∈ R Q × K − simply omits the lastcolumn (output sample) of Y : Y = (cid:2) y y . . . y K − y K (cid:3) and Y = (cid:2) y y . . . y K − (cid:3) . Matrices A , B , C , and D are then approximated by solving: ( X = AX + BU Y = CX + DU or equivalently (cid:20) X Y (cid:21) = (cid:20) A BC D (cid:21) (cid:20) X U (cid:21) , (7)which is (6) but where u k , x k , x k +1 , and y k +1 have been replaced by U , X , X and Y , respectively, and has the solution: (cid:20) A BC D (cid:21) = (cid:20) X Y (cid:21) (cid:20) X U (cid:21) + . (8)This procedure is equivalent to the Direct N4SID algorithm [30, 14, Ch. 6.6]mentioned in Section 1.Note that all the DMD variants discussed so far identify the original continuous-time systems by linear discrete-time models. However, one can create a continuous-time model given by { b A, b B, b C, b D } , that is a first-order approximation to thediscrete-time model obtained by ioDMD, using for example, the standard first-order Euler method: x k +1 = x k + h ( b Ax k + b Bu k ) ⇒ Ax k + Bu k = x k + h b Ax k + h b Bu k ⇒ h b A b B i = (cid:2) h ( I − A ) hB (cid:3) . The output and feed-through operators for the continuous-time model remainthe same as in the discrete-time model produced by ioDMD, that is, b C = C , b D = D . Finally, it is important to note that (io)DMD derived models aregenerally only valid for the time horizon over which the data has been gathered. To accelerate the computation of ioDMD-derived models, we follow [7, 4] andreduce the order of the possibly high-dimensional state trajectories using aprojection-based approach. The data matrices X and X are compressed usinga truncated (Galerkin) projection Q ∈ R N × n , n (cid:28) N , Q ∗ Q = I : (cid:20) Q ∗ X Y (cid:21) = (cid:20) A r B r C r D r (cid:21) (cid:20) Q ∗ X U (cid:21)(cid:20) Q ∗ X Y (cid:21) (cid:20) Q ∗ X U (cid:21) + = (cid:20) A r B r C r D r (cid:21) . r e p r i n t The order of the identified system is thus determined by the rank of the projec-tion.A popular method to compute such a truncating projection Q is properorthogonal decomposition (POD) [12], which is practically obtained as the leftsingular vectors (POD modes) of the data matrix X : X SVD = U DV ∗ → X r = U (cid:124) X ≈ X. The number of resulting computed POD modes n depends on the desired pro-jection error k X − U U ∗ X k F ≤ ( P ni =1 D ii ) / , which is a consequence of theSchmidt-Eckhard-Young-Mirsky Lemma (see for example [5]).Note again that this data compression scheme has a very different motivationcompared to that of the aforementioned dimensionality-reduction done whencomputing the pseudoinverse via the truncated SVD (discussed in Section 2.1).The latter truncation, based on the magnitude of the singular values, is done forreasons of numerical accuracy when computing the pseudoinverse (and applyingit in subsequent computations). The projection-based truncation discussed here,using the sum of the singular values squared, allows for the possibility of amuch less onerous computational burden, as the state space of the models canoften be greatly reduced by discarding all but a handful of essential modes inorder to obtain a desired approximation error. Other projection-based data-driven model reduction techniques for the compression of the state trajectoryare similarly applicable, for example empirical balanced truncation [17]. DMD is a data-driven method, hence, the source of the data used for the systemidentification needs to be considered. Usually it is possible to identify an input-output system (for provided discrete input, state, and output functions) so thatthe identified system produces similar outputs given the input used for theidentification. To identify a model from data, the associated system needs toproduce outputs approximating the data source, not only for specific data sets,but for a whole class of relevant input functions and perhaps even arbitrarilyadmissible ones. For such generic linear system identification, the matrices A , B , C , D have to be computed in such a manner that (a) does not require a specificdata set and (b) allows behaviors of the system being modeled to be predictedfor a given initial condition and input function. This can be accomplished, forexample, by persistent excitation (PE) [6, 23], which utilizes signals such as stepfunctions or Gaussian noise as training input data. Here, we propose a relatedapproach that also exploits random (Gaussian) sampling, yet is based on thecross operator. The cross operator [13] W X : R N → R N is a tool for balancing-type model reduc-tion and encodes the input-output coherence of an associated system, which forlinear time-invariant systems is the so-called cross Gramian matrix [11]. This op-erator is defined as the composition of the controllability operator C : L → R N r e p r i n t with the observability operator O : R N → L : W X = C ◦ O . Thus, for a square system with the same input and output space dimension, W X maps a given initial state x via the observability operator to an output function,and is in turn passed to the controllability operator as an input function resultingin a final state : x O y C x ( T ) . To generate trajectory data, we modify the cross operator by replacing thecontrollability operator with the input-to-state map ξ : L → L [5, Ch. 4.3].This yields an operator W X : R N → L : x O y ξ x, which maps, as before, an initial state to an output function, but then maps thisoutput (as an input) function to a state trajectory (instead of to a final state).Compared to PE, the cross(-operator-based) excitation (CE) is a two-stage pro-cedure using perturbations of the initial state to generate the excitation, asopposed to perturbing the input directly.The cross excitation is related to indirect identification of closed-loop systems[29, 22], which is also a two-stage process. First, an intermediary system (open-loop) system is identified, which then is used in the second step to generatea signal that acts as an excitation for the identification of the actual closed-loop system. A distinct difference of CE compared to the indirect closed-loopidentification approach is that the latter exclusively uses input-output data whilethe former also uses state trajectory data in addition to the input-output data. As DMD and its variants are time-domain methods (ioDMD included), it istypically desired to preserve stability in the (reduced) identified discrete-timesystems. However, models derived by ioDMD are not guaranteed to be stable.To enforce stability, an additional post-processing step is required. For example,[2] proposed stabilizing models derived using Petrov-Galerkin projections bysolving a sequence of semidefinite programs. In this paper, we take a muchmore direct approach.A square matrix A is discrete-time stable if its spectral radius is less thanone, that is, ρ ( A ) <
1, where ρ ( A ) := max {| λ | : λ ∈ Λ( A ) } . Although the spectral radius is nonconvex, it is a continuous function withrespect to the matrix A and furthermore, it is continuously differentiable almosteverywhere (in the mathematical sense). In other words, the set of points wherethe spectral radius is nonsmooth only has measure 0 and so it holds that pointschosen randomly will, with probability 1, be outside of this set. Hence, despite This is also illustrated in [13, Fig. 1] r e p r i n t the nonsmoothness of the spectral radius, it is still possible to attain a wealth ofinformation from its gradient, since it is defined on all but a subset of measure0 in the full space. Thus if the matrix A from the ioDMD-derived model,that is, from (8), is not stable, we could consider employing a gradient-basedoptimization method to stabilize it, while hopefully ensuring that the resultingstabilized version of the ioDMD solution still models the original large-scalesystem. In order to meet these two objectives, we consider solving the followingconstrained optimization problem: (cid:20) A s B s C s D s (cid:21) = arg min ˜ A, ˜ B, ˜ C, ˜ D (cid:13)(cid:13)(cid:13)(cid:13) (cid:20) X Y (cid:21) − (cid:20) ˜ A ˜ B ˜ C ˜ D (cid:21) (cid:20) X U (cid:21) (cid:13)(cid:13)(cid:13)(cid:13) s.t. | λ ( ˜ A ) | < − τ, (9)where ˜ A ∈ R r × r , ˜ B ∈ R r × m , ˜ C ∈ R p × r , ˜ D ∈ R p × m , and τ ≥ necessarily need to solve (9) exactly. A few iterations maybe all that is needed to accomplish this task.As an alternative, one could consider solvingmin ˜ A, ˜ B, ˜ C, ˜ D (cid:13)(cid:13)(cid:13)(cid:13) (cid:20) A DMD B DMD C DMD D DMD (cid:21) − (cid:20) ˜ A ˜ B ˜ C ˜ D (cid:21) (cid:13)(cid:13)(cid:13)(cid:13) s.t. | λ ( ˜ A ) | < − τ in lieu of (9), where A DMD , B DMD , C DMD , and D DMD are the matrices producedby ioDMD. On the upside, this helps to avoid the problem of underdetermined-ness arising in (9) and encourages that a stable solution close to the originalioDMD-derived system is found. However, this modified objective no longertakes any observed data into account. We did evaluate this alternate optimiza-tion problem in our experiments, but the performance of the models it producedwas sometimes worse. As such, we will only report results for our experimentsdone using (9) in Section 5.The second issue for trying to solve (9) is that the spectral radius can bea rather difficult function to optimize, relatively speaking. First, despite beingcontinuously differentiable almost everywhere, the spectral radius is still a non-smooth function, specifically at matrices which have multiple eigenvalues thatattain the maximum modulus, that is, the value of the spectral radius. Typi-cally, minimizers of the spectral radius will be at such matrices (for example,see the plots of spectral configurations post optimization in [10, Section 4.1and Appendix 9.1]), so optimizing the spectral radius often means that an op-timization method must try to converge to a nonsmooth minimizer, a difficultprospect. Worse still is that the spectral radius is also non-locally Lipschitz atmatrices where these multiple eigenvalues attaining the value of the spectral ra-dius coincide (see [8]). Many of the available continuous optimization methods9 r e p r i n t are designed under the assumption that functions they optimize are smooth orif not, at least locally Lipschitz. If the functions being optimized do not meetthese criteria, these methods typically break down. Furthermore, the noncon-vexity of the spectral radius means that optimization codes may get stuck inlocal minima that may or may not provide sufficiently acceptable solutions.Although the inclusion of the spectral radius constraint makes (9) a nons-mooth, nonconvex optimization problem, with a non-locally-Lipschitz constraintfunction, again we do not necessarily need to solve it exactly (though this willremain to be seen in our experiments). Furthermore, while much of the litera-ture on nonsmooth optimization has historically focused on unconstrained prob-lems, there has also been recent interest in addressing problems with nonsmoothconstraints. For example, a new algorithm combining quasi-Newton BFGS(Broyden-Fletcher-Goldfarb-Shanno) updating and SQP (sequential quadraticprogramming) was recently proposed for general nonsmooth, nonconvex, con-strained optimization problems [10], where no special knowledge or structure isassumed about the objective or constraint functions. Particularly relevant toour approach here, this BFGS-SQP method was evaluated on 100 different spec-tral radius constrained optimization problems, with promising results relativeto other solvers [10, Section 6]. This indicates that it may also be a good solverfor our nonsmooth constrained optimization problem and thus we propose us-ing it to solve (9). Specifically, we use GRANSO: GRadient-based AlgorithmNon-Smooth Optimization [20], an open-source MATLAB code implementingthe aforementioned BFGS-SQP method of [10]. We implemented our new ioDMD variant using both PE and CE samplingstrategies (presented in Section 3.1) to collect observations of the original sys-tem’s behaviors. Furthermore, our software can also optionally stabilize theresulting ioDMD-derived models by using GRANSO [10] on our associated non-smooth constrained optimization problem.As discussed in Section 4, (9) is an underdetermined optimization prob-lem in DMD settings. Since such problems are in a sense quite flat, the normof the gradient merely being small can be a poor measure of when to termi-nate; correspondingly, we tightened GRANSO’s default termination toleranceof 10 − to 10 − (i.e. opts.opt_tol = 1e-8 ). Relatedly, convergence can alsobe slow so the choice of a starting point can also be critical. As our goal,specified by (9), is to stabilize a model while minimizing the tradeoff in anyincreased approximation error (that may occur due to stabilization), we simplyused the unstable ioDMD-derived models as starting points for GRANSO andused GRANSO’s custom termination feature to halt optimization once a modelhad been found that was both stable (for (9) we used τ := 0) and had an objec-tive value that was less than 1000 times the objective function evaluated at theoriginal ioDMD-derived unstable model. We found that this loose limit on howmuch the objective function was allowed to increase was more than adequate toretain good output errors. We ran GRANSO using its full-memory BFGS mode For a good introductory reference on many of the optimization techniques referred to inthis paper, see [21]. r e p r i n t (its default behavior and the recommended choice for nonsmooth problems) andkept all other GRANSO options at their default values as well.The numerical experiments were implemented in the Matlab language andwere run under MATLAB R2017a on a workstation computer with an Intel Corei7-6700 (4 Cores @ 3.4 GHz) and 8 GB memory. We demonstrate the effects of different types of excitation used for the ioDMD-based system identification by a numerical example. Specifically, for a giventarget data set, we identify a discrete-time linear system first using the targetdata itself, second by persistent excitation (PE) and third by utilizing the hereinproposed cross excitation (CE) from Section 3.1.The considered input-output system is based on the transport equation, withthe left boundary of the domain selected as input and the right boundary asoutput: ∂∂t z ( x, t ) = a ∂∂x z ( x, t ) , x ∈ [0 , ,z (0 , t ) = u ( t ) ,z ( x,
0) = 0 ,y ( t ) = z (1 , t ) . The partial differential equation is discretized in space using the first-orderfinite-difference upwind scheme, with a spatial resolution of ∆ x = and a = 1 .
3. The resulting ODE input-output system is then given by:˙ x ( t ) = Ax ( t ) + Bu ( t ) ,y ( t ) = Cx ( t ) . (10)The target data is given by discrete input, state and output functions from asimulation, which is performed by a first order implicit Runge-Kutta method.We used a time-step width of ∆ t = and a time horizon of T = 1, with aGaussian-bell type input function ˆ u ( t ) = e − ( t − ) .For the comparison, a discrete-time linear system was first identified from the1000 snapshots generated by a simulation of (10) excited by input ˆ u , to obtain abaseline for the modeling performance of ioDMD. The PE variant was sampledusing zero-mean, unit-covariance Gaussian noise and a unit step function inputwas used for the ioDMD-based identification. Finally, our CE variant had theinitial state sampled from a unit Gaussian distribution and a (component-wise)shifted initial state x ,i = 1 was tested. All methods were tested with ioDMDonly and then also with stabilization. Fig. 1 depicts the relative output error for simulations of systems identifiedusing data associated to the target input ˆ u , PE, and CE for increasingly ac-curate state-space data compression (that is, for increasingly smaller amountsof compression). The state-space data dimensionality was reduced using thePOD method with prescribed projection errors of 10 − , . . . , − . In this set ofexperiments, we did not use stabilization.11 r e p r i n t -3 -2 -1 O u t pu t E rr o r TargetPECE (a) Non-regularized identification bynoise input or randomly sampled initialstate. -10 -5 O u t pu t E rr o r TargetPECE (b) Non-regularized identification by stepinput or shifted initial state. -3 -2 -1 O u t pu t E rr o r TargetPECE (c) Regularized identification by noise in-put or randomly sampled initial state. -10 -5 O u t pu t E rr o r TargetPECE (d) Regularized identification by step in-put or shifted initial state.
Figure 1: First numerical experiment (Section 5.1.1): Non-stabilized ioDMD-based system identification. The plots show the identified (reduced order) sys-tem’s output error compared to the original system’s output for varying accu-racies of the POD projection-based model reduction.In Figs. 1a and 1b, the ioDMD procedure was not regularized by truncatingsingular values, while regularization Σ ii < − was used in Figs. 1c and 1d. InFigs. 1a and 1c, system identification was performed using zero-mean Gaussiannoise for the PE and an initial state sampled from a zero-mean Gaussian distri-bution for CE, respectively. In Figs. 1b and 1d, respectively, the identificationwas driven by a step input for the PE and a shifted initial state x ,i = 1 for CE.For this set of experiments, using the target data only produced stable sys-tems for large values of acceptable projection error while the PE-derived modelswere always unstable (and thus had very poor performance regardless of the pro-jection error). In contrast, the CE method produced stable systems for all levelsof projection error tested. Performance-wise, when comparing to the few target-data-derived models that also happened to be stable, the CE-derived models had12 r e p r i n t errors that were less than one order of magnitude higher (see Figs. 1a and 1c).On the other hand, when using the step input or shifted initial state, both PEand CE produced models with increasing accuracy as the level of acceptable pro-jection error of the data was decreased, as seen in Figs. 1b and 1d. In Fig. 1d,we see that the regularization limited the attainable accuracy for both PE andCE. The target-data-derived system had a constant error independent from theprojection error of the data. In this second set of experiments, Fig. 2 still shows the relative output errorfor simulations of systems identified using the target data, PE, and CE forincreasingly accurate state-space data compression, but now the systems havebeen post-processed using our optimization-based approach to enforce stability,as necessary. The state-space data dimensionality was again reduced usingthe POD method, with prescribed projection errors of 10 − , . . . , − . Thesubfigures are arranged as they were in Fig. 1.The step function PE and shifted initial state CE are unaffected by ourstabilization post-processing phase, as these systems were already stable; thustheir plots are the same in Figs. 2b and 2d as they were in Figs. 1b and 1d. In thecase of using Gaussian noise or randomly sampled initial state (Figs. 2a and 2c),which had not yielded stable systems for the target data or PE (either with orwithout regularization), our optimization-based post-processing procedure nowenforced stability.For the ioDMD-derived models that were unstable, GRANSO was able tostabilize all 24 of them. The average number of iterations needed to find the firststabilized version was 13.5 while the maximum was just 61. Furthermore, for12 of the problems, our full termination criteria were met in less than 20 itera-tions while the average and maximum iteration counts over all 24 problems wererespectively 84.2 and 329. This demonstrates that our optimization-based ap-proach is indeed able to stabilize such models reliably and efficiently. Solving (9)via GRANSO also met our secondary goal, that stabilization is achieved withoutdeviating too significantly from the original unstable models. The largest ob-served relative change between an initial unstable model and its correspondingGRANSO-derived stabilized version was just 1.44% while the average observedrelative change was merely 0.231%; the relative differences were calculated bycomparing vec[ A s , B s ; C s , D s ], where the matrices are GRANSO’s computed sta-bilized solution to (9), to a similar vec of the original (unstable) ioDMD-derivedmodel. We now compare the order of the identified systems. The order of the identifiedsystem is determined by the projection error selected for the state-space com-pression in the POD. For each data set (Target, Gaussian noise PE, Gaussiansample CE, Step input PE, shifted initial state CE), the resulting reduced orderis plotted for the different prescribed projection errors in Fig. 3a. As we can see,the step-input PE and shifted-initial-state CE method behave similar in termsof system dimension while for the Gaussian-noise PE and Gaussian-sampledinitial-state CE, the CE variant resulted in smaller system dimensions.13 r e p r i n t -3 -2 -1 O u t pu t E rr o r TargetPECE (a) Non-regularized identification bynoise input or randomly sampled initialstate. -10 -5 O u t pu t E rr o r TargetPECE (b) Non-regularized identification by stepinput or shifted initial state. -3 -2 -1 O u t pu t E rr o r TargetPECE (c) Regularized identification by noise in-put or randomly sampled initial state. -10 -5 O u t pu t E rr o r TargetPECE (d) Regularized identification by step in-put or shifted initial state.
Figure 2: Second numerical experiment (Section 5.1.2): Stabilized ioDMD-basedsystem identification. The plots show the identified (reduced order) system’soutput error compared to the original system’s output for varying accuracies ofthe POD projection-based model reduction.In terms of computational cost, only the Target and Gaussian-noise PEvariants required stabilization and as such, it is only for these that we seeincreased runtimes, as shown by the red and green plots (respectively) in Fig. 3b.Otherwise, the runtimes were mostly identical for the other variants in thecomparison.
One potential downside to our optimization-based stabilization procedure isthat full-memory
BFGS inverse Hessian updating (GRANSO’s recommendedsetting for nonsmooth problems) requires per-iteration work and storage that isquadratic in the number of optimization variables. As the number of optimiza-14 r e p r i n t S ys t e m D i m en s i on TargetPE (Gauss)CE (Gauss)PE (Step) CE (Shift) (a) Comparison of the identified system’sreduced order for different pre-selectedmean L projection errors used in bothexperiments (see Section 5.1.1). -1 S t ab ili z a t i on R un t i m e TargetPE (Gauss)CE (Gauss)PE (Step) CE (Shift) (b) Comparison of runtimes in seconds forthe stabilization procedure in the secondexperiment (Section 5.1.2).
Figure 3: Comparison of reduced identified system orders and stabilization run-times.tion variables is ( r + m ) × ( r + p ), the running time required to solve (9) couldbecome unacceptably long as r , the reduced order of the model, is increased.Thus, we now also consider whether or not limited-memory BFGS updating canalso be effective for solving (9).Using limited-memory BFGS has the benefit that the per-iteration workand storage is reduced to a linear amount (again in the number of optimizationvariables). However, one of the tradeoffs is that convergence can be quite slowin practice. For smooth problems, full-memory BFGS converges superlinearlywhile limited-memory BFGS only does so linearly; on nonsmooth problems,linear convergence is the best one can typically expect. Another potential issueis that while there has been much evidence supporting that full-memory BFGSis very reliable for nonsmooth optimization, the case for using limited-memoryBFGS on nonsmooth problems is much less clear; for a good overview of theliterature on this topic, see [18, Section 1].To investigate this question, we reran the experiments from Section 5.1 asecond time but where GRANSO’s limited-memory BFGS mode was now en-abled. Specifically, we configured GRANSO to approximate the inverse Hessianat each iteration using only the 10 most recently computed gradients, accom-plished by setting opts.limited_mem_size = 10 . All other parameters of ourexperimental setup were kept as they were described earlier.In this limited-memory configuration, GRANSO often required significantlymore iterations, as one might expect. The average and max number of iterationsto find the first stable version of a model were respectively 73.6 and 474, aboutan order of magnitude more iterations than incurred when using full-memoryBFGS. On the other hand, for 19 of the 24 problems, stable models were encoun-tered within the first 20 iterations. To meet our full termination criteria, theaverage and max number of iterations were respectively 222.0 and 811, roughlyabout two and a half times more than incurred when using full-memory BFGS.15 r e p r i n t Nevertheless, 12 of the 24 problems were still satisfactorily solved in less than20 iterations, matching the earlier result when using full-memory BFGS. De-spite the large increases in iteration counts, GRANSO’s overall runtime was onaverage 3.83 times faster when enabling limited-memory BFGS.With respect to output error in this limited-memory evaluation, the resultingstabilized models still essentially matched the earlier results using full-memoryBFGS. There was one notable exception, for Target data using the smallestprojection error of 10 − , where GRANSO remarkably found a better-performingmodel when using limited-memory BFGS. However, we did observe that thequality of the stabilized models appeared to be much more sensitive to changingGRANSO’s parameters than they were when using full-memory BFGS. As aconsequence, we still advocate that solving (9) with GRANSO is generally bestdone using its default full-memory BFGS updating. Nonetheless, if this is simplynot feasible computationally, one may still be able to obtain good results usinglimited-memory BFGS but perhaps not as reliably or consistently.As a final clarifying remark on this topic, we note that one cannot neces-sarily expect good performance on nonsmooth problems when using just any BFGS-based optimization code and that generally it is critical that the choiceof software is one specifically designed for nonsmooth optimization. Indeed, thisis highlighted in the evaluation done in [10, Section 6], where off-the-shelf quasi-Newton-based codes built for smooth optimization perform much worse on a testset of nonsmooth optimization problems compared to the quasi-Newton-basedcodes specifically built with nonsmooth optimization in mind.
In this work, we evaluated the approximation quality of ioDMD system iden-tification using a novel excitation scheme and a new optimization-based, post-processing procedure to ensure stability of the identified systems. Our newcross excitation strategy, particularly when used with random sampling, oftenproduces better results than when using persistent excitation, and our experi-ments indicate that both excitation schemes are useful for efficiently obtaininggood models for approximating the target data. Furthermore, we show thatdirectly solving a nonsmooth constrained optimization problem can indeed bea viable approach for stabilizing ioDMD-derived systems while retaining thesalient properties for approximating the output response.
Code Availability
The source code of the presented numerical examples can be obtained from: http://runmycode.org/companion/view/2902 and is authored by:
Christian Himpe and
Tim Mitchell .16 r e p r i n t References [1] A. Alla and J. N. Kutz. Nonlinear model order reduction via dynamicmodel decomposition.
SIAM J. Sci. Comput. , 39(5):B778–B796, 2017. doi:10.1137/16M1059308 .[2] D. Amsallem and C. Farhat. Stabilization of projection-based reduced-order models.
Numerical Methods in Engineering , 91(4):358–377, 2012. doi:10.1002/nme.4274 .[3] J. Annoni, P. Gebraad, and P. Seiler. Wind farm flow modeling using input-output dynamic mode decomposition. In
American Control Conference(ACC) , pages 506–512, 2016. doi:10.1109/ACC.2016.7524964 .[4] J. Annoni and P. Seiler. A method to construct reduced-order parameter-varying models.
International Journal of Robust and Nonlinear Control ,27(4):582–597, 2017. doi:10.1002/rnc.3586 .[5] A. C. Antoulas.
Approximation of Large-Scale Dynamical Systems , vol-ume 6 of
Adv. Des. Control . SIAM Publications, Philadelphia, PA, 2005. doi:10.1137/1.9780898718713 .[6] K. J. Åström and P. Eykhoff. System identification – a survey.
Automatica ,7(2):123–162, 1971. doi:10.1016/0005-1098(71)90059-8 .[7] B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. Kutz. Extractingspatial-temporal coherent patterns in large-scale neural recordings usingdynamic mode decomposition.
Journal of Neuroscience Methods , 258:1–15, 2016. doi:10.1016/j.jneumeth.2015.10.010 .[8] J. V. Burke and M. L. Overton. Variational analysis of non-Lipschitzspectral functions.
Math. Program. , 90(2, Ser. A):317–352, 2001. doi:10.1007/s102080010008 .[9] K. K. Chen, J. H. Tu, and R. W. Rowley. Variants of dynamic mode decom-position: Boundary condition, Koopman, and Fourier analyses.
NonlinearScience , 22(6):887–915, 2012. doi:10.1007/s00332-012-9130-9 .[10] F. E. Curtis, T. Mitchell, and M. L. Overton. A BFGS-SQP method fornonsmooth, nonconvex, constrained optimization and its evaluation usingrelative minimization profiles.
Optim. Methods Softw. , 32(1):148–181, 2017. doi:10.1080/10556788.2016.1208749 .[11] K. V. Fernando and H. Nicholson. On the structure of balanced and otherprincipal representations of SISO systems.
IEEE Trans. Autom. Control ,28(2):228–231, 1983. doi:10.1109/TAC.1983.1103195 .[12] P. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley.
Turbulence,coherent structures, dynamical systems and symmetry . Cambridge Mono-graphs on Mechanics. Cambridge University Press, Cambridge, 2012. doi:10.1017/CBO9780511919701 . 17 r e p r i n t [13] T. C. Ionescu, K. Fujimoto, and J. M. A. Scherpen. The cross operatorand the singular value analysis for nonlinear symmetric systems. In Proc.Eur. Control Conf. , pages 1565–1570, 2009. URL: http://hdl.handle.net/11370/1bcfd3be-c006-49e0-bbbe-5c09ca055d87 .[14] K. Katayama.
Subspace Methods for System Identification . Communi-cations and Control Engineering. Springer London, 2005. doi:10.1007/1-84628-158-X .[15] B. O. Koopman. Hamiltonian systems and transformation in Hilbert space.
Proceedings of the National Academy of Sciences , 17(5):315–381, 1931.URL: .[16] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor.
DynamicMode Decomposition: Data-Driven Modeling of Complex Systems . Societyof Industrial and Applied Mathematics, Philadelphia, USA, 2016. doi:10.1137/1.9781611974508 .[17] S. Lall, J. E. Marsden, and S. Glavaški. Empirical model reduction of con-trolled nonlinear systems. In
Proc. of the IFAC World Congress , volume F,pages 473–478, 1999. URL: resolver.caltech.edu/CaltechAUTHORS:20101007-154754737 .[18] A. S. Lewis and M. L. Overton. Nonsmooth optimization via quasi-Newtonmethods.
Math. Program. , 141(1–2, Ser. A):135–163, 2013. doi:10.1007/s10107-012-0514-2 .[19] I. Mezic. Spectral properties of dynamical systems, model reduction anddecompositions.
Nonlinear Dynamics , 41(1):309–325, 2005. doi:10.1007/s11071-005-2824-x .[20] T. Mitchell. GRANSO: GRadient-based Algorithm for Non-Smooth Opti-mization. http://timmitchell.com/software/GRANSO . See also [10].[21] J. Nocedal and S. J. Wright.
Numerical Optimization.
Springer, New York,1999. doi:10.1007/b98874 .[22] H. Oku and T. Fujii. Direct subspace model identification of LTI systemsoperating in closed-loop. In , pages 2219–2224, 2004. doi:100.1109/CDC.2004.143378 .[23] J. L. Proctor, S. L. Brunton, and J. N. Kutz. Dynamic mode decompositionwith control.
SIAM J. Applied Dynamical Systems , 15(1):142–161, 2016. doi:10.1137/15M1013857 .[24] J. L. Proctor, S. L. Brunton, and J. N. Kutz. Generalizing Koopmantheory to allow for inputs and control. arXiv e-prints 1602.07647, CornellUniversity, 2016. math.OC. URL: https://arxiv.org/abs/1602.07647 .[25] C. W. Rowley and S. T. M. Dawson. Model reduction for flow analysisand control.
Annual Review of Fluid Mechanics , 49:387–417, 2017. doi:10.1146/annurev-fluid-010816-060042 .18 r e p r i n t [26] C. W. Rowley, I. Mezic, S. Bagheri, P. Schlatter, and D. S. Henningson.Spectral analysis of nonlinear flows. Journal of Fluid Mechanics , 641:115–1127, 2009. doi:10.1017/S0022112009992059 .[27] P. J. Schmid. Dynamic mode decomposition of numerical and experimentaldata.
J. Fluid Mech. , 656:5–28, 2010. doi:10.1017/S0022112010001217 .[28] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N.Kutz. On dynamic mode decomposition: Theory and applications.
Journalof Computational Dynamics , 1(2):391–421, 2014. doi:10.3934/jcd.2014.1.391 .[29] P. M. J. Van Den Hof and R. J. P. Schrama. An indirect method for transferfunction estimation from closed loop data.
Automatica , 29(6):1523–1527,1993. doi:10.1016/0005-1098(93)90015-L .[30] P. Van Overschee and B. de Moor. N4SID: numerical algorithms for statespace subspace system identification. In
IFAC Proceedings Volumes , vol-ume 26, pages 55–58, 1993. doi:10.1016/S1474-6670(17)48221-8 .[31] M. Viberg. Subspace-based methods for the identification of linear time-invariant systems.
Automatica , 31(12):1835–1851, 1995. doi:10.1016/0005-1098(95)00107-5doi:10.1016/0005-1098(95)00107-5