CKNet: A Convolutional Neural Network Based on Koopman Operator for Modeling Latent Dynamics from Pixels
11 CKNet: A Convolutional Neural Network Based onKoopman Operator for Modeling Latent Dynamicsfrom Pixels
Yongqian Xiao, Xin Xu, and Lilin Qian,
Abstract —For systems with only known pixels, it is difficultto identify its dynamics, especially with a linear operator. Inthis work, we present a convolutional neural network (CNN)based on the Koopman operator (CKNet) to identify the latentdynamics from raw pixels. CKNet learned an encoder and de-coder to play the role of the Koopman eigenfunctions and modes,respectively. The Koopman eigenvalues can be approximated bythe eigenvalues of the learned system matrix. We present thedeterministic and variational approaches to realize the encoderseparately. Because CKNet is trained under the constraints of theKoopman theory, the identified dynamics is linear, controllableand physically-interpretable. Besides, the system matrix andcontrol matrix are trained as trainable tensors. To improve theperformance, we propose the auxiliary weight term for multi-step linearity and prediction losses. Experiments select two classicforced dynamical systems with continuous action space, and theresults show that identified dynamics with 32-dim can predictvalidly 120 steps and generate clear images.
Index Terms —Koopman Operator, Latent Dynamics, raw pix-els, deep learning.
I. I
NTRODUCTION A S the identified model is descried linearly, model iden-tification with the Koopman operator attracted hugeattentions and achieved great successes in recent years. Utilizingthe identified model, the controlling performance could beimproved via predicting future states and evaluating thecorresponding losses. In terms of modeling based on theKoopman operator, two main approaches were proposed inrecent years except for deep learning-based methods. Oneis dynamic mode decomposition (DMD)[1] which appliessingular value decomposition (SVD) to extract intrinsic featuresfor approximating the Koopman eigenvalues, eigenfunctions,and modes with the eigenvalues and their corresponding rightand left eigenvectors. The other is extended dynamic modedecomposition (EDMD) [2] and its kernel variant KDMD [3]which transform the modeling into a supervised problem andsolves it with least square methods. Koopman operator-basedapproaches have been applied for approximating the systemdynamics in many fields, such as fluid dynamics [4], powersystem [5], [6], molecular conformation analysis [7], roboticsystems [8], etc.To overcome the dilemma that DMD and EDMD are onlyapplicable to unforced systems, extended methods for forcedsystems based on DMD and EDMD were designed via dealing
Yongqian Xiao, Xin Xu are with the College of Intelligence Science andTechnology, National University of Defense Technology, Changsha 410073,China. email: [email protected] with the state and control input as an augmented matrix [9],[10]. In this way, nonlinear forced systems can be describedlinearly so that linear control theorems can be applied naturallyafter planning or for the systems which have explicit referencestates [11], [12].Theoretically, the Koopman operator can accurately describenonlinear systems globally in the infinite invariant subspace[13],[14]. However, we can not realize infinite-dimensional oper-ator practically. We usually construct a linear operator in ahigher-dimensional space to approximate the infinite Koopmanoperator where the higher-dimensional is created by liftingthe original state space via designing basis functions. Basisfunctions have a decisive effect on modeling performance.They can be constructed with kernel functions, e.g. radial basisfunctions (RBF), but designing basis functions demands strongexperience and lack of theoretical guidance. Besides, when thestate’s dimension and dataset’s scale are extraordinarily large,it is intractable to load all data into memory and execute theSVD or pseudo-inverse operator. In addition, designing properbasis functions is even more complex. DMD, K-DMD, EDMD,and K-EDMD become infeasible.Deep learning has natural advantages to solve problemsof complex function fitting. On basis functions designingproblem, we can autonomously train a neural network totake the place of basis functions instead of manually tryingdifferent kernel functions with different hyper-parameters.From this perspective, notable researches about combiningdeep learning and the Koopman operator have fueled itsapplication in many fields such as fluid dynamics[15], powergrid[16], vehicle dynamics [17], molecular kinetics[18], atomicscale dynamics[19], highway traffic dynamics [20], chaossystem [21] et.al. These works usually adopt an anto-encoder(AE) framework, and the encoder is used to approximate theKoopman eigenfunctions. Besides, some works calculate thesystem matrix and control matrix according to the sequenceof latent states outputted by the encoder[22], [23]. In themeantime, some works treat the system and control matricesas trainable weights [17], [24].Previous researches mainly focus on low-dimensional sys-tems. However, under many circumstances, existing reasonssuch as the expensive cost of high-precision sensors, disableus to acquire real-time intrinsic value-wise states, but high-dimensional pixel-wise observations can be grasped via low-cost cameras. Further, pixel-wise observations, such as imagesor lidar point cloud data, usually include lots of invalidinformation and noises. In this way, it is intractable to control a r X i v : . [ ee ss . S Y ] F e b the system only according to the inputting raw images. To dealwith these situations, learning-based algorithms are usuallyapplied, such as learning-based nonlinear MPC (LB-NMPC)[25], and family of deep reinforcement learning algorithms(DRL) [26], [27]. This end-to-end style leads a poor efficiency,therefore, encoding-based approaches were proposed to learna neural network model or construct an encoder to improvethe learning efficiency, such as MuZero [28], CURL [29],and PlaNet [30]. CURL only extracts features as the state forRL algorithms without predictive ability while MuZero andPlaNet learn a model that takes the extracted feature vectoras the latent state with neural networks in an end-to-end style.Unlike with these methods, after features are extracted withthe encoder, we regard the feature vector as the latent system’sstate and adopt the EDMD theory to approximate the Koopmanoperator of this latent system resulting in an interpretable lineardynamics instead of a non-linear end-to-end model based onneural networks. In this way, we can predict linearly based onthe identified dynamics leading to a small computing cost.Currently, few works focus on approximating the Koopmanoperator of dynamical systems that take raw pixels as thestate. A DMD-based deep learning framework was constructedfor background/foreground extraction and video classification[31]. Firstly, it focused on unforced systems. Secondly, itadopts a hierarchical manner that trains AE firstly then doesthe DMD operator. DeepKoCo [32] is a similar work thatthe encoder adopts a deterministic approach to output theKoopman eigenfunctions directly. The system matrix consistsof Jordan block and the control matrix is subjected to theprediction and reconstruction the next observation. In thiswork, we adopt EDMD theory with both deterministic andvariational approaches that the encoder outputs basis functionswhich is linear correlation with the Koopman eigenfunctions.The system and control matrices are dealt with trainable tensors.Namely, after training, we obtain the fixed system and controlmatrices and we can do controllability analysis on the identifiedlatent dynamics.The main contributions of this work are three-fold:1) A convolutional neural network based on the Koopmanoperator is proposed and realized with the deterministic andvariational approaches for modeling latent dynamics from rawpixels.2) An auxiliary weight term is proposed for multi-steplinearity and prediction losses to improve the predictionperformance. Comparison experiments were designed to studythe influence of different auxiliary weights on different losses.3) The deterministic and variational approaches are appliedto identify two classic physical systems with continuous actionspace. The results show that the proposed method is validfor identifying the latent dynamics from row pixels and theseidentified dynamics are controllable.II. D ESIGN OF
CKN ET In this section, we detailedly introduce how to design CKNetfor approximating the Koopman operator of discrete-timeunforced and forced dynamical systems that taking pixel-wisematrices as states. Also, we give the method for sampling basisfunctions while we adopt the variational encoder.
A. CKNet for Unforced Systems
Consider an unforced discrete-time system as follows: x k +1 = f ( x k ) (1)where x ∈ R c × h × w ∈ M denotes the state of dynamical sys-tem f in original high-dimension space M and it consists of c images with length h width w . We utilize a convolutional neuralnetwork (CNN) to approximate the Koopman eigenfunctions,thus the Koopman operator K can be defined by ( K ϕ )( x k ) = ϕ ( f ( x k )) (2)where ϕ ∈ H are the Koopman eigenfunctions, some space of H is usually an infinite-dimensional space. In this manner, theunforced system f can be described as a linear dynamics in H . The linear dynamics evolves linearly in H : x k + p = (cid:88) i =1 ζ i ( K p ϕ i )( x k ) = (cid:88) i =1 ζ i µ pi ϕ i ( x k ) (3)where K p denotes p times Koopman operator, µ i is the i -th Koopman eigenvalue corresponding to the i -th Koopmaneigenfunction φ i , and ζ i is the i -th Koopman mode to remapstates back to M .Though K operator is acted in infinite-dimensional space, itattracts attention because of its linearity. Methods of DMDsapproximate the Koopman operator via approximating theKoopman eigenvalues, eigenfunctions, and modes. In this work,we propose a deep learning framework based on EDMD toapproximate the Koopman operator, unlike DMDs. Because ofthe limitation of length, no more tautology here. Want moredetails of EDMD could refer to [2], [11]. As shown in Fig. 1,CKNet expands a low-dimensional subspace V via the encoder φ to extract intrinsic dynamical features as basis functionsto paly the role of the Koopman eigenfunctions. Meanwhile,a nonlinear CNN decoder is designed to play the role ofthe linear Koopman modes to transform latent states from V back to pixels in H . Therefore, the unforced system f can beapproximated via CKNet: (cid:40) φ ( x k + p ) . = φ K ( x k + p ) = K p φ ( x k ) = A p φ ( x k ) x k = ˜ φ ( x k ) (4)where K is an approximating operator of K in V andrepresented by the square matrix A , φ ( x ) ∈ R L and φ K ( x ) ∈ R L denote the latent state acquired via the encoder and K operator, respectively, ˜ φ ( φ ( x )) ∈ R c (cid:48) × h × w denotes theoutput of the decoder, where c (cid:48) is a hyper-parameter whichequals c or 1. After training, the Koopman eigenvalues areapproximated by the eigenvalues of A . Note that basis functionsis linear correlation with Koopman eigenfunctions. That is, ϕ i = a Ti φ ( x ) , where a i is the right eigenvector correspondingto the i -th eigenvalue.With the deterministic approach, the encoder outputs ba-sis functions φ ( x ) directly. While the encoder adopts thevariational approach, we can acquire the basis functions viasampling from the learned Gaussian distribution. To realizeback-propagation, reparameterized technique is applied to ConvolutionalEncoder
Image sequence MLP
ConvolutionalDecoder
Image sequenceEDMD MLPEDMD
Fig. 1. The framework of CKNet. (a) The encoder of CKNet for expanding some finite space of V as an invariant subspace of H . It outputs basis functions fortaking the place of the Koopman eigenfunctions. We construct the encoder in two ways, the deterministic approach and the variational approach respectively.The deterministic approach shown in (a.1), outputs basis functions directly after the MLP. (a.2) shows the variational approach via sampling from the learnedGaussian distribution. (b) We adopt a recursive way to realize multi-step training. A high-dimensional nonlinear systems can be described as a low-dimensionallinear dynamics φ ( x k +1 ) = Aφ ( x k ) + Bu k in V , where the system matrix A and control matrix B are obtained as trainable tensors. CKNet is alsoapplicable for unforced systems while the input u k equals 0 constantly. (c) The decoder has the reverse structure with the encoder and it plays the role of theKoopman modes for mapping the latent state from subspace V back to the original observation space. sample basis function in the training process: φ ( x ) = µ φ ( x ) + exp (ln ( σ φ ( x )) (cid:12) ξ ) (5)where µ φ and σ φ are the mean and variance of the learnedGaussian distribution. The µ and ln( σ φ ) are given by thevariational encoder. Where ξ N (0 , I ) is a noise vector from astandard normal distribution. B. CKNet for Forced Dynamics
In this section, we focus on approximating forced dynamicswith CKNet. Consider a discrete-time forced dynamics: x k +1 = f ( x k , u k ) (6)where x k ∈ R c × h × w and u k ∈ R n are the state and controlof the system f . There are several methods of extending theKoopman operator for forced systems which take value-wisevector as states[9], [10], [11]. In this work, we adopt themethod in [11]. The Koopman operator of (6) can be describedas follows: ( K ϕ ) X k = ϕ ( f ( X k )) (7)where X is the extended state of the dynamics: X k = (cid:20) x k u k (cid:21) Similarly, CKNet is also applicable for approximating theKoopman operator in (7): (cid:40) φ ( x k + p ) . = φ K ( x k + p ) = G p Ψ ( x k ) x k = ˜ φ ( x k ) (8)where G = [ A B ] is an approximating operator for forceddynamics. Ψ ( x k ) = [ φ ( x k ) u k ] (cid:62) is the extended state in V . In EDMD, the system matrix A and control matrix B aresolved by constructing a least square problem that builds thedataset with snapshot pairs. However, it is not feasible forlarge-scale dynamical systems. In this work, we treat A and B as trainable tensors and train them in a mini-batch manner.Particularly, we execute the controllability analysis of theidentified dynamics in subspace V in the training process. Theapproximated discrete-time linear dynamics is controllable ifthe following matrix S is full rank: (cid:40) S = (cid:2) B AB A B ... A L − B (cid:3) R = Rank ( S ) (9) C. Loss functions for CKNet
CKNet is a general framework based on the classic EDMDtheory. Meanwhile, CKNet extends the scope of applicationmainly in three aspects: Firstly, CKNet adopts multi-step lossfunctions to improve the approximating performance whileEDMD adopt one-step loss functions. Secondly, CKNet isapplicable for pixel-wise inputted dynamical systems whileEDMD is only applicable for value-wise systems. Thirdly,CKNet adopts a mini-batch training manner, but EDMD trainedvia solving a least square problem with whole training data sothat EDMD is not feasible for high-dimensional systems.To strength the linear accuracy of the identified model in V ,we consider the term of linearity loss to restrain the encoder.Multi-step linearity loss technology is applied so that theapproximated dynamics can be identified better in a global perspective leading to more predictive steps without divergence. L linear = 1 p l p l (cid:88) i =1 (cid:37) i (cid:107) φ ( x k + i , θ e ) − φ K ( x k + i ) (cid:107) F = 1 p l p l (cid:88) i =1 (cid:37) i (cid:13)(cid:13) φ ( x k + i , θ e ) − G i Ψ ( x k ) (cid:13)(cid:13) F (10)where the encoder φ is parameterized with trainable weights θ e , (cid:37) i is the weight of the i -th step linear prediction, G i denotes i times linear recursion from a state φ ( x k , θ e ) with a sequence ofcontrol u k , ..., u k + i in V , and it can be calculated as follows: G i Ψ ( x ) = φ K ( x k + i )= Aφ K ( x k + i − ) + Bu k + i − = · · · = A i φ ( x k , θ e ) + i (cid:88) j =1 A j − Bu k + i − j (11)When we train the encoder, A , and B only with the constraint(10), we can also acquire a linear approximated dynamics in V if we don’t demand to obtain corresponding pixels. However,during the training process, this training way will make theencoder, A , and B converging to zeros gradually which resultsin an invalid model. To avoid this problem, a reconstruction lossfunction is included. This loss restrains the intrinsic featuresextracted by the encoder to contain all the information so thatthe decoder could retrieve original pixels. L recon = 1 p p (cid:88) i =1 (cid:13)(cid:13)(cid:13) x k − ˜ φ ( φ ( x k +1 , θ e ) , θ d ) (cid:13)(cid:13)(cid:13) F (12)where ˜ φ denotes the decoder which is parameterized with θ d .Since we need to generate the corresponding images aftermulti-step prediction in V , the weighted multi-step predictionloss function is designed to further restrain the encoder anddecoder. L pred = 1 p p p p (cid:88) i =1 ι i (cid:13)(cid:13)(cid:13) x k − ˜ φ (cid:0) G i Ψ ( x k ) , θ d (cid:1)(cid:13)(cid:13)(cid:13) F (13)where ι i is the weight for the i -th step linear prediction.Pixel-wise inputted systems are more complex and toughto approximate than value-wise systems because raw pixelsstate contain many invalid noises. Leading to a problem that,after some prediction steps generated images only have thebackground information, but lost all key features. To alleviatethis problem, we add a term of auxiliary weight to increase theimportance of losses of long step’s prediction. (cid:37) i in (10) and ι i have similar functions in this work, and they are defined bya ‘tanh’ function as follows: (cid:26) ι i = 1 + tanh ( τ l i ) (cid:37) i = 1 + tanh ( τ p i ) (14)where τ (cid:63) is a hyper-parameter and it influences the importancedegree as shown in Fig. 2. In this way, weights ι and (cid:37) arelimited in the range of [1 2] so that they will not cause gradientexplosion.In addition, we add a term of l regulation loss on the Fig. 2. The weight of multi-step loss functions. We can change the importancedegree of the i -th step prediction loss via tune τ . When the total predictionsteps p l or p p in the training process is large, we should tune down the valueof τ , increase it vice-versa. encoder and decoder to avoid over-fitting. l = Θ (15)where Θ denotes the weights of the encoder, decoder, A , B .Finally, CKNet can be trained under the loss function asfollows: L = α L linear + α L recon + α L pred + α l (16)where α , α , α , α are the weights for each loss function.We can train CKNet through minimizing the weighted loss, L ,details shown in Algorithm 1. Algorithm 1
The CKNet Algorithm
Require: p , p l , p p , τ p τ l , c , c (cid:48) , ζ , lr , Epoch = 0 , Epoch max , α i , i = 1 , · · · , , batch size b s , a small scalar (cid:15) > .Initialize θ e , θ d , A , B Ensure: trained θ e , θ d , A , B ; while Epoch > Epoch max do Sample a batch data sequence of images and controlswith the length of ms = max ( p, p l , p p ) . x ms + c , U ms for i = 1 to ms do if Adopt dterministic approach then Obtain the latent state, φ ( x i : i + c , θ e ) , outputted bythe encoder directly; else Sample the latent state φ ( x i : i + c , θ e ) with (5); end if Acquire the reconstruction state ˆ x =˜ φ ( φ ( x i : i + c , θ e ) , θ d ) ; end for for i = 1 to ms do Compute G i Ψ ( x c ) with (11) and ˜ φ ( G i Ψ ( x c )) ; end for Obtain the weighted loss L in (16) with (10), (12), (13),(14), and (15); Update θ e , θ d , A , and B via minimizing L with anAdam optimizer; Epoch = Epoch + 1 end while
TABLE II
NFORMATION OF COLLECTED DATASET . C ART P OLE M OUNTAIN C AR E PISODES
250 240S
TEPS [200 300] [300 400] A LLOCATION [25 25 200] [20 20 200]
III. E
XPERIMENTS
In this work, we adopt the offline training manner to validateour CKNet via two nonlinear pixel-wise inputted systemswith continuous action space, MountainCar and CartPole,respectively. Namely, we first collect the training, testing, andvalidation datasets, and preprocess these data before training.
Fig. 3. The selected two forced dynamics in Gym environment with continuousaction space for validating CKNet. (a) Modified ‘CartPole-v0’ task withcontinuous action space; (b) ‘MountainCarContinuous-v0’ task.
A. Data collection ‘MountainCarContinuous-v0’ and ‘CartPole-v0’ are two clas-sic tasks for validating reinforcement learning (RL) algorithms.In gym library, the MountainCar task provides the versionwith continuous action space while CartPole task only supportsdiscrete action space with {− , , } . Thus we did a slightmodification to support continuous action space for the CartPoletask. In order to obtain comprehensive data in the state space,we utilize trained RL algorithms and plus a term of noiseto the controller for data collection. In collecting process,we record episodes data including the current image s k , theexecuted control u k , and the next image s k +1 . For CartPole, wecollected 250 episodes where 25 for testing, 25 for validation,and the rest 200 for training. The steps of each episode are inthe range of [200 300] . Similarly for MountainCar, informationis detailed in Table. I.In the preprocessing, we first convert images to grayscale.Then, we enhance these images by modifing the grayscale to . when a pixel’s value is bigger than 0.8. Lastly, we crop andresize images to an appropriate size so that we can decreasethe calculation cost but still keep enough key information..A single image include position and angle features but itcan not represent information of velocities, such as velocity ofthe car and angular velocity of the pole In the CartPole task.Therefore, we concatenate c continuous images to a multi-channel tensor as the state. Consequently, the size of statetensors is c × × for CartPole environment, c × × for MountainCar. TABLE IIN
EURAL NETWORK STRUCTURE OF THESE TWO EXAMPLES . C ART P OLE M OUNTAIN C AR S TRUCTURE
ACT S
TRUCTURE
ACT × × I NPUT × × I NPUT × × R E LU × × R E LU × × R E LU × × R E LU × × R E LU × × R E LU × × R E LU × × R E LU × × R E LU - -4860 R E LU 4860 R E LU1525 R E LU 1525 R E LU32 -/T
ANH
32 -/T
ANH
B. Training
As shown in Table. II, the neural networks have similarstructures and they are designed simply without any poolinglayer. There is one more convolutional layer considering that theinput image of the CartPole task is bigger. For the activationfunction of the encoder’s last layer, we tried two kinds ofactivation styles, ‘Tanh’ function and without an activationfunction, and simulation results show that these two styles areboth valid. Decoders have completely reversed structures withcorresponding to encoders. Activation functions of decodersare ‘ReLU’ function except the activation function of the lastconvolutional layer is ‘Sigmoid’ function.Hyperparameters are given in Table. III, where lr , bs arethe learning rate and batch size respectively, c and c (cid:48) denotethe number of images for the input of encoders and output ofdecoders. When c (cid:48) = c , it means the decoder is constrainedto output the exact images corresponding to the input ofthe encoder. When c (cid:48) = 1 , it denotes the decoder is onlyconstrained to output the current image which equals the lastimage of the input of the encoder. From Table. III we canknow that hyperparameters of these two tasks are almost thesame except for the learning rate and batch size. In the trainingprocess, CKNet does not need to design network structure andtune hyperparameters deliberately for different tasks.Additionally, we train the network with Pytorch-Lightning1.0.7, which is a framework based on Pytorch. Pytorch-Lightning is convenient for training with multi-GPU andsynchronizing parameters of batch-normalization. We trainthese networks with four NVIDIA GeForce GTX 2080Ti GPUwith batch-normalization.IV. R ESULTS
During training process, we regularly check the controlla-bility of the identified linear dynamics via recording the rankof S in (9). The change curve of the rank R shown in Fig.4, for CartPole task, the rank R reaches V quickly, where V is the dimension of the latent state. For the MountainCartask, S becomes full rank after around 4.5K training steps.Namely, in the training process, the identified models of thesetwo tasks for both deterministic and variational approachesbecome controllable. TABLE IIIH
YPER - PARAMETERS OF TWO ENVIRONMENT . C ART P OLE M OUNTAIN C AR H- PARAM V ALUE H- PARAM V ALUE α α α α α α α × − α × − V V τ p . τ p . τ l . τ l . p l p l p p p p p p c c c (cid:48) c (cid:48) lr . lr . bs bs Batch
Fig. 4. The rank in the training process of the matrix S in (9). (cid:63) denotes therealized task (cid:63) with the deterministic approach, and where (cid:63) ∈ { MountainCar,CartPole } . V- (cid:63) denotes the variational approach for the task of (cid:63) . For testing, we do steps prediction of these two tasksfor demonstrating the proposed CKNet. We first obtain theoriginal latent state φ ( x ) utilizing the original state x whichconsists of c adjacent images, then we acquire φ K ( x k − ) with a sequence of controls u k as input according to therecurrent rule as shown in Fig. 1 (b).Because the dimension of the latent state equals 32 in thiswork, for intuitive expression, we adopt the mean absoluteerror (MAE) of each step of the latent state and reconstructionimage to evaluate the accuracy of identified linear dynamics.Generally, there are two ways while utilizing the identifieddynamics. While we use the latent state as the input of systemsfor controlling, we expect a smaller error on predicted latentstates. Similarly, while we use predicted images, we desirea smaller error on generation images. As shown in Fig. 5,we study the influence of auxiliary weights on the predictionperformance with deterministic and variational approaches.In the MountainCar task, we can knowledge that an appropri-ate auxiliary weight on linearity loss can significantly decreasethe prediction of latent states with both deterministic andvariational approaches. For image prediction and generation, anappropriate auxiliary weight also has an obvious improvementwhen the prediction step is within the range of [0 80] steps for the deterministic approach, and [0 50] steps for the variationalapproach.In the CartPole task, auxiliary weights have obvious perfor-mance improvements on prediction of generation images forboth deterministic and variational approaches. For latent statesprediction, auxiliary weights on L pred have a better assistancefor the deterministic approach while auxiliary weights areputted on L linear . Besides, small auxiliary weights on L pred are more suitable.The prediction and images generation result shown in Fig.6 and Fig. 7, identified dynamics can not only accuratelypredict dynamical intrinsic state, such as the position, angle, andvelocities, but also contain fixed information of environments,i.e. the size of the pole, the shape and slide rail of the car, theshape of the mountain.V. C ONCLUSION
This work proposes a deep learning framework with con-volutional networks for identifying the latent dynamics fromraw images. We construct the encoder in two different ways,the deterministic and variational approaches. Besides, auxiliaryweights are introduced into multi-step linearity and predictionlosses to improve the prediction performance. Training processunder the constraints of the Koopman operator, the identifiedmodel is linear, controllable, and physically-interpretable in thesubspace constructed by the encoder. Experiments adopt twoclassic forced and unforced physical systems with continuousaction space and the results show that the identified model canaccurately predict the latent states and generate clear imagesfor 120 steps linear prediction.A
CKNOWLEDGMENT
The authors would like to thank...R
EFERENCES[1] P. J. Schmid, “Dynamic mode decomposition of numerical and experi-mental data,” Journal of fluid mechanics, vol. 656, pp. 5–28, 2010.[2] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data–drivenapproximation of the koopman operator: Extending dynamic modedecomposition,” Journal of Nonlinear Science, vol. 25, no. 6, pp. 1307–1346, 2015.[3] I. G. Kevrekidis, C. W. Rowley, and M. O. Williams, “A kernel-based method for data-driven koopman spectral analysis,” Journal ofComputational Dynamics, vol. 2, no. 2, pp. 247–265, 2016.[4] I. Mezi´c, “Analysis of fluid flows via spectral properties of the koopmanoperator,” Annual Review of Fluid Mechanics, vol. 45, pp. 357–378,2013.[5] Y. Susuki, I. Mezic, F. Raak, and T. Hikihara, “Applied koopmanoperator theory for power systems technology,” Nonlinear Theory andIts Applications, IEICE, vol. 7, no. 4, pp. 430–459, 2016.[6] M. Netto and L. Mili, “A robust data-driven koopman kalman filter forpower systems dynamic state estimation,” IEEE Transactions on PowerSystems, vol. 33, no. 6, pp. 7228–7237, 2018.[7] S. Klus, A. Bittracher, I. Schuster, and C. Sch¨utte, “A kernel-basedapproach to molecular conformation analysis,” The Journal of ChemicalPhysics, vol. 149, no. 24, p. 244109, 2018.[8] G. Mamakoukas, M. L. Castano, X. Tan, and T. Murphey, “Localkoopman operators for data-driven control of robotic systems.” inRobotics: science and systems, 2019.[9] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic mode decom-position with control,” SIAM Journal on Applied Dynamical Systems,vol. 17, no. 1, pp. 142–161, 2018. M A E MountainCar / Latent State Error M A E MountainCar / Image Error
D-MountainCarD-R01W-MountainCarD-R03W-MountainCarD-L01W-MountainCarD-L03W-MountainCar (a) Deterministic approach for MountainCar M A E MountainCar / Latent State Error M A E MountainCar / Image Error
V-MountainCarV-R01W-MountainCarV-R03W-MountainCarV-L01W-MountainCarV-L03W-MountainCar (b) Variational approach for MountainCar M A E CartPole / Latent State Error M A E CartPole / Image Error
D-CartPoleD-R01W-CartPoleD-R03W-CartPoleD-L01W-CartPoleD-L03W-CartPole (c) Deterministic approach for CartPole M A E CartPole / Latent State Error M A E CartPole / Image Error
V-CartPoleV-R01W-CartPoleV-R03W-CartPoleV-L01W-CartPoleV-L03W-CartPole (d) Variational approach for CartPoleFig. 5. Prediction MAE with the deterministic and variational approaches for MountainCar and CartPole tasks. The left and right columns subfiguresrespectively show the MAE of prediction latent states and reconstruction images with different auxiliary weights in (14). Where ‘D’ denotes that CKNetadopts the deterministic approach for the encoder while ‘V’ denotes the variational approach. ‘R0 (cid:63)
W’ and ‘L0 (cid:63)
W’ indicate that τ p and τ l in (14) are equal to . × (cid:63) respectively, otherwise both τ p and τ l are equal to zero. The MAE of each curve are calculated with 30 episodes of prediction.
10 20 30 40 50 6070 80 90 100 110 120
Original state TrueModelTrueModel
Fig. 6. The prediction of CartPole task. The solid red line divides the picture into two layers and each layer has two rows. The up row denotes the groundtruth while the second row denotes the generated images via linear prediction in V . Besides, the up left numbers denote the prediction steps. Original state TrueModel
10 20 30 40 50 60 70 80 90 100 110 120-2 -1 0