[PDF] Latent Space Subdivision: Stable and Controllable Time Predictions for Fluid Flow

Abstract

We propose an end-to-end trained neural networkarchitecture to robustly predict the complex dynamics of fluid flows with high temporal stability. We focus on single-phase smoke simulations in 2D and 3D based on the incompressible Navier-Stokes (NS) equations, which are relevant for a wide range of practical problems. To achieve stable predictions for long-term flow sequences, a convolutional neural network (CNN) is trained for spatial compression in combination with a temporal prediction network that consists of stacked Long Short-Term Memory (LSTM) layers. Our core contribution is a novel latent space subdivision (LSS) to separate the respective input quantities into individual parts of the encoded latent space domain. This allows to distinctively alter the encoded quantities without interfering with the remaining latent space values and hence maximizes external control. By selectively overwriting parts of the predicted latent space points, our proposed method is capable to robustly predict long-term sequences of complex physics problems. In addition, we highlight the benefits of a recurrent training on the latent space creation, which is performed by the spatial compression network.

Full PDF

LLatent Space Subdivision:Stable and Controllable Time Predictions for Fluid Flow

Steffen Wiewel Byungsoo Kim Vinicius C. Azevedo Barbara Solenthaler Nils Thuerey Abstract

We propose an end-to-end trained neural networkarchitecture to robustly predict the complex dy-namics of ﬂuid ﬂows with high temporal stability.We focus on single-phase smoke simulations in2D and 3D based on the incompressible Navier-Stokes (NS) equations, which are relevant for awide range of practical problems. To achieve sta-ble predictions for long-term ﬂow sequences, aconvolutional neural network (CNN) is trained forspatial compression in combination with a tem-poral prediction network that consists of stackedLong Short-Term Memory (LSTM) layers. Ourcore contribution is a novel latent space subdivi-sion (LSS) to separate the respective input quan-tities into individual parts of the encoded latentspace domain. This allows to distinctively alterthe encoded quantities without interfering withthe remaining latent space values and hence max-imizes external control. By selectively overwrit-ing parts of the predicted latent space points, ourproposed method is capable to robustly predictlong-term sequences of complex physics prob-lems. In addition, we highlight the beneﬁts ofa recurrent training on the latent space creation,which is performed by the spatial compressionnetwork.

1. Introduction

Computing the dynamics of ﬂuids requires solving a set ofcomplex equations over time. This process is computation-ally very expensive, especially when considering that thestability requirement poses a constraint on the maximal timestep size that can be used in a simulation.Due to the high computational resources, approaches formachine learning based physics simulations have recently Department of Computer Science, Technical University ofMunich, Germany Department of Computer Science, ETHZurich, Switzerland. Correspondence to: Steffen Wiewel. been explored. One of the ﬁrst approaches used RegressionForest as a regressor to forward the state of a ﬂuid overtime (Ladický et al., 2015). Handcrafted features have beenused, representing the individual terms of the Navier-Stokesequations. These context-based integral features can be eval-uated in constant time and robustly forward the state of thesystem over time. In contrast, using neural networks for thetime prediction has the advantage that no features have tobe deﬁned manually, and hence these methods have recentlygained increased attention. In graphics, the presented neuralprediction methods (Wiewel et al., 2019; Kim et al., 2019;Morton et al., 2018) use a two-step approach, where ﬁrstthe physics ﬁelds are translated into a compressed repre-sentation, i.e., the latent space. Then, a second networkis used to predict the state of the system over time in thelatent space. The two networks are trained individually,which is an intuitive approach as spatial and temporal repre-sentations can be separated by design. In practice, the ﬁrstnetwork (i.e., the autoencoder) introduces small errors in theencoding and decoding in each time step. In combinationwith a temporal prediction network these errors accumulateover time, introducing drifting over prolonged time spansand can even lead to instability, as seen in Figure 4 (topright). This is especially problematic in supervised learnedlatent space representations, since the drift will shift theinitial, user-speciﬁed conditions (e.g., an object’s position)into an erroneous latent space conﬁguration originated fromdifferent conditions.Like previous work, we use a neural network to predict themotion of a ﬂuid over time, but with the central goal toincrease accuracy and robustness of long-term predictions.We propose to use a joint end-to-end training of both com-ponents, the ﬂuid state compression and temporal predictionof the motion. A key observation is also the need to controlthe learned latent space, such that we can inﬂuence it toimpose boundary conditions and other known informationthat is external to the simulation state. We therefore proposea latent space subdivision that separates the encoded quanti-ties in the latent space domain. The subdivision is enforcedwith a latent space split soft-constraint for the input quanti-ties velocity and density, and allows to alter the individualencoded components separately. This separation is a keycomponent to robustly predict long-term sequences. a r X i v : . [ c s . G R ] M a r atent Space Subdivision

2. Related Work and Background

Our work concentrates on single-phase ﬂows, which areusually modeled by a pressure-velocity formulation of theincompressible Navier-Stokes equations: ∂ u ∂t + u · ∇ u = − ρ ∇ p + ν ∇ u + g (1) ∇ · u = 0 , where p denotes pressure, u is the ﬂow velocity, and ρ, ν, g denote density, kinematic viscosity, and external forces, re-spectively. The density ρ is passively advected by the veloc-ity u . A complete overview of ﬂuid simulation techniquesin computer graphics can be found in (Bridson, 2015).Data-driven ﬂow modelling encompasses two distinguishedand complementary efforts: dimensionality reduction andreduced-order modelling. Dimensionality reduction (e.g.,Singular Value Decomposition) scales down the analyzeddata into a set of important features in an attempt to in-crease the sparsity of the representation, while reduced-order modelling (e.g., Galerkin Projection) describes thespatial and temporal dynamics of a system represented by aset of reduced parameters. In computer graphics, the workof Treuille et al. (2006) was the ﬁrst to use Principal Compo-nent Analysis (PCA) for dimensionality reduction coupledwith a Galerkin Projection method for subspace simulation.This approach was later extended (Kim & Delaney, 2013)with a cubature approach for enabling Semi-Lagrangian andMac-Cormack (Selle et al., 2008) advection schemes, whileimproving the handling of boundary conditions. Reduced-order modelling was also explored to accelerate the pressureprojection step in liquid simulations (Ando et al., 2015).Instead of computing reduced representations from pre-simulated velocity ﬁelds, alternative basis functions can beused for reduced-order modelling; examples of basis func-tions include Legendre Polynomials (Gupta & Narasimhan,2007), modular (Wicke et al., 2009) and spectral (Long &Reinhard, 2009) representations. Also Laplacian Eigenfunc-tions have been successfully employed for dimensionalityreduction, due to their natural sparsity and inherent incom-pressibility. De Witt et al. (2012) combined LaplacianEigenfunctions with a Galerkin Projection method, enablingfast and energy-preserving ﬂuid simulations. The approachwas extended to handle arbitrarily-shaped domains (Liuet al., 2015), combined with a Discrete Cosine Transform(DCT) for compression (Jones et al., 2016), and improvedfor scalability (Cui et al., 2018).The aforementioned methods for data-driven ﬂow modellinguse linear basis functions for dimensionality reduction. Thisenables the use of Galerkin Projection for subspace inte-gration, but it limits the power of the reconstruction whencompared to non-linear embeddings. The latent spaces gen-erated by autoencoders (AE) are non-linear and richly cap- ture the input space with fewer variables (Radford et al.,2016; Wu et al., 2016). In light of that, Wiewel et al. (2019)combined a latent space representation with recurrent neu-ral networks (RNN) to predict the temporal evolution ofﬂuid functions in the latent space domain. Kim et al. (2019)introduced a generative deep neural network for parameter-ized ﬂuid simulations that only takes a small set of physicalparameters as input to very efﬁciently synthesize points inthe learned parameter space. Their method also proposes anextension to latent space integration by training a fully con-nected neural network that maps subsequent latent spaces.Our work is related to these two methods, but a main differ-ence is that we use an end-to-end training of both the spatialcompression and the temporal prediction. In combinationwith our latent space subdivision, our predictions are morestable, while previous approaches fail to properly recoverlong-term integration correspondences due to the lack ofautoencoder regularization.For particle-based ﬂuid simulations, a temporal state pre-diction using Regression Forest was presented in Ladickyet al. (2015). Handcrafted features are evaluated in particleneighborhoods and serve as input to the regressor, whichthen predicts the particle velocity of the next time step.Machine learning has also been used in the context of grid-based (Eulerian) ﬂuid simulations. Tompson et al. (2017)used a convolutional neural network (CNN) to model spa-tial dependencies in conjunction with an unsupervised lossfunction formulation to infer pressure ﬁelds. A simplerthree-layer fully connected neural network for the samegoal was likewise proposed (Yang et al., 2016). As an al-ternative, learned time evolutions for Koopman operatorswere proposed (Morton et al., 2018), which however em-ploy a pre-computed dimensionality reduction via PCA. Chuet al. (2017) enhance coarse ﬂuid simulations to generatehighly detailed turbulent ﬂows. Individual low-resolutionﬂuid patches were tracked and mapped to high-resolutioncounterparts via learned descriptors. Xie et al. (2018) ex-tended this approach by using a conditional generative adver-sarial network with a spatio-temporal discriminator supervi-sion. Small-scale splash details in hybrid ﬂuid solvers weretargeted with deep learning-based stochastic models (Umet al., 2018). For a review of machine learning applied toﬂuid mechanics we refer the reader to (Brunton et al., 2019).In the context of data-driven ﬂow modelling, methodsto interpolate between existing data have been presented.Raveendran et al. (2014) presented a technique to smoothlyblend between pre-computed liquid simulations, which waslater extended with more controllability (Manteaux et al.,2016). Thuerey et al. (2016) used dense space-time deforma-tions represented by grid-based signed-distance functionsfor interpolation of smoke and liquids, and Sato et al. (2018)interpolated velocity ﬁelds by minimizing an energy func-tional. atent Space Subdivision

3. Method

The central goal of our models is to robustly and accuratelypredict long-term sequences of ﬂow dynamics. For this, weneed an autoencoder to translate high-dimensional physicsﬁelds into a compressed representation (latent space) anda temporal prediction network to advance the state of thesimulation over time. A key observation is that if thesetwo network components are trained individually, neithercomponent has a holistic view on the underlying problem.The autoencoder, consisting of an encoder E and a decoder D , generates a compressed representation c = E ( x ) , whichfocuses solely on the reconstruction ˜ x = D ( c ) of the giveninput x . Hence, the loss function to minimize is givenby (cid:107) x − ˜ x (cid:107) . Without considering the aspect of time, theautoencoder’s latent space only stores spatial descriptors.Due to the exclusive focus on space, temporally consecutivedata points are not necessarily placed close to each other inthe latent space domain. This poses substantial challengesfor the temporal prediction network.Therefore, we consider the aspect of time within the train-ing of the autoencoder in order to shape its latent spacewith respect to temporal information, in addition to the spa-tial information. Thus, we propose an end-to-end trainingprocedure, where we train our autoencoder and temporalprediction network simultaneously by internally connectingthe latter as a recurrent block to the encoding and decodingblocks of the spatial autoencoder. As a result, the latentspace domain is aware of temporal changes, and can yieldtemporally coherent latent space points that are suitable forthe time prediction network. By default, our training processincludes the combined training of our spatial autoencoderand temporal prediction network as shown in Figure 1. Inthis ﬁgure, the encoder E, decoder D and prediction net-work P are duplicated for visualization purposes. In thenext sections, we describe each individual network in moredetail. The spatial encoding of the data is performed by a regularautoencoder, the network for which is split into an encodingand decoding part (Kim et al., 2019). The encoder contains convolution layers with skip connections, which connectits internal layers, followed by one fully-connected layer.The decoder consists of a fully-connected layer, which isfollowed by convolution layers with skip connections.For the ﬂuid simulation dataset used in this work, the in-put is either 2D or 3D, leading to the usage of 2D- or 3D-convolutions and a feature dimension of or , respectively.Furthermore, a data-speciﬁc curl-layer is appended to thedecoder network to enforce zero divergence in the result-ing velocity ﬁeld (Kim et al., 2019), as required by the NSequations (see Equation 1). DP P D D P D P D E E − E E Figure 1: Combined training of autoencoder and tem-poral prediction. The prediction networks input win-dow is set to w = 2 . Thus, the count of recurrentpredictions is n i = n − . The LSS is enforced byapplying the split loss on ¯ c den and ˘ c vel for velocityand density, respectively. A direct AE reconstructionloss is only performed on ˆ x .The dimensionality of the latent space c for a given x isdeﬁned by the ﬁnal layer of the encoder and can be freelychosen. We pass a velocity u as well as a density ﬁeld ρ toour encoder, i.e., x = [ u , ρ ] . The velocity ﬁeld is an activequantity that is used in ﬂuid simulations to advect a passivequantity forward in time. In case of smoke simulations,which is a speciﬁc instance of ﬂuid simulations, the passivedensity ﬁeld is advected by the ﬂow velocity. As a result, c contains information about both the active and passiveﬁelds, i.e., velocity and density. Hence, we can accuratelyadvect the passive quantity density with a velocity ﬁeld withlow computational effort, it makes sense to compute theadvection outside of the network, and project the new stateinto the latent space.In order to be able to alter active and passive ﬁelds individ-ually in the compressed representation c , with given inputﬁeld x = [ x vel , x den ] where the subscripts vel and den thereby denote the velocity and density part, we subdivide c into separate parts for velocity c vel and density c den , re-spectively. This property of the latent space is needed forprojecting the new state of the passive quantity into the la-tent space domain. Additionally, to exert explicit externalcontrol over the prediction, we designate another part of c tocontain supervised parameters, called c sp (Kim et al., 2019).In our case of, e.g., a smoke simulation with a rotating cupﬁlled with smoke, such supervised parameters can be theposition of a smoke source or the rotation angle of a solidobstacle. With this subdivision, we increase stability of ourpredictions and allow for explicit external control. Thissubdivision is described in Equation 2, where v , d , and sp describe the indices of the velocity, density, and supervised atent Space Subdivision parameter parts in the latent space domain, respectively. c = (cid:2) c vel ∼ c den | c sp (cid:3) (2)where c vel = (cid:2) c , . . . , c v (cid:3) , c den = (cid:2) c v +1 , . . . , c d (cid:3) , c sp = (cid:2) c d +1 , . . . , c sp (cid:3) . To arrive at the desired subdivision, the split loss L split (see Equation 3) is used as a loss function in the trainingprocess. It is modelled as a soft constraint and thereby doesnot enforce the parts c vel and c den to be strictly disjunct.The loss is deﬁned as L split ( c , I s , I e ) = I e (cid:88) i = I s (cid:107) c i (cid:107) . (3)Since we divide the latent space in three parts, L split isapplied twice. For the velocity part c vel , the indices I s = v +1 and I e = d are chosen to indicate that the density part mustnot be used on encoding velocities, i.e., L split ( c , v + 1 , d ) .In contrast to the previous limits, for the density part c den the velocity part is indicated by choosing the indices I s = 0 and I e = v , i.e. L split ( c , , v ) . First, only the velocity part x vel of input x is encoded (the density part x den is zero, i.e., ¯ x = [ x vel , ), yielding ¯ c vel . Vice versa, the density part x den of input x is encoded, whereas the velocity part x vel is replaced with zeros, i.e., ˘ x = [0 , x den ] , resulting in ˘ c den .Therefore, the loss is applied twice for the ¯ c and ˘ c encodingsas L split (¯ c , v + 1 , d ) and L split (˘ c , , v ) , respectively.In order to exhibit external control over the prediction, c sp isenforced to contain parameters describing certain attributesof the simulation. While training the network, an addi-tional soft-constraint is applied, which forces the encoder toproduce the supervised parameters. The soft-constraint isimplemented as the mean-squared error of the values gener-ated by the encoder ˆ c sp and the ground truth data c sp andconsitutes the supervised loss L sup as L sup ( c sp , ˆ c sp ) = (cid:107) c sp − ˆ c sp (cid:107) . (4)Additionally, an AE loss L AE (Equation 5) is applied to thedecoded ﬁeld ˆ x . It forces the velocity part of the decodedﬁeld to be close to the input velocity by applying the mean-absolute error. To take the rate of change of the velocitesinto consideration as well, the mean-absolute error of thevelocities’ gradient is added to the formulation. In contrast,the density part is handled by directly applying the mean-squared error on the decoded output density and the input.The AE loss is thereby deﬁned as L AE ( x , ˆ x ) = (cid:107) x vel − ˆ x vel (cid:107) + (cid:107)∇ x vel − ∇ ˆ x vel (cid:107) + (cid:107) x den − ˆ x den (cid:107) . (5) The prediction network performs a temporal transformationof its input to the temporal consecutive state. The inputs area series of w consecutive input states. The prediction net-work block contains two recurrent LSTM layers, followedby two 1D-convolutional layers. In our case of 2D smokesimulations, two consecutive latent space points of dimen-sion are used as input. Those are fed to the predictionlayers and result in one latent space point of dimension ,called the residuum ∆ c t . Afterwards, the residuum is addedto the last input state to arrive at the next consecutive state,i.e., ˜ c t +1 = c t + ∆ c t .Due to the subdivision capability of our autoencoder, ourtemporal prediction network supports external inﬂuenceover the predictions it generates. After each prediction, it ispossible to replace or update information without the needof re-encoding the predicted data. Instead, only parts ofthe predicted latent space point can be replaced, enablingﬁne-grained control over the ﬂow. For example, in the caseof smoke simulations, the passive smoke density quantitycan be overwritten with an externally updated version, i.e.,the c den part is replaced by c . This allows for adding newsmoke sources or modifying the current ﬂow by removingsmoke from certain parts of the simulation domain.Considering the exposition of the prediction input window w , which can be chosen freely, and the desired internaliteration count n i = n − w , the additive prediction erroris brought into consideration for the prediction network Pwhile training, i.e., it is traversed n i times. This leads to acombined training loss of AE and P deﬁned as L = L AE,direct + L sup + L split,vel + L split,den + n i (cid:88) n =0 (cid:0) L AE,pred ni (cid:1) , (6)where L AE is applied to the corresponding pairs of thedecoded outputs ˜ x ... ˜ x n of the prediction network P andtheir corresponding ground-truth x ... x n . Thereby, ourﬁnal loss is the sum of all the previously presented losses.In our combined training approach, both networks updatetheir weights by applying holistic gradient evaluations, i.e.,are trained end-to-end. The beneﬁt of the end-to-end train-ing is that the spatial autoencoder AE also incorporatestemporal aspects when updating its weights. In addition,by recurrently applying L AE on the predictions, the predic-tion network P is trained to actively minimize accumulatingadditive errors. To incorporate temporal awareness in theautoencoder, the decoder block is connected to the individ-ual prediction outputs and is thereby reused several timesin one network traversal. The recurrent usage of the decod-ing block is commonly known as weight sharing (Bromleyet al., 1993). Furthermore, by applying the prediction losses atent Space Subdivision on the decoded predictions, the spatial autoencoder adaptsto the changes induced by the temporal prediction as well,which furthers the focus of the autoencoder to produce la-tent spaces suitable for temporal predictions. As a result,the prediction network is capable of robustly and accuratelypredicting long-term sequences of complex ﬂuid ﬂows.

4. Training Datasets

The datasets we used to train our networks contain random-ized smoke ﬂows simulated with an open source framework(Thuerey & Pfaff, 2018). In total, three different scene se-tups were used to capture a wide range of complex physicalbehavior. The ﬁrst scene contains a moving smoke inﬂowsource that generates hot smoke continuously, which is ris-ing and producing complex swirls (see Figure 2).Figure 2: Example sequences of our 2D datasets: mov-ing smoke (top), rotating (center) and moving cup(bottom). The smoke density is shown as black andthe cup-shaped obstacle in blue.The second and third scenes simulate cold smoke in a cup-shaped obstacle. The former rotates the cup randomlyaround a ﬁxed axis, while the latter additionally applies atranslation (see Figure 2). The rising smoke and the rotatingcup scene each expose one control parameter, i.e., move-ment on the x-axis and rotation around the z-axis, whereasthe rotating and moving cup scene exposes both of thesecontrol parameters. Each of the three datasets in 2D con-tains randomly generated scenes with consecutiveframes. Additionally, the moving smoke as well as the rotat-ing and moving cup dataset was generated in 3D with randomly generated scenes and consecutive frames (seeTable 1). Table 1: Statistics of our datasets.

Scene Type Resolution

100 600Moving Smoke (3D)

100 600Rotating and Moving Cup (2D)

200 600Rotating Cup (2D)

200 600Moving Smoke (2D) x

200 600

5. Evaluation

In this section, we compare our architecture to the baselineof previous work. We also perform an ablation study on dif-ferent settings of our proposed architecture to compare theirrespective inﬂuence on the output. We compute the meanpeak signal-to-noise ratio (PSNR) for all our comparisons,i.e., larger values are better. For each case, we measureaccuracy of our prediction w.r.t. density and velocity interms of PSNR for ten simulations setups that were not seenduring training.For a thorough evaluation, we supply two prediction ap-proaches. First, we evaluate a regular prediction approachwith no reinjection of physical information (denoted

VelDen )that is in sync with previous work (Kim et al., 2019; Wiewelet al., 2019). This approach is formulated as ˜ c t = P (˜ c t − , ˜ c t − ) , (7) ˜ x t = [ ˜ x tvel , ˜ x tden ] = D (˜ c t ) , where the previously predicted latent space points ˜ c t − and ˜ c t − are used to evaluate the next time step ˜ c t . Afterwards, ˜ c t is decoded D (˜ c t ) and the density part ˜ x tden is directlydisplayed, i.e., no external physical information about thesystem state is inﬂuencing the output of our VelDen bench-marks.In the second approach, we make use of our LSS to reinjectthe advected density into the prediction to beneﬁt from wellunderstood physical computations that keep our predictionsstable and can be performed in a fast manner. The predictionprocess utilizing our LSS is denoted as

Vel and is given as ˜ c t = P (ˆ c t − , ˆ c t − ) , (8) ˜ x t = [ ˜ x tvel , ˜ x tden ] = D (˜ c t ) , ˙ x tden = Adv ( ˙ x t − den , ˜ x tvel ) , ˙ c t = E ([ ˜ x tvel , ˙ x tden ]) , ˆ c t = [˜ c tvel , ˙ c tden ] , ˆ x t = [ ˜ x tvel , ˙ x tden ] , where we are using the decoded predicted velocity ˜ x tvel toadvect the simulation density ˙ x t − den and reinject its encodedform into our latent space ˆ c t . The new latent space repre-sentation ˆ c t is thereby formed by concatenating the newencoded density ˙ c tden and the predicted encoded velocityﬁeld ˜ c tvel . By reinjecting the advected density ﬁeld ˙ x tden ,we inform the prediction network about boundary condi-tions as well as other known physical information that isexternal to the prediction state. In the following we willablate on different aspects of our method to evaluate theirrespective inﬂuence on the ﬁnal results. Latent Space Temporal Awareness

The temporal aware-ness of our spatial autoencoder is evaluated in this section, atent Space Subdivision P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t (a) No temporal constraints or super-vised parameters. P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t (b) No temporal constraints but withsupervised parameters. P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t (c) With temporal constraints and su-pervised parameters (ours). Figure 3: Spatial encodings of frames of different smoke simulations. The latent space points are normalized totheir respective maximum and processed with PCA for visualization purposes. Each color stands for a single simulationand represents a series of frames. No-SplitLSS (ours)GT

Figure 4: Long-term prediction of time steps: Ourmethod robustly predicts the ﬂuid ﬂow, whereas theregular prediction method (no-split) fails to capturethe movement and even produces unphysical behavior.Compare the PSNR values in Table 3.since it has a signiﬁcant impact on the performance of ourtemporal prediction network. In Figure 3 we evaluate threenetworks trained with different loss functions in terms ofthe stability of the latent space they generate for sequences.For each of the plots, frames of different smoke sim-ulations were encoded to the latent space domain with anautoencoder. The resulting latent space points were normal-ized with their respective maximum to the range of [ − , and afterwards transformed to dimensions with PCA. Thesupervised part was removed before normalization. For ourcomparison we chose autoencoders with a latent spacedimension of .The results in Figure 3a were generated with a classic AEthat was trained to only reproduce its input, i.e., only adirect loss on the output (Equation 5) was applied. Forthis classic AE no temporal constraints were imposed, andno supervised parameters were added to the latent space.The resulting PCA decomposition shows a very unevendistribution: large distances between consecutive pointsexist in some siutations, whereas a large part of the samplesare placed closely together in a cluster. PredictReinject(ours)GT

Figure 5: A circular solid obstacle, unseen during train-ing, is placed in the upper right. Our proposed method(center) predicts stable and realistic. The predictiononly approach (top) is not able to capture the obstacle.When adding the supervised parameter loss (Equation 4) inthe second evaluation (Kim et al., 2019) the trajectories be-come more ordered, as shown in Figure 3b, but still exhibit anoticeably uneven distribution. Thus, the supervised param-eter, despite being excluded from the PCA, has a noticeableinﬂuence on the latent space construction.In Figure 3c, the results of the AE trained with our proposedtime-aware end-to-end training method are shown. ThisAE applies the supervised parameter loss (Equation 4) aswell as the direct loss on the output (Equation 5) and wastrained in combination with the temporal prediction net-work P as described in Section 4. The visualization of thePCA decomposition shows that a strong temporal coherenceof the consecutive data points emerges. This visualizationindicates why our prediction networks yield substantial im-provements in terms of accuracy: the end-to-end trainingprovides the autoencoder with gradients that guide it to-wards learning a latent space mapping that is suitable for atent Space Subdivision

PredictReinject(ours)GT

Figure 6: An additional inﬂow, unseen in training, isplaced during prediction. In contrast to our approach(center), the prediction only approach (top) is not ableto capture the second inﬂow.temporal predictions. Intuitively, changes over time requirerelatively small and smooth updates, which results in thevisually more regular curves shown in Figure 3c.

Simple LS Division vs. LSS

A simple approach to arriveat a latent space with a clear subdivision in terms of inputquantities is to use two separate spatial AEs for the individ-ual input quantities. After encoding, the two resulting latentspace points c vel = E vel ( u ) and c den = E den ( ρ ) can beconcatenated, yielding c simple = [ c vel , c den ] . In contrast tothe simple approach, our LSS directly encodes both quanti-ties with a single AE as c LSS = E ([ u , ρ ]) and enforces thesubdivision with a soft-constraint. The combined trainingwith the prediction network is performed identical for bothspatial compression versions. It becomes apparent from theresults in Table 2 b), that the network trained with our soft-constraint outperforms the simple split variant. Especially,when reinjecting the simulation density in the Vel bench-marks, we see a better PSNR value of . for u and . for ρ for our method in comparison to . and . for u and ρ , respectively. The reason for this is that the simplesplit version can not take advantage of synergistic effectsin the input data, since both input quantities are encoded inseparate AEs. In contrast, our method uses the synergiesof the physically interconnected velocity and density ﬁeldsand robustly predicts future time steps. Internal Iterations

We compare the internal iteration countof the prediction network in the training process in Table 2a). By performing multiple recurrent evaluations of our tem-poral network already in the training process we minimizethe additive error build-up of many consecutive predictions.To ﬁght the additive prediction errors over a long time hori-zon is important to arrive at a robust and exact predictedsequence. We chose the values of , and internal it-erations for our comparison. It becomes apparent, that thenetwork trained with internal iterations and thereby the Table 2: Evaluations for Vel and

VelDen predictions;rotating and moving cup: a) Internal predictions n i ; b)Simple split vs. LSS; c) LS dimensionality comparison Vel VelDen a) n i PSNR u PSNR ρ PSNR u PSNR ρ b) Type PSNR u PSNR ρ PSNR u PSNR ρ Simple Split 26.95 16.35 c) LS Dimension | c | PSNR u PSNR ρ PSNR u PSNR ρ

16 30.28 17.66 29.11 17.1232 30.40 17.64 longest prediction horizon is superior in both evaluations. Itshould be noted, that the predictions with reinjection of thephysical density ﬁeld (

Vel ) have a lower error on the densitythan the prediction-only (

VelDen ) approach, e.g. a PSNRvalue of . in contrast to . for the density ﬁeld ofthe iteration version. This supports the usefulness of ourproposed latent space subdivision, that is needed to reinjectexternal information. Latent Space Dimensionality

The latent space dimension-ality has a major impact on the resulting weight count ofthe autoencoder as well as the complexity of the temporalpredictions and thereby their difﬁculty. In the followingwe compare latent space sizes of , and . When itcomes to prediction only ( VelDen ), the PSNR is better fora larger latent space dimensionality. In contrast to this ob-servation, the PSNR value is on the same level for all latentspace sizes, when the simulation density is reinjected (

Vel ).For this reason we used a latent space dimensionality of for all further comparisons. Due to bouyancy, the velocityand density of our smoke simulations are loosely coupled.Thus, additional weights do not increase the overall perfor-mance when the reconstruction of the individual parts of therespective input quantities already converged. Latent Space Subdivision vs. No-Split

To evaluate theusefulness of our LSS method that supports reinjection ofexternal information, it is compared to a classic networksetup that does not support that and instead performs a regu-lar prediction. When only performing a regular prediction,the step-wise prediction errors accumulate. Without thereinjection of the externally driven density ﬁeld into ourprediction process, the quality of the outcome decreasesdrastically. Due to the bouyancy based coupling of densityand velocity this effect intensiﬁes.In Table 3 we compare the long-term temporal predictionperformance of a . (no-split) version and our . LSSversion over a time horizon of simulation steps. Thosesplit numbers correspond to the percentage designated for atent Space Subdivision

Table 3: LSS and no-split comparison; rotating andmoving cup; time steps

LS Split Type PSNR u PSNR ρ VelDen

VelDen

Vel the velocity part of the latent space, e.g., a value of . means that of the latent space are used for the encodedvelocity. Since the no-split version contains a classic AEwithout any latent space constraints, the evaluation can onlybe performed for the prediction-only ( VelDen ) benchmark,since the reinjection of density is not possible with the clas-sic AE. Our LSS . version with a density PSNR valueof . clearly outperforms the no-split version with adensity PSNR value of . . Due to the reinjection ca-pabilities of our method, the resulting prediction remainsstable whereas the classic no-split approach fails to cap-ture the ﬂow throughout the prediction horizon and evenproduces unphysical behavior (see Figure 4).

6. Results

We demonstrate the effectiveness of the subdivided latentspace with several generalization tests. As shown in Fig-ure 5, our method is capable of predicting the ﬂuid motioneven when an obstacle is placed in the domain. Due to oursplit latent space, the obstacle can be passively injected intothe prediction process by supplying an encoded density ﬁeldwith a masked out obstacle region. We replaced the densitypart of the latent space with its encoded state after advectingit with our predicted velocity ﬁeld. The prediction withoutFigure 7: Four different 3D moving smoke predic-tions. The movement is completely predicted by ourproposed method.injection of external information is not capturing the obsta-cle and thereby deviates from the ground truth. In Figure 6we show that our method is capable of predicting the ﬂuid motion even when a new inﬂow region is added that was notseen while training. Our network performs reasonably wellin all tested experiments due to the reinjection capabilitiesprovided by the latent space subdivision, even though thegeneralization scenes were not part of the training data.Table 4: Average timing of a simulation step com-puted via regular pressure solve and our

Vel predictionscheme. While the former scales with data complex-ity, ours scales linearly with the domain dimension.Average of scenes with simulation steps each.Measured on Intel(R) Xeon(R) E5-1650 v3 (3.50GHz)and Nvidia GeForce RTX 2070. Scene Resolution Type Solve [s] Total [s]Rot. mov. cup 3D Simulation 0.891 0.960Rot. mov. cup 3D Prediction

Mov. smoke 3D Simulation 0.472 0.537Mov. smoke 3D Prediction

Rot. mov. cup Simulation 0.041 0.044Rot. mov. cup Prediction

Rot. cup Simulation 0.018 0.019Rot. cup Prediction

To demonstrate the capabilities of our method, we traineda 3D version of our network on the moving smoke scene;selected frames are shown in Figure 7. Additionally, wecompared the runtime performance of our networks to theregular solver that was used for generating the training data(see Table 4). Even though, we need to decode and encodethe density ﬁeld due to our reinjection method and therebycopy it from GPU to CPU memory, we still arrive at a perfor-mance measure of . seconds for an average predictionstep in our 3D scene. For comparison, a traditional multi-threaded CPU-based solver takes . seconds on averagefor a simulation step for the same scenes.

7. Conclusion & Future Work

We have demonstrated an approach for subdividing latentspaces in a controlled manner, to improve generalizationand long-term stability of physics predictions. In combi-nation with our time-aware end-to-end training that learnsa reduced representation together with the time prediction,this makes it possible to predict sequences with several hun-dred roll-out steps. In addition, our trained networks canbe evaluated very efﬁciently, and yield signiﬁcant speed-ups compared to traditional solvers. As future work, webelieve it will be interesting to further extend the general-izing capabilities of our network such that it can cover awider range of physical behavior. In addition, it will beinteresting to explore different architectures to reduce thehardware requirements for training large 3D models withour approach. atent Space Subdivision

Acknowledgements

This work was supported by the ERC Starting Grant re-alFlow (StG2015637014) and the Swiss National ScienceFoundation (grant no. 200021_168997). Source code andvideo: https://ge.in.tum.de/publications/latent-space-subdivision/ . References

Ando, R., Thürey, N., and Wojtan, C. A dimension-reducedpressure solver for liquid simulations.

Comput. Graph.Forum , 34(2):473–480, May 2015. ISSN 0167-7055. doi:10.1111/cgf.12576. URL http://dx.doi.org/10.1111/cgf.12576 .Bridson, R.

Fluid Simulation for Computer Graphics .CRC Press, 2015.Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., and Shah,R. Signature veriﬁcation using a "siamese" time delayneural network. In

Proceedings of the 6th InternationalConference on Neural Information Processing Systems ,NIPS’93, pp. 737–744, 1993.Brunton, S., Noack, B., and Koumoutsakos, P. MachineLearning for Fluid Mechanics. may 2019. URL http://arxiv.org/abs/1905.11075 .Chu, M. and Thuerey, N. Data-driven synthesis of smokeﬂows with CNN-based feature descriptors.

ACM Trans.Graph. , 36(4)(69), 2017.Cui, Q., Sen, P., and Kim, T. Scalable laplacian eigenﬂuids.

ACM Trans. Graph. , 37(4):87:1–87:12, July 2018. ISSN0730-0301. doi: 10.1145/3197517.3201352. URL http://doi.acm.org/10.1145/3197517.3201352 .De Witt, T., Lessig, C., and Fiume, E. Fluid simulationusing laplacian eigenfunctions.

ACM Trans. Graph. , 31(1):10:1–10:11, February 2012. ISSN 0730-0301. doi:10.1145/2077341.2077351. URL http://doi.acm.org/10.1145/2077341.2077351 .Gupta, M. and Narasimhan, S. G. Legendre ﬂuids: A uniﬁedframework for analytic reduced space modeling and ren-dering of participating media. In

Proceedings of the 2007ACM SIGGRAPH/Eurographics Symposium on ComputerAnimation , SCA ’07, pp. 17–25, Aire-la-Ville, Switzer-land, Switzerland, 2007. Eurographics Association. ISBN978-1-59593-624-0. URL http://dl.acm.org/citation.cfm?id=1272690.1272693 .Jones, A. D., Sen, P., and Kim, T. Compressing ﬂuid sub-spaces. In

Symposium on Computer Animation , pp. 77–84.ACM/Eurographics, 2016. Kim, B., C. Azevedo, V., Thuerey, N., Kim, T., Gross, M.,and Solenthaler, B. Deep Fluids: A Generative Networkfor Parameterized Fluid Simulations.

Computer GraphicsForum (Proc. Eurographics) , 38(2), 2019.Kim, T. and Delaney, J. Subspace ﬂuid re-simulation.

ACMTrans. Graph. , 32(4):62:1–62:9, July 2013. ISSN 0730-0301. doi: 10.1145/2461912.2461987. URL http://doi.acm.org/10.1145/2461912.2461987 .Ladický, L., Jeong, S., Solenthaler, B., Pollefeys, M., andGross, M. Data-driven ﬂuid simulations using regressionforests.

ACM Transactions on Graphics , 34(6):1–9, oct2015.Liu, B., Mason, G., Hodgson, J., Tong, Y., and Desbrun,M. Model-reduced variational ﬂuid simulation.

ACMTrans. Graph. , 34(6):244:1–244:12, October 2015. ISSN0730-0301. doi: 10.1145/2816795.2818130. URL http://doi.acm.org/10.1145/2816795.2818130 .Long, B. and Reinhard, E. Real-time ﬂuid simulation us-ing discrete sine/cosine transforms. In

Proceedings ofthe 2009 Symposium on Interactive 3D Graphics andGames , I3D ’09, pp. 99–106, New York, NY, USA, 2009.ACM. ISBN 978-1-60558-429-4. doi: 10.1145/1507149.1507165. URL http://doi.acm.org/10.1145/1507149.1507165 .Manteaux, P.-L., Vimont, U., Wojtan, C., Rohmer, D.,and Cani, M.-P. Space-time sculpting of liquid anima-tion. In

Proceedings of the 9th International Conferenceon Motion in Games , MIG ’16, pp. 61–71, New York,NY, USA, 2016. ACM. ISBN 978-1-4503-4592-7. doi:10.1145/2994258.2994261. URL http://doi.acm.org/10.1145/2994258.2994261 .Morton, J., Jameson, A., Kochenderfer, M. J., and Wither-den, F. Deep dynamical modeling and control of unsteadyﬂuid ﬂows. In

Proceedings of Neural Information Pro-cessing Systems , 2018.Radford, A., Metz, L., and Chintala, S. Unsupervised rep-resentation learning with deep convolutional generativeadversarial networks.

Proc. ICLR , 2016.Raveendran, K., Wojtan, C., Thuerey, N., and Turk, G.Blending liquids.

ACM Trans. Graph. , 33(4):137:1–137:10, July 2014. ISSN 0730-0301. doi: 10.1145/2601097.2601126. URL http://doi.acm.org/10.1145/2601097.2601126 .Sato, S., Dobashi, Y., and Nishita, T. Editing ﬂuid animationusing ﬂow interpolation.

ACM Trans. Graph. , 37(5):173:1–173:12, September 2018. ISSN 0730-0301. doi:10.1145/3213771. URL http://doi.acm.org/10.1145/3213771 . atent Space Subdivision Selle, A., Fedkiw, R., Kim, B., Liu, Y., and Rossignac, J.An Unconditionally Stable MacCormack Method.

J. Sci.Comput. , 35(2-3):350–371, June 2008.Thuerey, N. Interpolations of smoke and liquid simulations.

ACM Trans. Graph. , 36(1):3:1–3:16, September 2016.ISSN 0730-0301. doi: 10.1145/2956233. URL http://doi.acm.org/10.1145/2956233 .Thuerey, N. and Pfaff, T. MantaFlow, 2018.Tompson, J., Schlachter, K., Sprechmann, P., and Perlin, K.Accelerating eulerian ﬂuid simulation with convolutionalnetworks. In

Proceedings of the 34th ICML Vol. 70 , pp.3424–3433. JMLR. org, 2017.Treuille, A., Lewis, A., and Popovi´c, Z. Model reduction forreal-time ﬂuids. In

ACM SIGGRAPH 2006 Papers , SIG-GRAPH ’06, pp. 826–834, New York, NY, USA, 2006.ACM. ISBN 1-59593-364-6. doi: 10.1145/1179352.1141962. URL http://doi.acm.org/10.1145/1179352.1141962 .Um, K., Hu, X., and Thuerey, N. Liquid splash modelingwith neural networks. In

Computer Graphics Forum ,volume 37, pp. 171–182. Wiley Online Library, 2018.Wicke, M., Stanton, M., and Treuille, A. Modular bases forﬂuid dynamics.

ACM Trans. Graph. , 28(3):39, August2009.Wiewel, S., Becher, M., and Thuerey, N. Latent spacephysics: Towards learning the temporal evolution of ﬂuidﬂow.

Computer Graphics Forum , 38(2):71–82, 2019.Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum,J. Learning a probabilistic latent space of object shapesvia 3d generative-adversarial modeling. In

Advances inNeural Information Processing Systems , pp. 82–90, 2016.Xie, Y., Franz, E., Chu, M., and Thuerey, N. tempoGAN:A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow.

SIGGRAPH , 2018.Yang, C., Yang, X., and Xiao, X. Data-driven projec-tion method in ﬂuid simulation.

Computer Animationand Virtual Worlds , 27(3-4):415–424, may 2016. ISSN15464261. doi: 10.1002/cav.1695. URL http://doi.wiley.com/10.1002/cav.1695 . atent Space Subdivision Supplemental Document forLatent Space Subdivision:Stable and Controllable TimePredictions for Fluid Flow

A. Evaluation

A.1. Prediction Window Size

The prediction window w describes the count of consecutivetime steps that are taken as input by the temporal predictionnetwork. In our comparison we tested window sizes rangingfrom over up to consecutive input steps. The results interms of PSNR values are displayed in Table 5 and Table 6.Table 5: Prediction win-dow w comparison Vel w PSNR u PSNR ρ Table 6: Predictionwindow w comparison VelDen w PSNR u PSNR ρ It becomes apparent that the prediction-only approach(

VelDen ) beneﬁts from a larger input window, whereas the

Vel approach with reinjected external information performsbest with a smaller input window.

A.2. Latent Space Split Percentage

We evaluated the impact of the latent space split percentageon three of our datasets. Therefore, we trained multiplemodels with different split percentages on the individualdatasets. The comparison for our moving smoke scene isshown in Table 7 and Table 8. The latter are the results ofthe prediction-only evaluation (denoted

VelDen ), whereasthe ﬁrst table presents the results of our reinjected densityapproach (denoted

Vel ). In this experiment all split versionsare outperformed by the no-split version in the prediction-only only setup with PSNR values of . and . forvelocity and density, respectively.Table 7: LS split com-parison Vel ; movingsmoke

LS Split PSNR u PSNR ρ Table 8: LS split com-parison

VelDen ; movingsmoke

LS Split PSNR u PSNR ρ Table 9: LS split com-parison

Vel ; rotating cup

LS Split PSNR u PSNR ρ Table 10: LS split com-parison

VelDen ; rotatingcup

LS Split PSNR u PSNR ρ Table 11: LS split com-parison

Vel ; rotating andmoving cup

LS Split PSNR u PSNR ρ Table 12: LS split com-parison

VelDen ; rotatingand moving cup

LS Split PSNR u PSNR ρ Table 13: LSS and no-split comparison; rotat-ing cup; time steps

LS Split Type PSNR u PSNR ρ VelDen

VelDen

Vel

In contrast, the networks trained on the rotating cup datasetbehave different as shown in Table 9 and Table 10. Theclassic no-split version is outperformed by all other split ver-sions in terms of density PSNR values in the prediction-only(

VelDen ) setup. In the reinjected density evaluation (

Vel ), thebeneﬁt of latent space splitting becomes even more apparentwhen comparing the PSNR values of velocity, . anddensity, . of the . network with the velocity PSNRof . and density PSNR . of the no-split version. A.3. Latent Space Subdivision vs. No-Split

In this section we present additional results for our rotatingcup dataset. See the main document for a long-term compar-ison of LSS vs. no-split for our more complicated movingand rotating cup dataset. In Table 13 we compare the tem-poral prediction performance of a . (no-split) version andour . LSS version over a time horizon of simulationsteps. Our LSS . version with a density PSNR value of . clearly outperforms the no-split version with a densityPSNR value of . . A.4. Generalization

Additionally, we show in Figure 8 that our method recoversfrom the removal of smoke in a certain sink-region and iscapable of predicting the ﬂuid motion. atent Space Subdivision

PredictiononlyReinjection(ours)GT

Figure 8: An sink is placed in the upper right of our moving smoke scene. This was unseen during training. Theprediction by our proposed method remains stable and realistic. In the second row density reinjection was applied. Inthe top row no external information was injected. Thus, the sink can’t be processed by the network.

B. Fluid Flow Data

Our work concentrates on single-phase ﬂows, modelled by apressure-velocity formulation of the incompressible Navier-Stokes equations as highlighted in Section 2. Thereby, weapply a classic NS solver to simulate our smoke ﬂows basedon R. Bridson (2015). In addition to Section 4, more infor-mation about the simulation procedure is provided in thefollowing.

B.1. Simulation Setup

The linear system for pressure projection is solved with aconjugate gradient method. The conjugate gradient (CG)solver accuracy is set to · − for our moving smokedataset, whereas an accuracy of · − is utilized for themoving cup datasets. We generated all our datasets witha time step of . . Depending on the behavioral require-ments of our different experiments with rising, hot andsinking, cold smoke we use the Boussinesq model with thesmoke density in combination with a gravity constant of (0 . , − · − , . for the moving and rising smoke and (0 . , · − , . for the rotating cup dataset. To arriveat a more turbulent ﬂow behavior, the gravity constant wasset to (0 . , · − , . for our moving and rotating cupdataset. We do not apply other forces or additional viscosity.We purely rely on numerical diffusion to introduce viscosityeffects.In combination with the quantities required by our classic NS setup, namely ﬂow velocity u , pressure p and density ρ , we also need a ﬂag grid f , an obstacle velocity ﬁeld u obs and the corresponding obstacle levelset for our obstacle sup-porting scenes. Thereby our density ρ is passively advectedwithin the ﬂow velocity u .To handle the obstacle movement accordingly, we calculatethe obstacle velocity ﬁeld by evaluating the displacementper mesh vertex of the previous to the current time stepand applying the interpolated velocities to the accordinggrid cells of the obstacle velocity ﬁeld. Afterwards, theobstacle velocity ﬁeld values are averaged to represent acorrect discretization.In Algorithm 1 the simulation procedure of the movingsmoke dataset is shown. For our obstacle datasets the proce-dure in Algorithm 2 is used, with the prediction algorithmgiven in Algorithm 3. Boundary conditions are abbreviatedwith BC in these algorithm. Algorithm 1

Moving smoke simulation while t → t + 1 do ρ ← applyInﬂowSource( ρ , s ) ρ ← advect( ρ , u ) u ← advect( u , u ) f ← setWallBCs( f , u ) u ← addBuoyancy( ρ , u , f , g ) p ← solvePressure( f , u ) u ← correctVelocity( u , p ) end while atent Space Subdivision Algorithm 2

Rotating and moving cup while t → t + 1 do ρ ← applyInﬂowSource( ρ , s ) ρ ← advect( ρ , u ) u ← advect( u , u ) u obs ← computeObstacleVelocity( obstacle t , obstacle t +1 ) f ← setObstacleFlags( obstacle t ) f ← setWallBCs( f , u , obstacle t , u obs ) u ← addBuoyancy( ρ , u , f , g ) p ← solvePressure( f , u ) u ← correctVelocity( u , p ) end whileAlgorithm 3 Rotating and moving cup network prediction

Vel while t → t + 1 do ρ ← applyInﬂowSource( ρ , s ) ρ ← advect( ρ , u ) u ← advect( u , u ) u obs ← computeObstacleVelocity( obstacle t , obstacle t +1 ) f ← setObstacleFlags( obstacle t ) f ← setWallBCs( f , u , obstacle t , u obs ) ˙ c t ← encode( ˜ u t , ρ t ) ˆ c t ← [˜ c tvel , ˙ c tden ] ˜ c t +1 ← predict( ˆ c t − , ˆ c t ) ˜ u t +1 , ˜ ρ t +1 ← decode( ˜ c t +1 ){ } ˜ ρ t +1 is not used u t +1 ← ˜ u t +1 {o}verwrite the velocity with the pre-diction end whileB.2. Training Datasets In the following multiple simulations contained in our train-ing data set are displayed. Figure 9: Four example sequences of our movingsmoke dataset. For visualization purposes we displayframes to with a step size of for the respec-tive scenes. The smoke density is shown as black.Figure 10: Three example sequences of our rotatingcup dataset. For visualization purposes we displayframes to with a step size of for the respec-tive scenes. The cup-shaped obstacle is highlighted inblue, whereas the smoke density is shown as black.Figure 11: Three example sequences of our rotatingand moving cup dataset. For visualization purposeswe display frames to with a step size of20