Latent Space Subdivision: Stable and Controllable Time Predictions for Fluid Flow
Steffen Wiewel, Byungsoo Kim, Vinicius C. Azevedo, Barbara Solenthaler, Nils Thuerey
LLatent Space Subdivision:Stable and Controllable Time Predictions for Fluid Flow
Steffen Wiewel Byungsoo Kim Vinicius C. Azevedo Barbara Solenthaler Nils Thuerey Abstract
We propose an end-to-end trained neural networkarchitecture to robustly predict the complex dy-namics of fluid flows with high temporal stability.We focus on single-phase smoke simulations in2D and 3D based on the incompressible Navier-Stokes (NS) equations, which are relevant for awide range of practical problems. To achieve sta-ble predictions for long-term flow sequences, aconvolutional neural network (CNN) is trained forspatial compression in combination with a tem-poral prediction network that consists of stackedLong Short-Term Memory (LSTM) layers. Ourcore contribution is a novel latent space subdivi-sion (LSS) to separate the respective input quan-tities into individual parts of the encoded latentspace domain. This allows to distinctively alterthe encoded quantities without interfering withthe remaining latent space values and hence max-imizes external control. By selectively overwrit-ing parts of the predicted latent space points, ourproposed method is capable to robustly predictlong-term sequences of complex physics prob-lems. In addition, we highlight the benefits ofa recurrent training on the latent space creation,which is performed by the spatial compressionnetwork.
1. Introduction
Computing the dynamics of fluids requires solving a set ofcomplex equations over time. This process is computation-ally very expensive, especially when considering that thestability requirement poses a constraint on the maximal timestep size that can be used in a simulation.Due to the high computational resources, approaches formachine learning based physics simulations have recently Department of Computer Science, Technical University ofMunich, Germany Department of Computer Science, ETHZurich, Switzerland. Correspondence to: Steffen Wiewel
2. Related Work and Background
Our work concentrates on single-phase flows, which areusually modeled by a pressure-velocity formulation of theincompressible Navier-Stokes equations: ∂ u ∂t + u · ∇ u = − ρ ∇ p + ν ∇ u + g (1) ∇ · u = 0 , where p denotes pressure, u is the flow velocity, and ρ, ν, g denote density, kinematic viscosity, and external forces, re-spectively. The density ρ is passively advected by the veloc-ity u . A complete overview of fluid simulation techniquesin computer graphics can be found in (Bridson, 2015).Data-driven flow modelling encompasses two distinguishedand complementary efforts: dimensionality reduction andreduced-order modelling. Dimensionality reduction (e.g.,Singular Value Decomposition) scales down the analyzeddata into a set of important features in an attempt to in-crease the sparsity of the representation, while reduced-order modelling (e.g., Galerkin Projection) describes thespatial and temporal dynamics of a system represented by aset of reduced parameters. In computer graphics, the workof Treuille et al. (2006) was the first to use Principal Compo-nent Analysis (PCA) for dimensionality reduction coupledwith a Galerkin Projection method for subspace simulation.This approach was later extended (Kim & Delaney, 2013)with a cubature approach for enabling Semi-Lagrangian andMac-Cormack (Selle et al., 2008) advection schemes, whileimproving the handling of boundary conditions. Reduced-order modelling was also explored to accelerate the pressureprojection step in liquid simulations (Ando et al., 2015).Instead of computing reduced representations from pre-simulated velocity fields, alternative basis functions can beused for reduced-order modelling; examples of basis func-tions include Legendre Polynomials (Gupta & Narasimhan,2007), modular (Wicke et al., 2009) and spectral (Long &Reinhard, 2009) representations. Also Laplacian Eigenfunc-tions have been successfully employed for dimensionalityreduction, due to their natural sparsity and inherent incom-pressibility. De Witt et al. (2012) combined LaplacianEigenfunctions with a Galerkin Projection method, enablingfast and energy-preserving fluid simulations. The approachwas extended to handle arbitrarily-shaped domains (Liuet al., 2015), combined with a Discrete Cosine Transform(DCT) for compression (Jones et al., 2016), and improvedfor scalability (Cui et al., 2018).The aforementioned methods for data-driven flow modellinguse linear basis functions for dimensionality reduction. Thisenables the use of Galerkin Projection for subspace inte-gration, but it limits the power of the reconstruction whencompared to non-linear embeddings. The latent spaces gen-erated by autoencoders (AE) are non-linear and richly cap- ture the input space with fewer variables (Radford et al.,2016; Wu et al., 2016). In light of that, Wiewel et al. (2019)combined a latent space representation with recurrent neu-ral networks (RNN) to predict the temporal evolution offluid functions in the latent space domain. Kim et al. (2019)introduced a generative deep neural network for parameter-ized fluid simulations that only takes a small set of physicalparameters as input to very efficiently synthesize points inthe learned parameter space. Their method also proposes anextension to latent space integration by training a fully con-nected neural network that maps subsequent latent spaces.Our work is related to these two methods, but a main differ-ence is that we use an end-to-end training of both the spatialcompression and the temporal prediction. In combinationwith our latent space subdivision, our predictions are morestable, while previous approaches fail to properly recoverlong-term integration correspondences due to the lack ofautoencoder regularization.For particle-based fluid simulations, a temporal state pre-diction using Regression Forest was presented in Ladickyet al. (2015). Handcrafted features are evaluated in particleneighborhoods and serve as input to the regressor, whichthen predicts the particle velocity of the next time step.Machine learning has also been used in the context of grid-based (Eulerian) fluid simulations. Tompson et al. (2017)used a convolutional neural network (CNN) to model spa-tial dependencies in conjunction with an unsupervised lossfunction formulation to infer pressure fields. A simplerthree-layer fully connected neural network for the samegoal was likewise proposed (Yang et al., 2016). As an al-ternative, learned time evolutions for Koopman operatorswere proposed (Morton et al., 2018), which however em-ploy a pre-computed dimensionality reduction via PCA. Chuet al. (2017) enhance coarse fluid simulations to generatehighly detailed turbulent flows. Individual low-resolutionfluid patches were tracked and mapped to high-resolutioncounterparts via learned descriptors. Xie et al. (2018) ex-tended this approach by using a conditional generative adver-sarial network with a spatio-temporal discriminator supervi-sion. Small-scale splash details in hybrid fluid solvers weretargeted with deep learning-based stochastic models (Umet al., 2018). For a review of machine learning applied tofluid mechanics we refer the reader to (Brunton et al., 2019).In the context of data-driven flow modelling, methodsto interpolate between existing data have been presented.Raveendran et al. (2014) presented a technique to smoothlyblend between pre-computed liquid simulations, which waslater extended with more controllability (Manteaux et al.,2016). Thuerey et al. (2016) used dense space-time deforma-tions represented by grid-based signed-distance functionsfor interpolation of smoke and liquids, and Sato et al. (2018)interpolated velocity fields by minimizing an energy func-tional. atent Space Subdivision
3. Method
The central goal of our models is to robustly and accuratelypredict long-term sequences of flow dynamics. For this, weneed an autoencoder to translate high-dimensional physicsfields into a compressed representation (latent space) anda temporal prediction network to advance the state of thesimulation over time. A key observation is that if thesetwo network components are trained individually, neithercomponent has a holistic view on the underlying problem.The autoencoder, consisting of an encoder E and a decoder D , generates a compressed representation c = E ( x ) , whichfocuses solely on the reconstruction ˜ x = D ( c ) of the giveninput x . Hence, the loss function to minimize is givenby (cid:107) x − ˜ x (cid:107) . Without considering the aspect of time, theautoencoder’s latent space only stores spatial descriptors.Due to the exclusive focus on space, temporally consecutivedata points are not necessarily placed close to each other inthe latent space domain. This poses substantial challengesfor the temporal prediction network.Therefore, we consider the aspect of time within the train-ing of the autoencoder in order to shape its latent spacewith respect to temporal information, in addition to the spa-tial information. Thus, we propose an end-to-end trainingprocedure, where we train our autoencoder and temporalprediction network simultaneously by internally connectingthe latter as a recurrent block to the encoding and decodingblocks of the spatial autoencoder. As a result, the latentspace domain is aware of temporal changes, and can yieldtemporally coherent latent space points that are suitable forthe time prediction network. By default, our training processincludes the combined training of our spatial autoencoderand temporal prediction network as shown in Figure 1. Inthis figure, the encoder E, decoder D and prediction net-work P are duplicated for visualization purposes. In thenext sections, we describe each individual network in moredetail. The spatial encoding of the data is performed by a regularautoencoder, the network for which is split into an encodingand decoding part (Kim et al., 2019). The encoder contains convolution layers with skip connections, which connectits internal layers, followed by one fully-connected layer.The decoder consists of a fully-connected layer, which isfollowed by convolution layers with skip connections.For the fluid simulation dataset used in this work, the in-put is either 2D or 3D, leading to the usage of 2D- or 3D-convolutions and a feature dimension of or , respectively.Furthermore, a data-specific curl-layer is appended to thedecoder network to enforce zero divergence in the result-ing velocity field (Kim et al., 2019), as required by the NSequations (see Equation 1). DP P D D P D P D E E − E E Figure 1: Combined training of autoencoder and tem-poral prediction. The prediction networks input win-dow is set to w = 2 . Thus, the count of recurrentpredictions is n i = n − . The LSS is enforced byapplying the split loss on ¯ c den and ˘ c vel for velocityand density, respectively. A direct AE reconstructionloss is only performed on ˆ x .The dimensionality of the latent space c for a given x isdefined by the final layer of the encoder and can be freelychosen. We pass a velocity u as well as a density field ρ toour encoder, i.e., x = [ u , ρ ] . The velocity field is an activequantity that is used in fluid simulations to advect a passivequantity forward in time. In case of smoke simulations,which is a specific instance of fluid simulations, the passivedensity field is advected by the flow velocity. As a result, c contains information about both the active and passivefields, i.e., velocity and density. Hence, we can accuratelyadvect the passive quantity density with a velocity field withlow computational effort, it makes sense to compute theadvection outside of the network, and project the new stateinto the latent space.In order to be able to alter active and passive fields individ-ually in the compressed representation c , with given inputfield x = [ x vel , x den ] where the subscripts vel and den thereby denote the velocity and density part, we subdivide c into separate parts for velocity c vel and density c den , re-spectively. This property of the latent space is needed forprojecting the new state of the passive quantity into the la-tent space domain. Additionally, to exert explicit externalcontrol over the prediction, we designate another part of c tocontain supervised parameters, called c sp (Kim et al., 2019).In our case of, e.g., a smoke simulation with a rotating cupfilled with smoke, such supervised parameters can be theposition of a smoke source or the rotation angle of a solidobstacle. With this subdivision, we increase stability of ourpredictions and allow for explicit external control. Thissubdivision is described in Equation 2, where v , d , and sp describe the indices of the velocity, density, and supervised atent Space Subdivision parameter parts in the latent space domain, respectively. c = (cid:2) c vel ∼ c den | c sp (cid:3) (2)where c vel = (cid:2) c , . . . , c v (cid:3) , c den = (cid:2) c v +1 , . . . , c d (cid:3) , c sp = (cid:2) c d +1 , . . . , c sp (cid:3) . To arrive at the desired subdivision, the split loss L split (see Equation 3) is used as a loss function in the trainingprocess. It is modelled as a soft constraint and thereby doesnot enforce the parts c vel and c den to be strictly disjunct.The loss is defined as L split ( c , I s , I e ) = I e (cid:88) i = I s (cid:107) c i (cid:107) . (3)Since we divide the latent space in three parts, L split isapplied twice. For the velocity part c vel , the indices I s = v +1 and I e = d are chosen to indicate that the density part mustnot be used on encoding velocities, i.e., L split ( c , v + 1 , d ) .In contrast to the previous limits, for the density part c den the velocity part is indicated by choosing the indices I s = 0 and I e = v , i.e. L split ( c , , v ) . First, only the velocity part x vel of input x is encoded (the density part x den is zero, i.e., ¯ x = [ x vel , ), yielding ¯ c vel . Vice versa, the density part x den of input x is encoded, whereas the velocity part x vel is replaced with zeros, i.e., ˘ x = [0 , x den ] , resulting in ˘ c den .Therefore, the loss is applied twice for the ¯ c and ˘ c encodingsas L split (¯ c , v + 1 , d ) and L split (˘ c , , v ) , respectively.In order to exhibit external control over the prediction, c sp isenforced to contain parameters describing certain attributesof the simulation. While training the network, an addi-tional soft-constraint is applied, which forces the encoder toproduce the supervised parameters. The soft-constraint isimplemented as the mean-squared error of the values gener-ated by the encoder ˆ c sp and the ground truth data c sp andconsitutes the supervised loss L sup as L sup ( c sp , ˆ c sp ) = (cid:107) c sp − ˆ c sp (cid:107) . (4)Additionally, an AE loss L AE (Equation 5) is applied to thedecoded field ˆ x . It forces the velocity part of the decodedfield to be close to the input velocity by applying the mean-absolute error. To take the rate of change of the velocitesinto consideration as well, the mean-absolute error of thevelocities’ gradient is added to the formulation. In contrast,the density part is handled by directly applying the mean-squared error on the decoded output density and the input.The AE loss is thereby defined as L AE ( x , ˆ x ) = (cid:107) x vel − ˆ x vel (cid:107) + (cid:107)∇ x vel − ∇ ˆ x vel (cid:107) + (cid:107) x den − ˆ x den (cid:107) . (5) The prediction network performs a temporal transformationof its input to the temporal consecutive state. The inputs area series of w consecutive input states. The prediction net-work block contains two recurrent LSTM layers, followedby two 1D-convolutional layers. In our case of 2D smokesimulations, two consecutive latent space points of dimen-sion are used as input. Those are fed to the predictionlayers and result in one latent space point of dimension ,called the residuum ∆ c t . Afterwards, the residuum is addedto the last input state to arrive at the next consecutive state,i.e., ˜ c t +1 = c t + ∆ c t .Due to the subdivision capability of our autoencoder, ourtemporal prediction network supports external influenceover the predictions it generates. After each prediction, it ispossible to replace or update information without the needof re-encoding the predicted data. Instead, only parts ofthe predicted latent space point can be replaced, enablingfine-grained control over the flow. For example, in the caseof smoke simulations, the passive smoke density quantitycan be overwritten with an externally updated version, i.e.,the c den part is replaced by c . This allows for adding newsmoke sources or modifying the current flow by removingsmoke from certain parts of the simulation domain.Considering the exposition of the prediction input window w , which can be chosen freely, and the desired internaliteration count n i = n − w , the additive prediction erroris brought into consideration for the prediction network Pwhile training, i.e., it is traversed n i times. This leads to acombined training loss of AE and P defined as L = L AE,direct + L sup + L split,vel + L split,den + n i (cid:88) n =0 (cid:0) L AE,pred ni (cid:1) , (6)where L AE is applied to the corresponding pairs of thedecoded outputs ˜ x ... ˜ x n of the prediction network P andtheir corresponding ground-truth x ... x n . Thereby, ourfinal loss is the sum of all the previously presented losses.In our combined training approach, both networks updatetheir weights by applying holistic gradient evaluations, i.e.,are trained end-to-end. The benefit of the end-to-end train-ing is that the spatial autoencoder AE also incorporatestemporal aspects when updating its weights. In addition,by recurrently applying L AE on the predictions, the predic-tion network P is trained to actively minimize accumulatingadditive errors. To incorporate temporal awareness in theautoencoder, the decoder block is connected to the individ-ual prediction outputs and is thereby reused several timesin one network traversal. The recurrent usage of the decod-ing block is commonly known as weight sharing (Bromleyet al., 1993). Furthermore, by applying the prediction losses atent Space Subdivision on the decoded predictions, the spatial autoencoder adaptsto the changes induced by the temporal prediction as well,which furthers the focus of the autoencoder to produce la-tent spaces suitable for temporal predictions. As a result,the prediction network is capable of robustly and accuratelypredicting long-term sequences of complex fluid flows.
4. Training Datasets
The datasets we used to train our networks contain random-ized smoke flows simulated with an open source framework(Thuerey & Pfaff, 2018). In total, three different scene se-tups were used to capture a wide range of complex physicalbehavior. The first scene contains a moving smoke inflowsource that generates hot smoke continuously, which is ris-ing and producing complex swirls (see Figure 2).Figure 2: Example sequences of our 2D datasets: mov-ing smoke (top), rotating (center) and moving cup(bottom). The smoke density is shown as black andthe cup-shaped obstacle in blue.The second and third scenes simulate cold smoke in a cup-shaped obstacle. The former rotates the cup randomlyaround a fixed axis, while the latter additionally applies atranslation (see Figure 2). The rising smoke and the rotatingcup scene each expose one control parameter, i.e., move-ment on the x-axis and rotation around the z-axis, whereasthe rotating and moving cup scene exposes both of thesecontrol parameters. Each of the three datasets in 2D con-tains randomly generated scenes with consecutiveframes. Additionally, the moving smoke as well as the rotat-ing and moving cup dataset was generated in 3D with randomly generated scenes and consecutive frames (seeTable 1). Table 1: Statistics of our datasets.
Scene Type Resolution
100 600Moving Smoke (3D)
100 600Rotating and Moving Cup (2D)
200 600Rotating Cup (2D)
200 600Moving Smoke (2D) x
200 600
5. Evaluation
In this section, we compare our architecture to the baselineof previous work. We also perform an ablation study on dif-ferent settings of our proposed architecture to compare theirrespective influence on the output. We compute the meanpeak signal-to-noise ratio (PSNR) for all our comparisons,i.e., larger values are better. For each case, we measureaccuracy of our prediction w.r.t. density and velocity interms of PSNR for ten simulations setups that were not seenduring training.For a thorough evaluation, we supply two prediction ap-proaches. First, we evaluate a regular prediction approachwith no reinjection of physical information (denoted
VelDen )that is in sync with previous work (Kim et al., 2019; Wiewelet al., 2019). This approach is formulated as ˜ c t = P (˜ c t − , ˜ c t − ) , (7) ˜ x t = [ ˜ x tvel , ˜ x tden ] = D (˜ c t ) , where the previously predicted latent space points ˜ c t − and ˜ c t − are used to evaluate the next time step ˜ c t . Afterwards, ˜ c t is decoded D (˜ c t ) and the density part ˜ x tden is directlydisplayed, i.e., no external physical information about thesystem state is influencing the output of our VelDen bench-marks.In the second approach, we make use of our LSS to reinjectthe advected density into the prediction to benefit from wellunderstood physical computations that keep our predictionsstable and can be performed in a fast manner. The predictionprocess utilizing our LSS is denoted as
Vel and is given as ˜ c t = P (ˆ c t − , ˆ c t − ) , (8) ˜ x t = [ ˜ x tvel , ˜ x tden ] = D (˜ c t ) , ˙ x tden = Adv ( ˙ x t − den , ˜ x tvel ) , ˙ c t = E ([ ˜ x tvel , ˙ x tden ]) , ˆ c t = [˜ c tvel , ˙ c tden ] , ˆ x t = [ ˜ x tvel , ˙ x tden ] , where we are using the decoded predicted velocity ˜ x tvel toadvect the simulation density ˙ x t − den and reinject its encodedform into our latent space ˆ c t . The new latent space repre-sentation ˆ c t is thereby formed by concatenating the newencoded density ˙ c tden and the predicted encoded velocityfield ˜ c tvel . By reinjecting the advected density field ˙ x tden ,we inform the prediction network about boundary condi-tions as well as other known physical information that isexternal to the prediction state. In the following we willablate on different aspects of our method to evaluate theirrespective influence on the final results. Latent Space Temporal Awareness
The temporal aware-ness of our spatial autoencoder is evaluated in this section, atent Space Subdivision P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t (a) No temporal constraints or super-vised parameters. P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t (b) No temporal constraints but withsupervised parameters. P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t P r i n c i p a l C o m p o n e n t (c) With temporal constraints and su-pervised parameters (ours). Figure 3: Spatial encodings of frames of different smoke simulations. The latent space points are normalized totheir respective maximum and processed with PCA for visualization purposes. Each color stands for a single simulationand represents a series of frames. No-SplitLSS (ours)GT
Figure 4: Long-term prediction of time steps: Ourmethod robustly predicts the fluid flow, whereas theregular prediction method (no-split) fails to capturethe movement and even produces unphysical behavior.Compare the PSNR values in Table 3.since it has a significant impact on the performance of ourtemporal prediction network. In Figure 3 we evaluate threenetworks trained with different loss functions in terms ofthe stability of the latent space they generate for sequences.For each of the plots, frames of different smoke sim-ulations were encoded to the latent space domain with anautoencoder. The resulting latent space points were normal-ized with their respective maximum to the range of [ − , and afterwards transformed to dimensions with PCA. Thesupervised part was removed before normalization. For ourcomparison we chose autoencoders with a latent spacedimension of .The results in Figure 3a were generated with a classic AEthat was trained to only reproduce its input, i.e., only adirect loss on the output (Equation 5) was applied. Forthis classic AE no temporal constraints were imposed, andno supervised parameters were added to the latent space.The resulting PCA decomposition shows a very unevendistribution: large distances between consecutive pointsexist in some siutations, whereas a large part of the samplesare placed closely together in a cluster. PredictReinject(ours)GT
Figure 5: A circular solid obstacle, unseen during train-ing, is placed in the upper right. Our proposed method(center) predicts stable and realistic. The predictiononly approach (top) is not able to capture the obstacle.When adding the supervised parameter loss (Equation 4) inthe second evaluation (Kim et al., 2019) the trajectories be-come more ordered, as shown in Figure 3b, but still exhibit anoticeably uneven distribution. Thus, the supervised param-eter, despite being excluded from the PCA, has a noticeableinfluence on the latent space construction.In Figure 3c, the results of the AE trained with our proposedtime-aware end-to-end training method are shown. ThisAE applies the supervised parameter loss (Equation 4) aswell as the direct loss on the output (Equation 5) and wastrained in combination with the temporal prediction net-work P as described in Section 4. The visualization of thePCA decomposition shows that a strong temporal coherenceof the consecutive data points emerges. This visualizationindicates why our prediction networks yield substantial im-provements in terms of accuracy: the end-to-end trainingprovides the autoencoder with gradients that guide it to-wards learning a latent space mapping that is suitable for atent Space Subdivision
PredictReinject(ours)GT
Figure 6: An additional inflow, unseen in training, isplaced during prediction. In contrast to our approach(center), the prediction only approach (top) is not ableto capture the second inflow.temporal predictions. Intuitively, changes over time requirerelatively small and smooth updates, which results in thevisually more regular curves shown in Figure 3c.
Simple LS Division vs. LSS
A simple approach to arriveat a latent space with a clear subdivision in terms of inputquantities is to use two separate spatial AEs for the individ-ual input quantities. After encoding, the two resulting latentspace points c vel = E vel ( u ) and c den = E den ( ρ ) can beconcatenated, yielding c simple = [ c vel , c den ] . In contrast tothe simple approach, our LSS directly encodes both quanti-ties with a single AE as c LSS = E ([ u , ρ ]) and enforces thesubdivision with a soft-constraint. The combined trainingwith the prediction network is performed identical for bothspatial compression versions. It becomes apparent from theresults in Table 2 b), that the network trained with our soft-constraint outperforms the simple split variant. Especially,when reinjecting the simulation density in the Vel bench-marks, we see a better PSNR value of . for u and . for ρ for our method in comparison to . and . for u and ρ , respectively. The reason for this is that the simplesplit version can not take advantage of synergistic effectsin the input data, since both input quantities are encoded inseparate AEs. In contrast, our method uses the synergiesof the physically interconnected velocity and density fieldsand robustly predicts future time steps. Internal Iterations
We compare the internal iteration countof the prediction network in the training process in Table 2a). By performing multiple recurrent evaluations of our tem-poral network already in the training process we minimizethe additive error build-up of many consecutive predictions.To fight the additive prediction errors over a long time hori-zon is important to arrive at a robust and exact predictedsequence. We chose the values of , and internal it-erations for our comparison. It becomes apparent, that thenetwork trained with internal iterations and thereby the Table 2: Evaluations for Vel and
VelDen predictions;rotating and moving cup: a) Internal predictions n i ; b)Simple split vs. LSS; c) LS dimensionality comparison Vel VelDen a) n i PSNR u PSNR ρ PSNR u PSNR ρ b) Type PSNR u PSNR ρ PSNR u PSNR ρ Simple Split 26.95 16.35 c) LS Dimension | c | PSNR u PSNR ρ PSNR u PSNR ρ
16 30.28 17.66 29.11 17.1232 30.40 17.64 longest prediction horizon is superior in both evaluations. Itshould be noted, that the predictions with reinjection of thephysical density field (
Vel ) have a lower error on the densitythan the prediction-only (
VelDen ) approach, e.g. a PSNRvalue of . in contrast to . for the density field ofthe iteration version. This supports the usefulness of ourproposed latent space subdivision, that is needed to reinjectexternal information. Latent Space Dimensionality
The latent space dimension-ality has a major impact on the resulting weight count ofthe autoencoder as well as the complexity of the temporalpredictions and thereby their difficulty. In the followingwe compare latent space sizes of , and . When itcomes to prediction only ( VelDen ), the PSNR is better fora larger latent space dimensionality. In contrast to this ob-servation, the PSNR value is on the same level for all latentspace sizes, when the simulation density is reinjected (
Vel ).For this reason we used a latent space dimensionality of for all further comparisons. Due to bouyancy, the velocityand density of our smoke simulations are loosely coupled.Thus, additional weights do not increase the overall perfor-mance when the reconstruction of the individual parts of therespective input quantities already converged. Latent Space Subdivision vs. No-Split
To evaluate theusefulness of our LSS method that supports reinjection ofexternal information, it is compared to a classic networksetup that does not support that and instead performs a regu-lar prediction. When only performing a regular prediction,the step-wise prediction errors accumulate. Without thereinjection of the externally driven density field into ourprediction process, the quality of the outcome decreasesdrastically. Due to the bouyancy based coupling of densityand velocity this effect intensifies.In Table 3 we compare the long-term temporal predictionperformance of a . (no-split) version and our . LSSversion over a time horizon of simulation steps. Thosesplit numbers correspond to the percentage designated for atent Space Subdivision
Table 3: LSS and no-split comparison; rotating andmoving cup; time steps
LS Split Type PSNR u PSNR ρ VelDen
VelDen
Vel the velocity part of the latent space, e.g., a value of . means that of the latent space are used for the encodedvelocity. Since the no-split version contains a classic AEwithout any latent space constraints, the evaluation can onlybe performed for the prediction-only ( VelDen ) benchmark,since the reinjection of density is not possible with the clas-sic AE. Our LSS . version with a density PSNR valueof . clearly outperforms the no-split version with adensity PSNR value of . . Due to the reinjection ca-pabilities of our method, the resulting prediction remainsstable whereas the classic no-split approach fails to cap-ture the flow throughout the prediction horizon and evenproduces unphysical behavior (see Figure 4).
6. Results
We demonstrate the effectiveness of the subdivided latentspace with several generalization tests. As shown in Fig-ure 5, our method is capable of predicting the fluid motioneven when an obstacle is placed in the domain. Due to oursplit latent space, the obstacle can be passively injected intothe prediction process by supplying an encoded density fieldwith a masked out obstacle region. We replaced the densitypart of the latent space with its encoded state after advectingit with our predicted velocity field. The prediction withoutFigure 7: Four different 3D moving smoke predic-tions. The movement is completely predicted by ourproposed method.injection of external information is not capturing the obsta-cle and thereby deviates from the ground truth. In Figure 6we show that our method is capable of predicting the fluid motion even when a new inflow region is added that was notseen while training. Our network performs reasonably wellin all tested experiments due to the reinjection capabilitiesprovided by the latent space subdivision, even though thegeneralization scenes were not part of the training data.Table 4: Average timing of a simulation step com-puted via regular pressure solve and our
Vel predictionscheme. While the former scales with data complex-ity, ours scales linearly with the domain dimension.Average of scenes with simulation steps each.Measured on Intel(R) Xeon(R) E5-1650 v3 (3.50GHz)and Nvidia GeForce RTX 2070. Scene Resolution Type Solve [s] Total [s]Rot. mov. cup 3D Simulation 0.891 0.960Rot. mov. cup 3D Prediction
Mov. smoke 3D Simulation 0.472 0.537Mov. smoke 3D Prediction
Rot. mov. cup Simulation 0.041 0.044Rot. mov. cup Prediction
Rot. cup Simulation 0.018 0.019Rot. cup Prediction
To demonstrate the capabilities of our method, we traineda 3D version of our network on the moving smoke scene;selected frames are shown in Figure 7. Additionally, wecompared the runtime performance of our networks to theregular solver that was used for generating the training data(see Table 4). Even though, we need to decode and encodethe density field due to our reinjection method and therebycopy it from GPU to CPU memory, we still arrive at a perfor-mance measure of . seconds for an average predictionstep in our 3D scene. For comparison, a traditional multi-threaded CPU-based solver takes . seconds on averagefor a simulation step for the same scenes.
7. Conclusion & Future Work
We have demonstrated an approach for subdividing latentspaces in a controlled manner, to improve generalizationand long-term stability of physics predictions. In combi-nation with our time-aware end-to-end training that learnsa reduced representation together with the time prediction,this makes it possible to predict sequences with several hun-dred roll-out steps. In addition, our trained networks canbe evaluated very efficiently, and yield significant speed-ups compared to traditional solvers. As future work, webelieve it will be interesting to further extend the general-izing capabilities of our network such that it can cover awider range of physical behavior. In addition, it will beinteresting to explore different architectures to reduce thehardware requirements for training large 3D models withour approach. atent Space Subdivision
Acknowledgements
This work was supported by the ERC Starting Grant re-alFlow (StG2015637014) and the Swiss National ScienceFoundation (grant no. 200021_168997). Source code andvideo: https://ge.in.tum.de/publications/latent-space-subdivision/ . References
Ando, R., Thürey, N., and Wojtan, C. A dimension-reducedpressure solver for liquid simulations.
Comput. Graph.Forum , 34(2):473–480, May 2015. ISSN 0167-7055. doi:10.1111/cgf.12576. URL http://dx.doi.org/10.1111/cgf.12576 .Bridson, R.
Fluid Simulation for Computer Graphics .CRC Press, 2015.Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., and Shah,R. Signature verification using a "siamese" time delayneural network. In
Proceedings of the 6th InternationalConference on Neural Information Processing Systems ,NIPS’93, pp. 737–744, 1993.Brunton, S., Noack, B., and Koumoutsakos, P. MachineLearning for Fluid Mechanics. may 2019. URL http://arxiv.org/abs/1905.11075 .Chu, M. and Thuerey, N. Data-driven synthesis of smokeflows with CNN-based feature descriptors.
ACM Trans.Graph. , 36(4)(69), 2017.Cui, Q., Sen, P., and Kim, T. Scalable laplacian eigenfluids.
ACM Trans. Graph. , 37(4):87:1–87:12, July 2018. ISSN0730-0301. doi: 10.1145/3197517.3201352. URL http://doi.acm.org/10.1145/3197517.3201352 .De Witt, T., Lessig, C., and Fiume, E. Fluid simulationusing laplacian eigenfunctions.
ACM Trans. Graph. , 31(1):10:1–10:11, February 2012. ISSN 0730-0301. doi:10.1145/2077341.2077351. URL http://doi.acm.org/10.1145/2077341.2077351 .Gupta, M. and Narasimhan, S. G. Legendre fluids: A unifiedframework for analytic reduced space modeling and ren-dering of participating media. In
Proceedings of the 2007ACM SIGGRAPH/Eurographics Symposium on ComputerAnimation , SCA ’07, pp. 17–25, Aire-la-Ville, Switzer-land, Switzerland, 2007. Eurographics Association. ISBN978-1-59593-624-0. URL http://dl.acm.org/citation.cfm?id=1272690.1272693 .Jones, A. D., Sen, P., and Kim, T. Compressing fluid sub-spaces. In
Symposium on Computer Animation , pp. 77–84.ACM/Eurographics, 2016. Kim, B., C. Azevedo, V., Thuerey, N., Kim, T., Gross, M.,and Solenthaler, B. Deep Fluids: A Generative Networkfor Parameterized Fluid Simulations.
Computer GraphicsForum (Proc. Eurographics) , 38(2), 2019.Kim, T. and Delaney, J. Subspace fluid re-simulation.
ACMTrans. Graph. , 32(4):62:1–62:9, July 2013. ISSN 0730-0301. doi: 10.1145/2461912.2461987. URL http://doi.acm.org/10.1145/2461912.2461987 .Ladický, L., Jeong, S., Solenthaler, B., Pollefeys, M., andGross, M. Data-driven fluid simulations using regressionforests.
ACM Transactions on Graphics , 34(6):1–9, oct2015.Liu, B., Mason, G., Hodgson, J., Tong, Y., and Desbrun,M. Model-reduced variational fluid simulation.
ACMTrans. Graph. , 34(6):244:1–244:12, October 2015. ISSN0730-0301. doi: 10.1145/2816795.2818130. URL http://doi.acm.org/10.1145/2816795.2818130 .Long, B. and Reinhard, E. Real-time fluid simulation us-ing discrete sine/cosine transforms. In
Proceedings ofthe 2009 Symposium on Interactive 3D Graphics andGames , I3D ’09, pp. 99–106, New York, NY, USA, 2009.ACM. ISBN 978-1-60558-429-4. doi: 10.1145/1507149.1507165. URL http://doi.acm.org/10.1145/1507149.1507165 .Manteaux, P.-L., Vimont, U., Wojtan, C., Rohmer, D.,and Cani, M.-P. Space-time sculpting of liquid anima-tion. In
Proceedings of the 9th International Conferenceon Motion in Games , MIG ’16, pp. 61–71, New York,NY, USA, 2016. ACM. ISBN 978-1-4503-4592-7. doi:10.1145/2994258.2994261. URL http://doi.acm.org/10.1145/2994258.2994261 .Morton, J., Jameson, A., Kochenderfer, M. J., and Wither-den, F. Deep dynamical modeling and control of unsteadyfluid flows. In
Proceedings of Neural Information Pro-cessing Systems , 2018.Radford, A., Metz, L., and Chintala, S. Unsupervised rep-resentation learning with deep convolutional generativeadversarial networks.
Proc. ICLR , 2016.Raveendran, K., Wojtan, C., Thuerey, N., and Turk, G.Blending liquids.
ACM Trans. Graph. , 33(4):137:1–137:10, July 2014. ISSN 0730-0301. doi: 10.1145/2601097.2601126. URL http://doi.acm.org/10.1145/2601097.2601126 .Sato, S., Dobashi, Y., and Nishita, T. Editing fluid animationusing flow interpolation.
ACM Trans. Graph. , 37(5):173:1–173:12, September 2018. ISSN 0730-0301. doi:10.1145/3213771. URL http://doi.acm.org/10.1145/3213771 . atent Space Subdivision Selle, A., Fedkiw, R., Kim, B., Liu, Y., and Rossignac, J.An Unconditionally Stable MacCormack Method.
J. Sci.Comput. , 35(2-3):350–371, June 2008.Thuerey, N. Interpolations of smoke and liquid simulations.
ACM Trans. Graph. , 36(1):3:1–3:16, September 2016.ISSN 0730-0301. doi: 10.1145/2956233. URL http://doi.acm.org/10.1145/2956233 .Thuerey, N. and Pfaff, T. MantaFlow, 2018.Tompson, J., Schlachter, K., Sprechmann, P., and Perlin, K.Accelerating eulerian fluid simulation with convolutionalnetworks. In
Proceedings of the 34th ICML Vol. 70 , pp.3424–3433. JMLR. org, 2017.Treuille, A., Lewis, A., and Popovi´c, Z. Model reduction forreal-time fluids. In
ACM SIGGRAPH 2006 Papers , SIG-GRAPH ’06, pp. 826–834, New York, NY, USA, 2006.ACM. ISBN 1-59593-364-6. doi: 10.1145/1179352.1141962. URL http://doi.acm.org/10.1145/1179352.1141962 .Um, K., Hu, X., and Thuerey, N. Liquid splash modelingwith neural networks. In
Computer Graphics Forum ,volume 37, pp. 171–182. Wiley Online Library, 2018.Wicke, M., Stanton, M., and Treuille, A. Modular bases forfluid dynamics.
ACM Trans. Graph. , 28(3):39, August2009.Wiewel, S., Becher, M., and Thuerey, N. Latent spacephysics: Towards learning the temporal evolution of fluidflow.
Computer Graphics Forum , 38(2):71–82, 2019.Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum,J. Learning a probabilistic latent space of object shapesvia 3d generative-adversarial modeling. In
Advances inNeural Information Processing Systems , pp. 82–90, 2016.Xie, Y., Franz, E., Chu, M., and Thuerey, N. tempoGAN:A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow.
SIGGRAPH , 2018.Yang, C., Yang, X., and Xiao, X. Data-driven projec-tion method in fluid simulation.
Computer Animationand Virtual Worlds , 27(3-4):415–424, may 2016. ISSN15464261. doi: 10.1002/cav.1695. URL http://doi.wiley.com/10.1002/cav.1695 . atent Space Subdivision Supplemental Document forLatent Space Subdivision:Stable and Controllable TimePredictions for Fluid Flow
A. Evaluation
A.1. Prediction Window Size
The prediction window w describes the count of consecutivetime steps that are taken as input by the temporal predictionnetwork. In our comparison we tested window sizes rangingfrom over up to consecutive input steps. The results interms of PSNR values are displayed in Table 5 and Table 6.Table 5: Prediction win-dow w comparison Vel w PSNR u PSNR ρ Table 6: Predictionwindow w comparison VelDen w PSNR u PSNR ρ It becomes apparent that the prediction-only approach(
VelDen ) benefits from a larger input window, whereas the
Vel approach with reinjected external information performsbest with a smaller input window.
A.2. Latent Space Split Percentage
We evaluated the impact of the latent space split percentageon three of our datasets. Therefore, we trained multiplemodels with different split percentages on the individualdatasets. The comparison for our moving smoke scene isshown in Table 7 and Table 8. The latter are the results ofthe prediction-only evaluation (denoted
VelDen ), whereasthe first table presents the results of our reinjected densityapproach (denoted
Vel ). In this experiment all split versionsare outperformed by the no-split version in the prediction-only only setup with PSNR values of . and . forvelocity and density, respectively.Table 7: LS split com-parison Vel ; movingsmoke
LS Split PSNR u PSNR ρ Table 8: LS split com-parison
VelDen ; movingsmoke
LS Split PSNR u PSNR ρ Table 9: LS split com-parison
Vel ; rotating cup
LS Split PSNR u PSNR ρ Table 10: LS split com-parison
VelDen ; rotatingcup
LS Split PSNR u PSNR ρ Table 11: LS split com-parison
Vel ; rotating andmoving cup
LS Split PSNR u PSNR ρ Table 12: LS split com-parison
VelDen ; rotatingand moving cup
LS Split PSNR u PSNR ρ Table 13: LSS and no-split comparison; rotat-ing cup; time steps
LS Split Type PSNR u PSNR ρ VelDen
VelDen
Vel
In contrast, the networks trained on the rotating cup datasetbehave different as shown in Table 9 and Table 10. Theclassic no-split version is outperformed by all other split ver-sions in terms of density PSNR values in the prediction-only(
VelDen ) setup. In the reinjected density evaluation (
Vel ), thebenefit of latent space splitting becomes even more apparentwhen comparing the PSNR values of velocity, . anddensity, . of the . network with the velocity PSNRof . and density PSNR . of the no-split version. A.3. Latent Space Subdivision vs. No-Split
In this section we present additional results for our rotatingcup dataset. See the main document for a long-term compar-ison of LSS vs. no-split for our more complicated movingand rotating cup dataset. In Table 13 we compare the tem-poral prediction performance of a . (no-split) version andour . LSS version over a time horizon of simulationsteps. Our LSS . version with a density PSNR value of . clearly outperforms the no-split version with a densityPSNR value of . . A.4. Generalization
Additionally, we show in Figure 8 that our method recoversfrom the removal of smoke in a certain sink-region and iscapable of predicting the fluid motion. atent Space Subdivision
PredictiononlyReinjection(ours)GT
Figure 8: An sink is placed in the upper right of our moving smoke scene. This was unseen during training. Theprediction by our proposed method remains stable and realistic. In the second row density reinjection was applied. Inthe top row no external information was injected. Thus, the sink can’t be processed by the network.
B. Fluid Flow Data
Our work concentrates on single-phase flows, modelled by apressure-velocity formulation of the incompressible Navier-Stokes equations as highlighted in Section 2. Thereby, weapply a classic NS solver to simulate our smoke flows basedon R. Bridson (2015). In addition to Section 4, more infor-mation about the simulation procedure is provided in thefollowing.
B.1. Simulation Setup
The linear system for pressure projection is solved with aconjugate gradient method. The conjugate gradient (CG)solver accuracy is set to · − for our moving smokedataset, whereas an accuracy of · − is utilized for themoving cup datasets. We generated all our datasets witha time step of . . Depending on the behavioral require-ments of our different experiments with rising, hot andsinking, cold smoke we use the Boussinesq model with thesmoke density in combination with a gravity constant of (0 . , − · − , . for the moving and rising smoke and (0 . , · − , . for the rotating cup dataset. To arriveat a more turbulent flow behavior, the gravity constant wasset to (0 . , · − , . for our moving and rotating cupdataset. We do not apply other forces or additional viscosity.We purely rely on numerical diffusion to introduce viscosityeffects.In combination with the quantities required by our classic NS setup, namely flow velocity u , pressure p and density ρ , we also need a flag grid f , an obstacle velocity field u obs and the corresponding obstacle levelset for our obstacle sup-porting scenes. Thereby our density ρ is passively advectedwithin the flow velocity u .To handle the obstacle movement accordingly, we calculatethe obstacle velocity field by evaluating the displacementper mesh vertex of the previous to the current time stepand applying the interpolated velocities to the accordinggrid cells of the obstacle velocity field. Afterwards, theobstacle velocity field values are averaged to represent acorrect discretization.In Algorithm 1 the simulation procedure of the movingsmoke dataset is shown. For our obstacle datasets the proce-dure in Algorithm 2 is used, with the prediction algorithmgiven in Algorithm 3. Boundary conditions are abbreviatedwith BC in these algorithm. Algorithm 1
Moving smoke simulation while t → t + 1 do ρ ← applyInflowSource( ρ , s ) ρ ← advect( ρ , u ) u ← advect( u , u ) f ← setWallBCs( f , u ) u ← addBuoyancy( ρ , u , f , g ) p ← solvePressure( f , u ) u ← correctVelocity( u , p ) end while atent Space Subdivision Algorithm 2
Rotating and moving cup while t → t + 1 do ρ ← applyInflowSource( ρ , s ) ρ ← advect( ρ , u ) u ← advect( u , u ) u obs ← computeObstacleVelocity( obstacle t , obstacle t +1 ) f ← setObstacleFlags( obstacle t ) f ← setWallBCs( f , u , obstacle t , u obs ) u ← addBuoyancy( ρ , u , f , g ) p ← solvePressure( f , u ) u ← correctVelocity( u , p ) end whileAlgorithm 3 Rotating and moving cup network prediction
Vel while t → t + 1 do ρ ← applyInflowSource( ρ , s ) ρ ← advect( ρ , u ) u ← advect( u , u ) u obs ← computeObstacleVelocity( obstacle t , obstacle t +1 ) f ← setObstacleFlags( obstacle t ) f ← setWallBCs( f , u , obstacle t , u obs ) ˙ c t ← encode( ˜ u t , ρ t ) ˆ c t ← [˜ c tvel , ˙ c tden ] ˜ c t +1 ← predict( ˆ c t − , ˆ c t ) ˜ u t +1 , ˜ ρ t +1 ← decode( ˜ c t +1 ){ } ˜ ρ t +1 is not used u t +1 ← ˜ u t +1 {o}verwrite the velocity with the pre-diction end whileB.2. Training Datasets In the following multiple simulations contained in our train-ing data set are displayed. Figure 9: Four example sequences of our movingsmoke dataset. For visualization purposes we displayframes to with a step size of for the respec-tive scenes. The smoke density is shown as black.Figure 10: Three example sequences of our rotatingcup dataset. For visualization purposes we displayframes to with a step size of for the respec-tive scenes. The cup-shaped obstacle is highlighted inblue, whereas the smoke density is shown as black.Figure 11: Three example sequences of our rotatingand moving cup dataset. For visualization purposeswe display frames to with a step size of20