[PDF] Multi-fidelity Generative Deep Learning Turbulent Flows

Abstract

In computational fluid dynamics, there is an inevitable trade off between accuracy and computational cost. In this work, a novel multi-fidelity deep generative model is introduced for the surrogate modeling of high-fidelity turbulent flow fields given the solution of a computationally inexpensive but inaccurate low-fidelity solver. The resulting surrogate is able to generate physically accurate turbulent realizations at a computational cost magnitudes lower than that of a high-fidelity simulation. The deep generative model developed is a conditional invertible neural network, built with normalizing flows, with recurrent LSTM connections that allow for stable training of transient systems with high predictive accuracy. The model is trained with a variational loss that combines both data-driven and physics-constrained learning. This deep generative model is applied to non-trivial high Reynolds number flows governed by the Navier-Stokes equations including turbulent flow over a backwards facing step at different Reynolds numbers and turbulent wake behind an array of bluff bodies. For both of these examples, the model is able to generate unique yet physically accurate turbulent fluid flows conditioned on an inexpensive low-fidelity solution.

Full PDF

MMulti-ﬁdelity Generative Deep Learning Turbulent Flows

Nicholas Geneva a , Nicholas Zabaras a, ∗ a Center for Informatics and Computational Science, University of Notre Dame, 311 CushingHall, Notre Dame, IN 46556, USA

Abstract

In computational ﬂuid dynamics, there is an inevitable trade oﬀ between accuracyand computational cost. Low-ﬁdelity simulations with coarse discretizations arecomputationally inexpensive, however, the resulting ﬂow ﬁelds are often inaccurate.Alternatively, high-ﬁdelity simulations can yield accurate predictions but at exponen-tially higher computational cost. In this work, a novel multi-ﬁdelity deep generativemodel is introduced for the surrogate modeling of high-ﬁdelity turbulent ﬂow ﬁeldsgiven the solution of a computationally inexpensive but inaccurate low-ﬁdelity solver.The resulting surrogate is able to generate physically accurate turbulent realizationsat a computational cost magnitudes lower than that of a high-ﬁdelity simulation.The deep generative model developed is a conditional invertible neural network,built with normalizing ﬂows, with recurrent LSTM connections that allow for stabletraining of transient systems with high predictive accuracy. The model is trainedwith a variational loss that combines both data-driven and physics-constrained learn-ing. This deep generative model is applied to non-trivial high Reynolds number ﬂowsgoverned by the Navier-Stokes equations including turbulent ﬂow over a backwardsfacing step at diﬀerent Reynolds numbers and turbulent wake behind an array ofbluﬀ bodies. For both of these examples, the model is able to generate unique yetphysically accurate turbulent ﬂuid ﬂows conditioned on an inexpensive low-ﬁdelitysolution.

Keywords:

Physics-Informed Machine Learning, Multi-ﬁdelity Modeling, InvertibleDeep Neural Networks, Uncertainty Quantiﬁcation, Turbulent Fluid Flow ∗ Corresponding author

Email addresses: [email protected] (Nicholas Geneva), [email protected] (NicholasZabaras)

URL: https://cics.nd.edu/ (Nicholas Zabaras)

Preprint submitted to Elsevier June 9, 2020 a r X i v : . [ phy s i c s . c o m p - ph ] J un . Introduction The numerical simulation and analysis of turbulent ﬂuid ﬂows is of great importanceto many scientiﬁc and engineering domains. Over the past several decades compu-tational ﬂuid dynamics (CFD) has become an integral component of academia andindustry. However high-accuracy ﬂuid simulation remains a computationally de-manding task particularly at high Reynolds numbers for which the ﬂow is turbulent.This has led to a hierarchy of simulation models to predict ﬂuid ﬂow ranging fromthe fast but typically inaccurate Reynolds-Averaged Navier-Stokes (RANS) to thefully resolved but super-computer demanding direct numerical simulation (DNS) [1].Large-eddy simulation (LES) has become a work horse method for scientiﬁc and in-dustrial analysis since it can achieve both reasonable accuracy and computationalrequirement. Often only a section of the entire simulation domain is of interest orrequires a greater degree of accuracy. Such examples include boundary layers, tur-bulent wakes behind a suspended or wall mounted objects, the interface betweentwo ﬂuids, shock boundaries, etc. This principle that diﬀerent physical scales are ofinterest in diﬀerent locations of the simulation domain has led to the developmentof various multiscale/multilevel methods [2]. Multiscale methods typically combinesimulations at diﬀerent resolutions to increase the accuracy of the simulation withminimal computational overhead.Multiscale computational ﬂuid dynamic methods constitute a rich and well de-veloped ﬁeld that encompasses many diﬀerent methodologies that approach the mul-tiscale aspect through diﬀerent philosophies. Of particular interest are adaptivemultilevel methods which focus on resolving diﬀerent scales based on the complexityof the ﬂuid ﬂow [2]. Such approaches use a hierarchy of grids at various resolutionsto resolve particular areas of the simulation domain at various levels of accuracy.This includes methods that use self-adaptive meshes in which the discretization ofthe simulation domain is evolved to meet speciﬁc resolution criteria [3, 4, 5], globalhybrid methods such as very large eddy simulation (VLES) [6] or detached eddysimulation (DES) [7] and zonal methods for which prespeciﬁed regions of the ﬂowdomain are resolved with higher accuracy to capture relevant physics [8, 9, 10]. Wetake inspiration from these multiscale models to develop a deep learning model thattakes advantage of simulations ran at multiple scales to predict high-ﬁdelity turbulentﬂuid ﬂow. This deep learning model replaces costly high-ﬁdelity simulation enablingus to obtain fast yet accurate turbulent statistics given a coarse simulation.Machine learning in CFD, speciﬁcally the modeling of the N-S equations, hasgained a growing interest in recent years with a wide variety of methods rangingfrom Kalman ﬁlters to deep neural networks. These applications can be brokendown into several major categories including: RANS turbulence modeling, LES sub-2cale grid modeling, ﬂow control and direct ﬂow prediction. Machine learning basedturbulence modeling for RANS simulation seeks to approximate the Reynolds-Stressterm in the RANS equations at an accuracy that is higher than the traditionallyused closure models through the incorporation of prior physical knowledge and high-ﬁdelity information [11, 12, 13, 14, 15]. Similarly, machine learning LES modelsseek to achieve the same goal of providing a sub-scale grid model that predictsthe contribution of neglected turbulent length scales at a higher accuracy than thetraditional methods [16, 17, 18]. These approaches are both very promising, howeverstill rely on pre-existing physical assumptions, approximations and resolutions whichfundamentally limit their predictive capability [19]. Another area of interest hasbeen the use of machine learning models to build a controller to yield a particularﬂuid response [20, 21].The ﬁnal category we discuss is direct ﬂuid ﬂow prediction where the machinelearning model is used to predict the state variables of the ﬂuid ﬂow directly. Thisincludes the use of machine learning to approximate ﬂuid ﬂows for graphical simu-lations [22, 23, 24], prediction of steady-state ﬂows [25, 26], prediction of oscillat-ing/unsteady ﬂows [21, 27, 28, 29], and the super-resolution, compression or repro-duction of various ﬂuid systems [30, 31, 32, 33]. While machine learning has becomea popular tool to predict the behavior of ﬂuids, we note that the majority of thetest cases considered are focused on simple non-turbulent problems. Many worksthat predict turbulent ﬂows are largely focused on qualitative results (e.g. com-puter graphics). This is expected due to the shear complexity of N-S turbulencethat poses a challenging problem for even traditional numerical methods let alonemachine learning models. Given that the vast majority of ﬂuid ﬂows of interest areturbulent in nature, much work is still needed to push the application of machinelearning to practical ﬂuid ﬂow problems of engineering concern.In this work, we accelerate the prediction of high-ﬁdelity turbulent ﬂows given acomputationally inexpensive low-ﬁdelity simulation through generative deep learn-ing. Although similar ideas have been presented in past literature, the proposedmodel diﬀers in several respects. First, we are interested in the prediction of phys-ical turbulent ﬂuid ﬂow governed by the Navier-Stokes equations diﬀering from thesimpler inviscid Euler equations used in computer graphics [22, 23, 24]. Second, ina similar spirit, we are interested in recovering accurate time-averaged and turbu-lent statistics as oppose to ﬂuid ﬂows that are just visually pleasing. Third, in thiswork the input of our model is an inexpensive low-ﬁdelity simulation that providesa coarse yet fairly inaccurate prediction. This contrasts to many works in machinelearning for turbulent applications where compressed [30, 31] or sub-sampled [32, 33]ﬁelds of the high-ﬁdelity target are used as the input. Some auto-regressive models,3uch as the deep neural network (DNN) in [29], are in fact even more dependent ona high-ﬁdelity simulation which is needed to start the time-series prediction. Thereliance upon a direct/coarsened high-ﬁdelity ﬁeld as a model input contains muchricher and more accurate information than a low-ﬁdelity simulation since it is beingsampled from a space for which the physics simulated is signiﬁcantly more precise.While this makes the machine learning problem signiﬁcantly easier, it also results inmodels that need an expensive high-ﬁdelity simulation to derive an input for makingpredictions. Thus the applicability of such models remains questionable. Fourth,we are interested in developing a surrogate model that can be used to predict mul-tiple ﬂows with diﬀerent boundary conditions as opposed to just learning a singleﬂow which is essential for justifying the model’s training cost. Lastly, in contrast topast deterministic approaches [30, 31, 33], our generative model learns the probabil-ity distribution of high-ﬁdelity ﬂow ﬁelds conditioned on the low-ﬁdelity simulationallowing for predictive probabilistic estimates.This paper makes the following novel contributions to the integration of deeplearning with CFD: (a) A multi-ﬁdelity deep generative model is proposed for theprediction of physical high-ﬁdelity ﬂuid ﬂow from a low-ﬁdelity solution. (b) Anovel invertible neural network architecture is proposed to model the distribution ofpossible high-ﬁdelity ﬂuid ﬂow solutions conditioned on the low-ﬁdelity observation.(c) A backwards Kullback-Leibler (KL) divergence loss is used that allows for physics-constrained and standard data-driven training of the generative model. (d) Themodel is deployed and evaluated for surrogate modeling of turbulent ﬂows at diﬀerentReynolds numbers and varying boundary conditions.The remainder of this paper is structured as follows. In Section 2, the problem ofmulti-ﬁdelity generative modeling turbulent ﬂuid ﬂows is introduced and discussed.In Section 3, the generative invertible neural network architecture is introduced withdetails of each component of the model. Following in Section 4, the variationaltraining of the generative model is outlined as well as the tuning of the model’shyper-parameters. The ﬁrst numerical example, in Section 5, investigates the surro-gate modeling of turbulent ﬂow over a backwards facing step at diﬀerent Reynoldsnumbers. The second numerical example, in Section 6, focuses on the prediction ofturbulent wake behind an array of bluﬀ bodies in varying locations. In Section 7, thecomputational cost of both training and testing the proposed deep learning modelis discussed. Lastly, conclusions and discussion are provided in Section 8. All code,trained models and data used in this work is open-sourced for full reproducibility. Code available upon publication. . Multi-ﬁdelity Generative Modeling Fluid Flows Multiscale ﬂuid simulation methods seek to strike an ideal balance between predic-tive accuracy and computational requirement. In particular, zonal/hybrid methodscouple a low-ﬁdelity simulation with a high-ﬁdelity simulation that is only evaluatedin an area of interest. This is most commonly done through the use of RANS orunsteady RANS in the low-ﬁdelity region and LES in the high-ﬁdelity region. Here,we consider the use of a very large eddy simulation (VLES) simulation (a LES simu-lation where the majority of the kinetic energy is unresolved due to a coarse grid) anda LES simulation on a ﬁner mesh for the low- and high-ﬁdelity areas, respectively.As depicted in Fig. 1a, this results in two coupled simulations which are solved simul-taneously with information being passed through the boundary of the high-ﬁdelitysimulation domain. The objective in this work is to replace this high-ﬁdelity sim-ulation zone with a fast generative deep learning model which can quickly predicta high-ﬁdelity realization given the low-ﬁdelity simulation as illustrated in Fig. 1b.We refer to this framework as a multi-ﬁdelity generative model due to the distinctdiﬀerent physical scales resolved by the input and output. While the scope of thenumerical examples explored in this work is focused on the use of low-ﬁdelity andhigh-ﬁdelity LES simulations, everything discussed in this work can be extended toother multi-ﬁdelity models using diﬀerent coarse/ﬁne simulation schemes. We alsonote that there is no limit on the size of the prediction area by the deep learningmodel, i.e. it can be the entire simulation domain if necessary. However, in thiswork, we are motivated out of engineering needs where such zonal approaches areextremely applicable. (a) Hybrid VLES-LES [34]. (b) Multi-ﬁdelity deep generative turbulence.

Figure 1: Comparison between traditional hybrid VLES-LES simulation (left) and the proposedmulti-ﬁdelity deep generative turbulence model (right) for studying the wake behind a wall-mountedcube.

Simply learning a single solution of a PDE with a deep learning model has littlepractical beneﬁt when a numerical solver exists due to the time and computationalinvestment needed to tune and train the model. Thus we are interested in surrogate5odeling turbulence in multiple ﬂows with varying boundary conditions (e.g. obsta-cle position or inlet velocity). This is of particular interest for various engineeringtasks including ﬂuid-structure design/optimization, inverse modeling and uncertaintyquantiﬁcation. To formalize the problem of interest consider an incompressible ﬂowgoverned by the Navier-Stokes (N-S) equations: ∂u j ∂t + u i ∂u j ∂x i = − ρ ∂p∂x j + ν eff ∂ u j ∂x i ∂x i , x ∈ Ω , t ∈ [0 , T ] ,u i ( x ,

0) = u ( x ) , p ( x ,

0) = p ( x ) , B ( u i , p ) = b ( x , t ) , x ∈ Γ , (1)where { u j , p } are the velocity components and pressure, respectively, being resolvedwithin the spatial domain Ω. ν eff is the eﬀective kinematic viscosity which canrepresent the true dynamic viscosity, in the case of DNS, as well as turbulent dissi-pation from length scales not resolved, in the case of LES and URANS. Γ denotes theboundary of the domain of interest for which the boundary operator B imposes thedesired boundary conditions. The initial state of the system is deﬁned by { u , p } .As depicted in Fig. 1b, we wish to build a deep generative model to infer from alow-ﬁdelity ﬂow ﬁeld the corresponding high-ﬁdelity realizations. Due to their pastsuccess for modeling physical systems [35, 36, 37], we will choose to use a convolutionbased generative model with learnable parameters θ . The use of convolutions impliesthat the data is placed onto a structured Euclidean grid, akin to that of pixels inimages. For example, given a two-dimensional incompressible ﬂuid ﬂow ﬁeld, theprediction of a single time-step n would have a low-ﬁdelity input x n = { u l , p l } ∈ R ,d l ,d l and a high-ﬁdelity output y n = { u h , p h } ∈ R ,d h ,d h both of which span thesame domain Ω (cid:48) ∈ Ω as depicted by the dashed boxes in Fig. 1b. Although omitted,this is easily extendable to one- and three-dimensional ﬂuid ﬂows as well. Given that d l , d l < d h , d h , this requires the model to predict length scales not recovered bythe coarse simulation making this problem ill-posed and motivating the need of agenerative probabilistic model to predict the density p ( y n | x n ) as opposed to a singledeterministic solution. Remark 1.

The inclusion of a low-ﬁdelity simulator as an input to the deep learningsurrogate allows for important information regarding the boundary conditions of theﬂow and approximate ﬂow properties to be provided to the model. This simpliﬁesthe learning task signiﬁcantly by providing a physical coarse estimate of the ﬂow,which is important for the prediction of the solution of the highly non-linear N-S equations at high Reynolds numbers. While we have shown in our past workthat deep learning surrogates can successfully model many complex physical systems6ndependently [35, 36, 37], the systems of interest in these works are far less complexthan the turbulent N-S equations.Given that turbulence is a transient phenomenon, predicting at a single time-stepis not suﬃcient, thus we wish to predict an entire time-series high-ﬁdelity Y = (cid:8) y , y , . . . , y N (cid:9) given the respective low-ﬁdelity observations X = (cid:8) x , x , . . . , x N (cid:9) .Although extensions can be made to simplify the model presented, we will assumethat the time-step size, ∆ t , of the low-ﬁdelity input and high-ﬁdelity output is thesame (i.e. for each input there is one output). The objective for this surrogate is forﬂuid ﬂow applications for which the boundary conditions are stochastic such that b ( ˆ x , t ) ∼ p ( b ), where p ( b ) is an empirical, analytically known or an unknown proba-bility distribution. This spans problems including the modeling of a ﬂow at diﬀerentReynolds numbers, diﬀerent domain boundary conditions, ﬂow through varying ge-ometries or diﬀerent initial conditions making this relevant to a vast number of ﬂuidmechanics research studies. Given that we wish to predict an entire time-seriesof high-ﬁdelity realizations, Y , we pose the following deﬁnition for the generativemulti-ﬁdelity surrogate for ﬂows with a stochastic boundary. Deﬁnition 2.1. Generative Surrogate for Flows with Stochastic Bound-ary Conditions.

Consider a low- and high-ﬁdelity simulators that compute ﬂuidﬂow governed by the N-S equations. For a given ﬁnite set of boundary conditions { b ( ˆ x , t ) i } Mi =1 ∼ p ( b ), these simulators are used to collect a training set of low- andhigh-ﬁdelity simulation data D = { X i , Y i } Mi =1 in the time interval t ∈ [0 , T ]. Theproblem of interest is training a generative surrogate to learn p θ ( Y | X ) and computethe predictive conditional density p θ ( Y ∗ | X ∗ , D ) of the high-ﬁdelity ﬂow ﬁeld Y ∗ for any low-ﬁdelity ﬂow time-series X ∗ for a given boundary condition b ∗ ( x , t ) ∼ p ( b ).

3. Transient Multi-ﬁdelity Glow

Deep generative models provide a ﬂexible probabilistic framework with the mostfundamental formulation centered around the use of random latent variables, z , in adeep learning model (i.e. a neural network) to allow for the likelihood of the model’soutput, y , to be expressed as the following marginal: p θ ( y ) = (cid:90) p θ ( y | z ) p θ ( z ) d z , (2)in which θ denotes the model’s parameters. In this work, the model’s output, y ,is the high-ﬁdelity ﬂow ﬁeld we wish to predict, however in this particular section7 should be interpreted as a much more abstract output encompassing a wide va-riety of machine learning problems. The latent variables are speciﬁcally designedsuch that their distribution is simple for sampling. However, this marginal is typi-cally not practical to train due to the large number of samples needed from p θ ( z )to approximate the marginalization. Hence, generative models such as variationalauto-encoders (VAEs) [38] as well as generative adversarial networks (GANs) [39] ap-proximate this likelihood through variational inference or by a min-max adversarialgame, respectively.In this work, we will utilize normalizing ﬂows which in recent years have gainedincreasing attention due to their extension to invertible neural networks (INNs) fortasks such as variational inference and generative modeling [40, 41, 42, 43, 44]. Gen-erative normalizing ﬂows provide a bijective mapping between an unknown likeli-hood density of the observations p θ ( y ) and a known latent density p θ ( z ). Typically, p θ ( y ) can be viewed as the unknown likelihood of a system for which we have aﬁnite number of observations, i.e. training data. Let us consider a mapping witha tractable Jacobian determinant, henceforth referred as the Jacobian, which allowsfor the likelihood to be expressed w.r.t. the latent density as follows: p θ ( y ) = p θ ( z ) (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂ z ∂ y (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (3)which is nothing more than the change of variables formula. This implies that themodel can be trained by maximizing the likelihood of p θ ( y ) (unknown) through thelatent variables assigned a simple distribution p θ ( z ) a-priori (typically Gaussian).As depicted in Fig. 2a, we use f θ ( · ) to denote the learnable function, with a tractableJacobian, that transforms observation to latent variables. To generate samples of y i ∼ p θ ( y ), samples are drawn form the latent distribution z i ∼ p θ ( z ) which arethen transformed using the inverse of the model f − θ ( · ).However, the requirement for a tractable Jacobian as well as a function that caneﬃciently be inverted for sampling is not trivial. Normalizing ﬂows address thischallenge by using a series of change of variable transformations [45, 46], y f θ ←→ h f θ ←→ h . . . f θ K ←−→ z , (4)each of which has a tractable Jacobian and is invertible. This allows for the log ofthe likelihood to be written as a summation of Jacobians:log p θ ( y ) = log p θ ( z ) + K (cid:88) k =1 log (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂ h k ∂ h k − (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (5)8n which h ≡ y and h K ≡ z .The core ideas of normalizing ﬂows can be extended to constructing generativedeep neural network models. A particular subset of ﬂow-based deep learning modelswe are interested in are coupling layer normalizing ﬂows ﬁrst proposed in NICE [40]and Real NVP [41]. A coupling layer is a carefully designed function such that theinverse mapping and the Jacobian can be easily calculated. These layers can thenbe stacked, just like layers of a neural network to form an expressive model with atractable Jacobian and inverse. To increase the expressive capabilities of normalizingﬂow models, various transformations have been proposed to envelope coupling layerssuch as 1 × p θ ( y | x ), for standard machine learning problems andsurrogate modeling of physical systems [36, 47]. (a) INN (b) CINN (c) TM-Glow Figure 2: Comparison of the forward and backward passes of various INN structures including(left to right) the standard INN, conditional INN (CINN) [36] and transient multi-ﬁdelity Glow(TM-Glow) introduced in Section 3.2.

As discussed in Section 2, we are interested in the prediction of a high-ﬁdelity ﬂow, Y = (cid:8) y , y , . . . , y N (cid:9) , given the corresponding solution of a low-ﬁdelity simulation, X = (cid:8) x , x , . . . , x N (cid:9) . In our past work [37], we formulated a deep convolutionalauto-regressive model for modeling the evolution of a transient PDE as a Markovchain. To increase the predictive capability of our model and integrate the low-ﬁdelityobservations, in this work we will use a deep recurrent neural network (RNN) formu-lation which is a standard approach for time-series predictions in deep learning [48].While our model will still predict a single time-step at a time, latent information ispassed between time-steps that the model can learn. The computational graph ofthis RNN with recurrent features τ n is depicted in Fig. 3. The likelihood for the9ntire time-series can be decomposed as follows: p θ ( Y | X ) = N (cid:89) n =1 p θ (cid:0) y n | x n (cid:1) = N (cid:89) n =1 p θ (cid:0) y n | x n , τ n − (cid:1) , (6)in which the recurrent features, τ n − , carry information from past time-steps x n − [48].This requires some initialization for τ to be deﬁned. In this work, these states aremade random with a known distribution as discussed in Section 3.3.1, however thereare many alternatives in RNN literature such as making them constant (i.e. deltafunction density function) or making them learnable parameters. Figure 3: Unfolded computational graph of a recurrent neural network model for which the arrowsshow functional dependence.

Implementing the RNN framework, our goal is to develop a generative model thatcan learn the conditional density, p θ ( y n | x n , τ n − ), in a single high-ﬁdelity time-step.This poses the following three design requirements: a) the core of our model must begenerative allowing for probabilistic modeling of the likelihood, b) we must formulatea method for encoding the low-ﬁdelity inputs x into features that can condition thegenerator and c) recurrent connections need to be integrated into the heart of thegenerative model to condition it on temporal information. To this end, we presenta novel Transient Multi-ﬁdelity Glow (TM-Glow) model for probabilistic surrogatemodeling of dynamical systems illustrated in Fig. 4. TM-Glow is built around theGlow model proposed by Kingma et al . [42] which will be the core generative INNfor modeling the conditional likelihood. This model is depicted in the right columnof Fig. 4a and the blue boxes in Fig. 4b. Glow is designed to provide a multiscaleencoding of the high-ﬁdelity ﬁelds, y n , into a set of random latent variables, z n , rep-resented by the orange boxes in Fig. 4. To address the second design requirement,we use the convolutional conditional encoder proposed by Zhu et al . [36] which con-ditions Glow model on the low-ﬁdelity input, x n , through a set of learnable features.This conditional encoder is shown in the left column of Fig. 4a and the pink boxesin Fig. 4b. Lastly, to allow for temporal conditioning of the Glow model, recurrent10onnections are integrated into novel LSTM aﬃne coupling blocks discussed in Sec-tion 3.3.1. These LSTM based operations allow for recurrent features to ﬂow in andout of the generator illustrated by the green boxes in Fig. 4b. (a) TM-Glow model schematic. (b) Dimensionality representation of TM-Glowwith a model depth of k d = 3. Figure 4: TM-Glow model. This model is comprised of a low-ﬁdelity encoder that conditions agenerative ﬂow model to produce samples of high-ﬁdelity ﬁeld snapshots. LSTM aﬃne blocks areintroduced to pass information between time-steps using recurrent connections. Boxes with roundedcorners in (a) indicate a stack of the elements inside and should not be confused with plate notation.Arrows illustrate the forward pass of the INN. (For interpretation of the colors in the ﬁgure(s), thereader is referred to the web version of this article.)

As shown in Fig. 4a, the TM-Glow core component is the multiscale Glow modelcomprised of squeeze, LSTM Aﬃne Block and split operations discussed in detail inSection 3.3. Superscript numbers enclosed by parenthesis are used to denote variablesat diﬀerent TM-Glow model levels. We emphasize that TM-Glow is an INN, thusthis model can evaluate the conditional likelihood exactly through the change ofvariables:log p θ ( Y | X ) = N (cid:88) n =1 log p θ (cid:0) z n | x n , τ n − (cid:1) + K (cid:88) k =1 log (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂ h nk ∂ h nk − (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (7)in which { h nk } Kk =1 is used to denote the hidden layers of TM-Glow that are the in-puts/outputs of the various invertible operations discussed in Section 3.3 and speciﬁ-cally in Table 1. The forward pass of the model, f θ ( · ), encodes the high-ﬁdelity obser-11ation y n into a set of random latent variables z n = (cid:8) z (1) ,n , z (2) ,n , . . . , z ( k d ) ,n , z ( e ) ,n (cid:9) .The backward/inverse pass of the model, f − θ ( · ), generates a sample of y n by sam-pling each random latent variable. The novel LSTM Aﬃne block contains recurrentconnections between time-steps conditioning the INN on the latent states of pre-vious time-steps τ n − = (cid:8) τ (1) ,n − , ..., τ ( k d ) ,n − (cid:9) . The dense convolutional encoder,detailed in Section 3.4, encodes a low-ﬁdelity input into conditional feature maps, ξ n = (cid:8) ξ (1) ,n , ξ (2) ,n , . . . , ξ ( k d ) ,n (cid:9) that are injected into the multiscale glow at each di-mensional level as depicted in Fig. 4b. The use of the recurrent connections in theLSTM block as well as the conditioning encoder results in the following directedgraphical representation of the model in Fig. 5. Figure 5: The unrolled computational graph of the TM-Glow model for a model depth of k d = 3. Our model is centered around a multiscale structure to promote the discovery of low-dimensional representation of the physics that govern the system. As seen in Fig. 4, amultiscale Glow model originally proposed by Dinh et al . [41] is employed to generatea ﬂow ﬁeld realization. In Fig. 4a, this is the right column of the model comprisedof Squeeze, LSTM Aﬃne Block and Split operations each of which is invertible.We remind the reader that the goal of each of these operations is to provide acomputationally eﬃcient but descriptive mapping between the high-ﬁdelity ﬂow ﬁeldand the random latent variables. As previously discussed in Eq. (4), this is achievedthrough the series of transformations between the hidden layers { h nk } Kk =1 . Thesetransformations are precisely the operations discussed in the subsequent sectionsand listed in Table 1. 12 .3.1. LSTM Aﬃne Block The core component of the generative portion of the TM-Glow model is the LSTMaﬃne block, which is a novel extension of the conditional aﬃne coupling layers [36, 47]designed speciﬁcally for transient time-series prediction. The LSTM aﬃne block iscomprised of three diﬀerent sub-components illustrated in Fig. 6: an unnormalizedconditional aﬃne block, a stack of conditional aﬃne blocks and a conditional LSTMaﬃne block.

Figure 6: The LSTM aﬃne block used in TM-Glow consisting of k c aﬃne coupling layers includingan unnormalized conditional aﬃne block (UnNorm Block), a stack of conditional aﬃne blocks(Conditional Block) and a conditional LSTM aﬃne block (LSTM Block). The core component of all these blocks are aﬃne coupling layers [40], a speciallydesigned function that allows for an eﬃcient inversion and Jacobian calculation. Asdepicted in Fig. 7a, half of the input, h k − , is modiﬁed by the scale and translationparameters, s and t , respectively, calculated from a coupling neural network (cou-pling NN). As implemented in Zhu et al . [36], this coupling NN is a shallow denseconvolutional network with an input of the other half original feature map, h k − , andthe conditional input ξ ( i ) ,n , which are simply concatenated together. This couplingNN contains the learnable parameters that can be updated using any gradient decentmethod. As detailed in Table 1, the retention of input to the coupling NN allows fora simple inversion and Jacobian calculation.This conditional aﬃne coupling layer is further extended with a convolutionalLSTM (convLSTM) depicted in Fig. 7b for transient problems. ConvLSTM is a vari-ation of the traditional LSTM structure that employs convolutional operations [49],making it better suited for convolutional models such as TM-Glow. This input tothe ConvLSTM is the same as the input of the coupling NN in the conditionalcoupling layer, which is conditioned on ξ ( i ) ,n . Following the standard ConvLSTMformulation, the recurrent features have two states, τ ( i ) ,n − = (cid:110) a ( i ) in , c ( i ) in (cid:111) , which cor-respond to the LSTM hidden and cell state, respectively. The output of the LSTM, τ ( i ) ,n = (cid:110) a ( i ) out , c ( i ) out (cid:111) , is passed to the subsequent time-step and a ( i ) out is used as an13nput to the coupling NN of the aﬃne layer. The initial states of the hidden and cellstates at the ﬁrst time-step are assigned the following densities: τ ( i ) , = (cid:110) a ( i ) in ∼ U ( − , , c ( i ) in ∼ N (0 , (cid:111) . (8)The resulting coupling layer is conditioned on both the current low-ﬁdelity input aswell as past time-step states. In the recent work of Kumar et al . [44], residual con-nections between generative ﬂow models are also proposed which are implemented bysimply using the previous latent variables as an input to the shallow neural networkin the split operation, as detailed in Section 3.3.3. The proposed use of ConvL-STM aﬃne layer prevents a vanishing gradient and enables much more descriptiverecurrent feature maps to be learned. (a) Conditional coupling layer. (b) Conditional LSTM coupling layer. Figure 7: The two variants of aﬃne coupling layers used in TM-Glow with an input and outputdenoted as h k − = (cid:8) h k − , h k − (cid:9) and h k = (cid:8) h k , h k (cid:9) , respectively. Time-step superscripts havebeen omitted for clarity of presentation. In the coupling layer blocks ActNorm is used which was originally proposed byKingma and Dhariwal [42] as an alternative to batch-normalization. ActNorm ap-plies an invertible normalization to each feature channel, detailed in Table 1, thatallows for smaller batch-sizes to be used without performance desegregation seen intraditional batch-normalization. The last essential component of the aﬃne couplingblocks is the 1 × ×

1, it can be eﬃciently inverted and has atrivial Jacobian as detailed in Table 1. The purpose of this convolution is to permutethe feature maps between coupling layers. Since, the coupling layers used in Fig. 7only operate on half of the input data, permutation between layers is essential toincrease the expressibility of the model. 14 able 1: Invertible operations used in the generative normalizing ﬂow method of TM-Glow. Beingconsistent with the notation in [42], we assume the inputs and outputs of each operation are ofdimension h k − , h k ∈ R c × h × w with c channels and a feature map size of [ h × w ]. Indexes over thespatial domain of the feature map are denoted by h ( x, y ) ∈ R c . The coupling neural network andconvolutional LSTM are abbreviated as N N and

LST M , respectively. Time-step superscripts havebeen neglect for clarity of presentation.

Operation Forward Inverse Log JacobianConditionalAﬃne Layer (cid:8) h k − , h k − (cid:9) = h k − (log s , t ) = NN ( h i − , ξ ( i ) ) h k = exp (log s ) (cid:12) h k − + th k = h k − h k = (cid:8) h k , h k (cid:9) (cid:8) h k , h k (cid:9) = h k (log s , t ) = NN ( h k , ξ ( i ) ) h k − = (cid:0) h k − t (cid:1) / exp (log s ) h k − = h k h k − = (cid:8) h k − , h k − (cid:9) sum (log | s | )LSTM AﬃneLayer (cid:8) h k − , h k − (cid:9) = h i − a ( i ) out , c ( i ) out = LST M (cid:16) h k − , ξ ( i ) , a ( i ) in , c ( i ) in (cid:17) (log s , t ) = NN ( h k − , ξ ( i ) , a ( i ) out ) h k = exp (log s ) (cid:12) h k − + th k = h k − h k = (cid:8) h k , h k (cid:9) (cid:8) h k , h k (cid:9) = h k a ( i ) out , c ( i ) out = LST M (cid:16) h k , ξ ( i ) , a ( i ) in , c ( i ) in (cid:17) (log s , t ) = NN ( h k , ξ ( i ) , a ( i ) out ) h k − = (cid:0) h k − t (cid:1) / exp (log s ) h k − = h k h k − = (cid:8) h k − , h k − (cid:9) sum (log | s | )ActNorm ∀ x, y h k ( x, y ) = s (cid:12) h k − ( x, y ) + b ∀ x, y h k − ( x, y ) = ( h k ( x, y ) − b ) / s h · w · sum (log | s | )1 × ∀ x, y h k ( x, y ) = W h k − ( x, y ) W ∈ R c × c ∀ x, y h k − ( x, y ) = W − h k ( x, y ) h · w · log (det | W | )Split (cid:8) h k − , h k − (cid:9) = h k − ( µ , σ ) = NN (cid:0) h k − (cid:1) p θ ( z k ) = N (cid:0) h k − | µ , σ (cid:1) h k = h k − h k − = h k ( µ , σ ) = NN (cid:0) h k − (cid:1) h k − ∼ N ( µ , σ ) h k − = (cid:8) h k − , h k − (cid:9) N/A

As seen in Fig. 7, the aﬃne coupling layer requires two inputs for which only oneis modiﬁed to allow for eﬃcient inversion. To form these two inputs, a squeezeoperation is applied to the feature maps which reduces the dimensions of the featuremap by a half and increases the number of channels by a factor of two. In thiswork, we use the squeeze method originally proposed by Dinh et al . [41] and alsoimplemented in the Glow model [42]. As depicted in Fig. 8a, the image is separatedusing a checkerboard pattern resulting in four sub-sampled versions. Note that thisdiﬀers from the conditional Glow model by Zhu et al . [36] which implements a chunkbased squeeze where the image is separated by four quadrants. We found that thecheckered approach had better performance likely due to the checkerboard squeezeproviding a sub-sampled version of the full image rather than a local quadrant to theaﬃne coupling layer. 15 a) Squeeze operation. (b) Split operation.

Figure 8: Squeeze and split forward operations used to manipulate the dimensionality of the fea-tures in TM-Glow. (Left) The squeeze operation compresses the input feature map h k − using acheckerboard pattern halving the spatial dimensionality and increasing the number of channels byfour. (Right) The split operation factors out half of an input h k − which are then taken to belatent random variable z ( i ) . The remaining features, h k are sent deeper in the network. Time-stepsuperscripts have been omitted for clarity of presentation. Unlike standard convolutional operations, the aﬃne coupling layer is volume pre-serving meaning that the number of output elements must be the same as the input.Retaining the total dimensionality input through all layers of the model is not idealfor a convolutional model since this increases the computational and memory costof the model. Thus we use the multiscale architecture proposed by Dinh et al . [41],which is illustrated clearly in Fig. 4b. This multiscale ﬂow model factors out halfof the current feature maps at multiple intervals of the architecture which are thentreated as random latent variables [42, 36]. A single split operation is illustrated inFig. 8b in which the density of these latent variables is taken to be a fully-factorizableGaussian with mean and standard deviation governed from the remaining featuresusing a shallow neural network.When the split is executed in the inverse direction, the hyper-parameters aredependent on the features being provided from deeper within the model as seen inTable 1. This dependence on deeper feature therefore conditions the random latentvariables on both conditional features representing the coarse simulation input, x n ,as well as the recurrent features τ n − . As an example to illustrate this point, considera TM-Glow with a model depth of k d = 3 as illustrated in Figs. 4b. Each of thefour random latent variables and high-ﬁdelity output for a single time-step can be16escribed as the following conditional distributions: z ( e ) ,n ∼ p θ (cid:0) z ( e ) ,n | ξ (3) ,n (cid:1) , z (3) ,n ∼ p θ (cid:0) z (3) | z ( e ) (cid:1) , z (2) ,n ∼ p θ (cid:0) z (2) ,n | z (3) ,n , ξ (3) ,n , τ (3) ,n − (cid:1) , z (1) ,n ∼ p θ (cid:0) z (1 ,n ) | z (2) ,n , ξ (2) ,n , τ (2) ,n − (cid:1) , y n ∼ p θ (cid:0) y n | z (1) ,n , ξ (1) ,n , τ (1) ,n − (cid:1) , (9)which clearly is a hierarchical modeling of the distribution y n ∼ p ( y n | x n , τ n − ) forwhich TM-Glow was designed to learn.As discussed by Dinh et al . [41], this multiscale architecture has multiple intrinsicbeneﬁts. The ﬁrst is that it results in the model learning intermediate representationsof the output ﬁeld, with deeper latent variables representing more global character-istics and shallower ones representing ﬁner details. Additionally this permutes theloss across multiple layers of the network which can improve training and predictiveaccuracy. With respect to modeling physical systems, such a multiscale architec-ture is well suited as a vast number of physical phenomena are multiscale in nature.Speciﬁcally in ﬂuids, it is well known that turbulence occurs at multiple length andtime scales making TM-Glow well suited for ﬂuid ﬂow prediction. Remark 2.

A particularly interesting attribute of the Glow model is the presenceof random latent variables at multiple levels in the generator. This characteristic isabsent from traditional VAE or GANs models for which the random latent variablesare only present at one level of the model, typically the lowest-dimensional. Thisunique architecture arises out of necessity, but allows for the generative model tolearn probabilistic densities at multiple scales. In the context of physical systems,this could allow the model to learn stochastic phenomena at varying length scaleswhich lends itself nicely to many multiscale systems.

To condition the generative model on the low-ﬁdelity ﬂuid ﬁeld at multiple levels, adensely connected convolutional encoder is used. This convolutional encoder, illus-trated on the right side of Fig. 4a, is comprised of encoding and dense blocks followingthe approach originally taken by Zhu et al . [36]. Examples of the encoding and denseblocks are illustrated in Fig. 9 which have been used successfully for modeling manyphysical systems in the past [35, 36, 37, 50]. The encoding blocks down-scale thefeature maps forcing the model to learn low-dimensional representations while thedensely connected blocks increase predictive accuracy of the model and have betterperformance than standard residual connections [51]. The feature maps are takenfrom multiple levels of the convolutional encoder, up-scaled and passed to the aﬃne17oupling blocks conditioning the generator. These are denoted by ξ ( i ) ,n in Fig. 4,passing detailed high-dimensional features towards the beginning of the encoder andglobal low-dimensional features towards the end. Figure 9: Dense block with a growth rate and length of 2. Residual connections between convolu-tions progressively stack feature maps resulting in 12 output channels in this schematic. Standardbatch-normalization [52] and Rectiﬁed Linear Unit (ReLU) activation functions are used [53] injunction with the convolutional operations. Convolutions are denoted by the kernel size k , stride s and padding p .

4. TM-Glow Training

One of the key beneﬁts of using INNs is the ability to calculate the likelihood ofthe data exactly with respect to the latent variables. This makes data-driven train-ing straight forward as one can simply pose the optimization as the minimizationof the negative of the log likelihood in Eq. (5) [40, 41, 42, 54]. However, since thisencodes the output of the model to the latent parameters, this training does notallow physical-constraints to be imposed on the generated samples of the model.In the work of Zhu et al . [36], in which physics-constrained learning is used in theabsence of data, the reverse Kullback-Leibler(KL) divergence is used as an opti-mization objective. The reverse KL divergence poses the optimization though thegenerated samples of the INN, which allows for physical-constraints to be imposedon the produced realizations.Due to the complex dynamics of the N-S equations at high-Reynolds numbers,physics-constrained learning turbulent ﬂuid ﬂows through a PDE based loss aloneposes a diﬃcult optimization objective. Thus we will use here a semi-supervisedextension of the reverse KL-divergence loss that allows both supervision with dataas well as additional physics-constrained components. Consider a training set ofi.d.d. cases D = { X d , Y d } Dd =1 , then the loss is as follows: L = arg min θ D (cid:88) d =1 D KL ( p θ ( Y d | X d ) || p β ( Y d | X d )) = D (cid:88) d =1 E p θ (cid:20) log p θ ( Y d | X d ) p β ( Y d | X d ) (cid:21) , (10)18n which we have made use of the fact that the KL divergence is additive for inde-pendent distributions. p θ ( Y | X ) is the density of TM-Glow with parameters θ fora single time-step. p β ( Y | X ) is an energy based density function with a controllableparameter β representing the true high-ﬁdelity targets. Note that the expectation iscalculated using the samples of the generative model y ∼ p θ , requiring a backward pass of the INN which is the opposite direction of the standard maximum likelihoodapproach.Currently this loss is posed across the entire time-series, however we desire itto be expressed in terms of single time-steps to make it computationally tractablewith TM-Glow. First, we pose the energy-based density as a product of independentdistributions at each individual time-step p β ( Y | X ) = (cid:81) Nn =1 p β ( y n | x n ). This isa similar form as the deﬁnition of the model’s likelihood in Eq. (6), p θ ( Y | X ) = (cid:81) Nn =1 p θ ( y n | x n , τ n − ). The loss for a time-series of N time-steps can be written as: L = D (cid:88) d =1 N (cid:88) n =1 E p θ (cid:2) log p θ (cid:0) y nd | x nd , τ n − d (cid:1) − log p β ( y nd | x nd ) (cid:3) . (11)The ﬁrst term is an entropy promoting term, log p θ (cid:0) y nd | x nd , τ n − d (cid:1) , encouraging di-versity in the models samples and avoiding mode collapse. A unique advantage usingan INN is that the entropy, H ( y nd | x nd , τ n − d ) = − E p θ (cid:2) log p θ (cid:0) y nd | x nd , τ n − d (cid:1)(cid:3) , can beevaluated exactly though the change of variables in Eq. (3) as oppose to approxi-mating [55, 56] or learning it [57]. The second term, the negative log energy density, − log p β ( y | x ), encourages consistency between the models generated samples andthe speciﬁed physical-constraints. In this work, we use the Boltzmann distributionto model p β ( y nd | x nd ) which is standard in energy based models [58]: p β ( y nd | x nd ) = exp ( − βV P DE ( · )) Z β , (12)in which V P DE ( · ) is a PDE based potential discussed in further detail in Section 4.1. Z β is a normalizing constant which does not impact the optimization and thus ne-glected. β is a tunable parameter that corresponds to the inverse temperature in theBoltzmann distribution that controls the strength of the potential in the backwardKL loss. The resulting form of the reverse KL divergence follows: L = D (cid:88) d =1 N (cid:88) n =1 E p θ (cid:34) log p θ (cid:0) z nd | x nd , τ n − d (cid:1) + K (cid:88) k =1 log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) det (cid:32) ∂ h nk,d ∂ h nk − ,d (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + βV P DE ( · ) (cid:35) . (13)19n practice, the expectation in the KL divergences is taken as point estimate duringtraining. Due to the large number of times this loss is evaluated during the stochasticoptimization of our model, the eﬀects of such point estimates have been empiricallyshown to be minimal [38]. The potential V P DE represents imposed physical constraints one wishes to imposeon the model’s samples. Similar to past physics-constrained literature [36, 37], wewill use the governing equations to aid the formulation of this potential. Within thiswork, we pose V P DE in terms of three components: V P DE = V P res + V Div + V L + V RMS , (14) V P res = v c n s (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ρ (cid:18) ∂ p n ∂x + ∂ p n ∂y (cid:19) + (cid:18) ∂ u nx ∂x (cid:19) + 2 ∂ u nx ∂y ∂ u ny ∂x + (cid:18) ∂ u ny ∂y (cid:19) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (15) V Div = v c n s (cid:13)(cid:13)(cid:13)(cid:13) ∂ u nx ∂x + ∂ u ny ∂y (cid:13)(cid:13)(cid:13)(cid:13) , V L = 1 n s (cid:107) y n − y nHF (cid:107) (16) V RMS = 1 n s (cid:107) RM S ( y (cid:48) ) − RM S ( y (cid:48) HF ) (cid:107) , (17)which consists of the residual of the Poisson equation for pressure, the divergencefree constraint for incompressible ﬂow and two L supervised learning terms. Theﬁrst is between the predicted state variables of the model, denoted by y n , and theobserved high-ﬁdelity solution y nHF . The second is between the root-mean-square(RMS) of the ﬂuctuation states predicted by the model, y (cid:48) , and the observed high-ﬁdelity RMS values of y (cid:48) HF . This term can be interpreted as matching the turbulentintensity between the predicted time-series and the high-ﬁdelity observables. TheRMS of the states is deﬁned as follows, RM S (cid:16) y (cid:48) (cid:17) = (cid:113) ( y (cid:48) ) = (cid:18) T (cid:90) T ( y ( t ) − y ) dt (cid:19) / , (18)which is a time-averaged quantity and thus has no time-step index. n s is the num-ber of nodes in the predicted high-ﬁdelity spatial domain. Both residual loss termsare scaled by the cell volume, v c = ∆ x · ∆ y , to help balance each loss component.While the potential resembles forms of other data and PDE based constrained lossfunctions [22, 33], ours is posed in probabilistic framework for learning the full dis-tribution of solutions opposed to a single deterministic prediction.20he PDE residual terms are evaluated using the model’s predictions and con-strains the predictions to be physically realizable. To evaluate the gradients weuse the same methods successfully used in our past works for various physical sys-tems [36, 37]. Eﬃcient ﬁnite diﬀerence based convolutions to approximate ﬁrst-ordergradients: ∂ u n ∂x = 18∆ x  − − −  ∗ u n , ∂ u n ∂y = 18∆ y  − − −

10 0 01 2 1  ∗ u n , (19)as well as second-order gradients: ∂ p n ∂x = 14∆ x  − − −  ∗ p n , ∂ p n ∂y = 14∆ y  − − −

21 2 1  ∗ p n . (20)These smoothed second-order accurate ﬁnite diﬀerence approximations are basedon image processing ﬁlters such as the Sobel ﬁlter 2D convolutions [59] which havebeen found to improve training stability over pure ﬁnite-diﬀerence calculations. Theconvolutional ﬁlter approach allows for eﬃcient computation of these gradients dur-ing training that directly integrates itself into the computational graph for back-propagation. In this work, since we are predicting a sub-domain for which we do notknow the complete boundary conditions, we only compute the PDE constraint termson the deep nodes of the predicted domain ignoring the boundary values. The L term helps stabilize the PDE based losses which can be unstable due to their gradi-ents as well as encouraging turbulence in the predicted ﬂuid ﬂow. Similar L lossesare used in GAN models for time-series predictions to increase time-series accuracyand continuity [60, 61]. Pseudocode for the training process is outlined in Algo-rithm 1 for a single training case but easily extends to a full training data-set. Thepseudocode for sampling of TM-Glow is also outlined in Algorithm 2, from whichstatistics are then computed in traditional Monte Carlo fashion. TM-Glow contains a large set of hyper-parameters including model depth, the num-ber of aﬃne coupling layers, coupling neural network depth, learning rate, mini-batchsize, etc. which are all coupled together making an extensive hyper-parameter searchextremely diﬃcult. While automated methods exist to aid this search, we opted totake a simpler approach by empirically ﬁnding a reasonable model architecture that21 lgorithm 1:

Training TM-Glow for a single training case.

Input:

TM-Glow model: f θ ; Low-ﬁdelity and high-ﬁdelity time-series data { X , Y } = { x n , y nHF } Nn =1 of length N ; Number of epochs: M ;Back-propagation through time interval: p ; Learning rate: η y ≈ (1 /n ) (cid:80) Nn =1 y nHF ; (cid:46) Approx. mean flow field for epoch = 1 to M do τ ∼ p ( τ ) ; (cid:46) Sample initial recurrent state for n = 1 to N do y n , τ n , log p ( y n | x n , τ n − ) ← f − θ ( x n , τ n − ) ; (cid:46) Sample TM-Glow V P res ( y n ) = ( v c /n s ) (cid:107)(cid:52) p n + ∇ · (( u n · ∇ ) u n ) (cid:107) ; (cid:46) Poisson residual V Div ( y n ) = ( v c /n s ) (cid:107)∇ · u n (cid:107) ; (cid:46) Divergence residual V L = (1 /n s ) (cid:107) y n − y nHF (cid:107) ; (cid:46) L1 Loss L += log p ( y n | x n , τ n − ) + β ( V P res + V Div + V L ) ; (cid:46) Backward KL if Mod(n,p)=0 then y (cid:48) = y n − p : n − y ; (cid:46) Approx. TM-Glow fluctuation fields L += ( βp/n s ) (cid:107) RM S ( y (cid:48) ) − RM S ( y (cid:48) HF ) (cid:107) ; (cid:46) RMS loss ∇ θ ← Backprop( L ) ; (cid:46) Back-propagation θ ← θ − η ∇ θ ; (cid:46) Gradient Descent L = 0 ; (cid:46) Zero loss y = (1 /n ) (cid:80) Nn =1 y n ; (cid:46) Update mean flow field estimate

Output:

Trained TM-Glow model f θ ; Algorithm 2:

Sampling TM-Glow high-ﬁdelity time-series.

Input:

Trained TM-Glow model: f θ ; Low-ﬁdelity time-series data { X } = { x n } Nn =1 of length N ; Number of samples: M for m = 1 to M do τ ∼ p ( τ ) ; (cid:46) Sample initial recurrent state for n = 1 to N do y n , τ n ← f − θ ( x n , τ n − ) ; (cid:46) Sample time-step from TM-Glow Y m = (cid:8) y , y , ..., y N (cid:9) ; (cid:46) Store sampled time-series

Output:

High-ﬁdelity ﬂow samples: Y M k c in Fig. 6.Each model is trained on a small data-set (32 ﬂows from the second numerical exam-ple in Section 6) to try to keep the computational cost of the hyperparameter searchreasonable. To quantify the accuracy of each model, the following time-averagedprediction mean squared errors (MSE) are used for a validation set of n test = 16ﬂows: M SE

Mag = 1 n s n test (cid:107) E p θ [ | u | ] − | u HF |(cid:107) , | u | = 1 T (cid:90) T (cid:113) u x ( t ) + u y ( t ) dt, (21) M SE

T KE = 1 n s n test (cid:107) E p θ [ k ] − k HF (cid:107) , k = 12 (cid:16) ( u (cid:48) x ) + (cid:0) u (cid:48) y (cid:1) (cid:17) , (22)in which the expected value of the model’s prediction is estimated using 20 modelsamples. The ﬁrst error assesses the accuracy of the mean ﬂow magnitude, the secondassesses the accuracy of the predicted turbulent kinetic energy (TKE). The test errorof the models considered are plotted in Fig. 10. We ﬁnd that there is a trade oﬀbetween average velocity and turbulent energy accuracy and larger models begin toover ﬁt on this small training data-set. Based on these results, we select a TM-Glowmodel with k d = 3 and k c = 16. Figure 10: (Left to right) Velocity magnitude MSE and turbulent kinetic energy (TKE) test MSEfor TM-Glow models containing k d · k c aﬃne coupling layers. Essential TM-Glow and training hyper-parameters are outlined in Table 2. Theresulting model contains 1 . τ , is averaged with the current recurrentstate after every BPTT. This borrows the idea of using various recurrent features atdiﬀerent timescales in hierarchical RNNs for neural language processing [62, 63]. Al-though not necessary, this algorithmic heuristic helps prevent the information fromthe initial state being lost which was found to improve the model’s accuracy andsample diversity. For additional details, we direct the reader to the source code. Table 2: TM-Glow model and training parameters used for both numerical test cases. For theparameters that vary between test cases the superscript † and ‡ to denote numerical examples inSections 5 and 6, respectively. Hyper-parameter diﬀerences are due to memory constraints imposedfrom the varying predictive domain sizes. TM-Glow TrainingModel Depth, k d ξ ( i )

32 Weight Decay 1 e − a ( i ) in , c ( i ) in ,

64 Epochs 400Aﬃne Coupling Layers, k c

16 Mini-batch Size 8 † , ‡ Coupling NN Layers 2 BPTT 10 time-stepsInverse Temp., β β , in the energy density controls the balancebetween the model satisfying the physics-based potential and the model’s entropy.Given this parameter close relation to the model’s probabilistic nature, reliabilitydiagrams of the model’s predictions are used to assess the predictive uncertaintyquality. Several models with diﬀerent β values are trained on the same small data-setof 32 ﬂows from the second numerical example in Section 6 used when calibratingthe model depth. For a small validation data-set of 16 ﬂows, for each model wecompute the empirical density function for each of the model’s output ﬁelds overall samples, time-steps and validation cases at each spatial location independently.The values of the predicted density function at several quantiles are then compared24o the empirical density function of the high-ﬁdelity data which is then averagedover the spatial domain and plotted in Fig. 11 for each state variable. Interestingly,unlike Zhu et al . [36], the predicted quantiles all match fairly well with the high-ﬁdelity data with apparently little sensitivity to β . Based on these results, a rangeof β = [200 , β = 500. Remark 3.

For each numerical example, additional ﬁne tuning is certainly possibleto obtain the highest level of accuracy. However, in this work we will not be per-forming any case speciﬁc tuning to demonstrate that decent results can be obtainedusing TM-Glow for multiple problems of diﬀerent nature and dimensionality.

Figure 11: Reliability diagrams of the predicted x-velocity, y-velocity and pressure ﬁelds predictedwith TM-Glow evaluated over 12000 model predictions. The black dashed line indicates matchingempirical distributions between the model’s samples and observed validation data.

An ablation study is performed to investigate the impact each component has onthe model’s predictive accuracy. Additionally, the model is also trained using thestandard maximum likelihood approach for INNs, by maximizing Eq. (7), to act asthe traditional base-line. The same training/validation data-set used for the accuracyand uncertainty calibration studies was also used here. As listed in Table 3, we trainseveral models using variants of the propose backward KL loss and compute the meansquared error of various ﬂow-ﬁeld quantities across the validation data-set. Again,20 model samples are used to compute the expected value of each predicted ﬂowquantity from which the error is computed.First we note that training the model through the traditional maximum likeli-hood approach generally yields worse results than the backwards KL losses with theexception of some of the time-averaged mean ﬂow quantities. Additionally, the large25esidual errors for the maximum likelihood training indicated that the instantaneousﬂow ﬁelds are non-physical. Interestingly, the proposed loss does not produce themost accurate mean ﬂow or turbulent statistics. This appears to be due to the in-clusion of the Poisson pressure residual loss, which enforces physical coupling of theoutput ﬁelds. Without this PDE loss, the model has more freedom and can achievegreater accuracy of the ﬂow statistics. However, this comes at the cost of havingnonphysical instantaneous ﬂow ﬁeld realizations which is indicated by the increasein the time-average pressure residual. Given that we are interested in predictingphysical ﬂuid ﬂow, we believe that inclusion of the Poisson residual is essential evenat the sacriﬁce of the time-average statistics.

Table 3: Ablation study of the impact of diﬀerent parts of the backward KL loss. As a base-linewe also train TM-GLow using the standard maximum likelihood estimation (MLE) approach. Themean square error (MSE) of various ﬂow ﬁeld quantities for various loss formulations are listed.The lowest values for each error are bolded.

MLE V Pres V Div V L V RMS

MSE ( u x ) MSE ( u y ) MSE ( p ) MSE (cid:18)(cid:113) ( u (cid:48) x ) (cid:19) MSE (cid:18)(cid:113)(cid:0) u (cid:48) y (cid:1) (cid:19) MSE (cid:18)(cid:113) ( p (cid:48) ) (cid:19) V Div V Pres (cid:88) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:88) (cid:88) (cid:88) (cid:88) (cid:55) (cid:55) (cid:88) (cid:88) (cid:88) (cid:55) (cid:55) (cid:55) (cid:88) (cid:88) (cid:55) (cid:55) (cid:55) (cid:88) (cid:55)

5. Turbulent Flow over a Backwards Step

We ﬁrst apply the proposed model to surrogate modeling turbulent ﬂow over a back-ward step at diﬀerent Reynolds numbers, a classical benchmark problem in compu-tational ﬂuids. As illustrated in Fig. 12, the feature of interest is the ﬂow separationthat occurs following the step. Such phenomena can be found in a surprisinglylarge number of systems including heat exchanges, ﬂow around buildings, combus-tion engines and aerodynamic elements [65, 66]. The Reynolds number of the ﬂowis governed by the inlet velocity u , viscosity ν = 0 . h = 1. In this benchmark, the inlet boundary condition is varying in magnitude andthus varying the Reynolds number of the ﬂow. Here we are interested in predictingthe recirculation region, marked by the green box in Fig. 12, for diﬀerent Reynoldsnumber. This region is the typical area of study for this ﬂow due the presence ofﬂow separation, Kevin-Helmholtz instability and turbulent ﬂow with various eddyformations. 26 igure 12: Flow over a backwards step. The green region indicates the recirculation region TM-Glow will be used to predict. All domain boundaries are no-slip with the exceptions of the uniforminlet and zero gradient outlet. The total outlet simulation length is made to be double that of theprediction range to negate eﬀects of the boundary condition on this zone. The low-ﬁdelity simulator that will be the input of the model has a mesh char-acteristic resolution of l c = h/

12 and the target high-ﬁdelity ﬁeld has a resolution of l c = h/

32 as shown in Fig. 13. Thus TM-Glow will be provided an input of [24 × × t = 0 .

5. The result-ing model input for a single time-step is x n = { u nl , p nl } ∈ R , , with an output y n = { u nh , p nh } ∈ R , , . The full training data set consists of ﬂuid ﬂows evenlydistributed between Reynolds number 5000 to 50000 each consisting of 80 time-steps.Simulations were performed using the OpenFOAM ﬁnite volume solver using stan-dard Smagorinsky LES sub-grid scale models [67]. During training we augment thesetime-series by splitting them in half into two time-series of 40 time-steps to artiﬁciallycreate more ﬂow training cases. Training input and output data is normalized to astandard unit Gaussian. Further details on the computational cost of the low-ﬁdelityand high-ﬁdelity simulations along with the training of TM-Glow are discussed inSection 7. (a) Low-ﬁdelity (b) High-ﬁdelity Figure 13: Computational mesh around the backwards step used for the low- and high-ﬁdelity CFDsimulations solved with OpenFOAM [67].

A test set of 17 ﬂows with evenly spaced Reynolds numbers between [7500 , Figure 14: (Left to right) Flow over backwards step velocity magnitude and turbulent kinetic energy(TKE) error during training of TM-Glow on diﬀerent data set sizes. Error values were average overﬁve model samples.Table 4: Backwards step test error of various normalized time-averaged ﬂow ﬁeld quantities of thelow-ﬁdelity solution interpolated to the high-ﬁdelity mesh and TM-Glow trained on various trainingdata set sizes. Lower is better. TM-Glow errors were averaged over 20 samples from the model.The training time of each data set size is also listed.

MSE ( u x /u ) MSE ( u y /u ) MSE ( p /u ) MSE (cid:18)(cid:113) ( u (cid:48) x ) /u (cid:19) MSE (cid:18)(cid:113)(cid:0) u (cid:48) y (cid:1) /u (cid:19) MSE (cid:18)(cid:113) ( p (cid:48) ) /u (cid:19) GPU Hrs.Low-Fidelity 0.1285 0.0265 0.0227 0.0241 0.0187 0.0145 -8 Flows 0.0160 0.0036 0.0040 0.0069 0.0053 0.0052 2.416 Flows 0.0173 0.0044 0.0032 0.0049 0.0046 0.0042 4.332 Flows 0.0135 0.0032 0.0023 0.0030 0.0032 0.0020 8.4

To illustrate the improvement TM-Glow is able to produce, we plot several time-steps of the velocity magnitude for several time-series samples of the model in Fig. 15as well as the Q-criterion (also known as the elliptic Okubo-Weiss criterion for 2Dﬂows) [68, 69] in Fig. 16. Additional, samples of each state variable for this numericaltest case are illustrated in Figs. 17 and 18. TM-Glow clearly generates a ﬂuid ﬂowthat is far closer to the high-ﬁdelity solution compared to the low-ﬁdelity simulationboth in the magnitude of the ﬂuid velocity but also the predicted vortex structure.The variance of the sampled time-series appears to vary depending on Reynoldsnumber. For example, the test case Re = 27500, which exists in the center of thetraining data range, has a noticeably larger sample diversity compared to the edgecase of Re = 47500. We believe that this is largely due to a lack of multiple ﬂows28t the same Reynolds number in the training data-set, thus the inclusion of multipletraining ﬂows at the same Reynolds numbers would increase sample diversity. Ingeneral, the model’s samples produce accurate ﬂuid ﬂow and turbulent statistics asillustrated in Figs. 19 and 20. In Fig. 19, the mean ﬂow proﬁles are plotted of thestate variables along with the predicted uncertainty for two test ﬂows. Following inFig. 20, the turbulent kinetic energy and Reynolds shear stress proﬁles are illustrated.TM-Glow is able to make dramatic improvements to the ﬂow statistics for turbulentﬂows diﬀering in Reynolds numbers by almost an order of magnitude using only 32ﬂuid simulations to learn. (a) Re = 27500(b) Re = 47500 Figure 15: (Top to bottom) Velocity magnitude of the high-ﬁdelity target, low-ﬁdelity input, 3TM-Glow samples and standard deviation for two test ﬂows. a) Re = 27500(b) Re = 47500 Figure 16: (Top to bottom) Q-criterion of the high-ﬁdelity target, low-ﬁdelity input and threeTM-Glow samples for two test ﬂows. a) X-velocity (b) Y-velocity(c) Pressure Figure 17: TM-Glow time-series samples of x − velocity, y − velocity and pressure ﬁelds for a back-wards step test case at Re = 7500. For each ﬁeld (top to bottom) the high-ﬁdelity ground truth,low-ﬁdelity input, three TM-Glow samples and the resulting standard deviation are plotted. (a) X-velocity (b) Y-velocity(c) Pressure Figure 18: TM-Glow time-series samples of x-velocity, y-velocity and pressure ﬁelds for a backwardsstep test case at Re = 27500. For each ﬁeld (top to bottom) the high-ﬁdelity ground truth, low-ﬁdelity input, three TM-Glow samples and the resulting standard deviation are plotted. igure 19: (Top to bottom) Time averaged x-velocity, y-velocity and pressure proﬁles for twodiﬀerent test cases at (left to right) Re = 7500 and Re = 47500. TM-Glow expectation (TM-Glow)and conﬁdence interval (TM-Glow 2 σ ) are computed using 20 time-series samples.Figure 20: (Top to bottom) Turbulent kinetic energy and Reynolds shear stress proﬁles for twodiﬀerent test cases at (left to right) Re = 7500 and Re = 47500. TM-Glow expectation (TM-Glow)and conﬁdence interval (TM-Glow 2 σ ) are computed using 20 time-series samples.

6. Turbulent Flow around an Array of Cylinders

While the prediction of a ﬂow at diﬀerent Reynolds numbers is a practical testcase, the reality is that the underlying ﬂow structures have a relatively similar form.Thus for our second numerical example, we wish to stress this model further byinvestigating the prediction of a ﬂow where the underlying ﬂow structures are varyingdramatically between test cases. A classical ﬂuid mechanics benchmark is the ﬂowaround a cylinder, however in its traditional form its not up to the complexity we are32nterested in. Thus for a more challenging problem, we will consider the predictionof turbulent wake behind an array of cylinders with a stochastic location.Flow around multiple bluﬀ bodies is important due to its various applicationsin engineering including: wind ﬂow around urban structures [70], water ﬂow aroundbridge pylons [71, 72], wake from an array of wind turbines [73, 74], modern oﬀ-shore structures [75], heat transfer applications, etc. Depicted in Fig. 21, in thiscase study ﬁve cylinders are randomly placed within a speciﬁed area of a channelwith a ﬁxed uniform inlet velocity. The sub-domain we wish predict is the wakeregion directly behind the cylinder array in which the majority of the turbulenceexists. Diﬀering from the previous surrogate model where the Reynolds number wasvarying, the physical boundary of this ﬂow is changing resulting in very diﬀerentﬂuid structures in the predictive sub-domain. The bulk Reynolds number of theﬂow, set at a constant value Re = 5000, is governed by the inlet velocity u = 1,viscosity ν = 0 . d = 1. This numerical example isakin to ﬂow optimization problems for which a structure is optimized to yield desiredﬂow properties. The predicted ﬂow ﬁelds for both a low-ﬁdelity and correspondinghigh-ﬁdelity ﬁnite volume simulation are shown in Fig. 22 for two diﬀerent cylinderarrays to demonstrate the diﬀerence in the resolved ﬂow features. Figure 21: Flow around array of bluﬀ bodies. The red region indicates the area for which the bodiescan be placed randomly. The green region indicates the wake zone that we will use TM-Glow topredict a high-ﬁdelity response from a low-ﬁdelity simulation. igure 22: Velocity magnitude of the low-ﬁdelity and high-ﬁdelity simulations for two diﬀerencecylinder arrays. (Left to right) Cylinder array conﬁguration and the corresponding (top to bottom)high-ﬁdelity and low-ﬁdelity ﬁnite volume simulation results at several time-steps. The low-ﬁdelity simulator that will be the input of the model has a mesh charac-teristic resolution of l c = 5 d/

16 and the target high-ﬁdelity ﬁeld has a characteristicresolution of l c = 5 d/

64 as shown in Fig. 23. The mesh is structured in the wakeregion allowing for this data to be directly used with our convolutional generativemodel. Thus TM-Glow will be provided an input of [16 ×

16] and predict a ﬁeld[64 ×

64] both with a time-step size of ∆ t = 0 .

5. The model input for this numer-ical example is x n = { u nl , p nl } ∈ R , , with an output y n = { u nh , p nh } ∈ R , , .The full training data set consists of ﬂuid ﬂows with cylinders randomly placed indiﬀerent conﬁgurations. Just like the previous numerical example, simulations wereperformed using the OpenFOAM ﬁnite volume solver using standard SmagorinskyLES sub-grid scale models [67]. During training we augment these time-series bysplitting them in half into two time-series of 40 time-steps to artiﬁcially create moreﬂow training cases. Training input and output data is normalized to a standard unitGaussian. Additional details on the computational cost of training of TM-Glow forthis numerical example can also be found in Section 7.34 a) Low-ﬁdelity (b) High-ﬁdelity Figure 23: Computational mesh around the cylinder array used for the low- and high-ﬁdelity CFDsimulations solved with OpenFOAM [67].

A test set of 32 ﬂows, each with a unique cylinder conﬁguration, are used toevaluate the performance of TM-Glow. Four models are trained on 32, 48, 64 and96 ﬂows. The test MSE error of the velocity magnitude and TKE, deﬁned in Eqs. 21and 22, during training are plotted in Fig. 24. The test errors of various ﬂow ﬁeldquantities are listed in Table 5 along with the error obtained from naively interpolat-ing the low-ﬁdelity solution to the high-ﬁdelity mesh. TM-Glow is able to producetime-average statistics that are far more accurate than the low-ﬁdelity solution asexpected. As the training data set increases, we do see improvements in the ﬂowstatistics as we would expect. We note though, that even on the smallest data setlarge improvements over the low-ﬁdelity simulation can still easily be obtained. Forthe remaining results we will use the model trained on 96 ﬂows to illustrate thehighest accuracy model obtained. 35 igure 24: (Left to right) Cylinder array velocity magnitude and turbulent kinetic energy (TKE)error during training of TM-Glow on diﬀerent data set sizes. Error values were average over ﬁvemodel samples.Table 5: Cylinder array test error of various time-averaged ﬂow ﬁeld quantities of the low-ﬁdelitysolution interpolated to the high-ﬁdelity mesh and TM-Glow trained on diﬀerent training data setsizes. Lower is better. TM-Glow errors were averaged over 20 samples from the model. The trainingtime of each data set size is also listed.

MSE ( u x ) MSE ( u y ) MSE ( p ) MSE (cid:18)(cid:113) ( u (cid:48) x ) (cid:19) MSE (cid:18)(cid:113)(cid:0) u (cid:48) y (cid:1) (cid:19) MSE (cid:18)(cid:113) ( p (cid:48) ) (cid:19) GPU Hrs.Low-Fidelity 0.1024 0.0081 0.0179 0.0638 0.0955 0.02122 -32 Flows 0.0432 0.0089 0.0158 0.0136 0.0201 0.0093 3.048 Flows 0.0378 0.0060 0.0181 0.0129 0.0175 0.0090 4.264 Flows 0.0361 0.0056 0.0110 0.0114 0.0170 0.0080 5.596 Flows 0.0304 0.0048 0.0127 0.0116 0.0174 0.0079 8.0

Similar to the previous numerical example, we plot several time-steps of thevelocity magnitude for several time-series samples of the model in Fig. 25. Ad-ditional, samples of each state variable for this numerical test case are illustratedin Figs. 26 and 27. Although the low-ﬁdelity simulation diﬀers signiﬁcantly fromthe high-ﬁdelity solution, we can see TM-Glow is able to produce ﬂuid realizationsthat qualitatively appear similar to the high-ﬁdelity. In this particular example,the low-ﬁdelity solution exhibits nearly laminar ﬂow due to the coarse discretizationused which TM-Glow is able to correct. Additionally, the ﬂuid ﬂows sampled fromTM-Glow appear much more diverse than that seen in the previous numerical ex-ample. Proﬁles of time-averaged ﬂow quantities and turbulent statistics are plottedin Figs. 28 and 29 for two test ﬂows. Indeed we can see that TM-Glow is able toimprove both with reasonable uncertainty bounds as well. In general, TM-Glow is36ble to yield an accurate prediction for the time-averaged ﬂow ﬁeld. However, themodel seems to consistently under-predict the turbulent intensity of the ﬂow ﬁeld.This could be improved through some ad hoc tuning of the loss by weighting theRMS term more heavily. 37 igure 25: (Top to bottom) Velocity magnitude of the high-ﬁdelity target, low-ﬁdelity input, threeTM-Glow samples and standard deviation for two test cases. a) X-velocity (b) Y-velocity(c) Pressure Figure 26: TM-Glow time-series samples of x − velocity, y − velocity and pressure ﬁelds for a cylinderarray test case. For each ﬁeld (top to bottom) the high-ﬁdelity ground truth, low-ﬁdelity input,three TM-Glow samples and the resulting standard deviation are plotted. a) X-velocity (b) Y-velocity(c) Pressure Figure 27: TM-Glow time-series samples of x − velocity, y − velocity and pressure ﬁelds for a cylinderarray test case. For each ﬁeld (top to bottom) the high-ﬁdelity ground truth, low-ﬁdelity input,three TM-Glow samples and the resulting standard deviation are plotted. a) Test-case 1(b) Test-case 2 Figure 28: Time-averaged ﬂow proﬁles for two test ﬂows. TM-Glow expectation (TM-Glow) andconﬁdence interval (TM-Glow 2 σ ) are computed using 20 time-series samples. (a) Test-case 1 (b) Test-case 2 Figure 29: Turbulent statistic proﬁles for two test ﬂows. TM-Glow expectation (TM-Glow) andconﬁdence interval (TM-Glow 2 σ ) are computed using 20 time-series samples. . Computational Cost Analysis In the following section, the computational cost associated with the training andprediction of TM-Glow will be discussed. The cost of a surrogate needs to be lowenough to justify its use, which includes the training cost for deep learning models. Tocompare processes ran on diﬀerent hardware and CPU cores, we adopt the measureof a service unit (SU) hour, which is equal to a single CPU core hour or a singleGPU hour. As shown in Table 6, both the low-ﬁdelity and high-ﬁdelity simulationswere ran on CPUs while the deep generative model used a single GPU. Diﬀerencesbetween CPU models were neglected since computation of TM-Glow is bottle-neckedby the GPU. The comparison of CPU consumption versus a GPU is not trivial due tothe fundamental hardware diﬀerences between the two and an in depth investigationusing energy consumption or ﬂoating point operations is beyond the intended scopeof this paper. Hence, we use this simple deﬁnition resembling that used by theExtreme Science and Engineering Discovery Environment (XSEDE). For both numerical examples, OpenFOAM ﬁnite volume simulator was used dueto its extensive validation and eﬃciency [67]. Both the low- and high-ﬁdelity simula-tions used standard LES Smagorinsky sub-grid scale model [76] with default param-eters. When the high-ﬁdelity simulations were parallelized between CPUs, Open-FOAM’s in house domain decomposition algorithm “scotch” was used to partitionthe meshes. Additionally, the ﬂuid ﬂows are solved between time t = [0 ,

80] for bothresolutions but only t = [40 ,

80] is used as training/testing data. This is done toensure the ﬂow ﬁelds sampled were of fully developed turbulence.

Table 6: Hardware used to run the low-ﬁdelity and high-ﬁdelity CFD simulations as well as thetraining and prediction of TM-Glow for both numerical examples.

CPU Cores CPU Model GPUs GPU Model SU HourLow-Fidelity 1 Intel Xeon E5-2680 - - 1High-Fidelity 8 Intel Xeon E5-2680 - - 8TM-Glow 1 Intel Xeon Gold 6226 1 NVIDIA Tesla V100 2

The low-ﬁdelity and high-ﬁdelity simulations for the ﬂow over backwards facing stepconsisted of a mesh with a resolution of ∆ x, ∆ y = h/

12 and ∆ x, ∆ y = h/

32, respec-tively. A sub-section of both meshes are plotted in Fig. 13 to illustrate the resolution . The low-ﬁdelity and high-ﬁdelity simulations for the ﬂow over a cylinder array con-sisted of a mesh with a resolution of ∆ x, ∆ y = 5 d/

16 and ∆ x, ∆ y = 5 d/

64, respec-tively. A sub-section of the meshes are plotted in Fig. 23 to illustrate the resolutiondiﬀerence. This results in the low-ﬁdelity and high-ﬁdelity meshes containing 9k and125k cells, respectively. A single low-ﬁdelity simulation takes about 3 . a) Flow over Backwards Step (b) Flow Around Cylinder Array Figure 30: Computational requirement for training TM-Glow given training data-sets of varioussizes. Computation is quantiﬁed using Service Units (SU) deﬁned in Table 6.Table 7: Prediction cost of the surrogatecompared to the high-ﬁdelity simulator forﬂow over a backwards step.

Backwards Step SU Hours Wall-clock (mins)Low-Fidelity 0.06 4.5TM-Glow 20 Samples 0.03 0.75Surrogate Prediction 0.09 5.25High-Fidelity Prediction 5.6 42

Table 8: Prediction cost of the surrogatecompared to the high-ﬁdelity simulator forﬂow around a cylinder array.

Cylinder Array SU Hours Wall-clock (mins)Low-Fidelity 0.05 3.1TM-Glow 20 Samples 0.02 0.7Surrogate Prediction 0.07 3.8High-Fidelity Prediction 4.27 32

8. Conclusion

The application of machine learning methods to CFD requires signiﬁcant advances toextend such models to realistic problems. In this work we investigate the predictionof fully-turbulent systems using deep learning. We proposed a multi-ﬁdelity approachfor which a computationally inexpensive low-ﬁdelity solver is used as a conditionalinput to a deep generative model that predicts ﬂuid realizations at high-ﬁdelity res-olution and accuracy. The model, Transient Multi-ﬁdelity Glow (TM-Glow), is aconditional invertible neural network that allows for the analytical evaluation of thelikelihood though the change of variables formula. TM-Glow is trained using vari-ational backwards KL divergence loss which allows for the seamless combination of44ata-driven and physics-constraint based learning. This model was demonstratedon two numerical examples to surrogate model turbulent ﬂow at diﬀerent Reynoldsnumbers as well as a stochastic boundary. With just the low-ﬁdelity solution, TM-Glow was able to predict diverse samples of turbulent ﬂow time-series that produceaccurate mean ﬁeld/turbulent statistics with error bars for uncertainty quantiﬁca-tion.The multi-ﬁdelity aspect of our model is a key ingredient. The low-ﬁdelity inputprovides critical information to the generative model such as information regardingboundary conditions, mean ﬂow properties and general ﬂow ﬁeld structure. Whilethis low-ﬁdelity simulation is typically inaccurate, it is a reliable starting point forthe model to extrapolate from. The prediction from low- to high-ﬁdelity is a signiﬁ-cantly simpler problem compared to a blind high-ﬁdelity ﬂow prediction allowing forreduced training data-set sizes and training time. For this reason, we believe thatdeep learning has signiﬁcant potential in multilevel/multi-ﬁdelity modeling of a vastnumber of physical systems where it can be used on even very high-dimensional com-plex phenomena due to a low-ﬁdelity solver aiding the machine learning model. Inthis spirit, future steps to be investigated include the extension of this model to othermulti-ﬁdelity physical systems. Additionally, as the deep learning ﬁeld evolves, moremodern architectures and training techniques could be integrated into the model toincrease its predictive capability. Regardless of potential future directions, TM-Glowdemonstrated that modern generative deep learning methods can be used eﬀectivelyfor multi-ﬁdelity modeling of complex dynamical systems.

Acknowledgements

The authors acknowledge support from the Defense Advanced Research ProjectsAgency (DARPA) under the Physics of Artiﬁcial Intelligence (PAI) program (con-tract HR00111890034). Computing resources were provided by the AFOSR Oﬃceof Scientiﬁc Research through the DURIP program and by the University of NotreDame’s Center for Research Computing (CRC). The work of NG was also supportedby the National Science Foundation (NSF) Graduate Research Fellowship Programgrant No. DGE-1313583. 45 eferences [1] S. B. Pope, Turbulent ﬂows, Cambridge University Press, Cambridge, 2000.[2] P. Sagaut, Multiscale and multiresolution approaches in turbulence: LES, DESand hybrid RANS/LES methods: applications and guidelines, World Scientiﬁc,2013.[3] S. M. Mitran, A comparison of adaptive mesh reﬁnement approaches for largeeddy simulation, Tech. rep., Washington University, Seattle, Department of Ap-plied Mathematics (2001).[4] M. Terracol, P. Sagaut, C. Basdevant, A multilevel algorithm for large-eddysimulation of turbulent compressible ﬂows, Journal of Computational Physics167 (2) (2001) 439 – 474. doi:10.1006/jcph.2000.6687 .URL [5] J. Hoﬀman, C. Johnson, A new approach to computational turbulencemodeling, Computer Methods in Applied Mechanics and Engineer-ing 195 (23) (2006) 2865 – 2880, incompressible CFD. doi:https://doi.org/10.1016/j.cma.2004.09.015 .URL [6] C. G. Speziale, Computing non-equilibrium turbulent ﬂows with time-dependentRANS and VLES, in: Fifteenth International Conference on Numerical Methodsin Fluid Dynamics, Springer, 1997, pp. 123–129. doi:10.1007/BFb0107089 .[7] A. Travin, M. Shur, M. Strelets, P. R. Spalart, Physical and numerical upgradesin the detached-eddy simulation of complex turbulent ﬂows, in: R. Friedrich,W. Rodi (Eds.), Advances in LES of Complex Flows, Springer Netherlands,Dordrecht, 2002, pp. 239–254. doi:10.1007/0-306-48383-1\_16 .[8] P. Qumr, P. Sagaut, Zonal multi-domain rans/les simulations of turbulent ﬂows,International Journal for Numerical Methods in Fluids 40 (7) (2002) 903–925. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/fld.381 , doi:10.1002/fld.381 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/fld.381 arXiv:https://doi.org/10.2514/1.3488 , doi:10.2514/1.3488 .URL https://doi.org/10.2514/1.3488 [10] M. Terracol, E. Manoha, C. Herrero, E. Labourasse, S. Redonnet, P. Sagaut,Hybrid methods for airframe noise numerical prediction, Theoretical andComputational Fluid Dynamics 19 (3) (2005) 197–227. doi:10.1007/s00162-005-0165-5 .[11] J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence modellingusing deep neural networks with embedded invariance, Journal of Fluid Me-chanics 807 (2016) 155166. doi:10.1017/jfm.2016.615 .URL https://doi.org/10.1007/s00162-005-0165-5 [12] H. Xiao, J.-L. Wu, J.-X. Wang, R. Sun, C. Roy, Quantifying and reducingmodel-form uncertainties in Reynolds-averaged Navier–Stokes simulations: Adata-driven, physics-informed Bayesian approach, Journal of ComputationalPhysics 324 (2016) 115–136. doi:10.1016/j.jcp.2016.07.038 .URL [13] J.-X. Wang, J.-L. Wu, H. Xiao, Physics-informed machine learning approachfor reconstructing Reynolds stress modeling discrepancies based on DNS data,Phys. Rev. Fluids 2 (2017) 034603. doi:10.1103/PhysRevFluids.2.034603 .URL https://link.aps.org/doi/10.1103/PhysRevFluids.2.034603 [14] N. Geneva, N. Zabaras, Quantifying model form uncertainty inReynolds-averaged turbulence models with Bayesian deep neural net-works, Journal of Computational Physics 383 (2019) 125 – 147. doi:10.1016/j.jcp.2019.01.021 .URL [15] S. Taghizadeh, F. D. Witherden, S. S. Girimaji, Turbulence closure modelingwith data-driven techniques: physical compatibility and consistency considera-tions, arXiv preprint arXiv:2004.03031.[16] Z. Wang, K. Luo, D. Li, J. Tan, J. Fan, Investigations of data-driven closure forsubgrid-scale stress in large-eddy simulation, Physics of Fluids 30 (12) (2018)4725101. doi:10.1063/1.5054835 .URL https://doi.org/10.1063/1.5054835 [17] C. J. Lapeyre, A. Misdariis, N. Cazard, D. Veynante, T. Poinsot,Training convolutional neural networks to estimate turbulent sub-gridscale reaction rates, Combustion and Flame 203 (2019) 255 – 264. doi:https://doi.org/10.1016/j.combustflame.2019.02.019 .URL [18] R. Maulik, O. San, A. Rasheed, P. Vedula, Subgrid modelling for two-dimensional turbulence using neural networks, Journal of Fluid Mechanics 858(2019) 122144. doi:10.1017/jfm.2018.770 .[19] J. Wu, H. Xiao, R. Sun, Q. Wang, Reynolds-averaged Navier-Stokes equationswith explicit data-driven Reynolds stress closure can be ill-conditioned, Journalof Fluid Mechanics 869 (2019) 553586. doi:10.1017/jfm.2019.205 .[20] J. Rabault, M. Kuchta, A. Jensen, U. R´eglade, N. Cerardi, Artiﬁcial neuralnetworks trained through deep reinforcement learning discover control strategiesfor active ﬂow control, Journal of Fluid Mechanics 865 (2019) 281302. doi:10.1017/jfm.2019.62 .[21] K. Bieker, S. Peitz, S. L. Brunton, J. N. Kutz, M. Dellnitz, Deep model predic-tive control with online learning for complex physical systems, arXiv preprintarXiv:1905.10094.[22] J. Tompson, K. Schlachter, P. Sprechmann, K. Perlin, Accelerating eulerianﬂuid simulation with convolutional networks, in: Proceedings of the 34th Inter-national Conference on Machine Learning - Volume 70, ICML’17, JMLR.org,2017, pp. 3424–3433.URL http://dl.acm.org/citation.cfm?id=3305890.3306035 [23] S. Wiewel, M. Becher, N. Thuerey, Latent space physics: Towards learn-ing the temporal evolution of ﬂuid ﬂow, Computer Graphics Forum 38 (2)(2019) 71–82. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13620 , doi:10.1111/cgf.13620 .URL https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13620 [24] B. Kim, V. C. Azevedo, N. Thuerey, T. Kim, M. Gross, B. Solenthaler, Deep ﬂu-ids: A generative network for parameterized ﬂuid simulations, Computer Graph-ics Forum 38 (2) (2019) 59–70. arXiv:https://onlinelibrary.wiley.com/ oi/pdf/10.1111/cgf.13619 , doi:10.1111/cgf.13619 .URL https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13619 [25] X. Guo, W. Li, F. Iorio, Convolutional neural networks for steady ﬂow approx-imation, in: Proceedings of the 22Nd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, KDD ’16, ACM, 2016, pp. 481–490. doi:10.1145/2939672.2939738 .URL http://doi.acm.org/10.1145/2939672.2939738 [26] L. Sun, H. Gao, S. Pan, J.-X. Wang, Surrogate modeling for ﬂuid ﬂows based onphysics-constrained deep learning without simulation data, Computer Methodsin Applied Mechanics and Engineering 361 (2020) 112732. doi:10.1016/j.cma.2019.112732 .[27] M. Raissi, Z. Wang, M. S. Triantafyllou, G. E. Karniadakis, Deep learningof vortex-induced vibrations, Journal of Fluid Mechanics 861 (2019) 119137. doi:10.1017/jfm.2018.872 .[28] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: Adeep learning framework for solving forward and inverse problems involvingnonlinear partial diﬀerential equations, Journal of Computational Physics 378(2019) 686 – 707. doi:https://doi.org/10.1016/j.jcp.2018.10.045 .URL [29] R. Han, Y. Wang, Y. Zhang, G. Chen, A novel spatial-temporal predictionmethod for unsteady wake ﬂows based on hybrid deep neural network, Physicsof Fluids 31 (12) (2019) 127101. doi:10.1063/1.5127247 .URL https://doi.org/10.1063/1.5127247 [30] O. Hennigh, Lat-net: compressing lattice Boltzmann ﬂow simulations using deepneural networks, arXiv preprint arXiv:1705.09036.[31] A. Mohan, D. Daniel, M. Chertkov, D. Livescu, Compressed convolutional lstm:An eﬃcient deep learning framework to model high ﬁdelity 3d turbulence, arXivpreprint arXiv:1903.00033.[32] M. Werhahn, Y. Xie, M. Chu, N. Thuerey, A multi-pass GAN for ﬂuid ﬂowsuper-resolution, arXiv preprint arXiv:1906.01689.4933] A. Subramaniam, M. L. Wong, R. D. Borker, S. Nimmagadda, S. K. Lele,Turbulence enrichment using generative adversarial networks, arXiv preprintarXiv:2003.01907.[34] J. Holgate, A. Skillen, T. Craft, A. Revell, A review of embedded large eddysimulation for internal ﬂows, Archives of Computational Methods in Engineering26 (4) (2019) 865–882. doi:10.1007/s11831-018-9272-5 .[35] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoderdecoder networks forsurrogate modeling and uncertainty quantiﬁcation, Journal of ComputationalPhysics 366 (2018) 415 – 447. doi:10.1016/j.jcp.2018.04.018 .URL [36] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, P. Perdikaris, Physics-constraineddeep learning for high-dimensional surrogate modeling and uncertainty quan-tiﬁcation without labeled data, Journal of Computational Physics 394 (2019)56 – 81. doi:https://doi.org/10.1016/j.jcp.2019.05.024 .URL [37] N. Geneva, N. Zabaras, Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks, Journal of Computational Physics(2019) 109056 doi:10.1016/j.jcp.2019.109056 .URL [38] D. P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprintarXiv:1312.6114.[39] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neuralinformation processing systems, 2014, pp. 2672–2680.[40] L. Dinh, D. Krueger, Y. Bengio, Nice: Non-linear independent componentsestimation, arXiv preprint arXiv:1410.8516.[41] L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using real nvp, arXivpreprint arXiv:1605.08803. 5042] D. P. Kingma, P. Dhariwal, Glow: Generative ﬂow with invertible 1x1 con-volutions, in: Advances in Neural Information Processing Systems, 2018, pp.10215–10224.[43] J.-H. Jacobsen, A. Smeulders, E. Oyallon, i-revnet: Deep invertible networks,arXiv preprint arXiv:1802.07088.[44] M. Kumar, M. Babaeizadeh, D. Erhan, C. Finn, S. Levine, L. Dinh,D. Kingma, Videoﬂow: A ﬂow-based generative model for video, arXiv preprintarXiv:1903.01434.[45] E. G. Tabak, E. Vanden-Eijnden, et al., Density estimation by dual ascent of thelog-likelihood, Communications in Mathematical Sciences 8 (1) (2010) 217–233.[46] E. G. Tabak, C. V. Turner, A family of nonparametric density estimation al-gorithms, Communications on Pure and Applied Mathematics 66 (2) (2013)145–164.[47] L. Ardizzone, C. L¨uth, J. Kruse, C. Rother, U. K¨othe, Guided image generationwith conditional invertible neural networks, arXiv preprint arXiv:1907.02392.[48] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.[49] X. SHI, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, W.-c. WOO, Con-volutional LSTM network: A machine learning approach for precipitationnowcasting, in: Advances in Neural Information Processing Systems 28, CurranAssociates, Inc., 2015, pp. 802–810.URL http://papers.nips.cc/paper/5955-convolutional-lstm-network-a-machine-learning-approach-for-precipitation-nowcasting.pdf [50] S. Mo, Y. Zhu, N. Zabaras, X. Shi, J. Wu, Deep convolutional encoder-decoder networks for uncertainty quantiﬁcation of dynamic multiphase ﬂowin heterogeneous media, Water Resources Research 55 (1) (2019) 703–728. doi:10.1029/2018WR023528 .URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018WR023528 [51] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely connectedconvolutional networks, in: The IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2017.5152] S. Ioﬀe, C. Szegedy, Batch normalization: Accelerating deep network trainingby reducing internal covariate shift, arXiv preprint arXiv:1502.03167.[53] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectiﬁer neural networks, in:Proceedings of the fourteenth international conference on artiﬁcial intelligenceand statistics, 2011, pp. 315–323.[54] W. Grathwohl, R. T. Chen, J. Betterncourt, I. Sutskever, D. Duvenaud, Ffjord:Free-form continuous dynamics for scalable reversible generative models, arXivpreprint arXiv:1810.01367.[55] C. Li, J. Li, G. Wang, L. Carin, Learning to sample with adversarially learnedlikelihood-ratio (2018).URL https://openreview.net/forum?id=S1eZGHkDM [56] Y. Yang, P. Perdikaris, Adversarial uncertainty quantiﬁcation in physics-informed neural networks, Journal of Computational Physics 394 (2019) 136 –152. doi:10.1016/j.jcp.2019.05.027 .URL [57] R. Kumar, S. Ozair, A. Goyal, A. Courville, Y. Bengio, Maximum entropygenerators for energy-based models, arXiv preprint arXiv:1901.08508.[58] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang, A tutorial on energy-based learning, Predicting structured data.[59] I. Sobel, G. Feldman, A 3x3 isotropic gradient operator for image processing,Presented at a talk at the Stanford Artiﬁcial Intelligence Project (1968) 271–272.[60] W. Xiong, W. Luo, L. Ma, W. Liu, J. Luo, Learning to generate time-lapsevideos using multi-stage dynamic generative adversarial networks, in: Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition,2018, pp. 2364–2373.[61] L. Zhao, X. Peng, Y. Tian, M. Kapadia, D. Metaxas, Learning to forecast andreﬁne residual motion for image-to-video generation, in: Proceedings of theEuropean Conference on Computer Vision (ECCV), 2018, pp. 387–403.[62] P. Liu, X. Qiu, X. Chen, S. Wu, X.-J. Huang, Multi-timescale long short-termmemory neural network for modelling sentences and documents, in: Proceedings52f the 2015 conference on empirical methods in natural language processing,2015, pp. 2326–2335.[63] J. Chung, S. Ahn, Y. Bengio, Hierarchical multiscale recurrent neural networks,arXiv preprint arXiv:1609.01704.[64] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXivpreprint arXiv:1412.6980.[65] E. Erturk, Numerical solutions of 2-D steady incompressible ﬂowover a backward-facing step, Part I: High Reynolds number solu-tions, Computers and Fluids 37 (6) (2008) 633 – 655. doi:https://doi.org/10.1016/j.compfluid.2007.09.003 .URL [66] L. Chen, K. Asai, T. Nonomura, G. Xi, T. Liu, A review ofBackward-Facing Step (BFS) ﬂow mechanisms, heat transfer and con-trol, Thermal Science and Engineering Progress 6 (2018) 194 – 216. doi:https://doi.org/10.1016/j.tsep.2018.04.004 .URL [67] H. Jasak, A. Jemcov, Z. Tukovic, et al., OpenFOAM: A C++ library for com-plex physics simulations, in: International workshop on coupled methods innumerical dynamics, Vol. 1000, IUC Dubrovnik, Croatia, 2007, pp. 1–20.[68] J. C. Hunt, A. A. Wray, P. Moin, Eddies, streams, and convergence zones inturbulent ﬂows, in: Center for Turbulence Research Report, Vol. CTR-S88,1988.URL https://ntrs.nasa.gov/search.jsp?R=19890015184 [69] G. Haller, An objective deﬁnition of a vortex, Journal of Fluid Mechanics 525(2005) 1–26. doi:10.1017/S0022112004002526 .[70] Y.-H. Tseng, C. Meneveau, M. B. Parlange, Modeling ﬂow around bluﬀ bodiesand predicting urban dispersion using large eddy simulation, EnvironmentalScience & Technology 40 (8) (2006) 2653–2662. doi:10.1021/es051708m .URL https://doi.org/10.1021/es051708m doi:10.1061/(ASCE)0733-9429(1998)124:3(288) .URL https://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9429%281998%29124%3A3%28288%29 [72] W. Huang, Q. Yang, H. Xiao, CFD modeling of scale eﬀects on tur-bulence ﬂow and scour around bridge piers, Computers and Fluids38 (5) (2009) 1050 – 1058, advances in Computational Fluid Dynamics. doi:https://doi.org/10.1016/j.compfluid.2008.01.029 .URL [73] M. Samorani, The wind farm layout optimization problem, in: P. M. Pardalos,S. Rebennack, M. V. F. Pereira, N. A. Iliadis, V. Pappu (Eds.), Handbook ofWind Power Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp.21–38. doi:10.1007/978-3-642-41080-2\_2 .URL https://doi.org/10.1007/978-3-642-41080-2_2 [74] J. S. Gonzlez, A. G. G. Rodriguez], J. C. Mora, J. R. Santos,M. B. Payan, Optimization of wind farm turbines layout using anevolutive algorithm, Renewable Energy 35 (8) (2010) 1671 – 1681. doi:https://doi.org/10.1016/j.renene.2010.01.010 .URL [75] M. H. Patel, Dynamics of oﬀshore structures, Butterworth-Heinemann, 2013.[76] J. Smagorinsky, General circulation experiments with the primitive equations:I. the basic experiment, Monthly Weather Review 91 (3) (1963) 99–164. doi:10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2 .URL https://doi.org/10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2https://doi.org/10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2