Multi-fidelity Generative Deep Learning Turbulent Flows
MMulti-fidelity Generative Deep Learning Turbulent Flows
Nicholas Geneva a , Nicholas Zabaras a, ∗ a Center for Informatics and Computational Science, University of Notre Dame, 311 CushingHall, Notre Dame, IN 46556, USA
Abstract
In computational fluid dynamics, there is an inevitable trade off between accuracyand computational cost. Low-fidelity simulations with coarse discretizations arecomputationally inexpensive, however, the resulting flow fields are often inaccurate.Alternatively, high-fidelity simulations can yield accurate predictions but at exponen-tially higher computational cost. In this work, a novel multi-fidelity deep generativemodel is introduced for the surrogate modeling of high-fidelity turbulent flow fieldsgiven the solution of a computationally inexpensive but inaccurate low-fidelity solver.The resulting surrogate is able to generate physically accurate turbulent realizationsat a computational cost magnitudes lower than that of a high-fidelity simulation.The deep generative model developed is a conditional invertible neural network,built with normalizing flows, with recurrent LSTM connections that allow for stabletraining of transient systems with high predictive accuracy. The model is trainedwith a variational loss that combines both data-driven and physics-constrained learn-ing. This deep generative model is applied to non-trivial high Reynolds number flowsgoverned by the Navier-Stokes equations including turbulent flow over a backwardsfacing step at different Reynolds numbers and turbulent wake behind an array ofbluff bodies. For both of these examples, the model is able to generate unique yetphysically accurate turbulent fluid flows conditioned on an inexpensive low-fidelitysolution.
Keywords:
Physics-Informed Machine Learning, Multi-fidelity Modeling, InvertibleDeep Neural Networks, Uncertainty Quantification, Turbulent Fluid Flow ∗ Corresponding author
Email addresses: [email protected] (Nicholas Geneva), [email protected] (NicholasZabaras)
URL: https://cics.nd.edu/ (Nicholas Zabaras)
Preprint submitted to Elsevier June 9, 2020 a r X i v : . [ phy s i c s . c o m p - ph ] J un . Introduction The numerical simulation and analysis of turbulent fluid flows is of great importanceto many scientific and engineering domains. Over the past several decades compu-tational fluid dynamics (CFD) has become an integral component of academia andindustry. However high-accuracy fluid simulation remains a computationally de-manding task particularly at high Reynolds numbers for which the flow is turbulent.This has led to a hierarchy of simulation models to predict fluid flow ranging fromthe fast but typically inaccurate Reynolds-Averaged Navier-Stokes (RANS) to thefully resolved but super-computer demanding direct numerical simulation (DNS) [1].Large-eddy simulation (LES) has become a work horse method for scientific and in-dustrial analysis since it can achieve both reasonable accuracy and computationalrequirement. Often only a section of the entire simulation domain is of interest orrequires a greater degree of accuracy. Such examples include boundary layers, tur-bulent wakes behind a suspended or wall mounted objects, the interface betweentwo fluids, shock boundaries, etc. This principle that different physical scales are ofinterest in different locations of the simulation domain has led to the developmentof various multiscale/multilevel methods [2]. Multiscale methods typically combinesimulations at different resolutions to increase the accuracy of the simulation withminimal computational overhead.Multiscale computational fluid dynamic methods constitute a rich and well de-veloped field that encompasses many different methodologies that approach the mul-tiscale aspect through different philosophies. Of particular interest are adaptivemultilevel methods which focus on resolving different scales based on the complexityof the fluid flow [2]. Such approaches use a hierarchy of grids at various resolutionsto resolve particular areas of the simulation domain at various levels of accuracy.This includes methods that use self-adaptive meshes in which the discretization ofthe simulation domain is evolved to meet specific resolution criteria [3, 4, 5], globalhybrid methods such as very large eddy simulation (VLES) [6] or detached eddysimulation (DES) [7] and zonal methods for which prespecified regions of the flowdomain are resolved with higher accuracy to capture relevant physics [8, 9, 10]. Wetake inspiration from these multiscale models to develop a deep learning model thattakes advantage of simulations ran at multiple scales to predict high-fidelity turbulentfluid flow. This deep learning model replaces costly high-fidelity simulation enablingus to obtain fast yet accurate turbulent statistics given a coarse simulation.Machine learning in CFD, specifically the modeling of the N-S equations, hasgained a growing interest in recent years with a wide variety of methods rangingfrom Kalman filters to deep neural networks. These applications can be brokendown into several major categories including: RANS turbulence modeling, LES sub-2cale grid modeling, flow control and direct flow prediction. Machine learning basedturbulence modeling for RANS simulation seeks to approximate the Reynolds-Stressterm in the RANS equations at an accuracy that is higher than the traditionallyused closure models through the incorporation of prior physical knowledge and high-fidelity information [11, 12, 13, 14, 15]. Similarly, machine learning LES modelsseek to achieve the same goal of providing a sub-scale grid model that predictsthe contribution of neglected turbulent length scales at a higher accuracy than thetraditional methods [16, 17, 18]. These approaches are both very promising, howeverstill rely on pre-existing physical assumptions, approximations and resolutions whichfundamentally limit their predictive capability [19]. Another area of interest hasbeen the use of machine learning models to build a controller to yield a particularfluid response [20, 21].The final category we discuss is direct fluid flow prediction where the machinelearning model is used to predict the state variables of the fluid flow directly. Thisincludes the use of machine learning to approximate fluid flows for graphical simu-lations [22, 23, 24], prediction of steady-state flows [25, 26], prediction of oscillat-ing/unsteady flows [21, 27, 28, 29], and the super-resolution, compression or repro-duction of various fluid systems [30, 31, 32, 33]. While machine learning has becomea popular tool to predict the behavior of fluids, we note that the majority of thetest cases considered are focused on simple non-turbulent problems. Many worksthat predict turbulent flows are largely focused on qualitative results (e.g. com-puter graphics). This is expected due to the shear complexity of N-S turbulencethat poses a challenging problem for even traditional numerical methods let alonemachine learning models. Given that the vast majority of fluid flows of interest areturbulent in nature, much work is still needed to push the application of machinelearning to practical fluid flow problems of engineering concern.In this work, we accelerate the prediction of high-fidelity turbulent flows given acomputationally inexpensive low-fidelity simulation through generative deep learn-ing. Although similar ideas have been presented in past literature, the proposedmodel differs in several respects. First, we are interested in the prediction of phys-ical turbulent fluid flow governed by the Navier-Stokes equations differing from thesimpler inviscid Euler equations used in computer graphics [22, 23, 24]. Second, ina similar spirit, we are interested in recovering accurate time-averaged and turbu-lent statistics as oppose to fluid flows that are just visually pleasing. Third, in thiswork the input of our model is an inexpensive low-fidelity simulation that providesa coarse yet fairly inaccurate prediction. This contrasts to many works in machinelearning for turbulent applications where compressed [30, 31] or sub-sampled [32, 33]fields of the high-fidelity target are used as the input. Some auto-regressive models,3uch as the deep neural network (DNN) in [29], are in fact even more dependent ona high-fidelity simulation which is needed to start the time-series prediction. Thereliance upon a direct/coarsened high-fidelity field as a model input contains muchricher and more accurate information than a low-fidelity simulation since it is beingsampled from a space for which the physics simulated is significantly more precise.While this makes the machine learning problem significantly easier, it also results inmodels that need an expensive high-fidelity simulation to derive an input for makingpredictions. Thus the applicability of such models remains questionable. Fourth,we are interested in developing a surrogate model that can be used to predict mul-tiple flows with different boundary conditions as opposed to just learning a singleflow which is essential for justifying the model’s training cost. Lastly, in contrast topast deterministic approaches [30, 31, 33], our generative model learns the probabil-ity distribution of high-fidelity flow fields conditioned on the low-fidelity simulationallowing for predictive probabilistic estimates.This paper makes the following novel contributions to the integration of deeplearning with CFD: (a) A multi-fidelity deep generative model is proposed for theprediction of physical high-fidelity fluid flow from a low-fidelity solution. (b) Anovel invertible neural network architecture is proposed to model the distribution ofpossible high-fidelity fluid flow solutions conditioned on the low-fidelity observation.(c) A backwards Kullback-Leibler (KL) divergence loss is used that allows for physics-constrained and standard data-driven training of the generative model. (d) Themodel is deployed and evaluated for surrogate modeling of turbulent flows at differentReynolds numbers and varying boundary conditions.The remainder of this paper is structured as follows. In Section 2, the problem ofmulti-fidelity generative modeling turbulent fluid flows is introduced and discussed.In Section 3, the generative invertible neural network architecture is introduced withdetails of each component of the model. Following in Section 4, the variationaltraining of the generative model is outlined as well as the tuning of the model’shyper-parameters. The first numerical example, in Section 5, investigates the surro-gate modeling of turbulent flow over a backwards facing step at different Reynoldsnumbers. The second numerical example, in Section 6, focuses on the prediction ofturbulent wake behind an array of bluff bodies in varying locations. In Section 7, thecomputational cost of both training and testing the proposed deep learning modelis discussed. Lastly, conclusions and discussion are provided in Section 8. All code,trained models and data used in this work is open-sourced for full reproducibility. Code available upon publication. . Multi-fidelity Generative Modeling Fluid Flows Multiscale fluid simulation methods seek to strike an ideal balance between predic-tive accuracy and computational requirement. In particular, zonal/hybrid methodscouple a low-fidelity simulation with a high-fidelity simulation that is only evaluatedin an area of interest. This is most commonly done through the use of RANS orunsteady RANS in the low-fidelity region and LES in the high-fidelity region. Here,we consider the use of a very large eddy simulation (VLES) simulation (a LES simu-lation where the majority of the kinetic energy is unresolved due to a coarse grid) anda LES simulation on a finer mesh for the low- and high-fidelity areas, respectively.As depicted in Fig. 1a, this results in two coupled simulations which are solved simul-taneously with information being passed through the boundary of the high-fidelitysimulation domain. The objective in this work is to replace this high-fidelity sim-ulation zone with a fast generative deep learning model which can quickly predicta high-fidelity realization given the low-fidelity simulation as illustrated in Fig. 1b.We refer to this framework as a multi-fidelity generative model due to the distinctdifferent physical scales resolved by the input and output. While the scope of thenumerical examples explored in this work is focused on the use of low-fidelity andhigh-fidelity LES simulations, everything discussed in this work can be extended toother multi-fidelity models using different coarse/fine simulation schemes. We alsonote that there is no limit on the size of the prediction area by the deep learningmodel, i.e. it can be the entire simulation domain if necessary. However, in thiswork, we are motivated out of engineering needs where such zonal approaches areextremely applicable. (a) Hybrid VLES-LES [34]. (b) Multi-fidelity deep generative turbulence.
Figure 1: Comparison between traditional hybrid VLES-LES simulation (left) and the proposedmulti-fidelity deep generative turbulence model (right) for studying the wake behind a wall-mountedcube.
Simply learning a single solution of a PDE with a deep learning model has littlepractical benefit when a numerical solver exists due to the time and computationalinvestment needed to tune and train the model. Thus we are interested in surrogate5odeling turbulence in multiple flows with varying boundary conditions (e.g. obsta-cle position or inlet velocity). This is of particular interest for various engineeringtasks including fluid-structure design/optimization, inverse modeling and uncertaintyquantification. To formalize the problem of interest consider an incompressible flowgoverned by the Navier-Stokes (N-S) equations: ∂u j ∂t + u i ∂u j ∂x i = − ρ ∂p∂x j + ν eff ∂ u j ∂x i ∂x i , x ∈ Ω , t ∈ [0 , T ] ,u i ( x ,
0) = u ( x ) , p ( x ,
0) = p ( x ) , B ( u i , p ) = b ( x , t ) , x ∈ Γ , (1)where { u j , p } are the velocity components and pressure, respectively, being resolvedwithin the spatial domain Ω. ν eff is the effective kinematic viscosity which canrepresent the true dynamic viscosity, in the case of DNS, as well as turbulent dissi-pation from length scales not resolved, in the case of LES and URANS. Γ denotes theboundary of the domain of interest for which the boundary operator B imposes thedesired boundary conditions. The initial state of the system is defined by { u , p } .As depicted in Fig. 1b, we wish to build a deep generative model to infer from alow-fidelity flow field the corresponding high-fidelity realizations. Due to their pastsuccess for modeling physical systems [35, 36, 37], we will choose to use a convolutionbased generative model with learnable parameters θ . The use of convolutions impliesthat the data is placed onto a structured Euclidean grid, akin to that of pixels inimages. For example, given a two-dimensional incompressible fluid flow field, theprediction of a single time-step n would have a low-fidelity input x n = { u l , p l } ∈ R ,d l ,d l and a high-fidelity output y n = { u h , p h } ∈ R ,d h ,d h both of which span thesame domain Ω (cid:48) ∈ Ω as depicted by the dashed boxes in Fig. 1b. Although omitted,this is easily extendable to one- and three-dimensional fluid flows as well. Given that d l , d l < d h , d h , this requires the model to predict length scales not recovered bythe coarse simulation making this problem ill-posed and motivating the need of agenerative probabilistic model to predict the density p ( y n | x n ) as opposed to a singledeterministic solution. Remark 1.
The inclusion of a low-fidelity simulator as an input to the deep learningsurrogate allows for important information regarding the boundary conditions of theflow and approximate flow properties to be provided to the model. This simplifiesthe learning task significantly by providing a physical coarse estimate of the flow,which is important for the prediction of the solution of the highly non-linear N-S equations at high Reynolds numbers. While we have shown in our past workthat deep learning surrogates can successfully model many complex physical systems6ndependently [35, 36, 37], the systems of interest in these works are far less complexthan the turbulent N-S equations.Given that turbulence is a transient phenomenon, predicting at a single time-stepis not sufficient, thus we wish to predict an entire time-series high-fidelity Y = (cid:8) y , y , . . . , y N (cid:9) given the respective low-fidelity observations X = (cid:8) x , x , . . . , x N (cid:9) .Although extensions can be made to simplify the model presented, we will assumethat the time-step size, ∆ t , of the low-fidelity input and high-fidelity output is thesame (i.e. for each input there is one output). The objective for this surrogate is forfluid flow applications for which the boundary conditions are stochastic such that b ( ˆ x , t ) ∼ p ( b ), where p ( b ) is an empirical, analytically known or an unknown proba-bility distribution. This spans problems including the modeling of a flow at differentReynolds numbers, different domain boundary conditions, flow through varying ge-ometries or different initial conditions making this relevant to a vast number of fluidmechanics research studies. Given that we wish to predict an entire time-seriesof high-fidelity realizations, Y , we pose the following definition for the generativemulti-fidelity surrogate for flows with a stochastic boundary. Definition 2.1. Generative Surrogate for Flows with Stochastic Bound-ary Conditions.
Consider a low- and high-fidelity simulators that compute fluidflow governed by the N-S equations. For a given finite set of boundary conditions { b ( ˆ x , t ) i } Mi =1 ∼ p ( b ), these simulators are used to collect a training set of low- andhigh-fidelity simulation data D = { X i , Y i } Mi =1 in the time interval t ∈ [0 , T ]. Theproblem of interest is training a generative surrogate to learn p θ ( Y | X ) and computethe predictive conditional density p θ ( Y ∗ | X ∗ , D ) of the high-fidelity flow field Y ∗ for any low-fidelity flow time-series X ∗ for a given boundary condition b ∗ ( x , t ) ∼ p ( b ).
3. Transient Multi-fidelity Glow
Deep generative models provide a flexible probabilistic framework with the mostfundamental formulation centered around the use of random latent variables, z , in adeep learning model (i.e. a neural network) to allow for the likelihood of the model’soutput, y , to be expressed as the following marginal: p θ ( y ) = (cid:90) p θ ( y | z ) p θ ( z ) d z , (2)in which θ denotes the model’s parameters. In this work, the model’s output, y ,is the high-fidelity flow field we wish to predict, however in this particular section7 should be interpreted as a much more abstract output encompassing a wide va-riety of machine learning problems. The latent variables are specifically designedsuch that their distribution is simple for sampling. However, this marginal is typi-cally not practical to train due to the large number of samples needed from p θ ( z )to approximate the marginalization. Hence, generative models such as variationalauto-encoders (VAEs) [38] as well as generative adversarial networks (GANs) [39] ap-proximate this likelihood through variational inference or by a min-max adversarialgame, respectively.In this work, we will utilize normalizing flows which in recent years have gainedincreasing attention due to their extension to invertible neural networks (INNs) fortasks such as variational inference and generative modeling [40, 41, 42, 43, 44]. Gen-erative normalizing flows provide a bijective mapping between an unknown likeli-hood density of the observations p θ ( y ) and a known latent density p θ ( z ). Typically, p θ ( y ) can be viewed as the unknown likelihood of a system for which we have afinite number of observations, i.e. training data. Let us consider a mapping witha tractable Jacobian determinant, henceforth referred as the Jacobian, which allowsfor the likelihood to be expressed w.r.t. the latent density as follows: p θ ( y ) = p θ ( z ) (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂ z ∂ y (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (3)which is nothing more than the change of variables formula. This implies that themodel can be trained by maximizing the likelihood of p θ ( y ) (unknown) through thelatent variables assigned a simple distribution p θ ( z ) a-priori (typically Gaussian).As depicted in Fig. 2a, we use f θ ( · ) to denote the learnable function, with a tractableJacobian, that transforms observation to latent variables. To generate samples of y i ∼ p θ ( y ), samples are drawn form the latent distribution z i ∼ p θ ( z ) which arethen transformed using the inverse of the model f − θ ( · ).However, the requirement for a tractable Jacobian as well as a function that canefficiently be inverted for sampling is not trivial. Normalizing flows address thischallenge by using a series of change of variable transformations [45, 46], y f θ ←→ h f θ ←→ h . . . f θ K ←−→ z , (4)each of which has a tractable Jacobian and is invertible. This allows for the log ofthe likelihood to be written as a summation of Jacobians:log p θ ( y ) = log p θ ( z ) + K (cid:88) k =1 log (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂ h k ∂ h k − (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (5)8n which h ≡ y and h K ≡ z .The core ideas of normalizing flows can be extended to constructing generativedeep neural network models. A particular subset of flow-based deep learning modelswe are interested in are coupling layer normalizing flows first proposed in NICE [40]and Real NVP [41]. A coupling layer is a carefully designed function such that theinverse mapping and the Jacobian can be easily calculated. These layers can thenbe stacked, just like layers of a neural network to form an expressive model with atractable Jacobian and inverse. To increase the expressive capabilities of normalizingflow models, various transformations have been proposed to envelope coupling layerssuch as 1 × p θ ( y | x ), for standard machine learning problems andsurrogate modeling of physical systems [36, 47]. (a) INN (b) CINN (c) TM-Glow Figure 2: Comparison of the forward and backward passes of various INN structures including(left to right) the standard INN, conditional INN (CINN) [36] and transient multi-fidelity Glow(TM-Glow) introduced in Section 3.2.
As discussed in Section 2, we are interested in the prediction of a high-fidelity flow, Y = (cid:8) y , y , . . . , y N (cid:9) , given the corresponding solution of a low-fidelity simulation, X = (cid:8) x , x , . . . , x N (cid:9) . In our past work [37], we formulated a deep convolutionalauto-regressive model for modeling the evolution of a transient PDE as a Markovchain. To increase the predictive capability of our model and integrate the low-fidelityobservations, in this work we will use a deep recurrent neural network (RNN) formu-lation which is a standard approach for time-series predictions in deep learning [48].While our model will still predict a single time-step at a time, latent information ispassed between time-steps that the model can learn. The computational graph ofthis RNN with recurrent features τ n is depicted in Fig. 3. The likelihood for the9ntire time-series can be decomposed as follows: p θ ( Y | X ) = N (cid:89) n =1 p θ (cid:0) y n | x n (cid:1) = N (cid:89) n =1 p θ (cid:0) y n | x n , τ n − (cid:1) , (6)in which the recurrent features, τ n − , carry information from past time-steps x n − [48].This requires some initialization for τ to be defined. In this work, these states aremade random with a known distribution as discussed in Section 3.3.1, however thereare many alternatives in RNN literature such as making them constant (i.e. deltafunction density function) or making them learnable parameters. Figure 3: Unfolded computational graph of a recurrent neural network model for which the arrowsshow functional dependence.
Implementing the RNN framework, our goal is to develop a generative model thatcan learn the conditional density, p θ ( y n | x n , τ n − ), in a single high-fidelity time-step.This poses the following three design requirements: a) the core of our model must begenerative allowing for probabilistic modeling of the likelihood, b) we must formulatea method for encoding the low-fidelity inputs x into features that can condition thegenerator and c) recurrent connections need to be integrated into the heart of thegenerative model to condition it on temporal information. To this end, we presenta novel Transient Multi-fidelity Glow (TM-Glow) model for probabilistic surrogatemodeling of dynamical systems illustrated in Fig. 4. TM-Glow is built around theGlow model proposed by Kingma et al . [42] which will be the core generative INNfor modeling the conditional likelihood. This model is depicted in the right columnof Fig. 4a and the blue boxes in Fig. 4b. Glow is designed to provide a multiscaleencoding of the high-fidelity fields, y n , into a set of random latent variables, z n , rep-resented by the orange boxes in Fig. 4. To address the second design requirement,we use the convolutional conditional encoder proposed by Zhu et al . [36] which con-ditions Glow model on the low-fidelity input, x n , through a set of learnable features.This conditional encoder is shown in the left column of Fig. 4a and the pink boxesin Fig. 4b. Lastly, to allow for temporal conditioning of the Glow model, recurrent10onnections are integrated into novel LSTM affine coupling blocks discussed in Sec-tion 3.3.1. These LSTM based operations allow for recurrent features to flow in andout of the generator illustrated by the green boxes in Fig. 4b. (a) TM-Glow model schematic. (b) Dimensionality representation of TM-Glowwith a model depth of k d = 3. Figure 4: TM-Glow model. This model is comprised of a low-fidelity encoder that conditions agenerative flow model to produce samples of high-fidelity field snapshots. LSTM affine blocks areintroduced to pass information between time-steps using recurrent connections. Boxes with roundedcorners in (a) indicate a stack of the elements inside and should not be confused with plate notation.Arrows illustrate the forward pass of the INN. (For interpretation of the colors in the figure(s), thereader is referred to the web version of this article.)
As shown in Fig. 4a, the TM-Glow core component is the multiscale Glow modelcomprised of squeeze, LSTM Affine Block and split operations discussed in detail inSection 3.3. Superscript numbers enclosed by parenthesis are used to denote variablesat different TM-Glow model levels. We emphasize that TM-Glow is an INN, thusthis model can evaluate the conditional likelihood exactly through the change ofvariables:log p θ ( Y | X ) = N (cid:88) n =1 log p θ (cid:0) z n | x n , τ n − (cid:1) + K (cid:88) k =1 log (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂ h nk ∂ h nk − (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (7)in which { h nk } Kk =1 is used to denote the hidden layers of TM-Glow that are the in-puts/outputs of the various invertible operations discussed in Section 3.3 and specifi-cally in Table 1. The forward pass of the model, f θ ( · ), encodes the high-fidelity obser-11ation y n into a set of random latent variables z n = (cid:8) z (1) ,n , z (2) ,n , . . . , z ( k d ) ,n , z ( e ) ,n (cid:9) .The backward/inverse pass of the model, f − θ ( · ), generates a sample of y n by sam-pling each random latent variable. The novel LSTM Affine block contains recurrentconnections between time-steps conditioning the INN on the latent states of pre-vious time-steps τ n − = (cid:8) τ (1) ,n − , ..., τ ( k d ) ,n − (cid:9) . The dense convolutional encoder,detailed in Section 3.4, encodes a low-fidelity input into conditional feature maps, ξ n = (cid:8) ξ (1) ,n , ξ (2) ,n , . . . , ξ ( k d ) ,n (cid:9) that are injected into the multiscale glow at each di-mensional level as depicted in Fig. 4b. The use of the recurrent connections in theLSTM block as well as the conditioning encoder results in the following directedgraphical representation of the model in Fig. 5. Figure 5: The unrolled computational graph of the TM-Glow model for a model depth of k d = 3. Our model is centered around a multiscale structure to promote the discovery of low-dimensional representation of the physics that govern the system. As seen in Fig. 4, amultiscale Glow model originally proposed by Dinh et al . [41] is employed to generatea flow field realization. In Fig. 4a, this is the right column of the model comprisedof Squeeze, LSTM Affine Block and Split operations each of which is invertible.We remind the reader that the goal of each of these operations is to provide acomputationally efficient but descriptive mapping between the high-fidelity flow fieldand the random latent variables. As previously discussed in Eq. (4), this is achievedthrough the series of transformations between the hidden layers { h nk } Kk =1 . Thesetransformations are precisely the operations discussed in the subsequent sectionsand listed in Table 1. 12 .3.1. LSTM Affine Block The core component of the generative portion of the TM-Glow model is the LSTMaffine block, which is a novel extension of the conditional affine coupling layers [36, 47]designed specifically for transient time-series prediction. The LSTM affine block iscomprised of three different sub-components illustrated in Fig. 6: an unnormalizedconditional affine block, a stack of conditional affine blocks and a conditional LSTMaffine block.
Figure 6: The LSTM affine block used in TM-Glow consisting of k c affine coupling layers includingan unnormalized conditional affine block (UnNorm Block), a stack of conditional affine blocks(Conditional Block) and a conditional LSTM affine block (LSTM Block). The core component of all these blocks are affine coupling layers [40], a speciallydesigned function that allows for an efficient inversion and Jacobian calculation. Asdepicted in Fig. 7a, half of the input, h k − , is modified by the scale and translationparameters, s and t , respectively, calculated from a coupling neural network (cou-pling NN). As implemented in Zhu et al . [36], this coupling NN is a shallow denseconvolutional network with an input of the other half original feature map, h k − , andthe conditional input ξ ( i ) ,n , which are simply concatenated together. This couplingNN contains the learnable parameters that can be updated using any gradient decentmethod. As detailed in Table 1, the retention of input to the coupling NN allows fora simple inversion and Jacobian calculation.This conditional affine coupling layer is further extended with a convolutionalLSTM (convLSTM) depicted in Fig. 7b for transient problems. ConvLSTM is a vari-ation of the traditional LSTM structure that employs convolutional operations [49],making it better suited for convolutional models such as TM-Glow. This input tothe ConvLSTM is the same as the input of the coupling NN in the conditionalcoupling layer, which is conditioned on ξ ( i ) ,n . Following the standard ConvLSTMformulation, the recurrent features have two states, τ ( i ) ,n − = (cid:110) a ( i ) in , c ( i ) in (cid:111) , which cor-respond to the LSTM hidden and cell state, respectively. The output of the LSTM, τ ( i ) ,n = (cid:110) a ( i ) out , c ( i ) out (cid:111) , is passed to the subsequent time-step and a ( i ) out is used as an13nput to the coupling NN of the affine layer. The initial states of the hidden and cellstates at the first time-step are assigned the following densities: τ ( i ) , = (cid:110) a ( i ) in ∼ U ( − , , c ( i ) in ∼ N (0 , (cid:111) . (8)The resulting coupling layer is conditioned on both the current low-fidelity input aswell as past time-step states. In the recent work of Kumar et al . [44], residual con-nections between generative flow models are also proposed which are implemented bysimply using the previous latent variables as an input to the shallow neural networkin the split operation, as detailed in Section 3.3.3. The proposed use of ConvL-STM affine layer prevents a vanishing gradient and enables much more descriptiverecurrent feature maps to be learned. (a) Conditional coupling layer. (b) Conditional LSTM coupling layer. Figure 7: The two variants of affine coupling layers used in TM-Glow with an input and outputdenoted as h k − = (cid:8) h k − , h k − (cid:9) and h k = (cid:8) h k , h k (cid:9) , respectively. Time-step superscripts havebeen omitted for clarity of presentation. In the coupling layer blocks ActNorm is used which was originally proposed byKingma and Dhariwal [42] as an alternative to batch-normalization. ActNorm ap-plies an invertible normalization to each feature channel, detailed in Table 1, thatallows for smaller batch-sizes to be used without performance desegregation seen intraditional batch-normalization. The last essential component of the affine couplingblocks is the 1 × ×
1, it can be efficiently inverted and has atrivial Jacobian as detailed in Table 1. The purpose of this convolution is to permutethe feature maps between coupling layers. Since, the coupling layers used in Fig. 7only operate on half of the input data, permutation between layers is essential toincrease the expressibility of the model. 14 able 1: Invertible operations used in the generative normalizing flow method of TM-Glow. Beingconsistent with the notation in [42], we assume the inputs and outputs of each operation are ofdimension h k − , h k ∈ R c × h × w with c channels and a feature map size of [ h × w ]. Indexes over thespatial domain of the feature map are denoted by h ( x, y ) ∈ R c . The coupling neural network andconvolutional LSTM are abbreviated as N N and
LST M , respectively. Time-step superscripts havebeen neglect for clarity of presentation.
Operation Forward Inverse Log JacobianConditionalAffine Layer (cid:8) h k − , h k − (cid:9) = h k − (log s , t ) = NN ( h i − , ξ ( i ) ) h k = exp (log s ) (cid:12) h k − + th k = h k − h k = (cid:8) h k , h k (cid:9) (cid:8) h k , h k (cid:9) = h k (log s , t ) = NN ( h k , ξ ( i ) ) h k − = (cid:0) h k − t (cid:1) / exp (log s ) h k − = h k h k − = (cid:8) h k − , h k − (cid:9) sum (log | s | )LSTM AffineLayer (cid:8) h k − , h k − (cid:9) = h i − a ( i ) out , c ( i ) out = LST M (cid:16) h k − , ξ ( i ) , a ( i ) in , c ( i ) in (cid:17) (log s , t ) = NN ( h k − , ξ ( i ) , a ( i ) out ) h k = exp (log s ) (cid:12) h k − + th k = h k − h k = (cid:8) h k , h k (cid:9) (cid:8) h k , h k (cid:9) = h k a ( i ) out , c ( i ) out = LST M (cid:16) h k , ξ ( i ) , a ( i ) in , c ( i ) in (cid:17) (log s , t ) = NN ( h k , ξ ( i ) , a ( i ) out ) h k − = (cid:0) h k − t (cid:1) / exp (log s ) h k − = h k h k − = (cid:8) h k − , h k − (cid:9) sum (log | s | )ActNorm ∀ x, y h k ( x, y ) = s (cid:12) h k − ( x, y ) + b ∀ x, y h k − ( x, y ) = ( h k ( x, y ) − b ) / s h · w · sum (log | s | )1 × ∀ x, y h k ( x, y ) = W h k − ( x, y ) W ∈ R c × c ∀ x, y h k − ( x, y ) = W − h k ( x, y ) h · w · log (det | W | )Split (cid:8) h k − , h k − (cid:9) = h k − ( µ , σ ) = NN (cid:0) h k − (cid:1) p θ ( z k ) = N (cid:0) h k − | µ , σ (cid:1) h k = h k − h k − = h k ( µ , σ ) = NN (cid:0) h k − (cid:1) h k − ∼ N ( µ , σ ) h k − = (cid:8) h k − , h k − (cid:9) N/A
As seen in Fig. 7, the affine coupling layer requires two inputs for which only oneis modified to allow for efficient inversion. To form these two inputs, a squeezeoperation is applied to the feature maps which reduces the dimensions of the featuremap by a half and increases the number of channels by a factor of two. In thiswork, we use the squeeze method originally proposed by Dinh et al . [41] and alsoimplemented in the Glow model [42]. As depicted in Fig. 8a, the image is separatedusing a checkerboard pattern resulting in four sub-sampled versions. Note that thisdiffers from the conditional Glow model by Zhu et al . [36] which implements a chunkbased squeeze where the image is separated by four quadrants. We found that thecheckered approach had better performance likely due to the checkerboard squeezeproviding a sub-sampled version of the full image rather than a local quadrant to theaffine coupling layer. 15 a) Squeeze operation. (b) Split operation.
Figure 8: Squeeze and split forward operations used to manipulate the dimensionality of the fea-tures in TM-Glow. (Left) The squeeze operation compresses the input feature map h k − using acheckerboard pattern halving the spatial dimensionality and increasing the number of channels byfour. (Right) The split operation factors out half of an input h k − which are then taken to belatent random variable z ( i ) . The remaining features, h k are sent deeper in the network. Time-stepsuperscripts have been omitted for clarity of presentation. Unlike standard convolutional operations, the affine coupling layer is volume pre-serving meaning that the number of output elements must be the same as the input.Retaining the total dimensionality input through all layers of the model is not idealfor a convolutional model since this increases the computational and memory costof the model. Thus we use the multiscale architecture proposed by Dinh et al . [41],which is illustrated clearly in Fig. 4b. This multiscale flow model factors out halfof the current feature maps at multiple intervals of the architecture which are thentreated as random latent variables [42, 36]. A single split operation is illustrated inFig. 8b in which the density of these latent variables is taken to be a fully-factorizableGaussian with mean and standard deviation governed from the remaining featuresusing a shallow neural network.When the split is executed in the inverse direction, the hyper-parameters aredependent on the features being provided from deeper within the model as seen inTable 1. This dependence on deeper feature therefore conditions the random latentvariables on both conditional features representing the coarse simulation input, x n ,as well as the recurrent features τ n − . As an example to illustrate this point, considera TM-Glow with a model depth of k d = 3 as illustrated in Figs. 4b. Each of thefour random latent variables and high-fidelity output for a single time-step can be16escribed as the following conditional distributions: z ( e ) ,n ∼ p θ (cid:0) z ( e ) ,n | ξ (3) ,n (cid:1) , z (3) ,n ∼ p θ (cid:0) z (3) | z ( e ) (cid:1) , z (2) ,n ∼ p θ (cid:0) z (2) ,n | z (3) ,n , ξ (3) ,n , τ (3) ,n − (cid:1) , z (1) ,n ∼ p θ (cid:0) z (1 ,n ) | z (2) ,n , ξ (2) ,n , τ (2) ,n − (cid:1) , y n ∼ p θ (cid:0) y n | z (1) ,n , ξ (1) ,n , τ (1) ,n − (cid:1) , (9)which clearly is a hierarchical modeling of the distribution y n ∼ p ( y n | x n , τ n − ) forwhich TM-Glow was designed to learn.As discussed by Dinh et al . [41], this multiscale architecture has multiple intrinsicbenefits. The first is that it results in the model learning intermediate representationsof the output field, with deeper latent variables representing more global character-istics and shallower ones representing finer details. Additionally this permutes theloss across multiple layers of the network which can improve training and predictiveaccuracy. With respect to modeling physical systems, such a multiscale architec-ture is well suited as a vast number of physical phenomena are multiscale in nature.Specifically in fluids, it is well known that turbulence occurs at multiple length andtime scales making TM-Glow well suited for fluid flow prediction. Remark 2.
A particularly interesting attribute of the Glow model is the presenceof random latent variables at multiple levels in the generator. This characteristic isabsent from traditional VAE or GANs models for which the random latent variablesare only present at one level of the model, typically the lowest-dimensional. Thisunique architecture arises out of necessity, but allows for the generative model tolearn probabilistic densities at multiple scales. In the context of physical systems,this could allow the model to learn stochastic phenomena at varying length scaleswhich lends itself nicely to many multiscale systems.
To condition the generative model on the low-fidelity fluid field at multiple levels, adensely connected convolutional encoder is used. This convolutional encoder, illus-trated on the right side of Fig. 4a, is comprised of encoding and dense blocks followingthe approach originally taken by Zhu et al . [36]. Examples of the encoding and denseblocks are illustrated in Fig. 9 which have been used successfully for modeling manyphysical systems in the past [35, 36, 37, 50]. The encoding blocks down-scale thefeature maps forcing the model to learn low-dimensional representations while thedensely connected blocks increase predictive accuracy of the model and have betterperformance than standard residual connections [51]. The feature maps are takenfrom multiple levels of the convolutional encoder, up-scaled and passed to the affine17oupling blocks conditioning the generator. These are denoted by ξ ( i ) ,n in Fig. 4,passing detailed high-dimensional features towards the beginning of the encoder andglobal low-dimensional features towards the end. Figure 9: Dense block with a growth rate and length of 2. Residual connections between convolu-tions progressively stack feature maps resulting in 12 output channels in this schematic. Standardbatch-normalization [52] and Rectified Linear Unit (ReLU) activation functions are used [53] injunction with the convolutional operations. Convolutions are denoted by the kernel size k , stride s and padding p .
4. TM-Glow Training
One of the key benefits of using INNs is the ability to calculate the likelihood ofthe data exactly with respect to the latent variables. This makes data-driven train-ing straight forward as one can simply pose the optimization as the minimizationof the negative of the log likelihood in Eq. (5) [40, 41, 42, 54]. However, since thisencodes the output of the model to the latent parameters, this training does notallow physical-constraints to be imposed on the generated samples of the model.In the work of Zhu et al . [36], in which physics-constrained learning is used in theabsence of data, the reverse Kullback-Leibler(KL) divergence is used as an opti-mization objective. The reverse KL divergence poses the optimization though thegenerated samples of the INN, which allows for physical-constraints to be imposedon the produced realizations.Due to the complex dynamics of the N-S equations at high-Reynolds numbers,physics-constrained learning turbulent fluid flows through a PDE based loss aloneposes a difficult optimization objective. Thus we will use here a semi-supervisedextension of the reverse KL-divergence loss that allows both supervision with dataas well as additional physics-constrained components. Consider a training set ofi.d.d. cases D = { X d , Y d } Dd =1 , then the loss is as follows: L = arg min θ D (cid:88) d =1 D KL ( p θ ( Y d | X d ) || p β ( Y d | X d )) = D (cid:88) d =1 E p θ (cid:20) log p θ ( Y d | X d ) p β ( Y d | X d ) (cid:21) , (10)18n which we have made use of the fact that the KL divergence is additive for inde-pendent distributions. p θ ( Y | X ) is the density of TM-Glow with parameters θ fora single time-step. p β ( Y | X ) is an energy based density function with a controllableparameter β representing the true high-fidelity targets. Note that the expectation iscalculated using the samples of the generative model y ∼ p θ , requiring a backward pass of the INN which is the opposite direction of the standard maximum likelihoodapproach.Currently this loss is posed across the entire time-series, however we desire itto be expressed in terms of single time-steps to make it computationally tractablewith TM-Glow. First, we pose the energy-based density as a product of independentdistributions at each individual time-step p β ( Y | X ) = (cid:81) Nn =1 p β ( y n | x n ). This isa similar form as the definition of the model’s likelihood in Eq. (6), p θ ( Y | X ) = (cid:81) Nn =1 p θ ( y n | x n , τ n − ). The loss for a time-series of N time-steps can be written as: L = D (cid:88) d =1 N (cid:88) n =1 E p θ (cid:2) log p θ (cid:0) y nd | x nd , τ n − d (cid:1) − log p β ( y nd | x nd ) (cid:3) . (11)The first term is an entropy promoting term, log p θ (cid:0) y nd | x nd , τ n − d (cid:1) , encouraging di-versity in the models samples and avoiding mode collapse. A unique advantage usingan INN is that the entropy, H ( y nd | x nd , τ n − d ) = − E p θ (cid:2) log p θ (cid:0) y nd | x nd , τ n − d (cid:1)(cid:3) , can beevaluated exactly though the change of variables in Eq. (3) as oppose to approxi-mating [55, 56] or learning it [57]. The second term, the negative log energy density, − log p β ( y | x ), encourages consistency between the models generated samples andthe specified physical-constraints. In this work, we use the Boltzmann distributionto model p β ( y nd | x nd ) which is standard in energy based models [58]: p β ( y nd | x nd ) = exp ( − βV P DE ( · )) Z β , (12)in which V P DE ( · ) is a PDE based potential discussed in further detail in Section 4.1. Z β is a normalizing constant which does not impact the optimization and thus ne-glected. β is a tunable parameter that corresponds to the inverse temperature in theBoltzmann distribution that controls the strength of the potential in the backwardKL loss. The resulting form of the reverse KL divergence follows: L = D (cid:88) d =1 N (cid:88) n =1 E p θ (cid:34) log p θ (cid:0) z nd | x nd , τ n − d (cid:1) + K (cid:88) k =1 log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) det (cid:32) ∂ h nk,d ∂ h nk − ,d (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + βV P DE ( · ) (cid:35) . (13)19n practice, the expectation in the KL divergences is taken as point estimate duringtraining. Due to the large number of times this loss is evaluated during the stochasticoptimization of our model, the effects of such point estimates have been empiricallyshown to be minimal [38]. The potential V P DE represents imposed physical constraints one wishes to imposeon the model’s samples. Similar to past physics-constrained literature [36, 37], wewill use the governing equations to aid the formulation of this potential. Within thiswork, we pose V P DE in terms of three components: V P DE = V P res + V Div + V L + V RMS , (14) V P res = v c n s (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ρ (cid:18) ∂ p n ∂x + ∂ p n ∂y (cid:19) + (cid:18) ∂ u nx ∂x (cid:19) + 2 ∂ u nx ∂y ∂ u ny ∂x + (cid:18) ∂ u ny ∂y (cid:19) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (15) V Div = v c n s (cid:13)(cid:13)(cid:13)(cid:13) ∂ u nx ∂x + ∂ u ny ∂y (cid:13)(cid:13)(cid:13)(cid:13) , V L = 1 n s (cid:107) y n − y nHF (cid:107) (16) V RMS = 1 n s (cid:107) RM S ( y (cid:48) ) − RM S ( y (cid:48) HF ) (cid:107) , (17)which consists of the residual of the Poisson equation for pressure, the divergencefree constraint for incompressible flow and two L supervised learning terms. Thefirst is between the predicted state variables of the model, denoted by y n , and theobserved high-fidelity solution y nHF . The second is between the root-mean-square(RMS) of the fluctuation states predicted by the model, y (cid:48) , and the observed high-fidelity RMS values of y (cid:48) HF . This term can be interpreted as matching the turbulentintensity between the predicted time-series and the high-fidelity observables. TheRMS of the states is defined as follows, RM S (cid:16) y (cid:48) (cid:17) = (cid:113) ( y (cid:48) ) = (cid:18) T (cid:90) T ( y ( t ) − y ) dt (cid:19) / , (18)which is a time-averaged quantity and thus has no time-step index. n s is the num-ber of nodes in the predicted high-fidelity spatial domain. Both residual loss termsare scaled by the cell volume, v c = ∆ x · ∆ y , to help balance each loss component.While the potential resembles forms of other data and PDE based constrained lossfunctions [22, 33], ours is posed in probabilistic framework for learning the full dis-tribution of solutions opposed to a single deterministic prediction.20he PDE residual terms are evaluated using the model’s predictions and con-strains the predictions to be physically realizable. To evaluate the gradients weuse the same methods successfully used in our past works for various physical sys-tems [36, 37]. Efficient finite difference based convolutions to approximate first-ordergradients: ∂ u n ∂x = 18∆ x − − − ∗ u n , ∂ u n ∂y = 18∆ y − − −
10 0 01 2 1 ∗ u n , (19)as well as second-order gradients: ∂ p n ∂x = 14∆ x − − − ∗ p n , ∂ p n ∂y = 14∆ y − − −
21 2 1 ∗ p n . (20)These smoothed second-order accurate finite difference approximations are basedon image processing filters such as the Sobel filter 2D convolutions [59] which havebeen found to improve training stability over pure finite-difference calculations. Theconvolutional filter approach allows for efficient computation of these gradients dur-ing training that directly integrates itself into the computational graph for back-propagation. In this work, since we are predicting a sub-domain for which we do notknow the complete boundary conditions, we only compute the PDE constraint termson the deep nodes of the predicted domain ignoring the boundary values. The L term helps stabilize the PDE based losses which can be unstable due to their gradi-ents as well as encouraging turbulence in the predicted fluid flow. Similar L lossesare used in GAN models for time-series predictions to increase time-series accuracyand continuity [60, 61]. Pseudocode for the training process is outlined in Algo-rithm 1 for a single training case but easily extends to a full training data-set. Thepseudocode for sampling of TM-Glow is also outlined in Algorithm 2, from whichstatistics are then computed in traditional Monte Carlo fashion. TM-Glow contains a large set of hyper-parameters including model depth, the num-ber of affine coupling layers, coupling neural network depth, learning rate, mini-batchsize, etc. which are all coupled together making an extensive hyper-parameter searchextremely difficult. While automated methods exist to aid this search, we opted totake a simpler approach by empirically finding a reasonable model architecture that21 lgorithm 1:
Training TM-Glow for a single training case.
Input:
TM-Glow model: f θ ; Low-fidelity and high-fidelity time-series data { X , Y } = { x n , y nHF } Nn =1 of length N ; Number of epochs: M ;Back-propagation through time interval: p ; Learning rate: η y ≈ (1 /n ) (cid:80) Nn =1 y nHF ; (cid:46) Approx. mean flow field for epoch = 1 to M do τ ∼ p ( τ ) ; (cid:46) Sample initial recurrent state for n = 1 to N do y n , τ n , log p ( y n | x n , τ n − ) ← f − θ ( x n , τ n − ) ; (cid:46) Sample TM-Glow V P res ( y n ) = ( v c /n s ) (cid:107)(cid:52) p n + ∇ · (( u n · ∇ ) u n ) (cid:107) ; (cid:46) Poisson residual V Div ( y n ) = ( v c /n s ) (cid:107)∇ · u n (cid:107) ; (cid:46) Divergence residual V L = (1 /n s ) (cid:107) y n − y nHF (cid:107) ; (cid:46) L1 Loss L += log p ( y n | x n , τ n − ) + β ( V P res + V Div + V L ) ; (cid:46) Backward KL if Mod(n,p)=0 then y (cid:48) = y n − p : n − y ; (cid:46) Approx. TM-Glow fluctuation fields L += ( βp/n s ) (cid:107) RM S ( y (cid:48) ) − RM S ( y (cid:48) HF ) (cid:107) ; (cid:46) RMS loss ∇ θ ← Backprop( L ) ; (cid:46) Back-propagation θ ← θ − η ∇ θ ; (cid:46) Gradient Descent L = 0 ; (cid:46) Zero loss y = (1 /n ) (cid:80) Nn =1 y n ; (cid:46) Update mean flow field estimate
Output:
Trained TM-Glow model f θ ; Algorithm 2:
Sampling TM-Glow high-fidelity time-series.
Input:
Trained TM-Glow model: f θ ; Low-fidelity time-series data { X } = { x n } Nn =1 of length N ; Number of samples: M for m = 1 to M do τ ∼ p ( τ ) ; (cid:46) Sample initial recurrent state for n = 1 to N do y n , τ n ← f − θ ( x n , τ n − ) ; (cid:46) Sample time-step from TM-Glow Y m = (cid:8) y , y , ..., y N (cid:9) ; (cid:46) Store sampled time-series
Output:
High-fidelity flow samples: Y M k c in Fig. 6.Each model is trained on a small data-set (32 flows from the second numerical exam-ple in Section 6) to try to keep the computational cost of the hyperparameter searchreasonable. To quantify the accuracy of each model, the following time-averagedprediction mean squared errors (MSE) are used for a validation set of n test = 16flows: M SE
Mag = 1 n s n test (cid:107) E p θ [ | u | ] − | u HF |(cid:107) , | u | = 1 T (cid:90) T (cid:113) u x ( t ) + u y ( t ) dt, (21) M SE
T KE = 1 n s n test (cid:107) E p θ [ k ] − k HF (cid:107) , k = 12 (cid:16) ( u (cid:48) x ) + (cid:0) u (cid:48) y (cid:1) (cid:17) , (22)in which the expected value of the model’s prediction is estimated using 20 modelsamples. The first error assesses the accuracy of the mean flow magnitude, the secondassesses the accuracy of the predicted turbulent kinetic energy (TKE). The test errorof the models considered are plotted in Fig. 10. We find that there is a trade offbetween average velocity and turbulent energy accuracy and larger models begin toover fit on this small training data-set. Based on these results, we select a TM-Glowmodel with k d = 3 and k c = 16. Figure 10: (Left to right) Velocity magnitude MSE and turbulent kinetic energy (TKE) test MSEfor TM-Glow models containing k d · k c affine coupling layers. Essential TM-Glow and training hyper-parameters are outlined in Table 2. Theresulting model contains 1 . τ , is averaged with the current recurrentstate after every BPTT. This borrows the idea of using various recurrent features atdifferent timescales in hierarchical RNNs for neural language processing [62, 63]. Al-though not necessary, this algorithmic heuristic helps prevent the information fromthe initial state being lost which was found to improve the model’s accuracy andsample diversity. For additional details, we direct the reader to the source code. Table 2: TM-Glow model and training parameters used for both numerical test cases. For theparameters that vary between test cases the superscript † and ‡ to denote numerical examples inSections 5 and 6, respectively. Hyper-parameter differences are due to memory constraints imposedfrom the varying predictive domain sizes. TM-Glow TrainingModel Depth, k d ξ ( i )
32 Weight Decay 1 e − a ( i ) in , c ( i ) in ,
64 Epochs 400Affine Coupling Layers, k c
16 Mini-batch Size 8 † , ‡ Coupling NN Layers 2 BPTT 10 time-stepsInverse Temp., β β , in the energy density controls the balancebetween the model satisfying the physics-based potential and the model’s entropy.Given this parameter close relation to the model’s probabilistic nature, reliabilitydiagrams of the model’s predictions are used to assess the predictive uncertaintyquality. Several models with different β values are trained on the same small data-setof 32 flows from the second numerical example in Section 6 used when calibratingthe model depth. For a small validation data-set of 16 flows, for each model wecompute the empirical density function for each of the model’s output fields overall samples, time-steps and validation cases at each spatial location independently.The values of the predicted density function at several quantiles are then compared24o the empirical density function of the high-fidelity data which is then averagedover the spatial domain and plotted in Fig. 11 for each state variable. Interestingly,unlike Zhu et al . [36], the predicted quantiles all match fairly well with the high-fidelity data with apparently little sensitivity to β . Based on these results, a rangeof β = [200 , β = 500. Remark 3.
For each numerical example, additional fine tuning is certainly possibleto obtain the highest level of accuracy. However, in this work we will not be per-forming any case specific tuning to demonstrate that decent results can be obtainedusing TM-Glow for multiple problems of different nature and dimensionality.
Figure 11: Reliability diagrams of the predicted x-velocity, y-velocity and pressure fields predictedwith TM-Glow evaluated over 12000 model predictions. The black dashed line indicates matchingempirical distributions between the model’s samples and observed validation data.
An ablation study is performed to investigate the impact each component has onthe model’s predictive accuracy. Additionally, the model is also trained using thestandard maximum likelihood approach for INNs, by maximizing Eq. (7), to act asthe traditional base-line. The same training/validation data-set used for the accuracyand uncertainty calibration studies was also used here. As listed in Table 3, we trainseveral models using variants of the propose backward KL loss and compute the meansquared error of various flow-field quantities across the validation data-set. Again,20 model samples are used to compute the expected value of each predicted flowquantity from which the error is computed.First we note that training the model through the traditional maximum likeli-hood approach generally yields worse results than the backwards KL losses with theexception of some of the time-averaged mean flow quantities. Additionally, the large25esidual errors for the maximum likelihood training indicated that the instantaneousflow fields are non-physical. Interestingly, the proposed loss does not produce themost accurate mean flow or turbulent statistics. This appears to be due to the in-clusion of the Poisson pressure residual loss, which enforces physical coupling of theoutput fields. Without this PDE loss, the model has more freedom and can achievegreater accuracy of the flow statistics. However, this comes at the cost of havingnonphysical instantaneous flow field realizations which is indicated by the increasein the time-average pressure residual. Given that we are interested in predictingphysical fluid flow, we believe that inclusion of the Poisson residual is essential evenat the sacrifice of the time-average statistics.
Table 3: Ablation study of the impact of different parts of the backward KL loss. As a base-linewe also train TM-GLow using the standard maximum likelihood estimation (MLE) approach. Themean square error (MSE) of various flow field quantities for various loss formulations are listed.The lowest values for each error are bolded.
MLE V Pres V Div V L V RMS
MSE ( u x ) MSE ( u y ) MSE ( p ) MSE (cid:18)(cid:113) ( u (cid:48) x ) (cid:19) MSE (cid:18)(cid:113)(cid:0) u (cid:48) y (cid:1) (cid:19) MSE (cid:18)(cid:113) ( p (cid:48) ) (cid:19) V Div V Pres (cid:88) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:88) (cid:88) (cid:88) (cid:88) (cid:55) (cid:55) (cid:88) (cid:88) (cid:88) (cid:55) (cid:55) (cid:55) (cid:88) (cid:88) (cid:55) (cid:55) (cid:55) (cid:88) (cid:55)
5. Turbulent Flow over a Backwards Step
We first apply the proposed model to surrogate modeling turbulent flow over a back-ward step at different Reynolds numbers, a classical benchmark problem in compu-tational fluids. As illustrated in Fig. 12, the feature of interest is the flow separationthat occurs following the step. Such phenomena can be found in a surprisinglylarge number of systems including heat exchanges, flow around buildings, combus-tion engines and aerodynamic elements [65, 66]. The Reynolds number of the flowis governed by the inlet velocity u , viscosity ν = 0 . h = 1. In this benchmark, the inlet boundary condition is varying in magnitude andthus varying the Reynolds number of the flow. Here we are interested in predictingthe recirculation region, marked by the green box in Fig. 12, for different Reynoldsnumber. This region is the typical area of study for this flow due the presence offlow separation, Kevin-Helmholtz instability and turbulent flow with various eddyformations. 26 igure 12: Flow over a backwards step. The green region indicates the recirculation region TM-Glow will be used to predict. All domain boundaries are no-slip with the exceptions of the uniforminlet and zero gradient outlet. The total outlet simulation length is made to be double that of theprediction range to negate effects of the boundary condition on this zone. The low-fidelity simulator that will be the input of the model has a mesh char-acteristic resolution of l c = h/
12 and the target high-fidelity field has a resolution of l c = h/
32 as shown in Fig. 13. Thus TM-Glow will be provided an input of [24 × × t = 0 .
5. The result-ing model input for a single time-step is x n = { u nl , p nl } ∈ R , , with an output y n = { u nh , p nh } ∈ R , , . The full training data set consists of fluid flows evenlydistributed between Reynolds number 5000 to 50000 each consisting of 80 time-steps.Simulations were performed using the OpenFOAM finite volume solver using stan-dard Smagorinsky LES sub-grid scale models [67]. During training we augment thesetime-series by splitting them in half into two time-series of 40 time-steps to artificiallycreate more flow training cases. Training input and output data is normalized to astandard unit Gaussian. Further details on the computational cost of the low-fidelityand high-fidelity simulations along with the training of TM-Glow are discussed inSection 7. (a) Low-fidelity (b) High-fidelity Figure 13: Computational mesh around the backwards step used for the low- and high-fidelity CFDsimulations solved with OpenFOAM [67].
A test set of 17 flows with evenly spaced Reynolds numbers between [7500 , Figure 14: (Left to right) Flow over backwards step velocity magnitude and turbulent kinetic energy(TKE) error during training of TM-Glow on different data set sizes. Error values were average overfive model samples.Table 4: Backwards step test error of various normalized time-averaged flow field quantities of thelow-fidelity solution interpolated to the high-fidelity mesh and TM-Glow trained on various trainingdata set sizes. Lower is better. TM-Glow errors were averaged over 20 samples from the model.The training time of each data set size is also listed.
MSE ( u x /u ) MSE ( u y /u ) MSE ( p /u ) MSE (cid:18)(cid:113) ( u (cid:48) x ) /u (cid:19) MSE (cid:18)(cid:113)(cid:0) u (cid:48) y (cid:1) /u (cid:19) MSE (cid:18)(cid:113) ( p (cid:48) ) /u (cid:19) GPU Hrs.Low-Fidelity 0.1285 0.0265 0.0227 0.0241 0.0187 0.0145 -8 Flows 0.0160 0.0036 0.0040 0.0069 0.0053 0.0052 2.416 Flows 0.0173 0.0044 0.0032 0.0049 0.0046 0.0042 4.332 Flows 0.0135 0.0032 0.0023 0.0030 0.0032 0.0020 8.4
To illustrate the improvement TM-Glow is able to produce, we plot several time-steps of the velocity magnitude for several time-series samples of the model in Fig. 15as well as the Q-criterion (also known as the elliptic Okubo-Weiss criterion for 2Dflows) [68, 69] in Fig. 16. Additional, samples of each state variable for this numericaltest case are illustrated in Figs. 17 and 18. TM-Glow clearly generates a fluid flowthat is far closer to the high-fidelity solution compared to the low-fidelity simulationboth in the magnitude of the fluid velocity but also the predicted vortex structure.The variance of the sampled time-series appears to vary depending on Reynoldsnumber. For example, the test case Re = 27500, which exists in the center of thetraining data range, has a noticeably larger sample diversity compared to the edgecase of Re = 47500. We believe that this is largely due to a lack of multiple flows28t the same Reynolds number in the training data-set, thus the inclusion of multipletraining flows at the same Reynolds numbers would increase sample diversity. Ingeneral, the model’s samples produce accurate fluid flow and turbulent statistics asillustrated in Figs. 19 and 20. In Fig. 19, the mean flow profiles are plotted of thestate variables along with the predicted uncertainty for two test flows. Following inFig. 20, the turbulent kinetic energy and Reynolds shear stress profiles are illustrated.TM-Glow is able to make dramatic improvements to the flow statistics for turbulentflows differing in Reynolds numbers by almost an order of magnitude using only 32fluid simulations to learn. (a) Re = 27500(b) Re = 47500 Figure 15: (Top to bottom) Velocity magnitude of the high-fidelity target, low-fidelity input, 3TM-Glow samples and standard deviation for two test flows. a) Re = 27500(b) Re = 47500 Figure 16: (Top to bottom) Q-criterion of the high-fidelity target, low-fidelity input and threeTM-Glow samples for two test flows. a) X-velocity (b) Y-velocity(c) Pressure Figure 17: TM-Glow time-series samples of x − velocity, y − velocity and pressure fields for a back-wards step test case at Re = 7500. For each field (top to bottom) the high-fidelity ground truth,low-fidelity input, three TM-Glow samples and the resulting standard deviation are plotted. (a) X-velocity (b) Y-velocity(c) Pressure Figure 18: TM-Glow time-series samples of x-velocity, y-velocity and pressure fields for a backwardsstep test case at Re = 27500. For each field (top to bottom) the high-fidelity ground truth, low-fidelity input, three TM-Glow samples and the resulting standard deviation are plotted. igure 19: (Top to bottom) Time averaged x-velocity, y-velocity and pressure profiles for twodifferent test cases at (left to right) Re = 7500 and Re = 47500. TM-Glow expectation (TM-Glow)and confidence interval (TM-Glow 2 σ ) are computed using 20 time-series samples.Figure 20: (Top to bottom) Turbulent kinetic energy and Reynolds shear stress profiles for twodifferent test cases at (left to right) Re = 7500 and Re = 47500. TM-Glow expectation (TM-Glow)and confidence interval (TM-Glow 2 σ ) are computed using 20 time-series samples.
6. Turbulent Flow around an Array of Cylinders
While the prediction of a flow at different Reynolds numbers is a practical testcase, the reality is that the underlying flow structures have a relatively similar form.Thus for our second numerical example, we wish to stress this model further byinvestigating the prediction of a flow where the underlying flow structures are varyingdramatically between test cases. A classical fluid mechanics benchmark is the flowaround a cylinder, however in its traditional form its not up to the complexity we are32nterested in. Thus for a more challenging problem, we will consider the predictionof turbulent wake behind an array of cylinders with a stochastic location.Flow around multiple bluff bodies is important due to its various applicationsin engineering including: wind flow around urban structures [70], water flow aroundbridge pylons [71, 72], wake from an array of wind turbines [73, 74], modern off-shore structures [75], heat transfer applications, etc. Depicted in Fig. 21, in thiscase study five cylinders are randomly placed within a specified area of a channelwith a fixed uniform inlet velocity. The sub-domain we wish predict is the wakeregion directly behind the cylinder array in which the majority of the turbulenceexists. Differing from the previous surrogate model where the Reynolds number wasvarying, the physical boundary of this flow is changing resulting in very differentfluid structures in the predictive sub-domain. The bulk Reynolds number of theflow, set at a constant value Re = 5000, is governed by the inlet velocity u = 1,viscosity ν = 0 . d = 1. This numerical example isakin to flow optimization problems for which a structure is optimized to yield desiredflow properties. The predicted flow fields for both a low-fidelity and correspondinghigh-fidelity finite volume simulation are shown in Fig. 22 for two different cylinderarrays to demonstrate the difference in the resolved flow features. Figure 21: Flow around array of bluff bodies. The red region indicates the area for which the bodiescan be placed randomly. The green region indicates the wake zone that we will use TM-Glow topredict a high-fidelity response from a low-fidelity simulation. igure 22: Velocity magnitude of the low-fidelity and high-fidelity simulations for two differencecylinder arrays. (Left to right) Cylinder array configuration and the corresponding (top to bottom)high-fidelity and low-fidelity finite volume simulation results at several time-steps. The low-fidelity simulator that will be the input of the model has a mesh charac-teristic resolution of l c = 5 d/
16 and the target high-fidelity field has a characteristicresolution of l c = 5 d/
64 as shown in Fig. 23. The mesh is structured in the wakeregion allowing for this data to be directly used with our convolutional generativemodel. Thus TM-Glow will be provided an input of [16 ×
16] and predict a field[64 ×
64] both with a time-step size of ∆ t = 0 .
5. The model input for this numer-ical example is x n = { u nl , p nl } ∈ R , , with an output y n = { u nh , p nh } ∈ R , , .The full training data set consists of fluid flows with cylinders randomly placed indifferent configurations. Just like the previous numerical example, simulations wereperformed using the OpenFOAM finite volume solver using standard SmagorinskyLES sub-grid scale models [67]. During training we augment these time-series bysplitting them in half into two time-series of 40 time-steps to artificially create moreflow training cases. Training input and output data is normalized to a standard unitGaussian. Additional details on the computational cost of training of TM-Glow forthis numerical example can also be found in Section 7.34 a) Low-fidelity (b) High-fidelity Figure 23: Computational mesh around the cylinder array used for the low- and high-fidelity CFDsimulations solved with OpenFOAM [67].
A test set of 32 flows, each with a unique cylinder configuration, are used toevaluate the performance of TM-Glow. Four models are trained on 32, 48, 64 and96 flows. The test MSE error of the velocity magnitude and TKE, defined in Eqs. 21and 22, during training are plotted in Fig. 24. The test errors of various flow fieldquantities are listed in Table 5 along with the error obtained from naively interpolat-ing the low-fidelity solution to the high-fidelity mesh. TM-Glow is able to producetime-average statistics that are far more accurate than the low-fidelity solution asexpected. As the training data set increases, we do see improvements in the flowstatistics as we would expect. We note though, that even on the smallest data setlarge improvements over the low-fidelity simulation can still easily be obtained. Forthe remaining results we will use the model trained on 96 flows to illustrate thehighest accuracy model obtained. 35 igure 24: (Left to right) Cylinder array velocity magnitude and turbulent kinetic energy (TKE)error during training of TM-Glow on different data set sizes. Error values were average over fivemodel samples.Table 5: Cylinder array test error of various time-averaged flow field quantities of the low-fidelitysolution interpolated to the high-fidelity mesh and TM-Glow trained on different training data setsizes. Lower is better. TM-Glow errors were averaged over 20 samples from the model. The trainingtime of each data set size is also listed.
MSE ( u x ) MSE ( u y ) MSE ( p ) MSE (cid:18)(cid:113) ( u (cid:48) x ) (cid:19) MSE (cid:18)(cid:113)(cid:0) u (cid:48) y (cid:1) (cid:19) MSE (cid:18)(cid:113) ( p (cid:48) ) (cid:19) GPU Hrs.Low-Fidelity 0.1024 0.0081 0.0179 0.0638 0.0955 0.02122 -32 Flows 0.0432 0.0089 0.0158 0.0136 0.0201 0.0093 3.048 Flows 0.0378 0.0060 0.0181 0.0129 0.0175 0.0090 4.264 Flows 0.0361 0.0056 0.0110 0.0114 0.0170 0.0080 5.596 Flows 0.0304 0.0048 0.0127 0.0116 0.0174 0.0079 8.0
Similar to the previous numerical example, we plot several time-steps of thevelocity magnitude for several time-series samples of the model in Fig. 25. Ad-ditional, samples of each state variable for this numerical test case are illustratedin Figs. 26 and 27. Although the low-fidelity simulation differs significantly fromthe high-fidelity solution, we can see TM-Glow is able to produce fluid realizationsthat qualitatively appear similar to the high-fidelity. In this particular example,the low-fidelity solution exhibits nearly laminar flow due to the coarse discretizationused which TM-Glow is able to correct. Additionally, the fluid flows sampled fromTM-Glow appear much more diverse than that seen in the previous numerical ex-ample. Profiles of time-averaged flow quantities and turbulent statistics are plottedin Figs. 28 and 29 for two test flows. Indeed we can see that TM-Glow is able toimprove both with reasonable uncertainty bounds as well. In general, TM-Glow is36ble to yield an accurate prediction for the time-averaged flow field. However, themodel seems to consistently under-predict the turbulent intensity of the flow field.This could be improved through some ad hoc tuning of the loss by weighting theRMS term more heavily. 37 igure 25: (Top to bottom) Velocity magnitude of the high-fidelity target, low-fidelity input, threeTM-Glow samples and standard deviation for two test cases. a) X-velocity (b) Y-velocity(c) Pressure Figure 26: TM-Glow time-series samples of x − velocity, y − velocity and pressure fields for a cylinderarray test case. For each field (top to bottom) the high-fidelity ground truth, low-fidelity input,three TM-Glow samples and the resulting standard deviation are plotted. a) X-velocity (b) Y-velocity(c) Pressure Figure 27: TM-Glow time-series samples of x − velocity, y − velocity and pressure fields for a cylinderarray test case. For each field (top to bottom) the high-fidelity ground truth, low-fidelity input,three TM-Glow samples and the resulting standard deviation are plotted. a) Test-case 1(b) Test-case 2 Figure 28: Time-averaged flow profiles for two test flows. TM-Glow expectation (TM-Glow) andconfidence interval (TM-Glow 2 σ ) are computed using 20 time-series samples. (a) Test-case 1 (b) Test-case 2 Figure 29: Turbulent statistic profiles for two test flows. TM-Glow expectation (TM-Glow) andconfidence interval (TM-Glow 2 σ ) are computed using 20 time-series samples. . Computational Cost Analysis In the following section, the computational cost associated with the training andprediction of TM-Glow will be discussed. The cost of a surrogate needs to be lowenough to justify its use, which includes the training cost for deep learning models. Tocompare processes ran on different hardware and CPU cores, we adopt the measureof a service unit (SU) hour, which is equal to a single CPU core hour or a singleGPU hour. As shown in Table 6, both the low-fidelity and high-fidelity simulationswere ran on CPUs while the deep generative model used a single GPU. Differencesbetween CPU models were neglected since computation of TM-Glow is bottle-neckedby the GPU. The comparison of CPU consumption versus a GPU is not trivial due tothe fundamental hardware differences between the two and an in depth investigationusing energy consumption or floating point operations is beyond the intended scopeof this paper. Hence, we use this simple definition resembling that used by theExtreme Science and Engineering Discovery Environment (XSEDE). For both numerical examples, OpenFOAM finite volume simulator was used dueto its extensive validation and efficiency [67]. Both the low- and high-fidelity simula-tions used standard LES Smagorinsky sub-grid scale model [76] with default param-eters. When the high-fidelity simulations were parallelized between CPUs, Open-FOAM’s in house domain decomposition algorithm “scotch” was used to partitionthe meshes. Additionally, the fluid flows are solved between time t = [0 ,
80] for bothresolutions but only t = [40 ,
80] is used as training/testing data. This is done toensure the flow fields sampled were of fully developed turbulence.
Table 6: Hardware used to run the low-fidelity and high-fidelity CFD simulations as well as thetraining and prediction of TM-Glow for both numerical examples.
CPU Cores CPU Model GPUs GPU Model SU HourLow-Fidelity 1 Intel Xeon E5-2680 - - 1High-Fidelity 8 Intel Xeon E5-2680 - - 8TM-Glow 1 Intel Xeon Gold 6226 1 NVIDIA Tesla V100 2
The low-fidelity and high-fidelity simulations for the flow over backwards facing stepconsisted of a mesh with a resolution of ∆ x, ∆ y = h/
12 and ∆ x, ∆ y = h/
32, respec-tively. A sub-section of both meshes are plotted in Fig. 13 to illustrate the resolution . The low-fidelity and high-fidelity simulations for the flow over a cylinder array con-sisted of a mesh with a resolution of ∆ x, ∆ y = 5 d/
16 and ∆ x, ∆ y = 5 d/
64, respec-tively. A sub-section of the meshes are plotted in Fig. 23 to illustrate the resolutiondifference. This results in the low-fidelity and high-fidelity meshes containing 9k and125k cells, respectively. A single low-fidelity simulation takes about 3 . a) Flow over Backwards Step (b) Flow Around Cylinder Array Figure 30: Computational requirement for training TM-Glow given training data-sets of varioussizes. Computation is quantified using Service Units (SU) defined in Table 6.Table 7: Prediction cost of the surrogatecompared to the high-fidelity simulator forflow over a backwards step.
Backwards Step SU Hours Wall-clock (mins)Low-Fidelity 0.06 4.5TM-Glow 20 Samples 0.03 0.75Surrogate Prediction 0.09 5.25High-Fidelity Prediction 5.6 42
Table 8: Prediction cost of the surrogatecompared to the high-fidelity simulator forflow around a cylinder array.
Cylinder Array SU Hours Wall-clock (mins)Low-Fidelity 0.05 3.1TM-Glow 20 Samples 0.02 0.7Surrogate Prediction 0.07 3.8High-Fidelity Prediction 4.27 32
8. Conclusion
The application of machine learning methods to CFD requires significant advances toextend such models to realistic problems. In this work we investigate the predictionof fully-turbulent systems using deep learning. We proposed a multi-fidelity approachfor which a computationally inexpensive low-fidelity solver is used as a conditionalinput to a deep generative model that predicts fluid realizations at high-fidelity res-olution and accuracy. The model, Transient Multi-fidelity Glow (TM-Glow), is aconditional invertible neural network that allows for the analytical evaluation of thelikelihood though the change of variables formula. TM-Glow is trained using vari-ational backwards KL divergence loss which allows for the seamless combination of44ata-driven and physics-constraint based learning. This model was demonstratedon two numerical examples to surrogate model turbulent flow at different Reynoldsnumbers as well as a stochastic boundary. With just the low-fidelity solution, TM-Glow was able to predict diverse samples of turbulent flow time-series that produceaccurate mean field/turbulent statistics with error bars for uncertainty quantifica-tion.The multi-fidelity aspect of our model is a key ingredient. The low-fidelity inputprovides critical information to the generative model such as information regardingboundary conditions, mean flow properties and general flow field structure. Whilethis low-fidelity simulation is typically inaccurate, it is a reliable starting point forthe model to extrapolate from. The prediction from low- to high-fidelity is a signifi-cantly simpler problem compared to a blind high-fidelity flow prediction allowing forreduced training data-set sizes and training time. For this reason, we believe thatdeep learning has significant potential in multilevel/multi-fidelity modeling of a vastnumber of physical systems where it can be used on even very high-dimensional com-plex phenomena due to a low-fidelity solver aiding the machine learning model. Inthis spirit, future steps to be investigated include the extension of this model to othermulti-fidelity physical systems. Additionally, as the deep learning field evolves, moremodern architectures and training techniques could be integrated into the model toincrease its predictive capability. Regardless of potential future directions, TM-Glowdemonstrated that modern generative deep learning methods can be used effectivelyfor multi-fidelity modeling of complex dynamical systems.
Acknowledgements
The authors acknowledge support from the Defense Advanced Research ProjectsAgency (DARPA) under the Physics of Artificial Intelligence (PAI) program (con-tract HR00111890034). Computing resources were provided by the AFOSR Officeof Scientific Research through the DURIP program and by the University of NotreDame’s Center for Research Computing (CRC). The work of NG was also supportedby the National Science Foundation (NSF) Graduate Research Fellowship Programgrant No. DGE-1313583. 45 eferences [1] S. B. Pope, Turbulent flows, Cambridge University Press, Cambridge, 2000.[2] P. Sagaut, Multiscale and multiresolution approaches in turbulence: LES, DESand hybrid RANS/LES methods: applications and guidelines, World Scientific,2013.[3] S. M. Mitran, A comparison of adaptive mesh refinement approaches for largeeddy simulation, Tech. rep., Washington University, Seattle, Department of Ap-plied Mathematics (2001).[4] M. Terracol, P. Sagaut, C. Basdevant, A multilevel algorithm for large-eddysimulation of turbulent compressible flows, Journal of Computational Physics167 (2) (2001) 439 – 474. doi:10.1006/jcph.2000.6687 .URL [5] J. Hoffman, C. Johnson, A new approach to computational turbulencemodeling, Computer Methods in Applied Mechanics and Engineer-ing 195 (23) (2006) 2865 – 2880, incompressible CFD. doi:https://doi.org/10.1016/j.cma.2004.09.015 .URL [6] C. G. Speziale, Computing non-equilibrium turbulent flows with time-dependentRANS and VLES, in: Fifteenth International Conference on Numerical Methodsin Fluid Dynamics, Springer, 1997, pp. 123–129. doi:10.1007/BFb0107089 .[7] A. Travin, M. Shur, M. Strelets, P. R. Spalart, Physical and numerical upgradesin the detached-eddy simulation of complex turbulent flows, in: R. Friedrich,W. Rodi (Eds.), Advances in LES of Complex Flows, Springer Netherlands,Dordrecht, 2002, pp. 239–254. doi:10.1007/0-306-48383-1\_16 .[8] P. Qumr, P. Sagaut, Zonal multi-domain rans/les simulations of turbulent flows,International Journal for Numerical Methods in Fluids 40 (7) (2002) 903–925. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/fld.381 , doi:10.1002/fld.381 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/fld.381 arXiv:https://doi.org/10.2514/1.3488 , doi:10.2514/1.3488 .URL https://doi.org/10.2514/1.3488 [10] M. Terracol, E. Manoha, C. Herrero, E. Labourasse, S. Redonnet, P. Sagaut,Hybrid methods for airframe noise numerical prediction, Theoretical andComputational Fluid Dynamics 19 (3) (2005) 197–227. doi:10.1007/s00162-005-0165-5 .[11] J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence modellingusing deep neural networks with embedded invariance, Journal of Fluid Me-chanics 807 (2016) 155166. doi:10.1017/jfm.2016.615 .URL https://doi.org/10.1007/s00162-005-0165-5 [12] H. Xiao, J.-L. Wu, J.-X. Wang, R. Sun, C. Roy, Quantifying and reducingmodel-form uncertainties in Reynolds-averaged Navier–Stokes simulations: Adata-driven, physics-informed Bayesian approach, Journal of ComputationalPhysics 324 (2016) 115–136. doi:10.1016/j.jcp.2016.07.038 .URL [13] J.-X. Wang, J.-L. Wu, H. Xiao, Physics-informed machine learning approachfor reconstructing Reynolds stress modeling discrepancies based on DNS data,Phys. Rev. Fluids 2 (2017) 034603. doi:10.1103/PhysRevFluids.2.034603 .URL https://link.aps.org/doi/10.1103/PhysRevFluids.2.034603 [14] N. Geneva, N. Zabaras, Quantifying model form uncertainty inReynolds-averaged turbulence models with Bayesian deep neural net-works, Journal of Computational Physics 383 (2019) 125 – 147. doi:10.1016/j.jcp.2019.01.021 .URL [15] S. Taghizadeh, F. D. Witherden, S. S. Girimaji, Turbulence closure modelingwith data-driven techniques: physical compatibility and consistency considera-tions, arXiv preprint arXiv:2004.03031.[16] Z. Wang, K. Luo, D. Li, J. Tan, J. Fan, Investigations of data-driven closure forsubgrid-scale stress in large-eddy simulation, Physics of Fluids 30 (12) (2018)4725101. doi:10.1063/1.5054835 .URL https://doi.org/10.1063/1.5054835 [17] C. J. Lapeyre, A. Misdariis, N. Cazard, D. Veynante, T. Poinsot,Training convolutional neural networks to estimate turbulent sub-gridscale reaction rates, Combustion and Flame 203 (2019) 255 – 264. doi:https://doi.org/10.1016/j.combustflame.2019.02.019 .URL [18] R. Maulik, O. San, A. Rasheed, P. Vedula, Subgrid modelling for two-dimensional turbulence using neural networks, Journal of Fluid Mechanics 858(2019) 122144. doi:10.1017/jfm.2018.770 .[19] J. Wu, H. Xiao, R. Sun, Q. Wang, Reynolds-averaged Navier-Stokes equationswith explicit data-driven Reynolds stress closure can be ill-conditioned, Journalof Fluid Mechanics 869 (2019) 553586. doi:10.1017/jfm.2019.205 .[20] J. Rabault, M. Kuchta, A. Jensen, U. R´eglade, N. Cerardi, Artificial neuralnetworks trained through deep reinforcement learning discover control strategiesfor active flow control, Journal of Fluid Mechanics 865 (2019) 281302. doi:10.1017/jfm.2019.62 .[21] K. Bieker, S. Peitz, S. L. Brunton, J. N. Kutz, M. Dellnitz, Deep model predic-tive control with online learning for complex physical systems, arXiv preprintarXiv:1905.10094.[22] J. Tompson, K. Schlachter, P. Sprechmann, K. Perlin, Accelerating eulerianfluid simulation with convolutional networks, in: Proceedings of the 34th Inter-national Conference on Machine Learning - Volume 70, ICML’17, JMLR.org,2017, pp. 3424–3433.URL http://dl.acm.org/citation.cfm?id=3305890.3306035 [23] S. Wiewel, M. Becher, N. Thuerey, Latent space physics: Towards learn-ing the temporal evolution of fluid flow, Computer Graphics Forum 38 (2)(2019) 71–82. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13620 , doi:10.1111/cgf.13620 .URL https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13620 [24] B. Kim, V. C. Azevedo, N. Thuerey, T. Kim, M. Gross, B. Solenthaler, Deep flu-ids: A generative network for parameterized fluid simulations, Computer Graph-ics Forum 38 (2) (2019) 59–70. arXiv:https://onlinelibrary.wiley.com/ oi/pdf/10.1111/cgf.13619 , doi:10.1111/cgf.13619 .URL https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13619 [25] X. Guo, W. Li, F. Iorio, Convolutional neural networks for steady flow approx-imation, in: Proceedings of the 22Nd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, KDD ’16, ACM, 2016, pp. 481–490. doi:10.1145/2939672.2939738 .URL http://doi.acm.org/10.1145/2939672.2939738 [26] L. Sun, H. Gao, S. Pan, J.-X. Wang, Surrogate modeling for fluid flows based onphysics-constrained deep learning without simulation data, Computer Methodsin Applied Mechanics and Engineering 361 (2020) 112732. doi:10.1016/j.cma.2019.112732 .[27] M. Raissi, Z. Wang, M. S. Triantafyllou, G. E. Karniadakis, Deep learningof vortex-induced vibrations, Journal of Fluid Mechanics 861 (2019) 119137. doi:10.1017/jfm.2018.872 .[28] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: Adeep learning framework for solving forward and inverse problems involvingnonlinear partial differential equations, Journal of Computational Physics 378(2019) 686 – 707. doi:https://doi.org/10.1016/j.jcp.2018.10.045 .URL [29] R. Han, Y. Wang, Y. Zhang, G. Chen, A novel spatial-temporal predictionmethod for unsteady wake flows based on hybrid deep neural network, Physicsof Fluids 31 (12) (2019) 127101. doi:10.1063/1.5127247 .URL https://doi.org/10.1063/1.5127247 [30] O. Hennigh, Lat-net: compressing lattice Boltzmann flow simulations using deepneural networks, arXiv preprint arXiv:1705.09036.[31] A. Mohan, D. Daniel, M. Chertkov, D. Livescu, Compressed convolutional lstm:An efficient deep learning framework to model high fidelity 3d turbulence, arXivpreprint arXiv:1903.00033.[32] M. Werhahn, Y. Xie, M. Chu, N. Thuerey, A multi-pass GAN for fluid flowsuper-resolution, arXiv preprint arXiv:1906.01689.4933] A. Subramaniam, M. L. Wong, R. D. Borker, S. Nimmagadda, S. K. Lele,Turbulence enrichment using generative adversarial networks, arXiv preprintarXiv:2003.01907.[34] J. Holgate, A. Skillen, T. Craft, A. Revell, A review of embedded large eddysimulation for internal flows, Archives of Computational Methods in Engineering26 (4) (2019) 865–882. doi:10.1007/s11831-018-9272-5 .[35] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoderdecoder networks forsurrogate modeling and uncertainty quantification, Journal of ComputationalPhysics 366 (2018) 415 – 447. doi:10.1016/j.jcp.2018.04.018 .URL [36] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, P. Perdikaris, Physics-constraineddeep learning for high-dimensional surrogate modeling and uncertainty quan-tification without labeled data, Journal of Computational Physics 394 (2019)56 – 81. doi:https://doi.org/10.1016/j.jcp.2019.05.024 .URL [37] N. Geneva, N. Zabaras, Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks, Journal of Computational Physics(2019) 109056 doi:10.1016/j.jcp.2019.109056 .URL [38] D. P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprintarXiv:1312.6114.[39] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neuralinformation processing systems, 2014, pp. 2672–2680.[40] L. Dinh, D. Krueger, Y. Bengio, Nice: Non-linear independent componentsestimation, arXiv preprint arXiv:1410.8516.[41] L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using real nvp, arXivpreprint arXiv:1605.08803. 5042] D. P. Kingma, P. Dhariwal, Glow: Generative flow with invertible 1x1 con-volutions, in: Advances in Neural Information Processing Systems, 2018, pp.10215–10224.[43] J.-H. Jacobsen, A. Smeulders, E. Oyallon, i-revnet: Deep invertible networks,arXiv preprint arXiv:1802.07088.[44] M. Kumar, M. Babaeizadeh, D. Erhan, C. Finn, S. Levine, L. Dinh,D. Kingma, Videoflow: A flow-based generative model for video, arXiv preprintarXiv:1903.01434.[45] E. G. Tabak, E. Vanden-Eijnden, et al., Density estimation by dual ascent of thelog-likelihood, Communications in Mathematical Sciences 8 (1) (2010) 217–233.[46] E. G. Tabak, C. V. Turner, A family of nonparametric density estimation al-gorithms, Communications on Pure and Applied Mathematics 66 (2) (2013)145–164.[47] L. Ardizzone, C. L¨uth, J. Kruse, C. Rother, U. K¨othe, Guided image generationwith conditional invertible neural networks, arXiv preprint arXiv:1907.02392.[48] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.[49] X. SHI, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, W.-c. WOO, Con-volutional LSTM network: A machine learning approach for precipitationnowcasting, in: Advances in Neural Information Processing Systems 28, CurranAssociates, Inc., 2015, pp. 802–810.URL http://papers.nips.cc/paper/5955-convolutional-lstm-network-a-machine-learning-approach-for-precipitation-nowcasting.pdf [50] S. Mo, Y. Zhu, N. Zabaras, X. Shi, J. Wu, Deep convolutional encoder-decoder networks for uncertainty quantification of dynamic multiphase flowin heterogeneous media, Water Resources Research 55 (1) (2019) 703–728. doi:10.1029/2018WR023528 .URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018WR023528 [51] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely connectedconvolutional networks, in: The IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2017.5152] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network trainingby reducing internal covariate shift, arXiv preprint arXiv:1502.03167.[53] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in:Proceedings of the fourteenth international conference on artificial intelligenceand statistics, 2011, pp. 315–323.[54] W. Grathwohl, R. T. Chen, J. Betterncourt, I. Sutskever, D. Duvenaud, Ffjord:Free-form continuous dynamics for scalable reversible generative models, arXivpreprint arXiv:1810.01367.[55] C. Li, J. Li, G. Wang, L. Carin, Learning to sample with adversarially learnedlikelihood-ratio (2018).URL https://openreview.net/forum?id=S1eZGHkDM [56] Y. Yang, P. Perdikaris, Adversarial uncertainty quantification in physics-informed neural networks, Journal of Computational Physics 394 (2019) 136 –152. doi:10.1016/j.jcp.2019.05.027 .URL [57] R. Kumar, S. Ozair, A. Goyal, A. Courville, Y. Bengio, Maximum entropygenerators for energy-based models, arXiv preprint arXiv:1901.08508.[58] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang, A tutorial on energy-based learning, Predicting structured data.[59] I. Sobel, G. Feldman, A 3x3 isotropic gradient operator for image processing,Presented at a talk at the Stanford Artificial Intelligence Project (1968) 271–272.[60] W. Xiong, W. Luo, L. Ma, W. Liu, J. Luo, Learning to generate time-lapsevideos using multi-stage dynamic generative adversarial networks, in: Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition,2018, pp. 2364–2373.[61] L. Zhao, X. Peng, Y. Tian, M. Kapadia, D. Metaxas, Learning to forecast andrefine residual motion for image-to-video generation, in: Proceedings of theEuropean Conference on Computer Vision (ECCV), 2018, pp. 387–403.[62] P. Liu, X. Qiu, X. Chen, S. Wu, X.-J. Huang, Multi-timescale long short-termmemory neural network for modelling sentences and documents, in: Proceedings52f the 2015 conference on empirical methods in natural language processing,2015, pp. 2326–2335.[63] J. Chung, S. Ahn, Y. Bengio, Hierarchical multiscale recurrent neural networks,arXiv preprint arXiv:1609.01704.[64] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXivpreprint arXiv:1412.6980.[65] E. Erturk, Numerical solutions of 2-D steady incompressible flowover a backward-facing step, Part I: High Reynolds number solu-tions, Computers and Fluids 37 (6) (2008) 633 – 655. doi:https://doi.org/10.1016/j.compfluid.2007.09.003 .URL [66] L. Chen, K. Asai, T. Nonomura, G. Xi, T. Liu, A review ofBackward-Facing Step (BFS) flow mechanisms, heat transfer and con-trol, Thermal Science and Engineering Progress 6 (2018) 194 – 216. doi:https://doi.org/10.1016/j.tsep.2018.04.004 .URL [67] H. Jasak, A. Jemcov, Z. Tukovic, et al., OpenFOAM: A C++ library for com-plex physics simulations, in: International workshop on coupled methods innumerical dynamics, Vol. 1000, IUC Dubrovnik, Croatia, 2007, pp. 1–20.[68] J. C. Hunt, A. A. Wray, P. Moin, Eddies, streams, and convergence zones inturbulent flows, in: Center for Turbulence Research Report, Vol. CTR-S88,1988.URL https://ntrs.nasa.gov/search.jsp?R=19890015184 [69] G. Haller, An objective definition of a vortex, Journal of Fluid Mechanics 525(2005) 1–26. doi:10.1017/S0022112004002526 .[70] Y.-H. Tseng, C. Meneveau, M. B. Parlange, Modeling flow around bluff bodiesand predicting urban dispersion using large eddy simulation, EnvironmentalScience & Technology 40 (8) (2006) 2653–2662. doi:10.1021/es051708m .URL https://doi.org/10.1021/es051708m doi:10.1061/(ASCE)0733-9429(1998)124:3(288) .URL https://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9429%281998%29124%3A3%28288%29 [72] W. Huang, Q. Yang, H. Xiao, CFD modeling of scale effects on tur-bulence flow and scour around bridge piers, Computers and Fluids38 (5) (2009) 1050 – 1058, advances in Computational Fluid Dynamics. doi:https://doi.org/10.1016/j.compfluid.2008.01.029 .URL [73] M. Samorani, The wind farm layout optimization problem, in: P. M. Pardalos,S. Rebennack, M. V. F. Pereira, N. A. Iliadis, V. Pappu (Eds.), Handbook ofWind Power Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp.21–38. doi:10.1007/978-3-642-41080-2\_2 .URL https://doi.org/10.1007/978-3-642-41080-2_2 [74] J. S. Gonzlez, A. G. G. Rodriguez], J. C. Mora, J. R. Santos,M. B. Payan, Optimization of wind farm turbines layout using anevolutive algorithm, Renewable Energy 35 (8) (2010) 1671 – 1681. doi:https://doi.org/10.1016/j.renene.2010.01.010 .URL [75] M. H. Patel, Dynamics of offshore structures, Butterworth-Heinemann, 2013.[76] J. Smagorinsky, General circulation experiments with the primitive equations:I. the basic experiment, Monthly Weather Review 91 (3) (1963) 99–164. doi:10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2 .URL https://doi.org/10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2https://doi.org/10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2