[PDF] Using Machine Learning to Augment Coarse-Grid Computational Fluid Dynamics Simulations

Abstract

Simulation of turbulent flows at high Reynolds number is a computationally challenging task relevant to a large number of engineering and scientific applications in diverse fields such as climate science, aerodynamics, and combustion. Turbulent flows are typically modeled by the Navier-Stokes equations. Direct Numerical Simulation (DNS) of the Navier-Stokes equations with sufficient numerical resolution to capture all the relevant scales of the turbulent motions can be prohibitively expensive. Simulation at lower-resolution on a coarse-grid introduces significant errors. We introduce a machine learning (ML) technique based on a deep neural network architecture that corrects the numerical errors induced by a coarse-grid simulation of turbulent flows at high-Reynolds numbers, while simultaneously recovering an estimate of the high-resolution fields. Our proposed simulation strategy is a hybrid ML-PDE solver that is capable of obtaining a meaningful high-resolution solution trajectory while solving the system PDE at a lower resolution. The approach has the potential to dramatically reduce the expense of turbulent flow simulations. As a proof-of-concept, we demonstrate our ML-PDE strategy on a two-dimensional turbulent (Rayleigh Number Ra= 10 9 ) Rayleigh-Bénard Convection (RBC) problem.

Full PDF

aa r X i v : . [ phy s i c s . c o m p - ph ] O c t Using Machine Learning to Augment Coarse-Grid Computational Fluid DynamicsSimulations

Jaideep Pathak, ∗ Mustafa Mustafa, † Karthik Kashinath, Emmanuel Motheau, Thorsten Kurth, and Marcus Day NERSC, Lawrence Berkeley National Laboratory CRD, Lawrence Berkeley National Laboratory Nvidia Corporation (Dated: October 6, 2020)Simulation of turbulent ﬂows at high Reynolds number is a computationally challenging task rel-evant to a large number of engineering and scientiﬁc applications in diverse ﬁelds such as climatescience, aerodynamics, and combustion. Turbulent ﬂows are typically modeled by the Navier-Stokesequations. Direct Numerical Simulation (DNS) of the Navier-Stokes equations with suﬃcient nu-merical resolution to capture all the relevant scales of the turbulent motions can be prohibitively ex-pensive. Simulation at lower-resolution on a coarse-grid introduces signiﬁcant errors. We introducea machine learning (ML) technique based on a deep neural network architecture that corrects thenumerical errors induced by a coarse-grid simulation of turbulent ﬂows at high-Reynolds numbers,while simultaneously recovering an estimate of the high-resolution ﬁelds. Our proposed simula-tion strategy is a hybrid ML-PDE solver that is capable of obtaining a meaningful high-resolutionsolution trajectory while solving the system PDE at a lower resolution. The approach has the po-tential to dramatically reduce the expense of turbulent ﬂow simulations. As a proof-of-concept, wedemonstrate our ML-PDE strategy on a two-dimensional turbulent (Rayleigh Number Ra = 10 )Rayleigh-B´enard Convection (RBC) problem. Most practical ﬂows of interest are by nature turbu-lent and present a wide range of temporal and spatialscales. The modeling of such applications is challengingand a Direct Numerical Simulation (DNS) approach, inwhich the full range of spatial and temporal scales exhib-ited by the governing equations are resolved numerically,imposes a severe computational burden. The Reynoldsnumber is a non-dimensional measure of the range of tem-poral and spatial scales present in a system [1] [2], andthus plays a key role in determining the computationalresources required to accurately simulate a ﬂow system.One of the ﬁrst DNS studies of turbulence was limited toa Reynolds number Re ≈

500 [3], whereas about 15 yearslater, the same group reported a similar study [4] with Re ≈ ∗ Corresponding Author: [email protected] † Jaideep Pathak and Mustafa Mustafa contributed equally to thispaper to run than state-of-the-art physics based General Circu-lation Models with comparable accuracy. Data availabil-ity and computational resource constraints thus warrantthe development of a hybrid approach where a physics-based numerical solver works in tandem with a coupledML architecture, thus leveraging the strength of eachapproach while maintaining a reasonable computationalcost. Such hybrid approaches for model error correctionhave been previously demonstrated on low-dimensionalchaotic systems [15, 16]. Data-driven PDE discretiza-tions for ﬂuid ﬂow were discussed in Refs. [17, 18]. Recentadvances in Single Image Super-Resolution have demon-strated impressive results in the ﬁeld of computer visionand image processing (c.f. Ref. [19] for a review), wheredeep neural networks are used to increase the resolutionof coarse-grained images. Super-Resolution techniqueshave been recently demonstrated for ﬂuid dynamics andclimate applications [20–23].The main idea of this paper is to develop a strategybased on Deep Learning (DL) methods that has the po-tential to extend the Reynolds numbers accessible to de-tailed simulations. The present work demonstrates anexample ﬂow simulation computed on a coarse mesh thatis enhanced using a DL model to populate the ﬁner scalesthat are normally available only by increasing resolution,and expense, of the simulation. This technique also in-troduces a correction to the model errors resulting fromthe simulating on a coarse mesh. In contrast to a post-facto

Super-Resolution applications that works with ar-tiﬁcially coarsened simulation data, we present a generaltechnique to enhance PDE simulation data that is gener-ated by the ﬂow solver at low resolution. Our approachresults in a high-resolution estimate of system variableswhile simultaneously correcting model error introducedduring the coarse-grid PDE simulation.The remainder of this paper is structured as follows.In section I, the basic computational methodology is pre-sented. Next, in section II, the canonical problem ofRayleigh-B´enard Convection (RBC) in two dimensions isreviewed. In section III, our novel Deep Learning correc-tion algorithm method is presented. Finally, in section IVthe results show that it is possible to capture small phys-ical details with a coarse simulation that is coupled witha Deep Learning algorithm, and that important tempo-ral and spectral properties of the ﬂow can be recoveredthat compare well to a ground truth simulation at higherresolution.

I. METHODOLOGY

In this section we formalize the problem deﬁnition andproposed solution. We also state the goal of this paper inplain language. The main challenge is that the trajectoryof a coarse-grid simulation can be very diﬀerent from thatof a well-resolved solution. We investigate whether it ispossible to compute on-the-ﬂy corrections to the coarse-grid trajectory so that they then follow that of the ﬁnegrid. We do this by constructing an ML model that: 1)learns to model the error on the large scales due to thesmall scales that are missing from the coarse simulation,and 2) populates the missing small scales at each step ofthe low resolution solver.

A. Problem Deﬁnition

Consider a physical system whose evolution is de-scribed by a set of PDEs (such as the Navier-Stokes equa-tions). Let the state of the system at time t be denotedby Ψ ( t ). Typically, Ψ ( t ) will be a multi-channel ten-sor representing the ﬁeld of a physical variable such asthe temperature, pressure and velocity components. Theevolution of Ψ ( t ) under the dynamics of the PDE can berepresented by the initial value problem: ∂ t Ψ ( t ) = F [ Ψ ( t ) , ∂ x Ψ ( t )] . (1)with initial conditions, Ψ ( t = 0). In Eq. (1), F denotesa set of operators that act on Ψ and its set of (ﬁrst orhigher order) spatial derivatives, denoted by ∂ x Ψ ( t ).A variety of techniques may be employed to nu-merically evolve Eq. (1) in time, based on numer-ical resolution parameterized by the integer tuple, N . For ﬁnite-diﬀerence or ﬁnite-element approaches, N =( N , . . . , N D ), would represent the number of meshpoints across the domain in each direction, D ; for spec-tral solvers, N might specify the number of correspond-ing Fourier modes. Assume we have such a solver, repre-sented by the nonlinear operator F N . The operator F N acts on the ﬁelds X N ( t ) to evolve them in time by a time interval δt N , as an approximation solution to Eq. (1),subject to initial conditions, X N ( t ) = X N . Note that δt N typically is smaller for increasing N .We are interested in the value of the ﬁeld X N ( t + T )at some later time, t + T . We apply the operator F N on the ﬁelds X ( t ) so that, X N ( t + T ) = F ( T ) N [ X N ( t )] . (2)Here, F ( T ) N simply denotes the composite operator thatevolves the ﬁelds over an interval T via a sequence of mul-tiple (perhaps variable sized) time steps. To save com-putational cost, we could also choose to evolve the ap-propriately coarsened initial condition at a lower resolu-tion N ′ =( N /m, . . . , N D /m ) using the operator denoted F ′ ( T ) N ′ . To coarsen the initial condition, we interpolatethe ﬁelds X N ( t ) onto the coarser grid with resolution, N ′ . We call this operation ‘down-scaling’ and denotethe down-scaling interpolation operator by D m . The ap-propriate form of the operator D m may depend on thetype of CFD solver and the nature of the ﬁelds beinginterpolated, among other factors. We also construct acorresponding ‘up-scaling’ operator U m , that transformslow-resolution ﬁelds into high-resolution. Note that wedo not assume U m to be particularly sophisticated. Asimple example of an up-scaling operator could be onethat pads pixels with the value of nearest neighbors. Ingeneral, due to the highly nonlinear interactions acrosslength and time scales, F ( T ) X ( t ) = U m F ′ ( T ) [ D m X ( t )] . (3)Because information is lost in the down-scaling opera-tion, it is not generally possible to recover the output ofa high-resolution PDE solver from a low-resolution sim-ulation over a ﬁnite time interval. However, we positthat for an interval of time τ that is very small comparedto characteristic macroscopic time-scales of the system(such as the largest eddy turnover time), F ( τ ) [ X ( t )] = U m F ′ ( τ ) [ D m X ( t )] + ǫ. (4)If we are able to model the error ǫ then we can estimateand correct for the model error at regular intervals, τ ,and estimate a corrected trajectory X ml ( t ) that is closeto the true high-resolution trajectory X ( t ). B. Model Error correction with Machine Learning

We consider a Machine Learning technique to correctthe model error in a low-resolution PDE simulation andsimultaneously recover the high-resolution ﬁelds. Wechoose a small trajectory correction time interval, τ , andemphasize that over this interval the CFD solver maytake multiple (possibly adaptively chosen) time steps(typically chosen for stability/accuracy of the numericalintegration scheme). We seek to compute model errors PDE Solver Up-scaling U-Net Down-scaling

FIG. 1. MLPDE hybrid architecture illustrating the algorithm given by Eq. (15). The low resolution PDE time-stepper isfollowed by naive up-scaling, correction using a deep neural network and down-scaling with the process repeating in a closedfeedback loop. at regular intervals of τ in order to correct the PDE tra-jectory computed at low resolution using ML, and alsoto estimate the missing high-resolution ﬁelds. We train asupervised neural network to model the τ -step error ﬁeld ǫ τ ( t ) which is deﬁned by the following equations. X ( t + τ ) = F ( τ ) N [ X ( t )] , (5) x ( t + τ ) = F ′ ( τ ) N ′ [ D m [ X ( t )]] , (6)˜ X ( t + τ ) = U m [ x ( t + τ )] , (7) ǫ τ ( t + τ ) = X ( t + τ ) − ˜ X ( t + τ ) . (8)The supervised neural network, denoted by N is trainedto obtain an estimate ǫ mlτ ( t ) of the model error ǫ τ ( t ) ǫ mlτ ( t ) = N h ˜ X ( t ) i . (9)We operate the hybrid ML-PDE solver in inference modeas follows • Initialize : Start from the initial condition X N ( t )and initialize the ML-estimated trajectory X ml ( t )so that X ml ( t ) = X N ( t ) . (10) • Timestep with corrections : The following equationsare computed in a loop: x ( t + τ ) = F ′ ( τ ) N ′ (cid:2) D m (cid:2) X ml ( t ) (cid:3)(cid:3) , (11)˜ X ( t + τ ) = U m [ x ( t + τ )] , (12) ǫ mlτ ( t + τ ) = N h ˜ X ( t + τ ) i , (13) X ml ( t + τ ) = ˜ X ( t + τ ) + ǫ mlτ ( t + τ ) . (14)Eqs. (11 - 14) above can be combined to give us asingle inference equation as follows: X ml ( t + τ ) = U m h F ′ ( τ ) N ′ (cid:2) D m (cid:2) X ml ( t ) (cid:3)(cid:3)i + N h U m h F ′ ( τ ) N ′ (cid:2) D m (cid:2) X ml ( t ) (cid:3)(cid:3)ii . (15) This completes our problem deﬁnition and proposedframework for coupling Machine Learning with a PDEsolver. In the next two sections we turn our focus to aconcrete implementation of an ML architecture which wecouple to a solver to demonstrate this framework for solv-ing a canonical 2-dimensional ﬂuid convection problem,namely the Rayleigh-B´enard system of equations. II. RAYLEIGH-B´ENARD CONVECTION (RBC)

In order to demonstrate the eﬀectiveness of our hybridML-PDE architecture, we consider a two-dimensionalRayleigh-B´enard Convection (RBC) problem operatingin a regime that exhibits moderate levels of ﬁne-scale tur-bulent ﬂuctuations. The RBC problem is modeled withthe incompressible Navier-Stokes equations formulatedunder the Boussinesq approximation. Nondimensional-ization by the Rayleigh and Prandtl numbers and sub-tracting the steady conduction-only solution, gives thefollowing formulation: ∇ · u = 0 , (16) ∂ t u = − ( u · ∇ ) u − ∇ p + r P rRa ∇ u + θ e z , (17) ∂ t θ = r Ra ∇ θ − ( u · ∇ ) θ + u · e z . (18)where θ and p are the (nondimensional) deviations oftemperature and pressure from the steady solution, and u is the nondimensionalized ﬂuid velocity. The detailedderivation of this set of equations and the nondimen-sionalization parameters can be found in [24]. In thepresent study, the Prandtl and Rayleigh numbers are setto P r = 0 . Ra = 10 , respectively.The equations (16)-(18) are solved in a 2D compu-tational domain with unit aspect ratio, Γ=1. No-slip( u =0), isothermal ( θ =0) boundary conditions are im-posed on the upper and lower walls, while periodicityis applied to the lateral boundaries. The initial velocityand pressure ﬂuctuations are set to zero and an initialproﬁle on the ﬂuctuating temperature is created with arandom seed.To evolve the system numerically, we use Dedalus [25],an open-source spectral framework for solving a generalset of partial diﬀerential equations. We note that due tothe non-dissipative nature of the spectral discretizationused in Dedalus, stability considerations limit the max-imum numerical time step as a function of Ra . In thecases considered below, the solutions were computed onthe coarsest meshes using 1/10 of the computed stablestep size. Larger values of Ra would increase the energycontained within the under-resolved ﬂuctuations at highwave number but would require an even further reducedtime step for stable temporal evolution. III. ML-PDE ARCHITECTURE

In this section we describe the hybrid ML-PDE archi-tecture that combines a PDE time stepper operating ata coarse-resolution with a convolutional ML architectureoperating as a model error corrector. The hybrid archi-tecture has a training phase and an inference phase aswe outline below. We also outline the schemes used forup- and down-scaling.

A. Up-scaling and down-scaling Operators

Throughout the rest of this paper, we will often needto transform ﬁelds from high-resolution to low resolutionand vice-versa. Since we are using a spectral CFD solvera natural choice for the up-scaling and down-scaling op-erators is one that pads or truncates modes in spectralspace. Dedalus provides the functionality to implementsuch transforms natively. When downsampling a ﬁeldwith N s spectral coeﬃcients by a factor of m , the N s spectral coeﬃcients are truncated after the ﬁrst m modesand are then transformed to a 1 /m times scaled grid inreal space. When upsampling, the spectral coeﬃcientsare padded with the appropriate number of zeros abovethe highest modes before transforming to an m timesscaled grid in real space. We denote these spectral down-scaling and up-scaling operators by D m and U m respec-tively. B. Training

1. High-Resolution Data Generation

Training data was generated using the Dedalus PDEsolver to simulate Eqs. (16-18) on a Cartesian, evenlyspaced grid with resolution of ( N x , N z )=(512 , τ = 0 .

05 Model TimeUnits (MTU). For comparison, the largest eddy turnovertime was estimated to be approximately 2 . X ( t k ) by stacking the in-stantaneous velocity, temperature and pressure ﬁelds at each spatial index. These high-resolution ﬁelds X ( t k )will be referred to as the ground truth ﬁelds.

2. Regridding

For every saved snapshot X ( t k ), we generate a low-resolution down-scaled snapshot x ( t k ) = D X ( t k )on a Cartesian, evenly spaced grid with resolution( N x , N z ) = (128 , D m , described in section III A with m = 4.

3. Pair Creation

We solve Eqs. (16 - 18) using Dedalus on a 128 × x ( t k )by a time interval τ . We denote the time-stepped snap-shot at time t = t k + τ by ˜ x ( t k +1 ). We then up-scale˜ x ( t k +1 ) using U to obtain the interpolated tensor˜ X ( t k +1 ). The high-resolution ground truth ﬁeld attime t = t k + τ , X ( t k +1 ), is then used to compute theerror tensor, E ( t k +1 ) := ( X ( t k +1 ) − ˜ X ( t k +1 ). Anordered pair P k +1 is then deﬁned according to: P k +1 := (cid:16) ˜ X ( t k +1 ) , E ( t k +1 ) (cid:17) . (19)

4. ML model training

We use N train = 10 pairs { P k } k =1 as input out-put pairs to train a deep Convolutional Neural Network(CNN) with a UNet [26] style architecture. We furtheruse N validate = 3 × pairs to tune the network ar-chitecture and optimize hyper-parameters. The valida-tion dataset was also used to monitor the validation lossduring training the network, however, a separate testdataset was used for the ﬁnal performance evaluation ofthe ML-PDE framework. The details of the architectureand tuned hyper-parameters are detailed in Appendix A.The network is trained with an L1 Norm loss betweenthe network prediction and the ground truth ﬁelds.The all-convolutional network architecture is indepen-dent of the input spatial grid size, this allows us to trainthe network on smaller grid inputs and then apply it onthe full grid during inference. We leverage this feature totrain the network on random (256 × × C. Inference

We generate 10 snapshots X ( t k ) by initializing thesolver with a new initial condition (separate from all ini-tial conditions used to create the training and validationdata sets) using the Dedalus PDE solver at resolution512 × { X i } i =1 . From each of these snapshots,we generate a down-scaled snapshot { x i } i =1 using theprocedure described in Sec. III B 2. Ground Truth:

The trajectories resulting from evolv-ing each of the the initial conditions in { X i } i =1 usingEqs. (16 - 18) with Dedalus at resolution 512 ×

512 will beconsidered the ‘ground truth’ and denoted by { X itruth ( t ) } Baseline:

As a baseline comparison, we evolve each ofthe { x i } i =1 using Eqs. (16 - 18) at 128 ×

128 resolutionusing Dedalus and denote the resulting trajectories by x ibase ( t ) ML:

Starting with each of the initial conditions in { X i } i =1 we use the trained UNet in conjunction withthe Dedalus PDE solver at resolution 128 ×

128 to gen-erate a trajectory according to Eq. (15). The resultingtrajectory is denoted by { X iml ( t ) } i =1 . IV. RESULTS

We evaluate the ﬁdelity of the hybrid ML-PDE tra-jectories to the ground truth trajectories by evalu-ating the Root Mean Square (RMS) error e iml ( t ) = hk D X iml ( t ) − D X itruth ( t ) k i / . Additionally we alsoevaluate the baseline RMS error e ibase ( t ) = hk x ibase ( t ) − D (cid:2) X itruth ( t ) (cid:3) k i / . Fig. 2 shows the RMS error curves e iml , e ibase for the 20 trials along with the mean RMSerror over the 20 trials.Figure 3 shows the power spectral density of the θ ﬁeldafter 100 time steps for the Ground Truth ( X truth ( t )), theML-PDE trajectory ( X ml ( t )) and the up-scaled baseline( U x base ( t )). e l(t) R M S E t FIG. 2. The RMS error in the ML augmented simulation e iml ( t ) (red) and the Baseline low resolution PDE simulation e ibase ( t ) (blue) for 20 diﬀerent trajectories starting from dif-fering initial conditions. The dark red (blue) line indicatesthe average RMS error over the 20 trajectories for the MLaugmented (baseline) simulation -11 -23 k P ( k ) ML-PDEUp-scaled PDEGround Truth

FIG. 3. The Power Spectral Density (PSD) of the θ variablein the ML-PDE trajectory X iml ( t ) (red), the ground truth tra-jectory X itruth ( t ) (green), and the up-scaled baseline trajec-tory U x ibase ( t ) (blue) at t = 100 τ averaged over 20 diﬀerentintial conditions i . In Figure 4, we present a snapshot of the θ variable attwo diﬀerent instants of time, t = 175 τ and t = 200 τ asgenerated by the ML-PDE solver, the up-scaled baselinelow-resolution PDE solver along with the ground truthfor comparison. We observe signiﬁcant, visually percep-tible diﬀerences between the baseline solution and theground truth whereas the ML-PDE solution has retainsgreater ﬁdelity to the ground truth solution. V. CONCLUSIONS

We have introduced a novel ML-PDE hybrid architec-ture that can be used to eﬀectively enhance the accuracyof a low-resolution solution of a complex application in (a) (b) (c)(d) (e) (f)

FIG. 4. (a), (d): Snapshot of the instantaneous temperature deviation ( θ ) ﬁeld from a ground truth simulation at time-steps t = 175 τ and t = 200 τ respectively. (b), (e): The instantaneous θ ﬁeld snapshot from the ML-PDE simulation. (c), (f): Theinstantaneous θ ﬁeld snapshot from a baseline PDE simulation at (128 , U . Interesting visually perceptible diﬀerences in the solution snapshots have been highlighted with a green box. incompressible ﬂuid dynamics. The ML-PDE strategyfeatures the use of a neural network to model the er-rors that accumulate over short time intervals betweena coarse-grid evolution of the target PDE system and asolution of the system at much higher resolution. Theresulting ML model is used to generate periodic correc-tions to a coarse-grid solution of the PDE system in away that attempts to account for the missing ﬁne scales.We demonstrated our ML-PDE hybrid architecture us-ing a canonical two-dimensional Rayleigh-B´enard Con-vection (RBC) system operating in a regime that exhibitsa range of turbulent ﬂuctuations with signiﬁcant energyin high wave number modes. The ML-assisted coarse-grid evolution resulted in corrected solution trajectoriesthat were consistent with the solutions computed at amuch higher resolution in space and time.While this demonstration example was carried out fora two-dimensional, quasi-steady ﬂow, the approach canlikely be generalized to more complex systems, and hasthe potential for pushing the envelope of high Reynoldsnumber simulations across a variety of application ar-eas. We acknowledge several important limitations ofour initial implementation reported here. First, three-dimensional turbulence problems of considerably moregeneral interest will be dramatically more challenging toaddress, in terms of both complexity and required com-putational resources. However, by the same arguments,we are optimistic that there is an opportunity for consid-erable beneﬁt of our approach in that context. We notethat there is scope for further improving the architecture by introducing physics-based constraints on the outputof the neural network, adversarial training of the neu-ral network [35] and exploring diﬀerent neural networkarchitectures. For the two-dimensional example here,our approach is based on modeling ﬁne scale features ofthe quasi-steady solution, and cannot in its present formcapture the transient behaviour. Finally, we expect toexplore the generalizeability of this modeling approachfor the 2D RBC problem across varying Reynolds num-bers, domain geometry and boundary conditions in fu-ture work. We hope that our work will motivate furtherresearch in all of these directions. VI. CODE AND DATA

The code and data required to reproduce theresults presented in the paper will be availableat https://github.com/jdppthk/ML-PDE upon publica-tion.

VII. ACKNOWLEDGEMENTS

We would like to acknowledge the critical formativecontributions to this work by Adrian Albert, who unfor-tunately passed before this manuscript was completed.We would also like to thank Prabhat, Peter Harringtonand Wahid Bhimji for helpful comments and discussionsthroughout. This research used resources of the NationalEnergy Research Scientiﬁc Computing Center (NERSC), a U.S. Department of Energy Oﬃce of Science User Facil-ity operated under Contract No. DE-AC02-05CH11231. [1] S. Grossmann and D. Lohse, Physical Review E ,016305 (2002).[2] Note that in thermal convection of the form consideredin this paper, the Reynolds number has been shown todepend on the Rayleigh number as an approximate powerlaw of the form Re ∼ Ra β for a ﬁxed Prandtl number.c.f. Ref [1].[3] R. D. Moser, J. Kim, and N. N.Mansour, Physics of Fluids , 943 (1999),https://doi.org/10.1063/1.869966.[4] M. Lee and R. D. Moser,Journal of Fluid Mechanics , 395–415 (2015).[5] S. L. Brunton, B. R. Noack, and P. Koumoutsakos, An-nual Review of Fluid Mechanics , 477 (2020).[6] K. Duraisamy, G. Iaccarino, and H. Xiao, Annual Reviewof Fluid Mechanics , 357 (2019).[7] R. Maulik, O. San, A. Rasheed, and P. Vedula, Journalof Fluid Mechanics , 122 (2019).[8] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, Phys-ical review letters , 024102 (2018).[9] S. Rasp, M. S. Pritchard, and P. Gentine, Proceedingsof the National Academy of Sciences , 9684 (2018).[10] S. Scher, Geophysical Research Letters , 12 (2018).[11] S. Scher and G. Messori, Geoscientiﬁc Model Develop-ment , 2797 (2019).[12] P. D. Dueben and P. Bauer, Geoscientiﬁc Model Devel-opment , 3999 (2018).[13] J. A. Weyn, D. R. Durran, and R. Caruana, Journal ofAdvances in Modeling Earth Systems , 2680 (2019).[14] T. Arcomano, I. Szunyogh, J. Pathak, A. Wikner, B. R.Hunt, and E. Ott, Geophysical Research Letters ,e2020GL087776 (2020).[15] J. Pathak, A. Wikner, R. Fussell, S. Chandra, B. R.Hunt, M. Girvan, and E. Ott, Chaos: An Interdisci-plinary Journal of Nonlinear Science , 041101 (2018).[16] Z. Y. Wan, P. Vlachas, P. Koumoutsakos, and T. Sapsis,PloS one , e0197704 (2018).[17] Y. Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner,Proceedings of the National Academy of Sciences ,15344 (2019).[18] J. Zhuang, D. Kochkov, Y. Bar-Sinai, M. P. Brenner,and S. Hoyer, arXiv preprint arXiv:2004.05477 (2020).[19] W. Yang, X. Zhang, Y. Tian, W. Wang, J.-H. Xue,and Q. Liao, IEEE Transactions on Multimedia , 3106(2019).[20] C. M. Jiang, S. Esmaeilzadeh, K. Azizzadenesheli,K. Kashinath, M. Mustafa, H. A. Tchelepi, P. Marcus,A. Anandkumar, et al. , arXiv preprint arXiv:2005.01463(2020).[21] K. Stengel, A. Glaws, D. Hettinger, and R. N. King,Proceedings of the National Academy of Sciences ,16805 (2020).[22] B. Liu, J. Tang, H. Huang, and X.-Y. Lu, Physics ofFluids , 025105 (2020).[23] Y. Xie, E. Franz, M. Chu, and N. Thuerey, ACM Trans-actions on Graphics (TOG) , 1 (2018). [24] A. Pandey and M. K. Verma,Physics of Fluids , 095105 (2016),https://doi.org/10.1063/1.4962307.[25] K. J. Burns, G. M. Vasil, J. S.Oishi, D. Lecoanet, and B. P. Brown,Physical Review Research , 023068 (2020),arXiv:1905.10388 [astro-ph.IM].[26] O. Ronneberger, P. Fischer, and T. Brox, in Inter-national Conference on Medical image computing andcomputer-assisted intervention (Springer, 2015) pp. 234–241.[27] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De-Vito, M. Raison, A. Tejani, S. Chilamkurthy,B. Steiner, L. Fang, J. Bai, and S. Chintala, in

Advances in Neural Information Processing Systems 32 ,edited by H. Wallach, H. Larochelle, A. Beygelzimer,F. d Alche-Buc, E. Fox, and R. Garnett (CurranAssociates, Inc., 2019) pp. 8024–8035.[28] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw,E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan, et al. , in { USENIX } Symposium on Operating Sys-tems Design and Implementation ( { OSDI } (2018) pp.561–577.[29] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gon-zalez, and I. Stoica, arXiv preprint arXiv:1807.05118(2018).[30] “Torch distributed data parallel,” (accessed on09.16.2020), Pytorch DDP.[31] “Nvidia collective communications library,” (accessed on09.16.2020), Nvidia NCCL.[32] P. Micikevicius, S. Narang, J. Alben, G. Diamos,E. Elsen, D. Garcia, B. Ginsburg, M. Hous-ton, O. Kuchaiev, G. Venkatesh, and H. Wu, in International Conference on Learning Representations (2018).[33] “Introducing native pytorch automatic mixed preci-sion for faster training on nvidia gpus,” (accessed on09.16.2020), Pytorch AMP.[34] L. Biewald, “Experiment tracking with weights and biases,”(2020), software available from wandb.com.[35] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, and Y. Ben-gio, in

Advances in neural information processing systems (2014) pp. 2672–2680.[36] “Apex (a pytorch extension) opti-mizers,” (accessed on 09.16.2020), https://nvidia.github.io/apex/optimizers.html . A. NEURAL NETWORK ARCHITECTUREAND TRAINING

The network follows the standard UNet design [26]with a contracting path and an expanding path. Thecontracting path consists of 7 convolution layers with ker-nel size= 4 and stride= 2, the input is padded in orderto make the strided convolutions output exactly half ofthe input spatial size. The expanding path consists of7 transposed convolution layers with kernel size= 4 andstride= 2, again padding is chosen to make the outputexactly double the input spatial size. To make this auto-encoder a UNet, the output of each convoultion layerfrom the contracting path is concatenated to the inputof the corresponding layer in the expanding path, for ex-ample the output of the ﬁrst layer is concatenated withthe output of the penultimate layer to form the input ofthe very last layer of the network. Table I provides modelmore details on the architecture.The network was trained with apex.optimizers.FusedAdam [36], a variant of Stochas-tic Gradient Descent optimizers, with batch-size= 30and learning-rate= 0 . Output shape No. of params.

Input variables 4 × ×

512 -ConvBlock 128 × ×

256 9kConvBlock 256 × ×

128 525kConvBlock 512 × ×

64 2.1MConvBlock 1024 × ×

32 8.4MConvBlock 1024 × ×

16 16.8MConvBlock 1024 × × × × × × × ×

16 33.5MTConvBlock 1024 × ×

32 33.5MTConvBlock 512 × ×

64 16.8MTConvBlock 256 × ×

128 4.2MTConvBlock 128 × ×

256 1MTConv 4 × ×

512 16kTotal trainable parameters

TABLE I. Neural network architecture. Each ConvBlock con-sists of a Conv2D followed by a BatchNorm and LeakyReLUlayers. Each TConvBlock consists of a TransponsedConv2Dfollowed by a BatchNorm and ReLU layers. Last layer is asingle TransposedConv2D layer followed by a tanh activation.Output shape is (channel × N x × N y ). Note that the outputshape is illustrated for the (512 × ××