Stable a posteriori LES of 2D turbulence using convolutional neural networks: Backscattering analysis and generalization to higher Re via transfer learning
Yifei Guan, Ashesh Chattopadhyay, Adam Subel, Pedram Hassanzadeh
SStable a posteriori
LES of 2D turbulence usingconvolutional neural networks: Backscattering analysisand generalization to higher Re via transfer learning Yifei Guan , Ashesh Chattopadhyay , Adam Subel , and Pedram Hassanzadeh , Department of Mechanical Engineering, Rice University, Houston, TX, 77005, United States Department of Earth, Environmental and Planetary Sciences, Rice University, Houston, TX, 77005, United States
Abstract
There is a growing interest in developing data-driven subgrid-scale (SGS) models for large-eddy simulation (LES) using machine learning (ML). In a priori (offline) tests, some recentstudies have found ML-based data-driven SGS models that are trained on high-fidelity data(e.g., from direct numerical simulation, DNS) to outperform baseline physics-based models andaccurately capture the inter-scale transfers, both forward (diffusion) and backscatter. Whilepromising, instabilities in a posteriori (online) tests and inabilities to generalize to a differentflow (e.g., with a higher Reynolds number, Re ) remain as major obstacles in broadening theapplications of such data-driven SGS models. For example, many of the same aforementionedstudies have found instabilities that required often ad-hoc remedies to stabilize the LES at theexpense of reducing accuracy. Here, using 2D decaying turbulence as the testbed, we showthat deep fully convolutional neural networks (CNNs) can accurately predict the SGS forcingterms and the inter-scale transfers in a priori tests, and if trained with enough samples, lead tostable and accurate a posteriori LES-CNN. Further analysis attributes these instabilities to thedisproportionally lower accuracy of the CNNs in capturing backscattering when the training setis small. We also show that transfer learning, which involves re-training the CNN with a smallamount of data (e.g., 1%) from the new flow, enables accurate and stable a posteriori
LES-CNNfor flows with 16 × higher Re (as well as higher grid resolution if needed). These results showthe promise of CNNs with transfer learning to provide stable, accurate, and generalizable LESfor practical use. Accurate simulations of turbulent flows are of critical importance for predicting and understand-ing various engineering and natural systems. However, the direct numerical simulation (DNS) ofthe Navier-Stokes equations remains computationally prohibitive for many real-world applicationsbecause DNS requires resolving (i.e., directly solving for) all the relevant spatial and temporalscales. These scales might span several orders of magnitude, e.g., from the domain length downto the Kolmogorov scale [62, 81]. Large-eddy simulation (LES) offers a balance between accuracyand computational cost, since in LES, only the part of the inertial range containing the large-scalestructures is resolved on a coarse-resolution grid and the effects of the subgrid-scale (SGS) eddiesare parameterized, in terms of the resolved flow, using a SGS model [81, 87]. As a result, the ∗ [email protected] † [email protected] a r X i v : . [ phy s i c s . f l u - dyn ] F e b uality of the solutions from LES highly depends on the quality of the SGS model. Consequently,formulating accurate SGS models for LES has been an active area of research for the past fewdecades in different disciplines [e.g., 18, 58, 83, 87, 90, 91]. Below, we briefly describe some ofthe key physics-based SGS models and their major shortcomings, which have motivated the recentinterest in using machine learning (ML) to find data-driven SGS models. Then, we discuss some ofthe advances in data-driven SGS modeling as well as the main challenges, some of which we aimto address in this paper.In his pioneering work on developing one of the first global climate models, Smagorinsky pro-posed a physics-based SGS model for LES in 1963 [95]. In this model (SMAG, hereafter), effectsof the SGS eddies are parameterized as a function of the resolved flow using a scale-selective dis-sipative model that consists of a positive eddy viscosity ν e and second-order diffusion. Since then,the SMAG model and its variants have been widely used in different disciplines, for example, tosimulate weather and climate variability, combustion, multiphase flows, wind farms, and magneto-hydrodynamics [e.g., 1, 26, 27, 40, 77, 80, 86, 97, 102]. Such purely diffusive SGS models lead tonumerically stable LES; however, they might not correctly capture the inter-scale physical processessuch as energy (and enstrophy) transfers. These models often include second-order dissipation buthigher orders can also contribute to the forward transfer, i.e., transfer from the resolved scales tothe subgrid scales [58]. Furthermore, while in the mean, the transfer is forward and the role ofSGS processes is indeed dissipative, it is known that locally there can be transfer from the subgridscales to the resolved scales. This process is referred to as backscattering, which is missing frompurely diffusive SGS models [81].Backscattering has been found to play a significant role in various fluid flows, and extensivework has been done in different disciplines to account for it in physics-based SGS models [e.g.,9, 19, 38, 39, 47, 53, 66, 92, 103, 121]. For example, Piomelli et al. [78, 79] showed that the lackof energy backscattering in LES could lead to inaccurate prediction of the perturbation growthin transitional wall-bounded flows. Backscattering has been also found to be important in geo-physical turbulence, which has implications for modeling atmospheric and oceanic circulations andweather/climate predictions [6, 32, 33, 35, 64, 85, 93]. To improve the SMAG model and account forbackscattering, Germano et al. [30] developed a dynamic approach to compute the eddy viscosity,which could lead to ν e < ν e ≥ ν e , from high-fidelitydata (e.g., DNS or observations) [22, 57, 89, 91, 96, 112]. Alternatively, a growing number of2ecent papers have aimed to learn a data-driven SGS model from high-fidelity data, often in anon-parametric fashion, i.e., without any prior assumption about the model’s structural/functionalform [e.g., 28, 29, 36, 50, 70, 74, 82, 88, 101, 107, 108]. In the studies from the latter categorythat focused on LES, a variety of canonical fluid systems and different approaches (e.g., local vs.non-local) have been investigated. In the local approach, which often employs multilayer percep-tron artificial neural networks (ANNs), the SGS term (stress tensor or its divergence) at a gridpoint is estimated in terms of the resolved flow at or around the same grid point. For example,Maulik et al. [56] and Xie et al. [113, 114] have, respectively, studied 2D decaying homogenousisotropic turbulence (2D-DHIT) and 3D incompressible and compressible turbulence using this ap-proach (also, see [116]). In the non-local approach, which often employs variants of convolutionalneural networks (CNNs), the SGS term over the entire domain is estimated in terms of the re-solved flow in the entire domain to account for potential spatial correlations (e.g., due to coherentstructures) and non-homogeneities in the system. For example, Zanna and Bolton [7, 120], Beckand colleagues [4, 44], Pawar et al. [75], and Subel et al. [99] have used this approach for oceancirculation, 3D-DHIT, 2D-DHIT, and forced 1D Burgers’ turbulence, respectively.In a priori (offline) tests, in which the accuracy of the SGS model in estimating the SGS term asa function of the resolved flow is evaluated, some of these studies have found the data-driven SGSmodels to accurately account for inter-scale transfers (including backscattering) and outperformphysics-based models such as SMAG and DSMAG [7, 56, 75, 120, 122]. However, most of the samestudies have also found that in a posteriori (online) tests, in which the data-driven SGS model iscoupled with a coarse-resolution numerical solver, the LES model is unstable, leading to numericalblow-up or physically unrealistic flows [4, 5, 44, 56, 98, 115, 120, 122]. While the reason(s) for theseinstabilities remain unclear, a number of remedies have been proposed, e.g., post-processing of thetrained SGS model to remove backscattering or to attenuate the SGS feedback into the numericalsolver, or combining the data-driven model with an eddy viscosity model [4, 56, 120, 122] (also, seethe excellent review by Beck and Kurz [5]). However, such remedies include ad-hoc components andoften substantially take away the advantages gained from the non-parametric, data-driven approach.Instabilities in a posteriori tests remain a major challenge to broadening the applications ofML-based data-driven SGS models for LES. Another major challenge is the generalization capabil-ity of the data-driven SGS models beyond the flow from which the training data are obtained, e.g.,extrapolation to turbulent flows with higher Reynolds numbers ( Re ). The ability to generalize isimportant for at least two reasons: i) High-fidelity data from usually expensive simulations (e.g.,DNS) are needed to train data-driven SGS models and given the sharp increase in the computa-tional cost of DNS with Re , the ability to effectively extrapolate to higher Re makes data-drivenSGS models much more useful in practice; and ii) Some level of generalization capability in thedata-driven SGS models is essential for the LES models to be robust and trustworthy. However,it is known that such extrapolations are challenging for neural networks in general [43]. In LESmodeling, a priori tests in 3D-DHIT have shown that the performance of data-driven SGS modelsdegrades when applied to Re higher than the one for which the model is trained. In a posteriori tests with multi-scale Lorenz 96 systems [16] and forced 1D Burgers’ turbulence [99], we foundinaccurate generalization to more chaotic systems or flows with higher Re , particularly in terms ofshort-term prediction and re-producing the long-term statistics of rare events. However, in bothstudies, we also found that transfer learning, which involves re-training (part of) the already trainedneural network using a small amount of data from the new system [118], enables accurate general-ization, e.g., to 10 × higher Re [16, 99]. While promising, the effectiveness of transfer learning inenabling generalization in more complex turbulent flows needs to be investigated.3uilding on these earlier studies, here we use a deep fully CNN architecture to build a non-localdata-driven SGS model for a 2D-DHIT system using DNS data, and aim to(a) Examine the accuracy of this SGS model in a priori (offline) tests, with regard to bothpredicting the SGS terms and capturing inter-scale transfers,(b) Evaluate the accuracy and stability of LES with this SGS model (LES-CNN) in a posteriori (online) tests, both in terms of short-term predictions and long-term statistics,(c) Assess the effectiveness of transfer learning in enabling accurate and stable generalization ofLES-CNN to higher Re (up to 16 × ). We also show generalization to higher grid resolutionsby adding an encoder-decoder architecture to the CNN.For (a) and (b), we also present results from the SMAG and DSMAG models as well as a localANN-based data-driven SGS model.The remainder of this paper is structured as follows. Governing equations of the 2D-DHITsystem, the filtered equations, and the DNS and LES numerical solvers are presented in Section 2,followed by descriptions of the data-driven SGS models (training data and the CNN and ANNarchitectures) and the physics-based SGS models (SMAG and DSMAG) in Section 3. Resultsof the a priori and a posteriori tests as well as generalization to higher Re and/or resolutions viatransfer learning are presented in Section 4. Conclusions and future work are discussed in Section 5. The dimensionless governing equations of 2D-DHIT in the vorticity ( ω ) and streamfunction ( ψ )formulation in a doubly periodic x − y domain are ∂ω∂t + N ( ω, ψ ) = 1 Re ∇ ω, (1a) ∇ ψ = − ω, (1b)where the nonlinear term N ( ω, ψ ) represents advection N ( ω, ψ ) = ∂ψ∂y ∂ω∂x − ∂ψ∂x ∂ω∂y . (2)2D turbulence is a fitting prototype for many large-scale geophysical and environmental flows(where rotation and/or stratification dominate) and has been widely used as a testbed for noveltechniques, including ML-based SGS modeling [e.g., 10, 100, 103–105]. In DNS, as discussed indetail in Section 2.2, Eqs. (1a)-(1b) are numerically solved at high spatio-temporal resolutions.To find the equations for LES, we apply filtering (denoted by ( · ) and defined later) to Eqs. (1a)-(1b) to obtain ∂ω∂t + N ( ω, ψ ) = 1 Re ∇ ω + N ( ω, ψ ) − N ( ω, ψ ) | {z } Π , (3a) ∇ ψ = − ω. (3b)4ote that in deriving these equations, we assume that the filter commutes with the spatial (andtemporal) derivative operators, which is the case for commonly used filters such as box, sharpspectral, and Gaussian filters [81, 86]; the latter is used in this work (see Section 3.1). As discussedin Section 2.2, ths numerical solution of Eqs. (3a)-(3b) requires spatio-temporal resolutions lowerthan those of the DNS. However, the SGS forcing term, Π, includes the effects of the small-scaleeddies that have been truncated due to filtering/coarse-graining and are not resolved in LES. As aresult, Π has to be estimated solely based on the resolved variables ( ω, ψ ) to close Eqs. (3a)-(3b),a problem that is at the heart of turbulence modeling [81].In most physics-based models, such as those using eddy viscosity, Π is modeled as a purelydiffusive process (SMAG and DSMAG are described in Section 3.4). In data-driven approaches, suchas the one pursued here and discussed in Sections 3.2 and 3.3, the aim is to learn the relationshipbetween ( ω, ψ ) and Π in DNS data using methods such as deep neural networks, without any priorassumptions about the functional form of this relationship. For DNS, we solve Eqs. (1a)-(1b) in a doubly periodic square domain with L × L = [0 , π ] × [0 , π ].A Fourier-Fourier pseudo-spectral solver is used along with second-order Adams-Bashforth andCrank-Nicolson time-integration schemes for the advection and viscous terms, respectively. Thecomputational grid has uniform spacing ∆ DNS = 2 π/N
DNS , where N DNS is the number of grid pointsin each direction. We use N DNS = 2048 for Re = 8000 , , and 64000, and N DNS = 3072 for Re = 128000. The time-stepping size ∆ t DNS = 10 − (∆ t DNS = 5 × − ) is used for N DNS = 2048( N DNS = 3072). Following Refs. [55, 56], the initial condition of each DNS run is a random vor-ticity field but with the same prescribed energy spectrum (see Appendix A for details). For eachof the Re mentioned above, we conducted 15 independent DNS runs from random initial conditions.The numerical solver is implemented in Python using CUDA GPU computing. We use equalnumbers of GPU blocks as the resolution in each direction such that only one GPU thread in eachblock is assigned for the computation on one computational grid point. The fast Fourier transform(FFT) and inverse fast Fourier transform (iFFT) operations are performed using the cuFFT library.Double-precision floating-point arithmetic is used for all numerical solvers.Figure 1 shows an example of the vorticity field for Re = 32000 at the initial condition ( t = 0),and at t = 50 τ and t = 200 τ , where τ = 1 / | ω | max = 0 .
02 = 200∆ t DNS ( | ω | max is computed at t = 0).After around 50 τ , the turbulent kinetic energy (TKE) spectrum ( ˆ E ( k )) exhibits self-similarity. Notethat the TKE spectrum is calculated using an angle average and therefore k = q k x + k y .For LES, we solve Eqs. (3a)-(3b) using the same numerical solver used for DNS, except thatthe spatial resolution is lower by a factor of 8 in each direction (i.e., N LES = N DNS / LES =8∆
DNS ) and the time-stepping size is 10 times larger, ∆ t LES = 10∆ t DNS . As a result, the LESsolver requires 640 times fewer degrees of freedom, which substantially reduces the computationalcost. However, the LES solver needs a SGS model for Π. Here, we use two data-driven modelsthat employ CNN and ANN as well as two common physics-based models (SMAG and DSMAG).In the next section, we first describe the filtered DNS (FDNS) data, which are used for trainingthe data-driven SGS models, and then describe the CNN, ANN, SMAG, and DSMAG models.5 −
25 0 25 50(a) t = 0 (b) t = 50 τ (c) t = 200 τ (d) ˆ E ( k ) spectrumt=0t=50 τ t=200 τ k − − − − − − − − Wavenumber, k Figure 1:
An example of the vorticity field of DNS for Re = 32000 at a) t = 0, b) t = 50 τ , and c) t = 200 τ .The initial turbulent kinetic energy (TKE) spectrum is prescribed while the vorticity field has random phase(see Appendix A). Data collection for training of data-driven SGS models (using CNN or ANN) starts from t = 50 τ and ends at t = 200 τ . As shown in (d), in this period, the TKE spectra exhibit self-similarityfollowing the k − scaling of the Kraichnan-Batchelor-Leith (KBL) theory [3, 41, 46]. Data-driven and physics-based SGS models for LES
To compute the filtered DNS variables on the LES grid, which as mentioned above is 8 × coarserthan the DNS grid in each direction, we i) apply the Gaussian filter transfer function to the DNSdata, and ii) coarse-grain the filtered results to the LES grid [81, 120]. Below, the subscript “DNS”denotes the high-resolution DNS grid and “LES” denotes the coarse-resolution LES grid.Using vorticity as an example, we first transform the DNS vorticity field ω ( r DNS ) into thespectral space ˆ ω ( k DNS ), where r = ( x, y ) and k = ( k x , k y ). Then, we apply the Gaussian filter inthe spectral space ˜ˆ ω ( k DNS ) = G ( k DNS ) (cid:12) ˆ ω ( k DNS ) , (4)where the operator (cid:12) means element-wise multiplication of matrices and ˜( · ) denotes the filteredvariable at the DNS resolution. The transfer function of the Gaussian filter is [81]: G ( k DNS ) = e −| k DNS | ∆ F / , (5)where ∆ F is the filter size, which is taken to be ∆ F = 2∆ LES to yield sufficient resolution [81, 122].After the filtering operation, coarse-graining is performed to transform the filtered solution fromthe DNS to LES grid [81, 120]:ˆ ω ( k LES ) = ˜ˆ ω ( | k x | < k c , | k y | < k c ) (6)where k c = π/ ∆ LES is the cut-off wavenumber in spectral space, and we use ( · ) to denote the fil-tered and then coarse-grained variables (hereafter, we use the term “filtered” for both “filtered” andthen “coarse-grained” when there is no ambiguity). ˆ ψ ( k LES ) and ˆΠ( k LES ) are similarly computedfollowing Eqs. (4)-(6).Note that in addition to the Gaussian filter, box and sharp Fourier filters are also commonlyused for LES. However, the Gaussian filter is compact in both physical and spectral spaces [81].Because our numerical solver is in the Fourier spectral space and our CNN and ANN operate inthe physical space, we focus on the Gaussian filter for LES. Furthermore, Zhou et al. [122] foundthat the Gaussian filter outperforms the other two filters in terms of correlation coefficients of Πin their work on data-driven SGS modeling of 3D turbulence.Figure 2 shows examples of the Π term and effects of filtering on the vorticity field in physicalspace and on the TKE spectrum ( ˆ E ( k )). The fine structures in DNS vorticity ω are lost in filteredvorticity ˜ ω and manifest themselves in SGS vorticity ω = ω − ˜ ω and the SGS forcing term Π(panels (c)-(d)). The ˆ E ( k ) spectrum further shows the effects of the Gaussian filter on the energyat smaller scales (panel (e)). The Gaussian filter leads to the deviation of the FDNS spectrum fromthe DNS spectrum, especially at the scales near k c .Our goal is to non-parameterically learn Π as function of the FDNS variables ω and ψ using adeep fully CNN as well as an ANN used in a previous study [56]. For non-local data-driven SGS modeling, we propose to use a deep fully CNN. The CNN architecturewas originally developed for computer vision and image processing and its key feature is that rather7a) ω (b) ˜ ω (e) ˆ E ( k ) spectrum and Gaussian filter(c) ω = ω − ˜ ω (d) Π −
50 500 − −
10 100 −
10 10 − − − − − − − k − k c DNSFDNSGaussian filterWavenumber, k Figure 2:
Examples showing the effects of filtering. a) DNS vorticity ω , (b) filtered vorticity ˜ ω , (c) SGSvorticity ω = ω − ˜ ω , (d) SGS forcing term Π, and (e) TKE spectrum for Re = 32000 at the end of one ofthe DNS runs ( t = 200 τ ). Panel (e) also shows the transfer function of the Gaussian filter and the cutoffwavenumber, k c . The FDNS spectrum deviates from the DNS spectrum near k c because of the filtering. than having pre-defined filters, CNNs learn the filters used for pattern recognition for a given dataset [31, 42, 45]. CNNs have often been found superior to ANNs when the data contains spatialpatterns and structures significant to the functional relationship to be learned [21, 76]. Therefore, itis not surprising that CNNs have been found to perform well, usually superior to non-convolutionalML methods, in applications involving turbulent flows, given the abundance of coherent structuresand spatial correlations in turbulence [e.g., 4, 7, 12, 15, 60, 75]. Specifically for SGS modeling, arecent a priori analysis has shown that CNN outperforms local ANN in terms of prediction accuracyof the SGS stress term in the same 2D-DHIT system studied here [75].Building on previous work and to account for non-local effects (e.g., coherent structures andspatial correlations), we use a CNN with inputs/outputs that are global (i.e., from the entiredomain). Thus, the input features are ( ψσ ψ , ωσ ω ) ∈ R × N LES × N LES , (7)and the output targets are ( Π σ Π ) ∈ R N LES × N LES , (8)where σ is the standard deviation of the corresponding variables calculated over all training samples.We aim to use a CNN to learn M , an optimal map between the inputs and outputs M : n ψ/σ ψ , ω/σ ω o ∈ R × N LES × N LES → n Π /σ Π o ∈ R × N LES × N LES (9)8 ψ ¯ ω Π Convlayer Convlayer Convlayer Convlayer Convlayer
Figure 3:
Schematic of the CNN. Inputs and outputs are samples of normalized ( ¯ ψ, ¯ ω ) and Π, respectively.The convolutional layers (Conv layers) have the same dimension (256 × ×
5. The activation function of each layer is ReLu (rectified linearunit) except for the last one, which is a linear map. by minimizing the mean-squared-error (
M SE ) M SE = 1 n tr n tr X i =1 k Π CNN i − Π FDNS i k , (10)where n tr is the number of training samples and k · k is the L norm.Figure 3 schematically shows the CNN architecture that is used here. We use the mini-batchstochastic gradient descent method with the Adam optimizer to minimize the loss function, Eq. (10).Note that the CNN has no pooling or upsampling layers (i.e., fully CNN), so the hidden layers havethe same size as the input and output layers. We have found that using a fully CNN (i.e. withoutan up/down sampling) is a key to training an accurate SGS model, consistent with earlier findingsthat pooling layers may artificially change spatial correlations of the data [15].Hyper-parameters such as the number of hidden layers have been determined via an extensivesearch. We find that to capture the complex pattern of Π, a deep CNN with 10 hidden layers isneeded. For example, the 10-layer CNN outperforms shallower 8-layer and 5-layer CNNs in termsof training loss for the same n tr . Overall, the CNN with 10 layers has 927041 trainable parameters.The training, validation, and testing sets are generated using 2D snapshots of filtered datacollected from 15 independent DNS runs with random initial conditions, sampled every 10∆ DNS ,in the time interval [50 τ, τ ]. We use the data from 8 runs for the training set, 2 runs for thevalidation set, and 5 runs for the testing set. The effects of the size of the training dataset on theaccuracy of the SGS model is further discussed in Section 4.1.As is the common practice in ML applications, we run the CNN (and the ANN) with single-precision floating-point operations during both training and testing to accelerate the process andreduce the data transfer/storage. We have also explored training/testing a CNN with double-precision floating-point arithmetic, but found no distinguishable enhancement in the a posteriori tests.Finally, we point out that the codes for CNN and CNN with transfer learning (discussed later)are made publicly available on GitHub (see the Acknowledgement for details).9 .3 Multilayer perceptron artificial neural network (ANN) A few recent studies have proposed building local data-driven SGS models using ANNs trained tolearn the mapping between a local stencil of input variables to the local SGS term Π [56, 114, 116,122]. For example, Maulik et al. [56] employed such an approach for the same 2D-DHIT systemand proposed to train an ANN with inputs consisting of 9 grid stencil values of ω and ψ plus thelocal values of | S | and |∇ ω | and the output consisting of the local SGS term Π value: M : n ω i,j , ω i,j +1 , ..., ω i − ,j − , ψ i,j , ψ i,j +1 , ..., ψ i − ,j − , | S | i,j , |∇ ω | i,j o ∈ R → n Π i,j o ∈ R , (11)where ( i, j ) here denotes a local grid point. | S | is the characteristic filtered rate of strain [81] and |∇ ω | = r(cid:16) ∂ω∂x (cid:17) + (cid:16) ∂ω∂y (cid:17) .We have closely followed Ref. [56] in building a local data-driven SGS model. For the ANN, weuse their publicly available code. The ANN is fully connected with 2 hidden layers, each containing50 neurons. The network has 3651 trainable parameters. We explore architectures with more layersand neurons per layer, but find no improvement in the accuracy. Due to the use of local inputs,in this approach the number of training samples is equal to the number of snapshots multiplied by N . In common practice, only a few (less than 10) snapshots of data is used as the training dataset [56, 114, 115, 122]. Here, following Ref. [56], we use 8 randomly selected snapshots (from thetraining set mentioned in Section 3.2) resulting in 524288 samples in the training sets. We havealso investigated the effects of increasing the number of samples to 20 snapshots, but again, nosubstantial improvement in training loss is found. Note that following Ref. [56], no pre-processing,e.g., normalization, is performed on the input or output data (we find normalizing the input/outputsamples to have no effect on the performance of the ANN).Note that it is not the purpose of this paper to compare the ANN- and CNN-based approachesside by side (even if such comparison is possible given the differences in architecture, networksize, input/output, and size of the training set). Therefore, beyond the explorations mentionedabove, we have not performed an exhaustive search on the ANN and local SGS modeling approach.Our explorations all suggest that the comprehensively investigated network/approach presented inRef. [56] is already optimal. In the SMAG [95] model, which is a commonly used baseline SGS model for LES, the SGS stressterm in the momentum equation is modeled as [81, 86]: τ SMAG = − C s ∆) h S S i / S, (12)where the angle brackets h·i denote domain averaging. S is the filtered rate-of-strain tensor [81].The SGS term Π in Eq. (3a) is therefore:Π SMAG = ( C s ∆) h S S i / ∇ ω = ν e ∇ ω, (13)where C s is the Smagorinsky coefficient, ν e is the eddy viscosity, and h S S i / = vuut ∂ ψ∂x∂y ! + ∂ ψ∂x − ∂ ψ∂y ! . (14)10able 1: Correlation coefficients c (Eq. (15)) between the predicted and true SGS term Π for Re = 32000 in a priori tests. The subscripts indicate c computed only over elements of Π F DNS and Π M corresponding to T >
T < c . ± .
06 0 . ± .
02 0 . ± . c T > . ± .
06 0 . ± .
02 0 . ± . c T < . ± .
02 0 . ± . C s is a constant in the SMAG model. The DSMAG model [30] uses a dynamic procedure toestimate ν e based on the local flow structure. This procedure can lead to ν e <
0, which can resultin numerical stabilities; consequently, “positive clipping” is often applied to enforce ν e ≥ C s = 1 for SMAG following Maulik et al. [56] and implement DSMAG (with positiveclipping) following Pawar et al. [75], who studied the same 2D-DHIT system. Note that theseSMAG and DSMAG models both have ν e ≥ A priori analysis
We first examine the accuracy of the CNN-based SGS model in predicting the Π term and inter-scaletransfers for never-seen-before samples of ( ψ, ω ) from the testing set. The results in Section 4.1.1are reported for n tr = 50000. We use a commonly used metric, the correlation coefficient c betweenthe modeled (Π M ) and true (Π FDNS ) SGS terms defined as [4, 75, 121]: c = D(cid:16) Π M − h Π M i (cid:17)(cid:16) Π FDNS − h Π FDNS i (cid:17)ErD (Π M − h Π M i ) ErD (Π FDNS − h Π FDNS i ) E . (15)The correlation coefficients (averaged over 100 random testing samples) for CNN as well as DSMAGand ANN are reported in Table 1. These a priori tests show that the data-driven SGS models sub-stantially outperform DSMAG, and that this CNN-based model (with c above 0 .
9) has statisticallysignificantly higher accuracy than this ANN-based model. Note that similarly, previous findingsbased on correlation coefficients of SGS stress term found CNNs to outperform ANNs in a priori tests [75].Next, we examine the inter-scale transfer in a priori tests. The transfer is often quantifiedusing the SGS stress [81]. Since here we are working with the SGS forcing term Π, which is thecurl of the divergence of the SGS stress, we instead follow previous work and define SGS transfer T as [38, 56, 103]: T = sgn ( ∇ ω ) (cid:12) Π , (16)where sgn ( · ) is the sign function. At each grid point ( i, j ), T i,j > T i,j < ω (b) T F DNS (c) T CNN (d) T ANN (e) T DSMAG
BackscatterForwardtransfer −
50 100 − Figure 4:
Example of inter-scale transfer T , Eq. (16), in a priori analysis at Re = 32000. a) Filteredvorticity ω ; b) true T from FDNS; (c)-(e) T from CNN, ANN, and DSMAG. The ANN and CNN captureboth forward transfer and backscatter while DSMAG only captures the forward transfer (diffusion). Theupper row shows the entire domain while the second row shows the portion in the black square. cascade, which is a physical property [81, 104]. For a sample filtered vorticity ω , Fig. 4 shows thetrue inter-scale transfer T FDNS and T from CNN, ANN, and DSMAG. Because DSMAG is purelydiffusive, it only captures the forward transfer. The ANN and CNN both capture the diffusion aswell as backscattering. Table 1 further shows c computed separately over grid points correspondingto only T >
T < c > . a priori tests show that the CNN-based data-driven SGS model can accu-rately predict the out-of-sample SGS forcing terms and inter-scale transfers. However, as discussedin the Introduction, previous studies have found that accuracy in a priori tests does not necessarilytranslate to accuracy/stability in a posteriori analysis [4, 56, 120, 122]. Before discussing the aposteriori tests in Section 4.2, we further examine how the accuracy of the CNN depends on thesize of the training set, which as it turns out, impacts the stability of LES-CNN. n tr Table 2 shows how the SGS term’s correlation coefficient c varies in a priori tests as the numberof samples used to train the CNN ( n tr ) is increased. The value of c increases with n tr , reaching0 .
90 with n tr = 10000 and 0 .
93 with n tr = 50000. While c = 0 .
90 (for n tr = 10000) might seemhigh enough and the CNN-based data-driven SGS model might seem accurate enough, a set of aposteriori tests with this LES-CNN model are found to lead to noisy, unphysical flows for someinitial conditions. In fact, a posteriori tests with LES-CNN trained with lower n tr (500 or 1000)lead to numerically unstable simulations that blow-up. Only simulations with n tr ≥ a posteriori tests might be due to inaccurateout-of-sample predictions as a result of insufficient training data. These findings are consistent12able 2: Correlation coefficients c (Eq. (15)) between the CNN-predicted and true SGS term Π for Re = 32000 in a priori tests as a function of the number of training samples n tr . The values showthe average over (the same) 100 randomly chosen testing samples and the standard deviation. Thelast row indicates the fate of a posteriori LES-CNN integrations from 5 random initial conditions:unstable refers to numerical blow-up, unphysical refers to simulations leading to noisy/unrealisticflows, and stable refers to numerically stable and accurate simulations. n tr
500 1000 10000 30000 50000 c . ± .
05 0 . ± .
04 0 . ± .
04 0 . ± .
04 0 . ± . c T > . ± .
05 0 . ± .
03 0 . ± .
04 0 . ± .
04 0 . ± . c T < . ± .
04 0 . ± .
03 0 . ± .
04 0 . ± .
04 0 . ± . a posteriori LES-ANN, which was traced to inaccurate Πterms around some of the shockwaves. In that study, we showed that artificially enriching the train-ing dataset using a data augmentation strategy [25, 68, 117] led to a stable and accurate LES-ANN.Table 2 further reports c T > and c T < as a function of n tr . This analysis shows that consistently, c T < is lower than c T > , especially at small n tr , but the difference declines from 0 .
15 to 0 .
04 withincreasing n tr . The implication of these results is that the SGS model with a CNN trained usinga small n tr is less capable of accurately predicting backscattering than forward transfer, which,based on previous findings, could lead to instabilities. As discussed in the Introduction, capturingbackscattering is highly desired; however, it is known from physics-based SGS modeling effortsthat it can lead to instabilities if handled incorrectly [49, 59]. Moreover, in recent data-driven SGSmodeling efforts, as discussed later, removing backscattering has been used as a way of stabilizing a posteriori LES [56, 122]. Table 2 shows that at least for our CNN, the backscattering can beaccurately captured and the a posteriori
LES can be stable without any further post-processing ifthe training set is large enough.In short, these results suggest that neural networks that may “seem” well-trained and accuratein a priori (offline) tests, may not be sufficient for stable/accurate LES in a posteriori (online) tests.We say “seem” because there is no established a priori metric and threshold to know if a data-driven SGS model is well-trained and accurate enough to lead to stable and accurate a posteriori
LES. In this study, the threshold is empirically between c = 0 .
90 and c = 0 .
92, or if c T < is a bettermetric, between 0 .
89 and 0 .
91. To be clear, these are just empirical thresholds in this testcase, andsuch thresholds might be case-dependent. Whether a general connection between a data-drivenSGS models’ accuracy in a priori tests and the a posteriori
LES stability could be established ornot should be thoroughly investigated in future work. Furthermore, we emphasize that we do notclaim that all instabilities in other a posteriori
LES runs using data-driven SGS models (reportedin other studies) are due to similar inaccuracies that could be reduced by enriching the trainingset.
A posteriori analysis
In the a posteriori (online) tests, the CNN-based data-driven SGS model and the LES numericalsolver of Eqs. (3a)-(3b) are coupled (LES-CNN): at a given time step, the resolved variables ( ψ, ω )from the numerical solver are normalized (dividing by their σ ) and fed into the already trained13 e = 8000 Re = 32000 FDNSLES-CNNLES-ANN w/ Eq. (17)LES-ANNLES-DSMAG τ E ( t ) /E Figure 5:
Evolution of kinetic energy E ( t ) normalized by E = E (0) in a posteriori tests from 5 randominitial conditions at Re = 8000 and Re = 32000. Note that for each Re , the ANN- and CNN-based data-driven SGS models have been trained on data from that Re . Curves show the mean from the 5 integrations.The LES integrations start at t = 50 τ . All stable LES models overpredict the decay rate but LES-CNN isclosest to the FDNS while LES-DSMAG, and even more so the post-processed LES-ANN with backscatteringremoved, are too dissipative. LES-ANN without post-processing is unstable and blows up. CNN, which predicts Π
CNN . This Π
CNN is then de-normalized (multiplying by σ Π ) and fed backinto the numerical solver to compute the resolved flow in the next time step, and the cycle con-tinues. The CNN used for the a posteriori tests is trained with n tr = 50000 and leads to stableLES-CNN in all tests conducted here. Similarly, we use the ANN-based data-driven SGS modeland the physics-based SGS model SMAG and DSMAG to conduct LES-ANN, LES-SMAG, andLES-DSMAG integrations.Figure 5 shows examples of the evolution of the kinetic energy E ( t ) = −h ψω i / Re = 32000 as well as for Re = 8000. Whilethe LES-CNN and LES-DSMAG are stable, LES-ANN is unstable, leading to rapid increases in E and blow up. In their pioneering work, Maulik et al. [56] also found this LES-ANN unstable andproposed a post-processing step: Π ANN i,j = 0 , ∀ T i,j < , (17)which effectively, like the positive clipping used for DSMAG, eliminates backscattering based on T from Eq. (16). A similar procedure was used by Zhou et al. [122] to stabilize their LES-ANN for3D-DHIT. While this post-processed LES-ANN is stable, it is excessively dissipative (even morethan DSMAG) and substantially overpredicts the energy decay rate. LES-CNN, which is stablewithout any post-processing and accounts for both diffusion and backscattering, has the closestagreement with FDNS in terms of the decay rate. It should be pointed out that it is possible thatincreasing the number of training samples for the ANN also leads to a more accurate and perhapsa stable LES-ANN; however, as mentioned before, the focus of this work is on LES-CNN and acomprehensive investigation of LES-ANN is beyond the scope of this paper. We present the results14 ES-CNNLES-DSMAGLES-SMAGLES-ANN w/ Eq. (17)
Relativeerror: error L time/ τ Short-term prediction accuracy of LES models in a posteriori tests at Re = 32000. Predictionsstart at t = 50 τ in the 5 testing sets. For each model, curves show the evolution of the relative L -normerror, error L ( t ) = k ¯ ω LES − ¯ ω FDNS k / k ¯ ω FDNS k , averaged over the 5 integrations. The LES-CNN hasthe highest accuracy and outperforms LES-DSMAG. The large error in the post-processed LES-ANN andin LES-SMAG is due to excessive dissipation (see Fig. 7). with LES-DSMAG as a baseline and present the results with the recently published LES-ANN togive the readers a better view of the state-of-the-art in this field.To examine the accuracy of LES-CNN in short-term forecasting, Fig. 6 presents the relative L -norm error in the prediction of ω averaged from 5 random initial conditions in the testing setfrom t = 50 τ to 200 τ . The results show that LES-CNN has the highest accuracy, outperformingthe next best model, DSMAG. The post-processed LES-ANN and LES-SMAG have substantiallyhigher errors, which as the next analysis shows is due to their excessive dissipation. To furtherevaluate the short-term accuracy of these LES models, Fig. 7 shows an example of ω ( x, y ) at t = 100 τ, τ, and 200 τ predicted from an initial condition at t = 50 τ in the testing set. Evi-dently, LES-CNN is capable of predicting both small- and large-scale structures well, and outper-forms LES-DSMAG, which while capturing most of the large-scale structures well, misses manyof the small-scale structures. The post-processed LES-ANN and LES-SMAG are too diffusive andmiss most small-scale structures, substantially underpredicting the magnitude of ω , especially atlater times.The above analysis shows that the superior accuracy of the CNN-based SGS model in a pri-ori tests translates to high accuracy in short-term forecasts with LES-CNN in a posteriori tests.Next, we examine the accuracy of these a posteriori LES models in reproducing the statistics ofthe turbulent flow, which is an important test for the applicability of these models [63]. Figure 8shows the TKE spectrum and probability density function (PDF) of vorticity at t = 200 τ from the5 simulations in the testing sets. Among the LES models, LES-CNN has the best performance: itsTKE spectrum matches that of the FDNS across wavenumbers and its PDF matches that of theFDNS, even at the end of the tails. The next best-performing model is LES-DSMAG, whose TKEspectrum overall agrees with FDNS, although this model is more diffusive than LES-CNN. The ex-cessive diffusion is more noticeable in the PDF of the vorticity field: while the PDF of LES-DSMAG15DNSLES-CNNLES-ANNw/ Eq. (17)LES-DSMAGLES-SMAG -40 -20 0 20 40 t = 100 τ t = 150 τ t = 200 τ Figure 7:
Examples of the vorticity fields at t = 100 τ, τ, and 200 τ from one of the testing sets at Re = 32000. ω from FDNS is shown in the first row (used as the “truth” for the LES). Rows 2-5 show ω predicted from t = 50 τ using 4 a posteriori LES models. The LES-CNN captures the patterns andmagnitudes of both large- and small-scale structures well, except at the latest time at t = 200 τ . While LES-DSMAG predicts most of the large-scale structures and some of the small-scale structures well, particularlyat the earlier times, its overall accuracy is lower than that of LES-CNN (also see Fig. 6). The post-processed LES-ANN has a reasonably good performance at t = 100 τ , but at later times, this model andthe LES-SMAG model are too diffusive such that the magnitude of the vorticity field is underpredicted andsmall-scale structures are missing. matches the bulk of the FDNS’ PDF, there are large deviations at the tails, beyond ± E ( k ) spectrum k − ω/σ ω Wavenumber, k − − − − − − − − -6 -4 -2 0 2 4 610 − − − − − − − DNSFDNSLES-CNNLES-CNN w/ Eq. (17)LES-DSMAGLES-SMAGLES-ANN w/ Eq. (17)
Figure 8:
The TKE spectrum ˆ E ( k ) and probability density function (PDF) of vorticity at t = 200 τ from a posteriori tests at Re = 32000. Results are from independent runs in the 5 testing sets. For ˆ E ( k ), thespectrum from each run is calculated and then averaged. For the PDF, data from all 5 runs are combined andthe PDF is calculated using a kernel estimator [109]. For both the TKE spectrum and PDF, the LES-CNNhas the best performance, followed by LES-DSMAG. Results from post-processed LES-CNN with Eq. (17)are shown just to demonstrate the importance of capturing backscattering for the excellent performance ofLES-CNN in reproducing the TKE spectrum of FDNS and the tails of the FDNS’ PDF. The post-processedLES-ANN with Eq. (17) and LES-SMAG are too diffusive, which shows in both TKE spectrum and PDF. to TKE spectra that quickly curl down as k increases and PDFs that substantially deviate fromthe FDNS’ PDF at the tails (for LES-SMAG, even in the bulk). Just to further demonstrate theimportance of capturing backscattering in the outstanding performance of LES-CNN in matchingthe FDNS’ spectrum and PDF, Fig. 8 also presents results from a post-processed LES-CNN withEq. (17) (i.e., backscattering removed), showing that the model becomes excessively diffusive (withperformance comparable to that of the LES-DSMAG).The a posteriori results show the advantages of the CNN-based data-driven SGS model, whichprovides a stable LES model while capturing backscattering, and yields superior performance forboth forecasting short-term spatio-temporal evolution and reproducing long-term statistics of theturbulent flow. Re So far, we have tested the data-driven SGS model and the LES-CNN on flows with the same Re as the flow from which data was collected for the training of the CNN ( Re = 8000 or Re = 32000).As discussed in the Introduction, the capability to generalize beyond the training flow, in partic-ular to extrapolate to turbulent flows with higher Re in a posteriori tests, is essential for robust,trustworthy, and practically useful LES models. Neural networks are known to have difficulty withextrapolations, and in our recent work with multi-scale Lorenz 96 equations and forced 1D Burgers’turbulence, we found that data-driven SGS models do not generalize well to more chaotic systemsor flows with 10 × higher Re , leading to inaccurate predictions in a posteriori (online) tests [16, 99].17 E ( k ) Test on Re = 8000 Test on Re = 32000 Test on Re = 64000 k − k − k − FDNSLES-CNN Re =8000 FDNSLES-CNN Re =8000 LES-TL-CNN Re =8000 LES-CNN Re =32000 FDNSLES-CNN Re =8000 LES-TL-CNN Re =8000 LES-CNN Re =64000 − − − − Wavenumber, k Figure 9:
Transfer learning to higher Re . The TKE spectrum ˆ E ( k ) at t = 200 τ from a posteriori tests atthree different Re . Results are from independent runs in the 5 testing sets. For ˆ E ( k ), the spectrum fromeach run is calculated and then averaged. The superscript indicates the Re on which the CNN is trainedwith n tr = 50000 samples. TL (transfer learned) means that the CNN has been re-trained with n T Ltr = 500samples (1% of n tr ) from the Re on which the LES-CNN is tested on (indicated in the title of each panel).In each panel, the blue lines show that the LES-CNN trained and tested on the same Re is accurate and itsTKE spectrum agrees with that of the FDNS. However, the red lines in the two panels on the right show thatthe LES-CNN trained on Re = 8000 does not perform well at 4 × or 8 × higher Re , with the TKE spectra ofthe simulated flow substantially deviating from that of the FDNS at high k near k c . The red dashed linesshow that the LES-TL-CNN pre-trained on Re = 8000 and transfer learned with a small amount of datafrom the higher Re perform well at 4 × or 8 × higher Re . Similarly, Fig. 9 shows that for the 2D-DHIT system studied here, a data-driven SGS model trainedon data from Re = 8000 leads to a posteriori LES-CNN that is accurate only at Re = 8000 butnot at Re = 32000 or 64000. At these higher Re , the TKE spectra deviate substantially from thespectrum of the FDNS.In both Chattopadhyay et al. [16] and Subel et al. [99], we showed that transfer learning enablesaccurate generalization/extrapolation of data-driven SGS models to more chaotic systems and tur-bulent flows with a 10 × higher Re , although the effectiveness of this approach beyond 1D and tomore complex turbulent flows remained to be investigated.Transfer learning involves taking a neural network that has been already trained for a givendata distribution (e.g., flow with a given Re ) using a large amount of data and re-training onlysome of its layers (usually the deeper layers) using a small amount of data from the new datadistribution (e.g., flow with a higher Re ) [31, 118]. For example, Fig. 10 shows the schematic ofthe transfer-learned CNN used here. While similar to the original CNN (Fig. 3), there is one majordifference: for transfer learning, the first 8 Conv layers use the weights already computed duringtraining with n tr samples from the lower Re . These weights are fixed and remain the same duringthe re-training. The last two Conv layers are initialized with weights computed during trainingwith n tr samples from the lower Re , but these two layers will be trained and their weights willbe updated using n T Ltr = n tr /
100 samples from the higher Re . The key idea of TL is that indeep neural networks, the first layers learn high-level features, and the low-level features that arespecific to a particular data distribution are learned only in the deeper layers [31, 118]. Therefore,for generalization, only the deeper layers need to be re-trained, which can be done using a small18 ψ ¯ ω ΠConvlayer Convlayer Convlayer Convlayer ConvlayerFigure 10:
Schematic of the CNN with transfer learning for extrapolation to higher Re . Everything isthe same as the original CNN shown in Fig. 3 with one exception: here, the first 8 Conv layers (gray) usethe weights already computed during training with n tr samples from the lower Re and are fixed (not to betrained). Only the last two Conv layers (blue) are going to be trained using n T Ltr = n tr /
100 samples fromthe higher Re , after these layers are initialized not randomly but using the weights computed for the lower Re . amount of data from the new distribution.To examine the effectiveness of transfer learning in the 2D-DHIT testcase, we take the CNNthat is already trained with n tr samples from Re = 8000 and re-train it with n T Ltr = n tr / Re = 32000 or Re = 64000. Figure 9 shows that the a posteriori LES with these transfer-learned CNNs (LES-TL-CNN Re =8000 ) accurately extrapolates to 4 × and8 × Re . In both cases, the accuracy of the transfer-learned LES-TL-CNN is as good as that ofthe LES-CNN trained with n tr samples from Re = 32000 and Re = 64000. Before showing theresults for accurate extrapolation to even higher Re (16 × ) in the next section, we point out thatthe number of layers to be re-trained and the number of samples used for re-training ( n T Ltr ) dependon the problem and require some trial and error for the best performance. Here, fixing the first6 layers and re-training the deeper 4 layers (with the same n T Ltr ) leads to similar LES-TL-CNNperformance. The goal of transfer learning is to minimize n T Ltr while achieving the accuracy of n tr ,with the number of re-trained layers being a hyper-parameter to be tuned to achieve this goal.Substantial exploration in forced 1D Burgers’ turbulence showed that the a posteriori performanceof LES with transfer-learned data-driven SGS models mainly depends on n T Ltr as long as more thanone layer is re-trained [99]. Re and higher LES numerical resolution One often-cited disadvantage of using CNNs (compared to local ANNs) for data-driven SGS mod-eling is dependence on the specific LES resolution for which the CNN has been trained, limiting theuse of the LES-CNN on a different grid resolution (and the use of transfer learning to extrapolateto even higher Re for which a higher LES resolution might be needed). Here, we show that thisissue can be easily addressed by adding pooling (encoder) and upsampling (decoder) layers to thetransfer learning architecture.For example, to use a CNN-based data-driven SGS model trained on data from Re = 8000and resolution 256 ×
256 and conduct a posteriori
LES-TL-CNN integrations at Re = 64000 or Re = 128000 with resolution N LES = 512, we can use the encoder-decoder architecture shown inFig. 11. Here, the number of convolutional layers are the same as before plus an additional layerbefore the encoder. The encoder with a pooling layer with stride two transforms the first layerfrom the input size (512 × Re and resolution (256 × n tr samples from the lower Re . These weights are kept fixed and19 ψ ¯ ω ΠConvlayer
Encoder
Conv layers
Decoder
Convlayer ConvlayerFigure 11:
Schematic of the CNN with transfer learning and encoder-decoder architecture for extrapolationto higher Re and higher LES grid resolution. There are few differences with the CNN shown in Fig. 10. Here,the input and output samples are at the higher resolution of 512 (inputs and outputs of the CNNs in Figs. 3and 10 are at the resolution of 256 ). The 8 Conv layers that are already trained with n tr samples from thelower Re and FDNS at the resolution of 256 ×
256 are embedded within an encoder-decoder architecture.These 8 layers (gray) are fixed (not to be trained). The last two layers (in blue) are initialized not randomlybut using the weights computed for the lower Re and lower resolution. A first layer (in blue) is addedbetween the input and the encoder, and is initialized randomly. Only these three layers are going to betrained using n T Ltr = n tr /
100 samples from the higher Re and higher resolution (512 ). these layers are not going to be trained. A decoder transforms the output of the last of these layersfrom the size 256 ×
256 to the size of the first of the last two layers, which is 512 × n tr samples from the lower Re (and lower resolution). Only these two layers and the very first layerwill be trained and their weights are updated using n T Ltr = n tr /
100 samples from the higher Re andhigher resolution. Here we use a factor of two increase in the resolution in each direction just asan example, and this approach can be used on any other resolution changes too.Figure 12 shows, for Re = 64000 and Re = 128000, the TKE spectrum for LES-TL-CNN incomparison to that of FDNS. In this LES-TL-CNN, the numerical resolution is N LES × N LES =512 ×
512 and its CNN has been trained with n tr = 50000 samples from Re = 8000 with resolution256 and transfer-learned with n T Ltr = n tr /
100 samples from Re = 64000 or Re = 128000 at theresolution of 512 . The results show that transfer learning enables extrapolation to over an order-of-magnitude increase in Re (16 × ) and with the encoder-decoder architecture, enables transferbetween different LES resolutions. The implications of these findings, in particular for practicalpurposes, are discussed in the next section. Using 2D decaying turbulence as the testbed, we have examined the performance of a CNN-based,non-local, data-driven SGS model in a priori and a posteriori analyses, with training and testingdone on data from flows with the same Re . We have also investigated the effectiveness of transferlearning in enabling a posteriori LES-CNNs that are trained on data from flows with low Re (andlow grid resolution) to work for flows with higher Re (and higher grid resolution). In all cases,training is done on filtered DNS data, and the performance is tested in comparison with out-of-sample filtered DNS data.As discussed in Section 4.1, a priori tests at Re = 32000 show that the trained data-drivenSGS model can accurately predict the SGS forcing terms from never-seen-before snapshots of theresolved flow with correlation coefficients c (Eq. (15)) around 0 .
93, substantially outperforminga baseline physics-based SGS model, DSMAG. The data-driven SGS model is also found to accu-20est on Re = 64000 Test on Re = 128000 k − k − DNSFDNSLES-TL-CNN Re =8000 DNSFDNSLES-TL-CNN Re =8000 − − − − − − ˆ E ( k ) Wavenumber, k Figure 12:
Transfer learning to higher Re and higher LES numerical resolution. The TKE spectrum ˆ E ( k )at t = 200 τ from a posteriori tests at two different Re . Results are from independent runs in the 5 testingsets. For ˆ E ( k ), the spectrum from each run is calculated and then averaged. The superscript indicates thatthe CNN has been trained with n tr = 50000 samples from Re = 8000 at the resolution of 256 × n T Ltr = 500 samples (1% of n tr ) from the Re on which the LES-TL-CNN is tested on (indicated in the title of each panel) at the resolution of 512 × Re = 64000, N DNS = 2048 and for Re = 128000, N DNS = 3072. The FDNS is at the resolution of 512 . Theblue lines show that the LES-TL-CNN pre-trained on Re = 8000 and transfer learned with a small amountof data from the higher Re and resolution perform well at 8 × or 16 × higher Re . Note that for LES-TL-CNNat both Re , here we use N LES = 512. rately capture both forward transfer and backscattering between the resolved and unresolved scales.To examine the connection between a priori and a posteriori performance, we have evaluatedthe accuracy of a priori tests (in terms of c ) and the stability of a posteriori LES-CNN as thenumber of training samples are varied from n tr = 500 to 50000 (Table 2). This analysis shows thatwhile the SGS model trained with n tr = 10000 seems accurate (with c = 0 . n tr ) is not stable. Increasing n tr to 30000 and 50000 furtherimproves c to 0 .
92 and 0 .
93, respectively, and leads to accurate and stable a posteriori
LES-CNN,without any need for post-processing or additional eddy viscosity. More analysis, in which c is cal-culated separately for grid points experiencing only forward transfer or only backscattering, showsthat at low n tr , the CNN captures backscattering with much lower accuracy compared to forwardtransfer, but that the difference decreases as n tr is increased. This analysis suggests that the insta-bilities of a posteriori LES-CNN trained with small training sets might be due to the inability ofthe SGS model to correctly represent backscattering. Why learning backscattering requires moredata remains to be studied in future work. This might be because backscattering is fundamentallyharder to learn data drivenly, or because backscattering is less frequent than forward transfer, orboth. While we do not claim that all instabilities in a posteriori (online) tests are due to this issueand could be overcome by increasing n tr , we believe that these findings can help future studiesin understanding the reasons(s) behind these instabilities and formulating rigorous solutions (see21elow for further discussions).As discussed in Section 4.2, a posteriori tests at Re = 32000 with the CNN trained with n tr = 50000 show that LES-CNN is stable and accurate. The LES-CNN outperforms LES-DSMAGand LES with other tested SGS models in terms of both short-term forecast and re-producing theTKE spectrum and PDF of vorticity (even at the tails). The main shortcoming of the other modelsis that they are too diffusive, primarily because they do not capture backscattering due to theirformulation or post-processing steps used to make them stable. The CNN-based SGS model learnsboth forward transfer and backscattering non-parameterically from data, and as mentioned above,once the latter is accurately captured with enough training samples, this SGS model leads to anaccurate and stable a posteriori LES-CNN.The analysis presented in Section 4.3 shows that a data-driven SGS model trained at Re = 8000does not lead to accurate a posteriori LES-CNN solutions (in terms of TKE spectra) at the higher Re , e.g., at Re = 32000 or Re = 64000. However, we show that transfer learning largely solvesthis problem and enables the LES-CNN trained for a flow at low Re to provide accurate and stablesolutions for flows with higher Re while requiring only a small amount of data from the flow athigher Re . The data-driven SGS model can even be coupled with LES solvers that use higher gridresolutions by adding an encoder-decoder architecture to the transfer-learned CNN (Section 4.4).For example, we show that a CNN trained with n tr = 50000 samples from Re = 8000 (at filteredresolution 256 × a posteriori LES-CNN for flows with Re = 128000 and N LES = 512 once 2 out of the 10 convolution layers of the CNN are re-trainedwith only n T Ltr = n tr /
100 = 500 samples from Re = 128000. To the best of our knowledge, this isthe first application of transfer learning to building generalizable data-driven SGS models beyond1D turbulence (the 1D results were presented in our recent work [99]).In summary, in a canonical 2D turbulent flow, we present promising results that CNNs andtransfer learning can be used together to build non-local data-driven SGS models that lead to ac-curate, stable, and generalizable LES models. The generalization capability provided by transferlearning is key in making such data-driven SGS models practically useful. This is because traininga base CNN model with a large training set of high-fidelity data from low Re and then requiringonly a small amount of high-fidelity data from the higher Re for re-training is highly desirable forturbulence modeling, given the sharp increase in the computational cost of high-fidelity simula-tions such as DNS for higher Re . It should be also highlighted that because transfer learning onlyrequires a small amount of data and re-training only a few layers, its training process is fast andhas a low computational cost, thus it can be conducted on the fly, for example when dealing withnon-stationary systems. Moreover, the ability to also transfer between different LES resolutionsfurther broadens the applicability of non-local SGS models. While not examined here, it is alsopossible that transfer learning provides generalization beyond Re and grid resolution, for examplebetween canonical fluid systems and fluid flows with more complex geometries. Such applicationsshould be explored in future work.Beyond the obvious need to study the performance of the CNN-based SGS models and transferlearning in more complex turbulent flows (e.g., 3D, wall turbulence, stratified), there are a numberof avenues to pursue in order to further expand and improve the methodology. The number oftraining samples might be potentially reduced, without loss of accuracy or stability, using dataaugmentation, e.g., through pre-processing the training data by exploiting the symmetries in theflow [68, 99], and/or using physics-informed ML [37]. Examples of the latter include adding com-22onents (such as capsules [15] and transformers [14]) that better preserve spatial correlations in theCNN or imposing physical constraints in the loss function [e.g., 37, 111]. Establishing a connectionbetween accuracy in a priori tests and stability in a posteriori tests would also be substantiallyhelpful. Note that in this work (and in most other SGS modeling studies), an “offline training”strategy is used: the SGS model is first trained using snapshots of the resolved flow as inputs andsnapshots of the SGS term as outputs, and then this trained data-driven model is coupled with thecoarse-resolution LES solver. At least some of the issues related to stability could be potentiallyresolved, and even scaling with the size of training set could be improved, by using an “online train-ing” strategy, which involves training the data-driven model to find the best SGS term that evolvesthe solution of the LES closest to that of the DNS. Sirignano et al. [94] have recently presented anexciting and promising framework for such an approach. Exploring data-driven SGS models thataccount for non-Markovian effects arising from coarse-graining, as suggested by the Mori-Zwanzigformalism [17, 48, 71, 110], is another direction to pursue in future work. Finally, interpretingthe CNNs that provide accurate SGS models, such as the one trained here, can lead to insightinto the SGS physics and possibly even better data-driven and/or physics-based models. Whileinterpreting neural networks is currently challenging, using them along with data-driven equationdiscovery methods might provide a stepping stone, as for example done for ocean mesoscale eddiesin pioneering work by Zanna and Bolton [120]. We thank Romit Maulik, Tapio Schneider, and Laure Zanna for insightful discussions about data-driven SGS modeling, and Karan Jakhar, Rambod Mojgani, and Ebrahim Nabizadeh for helpfulcomments on the manuscript. This work was supported by an award from the ONR Young In-vestigator Program (N00014-20-1-2722) and a grant from the NSF CSSI program (OAC-2005123)to P.H. Computational resources were provided by NSF XSEDE (allocation ATM170020) to useBridges GPU and Comet GPU clusters and by the Rice University Center for Research Com-puting. The codes for CNN and CNN with transfer learning are publicly available at https://github.com/envfluids/2D-DDP . A Initial condition for DNS
Following previous studies, we choose the initial conditions of DNS to have the same energy spec-trum but randomly different vorticity fields [55, 56, 67]. The initial energy spectrum is given by[67] ˆ E ( k ) = Ak e − ( k/k p ) , (18)where the amplitude is A = 4 k − p π , (19)and k = | k | = q k x + k y . The maximum value of the energy spectrum occurs at √ k p , where k p = 10 is used here following Ref. [55]. The given energy spectrum in turn determines themagnitude of the Fourier coefficients of vorticity: | ˆ ω ( k ) | = s kπ ˆ E ( k ) . (20)23hen the vorticity distribution in Fourier space isˆ ω ( k ) = | ˆ ω ( k ) | e i η ( k ) , (21)where η ( k ) = η ( k ) + η ( k ). η ( k ) and η ( k ) are independent random numbers from a uniformdistribution in [0 , π ] at each ( k x , k y ) when both k x , k y ≥ k x − k y plane).The values at the other quadrants are as follows: η ( k ) = − η ( k ) + η ( k ) for k x < , k y ≥ η ( k ) = − η ( k ) − η ( k ) for k x < , k y < η ( k ) = + η ( k ) − η ( k ) for k x ≥ , k y < t = 0.Figures 1(a) and 1(d) show an example of the initial ω ( x, y ) and the corresponding ˆ E ( k ),respectively. The initial vorticity is dominated by relatively large-scale structures, but small-scalestructures emerge as the flow evolves (Figs. 1(b), (c), and (d)). From t ≈ τ , the ˆ E ( k ) spectrumexhibits self-similarity and follows the Kraichnan-Batchelor-Leith (KBL) theory [3, 41, 46]. Between t = 50 τ and 200 τ , the flow decays due to the viscous dissipation, the small-scale structures fadeaway, and the large, coherent vortices merge and grow as a result of the inverse energy cascade.Following previous studies, we focus on this phase of the decaying 2D turbulence and discard thefirst 50 τ as spin-up [4, 55]. 24 eferences [1] A. Arakawa and V. R. Lamb. Computational design of the basic dynamical processes of the UCLAgeneral circulation model. General Circulation Models of the Atmosphere , 17(Supplement C):173–265,1977.[2] Y. Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Learning data-driven discretizations for partialdifferential equations.
Proceedings of the National Academy of Sciences , 116(31):15344–15349, 2019.[3] G. K. Batchelor. Computation of the energy spectrum in homogeneous two-dimensional turbulence.
The Physics of Fluids , 12(12):II–233, 1969.[4] A. Beck, D. Flad, and C.-D. Munz. Deep neural networks for data-driven LES closure models.
Journalof Computational Physics , 398:108910, 2019.[5] A. Beck and M. Kurz. A perspective on machine learning methods in turbulence modelling. arXivpreprint arXiv:2010.12226 , 2020.[6] J. Berner, F. Doblas-Reyes, T. Palmer, G. Shutts, and A. Weisheimer. Impact of a quasi-stochasticcellular automaton backscatter scheme on the systematic error and seasonal prediction skill of a globalclimate model.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engi-neering Sciences , 366(1875):2559–2577, 2008.[7] T. Bolton and L. Zanna. Applications of deep learning to ocean data inference and subgrid parame-terization.
Journal of Advances in Modeling Earth Systems , 11(1):376–399, 2019.[8] S. L. Brunton, B. R. Noack, and P. Koumoutsakos. Machine learning for fluid mechanics.
AnnualReview of Fluid Mechanics , 52:477–508, 2020.[9] D. Carati, S. Ghosal, and P. Moin. On the representation of backscatter in dynamic localizationmodels.
Physics of Fluids , 7(3):606–616, 1995.[10] G. J. Chandler and R. R. Kerswell. Invariant recurrent solutions embedded in a turbulent two-dimensional Kolmogorov flow.
Journal of Fluid Mechanics , 722:554–595, 2013.[11] J. R. Chasnov. Simulation of the Kolmogorov inertial subrange using an improved subgrid model.
Physics of Fluids A: Fluid Dynamics , 3(1):188–200, 1991.[12] A. Chattopadhyay, P. Hassanzadeh, and S. Pasha. Predicting clustered weather patterns: A test casefor applications of convolutional neural networks to spatio-temporal climate data.
Scientific Reports ,10(1):1–13, 2020.[13] A. Chattopadhyay, P. Hassanzadeh, and D. Subramanian. Data-driven predictions of a multiscaleLorenz 96 chaotic system using machine-learning methods: reservoir computing, artificial neural net-work, and long short-term memory network.
Nonlinear Processes in Geophysics , 27(3):373–389, 2020.[14] A. Chattopadhyay, M. Mustafa, P. Hassanzadeh, and K. Kashinath. Deep spatial transformers for au-toregressive data-driven forecasting of geophysical turbulence. In
Proceedings of the 10th InternationalConference on Climate Informatics , pages 106–112, 2020.[15] A. Chattopadhyay, E. Nabizadeh, and P. Hassanzadeh. Analog forecasting of extreme-causing weatherpatterns using deep learning.
Journal of Advances in Modeling Earth Systems , 12(2):e2019MS001958,2020.[16] A. Chattopadhyay, A. Subel, and P. Hassanzadeh. Data-driven super-parameterization using deeplearning: Experimentation with multi-scale Lorenz 96 systems and transfer-learning.
Journal of Ad-vances in Modeling Earth Systems , 12(11):e2020MS002084, 2020.[17] A. J. Chorin, O. H. Hald, and R. Kupferman. Optimal prediction and the Mori–Zwanzig representationof irreversible processes.
Proceedings of the National Academy of Sciences , 97(7):2968–2973, 2000.[18] A. Dipankar, B. Stevens, R. Heinze, C. Moseley, G. Z¨angl, M. Giorgetta, and S. Brdar. Large eddysimulation using the general circulation model ICON.
Journal of Advances in Modeling Earth Systems ,7(3):963–986, 2015.[19] J. A. Domaradzki, R. W. Metcalfe, R. S. Rogallo, and J. J. Riley. Analysis of subgrid-scale eddyviscosity with use of results from direct numerical simulations.
Physical Review Letters , 58(6):547,1987.[20] J. A. Domaradzki and E. M. Saiki. A subgrid-scale model based on the estimation of unresolved scalesof turbulence.
Physics of Fluids , 9(7):2148–2164, 1997.[21] S. B. Driss, M. Soua, R. Kachouri, and M. Akil. A comparison study between MLP and convolutionalneural network models for character recognition. In
Real-Time Image and Video Processing 2017 ,volume 10223. International Society for Optics and Photonics, 2017.[22] O. R. Dunbar, A. Garbuno-Inigo, T. Schneider, and A. M. Stuart. Calibration and uncertainty quan-tification of convective parameters in an idealized GCM. arXiv preprint arXiv:2012.13262 , 2020.
23] K. Duraisamy. Machine Learning-augmented Reynolds-averaged and large eddy simulation models ofturbulence. arXiv preprint arXiv:2009.10675 , 2020.[24] K. Duraisamy, G. Iaccarino, and H. Xiao. Turbulence modeling in the age of data.
Annual Review ofFluid Mechanics , 51:357–377, 2019.[25] S. Formentin, M. Mazzoleni, M. Scandella, and F. Previdi. Nonlinear system identification via dataaugmentation.
Systems & Control Letters , 128:56–63, 2019.[26] R. O. Fox. Large-eddy-simulation tools for multiphase flows.
Annual Review of Fluid Mechanics ,44:47–76, 2012.[27] B. Fox-Kemper and D. Menemenlis. Can large eddy simulation techniques improve mesoscale richocean models.
Ocean Modeling in an Eddying Regime , 177:319–337, 2008.[28] H. Frezat, G. Balarac, J. L. Sommer, R. Fablet, and R. Lguensat. Physical invariance in neuralnetworks for subgrid-scale scalar flux modeling. arXiv preprint arXiv:2010.04663 , 2020.[29] M. Gamahara and Y. Hattori. Searching for turbulence models by artificial neural network.
PhysicalReview Fluids , 2(5):054604, 2017.[30] M. Germano, U. Piomelli, P. Moin, and W. H. Cabot. A dynamic subgrid-scale eddy viscosity model.
Physics of Fluids A: Fluid Dynamics , 3(7):1760–1765, 1991.[31] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio.
Deep Learning , volume 1. MIT press Cambridge,2016.[32] I. Grooms, Y. Lee, and A. J. Majda. Numerical schemes for stochastic backscatter in the inversecascade of quasigeostrophic turbulence.
Multiscale Modeling & Simulation , 13(3):1001–1021, 2015.[33] H. T. Hewitt, M. Roberts, P. Mathiot, A. Biastoch, E. Blockley, E. P. Chassignet, B. Fox-Kemper,P. Hyder, D. P. Marshall, E. Popova, et al. Resolving and parameterising the ocean mesoscale in earthsystem models.
Current Climate Change Reports , pages 1–16, 2020.[34] M. F. Jansen and I. M. Held. Parameterizing subgrid-scale eddy effects using energetically consistentbackscatter.
Ocean Modelling , 80:36–48, 2014.[35] M. F. Jansen, I. M. Held, A. Adcroft, and R. Hallberg. Energy budget-based backscatter in an eddypermitting primitive equation model.
Ocean Modelling , 94:15–26, 2015.[36] M. L. Kaandorp and R. P. Dwight. Data-driven modelling of the Reynolds stress tensor using randomforests with invariance.
Computers & Fluids , page 104497, 2020.[37] K. Kashinath, M. Mustafa, A. Albert, J.-L. Wu, C. Jiang, S. Esmaeilzadeh, K. Azizzadenesheli,R. Wang, A. Chattopadhyay, A. Singh, A. Manepalli, D. Chirila, R. Yu, R. Walters, B. White, H. Xiao,H. A. Tchelepi, P. Marcus, A. Anandkumar, P. Hassanzadeh, and Prabhat. Physics-informed machinelearning: case studies for weather and climate modelling.
Philosophical Transactions of the RoyalSociety A: Mathematical, Physical and Engineering Sciences , 379(2194):20200093, 2021.[38] R. M. Kerr, J. A. Domaradzki, and G. Barbier. Small-scale properties of nonlinear interactions andsubgrid-scale energy transfer in isotropic turbulence.
Physics of Fluids , 8(1):197–208, 1996.[39] S. Khani and M. L. Waite. Backscatter in stratified turbulence.
European Journal of Mechanics-B/Fluids , 60:1–12, 2016.[40] B. Knaepen and P. Moin. Large-eddy simulation of conductive flows at low magnetic Reynolds number.
Physics of Fluids , 16(5):1255–1261, 2004.[41] R. H. Kraichnan. Inertial ranges in two-dimensional turbulence.
The Physics of Fluids , 10(7):1417–1423, 1967.[42] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neuralnetworks. In
Advances in neural information processing systems , pages 1097–1105, 2012.[43] D. Krueger, E. Caballero, J.-H. Jacobsen, A. Zhang, J. Binas, R. L. Priol, and A. Courville. Out-of-distribution generalization via risk extrapolation (REx). arXiv preprint arXiv:2003.00688 , 2020.[44] M. Kurz and A. Beck. A machine learning framework for LES closure terms. arXiv preprintarXiv:2010.03030 , 2020.[45] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard, and L. D. Jackel.Handwritten digit recognition with a back-propagation network. In
Advances in neural informationprocessing systems , pages 396–404, 1990.[46] C. Leith. Atmospheric predictability and two-dimensional turbulence.
Journal of the AtmosphericSciences , 28(2):145–161, 1971.[47] D. Leslie and G. Quarini. The application of turbulence theory to the formulation of subgrid modellingprocedures.
Journal of Fluid Mechanics , 91(1):65–91, 1979.
48] Z. Li, X. Bian, X. Li, and G. E. Karniadakis. Incorporation of memory effects in coarse-grainedmodeling via the Mori-Zwanzig formalism.
The Journal of chemical physics , 143(24):243128, 2015.[49] D. K. Lilly. A proposed modification of the Germano subgrid-scale closure method.
Physics of FluidsA: Fluid Dynamics , 4(3):633–635, 1992.[50] J. Ling, A. Kurzawski, and J. Templeton. Reynolds averaged turbulence modelling using deep neuralnetworks with embedded invariance.
Journal of Fluid Mechanics , 807:155–166, 2016.[51] Y. Liu, L. Lu, L. Fang, and F. Gao. Modification of Spalart–Allmaras model with consideration ofturbulence energy backscatter using velocity helicity.
Physics Letters A , 375(24):2377–2381, 2011.[52] P. P. Mana and L. Zanna. Toward a stochastic parameterization of ocean mesoscale eddies.
OceanModelling , 79:1–20, 2014.[53] P. J. Mason and D. J. Thomson. Stochastic backscatter in large-eddy simulations of boundary layers.
Journal of Fluid Mechanics , 242:51–78, 1992.[54] R. Maulik, K. Fukami, N. Ramachandra, K. Fukagata, and K. Taira. Probabilistic neural networksfor fluid flow surrogate modeling and data recovery.
Physical Review Fluids , 5(10):104401, 2020.[55] R. Maulik and O. San. A stable and scale-aware dynamic modeling framework for subgrid-scaleparameterizations of two-dimensional turbulence.
Computers & Fluids , 158:11–38, 2017.[56] R. Maulik, O. San, A. Rasheed, and P. Vedula. Subgrid modelling for two-dimensional turbulenceusing neural networks.
Journal of Fluid Mechanics , 858:122–144, 2019.[57] R. Maulik, H. Sharma, S. Patel, B. Lusch, and E. Jennings. A turbulent eddy-viscosity surrogatemodeling framework for Reynolds-Averaged Navier-Stokes simulations.
Computers & Fluids , page104777, 2020.[58] C. Meneveau and J. Katz. Scale-invariance and turbulence models for large-eddy simulation.
AnnualReview of Fluid Mechanics , 32(1):1–32, 2000.[59] C. Meneveau and T. S. Lund. The dynamic Smagorinsky model and scale-dependent coefficients inthe viscous range of turbulence.
Physics of Fluids , 9(12):3932–3934, 1997.[60] A. Mohan, D. Daniel, M. Chertkov, and D. Livescu. Compressed convolutional LSTM: An efficientdeep learning framework to model high fidelity 3D turbulence. arXiv preprint arXiv:1903.00033 , 2019.[61] A. T. Mohan, D. Tretiak, M. Chertkov, and D. Livescu. Spatio-temporal deep learning models of 3Dturbulence with physics informed diagnostics.
Journal of Turbulence , pages 1–41, 2020.[62] P. Moin and K. Mahesh. Direct numerical simulation: a tool in turbulence research.
Annual Reviewof Fluid Mechanics , 30(1):539–578, 1998.[63] R. D. Moser, S. W. Haering, and G. R. Yalla. Statistical properties of subgrid-scale turbulence models.
Annual Review of Fluid Mechanics , 53, 2020.[64] B. Nadiga. Stochastic vs. deterministic backscatter of potential enstrophy in geostrophic turbulence.
Stochastic Physics and Climate Modeling, edited by T. Palmer and P. Williams (Cambridge UniversityPress, Cambridge, England, 2009) , 2010.[65] G. Novati, H. L. de Laroussilhe, and P. Koumoutsakos. Automating turbulence modelling by multi-agent reinforcement learning.
Nature Machine Intelligence , pages 1–10, 2021.[66] J. O’Brien, J. Urzay, M. Ihme, P. Moin, and A. Saghafian. Subgrid-scale backscatter in reacting andinert supersonic hydrogen-air turbulent mixing layers.
Journal of Fluid Mechanics , 743:554, 2014.[67] P. Orlandi.
Fluid flow phenomena: a numerical toolkit , volume 55. Springer Science & Business Media,2012.[68] S. Pan and K. Duraisamy. Long-time predictive modeling of nonlinear dynamical systems using neuralnetworks.
Complexity , 2018, 2018.[69] S. Pandey, J. Schumacher, and K. R. Sreenivasan. A perspective on machine learning in turbulentflows.
Journal of Turbulence , pages 1–18, 2020.[70] E. J. Parish and K. Duraisamy. A paradigm for data-driven predictive modeling using field inversionand machine learning.
Journal of Computational Physics , 305:758–774, 2016.[71] E. J. Parish and K. Duraisamy. A dynamic subgrid scale model for large eddy simulations based onthe Mori–Zwanzig formalism.
Journal of Computational Physics , 349:154–175, 2017.[72] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott. Model-free prediction of large spatiotemporallychaotic systems from data: A reservoir computing approach.
Physical Review Letters , 120(2):024102,2018.
73] J. Pathak, M. Mustafa, K. Kashinath, E. Motheau, T. Kurth, and M. Day. Using machine learningto augment coarse-grid computational fluid dynamics simulations. arXiv preprint arXiv:2010.00072 ,2020.[74] S. Pawar, S. E. Ahmed, and O. San. Interface learning in fluid dynamics: Statistical inference ofclosures within micro–macro-coupling models.
Physics of Fluids , 32(9):091704, 2020.[75] S. Pawar, O. San, A. Rasheed, and P. Vedula. A priori analysis on deep learning of subgrid-scaleparameterizations for Kraichnan turbulence.
Theoretical and Computational Fluid Dynamics , pages1–27, 2020.[76] C. Peyrard, F. Mamalet, and C. Garcia. A comparison between multi-layer perceptrons and convolu-tional neural networks for text image super-resolution. In
VISAPP (1) , pages 84–91, 2015.[77] U. Piomelli. Large-eddy simulation: achievements and challenges.
Progress in Aerospace Sciences ,35(4):335–362, 1999.[78] U. Piomelli, W. H. Cabot, P. Moin, and S. Lee. Subgrid-scale backscatter in turbulent and transitionalflows.
Physics of Fluids A: Fluid Dynamics , 3(7):1766–1771, 1991.[79] U. Piomelli, T. A. Zang, C. G. Speziale, and M. Y. Hussaini. On the large-eddy simulation of transi-tional wall-bounded flows.
Physics of Fluids A: Fluid Dynamics , 2(2):257–265, 1990.[80] H. Pitsch. Large-eddy simulation of turbulent combustion.
Annual Review of Fluid Mechanics , 38:453–482, 2006.[81] S. B. Pope.
Turbulent Flows . IOP Publishing, 2001.[82] G. D. Portwood, B. T. Nadiga, J. A. Saenz, and D. Livescu. Interpreting neural network models ofresidual scalar flux.
Journal of Fluid Mechanics , 907, 2021.[83] K. G. Pressel, S. Mishra, T. Schneider, C. M. Kaul, and Z. Tan. Numerics and subgrid-scale modelingin large eddy simulations of stratocumulus clouds.
Journal of Advances in Modeling Earth Systems ,9(2):1342–1365, 2017.[84] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learningframework for solving forward and inverse problems involving nonlinear partial differential equations.
Journal of Computational Physics , 378:686–707, 2019.[85] S. Rasp, M. S. Pritchard, and P. Gentine. Deep learning to represent subgrid processes in climatemodels.
Proceedings of the National Academy of Sciences , 115(39):9684–9689, 2018.[86] P. Sagaut.
Large eddy simulation for incompressible flows: An introduction . Springer Science &Business Media, 2006.[87] P. Sagaut, M. Terracol, and S. Deck.
Multiscale and multiresolution approaches in turbulence-LES,DES and Hybrid RANS/LES methods: Applications and Guidelines . World Scientific, 2013.[88] H. Salehipour and W. Peltier. Deep learning of mixing by two “atoms” of stratified turbulence.
Journalof Fluid Mechanics , 861, 2019.[89] F. Sarghini, G. De Felice, and S. Santini. Neural networks based subgrid scale modeling in large eddysimulations.
Computers & Fluids , 32(1):97–108, 2003.[90] H. Sarlak, C. Meneveau, and J. N. Sørensen. Role of subgrid-scale modeling in large eddy simulationof wind turbine wake interactions.
Renewable Energy , 77:386–399, 2015.[91] T. Schneider, S. Lan, A. Stuart, and J. Teixeira. Earth system modeling 2.0: A blueprint for modelsthat learn from observations and targeted high-resolution simulations.
Geophysical Research Letters ,44(24):12–396, 2017.[92] V. Shinde. Proper orthogonal decomposition assisted subfilter-scale model of turbulence for large eddysimulation.
Physical Review Fluids , 5(1):014605, 2020.[93] G. Shutts. A kinetic energy backscatter algorithm for use in ensemble prediction systems.
QuarterlyJournal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorologyand physical oceanography , 131(612):3079–3102, 2005.[94] J. Sirignano, J. F. MacArt, and J. B. Freund. DPM: A deep learning PDE augmentation method withapplication to large-eddy simulation.
Journal of Computational Physics , 423:109811, 2020.[95] J. Smagorinsky. General circulation experiments with the primitive equations: I. The basic experiment.
Monthly Weather Review , 91(3):99–164, 1963.[96] A. N. Souza, G. Wagner, A. Ramadhan, B. Allen, V. Churavy, J. Schloss, J. Campin, C. Hill, A. Edel-man, J. Marshall, et al. Uncertainty quantification of ocean parameterizations: Application to thek-profile-parameterization for penetrative convection.
Journal of Advances in Modeling Earth Sys-tems , 12(12):e2020MS002108, 2020.
97] R. J. Stevens, L. A. Mart´ınez-Tossas, and C. Meneveau. Comparison of wind farm large eddy simula-tions using actuator disk and actuator line models with wind tunnel experiments.
Renewable energy ,116:470–478, 2018.[98] R. Stoffer, C. M. van Leeuwen, D. Podareanu, V. Codreanu, M. A. Veerman, M. Janssens, O. K.Hartogensis, and C. C. van Heerwaarden. Development of a large-eddy simulation subgrid model basedon artificial neural networks: A case study of turbulent channel flow.
Geoscientific Model DevelopmentDiscussions , pages 1–29, 2020.[99] A. Subel, A. Chattopadhyay, Y. Guan, and P. Hassanzadeh. Data-driven subgrid-scale modeling offorced Burgers turbulence using deep learning with generalization to higher Reynolds numbers viatransfer learning.
Physics of Fluids (in press) , 2021. available at arXiv:2012.06664.[100] P. Tabeling. Two-dimensional turbulence: A physicist approach.
Physics Reports , 362(1):1–62, 2002.[101] S. Taghizadeh, F. D. Witherden, and S. S. Girimaji. Turbulence closure modeling with data-driventechniques: Physical compatibility and consistency considerations. arXiv preprint arXiv:2004.03031 ,2020.[102] Z. Tan, T. Schneider, J. Teixeira, and K. G. Pressel. Large-eddy simulation of subtropical cloud-topped boundary layers: 2. Cloud response to climate change.
Journal of Advances in Modeling EarthSystems , 9(1):19–38, 2017.[103] J. Thuburn, J. Kent, and N. Wood. Cascades, backscatter and conservation in numerical models oftwo-dimensional turbulence.
Quarterly Journal of the Royal Meteorological Society , 140(679):626–638,2014.[104] G. K. Vallis.
Atmospheric and Oceanic Fluid Dynamics . Cambridge University Press, 2017.[105] W. T. Verkley, C. A. Severijns, and B. A. Zwaal. A maximum entropy approach to the interac-tion between small and large scales in two-dimensional turbulence.
Quarterly Journal of the RoyalMeteorological Society , 145(722):2221–2236, 2019.[106] Z. Y. Wan, P. Vlachas, P. Koumoutsakos, and T. Sapsis. Data-assisted reduced-order modeling ofextreme events in complex dynamical systems.
PLOS One , 13(5):e0197704, 2018.[107] J.-X. Wang, J.-L. Wu, and H. Xiao. Physics-informed machine learning approach for reconstructingReynolds stress modeling discrepancies based on DNS data.
Physical Review Fluids , 2(3):034603, 2017.[108] Z. Wang, K. Luo, D. Li, J. Tan, and J. Fan. Investigations of data-driven closure for subgrid-scalestress in large-eddy simulation.
Physics of Fluids , 30(12):125101, 2018.[109] R. R. Wilcox.
Fundamentals of modern statistical methods: Substantially improving power and accu-racy . Springer, 2010.[110] J. Wouters and V. Lucarini. Multi-level dynamical systems: Connecting the Ruelle response theoryand the Mori-Zwanzig approach.
Journal of Statistical Physics , 151(5):850–860, 2013.[111] J.-L. Wu, K. Kashinath, A. Albert, D. Chirila, H. Xiao, et al. Enforcing statistical constraints ingenerative adversarial networks for modeling chaotic dynamical systems.
Journal of ComputationalPhysics , 406:109209, 2020.[112] J.-L. Wu, H. Xiao, and E. Paterson. Physics-informed machine learning approach for augmentingturbulence models: A comprehensive framework.
Physical Review Fluids , 3(7):074602, 2018.[113] C. Xie, J. Wang, H. Li, M. Wan, and S. Chen. Artificial neural network mixed model for large eddysimulation of compressible isotropic turbulence.
Physics of Fluids , 31(8):085112, 2019.[114] C. Xie, J. Wang, K. Li, and C. Ma. Artificial neural network approach to large-eddy simulation ofcompressible isotropic turbulence.
Physical Review E , 99(5):053113, 2019.[115] C. Xie, J. Wang, and E. Weinan. Modeling subgrid-scale forces by spatial artificial neural networks inlarge eddy simulation of turbulence.
Physical Review Fluids , 5(5):054606, 2020.[116] C. Xie, X. Xiong, and J. Wang. Artificial neural network approach for turbulence models: A localframework. arXiv preprint arXiv:2101.10528 , 2021.[117] Y. Xie, E. Franz, M. Chu, and N. Thuerey. tempoGAN: A temporally coherent, volumetric GAN forsuper-resolution fluid flow.
ACM Transactions on Graphics (TOG) , 37(4):1–15, 2018.[118] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks?In
Advances in neural information processing systems , pages 3320–3328, 2014.[119] Y. Zang, R. L. Street, and J. R. Koseff. A dynamic mixed subgrid-scale model and its application toturbulent recirculating flows.
Physics of Fluids A: Fluid Dynamics , 5(12):3186–3196, 1993.[120] L. Zanna and T. Bolton. Data-driven equation discovery of ocean mesoscale closures.
GeophysicalResearch Letters , 47(17):e2020GL088376, 2020. PhysicalReview A , 43(12):7049, 1991.[122] Z. Zhou, G. He, S. Wang, and G. Jin. Subgrid-scale model for large-eddy simulation of isotropicturbulent flows using an artificial neural network.
Computers & Fluids , 195:104319, 2019., 195:104319, 2019.