[PDF] An Efficient Deep Learning Technique for the Navier-Stokes Equations: Application to Unsteady Wake Flow Dynamics

Abstract

We present an efficient deep learning technique for the model reduction of the Navier-Stokes equations for unsteady flow problems. The proposed technique relies on the Convolutional Neural Network (CNN) and the stochastic gradient descent method. Of particular interest is to predict the unsteady fluid forces for different bluff body shapes at low Reynolds number. The discrete convolution process with a nonlinear rectification is employed to approximate the mapping between the bluff-body shape and the fluid forces. The deep neural network is fed by the Euclidean distance function as the input and the target data generated by the full-order Navier-Stokes computations for primitive bluff body shapes. The convolutional networks are iteratively trained using the stochastic gradient descent method with the momentum term to predict the fluid force coefficients of different geometries and the results are compared with the full-order computations. We attempt to provide a physical analogy of the stochastic gradient method with the momentum term with the simplified form of the incompressible Navier-Stokes momentum equation. We also construct a direct relationship between the CNN-based deep learning and the Mori-Zwanzig formalism for the model reduction of a fluid dynamical system. A systematic convergence and sensitivity study is performed to identify the effective dimensions of the deep-learned CNN process such as the convolution kernel size, the number of kernels and the convolution layers. Within the error threshold, the prediction based on our deep convolutional network has a speed-up nearly four orders of magnitude compared to the full-order results and consumes an insignificant fraction of computational resources. The proposed CNN-based approximation procedure has a profound impact on the parametric design of bluff bodies and the feedback control of separated flows.

Full PDF

AAn Eﬃcient Deep Learning Technique for theNavier-Stokes Equations: Application to UnsteadyWake Flow Dynamics

T. P. Miyanawala a , R. K. Jaiman a, ∗ a Department of Mechanical Engineering, National University Singapore, Singapore 119077

Abstract

We present an eﬃcient deep learning technique for the model reduction of theNavier-Stokes equations for unsteady ﬂow problems. The proposed techniquerelies on the Convolutional Neural Network (CNN) and the stochastic gradientdescent method. Of particular interest is to predict the unsteady ﬂuid forces fordiﬀerent bluﬀ body shapes at low Reynolds number. The discrete convolutionprocess with a nonlinear rectiﬁcation is employed to approximate the mappingbetween the bluﬀ-body shape and the ﬂuid forces. The deep neural network isfed by the Euclidean distance function as the input and the target data generatedby the full-order Navier-Stokes computations for primitive bluﬀ body shapes.The convolutional networks are iteratively trained using the stochastic gradientdescent method with the momentum term to predict the ﬂuid force coeﬃcientsof diﬀerent geometries and the results are compared with the full-order compu-tations. We attempt to provide a physical analogy of the stochastic gradientmethod with the momentum term with the simpliﬁed form of the incompress-ible Navier-Stokes momentum equation. We also construct a direct relationshipbetween the CNN-based deep learning and the Mori-Zwanzig formalism for themodel reduction of a ﬂuid dynamical system. A systematic convergence andsensitivity study is performed to identify the eﬀective dimensions of the deep-learned CNN process such as the convolution kernel size, the number of kernels ∗ Corresponding author

Email address: [email protected] (R. K. Jaiman)

Preprint submitted to arXiv August 16, 2018 a r X i v : . [ phy s i c s . f l u - dyn ] A ug nd the convolution layers. Within the error threshold, the prediction based onour deep convolutional network has a speed-up nearly four orders of magnitudecompared to the full-order results and consumes an insigniﬁcant fraction of com-putational resources. The proposed CNN-based approximation procedure hasa profound impact on the parametric design of bluﬀ bodies and the feedbackcontrol of separated ﬂows. Keywords:

Deep learning, Convolutional neural networks, Distance function,Stochastic gradient descent, Navier-Stokes equations, Unsteady wake dynamics

1. Introduction

Unsteady separated ﬂow behind a bluﬀ body causes ﬂuctuating drag andtransverse forces to the body, which is of great signiﬁcance in engineering ap-plications. “After over a century of eﬀort by researchers and engineers, theproblem of bluﬀ body ﬂow remains almost entirely in the empirical, descriptiverealm of knowledge,” as famously stated by [1]. Recent advances in computa-tional and experimental techniques in the past decades have only strengthenedthis statement. Nonlinear unsteady ﬂow separation and the dynamics of vor-tex formation in the near wake region make the bluﬀ-body ﬂow an extremelycomplex phenomenon to tackle through rigorous analytical tools. While physi-cal experimental and computational techniques provide high-ﬁdelity data, theyare generally time-consuming and expensive for design space exploration andﬂow control in a practical engineering application. Furthermore, the enormousamount of generated high-ﬁdelity data are often under-utilized and the sig-niﬁcant patterns identiﬁed and learned in one case are rarely used for a nextsimulation eﬀectively. For example, consider the ﬂow past a stationary roundedcorner square cylinder and the engineering objective is to determine a direct re-lationship between the ﬂuid forces and the rounding angle of the square cylinder.Traditionally, we need to perform computationally expensive transient Navier-Stokes simulations for each rounded angle value for a considerable number oftime-steps, albeit the vortex formation and the shedding process do not diﬀer2uch between the diﬀerent conﬁgurations. Here, we focus on a data-drivencomputing method to predict the key ﬂow parameters of bluﬀ-body ﬂows viaavailable high-ﬁdelity data. Of particular interest is to have an eﬃcient alterna-tive for the brute force full-order model (FOM) and to obtain the ﬂuid forces atnearly real-time within an acceptable error threshold while utilizing a fractionof the computational eﬀort.Data-driven methods using low dimensional representations via proper or-thogonal decomposition (POD) [2], eigenvalue realization algorithm (ERA) [3]and dynamic mode decomposition (DMD)[4, 5] have been used in computationalﬂuid dynamics (CFD) for model order reduction. These methods generate phys-ically interpretable spatial and spatio-temporal modes related to dominatingfeatures of the bluﬀ body ﬂow such as the shear layers and the Karman vortexstreet. However, these techniques may fail to capture nonlinear transients andmulti-scale phenomena [6] in bluﬀ-body ﬂows, which motivates the need for animproved technique to extract invariant dominant features to predict the im-portant ﬂow dynamical parameters. In such learning algorithms, the essentialidea is to construct a generalized kernel mapping function from a ﬁnite set ofoutput data and to postulate a priori relationship between the input-outputdynamics. With that regard, biologically-inspired learning (i.e., neuron-basedlearning) techniques have some attractive approximation properties to constructthe input-output dynamics through two main phases namely, training and pre-diction. In the training phase, high-ﬁdelity data from FOM are fed to a learningalgorithm as pairs of input and output. The target of neural network learning isthen to ﬁnd the functional relationship between the input (bluﬀ body geometry)and the output (wake dynamics) using the provided initial data such that theforce coeﬃcient for a new bluﬀ body geometry can be determined in real timewithout using the full-order simulation. For this process, the learning algorithmutilizes a set of weighting/functional steps also called layers to connect the in-put geometry matrix and the force coeﬃcient. These hidden layers, the set ofinput geometry matrices and the output force coeﬃcient layer form a neuralnetwork . After the input data is fed to the neural network, it initializes the3raining with guessed weights for the layers between the input and output anditeratively back-propagates the deviations and corrects the assumed weights. In[7] a modeling paradigm was proposed using ﬁeld inversion and machine learningfor ﬂuid mechanics problems. However, in these traditional machine learningalgorithms, neural network layers utilize matrix multiplication by considering amatrix of parameters with another parameter and each data point in one layeris connected to each data point of the next layer via a weight function. Theseearly supervised learning methods, also referred to as shallow learning, performsreasonably well if the system is governed by a few dominant features and be-comes extremely ineﬃcient in capturing the multiple local ﬂow features such asﬂow separation, shear layer, vortex formation region, and street.Deep learning solves the deﬁciency of traditional learning algorithms bybuilding complex features from simpler nested functions via input-output re-lationships. For example, [8] proposed a novel Reynolds averaged turbulencemodel based on deep learning. Another developing technique is machine learningbased on Gaussian process (GP) regression whereby the interpolations are gov-erned by previous covariances opposed to smooth curve-ﬁtting. The GP-basedregression network replaces the weight connections in a Bayesian neural networkwith Gaussian processes, which allows the model input dependent correlationsbetween multiple tasks. Such GP-based models are generally task-speciﬁc andmay require sophisticated approximate Bayesian inference which is much morecomputationally expensive in contrast to deep learning models. Recently, theGP-based machine learning method is introduced for the linear diﬀerential equa-tion systems by [9] and for the nonlinear partial diﬀerential equation (PDE) sys-tems by [10]. Also, [11] introduced a multi-ﬁdelity GP for prediction of randomﬁelds. While neural networks employ a considerable number of highly adap-tive basis functions, Gaussian processes generally rely on a multitude of manyﬁxed basis functions. Another main diﬀerence of CNN-based technique withthe GP-based regression method is that it allows a parametric learning processthrough highly adaptive basis functions. Neural networks have some inherentability to discover meaningful high-dimensional data through learning multiple4ayers via highly adaptive basis functions [12, 13]. In particular, deep neuralnetworks can provide an eﬀective process for generating adaptive basis functionsto develop meaningful kernel functions, which can allow discovering structure inhigh-dimensional data without human intervention. Owing to these attractiveproperties of neural networks, we consider a CNN-based based deep learningto develop a novel computational framework to predict the wake ﬂow dynamicsusing high-ﬁdelity computational ﬂuid dynamics (CFD) data.The present work aims to develop an eﬃcient model reduction technique forthe Navier-Stokes equations via deep neural networks. In a deep convolutionalnetwork framework [14], the output data point of a layer is connected only toa very small portion of the input data points and the learning system providesthe sparse interactions (sparse weights) and parameter sharing (tied weights)for working with inputs of variable size. Due to this local connectivity betweenthe adjacent layers, a particular learned weight function can properly capturethe essential features with much reduced memory requirements and computingoperations. While the CNN applies a discrete convolution operation, it is localin extent and performs superior to other deep learning methods. Hence, we con-sider the convolutional neural network as the deep-learning technique to learnand predict the wake ﬂow parameters. Our main contribution is to develop thefeedforward CNN-based prediction model and to assess its accuracy and perfor-mance against the traditional Navier-Stokes simulation for unsteady bluﬀ-bodyseparated ﬂow. Another key contribution is to establish a relationship of deepkernel learning method with the simpliﬁed form of the incompressible Navier-Stokes equations, which essentially provides a physical justiﬁcation for the suc-cess of the proposed neural networks during the functional mapping betweenthe input and the output response. We also connect the convolution processin the deep learning procedure with the well-known Mori-Zwanzig formalismfor the dimensionality reduction of a dynamical system. For a demonstrationof the proposed data-driven technique, we consider a simple variation of bluﬀbody geometry as a parametric input and perform a series of experiments todemonstrate the accuracy and the eﬃciency of the technique. The proposed5ovel model order reduction (MOR)-technique based on deep learning has di-rect engineering applications in the design of oﬀshore, civil and aeronauticalstructures.The paper is organized as follows. We begin by reviewing the full-ordermodel based on the variational formulation of the unsteady incompressible ﬂowequations in Section 2. Section 3 describes the background material and theformulation of our CNN-based deep learning procedure and covers the rela-tionship of deep-learning based MOR with the analytical form of the full-orderNavier-Stokes system. In Section 4, we present a systematic sensitivity and per-formance analysis of the proposed CNN-based technique in predicting the wakedynamics of various bluﬀ bodies of diﬀerent geometries. Concluding remarksare presented in Section 5.

2. Full-Order Variational Model

For the sake of completeness, we brieﬂy summarize the Navier-Stokes solverused in this numerical study for our high-dimensional full-order computations.We assume the ﬂuid ﬂow to be Newtonian, isothermal and incompressible forthe present study. A Petrov-Galerkin ﬁnite-element and semi-discrete time step-ping are adopted for the full-order modeling to investigate the interaction of anincompressible viscous ﬂow with a stationary body [15, 16]. For the spatial do-main Ω and the time domain (0 , T ), the incompressible Navier-Stokes equationsfor the momentum and the continuity are ρ (cid:18) ∂ u ∂t + u · ∇ u (cid:19) = ∇ · σ + b , ∇ · u = 0 on Ω × (0 , T ) (1)where u = u ( x , t ) represent the ﬂuid velocity deﬁned for each spatial point x ∈ Ω, respectively, b is the body force applied on the ﬂuid, and σ is the Cauchystress tensor for a Newtonian ﬂuid, written as: σ = − p I + µ (cid:16) ∇ u + ( ∇ u ) T (cid:17) ,where p denotes the ﬂuid pressure, µ is the dynamic viscosity of the ﬂuid. Theappropriate conditions are speciﬁed along the Dirichlet Γ g and Neumann Γ h boundaries of the ﬂuid domain. We next present the discretization using a6tabilized variational procedure with equal order interpolations for velocity andpressure. By means of the ﬁnite element method, the ﬂuid spatial domain Ω is dis-cretized into several non-overlapping ﬁnite elements Ω e , e = 1 , , ..., n el , where n el is the total number of elements. In this paper, we adopt a generalized- α method to integrate in time t ∈ [ t n , t n+1 ], which can be unconditionally stableas well as second-order accurate for linear problems simultaneously via a singleparameter termed as the spectral radius ρ ∞ . With the aid of the generalized- α parameters ( α, α m ), the expressions employed in the variational form for theﬂow equation are given as [17]: u n+1h = u nh + ∆ t ∂ u nh ∂t + γ ∆ t (cid:18) ∂ u n+1h ∂t − ∂ u nh ∂t (cid:19) , (2) u n+ α h = u nh + α ( u n+1h − u nh ) , (3) ∂ u n+ α m h ∂t = ∂ u nh ∂t + α m (cid:18) ∂ u n+1h ∂t − ∂ u nh ∂t (cid:19) , (4)where α m = (cid:18) − ρ ∞ ρ ∞ (cid:19) , α = ρ ∞ , γ = + α m − α . Let the space of thetrial solutions be denoted by S h and the space of test functions be V h . Thevariational form of the ﬂow equations can be written as: ﬁnd [ u n+ α h , p n+1h ] ∈ S h such that ∀ [ φ f , q ] ∈ V h : (cid:90) Ω h ρ (cid:18) ∂ u n+ α m h ∂t + u n+ α h · ∇ u n+ α h (cid:19) · φ f d Ω + (cid:90) Ω h σ n+ α h : ∇ φ f d Ω − (cid:90) Ω h ∇ · u n+ α h q d Ω + n el (cid:88) e=1 (cid:90) Ω e τ m (cid:16) ρ u n+ α h · ∇ φ f + ∇ q (cid:17) · (cid:18) ρ ∂ u n+ α m h ∂t + ρ u n+ α h · ∇ u n+ α h − ∇ · σ n+ α h − b (cid:19) dΩ e + n el (cid:88) e=1 (cid:90) Ω e ∇ · φ f τ c ρ ∇ · u n+ α h dΩ e = (cid:90) Ω h b · φ f d Ω + (cid:90) Γ h h · φ f d Γ , (5)where the 4th and 5th terms on the left hand side represent the stabilizationterms applied on each element locally. All the other integrals constitute theGalerkin terms and they are evaluated at t n +1 . The stabilization parameters7 m and τ c appearing in the element level integrals are the least-squares metrics,which are added to the fully discretized formulation [18]. Refer to [15, 16] forthe detailed derivation of Eq. (5).Linearization of the variational form is carried out by the standard Newton-Raphson technique. Let ∆ u and ∆ p denote the increment in the velocity andpressure variables. The linearized matrix form of Eq. (5) is written as: M ∆ u + θ K d ∆ u + θ N ∆ u + θ G ∆ p = R m (6) − ( G M ) T ∆ u − ( G K ) T ∆ u + C ∆ p = R c (7)where M is the mass matrix, K d is the diﬀusion matrix, N is the convectionmatrix, G is the pressure gradient operator. G M , G K , and C are the contri-bution of mass matrix, stiﬀness matrix and pressure matrix for the continuityequation respectively. The parameter θ = 2∆ t (1 + ρ ∞ ) / (3 − ρ ∞ ) is a scalar,whereas ρ ∞ denotes the spectral radius which acts as a parameter to control thehigh-frequency damping [16]. R m and R c are the right-hand residual vectors inthe linearized form of the momentum and continuity equations, respectively. We solve the N-S equations at discrete time steps to capture the transientﬂow characteristics, which lead to a sequence of linear systems of equations viaNewton-Raphson type iterations. Using Eqs. (6) and (7), the linearized form ofthe Navier-Stokes equations can be written as the coupled system of equations:  M + θ ( K d + N ) θ G − ( G TM + G TK ) C   ∆ u ∆ p  =  R m R c  (8)This coupled matrix form is complicated to solve since there are two sets ofpressure and velocity modes, whereby the velocity modes must be captured ac-curately before evaluating the pressure modes. Linear solvers are needed forthe solution of momentum and the pressure equations. We ﬁrst solve the sym-metric matrix of the pressure projection using the Conjugate Gradient (CG),8ith a diagonal preconditioner [19]. This provides a set of projection vectorsfor the low-pressure modes of the coupled Navier-Stokes system. Note thatto solve this equation we do not need the left-hand-side matrix in the explicitform, but instead we perform the required matrix-vector products. Withoutdisturbing the projected low-pressure modes, we then use the standard Gener-alized Minimal Residual (GMRES) solver based on the modiﬁed Gram-Schmidtorthogonalization [20] for the velocity and pressure increments using Eq. (8)within the Newton-Raphson loop. The GMRES is an iterative Krylov-subspacemethod, which minimizes the residual norm of all vectors in the Krylov sub-space. The Arnoldi recurrence generates a sequence of orthogonal vectors thatspan the Krylov space [20] and the solution to the least squares system providesan approximate solution by orthogonal projection. A Krylov cycle involves theequal number of steps of the Arnoldi recurrence and matrix-vector products forthe Krylov space. Each new Arnoldi vector is explicitly orthogonalized againstall previous vector, which increases the storage linearly with the iteration countuntil convergence and the work increases quadratically with the iteration count.In the above full-order iterative solution, the linear equation solution via theKrylov-subspace has a robust performance and convergence rate for the steep-est descent. However, such matrix-free Krylov methods for the full-order modelrequire much more storage of intermediate results, and they are generally non-local. Furthermore, the formations of ﬁnite element terms are also computa-tionally expensive processes for element-by-element integration in the full-orderanalysis using Eq. (5) at each iteration, which can be replaced by some equation-free approximation via a deep neural network. We next present our data-drivenmethod based on deep-learning and the stochastic gradient descent approach.

3. Model Reduction via Convolutional Neural Network

Deep learning utilizes multiple layers of representation to learn relevant fea-tures automatically from training data via deep neural networks. These deepneural networks can be considered as multi-iterative coarse-graining scheme9hich provides a successive process to learn relevant dynamics from the databy combining low-level features with higher-level features. In this study, weconsider the convolutional neural network for our deep learning technique toextract relevant features from ﬂuid dynamics data and then predict the ﬂuidforces. With this objective, we hypothesize that a properly trained CNN-baseddeep-learning technique can ﬁnd the mapping between ﬂow parameters (e.g.,force coeﬃcients, Strouhal number) and the design parameters (e.g., aspectratio, rounding angle, angle of attack) of interest and hence can be used forpredictions. The training phase of the CNN comprises the input function, thefeed-forward process, and the back-propagation process. We feed the inputfunction as a matrix of values derived from a unique mapping of the designparameter. Then the feed-forward process initially assumes the features of thewake ﬂow and performs four operations on this input matrix: (i) convolution: alinear process which extracts the local features of the ﬂow using small matricescalled kernels or ﬁlters and sliding-window operations, (ii) nonlinear rectiﬁca-tion: adds non-linearity to the outcome of the convolution, (iii) down-sampling(if required): reduces the memory consumption of the process at the expense ofdata dropout, and (iv) fully connected layers: connect all the extracted featuresto output the wake ﬂow parameter.In a deep neural network, input features are mapped or transformed intohidden units to form new features to feed to the next layer. While the terminol-ogy of feature map is quite broad in the ﬁeld of machine learning, a feature mapis essentially a function which maps the input data vector to feature space andprovides the encoding of features and their locations for convolutional layers.Using the discrepancy between the FOM and the predicted result, the back-propagation process corrects the assumed convolution kernel values and weightsof the fully connected layer. By successively performing feature extraction, thenetworks learn to emphasize dynamically relevant features while having a di-minishing eﬀect on the irrelevant features. After a prescribed number of theseiterations, the CNN-based learning is deemed to be properly trained and readyto predict the force coeﬃcients of the new geometry set. The components of the10terative training are elaborated as follows for our CNN-based deep learning.

The CNN starts with an input function which maps the macroscopic param-eter of concern with the independent variable(s). For the sake of explanationlet the input function be a matrix D ( x ) = D ij of the size R m × n . It is critical todeﬁne the input function for a large enough domain with a proper reﬁnement.However, we show in Section 4.5 that the reﬁnements required for CNN aremuch less than the ﬁnite element NS solvers. After generating the input func-tion, the feed-forward process is applied to it to extract the dominant featuresof the ﬂuid ﬂow. The feed-forward process applies the operations: convolution, rectiﬁcation,and down-sampling in sequence, ﬁrst on the input matrix, then on the outputobtained from the above operations. We refer to each of these operations asa layer and the combination of adjacent convolution, rectiﬁcation and down-sampling as a composite layer. We denote the output of each layer of the l th composite layer as Y Cl , Y Rl and Y Dl , respectively. The total number of com-posite layers, denoted by N , is a critical factor inﬂuencing the accuracy of theentire convolutional network. Such system variables are referred to as hyper-parameters and their values need to be determined for an optimal performance.The feed-forward process takes a set of 2D vectors as input and applies thediscrete convolution operation with a number of 2D kernels. For the ﬁrst convo-lution operation, the input is the input function: D ( x ). Let us denote the totalnumber of kernels in the l th composite layer by k l and group all those kernelswith size δ × ξ into the tensor K l ∈ R δ × ξ × k l . When properly trained, each ofthese kernels represents a feature of the system contributing to the variation ofthe unknown output parameter due to the change in the design parameter. Theconvolution kernel size determines the input neighborhood size which representsa particular feature. 11hen we apply the ﬁrst layer of 2D convolution on the input matrix, i.e.( l = 1), it outputs the 3D tensor Y C = { Y C ijk } [21]: Y C ijk = D ij (cid:63) K ijk = n (cid:88) b =1 m (cid:88) a =1 D ab K i − a +1)( j − b +1) k , k = 1 , , ..., k , (9)where the symbol (cid:63) denotes the convolution operation, which allows extractinglocal features with some prior knowledge. Eq. (9) changes slightly if the convo-lution blocks are skipped on more than one element of the input matrix on anydirection. The skipping lengths along the two directions of the input, termedas the stride s l = [ s l s l ] is also an important hyper-parameter. Except forthe ﬁrst composite layer, the input to the convolution layer is a 3D tensor. The l th layer takes each 2D slice of the tensor Y D ( l − and convolutes them with allthe kernels which create a 3D tensor Y Cl with a depth of k ( l − × k l . Theseadditional convolutions increase the neighborhood size of a particular featureand may cause some errors as well.Since convolution is a linear operation, it is required to introduce a nonlinearrectiﬁcation process to capture the complex nonlinear ﬂow features such as thebluﬀ body wake. We employ the common rectiﬁer linear unit (reLU) for thispurpose which yields Y Rl = { Y Rlijk } as: Y Rlijk = max( Y Clijk , . (10)The reLU is generally observed to be a faster learning non-linearization processthan the other commonly used tanh( x ) and sigmoid (1 + e − x ) − functions [22].By generalizing the convolution for the l th layer, the size v l of Y Cl will be: v l = (cid:108) m (cid:81) s l (cid:109) × (cid:108) n (cid:81) s l (cid:109) × (cid:81) k l which gets larger with smaller strides and alarge number of kernels and convolution layers. A down-sampling/pooling layerbetween convolutions can be used to reduce the size of the CNN matrices atthe cost of losing some data. The commonly used pooling techniques are themaximum, average and minimum where the respective function (max, avg ormin) is applied to small windows of the rectiﬁed output, generally with strides12 nput layer convolutional layer+ rectiﬁer layerpooling layer convolutional layer+ rectiﬁer layerpooling layer fully connectedlayeroutput layer Figure 1: An abstract representation of a convolutional neural network (CNN) architecture,which illustrates the alternative sequential stages between convolutional and pooling layers.While the convolution step is denoted in dashed lines, the pooling-layer window is denoted indotted lines. Units in a convolutional layer are feature maps and each unit is connected tolocal patches in the feature maps of the previous layer through a set of weights. equal to the pooling window size. It reduces the size of the feature map andalso drops some data. After the N composite layers, the feed-forward processends with an additional 1 D vector called a fully connected layer which has thesize of R β × where β is the stacked size of v N . Figure 1 summarizes a generalarchitecture of the convolutional neural network. In the ﬁgure, a convolutionallayer represents two sub-layers namely convolution and rectiﬁer. The featuremaps of the ﬁnal downsampling layer are fed into the fully connected layers forthe feed-forward process. While the convolutional layer plays a role to detectlocal connections of features from the previous layer, the pooling layer is tocollapse the similar features into one. The ﬁnal set of predictions for the forcecoeﬃcient of the feed-forward process C FCNN for a training set with S inputs isgiven by: [ C FCNN ] S × = [ w ] S × β [ Y N ] β × . (11)where w is the weight matrix of the fully connected layer. The values of thekernels and these weights are the adjustable parameters of the system and de-note them as a set W for the ease of explanation. C FCNN is compared with thefull order result ( C FF OM ) and the weights are then adjusted to minimize theerror using the back-propagation process. As being a universal black-box ap-proximation, the above neural networks with convolution layers can be eﬀectiveto deal with the local interaction and multiscale nature of Navier-Stokes PDE13ystem. In subsection 3.4, we provide a connection between the convolutionprocess with the memory kernels in the Mori-Zwanzig formalism used for themodel reduction of a full-order hydrodynamic system.

The role of back-propagation is to train a multi-layered neural network tolearn the relevant features from any arbitrary input-output datasets. The train-ing phase is an iterative process, which continuously minimizes the error betweenthe predicted and target outputs. It compares the output of the feed-forwardpass with the full-order result and corrects the set of weights ( W ) to minimizethe error. While the network is initialized with randomly chosen weights, thebackpropagation process attempts to ﬁnd a local minimum of the error func-tion. Let us represent the entire feed-forward process for the p th input matrix( d p ) with the function H ( d p , W ). We deﬁne a cost function G to measure thediscrepancy between the feed-forward prediction and full-order ﬂuid coeﬃcient: E p = G ( C FF OM , H ( d p , W )) . (12)In this study, the cost function G is the root mean square error L function.Now the target is to update the weight set W to minimize the error E p using astandard gradient descent back-propagation method.For clarity, we denote the layer number in superscripts and the iterationnumber in subscripts unless otherwise mentioned. For simplicity let us denotethe output of the l th composite layer of the feed-forward process as Y l . It canbe related to the previous layer output Y ( l − and the weights of the l th layer W l : Y l = F ( W l , Y ( l − ) , (13)where F represents a single pass of convolution, rectiﬁcation, and down-sampling.Note that Y = D and Y ( N +1) = C FCNN . The back-propagation process startsat this predicted value where the error gradient of the fully connected layer canbe determined ﬁrst. We next calculate the error gradient with respect to the14eights and the previous layer output recurrently in the backward direction.When the error gradient of the l th output layer: ∂E p ∂Y l is known, we can get theerror gradients using the chain rule: ∂E p ∂W l = ∂F∂W (cid:16) W l , Y ( l − (cid:17) ∂E p ∂Y l ,∂E p ∂Y ( l − = ∂F∂Y (cid:16) W l , Y ( l − (cid:17) ∂E p ∂Y l , (14)where ∂F∂W and ∂F∂Y are the Jacobians of F relative to W and Y , and are evaluatedat the l th layer. The successive gradients during the backpropagation can alsobe linked with the variational least-square minimization. After the evaluationof all the error gradients ∂E p ∂W , we use the stochastic gradient descent methodwith momentum (SGDM) [23] to adjust the parameter set for the T th iteration: W T = W T − − γ S S (cid:88) p =1 ∂E p ∂W + η ( W T − − W T − ) (15)where γ > η ∈ [0 ,

1] is called the momentum and is the hyper-parameterwhich determines the contribution from the previous gradient correction. Thegradient in the SGDM is an expectation, which may be approximately estimatedusing a small set of samples. Refer to [24] for the proof of convergence and thederivation of the SGDM method. In the next section, we provide the linkingbetween the deep kernel learning with the momentum term and the simpliﬁedanalytical form of the Navier Stokes momentum equation. We have outlined thedetailed formulation of our CNN-based deep learning algorithm via stochasticgradient descent. The present approach is solely data-driven and does not relyexplicitly on the underlying governing equations of the dynamical system.

Both ﬂow physics and deep learning rely on many degrees of freedoms, whichinteract in a nonlinear and multiscale manner. Here, we attempt to demonstratea physical analogy for deep learning in the context of Navier-Stokes equations15or the ﬂuid ﬂow modeling. This analogy will provide a fundamental basis anda conceptual underpinning to understand how the deep learning can provide auseful reduced model for ﬂow dynamics.The deep learning based on convolutional neural networks can be consideredas a black box mapping between two functions [9]: y ( x ) = L φ F ( x ) , (16)where y and F are the mappings of the input and output with the independentvariable x . The aim of CNN process is to determine the unknown mapping L byextracting its features φ using available data. Moreover, the CNN transformsthis problem to determine some unknown kernels κ in the equation: L φ F ( x ) = ( κ (cid:63) F ) = (cid:90) κ ( x − x (cid:48) ) F ( x (cid:48) ) d x (cid:48) , (17)where x (cid:48) is a dummy variable. The task of CNN is now to estimate thesekernels (memory eﬀect) by a learning process based on available input-outputcombinations obtained by full-order methods. Essentially, the CNN-based pro-cess facilitates the systematic approximations of the memory eﬀects during themodel reduction. We will justify this approximation property of deep learningby linking with the Mori-Zwanzig formalism. Before we provide an analogy ofthe convolution eﬀect and the stochastic iterative correction process, we trans-form the incompressible Navier Stokes momentum equation into a simpliﬁedintegral-diﬀerential form. To begin, let φ be the divergence-free velocity potential function in Eq. (1),such that u = ∇ φ . Substituting this for a very small volumetric ﬂuid region(∆ V ), this gives: ∂ ∇ φ∂t + ( ∇ φ · ∇ ) ∇ φ = − ρ ∇ p + ν ∇ ∇ φ. (18)16hich can be reduced to the reaction-diﬀusion equation: ∂ψ∂t − ν ∇ ψ = ∆ p µ ψ (19)where ψ = − ν ln φ . Detailed steps of the transformation of the Navier-Stokesmomentum equation into the reaction-diﬀusion equation is presented in Ap-pendix A. From a statistical viewpoint, Eq. (19) is a particular case of theEinstein-Kolmogorov diﬀerential equation: ∂ Υ ∂t = − ∂A Υ ∂x + ∂ B Υ ∂x , (20)where Υ( x , t ; x , t ) is the concentration at spatial point x at time t of particleswhich started undergoing Brownian motion at time t from point x . HenceΥ( x , t ; x , t ) · ∆ V is the probability of ﬁnding a particle undergoing Brow-nian motion. The connection between the partial diﬀerential equations andthe Brownian Motion can be constructed by the basic diﬀerentiation of distri-bution function of the underlying Gaussian stochastic process. A statistical-mechanical theory of many ﬂuid particle system can be employed to link thecollective transport and the Brownian motion. The following equation so-calledthe Einstein-Kolmogorov equation relates the particle concentration with theinitial concentration:Υ( x , t ; x , t ) = (cid:90) Υ( x , t ; P, θ )Υ(

P, θ ; x , t ) dV P , (21)where t < θ < t and the integral is taken all over the spatial domain. Thisequation leads to Eq. (20) as shown in [25]. Eq. (20) relates the stochasticprocess of the Brownian motion to a continuum partial diﬀerential equation onconcentration.Next, we can formulate a general solution for the reaction-diﬀusion equation,Eq. (19) using the Green’s function. Let L be the operator L = ∂∂t − ν ∇ .The diﬀerential equations for ψ ( x , t ) and the Green’s function, G ( x , t ; ξ , τ ) for17 , ξ ∈ D and t, τ ≥ L ψ ( x , t ) = ∆ p µ ψ ( x , t ) , (22) L G ( x , t ; ξ , τ ) = δ ( x − ξ ) δ ( t − τ ) , (23)where D is the ﬂuid domain of interest and ξ and τ are dummy variables. Thisleads to the integral form of the Green’s function based general solution fornon-homogeneous diﬀusion equation [26] (see Appendix B): ψ ( x , t ) = (cid:90) ∞ (cid:90) V G ∆ p µ ψ ( x , t ) dV ξ dτ + (cid:90) V Gψ ( x , dV ξ + ν (cid:90) ∞ (cid:73) S [ G ∇ ξ ψ + ψ ∇ ξ G ] n dSdτ. (24)Note that the second and third terms represent the initial and boundary condi-tions. Considering only the ﬁrst term, Green’s function is the unknown kernelanalogous to the kernels of the Eq. (17). Then the blackbox process L φ in Eq.(16) is analogous to the ﬁxed point iteration of the nonlinear functional map ofEq. (24) given by F : ψ m +1 ( x , t ) = F ( ψ m ( x , t )) , (25)where m is the iteration number. The above steps provide a physical processto construct the direct mapping function from the Navier-Stokes momentumequation for the hydrodynamic variable of interest. This integral transformationof the variable of interest can also be realized using the deep architecture wherethe weights between the successive layers are connected through a mappingfunction. We next present the Mori-Zwanzig formalism to establish a connectionof the reduced system obtained from the Navier-Stokes equation. During thecoarse-grained representation via the Mori-Zwanzig formalism, the complexityof large degrees of freedom in the full-order model is replaced by the convolution(memory) kernels. In the deep learning, the CNN process provides systematicapproximations for the memory eﬀects of the original dynamics.18 .4.2. Mori-Zwanzig formalism and convolution integral We brieﬂy discuss the Mori-Zwanzig formalism [27, 28] in the context ofmodeling memory eﬀects during the CNN-based deep learning. This formalismprovides a strategy for integrating out a subset of variables in a generic dynam-ical system through the coarse-graining process. Let the coarse-grained modesof the function ψ in Eq. (19) be χ such that ψ ( χ , t ) = g ( χ ), where the initialcondition χ (0) = χ with χ ∈ L . Consider the decomposition of ψ as the slow(resolved) modes (cid:98) ψ = ψ ( (cid:98) χ , t ) and the fast (unresolved) modes (cid:101) ψ = ψ ( (cid:101) χ , t ). Aspresented in Appendix C, the Mori-Zwanzig equation for the resolved variableof interest (cid:98) ψ can be derived as: ∂ (cid:98) ψ∂t = Θ (cid:98) ψ (cid:124)(cid:123)(cid:122)(cid:125) Markovian + (cid:90) t K ( t ) (cid:98) ψ ( t − t ) dt (cid:124) (cid:123)(cid:122) (cid:125) Convolution + F ( (cid:98) χ , t ) (cid:124) (cid:123)(cid:122) (cid:125) Fluctuation (26)The ﬁrst term in the right-hand side is a Markovian term given by Θ (cid:98) ψ = e Lt P L (cid:98) ψ , where P is a projection of ψ to the slow variables (cid:98) ψ and the dif-ferential operator L is given by L = ( ∆ p µ + ν ∇ ). The Markovian term dependsonly on the value of (cid:98) ψ at time t . The second term depends on every valueof ψ during interval [0 , t ], hence represents the memory eﬀect via convolutionprocess. The convolution (memory) kernel is given by K ( t ) ψ = P Le

QLt

QLψ ,where Q = I − P is an orthogonal subspace of P and I denotes the identityoperator. The projection operators P and Q are self-adjoint and they are or-thogonal to each other. The third term F ( (cid:98) χ , t ) = e QLt QL (cid:98) ψ represents theﬂuctuation (noise) and it satisﬁes the orthogonal dynamics equation. In a stan-dard stochastic diﬀerential equation, the time evolution of the resolved variableis considered as the combination of the real-time Markovian term and the ﬂuc-tuation. However, the Mori-Zwanzig formalism contains the convolution kernelwith memory eﬀect, which is not correlated with the ﬂuctuation. The CNN-based learning process is speciﬁcally trained to extract the memory-dependentdynamics via data-driven modeling. In a broader sense, for ﬂuid dynamics, wecan apply this dimensionality reduction approach to build subgrid models for19urbulence via data-driven approximations. Next, we show a link between thestochastic gradient descent with the momentum and the discrete counterpart ofthe reaction-diﬀusion system. By considering the time-discretized form of the reaction-diﬀusion equationEq. (19) for two adjacent time steps, we get: ψ T − ψ T − ∆ t = ∆ p µ ψ T + ν ∇ ψ T (27) ψ T − − ψ T − ∆ t = ∆ p µ ψ T − + ν ∇ ψ T − (28)Subtracting the equations and rearranging the terms, we obtain ψ T = ψ T − + 2∆ tµν µ − ∆ p ∆ t (cid:0) ∇ ψ T − ∇ ψ T − (cid:1) + 2 µ µ − ∆ p ∆ t ( ψ T − − ψ T − ) . (29)This is analogous to the stochastic gradient descent method with momentumgiven in Eq. (15). The second term of the right-hand side is equivalent to theerror gradient and the third term represents the momentum term. The abovesemi-discrete form of the simpliﬁed Navier-Stokes momentum equation providesa connection with the discrete kernel learning (i.e., black-box integrator) fromthe stochastic gradient with momentum. We next present the sensitivity andconvergence study of the proposed CNN-based stochastic gradient method withthe momentum term.

4. Results

In this work, for the ﬁrst time, we apply a deep convolutional network for thefundamental ﬂuid dynamic problem of the ﬂow past bluﬀ bodies. In particular,we quantify the variation of force coeﬃcients e.g., lift ( C L ) and drag ( C D ) asa function of the bluﬀ body geometry. Without loss of generality, we considertwo of the most widely studied bluﬀ body geometry variations: the aspect ratio20 AR ) of an elliptical body and the rounding angle ( ϕ ) of a square shaped body.Using a trained deep neural net, we will examine how the deep nets can predictwell on unseen conﬁgurations (i.e., generalization property of deep nets). (a) (b) (c)(d) Figure 2: Problem deﬁnition and parameter space for the application of deep convolutionalnetwork: (a) schematics of bluﬀ-body geometries with relevant dimensions: aspect ratio ( AR )of ellipses and rounding angle ( ϕ ) of squares are shown, (b) known training geometry set,(c) vortex street and time histories of the force coeﬃcients for a representative case: circularcylinder ( AR = 1 and ϕ = 90 o ) and (d) unknown geometries to be predicted by using theFOM data of the training set. Figure 2 (a) shows the two-dimensional problem domain and the boundaryconditions. A Dirichlet boundary condition ( u = U, v = 0) is employed at theinlet and a Neumann boundary condition ( ∂u∂x = 0 , ∂v∂x = 0 , p = 0) is appliedat the outlet. The top and bottom boundaries are applied with a symmetricboundary condition ( ∂u∂y = 0 , v = 0) to simulate the far-ﬁeld conditions. A no-slip boundary condition is applied on the bluﬀ body surface. The ﬂow domain21 consists of a collection of non-overlapping ﬁnite elements with a body-ﬁttedformulation with ﬁner grid resolution in the regions of high gradients.While Fig. 2 (a) shows the bluﬀ body shapes immersed in a uniform ﬂowspeed of U , Figs. 2(b) and (d) illustrate the training cases and prediction casesrespectively. Even though we deﬁne two diﬀerent conﬁgurations of bluﬀ bodies,they share a common element: the circular bluﬀ body can be considered as anellipse with aspect ratio, AR = 1 and a rounded square with a rounding angle ϕ = 90 o . Hence, we assume that the high-ﬁdelity data of one conﬁguration willbe useful in predicting the wake ﬂow characteristics of the other conﬁguration.We perform the full-order simulation over a small subset of the inﬁnite setof problem geometries shown in Fig. 2 (b), for a laminar ﬂow at Re = 100(based on the diameter D ) to extract the velocity and pressure ﬁelds for 20cycles of vortex shedding. Using the ﬂow ﬁeld data, we ﬁrst calculate the timehistories of the lift and drag coeﬃcients and then the mean drag ( C D ) and theroot mean square (rms) of the lift ( C rmsL ) for each training case. Figure 2 (c)shows the vortex shedding pattern, the time histories of the force coeﬃcientand the extracted time-averaged statistics for the representative case of circularshape. We feed these values as the output layer of the CNN while a distancefunction based on the bluﬀ body geometry is fed as the input to the CNN. Thedeep neural network is trained to obtain the functional relationship between thebluﬀ body geometry and the force dynamics such that it can predict the forcecoeﬃcients for any perturbed geometry of the training set. As a demonstration,we estimate the force coeﬃcient for all geometries in the set shown in Fig. 2(d).The ﬂuid loading is computed by integrating the surface traction consideringthe ﬁrst layer of elements located on the cylinder surface. The instantaneous22 L u =20D L d =40D Y X v=0, v=0, xy =0 xy =0 v=0u=U H = D xy =0 xx =0 D (a) (b) Figure 3: Full order model based on Navier-Stokes equations: (a) problem domain and (b)unstructured 2D mesh lift and drag force coeﬃcients are deﬁned as C L = 1 ρU D (cid:90) Γ ( σ . n ) . n y d Γ , (30) C D = 1 ρU D (cid:90) Γ ( σ . n ) . n x d Γ , (31)where D is the cylinder dimension perpendicular to the ﬂow direction and n x and n y are the Cartesian components of the unit normal, n . The domain is dis-cretized using an unstructured ﬁnite-element mesh shown in Fig. 3(b). There isa boundary layer mesh surrounding the bluﬀ body and triangular mesh outsidethe boundary layer region. This mesh is obtained by the convergence stud-ies conducted in [16, 29]. The mesh selected for the circular cylinder contains86,638 elements. This element count slightly diﬀers for each bluﬀ-body geome-try. Based on our previous temporal convergence studies in [16, 29], we employ∆ t = 0 .

025 (

D/U ∞ ). We deﬁne an input function which represents the bluﬀ body geometry and isindependent of any other variable. We use the Euclidean distance from the bluﬀbody boundary given by d ( x, y ) = { d ij } ∈ R m × n for a Cartesian XY -coordinate23rame ( x ∈ X, y ∈ Y ) centered at the bluﬀ body center such that: d ij = (min || r ( x i , y j ) − r ( x Γ , y Γ ) || ) b, i = 1 , , ..., m, j = 1 , , ..., n, (32)where r is the distance to the domain point from the bluﬀ-body center and Γrepresents the independent bluﬀ body boundary points. The binary condition b = 0 on and inside the boundary and b = 1 otherwise. The integers m and n arethe matrix dimensions in the x and y directions, which depend on the underlyingﬂuid domain discretization. In all cases, the input discrete representation ofgeometry is evaluated on a coarse uniform grid (i.e., x i , y j ) of 0 . D intervalson a rectangular domain [( − D, − D ) , (40 D, D )], which results into a 2Dmatrix with a size m × n = 201 × x Γ , y Γ )represents ( x, y ) coordinates of 512 points on the bluﬀ body boundary, which areindependent of the main coarse grid. This grid resolution and the ﬁne boundarydiscretization properly capture the geometry variations among diﬀerent bluﬀbodies. The feed-forward process is then applied to the input geometry matrixto extract the dominant features of the ﬂow. As mentioned in Section 3.2 weneed to tune the algorithm by optimizing a few hyper-parameters. The nextsection describes the hyper-parameter design performed in this particular case. Extreme reﬁning and overuse of the convolution layers may cause the CNNto overﬁt the training dataset and make it incapable of predicting the forces forperturbed geometries of the training dataset. However, the under-utilization ofconvolution will increase the error of prediction. Here, we present an empiricalsensitivity study to establish the hyper-parameter values for the best perfor-mance of the CNN-based learning. Speciﬁcally, we address some of the generalissues raised by [6] for deep learning methods in ﬂuid ﬂows e.g., the number ofconvolution layers, the number of ﬁlters per layer, the size of the ﬁlters and thelearning rate.The main benchmark considered here is the maximum relative error of the24rag coeﬃcient prediction which is required to be below 5% threshold for theacceptable hyper-parameter set. We evaluate this error for the training set andalso select a subset of 10 target geometries ( AR = 1 . , . , , , ◦ , ◦ , ◦ , ◦ , ◦ ) which we denote as the test set. We start withthe hypothesis that the CNN process with the least overall size is optimal forthe predictions since it has the smallest ﬂuid domain neighborhood representinga ﬂow feature of dynamical importance. Our ﬁrst trial is to increase the stridewhich can cause our results to deteriorate since it increases the ﬂuid neigh-borhood size of a feature and generates erroneous local connectivities betweenfarther points in the ﬂuid domain. For all convolution layers, we use a strideof s l = [1 1]: i.e. the convolution kernel block is moved adjacently in bothdirections of the 2D matrix. As shown in Fig. 4 (a), the least error is observedwhen 50 kernels are used. When a less number of kernels is used, it missessome ﬂow features required to predict C D from the bluﬀ body geometry. Ifwe increase it too much, the CNN introduces erroneous features increasing theprediction error. Figure 4 (b) shows that the prediction error is minimal: forthe training set when 8 × × × C D for newgeometries via the generalization property of deep nets. Also, the 8 × × C D prediction fromthe actual value. At a learning rate of 0 .

01, we obtain the best predictions asshown in Fig. 4c. Smaller learning rates do not provide an acceptable accuracywithin a prescribed number of iterations. If the learning rate is too large, theweights of the CNN oscillate between the same values without converging to theminimum error.Next, we conduct the error analysis with a diﬀerent number of layers, thekernels and with and without a down-sampling layer, which is shown in Fig.4d. We ﬁnd that the CNN performs best when a single composite layer with25

B C D E F02468 M a x i m u m e rr o r % A - 1 layer, 50 kernels, w/o poolingB - 1 layer, 50 kernels, w pooling

Train set error Test set error

C - 2 layers, 7 kernels, w/o poolingD - 2 layers, 50 kernels, w/o poolingE - 2 layers, 28 kernels, w poolingF - 2 layers, 50 kernels, w pooling

10 20 30 40 50 60 70 80 90 10012345678

Number of kernels per layer M a x E rr o r % (a) × × × × Kernel size M a x E rr o r % (b) .

001 0 .

002 0 .

005 0 .

01 0 . Learning rate M a x E rr o r % (c) , ,w/o , ,w , ,w/o , ,w/o , ,w , ,w No. of convolution layers, No. of ker-nels, With or without downsampling M a x E rr o r % (d) Figure 4: Hyper-parameter analysis: maximum error variation with (a) number of kernels,(b) kernel size, (c) learning rate and (d) number of layers. The solid lines denote the errorthreshold of 5%.

50 kernels is used without the down-sampling layer. When we use more layers,it increases the ﬂuid neighborhood size representing a local feature creatinginvalid local connectivities. Down-sampling drops many points of the featuremap obtained by a kernel which causes a much larger ﬂuid neighborhood to berepresented by a single value. This aﬀects adversely on the wake ﬂow predictionswhere the features are highly local. To summarize, we use a CNN with oneconvolution layer consisting of 50 kernels of the size 4 ×

4, followed by an reLUrectiﬁcation layer and an SGDM back-propagation with a learning rate of 0.01.Some sample data and codes used in this work are publicly available on Githubat: https://github.com/TharinduMiyanawala/CNNforCFD .4. Check for overﬁtting When tuning the hyper-parameters of the CNN, we face the risk of overﬁttingthe algorithm to the training data. However, it is possible to check the developedCNN algorithm for overﬁtting without running a validation check with FOMof predictions. We ﬁrst test the algorithm using the standard 1-fold exclusiontest. Here we exclude one element at a time from the training set and predictit using the CNN trained by the other inputs. If the system is overﬁtting,the exclusion of one element causes the system to fail to reach the accuracyrequired. However, the full-set predictions will generally be more accurate thanthe 1-fold excluded predictions. We also introduce a class-wise training wherewe only employ the common element: circle, one ellipse and one rounded squareas the training set. If the prediction error of this test falls below the requiredthreshold, it conﬁrms that the algorithm is not overﬁtting. However, if the errorof this test is higher it will not give a conclusion on overﬁtting. According toFig. 5, the relative error is below the 5% threshold for both tests which conﬁrmthat the CNN-based model does not overﬁt to the training data. In the nextsection, we predict the force coeﬃcients for new bluﬀ body geometries using thedesigned CNN. o o o . . . . . . Ellipses: AR Rounded Sq.: ϕ C D FOMFull-set1-fold exclusionClass-wise (a) o o o Ellipses: AR Rounded Sq.: ϕ M a x . E rr o r %

5% ThresholdFull-set1-fold exclusionClass-wise (b)

Figure 5: (a) Assessment of CNN-based C D prediction for the training set of using full-set,1-fold exclusion and class-wise training. (b) Relative percentage error for each case. . . . . . Rounding Angle ( ϕ ) C D FOMPredicted-EllipsesJaiman et. al. (2015)

Aspect Ratio ( AR ) FOM - Rounded SquaresPredicted - Rounded SquaresJaiman et. al. (2015) - Rounded Sq.FOM - EllipsesPredicted - EllipsesJohnson et. al. (2001) - Ellipses (a)

10 20 30 40 50 600 . . . . . Rounding Angle ( ϕ ) C r m s L FOMPredicted-EllipsesJaiman et. al. (2015)

Aspect Ratio ( AR ) FOM - Rounded SquaresPredicted - Rounded SquaresJaiman et. al. (2015)FOM - EllipsesPredicted - Ellipses (b)

Figure 6: Prediction of the force coeﬃcients: (a) mean drag and (b) rms of the lift. Theresults are compared with simulation results of [30] and [31].

We use the convolutional neural network designed in Section 4.3 to deter-mine the force coeﬃcients for the perturbed geometries shown in Fig. 2 (d).We determine the error of prediction and then compare the computational costof the full-order model and the CNN-based approximation. We feed the CNNmodel with the new geometry input function generated for the bluﬀ body conﬁg-urations in Fig. 2 (d) and obtain the predictions of the force coeﬃcients. Figure6 compares the results of the CNN-based prediction with the full-order Navier-Stokes results. The CNN successfully predicts the force coeﬃcients within 5%error, which clearly demonstrates the generalization of our optimized convolu-tional nets. A direct functional relationship between the ﬂuid forces and theinput geometric parameters is established for unstable wake ﬂow via the pro-posed deep learning method.

Table 1 describes the computational resources and the elapsed time forthe full-order model when running on a multicore workstation for the high-28 able 1: Summary of computational resources used, speed-up and error between FOM andCNN data. While FOM is performed on a multi-core workstation and a single-core personalcomputer (PC) system, the CNN-based computation is done on a PC.

FOM-HPC ∗ FOM-PC ∗ CNN ∗∗ No. of Processors 24 1 1Processor Speed (GHz) 2.60 2.60 2.60RAM (GB) 256 16 16Elapsed time (online) (sec) 2659.44 63548.21 69.41Elapsed time(oﬄine) (sec) - - 287.24online time per case ∗∗ (sec) 2659.44 63548.21 3.31Total time per case ∗∗ (sec) 2659.44 63548.21 16.98Speed-up factor (online) (803 † , 19198 †† )Speed-up (oﬄine + online) (156 † , 3742 †† )Maximum Error 4 . † ∗ -per simulation, ∗∗ - for an input set of 21 cases, † -relative to FOM-HPC, †† -relative toFOM-PC. performance computing (HPC) and a single-core personal computer (PC) andthe CNN-based computation is performed on the same PC. While the parallelsimulations utilize 24 processors and 256GB RAM for each case, the serial PCsimulations are performed on a single processor with 16GB RAM. Note that eachFOM case is a standalone and completely online computation. To obtain theforce coeﬃcients of one case, we run the FOM for 20 shedding cycles which min-imize the numerical errors introduced by the initial transient response. Henceevery simulation has to run until ∼ tU/D = 250 or ∼ tU/D = 0 . M elements, the GMRES methodperforms ∼ (2 M + N M ) ﬂoating-point operations for the matrix-vector prod-ucts and additions in the N th iteration for the N th Krylov subspace vector.Hence, for a single time step, it requires ∼ (2 M + N M ) operations. Since N (cid:28) M , the total number of operations performed in the FOM solver for T time steps is ∼ O ( M T ). In this study, M ∼ O (10 ) and T ∼ O (10 ), hencethe number of operations is ∼ O (10 ) per case. Unlike the FOM computationvia ﬁnite-element GMRES approach, the CNN-based deep learning predictionhas an oﬄine training process and then the force predictions are obtained forthe required inputs. This oﬄine process gives a library of kernels and weights29hich are used for all predictions where in the case of full-order simulations thehigh-ﬁdelity data of one FOM simulation is generally not utilized for the nextFOM simulation. Figure 7: Comparison of the processes of extracting the force coeﬃcients from the FOMand CNN for the ﬂow past a bluﬀ body. Here σ ( u , p ) is the Cauchy stress tensor and C X =[ C D , C rmsL ]. The convolution layer has ‘ l (cid:48) layers with size p × p where kr i is the i th componentof the r th layer. F C i is the i th element of the fully connected layer. Figure 7 illustrates a comparison of the online processes of FOM and CNN.The online prediction phase of CNN is an extremely fast process since it requiresfar less number of ﬂoating point operations than FOM. For a CNN consisting ofan input matrix with M elements and l number of p × p kernels, the convolutionprocess requires only ∼ M p l operations. The non-linearization and the fully-connected layer mapping require further ∼ M l operations which estimate thetotal number of operations as ∼ M l (2 p + 3) for a CNN consists of a singleconvolution, rectiﬁer, and fully-connected layer.In this study M ∼ O (10 ), l = 50 and p = 4 which assesses the number ofoperations per case at ∼ O (10 ). Theoretically, the online process of CNN is30 (10 ) times faster than the FOM. Due to the high eﬃciency of the CNN-baseddeep learning, the force predictions for all 21 cases are computed at once. Usingthe serial computation power of a PC, it gives ∼

800 speed up in comparisonto the FOM simulations on an HPC which has 24 times the processing powerand 16 times the RAM in contrast to the single-processor PC. Even with theoﬄine training process, the CNN-based prediction is ∼

150 times faster thanthe HPC-FOM. When the CNN-based deep learning computation and the FOMsimulation are performed on the same PC, the CNN is ∼ ∼ ∼ O (10) seconds.In summary, the CNN-based deep learning model predicts the force coeﬃ-cients within 5% accuracy at ∼ O (10 ) speed-up per single new geometry usinga small fraction of the computational resources as compared to its FOM coun-terpart. This allows the maximum utilization of the high-ﬁdelity data obtainedfrom the full-order methods. The CNN-based prediction has a very attractivespeed versus accuracy trade-oﬀ for the design optimization process and the feed-back control of bluﬀ body ﬂows. Furthermore, the CNN signiﬁcantly reducesthe computational resource requirement enabling the design space explorationof wake ﬂow problems on personal computers and devices.

5. Discussion

Through our CNN-based deep learning results, we are successfully able toidentify the variation of the force coeﬃcients with the two representative geo-metric perturbations: the aspect ratio of an ellipse and the rounding angle of arounded square. In the present study, Eq. (31) links the instantaneous force co-eﬃcients with the pressure ( p ) and velocity ( u ) ﬁelds, which are governed by the31avier-Stokes equations as functions of physical and geometric parameters e.g., Re and cylinder geometry. In a broader sense, the full-order model provides afunctional map between the geometric perturbations and the force coeﬃcients,i.e. for the case of ellipses: C D ( x ) = f ( AR ( x )) (33) C rmsL ( x ) = g ( AR ( x )) . (34)The functions f and g comprise the force contributions from the ﬂow featuressuch as the shear layer, the near-wake, the vortex street, the time and spacediscretizations, the surface integration of Eq. (31) and the time averaging of theinstantaneous forces. Each of the input-output pair generated by the FOM anal-ysis is independent, contains a large amount of high-ﬁdelity data and consumessigniﬁcant computational eﬀort. While the FOM provides a great insight intothe ﬂow physics, it is generally not suitable for the iterative optimization andthe eﬃcient engineering design study. We have shown that the CNN-based deepkernel learning provides a functional relationship between the ﬂow parameterand design parameter and it can also capture dynamically important patterns.In this section, we further elaborate our results on the force prediction for vary-ing aspect ratios. In particular, the variation of the lift coeﬃcient with theaspect ratio ( AR ) of ellipses (Fig. 6 (b)) shows an interesting pattern. It hasa maximum in the vicinity of AR = 2 .

5. Here we brieﬂy examine the reasonsfor this behavior and investigate the capability of CNN-based deep learning tocapture it.

The lift force of a bluﬀ body is aﬀected by many ﬂow features. Here we dis-cuss two such features namely, the near wake circulation and the recirculationlength, and their variations with the aspect ratio of the elliptical bluﬀ body. Ac-cording to the Kutta-Joukowski’s theorem for an inviscid ﬂow around a stream-line body that the lift force per unit span can be expressed as: F L = − ρ f U Γ,32here Γ is the circulation given by the line integral (cid:72) S u f · dS evaluated on aclosed contour S enclosing the streamlined body. Although the viscous ﬂow pasta bluﬀ body does not satisfy some of the conditions, it suggests the importantrelationship: the lift force has some proportional relationship with the circula-tion of the vortical wake. Here, we observe the vortex patterns and calculatethe total circulation of the wake to describe the lift variation observed due tothe perturbation of the ellipse geometry. Figure 8 (a) shows the vorticity pat-terns of the wake of diﬀerent ellipses. When the aspect ratio is changed fromthe circular cylinder ( AR = 1 .

0) to AR = 10 .

0, the vortex pattern changesfrom a standard Karman street to a two-layered vortical wake pattern. Thisbehavior is also observed for Re = 100 in the recent study by [32]. It isclear that the vortex intensity and the recirculation length diﬀers accordingto the aspect ratio of an elliptical bluﬀ body. We quantify the total circula-tion of the near wake by considering the closed rectangular contour bounded by( − , − . D, (3 . , − . D, (3 . , . D and ( − , . D . Apart from the x = 3 . D line segment, all the other line segments pass through irrotational regions of theﬂow. The circulation is taken positive in the anti-clockwise sense. Figure 8 (b)shows the time traces of the near wake circulation and the lift coeﬃcient. Asexpected, the Circulation and the lift signals have the same frequency as theStrouhal frequency, but with a phase diﬀerence. Unlike for a streamlined body,the lift force on a bluﬀ body also depends on the distance to the vortices in thewake. We present the circulation and the recirculation length variation withrespect to the aspect ratio in Fig. 8 (c). Note that the recirculation lengthis calculated as the distance to the closest vortex core just after the end of itsshedding from the bluﬀ body. The near wake circulation has a maximum of AR = 4 . AR .The predictions based on our CNN-based deep learning rely on extractingfeature patterns as described here. Due to this, unlike the FOM analysis, itutilizes the full-order results of all input simulations to determine the requiredﬂow dynamic property. We further discuss the interpretation of the learnedkernels in terms of the feature extractions.33 a)

150 200 250-2-1012 (b) . . . . Aspect Ratio ( AR ) Γ r m s Γ rms . . L r Near wake circulation (Γ rms )Recirculation length ( L r ) (c) Figure 8: Analysis of ﬂow past over elliptical cylinders of varying aspect ratios. Flow featuresaﬀecting the lift variation with aspect ratio: (a) Development of vortex shedding patternsobserved for diﬀerent aspect ratios, (b) Time traces of near wake circulation and lift coeﬃcientfor AR = 1 .

0, and (c) Variation of near wake circulation and recirculation length with AR .Note that Γ and L r values presented here are non-dimensionalized by UD and D , respectively. .2. Convolution kernels Figure 9: Feature maps obtained by critical ﬁlters of the convolutional neural network forthe ﬂow past a circular cylinder. The maps are presented in the near cylinder 5 D × D ﬂuid domain. Note that the scales are not uniform due to the diﬀerent kernel weight rangesretrieved by the adaptive training process. We have demonstrated that the CNN process is eﬃcient and accurate for theprediction of unsteady forces acting on a bluﬀ body due to the vortex sheddingprocess. The success of CNN-based deep learning lies in the reasonable captur-ing of ﬂow features aﬀecting the forces on the bluﬀ body through the learningprocess. The main learning operation occurs at the adaptation of convolutionkernels using the stochastic gradient descent method. The learned discrete ker-nel is a small matrix of numbers which makes it hard to visualize and interpret.However, by applying convolution operation on the input matrix using indi-vidual kernels separately, this will produce the individual feature map of eachkernel. For example, Fig. 9 shows some of the critical feature maps extracted atthe convolution layer of the circular cylinder. These feature maps provide theencoding of features and their locations for the convolutional layers. While thelocation of a feature explicitly represents where it is, the response of a feature(i.e. what it is) is encoded implicitly. Carefully designed deep networks basedon the convolutional feature maps can extract the ﬂow dynamical features. It is35orth noting that the feature maps will not necessarily display commonly seenﬂow features such as the shear layer and the Karman vortex street. In mostcases, these features are reconstructed by a nonlinear combination of many ker-nels. Hence, the learning process is not limited to the learning of kernels but alsoincludes learning the proper combination of kernels for the prediction process.Next, we investigate whether the CNN-based learning is capable of predictingthe lift force variation generated by the FOM even it is fed by diﬀerent inputcases. M a x . e rr o r % . . . . . . . . . . . .

42 Aspect Ratio ( AR ) C r m s L FOM AR = { , , } AR = { , , } AR = { , , } AR = { , , } AR = { , , } † AR = { , , , } Figure 10: Lift force prediction using diﬀerent training sets. The legend states the cases usedas the training set. † denotes that only ellipses are used for training. All other cases aretrained together with the set of rounded square cases: ϕ = { o , o , o , o } . Inset showsthe maximum error of each prediction. The predictions presented in Fig. 4.5 are generated by a small subset ofinput-output combinations obtained from the full-order Navier-Stokes simula-tions. However, the CNN-based deep learning does not operate as a traditionalcurve-ﬁtting tool whereby the single-valued input-output combinations are ﬁtted36o match a predetermined general curve formula. The input to the CNN-baseddeep learning is not a single value but a matrix of function values derived fromthe single value input. This makes the CNN capable of extracting the ﬂow fea-tures and eventually much more data eﬃcient than standard curve-ﬁtting andregression tools.Figure 10 shows the performance of CNN-based learning when fed with dif-ferent combinations of inputs. We ﬁnd that the predictions perform best whenthe extremes of the geometric variations (e.g., in the case of ellipses: AR = 1and 10) are included. Furthermore, when we include AR = 3 case, all the predic-tions are accurate within 5% error margin of the FOM results. The performanceof CNN-based deep learning becomes slightly better when AR = 2 case is alsoadded to the training set. The interesting observation is that the predictionsare signiﬁcantly improved when four cases from the rounding angle problem( ϕ = 10 o , o , o and 60 o ) are included in the training process. This emphasizesanother advantage of CNN-based prediction that we may use the data of a sim-ilar problem to enhance the accuracy of the data-driven predictions. The mostsigniﬁcant result is that the CNN has accurately captured the maximum lift inthe vicinity of AR = 2 . AR = 1 , , AR = 2 , , ϕ = 10 o , o and60 o rounded square cases. In both cases, the drag coeﬃcient of ellipse geometrieswith the aspect ratio outside the training set are predicted reasonably. Figure 11illustrates the predictions of these cases with an error about 5%. It is worth not-ing that the interpolation approach is slightly more accurate than the extrapola-tion cases. Relevant codes and data for this particular example can be found at37 . . . . . . AR ) C D FOMCNN - InterpolationCNN - Extrapolation set 1CNN - Extrapolation set 2

Figure 11: Prediction of mean drag as an extrapolation problem:- extrapolation set 1: AR =1 , , , φ = 10 , , , and extrapolation set 2: AR = 2 , , , φ = 10 , , . GitHub link: https://github.com/TharinduMiyanawala/CNNforCFD . Over-all, the above investigation suggests that we can achieve an improved perfor-mance by allocating the high-performance computing resources for the FOMcases at the extreme values of the input parameter.Next, we attempt to generalize the force coeﬃcient predictions to otherbluﬀ body shapes with sharp corners. In particular, we utilize the same CNNtrained presented in Section 4.4 to predict the drag coeﬃcient for the bluﬀ bodygeometries shown in Fig. 12, i.e. a symmetric trapezium, a regular hexagon,a regular octagon, a basic square and a rectangular geometry with an aspectratio of 2. The force predictions and the associated errors are presented in table2 in the ascending order of the drag coeﬃcient. The CNN forecasts the dragcoeﬃcients with more than 92.5% accuracy. As expected, the shapes closer tothe circle and the rounded square (i.e., sharp square, octagon, and hexagon) havehigher accuracy than the trapezium and the rectangle geometries. These results38emonstrate a broader prospect of the data-driven ﬂow dynamics predictionsvia trained deep nets. Using the available full-order or experimental results, onecan create a common database for ﬂow dynamic coeﬃcients and use it to train ageneralized CNN to predict the ﬂow dynamic parameters of any complex shapefor a wide range of ﬂow conditions. (a) (b)(c) (d)(e)

Figure 12: Vorticity ﬁelds for sharp cornered bluﬀ-bodies: (a) trapezium, (b) regular hexagon,(c) regular octagon, (d) square and (e) rectangle with aspect ratio = 2 at tU ∞ /D = 245. Flowis from left to right. Eﬃcient prediction of force coeﬃcients via our data-driven framework hasa profound impact in oﬀshore, civil and aeronautical engineering applications.In particular to practical oﬀshore structures, the hydrodynamic coeﬃcients re-quired for the well-known force decomposition model of [33] can be obtainedeﬃciently from the present CNN-based model which are otherwise expensiveand slow via CFD-based simulations or physical experiments. Further exten-39 able 2: Generalization of drag force predictions for sharp-cornered bluﬀ-bodies shown in Fig.12.

Bluﬀ-body C D Error (%)FOM CNNTrapezium 1.3584 1.2597 7.20Hexagon 1.3787 1.3741 0.33Octagon 1.4196 1.3973 1.57Square 1.5095 1.4291 5.30Rectangle 1.8341 1.7186 6.29sions and demonstrations are underway for high Reynolds number and practicalgeometries.

6. Conclusions

We have presented a general and eﬃcient data-driven model reduction methodbased on the deep-learning convolutional neural nets and the stochastic gradi-ent descent with momentum term. We successfully demonstrated the predictivecapability of the proposed deep learning technique for the force coeﬃcients of aset of bluﬀ body geometries. We provided a physical analogy of the stochasticgradient descent method and the iterative correction process of the CNN-basedwith the simpliﬁed form of the Navier-Stokes equation. We illustrated the con-nection of the convolution process in the deep neural networks with the Mori-Zwanzig formalism. During the prediction process, we feed this CNN-basedmodel with the force coeﬃcients derived for seven bluﬀ bodies using a full-orderNavier-Stokes solver. The CNN is trained to output the force coeﬃcients for anyperturbed bluﬀ body geometry input. We successfully tuned the neural networkparameters for the best prediction performance with a maximum error < × O (10 ) for the considered set of cases. We have shown that theCNN-based learning accurately captures the ﬂow features related to the forcepredictions with very less input-output data combinations. The CNN-basedpredictions are accurate for both interpolation and extrapolation from the geo-metric parameter range of the training set. Finally, we demonstrated that thetrained CNN can be used to predict many diﬀerent geometries. The eﬃcientprediction of force coeﬃcients using the present data-driven method has a greatvalue for the iterative engineering design and the feedback ﬂow control [34]. Wehope that this work will help to expand the use of deep neural networks in awider range of CFD applications. Appendix A: Transformation of the NS momentum equation to reaction-diﬀusion equation

We brieﬂy present the transformation of the NS momentum equation into thereaction-diﬀusion problem by rewriting the convective and diﬀusion terms andeliminating the convective operator. Let φ be the velocity potential function as u = ∇ φ and then from the incompressible Navier-Stokes momentum equation(Eq. (1)), we get ∂ ∇ φ∂t + ( ∇ φ · ∇ ) ∇ φ = − ρ ∇ p + ν ∇ ∇ φ. (A.1)Using the vector identity: ∇ ( a . a ) = 2 ( a × ( ∇ × a ) + ( a . ∇ ) a ) and ∇ × ∇ φ =0, we get ( ∇ φ · ∇ ) ∇ φ = ∇ ( ∇ φ · ∇ φ ) and then substituting into Eq. (A.1),41ne can obtain ∇ (cid:18) ∂φ∂t + ∇ φ · ∇ φ (cid:19) = ∇ (cid:18) − pρ + ν ∇ φ (cid:19) , (A.2)which can be further simpliﬁed as ∂φ∂t + ∇ φ · ∇ φ − ∆ pρ + ν ∇ φ, (A.3)where ∆ p is the pressure diﬀerence with respect to some reference pressure. Byintroducing the variable transformation φ = − ν ln ψ in Eq.(A.3), we obtain:1 ψ ∂ψ∂t − νψ ∇ ψ · ∇ ψ = ∆ p µ + ν ∇ (ln ψ ) . (A.4)By substituting the relation ∇ (ln ψ ) = (cid:16) − ψ ( ∇ ψ · ∇ ψ ) + ψ ∇ ψ (cid:17) , we con-struct the reaction-diﬀusion equation in terms of scalar function ψ of a movingﬂuid element from the Navier-Stokes momentum equation: ∂ψ∂t − ν ∇ ψ = ∆ p µ ψ. (A.5)The above hyperbolic diﬀerential form is often called the Liouville equation,whose characteristics are integral curves of the nonlinear ordinary diﬀerentialequation. In Appendix B, we present an integral solution to the above reaction-diﬀusion equation for constructing the moment map. Appendix B: An explicit integral solution of the reaction-diﬀusionequation

We can construct a general integral solution for the reaction-diﬀusion systemEq. (A.5) via Green’s function. Let L be the operator L = ∂∂t − ν ∇ . Thediﬀerential equations for ψ ( x , t ) and the Green’s function, G ( x , t ; ξ , τ ) for x , ξ ∈ and t, τ ≥ L ψ ( x , t ) = ∆ p µ ψ ( x , t ) , (B.1) L G ( x , t ; ξ , τ ) = δ ( x − ξ ) δ ( t − τ ) , (B.2)where D is the ﬂuid domain of interest and ξ and τ are the dummy variables.Multiplying Eq. (B.1) by G and by Eq. (B.2) ψ and subtracting Eq. (B.2) fromEq. (B.1), we get G L ψ − ψ L G = G ∆ p µ ψ − δ ( x − ξ ) δ ( t − τ ) ψ. (B.3)Integrating with respect to x and t , the space-time integral form is constructedas follows: (cid:90) ∞ (cid:90) V [ G L ψ − ψ L G ] dV dt = (cid:90) ∞ (cid:90) V G ∆ p µ ψ dV dt − ψ ( ξ , τ ) . (B.4)The left-hand side can be further simpliﬁed using the Green’s second identity (cid:90) ∞ (cid:90) V [ G L ψ − ψ L G ] dV dt = (cid:90) V (cid:90) ∞ (cid:20) G ∂ψ∂t − ψ ∂G∂t (cid:21) dtdV − ν (cid:90) ∞ (cid:90) V (cid:2) G ∇ ψ − ψ ∇ G (cid:3) dV dt = − (cid:90) V Gψ ( x , dV − (cid:90) ∞ (cid:90) V ψ ∂G∂t dV dt − ν (cid:90) ∞ (cid:73) S [ G ∇ ψ − ψ ∇ G ] n dSdt. (B.5)By equating Eq. (B.4) and Eq. (B.5), for ψ ( ξ , τ ) we obtain the explicit expressionfor the transformed scalar variable ψ : ψ ( ξ , τ ) = (cid:90) ∞ (cid:90) V G ∆ p µ ψ dV dt + (cid:90) V Gψ ( x , dV + 2 (cid:90) ∞ (cid:90) V ψ ∂G∂t dV dt + ν (cid:90) ∞ (cid:73) S [ G ∇ ψ + ψ ∇ G ] n dSdt. (B.6)43y exchanging ( ξ , τ ) with ( x , t ) and setting the 3rd term of right-hand side tozero, we get the general solution of non-homogeneous diﬀusion equation [26]: ψ ( x , t ) = (cid:90) ∞ (cid:90) V G ∆ p µ ψ ( x , t ) dV ξ dτ + (cid:90) V Gψ ( x , dV ξ + ν (cid:90) ∞ (cid:73) S [ G ∇ ξ ψ + ψ ∇ ξ G ] n dSdτ. (B.7)By observing the above equation, one can construct an input-output mappingfor the sets of scalar functions and the Green’s function. Deep neural networkscan help to construct these functions hierarchically, in which higher-level featurefunctions can be formed as a multilayer stack of simple lower-level ones. Appendix C: The Mori-Zwanzig projection operators

By considering the operator L = ∆ p µ + ν ∇ , the reaction-diﬀusion Eq. (A.5)derived from the Navier-Stokes momentum equation can be written as Liouvilleform: ∂ψ∂t = Lψ, (C.1)whose characteristics satisfy the N -dimensional semi-discrete nonlinear ordinarydiﬀerential equation. Let ψ ( χ , t ) = g ( χ ), where χ ∈ R N denote the modes of ψ comprising the slow (retained) degrees of freedom (cid:98) χ and the fast (eliminated)degrees of freedom (cid:101) χ . We denote the component of ψ corresponding to theresolved mode by (cid:98) ψ ( t ) = ψ ( (cid:98) χ , t ) = g ( (cid:98) χ ). The solution of Eq. (C.1) for theresolved mode is given by (cid:98) ψ = e Lt (cid:98) ψ (0), whereas e Lt is the evolution opera-tor. The standard semi-group notation is used to construct the ﬂow map i.e., (cid:98) ψ ( (cid:98) χ , t ) = e Lt g ( (cid:98) χ ). For brevity, we denote (cid:98) ψ (0) by (cid:98) ψ hereafter. Let P be aprojection of ψ in the direction of (cid:98) ψ such that P ψ ≡ ( (cid:98) ψ , ψ )( (cid:98) ψ , (cid:98) ψ ) (cid:98) ψ, (C.2)and also let Q = I − P be another projector onto an orthogonal subspace. Theprojection operators P and Q follow the self-adjoint condition and satisfy the44rthogonality condition QP = 0. While P projects the particular variable ontothe slow variables of the system, Q projects the variable onto the fast variablesof the dynamical system. Using Dyson’s formula [35]: e Lt = e QLt + (cid:90) t e L ( t − t ) P Le

QLt dt . (C.3)and substituting it to Eq. (C.1), we obtain: ∂ (cid:98) ψ∂t = e Lt L (cid:98) ψ = e Lt ( Q + P ) L (cid:98) ψ = e Lt P L (cid:98) ψ + e Lt QL (cid:98) ψ . (C.4)Using Eq. (C.2) and Eq. (C.3), the projection terms are: e Lt P L (cid:98) ψ = ( L (cid:98) ψ , ψ )( (cid:98) ψ , (cid:98) ψ ) e Lt (cid:98) ψ = Θ (cid:98) ψ, (C.5) e Lt QL (cid:98) ψ = e QLt QL (cid:98) ψ + (cid:90) t e L ( t − t ) P Le

QLt QL (cid:98) ψ dt . (C.6)By deﬁning F ( (cid:98) χ , t ) ≡ e QLt QL (cid:98) ψ and P Le

QLt QL (cid:98) ψ = − K ( t ) (cid:98) ψ , we can writeEq. (C.4) as ∂ (cid:98) ψ∂t = Θ (cid:98) ψ (cid:124)(cid:123)(cid:122)(cid:125) Markovian + (cid:90) t K ( t ) (cid:98) ψ ( t − t ) dt (cid:124) (cid:123)(cid:122) (cid:125) Convolution (memory) + F ( (cid:98) χ , t ) (cid:124) (cid:123)(cid:122) (cid:125) Fluctuation (C.7)which is referred to as the Mori-Zwanzig equation for the coarse-graining of ageneral dynamical system [27, 28, 36]. While the ﬁrst term in the right-hand sideis a function of the instantaneous value of ψ , the second term depends on thetime history of ψ in the range [0 , t ] via convolution (memory) eﬀect. The thirdterm represents the eﬀect of ﬂuctuations and by construction, it satisﬁes theorthogonal dynamics equation, ∂F ( (cid:98) χ ,t ) ∂t = QLF ( (cid:98) χ , t ). This formalism providesa theoretical framework to make systematic approximations of memory kerneleﬀects during the CNN-based learning.45 cknowledgements The ﬁrst author thanks the National Research Foundation and Keppel Cor-poration, Singapore for supporting the work done in the Keppel-NUS CorporateLaboratory.

ReferencesReferences [1] A. Roshko, Perspectives on bluﬀ body aerodynamics, Journal of Wind En-gineering and Industrial Aerodynamics 49 (1-3) (1993) 79–100.[2] P. Holmes, Turbulence, coherent structures, dynamical systems and sym-metry, Cambridge University Press, 2012.[3] W. Yao, R. K. Jaiman, Model reduction and mechanism for the vortex-induced vibrations of bluﬀ bodies, Journal of Fluid Mechanics 827 (2017)357–393.[4] P. J. Schmid, Dynamic mode decomposition of numerical and experimentaldata, Journal of Fluid Mechanics 656 (2010) 5–28.[5] P. J. Schmid, L. Li, M. P. Juniper, O. Pust, Applications of the dy-namic mode decomposition, Theoretical and Computational Fluid Dynam-ics 25 (1-4) (2011) 249–259.[6] J. N. Kutz, Deep learning in ﬂuid dynamics, Journal of Fluid Mechanics814 (2017) 1–4.[7] E. J. Parish, K. Duraisamy, A paradigm for data-driven predictive model-ing using ﬁeld inversion and machine learning, Journal of ComputationalPhysics 305 (2016) 758–774.[8] J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence mod-elling using deep neural networks with embedded invariance, Journal ofFluid Mechanics 807 (2016) 155–166.469] M. Raissi, P. Perdikaris, G. E. Karniadakis, Machine learning of lineardiﬀerential equations using Gaussian processes, Journal of ComputationalPhysics 348 (2017) 683–693.[10] M. Raissi, G. E. Karniadakis, Hidden physics models: Machine learning ofnonlinear partial diﬀerential equations, arXiv preprint arXiv:1708.00588.[11] L. Parussini, D. Venturi, P. Perdikaris, G. E. Karniadakis, Multi-ﬁdelityGaussian process regression for prediction of random ﬁelds, Journal of Com-putational Physics 336 (2017) 36–50.[12] G. Hinton, S. Osindero, Y. W. Teh, A fast learning algorithm for deepbelief nets, Neural Computation 18 (7) (2006) 1527–1554.[13] Y. Bengio, Learning deep architectures for AI, Foundations and trends R (cid:13)(cid:13)