Turbulence closure modeling with data-driven techniques: Influence of neural network hyperparameters and training procedure on generalizability
Salar Taghizadeh, Freddie Witherden, Yassin Hassan, Sharath Girimaji
AAIP/123-QED
Turbulence closure modeling with data–driven techniques: Influence of neuralnetwork hyperparameters and training procedure on generalizability
S. Taghizadeh, a) F. D. Witherden, Y. A. Hassan, and S. S. Girimaji Department of Mechanical Engineering, Texas A&M University, College Station,TX 77843, USA Department of Ocean Engineering, Texas A&M University, College Station, TX 77843,USA Department of Nuclear Engineering, Texas A&M University, College Station, TX 77843,USA (Dated: 3 February 2021)
Generalizability of machine–learning (ML) assisted turbulence closure models to unseenflows remains an important challenge. It is well known that the architecture of a neural net-work and the manner of training have a profound influence on the predictive capability andgeneralizability of the resulting model [I. Goodfellow, Y. Bengio, and A. Courville, Deeplearning, Vol. 1 (2016)]. The objective of the present work is to characterize the influenceof the choice of (i) the network (in terms of the number of neurons and hidden layers);and (ii) training process elements such as type of loss function on the generalizability ofthe ML models. A systematic examination of ML model generalizability over a broad pa-rameter regime requires detailed datasets for many flows in the entire domain. The mainnovelty of the present study is to utilize a simplified proxy–physics surrogate of turbulenceto obtain data rather than prohibitively expensive high–fidelity simulations or experiments.The chosen proxy model exhibits important features including bifurcation and rapid distor-tion effects in different parameter regimes. Using proxy–physics data, training and testingof various neural networks are performed over different regimes. Optimal network archi-tecture and training techniques are identified based on performance in these test studies.The findings provide a clear basis for choosing neural network architecture and trainingprocedure for turbulence applications. a) Electronic mail: [email protected]. a r X i v : . [ phy s i c s . f l u - dyn ] F e b . INTRODUCTION Turbulent flows exhibit vastly different characteristics in different parameter regimes dependingupon the mean strain-to-rotation rate ratio , mean-to-turbulence time scale ratio , underlyingflow instabilities , largescale unsteadiness, presence or absence of system–rotation or streamline–curvature , body–force effects and flow geometry. In addition, due to inherent complexity ofthe turbulence phenomenon, closure models developed in one parameter regime cannot be pre-sumed to be reasonable or even valid in other regimes. The strong dependence of flow statisticson the various physical parameters is one of the enduring challenges in the field of turbulenceclosure model development. The degree of difficulty of closure modeling depends upon the levelof closure. In the Reynolds–averaged Navier–Stokes (RANS) method, the fundamental governingequations are averaged over all scales of motion leading to significant reduction in computationaleffort needed for performing a flow simulation. The reduction in computational burden comesat the cost of increased complexity of closure modeling. Ad hoc simplifications or assumptionsare typically invoked to close various terms in the RANS equations. While RANS models mayperform adequately in flows in which they are calibrated, they can be catastrophically wrong inother complex flows due to lack of generalizability. Despite inherent limitations, RANS is widelyused in industrial applications involving complex flows due to ease of computations. At the otherextreme of the closure spectrum, in the large–eddy simulation (LES) approach, dynamically im-portant scales are resolved and only the small–scale motions are modeled. The small–scales aresignificantly easier to model as they embody most of the ‘universal’ aspects of turbulence and,therefore, are more easily amenable to generalizability than their RANS counterpart. Thus, inLES, the relative simplicity and generalizability of subgrid closure models comes at the expenseof significantly increased computational costs. The closure modeling challenges of scale resolvingsimulations (SRS) are of intermediate degree of difficulty as they resolve more scales than RANSbut significantly lesser than LES .There is heightened expectation in recent times that ML techniques can be used to significantlyimprove RANS turbulence closure model performance in complex industrial flows. The rationaleis that the shortcomings of the physics–based closures can be circumvented by appropriate data–based training of the models. Toward this end, many authors have used neural networks to modelthe turbulence constitutive relation. However, ML methods for turbulence closure are not withouttheir own challenges and shortcomings. There is no clear guidance on the architecture of the neural2etwork or different elements of the training procedure. Ling et al. showed that embedding theGalilean invariance property in the deep neural network architecture can improve the predictivecapability of ML turbulence models. They used a deep neural network with 8 hidden layers, and30 neurons per hidden layer. Zhang et al. trained a neural network with 4 hidden layers and20 neurons per layer in order to develop a model to predict the Reynolds stress of a channel flowat different Reynolds numbers. Using same channel flow datasets, Fang et al. trained a neuralnetwork with 5 hidden layers and 50 neurons per layer to improve model performance. Jiang etal. used DNS of channel flow at different Reynolds number in order to developed a new algebraicReynolds stress model by training a deep neural network with 9 hidden layers and varying numberof neurons in each layer. Sotgiu et al. developed a Reynolds stress constitutive model by traininga neural network with 8 hidden layers and 8 neurons. They used planner channel and periodichill channel flow datasets for training the ML algorithm. Geneva and Zabaras trained a neuralnetwork with 5 hidden layers and tapered the number of neurons of the last hidden layers in anattempt to improve training performance.In the field of computer vision, the architecture of a neural network and the manner of traininghave been demonstrated to have a profound influence on the predictive capability and generaliz-ability of the resulting model . Indeed, neural networks of different architectures trained withthe same data can produce significantly different results during testing and prediction. Lackinga formal procedure for neural network selection and training protocol, ML–assisted turbulenceclosures can be as ad hoc as the traditional models and, more importantly, lack generalizability. Inother areas, it has been shown that ML models developed with subject matter expertise has a betterchance of succeeding . Thus, it is of much interest to examine generalizability in turbulence–likesystems that are much simpler to compute.The objective of this study is to systematically analyze the influence of neural–network hyper-parameters and training procedure on generalizability of turbulence models to unseen flows. Suchunderstanding can lead to important guidelines and best practice recommendations to fully real-ize the advantages offered by ML. Comprehensive examination of generalizability of ML modelsover a wide parameter range of turbulent flows is rendered difficult due to the lack of high–fidelitydata. Direct numerical simulations (DNS) of a wide range of flows is prohibitively expensive.Experiments can be performed expeditiously only for a small set of flows. While it is prefer-able to perform this study using data from complex turbulent flows, this study focuses on simplerapproach as a first step. 3o enable a reasonably rigorous analysis, we (i) restrict our consideration to statistically two–dimensional homogeneous turbulence, which represents the most elementary non–trivial flows ofinterest; and (ii) utilize data generated from a proxy–physics turbulence surrogate that incorporatesmany of the key aspects of real homogeneous flows. The proxy–physics surrogate is expected toprovide some level of turbulence subject matter expertise without expensive data. Clearly, theproxy–physics surrogate must be simple to compute, while retaining key complexities of turbulentflows. If the ML models cannot demonstrate generalizability in the simple proxy–physics system,it would be unreasonable to expect ML models to perform well in unseen turbulent flows. Further,the lessons learned for this simpler system can yield valuable insight into the closure modelingof the more complex turbulence phenomenon. The various neural–network hyperparameters andtraining elements considered in this study are: number of hidden layers or depth of network,number of neurons in each layer or width of network and type of loss function.The organization in the rest of the paper is as follows. The selected proxy–physics turbulencesurrogate is explained in Sec. II. The neural network hyperparameteres and training elements aredescribed in Sec. III. Section IV details how data is generated with the proxy–physics turbulencesurrogate in different regimes of two–dimensional homogeneous turbulence. The results fromdifferent architectures and training methods are presented in Sec. V. We conclude with a summaryof findings and recommendations in Sec. VI. II. METHODOLOGY
In a statistically two–dimensional homogeneous incompressible turbulent flow field withoutbody forces, only two parameters govern the physical behavior – Sk / ε and Rk / ε , where S , R , K and ε are respectively mean strain rate, mean rotation rate, turbulent kinetic energy and turbu-lent dissipation. This set of flows do not exhibit any large–scale instabilities, coherent structures,complex geometrical features or statistical unsteadiness. Despite the apparent simplicity, homo-geneous two–dimensional turbulent flows embody multiple complex phenomena that are stronglydependent upon the parameter values. The flow goes from hyperbolic to rectilinear to ellipticstreamline geometry with increasing mean rotation rate. Depending upon the mean flow to tur-bulence strain rate ratio, the flow physics can range from the rapid distortion limit to decayinganisotropic turbulence.Algebraic Reynolds Stress Model (ARSM) does reasonably well in capturing key flow physics4n different parts of the flow regime . The ARSM requires the solution of a cubic equationand appropriate root must be chosen in different flow regimes to yield the correct behavior . Inthis study, we use self–consistent and nonsingular fully explicit algebraic Reynolds Stress Model(EARSM) proposed by Girimaji as the proxy–physics turbulence surrogate to generate stress–strain (constitutive) relationship data for different normalized strain and rotation rates. As men-tioned in the Introduction, generalizability of the neural–network models in this simple system isa prerequisite to generalizability in actual turbulent flows. Further, due to the fact that EARSMcaptures many key statistical features of turbulence, this study will yield much valuable insightinto RANS ML modeling. The proxy–physics data in different parts of the domain are then usedto train neural–networks of different architectures. The networks are then tested in other parameterregimes to examine generalizability. Complete knowledge of the proxy–physics solution enablesprecise error assessment incurred during generalization to test flows.Using representation theory, three–term expansion of the anisotropy tensor in terms of thenormalized strain (S) and normalized rotation rate (R) tensors for two–dimensional mean flowscan be expressed as , b i j = G ( S i j ) + G ( S ik R k j − R ik S k j ) + G ( S ik S k j − δ i j S mn S nm ) , (1)where scalar coefficients G − − G are constitutive closure coefficients (CCC) that must bemodeled. They are functions of scalar invariant of the strain and rotation–rate tensors ( η = S i j S i j , η = R i j R i j ) and other flow quantities such as - turbulent kinetic energy (k), turbulencefrequency ( ω ) and the coefficients of the pressure–strain model. An implicit algebraic equationfor the anisotropy tensor can be obtained in the weak–equilibrium limit of turbulence using thefollowing simplification , Db i j Dt = ∂ b i j ∂ t + U k ∂ b i j ∂ x k ≈ , (2)where D / Dt is the substantial derivative following the mean flow. The weak–equilibrium assump-tion is valid for many flows wherein the timescale of anisotropy evolution is rapid compared to thetimescales of mean flow, turbulent kinetic energy, and dissipation rate . Using Eqs. (1) and (2),the nonlinear algebraic Reynolds stress equation can be written as the following cubic fixed–pointequation for G , ( η L ) G − ( η L L ) G + (cid:104) ( L ) + η L L − η ( L ) + η ( L ) (cid:105) G − L L = . (3)5hen G and G can be expressed as , G = − L G L − η L G , G = L G L − η L G . (4)The closure coefficients in Eqs. (3) and (4) are as follows: L = C − , L = C + , L = C − , L = C − , L = C − , (5)where the C (cid:48) s are numerical constants of the pressure–strain correlation model. In this study alinearized version of the so–called SSG pressure–strain model is employed, C = . , C = . , C = . , C = . , C = . . (6)The cubic relation in Eq. (3) has multiple real and complex roots and the selection of the appro-priate solution for G is not straightforward. By considering two physical selection criteria, (i) continuity of G , and (ii) G ≤
0, Girimaji derives a fully explicit solution of the cubic Eq. (3)for scalar coefficient G as follows: G = L L ( L ) + η ( L ) for η = , L L ( L ) − η ( L ) + η ( L ) for L = , − p + ( − b + √ D ) + ( − b − √ D ) for D > , − p + (cid:113) − a cos ( θ ) for D < , b < , − p + (cid:113) − a cos ( θ + π ) for D < , b > . (7)Here the discriminant D of the cubic Eq. (3) is calculated as: D = b + a , (8)other parameters of Eqs. (7) and (8) are defined as below: a = ( q − p ) , b = ( p − pq + r ) , p = − L η L , q = ( η L ) (cid:104) ( L ) + η L L − η ( L ) + η ( L ) (cid:105) , r = − L L ( η L ) , cos ( θ ) = − b / (cid:112) − a / . (9)The first two cases in Eq. (7) are special limiting cases of the last three and it can be shown thatthe limiting behavior can be easily calculated from the general expressions . After the coefficient G is calculated, other CCC can also be obtained using Eq. (4) in the entire parameter space.6IG. 1: Different states of turbulence (a) (b) (c) FIG. 2: CCC contour plots, (a) G , (b) G and (c) G Physics of EARSM.
The goal of this study is to use EARSM to provide some turbulence subjectmatter expertise. For this reason, it is necessary to ensure that the selected proxy model adequatelyincorporates some of the known features of turbulence. Different turbulence states covered by theproxy–physics surrogate are depicted for the considered parameter space in Fig. 1. At rapidlystrained turbulence state, strain rate dominates over rotation rate and the discriminant D of thecubic fixed–point equation for G is negative. The realizability violations occur at this rapidlystrained region due to the governing elastic constitutive relationship. Rotation rate dominatesover strain rate at high values of mean vorticity state. Other important turbulence states are alsorepresented in this figure. Hyperbolic streamline flows occur when S >> R ; and the streamlinesare elliptic when S << R . When S ≈ R rectilinear shear flows are seen. The contour plot of G given by Eq. (7) is illustrated in Fig. 2. It is evident that G and therefore the effective turbulent7iscosity is well defined in the entire parameter space wherein different important turbulence statesare covered. It is also shown that the coefficient G is a continuous function across bifurcation line D =
0. Overall, the physics underlying G contour is a reasonable facsimile of real turbulent flows.Fig. 2 shows well defined values for G and G coefficients in the parameter space. Therefore,the analytical, self-consistent EARSM model incorporates some of the known phenomenon ofturbulence. Overall, EARSM provides reasonable CCC in the entire parameter space and can beused as an appropriate proxy model of turbulence.The objective of this study is to investigate if the developed ML turbulence model can reason-ably simulate this simple surrogate of turbulence. If the ML models can not perform well withthis simplified description it is unlikely to perform successfully in real turbulent flows. We alsosee from Fig. 2 that the CCC have a large range of values in the parameter space. In particular, themagnitude of the CCC is approximately zero 10 − for larger values of parameters, while it is inthe order of 10 − in decaying turbulence state. As mentioned earlier, small values of G for large S is a consequence of elastic constitutive behaviour and is needed for preserving realizability. ThisEARSM model is used to generate Reynolds stress data for training and testing ML models in thisstudy. III. MACHINE LEARNING
Previous investigations.
ML is a general term to describe a class of algorithms which use datato generate models. The selection of modeling strategy depends on the type of problem. Recently,deep neural networks have been widely used for modeling turbulent flows. Ling et al. trained adeep neural network with 8 hidden layers, and 30 neurons per hidden layer with available high–fidelity data: duct and channel flow, perpendicular and inclined jet in cross–flow, flow around asquare cylinder and flow through a converging–diverging flow. Their results showed that incor-porating the Galilean invariance property in the deep neural network architecture can improve thepredictive capability of ML turbulence models. Zhang et al. trained a neural network with 4 hid-den layers and 20 neurons per layer in order to develop a model to predict the Reynolds stress of achannel flow at different Reynolds numbers. They obtained well behaved models by introducingregularization in their training algorithm. Using same channel flow datasets, Fang et al. traineda neural network with 5 hidden layers and 50 neurons per layer in order to develop an improvedReynolds stress tensor model. Jiang et al. used DNS of channel flow at different Reynolds num-8IG. 3: Different activation functionsber in order to developed a new algebraic Reynolds stress model by training a deep neural networkwith 9 hidden layers and varying number of neurons in each layer as, 12, 18, 21, 27, 32, 35, 30,28, and 27, respectively. Sotgiu et al. developed a Reynolds stress constitutive model by traininga neural network with 8 hidden layers and 8 neurons. They used planner channel and periodichill channel flow datasets for training the ML algorithm. Geneva and Zabaras trained a neuralnetwork with 5 hidden layers and tapered the number of neurons in the last two hidden layersto prevent weights from being too small and improve training performance. The required train-ing data was generated by performing LES simulations of different flows: converging–divergingchannel, periodic hills, square duct, square cylinder and tandem cylinders. Objective.
Determining a suitable network architecture and training hyperparameters of a deepneural network is an empirical task and it has been shown that there is a strong positive correla-tion between the final performance of the trained neural networks and experience of the user inoptimizing the hyperparameters . There is no rigorous guidance on the architecture of the neuralnetwork or different elements of the training procedure in developing ML assisted turbulence mod-els. The objective of this study is to perform a comparative analysis of the different structures offeed–froward neural network. Based on the physics of the proxy–physics surrogate, three differentscenarios are introduced in order to generating training datasets. Then, for each scenario a sys-tematic optimization of the neural network architecture is performed in order to identify optimumnetwork architectures that ensure a low generalization error at different datasets. Deep neural networks.
A deep neural network is a class of model which transforms the input9 𝜂 𝐺 𝐺 𝐺 Hidden Layers Output Layer
Input Layer
FIG. 4: Illustration of a deep feed–forward neural networkparameters (features) through several layers of units (neurons). Each unit (neuron) is connectedby affine linear maps between units in successive layers and then with nonlinear (scalar) activationfunctions within units. Different activation functions such as, rectified linear unit (ReLU), leakyReLU, exponential linear unit (ELU), sigmoid and hyperbolic tangent are shown in Fig. 3. TheReLU function is the popular choice in the ML literature: σ ( z ) = max ( z , ) , (10)where, z is the input to a neuron. In this study, we selected ReLU activation function for all theneural networks.It should be noted that deep neural networks consist of simple functions and their efficiencyemanates from the interactions between large number of hidden layers . Due to their flexiblearchitecture and superior performance in modeling non–linear and complicated relationships withhigh–dimensional data, they have become a popular subset of ML approaches. Although a varietyof network structures, such as convolutional neural networks (CNN), recurrent neural networks(RNN), or long short–term memory (LSTM) networks have been proposed in the ML literature ,we will restrict ourselves to fully connected architectures. Fig. 4 shows the schematic of a deepfeed–forward neural network (also termed as a multi–layer perceptron (MLP)) which is trainedusing backpropagation with gradient descent method. The input layer, the hidden layers and theoutput layer are also shown in this figure. Loss function.
The deep neural networks are trained by minimizing a loss (objective) function,10hich measures the difference between the predicted output of the model and labels (ground truthdata). It has been shown that the success of ML models depends on the formulation of the lossfunction used for optimization of the model coefficients (weights and biases of neurons) during theML training process . The type of loss function is also problem specific and need to be selectedproperly. The root mean squared error (RMSE) is the commonly used loss function: RMSE = (cid:115) ∑ Ni = ( y i − ˆ y i ) N , (11)where ˆ y i denotes the ML prediction and y i is the true labeled data; N is the number of trainingdata. Alternatively, the mean absolute percentage error (MAPE) is also considered, MAPE = N N ∑ i = | y i − ˆ y i || y i | , (12)The RMSE loss works well in most cases, while the MAPE loss is better for the case where theoutput has a large range of function values . In order to find the appropriate type of loss function,the models trained with MAPE and RMSE are compared in this study.Adding a regularization term in the loss function formulation is one of the common ways ofavoiding overfitting. Overfitting occurs when an algorithm is tailored to a particular dataset andis not generalized well to deal with unseen datasets. Introducing a regularization term in the lossfunction formulation shrinks the model coefficients towards zero, decrease the complexity of themodel and hence significantly reduces the variance . The L norm (Lasso Regression) and L norm (Ridge Regression) are the two common regularization methods , MAPE ( y , y i ) + λ N ∑ i = | ω i | , L − normMAPE ( y , y i ) + λ N ∑ i = ω i , L − norm (13)where ω denotes the weight of each neuron and λ is a positive hyperparameter to determine thestrength of regularization. For this work, to control the overfitting in some experiments, RidgeRegression ( L norm regularization) is applied in all the hidden layers during the training of theneural networks. Network hyperparameters.
Architecture and other training hyperparameters of the neural net-works need to be suitably specified in order to build a robust and effective model that can generalizeto unseen datasets. For instance, hyperparameters include size of the networks (number of layers11ABLE I: Fixed hyperparameters
Hyperparameter ValueActivation function ReLUOptimization algorithm Adam L norm penalization coefficient ( λ ) or depth, width or breadth of each layer), formulation and type of the loss function, regularizationand optimization algorithm, batch size, type of initialization, learning rate, etc. Hyperparameteroptimization is an empirical task and grid search is usually adopted, i.e., many networks with sev-eral different combinations of interval values of each hyperparameter are trained and comparedbased on their accuracy and generalization ability.In principle, all of the hyperparameters in the ML algorithm can be varied and they mighthave significant impact on the model performance. However, the predictive capability and gen-eralization of a neural network is mostly controlled by its architecture: the depth and breadth ofnetwork. Therefore, in this study we only perform a systematic search in the network size space toidentify network architectures that are efficient and ensure a low generalization error at differentdatasets. We consider a matrix of network sizes by varying the depth for five values of 1, 3, 5, 7,10 and the width for five values of 3, 5, 7, 15, 30. The learning rate is also varied in the range1 × − ∼ × − . Other fixed hyperparameters are shown in Table I. IV. DATA GENERATION WITH PROXY–PHYSICS TURBULENCE SURROGATE
The data needed for training the ML algorithm is generated using the proxy–physics surrogateis discussed in Sec. II. The number of data points that are non–uniformly extracted for each inputparameter in the range [10 − , ] is 150. Therefore, the total number of data points is 22,500 inthe entire parameter space. In this work, different investigations are conducted in order to answerthree main questions:1. What is the optimum neural network architecture when training is performed using datafrom the entire parameter space?2. What is the optimum neural network architecture when data covers only one portion of12 a) (b) (c) FIG. 5: Training (green circles) and testing (red circles) datasets for (a) Case-1, (b) Case-2, (c)Case-3known physics?3. What is the optimum neural network architecture when only narrow band of data is availablein the critical regime of physics?These three investigations lead to important inferences about generalizability. We generate thetraining datasets in three different scenarios as follows.
A. Training data over the entire parameter space
This represents the ideal scenario in which the neural network is trained with data over theentire parameter space. Thus there is no physics that is unseen by the ML model. In this case,generalizability is expected to be trivially straightforward. Sufficient data is gathered over, strain,shear and rotational flow regimes. It should be noted that providing the real turbulence data inthe entire parameter space, requires expensive DNS over a wide range of flows. The generateddata points in the parameters space are randomly split into 75% for training (Training Data) and25% for validation and testing the model (Testing Data). The testing data is used for model eval-uation during architecture optimization. Fig. 5a represents randomly distributed data in the entireparameter space for this case. 13 . Training data only in the strain–dominated region ( D < ) In this case training data is restricted to a subset of the parameter space. Data points in thestrain–dominated region D < D > C. Limited training data in the shear–dominated region
In many instances training data is available only in a very narrow region of the parameterspace. In this regard, we examine the optimum network architecture and generalizability of the MLmodel trained with the limited dataset that covers shear–driven turbulence. As can be seen fromEARSM equation, shear flow represents the bifurcation region between strain and rotation flows.An arbitrary zone near the bifurcation line D = − < ( η − . η ) < V. RESULTS
1. RMSE loss function vs. MAPE loss function
It has been shown that the appropriate formulation of loss function and optimal choice of theflow statistics contributing to the loss function impact the success of the ML trained turbulencemodels . As mentioned in Sec. III we examine the performance of the ML models trained withdifferent loss functions in this study. First, we train two deep neural networks with RMSE andMAPE loss functions without L regularization. For this analysis the randomly generated datasetin the entire parameter space is segmented as 75% for training and 25% for testing of the models.The selected fixed network architecture for both cases has 7 hidden layers with 7 computationneurons in each layer. All other hyperparameters of the models are as shown in Table I.The performance of the models trained with different loss functions are reported with twoMAPE and RMSE error metrics on training and testing datasets in each column of Table II. It can14ABLE II: Performance of models trained with different loss functions Errors with various metricsRMSE–training RMSE–testing MAPE–training MAPE–testingRMSE–loss func. 9 . × − . × − . × − . × − be seen that the MAPE loss function is better for training the ML algorithm. As mentioned inSec. II, the CCC have a large range of values over the parameter space. When a RMSE is usedas the loss function, the training and back–propagation process are mostly dominated by the CCCwith larger magnitudes, as the small value coefficients contribute very little to the neural networkoptimization process. Local error contours of the models trained with different loss functions areshown in Figs. 6 and 7. It can be seen that the ML algorithm trained with MAPE has smaller localerrors in different turbulence physics regions in the entire parameter space.Comparing different error metrics, MAPE vs. RMSE for final performance of the models inTable II and absolute error vs. MAPE for local error contours in Figs. 6 and 7, clearly showsthat MAPE metric has an unbiased, easily interpretable presentation of final model performance.Therefore, MAPE is selected as the appropriate type of loss function for the remainder of theanalysis.
2. Case–1: Training data over the entire parameter space
As a first experiment, we train the neural networks with randomly distributed dataset, i.e., 75%for training and 25% for testing the models. For this case the number of hidden layers in neuralnetworks varies among 1, 3, 5 and 7. Number of neurons in each hidden layer varies between3, 5, 7 and 15. Fig. 8 demonstrates the MAPE error of all the 16 network architectures in thetraining and testing datasets. In this figure number of hidden layers and number of neurons ineach layer are shown with horizontal and vertical axes, respectively. Although for this case L regularization is not used during ML training, all the 16 networks have similar performance onboth the training and testing datasets and models test quite well without over/under fitting. It canbe seen for this case with randomly distributed data in the entire parameter space, neural networkswith total neurons less than 21 with any combination of layers and neurons have the worst training15IG. 6: Absolute error contours of CCC for ML models trained with different loss functions, 1strow: RMSE, 2nd row: MAPEand testing errors. Additionally, as the number of the layers and neurons in each layer increase, thetraining and testing errors decrease and for this case network architecture with 7 hidden layers and15 neurons in each layer (7L–15N) has the best performance in the training and testing datasets.Local MAPE contours of networks with 7 hidden layers and different neurons in the entireparameter space are shown in Fig. 9. It is evident that by increasing the number of neurons ineach layer, error decreases in the entire parameter space and for best model in this case (7L–15N)the maximum error occurs mainly near the bifurcation line D =
3. Case–2: Training data only in the strain–dominated region ( D < ) In this experiment the data points in the strain–dominated region D < D > L norm regularization as defined in Eq. (13) with λ = . a) (b) FIG. 8: Training and testing MAPE for neural networks with different architectures in Case–1architecture in this case. For selected network architecture with 7 hidden layers and differentneurons local MAPE contours in the entire parameter space are shown in Fig. 11. It is evident thatthe network with 7L–15N architecture has best performance in the entire parameter space for allthe CCC.
4. Case–3: Limited training data in the shear–dominated region
In this scenario the training dataset covers a limited but important region of the parameter spacenear the bifurcation line D =
0. The data points in rest of the parameters space are used for testingthe models. The network architectures considered for this case have hidden layers and neuronsvarying in the range 3 to 10 and 3 to 30, respectively. The performance of all the 20 networkarchitectures in the training and testing datasets are compared in Fig. 12. A L norm regularizationwith λ = . a) (b) FIG. 10: Training and testing MAPE for neural networks with different architectures in Case–2CCC.The main inferences from the three case studies can be summarized. When training is per-formed with data from the entire parameter space (Case–1) the accuracy of the predicting the testcases improves near monotonically with network size. As accuracy of training improves, so doesaccuracy in testing as there is no unseen flow physics. However, when training and testing areperformed in different domains of the parameter space (Cases– 2 and 3), larger networks do notnecessarily lead to improved testing. Very small networks do not perform well as there is insuf-ficient degrees of freedom to capture training and testing physics. With increase in network sizetesting accuracy improves until an optimum size in which training and testing errors are reason-ably small. Beyond this optimal size, the larger networks overfit the training data and thereforeperforms poorly during testing. The optimal network size for these cases is 7 hidden layers with15 neurons in each layer.
VI. CONCLUSION
ML techniques can potentially enhance the performance of the RANS turbulence closure mod-els significantly. Generalizability of ML–assisted turbulence model is one of the key challengesthat requires more investigations. Neural network hyperparameters and training techniques cansignificantly impact the predictive capability and generalizability of ML turbulence models. Inthis study, the influence of neural network architectures (number of hidden layers and neurons)20IG. 11: Local MAPE contours in the entire parameter space for Case–2, 1st row: 7L–5N, 2ndrow: 7L–7N, 3rd row: 7L–15N, 4th row: 7L–30N.21 a) (b)
FIG. 12: Training and testing MAPE for neural networks with different architectures in Case–3and training process elements (type of loss function) on generalizability are systematically inves-tigated over a broad parameter space of two–dimensional homogenous turbulence states.The key novelty of this work is the adoption of a simplified proxy–physics turbulence surrogatethat incorporates important features of real homogeneous flows to generate the training data to as-sess generalizability characteristics. An important advantage of this approach is that training datafor all flows in the parameter space can be generated easily. In contrast, DNS or experiments wouldbe prohibitively expensive and may not even be feasible for all flows in the parameter space. Suc-cessful generalizability of ML models in this proxy–physics turbulence system is a necessary butnot a sufficient condition for ML–model generalizability in actual turbulent flows. Nevertheless,this study provides valuable insight into the generalizability characteristics of different networkarchitectures in turbulence–like phenomena.The main inferences from this study are now listed:1. The loss function for training based on MAPE may be better than RMSE for deep feed–forward neural networks of turbulence CCC. This is due to the fact that MAPE limits themaximum model error throughout of the parameter space. Thus, the possibility of catas-trophic model performance in some part of the parameter space in minimized. On the otherhand, RMSE focuses only on average error in the parameter space without limiting the errorat any specific location. Therefore, while the model may do well in most regions of theparameter space, significant failure in some parts may not be ruled out. Clearly, a loss func-22IG. 13: Local MAPE contours in the entire parameter space for Case–3, 1st row: 7L–5N, 2ndrow: 7L–7N, 3rd row: 7L–15N, 4th row: 7L–30N.23ion based on a combination of MAPE and RMSE can be useful and will be investigated infuture studies.2. As expected, if training is performed with data obtained from the entire parameter space,the ML model does well in testing. In such a case, the accuracy of the model in the testingphase increases nearly monotonically with increasing neurons (breadth) and layers (depth)of the network.3. If the training is performed with data obtained from a limited region of the parameter spaceand tested in other regions, the performance does not improve monotonically with the net-work size. Starting from smaller networks, the training and testing accuracy improves withincreasing network size. Beyond an optimal network size, the predictive capability deterio-rates due to overfitting of the training data.4. The optimal network size will depend on the number of coefficients in the constitutive rela-tionship. For the three CCC model used in this study, a seven–layer fifteen–neuron networkperformed best. This network also yielded reasonable behavior in the full parameter–spacetraining and is hence recommended for use in practical applications.In summary, we establish a proxy–physics approach to determine the optimal network archi-tecture and training procedure for a given form of the turbulence constitutive relationship. Wepropose that such a preliminary study for identifying the optimal network hyperparameters be per-formed as an initial step in developing ML–RANS models. Such practice will provide more rigorto training procedure and lead to higher likelihood of success of ML turbulence models.
ACKNOWLEDGMENTS
The authors acknowledge support provided by Texas A&M High Performance Research Com-puting center.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding authorupon reasonable request. 24
EFERENCES G. Blaisdell and K. Shari, “Simulation and modeling of the elliptic streamline ow,” in
Proceed-ings of the Summer Program (Citeseer, 1996) p. 1. S. S. Girimaji, “Pressure–strain correlation modelling of complex turbulent flows,” Journal ofFluid Mechanics , 91–123 (2000). C. G. Speziale, “Analytical methods for the development of reynolds-stress closures in turbu-lence,” Annu. Rev. Fluid Mech , 57 (1991). A. A. Mishra and S. S. Girimaji, “Pressure–strain correlation modeling: towards achieving con-sistency with rapid distortion theory,” Flow, turbulence and combustion , 593–619 (2010). A. A. Mishra and S. S. Girimaji, “Toward approximating non-local dynamics in single-pointpressure–strain correlation closures,” Journal of Fluid Mechanics , 168–188 (2017). C. G. Speziale, R. Abid, and G. A. Blaisdell, “On the consistency of reynolds stress turbulenceclosures with hydrodynamic stability theory,” Physics of Fluids , 781–788 (1996). S. S. GIRIMAJI, “Partially-averaged navier-stokes model for turbulence: A reynolds-averagednavier-stokes to direct numerical simulation bridging method,” Journal of applied mechanics ,413–421 (2006). S. S. Girimaji, E. Jeong, and R. Srinivasan, “Partially averaged navier-stokes method for tur-bulence: Fixed point analysis and comparison with unsteady partially averaged navier-stokes,”Journal of Applied Mechanics , 422 (2006). J. Ling, A. Kurzawski, and J. Templeton, “Reynolds averaged turbulence modelling using deepneural networks with embedded invariance,” Journal of Fluid Mechanics , 155–166 (2016). Z. Zhang, X.-d. Song, S.-r. Ye, Y.-w. Wang, C.-g. Huang, Y.-r. An, and Y.-s. Chen, “Applica-tion of deep learning method to reynolds stress models of channel flow based on reduced-ordermodeling of dns data,” Journal of Hydrodynamics , 58–65 (2019). R. Fang, D. Sondak, P. Protopapas, and S. Succi, “Neural network models for the anisotropicreynolds stress tensor in turbulent channel flow,” Journal of Turbulence , 525–543 (2020). C. Jiang, J. Mi, S. Laima, and H. Li, “A novel algebraic stress model with machine-learning-assisted parameterization,” Energies , 258 (2020). C. Sotgiu, B. Weigand, K. Semmler, and P. Wellinger, “Towards a general data-driven explicitalgebraic reynolds stress prediction framework,” International Journal of Heat and Fluid Flow , 108454 (2019). 25 N. Geneva and N. Zabaras, “Quantifying model form uncertainty in reynolds-averaged turbu-lence models with bayesian deep neural networks,” Journal of Computational Physics , 125–147 (2019). I. Goodfellow, Y. Bengio, and A. Courville,
Deep learning , Vol. 1 (MIT press Cambridge,2016). K. Anand, Z. Wang, M. Loog, and J. van Gemert, “Black magic in deep learning: How humanskill impacts network training,” arXiv preprint arXiv:2008.05981 (2020). T. Gatski, “On explicit algebraic stress models for complex turbulent flows,” J. Fluid Mech ,59–78 (1993). S. S. Girimaji, “Fully explicit and self-consistent algebraic reynolds stress model,” Theoreticaland Computational Fluid Dynamics , 387–402 (1996). S. Taghizadeh, F. D. Witherden, and S. S. Girimaji, “Turbulence closure modeling with data-driven techniques: physical compatibility and consistency considerations,” New Journal ofPhysics , 093023 (2020). S. S. Girimaji, “Lower-dimensional manifold (algebraic) representation of reynolds stress clo-sure equations,” Theoretical and Computational Fluid Dynamics , 259–281 (2001). C. A. Gomez and S. S. Girimaji, “Explicit algebraic reynolds stress model (earsm) for compress-ible shear flows,” Theoretical and Computational Fluid Dynamics , 171–196 (2014). C. SPEZIALE, S. SARKAR, and T. GATSKI, “Modelling the pressure-strain correlation ofturbulence: an invariant dynamical stystems approach,” Journal of Fluid Mechanics , 245–272 (1991). S. Cai, Z. Wang, L. Lu, T. A. Zaki, and G. E. Karniadakis, “Deepm&mnet: Inferring the elec-troconvection multiphysics fields based on operator approximation by neural networks,” arXivpreprint arXiv:2009.12935 (2020). G. James, D. Witten, T. Hastie, and R. Tibshirani,
An introduction to statistical learning , Vol.112 (Springer, 2013). D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprintarXiv:1412.6980 (2014). X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neuralnetworks,” in
Proceedings of the thirteenth international conference on artificial intelligenceand statistics (JMLR Workshop and Conference Proceedings, 2010) pp. 249–256.26 R. L. Thompson, L. E. B. Sampaio, F. A. de Bragança Alves, L. Thais, and G. Mompean,“A methodology to evaluate statistical errors in dns data of plane channel flows,” Computers &Fluids , 1–7 (2016). S. V. Poroseva, J. D. Colmenares F, and S. M. Murman, “On the accuracy of rans simulationswith dns data,” Physics of Fluids , 115102 (2016). W. Rodi, “A new algebraic relation for calculating the reynolds stresses,” in
Gesellschaft Ange-wandte Mathematik und Mechanik Workshop Paris France , Vol. 56 (1976) p. 219. D. B. Taulbee, “An improved algebraic reynolds stress model and corresponding nonlinear stressmodel,” Physics of Fluids A: Fluid Dynamics , 2555–2561 (1992). S. Pope, “A more general effective-viscosity hypothesis,” Journal of Fluid Mechanics72