[PDF] Physics-guided deep learning framework for predictive modeling of the Reynolds stress anisotropy

Abstract

Despite a cost-effective option in practical engineering, Reynolds-averaged Navier-Stokes simulations are facing the ever-growing demand for more accurate turbulence models. Recently, emerging machine learning techniques are making promising impact in turbulence modeling, but in their infancy for widespread industrial adoption. Towards this end, this work proposes a universal, inherently interpretable machine learning framework of turbulence modeling, which mainly consists of two parallel machine-learning-based modules to respectively infer the integrity basis and closure coefficients. At every phase of the model development, both data representing the evolution dynamics of turbulence and domain-knowledge representing prior physical considerations are properly fed and reasonably converted into modeling knowledge. Thus, the developed model is both data- and knowledge-driven. Specifically, a version with pre-constrained integrity basis is provided to demonstrate detailedly how to integrate domain-knowledge, how to design a fair and robust training strategy, and how to evaluate the data-driven model. Plain neural network and residual neural network as the building blocks in each module are compared. Emphases are made on three-fold: (i) a compact input feature parameterizing the newly-proposed turbulent timescale is introduced to release nonunique mappings between conventional input arguments and output Reynolds stress; (ii) the realizability limiter is developed to overcome under-constraint of modeled stress; and (iii) constraints of fairness and noisy-sensitivity are first included in the training procedure. In such endeavors, an invariant, realizable, unbiased and robust data-driven turbulence model is achieved, and does gain good generalization across channel flows at different Reynolds numbers and duct flows with various aspect ratios.

Full PDF

11 Physics-guided deep learning framework for predictive modeling of Reynolds stress anisotropy

Chao Jiang

School of Civil Engineering, Harbin Institute of Technology, Harbin 150090, China [email protected]

Abstract

I. Introduction

In flows of engineering interest, Reynolds-averaged Navier-Stokes (RANS) simulation keeps an ongoing popular alternative to high-fidelity, but computationally-expensive methods, e.g., direct numerical simulation (DNS) and large eddy simulation (LES). Despite significantly reducing the computational complexity of the nonlinear Navier-Stokes system, RANS simulations inevitably introduce considerable inadequacies mainly due to inadequate turbulence models in representing the real turbulence physics, which has been well documented in the literature.

There is, on the other hand, ever-growing demand for improved RANS models: an ambitious desire to apply a reliable RANS-based prediction over the entire flight envelope in the foreseeable future.

4, 5

Even if DNS and LES eventually are feasible in engineering, they are expected to rely mainly on RANS-based near wall models. The needs are clear, while conventional model developments almost remain stalled, indicating a need for new ideas going forward. Very recently, encouraging advances are being made in improvements of model accuracy using the cutting-edge machine learning (ML) techniques.

6, 7

ML-augmented turbulence modeling with data-driven aspects provides a paradigm shift for turbulence modeling to overcome the well-known bottleneck of conventional intuition-based model creations. The first attempt to apply ML algorithms to turbulence modeling can go back to Cheung et al., who developed a Bayesian statistical model for the closure coefficients of Spalart-Allmaras model. Since then, data-driven turbulence modeling has been growing vigorously. Edeling et al.

9, 10 and Zhang et al. performed Bayesian inference on all closure coefficients of linear eddy-viscosity models and accompanying transport equations. The stochastic modeling always makes the obtained coefficients being spatial-independent. Instead, Weatheritt and Sandberg use gene expression programming to explicitly recalibrate the closure coefficients of nonlinear algebraic models as functions of the local flow field. Zhu et al. reconstructed a function-surrogated model to approximate the turbulent viscosity using the radial basis function. It is worth noting that, Ling et al. and Jiang et al. pioneered the use of deep neural network (DNN) to gain spatially-varying coefficients of nonlinear algebraic models. All aforementioned efforts are part of ML-augmented inverse modeling of the closure coefficients given a form-constraint RANS model. Rather than recalibrating a set of coefficients to reduce the parametric uncertainty, adding corrective source terms is another route to correct the existing RANS models, which focuses on improving the structural uncertainty. Dow and Wang made first attempt to statistically estimate the discrepancy between the inferred and modeled turbulent viscosity of the 𝑘 − 𝜔 model using Gaussian process. To achieve a dependence on the feature space, Duraisamy and co-workers proposed a promising data-driven framework comprising of Bayesian-based field inversion and ML-based function reconstruction to build the mapping of inferred discrepancy (e.g., a corrective multiplier or a direct corrective source) to local flow quantities. Following this practice, He et al. adopted continuous adjoint formulation to correct the Spalart-Allmaras model and Yang et al. combined Bayesian and DNN to correct 𝑘 − 𝜔 − 𝛾 − 𝐴 𝑟 transition model. Iaccarino and co-workers presented a novel approach to account for structural uncertainty by directly perturbing the elements in the spectral decomposition of Reynolds stress. Although not machine learning algorithms, this approach has two prominent advantages: (i) good interpretation because of turbulent kinetic energy, eigenvalue, and eigenvector representing the magnitude, shape, and orientation of perturbations; and (ii) easy implementation of realizability. Xiao and co-workers benefited from both above-mentioned practices and successfully used random forest to predict the discrepancy in the Reynolds stress projections. Yin et al. followed this direction using DNN. In comparison, the greatest challenge in the philosophy of turbulence modeling is to develop a universal model across multiple classes of flows (the ultimate barrier). This requires a more flexible framework to reduce both structural and parametric uncertainties, aiming at developing new models free from the existing RANS models, rather than improving them. Early attempts have been made to directly predict eigenvalues of stress anisotropy using ML algorithms,

32, 33 which can only account for structural uncertainty. It should be noted that ML-augmented data-driven approaches have successfully found hidden exact relations behind data; however a universally accurate turbulence model cannot be waiting to be uncovered because of loss information in the closure terms. Alternatively, the optimal models in a user-defined sense may be available. In this similar direction, we design two parallel ML-based modules to infer both integrity basis and closure coefficients based on data and domain-knowledge, which reduce structural and parametric uncertainties, respectively. The two modules are interactive through a multiplicative layer by analogy to tensor modeling, thus forming an interpretable framework of turbulence modeling. Each learned integrity basis with corresponding referred closure coefficient makes up one term of the overall polynomial to construct a functional model-from. There has been a strong belief that different flows could be simulated properly using the same form-constraint model but with possible different coefficients, indicating that closure coefficients reflect the physical effects from the flow boundary. As a result, the proposed framework can serve as a transfer learning framework for dynamic tasks: one can retrain the data-driven model by only re-updating the closure coefficients with fixed learned integrity basis (as learned modeling knowledge). It is an advantage when compared to models using a single ML-based module to directly predict Reynolds stress. Given the universal approximation, DNN (i.e., deep learning) is selected as the machine learning algorithm, which have achieved great success in many fields, such as image recognition and natural language processing. Kutz stressed major advantages of DNN in extracting features from informative data and capturing invariances. Ling et al. further demonstrated that DNN has a clear advantage of learning invariances through data augmentation over random forest. To investigate the performance of the proposed framework by itself, a controlled environment will be needed in which the learning dynamics of the framework, e.g., sources of gains and the upper performance limit, can be explored in isolation. Towards this goal, we first constrain the integrity basis to conventional tensor basis proposed by Lumley and Pope . Against this backdrop, we conduct this work and perform a systematic investigation on following three-fold: (i) how to select input features and utilize prior domain-knowledge; (ii) how to setup a fair and robust learning strategy to overcome or moderate prediction prejudice and noisy-sensitivity; and (iii) how to evaluate a data-driven model. We also note that such a version with pre-constrained integrity basis still represents a broad class of data-driven turbulence modeling and previous many works (e.g., Jiang et al., Weatheritt and Sandberg, and Ling et al. ) are its special cases. It could be expected that the class of methodology is very useful both in improving the existing models and in assisting to develop new models. Thus, it is naturally a logical starting point to explore a controlled version. Overall, this work seeks to develop a universal, inherently interpretable ML-based framework (IMLF) of turbulence modeling with build-in fundamental principles, under which a physics-informed residual network (PiResNet) is achieved. The remainder of the paper is structured as follows. Section II introduces some fundamental aspects of IMLF, focusing especially on integrating structural domain-knowledge into PiResNet, e.g., feature selection as inputs, physical constraints of outputs and a ML architecture by analogy to tensor modeling. Section III covers a fair-awareness and robust training strategy for PiResNet, aiming at achieving a high-fidelity representation of turbulence closures without imbalance-induced generalization prejudice and lower noisy-sensitivity. Section IV concerns PiResNet performance on prediction stages, including extrapolative accuracy, realizability and robustness. Finally, Section V draws the main findings and highlights some remaining key challenges for future extensions of this work. II. Methodology: Proposal with Physical Perspective

For incompressible turbulent flows, the RANS momentum equations can be stated as p kt  +   +  −  = −   u u u u a (1) where 𝒖̅ , 𝑝̃ , 𝑘 , and 𝜈 are the mean-flow velocity, nominal pressure (including kinematic pressure and isotropic part of Reynolds stress), turbulent kinetic energy, and kinematic viscosity, respectively. The stress anisotropy tensor 𝒂 ≡ 𝝉/(2𝑘) − 𝐈/3 where 𝝉 and 𝐈 are respectively the Reynolds stress and identity tensors, represents effects of unresolved turbulence and needs closure modeling. For the closuring purpose, the stress anisotropy can only be related to the mean-flow field. Despite some progress

3, 42-45 in recent years, efforts in conventional turbulence modeling have considerably diminished. There may be the following reasons. It is very difficult to derive, in theory, an accurate analytic model only based on developer’s limited knowledge, experience, and intuition. Even if a more accurate model-form is achieved, its potential benefits still depend on a set of coefficients to be determined. Jiang et al. have demonstrated in details that conventional methods to calibrate the closure coefficients are facing many disadvantages. The existing rich data from high-resolution simulations and experiments, on the other hand, is creating favorable conditions to inform closure models. ML algorithms have powerful capability in distilling more informative knowledge from the raw data. Thus, the present work is motivated to develop a ML-augmented framework by fusion of data and domain-knowledge to reconstruct a mapping from the mean-flow field to the stress anisotropy. A. Summary of the proposed IMLF framework

Turbulence modeling, at RANS level, is a classic example of multivariate problems where the stress anisotropy tensor must be modeled to represent nonlinear interactions between mean-flow and turbulent motion in different directions. Tensor modeling is well known to a good physical option in multivariate problems due to its objectivity. Generally, a given multivariate problem can be formulized to a polynomial of a set of integrity basis with corresponding closure coefficients, just like Lumley and Pope did. Accordingly, we design two parallel ML-based modules, as depicted in Fig. 1, to be responsible for distilling integrity basis and closure coefficients from high-fidelity data, both of which are integrated through a multiplicative layer to constitute the stress anisotropy tensor. In either module, there is enough flexibility to build a desired DNN architecture, e.g., plain neural network, residual neural network

46, 47 and Bayesian neural network, thus indicating a universal, inherently interpretable ML-based framework for turbulence modeling (denoted as IMLF). According to the well-defined concept by Rudin, the interpretability of the resulting IMLF, first and foremost, lies in its constrained model-form as : , : , , f f = q g Q a g (2) where { , , } I I = q and (1) (2) { , , } = Q S S are two vectors of scalar-valued input features i I and tensor-valued input features ( ) i S ; { , , } g g = g and (1) (2) { , , } = T T are two vectors of learned scalar-valued closure coefficients i g and tensor-valued integrity basis ( ) i T . The data-driven model represented by Eq. (2) has a structure of ( ) ii g =  a T , which helps easily interpret what IMLF has learned from informative data. f and f account for parametric and structural uncertainties, respectively. The modeling process is straightforward. When integrity basis ( ) 101 { } i i − T are constrained to those of Pope, IMLF degrades to a general form of conventional algebraic models. Thus, IMLF is a more universal framework than existing models based on the representation theorems (Hilbert basis theorem and Cayley-Hamilton theorem ) and has more flexibility. To a great extent, the transparency of IMLF gains the superiority of symbolic regression based on evolution algorithms.

12, 13

In contrast, there is no analytic form to interpret the structure of learned model discrepancy by other two classes of data-driven approaches: correcting transport equations by Duraisamy et al. and eigenspace-projected Reynolds stress by Xiao et al.

Second, IMLF obeys prior domain-knowledge that we inject into IMLF hereafter, such as invariant properties (rotational, reflectional, extended Galilean and scale invariances), corrective turbulent timescale, realizability constraints, unbiased principles, and robust regularizations. Alber et al. conducted a systematic investigation of ML-based multiscale modeling with no fundamental laws of physics and received non-physical solutions and ill-posed problems, indicating the significance of domain-knowledge. All prior domain-knowledge is thus reasonably converted into modeling knowledge of IMLF during our model development. However, some aspects are missing or mistaken in previous works.

13, 15, 16, 28

Third, the unbiased and robust learning strategy adopted in this work help interpret how IMLF achieves fair-awareness and noisy-insensitive predictions. To develop an inherently interpretable model is of critical importance for physical problems. As emphasized by Wu et al., interpretability requirements are intimately related to the ultimate barrier in turbulence modeling: Can we achieve a universal turbulent constitutive relation across multiple classes of flows? However, interpretability has not been reasonably considered. IMLF is making attempts to bridge the gap. After designing IMLF, we first utilize training data to establish a data-driven surrogate model of the stress anisotropy by minimizing the following objective-function to determine the optimal trainable parameters   of the adopted ML algorithm ˆarg min{ }, g wF   = − + + a g g w   (3) where F  ,  , and  denote Frobenius norm, L -1 norm, and L -2 norm, respectively. ˆ a is targeted stress anisotropy from high-fidelity data. The first term in the objective-function is the training error representing the quality of training. The second term is a sparsity-inducing penalty on the closure coefficients, and the third term is noisy-insensitive (i.e., robust) penalty on the trainable parameters of the ML algorithm. g  and w  are penalty factors and trainable parameters are updated using the mature backpropagation algorithm. More details in Section III. The resulting data-driven model reads ( , ; ). f f  = a q Q  (4) Then we insert the learned model into the computational fluid dynamics (CFD) environment represented by Eq. (1) and evaluate its predictive performance for quantities of interest (QoIs) in unseen flows. The overall procedure is briefly summarized as follows: (1) Data collection and preparation. Collect representative datasets for training and testing flows from high-resolution experiments and high-fidelity simulations, and reformat all raw data with a unified standard. (2)

Feature selection and computation. Select input features reasonably according to the general principles (e.g., mean-flow field variables, invariance properties, compactness) and compute input features and targeted stress anisotropy both for training and testing flows. (3)

Data sampling and model training. Develop a fair sampling technique to obtain a training dataset with enough sample diversity and design a fair and robust learning strategy (e.g., unbiased outcomes for different flow domains, stress components and coordinate systems, noisy-insensitivity to input features) to achieve a desired data-driven model. (4)

Prediction and evaluation. Perform the learned model in unseen flows and evaluate its predictive skills (e.g., efficiency, accuracy, robustness). The overall workflow of IMLF is shown in Fig. 1. Prior domain-knowledge such as physically-motivated selection of input features and corrective turbulent timescale considering viscous effects, it should be noted, is considered in the design phase (Step 2). Realizability properties to constrain the learned model are enforced both in the training (Step 3) and testing phases (Step 4). A scale- and frame-invariant objective-function as well as regularization constraints both on trainable parameters and closure coefficients are implemented in Step 3. Finally, the propagation test of modeled stress anisotropy to QoIs in predictive settings is conducted in Step 4. By fusion of informative data and domain-knowledge, the developed model under IMLF is both data- and knowledge-driven. Each component is detailed hereafter.

FIG. 1. The development and evaluation lifecycle of the interpretable machine-learning framework (IMLF) for turbulence modeling containing three phases: a design of IMLF integrated with domain-knowledge (Phase I), a fair and robust training strategy for IMLF (Phase II), and a systematic performance testing of IMLF (Phase III).

Using the procedure described above, the mappings : f q g and : f Q are eventually learned as regression functions. Rather than stochastic functions by Gaussian process or Bayesian inversion

8, 11, 56 , we aims at developing a deterministic predictor using DNNs. ML-based deterministic modeling is computationally cheaper than stochastic modeling. Besides, stochastic modeling always depends on data space when using Gaussian process or on parameter space when using Bayesian inversion, both not on feature space. Such models have no physical content other than the assumption of smoothness. Stochastic models also encounter poor extrapolation capability due to their geometry-dependence. Finally, stochastic models targeted at the mean-flow velocity do not satisfy realizability constraints for stress anisotropy. In comparison, deterministic modeling using DNNs can move beyond these limitations. In particular, all reasonable principles regarding conventional turbulence modeling are strictly respected in data-driven turbulence modeling under IMLF. Most importantly, given rotational and reflectional invariances of RANS equations represented by Eq. (1), the overall regression functions f f representing the data-driven model must be form-invariant w.r.t. rotations and reflections of the coordinate system. That is, a transformation of f f under arbitrary orthogonal tensor R ( T T = = I RR R R and det 1 =  R ) should satisfy the relation as   ( , ; ) ( , ; ) . T T f f f f   = q RQR R q Q R   (5) The superscript “ T ” means transpose of a tensor. It should be noted that RANS equations obey invariances under rotational and reflectional transformations of the coordinate system while none of terms in RANS equations (including the stress anisotropy tensor) are rotational or reflectional invariances. Accordingly, we seek to the rotational or reflectional invariances on regression functions f f , not on the stress anisotropy itself. Obviously, the scalar-valued functions : f q g as functions of all frame-invariant scalars is invariant w.r.t. rotations and reflections of the coordinate system. Thus, Eq. (5) reduces to ( ) ( ). T T T f f  =

R R R Q R RQR (6) To satisfy Eq. (6), : f Q should be designed as tensor-valued isotropic functions of the arguments. Speziale et al. and Pope explicitly constructed such functions for pressure-strain correlation and the stress anisotropy, respectively. In comparison, Eq. (6) herein is represented by a ML-based module. However, it is always misunderstanding that the input-output equivariance can be guaranteed by choosing rotational and reflectional invariants as input features, for instance, that is just the case of Wang et al. and Wu el al. Considering that there is no explicit analytic form (as an aside, strictly existing but considerably complex to understand) in most data-driven turbulence modeling, one must take care of the construction of pre-defined functions by adopted ML algorithms. Ling et al. used data augmentation to preserve approximate invariances under rotational and reflectional transformations. Furthermore, regarding the dynamical processes of RANS equations, the Reynolds stress 𝝉 reduces from Euclidean invariance to extended Galilean invariance, not just Galilean invariance as usually stated. It is therefore easily deduced that the stress anisotropy 𝒂 is also extended Galilean invariant owing to extended Galilean invariance of both turbulent kinetic energy 𝑘 and Reynolds stress 𝝉 . When using isotropic functions, extended Galilean invariance of the predicted stress anisotropy 𝒂 can be easily guaranteed by constraining extended Galilean invariance to input features { , } q Q . However, input features of most previous works such as Xiao et al., Zhu et al., Yin et al., and Ling and Templeton, are not extended Galilean invariant (even some are not Galilean invariant), which could destroy the properties of predicted Reynolds stress and the extrapolation capability. Notably, Ling et al.

16, 39 also confused the two concepts of rotational/reflectional invariance and Galilean invariance (an invariance under an inertial frame with constant translations). Besides, dimensionless stress anisotropy is scale-invariant owing to the Reynolds-number similarity of RNAS equations. Nondimensional input features are therefore needed. Lastly, the stress anisotropy is bounded by a set of inequality constraints, i.e., realizability requirements, which have been comprehensively studied by Lumley and Banerjee et al. However, these constraints are not always satisfied using existing limiters, e.g., Ling et al. (shown in Section IV. A). All four considerations mentioned above are important prerequisites of modeling stress anisotropy and presented below. We focus on clarifying some existing unreasonable practices. B. Data-driven and knowledge-driven aspects

A large amount of available data does provide a special opportunity for ML algorithms to inject new vitality into turbulence modeling. However, this does not mean that existing knowledge about turbulence modeling should be shelved. Alber et al. demonstrated that pure data-driven models (in biological, biomedical and behavioral sciences) always encounter difficulties in achieving reliable and interpretable results. On the one hand, although data contains rich information about a physical system to be modeled, a complete dataset or a complete set of input features is not always available in practice. Complete information thus cannot be fed into the data-driven model. Some redundant and irrelevant input features even play the opposite role. Even if such a complete and compact dataset and input features are eventually achieved, on the other hand, it is still difficult for ML algorithms to directly learn all precise physical relationships, especially inequities and qualitatively physical properties. For instance, pure data-driven models can easily obtain sufficiently accurate predictions for stress anisotropy components (let alone nonzero training error for the extrapolation consideration), but difficult to satisfy the realizability constraints which provides a set of inequalities for stress anisotropy components. Finally, without any prior domain-knowledge, pure data-driven models will pay the price in other ways. Wu et al. and Ling et al. used data augmentation to preserve invariances of regression functions which significantly increased the computational cost. More than that, the pure data-driven methodology is prone to encounter convergence difficulty in training stages and potential failure risk in prediction stages (not intrinsic invariances). It is worth noting that pure data-driven models have been facing ill-posed problems (due to thousands of network parameters to be determined) and non-physical solutions. Thus, existing domain-knowledge must be reasonably considered in synergy with data-driven aspects to accelerate turbuelence modeling. The present work is motivated to develop a data- and knowledge-driven turbulence model. Section II.A outlines what domain-knowledge will be embedded in the data-driven model and how domain-knowledge helps achieve a inherently interpretable model. Addtionally, domain-knowledge introduced in this work helps IMLF gain other benifits: (i) siginificant reduction on the need for training data with invariaces, (ii) preserving phsyical solutions with realizability constraints, (iii) improvements on fair predictions with a fair leanring statrgy, and (iv) improvements on robust predictions with regularization of both trainable parameters and closure coefficients. In the following, we are centered about how to intergrate these domain-knowledge into the developed model. C. A general principle to select input features

The present work seeks to a deterministic model depending on feature space rather than a spatial-independent stochastic model. Thus we need to construct proper input features from the mean-flow field { , , , , } p k   u where  u , p  , and  are mean velocity gradient, mean pressure gradient, and the dissipation rate of turbulent kinetic energy, respectively. The fundamental principle to select input features { , } q Q is according to the requirements of targeted stress anisotropy 𝒂 and the properties of regression functions f f . It should be noted that the stress anisotropy 𝒂 rather than Reynolds stress 𝝉 is selected as the predicted target. Two dynamically similar flows share the same 𝒂 but different 𝝉 . Thus, one can benefit from two aspects when using 𝒂 as the predicted target: (i) a more compact training data to reduce computational cost, and (ii) a scale-invariant objective-function to guarantee the validity of hyper-parameters in ML algorithms across dynamically similar flows. For targeted stress anisotropy 𝒂 , there are four physical principles: (i) extended Galilean invariance, (ii) scale invariance, (iii) symmetry, and (iv) realizability constraints. To satisfy rotational and reflectional invariance, f and f are isotropic functions “implicitly” represented by ML-based modules. Thus, the input features must obey the same principles as targeted stress anisotropy 𝒂 except for the last principle. Realizability constraints are a set of inequalities among stress anisotropy components, thus being considerably difficult to satisfy by constructing input features. Instead, realizability constraints are directly enforced to predicted stress anisotropy 𝒂 . Besides above-mentioned requirements, the selected input features must have clear physical justifications. Symmetric tensor of the mean-flow velocity gradient ( ) / 2 T   +  S u u representing strain rate and antisymmetric tensor ( ) / 2 T   −  u u  representing rotation rate are most widely-use3d flow variables in flow analysis and conventional turbulence modeling. Considering that two training and testing flows are homogenous in the streamwise direction, only p  is excluded, leading to a general mapping from the mean-flow field to the stress anisotropy ( , , , , ). k = a a S  (7) Arguments of Eq. (7) S ,  , k ,  , and  are extended Galilean invariant. Regarding extended Galilean invariance, there are two things to be clarified. One is that extended Galilean invariance of input features have been rarely considered reasonably to reproduce extended Galilean invariance of the stress anisotropy. In previous many works such as Zhu et al., Wang et al. and Ling and Templeton , several input features are not Galilean invariant (also not extended Galilean invariant); part of input features by Wu et al.

29, 30 are not extended Galilean invariant. Such input features do destroy the physical properties of the stress anisotropy, i.e., qualitatively incorrect. Another is an interesting but somewhat controversial topic about the validity of frame-indifference in turbulence modeling. Earlier works of Speziale and Charles arrived at erroneous conclusions, not considering the dynamical process, that 𝒂 is frame-indifference – a guiding principle in continuum mechanics put forth by Noll. Consequently,  was excluded in turbulence modeling because it is not frame-indifference. Later, Lumley argued, based on experimental observations, that  should be introduced into the constitutive model to account for effects of rigid rotation on Reynolds stress. Wu et al. hold the same idea, indicating that  only exists in some special flows. However, Huang and Durst recently demonstrated analytically that 𝒂 is not frame-indifference, implying that  can serve as an effective constitutive argument in turbulence modeling and not just in some flows. Thus, 𝒂 is not independent of the observers. For integrity basis inference : f Q , input features and outputs should be dimensionless and extended Galilean invariant. Outputs also should be symmetric and realizable. Thus, we select / k  s S and / k  ω  as input features. Specifically, we constrain regression functions ( ) 102 1 : { , } { } i i f = = = Q s ω T to a general form of algebraic models by Pope , thus achieving a special version of IMLF as g g g g = + − + − + − + I I a s s s sω ωs ω ω (8) where tr( )  denotes the trace. For coefficients inference : f q g , we select traditional flow variables (related to the traces) as input features. See more details in Refs.

16, 41

For instance, first two terms m kl kl s s s  and m kl kl ω ω ω  are extended Galilean, rotational/reflectional and scale-invariants and related to the positive second invariant of  u , i.e., ( ) / 4 c m m Q ω s  − , which is a criterion to detect vortical structures. It is worth noting that Eq. (8) cannot achieve satisfactory predictions only depending on conventional tensor invariants ={ , , } m m s ω q and an additional input feature must be introduced. In Section II.D, the reasons of the failure of Eq. (8) are detailly analyzed. Previous works of Ling et al. and Weatheritt and Sandberg encounter this difficulty, which is also noticed recently by Geneva and Zabaras. The present work provides a solution. Additionally, the requirement that input features must be dimensionless is also from ML algorithms to solve physical problems. It is very difficult for ML algorithms to guarantee the expected dimension of targeted outputs when using dimensional input features due to the uncontrollable and complex nested functional form of ML algorithms. Such a model is qualitatively incorrect even if a good prediction may be provided numerically. This aspect is always ignored. Finally, input features should be compact and complete for physical problems. Due to the lack of prior knowledge about arguments, it is difficult to obtain ideally complete input features. One possible way in the near future is to perform feature compression and extraction from redundant input features using autoencoder. With more input features, the learned model is more complex and more difficult to understand. For physical problems, models are expected to be as simple as possible which is well-known as “Occam's Razor”. The performance of our model is significantly improved by adding only one input feature as shown later. It cannot be overemphasized that the construction of input features is of critical importance when applying ML algorithms to physical problems. When physical properties of input features are not satisfied, data augmentation as an alternative results in noncompact training data and an increase in computational cost. Ling and Templeton have built a procedure to select input features and Wu et al. have extended it. The present work focuses on the general principles and perfect them. D. Preserving uniqueness with a new timescale

To establish a deterministic ML-based predictor (i.e., the data-driven model), a unique mapping from selected input features to targeted outputs in the training dataset is a fundamental prerequisite. However, using { , , } m m s ω = q and { , } = Q s ω as input features cannot satisfy this condition in turbulent flows. We perform Eq. (8) in two-dimensional fully-developed channel flows to demonstrate the drawbacks and analyze the causes. In such flows, only / dU dy ( U is the mean streamwise velocity and y denotes the wall-normal direction) exists. Accordingly, all nonzero mean-flow quantities are listed below

12 21 12 21 m m m m k dUs s s s dy     = = = − = = = (9) The flow is controlled by the dominant shear stress a . Substituting Eq. (9) into Eq. (8) leads to

212 1 6 m m m a g s g s = − (10) where coefficients g and g are ultimately dependent on m s . Thus, the resulting Eq. (10) is expressed as a one-variate function of m s . In contrast, Fig. 2(a) gives the true relationship between a and m s at Re  = , exhibiting unexpected nonunique mappings, i.e., ( ) ( ) m A m B s y s y + + = ,

12 12 ( ) ( )

A B m my y a s a s + +  . Obviously, Eq. (10) is inconsistent with Fig. 2(a). As a result, Eq. (8) cannot be used as the regression function to establish a functional mapping from selected m s to targeted a given a training dataset. So what will happen when Eq.(8) is trained on this inherently inconsistent data? Data-driven models of Ling et al. (denoted as TBNN) and Weatheritt and Sandberg can also be represented by Eq. (8). The only difference is the truncation order: the former is fully complete while the latter is 3. We take TBNN as an example to quantitatively illustrate the drawbacks of original Eq.(8). We perform two training scenarios: TBNN-1 is trained on the whole data while TBNN-2 is trained using the data far from the wall ( y +  ). Both results are shown in Fig. 2(b). TBNN-1 shows worse training performance than TBNN-2 (nonunique mappings only in the circle-marked region as shown in Fig. (1)), indicating that nonunique mappings in the training dataset make training more difficult. Additionally, the training process of TBNN-1 shows bias toward the region where more data exists, as shown in Fig.2 (b), because more data means bigger contribution to the overall training error. Obviously, TBNN-1 is going to get worse when the amount of training data is equal in regions far from the wall and near the wall, and at the same time achieves equally bad predictions for both regions. It is noted that this is why we design a fair data sampling technique to achieve fair predictions for different regions in Section III.B. Lastly, both trained TBNN-1 and TBNN-2 provide worse predictions of a in the same flow where only meshes are different from those in training cases, indicating that TBNN-1 and TBNN-2 merely fit on the existing data and have no generalization capability. Recently, Geneva and Zabaras also noticed that TBNN is difficult to train and yields satisfactory predictions. This problem is common for models based on Eq. (8). FIG. 2. Two-dimensional fully-developed channel flow at 𝑅𝑒 𝜏 = 5200 : (a) the shear stress anisotropy 𝑎 against characteristic strain rate 𝑠 𝑚 from DNS, and (b) two TBNN-predicted shear stress anisotropy 𝑎 𝑠 𝑚 with different meshes from that of DNS. From the mathematical standpoint, the number of input features is not enough. Due to { , } { , }

A B y y + + = s ω s ω , ( ) 102 1 : { , } { } i i f = = = Q s ω T provides the same value for the integrity basis at locations A and B, i.e., ( ) 10 ( ) 101 1 { } { }

A B i ii iy y + + = = = T T . To return different values for a at locations A and B as expected in Fig. 2(a), ( ) 10 ( ) 101 1 { } { } A B i ii iy y g g + + = =  must be guaranteed. However, Eq. (8) always provides the same value ( ) 10 ( ) 101 1 { } { } A B i ii iy y g g + + = = = due to { , , } ={ , , } A B m m m my y s ω s ω + + . Obviously, Eq. (8) returns the same prediction at locations where m s is the same. Thus, an additional input feature I should be introduced, which must be rotational/reflectional, extended Galilean and scale-invariant, and satisfy the following relation under any ( ) ( ) m A m B s y s y + + = as ( ) ( ). A B

I y I y + +  (11) From the physical standpoint, an inappropriate turbulent timescale is chosen in Eq. (8). There are two assumptions to derive Eq. (8) from Eq. (7): (i) the feature  is excluded as an argument, and (ii) the macro timescale of turbulence / k  is chosen as the flow timescale from the wall to the fully turbulent region. Lumley showed that Reynolds stresses are uniquely related to the mean-flow fieled and macroscale of turbulence for a high-Reynolds-number nearly homogeneous flow. This is why conventional turbulence models offer good predictions only in the regions far from the wall. Jiang et al. made a comparative study to demonstrate this issue. In the near-wall region, a correction is needed to account for the dominant viscous effect. According to the expansions of turbulent velocity field near the wall by Hanjalic and Launder, the turbulent timescale (i.e., the ratio of the length scale of the energy-containing eddies to the turbulent velocity scale) approaches a nonzero value. However, the commonly-used timescale / k  vanishes at the wall due to lim 0 y k → = . The timescale is expected to be the Kolmogorov timescale /   because viscous dissipation dominates near the wall. Thus, we define a new timescale as

22 22 , 1 , tI t t ck kc Re      + =  + (12) where ( 0) t c  is a weighting parameter and / t Re k v  is turbulent Reynolds number which is widely used in low-Reynolds-number models. ( 1)   reflects the ratio of the new timescale to conventional timescale. The new timescale I  has good properties: (i) it recovers to standard / k  at large t Re , and (ii) it never falls below / ( 1) t t c Re   . It is worth noting that I  can prevent the singularity at the wall due to the vanishing of / k  . With I    = s S s and I    = ω ω  , Eq. (8) can be rewritten as g g g g             = + − + − + − + I I a s s s s ω ω s ω ω (13) all closure coefficients are determined using : f   q g where ={ , , } m m s ω    q , m m s s  = , and m m   = . Thus, in two-dimensional fully developed channel flows, a of Eq. (13) eventually depends on both m s ( m m s = ) and t Re at a given t c . In the following, we examine whether t Re confronts the nonunique mapping problem. As Fig. 3(a) shown, at any two locations with the same strain, i.e., ( ) ( ) m A m B s y s y + + = , there always exist two different values for t Re , i.e., ( ) ( ) t A t B Re y Re y + +  . Similarly, at any two locations with ( ) ( ) t A t B Re y Re y + + = , there always exist ( ) ( ) m A m B s y s y + +  . That is, ( ) ( ) m A m B s y s y + + = and ( ) ( ) t A t B Re y Re y + + = do not occur at the same time. The selection of t Re satisfies the condition of Eq. (11) and successfully overcomes the nonunique input-output mappings in conventional models represented by Eq. (8). In Section IV, we will test its performance quantitatively. The last question is how to determine the value of t c in order to directly utilize Eq. (13). We note that  means a correction for conventional turbulent timescale / k  . Thus, a different value of t c means a different range where the correction is activated. In the viscous sublayer ( y +  ), k cy = where c is inversely proportional to the square of timescale. We set t c c v = with the Kolmogorov timescale in the vicinity of the wall. At the same time, the transport equation of k reduces to / v k y   = at the wall. Combining above relations yields t c = . This derivation implies an assumption: a correction takes place only in the viscous sublayer. We examine this value with DNS. The variation of  at different values of t c based on DNS are shown in Fig. 3(b). Obviously,

2( 1) t c n = = in Fig. 3(b) corresponds to a correction within y +  (slight departure from y +  ). Besides,

4( 4) t c n = = and t c n = = corresponds to corrections within y +  and y +  , respectively. However, there is no prior knowledge about the exact upper bound below which a correction should be applied. Consequently, a strictly reasonable value of t c may not be available. Thus, Eq. (13) cannot be directly used for turbulence modeling because input features { , , } m m s ω   = q and { , }   = Q s ω are undetermined. To overcome this difficult, we reform Eq. (13) as c c c c = + − + − + − +

I I a s s s sω ωs ω ω (14) where the closure coefficients c g   , c g   , c g   and c g   can be eventually expressed as functions of { , , } m m s ω and t Re if t c is known. The advantage of Eq. (14) is that the dependence of Eq.(13) on t c is entirely imputed to the closure coefficients { } i i c = . The unknown parameter t c can be further implicitly considered by the ML-based modules when using { , , } m m s ω and t Re as input features to directly regress { } i i c = . Now the problem is closed up. We also note that, although Eq. (14) is directly used for data-driven turbulence modeling herein, the physical meaning of Eq. (14) lies in Eq. (13) which interprets why to select t Re as one of input features. E. Enforcing realizability constraints to outputs

As noted previously, realizability is a physically valuable guiding principle in developing turbulence models which is useful in numerically stable predictions. Efforts have been made to achieve realizable solutions in conventional models, however less attempts are made in data-driven turbulence modeling. We directly enforce realizability constraints to the targeted stress anisotropy rather than to closure coefficients themselves. An implementation on closure coefficients may be practicable in conventional models, but considerably difficult in data-driven models because closure coefficients are “implicitly” represented by a ML-based module. On the other hand, for data-driven models directly reconstructing the stress anisotropy, it is also needed to directly implement realizability constraints on the stress anisotropy. Thus, we develop a general procedure. FIG. 3. Two-dimensional fully-developed channel flow at 𝑅𝑒 𝜏 = 5200: (a) the shear parameter 𝑠 𝑚 and turbulent Reynolds number 𝑅𝑒 𝑡 along the wall-normal direction 𝑦 + , and (b) the ratio of new turbulent timescale to macro timescale along the wall-normal direction 𝑦 + . Results at different values of the weighting parameter 𝑐 𝑡 are shown. Schumann showed that realizability constraints contain a set of inequalities, however they are unfriendly to provide an effective correction strategy in computational codes. Instead, we select the following inequalities a   − ， (15a) a a a     + + ， (15b)   − ， (15c)   − ， (15d) where ( 1, 2,3) i i  = are eigenvalues of the stress anisotropy 𝒂 ( )     . It is noted that no summations are taken for Greek indices. The above conditions are mathematically rigorous and physically important. The first condition Eq. (15a) prevents negative energy. The second condition Eq. (15b) originates from the Schwarz inequality and offers the upper bound for the shear stresses. The remaining conditions Eqs. (15c)-(15d) are supplied to ensure the Reynolds stress positive semidefinite. Thus, Eqs. (15a)-(15d) are sufficient to satisfy the realizability constraints. Despite selecting constrained conditions, a correction strategy should be further developed when Eqs. (15a)-(15d) are not completely satisfied. Accordingly, we propose a progressive iteration realizability (PIR) scheme as the corrector which is detailed in algorithm 1. The PIR scheme is applied to both training and testing phases, thus achieving a physically consistent and numerically stable data-driven model. In comparison, conditions adopted by Ling et al. are under-constraint. A comparative testing will be conducted in Section IV.A. III. Training: Data Preparation and Learning Strategy

In this phase, we provide a systematic procedure to train a data-driven turbulence model using ML algorithms, especially focusing on how to design a fair and robust learning strategy in order to achieve unbiased and noisy-insensitive predictions. Fairness in data-driven turbulence modeling is a completely new concept although it has been an important consideration in many fields, such as high-risk decision-making scenarios. As a domain-specific concept, fairness herein refers to the absence of any prejudice or favoritism in predicted outcomes toward specific flow domain, stress component and coordinate system . However, the problem of unfairness is essentially the same in data science, which arises from the hidden or neglected biases in selection of training data or objective-function design. For instance, we expect to improve the predictive performance of a model in a certain flow domain, however we may use fewer data in this flow-domain to train this model. It is a simple truth that less data means less contribution to the overall training error and thus lower accuracy. Obviously, fairness is of critical importance especially for data-driven turbulence modeling because obtaining a data-driven model heavily depends on the training strategy based on training data. We present two kinds of fair measures, i.e., a clustering-based data sampling technique and a fair objective-function design. Another concern is the robustness of a data-driven model. Robustness herein refers to the insensitivity of a data-driven model to a certain degree of noise or perturbation in input features . To demonstrate this problem, we conduct a brief review about how it is settled down in conventional turbulence models. Take the explicit algebraic stress model developed by Gatski and Speziale as an example, the turbulent viscosity can be expressed as follows:

22 2 2 2 2 2 2 2 cc c c c c c c c         + − + + + + + + ， (16) where c  and c  are strain-dependent coefficients. These three algebraic relations are nearly identical for turbulent flows that are close to equilibrium. However, the right-hand side of Eq. (16) exhibits the best numerical stability because the resulting model responds more insensitively to the small perturbations in the strain. The key lies in noisy-insensitivity of modified turbulent viscosity to strain-dependent coefficients. Jiang et al. summarized the general principle for conventional explicit turbulence models. In comparison, there have not been such considerations in “implicit” data-driven turbulence modeling. Our work is an initial attempt. A. Representative dataset for training and testing

Representative datasets of turbulent flows should be carefully selected to investigate the predictive skills of a data-driven turbulence model in capturing different physical phenomena in turbulent flows. The two-dimensional fully-developed channel flow is a general representative of pressure-driven wall-bounded flows. This flow is dominated by the shear stress and has nonlocal effects. Most conventional turbulence models cannot offer approximate predictions in near-wall region. We use this simple parallel shear flow to demonstrate whether additional input feature t Re can capture the flow characteristics near the wall. Four DNS datasets at various friction Reynolds numbers (based on the friction velocity 𝑢 𝜏 and half-channel height h ) are used: (i) 𝑅𝑒 𝜏 = 650 from Iwamoto et al., (ii) 𝑅𝑒 𝜏 = 1000 from Bernardini et al ., (iii) 𝑅𝑒 𝜏 = 4200 from Hoyas and Jimenez, and (iv) 𝑅𝑒 𝜏 = 5200 from Lee and Moser. Additionally, three-dimensional fully-developed duct flow is also selected as a canonical example due to its rich characteristics (e.g., secondary vortical structure) and extensive engineering interest. This flow is homogeneous in the streamwise direction, 𝑧 and 𝑦 denote the directions parallel to the horizontal and vertical walls. Six DNS datasets at various 𝑅𝑒 𝜏 and aspect ratios (AR) are used:

83, 84 (i) AR=1, 3, 5, 7 at 𝑅𝑒 𝜏 =180 and (ii) AR=1, 3 at 𝑅𝑒 𝜏 = 360 . Both training and testing datasets are from these two flows. To utilize data from different sources, the raw data should be preprocessed by a unified standard. For two flows adopted herein, all quantities are nondimensionalized by different flow variables. In the following, we show how to reformat data as we expect and to compute the input feature 𝑅𝑒 𝑡 . For channel flows, all raw data are normalized by 𝑢 𝜏 and viscous length scale 𝜈/𝑢 𝜏 (signified with +), thus / k k u + = and / u    + = . So, we get / ( ) / t Re k k   + +  = and the result at 𝑅𝑒 𝜏 = 𝑢 𝑏 and h (signified with *), thus * 2 / b k k u = and / b h u    = . So, we get / ( ) / t b Re k Re k   = where / b b Re u h v  is the bulk Reynolds number. Other input features and targeted stress anisotropy are computed similarly. Representative input features 𝑅𝑒 𝑡 and 𝑠 𝑚 for duct flows is shown in Fig. 4. It is noted that m m s = ω in channel flows and m m s ω  in duct flows, thus redundancy feature m ω being omitted. FIG. 4. Representative input features (a)-(b) 𝑅𝑒 𝑡 and (c)-(d) 𝑠 𝑚 for duct flows at (a)-(c) 𝑅𝑒 𝜏 = 180 with 𝐴𝑅 = 1 and (b)-(d) 𝑅𝑒 𝜏 = 360 with 𝐴𝑅 = 3 . Only lower left quadrant of the duct is shown due to symmetry.

B. Clustering-based unbiased sampling technique

It is common knowledge that a ML-based training process could exhibit a preference toward a pattern that is shared by more training data, especially when adopting the commonly-used mean squared error as the objective-function. As a result, an unfair data-driven model is achieved. TBNN-1 in Fig. 2(b) is just the case, where TBNN-1 provides better predictive performance in the region far from the wall because there is more training data in this region. A “pattern” that covers more data accumulates larger error and thus makes bigger impact; in return this pattern is taken high-priority care of by the error backpropagation. Thus, an unbiased strategy is needed to ensure equal sample diversity from training data. A k-means clustering algorithm as an unsupervised ML classifier is employed here to assign the data into groups without prior knowledge. Considering that the stress anisotropy is targeted in the objective-function, the stress anisotropy is selected as flow features for clustering. Consequently, the k-means algorithm automatically categorizes the flow whole domain into two different subregions, as shown in Fig. 5, resulting in similar patterns within the same cluster, i.e., nearly homogeneous flows in a region far from the wall and nonhomogeneous flows in a region near the wall. Conventional turbulence models, as demonstrated by Jiang et al. , cannot account for near-wall flows. This clustering result, not surprisingly, reflects the effects of wall proximity and can be explained using layered eigenvalues of the stress anisotropy. As shown in Fig. 6, the spatial distribution of each eigenvalue is consistent with the clustering result in Fig. 5. The major role of adopted k-means algorithm is to quantitatively characterize the boundary between two subregions, which cannot be qualitatively extracted from Fig. 6. FIG. 5. Clustering results for duct flows at (a) 𝑅𝑒 𝜏 = 180 with 𝐴𝑅 = 1 and (b) 𝑅𝑒 𝜏 = 360 with 𝐴𝑅 = 3 . The whole domain is separated into two subregions: regions far from the wall (blue) and near the wall (yellow).

On the other hand, for two-dimensional channel flows, the clustering boundary approximately locates where the maximum shear stress takes place with a scaling law 𝑅𝑒 𝜏3 10⁄ . Chen et al. derived theoretically this place has a scaling law of 𝑅𝑒 𝜏1 3⁄ . From the above discussion, it can be seen that a clustering-based data sampling technique for data-driven turbulence modeling is not only a mathematical operation but also has a physical meaning. Accordingly, an oversampling technique based on this clustering technique is developed to rebalance equal sample diversity between two clustered subregions, and thus guarantee that different flow patterns are treated fairly in the training process. It is worth noting that the clustering-unbiased sampling technique herein will be very useful to develop a universal data-driven model across multiple classes of flows in the near future. FIG. 6. Eigenvalues ( 𝜆 , 𝜆 , 𝜆 ) of the stress anisotropy for duct flows at (a)-(c) 𝑅𝑒 𝜏 = 180 with 𝐴𝑅 = 1 and (d)-(f) 𝑅𝑒 𝜏 = 360 with 𝐴𝑅 = 3 . The first to third columns corresponds to 𝜆 , 𝜆 , and 𝜆 , respectively. Only lower left quadrant of the duct is shown due to symmetry. C. Neural network design and training experiments

Under proposed IMLF shown in Fig. 1, we use { , , } t m

Re s = q and { , } = Q s ω as input features and constrain the learned integrity basis to conventional tensor basis of Pope. This simplification aims to create a controlled environment in which main influence factors regarding IMLF are well behaved and thus the dynamical properties of IMLF can be explored in isolation in the near future. Moreover, data-driven inverse modeling under a set of constrained integrity basis is a proof-of-concept of IMLF as a transfer learning framework. In theory, Eq. (8) is valid only under the assumption of homogeneous strain rates. Here we show how to transfer it to nonhomogeneous flows (e.g., near-wall flows in a channel and secondary flows within a duct) with the aid of newly-proposed t Re . With these in mind, we develop our data-driven model. The main focus in the training phase is thus on the multicomponent regression problem defined as  ： q g under a set of targeted needs, i.e., requirements of a fair prediction for different flow domains and stress components, noisy-insensitivity of learned model to input features, spatial smoothness of predicted field, physical realizability of predicted components, and extrapolation capability. These needs are significantly important for physical problems. A ML-based procedure to settle down this problem herein integrates three critical steps: (i) setup a proper neural network architecture to provide a family of alternative functions for  with a set of trainable network parameters; (ii) design a reasonable objective-function to fully consider targeted needs; and (iii) select a good optimization algorithm to determine the trainable parameters. A specialized physics-informed residual network (PiResNet) under IMLF is developed to represent the mapping  ： q g on the training dataset , as shown in Fig. 7. PiResNet is based on a residual network architecture with fully-connected layers. This network is an acyclic cascade consisting of an input layer, several hidden layers, and an output layer. A skip connection is applied across several hidden layers to provide identity mapping, thus forming a residual block in which the overall operation can be expressed as ( ) i i i m F + + x x x and ( ) F  is the overall operation of m fully-connected layers contained in this residual block. In each fully-connected layer, ( ) i i i i  + + w x b x where i w and i b are trainable network parameters and  is the nonlinear activation function. He et al.

46, 47 has demonstrated, in image recognition, that the identity mapping achieves great empirical success to settles down two technical problems frequently encountered in deep learning, i.e., gradient vanishing and performance degradation. Considering the fact that there is little use of other networks other than plain neural network in turbulence closure modeling, the first use of this network enriches our understanding of neural networks. Recalling PiResNet, we cast the form-constraint inverse modeling as an optimization problem by minimizing the following objective-function ( ; , ; ) , c w a w w c c J     + + (18a) a a F a F a F N N N     −   −   −   I a a a a aa  (18b) w c N    w c (18c) where   , D ni i i = = w b  are a set of trainable parameters ( D n is the number of network layers), N is the number of samples in , w  and c  are regularization factors, and  is a fair factor matrix. a represents supervised data-driven aspects, for which we provide three alternatives in Eq. (18b) to achieve an unbiased prediction for stress anisotropy components. w represents the complexity of network, thus w   aiming to obtain noisy-insensitive closure coefficients and prevent overfitting; while the regularization constraint on c is motivated to ease ill-posed problems and reduce noisy-sensitivity of the learned model to integrity basis. Both w   and c   as well as PIR constrained on a are unsupervised physics-informed aspects embedded in the training phase. Finally, we adopt backpropagation algorithm to update network parameters and use Adam algorithm to compute objective-function gradients. The training procedure is summarized in algorithm 2, which serves as a fair and robust training strategy for PiResNet to achieve optimal network parameters   , thus ( , )   = g q  . The numeric range of input features and outputs should be limited in order to ease the updating of trainable network parameters. A logarithmic function log(1 ) x + is selected to rescale all input features, which makes 𝑅𝑒 𝑡 work better. In comparison, the learned model responses more sensitively to noise in input features when adopting commonly-used normalization with global means and variances, which should be avoided. We note that the benefits of logarithm functions may go far beyond feature scaling. Weatheritt and Sandberg found that the logarithm has been always successfully evolved in every best regression solution when using evolutionary algorithm. At the same time, integrity basis is also rescaled in order to adjust numeric range of learned closure coefficients, as shown in Fig. 7. Previous work by Jiang et al. demonstrated that closure coefficients do have a separation of order magnitude. Another concern is the selection of activation functions, which is based on following considerations: (i) smooth functions (infinitely differentiable) to account for the smoothness requirement of predicted stress anisotropy and (ii) sensitivity to negative values to prevent dead neurons. Thus, we choose Gaussian error linear unit (GELU) as the activator, which does outperform rectified linear unit (ReLU) and exponential linear unit (ELU) in our testing. GELU achieves smaller network weights and thus is more insensitive to noise in input FIG. 7. A schematic of invariant and realizable PiResNet based on a residual network architecture with fully-7 connected layers. All operations in the neural network are element-wise. A fair and robust training strategy for PiResNet is detailed in algorithm 2. features. Good properties of a gating mechanism by GELU are detailly investigated. We note that PiResNet training with GELU has a little higher computational cost, almost 8.4 GPU hours (GeForce GRT 2080). Besides, a comparative testing of plain neural network (16 hidden layers) and residual neural network (10 hidden layers) indicates that the latter achieves better and more robust predictions. Thus, residual neural network is selected as the resulting network. PiResNet training is performed with the open-source software library TensorFlow. The effect of the training dataset size on the performance of PiResNet is investigated with the cross-validation technique, as shown in Fig. 8(a). Three flow cases are used for training: a channel flow at 𝑅𝑒 𝜏 = 1000 and two duct flows with AR=1 at 𝑅𝑒 𝜏 = 180 and AR=3 at 𝑅𝑒 𝜏 = 360 ; and the remaining cases are used for testing, as shown in Fig. 8(b). It is noted that PiResNet trained only with a duct flow with AR=1 at 𝑅𝑒 𝜏 = 180 still achieves a good extrapolation to a duct flow with AR=3 at 𝑅𝑒 𝜏 = 360 , but slightly underpredicts shear stresses for a duct flow with AR=7 at 𝑅𝑒 𝜏 =180 . In comparison, the random forest model by Wang et al. trained with three duct cases at 𝑅𝑒 𝑏 =2200, 2600 and 2900 (respectively corresponding to 𝑅𝑒 𝜏 =150, 175 and almost 200) extrapolates to a duct flow at 𝑅𝑒 𝑏 =3500 ( 𝑅𝑒 𝜏 =225); the neural network model by Ling et al. trained with six classes of different flow configurations (including a duct flow at 𝑅𝑒 𝑏 =3500) transfers to a duct flow at 𝑅𝑒 𝑏 =2000 ( 𝑅𝑒 𝜏 =120). PiResNet does benefit from embedded prior domain-knowledge. Recalling data-driven aspects, the additional duct data helps capture the effect of large aspect ratio better while the additional channel flow helps further improve predictions at duct centerplane and capture the effect of large Reynolds number better. FIG. 8. (a) Training and validation errors (root mean squared error) as functions of the training dataset size ( d N ) and (b) The flow configurations adopted for PiResNet training and testing. We use both original data and clustering-based rebalanced data to train PiResNet (denoted as PiResNet-1 and PiResNet-2) and compare the fair prediction ability for two clustering subregions. The discrepancy of prediction errors between these two subregions are shown in Table I, where a smaller value means a fairer prediction. PiResNet-1 trained with imbalance data tends to have an unfair prediction and this unfair trend increases with increasing data imbalance. The ratio of sample numbers of two subregions in original data, i.e., 𝑟 𝑠 , is also shown in Table I to characterize the degree of data imbalance. In comparison, PiResNet-2 with rebalanced data does gain fairer predictions for different flow domains: a higher than 40% improvement in training dataset and a almost 25% improvement in testing dataset, especially a 69% rise for the case with AR=1 at 𝑅𝑒 𝜏 =360 . Thus, the clustering-based sampling technique is effective. It is noted that an absolutely fair PiResNet requires a more accurate clustering result to guarantee absolutely equal diversity of different patterns. Despite not absolutely fairness, the clustering-based sampling technique greatly mitigates the bias of a learned model toward a certain flow pattern which will play an important role in develop a universal data-driven model across multiple classes of flows in the near future. TABLE I. The discrepancy of root mean squared errors between two clustering subregions. PiResNet-1 and PiResNet-2 refer to PiResNet trained with original data and clustering-based rebalanced data. 𝑟 𝑠 denotes the ratio of sample number between two subregions in original data. The comparison of both models is conducted using the same training flows at almost the same level of the overall training error. AR=1 𝑅𝑒 𝜏 = 180 AR=3 𝑅𝑒 𝜏 = 360 AR=1 𝑅𝑒 𝜏 = 360 AR=3 𝑅𝑒 𝜏 = 180 AR=5 𝑅𝑒 𝜏 = 180 AR=7 𝑅𝑒 𝜏 = 180 𝑟 𝑠 PiResNet-1 −3 −3 −3 −2 −2 −2 PiResNet-2 −3 −3 −3 −2 −2 −2 Similarly, we further examine whether PiResNet achieves fair predictions for stress anisotropy components. Eq. (18b) provides three different objective functions for the first term of Eq. (18a), i.e., a , a , and a . In particular, the fair factor in a is properly selected by trial-and-error method as ˆ/ | | ij ij N a  =  . The comparative result is shown in Table II. a shows biased predictions for a and a , while a and a make improvements toward a fair prediction of all stress anisotropy components. In particular, the prediction of a is significantly improved. However, a is not frame-invariant and depends on the matrix  (not a tensor). For the same flow even as the training case, PiResNet with a provides different predictive accuracy when only different meshes are used. Thus, we choose a in order to form a scale- and frame-invariant objective-function. Consequently, the hyper-parameters can be valid across training datasets of dynamically similar flows only with different scales or different coordinate systems. TABLE II. The regression coefficients for three different ℒ 𝑎 defined in Eq. (18b) corresponding to ℒ 𝑎1 , ℒ 𝑎2 , and ℒ 𝑎3 , respectively. Results both for all training and testing flows are given. 𝒂 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 Training ℒ 𝑎1 ℒ 𝑎2 ℒ 𝑎3 ℒ 𝑎1 ℒ 𝑎2 ℒ 𝑎3 Furthermore, we search for optimal regularization factors w  and c  from −3 ~10 −7 and eventually select w  = −7 and c  = −6 . In this setting, PiResNet has smaller values both for the overall network parameters and closure coefficients, as shown in Fig. 9. The regularization constraint only on closure coefficients ( w  = , c   ) can hardly limit the network parameters (see Fig. 9(b)) and the regularization constraint only on network parameters ( c  = , w   ) make limited impact in closure coefficients (see Fig. 9(c)). Thus, regularization constraints both on network parameters and closure coefficients are necessary. It is worth noting that the regularization of the closure coefficients is critical in two aspects. On the one hand, the ill-posed problem that the objective-function gradients w.r.t. the closure coefficients are insensitive may occur when the strain is zero somewhere in the flow. Thus, we introduce additional information to regularize the solution by penalizing the total magnitude of the closure coefficients. To some extent, it also increases smoothness. On the other hand, the closure coefficients can be regarded as an amplifier to the uncertainty of integrity basis. Smaller closure coefficients reduce the noisy-sensitivity of the learned model to integrity basis. Similarly, smaller network parameters mean lower noisy-sensitivity of strain-dependent closure coefficients. PiResNet with w  = −7 and c  = −6 eventually achieves a mean value of 0.025 for each network parameter averaged on all neurons and 0.046 for each closure coefficient averaged on the spatial volume. Section IV.A shows their quantitative effects. Finally, the whole flowchart to search optimal PiResNet in this work can be summarized in Fig. 10. The resulting network has 5 residual blocks, and each block contains 2 fully-connected hidden layers with 128 neurons in each layer. FIG. 9. The learning dynamics of PiResNet with three sets of hyperparameters ( 𝜆 𝑤 , 𝜆 𝑐 ): (a) ℒ 𝑎1 representing the quality of data-driven training; (b) ℒ 𝑤 representing the complexity of network; and (c) ℒ 𝑐 representing the magnitude of closure coefficients. All training experiments are performed with ℒ 𝑎3 and evaluated with ℒ 𝑎1 . Note that ℒ 𝑐 is computed for rescaled closure coefficients due to rescaling integrity basis as shown in Fig. 7. FIG. 10. A flowchart to achieve the optimal PiResNet in this work.

IV. Testing: Numerical Results and Validation In this phase, as outlined in Fig. 1, a systematic evaluation of PiResNet-predicted outcomes from different aspects (e.g., extrapolative accuracy, realizability and robustness) is motivated to present evidence driving the trust toward PiResNet. Testing is performed on two classes of different flow configurations: (i) two-dimensional parallel-shear flows in a channel at various Reynolds numbers and (ii) three-dimensional secondary flows within a duct with various aspect ratios. It is well-recognized that conventional closure models lack accuracy both for near-wall flow of a channel (featured with nonhomogeneous effects due to wall proximity) and in-plane flow of a duct (featured with secondary motions induced by normal-stress imbalance). We also note, in these two flows under testing, the roles that Reynolds stress plays in the mean-flow field are different: shear stress dominates the flow in a channel while normal stress controls the secondary flow in a duct. Hence, the utility of PiResNet is demonstrated by applying it to these flows to capture the aforementioned flow characteristics and their sensitivities to Reynolds number and aspect ratio. To illustrate the superiority of PiResNet, we also compare it to the same-level model based on gene expression programming by Weatheritt and Sandberg (denoted as GEP). It is less fair to compare modern data-driven models to conventional algebraic models, e.g., that is the case in Ref. ; instead fair to the state-of-the-art model, at least less outdated models, which is also stressed recently by Musgrave et al . Besides, DNN (as the kernel of PiResNet) and GEP are two representatives of modern ML algorithms: the former is parametric regression based on connectionism while the latter is symbolical regression. These are why we take GEP as the baseline model. This is the first attempt, to our knowledge, to compare different ML-based turbulence models.

A. Extrapolative predictions of stress anisotropy

The section is centered around whether the well-trained PiResNet can successfully provide satisfactory predictions on unseen flows, i.e., its extrapolative capability. PiResNet-predicted stress anisotropy for channel flows at 𝑅𝑒 𝜏 =650, 2000 and 5200 are shown in Fig. 11 and compared with GEP and reference DNS. It can be seen that PiResNet is consistent with DNS in all stress anisotropy components. In comparison, GEP provides satisfactory predictions only for the shear stress and only in the region far from the wall, which is a slight advantage over conventional nonlinear models in this flow. Moreover, GEP, not surprisingly, returns the same stress for near-wall region as the region far from the wall due to inappropriate turbulent timescale. Introducing t Re , which parameterizes the newly-proposed turbulent timescale, does enable PiResNet to extend the resolution scope to the whole flow domain. Thus, the concept of Eq. (12) is effective. FIG. 11. The variation of stress anisotropy components ( 𝑎 𝑖𝑗 ) with nondimensional shear parameter ( 𝑠 𝑚 ) predicted by GEP, PiResNet, and DNS for channel flows at (a)~(d) 𝑅𝑒 𝜏 =650, (e)~(h) 𝑅𝑒 𝜏 =2000, and (i)~(l) 𝑅𝑒 𝜏 =5200. The first to forth columns correspond respectively to 𝑎 , 𝑎 , 𝑎 , and 𝑎 . The direction with increasing wall distance (w. dis.) is also marked. Note that some results of GEP are out of range and not shown. PiResNet-predicted stress anisotropy for duct flows at 𝑅𝑒 𝜏 =360 with AR=1 is shown in Fig. 12 and 𝑅𝑒 𝜏 =180 with AR=7 in Fig. 13. For brevity, results for duct flows with smaller aspect ratios (i.e., AR=3 and 5) at 𝑅𝑒 𝜏 =180 are omitted. Due to symmetry, only lower left quadrant of the duct ( −AR ≤ 𝑧 ℎ⁄ ≤ 0, −1 ≤ 𝑦/ℎ ≤ 0 ) is shown. PiResNet easily provides improvements on predictions of all stress anisotropy components while GEP provides relatively accurate predictions only for shear stresses 𝑏 and 𝑏 . An unfair prediction of normal stresses by GEP lies in its biased training objective-function. It is common knowledge that in-plane normal stress imbalance is of critical importance to capture secondary motions in duct flows. Besides, GEP lacks accuracy in the near-wall region. In comparison, PiResNet correctly captures the secondary mechanism and has a good predictive ability for near-wall flows. PiResNet successfully transfers to flows with different Reynolds numbers and aspect ratios, indicating that PiResNet does learn the underlying similar dynamical mechanism. It is worth noting that a fair training strategy enables PiResNet to improve predictions of 𝑎 . FIG. 12. Stress anisotropy components on the duct cross-plane predicted by GEP, PiResNet, and DNS for a duct flow at 𝑅𝑒 𝜏 =360 with AR=1: (a) 𝑎 , (b) 𝑎 , (c) 𝑎 , (d) 𝑎 , (e) 𝑎 , and (f) 𝑎 . The first to forth columns correspond to results of GEP, PiResNet, DNS and the absolute error between PiResNet and DNS. Note that this case is an extrapolation both on Reynolds number and aspect ratio. For simplicity, only lower left quadrant of the duct ( −1 ≤ 𝑧 ℎ⁄ , 𝑦/ℎ ≤ 0 ) is shown. The coordinate origin is located at the duct center. FIG. 13. Stress anisotropy components on the duct cross-plane predicted by GEP, PiResNet, and DNS for duct flow at 𝑅𝑒 𝜏 =180 with AR=7: (a) 𝑎 , (b) 𝑎 , (c) 𝑎 , (d) 𝑎 , (e) 𝑎 , and (f) 𝑎 . The first to forth columns correspond to results of GEP, PiResNet, DNS and the absolute error between PiResNet and DNS. Note that this case is a large extrapolation on aspect ratio. For simplicity, only lower left quadrant of the duct ( −7 ≤ 𝑧 ℎ⁄ ≤0, −1 ≤ 𝑦/ℎ ≤ 0 ) is shown. The coordinate origin is located at the duct center. B. Realizability capability

We further examine in the barycentric map (BMap) whether PIR corrector really works and whether a realizable solution is achieved by PiResNet. BMap as the non-distorted representation of anisotropy, linearly relates any anisotropy state to three limiting states , c c c c c c   = + +    (19) where c  , c  , and c  denote three vertices of BMap in representing the one component state (1C), two-component axisymmetric state (2C-axis) and three-component isotropic state (3C-iso). c  , c  , and c  are weighting factors of these three states, and determined by the eigenvalues of the stress anisotropy: , 2( ), 3 1. c c c        = − = − = + (20) Thus, comparative results for channel flows at 𝑅𝑒 𝜏 =650, 2000 and 5200 are shown in Fig. 14. PiResNet achieves good agreement with DNS and is successfully constrained within the well-defined BMap; however, GEP mismatches with DNS especially in the region far from the wall. GEP shows more obvious bias toward 3C-iso state than PiResNet, thus indicating a poor characterization FIG. 14. Scatter plot of the stress anisotropy in the barycentric map for channels flow at (a) 𝑅𝑒 𝜏 =650, (b) 𝑅𝑒 𝜏 =2000, and (c) 𝑅𝑒 𝜏 =5200, scaled with the corresponding wall distance (w. dis.). Results of GEP and PiResNet are compared with DNS. Note that some results of GEP are out of range and not shown. of anisotropy. Further results for duct flows at 𝑅𝑒 𝜏 =360 with AR=1 and 𝑅𝑒 𝜏 =180 with AR=7 are shown in Fig. 15 and Fig. 16. It can be seen that PiResNet always provides realizable solutions and agrees well with DNS. In comparison, realizable anisotropy is obtained by GEP only in the region far from the wall and, in particular, GEP returns to a 3C-iso state as the wall is approached. As demonstrated by Weatheritt and Sandberg , the qualitative incorrectness is due to inappropriate use of timescale. With Eq. (12), our new timescale I  never tends to zero as the wall is approached. Without the PIR corrector, our proposed PiResNet provides realizable solutions in most flow region other than the near-wall region. To demonstrate the effectivity of proposed PIR corrector and its advantage over that of Ling et al., both realizability correctors are applied to GEP-predicted stress anisotropy. As a result, our proposed PIR corrector easily returns realizable solutions while that of Ling et al. is still under-constraint, thus indicating that the PIR corrector is effective and can be extended to other turbulence models for realizable solutions. FIG. 15. Scatter plot of the stress anisotropy in the barycentric map for a duct flow at 𝑅𝑒 𝜏 =360 with AR=1, colored by the corresponding wall distance (w. dis.). Results of GEP and PiResNet are compared with DNS. FIG. 16. Scatter plot of the stress anisotropy in the barycentric map for a duct flow at 𝑅𝑒 𝜏 =180 with AR=7, colored by the corresponding wall distance (w. dis.). Results of GEP and PiResNet are compared with DNS. C. Robustness performance

The robustness of PiResNet is examined by a noisy-sensitivity analysis, where additional Gaussian noise is added to the input features as (1 ), (0,1), q q  = + (21) where q  and q denote the input features with and without noise,  is the noise level and is a set of random numbers that satisfy a Gaussian distribution. The relative error is defined as ˆ ˆ|| || / || || , a F F  = − a a a (22) where  denotes averaging over the spatial flow domain. Figure 17 shows the variation of relative prediction error of PiResNet with various noise levels in input features. Three activators with the same regularization operation are compared. It can be seen that the prediction error is almost unchanged until a noise level of 30% and PiResNet with GELU is more insensitive to perturbation in input features. Thus, PiResNet gains good noise immunity, which is of critical importance to numerical simulations. Also, it is very useful to improve experimental Reynolds stress when combining PiResNet and noisy experiment data. FIG. 17. The variation of PiResNet-predicted relative error ( 𝜀 𝑎 ) on the testing flows with the increasing noise level ( 𝛿 ) in input features. Three activators are compared with the same regularization constraints both on network parameters and closure coefficients. As aforementioned, the robustness of PiResNet stems from two parts: (i) the noisy-sensitivity of strain-dependent closure coefficients depends on network parameters and (ii) the noisy-sensitivity of learned model to integrity basis depends on the closure coefficients. The neuron-averaged value of 0.025 is constrained for network parameters, thus obtaining noisy-sensitive closure coefficients. Then we examine the closure coefficients, which is shown in Table III and Fig. 18. The closure coefficients of GEP are considerably large (details in Ref. ) and thus are omitted. Although complete integrity basis has been considered, our PiResNet with only first four terms can still reproduce the flow characteristics well for all testing cases. Weatheritt and Sandberg. also found that more terms make limited gains in predictive accuracy. Thus, only four coefficients are shown. PiResNet provides a mean value of -0.077 for 𝑐 which is close to -0.09 in conventional models. As is the case in GEP, large values of closure coefficients in PiResNet appear on the duct bisector due to velocity gradients vanishing. The difference is that penalty on PiResNet-predicted closure coefficients (as shown in Eq. (18a)) helps regularize the solution of ill-posed problems and thus successfully obtain much smaller values than GEP. Coefficients ( 𝑐 , 𝑐 , 𝑐 , 𝑐 ) of PiResNet have small values below 0.18, which contributes to noisy-sensitivity to integrity basis ( 𝒔 , 𝒔 , 𝒔𝝎 − 𝝎𝒔 , 𝝎 ). Additionally, the small dispersion of closure coefficients, as shown in Table III, indicates good spatial smoothness. Larger values and a larger dispersion for closure coefficients occur when removing regularizations of closure coefficients. Obviously, regularization operations both on network parameters and closure coefficients do work well in improving the robustness of PiResNet. TABLE III. The spatial average (Ave.) and root mean square (rms) of PiResNet-predicted closure coefficients for all testing flows. 𝑐 𝑐 𝑐 𝑐 Ave. -0.077 0.036 -0.050 -0.024 rms

D. Interpretability of closure coefficients

Finally, we analyze the physical roles of learned closure coefficients and their contribution to the production of turbulent kinetic energy : k P  − S  . 𝑐 <0 in PiResNet is useful to numerical stability while 𝑐 >0 in GEP as the wall is approached. 𝑐 >0 in PiResNet indicate that 𝒔 play an opposite role with the linear term 𝒔 , while 𝒔𝝎 − 𝝎𝒔 with 𝑐 <0 plays the same role with 𝒔 . 𝑐 <0 in most regions and 𝑐 >0 in the narrow corner region. Accordingly, negative 𝑎 𝑖𝑗 is dominated by 𝑐 while positive 𝑎 𝑖𝑗 is mainly controlled by 𝑐 . The well-known in-plane secondary motion is almost accounted for by 𝑐 , 𝑐 and 𝑐 due to ( 2 )( ) c c c s s  −  + − − , which drives high-energy fluid from the near-wall region to the duct core. Notably, the symmetric distribution of ( 𝑐 , 𝑐 , 𝑐 , 𝑐 ) about the duct bisector at 𝑅𝑒 𝜏 =360 with AR=1 is a physical reflection of embedded invariances in PiResNet. Furthermore, Eq. (14) in incompressible flows yields the PiResNet-predicted k P (nondimensionalized by  ) as / 2 tr( ) 2 tr( ) 2 tr( ). k P c c c  = − − − s s s  (23) The disappearance of 𝑐 in Eq. (23) but its existence in Eq. (14) implies that 𝒔𝝎 − 𝝎𝒔 plays a redistribution role between stress components and has no contribution to k P , which is similar with the mechanism of pressure-strain correlations in transport equations. In duct flows, tr( ) 0  s , tr( ) 0  s , tr( ) 0  s (negative in the corner region) and tr( ) |tr( )|>|tr( )|  s s s  . Thus, 𝒔 , 𝒔 and 𝝎 in PiResNet accelerate the production of turbulent kinetic energy while the linear term 𝒔 with 𝑐 >0 in GEP as the wall is approached, acts to hinder the production of turbulent kinetic energy. This behavior of GEP leads to a laminarization or “lift off” of the secondary structures. To achieve physical and converged results, Weatheritt and Sandberg claimed that an empirical modification of GEP-predicted k P is a must. Our PiResNet provides a correct mechanism for k P . Accordingly, closure coefficients in PiResNet have clear groundings in physical arguments. FIG. 18. PiResNet-predicted closure coefficients for a duct flow at (a)-(d) 𝑅𝑒 𝜏 =360 with AR=1 and (e)-(h) 𝑅𝑒 𝜏 =180 with AR=7. The first to forth columns corresponds to 𝑐 , 𝑐 , 𝑐 and 𝑐 , respectively. Overall, PiResNet is a successful model with closure coefficients of clear physical meaning, which provides realizable and robust predictions and can account for the variability of flow characteristics due to Reynolds number and aspect ratio.

V. Summary and Perspectives

The current work has developed a universal turbulence modeling framework under which an invariant, realizable, unbiased, and robust data-driven turbulence model was achieved. At every phase of the model development lifecycle, the underlying physical considerations are indispensable: domain-knowledge has been attentively inferred and reasonably converted to modeling knowledge embedded both in the design and training processes. In such endeavors, the methodology addressed in this work has been insured as an interpretable machine learning framework for turbulence modeling. In the design phase: first and foremost, the objectivity of the proposed PiResNet was strictly preserved by suitably choosing extended Galilean-invariant input features to regress the closure coefficients. This principle prevents from a preference for the coordinate system adopted by the modeler, which is user-friendly. Second, a dimensionless quantity ( 𝑅𝑒 𝑇 ) was introduced to parameterize the newly-proposed turbulent time scale. As an additional argument, it successfully helped extricate from possible nonunique mappings of conventional inputs (only dimensionless strain magnitude and vorticity magnitude for coefficient regression) to outputs of a RANS model, thus being possible to employ machine learning algorithms to establish a deterministic functional relation. Third, realizability as one of the contents of turbulence physics was reasonably respected to constrain the well-learned PiResNet, especially when the model is operating in an extrapolatory mode. Using the PIR corrector, realizable solutions were gained in the predictive settings. It is worth noting that 𝑅𝑒 𝑇 and the PIR corrector can also be useful to other algebraic RANS models. In the training phase: a fair learning strategy comprising of unbiased data sampling (here referring to the clustering-based resampling technique) and unbiased objective-function design was successfully implemented to update internal network parameters. The unbiased objective-function design contains two-fold: (i) a scale- and frame-invariant objective-function to insure the hyper-parameter settings of PiResNet against being invalid across training datasets of dynamically similar flows only with different scales or different coordinate systems, which is useful for other modelers to reproduce this work; and (ii) effective measures taken for the first time to rebalance more fairly each contribution of the anisotropy components to the overall training error. Besides, regularizations were performed both on trainable network parameters and regressed closure coefficients to reduce the PiResNet sensitivity to the uncertainty of input data, thus achieving a robust model. The former accounts for the sensitivity of the closure coefficients to the uncertainty of invariant inputs, while the latter for the amplifier of the uncertainty of tensor basis. It cannot be overstated that physical considerations are critical to data-driven turbulence modeling. The above-mentioned measures jointly do improve the predictive performance of data-driven turbulence modeling based on limited data and are significant efforts towards its practical use in the engineering environment in the near future. Also, the noisy-insensitive PiResNet can provide high-fidelity predictions of all Reynolds-tress components based on sparse experimental data. Based on the PiResNet-predicted Reynolds-stress and the RANS solver, one can easily develop a hybrid experiment-CFD method. Despite encouraging results, on the other hand, there is much to be gained by further calibrating the RANS-related transport equations, enriching the diversity of training data, and carefully exploring a broader set of input features. We do not claim that the learned PiResNet necessarily is applicable to other untested complex flows although it does gain good generalization across some two-dimensional and three- dimensional flows. PiResNet needs to be widely validated. Rather, the philosophy and formalisms employed in the IMLF are of a general nature, and not restricted to the type of RANS-based closure, the type of neural netwoork architecture, or the type of training case. Following the guidelines of IMLF, one can easily extend to retrain a data-driven model, even at LES to predict more complex flow configuraions. However, the most challenging issue remains how to develop a generalizable model across multiple classes of complex flows. Therefore, building the bechmark dataset containing diverse typical flow phenomenon (e.g., secondary effects, flow separation, streamline curvature, to name a few) is presently eager to develop and evaluate data-driven turbulence models. Additionally, a complete and compact set of input features should be investgated sysmatically. The incompleteness and redundancy may destroy the generalization capability. Most importantly, data-driven turbulence modeling to date (including PiResNet) has been mainly centerd around machine-learning-augmentd calibration of the closure coefficients given a conventional form-constraint RANS model, which only improves the parametric inadequacy. It is common knowledge that the inadequacy in model-form is the crux. Towards this end, it is indeed essential to develop a “form-free-in-prior” data-driven model to directly infer the model-form in adequately representing the rich dynamics of turbulence. This requires that machine learning algorithms can directly deal with tensor problems, at least vector problems (when using a spectral decomposition). Future work in this direction may be of great interest. References P. R. Spalart, "Philosophies and fallacies in turbulence modeling," Prog. Aerosp. Sci. , 1 (2015). P. A. Durbin, "Some recent developments in turbulence closure modeling," Annu. Rev. Fluid Mech. , 77 (2018). C. Jiang, J. Mi, S. Laima, and H. Li, "A novel algebraic stress model with machine ‐ learning ‐ assisted parameterization," Energies , 258 (2020). J. Slotnick, A. K. J. Alonso, D. Darmofal, W. Gropp, E. Lurie, and D. Mavriplis, "CFD vision 2030 study: A path to revolutionary computational aerosciences," Technical Report NASA/CR-2014-218178, 2014. K. Duraisamy, P. R. Spalart, and C. L. Rumsey, "Status, emerging ideas and future directions of turbulence modeling research in aeronautics," Technical Report NASA/TM–2017–219682, 2017. K. Duraisamy, G. Iaccarino, and H. Xiao, "Turbulence modeling in the age of data," Annu. Rev. Fluid Mech. , 357 (2019). S. L. Brunton, B. R. Noack, and P. Koumoutsakos, "Machine learning for fluid mechanics," Annu. Rev. Fluid Mech. , 477 (2020). 2 S. H. Cheung, T. A. Oliver, E. E. Prudencio, S. Prudhomme, and R. D. Moser, "Bayesian uncertainty analysis with applications to turbulence modeling," Relab. Eng. Syst. Safe , 1137 (2011). W. N. Edeling, P. Cinnella, and R. P. Dwight, "Predictive RANS simulations via Bayesian model-scenario averaging," J. Comput. Phys. , 65 (2014). W. N. Edeling, P. Cinnella, R. P. Dwight, and H. Bijl, "Bayesian estimates of parameter variability in the k-ε turbulence model," J. Comput. Phys. , 73 (2014). J. Zhang, and S. Fu, "An efficient Bayesian uncertainty quantification approach with application to k-ω-γ transition modeling," Comput. Fluids , 211 (2018). J. Weatheritt, and R. Sandberg, "A novel evolutionary algorithm applied to algebraic modifications of the RANS stress–strain relationship," J. Comput. Phys. , 22 (2016). J. Weatheritt, and R. D. Sandberg, "The development of algebraic stress models using a novel evolutionary algorithm," Int. J. Heat Fluid Flow , 298 (2017). Y. Zhao, H. D. Akolekar, J. Weatheritt, V. Michelassi, and R. D. Sandberg, "RANS turbulence model development using CFD-driven machine learning," J. Comput. Phys. 109413 (2020). L. Y. Zhu, W. W. Zhang, J. Q. Kou, and Y. L. Liu, "Machine learning methods for turbulence modeling in subsonic flows around airfoils," Phys. Fluids , (2019). J. Ling, A. Kurzawski, and J. Templeton, "Reynolds averaged turbulence modelling using deep neural networks with embedded invariance," J. Fluid Mech. , 155 (2016). E. Dow, and Q. Wang,

Quantification of structural uncertainties in the k−ω turbulence model (AIAA Paper 2011-1762, Denver, Colorado, 2011). E. J. Parish, and K. Duraisamy, "A paradigm for data-driven predictive modeling using field inversion and machine learning," J. Comput. Phys. , 758 (2016). A. P. Singh, and K. Duraisamy, "Using field inversion to quantify functional errors in turbulence closures," Phys. Fluids , 045110 (2016). A. P. Singh, S. Medida, and K. Duraisamy, "Machine-learning-augmented predictive modeling of turbulent separated flows over airfoils," AIAA J. , 2215 (2017). C. X. He, Y. Z. Liu, and L. Gan, "A data assimilation model for turbulent flows using continuous adjoint formulation," Phys. Fluids , (2018). M. Yang, and Z. Xiao, "Improving the k–ω–γ–Ar transition model by the field inversion and machine learning framework," Phys. Fluids , 064101 (2020). M. Emory, J. Larsson, and G. Iaccarino, "Modeling of structural uncertainties in Reynolds-averaged Navier-Stokes closures," Phys. Fluids , 110822 (2013). C. Gorlé, and G. Iaccarino, "A framework for epistemic uncertainty quantification of turbulent scalar flux models for Reynolds-averaged Navier-Stokes simulations," Phys. Fluids , 055105 (2013). W. N. Edeling, G. Iaccarino, and P. Cinnella,

Uncertainty quantification in RANS constrained by a partial differential equation (Center for Turbulence Research, Stanford University, 2017). G. Iaccarino, A. A. Mishra, and S. Ghili, "Eigenspace perturbations for uncertainty estimation of single-point turbulence closures," Phys. Rev. Fluids , 024605 (2017). R. L. Thompson, A. A. Mishra, G. Iaccarino, W. Edeling, and L. Sampaio, "Eigenvector perturbation methodology for uncertainty quantification of turbulence models," Phys. Rev. Fluids , 044603 (2019). J. X. Wang, J. L. Wu, and H. Xiao, "Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data," Phys. Rev. Fluids , 034603 (2017). J. L. Wu, H. Xiao, and E. Paterson, "Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework," Phys. Rev. Fluids , (2018). 3 J. L. Wu, R. Sun, S. Laizet, and H. Xiao, "Representation of stress tensor perturbations with application in machine-learning-assisted turbulence modeling," Comput. Method Appl. Mech. Eng. , 707 (2019). Y. Yin, P. Yang, Y. Zhang, H. Chen, and S. Fu, "Feature selection and processing of turbulence modeling based on an artificial neural network," Phys. Fluids , 105117 (2020). J. L. Ling, A. Ruiz, G. Lacaze, and J. Oefelein, "Uncertainty analysis and data-driven model advances for a jet-in-crossflow," Journal of Turbomachinery , (2017). H. Xiao, J.-L. Wu, S. Laizet, and L. Duan, "Flows over periodic hills of parameterized geometries: A dataset for data-driven turbulence modeling from direct simulations," Comput. Fluids , 104431 (2020). S. L. Brunton, J. L. Proctor, and J. N. Kutz, "Discovering governing equations from data by sparse identification of nonlinear dynamical systems," Proceedings of the National Academy of Sciences of the United States of America , 3932 (2016). S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, "Data-driven discovery of partial differential equations," Sci. Adv. , (2017). S. Pan, and K. Duraisamy, "Data-driven discovery of closure models," SIAM Journal on Applied Dynamical Systems , 2381 (2018). H. Vaddireddy, A. Rasheed, A. E. Staples, and O. San, "Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data," Phys. Fluids , 015113 (2020). J. N. Kutz, "Deep learning in fluid dynamics," J. Fluid Mech. , 1 (2017). J. Ling, R. Jones, and J. Templeton, "Machine learning strategies for systems with invariance properties," J. Comput. Phys. , 22 (2016). J. L. Lumley, "Toward a turbulent constitutive relation," J. Fluid Mech. , 413 (1970). S. B. Pope, "A more general effective-viscosity hypothesis," J. Fluid Mech. , 331 (1975). S. K. Arolla, and P. A. Durbin, "Modeling rotation and curvature effects within scalar eddy viscosity model framework," Int. J. Heat Fluid Flow , 78 (2013). F. Billard, and D. Laurence, "A robust k-ε-v2/k elliptic blending turbulence model applied to near-wall, separated and buoyant flows," Int. J. Heat Fluid Flow , 45 (2012). K. Hanjalic, M. Popovac, and M. Hadziabdic, "A robust near-wall elliptic-relaxation eddy-viscosity turbulence model for CFD," Int. J. Heat Fluid Flow , 1047 (2004). T. Ma, C. Santarelli, T. Ziegenhein, D. Lucas, and J. Fröhlich, "Direct numerical simulation--based Reynolds-averaged closure for bubble-induced turbulence," Phys. Rev. Fluids , 034301 (2017). K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun,

Deep Residual Learning for Image Recognition (IEEE Computer Society, 2016). K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun,

Identity Mappings in Deep Residual Networks (Springer, 2016). C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, "Weight uncertainty in neural networks," arXiv preprint arXiv:1505.05424 (2015). D. P. Kingma, T. Salimans, and M. Welling, "Variational dropout and the local reparameterization trick," Advances in neural information processing systems , 2575 (2015). Q. Liu, and D. Wang, "Stein variational gradient descent: A general purpose bayesian inference algorithm," Advances in neural information processing systems , 2378 (2016). C. Rudin, "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead," Nat. Mach. Intell. , 206 (2019). A. Spencer, and R. Rivlin, "Isotropic integrity bases for vectors and second-order tensors," Archive for rational mechanics and analysis , 45 (1962). R. W. Johnson,

Handbook of fluid dynamics (Crc Press, 2016). 4 M. Alber, A. Buganza Tepole, W. R. Cannon, S. De, S. Dura-Bernal, K. Garikipati, G. Karniadakis, W. W. Lytton, P. Perdikaris, L. Petzold, and E. Kuhl, "Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences," npj Digit. Med. , 115 (2019). Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature , 436 (2015). W. N. Edeling, M. Schmelzer, R. P. Dwight, and P. Cinnella, "Bayesian predictions of Reynolds-averaged Navier–Stokes uncertainties using maximum a posteriori estimates," AIAA J. , (2018). S. B. Pope,

Turbulent Flows (Cambridge University Press, New York, 2000). C. G. Speziale, S. Sarkar, and T. B. Gatski, "Modelling the pressure–strain correlation of turbulence: an invariant dynamical systems approach," J. Fluid Mech. , 245 (1991). Y. N. Huang, and F. Durst, "Reynolds stress under a change of frame of reference," Phys. Rev. E , 056305 (2001). J. Ling, and J. Templeton, "Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty," Phys. Fluids , 085103 (2015). J. L. Lumley, "Computational modeling of turbulent flows," Adv. Appl. Mech. , 123 (1979). S. Banerjee, R. Krahl, F. Durst, and C. Zenger, "Presentation of anisotropy properties of turbulence, invariants versus eigenvalue approaches," J. Turbul. , N32 (2007). Speziale, and G. Charles, "Invariance of turbulent closure models," Phys. Fluids , 1033 (1979). W. Noll, "A mathematical theory of the mechanical behavior of continuous media," Archive for rational Mechanics and Analysis , 197 (1958). G. Haller, "An objective definition of a vortex," J. Fluid Mech. , 1 (2005). N. Geneva, and N. Zabaras, "Quantifying model form uncertainty in Reynolds-averaged turbulence models with Bayesian deep neural networks," J. Comput. Phys. , 125 (2019). M. Lee, and R. D. Moser, "Direct numerical simulation of turbulent channel flow up to Re τ ≈ , 395 (2015). K. Hanjalic, and B. E. Launder, "Contribution towards a Reynolds-stress closure for low-Reynolds-number turbulence," J. Fluid Mech. , 593 (1976). Z. Yang, and T.-H. Shih, "New time scale based k-epsilon model for near-wall turbulence," AIAA J. , 1191 (1993). T. Craft, B. Launder, and K. Suga, "Development and application of a cubic eddy-viscosity model of turbulence," Int. J. Heat Fluid Flow , 108 (1996). U. Goldberg, and D. Apsley, "A wall-distance-free low Re k—ϵ turbulence model," Comput. Method Appl. Mech. Eng. , 227 (1997). M. M. Rahman, R. K. Agarwal, M. J. Lampinen, and T. Siikonen,

Improvements to Rahman-Agarwal-Siikonen one-equation turbulence model based on k-ε closure (Atlanta, GA, USA, 2014). P. A. Durbin, "Near-wall turbulence closure modeling without “damping functions”," Theor. Comp. Fluid Dyn. , 1 (1991). P. A. Durbin, and C. G. Speziale, "Realizability of second-moment closure via stochastic analysis," J. Fluid Mech. , 395 (1994). S. S. Girimaji, "A new perspective on realizability of turbulence models," J. Fluid Mech. , 191 (2004). K. Inagaki, T. Ariki, and F. Hamba, "Higher-order realizable algebraic Reynolds stress modeling based on the square root tensor," Phys. Rev. Fluids , 114601 (2019). U. Schumann, "Realizability of Reynolds ‐ stress turbulence models," Phys. Fluids , 721 (1977). T. B. Gatski, and C. G. Speziale, "On the explicit algebraic stress models for complex turbulent flows," J. Fluid Mech. , 59 (1993). 5 M. M. Rahman, P. Rautaheimo, and T. Siikonen, "Modifications for an explicit algebraic stress model," Int. J. Numer. Meth. Fl. , 221 (2001). K. Iwamoto, Y. Suzuki, and N. Kasagi, "Reynolds number effect on wall turbulence: Toward effective feedback control," Int. J. Heat Fluid Flow , 855 (2002). M. Bernardini, S. Pirozzoli, and P. Orlandi, "Velocity statistics in turbulent channel flow up to Re τ = 4000," J. Fluid Mech. , 171 (2014). S. Hoyas, and J. Jiménez, "Reynolds number effects on the Reynolds-stress budgets in turbulent channels," Phys. Fluids , (2008). R. Vinuesa, A. Noorani, A. Lozano-Duran, G. K. El Khoury, P. Schlatter, P. F. Fischer, and H. M. Nagib, "Aspect ratio effects in turbulent duct flows studied through direct numerical simulation," J. Turbul. , 677 (2014). R. Vinuesa, P. Schlatter, and H. M. Nagib, "On minimum aspect ratio for duct flow facilities and the role of side walls in generating secondary flows," J. Turbul. , 588 (2015). D. Arthur, and S. Vassilvitskii, "K-means plus plus : The advantages of careful seeding," Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms 1027 (2007). X. Chen, F. Hussain, and Z. S. She, "Non-universal scaling transition of momentum cascade in wall turbulence," J. Fluid Mech. , R2 (2019). D. Kingma, and J. Ba,

Adam: A method for stochastic optimization (San Diego, CA, USA, 2015). D. Hendrycks, and K. Gimpel, "Gaussian error linear units," arXiv preprint (2016). V. Nair, and G. E. Hinton,

Rectified linear units improve restricted boltzmann machines (Haifa, Israel, 2010). D.-A. Clevert, T. Unterthiner, and S. Hochreiter,

Fast and accurate deep network learning by exponential linear units (San Juan, Puerto Rico, Spain, 2016). P. Ramachandran, B. Zoph, and Q. V. Le, "Swish ： a self-gated activation function," arXiv preprints arXiv:1710.05941 (2017). P. Bradshaw, "Turbulent Secondary Flows," Annu. Rev. Fluid Mech. , 53 (1987). K. Musgrave, S. Belongie, and S.-N. Lim, "A metric learning reality check," arXiv perprints arXiv:2003.08505 (2020). S. Banerjee, R. Krahl, F. Durst, and C. Zenger, "Presentation of anisotropy properties of turbulence, invariants versus eigenvalue approaches," J. Turbul. , 1 (2007). J. Wu, H. Xiao, R. Sun, and Q. Wang, "Reynolds-averaged Navier–Stokes equations with explicit data-driven Reynolds stress closure can be ill-conditioned," J. Fluid Mech. , 553 (2019). H. G. Weller, G. Tabor, H. Jasak, and C. Fureby, "A tensorial approach to computational continuum mechanics using object-oriented techniques," Comput. Phys. , 620 (1998). C. G. Speziale, S. Sarkar, and T. B. Gatski, "Modelling the pressure–strain correlation of turbulence: an invariant dynamical systems approach," J. Fluid Mech.227