[PDF] Multi-scale Deep Neural Network (MscaleDNN) Methods for Oscillatory Stokes Flows in Complex Domains

Abstract

In this paper, we study a multi-scale deep neural network (MscaleDNN) as a meshless numerical method for computing oscillatory Stokes flows in complex domains. The MscaleDNN employs a multi-scale structure in the design of its DNN using radial scalings to convert the approximation of high frequency components of the highly oscillatory Stokes solution to one of lower frequencies. The MscaleDNN solution to the Stokes problem is obtained by minimizing a loss function in terms of L2 normof the residual of the Stokes equation. Three forms of loss functions are investigated based on vorticity-velocity-pressure, velocity-stress-pressure, and velocity-gradient of velocity-pressure formulations of the Stokes equation. We first conduct a systematic study of the MscaleDNN methods with various loss functions on the Kovasznay flow in comparison with normal fully connected DNNs. Then, Stokes flows with highly oscillatory solutions in a 2-D domain with six randomly placed holes are simulated by the MscaleDNN. The results show that MscaleDNN has faster convergence and consistent error decays in the simulation of Kovasznay flow for all four tested loss functions. More importantly, the MscaleDNN is capable of learning highly oscillatory solutions when the normal DNNs fail to converge.

Full PDF

MMulti-scale Deep Neural Network (MscaleDNN) Meth-ods for Oscillatory Stokes Flows in Complex Domains

Bo Wang , Wenzhong Zhang , Wei Cai LCSM(MOE), School of Mathematics and Statistics, Hunan Normal University,Changsha, Hunan, 410081, P. R. China. Dept. of Mathematics, Southern Methodist University, Dallas, TX 75275

Summary.

In this paper, we study a multi-scale deep neural network (MscaleDNN)as a meshless numerical method for computing oscillatory Stokes ﬂows in complexdomains. The MscaleDNN employs a multi-scale structure in the design of its DNNusing radial scalings to convert the approximation of high frequency components of thehighly oscillatory Stokes solution to one of lower frequencies. The MscaleDNN solutionto the Stokes problem is obtained by minimizing a loss function in terms of L normof the residual of the Stokes equation. Three forms of loss functions are investigatedbased on vorticity-velocity-pressure, velocity-stress-pressure, and velocity-gradient ofvelocity-pressure formulations of the Stokes equation. We ﬁrst conduct a systematicstudy of the MscaleDNN methods with various loss functions on the Kovasznay ﬂowin comparison with normal fully connected DNNs. Then, Stokes ﬂows with highlyoscillatory solutions in a 2-D domain with six randomly placed holes are simulatedby the MscaleDNN. The results show that MscaleDNN has faster convergence andconsistent error decays in the simulation of Kovasznay ﬂow for all four tested lossfunctions. More importantly, the MscaleDNN is capable of learning highly oscillatorysolutions when the normal DNNs fail to converge. AMS subject classiﬁcations : 35Q68, 65N99, 68T07, 76M99

Key words : deep neural network, Stokes equation, multi-scale, meshless methods.

Numerical methods for incompressible ﬂow is one of the major topics in computationalﬂuid dynamics, which has been intensively studied over last ﬁve decades. Varioustechniques have been proposed to address the incompressibility condition of the ﬂow,including projection methods [4] [18], Gauge methods [6], and time splitting methods ∗ Corresponding author.

Email addresses: [email protected] (Bo Wang), [email protected] (WenzhongZhang), [email protected] a r X i v : . [ m a t h . NA ] O c t [13], among others. Finite element and spectral element methods [3] are mostly usedto discretize the Navier-stokes equation where special attentions are needed for theapproximation spaces of velocity and pressure to satisfy the Babuska and Brezzi inf-supcondition for a saddle point problem [8]. Besides, for large scale engineering applications,body-ﬁtted mesh generations for 3-D objects and efﬁcient linear solvers for the resultinglinear systems have been a major issue for computational resources.The emerging deep neural network (DNN) has found many applications beyondits traditional applications such as image classiﬁcation and speech recognition. Recentwork in extending DNNs to the ﬁeld of scientiﬁc and engineering computing has shownmuch promise [7] [9] [17]. DNN based numerical methods are usually formulated asan optimization problem where the loss function could be an energy functional as ina Ritz formulation of a self-adjoint differential equation [7] or simply the least squaredmean of the residual of the PDEs [10] [2] [11]. The DNN technique provides a powerfulapproximation method to represent solutions of high dimensional variables while thetraditional ﬁnite element and spectral element methods encounter the well known curse ofdimensionality problem. Also, there are several advantages of using DNN to approximatethe solution of the incompressible ﬂows. Firstly, the stochastic optimization algorithmemployed by DNN based methods relies on loss calculated on randomly sampled pointsin the computational domain rather than over an unstructured mesh ﬁtting the geometryof the complex objects in the ﬂuid problem. This feature renders the DNN-based methodsfor solving PDEs a truly mesh-less method. Secondly, due to the capability of the DNN inhandling high dimensional functions, the approximation of a time dependent solutioncan be carried out in the temporal-spatial four dimensional space. Thirdly, boundaryconditions for the ﬂuid problems can be simply enforced by introducing penalty terms inthe loss function and no need to ﬁnd and implement appropriate and non-trivial boundaryconditions for pressure [16] or vorticity variables in corresponding formulations for theStokes or Navier-Stokes equations.Normal fully connected DNNs used for image classiﬁcation and data science applica-tions have been shown to be ineffective in learning high frequency contents of the solutionas illustrated in recent works on DNNs’ frequency dependent convergence properties [19].Unfortunately, ﬂuid ﬂow at high Reynolds number will contain many scales, which is thehallmark of the onset of turbulent ﬂow from a laminar one. Therefore, in order to makethe DNN based approaches to be competitive numerical methods, in terms of resolutionpower, to popular spectral [3] and spectral element methods [12], it is important to developnew classes of DNNs which can represent scales of drastic disparities arising from thestudy of turbulent ﬂows. For this purpose, we have recently developed strategies tospeed up the convergence of DNNs in learning high frequency content of the solutionsof PDEs. Two new DNNs have been proposed: a PhaseDNN [2] and a MscaleDNN [11].The PhaseDNN uses a series of phase shifts to convert high frequency contents to a lowfrequency range before the learning is carried out. This method has been shown to be veryeffective in simulating high frequency Helmholtz equations in acoustic wave scattering.On the other hand, the MscaleDNN uses a radial scaling technique in the frequency domain (or a corresponding scaling in the physical domain) to convert solution content ofa range of higher frequency to a lower frequency one, which will be learned quickly witha small size DNN, and the latter is then scaled back in the physical space to approximatethe original solution content. MscaleDNN is more effective to handle higher dimensionalPDEs and has already been shown to be superior over traditional fully connected DNNsfor solving Poisson-Boltzmann equation in complex and singular domains [11]. In this pa-per, we will extend the MscaleDNN approach to ﬁnd the solution of the Stokes problem asa ﬁrst step to develop DNN based numerical methods for time-dependent incompressibleNavier-Stokes equations.The rest of the paper is organized as follows. In section 2, we will present the structureof the MscaleDNN to be used for solving the Stokes problems. Section 3 will proposeseveral loss functions for training, based on three different ﬁrst order system reformula-tions of the Stokes equation. A benchmark test on a low frequency Kovasznay ﬂow willbe conducted in section 4 to evaluate the performance of normal fully connected DNNand MscaleDNNs as well as different loss functions. Section 5 will present the numericaltests of highly oscillatory Stokes ﬂows with multiple frequencies in a complex domain.Finally, a conclusion and discussion of future work are given in Section 6. In a recent work [11], a multi-scale DNN was proposed, which consists of a series of parallelnormal sub-neural networks. Each of the sub-networks will receive a scaled version ofthe input and their outputs will then be combined to make the ﬁnal out-put of theMscaleDNN (refer to Fig. 1). The individual sub-network in the MscaleDNN with a scaledinput is designed to approximate a segment of frequency content of the targeted functionand the effect of the scaling is to convert a speciﬁc high frequency content to a lowerfrequency range so the learning can be accomplished much quickly. Recent work [19] onthe frequency dependence of the DNN convergence shows that much faster convergenceoccurs in approximating low frequency function compared with approximating highfrequency ones, the MscaleDNN takes advantage of this property. In addition, in order toproduce scale separation and identiﬁcation capability for a MscaleDNN, we borrowed theidea of compact mother scaling and wavelet functions from the wavelet theory [5], andfound that the activation functions with a localized frequency proﬁle works better thannormal activation functions, e.g., ReLU, tanh, etc.Fig. 1 shows the schematics of a MscaleDNN consisting of n networks. Each scaledinput passing through a sub-network can be expressed in the following formula f θ ( x ) = W [ L − ] σ ◦ ( ··· ( W [ ] σ ◦ ( W [ ] ( x )+ b [ ] )+ b [ ] ) ··· )+ b [ L − ] , (2.1)where W [ ] to W [ L − ] and b [ ] to b [ L − ] are the weight matrices and bias unknowns, re-spectively, to be optimized via the training, σ ( x ) is the activation function. In this work,the following plane wave activation function will be used for its localized frequency Figure 1: Illustration of a MscaleDNN.property [11], σ ( x ) = sin ( x ) . (2.2)For the input scales, we could select the scale for the i -th sub-network to be i (as shownin Fig. 1) or 2 i − . Mathematically, a MscaleDNN solution f ( x ) is represented by thefollowing sum of sub-networks f θ ni with network parameters denoted by θ n i (i.e., weightmatrices and bias) f ( x ) ∼ M ∑ i = f θ ni ( α i x ) , (2.3)where α i is the chosen scale for the i -th sub-network in Fig. 1. For more details on thedesign and discussion of the MscaleDNN, please refer to [11].For comparison studies in this paper, we will refer to a “ normal ” network as an onefully connected DNN with the same total number of neurons as the MscaleDNN, butwithout multi-scale features. We would perform extensive numerical experiments toexamine the effectiveness of different settings and select efﬁcient ones to solve complexproblems. All DNN models are trained by Adam [14]. The following two dimensional (2-D) Stokes problem − ν (cid:52) u + ∇ p = f , in Ω , (3.1) ∇· u =

0, in Ω , (3.2) u = g , on ∂ Ω , (3.3) will be solved by the MscaleDNN, here Ω is an open bounded domain in R , and theboundary condition g satisﬁes a compatibility condition (cid:90) ∂ Ω g · n ds =

0. (3.4)The MscaleDNN solution will be found as in the traditional least square ﬁnite elementmethod [1] where the solution is obtained by minimizing a loss function in terms of theresidual of the Stokes problem (3.1). To introduce loss functions for the DNN algorithms,we ﬁrst reformulate (3.1)-(3.3) into a ﬁrst order system as in least square ﬁnite elementmethods for solving Stokes problem. There are various possible ways of recasting (3.1)into a ﬁrst order system, and we will focus on the following three popular approachesused in the development of least square ﬁnite element methods [1]. • Vorticity-velocity-pressure ( ω VP) formulation:

The ﬁrst approach introduces a vortic-ity variable, a scalar quantity for 2-D ﬂows, ω = ∇× u = ∂ x u y − ∂ y u x , (3.5)arriving at a vorticity-velocity-pressure ( ω VP) system: ν ∇× ω + ∇ p = f , in Ω , (3.6a) ω = ∇× u , in Ω , (3.6b) ∇· u = , in Ω . (3.6c) • Velocity-stress-pressure (VSP) formulation:

The second approach introduces a stresstensor T = √ ν ( ∇ u + ∇ u (cid:62) ) /2, (3.7)while a velocity-stress-pressure (VSP) system −√ ν ∇· T + ∇ p = f , in Ω , (3.8a) T = √ ν ( ∇ u + ∇ u (cid:62) ) , in Ω , (3.8b) ∇· u = , in Ω , (3.8c)is obtained. • Velocity-gradient of velocity-pressure (VgVP) formulation:

The third approach simplyintroduces a variable U = ∇ u (by taking gradient on each component of the velocity ﬁeld),which leads to a velocity-gradient of velocity-pressure (VgVP) system − ν ∇· U + ∇ p = f , in Ω , (3.9a) U = ∇ u , in Ω . (3.9b) ∇· u = in Ω . (3.9c) It is well known that it is more difﬁcult to compute the pressure than the velocityin computational ﬂuid dynamics. We ﬁnd that the velocity also converges faster thanpressure in the DNN-based methods. In order to take care of the pressure, we takedivergence on both sides of the Stokes equation (3.1) to obtain a Poisson equation ∆ p = ∇· f , in Ω . (3.10)The residual of this equation will be an extra term in the loss function, and a tunableweight on the loss due to pressure is introduced. To be consistent with the ﬁrst ordersystems above, we also reformulate the Poisson equation (3.10) into q = ∇ p , in Ω , (3.11a) ∇· q = ∇· f , in Ω . (3.11b)Together with the ﬁrst order systems (3.6),(3.8) or (3.9), respectively, we can design theMscaleDNN algorithms. In each algorithm, a total of four MscaleDNNs will be used: onefor the velocity vector u , one for the pressure p , one for the gradient of pressure q and onefor the vorticity ω , stress T or the gradient of velocity U , respectively. The DNN solutionsare denoted by u ( x , θ u ) , p ( x , θ p ) , ω ( x , θ ω ) , T ( x , θ T ) , U ( x , θ U ) , q ( x , θ q ) accordingly. Based onthe ﬁrst order systems, we deﬁne loss functions as follows L ω VP ( θ u , θ p , θ ω , θ q ) : = (cid:107) ν ∇× ω + q − f (cid:107) Ω + α (cid:107)∇· q −∇· f (cid:107) Ω + (cid:107)∇× u − ω (cid:107) Ω + (cid:107)∇· u (cid:107) Ω + (cid:107)∇ p − q (cid:107) Ω + β (cid:107) u − g (cid:107) ∂ Ω , L VSP ( θ u , θ p , θ T , θ q ) : = (cid:107)√ ν ∇· T − q + f (cid:107) Ω + α (cid:107)∇· q −∇· f (cid:107) Ω + (cid:13)(cid:13)(cid:13) √ ν ( ∇ u + ∇ u (cid:62) ) − T (cid:13)(cid:13)(cid:13) Ω + (cid:107)∇· u (cid:107) Ω + (cid:107)∇ p − q (cid:107) Ω + β (cid:107) u − g (cid:107) ∂ Ω , L VgVP ( θ u , θ p , θ U , θ q ) : = (cid:107) ν ∇· U − q + f (cid:107) Ω + α (cid:107)∇· q −∇· f (cid:107) Ω + (cid:107)∇ u − U (cid:107) Ω + (cid:107)∇· u (cid:107) Ω + (cid:107)∇ p − q (cid:107) Ω + β (cid:107) u − g (cid:107) ∂ Ω , (3.12)where α , β are penalty constants. We emphasize that the Poisson residual α (cid:107)∇· q −∇· f (cid:107) Ω + (cid:107)∇ p − q (cid:107) Ω ,in the loss function is important for the convergence of the pressure as to be shown vianumerical results in Section 5.3.For the brevity of notations, the loss functions in (3.12) are named as ω VP-loss, VSP-lossand VgVP-loss, accordingly. In the rest of this paper, these loss functions will be comparedwith the simple loss function directly obtained from the original Stokes equation: L VP ( θ u , θ p ) = (cid:107) ν ∆ u −∇ p + f (cid:107) Ω + (cid:107)∇· u (cid:107) Ω + β (cid:107) u − g (cid:107) ∂ Ω , (3.13)which is named as VP-loss. In the DNN algorithms using this loss function, a total of twoMscaleDNNs will be used: one for the velocity vector u where the output y = u in Fig. 1,and one for the scalar pressure p . As a benchmark test, we ﬁrst consider the Stokes problem in a square domain Ω =[ ] × [ − ] with an exact solution coinciding with the analytical solution of theincompressible Navier-Stokes equations obtained by Kovasznay [15], i.e., u = − e λ x cos ( π x ) , u = λ π e λ x sin ( π x ) , p = e λ x , (4.1)where λ = Re − (cid:114) Re + π , Re = ν .The source term f is obtained by substituting the exact solution into the Stokes equation(3.1). We set the viscosity ν = { x ,2 x ,4 x ,8 x ,16 x ,32 x } and their fully connectedsub-networks all have 4 hidden layers and 50 neurons in each hidden layer. On the otherhand, a fully connected DNN with 4 hidden layers and 300 neurons in each hidden layerwas tested for comparison. Therefore, the total number of neurons in the fully connectedDNN and MscaleDNNs are the same. Nevertheless, the fully connected DNN does havemore connectivity with more parameters. In the loss functions, we ﬁx α = β = Ω and 10000 points on the boundary for learning.In the learning process, we set batch size equal to 1000 points inside the domain andrandomly pick 400 points on the boundary for each step. • Adaptive learning rates.

We have found that reducing learning rate as the trainingprogresses can have a noticeable improvement in the reduction of loss. In our numericaltests, the learning rate of the ﬁrst 100 epochs is set to be 0.001. Then, the learning rate willbe reduced by a factor of 10 after each 100 epochs. The change of learning rate can be seenclearly in the history of losses.In order to check the accuracy of the algorithms, we deﬁne (cid:96) -errors Err ( u ) = (cid:16) N N ∑ j = | u DNN ( x j ) − u ( x j ) | (cid:17) , Err ( p ) = (cid:16) N N ∑ j = | p DNN ( x j ) − p ( x j ) | (cid:17) , (4.2)between the DNN solution { u DNN ( x ) , p DNN ( x ) } and the exact solution { u ( x ) , p ( x ) } givenin (4.1). Here, { x j = ( x j , x j ) } Nj = are locations of a uniform 200 ×

200 mesh of the domain Ω .The DNN solutions obtained by minimizing different loss functions in (3.12)-(3.13) arecompared in Fig. 2-3. The results show that both fully connected DNN and MscaleDNNsconverge in 300 epochs with any one of the loss functions in (3.12). However, the simpleVP-loss in (3.13) has a very poor performance no matter if the fully connected DNN or theMscaleDNNs is used. In particular, both fully connected DNN and MscaleDNNs can notproduce reasonable results within 300 epochs if the VP-loss function is used. (a) loss (b) Err ( u ) (c) Err ( p ) Figure 2: Normal DNN with different loss functions. (a) loss (b)

Err ( u ) (c) Err ( p ) Figure 3: MscaleDNN with different loss functions. l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs e rr o r normal DNNMSDNN (c) Err ( p ) Figure 4: Normal DNN and MscaleDNN with loss function L ω VP ( θ u , θ p , θ ω , θ q ) .More detailed difference can been seen from the comparison of loss and errors betweenthe normal DNN and MscaleDNN for the three loss functions in Fig. 4-6. The resultsshow that the MscaleDNNs have much faster convergence no matter which loss functionis used. In fact, MscaleDNNs can achieve much better accuracy than normal DNN as wecan see in Fig. 4(b)-6(b). In particular, the MscaleDNN solutions obtained by minimizingthe ω VP -loss are compared with exact solution along the line y = l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 5: Normal DNN and MscaleDNN with loss function L VSP ( θ u , θ p , θ T , θ q ) . l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 6: Normal DNN and MscaleDNN with loss function L VgVP ( θ u , θ p , θ U , θ q ) . u x (a) u x u y (b) u y p (c) p Figure 7: Error of MscaleDNN solutions at epoch 300 with loss function L ω VP ( θ u , θ p , θ ω , θ q ) . The MscaleDNN is more powerful than a normal DNN due to the former’s capability onsolving complicate problems with oscillatory solutions. Here, we consider the Stokes ﬂowin the domain Ω = [ ] × [ − ] with 6 cylindrical holes (refer to Fig. 8) centered at ( ) , ( − ) , ( ) , ( ) , ( ) , ( ) ,inside the domain. The radius of the cylinders are set to be 0.2,0.15,0.18,0.2,0.18,0.15,respectively. We will test two exact solutions with highly oscillatory velocity ﬁelds. All examples are set to run 1500 epochs using Adam. (a) computational domain (b) an example of u in (5.2) Figure 8: A oscillatory solution over a domain with six cylindrical voids.The adaptive learning rates technique will be used in the numerical tests below, wherethe learning rate of the ﬁrst 500 epochs is set to be 0.001, then, the learning rate will bereduced by a factor of 10 after each 500 epochs. The change of learning rate can be seenclearly in the history of losses later. In the loss functions, we ﬁx the penalty parameter β =

100 and set an initial penalty parameter α = Err ( u ) and Err ( p ) and adjust parameter α as follows• If Err ( u ) > Err ( p ) , α = α + Err ( p ) > Err ( u ) and α > α = α − (cid:96) -errors deﬁned in (4.2) are computed again with 34,072randomly selected points in the computational domain. The ﬁrst case has an exact solution given by u = − e λ x cos ( n π x + m π x )) , u = λ m π e λ x sin ( n π x + m π x )+ nm e λ x cos ( n π x + m π x ) , p = e λ x , λ = Re − (cid:114) Re + π , Re = ν , (5.1)with frequencies n = m =

55. In the simulations of this example, the MscaleDNNs for u , ω , T and U are set to have 11 scales: { x ,2 x , ··· ,2 x } and the embedded fully connected DNN for each scale is set to have 8 hidden layers and 150 neurons in each hidden layer.As the pressure does not have high oscillations, the MscaleDNNs for p and q are set tohave 6 scales: { x ,2 x , ··· ,2 x } and the embeded fully connected DNN for each scale is setto have 8 hidden layers and 50 neurons in each hidden layer. We randomly sample 850621points inside Ω and 140000 points on the boundary for learning. In the learning process,we set batch size equal to 10000 points inside the domain and randomly pick 2000 pointson the boundary for each step.The MscaleDNN solutions of u are compared with the exact u in Fig. 9-11. Here,we plot the solutions along the line y = u and p using different lossesare depicted in Fig. 12. We can see that the ω VP-loss or

VgVP -loss with MscaleDNNcan produce very accurate solutions in just 1500 epochs while the VSP-loss needs morelearning to achieve similar accuracy. v x exactMSDNN+ VP-loss Figure 9: Exact u and its MscaleDNN approximation with ω VP-loss L ω VP ( θ u , θ p , θ ω , θ q ) .For comparison, we also test the DNN-based algorithm only using fully connectedDNNs. For u and intermediate variables ω , T and U , we use fully connected DNNswith 8 hidden layers and 1650 neurons in each hidden layer. For p and q , we use fullyconnected DNNs with 8 hidden layers and 300 neurons in each hidden layer. Therefore,the total number of neurons in the fully connected DNNs and the MscaleDNNs are thesame. The losses and (cid:96) -errors obtained by minimizing different loss functions in (3.12)are compared in Fig. 13-15. For this highly oscillatory solution, algorithms using fullyconnected DNNs can not learn anything within 1500 epochs. However, the ones usingMscaleDNNs converge very fast within 1500 epochs. v x exactMSDNN+VSP-loss Figure 10: Exact u and its MscaleDNN approximation with VSP-loss L VSP ( θ u , θ p , θ T , θ q ) . v x exactMSDNN+VgVP-loss Figure 11: Exact u and its MscaleDNN approximation with VgVP-loss L VgVP ( θ u , θ p , θ U , θ q ) . Our second test problem will be a case where the velocity ﬁeld has multiple high frequen-cies as follows, u = − e λ x cos ( π x + π x )) − e λ x cos ( π x + π x )) , u = λ π e λ x sin ( π x + π x )+ e λ x cos ( π x + π x )+ λ π e λ x sin ( π x + π x )+ e λ x cos ( π x + π x ) , p = e λ x , λ = Re − (cid:114) Re + π , Re = ν . (5.2) e rr o r VgVP-lossVSP-lossVP-loss (a)

Err ( u ) e rr o r VgVP-lossVSP-lossVP-loss (b)

Err ( p ) Figure 12: Errors of MscaleDNN approximations using different loss functions. l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 13: Comparison of a normal DNN and the MscaleDNN with loss function L ω VP ( θ u , θ p , θ ω , θ q ) . l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 14: Comparison of a normal DNN and the MscaleDNN with loss function L VSP ( θ u , θ p , θ T , θ q ) .For this test, the MscaleDNNs for u , ω , T and U are set to have 10 scales: { x ,2 x , ··· ,2 x } and the embedded fully connected DNN for each scale is set to have 8 hidden layers and120 neurons in each hidden layer. As in the last numerical test, the MscaleDNNs for p and q are set to have 6 scales: { x ,2 x , ··· ,2 x } and the embeded fully connected DNN for eachscale is set to have 8 hidden layers and 50 neurons in each hidden layer. We randomly l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 15: Comparison of a normal DNN and the MscaleDNN with loss function L VgVP ( θ u , θ p , θ U , θ q ) .sample 425290 points inside Ω and 140000 points on the boundary for learning. In thelearning process, we set batch size equal to 5000 points inside the domain and randomlyselect 2000 points on the boundary for each step.The MscaleDNN solutions of u are compared with the exact u in Fig. 16-18. Here, weagain plot the solutions along the line y = u and p using different lossesare depicted in Fig. 19. We can see that the ω VP-loss or

VgVP -loss with the MscaleDNNcan obtain very accurate solutions within 1500 epochs. Again, the VSP-loss need morelearning to achieve similar accuracy. v x exactMSDNN+ VP-loss Figure 16: Exact u and its MscaleDNN approximation with loss function L ω VP ( θ u , θ p , θ ω , θ q ) .For comparison, we test algorithms using only fully connected DNNs. For u andvariables ω , T and U , we use fully connected DNNs with 8 hidden layers and 1200 v x exactMSDNN+VSP-loss Figure 17: Exact u and its MscaleDNN approximation with loss function L VSP ( θ u , θ p , θ T , θ q ) . v x exactMSDNN+VgVP-loss Figure 18: Exact u and its MscaleDNN approximation with loss function L VgVP ( θ u , θ p , θ U , θ q ) .neurons in each hidden layer. For p and q , we use fully connected DNNs with 8 hiddenlayers and 300 neurons in each hidden layer. Again, the total number of neurons inthe fully connected DNNs and the MscaleDNNs are the same. The losses and (cid:96) -errorsobtained by minimizing different loss functions in (3.12) are compared in Fig. 20-22, whichclearly show the fast convergence of the MscaleDNNs when the normal fully connectedDNNs fail to converge at all. e rr o r VgVP lossVSP-lossVP-loss (a)

Err ( u ) e rr o r VgVP lossVSP-lossVP-loss (b)

Err ( p ) Figure 19: Error of MscaleDNN approximations using different loss functions. l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 20: Comparison of a normal DNN and the MscaleDNN with loss function L ω VP ( θ u , θ p , θ ω , θ q ) . l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 21: Comparison of a normal DNN and the MscaleDNN with loss function L VSP ( θ u , θ p , θ T , θ q ) . p and Poisson equation It is a well-known fact that the traditional projection methods for incompressible ﬂowmay experience an error degeneration for pressure near the boundaries depending onthe types of pressure boundary conditions used for the Poisson equation (3.10) [13]. Toshow the importance of the pressure’s Poisson equation in the DNN-based approaches l o g ( l o ss ) normal DNNMSDNN (a) loss epochs10 e rr o r normal DNNMSDNN (b) Err ( u ) epochs10 e rr o r normal DNNMSDNN (c) Err ( p ) Figure 22: Comparison of a normal DNN and the MscaleDNN with loss function L VgVP ( θ u , θ p , θ U , θ q ) .for the Stokes problem, we will study a loss function without explicitly including theresidual of the Poisson equation. Here, we consider a modiﬁcation of the loss function L ω VP ( θ u , θ p , θ ω , θ q ) given by (cid:101) L ω VP ( θ u , θ p , θ ω ) : = (cid:107) ν ∇× ω + ∇ p − f (cid:107) Ω + (cid:107)∇× u − ω (cid:107) Ω + (cid:107)∇· u (cid:107) Ω + β (cid:107) u − g (cid:107) ∂ Ω . (5.3)The input data, size of the MscaleDNNs and other settings are exactly the same as we haveused in the numerical tests in Section 5.2. The loss, errors of the MscaleDNN solutions arecompared with the algorithm using loss function L ω VP in Fig. 23. We can see that the lossand Err ( u ) are compatible. However, Err ( p ) is signiﬁcantly improved if the loss functionwith a Poisson equation residual is used. epoch

200 400 600 800 1000 1200 1400 l og ( Lo ss ) ˜ L ωVP L ωVP (a) loss epoch e rr o r -2 -1 ˜ L ωVP L ωVP (b) Err ( u ) epoch e rr o r -1 ˜ L ωVP L ωVP (c) Err ( p ) Figure 23: Effect of Poisson equation in the loss function: Loss functions L ω VP ( θ u , θ p , θ ω , θ q ) (red lines and diamonds) vs. ˜ L ω VP ( θ u , θ p , θ ω ) (blue lines and circles). In this paper, we have studied the MscaleDNN methods for solving highly oscillatoryStokes ﬂow in complex domains and demonstrated the capability of the MscaleDNNas a meshless and high resolution numerical method for simulating ﬂows in complex domains. Several least square formulations of the Stokes equations using different formsof ﬁrst order systems are used to construct the loss functions for the MscaleDNN learning.The numerical results have clearly demonstrated the increased resolution power of theMscaleDNN to capture the ﬁne structures in the ﬂow ﬁelds when the normal fully con-nected network with the same overall sizes fail to converge at all. The MscaleDNN showsthe potential of DNN machine learning as a practical alternative numerical method totraditional ﬁnite element methods. The DNN-based methods have an obvious advantageof no need for expensive mesh generations and matrix solvers as for traditional mesh-based numerical methods nor the delicate treatment of pressure boundary conditions andincompressibility constrains of the ﬂow ﬁeld.There are many unresolved issues for solving Navier-Stokes equation, among themthe most important one is to understand the convergence property of the MscaleDNNlearning. A related issue is to ﬁnd adaptive strategies to dynamically selecting the penaltyconstants for various terms in the loss functions, which are sensitive for the performanceof DNN based machine learning PDE algorithms. It should also be mentioned that thestructure of MscaleDNN is amendable to adaptive selections of scales by either adding orremoving a scale dynamically during learning, future work will be done to explore thisfeature as well as to apply the MscaleDNN to 3-D time-dependent incompressible ﬂows. Acknowledgments

W.C. is supported by the U.S. Army Research Ofﬁce (grant W911NF-17-1-0368). B. W.acknowledges the ﬁnancial support provided by NSFC (grant 11771137,12022104).

References [1] P. B. Bochev and M. D. Gunzburger, Finite element methods of least-squares type, SIAM Rev.,40 (1998), pp. 789-837.[2] W. Cai, X.G. Li, and L.Z. Liu. A phase shift deep neural network for high frequency approx-imation and wave problems. to appear in SIAM J. Scientiﬁc Computing, arXiv:1909.11759,2019.[3] C. Canuto, M. Hussain, A. Quarteroni, And T. Zang, Spectral Methods in Fluid Dynamics(Springer-Verlag, New York/Berlin, 1987 ).[4] A.J. Chorin, On the convergence of discrete approximations to the Navier-Stokes equations,Mnih. Comp. 23, 341-353 (1969).[5] I. Daubechies, Ten lectures on wavelets. Society for industrial and applied mathematics; 1992Jan 1.[6] W. N. E, J. G. Liu, Gauge method for viscous incompressible ﬂows. Communications inMathematical Sciences. 2003;1(2):317-32.[7] W. N. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm forsolving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.[8] Girault V, Raviart PA. Finite element methods for Navier-Stokes equations: theory andalgorithms. Springer Science & Business Media; 2012 Dec 6.9