[PDF] A deep neural network approach on solving the linear transport model under diffusive scaling

Abstract

In this work, we propose a learning method for solving the linear transport equation under the diffusive scaling. Due to the multiscale nature of our model equation, the model is challenging to solve by using conventional methods. We employ the physical informed neural network (PINN) framework, a mesh-free learning method that can numerically solve partial differential equations. Compared to conventional methods (such as finite difference or finite element type), our proposed learning method is able to obtain the solution at any given points in the chosen domain accurately and efficiently, which enables us to better understand the physics underlying the model. In our framework, the solution is approximated by a neural network that satisfies both the governing equation and other constraints. The network is then trained with a combination of different loss terms. Using the approximation theory and energy estimates for kinetic models, we prove theoretically that the total loss vanishes as the neural network converges, upon which the neural network approximated solution converges pointwisely to the analytic solution of the linear transport model. Numerical experiments for two benchmark examples are conducted to verify the effectiveness and accuracy of our proposed method.

Full PDF

AA deep neural network approach on solving the lineartransport model under diﬀusive scaling

Liu Liu ∗ , Tieyong Zeng † , Zecheng Zhang ‡ Abstract

In this work, we propose a learning method for solving the linear transport equa-tion under the diﬀusive scaling. Due to the multiscale nature of our model equation,the model is challenging to solve by using conventional methods. We employ thephysical informed neural network (PINN) framework, a mesh-free learning methodthat can numerically solve partial diﬀerential equations. Compared to conventionalmethods (such as ﬁnite diﬀerence or ﬁnite element type), our proposed learningmethod is able to obtain the solution at any given points in the chosen domain accu-rately and eﬃciently, which enables us to better understand the physics underlyingthe model. In our framework, the solution is approximated by a neural network thatsatisﬁes both the governing equation and other constraints. The network is thentrained with a combination of diﬀerent loss terms. Using the approximation theoryand energy estimates for kinetic models, we prove theoretically that the total lossvanishes as the neural network converges, upon which the neural network approxi-mated solution converges pointwisely to the analytic solution of the linear transportmodel. Numerical experiments for two benchmark examples are conducted to verifythe eﬀectiveness and accuracy of our proposed method.

The linear transport model, which describes the kinetic particles collision and absorp-tion through a material medium while evolving in time, is an important physics modeland arises in applications of various ﬁelds. In this paper, we study the linear transportequation under the diﬀusive scaling by using the deep neural network approach.The deep learning method has been successful in solving various problems in thecomputer science area; it improves the computation speed and the accuracy of manytasks such as the image classiﬁcation, segmentation and language processing. Once themodel is trained, one can use it by substituting the testing samples in the model, which ∗ Department of Mathematics, The Chinese University of Hong Kong † Department of Mathematics, The Chinese University of Hong Kong ‡ Department of Mathematics, Purdue University, West Lafayette, IN 47906, USA a r X i v : . [ m a t h . NA ] F e b sually is a forward process and only tensor productions are involved. The data-drivensolvers for partial diﬀerential equations (PDEs) have drawn an increasing attention due totheir capability to encode the underlying physical laws governed by the model equationand give relatively accurate predictions for the unknowns. In particular, we mentionthe physical informed neural networks (PINNs) framework [10, 18, 16, 11, 19, 7, 21].This method leverages the beneﬁts of auto-diﬀerentiation of the current software andthe underlying physics of the PDEs, which can be incorporated into the network byminimizing the losses composed of the PDE’s residuals and other constraints such as theinitial and boundary conditions. There are many other works developed, for example[2, 3, 26, 24]. In this work, we apply this popular method to solve the important,hyperbolic-type kinetic equation.The Deep Learning method can resolve some diﬃculties in traditional ﬁnite diﬀerence-type numerical methods [17], such as the expensive computational cost especially forproblems with high-dimensional physical variables, the challenge dealing with complexboundary conditions, truncation of the velocity domain (in some models such as theBoltzmann equation, the velocity lies in the three-dimensional whole space thus a trun-cation is needed in numerical discretizations). Besides, the Deep Learning algorithm hasthe advantage of being intuitive and easy to be executed. For example, instead of de-signing mass (or momentum and energy, if applicable) conservative schemes for kineticmodels, that are challenging in traditional numerical methods [6], one can simply involvethe derivative in time of conserved quantities of interests in the total loss function, asimplemented in [8].We mention some other advantages of using the DNN approach to solve the lineartransport model: a) to obtain the distribution function at any given t , x , v , instead ofgetting only discrete values at the uniform mesh in traditional ﬁnite volume or ﬁnite el-ement numerical methods; as a mesh-free method, works eﬃciently for high-dimensionalphysical space problems; b) to avoid high computational cost on the simulation of ki-netic equations due to the velocity variable v , and the integral-based, nonlocal collisionoperators appeared in other complicated kinetic models such as the Boltzmann or Lan-dau equations. Nevertheless, we mention that there are indeed some weaknesses of theDeep Learning approach. First, there is no guarantee that the Deep Learning algorithmwill converge and it is practically diﬃcult to show their convergence. It is also hardto evaluate the accuracy of the DNN approach in contrast with traditional numericalmethods.This paper is organized as the following. In Section 2, we introduce the backgroundof the linear transport model under the diﬀusive scaling. In Section 3, we review anddiscuss the neural network framework and method for solving general PDEs. Two mainconvergence results are given in Section 4, showing that 1) the loss function goes to zeroas the neural network converges; 2) the neural network solution converges point-wisely2o the analytic solution when the loss function converges to zero. The eﬀectiveness andaccuracy of our proposed method, including the choice of weights in the total loss functionwill be presented in Section 5. Finally, we summarize the paper and mention some futurework in Section 6. The linear transport model, which describes the kinetic particles collision and ab-sorption through a material medium while evolving in time, arises in many applications,such as atmosphere and ocean modeling [4, 20, 25], astrophysics [15] or nuclear physics.Such problems usually involve several orders of magnitudes of length scales characterizedby the Knudsen number, deﬁned as the ratio of the mean free path over a typical lengthscale such as the size of the spatial domain.We consider the linear transport equation under diﬀusive scaling, with one-dimensionalspace and velocity variable ( x ∈ Ω ⊂ R , v ∈ [ − , ε∂ t f + v · ∇ x f = 1 ε L ( f ) , f ( t = 0 , x, v ) = f ( x, v ) , L ( f ) = σ ( x ) (cid:20) (cid:90) − f ( t, x, v ) dv − f ( t, x, v ) (cid:21) . (2.1) We brieﬂy review the Deep Neural Network (DNN) structure and approach introducedin [8], where the one-dimensional kinetic Fokker-Planck equation is studied. Denote theapproximated function as f nn ( t, x, v, z ; m, w, b ) and suppose DNN has L layers; the inputlayer takes ( t, x, v ) as input and the ﬁnal layer gives f nn ( t, x, v, z ; m, w, b ) as the output.The relation between the l -th and ( l + 1)-th layer ( l = 1 , , · · · L −

1) is given by u ( l +1) j = m l (cid:88) i =1 w ( l +1) ji ¯ σ l ( z li ) + b ( l +1) j , where m = ( m , m , m , . . . , m L − ), the weights w = (cid:110) w ( k ) ji (cid:111) m k − ,m k ,Li,j,k =1 and the bias b = (cid:110) b ( k ) j (cid:111) m k ,Lj =1 ,k =1 are given in [8], which we refer to for details.Regarding the optimization algorithm, we use Adam optimization algorithm, whichis an extended algorithm of the stochastic gradient descent and is popularly used in theapplications of the deep learning. 3 .1 Deﬁnition of loss functions

The loss function for the governing linear transport equation (2.1) is deﬁned by

Loss GE = (cid:90) T (cid:90) Ω (cid:90) V (cid:12)(cid:12)(cid:12) ε∂ t f nn ( t, x, v ; m, w, b ) + v k ∂ x f nn ( t, x, v ; m, w, b ) − ε L ( f nn )( t, x, v ; m, w, b ) (cid:12)(cid:12)(cid:12) dvdxdt ≈ N i,j,k (cid:88) i,j,k (cid:12)(cid:12)(cid:12) ∂ t f nn ( t i , x j , v k ; m, w, b ) + v k ∂ x f nn ( t i , x j , v k ; m, w, b ) − ε L ( f nn )( t i , x j , v k ; m, w, b ) (cid:12)(cid:12)(cid:12) , (3.1)where N i,j,k = N i N j N k , and the collision operator L ( f nn )( t i , x j , v k ; m, w, b ) = σ ( x j ) (cid:34) N k (cid:88) k f nn ( t i , x j , v k ; m, w, b ) − f nn ( t i , x j , v k ; m, w, b ) (cid:35) . Deﬁne n x the unit outward normal vector on the boundary ∂ Ω, and γ def = ∂ Ω × [ − , γ + , incoming boundary γ − and a singular boundary γ , deﬁned by γ + def = { ( x, v ) ∈ ∂ Ω × [ − ,

1] : n x · v > } ,γ − def = { ( x, v ) ∈ ∂ Ω × [ − ,

1] : n x · v < } ,γ = { ( x, v ) ∈ ∂ Ω × [ − ,

1] : n x · v = 0 } . We now deﬁne the loss terms for the initial condition and the boundary conditions:

Loss BC = (cid:88) x ∈ ∂ Ω (cid:90) T (cid:90) V | f nn ( t, x, v ; m, w, b ) − g ( t, x, v ; m, w, b ) | dvdt ≈ | ∂ Ω | N i,k (cid:88) i,k | f nn ( t i , x, v k ; m, w, b ) − g ( t i , x, v k ; m, w, b ) | , (3.3)where | ∂ Ω | denotes the volume of the spatial boundary. Add up (3.1), (3.2) and (3.3),with appropriate weights { λ g , λ i , λ b } , the total loss function is deﬁned by Loss

T otal = λ g Loss GE + λ i Loss IC + λ b Loss BC . (3.4)4 Analysis results

In this section, we show the two main theoretical results.Recall that neural network architecture was ﬁrst introduced in [14]. Later in [5],Cybenko established suﬃcient conditions where a continuous function can be approxi-mated by ﬁnite linear combinations of single hidden layer neural networks, followed bythe work in [12] that extends the theory to the multi-layer network case. We paraphrasethe Universal Approximation Theorem [5] to the form that is needed in our context: If f solves the linear transport model (2.1) and is suﬃciently smooth in all variables, thenthere exists a two-layer neural network f nn ( t, x, v ) = m (cid:88) i =1 w (2)1 i ¯ σ (cid:16)(cid:16) w (1) i , w (1) i , w (1) i (cid:17) · ( t, x, v ) + b (1) i (cid:17) + b (2)1 such that || f − f nn || L ∞ ( K ) < η, || ∂ t ( f − f nn ) || L ∞ ( K ) < η, ||∇ x ( f − f nn ) || L ∞ ( K ) < η, where the domain K = [0 , T ] × Ω × [ − , We ﬁrst show that a sequence of neural network solutions to (2.1) exists such thatthe total loss function converges to zero.

Theorem 4.1.

Let f solves the equation (2.1) and satisﬁes f ∈ C ([0 , T ]) ∩ C (Ω × V ) .Then there exists a sequence of neural network parameters { m [ j ] , w [ j ] , b [ j ] } ∞ j =1 such that thesequence of DNN solutions with m [ j ] nodes, given by { f j ( t, x, v ) = f nn ( t, x, v ; m [ j ] , w [ j ] , b [ j ] ) } ∞ j =1 satisﬁes Loss

Total ( f j ) → , as j → ∞ . Proof.

Denote the velocity domain D = [ − ,

1] and µ ( V ) = (cid:82) D dv = 2. Deﬁne d ge,j ( t, x, v, z ) := − [ ε∂ t + v · ∇ x ] f j + 1 ε L ( f j ) . (4.1)Integrate | d ge,j | over [0 , T ] × Ω × D , which is equivalent to | ε∂ t ( f − f j ) + v · ∇ x ( f − f j ) − ε L ( f ) + 1 ε L ( f j ) | (4.2)We will show that the loss term (3.1) is bounded by O ( η ). For the ﬁrst two termsin (4.2), it is bounded by η T µ ( V ) | Ω | ( ε + µ ( V ) ). Observe that for any function g , (cid:90) − ( L ( g )) dv = − σ (cid:18)(cid:90) − g ( v ) dv (cid:19) + σ (cid:90) − g ( v ) dv ≤ σ (cid:90) − g ( v ) dv. (4.3)5ince || f − f j || L ( D ) ≤ || f − f j || L ∞ ( D ) µ ( V ) < Cη , (4.4)Let g = f − f j in (4.3) and by (4.4), we have (cid:107)L ( f ) − L ( f j ) (cid:107) L v = (cid:107)L ( f − f j ) (cid:107) L v < O ( η ).So the last two terms in (4.2): (cid:13)(cid:13)(cid:13)(cid:13) ε L ( f ) − ε L ( f j ) (cid:13)(cid:13)(cid:13)(cid:13) L (Ω ×D ) < O (cid:18) η ε (cid:19) , (4.5)since Ω is bounded. Therefore, the lost term Loss GE in (3.1) bounded by O (cid:0) ε η + ( ηε ) (cid:1) .For the inﬂow boundary condition, Loss BC is bounded by || f j − f || L ( γ − T ) ≤ T µ ( V ) | ∂ Ω | || f j − f || L ∞ ( γ − T ) ≤ T µ ( V ) | ∂ Ω | || f j − f || L ∞ ([0 ,T ] × Ω ×D ) ≤ O ( η ) , where γ ± T := [0 , T ] × γ ± . Note that the specular reﬂection boundary condition workssimilarly. For the initial data, denoted by f j,in and f in respectively, Loss IC = || f j,in − f in || L (Ω ×D ) ≤ || f j,in − f in || L ∞ (Ω ×D ) | Ω | µ ( V ) ≤ O ( η ) . Set η = η j = j , combine all the loss terms (3.1)-(3.3), we conclude that Loss

T otal ( f j ) ≤ O ( 1 ε j ) . Therefore,

Loss

T otal ( f j ) →

0, as j → ∞ . In this section, we show that with the parameters { m [ j ] , w [ j ] , b [ j ] } ∞ j =1 equipped, theneural network in Theorem 4.1 converges to the analytic solution of the linear transportmodel. Theorem 4.2.

Let { m [ j ] , w [ j ] , b [ j ] } ∞ j =1 be a sequence deﬁned in Theorem 4.1 and f solvesthe linear transport model (2.1) . As a consequence, Loss

Total ( f j ) → implies that || f j ( · , · , · , m [ j ] , w [ j ] , b [ j ] ) − f || L ∞ ([0 ,T ]; L (Ω ×D )) → , for some ﬁnite time t ∈ [0 , T ] , physical variables ( x, v ) ∈ Ω × D . Proof.

Recall the deﬁnition (4.1), then[ ∂ t + v · ∇ x ] { f − f j } = d ge,j ( t, x, v ) + 1 ε L ( f ) − ε L ( f j ) . (4.6)Deﬁne d ic,j ( x, v ) := f ( x, v ) − f j (0 , x, v ) ,

6n addition to d bc,j ( t, x, v ) := g ( t, x, v ) − f j ( t, x, v ) at ( t, x, v ) ∈ γ − T , for inﬂow boundary condition, and d bc,j ( t, x, v ) := f j ( t, x, − v ) − f j ( t, x, v ) for specularreﬂection boundary condition.The L norms and and inner products below stand for || · || L (Ω ×D ) . Multiplying( f − f j ) onto (4.6) and integrating over Ω × D , one gets (cid:90) Ω (cid:90) D ε∂ t ( f − f j ) dvdx + (cid:90) γ + ( f − f j ) v · n x dγ − (cid:90) γ − d bc,j | v · n x | dγ = 2 ε (cid:10) L ( f − f j ) , f − f j (cid:11) L + 2 (cid:10) d ge,j , f − f j (cid:11) L . (4.7)Recall (4.5) that (cid:107)L ( f − f j ) (cid:107) L ≤ Cη . The right-hand-side in (4.7) is bounded by2 ε (cid:10) L ( f − f j ) , f − f j (cid:11) L ≤ ε (cid:107)L ( f − f j ) (cid:107) L + 1 ε (cid:107) f − f j (cid:107) L , and since (cid:82) γ + ( f − f j ) v · n x dγ ≥

0, thus ε ddt || f − f j || L ≤ (cid:90) γ − d bc,j | v · n x | dγ + || d ge,j || L + Cη ε (cid:124) (cid:123)(cid:122) (cid:125) := H ( t ) +(1 + 1 ε ) || f − f j || L . Since (cid:82) γ − d bc,j | v · n x | dγ ≤ (cid:82) γ − d bc,j dγ , and the deﬁnitions that (cid:90) t (cid:90) γ − ( d bc,j ( s, · , · , · )) dγds = Loss BC , (cid:90) t || d ge,j ( s, · , · , · ) || L ds = Loss GE , then || f − f j || L (Ω ×D ) ≤ e ( ε + ε ) t Loss IC + 1 ε e ( ε + ε ) t (cid:90) t H ( s ) ds (cid:46) e ( ε + ε ) t (cid:18) Loss IC + 1 ε Loss BC + 1 ε Loss GE + C η ε t (cid:19) ≤ e ( ε + ε ) t (cid:18) ε Loss T otal + C η ε t (cid:19) . We already know from Theorem (4.1) that

Loss

T otal ( f j ) ≤ O ( η ε ), now take L ∞ in t ∈ [0 , T ] with η = η j = j →

0, then || f − f j || L (Ω ×D ) ≤ C (cid:48) e ( ε + ε ) t η (cid:18) ε + C tε (cid:19) . (4.8)Therefore, || f − f j || L ∞ ([0 ,T ]; L (Ω ×D )) → , as j → ∞ . emark 4.3. In [13], Liu et al. employed the hypercoercivity and Lyapunov-type func-tionals to conduct sensitivity analysis for a general class of kinetic equations with randomuncertainties and multiple scales. It is possible to adopt that framework and energy es-timates analysis to study the convergence of the DNN solution (or even improve to astronger result of exponential decaying in time), as presented in this section. However,due to its high complexity and main focus of the current manuscript, we defer to a futurework. Also, we mention that a similar analysis techniques of this section can be foundin [27], where the radiative transfer model in the kinetic regime ( ε = 1 ), with numericalexamples of diﬀerent applications and its long time behavior was studied by the DNNapproach. In this section, we are going to verify the proposed method with two examples. Theﬁrst example aims to show the accuracy of the method, and a more challenging problemwith a boundary layer is studied in the second test, which veriﬁes that our proposedmethod is able to capture the physical properties of the model. We will compute thesolutions of the model problems by using the numerical method shown in the Appendix,then compare it with the proposed deep learning method. As discussed before, the deeplearning method is a mesh-free method, hence inherits all the beneﬁts of the mesh freenumerical methods meanwhile keeps a low computational cost in the testing phase. Wewill ﬁrst introduce the grid points of the testing data which are used in the numericalcomputational. Details of the two examples will then be presented.

In this section, we are going to show the computation details of the reference solu-tion. In order to obtain the reference solutions from the conventional solver, we adoptthe robust, implicit yet explicitly implementable Asymptotic-Preserving (AP) numericalscheme in the even-odd decomposition framework [9], which works with high resolutionuniformly with respect to the scaling parameter ε . We will use solutions of this solver asthe reference solutions to compare with the DNN approximated solutions in our numericaltests. For the convenience of readers, we review this scheme in the Appendix.Let Ω = [0 , t and x for the training are chosen uniformly asfollows: { ( t i , x j ) } i,j ∈ [0 , T ] × [0 , , with ﬁxed ∆ t, ∆ x. The integral in velocity can be computed using the quadrature rule, only few points areneeded, such as N v = 32 with { v k } N v k =1 . Use the grids { ( t = 0 , x j , v k ) } j,k for the initialcondition and { ( t i , x = 0 or 1 , v k ) } i,k for the boundary condition. We save the data of8 at ﬁxed t and x , for all velocity points at [ − , N v = 32.In the conventional ﬁnite diﬀerence AP solver, set x i (1 ≤ i ≤ N ) and N = 40. Here∆ x = 0 . t ∼ O ((∆ x ) ). In ourtest, ∆ t = 0 . x ) . To summarize, in both of our experiments, we test our models onthe reference solutions which are evaluated basing on the following ﬁne mesh: { ( t i , x j , v k ) } i,j,k ∈ [0 , T ] × [0 , × [ − , , ∆ t = 0 . x , ∆ x = 140 , ∆ v = 2 / (33 − . We consider a benchmark test for studying the linear transport model [9]. In practice,the scaling parameter (mean free path) may diﬀer in several orders of magnitude from therareﬁed regimes to the diﬀusive regimes in one problem, thus developing methods thatwork uniformly with respect to this parameter becomes important. Our learning methodmanages to achieve this goal, one can test on models at any given Knudsen numberwithout bringing additional challenges. In this test, we consider a smooth initial conditionand the diﬀusive scaling. Let the initial distribution be the double-peak Maxwellian:  ρ ( x ) = 1 + 12 sin(2 πx ) ,T ( x ) = 5 + 2 cos(2 πx )20 ,f ( t = 0 , x, v ) = ρ (cid:34) exp (cid:32) − (cid:18) v − . T (cid:19) (cid:33) + exp (cid:32) − (cid:18) v + 0 . T (cid:19) (cid:33)(cid:35) . (5.1)Periodic boundary condition is considered. Assume the scattering coeﬃcient σ = 1 and ε = 10 − .Similar to [8], we make the data of grid points for each variable for the DNN training.We generate the mesh uniformly with ∆ t = 0 . x , ∆ x = 1 / , ∆ v = 2 / (33 − × → × → × → × (cid:15) is small, the governing equation loss (3.1) will dominate the total loss and there aremultiple scales in the loss function, bringing us an additional diﬃculty in the weight9uning. Regarding this, we are going to adapt the strategy in [21, Algorithm 1]. To makethe current work more readable, we cite and present their algorithm in the Appendix. Theweights for initial and boundary losses are depicted in Figure (1). Note that the weightfor the general equation is normalized to 1. Our large scale experiments show that usingthe above-mentioned adaptive weights gives us more accurate prediction results than thepre-set constant weights.Figure 1: y axis: weight; x axis: training epoch. Left: boundary loss; right: initialcondition loss. The x- axis is the training epochs and y-axis is the weight.We computed the relative error of (6.3), more precisely:relative error = (cid:107) u True − u Predicted (cid:107)(cid:107) u True (cid:107) , where u True = (cid:82) +1 − f dv and we denote the true solution as f . The evolution of the relativeerrors with respected to the training epoch is shown in Figure (2).Figure 2: Relative loss deﬁned by (5.2), with respect to the number of training epochs.y axis: relative loss as deﬁned in (5.2); x axis: training epoch.Note that the largest relative error stabilizes at less than 10 percents when the neuralnetwork converges and the number of epochs reach about 2000. We also plot the (6.3) ofthe true and predicted solution in Figure (3) at diﬀerent time.10igure 3: Plot of the density at diﬀerent output time. The blue curve is the NN predictedsolution and the red curve is the reference solution.We can observe from the Figure (3) that the proposed neural network method gives anaccurate approximation to the reference solution; this approximation is close even whentime is large. Also, periodic boundary conditions are satisﬁed. This example veriﬁes thatour method is accurate and then we can beneﬁt from the many advantages of the learningalgorithms. One of the advantages is the high eﬃciency in predicting new samples oncethe model is trained. This is very useful since our framework is mesh-less and compareto the conventional numerical methods, we are able to calculate the solution at any givenpoints in the support of the equation. One hence can study the physics of the modeleﬃciently. In this experiment, we consider another benchmark test with the incoming boundarydata [9]. This problem is more complicated and challenging, since the solution containsthe boundary layer, and we manage to see the DNN approach can capture the solutionbehavior especially near the boundary. The initial condition is given by f (0 , x, v ) = 0,and boundary conditions are f ( t, , v ) = 1 , v ≥ f ( t, , v ) = 0 , v ≤ . Consider the diﬀusive regime with ε = 10 − , and σ = 1. Set ∆ t = 0 . · ∆ x and∆ x = 1 /

25. This gives us 41225 samples for the governing equation loss, 1746 samplesfor the boundary loss and 425 for the initial condition loss. We use a 4-layers fullyconnected network (3 × → × → × → ×

1) activated by Tanh andtrained by the Adam gradient descent for 400 epochs. The initial learning rate is set tobe 0.0005 and a step schedule is used which will reduced the rate by 5 percents every 5011pochs. Same as the ﬁrst example, we will adopt the adaptive weight balance strategy.The weight evolution of the initial and boundary loss is shown in Figure (4).Figure 4: y axis: weight; x axis: training epoch. Right: boundary loss; left: initialcondition loss. The loss for the governing equation is normalized to 1.We ﬁnally present the results of the experiment in ﬁgure 5.Figure 5: y axis: can be calculated by (6.3); x axis: points in spatial direction x. Eachﬁgure is the result at a time step, please see the subtitles for the details of the time level.The blue curve is the predicted solution and the red curve is the reference solution.The most challenging property of this example is the existing of the boundary layer.We can see from the Figure (5), the proposed method is able to capture the boundarylayer. This is the consequence of minimizing the combined loss and the solution hencehas the all the physical properties of the equation.

In this work, we proposed to solve the linear transport model by a learning method.We consider the diﬀusive scaling with a small Knudsen number in our numerical tests,while our analysis applies to all orders of the Knudsen number. The asymptotic-preservingsolver [9], as a robust traditional numerical method, was designed to tackle the stiﬀnessof the model brought by the small relaxing parameter, without resolving the numericaldiscretizations. However, it requires a solid understanding of kinetic theory and not very12easible to practical applications in physics or engineering. Our learning method, on theother hand, is mesh-free, easy to implement (no matter how complicated the initial orboundary conditions are), provides the numerical solution at any given points, while doesnot need a strong background on kinetic theory, thus making it more applicable to gen-eral research ﬁelds. Theoretically, we prove that the total loss function vanishes as thenetwork converges, then show that the neural network solution converges to the analyticsolution pointwisely. In order to demonstrate the advantages of the learning method, wetest on two benchmark examples, whose results show that our method is accurate andcan capture the quantities of interests accurately, given challenging initial or boundaryconditions.In the future, we will extend our proposed method to high-dimensional kinetic prob-lems with uncertainties, and develop new training methods–in particular work on weightbalancing of diﬀerent loss terms. We may also consider applying the PINN frameworkto solve inverse problems associated with kinetic models.

Appendix

A. The asymptotic-preserving method

We brieﬂy recall [9] for the reformulationto diﬀusive relaxation system of the linear transport equation (2.1), and its diﬀusionlimit system as ε →

0. This also prepares us to study the asymptotic behavior of thedistribution function, which will be studied in a follow-up work.First, we split (2.1) into two equations for v > ε∂ t f ( v ) + v∂ x f ( v ) = σ ( x ) ε (cid:18) (cid:90) − f ( v ) dv − f ( v ) (cid:19) ,ε∂ t f ( − v ) − v∂ x f ( − v ) = σ ( x ) ε (cid:18) (cid:90) − f ( v ) dv − f ( − v ) (cid:19) , (6.1)In this case consider the even and odd parities r ( t, x, v ) = 12 [ f ( t, x, v ) + f ( t, x, − v )] ,j ( t, x, v ) = 12 ε [ f ( t, x, v ) − f ( t, x, − v )] . Adding and subtracting the two equations in (6.1) leads to  ∂ t r + v∂ x j = σ ( x ) ε ( ρ − r ) ,∂ t j + vε ∂ x r = − σ ( x ) (cid:15) j. (6.2)where ρ ( t, x ) = (cid:90) r dv. (6.3)13s ε → + , (6.2) yields r = ρ, j = − vσ ( x ) ∂ x ρ. Substituting this into the ﬁrst equation of (6.2) and integrating over v , one gets thelimiting diﬀusion equation [1]:  j = − vσ ( x ) ∂ x ρ,∂ t ρ = ∂ x (cid:18) σ ( x ) ∂ x ρ (cid:19) . (6.4)We solve the diﬀusive relaxation system (6.2) by splitting it into a relaxation step,followed by a transport step. One can check details of the discretized scheme in [9], weomit it here. B. Weights Balance Algorithm

We review the weight balance algorithm studiedin [21], which designs appropriate weights of diﬀerent loss terms in the total loss function.

Algorithm 1:

Learning rate annealing for the PINN [21] Consider a physics-informed neural network f θ ( x ) with parameters θ and a lossfunction L = L G + M (cid:88) i =1 λ i L i ( θ ) . where L G is the governing equation loss and L i are the other losses (initialcondition and etc.). λ i are the weights to balance the interplay of the losses.; for n = 1 , ..., S do Compute ˆ λ i by: ˆ λ i = max θ |∇ θ L G ( θ n ) ||∇ θ L i ( θ n ) | , where |∇ θ L i ( θ n ) | is the mean of |∇ θ L i ( θ n ) | with respected to θ n ; Update the weights λ i using a moving average: λ i = (1 − α ) λ i + αλ i , i = 1 , ...M, where α is a constant and the authors suggest that α = 0 . Update the parameter θ via the gradient descent; end eferenceseferences