A deep neural network approach on solving the linear transport model under diffusive scaling
AA deep neural network approach on solving the lineartransport model under diffusive scaling
Liu Liu ∗ , Tieyong Zeng † , Zecheng Zhang ‡ Abstract
In this work, we propose a learning method for solving the linear transport equa-tion under the diffusive scaling. Due to the multiscale nature of our model equation,the model is challenging to solve by using conventional methods. We employ thephysical informed neural network (PINN) framework, a mesh-free learning methodthat can numerically solve partial differential equations. Compared to conventionalmethods (such as finite difference or finite element type), our proposed learningmethod is able to obtain the solution at any given points in the chosen domain accu-rately and efficiently, which enables us to better understand the physics underlyingthe model. In our framework, the solution is approximated by a neural network thatsatisfies both the governing equation and other constraints. The network is thentrained with a combination of different loss terms. Using the approximation theoryand energy estimates for kinetic models, we prove theoretically that the total lossvanishes as the neural network converges, upon which the neural network approxi-mated solution converges pointwisely to the analytic solution of the linear transportmodel. Numerical experiments for two benchmark examples are conducted to verifythe effectiveness and accuracy of our proposed method.
The linear transport model, which describes the kinetic particles collision and absorp-tion through a material medium while evolving in time, is an important physics modeland arises in applications of various fields. In this paper, we study the linear transportequation under the diffusive scaling by using the deep neural network approach.The deep learning method has been successful in solving various problems in thecomputer science area; it improves the computation speed and the accuracy of manytasks such as the image classification, segmentation and language processing. Once themodel is trained, one can use it by substituting the testing samples in the model, which ∗ Department of Mathematics, The Chinese University of Hong Kong † Department of Mathematics, The Chinese University of Hong Kong ‡ Department of Mathematics, Purdue University, West Lafayette, IN 47906, USA a r X i v : . [ m a t h . NA ] F e b sually is a forward process and only tensor productions are involved. The data-drivensolvers for partial differential equations (PDEs) have drawn an increasing attention due totheir capability to encode the underlying physical laws governed by the model equationand give relatively accurate predictions for the unknowns. In particular, we mentionthe physical informed neural networks (PINNs) framework [10, 18, 16, 11, 19, 7, 21].This method leverages the benefits of auto-differentiation of the current software andthe underlying physics of the PDEs, which can be incorporated into the network byminimizing the losses composed of the PDE’s residuals and other constraints such as theinitial and boundary conditions. There are many other works developed, for example[2, 3, 26, 24]. In this work, we apply this popular method to solve the important,hyperbolic-type kinetic equation.The Deep Learning method can resolve some difficulties in traditional finite difference-type numerical methods [17], such as the expensive computational cost especially forproblems with high-dimensional physical variables, the challenge dealing with complexboundary conditions, truncation of the velocity domain (in some models such as theBoltzmann equation, the velocity lies in the three-dimensional whole space thus a trun-cation is needed in numerical discretizations). Besides, the Deep Learning algorithm hasthe advantage of being intuitive and easy to be executed. For example, instead of de-signing mass (or momentum and energy, if applicable) conservative schemes for kineticmodels, that are challenging in traditional numerical methods [6], one can simply involvethe derivative in time of conserved quantities of interests in the total loss function, asimplemented in [8].We mention some other advantages of using the DNN approach to solve the lineartransport model: a) to obtain the distribution function at any given t , x , v , instead ofgetting only discrete values at the uniform mesh in traditional finite volume or finite el-ement numerical methods; as a mesh-free method, works efficiently for high-dimensionalphysical space problems; b) to avoid high computational cost on the simulation of ki-netic equations due to the velocity variable v , and the integral-based, nonlocal collisionoperators appeared in other complicated kinetic models such as the Boltzmann or Lan-dau equations. Nevertheless, we mention that there are indeed some weaknesses of theDeep Learning approach. First, there is no guarantee that the Deep Learning algorithmwill converge and it is practically difficult to show their convergence. It is also hardto evaluate the accuracy of the DNN approach in contrast with traditional numericalmethods.This paper is organized as the following. In Section 2, we introduce the backgroundof the linear transport model under the diffusive scaling. In Section 3, we review anddiscuss the neural network framework and method for solving general PDEs. Two mainconvergence results are given in Section 4, showing that 1) the loss function goes to zeroas the neural network converges; 2) the neural network solution converges point-wisely2o the analytic solution when the loss function converges to zero. The effectiveness andaccuracy of our proposed method, including the choice of weights in the total loss functionwill be presented in Section 5. Finally, we summarize the paper and mention some futurework in Section 6. The linear transport model, which describes the kinetic particles collision and ab-sorption through a material medium while evolving in time, arises in many applications,such as atmosphere and ocean modeling [4, 20, 25], astrophysics [15] or nuclear physics.Such problems usually involve several orders of magnitudes of length scales characterizedby the Knudsen number, defined as the ratio of the mean free path over a typical lengthscale such as the size of the spatial domain.We consider the linear transport equation under diffusive scaling, with one-dimensionalspace and velocity variable ( x ∈ Ω ⊂ R , v ∈ [ − , ε∂ t f + v · ∇ x f = 1 ε L ( f ) , f ( t = 0 , x, v ) = f ( x, v ) , L ( f ) = σ ( x ) (cid:20) (cid:90) − f ( t, x, v ) dv − f ( t, x, v ) (cid:21) . (2.1) We briefly review the Deep Neural Network (DNN) structure and approach introducedin [8], where the one-dimensional kinetic Fokker-Planck equation is studied. Denote theapproximated function as f nn ( t, x, v, z ; m, w, b ) and suppose DNN has L layers; the inputlayer takes ( t, x, v ) as input and the final layer gives f nn ( t, x, v, z ; m, w, b ) as the output.The relation between the l -th and ( l + 1)-th layer ( l = 1 , , · · · L −
1) is given by u ( l +1) j = m l (cid:88) i =1 w ( l +1) ji ¯ σ l ( z li ) + b ( l +1) j , where m = ( m , m , m , . . . , m L − ), the weights w = (cid:110) w ( k ) ji (cid:111) m k − ,m k ,Li,j,k =1 and the bias b = (cid:110) b ( k ) j (cid:111) m k ,Lj =1 ,k =1 are given in [8], which we refer to for details.Regarding the optimization algorithm, we use Adam optimization algorithm, whichis an extended algorithm of the stochastic gradient descent and is popularly used in theapplications of the deep learning. 3 .1 Definition of loss functions
The loss function for the governing linear transport equation (2.1) is defined by
Loss GE = (cid:90) T (cid:90) Ω (cid:90) V (cid:12)(cid:12)(cid:12) ε∂ t f nn ( t, x, v ; m, w, b ) + v k ∂ x f nn ( t, x, v ; m, w, b ) − ε L ( f nn )( t, x, v ; m, w, b ) (cid:12)(cid:12)(cid:12) dvdxdt ≈ N i,j,k (cid:88) i,j,k (cid:12)(cid:12)(cid:12) ∂ t f nn ( t i , x j , v k ; m, w, b ) + v k ∂ x f nn ( t i , x j , v k ; m, w, b ) − ε L ( f nn )( t i , x j , v k ; m, w, b ) (cid:12)(cid:12)(cid:12) , (3.1)where N i,j,k = N i N j N k , and the collision operator L ( f nn )( t i , x j , v k ; m, w, b ) = σ ( x j ) (cid:34) N k (cid:88) k f nn ( t i , x j , v k ; m, w, b ) − f nn ( t i , x j , v k ; m, w, b ) (cid:35) . Define n x the unit outward normal vector on the boundary ∂ Ω, and γ def = ∂ Ω × [ − , γ + , incoming boundary γ − and a singular boundary γ , defined by γ + def = { ( x, v ) ∈ ∂ Ω × [ − ,
1] : n x · v > } ,γ − def = { ( x, v ) ∈ ∂ Ω × [ − ,
1] : n x · v < } ,γ = { ( x, v ) ∈ ∂ Ω × [ − ,
1] : n x · v = 0 } . We now define the loss terms for the initial condition and the boundary conditions:
Loss IC = (cid:90) Ω (cid:90) V | f nn (0 , x, v ) − f ( x, v ) | dvdx ≈ N j,k (cid:88) j,k | f nn (0 , x j , v k ) − f ( x j , v k ) | . (3.2)For inflow boundary condition f ( t, x, v ) | γ − = g ( t, x, v ) on x ∈ ∂ Ω, one has
Loss BC = (cid:88) x ∈ ∂ Ω (cid:90) T (cid:90) V | f nn ( t, x, v ; m, w, b ) − g ( t, x, v ; m, w, b ) | dvdt ≈ | ∂ Ω | N i,k (cid:88) i,k | f nn ( t i , x, v k ; m, w, b ) − g ( t i , x, v k ; m, w, b ) | , (3.3)where | ∂ Ω | denotes the volume of the spatial boundary. Add up (3.1), (3.2) and (3.3),with appropriate weights { λ g , λ i , λ b } , the total loss function is defined by Loss
T otal = λ g Loss GE + λ i Loss IC + λ b Loss BC . (3.4)4 Analysis results
In this section, we show the two main theoretical results.Recall that neural network architecture was first introduced in [14]. Later in [5],Cybenko established sufficient conditions where a continuous function can be approxi-mated by finite linear combinations of single hidden layer neural networks, followed bythe work in [12] that extends the theory to the multi-layer network case. We paraphrasethe Universal Approximation Theorem [5] to the form that is needed in our context: If f solves the linear transport model (2.1) and is sufficiently smooth in all variables, thenthere exists a two-layer neural network f nn ( t, x, v ) = m (cid:88) i =1 w (2)1 i ¯ σ (cid:16)(cid:16) w (1) i , w (1) i , w (1) i (cid:17) · ( t, x, v ) + b (1) i (cid:17) + b (2)1 such that || f − f nn || L ∞ ( K ) < η, || ∂ t ( f − f nn ) || L ∞ ( K ) < η, ||∇ x ( f − f nn ) || L ∞ ( K ) < η, where the domain K = [0 , T ] × Ω × [ − , We first show that a sequence of neural network solutions to (2.1) exists such thatthe total loss function converges to zero.
Theorem 4.1.
Let f solves the equation (2.1) and satisfies f ∈ C ([0 , T ]) ∩ C (Ω × V ) .Then there exists a sequence of neural network parameters { m [ j ] , w [ j ] , b [ j ] } ∞ j =1 such that thesequence of DNN solutions with m [ j ] nodes, given by { f j ( t, x, v ) = f nn ( t, x, v ; m [ j ] , w [ j ] , b [ j ] ) } ∞ j =1 satisfies Loss
Total ( f j ) → , as j → ∞ . Proof.
Denote the velocity domain D = [ − ,
1] and µ ( V ) = (cid:82) D dv = 2. Define d ge,j ( t, x, v, z ) := − [ ε∂ t + v · ∇ x ] f j + 1 ε L ( f j ) . (4.1)Integrate | d ge,j | over [0 , T ] × Ω × D , which is equivalent to | ε∂ t ( f − f j ) + v · ∇ x ( f − f j ) − ε L ( f ) + 1 ε L ( f j ) | (4.2)We will show that the loss term (3.1) is bounded by O ( η ). For the first two termsin (4.2), it is bounded by η T µ ( V ) | Ω | ( ε + µ ( V ) ). Observe that for any function g , (cid:90) − ( L ( g )) dv = − σ (cid:18)(cid:90) − g ( v ) dv (cid:19) + σ (cid:90) − g ( v ) dv ≤ σ (cid:90) − g ( v ) dv. (4.3)5ince || f − f j || L ( D ) ≤ || f − f j || L ∞ ( D ) µ ( V ) < Cη , (4.4)Let g = f − f j in (4.3) and by (4.4), we have (cid:107)L ( f ) − L ( f j ) (cid:107) L v = (cid:107)L ( f − f j ) (cid:107) L v < O ( η ).So the last two terms in (4.2): (cid:13)(cid:13)(cid:13)(cid:13) ε L ( f ) − ε L ( f j ) (cid:13)(cid:13)(cid:13)(cid:13) L (Ω ×D ) < O (cid:18) η ε (cid:19) , (4.5)since Ω is bounded. Therefore, the lost term Loss GE in (3.1) bounded by O (cid:0) ε η + ( ηε ) (cid:1) .For the inflow boundary condition, Loss BC is bounded by || f j − f || L ( γ − T ) ≤ T µ ( V ) | ∂ Ω | || f j − f || L ∞ ( γ − T ) ≤ T µ ( V ) | ∂ Ω | || f j − f || L ∞ ([0 ,T ] × Ω ×D ) ≤ O ( η ) , where γ ± T := [0 , T ] × γ ± . Note that the specular reflection boundary condition workssimilarly. For the initial data, denoted by f j,in and f in respectively, Loss IC = || f j,in − f in || L (Ω ×D ) ≤ || f j,in − f in || L ∞ (Ω ×D ) | Ω | µ ( V ) ≤ O ( η ) . Set η = η j = j , combine all the loss terms (3.1)-(3.3), we conclude that Loss
T otal ( f j ) ≤ O ( 1 ε j ) . Therefore,
Loss
T otal ( f j ) →
0, as j → ∞ . In this section, we show that with the parameters { m [ j ] , w [ j ] , b [ j ] } ∞ j =1 equipped, theneural network in Theorem 4.1 converges to the analytic solution of the linear transportmodel. Theorem 4.2.
Let { m [ j ] , w [ j ] , b [ j ] } ∞ j =1 be a sequence defined in Theorem 4.1 and f solvesthe linear transport model (2.1) . As a consequence, Loss
Total ( f j ) → implies that || f j ( · , · , · , m [ j ] , w [ j ] , b [ j ] ) − f || L ∞ ([0 ,T ]; L (Ω ×D )) → , for some finite time t ∈ [0 , T ] , physical variables ( x, v ) ∈ Ω × D . Proof.
Recall the definition (4.1), then[ ∂ t + v · ∇ x ] { f − f j } = d ge,j ( t, x, v ) + 1 ε L ( f ) − ε L ( f j ) . (4.6)Define d ic,j ( x, v ) := f ( x, v ) − f j (0 , x, v ) ,
6n addition to d bc,j ( t, x, v ) := g ( t, x, v ) − f j ( t, x, v ) at ( t, x, v ) ∈ γ − T , for inflow boundary condition, and d bc,j ( t, x, v ) := f j ( t, x, − v ) − f j ( t, x, v ) for specularreflection boundary condition.The L norms and and inner products below stand for || · || L (Ω ×D ) . Multiplying( f − f j ) onto (4.6) and integrating over Ω × D , one gets (cid:90) Ω (cid:90) D ε∂ t ( f − f j ) dvdx + (cid:90) γ + ( f − f j ) v · n x dγ − (cid:90) γ − d bc,j | v · n x | dγ = 2 ε (cid:10) L ( f − f j ) , f − f j (cid:11) L + 2 (cid:10) d ge,j , f − f j (cid:11) L . (4.7)Recall (4.5) that (cid:107)L ( f − f j ) (cid:107) L ≤ Cη . The right-hand-side in (4.7) is bounded by2 ε (cid:10) L ( f − f j ) , f − f j (cid:11) L ≤ ε (cid:107)L ( f − f j ) (cid:107) L + 1 ε (cid:107) f − f j (cid:107) L , and since (cid:82) γ + ( f − f j ) v · n x dγ ≥
0, thus ε ddt || f − f j || L ≤ (cid:90) γ − d bc,j | v · n x | dγ + || d ge,j || L + Cη ε (cid:124) (cid:123)(cid:122) (cid:125) := H ( t ) +(1 + 1 ε ) || f − f j || L . Since (cid:82) γ − d bc,j | v · n x | dγ ≤ (cid:82) γ − d bc,j dγ , and the definitions that (cid:90) t (cid:90) γ − ( d bc,j ( s, · , · , · )) dγds = Loss BC , (cid:90) t || d ge,j ( s, · , · , · ) || L ds = Loss GE , then || f − f j || L (Ω ×D ) ≤ e ( ε + ε ) t Loss IC + 1 ε e ( ε + ε ) t (cid:90) t H ( s ) ds (cid:46) e ( ε + ε ) t (cid:18) Loss IC + 1 ε Loss BC + 1 ε Loss GE + C η ε t (cid:19) ≤ e ( ε + ε ) t (cid:18) ε Loss T otal + C η ε t (cid:19) . We already know from Theorem (4.1) that
Loss
T otal ( f j ) ≤ O ( η ε ), now take L ∞ in t ∈ [0 , T ] with η = η j = j →
0, then || f − f j || L (Ω ×D ) ≤ C (cid:48) e ( ε + ε ) t η (cid:18) ε + C tε (cid:19) . (4.8)Therefore, || f − f j || L ∞ ([0 ,T ]; L (Ω ×D )) → , as j → ∞ . emark 4.3. In [13], Liu et al. employed the hypercoercivity and Lyapunov-type func-tionals to conduct sensitivity analysis for a general class of kinetic equations with randomuncertainties and multiple scales. It is possible to adopt that framework and energy es-timates analysis to study the convergence of the DNN solution (or even improve to astronger result of exponential decaying in time), as presented in this section. However,due to its high complexity and main focus of the current manuscript, we defer to a futurework. Also, we mention that a similar analysis techniques of this section can be foundin [27], where the radiative transfer model in the kinetic regime ( ε = 1 ), with numericalexamples of different applications and its long time behavior was studied by the DNNapproach. In this section, we are going to verify the proposed method with two examples. Thefirst example aims to show the accuracy of the method, and a more challenging problemwith a boundary layer is studied in the second test, which verifies that our proposedmethod is able to capture the physical properties of the model. We will compute thesolutions of the model problems by using the numerical method shown in the Appendix,then compare it with the proposed deep learning method. As discussed before, the deeplearning method is a mesh-free method, hence inherits all the benefits of the mesh freenumerical methods meanwhile keeps a low computational cost in the testing phase. Wewill first introduce the grid points of the testing data which are used in the numericalcomputational. Details of the two examples will then be presented.
In this section, we are going to show the computation details of the reference solu-tion. In order to obtain the reference solutions from the conventional solver, we adoptthe robust, implicit yet explicitly implementable Asymptotic-Preserving (AP) numericalscheme in the even-odd decomposition framework [9], which works with high resolutionuniformly with respect to the scaling parameter ε . We will use solutions of this solver asthe reference solutions to compare with the DNN approximated solutions in our numericaltests. For the convenience of readers, we review this scheme in the Appendix.Let Ω = [0 , t and x for the training are chosen uniformly asfollows: { ( t i , x j ) } i,j ∈ [0 , T ] × [0 , , with fixed ∆ t, ∆ x. The integral in velocity can be computed using the quadrature rule, only few points areneeded, such as N v = 32 with { v k } N v k =1 . Use the grids { ( t = 0 , x j , v k ) } j,k for the initialcondition and { ( t i , x = 0 or 1 , v k ) } i,k for the boundary condition. We save the data of8 at fixed t and x , for all velocity points at [ − , N v = 32.In the conventional finite difference AP solver, set x i (1 ≤ i ≤ N ) and N = 40. Here∆ x = 0 . t ∼ O ((∆ x ) ). In ourtest, ∆ t = 0 . x ) . To summarize, in both of our experiments, we test our models onthe reference solutions which are evaluated basing on the following fine mesh: { ( t i , x j , v k ) } i,j,k ∈ [0 , T ] × [0 , × [ − , , ∆ t = 0 . x , ∆ x = 140 , ∆ v = 2 / (33 − . We consider a benchmark test for studying the linear transport model [9]. In practice,the scaling parameter (mean free path) may differ in several orders of magnitude from therarefied regimes to the diffusive regimes in one problem, thus developing methods thatwork uniformly with respect to this parameter becomes important. Our learning methodmanages to achieve this goal, one can test on models at any given Knudsen numberwithout bringing additional challenges. In this test, we consider a smooth initial conditionand the diffusive scaling. Let the initial distribution be the double-peak Maxwellian: ρ ( x ) = 1 + 12 sin(2 πx ) ,T ( x ) = 5 + 2 cos(2 πx )20 ,f ( t = 0 , x, v ) = ρ (cid:34) exp (cid:32) − (cid:18) v − . T (cid:19) (cid:33) + exp (cid:32) − (cid:18) v + 0 . T (cid:19) (cid:33)(cid:35) . (5.1)Periodic boundary condition is considered. Assume the scattering coefficient σ = 1 and ε = 10 − .Similar to [8], we make the data of grid points for each variable for the DNN training.We generate the mesh uniformly with ∆ t = 0 . x , ∆ x = 1 / , ∆ v = 2 / (33 − × → × → × → × (cid:15) is small, the governing equation loss (3.1) will dominate the total loss and there aremultiple scales in the loss function, bringing us an additional difficulty in the weight9uning. Regarding this, we are going to adapt the strategy in [21, Algorithm 1]. To makethe current work more readable, we cite and present their algorithm in the Appendix. Theweights for initial and boundary losses are depicted in Figure (1). Note that the weightfor the general equation is normalized to 1. Our large scale experiments show that usingthe above-mentioned adaptive weights gives us more accurate prediction results than thepre-set constant weights.Figure 1: y axis: weight; x axis: training epoch. Left: boundary loss; right: initialcondition loss. The x- axis is the training epochs and y-axis is the weight.We computed the relative error of (6.3), more precisely:relative error = (cid:107) u True − u Predicted (cid:107)(cid:107) u True (cid:107) , where u True = (cid:82) +1 − f dv and we denote the true solution as f . The evolution of the relativeerrors with respected to the training epoch is shown in Figure (2).Figure 2: Relative loss defined by (5.2), with respect to the number of training epochs.y axis: relative loss as defined in (5.2); x axis: training epoch.Note that the largest relative error stabilizes at less than 10 percents when the neuralnetwork converges and the number of epochs reach about 2000. We also plot the (6.3) ofthe true and predicted solution in Figure (3) at different time.10igure 3: Plot of the density at different output time. The blue curve is the NN predictedsolution and the red curve is the reference solution.We can observe from the Figure (3) that the proposed neural network method gives anaccurate approximation to the reference solution; this approximation is close even whentime is large. Also, periodic boundary conditions are satisfied. This example verifies thatour method is accurate and then we can benefit from the many advantages of the learningalgorithms. One of the advantages is the high efficiency in predicting new samples oncethe model is trained. This is very useful since our framework is mesh-less and compareto the conventional numerical methods, we are able to calculate the solution at any givenpoints in the support of the equation. One hence can study the physics of the modelefficiently. In this experiment, we consider another benchmark test with the incoming boundarydata [9]. This problem is more complicated and challenging, since the solution containsthe boundary layer, and we manage to see the DNN approach can capture the solutionbehavior especially near the boundary. The initial condition is given by f (0 , x, v ) = 0,and boundary conditions are f ( t, , v ) = 1 , v ≥ f ( t, , v ) = 0 , v ≤ . Consider the diffusive regime with ε = 10 − , and σ = 1. Set ∆ t = 0 . · ∆ x and∆ x = 1 /
25. This gives us 41225 samples for the governing equation loss, 1746 samplesfor the boundary loss and 425 for the initial condition loss. We use a 4-layers fullyconnected network (3 × → × → × → ×
1) activated by Tanh andtrained by the Adam gradient descent for 400 epochs. The initial learning rate is set tobe 0.0005 and a step schedule is used which will reduced the rate by 5 percents every 5011pochs. Same as the first example, we will adopt the adaptive weight balance strategy.The weight evolution of the initial and boundary loss is shown in Figure (4).Figure 4: y axis: weight; x axis: training epoch. Right: boundary loss; left: initialcondition loss. The loss for the governing equation is normalized to 1.We finally present the results of the experiment in figure 5.Figure 5: y axis: can be calculated by (6.3); x axis: points in spatial direction x. Eachfigure is the result at a time step, please see the subtitles for the details of the time level.The blue curve is the predicted solution and the red curve is the reference solution.The most challenging property of this example is the existing of the boundary layer.We can see from the Figure (5), the proposed method is able to capture the boundarylayer. This is the consequence of minimizing the combined loss and the solution hencehas the all the physical properties of the equation.
In this work, we proposed to solve the linear transport model by a learning method.We consider the diffusive scaling with a small Knudsen number in our numerical tests,while our analysis applies to all orders of the Knudsen number. The asymptotic-preservingsolver [9], as a robust traditional numerical method, was designed to tackle the stiffnessof the model brought by the small relaxing parameter, without resolving the numericaldiscretizations. However, it requires a solid understanding of kinetic theory and not very12easible to practical applications in physics or engineering. Our learning method, on theother hand, is mesh-free, easy to implement (no matter how complicated the initial orboundary conditions are), provides the numerical solution at any given points, while doesnot need a strong background on kinetic theory, thus making it more applicable to gen-eral research fields. Theoretically, we prove that the total loss function vanishes as thenetwork converges, then show that the neural network solution converges to the analyticsolution pointwisely. In order to demonstrate the advantages of the learning method, wetest on two benchmark examples, whose results show that our method is accurate andcan capture the quantities of interests accurately, given challenging initial or boundaryconditions.In the future, we will extend our proposed method to high-dimensional kinetic prob-lems with uncertainties, and develop new training methods–in particular work on weightbalancing of different loss terms. We may also consider applying the PINN frameworkto solve inverse problems associated with kinetic models.
Appendix
A. The asymptotic-preserving method
We briefly recall [9] for the reformulationto diffusive relaxation system of the linear transport equation (2.1), and its diffusionlimit system as ε →
0. This also prepares us to study the asymptotic behavior of thedistribution function, which will be studied in a follow-up work.First, we split (2.1) into two equations for v > ε∂ t f ( v ) + v∂ x f ( v ) = σ ( x ) ε (cid:18) (cid:90) − f ( v ) dv − f ( v ) (cid:19) ,ε∂ t f ( − v ) − v∂ x f ( − v ) = σ ( x ) ε (cid:18) (cid:90) − f ( v ) dv − f ( − v ) (cid:19) , (6.1)In this case consider the even and odd parities r ( t, x, v ) = 12 [ f ( t, x, v ) + f ( t, x, − v )] ,j ( t, x, v ) = 12 ε [ f ( t, x, v ) − f ( t, x, − v )] . Adding and subtracting the two equations in (6.1) leads to ∂ t r + v∂ x j = σ ( x ) ε ( ρ − r ) ,∂ t j + vε ∂ x r = − σ ( x ) (cid:15) j. (6.2)where ρ ( t, x ) = (cid:90) r dv. (6.3)13s ε → + , (6.2) yields r = ρ, j = − vσ ( x ) ∂ x ρ. Substituting this into the first equation of (6.2) and integrating over v , one gets thelimiting diffusion equation [1]: j = − vσ ( x ) ∂ x ρ,∂ t ρ = ∂ x (cid:18) σ ( x ) ∂ x ρ (cid:19) . (6.4)We solve the diffusive relaxation system (6.2) by splitting it into a relaxation step,followed by a transport step. One can check details of the discretized scheme in [9], weomit it here. B. Weights Balance Algorithm
We review the weight balance algorithm studiedin [21], which designs appropriate weights of different loss terms in the total loss function.
Algorithm 1:
Learning rate annealing for the PINN [21] Consider a physics-informed neural network f θ ( x ) with parameters θ and a lossfunction L = L G + M (cid:88) i =1 λ i L i ( θ ) . where L G is the governing equation loss and L i are the other losses (initialcondition and etc.). λ i are the weights to balance the interplay of the losses.; for n = 1 , ..., S do Compute ˆ λ i by: ˆ λ i = max θ |∇ θ L G ( θ n ) ||∇ θ L i ( θ n ) | , where |∇ θ L i ( θ n ) | is the mean of |∇ θ L i ( θ n ) | with respected to θ n ; Update the weights λ i using a moving average: λ i = (1 − α ) λ i + αλ i , i = 1 , ...M, where α is a constant and the authors suggest that α = 0 . Update the parameter θ via the gradient descent; end eferenceseferences