[PDF] Solving localized wave solutions of the derivative nonlinear Schrodinger equation using an improved PINN method

Abstract

The solving of the derivative nonlinear Schrodinger equation (DNLS) has attracted considerable attention in theoretical analysis and physical applications. Based on the physics-informed neural network (PINN) which has been put forward to uncover dynamical behaviors of nonlinear partial different equation from spatiotemporal data directly, an improved PINN method with neuron-wise locally adaptive activation function is presented to derive localized wave solutions of the DNLS in complex space. In order to compare the performance of above two methods, we reveal the dynamical behaviors and error analysis for localized wave solutions which include one-rational soliton solution, genuine rational soliton solutions and rogue wave solution of the DNLS by employing two methods, also exhibit vivid diagrams and detailed analysis. The numerical results demonstrate the improved method has faster convergence and better simulation effect. On the bases of the improved method, the effects for different numbers of initial points sampled, residual collocation points sampled, network layers, neurons per hidden layer on the second order genuine rational soliton solution dynamics of the DNLS are considered, and the relevant analysis when the locally adaptive activation function chooses different initial values of scalable parameters are also exhibited in the simulation of the two-order rogue wave solution.

Full PDF

aa r X i v : . [ n li n . PS ] J a n SOLVING LOCALIZED WAVE SOLUTIONS OF THE DERIVATIVE NONLINEARSCHR ¨ODINGER EQUATION USING AN IMPROVED PINN METHOD

JUNCAI PU, JUN LI, AND YONG CHEN ∗ Abstract.

The solving of the derivative nonlinear Schr¨odinger equation (DNLS) has attracted considerable attentionin theoretical analysis and physical applications. Based on the physics-informed neural network (PINN) which has beenput forward to uncover dynamical behaviors of nonlinear partial diﬀerent equation from spatiotemporal data directly,an improved PINN method with neuron-wise locally adaptive activation function is presented to derive localized wavesolutions of the DNLS in complex space. In order to compare the performance of above two methods, we reveal thedynamical behaviors and error analysis for localized wave solutions which include one-rational soliton solution, genuinerational soliton solutions and rogue wave solution of the DNLS by employing two methods, also exhibit vivid diagramsand detailed analysis. The numerical results demonstrate the improved method has faster convergence and bettersimulation eﬀect. On the bases of the improved method, the eﬀects for diﬀerent numbers of initial points sampled,residual collocation points sampled, network layers, neurons per hidden layer on the second order genuine rationalsoliton solution dynamics of the DNLS are considered, and the relevant analysis when the locally adaptive activationfunction chooses diﬀerent initial values of scalable parameters are also exhibited in the simulation of the two-order roguewave solution. Introduction

The derivative nonlinear Schr¨odinger equation (DNLS) iq t + q xx + i ( q q ∗ ) x = 0 , (1.1)plays a signiﬁcant role both in the integrable system theory and many physical applications, especially in space plasmaphysics and nonlinear optics [1,2]. Here, q = q ( x, t ) are complex-valued solutions, the superscript “ ∗ ” denotes complexconjugation, and the subscripts x and t denote the partial derivatives with respect to x and t , respectively. In recentdecades, many scholars have invested a lot of time and energy to study various mathematical and physical problemsof the DNLS. Mio et al. derived the DNLS of Alfven waves in plasma, and it well describes the propagation of smallamplitude nonlinear Alfven waves in low- β plasma, propagating strictly parallel or at a small angle to the ambientmagnetic ﬁeld [3, 4]. The results show that the large amplitude Magneto-Hydro-Dynamical waves propagating atarbitrary angle with the surrounding magnetic ﬁeld in high β plasma are also simulated by the DNLS. In nonlinearoptics, the modiﬁed nonlinear Schr¨odinger equation, which is gauge equivalent to the DNLS, is derived in the theoryof ultrashort femtosecond nonlinear pulse in optical ﬁber [5]. While the spectrum width of the pulse is equal tothe carrier frequency, the self steepening eﬀect of the pulse should be considered. In addition, the ﬁlamentation oflower-hybrid waves can be simulated by the DNLS which governs the asymptotic state of the ﬁlamentation, and itadmits moving solitary envelope solutions for the electric ﬁeld [6]. Ichikawa and co-workers obtained the peculiarstructure of spiky modulation of amplitude and phase, which is arisen from the derivative nonlinear coupling term [7].At present, the abundant solutions and integrability of the DNLS have been derived through diﬀerent methods. Kaupand Newell proved the integrability of the DNLS in the sense of inverse scattering method in 1978 [1]. Nakamuraand Chen constructed the ﬁrst N-soliton formula of the DNLS with the help of the Hirota bilinear transformationmethod [8]. Furthermore, based on Darboux transform technique, Huang and Chen established the determinant formof N-soliton formula [4]. Kamchatnov and cooperators not only proposed a method for ﬁnding periodic solutions ofseveral integrable evolution equations and applied it to the DNLS, but also dealt with the formation of solitons onthe sharp front of optical pulse in an optical ﬁber according to the DNLS [9, 10]. The Cauchy problem of the DNLShas been discussed by Hayashi and Ozawa [11]. The compact N-soliton formulae both with asymptotically vanishingand non-vanishing amplitudes were obtained by iterating B¨acklund transformation of the DNLS [12]. In addition, thehigh-order solitons, high-order rogue waves, and rational solutions for the DNLS have been given out explicitly withthe help of two kinds of generalized Darboux transformations which rely on certain limit technique [13]. Recently,more abundant solutions and new physical phenomena of the DNLS are revealed by various methods [14–19].In recent years, due to the explosive growth of available data and computing resources, neural networks(NNs)have been successfully applied in diverse ﬁelds, such as recommendation system, speech recognition, mathematicalphysics, computer vision, pattern recognition and so on [20–24]. Particularly, a physics-informed neural networkmethod (PINN) has been proved to be particularly suitable for solving and inversing equations which have beencontrolled mathematical physical systems on the basis of NNs, and found that the high-dimensional network tasks canbe completed with less data sets [25, 26]. The PINN method can not only accurately solve both forward problems,where the approximate solutions of governing equations are obtained, but also precisely deal with the highly ill-posedinverse problems, where parameters involved in the governing equation are inferred from the training data. Based ∗ on the abundant solutions and integrability of the integrable systems [27–29], we have simulated the one and twoorder rogue wave solutions of the integrable nonlinear Schr¨odinger equation by employing the deep learning methodwith physical constraints [30]. The slow convergence performance leads to the increase of training time and higherperformance requirements of experimental equipment, so it is essential to accelerate the convergence of the networkwithout sacriﬁcing the performance. Meanwhile, the original PINN method could not accurately reconstruct thecomplex solutions in some complicated equations. it is crucial to design a higher eﬃcient and more adaptable deeplearning algorithm to not only improve the accuracy of the simulated solution but also reduce the training cost.As is known to all, a signiﬁcant feature of NNs is the activation function, which determines the activation of speciﬁcneurons and the stability of network performance in the training process. There is just a rule-of-thumb for the choice ofactivation function, which depends entirely on the problem at hand. In the PINN algorithm, many activation functionssuch as the sigmoid function, tanh, sin etc are used to solve various problems, refer to [25, 31] for details. Recently, avariety of research methods for activation functions have been proposed to optimize convergence performance and raisethe training speed. Dushkoﬀ and Ptucha proposed multiple activation functions of per neuron, in which individualneuron chooses between multiple activation functions [32]. Li et al. proposed a tunable activation function while onlyone hidden layer is used [33]. The authors focused on learning activation functions in convolutional NNs by combiningbasic activation functions in a data-driven way [34]. Jagtap and collaborators employed adaptive activation functionsfor regression in PINN to approximate smooth and discontinuous functions as well as solutions of linear and nonlinearpartial diﬀerential equations, and introduced a scalable parameters in the activation function, which can be optimizedto achieve best performance of the network as it changes dynamically the topology of the loss function involved inthe optimization process [35]. The adaptive activation function has better learning capabilities than the traditionalﬁxed activation as it improves greatly the convergence rate, especially during early training, as well as the solutionaccuracy.In particular, Jagtap et al. presented two diﬀerent kinds of locally adaptive activation functions, namely layer-wiseand neuron-wise locally adaptive activation functions [36]. Compared with global adaptive activation functions [35],the locally adaptive activation functions further improve the training speed and performance of NNs. Furthermore,in order to further speed up the training process, a slope recovery term based on activation slope has been addedto the loss function of layer-wise and neuron-wise locally adaptive activation functions to improve the performanceof neural network. Recently, we focus on studying abundant solutions of integrable equations [22, 26, 30, 31] due tothey have better integrability such as Painlev´e integrability, Lax integrability, Liouville integrability and so on [37–39].Signiﬁcantly, the DNLS has been pointed out that it satisfy important integrability properties, and many typesof localized wave solutions have been obtained by various eﬀective methods [1–5]. We extend the PINN based onlocally adaptive activation function with slope recovery term which proposed by Jagtap and cooperator [36] to solvethe nonlinear integrable equation in complex space, and construct the localized wave solutions which consist of therational soliton solutions and rogue wave solution of the integrable DNLS. Meanwhile, we also demonstrate the relevantresults that contain the rational soliton solutions and rogue wave solution by exploiting the PINN, which are convenientfor comparative analysis. The performance comparison between the improved PINN method with locally adaptiveactivation functions and the PINN method are given out in detail.This paper is organized as follows. In section 2, we introduce brieﬂy discussions of the original PINN methodand improved PINN method with locally adaptive activation function, where also discuss about training data, lossfunction, optimization method and the operating environment. In Section 3, the one-rational soliton solution and theﬁrst order genuine rational soliton solution of the DNLS are obtained by two distinct PINN approaches. Section 4provides the second order genuine rational solution and two-order rogue wave solution for the DNLS, and the relative L errors of simulating the second order genuine rational solution of the DNLS with diﬀerent numbers of initial pointssampled, residual collocation points sampled, network layers and neurons per hidden layer are also given out in detail.Moreover, the eﬀects of the initial values of scalable parameters on the two-order rogue wave solution are shown.Conclusion is given out in last section. 2. Methodology

Here, we will consider the general (1+1)-dimensional nonlinear time-dependent integrable equations in complexspace, where each contains a dissipative term as well as other partial derivatives, such as nonlinear terms or dispersiveterms, as follows q t + N ( q, q x , q xx , q xxx , · · · ) = 0 , (2.1)where q are complex-valued solutions of x and t to be determined later, and N is a nonlinear functional of the solution q and its derivatives of arbitrary orders with respect to x . Due to the complexity of the structure of the solution q ( x, t ) of Eq. (2.1), we decompose q ( x, t ) into the real part u ( x, t ) and the imaginary part v ( x, t ), i.e. q = u + iv . It isobvious that u ( x, t ) and v ( x, t ) are real-valued functions. Then substituting into Eq. (2.1), we have u t + N u ( u, u x , u xx , u xxx , · · · ) = 0 , (2.2) v t + N v ( v, v x , v xx , v xxx , · · · ) = 0 , (2.3) OLVING LOCALIZED WAVE SOLUTIONS OF THE DNLS USING AN IMPROVED PINN METHOD 3 where N u and N v are nonlinear functionals of the corresponding solution and its derivatives of arbitrary orders withrespect to x , respectively. In this section, we will brieﬂy introduce the original PINN method and its improved version,respectively.2.1. The PINN method.

Here, we ﬁrst construct a simple multilayer feedforward neural network with depth D which contains an input layer, D − N d neurons in the d th hidden layer. Then, the d th hidden layer receives the post-activation output x d − ∈ R N d − of the previous layer as itsinput, and the speciﬁc aﬃne transformation is of the form L d (cid:0) x d − (cid:1) , W d x d − + b d , (2.4)where the network weight W d ∈ R N d × N d − and the bias term b d ∈ R N d to be learned are initialized using some specialstrategies, such as Xavier initialization or He initialization [40, 41].The nonlinear activation function σ ( · ) is applied component-wise to the aﬃne output L d of the present layer. Inaddition, this nonlinear activation is not applied in the output layer for some regression problems, or equivalently, wecan say that the identity activation is used in the output layer. Therefore, the neural network can be represented as q ( x ; Θ) = ( L D ◦ σ ◦ L D − ◦ · · · ◦ σ ◦ L ) ( x ) , (2.5)where the operator “ ◦ ” is the composition operator, Θ = n W d , b d o Dd =1 ∈ P represents the learnable parameters to beoptimized later in the network, and P is the parameter space, q and x = x are the output and input of the network,respectively.The universal approximation property of the neural network and the idea of physical constraints play key roles inthe PINN method. Thus, based on the PINN method [25], we can approximate the potential complex-valued solution q ( x, t ) of nonlinear integrable equations using a neural network. Then, the underlying laws of physics described bythe governing equations are embedded into the network. By the aid of automatic diﬀerentiation (AD) mechanism indeep learning [42], we can automatically and conveniently obtain the derivatives of the solution with respect to itsinputs, i.e., the time and space coordinates. Compared with the traditional numerical diﬀerentiation methods, AD isa mesh-free method and does not suﬀer from some common errors, such as the truncation errors and round-oﬀ errors.To a certain extent, this AD technique enables us to open the black box of the neural network. In addition, the physicsconstraints can be regarded as a regularization mechanism that allows us to accurately recover the solution using arelatively simple feedforward network and remarkably few amounts of data. Moreover, the underlying physical lawsintroduce part interpretability into the neural network.Speciﬁcally, we deﬁne the residual networks f u ( x, t ) and f v ( x, t ), which are given by the left-hand-side of Eq. (2.2)and (2.3), respectively f u := u t + N u ( u, u x , u xx , u xxx , · · · ) , (2.6) f v := v t + N v ( v, v x , v xx , v xxx , · · · ) . (2.7)Then the solution q ( x, t ) will be trained to satisfy these two physical constraint conditions (2.6) and (2.7), whichplay a vital role of regularization and have been embedded into the mean-squared objective function, that is, the lossfunction Loss Θ = Loss u + Loss v + Loss f u + Loss f v , (2.8)where Loss u = 1 N q N q X i =1 | u ( x iu , t iu ) − u i | , Loss v = 1 N q N q X i =1 | v ( x iv , t iv ) − v i | , (2.9)and Loss f u = 1 N f N f X j =1 | f u ( x jf u , t jf u ) | , Loss f v = 1 N f N f X j =1 | f v ( x jf v , t jf v ) | . (2.10)Here { x iu , t iu , u i } N q i =1 and { x iv , t iv , v i } N q i =1 denote the initial and boundary value data of q ( x, t ). Similarly, the collocationpoints for f u ( x, t ) and f v ( x, t ) are speciﬁed by { x jf u , t jf u } N f j =1 and { x jf v , t jf v } N f j =1 . The loss function (2.8) consists of theinitial-boundary value data and the structure imposed by Eq. (2.6) and (2.7) at a ﬁnite set of collocation points.Speciﬁcally, the ﬁrst and second terms on the right hand side of Eq. (2.8) attempt to ﬁt the solution data, and thethird and fourth terms on the right hand side learn to discover the real solution space. JUNCAI PU, JUN LI, AND YONG CHEN ∗ The improved PINN method.

The original PINN method could not accurately reconstruct complex solutions in some complicated nonlinearintegrable equations. Thus, we present an improved PINN method (IPINN) where a locally adaptive activationfunction technique is introduced into the original PINN method. It changes the slope of the activation functionadaptively, resulting in non-vanishing gradients and faster training of the network. There are several kinds of locallyadaptive activation functions, for example, layer-wise and neuron-wise. In this paper, we only consider the neuron-wiseversion due to some accuracy and performance requirements. Speciﬁcally, we ﬁrst deﬁne such activation function as σ (cid:0) na di (cid:0) L d (cid:0) x d − (cid:1)(cid:1) i (cid:1) , d = 1 , , · · · , D − , i = 1 , , · · · , N d , where n > { a di } are additional Σ D − d =1 N d parameters to be optimized. Note that, there isa critical scaling factor n c above which the optimization algorithm will become sensitive in each problem set. Theneuron activation function acts as a vector activation function in each hidden layer, and each neuron has its own slopeof activation function.Based on Eq. (2.5), the new neural network with neuron-wise locally adaptive activation function can be representedas q ( x ; ¯Θ) = (cid:0) L D ◦ σ ◦ na D − i ( L D − ) i ◦ · · · ◦ σ ◦ na i ( L ) i (cid:1) ( x ) , (2.11)where the set of trainable parameters ¯Θ ∈ ¯ P consists of n W d , b d o Dd =1 and (cid:8) a di (cid:9) D − i =1 , ∀ i = 1 , , · · · , N d , ¯ P is theparameter space. In this method, the initialization of scalable parameters are carried out in the case of na di = 1 , ∀ n > Loss ¯Θ = Loss u + Loss v + Loss f u + Loss f v + Loss S , (2.12)where Loss u , Loss v , Loss f u and Loss f v are deﬁned by Eqs. (2.9)-(2.10). The last slope recovery term Loss S in theloss function (2.12) is deﬁned as Loss S = 1 D − P D − d =1 exp (cid:18) P Ndi =1 a di N d (cid:19) , (2.13)This term Loss S forces the neural network to increase the activation slope value quickly, which ensures the non-vanishing of the gradient of the loss function and improves the network’s training speed. Compared with the PINNmethod in Section 2.1, the improved method induces a new gradient dynamics, which results in better convergencepoints and faster convergence rate. Jagtap et al. stated that a gradient descent algorithm such as stochastic gradientdescent (SGD) minimizing the loss function (2.12) does not converge to a sub-optimal critical point or a sub-optimallocal minimum, for the neuron-wise locally adaptive activation function, given certain appropriate initialization andlearning rates [36].In both methods, all loss functions are simply optimized by employing the L-BFGS algorithm, which is a full-batchgradient descent optimization algorithm based on a quasi-Newton method [43]. Especially, the scalable parameters inthe adaptive activation function are initialized generally as n = 10 , a di = 0 .

1, unless otherwise speciﬁed. In addition,we select relatively simple multi-layer perceptrons (i.e., feedforward neural networks) with the Xavier initializationand the tanh activation function. All the codes in this article is based on Python 3.8 and Tensorﬂow 1.15, and allnumerical experiments reported here are run on a DELL Precision 7920 Tower computer with 2.10 GHz 8-core XeonSilver 4110 processor and 64 GB memory.3.

One-rational soliton solution and first order genuine rational soliton solution of the DNLS

In this section, two diﬀerent neural network methods mentioned in the previous section are used to obtain thesimulation solution of the DNLS, and the dynamic behavior, error analysis and related plots of the one-rational solitonsolution and ﬁrst order genuine rational soliton solution for the DNLS are listed out in detail. We consider the DNLSalong with Dirichlet boundary conditions given by  iq t + q xx + i ( q q ∗ ) x = 0 , x ∈ [ x , x ] , t ∈ [ t , t ] ,q ( x, t ) = q ( x ) ,q ( x , t ) = q ( x , t ) , (3.1)where x , x represent the lower and upper boundaries of x respectively. Similarly, t and t represent the initial andﬁnal times of t respectively. The initial condition q ( x ) is an arbitrary complex-valued function. The rational solitonsolutions of the DNLS have been obtained by generalized Darboux transformations [13]. In this part, we will employtwo diﬀerent types of approaches which contain the PINN and IPINN to simulate two diﬀerent forms of rational solitonsolutions. Compared with the known exact solutions of the DNLS, so as to prove that the numerical solutions q ( x, t )obtained by neural network models is eﬀective. From Ref. [13], we can derived the form about one-rational soliton OLVING LOCALIZED WAVE SOLUTIONS OF THE DNLS USING AN IMPROVED PINN METHOD 5 solution and ﬁrst order genuine rational soliton solution of the DNLS. the one-rational soliton solution formulationshown as follow q ( x, t ) = 4 a [4 i ( a x − t + a c ) − a ] e i ( a x − t + a c ) a [4 i ( a x − t + a c ) + a ] , (3.2)where a, c are arbitrary constants, i = −

1. Therefore, the velocity for this one-rational soliton solution is a / a x − t + a c = 0, the altitude for | q ( x, t ) | is 16 /a .On the other hand, the ﬁrst order genuine rational soliton solution of the DNLS can be represented as following q ( x, t ) = − ( − x + 6 t − i )( − x + 6 t + 3 i )( − x + 6 t + i ) , (3.3)which is nothing but the rational traveling wave solution with non-vanishing background. In the next two sections,we use the PINN method and the improved PINN method to simulate the above two solutions, respectively. Somenecessary comparisons and analyses are exhibited in detail.3.1. One-rational soliton solution.

In this section, based on the neural network structure which contains nine hidden layers, each layer has 40 neurons,we numerically construct one-rational soliton solution of the DNLS via the PINN method and improved PINN method.One can obtain the exact one-rational soliton solution of Eq. (3.1) after taking a = 1 , c = 1 into Eq. (3.2) as follow q ( x, t ) = 4[4 i (1 − t + x ) − e i (1 − t + x ) [4 i (1 − t + x ) + 1] . (3.4)Then we take [ x , x ] and [ t , t ] in Eq. (3.1) as [ − . , .

0] and [ − . , . q ( x ) is obtained by substituting a speciﬁc initial value into (3.4) q ( x ) = 4[4 i (1 . x ) − e i (1 . x ) [4 i (1 . x ) + 1] . (3.5)We employ the traditional ﬁnite diﬀerence scheme on even grids in MATLAB to simulate Eq. (3.1) with the initialdata (3.4) to acquire the training data. In particular, the initialization of scalable parameters is n = 5 , a mi = 0 . − . , .

0] into 513 points and time [ − . , .

1] into 401 points, one-rational soliton solution q ( x, t ) is discretized into 401 snapshots accordingly. We generate a smaller training dataset that containing initial-boundary data by randomly extracting N q = 100 from original dataset and N f = 10000 collocation points whichare generated by the Latin hypercube sampling method [44]. After giving a dataset of initial and boundary points,the latent one-rational soliton solution q ( x, t ) has been successfully learned by tuning all learnable parameters of theneural network and regulating the loss function (2.8) and (2.12). The model of PINN achieves a relative L error of4.345103e −

02 in about 1314 . L error of 1.998304e −

02 in about 1358 . q ( x, t )and the iteration number curve plots under PINN and IPINN structures are plotted respectively. The pictures (a) inFig. 1 and Fig. 2 clearly compare the exact solution and the predicted spatiotemporal solution of the two diﬀerentmethods, respectively. We particularly present a comparison between the exact solution and the predicted solutionat diﬀerent times t = − . , , .

05 in the bottom panel of (a) in Fig. 1 and Fig. 2. Obviously, the bottom panel ofpicture (a) in Figure 2 shows that the predicted solution of DNLS equation is more consistent with the exact solutionthan the bottom panel of picture (a) in Figure 1. In other words, the simulation eﬀect of IPINN is better than PINN.It is not hard to see that the training loss curve of picture (b) in Fig. 2, which revealing the relation between iterationnumber and loss function, is more smooth and stable than the curve of picture (b) in Fig. 1. In this test case, theIPINN with slope recovery term perform better than PINN in terms of convergence speed and accuracy of the solution.3.2.

First order genuine rational soliton solution.

In this section, we numerically construct the ﬁrst order genuine rational soliton solution of Eq. (3.1) by using thePINN and IPINN in which both contains nine hidden layers, each layer has 40 neurons. Now we take [ x , x ] and[ t , t ] in Eq. (3.1) as [ − . , .

0] and [ − . , . q ( x ) = − ( − x − . − i )( − x − . i )( − x − . i ) . (3.6)With the same data generation and sampling method in Section 3.1, and we numerically simulate the ﬁrst ordergenuine rational soliton solution of the DNLS (1.1) by using the PINN and IPINN, respectively. The training datasetthat composed of initial-boundary data and collocation points is produced via randomly subsampling N q = 100 fromthe original dataset and selecting N f = 10000 conﬁguration points which are generated by LHS. After training theﬁrst order genuine rational soliton solution with the help of PINN, the neural network achieves a relative L error of5.598548e −

03 in about 349.5862 seconds, and the number of iterations is 3305. However, the network model by usingthe improved PINN method achieves a relative L error of 4.969464e −

03 in about 1103.1358 seconds, and the number

JUNCAI PU, JUN LI, AND YONG CHEN ∗ − . − .

05 0 .

00 0 .

05 0 . t − . − . − . − . . x Exact Dynamics − . − .

05 0 .

00 0 .

05 0 . t − . − . − . − . . x Learned Dynamics − − x | q | t = − . − − x | q | t = 0 . Exact Prediction − − x | q | t = 0 . a iterations − − l o ss ( l o g ) Training loss b Figure 1.

The one-rational soliton solution q ( x, t ) based on the PINN: (a) The density plots andthe sectional drawing; (b) The loss curve ﬁgure. − . − .

05 0 .

00 0 .

05 0 . t − . − . − . − . . x Exact Dynamics − . − .

05 0 .

00 0 .

05 0 . t − . − . − . − . . x Learned Dynamics − − x | q | t = − . − − x | q | t = 0 . Exact Prediction − − x | q | t = 0 . a iterations − l o ss ( l o g ) Training loss b Figure 2.

The one-rational soliton solution q ( x, t ) based on the IPINN: (a) The density plots andthe sectional drawing; (b) The loss curve ﬁgure.of iterations is 10384. Apparently, when simulating the ﬁrst order genuine rational soliton solution, the IPINN hasmore iterations, longer training time and smaller L error than the PINN.Fig. 3 shows the density plots, proﬁle and loss curve plots of the ﬁrst order genuine rational soliton solution byemploying the PINN. Figure 4 illustrates the density diagrams, proﬁles at diﬀerent instants, error dynamics diagrams,three dimensional motion and loss curve ﬁgure of the ﬁrst order genuine rational soliton solution based on the IPINN.We can clearly see that both methods can accurately simulate the ﬁrst order genuine rational soliton solution Fromthe ( a ) in Fig. 3 and Fig. 4. However, comparing the b-graph of Fig. 3 with the d-graph of Fig. 4, we can clearlyobserve that the loss function curve of the IPINN decreases faster and smoother, while the loss function curve of PINNﬂuctuates greatly when the number of iterations is about 1500, and the burr phenomenon is remarkable obvious inthe whole PINN training process. Furthermore, we can also gain that the ideal eﬀect has been achieved when theIPINN is used for training after 2000 iterations from the loss function curve in Figure 4, so we can artiﬁcially controlthe appropriate number of iterations to save the training cost in some speciﬁc cases. At t = − . , , .

40, we revealthe proﬁles of the three moments in bottom rows of ( a ) in Fig. 3 and Fig. 4, and ﬁnd the ﬁrst order genuine rationalsolution has the property of soliton due to the amplitude does not change with time. The ( b ) of Fig. 4 exhibt theerror dynamics of the diﬀerence value between the exact solution and the predicted solution for the ﬁrst order genuinerational soliton solution. In Fig. 4, the corresponding plot3d of the ﬁrst order genuine rational soliton solution isplotted, it is evident that the ﬁrst order genuine rational soliton solution is similar to the single-soliton solution with | q | = 1 plane wave.4. Second order genuine rational soliton solution and two-order rogue wave solution of theDNLS

In this section, we will use two diverse methods described in Section 2, which are consisted of PINN and IPINN,to construct the second order genuine rational soliton solution and two-order rogue wave solution of the DNLS,respectively. The detailed results and analysis are given out in the following two parts.4.1.

Second order genuine rational soliton solution.

In this section, based on the Dirichlet boundary conditions Eq. (3.1), we will numerically predict the second ordergenuine rational soliton solution of the DNLS by using the PINN method and improved PINN method, separately.

OLVING LOCALIZED WAVE SOLUTIONS OF THE DNLS USING AN IMPROVED PINN METHOD 7 − . . . t − . − . . . . x Exact Dynamics − . . . t − . − . . . . x Learned Dynamics . . . . − x | q | t = − . − x | q | t = 0 . Exact Prediction − x | q | t = 0 . a iterations − − − − l o ss ( l o g ) Training loss b Figure 3.

The ﬁrst order genuine rational soliton solution q ( x, t ) based on the PINN: (a) The densityplots and the sectional drawing; (b) The loss curve ﬁgure. − . . . t − . − . . . . x Exact Dynamics − . . . t − . − . . . . x Learned Dynamics . . . . − x | q | t = − . − x | q | t = 0 . Exact Prediction − x | q | t = 0 . a − . . . t − . − . . . . x Error Dynamics − . . . b x − . − . . . . t − . − . . . . | q | c iterations − l o ss ( l o g ) Training loss d Figure 4.

The ﬁrst order genuine rational soliton solution q ( x, t ) based on the IPINN: (a) Thedensity diagram and proﬁles at three diﬀerent instants; (b) The error density diagram; (c) The three-dimensional motion; (d) The loss curve ﬁgure.The second order genuine rational soliton solution of the DNLS has been derived in Ref. [13], the form is as follows q ( x, t ) = L ∗ L L , (4.1)where L = 8(3 t − x ) + 18(3 t − x ) + 48 t + 12 k + i [12(3 t − x ) + 3] ,L = 8(3 t − x ) − t − x ) + 48 t + 12 k + i [36(3 t − x ) − , and “ ∗ ” denotes self conjugate, k is an arbitrary real number. The norm of solution (4.1) attains the maximum valueﬁve which locates at ( x, t ) = (cid:0) − k, − k (cid:1) , and vanishes Eq. (4.1) at (cid:16) α − α − k , α − α − k (cid:17) , where α = ± q . The“ridge” of this soliton (4.1) approximately lays on the line x = 3 t . When t → ±∞ , above the second order genuinerational soliton solution (4.1) approaches to the ﬁrst order genuine rational soliton solution represented by (3.3) alongits “ridge”. JUNCAI PU, JUN LI, AND YONG CHEN ∗ − . − . . . . t − x Exact Dynamics − . − . . . . t − x Learned Dynamics − . . . x | q | t = − . − . . . x | q | t = 0 . Exact Prediction − . . . x | q | t = 0 . a iterations − − l o ss ( l o g ) Training loss b Figure 5.

The second order genuine rational soliton solution q ( x, t ) based on PINN: (a) The densityplots and the sectional drawing; (b) The loss curve ﬁgure.Then we take [ x , x ] and [ t , t ] in Eq. (3.1) as [ − . , .

0] and [ − . , . k = exp(1) and the speciﬁc initial value into (4.1), we have q ( x ) = L ′∗ L ′ L ′ , (4.2)where L ′ = 8( − . − x ) + 18( − . − x ) − . i [12( − . − x ) + 3] ,L ′ = 8( − . − x ) − − . − x ) − . i [36( − . − x ) − . Next, we obtain the initial and boundary value data set by the same data discretization method in Section 3.1. Byrandomly subsampling N q = 200 from the original dataset and selecting N f = 20000 conﬁguration points, a trainingdataset composed of initial-boundary data and collocation points is generated with the help of LHS. Then the data setis substituted into two neural network models which composed of two diﬀerent neural network algorithms to simulatethe second order genuine rational soliton solution. After training, the neural network model of PINN achieves arelative L error of 3.680510e −

02 in about 705.3579 seconds, and the number of iterations is 6167. However, thenetwork structure of IPINN achieves a relative L error of 4.295123e −

02 in about 874.1350 seconds, and the numberof iterations is 6142.The PINN experiment results have been summarized in Fig. 5, and we simulate the solution of q ( x, t ) and obtainthe density plots, proﬁle, iterative curve plots of the second order genuine rational soliton solution. From (b) of Figure5, it can be clearly observed that the curve of loss function declines very slowly, and there have a particularly largeﬂuctuation after 6500 iterations, which indicate that the PINN has slow convergence and poor stability of loss function.Fig. 6 displays the training outcome by choosing the improved PINN method, and the density diagrams, proﬁles atdiﬀerent instants, error dynamics diagrams, three dimensional motion and loss curve ﬁgure of the second order genuinerational soliton solution q ( x, t ) are illustrated. The top panel of (a) of Fig. 6 gives the density map of hidden solution q ( x, t ), and when combing (b) of Fig. 6 with the bottom panel of (a) in Fig. 6, we can see that the relative L erroris relatively large at t > .

20. From (d) of Fig. 6, in contrast with the ﬁrst order genuine rational soliton solution byutilizing the improved PINN method in Section 3.2, the loss function curve of the second order genuine rational solitonsolution is relatively stable, and the whole iterative process is relatively long, which is completely diﬀerent from thesharp drop of the loss function curve about the ﬁrst order genuine rational soliton solution and the less number ofeﬀective iterations in (d) of Fig. 4. In a word, from the two neural network methods, the results show that both thePINN and IPINN can simulate the second order genuine rational soliton solution accurately, and the training time,relative error and iteration number are similar, but the iterative process of IPINN is more stable and the trainingperformance is better. There is no doubt that the IPINN is more reliable in training higher order solutions of theDNLS.In addition, according to the neural network model of IPINN, we obtain the following two tables speciﬁcally. Basedon the same initial and boundary values of the second order genuine rational soliton solution in the case of N q = 200and N f = 20000, we employ the control variable method which is often used in physic to study the eﬀects of diﬀerentlevels of neural networks and diﬀerent numbers of single-layer neurons on the second order genuine rational solitonsolution dynamics of the DNLS. Moreover, the relative L error of diﬀerent layers of neural networks and diﬀerentnumbers of single-layer neurons are given in Table 1. From the data in Table 1, we can see that when the numberof neural network layers is ﬁxed, the more the number of single-layer neurons, the smaller the relative L error. Ofcourse, due to the inﬂuence of randomness, there are individual data results that do not meet the previous conclusion,but on the whole the conclusion is tenable. Similarly, when the number of single-layer neurons is ﬁxed, the deeper thelayer is, the smaller the relative error is. To sum up, we can draw the conclusion that the number of layers of neuralnetwork and the number of single-layer neurons jointly determine the relative L error, and when the number of layers OLVING LOCALIZED WAVE SOLUTIONS OF THE DNLS USING AN IMPROVED PINN METHOD 9 − . − . . . . t − x Exact Dynamics − . − . . . . t − x Learned Dynamics − . . . x | q | t = − . − . . . x | q | t = 0 . Exact Prediction − . . . x | q | t = 0 . a − . − . . . . t − x Error Dynamics − . . . . b x − − − t − . − . . . . | q | c iterations l o ss ( l o g ) Training loss d Figure 6.

The second order genuine rational soliton solution q ( x, t ) based on IPINN: (a) The densityplots and the sectional drawing; (b) The error density plots; (c) The three-dimensional plots; (d) Theiterative curve plots.is not less than 6 and the number of neurons in a single layer is not less than 30, the overall relative error is small. Inthe case of the same original dataset, Table 2 shows the relative L error of nine-layer neural network and single-layerneural network with 40 neurons when taking diﬀerent number of sampling points N q in the initial-boundary data anddiﬀerent number of collocation points N f which are generated by the Latin hypercube sampling method. From thetable 2, we can see that the inﬂuences of N q and N f on the relative L error of neural network are not so obvious.After careful observation, when taking N f = 20000, regardless of the number of N q , the overall relative L error issmall, which also explain why the neural network model can simulate more accurate numerical solutions with smallerinitial data set. Table 1.

The second order genuine rational soliton solution of the DNLS by using the IPINN:Relative ﬁnal prediction error measure in the L norm for diﬀerent number of hidden layers andneurons in each layer. Layers Neurons 20 30 40 50 603 Table 2.

The second order genuine rational soliton solution of the DNLS by using the IPINN:Relative ﬁnal prediction error measure in the L norm for diﬀerent number of N q and N f . N q N f ∗ Two-order rogue wave solution.

Recently, the study of rogue waves is one of the hot topics in many areas including optics, plasma, ocean dynamics,machine learning, Bose-Einstein condensate and even ﬁnance and so on [30,45–50]. In addition to the peak amplitudemore than twice of the background wave, rogue waves also have the characteristics of instability and unpredictability.Therefore, the researches and applications of rogue waves play an momentous role in real life, especially how toavoid the damage to ships caused by ocean rogue waves is of great practical signiﬁcance. At present, Marcucci et al.have investigated the computational machine in which nonlinear waves replace the internal layers of neural networks,discussed learning conditions, and demonstrated functional interpolation, data interpolation, data sets, and Booleanoperations. When the nonlinear Schr¨odinger equation is considered, the use of highly nonlinear regions means thatsolitons, rogue waves and shock waves play a leading role in the training and calculation [47]. Moreover, the dynamicalbehaviors and error analysis about the one-order and two-order rogue waves of the nonlinear Schr¨odinger equationhave been revealed by the deep learning neural network with physical constraints for the ﬁrst time [30]. The roguewave solutions of the DNLS were derived in via Darboux transformation [51], and the high-order rogue wave solutionsare obtained by generalized Darboux transformation [13]. However, to the best of our knowledge, the machine learningwith neural network model has not been exploited to simulate the rogue wave solution of the DNLS. In this section,we construct the two-order rogue wave solution of the DNLS by employing the PINN and IPINN, respectively. Somevital comparisons are given out to better describe the advantages of PINN and IPINN.On the basis of the Dirichlet boundary conditions Eq. (3.1), we will numerically training the two-order rogue wavesolution of the DNLS by employing the PINN method and improved PINN method, separately. The two-order roguewave solution of the DNLS has been derived in Ref. [13], the form can be represented as following q ( x, t ) = R ∗ R R exp( − ix ) , (4.3)where R = 8 x + 24 x t + 24 x t + 8 t + 24 ix − ix t + 48 ix t − ix t + 24 ixt − it − x + 48 x t − x t + 48 xt + 180 t + 48 ix − ixt − it + 90 x − xt + 666 t + 54 ix − it + 9 ,R = 8 x + 24 x t + 24 x t + 8 t − ix − ix t − ix t − ix t − ixt − it − x − x t − x t − xt − t + 48 ix + 288 ix t + 576 ixt − it − x + 504 xt − t + 90 ix + 414 it + 45 . Then we take [ x , x ] and [ t , t ] in Eq. (3.1) as [ − . , .

5] and [ − . , . q ( x ) = R ′∗ R ′ R ′ exp( − ix ) , (4.4)where R ′ = 8 x − . x − . x + 89 . x + 0 . x + 9 . i (24 x + 0 . x + 48 . x + 0 . x + 53 . x + 1 . ,R ′ = 8 x − . x + 1 . x − . x − . x + 44 . i ( − x + 0 . x + 47 . x − . x + 90 . x − . . Similar to the discretization method in Section 3.1, we randomly sample N q = 300 from the original initial boundaryvalue condition data set, and select N f = 20000 conﬁguration points which are generated by the LHS method. Thus,the training dataset of initial boundary value data and conﬁguration points are formed. After training with twomethods, the neural network model of PINN achieves a relative L error of 8.412217e-02 in about 1188.4475 seconds,and the number of iterations is 9470. Moreover, introducing the IPINN, the structure attains a relative L error of7.262528e-02 in about 2924.0589 seconds, and the number of iterations is 18394. It can be seen from the above resultsthat under the same experimental conditions, the relative L error of IPINN method is smaller than that of PINN forsimulating rogue wave solution, but the improved PINN method has longer training time and more iterations. Next,we will give the speciﬁc numerical results and correlation analysis.The density plots, the sectional drawing and the error density plots of the two-order rogue wave solution areexhibited by employing the PINN method in Fig. 7. In bottom panel of (a) in Figure 7, one can observe that the wavepeak of the two-order rogue wave is well simulated, but the simulation on both sides of the wave peak is poor, whichcan also be veriﬁed from the error diagram in ﬁgure (b) in Fig. 7. On the other hand, as for the neural network modelwhich applying the improved PINN method, its density plot, section drawing, error density plots, three-dimensionaldiagram and loss function curve diagram are shown in detail in Figure 8. Similarly, we ﬁnd that the wave peaksimulation in (a) of Fig. 8 is not as good as that in Figure 7, but the simulation on both sides of the wave peak isbetter, which is just opposite to the situation simulated of the PINN. On the whole, the simulation satisfaction ofIPINN is higher, and it has more research value. The chart (b) in Figure 8 shows that there is a little error at themiddle peak, where the error is the diﬀerence value between the accurate solution and the predicted solution. The 3Dplots and the loss function curve are shown in (c) and (d) of Figure 8, respectively. From ﬁgure (d) of Fig. 8, it can OLVING LOCALIZED WAVE SOLUTIONS OF THE DNLS USING AN IMPROVED PINN METHOD 11 − . − .

005 0 .

000 0 .

005 0 . t − x Exact Dynamics − . − .

005 0 .

000 0 .

005 0 . t − x Learned Dynamics − x | q | t = − . − x | q | t = 0 . Exact Prediction − x | q | t = 0 . a − . − .

005 0 .

000 0 .

005 0 . t − x Error Dynamics . . b Figure 7.

The two-order rogue wave solution q ( x, t ) based on the PINN: (a) The density plots andthe sectional drawing; (b) The error density plots. − . − .

005 0 .

000 0 .

005 0 . t − x Exact Dynamics − . − .

005 0 .

000 0 .

005 0 . t − x Learned Dynamics − x | q | t = − . − x | q | t = 0 . Exact Prediction − x | q | t = 0 . a − . − .

005 0 .

000 0 .

005 0 . t − x Error Dynamics − . . . . b x − − − t − . − . . . . | q | c iterations − l o ss ( l o g ) Training loss d Figure 8.

The two-order rogue wave solution q ( x, t ) based on the IPINN: (a) The density plots andthe sectional drawing; (b) The error density plots; (c) The three-dimensional plots; (d) The iterativecurve plots.be seen that the loss value ﬂuctuates greatly when the number of iterations is around 2500, and then decreases slowlyfrom 1 to 0.1.Futhermore, initialization of the scaled parameters can be done in various ways as long as such value does notcause divergence of the loss. In this work, the scaled parameters are initialized as na mi = 1 , ∀ n >

1. Although, anincrease in scaling factor speeds up the convergence rate, at the same time the parameter a mi becomes more sensitive.In order to better understand the inﬂuence of initialization of scalable parameters on the improved PINN algorithmmodel, we present four diﬀerent initialization conditions of scalable parameters to obtain the two-order rogue wavesolution by employing the improved PINN method in Table 3. From the Table 3, we can drastically observe that whenamplify the scaled hyper-parameter n in the initialization conditions of scalable parameters, the number of iterationsand training time increase, but the relative L error does not blindly dwindle. When the hyper-parameter n = 10, therelative L error is minimum and the training eﬀect is better in Table 3. This also reveals why we generally choosethe initialization of scalable parameters as n = 10 , a mi = 0 . ∗ Table 3.

The two-order rogue wave solution of the DNLS by utilizing the IPINN: Relative L normerror, training time and iterations for diﬀerent initialization conditions of scalable parameters(ICSP). Type ICSP

Variable a,(n=1) Variable a,(n=5) Variable a,(n=10) Variable a,(n=20)Relative error 1.309705e-01 8.760463e-02 7.262528e-02 1.191116e-01Training time 1499.7196 2203.6984 2924.0589 3983.7843Iterations 9194 14385 18394 24043utilized the PINN to simulate the rogue wave solutions of the nonlinear Schr¨odinger equation in Ref. [30]. In thissection, a large number of experiments and analysis have been carried out, and ﬁnally the two-order rogue wavesolution of the DNLS has been imitated. In term of the same experimental conditions and environment, the PINN isbetter at simulating the wave crest, and the IPINN has better comprehensive eﬀect on wave crest and both sides ofwave crest. Apparently, the IPINN has more advantages about the overall eﬀect, especially in simulation of the morecomplex rogue wave solutions. 5.

Conclusion

Compared with traditional numerical methods, the PINN method has no mesh size limits and gives full play tothe advantages of computer science. Moreover, due to the physical constraints, the neural network is trained withremarkably few data and fast convergence rate, and has a better physical interpretability. These numerical methodsshowcase a series of results of various problems in the interdisciplinary ﬁeld of applied mathematics and computationalscience which open a new path for using machine learning to simulate unknown solutions and correspondingly discoverthe parametric equations in scientiﬁc computing. It also provides a theoretical and practical basis for dealing withsome high-dimensional scientiﬁc problems that can not be solved before.In this paper, based on the PINN method, an improved PINN method which contains the locally adaptive activationfunction with scalable parameters is introduced to solve the classical integrable DNLS. The improved PINN methodachieves a better performance of the neural network through such learnable parameters in the activation function.Speciﬁcally, applying two data-driven algorithms which including the PINN and IPINN to deduce the localized wavesolutions which consist of the one-rational soliton, genuine rational soliton solutions and rogue wave solution for theDNLS. In all these cases, compared with the original PINN method, it is shown that the decay of loss function is fasterin the case of the improved PINN method, and correspondingly the relative L error in the simulation of solutionis shown to be similar or even smaller in the proposed approach. We outline how diﬀerent types of localized wavesolutions are generated due to diﬀerent choices of initial and boundary value data. Remarkably, these numericalresults show that the improved PINN method with locally adaptive activation function is more powerful than thePINN method in exactly recovering the diﬀerent dynamic behaviors of the DNLS.The improved PINN approach is a promising and powerful method to increase the eﬃciency, robustness and accuracyof the neural network based approximation of nonlinear functions as well as abundant localized wave solutions ofintegrable equations. Furthermore, more general nonlinear integrable equation, such as the Hirota equation whichhas been widely concerned in integrable systems, is not investigated in our work. Due to the ability of the improvedPINN to accelerate the convergence rate and improve the network performance, more complex integrable equationscould also be considered, such as the Kaup-Newell systems, Sasa-Satsuma equation, Camassa-Holm equation and soon. How to combine machine learning with integrable system theory more deeply and build signiﬁcant integrabledeep learning algorithm is an urgent problem to be solved in the future. These new problems and challenges will beconsidered in the future research. References [1] D. J. Kaup, A. C. Newell, An exact solution for a derivative nonlinear Schr¨odinger equation, J. Math. Phys. 19 (1978) 798-801.[2] E. Mjølhus, On the modulational instability of hydromagnetic waves parallel to the magnetic ﬁeld, J. Plasma Phys. 16 (1976) 321-334.[3] K. Mio, T. Ogino, K. Minami, S. Takeda, Modiﬁed nonlinear Schr¨odinger equation for Alfv´en waves propagating along the magneticﬁeld in cold plasmas, J. Phys. Soc. Japan. 41 (1976) 265-271.[4] N. N. Huang, Z. Y. Chen, Alfven solitons, J. Phys. A: Math. Gen. 23 (1990) 439-453.[5] X. J. Chen, W. K. Lam, Inverse scattering transform for the derivative nonlinear Schr¨odinger equation with nonvanishing boundaryconditions, Phys. Rev. E 69 (2004) 066604.[6] K. H. Spatchek, P. K. Shukla, M. Y. Yu, Filamentation of lower-hybrid cones, Nucl. Fusion 18 (1977) 290-293.[7] Y. Ichikawa, K. Konno, M. Wadati, H. Sanuki, Spiky soliton in circular polarized Alfv´en wave, J. Phys. Soc. Japan. 48 (1980) 279-286.[8] A. Nakamura, H. H. Chen, Multi-soliton solutions of a derivative nonlinear schr¨odinger equation, J. Phys. Soc. Japan. 49 (1980)813-816.[9] A. M. Kamchatnov, On improving the eﬀectiveness of periodic solutions of the NLS and DNLS equations, J. Phys. A: Math. Gen. 23(1990) 2945-2960.[10] A. M. Kamchatnov, S. A. Darmanyan, F. Lederer, Forrnation of solitons on the sharp front of the pulse in an optical ﬁber, Phys. Lett.A 245 (1998) 259-264.[11] N. Hayashi, T. Ozawa, On the derivative nonlinear Schr¨odinger equation, Physica D 55 (1992) 14-36.

OLVING LOCALIZED WAVE SOLUTIONS OF THE DNLS USING AN IMPROVED PINN METHOD 13 [12] H. Steudel, The hierarchy of multi-soliton solutions of the derivative nonlinear Schr¨odinger equation, J. Phys. A: Math. Gen. 36 (2003)1931-1946.[13] B. L. Guo, L. M. Ling, Q. P. Liu, High-order solutions and generalized Darboux transformations of derivative nonlinear Schr¨odingerequations, Stud. Appl. Math. 130 (2012) 317-344.[14] T. Xu, Y. Chen, Mixed interactions of localized waves in the three-component coupled derivative nonlinear Schr¨odinger equations,Nonlinear Dyn. 92 (2018) 2133-2142.[15] B. Xue, J. Shen, X. G. Geng, Breathers and breather-rogue waves on a periodic background for the derivative nonlinear Schr¨odingerequation, Phys. Scr. 95 (2020) 055216.[16] S. W. Xu, J. S. He, D. Mihalache, Rogue waves generation through multiphase solutions degeneration for the derivative nonlinearSchr¨odinger equation, Nonlinear Dyn. 97 (2019) 2443-2452.[17] G. Q. Zhang, Z. Y. Yan, The derivative nonlinear Schr¨odinger equation with zero/nonzero boundary conditions: inverse scatteringtransforms and N-double-pole solutions, J. Nonlinear Sci. 30 (2020) 3089-3127.[18] L. Wang, M. Li, F. H. Qi, C. Geng, Breather interactions, higher-order rogue waves and nonlinear tunneling for a derivative nonlinearSchr¨odinger equation in inhomogeneous nonlinear optics and plasmas, Eur. Phys. J. D 69 (2015) 108.[19] B. Yang, J. C. Chen, J. K. Yang, Rogue Waves in the generalized derivative nonlinear Schr¨odinger equations, J. Nonlinear Sci. 30(2020) 3027-3056.[20] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436-444.[21] C. M. Bishop, Pattern Recognition and Machine Learning, Springer Press, 2006.[22] J. Li, Y. Chen, Solving second-order nonlinear evolution partial diﬀerential equations using deep learning, Commun. Theor. Phys. 72(2020) 105005.[23] A. Krizhevsky, I. Sutskever, G. Hinton, ImageNet classiﬁcation with deep convolutional neural networks, Commun. Acm 60 (2017)84-90.[24] B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept learning through probabilistic program induction, Science 350(2015) 1332-1338.[25] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward andinverse problems involving nonlinear partial diﬀerential equations, J. Comput. Phys. 378 (2019) 686-707.[26] J. Li, Y. Chen, A deep learning method for solving third-order nonlinear evolution equations, Commun. Theor. Phys. 72 (2020) 115003.[27] J. C. Pu, Y. Chen, Nonlocal symmetries, B¨acklund transformation and interaction solutions for the integrable Boussinesq equation,Modern Phys. Lett. B 34 (2020) 2050288.[28] R. Hirota, Direct Methods in Soliton Theory, Springer-verlag Press, 2004.[29] V. E. Zakharov, S. V. Manakov, S. P. Novikov, L. P. Pitaevskii, The Theory of Solitons: The Inverse Scattering Method, ConsultantsBureau Press, New York, 1984.[30] J. C. Pu, J. Li, Y. Chen, Soliton, breather and rogue wave solutions for solving the nonlinear Schr¨odinger equation using a deeplearning method with physical constraints, Chin. Phys. B (2021) in press.[31] J. Li, Y. Chen, A physics-constrained deep residual network for solving the sine-Gordon equation, Commun. Theor. Phys. 73 (2021)015001.[32] M. Dushkoﬀ, R. Ptucha, Adaptive activation functions for deep networks, Electronic imaging, computational imaging XIV (2016) pp.1-5(5).[33] B. Li, Y. B. Li, X. W. Rong, The extreme learning machine learning algorithm with tunable activation function, Neural Comput.Applic. 22 (2013) 531-539.[34] S. Qian, H. Liu, C. Liu, S. Wu, H. S. Wong, Adaptive activation functions in convolutional neural networks, Neurocomputing 272(2018) 204-212.[35] A. D. Jagtap, K. Kawaguchi, G. E. Karniadakis, Adaptive activation functions accelerate convergence in deep and physics-informedneural networks, J. Comput. Phys. 404 (2020) 109136.[36] A. D. Jagtap, K. Kawaguchi, G. E. Karniadakis, Locally adaptive activation functions with slope recovery for deep and physics-informedneural networks, Proc. R. Soc. A 476 (2020) 20200334.[37] J. Weiss, M. Tabor, G. Carnevale, The Painlev´e property for partial diﬀerential equations, J. Math. Phys. 24 (1983) 522-526.[38] P. D. Lax, Integrals of nonlinear equations of evolution and solitary waves, Comm. Pure. Appl. Math. 21 (1968) 467-490.[39] G. Z. Tu, On Liouvilie integrability of zero-curvature equations and the Yang hierarchy, J. Phys. A: Math. Gen. 22 (1989) 2375-2392.[40] X. Glorot, Y. Bengio, Understanding the diﬃculty of training deep feedforward neural networks, Proc. AISTATS (2010) pp. 249-256.[41] K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun, Delving deep into rectiﬁers: surpassing human-level performance on ImageNet classiﬁcation,ICCV (2015) pp. 1026-1034.[42] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic diﬀerentiation in machine learning: a survey, J. Mach.Learning Research 18 (2018) 1-43.[43] D. C. Liu, J. Nocedal, On the limited memory BFGS method for large scale optimization, Math. Program. 45 (1989) 503-528.[44] M. Stein, Large sample properties of simulations using Latin hypercube sampling, Technometrics 29 (1987) 143-151.[45] D. R. Solli, C. Ropers, P. Koonath, B. Jalali, Optical rogue waves, Nature 450 (2007) 1054-1057.[46] Y. F. Yue, L. L. Huang, Y. Chen, Modulation instability, rogue waves and spectral analysis for the sixth-order nonlinear Schr¨odingerequation, Commun. Nonlinear. Sci. Numer. Simulat. 89 (2020) 105284.[47] G. Marcucci, D. Pierangeli, C. Conti, Theory of neuromorphic computing by waves: machine learning by rogue waves, dispersiveshocks, and solitons, Phys. Rev. Lett. 125 (2020) 093901.[48] M. M. Wang, Y. Chen, Dynamic behaviors of mixed localized solutions for the three-component coupled Fokas–Lenells system,Nonlinear Dyn. 98 (2019) 1781-1794.[49] Z. Y. Yan, Vector ﬁnancial rogue waves, Phys. Lett. A 375 (2011) 4274-4279.[50] X. E. Zhang, Y Chen, Inverse scattering transformation for generalized nonlinear Schr¨odinger equation, Appl. Math. Lett. 98 (2019)306-313.[51] S. W. Xu, J. S. He, L. H. Wang, The Darboux transformation of the derivative nonlinear Schr¨odinger equation, J. Phys. A: Math.Theor. 44 (2011) 305203. ∗ (JP) School of Mathematical Sciences, Shanghai Key Laboratory of Pure Mathematics and Mathematical Practice, andShanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200241, People’s Republicof China (JL)

Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, People’sRepublic of China (YC)

School of Mathematical Sciences, Shanghai Key Laboratory of Pure Mathematics and Mathematical Practice, andShanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200241, People’s Republicof China (YC)

College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao 266590,People’s Republic of China (YC)

Department of Physics, Zhejiang Normal University, Jinhua 321004, People’s Republic of China

Email address ::