[PDF] DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm

Abstract

Data-driven methods have recently been developed to discover underlying partial differential equations (PDEs) of physical problems. However, for these methods, a complete candidate library of potential terms in a PDE are usually required. To overcome this limitation, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In the proposed framework, a deep neural network that is trained with available data of a physical problem is utilized to generate meta-data and calculate derivatives, and the genetic algorithm is then employed to discover the underlying PDE. Owing to the merits of the genetic algorithm, such as mutation and crossover, DLGA-PDE can work with an incomplete candidate library. The proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV) equation, the Burgers equation, the wave equation, and the Chaffee-Infante equation, respectively, for proof-of-concept. Satisfactory results are obtained without the need for a complete candidate library, even in the presence of noisy and limited data.

Full PDF

1 DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm

Running Title : Discovery of PDEs with incomplete candidate library

Hao Xu a , Haibin Chang a,* , and Dongxiao Zhang b,c,* a BIC-ESAT, ERE, and SKLTCS, College of Engineering, Peking University, Beijing 100871, P. R. China b School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, P. R. China c Intelligent Energy Lab, Frontier Research Center, Peng Cheng Laboratory, Shenzhen 518000, P. R. China * Corresponding author. E-mail address: [email protected] (H. Xu); [email protected] (H. Chang); [email protected] (D. Zhang).

Abstract : Data-driven methods have recently been developed to discover underlying partial differential equations (PDEs) of physical problems. However, for these methods, a complete candidate library of potential terms in a PDE are usually required. To overcome this limitation, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In the proposed framework, a deep neural network that is trained with available data of a physical problem is utilized to generate meta-data and calculate derivatives, and the genetic algorithm is then employed to discover the underlying PDE. Owing to the merits of the genetic algorithm, such as mutation and crossover, DLGA-PDE can work with an incomplete candidate library. The proposed DLGA-PDE is tested for discovery of the Korteweg–de Vries (KdV) equation, the Burgers equation, the wave equation, and the Chaffee-Infante equation, respectively, for proof-of-concept. Satisfactory results are obtained without the need for a complete candidate library, even in the presence of noisy and limited data.

Keywords:

PDE discovery; incomplete candidate library; machine learning; deep neural network; genetic algorithm. Introduction

With the rapid development of data processing capabilities of computers, data-driven methods have been widely used in various fields. In recent years, some researches about data-driven discovery of partial differential equations (PDEs) have been performed. Among these works, sparse regression and neural network are two main techniques for carrying out PDE discovery. For data-driven discovery of PDEs using sparse regression, the sparse terms that constitute a PDE are selected from a pre-determined candidate library, which is a collection of potential terms. In existing works, different sparse regression techniques, such as sequential threshold least-squares, sequential threshold ridge regression (STRidge), least absolute shrinkage and selection operator (LASSO), sparse group lasso, and threshold sparse Bayesian regression are utilized to obtain a parsimonious model [2,4,12,13,14,18]. Although sparse regression methods can obtain a parsimonious model with high computational efficiency, these methods may not work well with noisy data, especially if the finite difference method is adopted for calculating the derivatives that are required in the candidate library. Besides the sparse regression technique, neural network is another useful technique that has been employed for data-driven discovery of PDEs. Raissi et al. [9] proposed the physics-informed neural network (PINN) to solve the forward and inverse problem of PDEs. In their work, the PDE terms are supposed to be known, and only the coefficients are learned from data. Long et al. [6] proposed a convolutional neural network-based framework, named PDE-NET, for discovery of PDEs. However, the results may lack parsimony. Xu et al. [17] proposed a deep-learning framework, called DL-PDE, which combines neural network methods and sparse regression methods for PDE discovery. In their work, the neural network is utilized to generate meta-data and calculate derivatives, and sparse regression is then utilized to discover the PDE. Qin et al. [8] and Wu et al. [16] used the residual network (ResNet) to discover underlying PDEs. Hasan et al. [5] introduced a regularization scheme into the loss function of the neural network to prevent overﬁtting and improve the accuracy of PDE discovery. Compared with numerical methods, neural network can provide a stable calculation of derivatives via automatic differentiation, which makes it more robust to data noise. However, many shortcomings remain for the neural network technique. For example, training the neural network may be time-consuming, and the proper structure of neural networks may be challenging to design.

For both sparse regression and neural network based PDE discovery methods, a pre-defined complete candidate library is usually necessary. This means that the terms that constitute the PDE to be discovered are supposed to be contained in the candidate library. If the candidate library is incomplete, the methods mentioned above will fail to discover the correct PDE. Constructing a complete candidate library is difficult for practical application of PDE discovery methods. A large candidate library is an option, but it cannot be guaranteed to be complete, and it will increase difficulty for sparse identification. Meanwhile, most previous works are constructed on the assumption that the left-hand side of the PDE is the first-order derivative with respect to time. If the left-hand side is not specified with the exact term, these methods may have difficulty in discovering the correct PDE. In order to solve these problems, some recent work combining data-driven methods and an evolutionary approach have been proposed. For example, Maslyaev et al. [7] proposed a data-driven algorithm based on evolutionary optimization to discover PDEs, and compared with STRidge, better performance of the proposed method was achieved. Atkinson et al. [1] presented a framework to discover free-form governing equations with genetic programming algorithms. Compared with traditional data-driven methods, the evolutionary approach can discover PDEs with an incomplete candidate library, which greatly increases its applicability for PDE discovery. However, some drawbacks still exist for evolutionary methods. First, in existing works, derivative calculations are performed using Gaussian process and finite difference methods, which cannot work well with noisy data. Second, evolutionary methods converge slowly and necessitate repeated trials to avoid local minima. For solving the above issues, in this work, we propose a novel framework for PDE discovery, called DLGA-PDE, which combines the neural network method and genetic algorithm. In our proposed method, a deep neural network is first trained with available data of a physical problem, then utilized to generate meta-data and calculate derivatives, and finally the genetic algorithm is employed to discover the underlying PDE. In the genetic algorithm of DLGA-PDE, a special encoding method is created to digitize the PDE, which is able to handle multiple options for the candidate terms, including the left-hand side (time derivatives). In addition, crossover and mutation are proposed to expand the searching scope of candidate equations, which enables DLGA-PDE to work with an incomplete candidate library. Meanwhile, l -regularization is utilized in the fitness function to accelerate the rate of calculation and convergence. All of the features mentioned above make DLGA-PDE more flexible and appropriate for dealing with complex problems. Numerical experiments demonstrate that DLGA-PDE works well with limited and noisy data, and converges quickly. The remainder of this paper is organized as follows. Section 2 describes the methodology of DLGA-PDE in detail. Case studies are presented in Section 3. Discussions and conclusions are given in Section 4. Methodology

PDE Discovery

In this work, an extensive form of PDE is investigated as:  )( uu T (1) with ,...],,...,,,,[)( xxxxxx uuuuuuuuu  (2) where T u denotes different orders of derivatives of u with respect to t . For example, T u can be t u or tt u . Different from existing works, with varied options for the temporal derivative, the PDE form shown in Eq. (1) can cover a larger variety of equations. )( u  denotes the candidate library of potential terms in the PDE; and  is the coefficient of corresponding terms. Given the observation data (or meta-data), which are denoted as { ( , )} Ni i i u x t  , we have the following:      ),(),( ),(),( ),(),(),( ),( ),( NNxNN xxNNTTT txuutxu txuutxu txuutxutxu txu txu (3) which can be rewritten as:  )( uU T (4) For PDE discovery, it aims to identify the temporal derivative (e.g., whether it is t u or tt u ) and the sparse vector  from the data. Architecture of DLGA-PDE

In this work, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. Here, we will introduce the architecture of DLGA-PDE. It consists of a neural network step and a genetic algorithm step. Using observation data, a deep neural network is first trained with available data to approximate the response of the considered physical problem. The trained neural network is then utilized to generate meta-data and calculate derivatives. Then, the genetic algorithm is employed to obtain the best model and corresponding coefficients with an incomplete candidate library. In the genetic algorithm step, the PDE is first digitized and encoded to form the corresponding genome. After the procedure of crossover and mutation, fitness of the new generations is calculated to select the best children. Then, the selected children will be the parents of a new generation. This process will continue until convergence. Fig. 1 shows the architecture of DLGA-PDE.

Fig. 1.

The workflow scheme of DLGA-PDE.

Neural network In the neural network step, a deep feed forward fully-connected artificial neural network (ANN), NN ( x , t ), is first constructed to approximate u ( x , t ). Compared with other types of neural networks, the feed forward fully-connected ANN is easier to train and can calculate derivatives relatively more precisely. The structure of a typical ANN is presented in Fig. 2. Fig. 2.

The structure of a typical feed forward fully-connected artificial neural network. i , j , and k are neurons, and w ij and w jk are weights between two neurons. After designing the neural network structure, training data are utilized to train the neural network by minimizing the loss function:    Ni iiii txNNtxuLoss )];,(),([)(  (5) where  refers to the parameters to be optimized in the neural network. Meanwhile, certain methods, such as early termination, are used to prevent overfitting. When the training process has completed, the trained neural network will be employed to generate meta-data and calculate derivatives. Here, the procedure of neural network training is only briefly introduced, additional details of which can be found in Xu et al. [17]. Genetic algorithm

In this work, a specific genetic algorithm is proposed for discovering PDEs, which will be described in detail below. (a)

Digitization and translation In order to transform the PDE into a digitized form, the PDE is digitized and encoded to form the corresponding genome.

Definition 2.1

Numbers are used to represent the corresponding order of derivatives. For example:  u ,  xu ,  xu ,  xu Definition 2.2

Each PDE term is considered as a module. Here, it is assumed that there is only multiplication in a module. In fact, most PDEs can be split into a series of multiplication and addition combinations. For example: ]1,0[  xuu , ]2,2,0[   xuu Definition 2.3

Combining these modules yields the genome of the PDE. Modules are connected by a plus sign. It is worth mentioning that there are two parts of the genome component of the PDE. One is the module group of the left-hand side, and the other is the module group of the right-hand side, which is placed in braces. Since only temporal derivatives are in the left-hand side, the term in the left-hand side is encoded in the same way as in Definition 2.1. To simplify the problem, we only consider the case in which the left-hand side has only one term, i.e., the left-hand side has a genome length of 1. In fact, it is easy to extend to a wider range of PDEs. For example, the genome of the contaminant transport equation is: ]}2[],1{[],1[  xxLxxt uDuvu

Similarly, the genome of the wave equation is: ]}2{[],2[  xxtt Auu

This special encoding method can handle different options for the left-hand side of PDEs, which increases its flexibility. Using the encoding method of PDEs, we can randomly generate a series of genomes in this format, and each genome corresponds to a unique PDE form. In the genetic algorithm, only the basic genes are utilized to generate the first generation. Moreover, the terms in the first generation may contain various combinations of different powers of u and different orders of derivatives of u with respect to x . (b) Definition of fitness In the genetic algorithm, fitness refers to the genome's superiority in the population survival measure, which is used to distinguish between good and bad genomes. Fitness is calculated using a fitness function.

Definition 2.4

In this algorithm, fitness is defined as: )( genomelenMSEFitness   (6) where the least square regression is performed on the PDE that is translated from its genome to calculate coefficients; MSE is the mean squared error of the least square regression;  is a hyperparameter; and len(genome) is the length of genome. Here, the lower is the fitness, the better is the model. It is worth mentioning that the fitness function here uses the least square method and l -regularization, and this sparsity constrained fitness function is not utilized in previous works using the genetic algorithm for PDE discovery [1,7]. Different from sparse regression, the genetic algorithm is not optimized by gradients [15]. In the genetic algorithm, children are produced by crossover and mutation, and are selected according to their fitness. Consequently, it is not optimized along the gradient direction, but rather it is optimized by finding suitable children to evolve. Without the difficulties that it would cause in gradient optimization methods, l -regularization can be utilized in the genetic algorithm to prevent over-fitting and ensure parsimony. In addition, due to the characteristics of the genetic algorithm, the generated genomes represent the corresponding PDEs, and a large candidate library, such as that in the sparse regression method, is not required. (c) Process of crossover For every genome, part of its modules can be replaced with another genome’s modules under a certain probability to generate the next generation. For example: ]}2,0[],1{[],1[:2]}2[],3,1{[],1[:1 ]}2,0[],3,1{[],1[:2]}2[],1{[],1[:1 '' GeneGene GeneGene crossover   

In this work, the crossover rate is 80%, which means that there is an 80% probability of crossover. The process of crossover allows the genes in the parent's genome to be inherited to the next generation, and increases the possibility of different gene combinations. (d)

Process of mutation Mutation refers to a mutation in some genes in the genome, resulting in a new genome. Here, three ways of mutation are defined.

Definition 2.5 (Order Mutation) : Under a certain probability, the order of derivatives in the gene will be reduced by 1. Particularly, 0 can be mutated to any random higher order. For example: ]}3[],2,0{[],1[]]}3[],2,1{[],1[   mutation ]}2[],3,1{[],1[]}0[],3,1{[],1[   mutation

Definition 2.6 (Add-module Mutation) : Under a certain probability, a random module is added to the genome. For example: ]}0,0[],3[],2,1{[],1[]}3[],2,1{[],1[   mutation Definition 2.7 (Delete-module Mutation) : Under a certain probability, a module is deleted from the genome. For example: ]}1,0[],4[],2,1{[],1[]}1,3[],1,0[],4[],2,1{[],1[   mutation

Different from previous works using the genetic algorithm for PDE discovery, here, a variety of mutation methods are set. Order Mutation can change the order of the derivatives in the genome. On the other hand, Add-module Mutation and Delete-module Mutation can alter the genome length. These three different mutation ways may also occur independently. The diversification of mutation methods enables the genome to possibly have diverse changes, which is helpful for identifying the best model. It is worth noting that there are three mutation ways for the right-hand side of the equation, but the left-hand side of the equation only has Order Mutation. This means that the left-hand side can only change the derivative order and its length remains the same, i.e., only one term. (e)

Process of selection and evolution In the process of crossover, each parent genome crossovers twice, producing twice as many genomes as the parent genome. For example, 50 parents will produce 100 children via crossover. Then, the children’s fitness will be calculated and sorted from small to large. Finally, the first half of children will be selected as a new generation of parents. We set the number of genomes and the number of generations. After certain epochs, the genome that occurs stably during evolution is the best model. Results

In this section, some classical PDEs will be utilized to test the performance of DLGA-PDE.

Problem Setting

In this part, we will briefly introduce the settings of the discussed PDEs, which are the Korteweg–de Vries (KdV) equation, the wave equation, the Chaffee-Infante equation, and the Burgers equation. Additional details about the generation of the dataset and meta-data are provided in Supplementary Information, Section S1.

KdV equation

The KdV equation takes the following form: xxxxt buuuu  (7) The conventional spectral method is utilized to generate the dataset [9]. In this example, we set b =0.0025. In the dataset, there are 201 temporal observation steps at intervals of 0.005, and 512 spatial observation steps at intervals of 0.0039. Therefore, the total number of data points is 102912. A five-layer deep neural network with 50 neurons per hidden layer is constructed. The activation function is sin( x ). 30000 data from the dataset are randomly selected to train the neural network. Then, the trained neural network is utilized to generate 200000 meta-data, and derivatives are calculated using automatic differentiation. The wave equation takes the form: xxtt

Auu  (8) The central difference method is utilized to generate the dataset. In this work, we set A =1. In the dataset, there are 321 temporal observation steps at intervals of 0.0196, and 161 spatial observation steps at intervals of 0.0196. Therefore, the total number of data points is 51681. A five-layer deep neural network with 50 neurons per hidden layer is constructed. The activation function is sin( x ). 10000 data are randomly selected to train the neural network. Then, the trained neural network is utilized to generate 160000 meta-data, and derivatives are calculated using automatic differentiation. The Burgers equation takes the following form: xxxt auuuu  (9) The conventional spectral method is utilized to generate the dataset. In this case, we set a =1. In the dataset, there are 201 temporal observation steps at intervals of 0.05, and 256 spatial observation steps at intervals of 0.0625. Therefore, the total number of data points is 51456. A nine-layer deep neural network with 20 neurons per hidden layer is constructed. The activation function is tanh( x ). 2000 data are randomly selected to train the neural network. Then, the trained neural network is utilized to generate 57600 meta-data, and derivatives are calculated using automatic differentiation. The Chaffee-Infante equation takes the form: )( uuuu xxt   (10) In Eq. (10),  is the diffusion coefficient. When  , Eq. (10) is also called the Whitehead equation. It is widely used in many fields, such as environmental science, fluid dynamics, high-energy physics, and electronic science. In this work, we set  . In this work, the forward difference method is utilized to generate the dataset. In the dataset, there are 200 temporal observation steps at intervals of 0.002, and 301 spatial observation steps at intervals of 0.01. Therefore, the total number of data points is 60200. A five-layer deep neural network with 50 neurons per hidden layer is constructed. The activation function is sin( x ). 10000 data are randomly selected to train the neural network. Then, the trained neural network is utilized to generate 160000 meta-data, and derivatives are calculated using automatic differentiation. For DLGA-PDE, a standard setting is first utilized, which can handle most PDE discovery problems. In the standard setting, basic genes are set to be )}3(),2(),1(),0()]{2(),1([ xxxxxxttt uuuuuu , and thus the first generation is randomly generated by a combination of these genes. Here, we consider up to the third-order derivative for the right-hand side terms, which means that the highest-order derivative produced by mutation is third-order in the right-hand side term. In fact, most PDEs are composed of derivatives of the third-order and below. Therefore, it is reasonable to only consider the third-order derivative for the standard setting. The size of the population is set to be 200, and the maximum number of generations is set to be 100. At first, the four PDEs mentioned above are utilized to test the performance of standard DLGA-PDE. The results are shown in Table 1. It can be seen that standard DLGA-PDE has successfully discovered true PDE terms, which shows that DLGA-PDE is able to handle various types of PDEs. At the same time, the estimations of the coefficients of the PDE terms are very accurate. It is worth mentioning that the candidate library utilized here is incomplete, but DLGA-PDE can still discover the correct PDE owing to the characteristics of the genetic algorithm.

Table 1.

Summary of the learned structure via standard DLGA-PDE for four examples.

In order to illustrate the procedure of DLGA-PDE more explicitly, Table 2 shows the best child found in each generation when DLGA-PDE is utilized to discover the Chaffee-Infante

Discovered Structure

True Structure KdV Equation xxxxt uuuu ,, xxxxt uuuu ,, Wave Equation xxtt uu , xxtt uu , Burgers Equation xxxt uuuu ,, xxxt uuuu ,, Chaffee-Infante Equation ,,, uuuu xxt ,,, uuuu xxt equation. Table 2.

Best child found in each generation when DLGA-PDE is utilized to discover the Chaffee-Infante equation.

Generation

Structure of Discovered Best Child xxxxxxxxt uuuuuuuu ,,,,, xxxxxxxxxt uuuuuuuuu ,,,,, xxxxxxxxxxt uuuuuu ,,,, uuuuuuuuu xxxxxxxxxxxxt ,,,,, ,,,, uuuuuu xxxxt ,,,, uuuuuu xxxt ,,,, uuuuuuu xxxxxxxt ,,, uuuu xxt ,,, uuuu xxt Due to the difficulty of this example, it took many generations to converge and discover the correct structure. For this case, in the first several generations (e.g., 1-5 th generation), the best child contained many terms and fell into a local minimum in the sixth generation. Furthermore, in the 39 th generation, it jumped out of the local minimum and continued to evolve until it found the best model. From this case, one can see that the process of mutation and crossover in DLGA-PDE can effectively prevent becoming stuck with the local minimum. It is worth noting that u has appeared in the sixth generation, which demonstrates that DLGA-PDE has the ability to produce more complex combinations, and it can search far beyond the incomplete candidate library compared with traditional methods. In order to further investigate the effect of mutation and combination in the genetic algorithm step of DLGA-PDE, the basic genes will be changed in the next few numerical experiments. The performance of DLGA-PDE will be assessed when some basic order of derivatives is not contained in the basic genes. In order to test the effect of mutation proposed in DLGA-PDE, DLGA-PDE is utilized to discover the wave equation and the KdV equation again in the absence of some order of derivatives in basic genes. Specially, the missing derivative terms in the basic genes are parts of the target PDE, and thus it is impossible to form the target PDE based on the provided basic genes without mutation. The KdV equation is first investigated, which has a third-order derivative term. Different from the previous standard settings, here the basic genes are changed to be )}2(),1(),0()]{2(),1([ xxxttt uuuuu . The third-order derivative term is not included in the basic genes. In order to increase the difficulty for obtaining the correct term (the third-order derivative term), we set that the highest-order derivative produced by mutation is fourth-order for the right-hand side terms. Similarly, the best child found in each generation is displayed in Table 3.

Table 3.

Best child found in each generation when DLGA-PDE is utilized to discover the KdV equation in the absence of a third-order derivative term in basic genes.

Generations

Discovered Structure uuuuuuu xxxxxxt ,,,, xxxxxt uuuuu ,,, xxxxt uuuu ,, xxxxt uuuu ,, From Table 3, it can be seen that the best child in the first generation does not have new higher-order derivatives. In the second generation, however, the third-order derivative is generated by mutation. The best child remains unchanged from the third generation to the last generation, which means that the algorithm has converged. Finally, the correct form of PDE is discovered in the absence of the third-order derivative term in basic genes. This shows that DLGA-PDE can effectively find higher-order derivative terms via mutation, even if they are not included in the basic genes. In addition, DLGA-PDE exhibits a fast convergence rate and good stability. The wave equation is then investigated, whose left-hand side term is u tt . Previous experiments have demonstrated that DLGA-PDE can discover the correct left-hand side term for the wave equation with standard settings. Here, the basic genes are set as )}3(),2(),1(),0()]{1([ xxxxxxt uuuuu . We can see that the left-hand side term only has u t in basic genes, which means that u tt can only be produced by mutation. Other settings are the same as in the previous case. The best child found in each generation is presented in Table 4. From Table 4, it can be seen that u tt is obtained in the best child in the second generation by mutation. Meanwhile, the correct PDE form is discovered very quickly. Table 4.

Best child found in each generation when DLGA-PDE is utilized to discover the wave equation in the absence of u tt in basic genes. Generations

Discovered Structure , uu t xxtt uu , xxtt uu , In the above, two cases with missing high-order derivative terms in basic genes are utilized to test the effect of mutation. In these situations, DLGA-PDE successfully produces these missing terms via mutation. The missing term can be a high-order derivative with respect to x or a high-order derivative with respect to t . It can be found that DLGA-PDE can still quickly converge because of the effect of mutation in these cases. This also demonstrates that DLGA-PDE can adapt to many special situations with low restriction of the candidate library. Next, a more difficult situation with missing terms in the basic genes is investigated. For obtaining the missing terms, both mutation and combination are required in the genetic algorithm step. Here, Burgers is investigated again. Different from standard settings, the basic genes are changed to be )}2(),0()]{2(),1([ xxttt uuuu , and up to the third-order derivatives are still considered here. Because u x is not included in basic genes, uu x must be produced by both mutation and combination for discovering the correct Burgers equation. The best child found in each generation is displayed in Table 5. From Table 5, one can see that DLGA-PDE converges very quickly and successfully finds the correct structure. From this case, it can be seen that DLGA-PDE can effectively discover terms that need both mutation and combination. Table 5.

Best child found in each generation when DLGA-PDE is utilized to discover the Burgers equation in the absence of a necessary term in basic genes.

Generations

Discovered Structure , uu t xxxt uuuu ,, xxxt uuuu ,, In order to further test the performance of DLGA-PDE, it is compared with DL-PDE, which utilizes the neural network to calculate derivatives and utilizes sparse regression to discover PDEs. Additional details about DL-PDE can be found in Xu et al. [17]. The major difference between DLGA-PDE and DL-PDE is that DL-PDE needs a complete candidate library, while DLGA-PDE can work with an incomplete candidate library. The Burgers equation and the Chaffee-Infante equation are utilized to test the performance of these two methods. For the Burgers equation, different levels of noise are added to data, and 1000 data (1.94% of total data) are randomly selected to train the neural network. In this work, noise is added as follows: )1(),(),( etxutxu   (11) where  denotes the noise level; and e is the uniform random variable, taking values from -1 to 1 [3]. In this example, the performance of the two methods will be studied on a small amount of data and different noise levels. In this case, DLGA-PDE uses the standard settings, as discussed previously. In addition, DL-PDE uses a candidate library as follows: ]...1[ xxxxxxxxxxxxx uuuuuuuuuuuuuu  (12) Then, DLGA-PDE and DL-PDE are utilized to discover the Burgers equation with different noise levels, and the results are provided in Table 6. From Table 6, it can be seen that both DLGA-PDE and DL-PDE perform well when data are limited and noisy, but DLGA-PDE is more stable when the noise level is 20%. Table 6.

For discovering the Burgers equation, the results of DL-PDE and DLGA-PDE with different noise levels.

Noise Level

Learned Equation (1000 Data) Learned Equation (DL-PDE)

Learned Equation (DLGA-PDE) Correct PDE xxxt uuuu  xxxt uuuu 

Clean Data xxxt uuuu  xxxt uuuu 

1% Noise xxxt uuuu  xxxt uuuu 

5% Noise xxxt uuuu  xxxt uuuu 

10% Noise xxxt uuuu  xxxt uuuu 

15% Noise xxxt uuuu  xxxt uuuu 

20% Noise uuuu xxxt  xxxt uuuu  Using the Chaffee-Infante equation, we will study the performance of the two methods on different amounts of data with other conditions being identical. Similarly, DLGA-PDE uses the standard settings, while DL-PDE uses a candidate library with 16 terms: ]...1[ xxxxxxxxxx uuuuuuuuuuuuuu  (13) Then, DLGA-PDE and DL-PDE are utilized to discover the Chaffee-Infante equation with different amounts of data, the results of which are presented in Table 7. From Table 7, it can be seen that DLGA-PDE performs well with limited data, while DL-PDE fails in all experiments, which indicates that DLGA-PDE is much more stable than DL-PDE in this case. From the two experiments above, it can be found that the accuracy of DL-PDE is affected by the size of the candidate library. For the Burgers equation, only 12 terms are included in the candidate library. On the other hand, for the Chaffee-Infante equation, due to the source and sink term, which contains u and u , many terms must be included in the candidate library. Consequently, the candidate library for discovering the Chaffee-Infante equation is larger than that for the Burgers equation, which may be the reason for the failure to discover the correct Chaffee-Infante equation. For a larger candidate library, greater sparsity will be required when performing the sparse regression that is utilized in DL-PDE, which may increase the difficulty. In contrast, DLGA-PDE can discover PDEs with an incomplete candidate library, and the PDE terms to be discovered can be produced by mutation and crossover of genomes. In this way, the problems with the candidate library that occur in DL-PDE can be avoided in DLGA-PDE. By comparison with DL-PDE, the advantages of DLGA-PDE can be clearly discerned.

Table 7.

For discovering the Chaffee-Infante equation, the results of DL-PDE and DLGA-PDE with different data volume.

Using the Burgers equation and the Chaffee-Infante equation, it can be found that DLGA-PDE performs well when the data are limited and noisy. In order to further demonstrate the robustness of DLGA-PDE to limited and noisy data, DLGA-PDE is utilized to discover more PDEs with limited and noisy data, additional details of which are provided in the Supplementary Information, Section S2. It is shown that DLGA-PDE can find the correct PDE even if the noise level is 15% and performs well with little data, which means that DLGA-PDE is able to handle a large variety of PDEs with noisy and limited data.

Volume of Data

Learned Equation (Clean Data) Learned Equation (DL-PDE)

Learned Equation (DLGA-PDE)

Correct PDE uuuu xxt  uuuu xxt  uuu xxt  uuuu xxt  uu t  uuuu xxt  xxxxt uuuuuu  uuuu xxt 

500 Data (0.83% of Total) uuuu xxt  uuuu xxt  Conclusion and Discussion

In this work, we propose a novel framework, called DLGA-PDE, for discovering underlying PDEs with an incomplete candidate library, which combines the neural network method and the genetic algorithm. In the proposed method, a new way to digitize and code the PDE is proposed, and special principles of mutation are defined, which makes it able to quickly converge and avoid falling into local minima. Compared with ordinary genetic methods, DLGA-PDE is robust to limited and noisy data. The most significant advantage of DLGA-PDE is that it can discover PDEs with an incomplete candidate library. Through crossover and mutation, infinite possible genomes may be created, which greatly expands the search scope of DLGA-PDE and makes it more flexible for PDE discovery. Owing to the merit of its unique encoding method, DLGA-PDE can find the correct PDE when the left-hand side term of PDE has multiple options (e.g., u t , u tt , and so on), which may not be handled well in traditional methods. In DLGA-PDE, l -regularization is adopted to construct the fitness function, and by using the l -regularization, it increases the stability and efficiency. In this work, different numerical experiments are investigated, and the results demonstrate that DLGA-PDE performs well for a wide range of PDEs with standard settings. The effect of mutation and combination of DLGA-PDE is examined by setting different basic genes, and the results show that DLGA-PDE can find the correct PDE term via mutation and combination of genomes, even if some basic order of derivatives are not included in the basic genes. This demonstrates that DLGA-PDE can work well when scarce information is known about the PDE structure. By recording the evolution process, one can see that DLGA-PDE converges quickly in most cases. Meanwhile, DLGA-PDE is robust to limited and noisy data. This is because derivatives are calculated by automatic differentiation of the trained neural network with a large number of meta-data. By comparing the performance of DL-PDE and DLGA-PDE, the results of these two methods do not show much difference when the candidate library of DL-PDE is relatively small. However, for the Chaffee-Infante equation, the DL-PDE method fails to find the correct PDE form because a large candidate library is utilized to account for more complex terms. In contrast, DLGA-PDE works well and finds the correct terms stably for discovering the Chaffee-Infante equation. This demonstrates that DLGA-PDE has better stability and wider applicability in practical applications. Some limitations and shortcomings still exist for DLGA-PDE. The hyper-parameter of l -regularization is important for discovering the correct PDE form. However, it is currently set by experience, and a sophisticated method for adjusting the hyper-parameter is needed. In addition, DLGA-PDE can only be utilized under the condition that coefficients of the PDE are constant in space and time. It cannot now deal with certain other problems, such as piecewise-constant coefficients [3] or smoothly varying coefficients [11]. Further works regarding this issue are necessary. Acknowledgements

This work is partially funded by the National Natural Science Foundation of China under Grants 51520105005 and U1663208, and the National Science and Technology Major Project of China (Grant No. 2017ZX05009-005 and 2017ZX05049-003).

References [1]

Atkinson, S., Subber, W., Wang, L., Khan, G., Hawi, P., & Ghanem, R. (2019).

Data-driven discovery of free-form governing differential equations . 1–6. http://arxiv.org/abs/1910.05117 [2]

Brunton, S. L., Proctor, J. L., Kutz, J. N., & Bialek, W. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems.

Proceedings of the National Academy of Sciences of the United States of America , (15), 3932–3937. https://doi.org/10.1073/pnas.1517384113 [3] Chang, H., & Zhang, D. (2019a). Identification of physical processes via combined data-driven and data-assimilation methods.

Journal of Computational Physics , , 337–350. https://doi.org/10.1016/j.jcp.2019.05.008 [4] Chang, H., & Zhang, D. (2019b). Machine learning subsurface flow equations from data.

Computational Geosciences , (5), 895–910. https://doi.org/10.1007/s10596-019-09847-2 [5] Hasan, A., Pereira, J. M., Ravier, R., Farsiu, S., & Tarokh, V. (2019).

Learning Partial Differential Equations from Data Using Neural Networks . http://arxiv.org/abs/1910.10262 [6]

Long, Z., Lu, Y., & Dong, B. (2019). PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network.

Journal of Computational Physics , . https://doi.org/10.1016/j.jcp.2019.108925 [7] Maslyaev, M., Hvatov, A., & Kalyuzhnaya, A. (2019).

Data-driven PDE discovery with evolutionary approach . https://doi.org/10.1007/978-3-030-22750-0_61 [8]

Qin, T., Wu, K., & Xiu, D. (2019). Data driven governing equations approximation using deep neural networks.

Journal of Computational Physics , , 620–635. https://doi.org/10.1016/j.jcp.2019.06.042 [9] Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.

Journal of Computational Physics , , 686–707. https://doi.org/10.1016/j.jcp.2018.10.045 [10] Raissi, Maziar. (2018). Deep hidden physics models: Deep learning of nonlinear partial differential equations.

Journal of Machine Learning Research , , 1–24. [11] Rudy, S., Alla, A., Brunton, S. L., & Kutz, J. N. (2019). Data-driven identification of parametric partial differential equations.

SIAM Journal on Applied Dynamical Systems , (2), 643–660. https://doi.org/10.1137/18M1191944 [12] Rudy, S. H., Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2017). Data-driven discovery of partial differential equations.

Science Advances , (4), 1–7. https://doi.org/10.1126/sciadv.1602614 [13] Schaeffer, H. (2017). Learning partial differential equations via data discovery and sparse optimization.

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , (2197). https://doi.org/10.1098/rspa.2016.0446 [14] Schaeffer, H., & McCalla, S. G. (2017). Sparse model selection via integral terms.

Physical Review E , (2), 1–7. https://doi.org/10.1103/PhysRevE.96.023302 [15] Whitley, D. (1994). A genetic algorithm tutorial.

Statistics and Computing , (2), 65–85. https://doi.org/10.1007/BF00175354 [16] Wu, K., & Xiu, D. (2019).

Data-Driven Deep Learning of Partial Differential Equations in Modal Space . http://arxiv.org/abs/1910.06948 [17]

Xu, H., Chang, H., & Zhang, D. (2019).

DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data . , 1–24. http://arxiv.org/abs/1908.04463 [18] Zhang, S., & Lin, G. (2018). Robust data-driven discovery of governing physical laws with error bars.

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , (2217). https://doi.org/10.1098/rspa.2018.0305 Supplementary Information

DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm

E-mail address: [email protected] (H. Xu); [email protected] (H. Chang); [email protected] (D. Zhang).

Section S1. Additional details about the discussed PDEs

The Korteweg–de Vries (KdV) equation is a partial differential equation describing one-way motion of shallow water waves. It was discovered by Korteweg and de Vries when investigating small-amplitude and long-wave motion in shallow water. Its equation takes the form: xxxxt buuuu  (S1) where b is a constant. To obtain a dataset, the conventional spectral method is utilized to solve the KdV equation. We start with an initial condition being )cos(),0( xxu  , ]1,1[  x , and periodic boundary conditions. Eq. (S1) is integrated from the starting time t =0 to the final time t =1, which is done by utilizing the Chebfun package with a spectral Fourier discretization with 512 modes and a fourth-order explicit Runge-Kutta temporal integrator with time-step size being [9]. The solution is recorded every  t to obtain 201 observation steps in time. Therefore, we have x n =512, t n =201, and d N =102912. To generate meta-data, we take spatial observation steps with  x in the domain [ 0.5, 0.5) x   , and take temporal observation steps with  t in the domain )1,0[  t . Thus, for meta-data, we have x n  =1000, t n  =200, and d N  =200000. The wave equation is an important partial differential equation, which mainly describes various wave phenomena in nature, including transverse and longitudinal waves, such as acoustic, light, and water waves. It arises in numerous fields, such as acoustics, electromagnetics, and fluid mechanics. Its equation takes the following form: xxtt

Auu  (S2) To obtain a dataset, the central differential method is utilized to solve the wave equation. Three-point central difference is used to approximate the second-order derivative, and this equation is discretized as: x uuuAt uuu jijijijijiji     (S3) where Mxx f  ; NTt  ; f x is the right end of the spatial domain; T is the right end of the temporal domain; M is the number of nodes in the x direction; and N is the number of nodes in the t direction. By defining xtAr  , Eq. (S3) is simplified as: )22()(   jijijijiji uuruuru (S4) In this case, initial conditions are as follows:      x xxxu ,  xtu (S5) Meanwhile, boundary conditions are set to be:  ttutu . Other parameters are set as: A =1;  f x ; M=160;   T ; and N =320. Matlab is utilized to solve the problems for obtaining the dataset. There are 321 temporal observation steps at intervals of 0.0196 and 161 spatial observation steps at intervals of 0.0196, and thus the total number of data points is 51681. To generate meta-data, we take 400 spatial observation steps uniformly in the domain ]2,0[  x , and take 400 temporal observation steps uniformly in the domain [0, 5] t  . Therefore, for meta-data, we have x n  =400, t n  =400, and d N  =160000.

1. 3 Burgers equation

The Burgers equation is a nonlinear partial differential equation that simulates the propagation and reflection of shock waves. The Burgers equation is the basic partial differential equation for various fields of applied mathematics, such as fluid mechanics, nonlinear acoustics, and gas dynamics. One-dimensional Burgers equation can take the following form: t x xx u uu au    (S6) where a is the diffusion coefficient, which is set to be 0.1 in this example. To obtain a dataset, the conventional spectral method is utilized to solve the Burgers equation with an initial condition being )8/sin(),0( xxu  , ]8,8[  x , and periodic boundary conditions. Eq. (S6) is integrated from the starting time t =0 to the final time t =10, which is accomplished by using the Chebfun package with a spectral Fourier discretization with 256 modes and a fourth-order explicit Runge-Kutta temporal integrator with time-step size being [10]. The solutions are recorded every  t to obtain 201 observation steps in time. Thus, we have x n =256, t n =201, and d N =51456. To generate meta-data, we take spatial observation steps with  x in the domain [ 8,8) x   , and take temporal observation steps with  t in the domain )9,0[  t . Therefore, for these meta-data, we have x n  =320, t n  =180, and d N  =57600. The Chaffee-Infante equation is an important nonlinear partial differential equation, which takes the form: )( uuuu xxt   (S7) In Eq. (S7),  is the diffusion coefficient. When  , Eq. (S7) is also called the Whitehead equation. It is widely used in numerous fields, such as environmental science, fluid dynamics, high-energy physics, and electronic science. In this work, we set  . To obtain a dataset, the explicit forward Euler method is utilized to solve the wave equation. Thus, it can be discretized as follows: )(2 ,3,2 ,1,,1,1, jijijijijijiji uux uuut uu     (S8) where Mxx f  ; NTt  ; f x is the right end of the spatial domain; T is the right end of the temporal domain; M is the number of nodes in the x direction; and N is the number of nodes in the t direction. By defining xtr  , Eq. (S8) can be simplified as: tuuuururu jijijijijiji   )()()21( ,,,1,1,1,  (S9) In this case, initial conditions are set to be:  xxxxu . Meanwhile, boundary conditions are set to be:  ttutu . Other parameters are set as:  =1;  f x ; M =300;  T ; and N =80000. Matlab is utilized to solve this problem. To simplify the dataset, one temporal observation step is taken from every 320 numerical time steps from t =0.1 to t =0.5, and thus there are a total of 200 temporal observation steps. In addition, there are 301 spatial observation steps at intervals of 0.01. Therefore, the total number of data points is 60200. To generate meta-data, we take 400 spatial observation steps uniformly in the domain ]2,3.0[  x , and take 400 temporal observation steps uniformly in the domain [0.2, 0.4] t  . Thus, for meta-data, we have x n  =400, t n  =400, and d N  =160000. Section S2. Results of DLGA-PDE with limited and noisy data

For the KdV equation, different levels of noise are added to the data, and 30000 data are randomly selected to train the neural network. DLGA-PDE uses the standard setting, and the results are shown in Table S1. From Table S1, it can be seen that DLGA-PDE performs well when data are noisy.

Table S1.

Summary of the learned equation for discovering the KdV equation using DLGA-PDE with noisy data. Noise Level

Learned Equation

Correct PDE xxxxt uuuu 

Clean Data xxxxt uuuu 

1% Noise xxxxt uuuu 

5% Noise xxxxt uuuu 

10% Noise xxxxt uuuu 

15% Noise xxxxt uuuu 

For the wave equation, different amounts of data are utilized to train the neural network. DLGA-PDE uses the standard setting, and the results are presented in Table S2.

Table S2.

Summary of the learned equation for discovering the wave equation using DLGA-PDE with different amounts of data training the neural network.

Volume of Data

Learned Equation

Correct PDE xxtt uu  xxtt uu  xxtt uu  xxtt uu 

500 Data (0.967% of Total) xxtt uu  Next, different levels of noise are added to the original data, and 2000 data are selected to train the neural network. The results are shown in Table S3. From Table S2 and Table S3, it can be seen that DLGA-PDE performs well with limited and noisy data.

Table S3.

Summary of the learned equation for discovering the wave equation using DLGA-PDE with noisy data.

Noise Level

Learned Equation

Correct PDE xxtt uu  Clean Data xxtt uu 

1% Noise xxtt uu 

5% Noise xxtt uu 

10% Noise xxtt uu 

15% Noise xxtt uu 

20% Noise uu tt  For the Chaffee-Infante equation, different levels of noise are added to the data, and 10000 data are randomly selected to train the neural network. DLGA-PDE uses the standard setting, and the results are presented in Table S4. From Table S4, one can see that DLGA-PDE performs well when the data are noisy.

300 Data (0.580% of Total) xxtt uu 

100 Data (0.193% of Total) xxxxtt uuu  Table S4.

Summary of the learned equation for discovering the Chaffee-Infante equation using DLGA-PDE with noisy data.

Noise Level

Learned Equation

Correct PDE uuuu xxt  Clean Data uuuu xxt 

1% Noise uuuu xxt 

5% Noise uuuu xxt 

10% Noise uuuu xxt  uuuu xxt 