[PDF] PINNeik: Eikonal solution using physics-informed neural networks

Abstract

The eikonal equation is utilized across a wide spectrum of science and engineering disciplines. In seismology, it regulates seismic wave traveltimes needed for applications like source localization, imaging, and inversion. Several numerical algorithms have been developed over the years to solve the eikonal equation. However, these methods require considerable modifications to incorporate additional physics, such as anisotropy, and may even breakdown for certain complex forms of the eikonal equation, requiring approximation methods. Moreover, they suffer from computational bottleneck when repeated computations are needed for perturbations in the velocity model and/or the source location, particularly in large 3D models. Here, we propose an algorithm to solve the eikonal equation based on the emerging paradigm of physics-informed neural networks (PINNs). By minimizing a loss function formed by imposing the eikonal equation, we train a neural network to output traveltimes that are consistent with the underlying partial differential equation. We observe sufficiently high traveltime accuracy for most applications of interest. We also demonstrate how the proposed algorithm harnesses machine learning techniques like transfer learning and surrogate modeling to speed up traveltime computations for updated velocity models and source locations. Furthermore, we use a locally adaptive activation function and adaptive weighting of the terms in the loss function to improve convergence rate and solution accuracy. We also show the flexibility of the method in incorporating medium anisotropy and free-surface topography compared to conventional methods that require significant algorithmic modifications. These properties of the proposed PINN eikonal solver are highly desirable in obtaining a flexible and efficient forward modeling engine for seismological applications.

Full PDF

EE IKONAL SOLUTION USING PHYSICS - INFORMED NEURALNETWORKS

A P

REPRINT

Umair bin Waheed

Department of GeosciencesKing Fahd University of Petroleum and Minerals

Ehsan Haghighat

Department of Civil EngineeringMassachusetts Institute of Technology

Tariq Alkhalifah

Physical Sciences and Engineering DivisionKing Abdullah University of Science and Technology

Chao Song

Physical Sciences and Engineering DivisionKing Abdullah University of Science and Technology

Qi Hao

Center for Integrative Petroleum ResearchKing Fahd University of Petroleum and MineralsJuly 17, 2020 A BSTRACT

The eikonal equation is utilized across a wide spectrum of science and engineering disciplines. Inseismology, it regulates seismic wave traveltimes needed for applications like source localization,imaging, and inversion. Several numerical algorithms have been developed over the years to solve theeikonal equation. However, they suffer from computational bottleneck when repeated computationsare needed for perturbations in the velocity model and/or the source location, particularly in large3D models. Here, we employ the emerging paradigm of physics-informed neural networks (PINNs)to solve the eikonal equation. By minimizing a loss function formed by imposing the validity ofthe eikonal equation, we train a neural network to produce traveltimes that are consistent with theunderlying partial differential equation. More speciﬁcally, to tackle point-source singularity, we usethe factored eikonal equation. We observe sufﬁciently high traveltime accuracy for most applicationsof interest. We also demonstrate how machine learning techniques like transfer learning and surrogatemodeling can be used to massively speed up traveltime computations for updated velocity modelsand source locations. These properties of the PINN eikonal solver are highly desirable in obtainingan efﬁcient forward modeling engine for seismic inversion applications. K eywords Eikonal equation · Physics-informed neural networks · Traveltimes · Forward modeling.

The eikonal (from the Greek word (cid:15)ικων = image) equation is a ﬁrst-order non-linear partial differential equation(PDE) encountered in the wave propagation and geometric optics literature. It was ﬁrst derived by Sir William RowanHamilton in the year 1831 [1]. The eikonal equation ﬁnds its roots in both wave propagation theory and geometricoptics. In wave propagation, the eikonal equation can be derived from the ﬁrst term of the Wentzel-Kramers-Brillouin(WKB) expansion of the wave equation [2], whereas in geometric optics, it can be derived using Huygen’s principle [3].Despite its origins in optics, the eikonal equation ﬁnds applications in many science and engineering problems. Toname a few, in image processing, it is used to compute distance ﬁelds from one or more points [4], inferring 3Dsurface shapes from intensity values in 2D images [5], image denoising [6], segmentation [7], and registration [8]. Inrobotics, the eikonal equation is extensively used for optimal path planning and navigation, e.g., for domestic robots [9], a r X i v : . [ phy s i c s . c o m p - ph ] J u l PREPRINT - J

ULY

17, 2020autonomous underwater vehicles [10], and Mars Rovers [11]. In computer graphics, the eikonal equation is used tocompute geodesic distances for extracting shortest paths on discrete and parametric surfaces [12, 13]. In semi-conductormanufacturing, the eikonal equation is used for etching, deposition, and lithography simulations [14, 15]. Furthermore,and of primary interest to us, the eikonal equation is routinely employed in seismology to compute traveltime ﬁeldsneeded for many applications including statics and moveout correction [16], traveltime tomography [17], microseismicsource localization [18], and Kirchhoff migration [19].The fast marching method (FMM) and the fast sweeping method (FSM) are the two most commonly used algorithmsfor solving the eikonal equation. FMM belongs to the family of algorithms which are also referred to as single-passmethods. The ﬁrst such algorithm is attributed to John Tsitsiklis [20], who used a control-theoretic discretization of theeikonal equation and emulated Dijkstra-like shortest path algorithm. However, a few months later, a ﬁnite-differenceapproach, also based on Dijkstra-like ordering and updating was developed [21]. The FMM combines entropy satisfyingupwind schemes for gradient approximations and a fast sorting mechanism to solve the eikonal equation in a single-pass.The FSM, on the other hand, is a multi-pass algorithm that combines Gauss-Seidel iterations with alternating sweepingordering to solve the eikonal equation [22]. The idea behind the algorithm is that the characteristics of the eikonalequation can be divided into a ﬁnite number of pieces and information propagating along each piece can be accountedfor by one of the sweeping directions. Therefore, FSM converges in a ﬁnite number of iterations, irrespective of thegrid size.Both FMM and FSM were initially proposed to solve the eikonal equation on rectangular grids. However, manydifferent approaches have since been proposed, extending them to other discretizations and formulations. A detailedanalysis and comparison of these fast methods can be found in [23].On a different front, deep learning is fast emerging as a potential disruptive tool to tackle longstanding research problemsacross science and engineering disciplines [24]. Recent advances in the ﬁeld of Scientiﬁc Machine Learning havedemonstrated the largely untapped potential of deep learning for applications in scientiﬁc computing. The idea to useneural networks for solving PDEs has been around since the 1990s [25, 26]. However, recent advances in the theoryof deep learning coupled with a massive increase in computational power and efﬁcient graph-based implementationof new algorithms and automatic differentiation [27] have seen a resurgence of interest in using neural networks toapproximate the solution of PDEs.This resurgence is conﬁrmed by the advances made in the recent literature on scientiﬁc computing. For exam-ple, [28] used a deep neural network (DNN) for modeling turbulence in ﬂuid dynamics, while [29] proposed a deeplearning algorithm to solve the nonlinear Black–Scholes equation, the Hamilton–Jacobi–Bellman equation, and theAllen–Cahn equation. Similarly, [30] developed a mesh-free algorithm based on deep learning for efﬁciently solvinghigh-dimensional PDEs. In addition, [31] used a convolutional neural network to speed up the solution to a sparse linearsystem required to obtain a numerical solution of the Navier-Stokes equation.Recently, Raissi et al. [32] developed a deep learning framework for the solution and discovery of PDEs. The so-calledphysics-informed neural network (PINN) leverages the capabilities of DNNs as universal function approximators. Incontrast with the conventional deep learning approaches, PINNs restrict the space of admissible solutions by enforcingthe validity of the underlying PDE governing the actual physics of the problem. This is achieved by using a simplefeed-forward network leveraging automatic differentiation (AD), also known as algorithmic differentiation. PINNshave already demonstrated success in solving a wide range of non-linear PDEs, including Burgers, Schrödinger,Navier-Stokes, and Allen-Cahn equations.In this paper, we develop a PINN-based algorithm for solving the eikonal equation. Based on a loss function deﬁned bythe underlying PDE, we train a DNN to yield the solution of the eikonal equation. To mitigate point-source singularity,we use the factored eikonal equation. Through tests on benchmark synthetic models, we study the accuracy properties ofthe proposed solver. We also explore how machine learning techniques like transfer learning and surrogate modeling canbe used to massively speed up repeated traveltime computations with updated velocity models and/or source locations.These properties are highly desirable in reducing computational bottleneck for applications in seismic inversion.The rest of the paper is organized as follows. We begin by describing the theoretical underpinnings of the algorithm.After presenting the eikonal equation and deep feed-forward neural networks, we brieﬂy discuss important conceptsincluding the approximation property of neural networks and automatic differentiation. Tying these concepts together,we then show how PINNs can be used to solve the eikonal equation. Next, we present numerical tests probing intothe accuracy of the proposed framework on synthetic velocity models. We also explore the applicability of transferlearning and surrogate modeling to efﬁciently solve the eikonal equation. Next, we discuss the strengths and limitationsof the approach including implications of this work on the ﬁeld of numerical eikonal solvers. This is followed by someconcluding remarks. 2

PREPRINT - J

ULY

17, 2020Figure 1: Schematic representation of a feed-forward neural network with L − hidden layers. In this section, we ﬁrst introduce the eikonal equation and the factorization idea. This is followed by a brief overviewof deep neural networks and their capabilities as function approximators. Next, we brieﬂy explain the concept ofautomatic differentiation. Finally, putting these pieces together, we present the proposed algorithm for solving theeikonal equation.

The eikonal equation is a non-linear, ﬁrst-order, hyperbolic PDE of the form: |∇ T ( x ) | = 1 v ( x ) , ∀ x ∈ Ω ,T ( x s ) = 0 , (1)where Ω is a domain in R d with d as the space dimension, T ( x ) is the traveltime or Euclidean distance to any point x from the point-source x s , v ( x ) is the velocity deﬁned on Ω , and ∇ denotes the spatial differential operator. Equation (1)simply means the gradient of the arrival time surface is inversely proportional to the speed of the wavefront. This is alsocommonly known as the isotropic eikonal equation as the velocity is not a function of the wave propagation direction ( ∇ T / |∇ T | ) .To avoid the singularity due to the point-source [33], we consider the factored eikonal equation [34]. The factorizationapproach relies on factoring the unknown traveltime ( T ( x ) ) into two functions. One of the functions is speciﬁedanalytically, such that the other function is smooth in the source neighborhood. Speciﬁcally, we consider multiplicativefactorization, i.e., T ( x ) = T ( x ) τ ( x ) , (2)where T ( x ) is known and τ ( x ) is the unknown function. Plugging equation 2 in equation 1, we get the factored eikonalequation: T |∇ τ | + τ |∇ T | + 2 T τ ( ∇ T . ∇ τ ) = 1 v ( x ) ,τ ( x s ) = 1 . (3)The known factor T is computed analytically using the expression: T ( x ) = | x − x s | v ( x s ) , (4)where v ( x s ) is the velocity at the source location. A feed-forward neural network is a set of neurons organized in layers in which evaluations are performed sequentiallythrough the layers. It can be seen as a computational graph having an input layer, an output layer, and an arbitrarynumber of hidden layers. In a fully connected neural network, neurons in adjacent layers are connected with each otherbut neurons within a single layer share no connection. 3

PREPRINT - J

ULY

17, 2020Thanks to the universal approximation theorem, a neural network with n neurons in the input layer and m neurons inthe output layer can be used to represent a function u : R n → R m [35], as shown in Figure 1. For illustration, weconsider a network of L + 1 layers starting with input layer 0, the output layer L , and L − hidden layers. The numberof neurons in each layer is denoted as k = n, k , · · · , k L = m . Each connection between the i -th neuron in layer l − and j -th neuron in layer l has a weight w lji associated with it. Moreover, for each neuron in layer l , we have anassociated bias term b i , i = 1 , · · · , k l . Each neuron represents a mathematical operation, whereby it takes a weightedsum of its inputs plus a bias term and passes it through an activation function. The output from the k -th neuron in layer l is given as [36]: u lk = σ  k l − (cid:88) j =1 w lkj u l − j + b lk  , (5)where σ () represents the activation function. Commonly used activation functions are the logistic sigmoid, thehyperbolic tangent, and the rectiﬁed linear unit [37]. By dropping the subscripts, we can write equation (5) compactlyin the vectorial form: u l = σ (cid:0) W l u l − + b l (cid:1) , (6)where W l is the matrix of weights corresponding to connections between layers l − and l , u l and b l are vectorsgiven by u lk and b lk , respectively, and the activation function is applied element-wise. Computational frameworks, suchas Tensorflow [38], can be used to efﬁciently evaluate data ﬂow graphs like the one given in equation (6) efﬁcientlyusing parallel execution. The input values can be deﬁned as tensors (multi-dimensional arrays) and the computation ofthe outputs is vectorized and distributed across the available computational resources for efﬁcient evaluation.

Neural networks are well-known for their strong representational power. It has been shown that a neural network with asingle hidden layer and a ﬁnite number of neurons can be used to represent any bounded continuous function to anydesired accuracy. This is also known as the universal approximation theorem [39, 35]. It was later shown that by usinga non-linear activation function and a deep network, the total number of neurons can be signiﬁcantly reduced [40].Therefore, we seek a trained deep neural network (DNN) that could represent the mapping between the input ( x ) andthe output ( τ ( x ) ) of the factored eikonal equation for a given velocity model ( v ( x ) ).It is worth noting that while neural networks are, in theory, capable of representing very complex functions compactly,ﬁnding the actual parameters (weights and biases) needed to solve a given PDE can be quite challenging. Solving a PDE using PINNs requires derivatives of the network’s output with respect to the input. There are fourpossible ways to compute derivatives [27, 41]: (1) hand-coded analytical derivatives, (2) symbolic differentiation,(3) numerical approximation such as ﬁnite-difference, and (4) automatic differentiation (AD).Manually working out the derivatives may be exact, but they are not automated, and thus, impractical. Symbolicdifferentiation is also exact, but it is memory intensive and prohibitively slow as one could end up with exponentiallylarge expressions to evaluate. While numerical differentiation is easy to implement, it can be highly inaccurate due toround-off errors. On the contrary, AD uses exact expressions with ﬂoating-point values instead of symbolic stringsand it involves no approximation error, resulting in accurate evaluation of derivatives at machine precision. However,an efﬁcient implementation of AD can be non-trivial. Fortunately, many existing computational frameworks suchas

Tensorflow [38] and

PyTorch [42] have made available efﬁciently implemented AD libraries. In fact, in deeplearning, backpropagation [43], a generalized technique of AD, has been the mainstay for training neural networks.To understand how AD works, consider a simple fully-connected neural network with two inputs ( x , x ) , one output ( y ) , and one neuron in the hidden layer. Let us assume the network’s weights and biases are assigned such that: ν = 2 x + 3 x − ,h = σ ( ν ) = 11 + e − ν ,y = 5 h + 2 , (7)where h represents the output from the neuron in the hidden layer computed by applying the sigmoid function ( σ ) onthe weighted sum of the inputs ( ν ) .To illustrate the idea, let us say we are interested in computing partial derivatives ∂yx and ∂yx at ( x , x ) = (1 , − . ADrequires one forward pass and backward pass through the network to compute these derivatives as detailed in Table 1.4 PREPRINT - J

ULY

17, 2020To compute high-order derivatives, AD can be applied recursively through the network in the same manner. For adeeper understanding of AD, we refer the interested reader to [44].

Forward pass Reverse pass x = 1 x = − ∂y∂y = 1 ν = 2 x + 3 x − − h = 11 + e − ν = 0 . ∂y∂h = ∂ (5 h + 2) ∂h = 5 ∂y∂ν = ∂y∂h . ∂h∂ν = 5 × e − ν (1 + e − ν ) = 0 . y = 5 h + 2 = 2 . ∂y∂x = ∂y∂ν . ∂ν∂x = ∂y∂ν × . ∂y∂x = ∂y∂ν . ∂ν∂x = ∂y∂ν × . Table 1: Example of forward and reverse pass computations needed by AD to compute partial derivates of the outputwith respect to the inputs at ( x , x ) = (1 , − for the expressions given in equation (7). To solve the eikonal equation (1), we leverage the capabilities of neural networks as function approximators and deﬁnea loss function that minimizes the residual of the factored eikonal equation at a chosen set of training (collocation)points. This is achieved with (i) a DNN approximation of the unknown traveltime ﬁeld variable τ ( x ) ; (ii) a loss functionincorporating the eikonal equation and sampled on a collocation grid; (iii) a differentiation algorithm, i.e. AD in thiscase, to evaluate partial derivatives of τ ( x ) with respect to the spatial coordinates.To illustrate the idea, let us consider a two-dimensional domain Ω ∈ R where x = ( x, z ) ∈ [0 , , as shown in Figure 2.A source term is considered at x s = ( x s , z s ) , where τ ( x s ) = 1 . The unknown traveltime factor τ ( x ) is approximatedby a multilayer DNN N τ , i.e. τ ( x ) ≈ ˆ τ ( x ) = N τ ( x ; θ ) , where x = ( x, z ) are network inputs, ˆ τ is the network output,and θ represents the set of all trainable parameters of the network.The loss function can now be constructed using a mean-squared-error (MSE) norm as: J = 1 N I (cid:88) x ∗ ∈ I (cid:107)L(cid:107) + 1 N I (cid:88) x ∗ ∈ I (cid:107)H ( − ˆ τ ) | ˆ τ |(cid:107) + (cid:107) ˆ τ ( x s ) − (cid:107) , (8)where L = T |∇ ˆ τ | + ˆ τ |∇ T | + 2 T ˆ τ ( ∇ T . ∇ ˆ τ ) − v ( x ) , (9)forms the residual of the factored eikonal equation.The ﬁrst term on the right side of equation 8 imposes the validity of the factored eikonal equation on a given set oftraining points x ∗ ∈ I , with N I as the number of sampling points. The second term forces the solution ˆ τ to be positiveby penalizing negative solutions using the Heaviside function H () . The last term requires the solution to be unity at thepoint-source x s = ( x s , z s ) .Network parameters θ are then identiﬁed by minimizing the loss function (8) on a set of sampling (training) points x ∗ ∈ I , i.e., arg min θ J ( x ∗ ; θ ) (10)Once the DNN is trained, we evaluate the network on a set of regular grid-points in the computational domain to obtainthe unknown traveltime part. The ﬁnal traveltime solution is obtained by multiplying it with the known traveltime part,i.e., ˆ T ( x ) = T ( x ) ˆ τ ( x ) . (11)5 PREPRINT - J

ULY

17, 2020

Offset (km) D e p t h ( k m ) k m / s Figure 2: A velocity model with a constant velocity gradient of 0.5 s − in the vertical direction. The velocity at zerodepth is equal to 2 km/s and it increases linearly to 3 km/s at a depth of 2 km. Black star indicates the point-sourcelocation used for the tests.It is worth emphasizing that the proposed approach is different from traditional (or non-physics constrained) deeplearning techniques. The training of the network here refers to the tuning of weights and biases of the network such thatthe resulting solution minimizes the loss function J on a given set of training points. Also noteworthy is the fact that,contrary to supervised learning applications, the network here learns without any labeled set. To understand this point,consider a randomly initialized network, which will output a certain value ˆ τ i,j for each point ( i, j ) in the training set.These output values will be used to calculate the residual using equation 8. Based on this residual, the network adjustsits weights and biases allowing it to produce ˆ τ that adheres to the underlying factored eikonal equation (3). In this section, we test the proposed PINN eikonal solver for computing traveltimes emanating from a point-source.We consider models with smoothly varying velocity ﬁelds along the horizontal and vertical directions, and a highlyheterogeneous portion from the well-known Marmousi model. For each case presented below, we minimize theloss function given in equation 8 using the Adam optimizer with full-batch optimization. Moreover, the activationfunction used for the hidden layers of the network is the inverse tangent (arctan) whereas the ﬁnal layer is linear. ThePINN framework is implemented using the

SciANN package [45] – a high level

Tensorflow wrapper for scientiﬁccomputations. For comparison, we use the ﬁrst-order ﬁnite-difference fast marching solution, which is routinely usedfor traveltime computations in seismological applications.First, we consider a × km model with vertically varying velocity. The velocity at zero depth is 2 km/s and itincreases linearly with a gradient of 0.5 s − . We consider the point-source to be located at (1 km , km ) . The model isshown in Figure 2 with the black star depicting the point-source location. For a model with a constant velocity gradient,the analytical traveltime solution is given as [46]: T ( x ) = 1 (cid:112) g v + g h arccosh (cid:32) (cid:0) g v + g h (cid:1) | x − x s | v ( x ) v ( x s ) (cid:33) , (12)where T ( x ) is the traveltime value at some grid point x = ( x, y ) , from a point-source located at x s . Likewise, v ( x ) is the velocity at the grid-point x and v ( x s ) is the velocity at the point-source location. The velocity gradients alongthe vertical and horizontal dimensions are denoted by g v and g h , respectively. Therefore, for the model in Figure 2, g v = 0 . s − , g h = 0 s − , x s = (1 km , km ) , and v ( x s ) = 2 . km/s.We begin the tests by studying the predictive accuracy of the PINN eikonal solver for different network architectures.For the considered point-source, we compute traveltime solutions by using a number of different architectures formedby varying the number of hidden layers and the number of neurons in each hidden layer. For each architecture, we use2600 training points within the computational domain. As shown in Table 2, we compute relative L error betweenthe predicted solution and the analytical solution computed using equation 12. As expected, we observe that as we6 PREPRINT - J

ULY

17, 2020 (cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)

Layers Neurons 2 4 6 8 102 9.59e-03 2.12e-03 1.28e-03 7.91e-04 5.25e-044 1.96e-03 1.45e-03 5.06e-04 3.47e-04 4.04e-046 2.96e-03 8.31e-04 9.34e-04 5.05e-04 2.58e-048 7.78e-04 4.45e-04 1.01e-03 8.16e-04 5.95e-0410 1.99e-03 3.10e-04 3.99e-04 4.17e-04 3.16e-04Table 2: Relative L errors between the predicted and the analytical solutions for different number of hidden layers andneurons per layer. The velocity model used is shown in Figure 2 and the point-source is located at (1 km , km ) . Offset (km) D e p t h ( k m ) s e c o n d s (a) Offset (km) D e p t h ( k m ) s e c o n d s (b) Figure 3: The absolute traveltime errors for the PINN eikonal solution (a) and the ﬁrst-order fast marching solution (b)for the velocity model and the source location shown in Figure 2.

Offset (km) D e p t h ( k m ) AnalyticalPINNFast marching

Figure 4: Traveltime contours for the analytical solution (red), PINN eikonal solution (dashed black), and the ﬁrst-orderfast marching solution (dotted blue). The velocity model and the source location considered are shown in Figure 2.increase the number of layers and neurons, the predictive accuracy of the network is improved as it can approximatemore complex functions.Next, in Figure 3(a), we show the absolute traveltime errors for PINN eikonal solver considering the same velocitymodel and source position. The network contains 10 hidden layers and 20 neurons in each layer and is trained on 2000randomly selected points in the computational domain. Once the network is trained, we evaluate the network on a7

PREPRINT - J

ULY

17, 2020regular grid with 20 m grid spacing along the horizontal and vertical dimensions. For comparison, we also plot absolutetraveltime errors for the ﬁrst-order fast marching solution in Figure 3(b) on the same regular grid. We observe thatdespite using only 2000 randomly selected grid points, the PINN eikonal solution is signiﬁcantly more accurate thanthe ﬁrst-order fast marching solution. As can be seen in Figure 3(b), the fast marching solution suffers from large errorsin the diagonal direction, whereas the errors for the PINN eikonal solver are more randomly distributed. We also plottraveltime contours in Figure 4 comparing the analytical solution with the PINN eikonal solution and the ﬁrst-order fastmarching solution visually.Next, we investigate the applicability of transfer learning to the PINN eikonal solver. Transfer learning is a machinelearning technique that relies on storing knowledge gained while solving one problem and applying it to a differentbut related problem. We explore if the network trained for the previous example can be used to compute the solutionfor a different source location and velocity model. To this end, we consider a different velocity model with a verticalgradient of 1 s − and a horizontal gradient of 0.5 s − . The point-source is also relocated to (1 . km , . km ) as shownin Figure 5. Offset (km) D e p t h ( k m ) k m / s Figure 5: A velocity model with a constant vertical velocity gradient of 1 s − and a horizontal velocity gradient of0.5 s − . Black star indicates the point-source location used for the test. Epochs L o ss Random initial modelPre-trained initial model

Figure 6: A comparison of convergence history for training of the examples shown in Figure 2 (blue) and Figure 5(orange). The former example used a network with random initialization, whereas the network of the latter examplewas initialized using pre-trained weights from the former example.To train the network for this case, instead of initializing the network with random weights, we use the weights fromthe network trained for the previous example. We observe that the network converges much faster than the previousexample, which was randomly initialized, as shown in Figure 6. This is a highly desirable property of the PINN eikonalsolver since seismic applications, such as earthquake source localization or seismic imaging, require repeated traveltime8

PREPRINT - J

ULY

17, 2020

Offset (km) D e p t h ( k m ) s e c o n d s (a) Offset (km) D e p t h ( k m ) s e c o n d s (b) Figure 7: The absolute traveltime errors for the PINN eikonal solution (a) and the ﬁrst-order fast marching solution (b)for the velocity model and the source location shown in Figure 5.

Offset (km) D e p t h ( k m ) AnalyticalPINNFast marching

Figure 8: Traveltime contours for the analytical solution (red), PINN eikonal solution (dashed black), and the ﬁrst-orderfast marching solution (dotted blue). The velocity model and the source location considered are shown in Figure 5.computations for multiple source locations and updated velocity models. In comparison, conventional numericalalgorithms, such as the fast marching method, require the same computational effort for even a slight variation in thevelocity model or source location, which is a major source of the computational bottleneck, particularly in large 3Dmodels.Figure 7 compares the absolute traveltime errors computed using the PINN eikonal solution with the pre-trained initialmodel and the ﬁrst-order fast marching solution. We observe that despite using signiﬁcantly fewer epochs than in theprevious example, the solution accuracy is not compromised. The traveltime contours, shown in Figure 8, conﬁrm thisobservation visually.Next, we explore if a PINN model trained on solutions computed for various source locations in a given velocity modelcan be used as a surrogate model. To do this, the network is modiﬁed to include the source location x s = ( x s , z s ) asinputs in addition to the grid points x = ( x, z ) . We train the network on solutions computed for 16 sources located atregular intervals in a constant vertical gradient model, as shown in Figure 9. Through this training process, the networklearns the mapping between source locations, x s , and the corresponding traveltime ﬁelds, T ( x ) . Once the surrogatemodel is trained with source locations as additional input parameters, traveltime ﬁelds for new source locations for the9 PREPRINT - J

ULY

17, 2020

Offset (km) D e p t h ( k m ) k m / s Figure 9: A velocity model with a constant velocity gradient of 0.5 s − in the vertical direction. Black stars indicatelocations of sources used to train the network as a surrogate model. Offset (km) D e p t h ( k m ) s e c o n d s (a) Offset (km) D e p t h ( k m ) s e c o n d s (b) Figure 10: The absolute traveltime errors for the solution computed using the PINN surrogate model (a) and theﬁrst-order fast marching solution (b) for the velocity model shown in Figure 9 and a randomly chosen source point.given velocity model can be computed rapidly with just a single evaluation of the network. This is similar to obtainingan analytic solver as no training is needed for computing traveltimes corresponding to additional source locations.To demonstrate if the surrogate model approach yields an accurate solution, we use the trained surrogate model tocompute traveltime solutions for a randomly chosen source location in the model. We can conﬁrm by looking atthe absolute traveltime errors, shown in Figure 10, that the trained surrogate model yields a highly accurate solutioncompared to the ﬁrst-order fast marching solution even though no training is performed for this randomly chosen sourcepoint. This observation is conﬁrmed by seeing the traveltime contours in Figure 8.Moreover, transfer learning can be used to efﬁciently build surrogate models for updated velocity models, i.e. byinitializing the PINN surrogate model for the updated velocities using weights from the already trained surrogate model.Therefore, the transfer learning technique combined with surrogate modeling can be used to build a highly efﬁcienttraveltime modeling algorithm for seismic inversion compared to conventional numerical algorithms that do not affordsuch ﬂexibility.Finally, we test the PINN eikonal solver on a highly heterogeneous portion of the Marmousi model as shown inFigure 12. We consider a source located at (1 km , km ) . This is a particularly challenging model due to sharp velocity10 PREPRINT - J

ULY

17, 2020

Offset (km) D e p t h ( k m ) AnalyticalPINNFast marching

Figure 11: Traveltime contours for solutions obtained using the analytical formula (red), the PINN surrogate model(dashed black), and the ﬁrst-order fast marching solver (dotted blue). The velocity model considered is shown inFigure 9.

Offset (km) D e p t h ( k m ) k m / s Figure 12: A highly heterogeneous portion of the Marmousi model used for traveltime computation. Black star indicatesthe point-source location used for the test.variations. We train the network on 5000 randomly selected points in the computational domain. Once the network istrained, we evaluate the network on a regular grid with 20 m grid spacing along the horizontal and vertical dimensions.For comparison, we also compute the traveltime solution using the ﬁrst-order fast marching method on the same regulargrid. The absolute traveltime errors for both the approaches are compared in Figure 13 and the traveltime contours areshown in Figure 14. We observe signiﬁcantly better accuracy for the PINN eikonal solver compared to the ﬁrst-orderfast marching method. However, for the PINN eikonal solution, we observe a small region of large errors indicated bythe red arrow in Figure 14. This coincides with the region of the largest velocity gradient indicating that PINN modelsmay suffer from inaccuracies in such cases. Compared to previous examples, we also observe slower convergence forthe Marmousi example, which is explained by the highly non-convex loss function in this case.

In a conventional deep learning application, a neural network is trained by minimizing a loss function that typicallymeasures the mismatch between the network’s predicted output and some known outputs, also known as training data.However, there are several limitations associated with such models that solely rely on a labeled dataset and are oblivious11

PREPRINT - J

ULY

17, 2020

Offset (km) D e p t h ( k m ) s e c o n d s (a) Offset (km) D e p t h ( k m ) s e c o n d s (b) Figure 13: The absolute traveltime errors for the PINN eikonal solution (a) and the ﬁrst-order fast marching solution (b)for the velocity model and the source location shown in Figure 12.

Offset (km) D e p t h ( k m ) AnalyticalPINNFast marching

Figure 14: Traveltime contours for the reference solution (red), PINN eikonal solution (dashed black), and the ﬁrst-orderfast marching solution (dotted blue). The velocity model and the source location considered are shown in Figure 12.The red arrow indicates visible region of inaccuracy in the PINN eikonal solution.to the scientiﬁc principles governing real-world phenomena. For cases when the available training and test data areinsufﬁcient, such models often learn spurious relationships that are misleading. However, the biggest concern of such adata-driven model is the lack of scientiﬁc consistency of their predictions to known physical laws that have been thecornerstone of knowledge discovery across scientiﬁc disciplines for centuries.Therefore, in the present work, we propose an eikonal solver based on the framework of physics-informed neuralnetworks [32]. We leverage the capabilities of neural networks as universal function approximators [35] and deﬁne a lossfunction to minimize the residual of the governing eikonal equation at a chosen set of training points. This is achievedwith a simple feed-forward neural network leveraging the concept of automatic differentiation [27]. Through numericaltests, we observe that the proposed algorithm yield sufﬁciently accurate traveltimes for most seismic applications ofinterest. We demonstrate this by comparing the accuracy of the proposed approach against the ﬁrst-order fast marchingsolution, which is a popular numerical algorithm for solving the eikonal equation.We ﬁnd that the training cost of the PINN eikonal solver can be reduced by using a few randomly selected grid pointsfrom the computational domain, particularly for smooth velocity models. For tests on the smoothly varying velocitymodels with constant horizontal and vertical gradients, we randomly selected only 20% of the total grid points, whereaswe used 50% of the total grid points for the Marmousi model example. We also observe that the transfer learning12

PREPRINT - J

ULY

17, 2020technique can be used to speed up the convergence of the network for new source locations and/or updated velocitymodels by initializing the PINN model with the weights of a previously trained network. Moreover, having computedsolutions corresponding to a few sources for a given velocity model, we can also build a surrogate model with respect tothe source locations by adding them as input parameters. This essentially means that this surrogate model can thenbe used to compute traveltime ﬁelds corresponding to new source locations for the same velocity model just by asingle evaluation of the network. These observations effectively demonstrate the potential of the proposed approach inmassively speeding up many seismic applications that rely on repeated traveltime computations for multiple sourcelocations and velocity models. Moreover, the extension of the proposed framework to more complex eikonal equations,for example the anisotropic eikonal equation, is much simpler compared to the conventional numerical algorithms.Nevertheless, there are a few challenges in our observation that need further investigation. Most importantly, weobserve slow convergence for velocity models with sharp variations. This needs to be addressed particularly for areaswith complex subsurface geologic structures. Possible solutions to this problem may include a denser sampling oftraining points around parts of the velocity model with a large velocity gradient [47] or the use of non-local PINNframework [48]. Another challenge is concerning the optimal choice of the network hyper-parameters that are oftenhighly problem dependent. Recent advances in the ﬁeld of meta-learning may enable automated selection of optimumarchitectures.

We proposed a method to solve the eikonal equation using the deep learning framework. Through tests on benchmarksynthetic models, we show that the accuracy of the proposed approach is better than the ﬁrst-order fast marchingsolution. Depending on the heterogeneity in the velocity model, we also note that training is needed for only a fractionof the total grid points in the computational domain to reliably reconstruct the solution. We also observed that transferlearning can be used to speed up convergence for new velocity models and/or source locations. Moreover, havingcomputed solutions corresponding to a few source locations for a given velocity model, surrogate modeling can beused to train a network to instantly yield traveltime solutions corresponding to new source locations. These properties,not afforded by the conventional numerical algorithms, potentially allow us to massively speed up seismic inversionapplications, particularly for large 3D models. We demonstrated the proposed framework on 2D models for thesimplicity of illustration. The extension to 3D velocity models is straightforward.

References [1] Jaume Masoliver and Ana Ros. From classical to quantum mechanics through optics.

European journal of physics ,31(1):171, 2009.[2] Demetrius T Paris and Frank Kenneth Hurd.

Basic electromagnetic theory . McGraw-Hill Education, 1969.[3] Vladimir Igorevich Arnold.

Mathematical methods of classical mechanics , volume 60. Springer Science &Business Media, 2013.[4] David Adalsteinsson and James A Sethian. A fast level set method for propagating interfaces.

J. Comput. Phys ,118(2), 1994.[5] Elisabeth Rouy and Agnès Tourin. A viscosity solutions approach to shape-from-shading.

SIAM Journal onNumerical Analysis , 29(3):867–884, 1992.[6] Ravikanth Malladi and James A Sethian. A uniﬁed approach to noise removal, image enhancement, and shaperecovery.

IEEE Transactions on Image Processing , 5(11):1554–1568, 1996.[7] Christopher Alvino, Gozde Unal, Greg Slabaugh, Bertrand Peny, and Tong Fang. Efﬁcient segmentation based oneikonal and diffusion equations.

International Journal of Computer Mathematics , 84(9):1309–1324, 2007.[8] Zhujiang Cao, Shiyan Pan, Rui Li, Ramya Balachandran, J Michael Fitzpatrick, William C Chapman, andBenoit M Dawant. Registration of medical images using an interpolated closest point transform: method andvalidation.

Medical image analysis , 8(4):421–427, 2004.[9] Rodrigo Ventura and Aamir Ahmad. Towards optimal robot navigation in domestic spaces. In

Robot Soccer WorldCup , pages 318–331. Springer, 2014.[10] Clment Petres, Yan Pailhas, Pedro Patron, Yvan Petillot, Jonathan Evans, and David Lane. Path planning forautonomous underwater vehicles.

IEEE Transactions on Robotics , 23(2):331–341, 2007.[11] Santiago Garrido, David Álvarez, and Luis Moreno. Path planning for mars rovers using the fast marching method.In

Robot 2015: Second Iberian Robotics Conference , pages 93–105. Springer, 2016.13

PREPRINT - J

ULY

17, 2020[12] Ron Kimmel and James A Sethian. Computing geodesic paths on manifolds.

Proceedings of the national academyof Sciences , 95(15):8431–8435, 1998.[13] Alon Spira and Ron Kimmel. An efﬁcient solution to the eikonal equation on parametric manifolds.

Interfacesand Free Boundaries , 6(3):315–327, 2004.[14] John Joseph Helmsen, Elbridge Gerry Puckett, Phillip Colella, and Milo Dorr. Two new methods for simulatingphotolithography development in 3d. In

Optical Microlithography IX , volume 2726, pages 253–261. InternationalSociety for Optics and Photonics, 1996.[15] D Adalsteinsson and JA Sethian. Level set methods for etching, deposition and photolithography development.

Journal of Technology Computer Aided Design TCAD , pages 1–67, 1996.[16] Don C Lawton. Computation of refraction static corrections using ﬁrst-break traveltime differences.

Geophysics ,54(10):1289–1296, 1989.[17] Cédric Taillandier, Mark Noble, Hervé Chauris, and Henri Calandra. First-arrival traveltime tomography based onthe adjoint-state method.

Geophysics , 74(6):WCB1–WCB10, 2009.[18] Vladimir Grechka, Alejandro De La Pena, Estelle Schisselé-Rebel, Emmanuel Auger, and Pierre-Francois Roux.Relative location of microseismicity.

Geophysics , 80(6):WC1–WC9, 2015.[19] Gilles Lambare, Stephane Operto, Pascal Podvin, and Philippe Thierry. 3d ray+ born migration/inversion—part 1:Theory.

Geophysics , 68(4):1348–1356, 2003.[20] John N Tsitsiklis. Efﬁcient algorithms for globally optimal trajectories.

IEEE Transactions on Automatic Control ,40(9):1528–1538, 1995.[21] James A Sethian. A fast marching level set method for monotonically advancing fronts.

Proceedings of theNational Academy of Sciences , 93(4):1591–1595, 1996.[22] Hongkai Zhao. A fast sweeping method for eikonal equations.

Mathematics of computation , 74(250):603–627,2005.[23] Javier V Gómez, David Álvarez, Santiago Garrido, and Luis Moreno. Fast methods for eikonal equations: anexperimental survey.

IEEE Access , 7:39005–39029, 2019.[24] Maryam M Najafabadi, Flavio Villanustre, Taghi M Khoshgoftaar, Naeem Seliya, Randall Wald, and EdinMuharemagic. Deep learning applications and challenges in big data analytics.

Journal of Big Data , 2(1):1, 2015.[25] Hyuk Lee and In Seok Kang. Neural algorithm for solving differential equations.

Journal of ComputationalPhysics , 91(1):110–131, 1990.[26] Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artiﬁcial neural networks for solving ordinary andpartial differential equations.

IEEE transactions on neural networks , 9(5):987–1000, 1998.[27] Atılım Günes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automaticdifferentiation in machine learning: a survey.

The Journal of Machine Learning Research , 18(1):5595–5637,2017.[28] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling using deepneural networks with embedded invariance.

Journal of Fluid Mechanics , 807:155–166, 2016.[29] Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial differential equations using deeplearning.

Proceedings of the National Academy of Sciences , 115(34):8505–8510, 2018.[30] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differentialequations.

Journal of Computational Physics , 375:1339–1364, 2018.[31] Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, and Ken Perlin. Accelerating eulerian ﬂuid simulationwith convolutional networks. In

Proceedings of the 34th International Conference on Machine Learning-Volume70 , pages 3424–3433. JMLR. org, 2017.[32] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learningframework for solving forward and inverse problems involving nonlinear partial differential equations.

Journal ofComputational Physics , 378:686–707, 2019.[33] Jianliang Qian and William W Symes. An adaptive ﬁnite-difference method for traveltimes and amplitudes.

Geophysics , 67(1):167–176, 2002.[34] Sergey Fomel, Songting Luo, and Hongkai Zhao. Fast sweeping method for the factored eikonal equation.

Journalof Computational Physics , 228(17):6440–6455, 2009.14

PREPRINT - J

ULY

17, 2020[35] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approxi-mators.

Neural networks , 2(5):359–366, 1989.[36] Christopher M Bishop.

Pattern recognition and machine learning . Springer, 2006.[37] P Sibi, S Allwyn Jones, and P Siddarth. Analysis of different activation functions using back propagation neuralnetworks.

Journal of Theoretical and Applied Information Technology , 47(3):1264–1268, 2013.[38] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, AndyDavis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, MichaelIsard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, RajatMonga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever,Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden,Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorﬂow.org.[39] George Cybenko. Approximation by superpositions of a sigmoidal function.

Mathematics of control, signals andsystems , 2(4):303–314, 1989.[40] Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks:A view from the width. In

Advances in neural information processing systems , pages 6231–6239, 2017.[41] Charles C Margossian. A review of automatic differentiation and its efﬁcient implementation.

Wiley Interdisci-plinary Reviews: Data Mining and Knowledge Discovery , 9(4):e1305, 2019.[42] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin,Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In

Proceedings of NeuralInformation Processing Systems , 2017.[43] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagatingerrors. nature , 323(6088):533–536, 1986.[44] Conal Elliott. The simple essence of automatic differentiation.

Proceedings of the ACM on ProgrammingLanguages , 2(ICFP):1–29, 2018.[45] Ehsan Haghighat and Ruben Juanes. Sciann: A keras wrapper for scientiﬁc computations and physics-informeddeep learning using artiﬁcial neural networks. arXiv preprint arXiv:2005.08803 , 2020.[46] M Slotnick. Lessons in seismic computing.

Soc. Expl. Geophys , 268, 1959.[47] Cosmin Anitescu, Elena Atroshchenko, Naif Alajlan, and Timon Rabczuk. Artiﬁcial neural network methods forthe solution of second order boundary value problems.

Computers, Materials and Continua , 59(1):345–359, 2019.[48] Ehsan Haghighat, Ali Can Bekar, Erdogan Madenci, and Ruben Juanes. A nonlocal physics-informed deeplearning framework using the peridynamic differential operator. arXiv preprint arXiv:2006.00446arXiv preprint arXiv:2006.00446