Numerical analysis of a deep learning formulation of elastic full waveform inversion with high order total variation regularization in different parameterization
NNumerical analysis of a deep learning formulation of elasticfull waveform inversion with high order total variationregularization in different parameterization
Tianze Zhang ∗ Jian Sun ∗† Kristopher A. Innanen ∗ Daniel Trad ∗∗ University of Calgary,Department of Geoscience,Calgary, Alberta, Canada. † Pennsylvania State University,College of Earth and Mineral Science,University Park, Pennsylvania, USA (January 25, 2021)Running head:
ABSTRACT
We have formulated elastic seismic full waveform inversion (FWI) within a deep learningenvironment. Training this network with a single seismic data set is equivalent to carryingout elastic FWI. There are three main motivations for this effort. The first is an interestin developing an inversion algorithm which is more or less equivalent to standard elasticFWI but which is ready-made for a range of cloud computational architectures. The secondis an interest in algorithms which can later (i.e., not in this paper) be coupled with amore involved training component, involving multiple datasets. The third is a generalinterest in developing the idea of theory-guiding within machine learning solutions for largegeophysical problems, wherein the number of degrees of freedom within a network, and the1 a r X i v : . [ phy s i c s . g e o - ph ] J a n eliance on exhaustive training data, are both reduced by constraining the network withphysical rules. In our formulation, a recurrent neural network is set up with rules enforcingelastic wave propagation, with the wavefield projected onto a measurement surface actingas the synthetic data to be compared with observed seismic data. Gradients for iterativeupdating of an elastic model, with a variety of parameterizations and misfit functionals, canbe efficiently constructed within the network through the automatic differential method.With this method, the inversion based on complex misfits can be calculated. We surveythe impact of different complex misfits (based on the l , l ) with high order total variation(TV) regulations on multiparameter elastic FWI recovery of models within velocity/density,modulus/density, and stiffness parameter/density parameterizations. We analyze parametercross-talk. Inversion results on simple and complex models show that the RNN elasticFWI with high order TV regulation using l norm can help mitigate cross-talk issues withgradient-based optimization methods. 2 NTRODUCTION
It has recently been shown (Sun et al., 2020) that seismic wave propagation can be simulatedwith a specialized recurrent neural network (RNN), and that the process of training sucha network with a single seismic data set is equivalent to carrying out seismic full waveforminversion (FWI). We are motivated to extend and expand on these results, because of (1)the apparent potential for wider training of such a network to combat common FWI issuessuch as modelling error; (2) the opportunities for efficient computation (e.g., cloud) offeredby FWI realized on platforms like TensorFlow, and (3) an interest in the behavior of morecomplex RNN-FWI formulations than previously analyzed, e.g., multi-parameter elasticFWI. In this paper we report on progress on the third of these.The application of machine learning methods to seismic problems has been underwayfor decades; for example, R¨oth and Tarantola (1994) presented a neural network-basedinversion of time-domain seismic data for acoustic velocity depth-profiles. However, theevolution of these network approaches into deep learning methods, and the results whichhave subsequently become possible, make many aspects of the discipline quite new, andexplain the major recent surge in development and interest. Now, novel seismic applicationsare being reported in fault detection, denoising, reservoir prediction and velocity inversion(e.g., Jin et al., 2019; Zheng et al., 2019; Peters et al., 2019; Chen et al., 2019; Li et al., 2019a;Smith et al., 2019; Shustak and Landa, 2018). Specifically in seismic velocity inversion,Sun and Demanet (2018) applied a deep learning method to the problem of bandwidthextension; Zhang et al. (2019) designed an end-to-end framework, the velocity-GAN, whichgenerates velocity images directly from the raw seismic waveform data; Wu and Lin (2018)trained a network with an encoder-decoder structure to model the correspondence between3eismic data and subsurface velocity structures; and Yang and Ma (2019) investigated asupervised, deep convolutional neural network for velocity-model building directly fromraw seismograms.The above examples are purely data-driven, in the sense that they involve no assumedtheoretical/physical relationships between the input layer (e.g., velocity model) and outputlayer (e.g., seismic data). We believe that the advantages of the purely data-driven methodsare that once the training for the network to perform inversion is finished, take the data-driven training for a network that can perform FWI as an example, the raw seismograms canbe directly mapped to the velocity models. This would become a faster inversion methodcompared with the conventional inversion method that requires iterations. However, seismicinversion is a sophisticated issue, so how we choose the sufficient amount of training datasets that can represent such complex wave physics features and their corresponding velocitymodels is a hard problem.Here we distinguish between such methods and those belonging to the theory-guided
AI paradigm (e.g., Wagner and Rondinelli, 2016; Wu et al., 2018; Karpatne et al., 2017).Theory-guided deep learning networks are, broadly, those which enforce physical rules onthe relationships between variables and/or layers. This may occur through the trainingof a standard network with simulated or synthetic data, which are created using physicalrules, or by holding some weights in a network, chosen to force the network to mimic aphysical process, fixed and un-trainable. Theory-guiding was explicitly used in the formersense by Downton and Hampson (2018), in which a network designed to predict well logproperties from seismic amplitudes was trained not only with real data but with syntheticsderived from the Zoeppritz equations. Bar-Sinai et al. (2019) and Raissi (2018) built deepconvolutional neural networks (or CNNs) to solve partial differential equations, i.e., explic-4tly using theoretical models within the design, which is an example of theory guiding inthe latter sense. Sun et al. (2020), in the work we extend in this paper, similarly set upa deep recurrent neural network to simulate the propagation of a seismic wave through ageneral acoustic medium. The network is set up in such a way that the trainable weightscorrespond to the medium property unknowns (i.e., wave velocity model), and the non-trainable weights correspond to the mathematical rules (differencing, etc.) enforcing wavepropagation. The output layer was the wave field projected onto a measurement surface,i.e., a simulation of measured seismic data. The training of the Sun et al. (2020) networkwith a single data set was shown to be an instance of full waveform inversion.Conventional (i.e., not network-based) seismic full waveform inversion, or FWI, is acomplex data fitting procedure aimed at extracting important information from seismicrecords. Key ingredients are an efficient forward modeling method to simulate syntheticdata, and a local differential approach, in which the gradient and direction are calculatedto update the model. When updating several parameters simultaneously, which is a mul-tiparameter full waveform inversion, error in one parameter tends to produce updates inothers, a phenomenon referred to as inter-parameter trade-off or cross-talk (e.g., Kameiand Pratt, 2013; Alkhalifah and Plessix, 2014; Innanen, 2014; Pan et al., 2018; Keatingand Innanen, 2019). The degree of trade-off or cross-talk between parameters can dependsensitively on the specific parameterization; even though, for instance, λ , µ and ρ have thesame information content as V P , V S , and ρ , updating in one can produce markedly differentconvergence properties and final models that doing so in the other. The coupling betweendifferent physical parameters is controlled by the physical relationships amongst these pa-rameters, and the relationships between the parameters and the wave which interacts withthem. Tarantola (1986), by using the scattering or radiation pattern, systematically ana-5yzed the effect of different model parameterizations on isotropic FWI. He suggested thatthe greater the difference in the scattering pattern between each parameter, the better theparameters can be resolved. K¨ohn et al. (2012) showed that across all geophysical param-eterizations, within isotropic-elastic FWI, density is always the most challenging to resolve(e.g., Tarantola, 1984; Plessix, 2006; Kamath et al., 2018; Lailly, 1983; Operto et al., 2013;Oh and Alkhalifah, 2016; Pan et al., 2016; Keating and Innanen, 2017).In RNN-based FWI, then, a natural step is the extension of the Sun et al. (2020) result,which involves a scalar-acoustic formulation of FWI, to a multi-parameter elastic version.Cells within a deep recurrent neural network are designed such that the propagation ofinformation through the network is equivalent to the propagation of an elastic wavefieldthrough a 2D heterogeneous and isotropic-elastic medium; the network is equipped to ex-plore a range of parameterizations and misfit functionals for training based on seismic data.As with the acoustic network, the output layer is a projection of the wave field onto somedefined measurement surface, simulating measured data.In addition to providing a framework for inversion methods which mixes the features ofFWI with the training capacity of a deep learning network, this approach also allows forefficient calculation of the derivatives of the residual through automatic differential (AD)methods (e.g., Li et al., 2019b; Sambridge et al., 2007), an engine for which is providedby the open-source machine learning library Pytorch (Paszke et al., 2017). Our recurrentneural network is designed using this library. The forward simulation of wave propagation isrepresented by a Dynamic Computational Graph (DCG), which records how each internalparameter is calculated from all previous ones. In the inversion, the partial derivatives ofthe residual with respect to any parameter (within a trainable class) is computed by (1)backpropagating within the computational graph to that parameter, and (2) calculating the6artial derivative along the path using the chain rule.This paper is organized as follows. First, we introduce the basic structure of the recurrentneural network and how the gradients can be calculated using the backpropagation method.Second, we demonstrate how the elastic FWI RNN cell is constructed in this paper and howthe waveflieds propagate in the RNN cells. Third, we explain the l and l misfits with highorder TV regulation and the mathematical expression for the gradients based on thesemisfits. Fourth, we use simple layers models and complex over-thrust models to performinversions with various parameterizations using different misfits. Finally, we discuss theconclusions of this work. RECURRENT NEURAL NETWORK
A recurrent neural network (RNN) is a machine learning network suitable for dealing withdata which have some sequential meaning. Within such a network, the information gener-ated in one cell layer can be stored and used by the next layer. This design has naturalapplicability in the processing and interpretation of the time evolution of physical processes(Sun et al., 2020), and time-series data in general; examples include language modeling(Mikolov, 2012), speech recognition (Graves et al., 2013), and machine translation (Kalch-brenner and Blunsom, 2013; Zaremba et al., 2014).[Figure 1 about here.][Figure 2 about here.]Figure 1 illustrates the forward propagation of information through an example RNN, inwhich each RNN cell represents an instant in time: the O = [ O , O , O , · · · ] are the internal7ariables in each RNN cell; S = [ S , S ] are the input at each time step; P = [ P , P ] are theoutput; L = [ L , L ] are the labeled data; and W = [ W , W ] are the trainable parametersat each time step. Mathematical operations relating the internal variables within a cell,and those relating internal variables across adjacent cells, are represented as arrows.To train this network is to select trainable weights such that labeled data L and RNNoutput P are as close as possible. Specifically, in training we determine the parameters W ,through a gradient-based optimization involving the partial derivatives of the residual withrespect to each W i . These derivatives are determined through backpropagation, which isa repeated application the chain rule, organized to resemble flow in the reverse directionalong with the arrows in the network. For the example RNN in Figure 1, this takes theform illustrated in Figure 2, in which the sequence R = [ R , R ] represents the residuals ateach time step. Within this example, to calculate the partial derivative of R with respectto W , we back-propagate from node R to W : ∂R ∂W = ∂R ∂P ∂P ∂W = ∂R ∂P ∂P ∂O ∂O ∂W = − R with respect to W , we backpropagate from R to W : ∂R ∂W = ∂R ∂P ∂P ∂W = ∂R ∂P ∂P ∂O ∂O ∂O ∂O ∂O ∂O ∂W = − O (2)If the RNN was set up to propagate through two-time steps, the gradient for W is − O +2.Real RNNs are more complex and involve propagation through larger numbers of time steps,but all are optimized through a process similar to this. These derivatives are the basis forgradients in the optimization misfit function; using them the W are updated and iterationscontinue. 8 RECURRENT NEURAL NETWORK FORMULATION OF EFWI
Wave propagation can be simulated using suitably-designed RNNs (Sun et al., 2020; Richard-son, 2018; Hughes et al., 2019). Here we take the acoustic wave propagation approach ofSun et al. (2020) as a starting point, and formulate an RNN which simulates the propaga-tion of an elastic wave through isotropic elastic medium. The underlying equations are the2D velocity-stress form of the elastodynamic equations (Virieux, 1986; Liu and Sen, 2009): ∂v x ∂t = 1 ρ (cid:18) ∂σ xx ∂x + ∂σ xz ∂z (cid:19) ∂v z ∂t = 1 ρ (cid:18) ∂σ xz ∂x + ∂σ zz ∂z (cid:19) ∂σ xx ∂t = ( λ + 2 µ ) ∂v x ∂x + λ ∂v z ∂z∂σ zz ∂t = ( λ + 2 µ ) ∂v z ∂z + λ ∂v x ∂x∂σ xz ∂t = µ (cid:18) ∂v x ∂z + ∂v z ∂x (cid:19) ., (3)where v x and v z are the x and z components of the particle velocity, σ xx , σ zz and σ xz arethree 2D components of the stress tensor. Discretized spatial distributions of the Lam´eparameters λ and µ , and the density ρ , form the elastic model.[Figure 3 about here.]In Figure 3 the structure of an RNN cell which produces a staggered-grid finite differencesolution for the velocity and stress fields is illustrated. At each time step the discrete sources s x and s z act as inputs; the velocity and stress information, v t − x , v t − z , σ txx , σ tzz , and σ txz , is communicated between the RNN cells ; the partial derivative fields, ∂ x σ txx , ∂ z σ tzz , ∂ x σ txz , ∂ z σ txz , ∂ x v t + x , ∂ z v t + x , ∂ x v t + z , ∂ z v t + z are the internal variables in each RNN cell,which correspond to O in Figure 2; and, λ , µ and ρ are included as trainable weights,which correspond to W in Figure 2. In Algorithm 1 pseudocode detailing these calculations9ithin the RNN cell is provided. The ∗ symbol represents the machine learning imageconvolution operator. This image convolution is the process of adding each element of theimage to its local neighbors, weighted by the image convolution kernel. We find that thisimage convolution operator is also capable of calculating space partial derivatives if theconvolution kernel is designed according to the finite difference coefficients. Details aboutthe image convolution operation can be found in Podlozhnyuk (2007). dx , dz are the gridintervals, and the image convolution kernels are: k x = a /dx , k x = b /dx , k z = a T /dz , and k z = b T /dz , where a = [0 , / , − / , / , − /
24] and b = [1 / , − / , / , − / , a and b are 1 × k x and k x are kernels, for the image convolution process,responsible for calculating the staggered grid space partial derivative in x direction. k z and k z are kernels, for the image convolution process, responsible for calculating the staggeredgrid space partial derivative in z direction, and that is also why the arrays, a and b , aretransposed in k z and k z . Space partial derivative calculated in this way is, mathematically,the same with conventional staggered grid method (e.g.Virieux (1986)). In this study, weachieve this process in an image convolution way. In algorithm 1, in order to implement thePML boundary, all the stress fields and the velocity fields need to be split into their x andz components. In algorithm 1, d x and d z are the PML damping coefficients in x directionand z direction. d x can be expressed as: d x ( i ) = d x ( in pmlx ) p (4),where * represents either x or z direction. i is PML layer number starting from the effectivecalculation boundary. n pmlx is the PML layer number in x direction. p is an integer andthe value is from 1-4. d x can be expressed as: d x = log ( 1 R ) rV s n pmlx δ x (5)10where R is a theoretically reflection coefficient, r is a value ranging from 3-4. δ x is the gridlength in x direction. d z can also be calculated in the same way.[Figure 4 about here.]The activity of the RNN network is illustrated in “unfolded” form in Figure 4. Abovethe unfolded network, horizontal velocity fields associated with a point source at the top leftof a model are plotted at three times during propagation of the wave information throughthe network, the third being at t max , the maximum receiving time. The wave field values arestored at positions selected to match multicomponent receivers; these form shot records astime evolves, which becomes the output data at each time. These shot records correspondto variables P in Figures 1 and 2, and the observed data. For FWI problem, the observeddata is obtained from the true model. For real seismic data inversion, the observed data isobtained from field survey. In this RNN based elastic FWI the observed data is consideredas the labeled data. The residuals are calculated at the last computational time as Figure4 illustrated and this along with a selected norm defines the misfit function used to trainthe network.Algorithm 2 describes the training of the network, i.e., the process of elastic RNNFWI. Partial derivatives of the residual with respect to the trainable parameters (in thiscase λ , µ and ρ ) are calculated through backpropagation using the automatic differentialmethod, as set out in the previous section. After we have the gradients then we can use anoptimization method and step length to update the trainable parameters and reduce themisfit and start another iteration. In step 4, RNN() is the network discussed above, whoseoutput is the synthetic data; costFunc(), in step 5, is the misfit or loss function chosento measure the difference between the synthetic data and observed data; loss.backward()11egins the backpropagation within the computational graph and produces the gradientsfor each parameter, with which the current parameter model can be updated and anotheriteration started.The gradient calculated within the training process above essentially reproduces theadjoint-state calculations within FWI, as discussed by Sun et al. (2020). Formulated as aproblem of RNN training, the gradient calculation occurs rapidly and in a manner suitablefor cloud computational architectures; also, it allows the researcher to efficiently alter misfitfunction choices and parameterization in order to do high-level optimization. However, itinvolves the storage of the whole wavefield, and thus should be expected to have significantmemory requirements. MISFITS WITH HIGH ORDER TV REGULATION
Here we fist introduce the elastic RNN misfits based on l norm with high order TV regu-larization: Φ T Vl ( m λ , m µ , m ρ , α λ , α µ , α ρ , α λ , α µ , α ρ ) = 12 (cid:107) D syn ( m λ , m µ , m ρ ) − D obs (cid:107) + α λ Θ TV ( m λ ) + α µ Θ TV ( m µ ) + α ρ Θ TV ( m ρ )+ α λ Υ TV ( m λ ) + α µ Υ TV ( m µ ) + α ρ Υ TV ( m ρ ) (6),where α λ , α µ , α ρ , α λ , α µ , α ρ , are vector of Lagrange multipliers, and Θ TV , Υ TV repre-sents first and second order TV regularization functions respectively. D syn ( m λ , m µ , m ρ )represents the synthetic data, which is the function of the model parameters, and in thisequation they are V P , V S and ρ . Θ TV and Υ TV represent functions for calculating the firstand second order TV regulations for the models.12he first order TV regulation term can be expressed as: T V (( m )) = n − (cid:88) i =1 m − (cid:88) j =1 | M i +1 ,j − M i,j | + n − (cid:88) i =1 m − (cid:88) j =1 | M i,j +1 − M i,j | = (cid:18) ∇ x , ∇ z (cid:19) m , m = (cid:18) L x , L z , (cid:19) m , m = Θ TV ( m ) (7)The second order TV regulation term can be expressed as: T V (( m )) = n − (cid:88) i =1 m − (cid:88) j =1 | M i +1 ,j − M i,j + M i − ,j | + n − (cid:88) i =1 m − (cid:88) j =1 | M i,j +1 − M i,j + M i,j − | = (cid:18) ∇ xx , ∇ zz (cid:19) m , m = (cid:18) K xx , K zz , (cid:19) m , m = Υ TV ( m ) (8)The derivative of Φ T Vl for each parameter ,which is the gradient for V P , V S and ρ basedon the l T V norm can be expressed as: ∂ Φ TVl ∂ m λ ∂ Φ TVl ∂ m µ ∂ Φ TVl ∂ m ρ = G l2 λ G l2 µ G l2 ρ + R λ R µ R ρ (9), where G l2 λ , G l2 µ , G l2 ρ are the gradient for λ , µ , ρ . R λ , R µ , R ρ are the regulation termsand the mathematical expressions for these regulation terms are: R λ R µ R ρ = α λ L Tx Q x λ α λ L Tz Q z λ , α λ K Txx Q xx λ α λ K Tzz Q zz λ α µ L Tx Q x µ α µ L Tz Q z µ α µ K Txx Q xx µ α µ K Tzz Q zz µ α ρ L Tx Q x ρ α ρ L Tz Q z ρ α ρ K Txx Q xx ρ α ρ K Tzz Q zz ρ L x L z K xx K zz · m λ m µ m ρ (10)13 q x λ q x µ q x ρ q z λ q z µ q z ρ q xx λ q xx µ q xx ρ q zz λ q zz µ q zz ρ = L x L z K xx K zz (cid:18) m λ , m µ , m ρ (cid:19) (11) Q x λ Q x µ Q x ρ Q z λ Q z µ Q z ρ Q xx λ Q xx µ Q xx ρ Q zz λ Q zz µ Q zz ρ = | q x λ | | q x µ | | q x ρ | | q z λ | | q z µ | | q z ρ | | q xx λ | | q xx µ | | q xx ρ | | q zz λ | | q zz µ | | q zz ρ | (12) T means the transpose of the matrix, · meas dot product. q x λ represent the first orderTV regularization vector in x direction for parameter λ . q x λ represent the values in vector q x λ . Q x λ is the absolute inverse of q x λ . Q x λ are elements in vector Q x λ . q xx λ representthe second order TV regularization vector in x direction for parameter λ . q xx λ representthe values in vector q xx λ . Q xx λ is the absolute inverse of q xx λ . Q xx λ are elements in vector Q xx λ . Other values in equations (9) and (10) can be also deduced like this. L x , L z arethe first order differential vector to give the first order total variations in x and z directionsrespectively. K xx , K zz are the second order differential vector to give the second order totalvariations in x and z directions respectively.If we were to use l Φ T Vl ( m λ , m µ , m ρ , α λ , α µ , α ρ , α λ , α µ , α ρ ) = (cid:107) D syn ( m λ , m µ , m ρ ) − D obs (cid:107) + α λ Θ TV ( m λ ) + α µ Θ TV ( m µ ) + α ρ Θ TV ( m ρ )+ α λ Υ TV ( m λ ) + α µ Υ TV ( m µ ) + α ρ Υ TV ( m ρ ) (13)14he gradient for each parameter based on l1 norm: ∂ Φ TVl ∂ m λ ∂ Φ TVl ∂ m µ ∂ Φ TVl ∂ m ρ = G l1 λ G l1 µ G l1 ρ + R λ R µ R ρ (14), where G l1 λ , G l1 µ , G l1 ρ are the gradient for λ , µ , ρ using l norm as misfit function. Thegradient calculation in l norm is using a differenta adjoint source (Pyun et al. (2009)Brossier et al. (2010)). The adjoint source for the adjoint fields for l norm is , In the caseof real arithmetic numbers, the term ∆ d | ∆ d | corresponds to the function sign . In this study,we did not meet conditions when ∆ d = 0. The detail gradient expression using the adjointstate method for parameters λ , µ and ρ based on l and l norm can be expressed as: G l2 λ = − (cid:80) x s (cid:80) x g (cid:82) T (( ∂ x ˜ u x ( r , r s , t ) + ∂ z ˜ u z ( r , r s , t )) ( ∂ x ˜ u ∗ l x ( r , r g , T − t ) + ∂ z ˜ u ∗ l z ( r , r g , T − t ))) (15) G l2 µ = − (cid:80) x s (cid:80) x g (cid:82) T (( ∂ z ˜ u x ( r , r s , t ) + ∂ x ˜ u z ( r , r s , t )) ( ∂ z ˜ u ∗ l x ( r , r g , T − t ) + ∂ x ˜ u ∗ l z ( r , r g , T − t ))) − ∂ x ˜ u x ( r , r s , t ) ∂ x ˜ u ∗ l x ( r , r g , T − t ) + ∂ z ˜ u z ( r , r s , t ) ∂ z ˜ u ∗ l z ( r , r g , T − t ))) (16) G l2 ρ = (cid:88) x s (cid:88) x g (cid:90) T (( ∂ t ˜ u x ( r , r s , t ) ∂ t ˜ u ∗ l x ( r , r g , T − t ) + ∂ t ˜ u z ( r , r s , t ) ∂ t ˜ u ∗ l z ( r , r g , T − t )))(17) G l1 λ = − (cid:80) x s (cid:80) x g (cid:82) T (( ∂ x ˜ u x ( r , r s , t ) + ∂ z ˜ u z ( r , r s , t )) ( ∂ x ˜ u ∗ l x ( r , r g , T − t ) + ∂ z ˜ u ∗ l z ( r , r g , T − t ))) (18)15 l1 µ = − (cid:80) x s (cid:80) x g (cid:82) T (( ∂ z ˜ u x ( r , r s , t ) + ∂ x ˜ u z ( r , r s , t )) ( ∂ z ˜ u ∗ l x ( r , r g , T − t ) + ∂ x ˜ u ∗ l z ( r , r g , T − t ))) − ∂ x ˜ u x ( r , r s , t ) ∂ x ˜ u ∗ l x ( r , r g , T − t ) + ∂ z ˜ u z ( r , r s , t ) ∂ z ˜ u ∗ l z ( r , r g , T − t ))) (19) G l1 ρ = (cid:88) x s (cid:88) x g (cid:90) T (( ∂ t ˜ u x ( r , r s , t ) ∂ t ˜ u ∗ l x ( r , r g , T − t ) + ∂ t ˜ u z ( r , r s , t ) ∂ t ˜ u ∗ l z ( r , r g , T − t )))(20) G l2 λ , G l2 µ , G l2 ρ , are gradients for λ , µ and ρ using l norm as misfit respectively. G l1 λ , G l1 µ , G l1 ρ , are gradients for λ , µ and ρ using l norm as misfit respectively. ˜ u x ∗ l and˜ u z ∗ l are the adjoint wavefields generated by the l1 norm adjoint source, ˜ u x ∗ l and ˜ u z ∗ l are the adjoint wavefields generated by the l2 norm adjoint source. T is the total receivingtime for the shot records. r s , r g represent the source and receivers locations respectively. r represent the model perturbation locations for λ , µ and ρ model. Figure 5 shows thegradient calculated using the adjoint state method and the Automatic Difference method.Figure 5 (a), (b), (c) are the normalized λ , µ and ρ gradients calculated by using the adjontstate method. Figure 5 (d), (e), (f) are the normalized λ , µ and ρ gradients calculated byusing the Automatic Differential method. The gradients calculated by using the AutomaticDifference method, contains more information about the model, for instance the lower partof the model, indicating that they can better reconstruct the mode. Now we rewrite themisfit function as: Φ T V = J D + J r1 + J r2 (21),where J D represents the any kind of norm misfit between observed data and synthetic data. J r1 = α λ Θ TV ( m λ ) + α µ Θ TV ( m µ ) + α ρ Θ TV ( m ρ ). J r2 = α λ Υ TV ( m λ ) + α µ Υ TV ( m µ ) + α ρ Υ TV ( m ρ ). The value α λ , α µ , α ρ , α λ , α µ , α ρ . are chosen according to the following16ormula (Guitton et al. (2012)). T = J D J r1 + J r2 (22)We should control the values for α λ , α µ , α ρ , α λ , α µ , α ρ and keep value T between 1 and10. T should be relatively large when, noise occurs in the data (Xiang and Zhang (2016)).[Figure 5 about here.] Parameterization testing
By modifying the RNN cells, and changing the trainable parameters, we can examine theinfluence of parameterization on waveform inversion within the deep learning formulation.Three sets of parameter classes are considered: the velocity parameterization (D-V model),involving P-wave velocity, S-wave velocity, and density; the modulus parameterization (D-M model), involving the Lam´e parameters λ and µ , and density; and, the stiffness matrixmodel (D-S model), involving C , C and density ρ .In these tests, the size of each model is 40 ×
90. 7 source points are evenly distributedacross the surface of the model; the source is a Ricker wavelet with a dominant frequencyof 30 Hz . The grid length of the model is dx = dz = 4m. In Figure 6 (a)-(b) are true andinitial V P model, (c)-(d) are the true and initial V S model, (e)-(f) are the true and initial ρ model we use in this test.In Figures 7 are the inversion results using the D-V parameterization. Figure 7 (a)-(d)are the inversion results for V P generated by l , l T V , l and l T V norm. Figure 7 (e)-(h) arethe inversion results for V S generated by l , l T V , l and l T V norm. Figure 7 (i)-(l) are theinversion results for ρ generated by l , l T V , l and l T V norm. In Figure 7 (a), (e) and (i) wecan see the cross talk between different parameters as the back arrows indicate, In Figure 717b), (c), (d) we can see that by changing the misfits and adding high order TV regulations,the cross talk between V P and ρ has been reduced. In Figure 7 (h) and (l) we can see thatby using l T V norm, the cross talk between density and V S has been mitigated, while in (j)and (k) we still see the cross talk between V S and density. From Figures 7 we can see that l norm with high order TV regulation can help to mitigate the cross talk problem.[Figure 6 about here.][Figure 7 about here.]Next the modulus parameterization is examined, in which we seek to recover λ , µ and ρ models. This occurs through a straightforward modification of the RNN cell, and againa change in the trainable parameters from velocities to moduli. Figure 8 (a)-(d) are theinversion results for λ generated by l , l T V , l and l T V norm. Figure 8 (e)-(h) are theinversion results for µ generated by l , l T V , l and l T V norm. Figure 8 (i)-(l) are theinversion results for ρ generated by l , l T V , l and l T V norm. In Figure 8 (b) and (d) we cansee that by using the high order TV regulations in l and l norm the cross talk betweendensity and λ has been mitigated. Figure (j) shows that In this parameterization by usingthe high order TV regulation on l norm can provide better inversion results for density aswell. [Figure 8 about here.]Inversion results for models in the S-D parameterization are plotted in Figure 9, gen-erated using the change of variables C = V P ρ and C = V S ρ . Figure 9 (a)-(d) are theinversion results for λ generated by l , l T V , l and l T V norm. Figure 9 (e)-(h) are the inver-sion results for µ generated by l , l T V , l and l T V norm. Figure 9 (i)-(l) are the inversion18esults for ρ generated by l , l T V , l and l T V norm. In Figure 9 we can still see the crosstalk between c
44 and c
11 as the black arrows pointing out. However this cross talk hasbeen mitigated by in figure (d), by using the l T V norm misfit. The inversion results aboveshows that the RNN based high order TV regulation FWi based on the l and l norm hasthe ability to mitigate cross talk problem with only gradient based methods. The RNNbased l T V RNN FWI in D-M parameterization and l T V RNN FWI in D-S parameterizationprovide better inversion results than other inversion tests.[Figure 9 about here.][Figure 10 about here.][Figure 11 about here.][Figure 12 about here.][Figure 13 about here.]Next, we will verify the proposed methods on the over-thrust model. Figure 10 (a), (c),(e) demonstrate the true models for V P , V S , and density ρ model, and (b), (d) and (h) arethe initial models for V P , V S , and density ρ respectively. The size of the model is 121 × V P , Figure Figure 11 (b) is the inversion for V S , Figure19igure 11 (c) is the inversion for ρ . From figure 11 we can see that the overall inversionresolution by using the conventional FWI is poor.Figure 12 shows the inversion results by using D-M parameterization. Figure 12 (a)-(c)are λ l norm, l T V norm, l T V norm inversion results respectively, (d)-(f) are µ l norm, l T V norm, l T V norm, inversion results respectively, (g)-(i) are ρ l norm l T V norm, l T V norminversion results respectively. In this parameterization we get unstable inversion results forparameter λ . However, in D-M parameterization Figure 13 (f), the small half arc structureat around 1000m of the model has been recovered in , as the black arrows indicate. Figure13 shows the profiles through the recovered elastic modules at 1000m of the models based onD-M parameterization. In Figure 13, the black lines are the true values, the yellow lines arethe initial values, red lines are the inversion results for l norm, green lines are the inversionresults for l T V norm and blue lines are the inversion results for l T V norm. Compared withthe true lines we can also see that , Figure 13 (b) and Figure 13 (c), l T V norm inversionresults are more close to the true values.[Figure 14 about here.][Figure 15 about here.][Figure 16 about here.]Figure 14 shows the inversion results by using D-V parameterization. In this parameter-ization all the three parameters are stable. Figure 14 (a)-(c) are, V P l norm, l T V norm, l T V norm inversion results respectively, (d)-(f) are V S l norm, l T V norm, l T V norm, inversionresults respectively, (g)-(i) are ρ l norm l T V norm, l T V norm inversion results respectively.Compared with the true models ,we can see that the inversion results generated by using the20 norm are less robust compared with other misfits. l norm with high order TV regulationcan provide more accurate inversion results. This can also be seen from Figure 14, whichis the profiles through the recovered elastic modules at 1000m of the models. In Figure 14,the black lines are the true velocities and density, the yellow lines are the initial values, redlines are the inversion results for l norm, green lines are the inversion results for l T V normand blue lines are the inversion results for l T V norm. Figure 15 (a) shows the results for V P . Figure 15 (b) shows the results for V S . Figure 15 (c) shows the results for ρ . In all thethree figures we can see that blue lines are closer to the true values compared with otherlines, which means that the l T V norm can provide us with more accurate inversion results.Figure 16 shows the D-V parameterization inversion model misfits in each iteration. Thered lines are the inversion using l norm. The green lines are the inversion using l T V norm.The red lines are the inversion using l T V norm. Figure 16 shows that we can get higheraccuracy inversion results by using the l T V norm with fewer iterations.Figure 17 shows the inversion results by using D-S parameterization. Figure 17 (a)-(c),are c l norm, l T V norm, l T V norm inversion results respectively, (d)-(f) are c l norm, l T V norm, l T V norm, inversion results respectively, (g)-(i) are ρ l norm l T V norm, l T V norm inversion results respectively. In this parameterization the small half arc structureat around 1000m of the model is clearly resolved in this parameterization by using the l T V norm. Compared with other misfits , again the l T V norm provide us the best resolution forthe model. Figure 13 shows the profiles through the recovered elastic modules at 1000m ofthe models based on D-M parameterization.. Red lines, green lines and blue lines are l , l T V ,and l T V norm inversion results respectively. We can also see that in D-S parameteriation. l T V provide us with inversion results that is more close the to true values. Figure 18 showshow model misfits in D-M parameterization changes in each iteration. The blue line, the21 T V norm inversion, has the fastest model misfit decline rate. We can conclude in thisparameterization l T V can generate more accuracy inversion results with fewer iterations.[Figure 17 about here.][Figure 18 about here.][Figure 19 about here.] RANDOM NOISE TESTING
In this section, we will test the sensitivity of this deep learning method to data contaminatedwith random noise drawn from a Gaussian distribution. In Figure 20, we plot a trace froman example shot profile with different ratios of noise.[Figure 20 about here.]In (a), the red signal is the record without noise, and the blue line is the record withGaussian random noise. The mean value of the noise is zero, and the standard deviation is0.3 ( std = 0 . std = 0 . std = 1 are plotted.[Figure 21 about here.]In Figure 21a, e and i the inversion results from noise-free inversion based on the RNNare plotted. In Figure 21b, f and j, the inversion results with noise at std = 0 . V P , V S ,and ρ are plotted, respectively; in Figure 21c, g, and k, the inversion results with noise at std = 0 . V P , V S , and ρ are plotted; in Figure 21d, h, and l are likewise for std = 1 .
0. We22onclude that a moderate amount of random error in the data used for the RNN trainingleads to acceptable results, though some blurring is introduced and detail in the structureis lost. The V S recovery appears to be much more sensitive to noise than are those of V P or ρ . CONCLUSIONS
Elastic multi-parameter full waveform inversion can be formulated as a strongly constrained,theory-based deep-learning network. Specifically, a recurrent neural network, set up withrules enforcing elastic wave propagation, with the wavefield projected onto a measurementsurface acting as the labeled data to be compared with observed seismic data, recapitulateselastic FWI but with both (1) the opportunity for data-driven learning to be incorporated,and (2) a design supported by powerful cloud computing architectures. Each cell of therecurrent neural network is designed according to the isotropic-elastic wave equation.The partial derivatives of the data residual with respect to the trainable parameterswhich act to represent the elastic media are calculated by using the intelligent automaticdifferential method. With the automatic differential method, gradients can be automaticallycalculated via the chain rule, guided by backpropagation along the paths within the compu-tational graph. The automatic differential method produces a high level of computationalefficiency and scalability for the calculation of gradients for different parameters in elasticmedia. The formulation is suitable for exploring numerical features of different misfits anddifferent parameterizations, with an aim of improving the resolution of the recovered elasticmodels, and mitigate cross-talk.We compared RNN waveform inversions based on l , l with high order total variations.23e used this RNN synthetic environment to compare density-velocity (P-wave velocity, S-wave velocity, and density), modulus-density (Lam´e parameters and density) and S-D ( C , C and density) parameterizations, and their exposure to cross-talk for the varying misfitfunctions. Our results suggest generally that this approach to full waveform inversion isconsistent and stable. S-D l l EFERENCES
Alkhalifah, T., and R.-´E. Plessix, 2014, A recipe for practical full-waveform inversion inanisotropic media: An analytical parameter resolution study: Geophysics, , R91–R101.Bar-Sinai, Y., S. Hoyer, J. Hickey, and M. P. Brenner, 2019, Learning data-driven discretiza-tions for partial differential equations: Proceedings of the National Academy of Sciences, , 15344–15349.Brossier, R., S. Operto, and J. Virieux, 2010, Which data residual norm for robust elasticfrequency-domain full waveform inversion?: Geophysics, , R37–R46.Chen, Y., M. Zhang, M. Bai, and W. Chen, 2019, Improving the signal-to-noise ratio ofseismological datasets by unsupervised machine learning: Seismological Research Letters.Downton, J. E., and D. P. Hampson, 2018, Deep neural networks to predict reservoir prop-erties from seismic: CSEG Geoconvention.Graves, A., A.-r. Mohamed, and G. Hinton, 2013, Speech recognition with deep recurrentneural networks: 2013 IEEE international conference on acoustics, speech and signalprocessing, IEEE, 6645–6649.Guitton, A., G. Ayeni, and E. D´ıaz, 2012, Constrained full-waveform inversion by modelreparameterizationgeologically constrained fwi: Geophysics, , R117–R127.Hughes, T. W., I. A. Williamson, M. Minkov, and S. Fan, 2019, Wave physics as an analogrecurrent neural network: Science Advances, , eaay6946.Innanen, K. A., 2014, Seismic avo and the inverse hessian in precritical reflection full wave-form inversion: Geophysical Journal International, , 717–734.Jin, G., K. Mendoza, B. Roy, and D. G. Buswell, 2019, Machine learning-based fracture-hitdetection algorithm using lfdas signal: The Leading Edge, , 520–524.Kalchbrenner, N., and P. Blunsom, 2013, Recurrent continuous translation models: 1700–25709.Kamath, N., R. Brossier, L. M´etivier, and P. Yang, 2018, 3d acoustic/visco-acoustic time-domain fwi of obc data from the valhall field, in SEG Technical Program ExpandedAbstracts 2018: Society of Exploration Geophysicists, 1093–1097.Kamei, R., and R. Pratt, 2013, Inversion strategies for visco-acoustic waveform inversion:Geophysical Journal International, , 859–884.Karpatne, A., G. Atluri, J. H. Faghmous, M. Steinbach, A. Banerjee, A. Ganguly, S.Shekhar, N. Samatova, and V. Kumar, 2017, Theory-guided data science: A newparadigm for scientific discovery from data: IEEE Transactions on Knowledge and DataEngineering, , 2318–2331.Keating, S., and K. A. Innanen, 2017, Crosstalk and frequency bands in truncated newtonan-acoustic full-waveform inversion: 1416–1421.——–, 2019, Parameter crosstalk and modeling errors in viscoacoustic seismic full-waveforminversion: Geophysics, , R641–R653.K¨ohn, D., D. De Nil, A. Kurzmann, A. Przebindowska, and T. Bohlen, 2012, On theinfluence of model parametrization in elastic full waveform tomography: GeophysicalJournal International, , 325–345.Lailly, P., 1983, as a sequence of before stack migrations: Conference on Inverse Scattering–Theory and Application, Siam, 206.Li, C., Y. Zhang, and C. C. Mosher, 2019a, A hybrid learning-based framework for seismicdenoising: The Leading Edge, , 542–549.Li, D., K. Xu, J. M. Harris, and E. Darve, 2019b, Time-lapse full waveform inversionfor subsurface flow problems with intelligent automatic differentiation: arXiv preprintarXiv:1912.07552. 26iu, Y., and M. K. Sen, 2009, An implicit staggered-grid finite-difference method for seismicmodelling: Geophysical Journal International, , 459–474.Mikolov, T., 2012, Statistical language models based on neural networks: Master’s thesis.Oh, J.-W., and T. Alkhalifah, 2016, Elastic orthorhombic anisotropic parameter inversion:An analysis of parameterizationelastic orthorhombic anisotropic fwi: Geophysics, ,C279–C293.Operto, S., Y. Gholami, V. Prieux, A. Ribodetti, R. Brossier, L. Metivier, and J. Virieux,2013, A guided tour of multiparameter full-waveform inversion with multicomponent data:From theory to practice: The leading edge, , 1040–1054.Pan, W., K. A. Innanen, and Y. Geng, 2018, Elastic full-waveform inversion andparametrization analysis applied to walk-away vertical seismic profile data for uncon-ventional (heavy oil) reservoir characterization: Geophysical Journal International, ,1934–1968.Pan, W., K. A. Innanen, G. F. Margrave, M. C. Fehler, X. Fang, and J. Li, 2016, Estimationof elastic constants for hti media using gauss-newton and full-newton multiparameter full-waveform inversion: Geophysics, , R275–R291.Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,L. Antiga, and A. Lerer, 2017, Automatic differentiation in pytorch.Peters, B., E. Haber, and J. Granek, 2019, Neural-networks for geophysicists and theirapplication to seismic data interpretation: arXiv preprint arXiv:1903.11215.Plessix, R.-E., 2006, A review of the adjoint-state method for computing the gradient of afunctional with geophysical applications: Geophysical Journal International, , 495–503.Podlozhnyuk, V., 2007, Image convolution with cuda: NVIDIA Corporation white paper,27une, .Pyun, S., W. Son, and C. Shin, 2009, Frequency-domain waveform inversion using an l1-norm objective function: Exploration Geophysics, , 227–232.Raissi, M., 2018, Deep hidden physics models: Deep learning of nonlinear partial differentialequations: The Journal of Machine Learning Research, , 932–955.Richardson, A., 2018, Seismic full-waveform inversion using deep learning tools and tech-niques: arXiv preprint arXiv:1801.07232.R¨oth, G., and A. Tarantola, 1994, Neural networks and inversion of seismic data: Journalof Geophysical Research: Solid Earth, , 6753–6768.Sambridge, M., P. Rickwood, N. Rawlinson, and S. Sommacal, 2007, Automatic differenti-ation in geophysical inverse problems: Geophysical Journal International, , 1–8.Shustak, M., and E. Landa, 2018, Time reversal for wave refocusing and scatterer detectionusing machine learning: Geophysics, , T257–T263.Smith, R., T. Mukerji, and T. Lupo, 2019, Correlating geologic and seismic data withunconventional resource production curves using machine learning: Geophysics, , O39–O47.Sun, H., and L. Demanet, 2018, Low frequency extrapolation with deep learning, in SEGTechnical Program Expanded Abstracts 2018: Society of Exploration Geophysicists,2011–2015.Sun, J., Z. Niu, K. A. Innanen, J. Li, and D. O. Trad, 2020, A theory-guided deep-learningformulation and optimization of seismic waveform inversion: Geophysics, , R87–R99.Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation:Geophysics, , 1259–1266.——–, 1986, A strategy for nonlinear elastic inversion of seismic reflection data: Geophysics,28 , 1893–1903.Virieux, J., 1986, P-sv wave propagation in heterogeneous media: Velocity-stress finite-difference method: Geophysics, , 889–901.Wagner, N., and J. M. Rondinelli, 2016, Theory-guided machine learning in materials sci-ence: Frontiers in Materials, , 28.Wu, J.-L., H. Xiao, and E. Paterson, 2018, Physics-informed machine learning approach foraugmenting turbulence models: A comprehensive framework: Physical Review Fluids, ,074602.Wu, Y., and Y. Lin, 2018, Inversionnet: A real-time and accurate full waveform inversionwith cnns and continuous crfs: arXiv preprint arXiv:1811.07875.Xiang, S., and H. Zhang, 2016, Efficient edge-guided full-waveform inversion by canny edgedetection and bilateral filtering algorithms: Geophysical Supplements to the MonthlyNotices of the Royal Astronomical Society, , 1049–1061.Yang, F., and J. Ma, 2019, Deep-learning inversion: a next generation seismic velocity-model building method: Geophysics, , 1–133.Zaremba, W., I. Sutskever, and O. Vinyals, 2014, Recurrent neural network regularization:arXiv preprint arXiv:1409.2329.Zhang, Z., Y. Wu, Z. Zhou, and Y. Lin, 2019, Velocitygan: Subsurface velocity imageestimation using conditional adversarial networks: 2019 IEEE Winter Conference onApplications of Computer Vision (WACV), IEEE, 705–714.Zheng, Y., Q. Zhang, A. Yusifov, and Y. Shi, 2019, Applications of supervised deep learningfor seismic interpretation and inversion: The Leading Edge, , 526–533.29 lgorithm 1 Sequence of calculations in the RNN cell with PML boundary
Require:
Source: s x , s z ; velocity and stress fields at the previous time step, parameters: λ , µ , ρ ; time step: dt . PML damping coefficients d x . d z . Ensure:
Update velocity field at t + and stress fields at t + 1 σ txx ← σ txx + s x σ tzz ← σ tzz + s z ∂ x σ txx ← ( σ txx ∗ k x ) /ρ ∂ z σ txz ← ( σ txz ∗ k z ) /ρ ∂ x σ txz ← ( σ txz ∗ k x ) /ρ ∂ z σ tzz ← ( σ tzz ∗ k z ) /ρ v t + x x ← (1 − dtd x ) v t − x x + dt ( ∂ x σ txx ) v t + x z ← (1 − dtd z ) v t − x z + dt ( ∂ z σ txz ) v t + x ← v t + x x + v t + x z v t + z x ← (1 − dtd x ) v t − z x + dt ( ∂ x σ txz ) v t + z z ← (1 − dtd z ) v t − z z + dt ( ∂ z σ tzz ) v t + z ← v t + x x + v t + x z ∂ x v t + x ← v t + x ∗ k x ∂ z v t + x ← v t + x ∗ k z ∂ x v t + z ← v t + z ∗ k x ∂ z v t + z ← v t + z ∗ k z σ xxt +1 x ← (1 − dtdx ) σ xxtx + dt ( λ + 2 µ ) ∂ x v t + x σ zzt +1 x ← (1 − dtdz ) σ zztx + dt ( λ ) ∂ z v t + z σ zzt +1 ← σ xxt +1 x + σ xxt +1 z σ zzt +1 x ← (1 − dtdx ) σ xxtx + dt ( λ ) ∂ x v t + x σ zzt +1 z ← (1 − dtdz ) σ zztz + dt ( λ + 2 µ ) ∂ z v t + z σ zzt +1 ← σ zzt +1 x + σ zzt +1 z σ xzt +1 x ← (1 − dtdx ) σ xztx + dt ( µ ) ∂ z v t + x σ xzt +1 z ← (1 − dtdz ) σ xztz + dt ( µ ) ∂ x v t + z σ xzt +1 ← σ xzt +1 x + σ xzt +1 z lgorithm 2 Loop for elastic RNN FWI Set trainable parameters: λ , µ , ρ in this test. Set optimizers for parameters:
Optimizer , Optimizer and Optimizer for λ , µ and ρ respectively. for iter ∈ [1 , maxiter ] or not converge do D syn = RNN( λ , µ , ρ ): generate synthetic data loss = costFunc( D syn , D obs ): calculate misfits loss.backward(): Backpropagation and give gradients for the parameters optimizers.step(): update parameters end for IST OF FIGURES O = [ O , O , O , · · · ] arethe internal variables in each cell; S = [ S , S ] are the inputs; P = [ P , P ]are the outputs; L = [ L , L ] are the labeled data; W = [ W , W ] are thetrainable weights. Black dashed line indicates forward propagation. . . . . . 342 Residual backward propagation through an example RNN. R = [ R , R ] arethe residuals at each RNN cell, which are calculated using absolute error( l norm). Gray solid line shows how gradients are calculated using back prop-agation from the residual to the trainable weights in RNN cell along thecomputational graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 The structure of each RNN. In this figure, ∂ x σ txx , ∂ z σ tzz , ∂ x σ txz , ∂ z σ txz , ∂ x v t + x , ∂ z v t + x , ∂ x v t + z , ∂ z v t + z are the internal variables. v t − x , v t − z , σ txx , σ tzz , σ txz ,is communicated between the RNN cells. λ , µ , ρ are trainable parameters. . 364 Velocity field forward and residual backward propagation under formattingof the RNN. The shot records formed at each time step correspond to theoutput of RNN cell P in Figure 2. Observed shot records correspond tolabeled data (L in Figure 2). Residual shot record correspond to residualinformation (R in Figure 2). Back propagation starts from the residual shotrecord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 (a) λ gradient given by the adjoint state method. (b) µ gradient based bythe adjoint state method. (c) ρ gradient based by the adjoint state method.(d) λ gradient given by the AD method. (e) µ gradient based by the ADmethod. (f) ρ gradient based by the AD method. . . . . . . . . . . . . . . 386 (a) true Vp model, (b) initial Vs model, (c) true Vs model, (d) initial vsmodel, (e) true ρ mode, (f) initial ρ model. . . . . . . . . . . . . . . . . . . 397 D-V parameterization inversion results. (a)-(d), V P l norm, l T V norm, l norm, l T V norm inversion results respectively, (e)-(h), V S l norm, l T V norm, l l T V norm, inversion results respectively, (i)-(l), ρ l norm l T V norm, l norm, l T V norm inversion results respectively. . . . . . . . . . . . . . . . 408 D-M parameterization inversion results. (a)-(d), λ l , norm, l T V norm, l ,norm l T V norm inversion results respectively, (e)-(h), µ l norm, l T V norm, l l T V norm inversion results respectively, (i)-(l), ρ l norm, l T V norm, l norm, l T V norm inversion results respectively. . . . . . . . . . . . . . . . 419 D-S parameterization inversion results. (a)-(d), c l norm, l T V norm, l norm, l T V norm inversion results respectively, (e)-(h), c l norm, l T V norm, l l T V norm, inversion results respectively, (i)-(l), ρ l norm l T V norm, l norm, l T V norm inversion results respectively. . . . . . . . . . . . . . . . 4210 (a) true V P model, (b) initial V P model, (c) true V S model (d) initial Vsmodel, (e) ρ true (f) initial ρ . . . . . . . . . . . . . . . . . . . . . . . . . . 43321 (a) true V P model, (b) initial V P model, (c) true V S model (d) initial V S model, (e) ρ true (f) initial ρ . . . . . . . . . . . . . . . . . . . . . . . . . . 4412 M-D parameterization inversion results. (a)-(c), λ l norm, l T V norm, l T V norm inversion results respectively, (c)-(f), µ l norm, l T V norm, l T V norm,inversion results respectively, (g)-(i), ρ l norm l T V norm, l T V norm inversionresults respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4513 Profiles through the recovered elastic models. (a) Vertical λ profiles; (b)vertical µ profiles; (c) vertical ρ profiles . . . . . . . . . . . . . . . . . . . . 4614 D-V parameterization inversion results. (a)-(d), V P l norm, l T V norm, l T V norm inversion results respectively, (e)-(h), V S l norm, l T V norm, l T V norm,inversion results respectively, (i)-(l), ρ l norm l T V norm, l T V norm inversionresults respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4715 Profiles through the recovered elastic models. (a) Vertical V P profiles; (b)vertical V S profiles; (c) vertical ρ profiles. . . . . . . . . . . . . . . . . . . . 4816 D-V parameterization inversion results . (a) Vp model misfit using l , l T V and l T V norm.(b) Vs model misfit using l , l T V and l T V norm.(c) ρ modelmisfit using l , l T V and l T V norm . . . . . . . . . . . . . . . . . . . . . . . . 4917 S-D parameterization inversion results. (a)-(c), c l norm, l T V norm, l T V norm inversion results respectively, (c)-(f), c l norm, l T V norm, l T V norm,inversion results respectively, (g)-(i), ρ l norm l T V norm, l T V norm inversionresults respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5018 Profiles through the recovered elastic models based on . (a) Vertical c c
44 profiles; (c) vertical ρ profiles . . . . . . . . . . . . 5119 D-S parameterization inversion results . (a) c
11 model misfit using l , l T V and l T V norm.(b) c
44 model misfit using l , l T V and l T V norm.(c) ρ modelmisfit using l , l T V and l T V norm . . . . . . . . . . . . . . . . . . . . . . . . 5220 Noise free and noise shotrecords. (a) Noise free record and records withGaussian noise std = 0 .
3. (b) Noise free record and records with Gaussiannoise std = 0 .
5. (c) Noise free record and records with Gaussian noise std = 1. 5321 Random noise testing inversion results. (a),(e),(i), noise free inversion resultsfor V P , V S , and density (b),(f),(j) Gaussian noise std = 0 . V P , V S , and density, (c),(g),(k) Gaussian noise std = 0 . V P , V S , and density, (d),(h),(l) Gaussian noise std = 1 . V P , V S , and density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5433igure 1: Forward propagation through an example RNN. Cells are identical in form, withthe output of one being the input of the next. O = [ O , O , O , · · · ] are the internal variablesin each cell; S = [ S , S ] are the inputs; P = [ P , P ] are the outputs; L = [ L , L ] are thelabeled data; W = [ W , W ] are the trainable weights. Black dashed line indicates forwardpropagation. 34igure 2: Residual backward propagation through an example RNN. R = [ R , R ] are theresiduals at each RNN cell, which are calculated using absolute error( l norm). Gray solidline shows how gradients are calculated using back propagation from the residual to thetrainable weights in RNN cell along the computational graph.35 𝒙𝒙𝒕 𝝈 𝒙𝒛𝒕 𝝈 𝒛𝒛𝒕 𝝏 𝒙 𝝈 𝒙𝒙𝒕 𝝏 𝒙 𝝈 𝒙𝒛𝒕 𝝏 𝒛 𝝈 𝒙𝒛𝒕 𝝏 𝒛 𝝈 𝒛𝒛𝒕 𝑽 z𝒕+𝟏 𝝏 𝒙 𝑽 𝒙𝒕+𝟏𝟐 𝝏 𝒙 𝑽 𝒛𝒕+𝟏𝟐 𝝏 𝒛 𝑽 𝒙𝒕+𝟏𝟐 𝝏 𝒛 𝑽 𝒛𝒕+𝟏𝟐 𝝈 𝒙z𝒕+𝟏 𝝈 𝒙𝒙𝒕+𝟏 𝝈 𝒛𝒛𝒕+𝟏 𝑽 𝒙𝒕−𝟏𝟐 𝑽 𝒛𝒕−𝟏𝟐 𝝀 𝝁 𝑽 𝒙𝒕+𝟏𝟐 𝝁 𝝀𝝁 𝑺 𝒙𝒕 𝑺 𝒛𝒕 𝝆 𝝆 Figure 3: The structure of each RNN. In this figure, ∂ x σ txx , ∂ z σ tzz , ∂ x σ txz , ∂ z σ txz , ∂ x v t + x , ∂ z v t + x , ∂ x v t + z , ∂ z v t + z are the internal variables. v t − x , v t − z , σ txx , σ tzz , σ txz , is communi-cated between the RNN cells. λ , µ , ρ are trainable parameters.36igure 4: Velocity field forward and residual backward propagation under formatting of theRNN. The shot records formed at each time step correspond to the output of RNN cell P inFigure 2. Observed shot records correspond to labeled data (L in Figure 2). Residual shotrecord correspond to residual information (R in Figure 2). Back propagation starts fromthe residual shot record. 37igure 5: (a) λ gradient given by the adjoint state method. (b) µ gradient based by theadjoint state method. (c) ρ gradient based by the adjoint state method. (d) λ gradientgiven by the AD method. (e) µ gradient based by the AD method. (f) ρ gradient based bythe AD method. 38igure 6: (a) true Vp model, (b) initial Vs model, (c) true Vs model, (d) initial vs model,(e) true ρ mode, (f) initial ρ model. 39igure 7: D-V parameterization inversion results. (a)-(d), V P l norm, l T V norm, l norm, l T V norm inversion results respectively, (e)-(h), V S l norm, l T V norm, l l T V norm,inversion results respectively, (i)-(l), ρ l norm l T V norm, l norm, l T V norm inversionresults respectively. 40igure 8: D-M parameterization inversion results. (a)-(d), λ l , norm, l T V norm, l , norm l T V norm inversion results respectively, (e)-(h), µ l norm, l T V norm, l l T V norminversion results respectively, (i)-(l), ρ l norm, l T V norm, l norm, l T V norm inversionresults respectively. 41igure 9: D-S parameterization inversion results. (a)-(d), c l norm, l T V norm, l norm, l T V norm inversion results respectively, (e)-(h), c l norm, l T V norm, l l T V norm,inversion results respectively, (i)-(l), ρ l norm l T V norm, l norm, l T V norm inversionresults respectively. 42igure 10: (a) true V P model, (b) initial V P model, (c) true V S model (d) initial Vs model,(e) ρ true (f) initial ρ V P model, (b) initial V P model, (c) true V S model (d) initial V S model,(e) ρ true (f) initial ρ λ l norm, l T V norm, l T V norminversion results respectively, (c)-(f), µ l norm, l T V norm, l T V norm, inversion resultsrespectively, (g)-(i), ρ l norm l T V norm, l T V norm inversion results respectively45igure 13: Profiles through the recovered elastic models. (a) Vertical λ profiles; (b) vertical µ profiles; (c) vertical ρ profiles 46igure 14: D-V parameterization inversion results. (a)-(d), V P l norm, l T V norm, l T V norm inversion results respectively, (e)-(h), V S l norm, l T V norm, l T V norm, inversionresults respectively, (i)-(l), ρ l norm l T V norm, l T V norm inversion results respectively.47igure 15: Profiles through the recovered elastic models. (a) Vertical V P profiles; (b)vertical V S profiles; (c) vertical ρ profiles. 48igure 16: D-V parameterization inversion results . (a) Vp model misfit using l , l T V and l T V norm.(b) Vs model misfit using l , l T V and l T V norm.(c) ρ model misfit using l , l T V and l T V norm 49igure 17: S-D parameterization inversion results. (a)-(c), c l norm, l T V norm, l T V norminversion results respectively, (c)-(f), c l norm, l T V norm, l T V norm, inversion resultsrespectively, (g)-(i), ρ l norm l T V norm, l T V norm inversion results respectively.50igure 18: Profiles through the recovered elastic models based on . (a) Vertical c
11 profiles;(b) vertical c
44 profiles; (c) vertical ρ profiles51igure 19: D-S parameterization inversion results . (a) c
11 model misfit using l , l T V and l T V norm.(b) c
44 model misfit using l , l T V and l T V norm.(c) ρ model misfit using l , l T V and l T V norm 52igure 20: Noise free and noise shotrecords. (a) Noise free record and records with Gaussiannoise std = 0 .
3. (b) Noise free record and records with Gaussian noise std = 0 .