An encoder-decoder deep surrogate for reverse time migration in seismic imaging under uncertainty
Rodolfo S. M. Freitas, Carlos H. S. Barbosa, Gabriel M. Guerra, Alvaro L. G. A. Coutinho, Fernando A. Rochinha
AAn encoder-decoder deep surrogate for reverse timemigration in seismic imaging under uncertainty
Rodolfo S. M. Freitas a , Carlos H. S. Barbosa a , Gabriel M. Guerra b , Alvaro L.G. A. Coutinho a and Fernando A. Rochinha a , ∗ a COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, 21941-598, Brazil b Department of Mechanical Engineering, Federal Fluminense University, NiterÃşi, Brazil
A R T I C L E I N F O
Keywords :Reverse time migrationDeep LearningSurrogate ModelingUncertainty Quantification
A B S T R A C T
Seismic imaging faces challenges due to the presence of several uncertaintysources. Uncertainties exist in data measurements, source positioning, andsubsurface geophysical properties. Reverse time migration (RTM) is a high-resolution depth migration approach useful for extracting information suchas reservoir localization and boundaries. RTM, however, is time-consumingand data-intensive as it requires computing twice the wave equation to gener-ate and store an imaging condition. RTM, when embedded in an uncertaintyquantification algorithm (like the Monte Carlo method), shows a many-foldincrease in its computational complexity due to the high input-output dimen-sionality. In this work, we propose an encoder-decoder deep learning surro-gate model for RTM under uncertainty. Inputs are an ensemble of velocityfields, expressing the uncertainty, and outputs the seismic images. We showby numerical experimentation that the surrogate model can reproduce theseismic images accurately, and, more importantly, the uncertainty propaga-tion from the input velocity fields to the image ensemble.
1. Introduction
Seismic imaging is employed to delineate the salient geological features of the Earth subsur-face. Imaging methods are popular in the Oil & Gas industry as they are designed to be focused onthe more essential characteristics: the horizons bounding the regions of interest. They can also beused in conjunction with inverse methods such as Full Waveform Inversion [1]. Imaging methodsare designed and built departing from the integration of specialized optical (illuminating) prin-ciples and physics-based models describing the wave propagation through heterogeneous media.A critical aspect arising from such arrangement is the potential computational cost required, asa large domain is to be illuminated, which implies solving partial differential equations (PDEs) ∗ Corresponding author. Email: [email protected] (Rodolfo S. M. Freitas)[email protected] (Fernando A. Rochinha)
ORCID (s): (R.S.M. Freitas); (C.H.S. Barbosa); (G.M. Guerra); (A.L.G.A. Coutinho); (F.A. Rochinha)
R. S. M. Freitas et al.
Page 1 of 26 a r X i v : . [ phy s i c s . c o m p - ph ] J un n encoder-decoder deep surrogate for reverse time migration associated with the wave models in that area. The situation tends to be more complicated as theexcitation signals bear high-frequency content, which demands very fine grids in space and time.Such time-consuming tasks often hamper the use of high-fidelity codes constructed upon physics-based models. That becomes a more critical issue whenever one faces many-query applicationslike sensitivity analysis, design, optimization, or uncertainty quantification.In this work, we develop a machine-learning model to alleviate computational costs to provideseismic images with quantified uncertainty [2]. In this context, we propose a Monte Carlo method(MC) to sweep a large ensemble of plausible velocity fields obtained by approximate methods,and, therefore, prone to uncertainties, to compute an ensemble of images aiming at characterizingthe propagated uncertainties along with the seismic image processing. Moreover, we embed theMC sampling as an outer loop of a larger computational workflow proposed in [2] and detailedin Algorithm 1. This algorithm is structured in three sequential stages, enabling a probabilisticframework for seismic imaging.The first stage aims at generating plausible subsurface velocity fields honoring seismic data.Probabilistic inversion, such as Bayesian tomography [2, 3, 4, 5], and stochastic FWI [6, 7, 8, 9, 10]can provide a velocity field ensemble used as input to the second stage. Hence, in Stage 2 , animaging technique migrates the seismogram information using each velocity field sample. Thisstrategy wraps a seismic migration tool into an MC algorithm aiming to build a set of migratedseismic images. We have chosen the Reverse Time Migration (RTM) as the seismic migrationtechnique to localize the seismic reflectors in the correct depth location in the subsurface [11]. RTMis a depth migration approach based on the two-way wave equation, frequently used in industry,that provides reliable subsurface high-resolution seismic images useful for seismic interpretationand reservoir characterization [11]. The last stage of the workflow post-processes the RTM seismicimages ensemble, calculating uncertainty maps and extracting features, such as horizons and faults,that characterizes uncertainty in the resulting images.
Algorithm 1
Workflow for seismic imaging with quantified uncertainty
Input : source signals, seismograms, and spatial domain (raw data).
Output : ensemble of seismic images.
Stage 1 : Generate an ensemble of velocity fields:• Bayesian inversion with simplified physics-based models
Stage 2 : Propagate uncertainties – migrate the seismograms for the velocity field ensemble usingRTM, producing a corresponding ensemble of seismic images• Monte Carlo loop over samples produced in Stage 1
Stage 3 : Post-process the RTM seismic images• Uncertainty maps;• Automatic features (horizons) detection;• Probabilistic characterization of such features;Due to its flexibility by design, it is possible to generate different workflow versions, by, forinstance, replacing components within the stages (e.g., different strategies for the uncertain ve-
R. S. M. Freitas et al. Page 2 of 26n encoder-decoder deep surrogate for reverse time migration locity fields estimation in
Stage 1 ) targeting to accommodate different demands or efficiency re-quirements. Nonetheless, we would still be facing a time-consuming computational task in themany-query UQ analysis of
Stage 2 . That is what motivates us to follow a consolidated trend,replacing the original physics-based model by a cheap-to-evaluate surrogate. Recently, MachineLearning techniques, like Gaussian Processes [12, 13, 14, 15, 16, 17] and Deep Neural Networks(DNNs), [18, 19, 20, 21, 22, 23, 24, 25] have deployed efficient surrogates for UQ analysis. Gaus-sian Processes have achieved considerable success with computer models with inputs and outputsof moderate dimensionality, including the ability to blend data of different sources, leading to multi-fidelity approaches [15, 16]. On the other hand, Deep Neural Networks have gained traction dueto their unique profile of being flexible and scalable nonlinear function approximators. Anotheraspect worth highlighting is the substantial amount of computer libraries and tools available toenable their use.Here, we apply the deep learning surrogate architecture proposed in [18] for systems governedby PDEs cast as an image-to-image problem. The performance of such architecture was tested inuncertainty quantification of flows in heterogeneous media [19], extended to semi-supervised learn-ing in [20], and inverse problems in [21], with excellent results. This architecture is composed ofconvolutional layers and dense blocks, following an encoder-decoder neural network arrangementto handle the potential high-dimensionality of inputs and outputs. More specifically, we employthe deep learning architecture for constructing efficient proxies for RTM imaging by avoiding thehigh costs of solving twice a wave propagation equation in a heterogeneous medium. Such sur-rogates are nonlinear mappings linking the uncertain velocity field to the seismic images. It isworth to highlight that differently from usual surrogates, we do not replace only a forward solverassociated to a PDE, but the whole more expensive imaging process. The surrogate can handlethe high-dimensional inputs (velocity fields) and outputs (seismic images), leading to cost savingsin processing and memory storage. We demonstrate through two examples that such an approachenables producing seismic images with quantified uncertainty. Indeed, it can accurately reproducethe ensemble of images resulting from the MC uncertain propagation with much less computationaleffort. Moreover, it uses a limited training data as expected, which was confirmed by our resultsand efficiency estimation.The remainder of this paper is organized as follows. The next section details the RTM math-ematical problem along with its computational implementation within
Stage 2
MC uncertaintypropagation. Section 3 presents our deep learning, surrogate architecture and training strategy. InSection 4, we present numerical experiments where we investigate the accuracy, convergence, andcost-effectiveness of our surrogate model to replace the high-fidelity RTM under uncertainty. Thepaper ends with a summary of our main findings.
2. Reverse Time Migration under Uncertainty
RTM is a high-resolution depth migration technique providing useful subsurface images forextracting information such as reservoir localization and boundaries [11]. The raw data for RTMconsists of recorded seismic signals induced by a seismic source (a shot). The group of seismicsignals represents a seismogram that captures information related to reflections coming from thesubsurface. RTM relies on the two-way wave propagation equation, resulting in an imaging con-dition (IC) [26] computed over the space-time domain to be imaged. More specifically, the wave
R. S. M. Freitas et al. Page 3 of 26n encoder-decoder deep surrogate for reverse time migration equation is solved twice, the first time to compute the forward-propagated wave due to seismicsources, followed by the computation of the backward-propagated wave induced by the recordedseismograms. Both solutions are needed to compute the IC. We calculate the forward wave isotropicacoustic case by solving, ∇ 𝑝 ( r , 𝑡 ) − 1 𝑣 ( r ) 𝜕 𝑝 ( r , 𝑡 ) 𝜕𝑡 = 𝑓 ( 𝐫 𝑠 , 𝑡 ) ,𝑝 ( r , 𝑡 ) = 0 on 𝜕 Ω D and p( r , t) = 0 on 𝜕 Ω inf , (1) 𝑝 ( r ,
0) = 0 and 𝜕𝑝 ( r , 𝜕𝑡 = 0 , r ∈ Ω , where Ω ⊂ ℝ denotes the domain to be imaged, 𝜕 Ω = 𝜕 Ω 𝐷 ∪ 𝜕 Ω 𝑖𝑛𝑓 ⊂ ℝ is the domain boundaryand 𝜕 Ω 𝐷 and 𝜕 Ω 𝑖𝑛𝑓 are non-overlapping boundary partitions. 𝜕 Ω 𝐷 is the portion of the boundarywhere Dirichlet boundary conditions are applied, representing, for instance, the free-surface. Theoperator represents the non-reflecting boundary condition [27] applied on 𝜕 Ω 𝑖𝑛𝑓 . The pressure(the forward-propagated source wavefield) 𝑝 ( r , 𝑡 ) is defined at the position r = ( 𝑟 𝑥 , 𝑟 𝑦 , 𝑟 𝑧 ) ∈ Ω andtime 𝑡 ∈ [0 , 𝑇 ] . Moreover, 𝑣 ( 𝐫 ) is the compressional wave velocity spatial field, and 𝑓 ( 𝐫 𝑠 , 𝑡 ) is theseismic source. The vector 𝐫 𝑠 represents the seismic source position. The backward-propagatedwavefield is calculated solving, ∇ ̄𝑝 ( r , 𝜏 ) − 1 𝑣 ( r ) 𝜕 ̄𝑝 ( r , 𝜏 ) 𝜕𝜏 = 𝑠 ( 𝐫 𝑟 , 𝜏 ) ,̄𝑝 ( r , 𝜏 ) = 0 on 𝜕 Ω D and ̄p( r , 𝜏 ) = 0 on 𝜕 Ω inf (2) ̄𝑝 ( r ,
0) = 𝑝 ( r , 𝑇 ) and 𝜕 ̄𝑝 ( r , 𝜕𝜏 = 𝜕𝑝 ( r , 𝑇 ) 𝜕𝜏 , r ∈ Ω . which is an equation similar to (1), but with a different source 𝑠 ( 𝐫 𝑟 , 𝜏 ) , that is, the recorded signalsat the receivers positioned in 𝐫 𝑟 . Besides, the evolution in Eq. (2) is over the reverse time 𝜏 = 𝑇 − 𝑡 .Thus, the backward-propagated wavefield ̄𝑝 ( r , 𝜏 ) is defined in Ω and 𝜏 ∈ [0 , 𝑇 ] .The IC dictates the quality and fidelity of the final RTM image. There are several possibilities,for instance, excitation ICs [28, 29, 30], extend ICs [31, 32, 33], wavefield decomposition ICs [34],and the zero-lag cross-correlation ICs [11, 35]. We have chosen the zero-lag cross-correlationbetween the forward and backward propagated waves at each point in Ω , 𝐼 ( r ) = ∫ 𝑇 𝑝 ( r , 𝑡 ) ̄𝑝 ( r , 𝜏 ) 𝑑𝑡. (3)The IC amplitudes in equation (3) do not provide an explicit physical relationship with thereflection coefficients. In [35], we find a detailed explanation of the relation between the imagingcondition and the reflection coefficient. Nevertheless, the resulting image provides the correctamplitude contrast locations of the geological interfaces of rocks with different physical properties[11]. The amplitude contrast patterns are the main feature of the migrated seismic images explored R. S. M. Freitas et al. Page 4 of 26n encoder-decoder deep surrogate for reverse time migration in the present work.We apply an explicit 𝑛𝑑 -order in time and 𝑡ℎ -order in space finite difference numerical scheme[36] to approximate equations (1) and (2), leading to the vector 𝐯 , the discrete version of the velocityfield, and similarly the vectors 𝐩 , ̄𝐩 , 𝐬 , 𝐟 at each time step. Note that the vectors 𝐩 , ̄𝐩 , and 𝐯 havethe same dimension, that is 𝑁 = 𝑁 𝑥 ∗ 𝑁 𝑦 (or 𝑁 = 𝑁 𝑥 ∗ 𝑁 𝑦 ∗ 𝑁 𝑧 in 3D), where 𝑁 𝑥 , 𝑁 𝑦 (and 𝑁 𝑧 ) are the number of grid points in each Cartesian direction. Each discrete seismogram isa vector of size 𝑁 𝑟𝑒𝑐 ∗ ( 𝑁 𝑡 + 1) , where 𝑁 𝑟𝑒𝑐 is the number of receivers, and 𝑁 𝑡 = 𝑇 ∕Δ 𝑡 , with Δ 𝑡 is the time step. The seismic source 𝐟 has dimension 𝑁 𝑡 . RTM is not only computationallyintensive but also data-intensive due to the high dimensional inputs, the amount of data to manage,for instance, storing and retrieving 𝐩 , and the computational costs associated with imposing thestability and dispersion conditions in the discrete two-way wave equation [11]. The dispersionrelation takes into account the number of grid points per wavelength and the medium properties,which in our isotropic acoustic case is the P-wave velocity. Thus, complexity in estimating themigrated image increases with high heterogeneous media and seismic source cutoff frequency.Least-squares RTM (LSRTM) extends the basic method to an iterative method that minimizes adata misfit term [26, 37]. However, in the present work, we restrict ourselves to the standard RTMtechnique.As we wrap RTM into a sampling method in Algorithm 1, for taking into consideration theinput uncertainties, the overall computational cost of Stage 2 rises proportionally to 𝑁 𝑀𝐶 , thecardinality of the ensemble of possible velocity fields. Typically, seismic raw data recording setsare split into multiple steps ( 𝑁 𝑠ℎ𝑜𝑡𝑠 ) to cope with the potentially high spatial dimensions to becovered and to enhance the signal to noise ratio (SNR) in processing stages. Each step coveringfully or partially the domain corresponds to a different arrangement of sources and receivers. It isessential to mention that RTM iterates over the number of shots producing partial migrated imagescharacterized by equation (3). When this loop ends, a process called staking sums the partiallymigrated seismic images into a single one [38, 39], gathering into this single image all informationrelated to the seismogram set. Algorithm 2 details the generation of the RTM images ensemble,where a set of seismograms, { 𝐬 , ⋅ ⋅ ⋅ , 𝐬 𝑁 𝑠ℎ𝑜𝑡𝑠 } , a set of velocity fields, { 𝐯 , ⋅ ⋅ ⋅ , 𝐯 𝑁 𝑀𝐶 } , and a seismicsource ( 𝐟 ) are given as inputs. The indexes 𝑁 𝑠ℎ𝑜𝑡𝑠 and 𝑁 𝑀𝐶 represent the number of shots and thenumber of samples for the MC method. For each MC iteration, we solve the wave equation twice,one for the seismic source and other for the seismograms associated with it. The computation ofthe imaging condition uses both solutions (source wavefield, and receiver wavefield), retrievingfrom persistent storage the source wavefield to build the migrated seismic section and stackingthe partial results over time ( 𝐈 ∑ 𝑡 ), and over the number of seismograms ( 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 ). At the end ofAlgorithm 2, we have the discrete imaging condition set { 𝐈 , 𝐈 , ⋯ , 𝐈 𝑁 𝑀𝐶 } where each 𝐈 𝑖 is a vectorin ℝ 𝑁 . It is usual to filter each image to sharpen its features. Nevertheless, we do not apply anyfilter to the ensemble of seismic images. Summarizing, migrations of seismograms for the set ofvelocity fields produce the corresponding set of migrated seismic images, where each one has adirect relation with one velocity sample.Different strategies can be pursued in order to make feasible Algorithm 1 by reducing the in-herent computational costs of processing and storage. They could rely, for instance, on data com-pression [40, 41, 42, 43] or more effective stochastic sampling [44], but here, as mentioned before,our option is for using deep learning surrogates for the RTM imaging, what we describe in thefollowing section. R. S. M. Freitas et al. Page 5 of 26n encoder-decoder deep surrogate for reverse time migration
Algorithm 2
Reverse Time Migration under Uncertainty.
Require: { 𝐯 , ⋅ ⋅ ⋅ , 𝐯 𝑁 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 } , { 𝐬 , ⋅ ⋅ ⋅ , 𝐬 𝑁 𝑠ℎ𝑜𝑡𝑠 } , and 𝐟 function RTM _ UQ ( vectors { 𝐯 , ⋅ ⋅ ⋅ , 𝐯 𝑁 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 } , vectors { 𝐬 , ⋅ ⋅ ⋅ , 𝐬 𝑁 𝑠ℎ𝑜𝑡𝑠 } , vector 𝐟 ) for 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 = 1 to 𝑁 𝑀𝐶 do read 𝐯 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 , and 𝐟 initialize image condition 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 = 0 for 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 = 1 to 𝑁 𝑠ℎ𝑜𝑡𝑠 do initialize 𝑛 𝑡 = 0 apply initial conditions for 𝑖 𝑡 = 0 for 𝑖 𝑡 = 1 to 𝑁 𝑡 do 𝑛 𝑡 = 𝑛 𝑡 + 𝑖 𝑡 ∗ Δ 𝑡 solve equation (1) ⊳ source wavefield store 𝐩 𝑛 𝑡 end for initialize 𝑛 𝜏 = 0 , and 𝐈 ∑ 𝜏 = 0 apply initial conditions for 𝑖 𝜏 = 0 for 𝑖 𝜏 = 1 to 𝑁 𝑡 do 𝑛 𝜏 = 𝑁 𝑡 − ( 𝑛 𝜏 + 𝑖 𝜏 ∗ Δ 𝜏 ) ⊳ reverse time read 𝐩 𝑛 𝜏 , and 𝐬 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 solve equation (2) ⊳ receiver wavefield calculate 𝐈 ∑ 𝑛 𝜏 = 𝐈 ∑ 𝑛 𝜏 + 𝐩 𝑛 𝜏 ̄ 𝐩 𝑛 𝜏 ⊳ imaging condition end for stack 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 = 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 + 𝐈 ∑ 𝑛 𝜏 ⊳ stacking end for store 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 ∀ 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 ∈ 1 ≤ 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 ≤ 𝑁 𝑀𝐶 end for end function
3. Deep Learning Surrogate
The main goal of surrogate models is to replicate the multivariate input-output mapping pro-vided by physical models governed by PDEs to save computational costs. Performing uncertaintyquantification in such conditions is often hampered whenever one faces high-dimensionality, theso-called curse of dimensionality. As pointed out in the literature, DNNs have proved successfulin such situations by exploiting low-dimensional latent spaces and sophisticated training methods[45]. We aim to construct and evaluate the performance of DNNs acting as a surrogate modelfor the RTM imaging under uncertainty, using as baseline the encoder-decoder architecture pro-posed by [18] and designed for problems cast as image-to-image regressions. We briefly reviewthe building blocks of the network in this section.Figure 1, inspired in [18], provides the big picture of our end-to-end solution, depicting themain components of the encoder-decoder network. In our particular application, the input for thedeep learning surrogate is the ensemble of heterogeneous velocity fields, and the outputs are thecorresponding imaging conditions given by Eq. (3) for each sample of the velocity field ensemble.Inputs and outputs are high-dimensional spatial fields, and the surrogate modeling cast as a field-
R. S. M. Freitas et al. Page 6 of 26n encoder-decoder deep surrogate for reverse time migration
Figure 1:
RTM deep convolutional encoder-decoder network architecture. Inputs: ensemble of velocityfields 𝐯 ∈ ℝ 𝑁 . Outputs: corresponding ensemble of image conditions 𝐈 ∈ ℝ 𝑁 . to-field regression. This approach is image-inspired. Then, it relies on connecting each pixel ofthe input field to an output pixel, where pixels correspond to grid points in the input computationalmesh and output fields. The trained network maps the velocity 𝐯 ∈ ℝ 𝑁 into the IC field 𝐈 ∈ ℝ 𝑁 .The encoder-decoder architecture displayed in Fig. 1 consists of two sequential main phases.The first is the encoder, wherein dimension reduction occurs, followed by the decoder that allowsexpressing the network output with its original dimension. Alternating dense blocks and transitionlayers constitute both. Indeed, this architecture merges key characteristics of fully connected withconvolutional networks, suited for the present application. On one side, convolutional networks arequite effective in dimension reduction [45] and are capable of capturing spatial correlations presentin the input velocity fields. Fully connected layers enhance the information transmitted across thenetwork, improving the overall efficiency reflected in reasonable sizes of the needed training dataset [46].Dense-blocks connect all layers directly to each other, helping the training process with theimprovement of information flow and gradients across the network [46]. Inputs of the 𝑙 -th layerare the concatenated outputs from the previous layers, that is, z 𝑙 = 𝐻 𝑙 ([ z , z , … , z 𝑙 −1 ]) , with z 𝑙 the output of 𝑙 -th layer, and [ z , z , … , z 𝑙 −1 ] refers to their concatenation, and [0 , … , 𝑙 − 1] . 𝐻 𝑙 isa non-linear transformation. Here, 𝐻 𝑙 results from applying three consecutive operations, batchnormalization [47] followed by a ReLU [48] and, convolution. The dense-block has two designparameters, the number of layers 𝐿 , and the growth rate, 𝐾 , the number of output features of everysingle layer. The transition layers here, in the encoder (decoder), are convolutional (deconvolu-tional) and, therefore, handle the dimension inputs or outputs of dense blocks. As shown in Figure1, during the encoder phase, the high dimensional velocity fields are immersed in an alternatingseries of layers of dense blocks and encoders. The last layer of the encoder phase produces low-dimensional feature maps that characterize the high dimensional field, as shown in the purple maps.Such maps are immersed in the decoder phase, which is composed of an alternating series of layersof dense-blocks and decoders, returning ICs to its (high) dimension.The surrogate 𝐠 is expressed formally in a compact notation as, R. S. M. Freitas et al. Page 7 of 26n encoder-decoder deep surrogate for reverse time migration ̂𝐲 = 𝐠 ( 𝐱 ; 𝐰 ) , (4)where ̂𝐲 is the surrogate prediction (imaging condition, 𝐈 ) for an input 𝐱 (that is, a velocity field 𝐯 ),and vector 𝐰 contains the parameters of the neural network. Training the neural network meanslearning parameters 𝐰 using data from the set = {( 𝐱 𝑖 , 𝐲 𝑖 )} , 𝑖 = 1 ⋯ 𝑁 𝑡𝑟𝑎𝑖𝑛 obtained from sim-ulations with the RTM algorithm, where 𝑁 𝑡𝑟𝑎𝑖𝑛 is the number of samples in the training set. Thestochastic gradient descent algorithm computes the unknown network elements for a given lossfunction [49]. We consider for training our surrogate, the following regularized mean squarederror (MSE) function, 𝐿 𝑀𝑆𝐸 = 1 𝑁 𝑡𝑟𝑎𝑖𝑛 𝑁 𝑡𝑟𝑎𝑖𝑛 ∑ 𝑖 =1 ‖‖ ̂𝐲 𝑖 − 𝐲 𝑖 ‖‖ + 𝛼 Ω( 𝐰 ) (5)where ̂𝐲 𝑖 = 𝐠 𝑖 ( 𝐱 𝑖 , 𝐰 ) . Here the penalty function is given by Ω( 𝐰 ) = 𝐰 𝑇 𝐰 for the regularization.Moreover, the root mean squared error ( 𝑅𝑀 𝑆𝐸 ) is used for monitoring the convergence of thetraining error. The
𝑅𝑀 𝑆𝐸 is given by,
𝑅𝑀 𝑆𝐸 = √√√√ 𝑁 𝑡𝑟𝑎𝑖𝑛 𝑁 𝑡𝑟𝑎𝑖𝑛 ∑ 𝑖 =1 ‖‖ 𝐲 𝑖 − ̂ 𝐲 𝑖 ‖‖ . (6)
4. Numerical experiments
Here, we present two examples to demonstrate the ability of the encoder-decoder surrogatein replacing the original two-way wave equation RTM algorithm efficiently. In these numericalexperiments, we mimic
Stage 1 outputs of Algorithm 1. That is, we need to generate an ensembleof velocities. Hence, we assign to the different geological layers random velocity magnitudes forproducing synthetic data to train the neural network and perform the uncertainty analysis. The firstexample deals with a medium containing two flat geological layers of constant velocity. We increasethe difficulty for the surrogate in the second example, by employing a more complex medium, inwhich the five geological layers are no longer flat, implying in horizontal heterogeneity. The seismicsource term considered in the present work is a Ricker-type wavelet [50].The encoder-decoder networks are constructed using the open platform Tensorflow [51]. TheAdam optimizer algorithm [52] is employed for parameter learning, considering a weight decay of −5 , and an initial learning rate of −3 , with a learning rate scheduler, that is used droppingtwo times on plateau of the rooted mean squared error. We compute a total of 1300 samples byconsidering the velocity magnitude constant in the interior of each geological layer. Therefore,each velocity field in the ensemble has the form, 𝐯 = 𝑛 𝑙 ∑ 𝑙 =1 𝑣 𝑙 (1 + 𝜎 𝑙 𝜉 𝑙 ) 𝐏 𝑙 (7)where 𝑛 𝑙 is the number of geological layers, 𝑣 𝑙 is the mean velocity within each geological layer, 𝜎 𝑙 isthe standard deviation, here assumed as , 𝜉 𝑙 ∼ 𝕌 [−1 , is a random variable following a uniformdistribution, and 𝐏 𝑙 is a 𝑁 -dimensional vector containing in the components corresponding to the R. S. M. Freitas et al. Page 8 of 26n encoder-decoder deep surrogate for reverse time migration 𝑙 -th geological layer grid points and otherwise. After that, we apply a moving harmonic averageto 𝐯 with a window length of 20 grid points to mimic the velocity fields computed by parametermodel building techniques like tomography or full-waveform inversion. To analyze the surrogatetraining convergence, out of 1300 samples computed with the original RTM model, we create fourtraining datasets with , , , and samples each. We used the remaining 𝑁 𝑡𝑒𝑠𝑡 = 500 samples to test the trained network.Accuracy is measured using the distance between predictions with the surrogate model andthose computed with the RTM original model. To evaluate the surrogate model quality, we considerthe coefficient of determination ( 𝑅 -score) metric [53]. We define the coefficient of determinationas, 𝑅 = 1 − ∑ 𝑁 𝑡𝑒𝑠𝑡 𝑖 =1 ‖‖ 𝐲 𝑖 − ̂ 𝐲 𝑖 ‖‖ ∑ 𝑁 𝑡𝑒𝑠𝑡 𝑖 =1 ‖‖ 𝐲 𝑖 − 𝐲 ‖‖ (8)where 𝐲 = 𝑁 𝑡𝑒𝑠𝑡 ∑ 𝑁 𝑡𝑒𝑠𝑡 𝑖 =1 𝐲 𝑖 is the mean of test samples. The 𝑅 -score metric represents the normalizederror, allowing the comparison between surrogate models trained by different datasets, with valuesclose to 1 corresponding to the surrogate models best accuracy. Here, we consider 0.95 a goodtarget. Also, we intend that the surrogate model returns not only good predictions of seismic imagesbut also accurate estimations of quantities that characterize the uncertainties in such images. Tomeasure the degree of uncertainty in the seismic images, we follow the approach proposed in [54].In their approach, the degree of uncertainty is expressed by a confidence index that represents thepointwise normalized standard deviation, where low values represent high variabilities and highvalues the opposite. The confidence index is, 𝑐 ( r ) = 𝜎 𝑚𝑎𝑥 − 𝜎 ( r ) 𝜎 𝑚𝑎𝑥 − 𝜎 𝑚𝑖𝑛 , (9)where 𝑐 ( r ) is the confidence index at position r , and 𝜎 𝑚𝑖𝑛 and 𝜎 𝑚𝑎𝑥 are the minimum and maximumfield standard deviations, respectively. Another form of measuring the degree of uncertainty is thecoefficient of variation, defined as the pointwise ratio between the standard deviation and the mean,c 𝑣 ( r ) = 𝜎 ( r ) 𝜇 ( r ) (10)where 𝜇 ( r ) is the expected value at position r . In this first example, designed to evaluate the efficacy and efficiency of the proposed surrogate,we assume a simple geologic scenario in which two horizontal homogeneous geological layersseparated by a flat horizon parallel to the surface composes the subsurface, as shown in Fig 2.This domain has 1000 m of depth and 1000 m of lateral extension, where the velocity in the firstgeological layer is 3000 m/s, and the velocity in the deeper geological layer is 4500 m/s.We produce synthetic data using the wave propagation forward solver with the reference ve-locity field of Fig 2, with a seismic source with cutoff frequency of 30 Hz.In such modeling, wesimulate a fixed split-spread acquisition [38] comprising nine shots, where receivers are positionednear the surface for each shot and equally spaced of 20 meters. The seismic source is also placed
R. S. M. Freitas et al. Page 9 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] v e l o c i t y [ m / s ] Figure 2:
Simple geologic setup: two horizontal geological layers.
Table 1
RTM numerical parameters.Parameter Value Description (Unit) ℎ Spatial discretization (m) Δ 𝑡 .
22 × 10 −3 Temporal discretization (s) 𝑡 𝑎 𝑁 𝑥 × 𝑁 𝑦
50 × 50
Number of grid points 𝑖 𝑠𝑟𝑐𝑥 , 𝑖 𝑠𝑟𝑐𝑦 ([5:5:45], 5) Source positions 𝑖 𝑠𝑖𝑠𝑥 , 𝑖 𝑠𝑖𝑠𝑦 ( [1:1:50], 5) Receiver positions near the surface and moved 100 meters for each shot, covering the entire domain with nine shots.Table 1 shows the parameters used in the numerical modeling of the wave acoustic equations andthe positioning of the seismic source and receivers given by the index ranges [ 𝑖 𝑠𝑟𝑐𝑥 , 𝑖 𝑠𝑟𝑐𝑦 ] for thesources, and [ 𝑖 𝑠𝑖𝑠𝑥 , 𝑖 𝑠𝑖𝑠𝑦 ] for the receivers. The grid size and time step respect the numerical disper-sion and stability criteria [36].Table 2 details the neural network architecture. The first layer is convolutional, with kernelsize equals to 4 and stride 2. This first layer captures spatial relations from the velocity input.The number of features maps after the first convolutional layer is 48. The neural network has 3dense-blocks with 𝐿 = 4 and 𝐾 = 16. Dense-blocks have a kernel size equals 3, and a stride of 1.Encoder-decoder layers have a kernel size of 3 and a stride of 2. In the decoding layer, we introducea transposed convolution, allowing the expression of the output with its original dimensionality,equal to the computational grid. A final ReLU layer [48] imposes that the outputs are positivenumbers, naturally constraining the network to output 𝐼 𝐶 > at each grid point. Thus, the neuralnetwork has , parameters to be estimated.Figure 3 shows the decay of the 𝑅𝑀 𝑆𝐸 as a function of the number of epochs during the
R. S. M. Freitas et al. Page 10 of 26n encoder-decoder deep surrogate for reverse time migration
Table 2
Neural Network Architecture. "Outputs" represents the number of features maps and "Dimension" isthe dimension of the features maps. Layers Output DimensionInput 1
50 × 50
Convolution 48
24 × 24
Dense-block 1 112
24 × 24
Encoding 56
12 × 12
Dense-block 2 120
12 × 12
Decoding 1 60
24 × 24
Dense-block 3 124
24 × 24
Decoding 2 1
50 × 50
ReLU 1
50 × 50 R M S E Train = 200 samplesTrain = 400 samplesTrain = 600 samplesTrain = 800 samples
Figure 3:
RMSE decay with number of epochs. training process, for training data sets ranging from 200 to 800 samples. Note that the
𝑅𝑀 𝑆𝐸 stabilizes after 150 epochs for all cases and that for smaller data sets, we see higher error values.Key characteristics one is seeking when replacing the original expensive computational model by asurrogate are efficiency and accuracy. We estimate efficiency as the ratio between 𝑁 𝑆 , the numberof samples in the training set, and 𝑁 𝑀𝐶 , the number of samples of the MC method to achievea prescribed level of accuracy ( 𝑅 ≥ . ). Thus, we introduce the following index to evaluateefficiency, Efficiency = ( 𝜂 𝑁 𝑆 𝑁 𝑀𝐶 ) × 100 (11)Then, the index in equation (11) represents the percentage of the saved computational costs, and 𝜂 is an adjustment factor accounting for the time spent in the construction, training, and makingpredictions with the surrogate model. Without loss of generality, we assume for now 𝜂 = 1 . . Forless optimistic conditions, we recognize that 𝜂 > . . We calculate the coefficient of determination 𝑅 to assess the accuracy of the neural network with the remaining 500 samples. We observe thatthe surrogate model accuracy is good even in small training data scenarios, reaching 𝑅 ≥ . , R. S. M. Freitas et al. Page 11 of 26n encoder-decoder deep surrogate for reverse time migration
200 300 400 500 600 700 800Number of samples for training93.594.094.595.095.596.096.5 E ff i c i e n c y [ % ] R - s c o r e Figure 4:
Test 𝑅 -score and efficiency in function of the number of training samples. as shown in Fig. 4. Furthermore, Fig. 4 also depicts the surrogate efficiency. Here it is worthhighlighting that 600 RTM runs are necessary to compute the variance with a relative error of −3 . Thus, we see that the efficiency is higher than 90%, even for the larger data set with 800samples.To further illustrate how the surrogate model approximates the predictions of the original modelwith good accuracy, Fig. 5 shows comparisons for two realizations chosen randomly from the testset. We observe that the surrogate presents good results, returning good predictions of the imagecondition. We also depict a comparison between the standard deviation, 𝜎 ( 𝐫 ) , the confidence index, 𝑐 ( 𝐫 ) , and the coefficient of variation, 𝑐 𝑣 ( 𝐫 ) , predicted by the original and surrogate models with 𝑁 𝑡𝑒𝑠𝑡𝑖𝑛𝑔 = 500 testing samples, see Fig. 6. Besides, we introduce a discrete version of a 𝐿 relativeerror between two spatial fields 𝑓 , one computed with the RTM and the other by the surrogate as, 𝑒 𝑔 = 1 𝑁 𝑁 ∑ 𝑖 =1 ( 𝑔 𝑖𝑅𝑇 𝑀 − 𝑔 𝑖𝑠 𝑔 𝑖𝑅𝑇 𝑀 ) (12)where the subscripts refer to how we compute the field 𝑔 . This index is an average of the pointwiserelative error for all 𝑁 grid points. Figure 5 compares randomly selected seismic images producedby RTM with the corresponding ones obtained with the deep learning surrogate. The visual resem-blance is quantified using equation (12), leading to relative errors that stay below 2%. We extendfurther our assessment of the surrogate effectiveness by comparing the uncertainty indexes com-puted with the two techniques displayed in Fig. (6). For all indexes, the relative errors computedwith equation (12) are less than 1%. To challenge the encoder-decoder surrogate, we use a synthetic geologic model with five ho-mogeneous layers similar to the one proposed in [55]. The 2D velocity model consists of a waterlayer with velocity 1500 m/s, and four mini sedimentary basins with velocities of 2000 m/s, 2500m/s, 3000 m/s, and 4000 m/s, respectively. Figure 7 display a schematic view of the velocity fieldwith 1000 m of depth and 1000 m of lateral extension.Two synthetic seismograms are generated for the velocity fields shown in Fig 7 considering nowthe seismic sources with cutoff frequencies of 30 and 45Hz.Table 3 shows the RTM parameters and
R. S. M. Freitas et al. Page 12 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] (a) RTM model D e p t h [ m ] (b) Surrogate model Figure 5:
Randomly selected samples of seismic images in the test data set computed by the RTM (a)and the surrogate model (b). The relative errors in the image condition, 𝑒 𝐼 , are lower than 2%. Table 3
RTM numerical parameters.Parameter 𝑓 𝑐𝑢𝑡𝑜𝑓𝑓 = 30 𝐻𝑧 𝑓 𝑐𝑢𝑡𝑜𝑓𝑓 = 45 𝐻𝑧 Description (Unit) ℎ
10 6 . Spatial discretization (m) Δ 𝑡 .
25 × 10 −3 .
333 × 10 −4 Temporal discretization (s) 𝑡 𝑎 𝑁 𝑥 × 𝑁 𝑦
100 × 100 150 × 150
Number of grid points 𝑖 𝑠𝑟𝑐𝑥 , 𝑖 𝑠𝑟𝑐𝑦 ([5:10:95], 5) ([7:15:142], 7) Source positions 𝑖 𝑠𝑖𝑠𝑥 , 𝑖 𝑠𝑖𝑠𝑦 ( [1:1:100], 5) ([1:1:150], 7) Receiver positions the positioning of sources and receivers. For the cutoff frequency of 45Hz, due to the imposition ofthe stability and dispersion conditions in the discrete two-way wave equation, there is a significantincrease in the input dimensionality, that bears the potential to hamper the neural network training.The next sub-sections present results for the scenarios involving the two frequencies of excitation. R. S. M. Freitas et al. Page 13 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] Standard deviation - RTM model D e p t h [ m ] Standard deviation - Surrogate model D e p t h [ m ] Confidence index - RTM model D e p t h [ m ] Confidence index - Surrogate model D e p t h [ m ] Coefficient of variation - RTM model D e p t h [ m ] Coefficient of variation - Surrogate model
Figure 6:
UQ indexes - standard deviation, 𝜎 ( 𝐫 ) , confidence index, 𝑐 ( 𝐫 ) , and coefficient of variation, 𝑐 𝑣 ( 𝐫 ) - predicted by the RTM (left) and surrogate models (right). The relative errors to the RTM modelare lower than 1%. We detail the architecture of the neural network for this scenario in Table 4. It contains fivedense blocks, leading to , parameters to be trained. We can see in Figure 8 the 𝑅𝑀 𝑆𝐸 decay as the number of epochs increase for all training sets. We verify the accuracy of the surrogate
R. S. M. Freitas et al. Page 14 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] P P P P v e l o c i t y [ m / s ] Figure 7:
Geologic model with 5 layers adapted from [55]. 𝑃 , 𝑃 , 𝑃 , and 𝑃 are control points in thegeological model to compute statistics. R M S E Train = 200 samplesTrain = 400 samplesTrain = 600 samplesTrain = 800 samples
Figure 8:
RMSE decay with number of epochs. by computing the 𝑅 score for the 500 testing samples. We find that, as expected, for networkstrained with larger data sets, 𝑅 values are closer to . , as shown in Figure 9(a). Due to limitationsimposed by the high cost of generating samples using the full RTM model for this example, wedevelop a different efficiency analysis extrapolating from the basic conditions used for the networktraining. We assume conservatively that 𝑁 𝑀𝐶 is of the same order of the case in section 4.1. Thus,we start from a scenario where only 5,000 samples are needed to characterize uncertainties inthe seismic images and extrapolate to more expensive potential scenarios requiring hypotheticallytill 50,000 samples. Here, 𝑁 𝑆 is equal to 1100, 600 samples to train the neural network with anaccuracy of 𝑅 ≥ . , and 500 to test the surrogate model. Figure 9(b) depicts the efficiencyanalysis in function of 𝑁 𝑀𝐶 . We note, for the worst scenario, an efficiency of around 78%, and forthe scenarios where 𝑁 𝑀𝐶 is higher than 10,000 samples, the efficiencies reach values greater than90%. For the most expensive scenario, we see an efficiency close to 98%.Fig. 10 shows comparisons between images randomly selected from the test set and the corre-sponding images produced with the full RTM model. We observe that the surrogate model returns R. S. M. Freitas et al. Page 15 of 26n encoder-decoder deep surrogate for reverse time migration
Table 4
Neural Network Architecture. "Outputs" represents the number of features maps and "Dimension" isthe spatial dimension of the features maps.Layers Output DimensionInput 1
100 × 100
Convolution 48
48 × 48
Dense-block 1 112
48 × 48
Encoding 56
24 × 24
Dense-block 2 120
24 × 24
Encoding 60
12 × 12
Dense-block 3 124
12 × 12
Decoding 62
24 × 24
Dense-block 4 126
24 × 24
Decoding 63
48 × 48
Dense-block 5 127
48 × 48
Decoding 1
100 × 100
ReLU 1
100 × 100
200 300 400 500 600 700 800Samples0.900.910.920.930.940.950.960.970.98 R - s c o r e (a) 𝑅 -score for the trained networks E ff i c i e n c y [ % ] (b) Efficiency Figure 9:
Test 𝑅 -score and efficiency in function of the number of MC samples. excellent predictions of the imaging condition. We can also note that the surrogate model predictsthe UQ figures - standard deviation, 𝜎 ( 𝐫 ) , confidence index, 𝑐 ( 𝐫 ) , and coefficient of variation, 𝑐 𝑣 ( 𝐫 ) -with good accuracy, as seen in Fig 11, where the relative error between the surrogate predictions andthe RTM model are lower than 6%. Results in Figures 10 and 11 shows that the encoder-decodersurrogate extrapolates the replication of the IC training targets, delivering remarkable results toassist in UQ analysis.Next, we deepen our investigation of the surrogate’s ability to reproduce the probabilistic char-acterization of the IC fields by plotting the probability density functions (PDFs) of the imagingcondition at control points in the domain, as displayed in Fig. 7. We place the control points inregions of low and high uncertainties. As the reference solution, we use PDFs obtained by theRTM model with the 500 test samples to compare the accuracy of the surrogate model trained R. S. M. Freitas et al. Page 16 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] (a) RTM model D e p t h [ m ] (b) Surrogate model Figure 10:
Randomly selected images from the test data set computed by the RTM model (a) and thesurrogate model (b) trained with 600 samples. The relative errors in the image condition, 𝑒 𝐼 , are lowerthan 6%. with different datasets to estimate the PDFs at the control points. Figure 12 shows the imagingcondition PDFs estimated with the surrogate models trained with 400, 600, 800 samples, togetherwith the reference PDFs. We observe that the PDFs obtained with the surrogate model are in goodagreement with the reference ones for all control points, particularly at the deeper point ( 𝑃 ).Now we exemplify a possible surrogate use in the feature extraction and interpretation of seis-mic images, Stage 3 of Algorithm 1. We provide, using the surrogate, a view of the uncertainties
R. S. M. Freitas et al. Page 17 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] Standard deviation - RTM model D e p t h [ m ] Standard deviation - Surrogate model D e p t h [ m ] Confidence index - RTM model D e p t h [ m ] Confidence index - Surrogate model D e p t h [ m ] Coefficient of Variation - RTM model D e p t h [ m ] Coefficient of variation - Surrogate model
Figure 11:
UQ indexes - standard deviation, 𝜎 ( 𝐫 ) , confidence index, 𝑐 ( 𝐫 ) , and coefficient of variation, 𝑐 𝑣 ( 𝐫 ) - predicted by the RTM model (left) and the surrogate model (right). The relative errors betweenthe surrogate predictions to the RTM model for the UQ indexes are less than 6%. associated with specific seismic targets, the interfaces of geological layers. This view can revealhow the propagated uncertainties can directly impact the images posterior interpretation. Figure 13provides the IC mean value and associate confidence bands for the four interfaces. In the right partof the figure, we give an idea of the uncertainties spatial distribution, having as background a ran-domly selected image from the ensemble. To promote visual perception, we plotted amplified IC R. S. M. Freitas et al. Page 18 of 26n encoder-decoder deep surrogate for reverse time migration P D F RTM modelSurrogate - 400 samplesSurrogate - 600 samplesSurrogate - 800 samples 𝑃 P D F 𝑃 P D F 𝑃 P D F 𝑃 Figure 12:
Comparison between PDFs predicted by the RTM model and the surrogate model. confidence bands associated with each interface. Those bands reflect the IC value dispersion amidthe image ensemble, and, therefore, might lead to a lack of confidence in the reflector placement.
We now stress the proposed deep learning surrogate by testing its performance in a scenario notconsidered for the training, but still focusing on the subsurface geology of Fig.7. The RTM imagecomprises a certain number of shots, each designed for illuminating the same subsurface usingdifferent seismic sources conditions. Here we analyze a situation involving a higher excitationfrequency, 𝑓 = 45 𝐻 𝑧 , implying, due to numerical requirements, in the necessity of a finer grid. Insuch a case, input and output images have different dimensions compared to the previous scenario.Still, the intrinsic dimensionality is the same for the input as we are imaging the same velocityfield as before. Instead of seeking for a new architecture, we slightly changed the previous oneby replacing the first and last network layers and adapting the initial convolutional layer to ensurethat an integer number defines the kernel. We do not expect to obtain the surrogate’s optimalperformance by employing such a strategy, but that can be quite useful in practical terms if it works.Table 5 shows the neural network architecture for the 45 Hz scenario. The network architectureis the same as in section 4.2.1 with small changes. The first convolutional layer has a kernel size
R. S. M. Freitas et al. Page 19 of 26n encoder-decoder deep surrogate for reverse time migration
Confidence bands on the interfaces for 𝑓 = 30 Hz Confidence bands over image condition, 𝑓 =30 Hz Figure 13:
IC confidence bands over the interface between geological layers (left), and IC confidencebands superimposed over a randomly selected image (right) for 𝑓 = 30 Hz.
Table 5
Neural Network Architecture. "Outputs" represents the number of features maps and "Dimension" isthe dimension of the features maps. Layers Output DimensionInput 1
150 × 150
Convolution 48
72 × 72
Dense-block 1 112
72 × 72
Encoding 56
36 × 36
Dense-block 2 120
36 × 36
Encoding 60
18 × 18
Dense-block 3 124
18 × 18
Decoding 62
36 × 36
Dense-block 4 126
36 × 36
Decoding 63
72 × 72
Dense-block 5 127
72 × 72
Decoding 1
150 × 150
ReLU 1
150 × 150 equal to 7 and a stride of 2. The total number of parameters in the network is , . Figure 14shows 𝑅𝑀 𝑆𝐸 decay as a function of the number of epochs in the training process.We can see in Fig. 15a the 𝑅 score for different training sets showing for this more diffi-cult scenario a slight decrease in the neural network quality. Confirming our initial expectationsof a non-optimal but acceptable performance, the coefficients of determination 𝑅 for all training R. S. M. Freitas et al. Page 20 of 26n encoder-decoder deep surrogate for reverse time migration R M S E Train = 200 samplesTrain = 400 samplesTrain = 600 samplesTrain = 800 samples
Figure 14:
RMSE decay with number of epochs.
200 300 400 500 600 700 800Samples0.740.760.780.800.820.840.860.88 R - s c o r e (a) 𝑅 -score for the trained networks E ff i c i e n c y [ % ] η = 1.0η = 1.5η = 2.0η = 2.5η = 3.0 (b) Efficiency Figure 15: 𝑅 -score and efficiency for the trained networks. datasets are lower than 0.90. Moreover, we estimate the efficiency of the surrogate model in thesame manner as in the previous case. However, here we consider values for the adjustment factor 𝜂 > . . More precisely, the adjustment factor tries to estimate the time spent in search of theneural network hyperparameters to optimize the surrogate model accuracy and to generate largertraining sets. Without loss of generality, we assume that the number of samples 𝑁 𝑆 to train theneural network is equal to 1100, 600 to train, and 500 to test the surrogate model. Figure 15(b)shows the efficiency in function of 𝑁 𝑀𝐶 , for several adjustment factors. Note that for scenarioswith 𝑁 𝑀𝐶 ≤ , , the efficiency drops significantly for higher adjustment factors. However, forscenarios where 𝑁 𝑀𝐶 ≥ , samples the efficiency reaches values close to 80-90%. For sce-narios where 𝑁 𝑀𝐶 ≥ , we observe an efficiency close to 90% even for the higher adjustmentfactor.Despite the lower accuracy presented in this scenario, the surrogate model could reach satis-factory predictions of the imaging condition, as we can see in Fig. 16. In this Figure, we showthe three randomly selected images from the test data set computed by the RTM model and thesurrogate model. Note, however, that the image produced by the RTM model may not be the bestimage we can compute for these conditions. The grid is adjusted only to satisfy the stability anddispersion criteria for the 45 Hz cutoff frequency. We do not optimize the domain size for a proper R. S. M. Freitas et al. Page 21 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] (a) RTM model D e p t h [ m ] (b) Surrogate model Figure 16:
Randomly selected images from the test data set computed by the RTM model (a) and thesurrogate model (b) trained with 600 samples. The relative errors 𝑒 𝐼 = are lower than 10%. representation of the non-reflecting boundary conditions and source/receiver arrangement. Fur-thermore, Figure 17 shows a comparison between the standard deviation, 𝜎 ( 𝐫 ) , confidence index, 𝑐 ( 𝐫 ) , and coefficient of variation, 𝑐 𝑣 ( 𝐫 ) , computed by the RTM and surrogate models. We observethat the surrogate model predicts the UQ indexes with satisfactory accuracy. The relative errorsbetween the surrogate predictions to the RTM model for the UQ indexes are lower than 6%.We now investigate the probability density functions (PDFs) of the imaging condition at thecontrol points in Fig. 7. We use again as reference solution PDFs obtained by the RTM model R. S. M. Freitas et al. Page 22 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] Standard deviation - RTM model D e p t h [ m ] Standard deviation - Surrogate model D e p t h [ m ] Confidence index - RTM model D e p t h [ m ] Confidence index - Surrogate model D e p t h [ m ] Coefficient of variation - RTM model D e p t h [ m ] Coefficient of Variation - Surrogate model
Figure 17:
UQ indexes - standard deviation, 𝜎 ( 𝐫 ) , confidence index, 𝑐 ( 𝐫 ) , and coefficient of variation, 𝑐 𝑣 ( 𝐫 ) - predicted by the RTM model (left) and the surrogate model (right). The relative errors betweenthe surrogate predictions to the RTM model for the UQ indexes are lower than 6%. with 500 test samples to verify the accuracy of the surrogate models trained with different datasetsto estimate the PDFs at the control points. Figure 18 depicts the imaging condition PDFs at thecontrol points estimated by the surrogate model trained with 200, 400, 600, 800 samples, togetherwith the reference PDFs. We observe that the PDFs obtained with the surrogate model capture wellthe reference PDFs in all control points, particularly for large training datasets. R. S. M. Freitas et al. Page 23 of 26n encoder-decoder deep surrogate for reverse time migration P D F RTM modelSurrogate - 400 samplesSurrogate - 600 samplesSurrogate - 800 samples 𝑃 P D F 𝑃 P D F 𝑃 P D F 𝑃 Figure 18:
Comparison between PDFs predicted by the RTM model and the surrogate model.
5. Conclusions
We propose a deep learning model based on an encoder-decoder architecture to replace thecostly RTM technique on producing seismic images. This approach naturally fits the frameworkof a computational workflow to produce seismic images with quantified uncertainty in [2]. Thissurrogate model builds a scalable image–to–image mapping, coping with the high dimensionalityof both the heterogeneous velocity fields that serve as inputs and images outputs. Such surrogatehas revealed to be very efficient in the context of UQ many-query tasks, as demonstrated by ournumerical examples. Indeed, that was observed even in cases where we employ a non-optimalneural network architecture.We place our contribution in the emerging area of physics-informed machine learning, wherethe final model, in many different ways, blends two main components: often expensive compu-tational models relying on first principles and phenomenological closure equations, and machinelearning data-driven tools. Such combination not only suits perfectly to the needs required by theworkflow mentioned earlier but also offers a broad spectrum of opportunities to improve perfor-mance, like employing more powerful training strategies and automatic hyperparameters optimiza-tion.
R. S. M. Freitas et al. Page 24 of 26n encoder-decoder deep surrogate for reverse time migration
Acknowledgements
This study was financed in part by CAPES, Brasil Finance Code 001. This work is also partiallysupported by FAPERJ (grant E-26/203.018/2017), CNPq (grant 302489/2016-9), and Petrobras(grant 2017/00148-9).
References [1] Yu Chen, Kai Gao, Eric S. Davis, Dipen N. Sinha, Cristian Pantea, and Lianjie Huang. Full-waveform inversion and least-squares reverse-time migration imaging of collimated ultrasonic-beam data for high-resolution wellbore integrity monitoring. Applied Physics Letters,113(7):071903, 2018.[2] Carlos HS Barbosa, Liliane NO Kunstmann, Rômulo M Silva, Charlan DS Alves, Bruno S Silva, Marta Mattoso, Fernando A Rochinha,Alvaro LGA Coutinho, et al. A workflow for seismic imaging with quantified uncertainty. arXiv preprint arXiv:2001.06444, 2020.[3] A. Botero, A Gesret, T. Romary, M. Noble, and C. Maisons. Stochastic seismic tomography by interacting markov chains. GeophysicalJournal International, 207:374–392, 2016.[4] J. Belhadj, T. Romary, A. Gesret, M. Noble, , and B. Figliuzzi. New parametrizations for bayesian seismic tomography. Inverse Problems,34:33, 2018.[5] N. Brantut. Time-resolved tomography using acoustic emissions in the laboratory, and application to sandstone compaction. GeophysicalJournal International, 213:2177–2192, 2018.[6] James Martin, Lucas C Wilcox, Carsten Burstedde, and Omar Ghattas. A stochastic newton mcmc method for large-scale statistical inverseproblems with application to seismic inversion. SIAM Journal on Scientific Computing, 34(3):A1460–A1487, 2012.[7] Hejun Zhu, Siwei Li, Sergey Fomel, Georg Stadler, and Omar Ghattas. A bayesian approach to estimate uncertainty for full-waveforminversion using a priori information from depth migration. Geophysics, 81(5):R307–R323, 2016.[8] Reetam Biswas and Mrinal Sen. 2d full-waveform inversion and uncertainty estimation using the reversible jump hamiltonian monte carlo.In SEG Technical Program Expanded Abstracts 2017, pages 1280–1285. Society of Exploration Geophysicists, 2017.[9] Lars Gebraad, Christian Boehm, and Andreas Fichtner. Bayesian elastic full-waveform inversion using hamiltonian monte carlo. EarthArXiv,page qftn5, 2019.[10] Zeyu Zhao and Mrinal K Sen. A gradient based mcmc method for fwi and uncertainty analysis. In SEG Technical Program Expanded Abstracts2019, pages 1465–1469. Society of Exploration Geophysicists, 2019.[11] H-W. Zhou, H. Hu, Z. Zou, Y. Wo, and O. Youn. Reverse time migration: A prospect of seismic imaging methodology. Earth-ScienceReviews, 179:207–227, 2018.[12] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. 2006.[13] Ilias Bilionis and Nicholas Zabaras. Multi-output local gaussian process regression: Applications to uncertainty quantification. Journal ofComputational Physics, 231(17):5718 – 5746, 2012.[14] Ilias Bilionis, Nicholas Zabaras, Bledar A Konomi, and Guang Lin. Multi-output separable gaussian process: Towards an efficient, fullybayesian paradigm for uncertainty quantification. Journal of Computational Physics, 241:212–239, 2013.[15] L. Parussini, D. Venturi, P. Perdikaris, and G.E. Karniadakis. Multi-fidelity gaussian process regression for prediction of random fields.Journal of Computational Physics, 337:36–50, 2017.[16] P. Perdikaris, D. Venturi, and G. E. Karniadakis. Multi-fidelity information fusion algorithms for high-dimensional systems and massive datasets. SIAM J. Sci. Comput., 38:B521 – B538, 2016.[17] G. S. H. Pau, Y. Zhang, and S. Finsterle. Reduced order models for many-query subsurface flow applications. Computational Geosciences,17:705–721, 2013.[18] Y. Zhu and N. Zabaras. Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. Journalof Computational Physics, 366:415–447, 2018.[19] S. Mo, Y. Zhu, N. Zabaras, X. Shi, and J. Wu. Deep convolutional encoderâĂŘdecoder networks for uncertainty quantification of dynamicmultiphase flow in heterogeneous media. Water Resources Research, 55:703–728, 2019.[20] Y. Zhu, N. Zabaras, P.S. Koutsourelakis, and P. Perdikaris. Physics-constrained deep learning for high-dimensional surrogate modeling anduncertainty quantification without labeled data. Journal of Computational Physics, 394:56–81, 2019.[21] S. Mo, N. Zabaras, X. Shi, and J. Wu. Integration of adversarial autoencoders with residual dense convolutional networks for estimation ofnonâĂŘgaussian hydraulic conductivities. Water Resources Research, 56:e2019WR026082, 2020.[22] Y. Yang and P. Perdikaris. Conditional deep surrogate models for stochastic, high-dimensional, andmulti-fidelity systems. ComputationalMechanics, 64:417–434, 2019.[23] M. Tang, Y. Liu, and L. J.Durlofsky. A deep-learning-based surrogate model for data assimilation in dynamic subsurface flow problems.Journal of Computational Physics, 413:109456, 2020.[24] Adar Kahana, Eli Turkel, Shai Dekel, and Dan Givoli. Obstacle segmentation based on the wave equation and deep learning. Journal ofComputational Physics, 413:109458, 2020.[25] Jian Sun, Zhan Niu, Kristopher A. Innanen, Junxiao Li, , and Daniel O. Trad. A theory-guided deep-learning formulation and optimizationof seismic waveform inversion. Geophysics, 85(2):R87–R99, 2020.[26] G. T. Schuster. Seismic inversion. Society Exploration Geophysicists, 1 edition, 2017.[27] Charles Cerjan, Dan Kosloff, Ronnie Kosloff, and Moshe Reshef. A nonreflecting boundary condition for discrete acoustic and elastic waveequations. Geophysics, 50(4):705–708, 1985.
R. S. M. Freitas et al. Page 25 of 26n encoder-decoder deep surrogate for reverse time migration [28] Wen-Fong Chang and George A McMechan. Reverse-time migration of offset vertical seismic profiling data using the excitation-time imagingcondition. Geophysics, 51(1):67–84, 1986.[29] Wen-Fong Chang and George A McMechan. Elastic reverse-time migration. Geophysics, 52(10):1365–1375, 1987.[30] Bao D Nguyen and George A McMechan. Excitation amplitude imaging condition for prestack reverse-time migration. Geophysics,78(1):S37–S46, 2013.[31] Paul Sava and Sergey Fomel. Time-shift imaging condition in seismic migration. Geophysics, 71(6):S209–S217, 2006.[32] Bin Wang, Chuck Mason, Manhong Guo, Kwangjin Yoon, Jun Cai, Jean Ji, and Zhiming Li. Subsalt velocity update and composite imagingusing reverse-time-migration based delayed-imaging-time scan. Geophysics, 74(6):WCA159–WCA166, 2009.[33] Paul Sava and Ivan Vasconcelos. Extended imaging conditions for wave-equation migration. Geophysical Prospecting, 59(1):35–55, 2011.[34] Faqi Liu, Guanquan Zhang, Scott A Morton, and Jacques P Leveille. An effective imaging condition for reverse-time migration using wavefielddecomposition. Geophysics, 76(1):S29–S39, 2011.[35] Sandip Chattopadhyay and George A McMechan. Imaging conditions for prestack reverse-time migration. Geophysics, 73(3):S81–S89, 2008.[36] John C. Strikwerda. Finite Difference Schemes and Partial Differential Equations, Second Edition. SIAM: Society for Industrial and AppliedMathematics.[37] Toktam Zand, Hamid R. Siahkoohi, Alison Malcolm, Ali Gholami, and Alan Richardson. Consensus optimization of total variation–basedreverse time migration. Computational Geosciences, 2020.[38] Philip Kearey, Michael Brooks, and Ian Hill. An introduction to geophysical exploration. John Wiley & Sons, 2013.[39] Öz Yilmaz. Seismic data analysis: Processing, inversion, and interpretation of seismic data. Society of exploration geophysicists, 2001.[40] Lanshu Bai, Huiyi Lu, and Yike Liu. High-efficiency observations: compressive sensing and recovery of seismic waveform data. Pure andApplied Geophysics, 177(1):469–485, 2020.[41] Navjot Kukreja, Jan Huckelheim, Mathias Louboutin, Kaiyuan Hou, Fabio Luporini, Paul Hovland, and Gerard Gorman. Combining check-pointing and data compression for large scale seismic inversion. arXiv preprint arXiv:1810.05268, 2018.[42] Peter Lindstrom, Po Chen, and En-Jui Lee. Reducing disk storage of full-3d seismic waveform tomography (f3dt) through lossy onlinecompression. Computers & Geosciences, 93:45–54, 2016.[43] Philipp A Witte, Mathias Louboutin, Fabio Luporini, Gerard J Gorman, and Felix J Herrmann. Compressive least-squares migration withon-the-fly fourier transforms. Geophysics, 84(5):R655–R672, 2019.[44] Marco Ballesio, Joakim Beck, Anamika Pandey, Laura Parisi, Erik von Schwerin, and Raúl Tempone. Multilevel monte carlo acceleration ofseismic wave propagation under uncertainty. GEM-International Journal on Geomathematics, 10(1):22, 2019.[45] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. .[46] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017.[47] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICMLâĂŹ15, page448âĂŞ456. JMLR.org, 2015.[48] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and MiroslavDudÃŋk, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedingsof Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.[49] F. Chollet. Deep Learning with Python. Manning Publications Company, 2017.[50] Norman Ricker. The form and laws of propagation of seismic wavelets. Geophysics, 18(1):10–40, 01 1953.[51] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean,Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, LukaszKaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, OriolVinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorflow.org.[52] P. Kingma Diederik and Ba Jimmy. Adam: A method for stochastic optimization. preprint arXiv:1412.6980, 2014.[53] Sanford Weisberg. Applied Linear Regression. John Wiley & Sons, Inc., 2005.[54] Y. Li and J. Sun. 3d magnetization inversion using fuzzy c-means clustering with application to geology differentiation. Geophysics, 81:J61–J78, 2016.[55] T. Huang, Y. Zhang, and H. Zhang. The benefit of tti reverse time migration for subsalt imaging, gulf of mexico. 2009..[46] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017.[47] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICMLâĂŹ15, page448âĂŞ456. JMLR.org, 2015.[48] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and MiroslavDudÃŋk, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedingsof Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.[49] F. Chollet. Deep Learning with Python. Manning Publications Company, 2017.[50] Norman Ricker. The form and laws of propagation of seismic wavelets. Geophysics, 18(1):10–40, 01 1953.[51] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean,Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, LukaszKaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, OriolVinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorflow.org.[52] P. Kingma Diederik and Ba Jimmy. Adam: A method for stochastic optimization. preprint arXiv:1412.6980, 2014.[53] Sanford Weisberg. Applied Linear Regression. John Wiley & Sons, Inc., 2005.[54] Y. Li and J. Sun. 3d magnetization inversion using fuzzy c-means clustering with application to geology differentiation. Geophysics, 81:J61–J78, 2016.[55] T. Huang, Y. Zhang, and H. Zhang. The benefit of tti reverse time migration for subsalt imaging, gulf of mexico. 2009.