[PDF] An encoder-decoder deep surrogate for reverse time migration in seismic imaging under uncertainty

Abstract

Seismic imaging faces challenges due to the presence of several uncertainty sources. Uncertainties exist in data measurements, source positioning, and subsurface geophysical properties. Reverse time migration (RTM) is a high-resolution depth migration approach useful for extracting information such as reservoir localization and boundaries. RTM, however, is time-consuming and data-intensive as it requires computing twice the wave equation to generate and store an imaging condition. RTM, when embedded in an uncertainty quantification algorithm (like the Monte Carlo method), shows a many-fold increase in its computational complexity due to the high input-output dimensionality. In this work, we propose an encoder-decoder deep learning surrogate model for RTM under uncertainty. Inputs are an ensemble of velocity fields, expressing the uncertainty, and outputs the seismic images. We show by numerical experimentation that the surrogate model can reproduce the seismic images accurately, and, more importantly, the uncertainty propagation from the input velocity fields to the image ensemble.

Full PDF

AAn encoder-decoder deep surrogate for reverse timemigration in seismic imaging under uncertainty

Rodolfo S. M. Freitas a , Carlos H. S. Barbosa a , Gabriel M. Guerra b , Alvaro L.G. A. Coutinho a and Fernando A. Rochinha a , ∗ a COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, 21941-598, Brazil b Department of Mechanical Engineering, Federal Fluminense University, NiterÃşi, Brazil

A R T I C L E I N F O

Keywords :Reverse time migrationDeep LearningSurrogate ModelingUncertainty Quantiﬁcation

A B S T R A C T

Seismic imaging faces challenges due to the presence of several uncertaintysources. Uncertainties exist in data measurements, source positioning, andsubsurface geophysical properties. Reverse time migration (RTM) is a high-resolution depth migration approach useful for extracting information suchas reservoir localization and boundaries. RTM, however, is time-consumingand data-intensive as it requires computing twice the wave equation to gener-ate and store an imaging condition. RTM, when embedded in an uncertaintyquantiﬁcation algorithm (like the Monte Carlo method), shows a many-foldincrease in its computational complexity due to the high input-output dimen-sionality. In this work, we propose an encoder-decoder deep learning surro-gate model for RTM under uncertainty. Inputs are an ensemble of velocityﬁelds, expressing the uncertainty, and outputs the seismic images. We showby numerical experimentation that the surrogate model can reproduce theseismic images accurately, and, more importantly, the uncertainty propaga-tion from the input velocity ﬁelds to the image ensemble.

1. Introduction

Seismic imaging is employed to delineate the salient geological features of the Earth subsur-face. Imaging methods are popular in the Oil & Gas industry as they are designed to be focused onthe more essential characteristics: the horizons bounding the regions of interest. They can also beused in conjunction with inverse methods such as Full Waveform Inversion [1]. Imaging methodsare designed and built departing from the integration of specialized optical (illuminating) prin-ciples and physics-based models describing the wave propagation through heterogeneous media.A critical aspect arising from such arrangement is the potential computational cost required, asa large domain is to be illuminated, which implies solving partial diﬀerential equations (PDEs) ∗ Corresponding author. Email: [email protected] (Rodolfo S. M. Freitas)[email protected] (Fernando A. Rochinha)

ORCID (s): (R.S.M. Freitas); (C.H.S. Barbosa); (G.M. Guerra); (A.L.G.A. Coutinho); (F.A. Rochinha)

R. S. M. Freitas et al.

Page 1 of 26 a r X i v : . [ phy s i c s . c o m p - ph ] J un n encoder-decoder deep surrogate for reverse time migration associated with the wave models in that area. The situation tends to be more complicated as theexcitation signals bear high-frequency content, which demands very ﬁne grids in space and time.Such time-consuming tasks often hamper the use of high-ﬁdelity codes constructed upon physics-based models. That becomes a more critical issue whenever one faces many-query applicationslike sensitivity analysis, design, optimization, or uncertainty quantiﬁcation.In this work, we develop a machine-learning model to alleviate computational costs to provideseismic images with quantiﬁed uncertainty [2]. In this context, we propose a Monte Carlo method(MC) to sweep a large ensemble of plausible velocity ﬁelds obtained by approximate methods,and, therefore, prone to uncertainties, to compute an ensemble of images aiming at characterizingthe propagated uncertainties along with the seismic image processing. Moreover, we embed theMC sampling as an outer loop of a larger computational workﬂow proposed in [2] and detailedin Algorithm 1. This algorithm is structured in three sequential stages, enabling a probabilisticframework for seismic imaging.The ﬁrst stage aims at generating plausible subsurface velocity ﬁelds honoring seismic data.Probabilistic inversion, such as Bayesian tomography [2, 3, 4, 5], and stochastic FWI [6, 7, 8, 9, 10]can provide a velocity ﬁeld ensemble used as input to the second stage. Hence, in Stage 2 , animaging technique migrates the seismogram information using each velocity ﬁeld sample. Thisstrategy wraps a seismic migration tool into an MC algorithm aiming to build a set of migratedseismic images. We have chosen the Reverse Time Migration (RTM) as the seismic migrationtechnique to localize the seismic reﬂectors in the correct depth location in the subsurface [11]. RTMis a depth migration approach based on the two-way wave equation, frequently used in industry,that provides reliable subsurface high-resolution seismic images useful for seismic interpretationand reservoir characterization [11]. The last stage of the workﬂow post-processes the RTM seismicimages ensemble, calculating uncertainty maps and extracting features, such as horizons and faults,that characterizes uncertainty in the resulting images.

Algorithm 1

Workﬂow for seismic imaging with quantiﬁed uncertainty

Input : source signals, seismograms, and spatial domain (raw data).

Output : ensemble of seismic images.

Stage 1 : Generate an ensemble of velocity ﬁelds:• Bayesian inversion with simpliﬁed physics-based models

Stage 2 : Propagate uncertainties – migrate the seismograms for the velocity ﬁeld ensemble usingRTM, producing a corresponding ensemble of seismic images• Monte Carlo loop over samples produced in Stage 1

Stage 3 : Post-process the RTM seismic images• Uncertainty maps;• Automatic features (horizons) detection;• Probabilistic characterization of such features;Due to its ﬂexibility by design, it is possible to generate diﬀerent workﬂow versions, by, forinstance, replacing components within the stages (e.g., diﬀerent strategies for the uncertain ve-

R. S. M. Freitas et al. Page 2 of 26n encoder-decoder deep surrogate for reverse time migration locity ﬁelds estimation in

Stage 1 ) targeting to accommodate diﬀerent demands or eﬃciency re-quirements. Nonetheless, we would still be facing a time-consuming computational task in themany-query UQ analysis of

Stage 2 . That is what motivates us to follow a consolidated trend,replacing the original physics-based model by a cheap-to-evaluate surrogate. Recently, MachineLearning techniques, like Gaussian Processes [12, 13, 14, 15, 16, 17] and Deep Neural Networks(DNNs), [18, 19, 20, 21, 22, 23, 24, 25] have deployed eﬃcient surrogates for UQ analysis. Gaus-sian Processes have achieved considerable success with computer models with inputs and outputsof moderate dimensionality, including the ability to blend data of diﬀerent sources, leading to multi-ﬁdelity approaches [15, 16]. On the other hand, Deep Neural Networks have gained traction dueto their unique proﬁle of being ﬂexible and scalable nonlinear function approximators. Anotheraspect worth highlighting is the substantial amount of computer libraries and tools available toenable their use.Here, we apply the deep learning surrogate architecture proposed in [18] for systems governedby PDEs cast as an image-to-image problem. The performance of such architecture was tested inuncertainty quantiﬁcation of ﬂows in heterogeneous media [19], extended to semi-supervised learn-ing in [20], and inverse problems in [21], with excellent results. This architecture is composed ofconvolutional layers and dense blocks, following an encoder-decoder neural network arrangementto handle the potential high-dimensionality of inputs and outputs. More speciﬁcally, we employthe deep learning architecture for constructing eﬃcient proxies for RTM imaging by avoiding thehigh costs of solving twice a wave propagation equation in a heterogeneous medium. Such sur-rogates are nonlinear mappings linking the uncertain velocity ﬁeld to the seismic images. It isworth to highlight that diﬀerently from usual surrogates, we do not replace only a forward solverassociated to a PDE, but the whole more expensive imaging process. The surrogate can handlethe high-dimensional inputs (velocity ﬁelds) and outputs (seismic images), leading to cost savingsin processing and memory storage. We demonstrate through two examples that such an approachenables producing seismic images with quantiﬁed uncertainty. Indeed, it can accurately reproducethe ensemble of images resulting from the MC uncertain propagation with much less computationaleﬀort. Moreover, it uses a limited training data as expected, which was conﬁrmed by our resultsand eﬃciency estimation.The remainder of this paper is organized as follows. The next section details the RTM math-ematical problem along with its computational implementation within

Stage 2

MC uncertaintypropagation. Section 3 presents our deep learning, surrogate architecture and training strategy. InSection 4, we present numerical experiments where we investigate the accuracy, convergence, andcost-eﬀectiveness of our surrogate model to replace the high-ﬁdelity RTM under uncertainty. Thepaper ends with a summary of our main ﬁndings.

2. Reverse Time Migration under Uncertainty

RTM is a high-resolution depth migration technique providing useful subsurface images forextracting information such as reservoir localization and boundaries [11]. The raw data for RTMconsists of recorded seismic signals induced by a seismic source (a shot). The group of seismicsignals represents a seismogram that captures information related to reﬂections coming from thesubsurface. RTM relies on the two-way wave propagation equation, resulting in an imaging con-dition (IC) [26] computed over the space-time domain to be imaged. More speciﬁcally, the wave

R. S. M. Freitas et al. Page 3 of 26n encoder-decoder deep surrogate for reverse time migration equation is solved twice, the ﬁrst time to compute the forward-propagated wave due to seismicsources, followed by the computation of the backward-propagated wave induced by the recordedseismograms. Both solutions are needed to compute the IC. We calculate the forward wave isotropicacoustic case by solving, ∇ 𝑝 ( r , 𝑡 ) − 1 𝑣 ( r ) 𝜕 𝑝 ( r , 𝑡 ) 𝜕𝑡 = 𝑓 ( 𝐫 𝑠 , 𝑡 ) ,𝑝 ( r , 𝑡 ) = 0 on 𝜕 Ω D and  p( r , t) = 0 on 𝜕 Ω inf , (1) 𝑝 ( r ,

0) = 0 and 𝜕𝑝 ( r , 𝜕𝑡 = 0 , r ∈ Ω , where Ω ⊂ ℝ denotes the domain to be imaged, 𝜕 Ω = 𝜕 Ω 𝐷 ∪ 𝜕 Ω 𝑖𝑛𝑓 ⊂ ℝ is the domain boundaryand 𝜕 Ω 𝐷 and 𝜕 Ω 𝑖𝑛𝑓 are non-overlapping boundary partitions. 𝜕 Ω 𝐷 is the portion of the boundarywhere Dirichlet boundary conditions are applied, representing, for instance, the free-surface. Theoperator  represents the non-reﬂecting boundary condition [27] applied on 𝜕 Ω 𝑖𝑛𝑓 . The pressure(the forward-propagated source waveﬁeld) 𝑝 ( r , 𝑡 ) is deﬁned at the position r = ( 𝑟 𝑥 , 𝑟 𝑦 , 𝑟 𝑧 ) ∈ Ω andtime 𝑡 ∈ [0 , 𝑇 ] . Moreover, 𝑣 ( 𝐫 ) is the compressional wave velocity spatial ﬁeld, and 𝑓 ( 𝐫 𝑠 , 𝑡 ) is theseismic source. The vector 𝐫 𝑠 represents the seismic source position. The backward-propagatedwaveﬁeld is calculated solving, ∇ ̄𝑝 ( r , 𝜏 ) − 1 𝑣 ( r ) 𝜕 ̄𝑝 ( r , 𝜏 ) 𝜕𝜏 = 𝑠 ( 𝐫 𝑟 , 𝜏 ) ,̄𝑝 ( r , 𝜏 ) = 0 on 𝜕 Ω D and  ̄p( r , 𝜏 ) = 0 on 𝜕 Ω inf (2) ̄𝑝 ( r ,

0) = 𝑝 ( r , 𝑇 ) and 𝜕 ̄𝑝 ( r , 𝜕𝜏 = 𝜕𝑝 ( r , 𝑇 ) 𝜕𝜏 , r ∈ Ω . which is an equation similar to (1), but with a diﬀerent source 𝑠 ( 𝐫 𝑟 , 𝜏 ) , that is, the recorded signalsat the receivers positioned in 𝐫 𝑟 . Besides, the evolution in Eq. (2) is over the reverse time 𝜏 = 𝑇 − 𝑡 .Thus, the backward-propagated waveﬁeld ̄𝑝 ( r , 𝜏 ) is deﬁned in Ω and 𝜏 ∈ [0 , 𝑇 ] .The IC dictates the quality and ﬁdelity of the ﬁnal RTM image. There are several possibilities,for instance, excitation ICs [28, 29, 30], extend ICs [31, 32, 33], waveﬁeld decomposition ICs [34],and the zero-lag cross-correlation ICs [11, 35]. We have chosen the zero-lag cross-correlationbetween the forward and backward propagated waves at each point in Ω , 𝐼 ( r ) = ∫ 𝑇 𝑝 ( r , 𝑡 ) ̄𝑝 ( r , 𝜏 ) 𝑑𝑡. (3)The IC amplitudes in equation (3) do not provide an explicit physical relationship with thereﬂection coeﬃcients. In [35], we ﬁnd a detailed explanation of the relation between the imagingcondition and the reﬂection coeﬃcient. Nevertheless, the resulting image provides the correctamplitude contrast locations of the geological interfaces of rocks with diﬀerent physical properties[11]. The amplitude contrast patterns are the main feature of the migrated seismic images explored R. S. M. Freitas et al. Page 4 of 26n encoder-decoder deep surrogate for reverse time migration in the present work.We apply an explicit 𝑛𝑑 -order in time and 𝑡ℎ -order in space ﬁnite diﬀerence numerical scheme[36] to approximate equations (1) and (2), leading to the vector 𝐯 , the discrete version of the velocityﬁeld, and similarly the vectors 𝐩 , ̄𝐩 , 𝐬 , 𝐟 at each time step. Note that the vectors 𝐩 , ̄𝐩 , and 𝐯 havethe same dimension, that is 𝑁 = 𝑁 𝑥 ∗ 𝑁 𝑦 (or 𝑁 = 𝑁 𝑥 ∗ 𝑁 𝑦 ∗ 𝑁 𝑧 in 3D), where 𝑁 𝑥 , 𝑁 𝑦 (and 𝑁 𝑧 ) are the number of grid points in each Cartesian direction. Each discrete seismogram isa vector of size 𝑁 𝑟𝑒𝑐 ∗ ( 𝑁 𝑡 + 1) , where 𝑁 𝑟𝑒𝑐 is the number of receivers, and 𝑁 𝑡 = 𝑇 ∕Δ 𝑡 , with Δ 𝑡 is the time step. The seismic source 𝐟 has dimension 𝑁 𝑡 . RTM is not only computationallyintensive but also data-intensive due to the high dimensional inputs, the amount of data to manage,for instance, storing and retrieving 𝐩 , and the computational costs associated with imposing thestability and dispersion conditions in the discrete two-way wave equation [11]. The dispersionrelation takes into account the number of grid points per wavelength and the medium properties,which in our isotropic acoustic case is the P-wave velocity. Thus, complexity in estimating themigrated image increases with high heterogeneous media and seismic source cutoﬀ frequency.Least-squares RTM (LSRTM) extends the basic method to an iterative method that minimizes adata misﬁt term [26, 37]. However, in the present work, we restrict ourselves to the standard RTMtechnique.As we wrap RTM into a sampling method in Algorithm 1, for taking into consideration theinput uncertainties, the overall computational cost of Stage 2 rises proportionally to 𝑁 𝑀𝐶 , thecardinality of the ensemble of possible velocity ﬁelds. Typically, seismic raw data recording setsare split into multiple steps ( 𝑁 𝑠ℎ𝑜𝑡𝑠 ) to cope with the potentially high spatial dimensions to becovered and to enhance the signal to noise ratio (SNR) in processing stages. Each step coveringfully or partially the domain corresponds to a diﬀerent arrangement of sources and receivers. It isessential to mention that RTM iterates over the number of shots producing partial migrated imagescharacterized by equation (3). When this loop ends, a process called staking sums the partiallymigrated seismic images into a single one [38, 39], gathering into this single image all informationrelated to the seismogram set. Algorithm 2 details the generation of the RTM images ensemble,where a set of seismograms, { 𝐬 , ⋅ ⋅ ⋅ , 𝐬 𝑁 𝑠ℎ𝑜𝑡𝑠 } , a set of velocity ﬁelds, { 𝐯 , ⋅ ⋅ ⋅ , 𝐯 𝑁 𝑀𝐶 } , and a seismicsource ( 𝐟 ) are given as inputs. The indexes 𝑁 𝑠ℎ𝑜𝑡𝑠 and 𝑁 𝑀𝐶 represent the number of shots and thenumber of samples for the MC method. For each MC iteration, we solve the wave equation twice,one for the seismic source and other for the seismograms associated with it. The computation ofthe imaging condition uses both solutions (source waveﬁeld, and receiver waveﬁeld), retrievingfrom persistent storage the source waveﬁeld to build the migrated seismic section and stackingthe partial results over time ( 𝐈 ∑ 𝑡 ), and over the number of seismograms ( 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 ). At the end ofAlgorithm 2, we have the discrete imaging condition set { 𝐈 , 𝐈 , ⋯ , 𝐈 𝑁 𝑀𝐶 } where each 𝐈 𝑖 is a vectorin ℝ 𝑁 . It is usual to ﬁlter each image to sharpen its features. Nevertheless, we do not apply anyﬁlter to the ensemble of seismic images. Summarizing, migrations of seismograms for the set ofvelocity ﬁelds produce the corresponding set of migrated seismic images, where each one has adirect relation with one velocity sample.Diﬀerent strategies can be pursued in order to make feasible Algorithm 1 by reducing the in-herent computational costs of processing and storage. They could rely, for instance, on data com-pression [40, 41, 42, 43] or more eﬀective stochastic sampling [44], but here, as mentioned before,our option is for using deep learning surrogates for the RTM imaging, what we describe in thefollowing section. R. S. M. Freitas et al. Page 5 of 26n encoder-decoder deep surrogate for reverse time migration

Algorithm 2

Reverse Time Migration under Uncertainty.

Require: { 𝐯 , ⋅ ⋅ ⋅ , 𝐯 𝑁 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 } , { 𝐬 , ⋅ ⋅ ⋅ , 𝐬 𝑁 𝑠ℎ𝑜𝑡𝑠 } , and 𝐟 function RTM _ UQ ( vectors { 𝐯 , ⋅ ⋅ ⋅ , 𝐯 𝑁 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 } , vectors { 𝐬 , ⋅ ⋅ ⋅ , 𝐬 𝑁 𝑠ℎ𝑜𝑡𝑠 } , vector 𝐟 ) for 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 = 1 to 𝑁 𝑀𝐶 do read 𝐯 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 , and 𝐟 initialize image condition 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 = 0 for 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 = 1 to 𝑁 𝑠ℎ𝑜𝑡𝑠 do initialize 𝑛 𝑡 = 0 apply initial conditions for 𝑖 𝑡 = 0 for 𝑖 𝑡 = 1 to 𝑁 𝑡 do 𝑛 𝑡 = 𝑛 𝑡 + 𝑖 𝑡 ∗ Δ 𝑡 solve equation (1) ⊳ source waveﬁeld store 𝐩 𝑛 𝑡 end for initialize 𝑛 𝜏 = 0 , and 𝐈 ∑ 𝜏 = 0 apply initial conditions for 𝑖 𝜏 = 0 for 𝑖 𝜏 = 1 to 𝑁 𝑡 do 𝑛 𝜏 = 𝑁 𝑡 − ( 𝑛 𝜏 + 𝑖 𝜏 ∗ Δ 𝜏 ) ⊳ reverse time read 𝐩 𝑛 𝜏 , and 𝐬 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 solve equation (2) ⊳ receiver waveﬁeld calculate 𝐈 ∑ 𝑛 𝜏 = 𝐈 ∑ 𝑛 𝜏 + 𝐩 𝑛 𝜏 ̄ 𝐩 𝑛 𝜏 ⊳ imaging condition end for stack 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 = 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 + 𝐈 ∑ 𝑛 𝜏 ⊳ stacking end for store 𝐈 ∑ 𝑠ℎ𝑜𝑡 _ 𝑖𝑑 ∀ 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 ∈ 1 ≤ 𝑠𝑎𝑚𝑝𝑙𝑒 _ 𝑖𝑑 ≤ 𝑁 𝑀𝐶 end for end function

3. Deep Learning Surrogate

The main goal of surrogate models is to replicate the multivariate input-output mapping pro-vided by physical models governed by PDEs to save computational costs. Performing uncertaintyquantiﬁcation in such conditions is often hampered whenever one faces high-dimensionality, theso-called curse of dimensionality. As pointed out in the literature, DNNs have proved successfulin such situations by exploiting low-dimensional latent spaces and sophisticated training methods[45]. We aim to construct and evaluate the performance of DNNs acting as a surrogate modelfor the RTM imaging under uncertainty, using as baseline the encoder-decoder architecture pro-posed by [18] and designed for problems cast as image-to-image regressions. We brieﬂy reviewthe building blocks of the network in this section.Figure 1, inspired in [18], provides the big picture of our end-to-end solution, depicting themain components of the encoder-decoder network. In our particular application, the input for thedeep learning surrogate is the ensemble of heterogeneous velocity ﬁelds, and the outputs are thecorresponding imaging conditions given by Eq. (3) for each sample of the velocity ﬁeld ensemble.Inputs and outputs are high-dimensional spatial ﬁelds, and the surrogate modeling cast as a ﬁeld-

R. S. M. Freitas et al. Page 6 of 26n encoder-decoder deep surrogate for reverse time migration

Figure 1:

RTM deep convolutional encoder-decoder network architecture. Inputs: ensemble of velocityﬁelds 𝐯 ∈ ℝ 𝑁 . Outputs: corresponding ensemble of image conditions 𝐈 ∈ ℝ 𝑁 . to-ﬁeld regression. This approach is image-inspired. Then, it relies on connecting each pixel ofthe input ﬁeld to an output pixel, where pixels correspond to grid points in the input computationalmesh and output ﬁelds. The trained network maps the velocity 𝐯 ∈ ℝ 𝑁 into the IC ﬁeld 𝐈 ∈ ℝ 𝑁 .The encoder-decoder architecture displayed in Fig. 1 consists of two sequential main phases.The ﬁrst is the encoder, wherein dimension reduction occurs, followed by the decoder that allowsexpressing the network output with its original dimension. Alternating dense blocks and transitionlayers constitute both. Indeed, this architecture merges key characteristics of fully connected withconvolutional networks, suited for the present application. On one side, convolutional networks arequite eﬀective in dimension reduction [45] and are capable of capturing spatial correlations presentin the input velocity ﬁelds. Fully connected layers enhance the information transmitted across thenetwork, improving the overall eﬃciency reﬂected in reasonable sizes of the needed training dataset [46].Dense-blocks connect all layers directly to each other, helping the training process with theimprovement of information ﬂow and gradients across the network [46]. Inputs of the 𝑙 -th layerare the concatenated outputs from the previous layers, that is, z 𝑙 = 𝐻 𝑙 ([ z , z , … , z 𝑙 −1 ]) , with z 𝑙 the output of 𝑙 -th layer, and [ z , z , … , z 𝑙 −1 ] refers to their concatenation, and [0 , … , 𝑙 − 1] . 𝐻 𝑙 isa non-linear transformation. Here, 𝐻 𝑙 results from applying three consecutive operations, batchnormalization [47] followed by a ReLU [48] and, convolution. The dense-block has two designparameters, the number of layers 𝐿 , and the growth rate, 𝐾 , the number of output features of everysingle layer. The transition layers here, in the encoder (decoder), are convolutional (deconvolu-tional) and, therefore, handle the dimension inputs or outputs of dense blocks. As shown in Figure1, during the encoder phase, the high dimensional velocity ﬁelds are immersed in an alternatingseries of layers of dense blocks and encoders. The last layer of the encoder phase produces low-dimensional feature maps that characterize the high dimensional ﬁeld, as shown in the purple maps.Such maps are immersed in the decoder phase, which is composed of an alternating series of layersof dense-blocks and decoders, returning ICs to its (high) dimension.The surrogate 𝐠 is expressed formally in a compact notation as, R. S. M. Freitas et al. Page 7 of 26n encoder-decoder deep surrogate for reverse time migration ̂𝐲 = 𝐠 ( 𝐱 ; 𝐰 ) , (4)where ̂𝐲 is the surrogate prediction (imaging condition, 𝐈 ) for an input 𝐱 (that is, a velocity ﬁeld 𝐯 ),and vector 𝐰 contains the parameters of the neural network. Training the neural network meanslearning parameters 𝐰 using data from the set  = {( 𝐱 𝑖 , 𝐲 𝑖 )} , 𝑖 = 1 ⋯ 𝑁 𝑡𝑟𝑎𝑖𝑛 obtained from sim-ulations with the RTM algorithm, where 𝑁 𝑡𝑟𝑎𝑖𝑛 is the number of samples in the training set. Thestochastic gradient descent algorithm computes the unknown network elements for a given lossfunction [49]. We consider for training our surrogate, the following  regularized mean squarederror (MSE) function, 𝐿 𝑀𝑆𝐸 = 1 𝑁 𝑡𝑟𝑎𝑖𝑛 𝑁 𝑡𝑟𝑎𝑖𝑛 ∑ 𝑖 =1 ‖‖ ̂𝐲 𝑖 − 𝐲 𝑖 ‖‖ + 𝛼 Ω( 𝐰 ) (5)where ̂𝐲 𝑖 = 𝐠 𝑖 ( 𝐱 𝑖 , 𝐰 ) . Here the penalty function is given by Ω( 𝐰 ) = 𝐰 𝑇 𝐰 for the  regularization.Moreover, the root mean squared error ( 𝑅𝑀 𝑆𝐸 ) is used for monitoring the convergence of thetraining error. The

𝑅𝑀 𝑆𝐸 is given by,

𝑅𝑀 𝑆𝐸 = √√√√ 𝑁 𝑡𝑟𝑎𝑖𝑛 𝑁 𝑡𝑟𝑎𝑖𝑛 ∑ 𝑖 =1 ‖‖ 𝐲 𝑖 − ̂ 𝐲 𝑖 ‖‖ . (6)

4. Numerical experiments

Here, we present two examples to demonstrate the ability of the encoder-decoder surrogatein replacing the original two-way wave equation RTM algorithm eﬃciently. In these numericalexperiments, we mimic

Stage 1 outputs of Algorithm 1. That is, we need to generate an ensembleof velocities. Hence, we assign to the diﬀerent geological layers random velocity magnitudes forproducing synthetic data to train the neural network and perform the uncertainty analysis. The ﬁrstexample deals with a medium containing two ﬂat geological layers of constant velocity. We increasethe diﬃculty for the surrogate in the second example, by employing a more complex medium, inwhich the ﬁve geological layers are no longer ﬂat, implying in horizontal heterogeneity. The seismicsource term considered in the present work is a Ricker-type wavelet [50].The encoder-decoder networks are constructed using the open platform Tensorﬂow [51]. TheAdam optimizer algorithm [52] is employed for parameter learning, considering a weight decay of −5 , and an initial learning rate of −3 , with a learning rate scheduler, that is used droppingtwo times on plateau of the rooted mean squared error. We compute a total of 1300 samples byconsidering the velocity magnitude constant in the interior of each geological layer. Therefore,each velocity ﬁeld in the ensemble has the form, 𝐯 = 𝑛 𝑙 ∑ 𝑙 =1 𝑣 𝑙 (1 + 𝜎 𝑙 𝜉 𝑙 ) 𝐏 𝑙 (7)where 𝑛 𝑙 is the number of geological layers, 𝑣 𝑙 is the mean velocity within each geological layer, 𝜎 𝑙 isthe standard deviation, here assumed as , 𝜉 𝑙 ∼ 𝕌 [−1 , is a random variable following a uniformdistribution, and 𝐏 𝑙 is a 𝑁 -dimensional vector containing in the components corresponding to the R. S. M. Freitas et al. Page 8 of 26n encoder-decoder deep surrogate for reverse time migration 𝑙 -th geological layer grid points and otherwise. After that, we apply a moving harmonic averageto 𝐯 with a window length of 20 grid points to mimic the velocity ﬁelds computed by parametermodel building techniques like tomography or full-waveform inversion. To analyze the surrogatetraining convergence, out of 1300 samples computed with the original RTM model, we create fourtraining datasets with , , , and samples each. We used the remaining 𝑁 𝑡𝑒𝑠𝑡 = 500 samples to test the trained network.Accuracy is measured using the distance between predictions with the surrogate model andthose computed with the RTM original model. To evaluate the surrogate model quality, we considerthe coeﬃcient of determination ( 𝑅 -score) metric [53]. We deﬁne the coeﬃcient of determinationas, 𝑅 = 1 − ∑ 𝑁 𝑡𝑒𝑠𝑡 𝑖 =1 ‖‖ 𝐲 𝑖 − ̂ 𝐲 𝑖 ‖‖ ∑ 𝑁 𝑡𝑒𝑠𝑡 𝑖 =1 ‖‖ 𝐲 𝑖 − 𝐲 ‖‖ (8)where 𝐲 = 𝑁 𝑡𝑒𝑠𝑡 ∑ 𝑁 𝑡𝑒𝑠𝑡 𝑖 =1 𝐲 𝑖 is the mean of test samples. The 𝑅 -score metric represents the normalizederror, allowing the comparison between surrogate models trained by diﬀerent datasets, with valuesclose to 1 corresponding to the surrogate models best accuracy. Here, we consider 0.95 a goodtarget. Also, we intend that the surrogate model returns not only good predictions of seismic imagesbut also accurate estimations of quantities that characterize the uncertainties in such images. Tomeasure the degree of uncertainty in the seismic images, we follow the approach proposed in [54].In their approach, the degree of uncertainty is expressed by a conﬁdence index that represents thepointwise normalized standard deviation, where low values represent high variabilities and highvalues the opposite. The conﬁdence index is, 𝑐 ( r ) = 𝜎 𝑚𝑎𝑥 − 𝜎 ( r ) 𝜎 𝑚𝑎𝑥 − 𝜎 𝑚𝑖𝑛 , (9)where 𝑐 ( r ) is the conﬁdence index at position r , and 𝜎 𝑚𝑖𝑛 and 𝜎 𝑚𝑎𝑥 are the minimum and maximumﬁeld standard deviations, respectively. Another form of measuring the degree of uncertainty is thecoeﬃcient of variation, deﬁned as the pointwise ratio between the standard deviation and the mean,c 𝑣 ( r ) = 𝜎 ( r ) 𝜇 ( r ) (10)where 𝜇 ( r ) is the expected value at position r . In this ﬁrst example, designed to evaluate the eﬃcacy and eﬃciency of the proposed surrogate,we assume a simple geologic scenario in which two horizontal homogeneous geological layersseparated by a ﬂat horizon parallel to the surface composes the subsurface, as shown in Fig 2.This domain has 1000 m of depth and 1000 m of lateral extension, where the velocity in the ﬁrstgeological layer is 3000 m/s, and the velocity in the deeper geological layer is 4500 m/s.We produce synthetic data using the wave propagation forward solver with the reference ve-locity ﬁeld of Fig 2, with a seismic source with cutoﬀ frequency of 30 Hz.In such modeling, wesimulate a ﬁxed split-spread acquisition [38] comprising nine shots, where receivers are positionednear the surface for each shot and equally spaced of 20 meters. The seismic source is also placed

R. S. M. Freitas et al. Page 9 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] v e l o c i t y [ m / s ] Figure 2:

Simple geologic setup: two horizontal geological layers.

Table 1

RTM numerical parameters.Parameter Value Description (Unit) ℎ Spatial discretization (m) Δ 𝑡 .

22 × 10 −3 Temporal discretization (s) 𝑡 𝑎 𝑁 𝑥 × 𝑁 𝑦

50 × 50

Number of grid points 𝑖 𝑠𝑟𝑐𝑥 , 𝑖 𝑠𝑟𝑐𝑦 ([5:5:45], 5) Source positions 𝑖 𝑠𝑖𝑠𝑥 , 𝑖 𝑠𝑖𝑠𝑦 ( [1:1:50], 5) Receiver positions near the surface and moved 100 meters for each shot, covering the entire domain with nine shots.Table 1 shows the parameters used in the numerical modeling of the wave acoustic equations andthe positioning of the seismic source and receivers given by the index ranges [ 𝑖 𝑠𝑟𝑐𝑥 , 𝑖 𝑠𝑟𝑐𝑦 ] for thesources, and [ 𝑖 𝑠𝑖𝑠𝑥 , 𝑖 𝑠𝑖𝑠𝑦 ] for the receivers. The grid size and time step respect the numerical disper-sion and stability criteria [36].Table 2 details the neural network architecture. The ﬁrst layer is convolutional, with kernelsize equals to 4 and stride 2. This ﬁrst layer captures spatial relations from the velocity input.The number of features maps after the ﬁrst convolutional layer is 48. The neural network has 3dense-blocks with 𝐿 = 4 and 𝐾 = 16. Dense-blocks have a kernel size equals 3, and a stride of 1.Encoder-decoder layers have a kernel size of 3 and a stride of 2. In the decoding layer, we introducea transposed convolution, allowing the expression of the output with its original dimensionality,equal to the computational grid. A ﬁnal ReLU layer [48] imposes that the outputs are positivenumbers, naturally constraining the network to output 𝐼 𝐶 > at each grid point. Thus, the neuralnetwork has , parameters to be estimated.Figure 3 shows the decay of the 𝑅𝑀 𝑆𝐸 as a function of the number of epochs during the

R. S. M. Freitas et al. Page 10 of 26n encoder-decoder deep surrogate for reverse time migration

Table 2

Neural Network Architecture. "Outputs" represents the number of features maps and "Dimension" isthe dimension of the features maps. Layers Output DimensionInput 1

50 × 50

Convolution 48

24 × 24

Dense-block 1 112

24 × 24

Encoding 56

12 × 12

Dense-block 2 120

12 × 12

Decoding 1 60

24 × 24

Dense-block 3 124

24 × 24

Decoding 2 1

50 × 50

ReLU 1

50 × 50 R M S E Train = 200 samplesTrain = 400 samplesTrain = 600 samplesTrain = 800 samples

Figure 3:

RMSE decay with number of epochs. training process, for training data sets ranging from 200 to 800 samples. Note that the

𝑅𝑀 𝑆𝐸 stabilizes after 150 epochs for all cases and that for smaller data sets, we see higher error values.Key characteristics one is seeking when replacing the original expensive computational model by asurrogate are eﬃciency and accuracy. We estimate eﬃciency as the ratio between 𝑁 𝑆 , the numberof samples in the training set, and 𝑁 𝑀𝐶 , the number of samples of the MC method to achievea prescribed level of accuracy ( 𝑅 ≥ . ). Thus, we introduce the following index to evaluateeﬃciency, Eﬃciency = ( 𝜂 𝑁 𝑆 𝑁 𝑀𝐶 ) × 100 (11)Then, the index in equation (11) represents the percentage of the saved computational costs, and 𝜂 is an adjustment factor accounting for the time spent in the construction, training, and makingpredictions with the surrogate model. Without loss of generality, we assume for now 𝜂 = 1 . . Forless optimistic conditions, we recognize that 𝜂 > . . We calculate the coeﬃcient of determination 𝑅 to assess the accuracy of the neural network with the remaining 500 samples. We observe thatthe surrogate model accuracy is good even in small training data scenarios, reaching 𝑅 ≥ . , R. S. M. Freitas et al. Page 11 of 26n encoder-decoder deep surrogate for reverse time migration

200 300 400 500 600 700 800Number of samples for training93.594.094.595.095.596.096.5 E ff i c i e n c y [ % ] R - s c o r e Figure 4:

Test 𝑅 -score and eﬃciency in function of the number of training samples. as shown in Fig. 4. Furthermore, Fig. 4 also depicts the surrogate eﬃciency. Here it is worthhighlighting that 600 RTM runs are necessary to compute the variance with a relative error of −3 . Thus, we see that the eﬃciency is higher than 90%, even for the larger data set with 800samples.To further illustrate how the surrogate model approximates the predictions of the original modelwith good accuracy, Fig. 5 shows comparisons for two realizations chosen randomly from the testset. We observe that the surrogate presents good results, returning good predictions of the imagecondition. We also depict a comparison between the standard deviation, 𝜎 ( 𝐫 ) , the conﬁdence index, 𝑐 ( 𝐫 ) , and the coeﬃcient of variation, 𝑐 𝑣 ( 𝐫 ) , predicted by the original and surrogate models with 𝑁 𝑡𝑒𝑠𝑡𝑖𝑛𝑔 = 500 testing samples, see Fig. 6. Besides, we introduce a discrete version of a 𝐿 relativeerror between two spatial ﬁelds 𝑓 , one computed with the RTM and the other by the surrogate as, 𝑒 𝑔 = 1 𝑁 𝑁 ∑ 𝑖 =1 ( 𝑔 𝑖𝑅𝑇 𝑀 − 𝑔 𝑖𝑠 𝑔 𝑖𝑅𝑇 𝑀 ) (12)where the subscripts refer to how we compute the ﬁeld 𝑔 . This index is an average of the pointwiserelative error for all 𝑁 grid points. Figure 5 compares randomly selected seismic images producedby RTM with the corresponding ones obtained with the deep learning surrogate. The visual resem-blance is quantiﬁed using equation (12), leading to relative errors that stay below 2%. We extendfurther our assessment of the surrogate eﬀectiveness by comparing the uncertainty indexes com-puted with the two techniques displayed in Fig. (6). For all indexes, the relative errors computedwith equation (12) are less than 1%. To challenge the encoder-decoder surrogate, we use a synthetic geologic model with ﬁve ho-mogeneous layers similar to the one proposed in [55]. The 2D velocity model consists of a waterlayer with velocity 1500 m/s, and four mini sedimentary basins with velocities of 2000 m/s, 2500m/s, 3000 m/s, and 4000 m/s, respectively. Figure 7 display a schematic view of the velocity ﬁeldwith 1000 m of depth and 1000 m of lateral extension.Two synthetic seismograms are generated for the velocity ﬁelds shown in Fig 7 considering nowthe seismic sources with cutoﬀ frequencies of 30 and 45Hz.Table 3 shows the RTM parameters and

R. S. M. Freitas et al. Page 12 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] (a) RTM model D e p t h [ m ] (b) Surrogate model Figure 5:

Randomly selected samples of seismic images in the test data set computed by the RTM (a)and the surrogate model (b). The relative errors in the image condition, 𝑒 𝐼 , are lower than 2%. Table 3

RTM numerical parameters.Parameter 𝑓 𝑐𝑢𝑡𝑜𝑓𝑓 = 30 𝐻𝑧 𝑓 𝑐𝑢𝑡𝑜𝑓𝑓 = 45 𝐻𝑧 Description (Unit) ℎ

10 6 . Spatial discretization (m) Δ 𝑡 .

25 × 10 −3 .

333 × 10 −4 Temporal discretization (s) 𝑡 𝑎 𝑁 𝑥 × 𝑁 𝑦

100 × 100 150 × 150

Number of grid points 𝑖 𝑠𝑟𝑐𝑥 , 𝑖 𝑠𝑟𝑐𝑦 ([5:10:95], 5) ([7:15:142], 7) Source positions 𝑖 𝑠𝑖𝑠𝑥 , 𝑖 𝑠𝑖𝑠𝑦 ( [1:1:100], 5) ([1:1:150], 7) Receiver positions the positioning of sources and receivers. For the cutoﬀ frequency of 45Hz, due to the imposition ofthe stability and dispersion conditions in the discrete two-way wave equation, there is a signiﬁcantincrease in the input dimensionality, that bears the potential to hamper the neural network training.The next sub-sections present results for the scenarios involving the two frequencies of excitation. R. S. M. Freitas et al. Page 13 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] Standard deviation - RTM model D e p t h [ m ] Standard deviation - Surrogate model D e p t h [ m ] Conﬁdence index - RTM model D e p t h [ m ] Conﬁdence index - Surrogate model D e p t h [ m ] Coeﬃcient of variation - RTM model D e p t h [ m ] Coeﬃcient of variation - Surrogate model

Figure 6:

UQ indexes - standard deviation, 𝜎 ( 𝐫 ) , conﬁdence index, 𝑐 ( 𝐫 ) , and coeﬃcient of variation, 𝑐 𝑣 ( 𝐫 ) - predicted by the RTM (left) and surrogate models (right). The relative errors to the RTM modelare lower than 1%. We detail the architecture of the neural network for this scenario in Table 4. It contains ﬁvedense blocks, leading to , parameters to be trained. We can see in Figure 8 the 𝑅𝑀 𝑆𝐸 decay as the number of epochs increase for all training sets. We verify the accuracy of the surrogate

R. S. M. Freitas et al. Page 14 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] P P P P v e l o c i t y [ m / s ] Figure 7:

Geologic model with 5 layers adapted from [55]. 𝑃 , 𝑃 , 𝑃 , and 𝑃 are control points in thegeological model to compute statistics. R M S E Train = 200 samplesTrain = 400 samplesTrain = 600 samplesTrain = 800 samples

Figure 8:

RMSE decay with number of epochs. by computing the 𝑅 score for the 500 testing samples. We ﬁnd that, as expected, for networkstrained with larger data sets, 𝑅 values are closer to . , as shown in Figure 9(a). Due to limitationsimposed by the high cost of generating samples using the full RTM model for this example, wedevelop a diﬀerent eﬃciency analysis extrapolating from the basic conditions used for the networktraining. We assume conservatively that 𝑁 𝑀𝐶 is of the same order of the case in section 4.1. Thus,we start from a scenario where only 5,000 samples are needed to characterize uncertainties inthe seismic images and extrapolate to more expensive potential scenarios requiring hypotheticallytill 50,000 samples. Here, 𝑁 𝑆 is equal to 1100, 600 samples to train the neural network with anaccuracy of 𝑅 ≥ . , and 500 to test the surrogate model. Figure 9(b) depicts the eﬃciencyanalysis in function of 𝑁 𝑀𝐶 . We note, for the worst scenario, an eﬃciency of around 78%, and forthe scenarios where 𝑁 𝑀𝐶 is higher than 10,000 samples, the eﬃciencies reach values greater than90%. For the most expensive scenario, we see an eﬃciency close to 98%.Fig. 10 shows comparisons between images randomly selected from the test set and the corre-sponding images produced with the full RTM model. We observe that the surrogate model returns R. S. M. Freitas et al. Page 15 of 26n encoder-decoder deep surrogate for reverse time migration

Table 4

Neural Network Architecture. "Outputs" represents the number of features maps and "Dimension" isthe spatial dimension of the features maps.Layers Output DimensionInput 1

100 × 100

Convolution 48

48 × 48

Dense-block 1 112

48 × 48

Encoding 56

24 × 24

Dense-block 2 120

24 × 24

Encoding 60

12 × 12

Dense-block 3 124

12 × 12

Decoding 62

24 × 24

Dense-block 4 126

24 × 24

Decoding 63

48 × 48

Dense-block 5 127

48 × 48

Decoding 1

100 × 100

ReLU 1

100 × 100

200 300 400 500 600 700 800Samples0.900.910.920.930.940.950.960.970.98 R - s c o r e (a) 𝑅 -score for the trained networks E ff i c i e n c y [ % ] (b) Eﬃciency Figure 9:

Test 𝑅 -score and eﬃciency in function of the number of MC samples. excellent predictions of the imaging condition. We can also note that the surrogate model predictsthe UQ ﬁgures - standard deviation, 𝜎 ( 𝐫 ) , conﬁdence index, 𝑐 ( 𝐫 ) , and coeﬃcient of variation, 𝑐 𝑣 ( 𝐫 ) -with good accuracy, as seen in Fig 11, where the relative error between the surrogate predictions andthe RTM model are lower than 6%. Results in Figures 10 and 11 shows that the encoder-decodersurrogate extrapolates the replication of the IC training targets, delivering remarkable results toassist in UQ analysis.Next, we deepen our investigation of the surrogate’s ability to reproduce the probabilistic char-acterization of the IC ﬁelds by plotting the probability density functions (PDFs) of the imagingcondition at control points in the domain, as displayed in Fig. 7. We place the control points inregions of low and high uncertainties. As the reference solution, we use PDFs obtained by theRTM model with the 500 test samples to compare the accuracy of the surrogate model trained R. S. M. Freitas et al. Page 16 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] (a) RTM model D e p t h [ m ] (b) Surrogate model Figure 10:

Randomly selected images from the test data set computed by the RTM model (a) and thesurrogate model (b) trained with 600 samples. The relative errors in the image condition, 𝑒 𝐼 , are lowerthan 6%. with diﬀerent datasets to estimate the PDFs at the control points. Figure 12 shows the imagingcondition PDFs estimated with the surrogate models trained with 400, 600, 800 samples, togetherwith the reference PDFs. We observe that the PDFs obtained with the surrogate model are in goodagreement with the reference ones for all control points, particularly at the deeper point ( 𝑃 ).Now we exemplify a possible surrogate use in the feature extraction and interpretation of seis-mic images, Stage 3 of Algorithm 1. We provide, using the surrogate, a view of the uncertainties

R. S. M. Freitas et al. Page 17 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] Standard deviation - RTM model D e p t h [ m ] Standard deviation - Surrogate model D e p t h [ m ] Conﬁdence index - RTM model D e p t h [ m ] Conﬁdence index - Surrogate model D e p t h [ m ] Coeﬃcient of Variation - RTM model D e p t h [ m ] Coeﬃcient of variation - Surrogate model

Figure 11:

UQ indexes - standard deviation, 𝜎 ( 𝐫 ) , conﬁdence index, 𝑐 ( 𝐫 ) , and coeﬃcient of variation, 𝑐 𝑣 ( 𝐫 ) - predicted by the RTM model (left) and the surrogate model (right). The relative errors betweenthe surrogate predictions to the RTM model for the UQ indexes are less than 6%. associated with speciﬁc seismic targets, the interfaces of geological layers. This view can revealhow the propagated uncertainties can directly impact the images posterior interpretation. Figure 13provides the IC mean value and associate conﬁdence bands for the four interfaces. In the right partof the ﬁgure, we give an idea of the uncertainties spatial distribution, having as background a ran-domly selected image from the ensemble. To promote visual perception, we plotted ampliﬁed IC R. S. M. Freitas et al. Page 18 of 26n encoder-decoder deep surrogate for reverse time migration P D F RTM modelSurrogate - 400 samplesSurrogate - 600 samplesSurrogate - 800 samples 𝑃 P D F 𝑃 P D F 𝑃 P D F 𝑃 Figure 12:

Comparison between PDFs predicted by the RTM model and the surrogate model. conﬁdence bands associated with each interface. Those bands reﬂect the IC value dispersion amidthe image ensemble, and, therefore, might lead to a lack of conﬁdence in the reﬂector placement.

We now stress the proposed deep learning surrogate by testing its performance in a scenario notconsidered for the training, but still focusing on the subsurface geology of Fig.7. The RTM imagecomprises a certain number of shots, each designed for illuminating the same subsurface usingdiﬀerent seismic sources conditions. Here we analyze a situation involving a higher excitationfrequency, 𝑓 = 45 𝐻 𝑧 , implying, due to numerical requirements, in the necessity of a ﬁner grid. Insuch a case, input and output images have diﬀerent dimensions compared to the previous scenario.Still, the intrinsic dimensionality is the same for the input as we are imaging the same velocityﬁeld as before. Instead of seeking for a new architecture, we slightly changed the previous oneby replacing the ﬁrst and last network layers and adapting the initial convolutional layer to ensurethat an integer number deﬁnes the kernel. We do not expect to obtain the surrogate’s optimalperformance by employing such a strategy, but that can be quite useful in practical terms if it works.Table 5 shows the neural network architecture for the 45 Hz scenario. The network architectureis the same as in section 4.2.1 with small changes. The ﬁrst convolutional layer has a kernel size

R. S. M. Freitas et al. Page 19 of 26n encoder-decoder deep surrogate for reverse time migration

Conﬁdence bands on the interfaces for 𝑓 = 30 Hz Conﬁdence bands over image condition, 𝑓 =30 Hz Figure 13:

IC conﬁdence bands over the interface between geological layers (left), and IC conﬁdencebands superimposed over a randomly selected image (right) for 𝑓 = 30 Hz.

Table 5

Neural Network Architecture. "Outputs" represents the number of features maps and "Dimension" isthe dimension of the features maps. Layers Output DimensionInput 1

150 × 150

Convolution 48

72 × 72

Dense-block 1 112

72 × 72

Encoding 56

36 × 36

Dense-block 2 120

36 × 36

Encoding 60

18 × 18

Dense-block 3 124

18 × 18

Decoding 62

36 × 36

Dense-block 4 126

36 × 36

Decoding 63

72 × 72

Dense-block 5 127

72 × 72

Decoding 1

150 × 150

ReLU 1

150 × 150 equal to 7 and a stride of 2. The total number of parameters in the network is , . Figure 14shows 𝑅𝑀 𝑆𝐸 decay as a function of the number of epochs in the training process.We can see in Fig. 15a the 𝑅 score for diﬀerent training sets showing for this more diﬃ-cult scenario a slight decrease in the neural network quality. Conﬁrming our initial expectationsof a non-optimal but acceptable performance, the coeﬃcients of determination 𝑅 for all training R. S. M. Freitas et al. Page 20 of 26n encoder-decoder deep surrogate for reverse time migration R M S E Train = 200 samplesTrain = 400 samplesTrain = 600 samplesTrain = 800 samples

Figure 14:

RMSE decay with number of epochs.

200 300 400 500 600 700 800Samples0.740.760.780.800.820.840.860.88 R - s c o r e (a) 𝑅 -score for the trained networks E ff i c i e n c y [ % ] η = 1.0η = 1.5η = 2.0η = 2.5η = 3.0 (b) Eﬃciency Figure 15: 𝑅 -score and eﬃciency for the trained networks. datasets are lower than 0.90. Moreover, we estimate the eﬃciency of the surrogate model in thesame manner as in the previous case. However, here we consider values for the adjustment factor 𝜂 > . . More precisely, the adjustment factor tries to estimate the time spent in search of theneural network hyperparameters to optimize the surrogate model accuracy and to generate largertraining sets. Without loss of generality, we assume that the number of samples 𝑁 𝑆 to train theneural network is equal to 1100, 600 to train, and 500 to test the surrogate model. Figure 15(b)shows the eﬃciency in function of 𝑁 𝑀𝐶 , for several adjustment factors. Note that for scenarioswith 𝑁 𝑀𝐶 ≤ , , the eﬃciency drops signiﬁcantly for higher adjustment factors. However, forscenarios where 𝑁 𝑀𝐶 ≥ , samples the eﬃciency reaches values close to 80-90%. For sce-narios where 𝑁 𝑀𝐶 ≥ , we observe an eﬃciency close to 90% even for the higher adjustmentfactor.Despite the lower accuracy presented in this scenario, the surrogate model could reach satis-factory predictions of the imaging condition, as we can see in Fig. 16. In this Figure, we showthe three randomly selected images from the test data set computed by the RTM model and thesurrogate model. Note, however, that the image produced by the RTM model may not be the bestimage we can compute for these conditions. The grid is adjusted only to satisfy the stability anddispersion criteria for the 45 Hz cutoﬀ frequency. We do not optimize the domain size for a proper R. S. M. Freitas et al. Page 21 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] D e p t h [ m ] (a) RTM model D e p t h [ m ] (b) Surrogate model Figure 16:

Randomly selected images from the test data set computed by the RTM model (a) and thesurrogate model (b) trained with 600 samples. The relative errors 𝑒 𝐼 = are lower than 10%. representation of the non-reﬂecting boundary conditions and source/receiver arrangement. Fur-thermore, Figure 17 shows a comparison between the standard deviation, 𝜎 ( 𝐫 ) , conﬁdence index, 𝑐 ( 𝐫 ) , and coeﬃcient of variation, 𝑐 𝑣 ( 𝐫 ) , computed by the RTM and surrogate models. We observethat the surrogate model predicts the UQ indexes with satisfactory accuracy. The relative errorsbetween the surrogate predictions to the RTM model for the UQ indexes are lower than 6%.We now investigate the probability density functions (PDFs) of the imaging condition at thecontrol points in Fig. 7. We use again as reference solution PDFs obtained by the RTM model R. S. M. Freitas et al. Page 22 of 26n encoder-decoder deep surrogate for reverse time migration D e p t h [ m ] Standard deviation - RTM model D e p t h [ m ] Standard deviation - Surrogate model D e p t h [ m ] Conﬁdence index - RTM model D e p t h [ m ] Conﬁdence index - Surrogate model D e p t h [ m ] Coeﬃcient of variation - RTM model D e p t h [ m ] Coeﬃcient of Variation - Surrogate model

Figure 17:

UQ indexes - standard deviation, 𝜎 ( 𝐫 ) , conﬁdence index, 𝑐 ( 𝐫 ) , and coeﬃcient of variation, 𝑐 𝑣 ( 𝐫 ) - predicted by the RTM model (left) and the surrogate model (right). The relative errors betweenthe surrogate predictions to the RTM model for the UQ indexes are lower than 6%. with 500 test samples to verify the accuracy of the surrogate models trained with diﬀerent datasetsto estimate the PDFs at the control points. Figure 18 depicts the imaging condition PDFs at thecontrol points estimated by the surrogate model trained with 200, 400, 600, 800 samples, togetherwith the reference PDFs. We observe that the PDFs obtained with the surrogate model capture wellthe reference PDFs in all control points, particularly for large training datasets. R. S. M. Freitas et al. Page 23 of 26n encoder-decoder deep surrogate for reverse time migration P D F RTM modelSurrogate - 400 samplesSurrogate - 600 samplesSurrogate - 800 samples 𝑃 P D F 𝑃 P D F 𝑃 P D F 𝑃 Figure 18:

Comparison between PDFs predicted by the RTM model and the surrogate model.

5. Conclusions

We propose a deep learning model based on an encoder-decoder architecture to replace thecostly RTM technique on producing seismic images. This approach naturally ﬁts the frameworkof a computational workﬂow to produce seismic images with quantiﬁed uncertainty in [2]. Thissurrogate model builds a scalable image–to–image mapping, coping with the high dimensionalityof both the heterogeneous velocity ﬁelds that serve as inputs and images outputs. Such surrogatehas revealed to be very eﬃcient in the context of UQ many-query tasks, as demonstrated by ournumerical examples. Indeed, that was observed even in cases where we employ a non-optimalneural network architecture.We place our contribution in the emerging area of physics-informed machine learning, wherethe ﬁnal model, in many diﬀerent ways, blends two main components: often expensive compu-tational models relying on ﬁrst principles and phenomenological closure equations, and machinelearning data-driven tools. Such combination not only suits perfectly to the needs required by theworkﬂow mentioned earlier but also oﬀers a broad spectrum of opportunities to improve perfor-mance, like employing more powerful training strategies and automatic hyperparameters optimiza-tion.

R. S. M. Freitas et al. Page 24 of 26n encoder-decoder deep surrogate for reverse time migration

Acknowledgements

This study was ﬁnanced in part by CAPES, Brasil Finance Code 001. This work is also partiallysupported by FAPERJ (grant E-26/203.018/2017), CNPq (grant 302489/2016-9), and Petrobras(grant 2017/00148-9).

References [1] Yu Chen, Kai Gao, Eric S. Davis, Dipen N. Sinha, Cristian Pantea, and Lianjie Huang. Full-waveform inversion and least-squares reverse-time migration imaging of collimated ultrasonic-beam data for high-resolution wellbore integrity monitoring. Applied Physics Letters,113(7):071903, 2018.[2] Carlos HS Barbosa, Liliane NO Kunstmann, Rômulo M Silva, Charlan DS Alves, Bruno S Silva, Marta Mattoso, Fernando A Rochinha,Alvaro LGA Coutinho, et al. A workﬂow for seismic imaging with quantiﬁed uncertainty. arXiv preprint arXiv:2001.06444, 2020.[3] A. Botero, A Gesret, T. Romary, M. Noble, and C. Maisons. Stochastic seismic tomography by interacting markov chains. GeophysicalJournal International, 207:374–392, 2016.[4] J. Belhadj, T. Romary, A. Gesret, M. Noble, , and B. Figliuzzi. New parametrizations for bayesian seismic tomography. Inverse Problems,34:33, 2018.[5] N. Brantut. Time-resolved tomography using acoustic emissions in the laboratory, and application to sandstone compaction. GeophysicalJournal International, 213:2177–2192, 2018.[6] James Martin, Lucas C Wilcox, Carsten Burstedde, and Omar Ghattas. A stochastic newton mcmc method for large-scale statistical inverseproblems with application to seismic inversion. SIAM Journal on Scientiﬁc Computing, 34(3):A1460–A1487, 2012.[7] Hejun Zhu, Siwei Li, Sergey Fomel, Georg Stadler, and Omar Ghattas. A bayesian approach to estimate uncertainty for full-waveforminversion using a priori information from depth migration. Geophysics, 81(5):R307–R323, 2016.[8] Reetam Biswas and Mrinal Sen. 2d full-waveform inversion and uncertainty estimation using the reversible jump hamiltonian monte carlo.In SEG Technical Program Expanded Abstracts 2017, pages 1280–1285. Society of Exploration Geophysicists, 2017.[9] Lars Gebraad, Christian Boehm, and Andreas Fichtner. Bayesian elastic full-waveform inversion using hamiltonian monte carlo. EarthArXiv,page qftn5, 2019.[10] Zeyu Zhao and Mrinal K Sen. A gradient based mcmc method for fwi and uncertainty analysis. In SEG Technical Program Expanded Abstracts2019, pages 1465–1469. Society of Exploration Geophysicists, 2019.[11] H-W. Zhou, H. Hu, Z. Zou, Y. Wo, and O. Youn. Reverse time migration: A prospect of seismic imaging methodology. Earth-ScienceReviews, 179:207–227, 2018.[12] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. 2006.[13] Ilias Bilionis and Nicholas Zabaras. Multi-output local gaussian process regression: Applications to uncertainty quantiﬁcation. Journal ofComputational Physics, 231(17):5718 – 5746, 2012.[14] Ilias Bilionis, Nicholas Zabaras, Bledar A Konomi, and Guang Lin. Multi-output separable gaussian process: Towards an eﬃcient, fullybayesian paradigm for uncertainty quantiﬁcation. Journal of Computational Physics, 241:212–239, 2013.[15] L. Parussini, D. Venturi, P. Perdikaris, and G.E. Karniadakis. Multi-ﬁdelity gaussian process regression for prediction of random ﬁelds.Journal of Computational Physics, 337:36–50, 2017.[16] P. Perdikaris, D. Venturi, and G. E. Karniadakis. Multi-ﬁdelity information fusion algorithms for high-dimensional systems and massive datasets. SIAM J. Sci. Comput., 38:B521 – B538, 2016.[17] G. S. H. Pau, Y. Zhang, and S. Finsterle. Reduced order models for many-query subsurface ﬂow applications. Computational Geosciences,17:705–721, 2013.[18] Y. Zhu and N. Zabaras. Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantiﬁcation. Journalof Computational Physics, 366:415–447, 2018.[19] S. Mo, Y. Zhu, N. Zabaras, X. Shi, and J. Wu. Deep convolutional encoderâĂŘdecoder networks for uncertainty quantiﬁcation of dynamicmultiphase ﬂow in heterogeneous media. Water Resources Research, 55:703–728, 2019.[20] Y. Zhu, N. Zabaras, P.S. Koutsourelakis, and P. Perdikaris. Physics-constrained deep learning for high-dimensional surrogate modeling anduncertainty quantiﬁcation without labeled data. Journal of Computational Physics, 394:56–81, 2019.[21] S. Mo, N. Zabaras, X. Shi, and J. Wu. Integration of adversarial autoencoders with residual dense convolutional networks for estimation ofnonâĂŘgaussian hydraulic conductivities. Water Resources Research, 56:e2019WR026082, 2020.[22] Y. Yang and P. Perdikaris. Conditional deep surrogate models for stochastic, high-dimensional, andmulti-ﬁdelity systems. ComputationalMechanics, 64:417–434, 2019.[23] M. Tang, Y. Liu, and L. J.Durlofsky. A deep-learning-based surrogate model for data assimilation in dynamic subsurface ﬂow problems.Journal of Computational Physics, 413:109456, 2020.[24] Adar Kahana, Eli Turkel, Shai Dekel, and Dan Givoli. Obstacle segmentation based on the wave equation and deep learning. Journal ofComputational Physics, 413:109458, 2020.[25] Jian Sun, Zhan Niu, Kristopher A. Innanen, Junxiao Li, , and Daniel O. Trad. A theory-guided deep-learning formulation and optimizationof seismic waveform inversion. Geophysics, 85(2):R87–R99, 2020.[26] G. T. Schuster. Seismic inversion. Society Exploration Geophysicists, 1 edition, 2017.[27] Charles Cerjan, Dan Kosloﬀ, Ronnie Kosloﬀ, and Moshe Reshef. A nonreﬂecting boundary condition for discrete acoustic and elastic waveequations. Geophysics, 50(4):705–708, 1985.

R. S. M. Freitas et al. Page 25 of 26n encoder-decoder deep surrogate for reverse time migration [28] Wen-Fong Chang and George A McMechan. Reverse-time migration of oﬀset vertical seismic proﬁling data using the excitation-time imagingcondition. Geophysics, 51(1):67–84, 1986.[29] Wen-Fong Chang and George A McMechan. Elastic reverse-time migration. Geophysics, 52(10):1365–1375, 1987.[30] Bao D Nguyen and George A McMechan. Excitation amplitude imaging condition for prestack reverse-time migration. Geophysics,78(1):S37–S46, 2013.[31] Paul Sava and Sergey Fomel. Time-shift imaging condition in seismic migration. Geophysics, 71(6):S209–S217, 2006.[32] Bin Wang, Chuck Mason, Manhong Guo, Kwangjin Yoon, Jun Cai, Jean Ji, and Zhiming Li. Subsalt velocity update and composite imagingusing reverse-time-migration based delayed-imaging-time scan. Geophysics, 74(6):WCA159–WCA166, 2009.[33] Paul Sava and Ivan Vasconcelos. Extended imaging conditions for wave-equation migration. Geophysical Prospecting, 59(1):35–55, 2011.[34] Faqi Liu, Guanquan Zhang, Scott A Morton, and Jacques P Leveille. An eﬀective imaging condition for reverse-time migration using waveﬁelddecomposition. Geophysics, 76(1):S29–S39, 2011.[35] Sandip Chattopadhyay and George A McMechan. Imaging conditions for prestack reverse-time migration. Geophysics, 73(3):S81–S89, 2008.[36] John C. Strikwerda. Finite Diﬀerence Schemes and Partial Diﬀerential Equations, Second Edition. SIAM: Society for Industrial and AppliedMathematics.[37] Toktam Zand, Hamid R. Siahkoohi, Alison Malcolm, Ali Gholami, and Alan Richardson. Consensus optimization of total variation–basedreverse time migration. Computational Geosciences, 2020.[38] Philip Kearey, Michael Brooks, and Ian Hill. An introduction to geophysical exploration. John Wiley & Sons, 2013.[39] Öz Yilmaz. Seismic data analysis: Processing, inversion, and interpretation of seismic data. Society of exploration geophysicists, 2001.[40] Lanshu Bai, Huiyi Lu, and Yike Liu. High-eﬃciency observations: compressive sensing and recovery of seismic waveform data. Pure andApplied Geophysics, 177(1):469–485, 2020.[41] Navjot Kukreja, Jan Huckelheim, Mathias Louboutin, Kaiyuan Hou, Fabio Luporini, Paul Hovland, and Gerard Gorman. Combining check-pointing and data compression for large scale seismic inversion. arXiv preprint arXiv:1810.05268, 2018.[42] Peter Lindstrom, Po Chen, and En-Jui Lee. Reducing disk storage of full-3d seismic waveform tomography (f3dt) through lossy onlinecompression. Computers & Geosciences, 93:45–54, 2016.[43] Philipp A Witte, Mathias Louboutin, Fabio Luporini, Gerard J Gorman, and Felix J Herrmann. Compressive least-squares migration withon-the-ﬂy fourier transforms. Geophysics, 84(5):R655–R672, 2019.[44] Marco Ballesio, Joakim Beck, Anamika Pandey, Laura Parisi, Erik von Schwerin, and Raúl Tempone. Multilevel monte carlo acceleration ofseismic wave propagation under uncertainty. GEM-International Journal on Geomathematics, 10(1):22, 2019.[45] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. .[46] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017.[47] Sergey Ioﬀe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICMLâĂŹ15, page448âĂŞ456. JMLR.org, 2015.[48] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectiﬁer neural networks. In Geoﬀrey Gordon, David Dunson, and MiroslavDudÃŋk, editors, Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence and Statistics, volume 15 of Proceedingsof Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.[49] F. Chollet. Deep Learning with Python. Manning Publications Company, 2017.[50] Norman Ricker. The form and laws of propagation of seismic wavelets. Geophysics, 18(1):10–40, 01 1953.[51] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeﬀrey Dean,Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoﬀrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, LukaszKaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, OriolVinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorﬂow.org.[52] P. Kingma Diederik and Ba Jimmy. Adam: A method for stochastic optimization. preprint arXiv:1412.6980, 2014.[53] Sanford Weisberg. Applied Linear Regression. John Wiley & Sons, Inc., 2005.[54] Y. Li and J. Sun. 3d magnetization inversion using fuzzy c-means clustering with application to geology diﬀerentiation. Geophysics, 81:J61–J78, 2016.[55] T. Huang, Y. Zhang, and H. Zhang. The beneﬁt of tti reverse time migration for subsalt imaging, gulf of mexico. 2009..[46] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017.[47] Sergey Ioﬀe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICMLâĂŹ15, page448âĂŞ456. JMLR.org, 2015.[48] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectiﬁer neural networks. In Geoﬀrey Gordon, David Dunson, and MiroslavDudÃŋk, editors, Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence and Statistics, volume 15 of Proceedingsof Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.[49] F. Chollet. Deep Learning with Python. Manning Publications Company, 2017.[50] Norman Ricker. The form and laws of propagation of seismic wavelets. Geophysics, 18(1):10–40, 01 1953.[51] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeﬀrey Dean,Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoﬀrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, LukaszKaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, OriolVinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning onheterogeneous systems, 2015. Software available from tensorﬂow.org.[52] P. Kingma Diederik and Ba Jimmy. Adam: A method for stochastic optimization. preprint arXiv:1412.6980, 2014.[53] Sanford Weisberg. Applied Linear Regression. John Wiley & Sons, Inc., 2005.[54] Y. Li and J. Sun. 3d magnetization inversion using fuzzy c-means clustering with application to geology diﬀerentiation. Geophysics, 81:J61–J78, 2016.[55] T. Huang, Y. Zhang, and H. Zhang. The beneﬁt of tti reverse time migration for subsalt imaging, gulf of mexico. 2009.