[PDF] A workflow for seismic imaging with quantified uncertainty

Abstract

The interpretation of seismic images faces challenges due to the presence of several uncertainty sources. Uncertainties exist in data measurements, source positioning, and subsurface geophysical properties. Understanding uncertainties' role and how they influence the outcome is an essential part of the decision-making process in the oil and gas industry. Geophysical imaging is time-consuming. When we add uncertainty quantification, it becomes both time and data-intensive. In this work, we propose a workflow for seismic imaging with quantified uncertainty. We build the workflow upon Bayesian tomography, reverse time migration, and image interpretation based on statistical information. The workflow explores an efficient hybrid parallel computational strategy to decrease the reverse time migration execution time. High levels of data compression are applied to reduce data transfer among workflow activities and data storage. We capture and analyze provenance data at runtime to improve workflow execution, monitoring, and debugging with negligible overhead. Numerical experiments on the Marmousi2 Velocity Model Benchmark demonstrate the workflow capabilities. We observe excellent weak and strong scalability, and results suggest that the use of lossy data compression does not hamper the seismic imaging uncertainty quantification.

Full PDF

AA workﬂow for seismic imaging with quantiﬁed uncertainty

Carlos H. S. Barbosa a,b,1 , Liliane N. O. Kunstmann d,2 , Rˆomulo M. Silva a,b,3 , Charlan D. S. Alves a,4 ,Bruno S. Silva b,5 , Djalma M. S. Filho e,6 , Marta Mattoso d,7 , Fernando A. Rochinha c,1 , Alvaro L.G.A.Coutinho a,7, ∗ a High-Performance Computing Center, COPPE, Federal University of Rio de Janeiro, Brazil b Department of Civil Engineering, COPPE, Federal University of Rio de Janeiro, Brazil c Department of Mechanical Engineering, COPPE, Federal University of Rio de Janeiro, Brazil d Department of Computer Science, COPPE, Federal University of Rio de Janeiro, Brazil e CENPES, Petrobras

Abstract

The interpretation of seismic images faces challenges due to the presence of several uncertainty sources.Uncertainties exist in data measurements, source positioning, and subsurface geophysical properties.Understanding uncertainties’ role and how they inﬂuence the outcome is an essential part of the decision-making process in the oil and gas industry. Geophysical imaging is time-consuming. When we adduncertainty quantiﬁcation, it becomes both time and data-intensive. In this work, we propose a workﬂowfor seismic imaging with quantiﬁed uncertainty. We build the workﬂow upon Bayesian tomography, re-verse time migration, and image interpretation based on statistical information. The workﬂow exploresan eﬃcient hybrid parallel computational strategy to decrease the reverse time migration execution time.High levels of data compression are applied to reduce data transfer among workﬂow activities and datastorage. We capture and analyze provenance data at runtime to improve workﬂow execution, monitor-ing, and debugging with negligible overhead. Numerical experiments on the Marmousi2 Velocity ModelBenchmark demonstrate the workﬂow capabilities. We observe excellent weak and strong scalability, andresults suggest that the use of lossy data compression does not hamper the seismic imaging uncertaintyquantiﬁcation.

Keywords: uncertainty quantiﬁcation, seismic imaging, reverse time migration, high-performancecomputing, data compression, data provenance ∗ Corresponding author:

Email address: [email protected] (Alvaro L.G.A. Coutinho)

URL: https://orcid.org/0000-0002-4764-1142 (Alvaro L.G.A. Coutinho) Drafting the manuscript, coding the scientiﬁc workﬂow, and analysis of the seismic images and uncertainty maps. Drafting the manuscript, provenance coding, and analysis of the provenance results. HPC optimization and performance analysis. Coding the Bayesian Eikonal tomography and analysis of the results. Provided the necessary data, and seismic data analysis. Revising the manuscript critically for important intellectual content. Analysis and interpretation of data, drafting the manuscript, and revising the manuscript critically for importantintellectual content.

Preprint submitted to Computers & Geosciences March 24, 2020 a r X i v : . [ phy s i c s . g e o - ph ] M a r . Introduction An application of seismic imaging is delineating the structural geologic aspects of the subsurface.A migrated image is inferred from seismic data after careful processing and migration that intendsto collapse diﬀraction and relocates seismic reﬂections events to their correct position in depth. Thegeneration of images for seismic interpretation needs a migration tool, which has as inputs seismogramsand an estimation of the spatial velocity distribution. This data is acquired from seismic measurementswith appropriated pre-processing and velocity analysis techniques, like Normal Move-Out, tomography,or full-waveform inversion (FWI).In seismic exploration, many decisions rely on interpretations of seismic images, which are aﬀected bymultiple sources of uncertainty. The leading causes of uncertainty in seismic imaging are the geometrysurvey, source signature, data noise, data processing, parameters, and model and numerical errors [1, 2].Such procedures lead, unavoidably, to uncertainties on the data that propagate to the velocity analysisand migration process. The eﬀects of such uncertainties are hard to quantify. Thus, fundamental to thedecision-making process is understanding uncertainties and how they inﬂuence the outcomes. Building adepth seismic migrated image is challenging due to the non-uniqueness of the inverse problem to estimatethe uncertain velocity parameters.Several works discuss how to express the uncertainty related to velocity-model building and measurethe impact on the migration techniques [1, 2, 3, 4, 5, 6]. Typically, uncertainties are characterizedthrough a probabilistic perspective and expressed as an ensemble of possible realizations of a randomvariable, making the Monte Carlo (MC) method the standard tool to manage and quantify uncertainties[7, 8, 9, 10, 11 ? ]. Furthermore, these works emphasize the computational diﬃculties in quantifyingthe uncertainties, particularly those associated with the high dimensionality of the data. For instance,[3] and Poliannikov and Malcolm [4] choose smart sampling strategies to overcome the computationallimits of calculating posteriors for more realistic problems. We recall that to access uncertainties isnecessary to migrate the recorded data in several hundreds of samples. Migrating the samples makesthe computational algorithms not only time but also data intensive. Even in today’s supercomputers,their implementation and data management become critical for the eﬀective implementation of seismicimaging with quantiﬁed uncertainty.To tackle the computational and data management challenges on seismic imaging with quantiﬁeduncertainty, we introduce a workﬂow built upon three sequential stages: (i) Bayesian tomography toestimate the seismic velocity from ﬁrst arrival travel times, where the forward modeling relies on theEikonal equation (called here Bayesian Eikonal Tomography - BET), and the Reversible Jump MonteCarlo Markov Chain (RJMCMC) algorithm [13, 14, 9]; (ii) Seismic migration using Reverse Time Mi-gration (RTM) wrapped in a Monte Carlo algorithm; (iii) Image interpretation supported by statisticalinformation. The BET inversion for velocity estimation oﬀers the ﬂexibility of treating any prior informa-tion, like a reasonable initial velocity estimation, and it has the advantage of providing results that help2o quantify the uncertainties associated with the solutions [9]. RTM migrates the seismograms for thevelocity models from the ﬁrst stage (BET) to generate the corresponding set of migrated seismic images.Lastly, we expose the uncertainty maps that give a global representation of the ensemble of velocity ﬁelds,and images produced respectively by BET and RTM.Model-selection strategies [3] can decrease the number of models for the migration stage. Neverthe-less, even in this case, RTM migration wrapped into the Monte Carlo method is still challenging dueto the high dimensional uncertain inputs, the amount of data to manage, and computational costs as-sociated with imposing the Courant-Friedrichs-Lewy (CFL) condition for the two-way wave equation.Therefore, we present a parallel solution for RTM with quantiﬁed uncertainty to mitigate the compu-tational costs. Our implementation explores MPI to manage each RTM among distributed platforms,and OpenMP+vectorization to explore parallelism in hybrid architectures. Besides, data compression[15] improves data transfer through the workﬂow stages, to read the seismogram from disk, to managetemporary ﬁles, and to store the results in the ﬁnal stage. Data compression is necessary because theworkﬂow generates a large amount of heterogeneous data. Finally, to manage and track data exchangeamong the stages, we capture provenance data to monitor the several ﬁle transformations in addition toregistering CPU, ﬁle paths, and relationships among ﬁles.This work is organized by initially brieﬂy reviewing the essential components to build the workﬂow.After that, we detail the workﬂow with an emphasis on providing uncertainty estimations on the results.We discuss in the sequence high-performance computing improvements for RTM, and the additional datamanagement tools for data compression and provenance. The paper ends with a summary of our mainconclusions.

2. Migration with Quantiﬁed Uncertainty

In this section, we present the main concepts supporting our probabilistic framework for seismicimaging with quantiﬁed uncertainty. Bayesian inference states the solution of an inverse problem as aposteriori probability distribution (pdf) [16] of a random variable (often random ﬁelds), which in our caseis a function that describes the spatial distribution of the heterogeneous subsurface velocity given theobserved data. However, the posterior distribution cannot be expressed in a convenient analytical form[7]. Hence, to address this problem, we use a Markov Chain Monte Carlo (MCMC) method to generatesamples of the stochastic ﬁeld. The samples ensemble serves as input data for generating seismic images,producing a migration with quantiﬁed uncertainty.

Seismic tomography is an inversion tool that aims to identify physical parameters (velocity, anisotropy,among others) given a set of observations (measurement data). Parameters and the data are related bythe following equation: d obs = F ( m ) + (cid:15), (1)3here (cid:15) is an additive noise including modeling and measurements errors. Here, the operator F is theEikonal equation, that connects the velocity with ﬁrst arrival travel times [17]. The high-dimensionalparameter vector m = ( V , U , n ), where n is the number of grid points, the n -dimensional vector U = { u i } ,contains the grid positions, and the vector V = { v i } , i = 1 , , . . . n , the velocity of the compressionalwave in the i -th computational grid point.Iterative linearized approaches for seismic tomography have problems related to non-linearity and ill- posedness. These methodologies usually produce inaccurate parameters due to local minimum conver-gence. Besides, iterative linearized methods are not able to properly quantify the uncertainty associatedwith their solution. To address this issue, we have chosen a Bayesian formulation to aggregate as infor-mation as possible to yield a better parameter estimation and the associated uncertainties. A Bayesianapproach aims at estimating the parameters conditional distribution with respect to known data, througha combination of prior knowledge regarding the velocity model and information provided by such data.Thus, to deﬁne the posterior distribution P ( m | d obs ), we use Bayes’ theorem: P ( m | d obs ) = P ( d obs | m ) P ( m ) P ( d obs ) , (2)where P ( d obs | m ) is the likelihood function, P ( m ) is the a priori parameter density distribution, and P ( d obs ) is the evidence.The likelihood represents how well the estimated travel times ﬁts the measurements. Assuming thatthe additive noise in Equation 1 is Gaussian with zero mean and variance σ d the likelihood is: P ( d obs | m ) = exp  − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F ( m ) − d obs σ d (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  , (3)where (cid:107) · (cid:107) is the standard l -norm.To explore the posterior distribution and deal with the space parameter dimension, we employ theReversible Jump Markov Chain Monte Carlo (RJMCMC) method [13, 14, 7, 9]. The MCMC method isan iterative stochastic approach to generate samples according to the posterior probability distribution.A generalized version, called Reversible Jump, allows inference on both parameters and its dimensionality[7]. Thus, following [9], we employ a Voronoi’s tesselation with Gaussian kernels as basis functions toconstrain the inverse problem solution and reduce the parameter dimensionality. Such parametrizationworks as a prior that embodies the expected velocity ﬁeld non-stationary character, displaying localizedabrupt changes with a continuous variation. Hence, the number of Voronoi cells n (cid:48) (cid:28) n . Therefore, giventhe current state of the chain m i , three perturbations are possible, that is, the probability of adding ( p a )new cell nuclei, probability of deleting ( p d ) one existing cell nuclei, and the probability of moving ( p m )selected cell nuclei. Hence, the acceptance probability of the three perturbations depends on the ratioof their likelihoods. After a burn-in phase, the process iterates, eventually reaching the total numberof pre-deﬁned iterations. By attributing a constant velocity magnitude inside each Voronoi cell, we puta probabilistic structure in the prior, assuming that such velocity values follow independent uniform4ariables. Playing with the bound of the random variables, we can incorporate in the priors spatialtrends learned by experts by diﬀerent data sources, like the deposition history. Reverse Time Migration (RTM) is a depth migration approach based on the two-way wave equationand an appropriate imaging condition. Restricting ourselves to the acoustic isotropic case, the waveequation is, ∇ p ( r , t ) − v ( r ) ∂ p ( r , t ) ∂t = 0 , (4)where r = ( r x , r y , r z ) is the position vector, v is the compressional wave velocity, and p ( r , t ) the hydrostaticpressure in the position r and time t .The RTM imaging condition [18] is the zero-lag cross-correlation of the forward-propagated sourcewaveﬁeld ( S ) and the backward-propagated receiver waveﬁeld ( R ) normalized by the square of the sourcewaveﬁeld, I ( r ) = (cid:82) T S ( r , t ) R ( r , t ) dt (cid:82) T S ( r , t ) dt . (5)The source waveﬁeld is the solution of equation 4 with the seismic wavelet treated as a boundarycondition. Similarly, the receiver waveﬁeld is the solution of the same equation 4 with the observeddata (seismograms) as a boundary condition. The normalization term in equation 4 produces a source-normalized cross-correlation image that has the same unit, scaling, and sign as the reﬂection coeﬃcient[18]. Therefore, within the MC method, RTM migrates the near-surface recorded signals for the ensembleof BET-generated velocity samples. To reach a global representation of the solution variability within the ensemble of plausible velocityﬁelds and images produced by the workﬂow, we follow [19]. They proposed a conﬁdence index, thatnormalizes the uncertainty map explored by Li et al. [3], that assigns at each spatial point an evaluationof the degree of uncertainty expressed by the standard deviation σ ( r ), where low values represent regionswhere there are high variabilities, and high values the regions where there are small variabilities. Theconﬁdence index is expressed by, c ( r ) = σ max − σ ( r ) σ max − σ min , (6)where c ( r ) is the conﬁdence in position r , and σ min , σ max and are the minimum, maximum and standarddeviations in the r position, respectively.An alternative measure is the ratio between standard deviation and average, that is, r ( r ) = σ ( r ) µ ( r ) , (7)where, µ represents the average map of the data. In this case, Equation 7 measures the variations amongthe samples with the mean as reference. 5 . A Workﬂow for Seismic Imaging In this section, we present the components of our workﬂow for seismic imaging with quantiﬁed uncer-tainty. We have chosen a particular set of chained components that provide a comprehensive probabilisticframework for seismic imaging. Nevertheless, we recognize that diﬀerent components could be chainedin a similar form to provide other possible workﬂows. Thus, a workﬂow is an abstraction that representsstages of computations as a set of activities and a data ﬂow among them. Each activity can be a softwarethat performs operations on an input data set, transforming it into an output data set. A workﬂow canrepresent the whole seismic imaging stages as a chain of activities. Stages of seismic imaging (or workﬂowactivities) can be replaced or extended to build a diﬀerent viewpoint of the outcomes, such as statisticinformation for uncertainty quantiﬁcation.Figure 1 shows the three workﬂow stages. BET produces an ensemble of velocity ﬁelds (stage 1),RTM generates the corresponding seismic images (stage 2), and stage 3 builds the uncertainty mapsfor the velocity and migrated images from the results of stages 1 and 2. To eﬃciently implement thisworkﬂow, we address three computational aspects: parallelism, data transfer, and data management.Figure 2 describes how the workﬂow produces the ensemble of velocities and migrated images for posterioruncertainty quantiﬁcation, the hybrid parallelism in the RTM stage, and where data compression is usedto reduce storage and network data transfer.

Figure 1: Results of BET (Stage 1), and RTM for each velocity ﬁeld (Stage 2) built-in Stage 1. Stage 3 shows the uncertaintymaps for velocities and migrated images. The maps can guide interpreters to potential areas with low/high uncertainty invelocities and structures.

Stage 1 aims to build a probabilistic density function for the velocity encompassing feasible solutions6hat ﬁt the calculated travel times to the data and provides enough information for the next two stages.To generate samples of the stochastic ﬁeld, BET needs to run thousands of RJMCMC iterations. In theVoronoi’s tessellation within RJMCMC the number of cells, their position and size vary. This strategyproduces a number of Voronoi cells smaller than the velocity parameters, as a cell might contain manycomputational grid points. We use a KD-tree [ ? ] to reduce the searching cost in the Voronoi’s tessellationupdating. Solving the Eikonal wave equation in each iteration dominates BET costs. We use here theopen-source software FaATSO to solve the Eikonal wave equation. Figure 2: Stage 1 provides a probabilistic distribution of velocity using BET. Stage 2 migrates the seismograms for allvelocity parameters, generating, thus, a set of stacked seismic sections. Finally, Stage 3 extracts statistical information fromvelocity parameters and migrated images. Compression is used to transfer the data among the stages. Step 2 uses rtmUq2D a hybrid parallel RTM code.

At the end of stage 1, we select a set of velocities for the migration based on the variance of successivebatches of the post-burn-in samples. Thus, when the stochastic process converges, the variance amongthe samples stabilizes. In the second stage, RTM migrates the seismograms for the sub-set of velocityﬁelds. These migrations produce the corresponding set of seismic images, where each one has a directrelation with one velocity sample coming from stage 1. To attenuate low-frequency artifacts in the brutestack images, we apply a Laplacian ﬁlter. The RTM kernel, that is, the second-order wave equation,is discretized with a second-order ﬁnite diﬀerence scheme in time and an eighth-order scheme in space.The reverse problem implementation explores the technique proposed by [21]. This technique guaranteesthe coercivity of the adjoint problem. Furthermore, the RTM code, rtmUq2D , takes advantage of the Fa st Marching A coustic Emission T omography using S tandard O ptimization: https://github.com/nbrantut/FaATSO rtmUq2D also contains OpenMP directives to explore multiple cores parallelism. To complete ourhybrid RTM solution, we apply MPI to manage the execution of multiple migrations and shots. Thus, webroke the total number of migrations into subgroups, and each subgroup is assigned to an MPI process,as exempliﬁed in Figure 2. When a subgroup ﬁnishes its workload, the next one starts immediately.The batch execution (an RTM for each subgroup) is repeated until exhausting the number of samples.Besides, each RTM veriﬁes its own CFL condition to guarantee the stability of the discrete solution ofthe two-way wave equation for each sample.Once stages 1 and 2 produce ensembles of velocity ﬁelds and seismic images, a condensed repre-sentation of this information can be calculated. In stage 3, velocity uncertainty maps are obtained assoon as we accept BET solutions. The same procedure is valid for generating migrated images. Hence,stage 3 generates uncertainty maps representing variability in velocity, and reﬂectors amplitudes. Theseoutcomes give to geoscientists extra information to guide the seismic interpretation.However, to better support interpretation and improve conﬁdence, it is essential to register relation-ships between data produced at each workﬂow stage. For example, the uncertainty maps of stage 3 arerelated to the velocity ﬁelds and statistics, which are also related to the seismic images. Tracking thesedata derivations among the stages at runtime, combined with the procedures’ execution time, can sup-port workﬂow parameters debugging and ﬁne-tuning. Provenance data [22] represents the data derivationpath of the data transformations during the stages. Provenance data capture and storing in a prove-nance database while the workﬂow executes can add an overhead to the workﬂow execution. We chooseDfAnalyzer as a provenance asynchronous data capture software library because it can be used by highperformance computing (HPC) workﬂows to provide for runtime data analysis with negligible overhead[23]. While monitoring the workﬂow execution, users submit queries to the travel time matrix duringthe migration or go directly from an input velocity model ﬁle to the corresponding ﬁles that have theirstatistics obtained after all samples were aggregated. DfAnalyzer has been successfully used for otherHPC applications [24, 25].Data transfer is also a concern when each stage in the workﬂow has to manage its information andsend it to the next workﬂow stage. The amount of data transfer can increase drastically, given thenumber and size of the samples in the ensemble of velocities and migrated images. Data compressioncan reduce persistent storage and data transfer throughout the chain [15]. The ZFP library is an open-source C/C++ library for compressing numerical arrays. To achieve high compression rates, ZFP useslossy but optionally error-bounded compression. Although bit-for-bit lossless compression of ﬂoating-point data is not always possible, ZFP also oﬀers an accurate near-lossless compression [15]. We employZFP to compress the numerical arrays that store the velocity parameters, imaging conditions, and theseismograms. This use is linked to the workﬂow stages to reduce the pressure on the network data transferamong the workﬂow stages and to reduce I/O to persistent storage. However, lossy compression has tobe managed carefully, and with safe error-bounds. We test both lossy and lossless compression, and we8easure the error on the ﬁnal results.

4. Computational Experiments

We designed some computational experiments to assess the overall workﬂow behavior and perfor-mance. They consist of verifying RTM strong and weak scalability, measuring the numerical errors onthe ﬁnal results after applying data compression, and discussing the workﬂow uncertainty quantiﬁcationoutcomes using the data analysis tools. We choose two datasets: a two-layer constant velocity ﬁeld forscalability analysis, and one based on the 2-D Marmousi2 benchmark [26] for applying data compressionand running the full workﬂow.

We choose a two-layer constant velocity model that is 3.0 km depth and 9.2 km in the horizontaldirection for the scalability tests. The velocities for the two layers are 2000 m/s, and 3000 m/s. Wegenerate the reference data set for RTM based on the true velocity ﬁeld. The reference data set is thewaveﬁeld recorded by near-surface hydrophones (one seismogram) with a cutoﬀ frequency of 30 Hz. Theinput velocity ﬁeld for RTM is a smooth version of the synthetic velocity model.Figure 3 shows the RTM strong scalability considering a migration of one shot for an increasing numberof cores. We run the tests on diﬀerent platforms: 1 vector processor (NEC SX-Aurora TSUBASA TypeB),and 2 multi-cores CPUs (Intel Xeon E5-2670v3 (Haswell) and Intel Xeon Platinum 8160 (Skylake) CPUs).The vector processor available has its minimum conﬁguration, that is, 8 cores, while Haswell and Skylakehave 24 cores each.

Figure 3: RTM scalability analysis on diﬀerent platforms. The tests consist of a migration for one seismogram. The vectorprocessor available has 8 cores, while Haswell and Skylake have 24 cores each. n times to build thestatistics outcomes, the hybrid RTM has to run n times in n nodes. In this case, the CPU time for eachnode has to be approximately the same. Assuming the same parametrization of the strong scalabilitytest, we run the RTM code for an increasing number of nodes and measure the CPU time for each one.We note that the RTM takes, on average, 3.281s with standard deviation 0.0578s to execute one RTMper node on the Haswell cluster. Thus, the CPU time for running one RTM in one node is approximatelythe same as running ﬁve RTM in ﬁve nodes. We observe the same pattern for the other platforms. We select the 2-D Marmousi2 velocity model benchmark [26] for the data compression experiments.The benchmark is 3.5 km depth and 17.0 km in the horizontal direction and has parameters models(p-wave velocity and density), and other data sets related to its acquisition. However, we generate theseismograms solving the two-way wave equation with true velocity. Near-surface hydrophones record theseismogram with a cutoﬀ frequency of 45 Hz. Thus, we have 160 seismograms to be used in the migrationstage. In the BET stage, the reference data set is 34 travel-time panels obtained from solving the Eikonalequation with true velocity.After BET, we pick 5000 post-burn-in velocity models for migration. Thus, the RTM produced 5000seismic images that corresponding with each velocity model from stage 1. Hence, we use 5000 velocitymodels, 5000 seismic migrations, and 160 seismograms to test the lossy and lossless rate compression.Table 1 shows the results related to compression eﬃciency for the velocity models and seismic images.The compression eﬃciency for the seismograms are in Table 2.

Table 1: Fixed accuracy lossy and lossless compression comparison. The error tolerances for lossy compression are 10 − ,and 10 − . Each value is the average of the 1000 compressed matrices (velocity ﬁelds, and seismic images). Compression Velocity (MB) Image (MB)None 6.10 6.10Lossless 4.78 5.34Lossy (10 − ) 4.77 3.65Lossy (10 − ) 2.98 1.83Each velocity model and the seismic image without compression has 6.1 MB. The values in the columns”Velocity” and ”Image” represent, respectively, the average size of 1000 matrices for the velocity models10nd seismic images without compression (ﬁrst line), and with lossless and lossy compression. We applythree compression types: the ﬁrst one is lossless compression, the second one is lossy compression with anabsolute error of 10 − , and the last one is lossy compression with an absolute error of 10 − . We can seethat the lossy compression achieves the best compression rate, reaching about 51.1% of compression forthe velocity models (2.98 MB), and about 70.0% of compression for the seismic images (1.83 MB). Notethat the compression rates for each matrix are diﬀerent since ZFP explores the structure of the data.Although less eﬃcient, the lossless compression reaches about 21.6% of compression for the velocitymodels (4.78 MB) and about 12.5% of compression for the seismic images (5.34 MB). This eﬃciency ratefor lossless compression is compatible with its purpose, which is to preserve the original information.We also apply lossless and lossy compression to the 160 seismograms. Each seismogram has 76.20MB. The error tolerance for lossy compression is to 10 − . Compression rates reach a minimum of 13.38MB and a maximum of 8.38 MB, as we can see in Table 2. These values represent 82.4%, and 89.0% lessthan the original data, respectively. Table 2: Seismograms ﬁxed accuracy lossy and lossless compression comparison. The error tolerance for lossy compressionis set to 10 − . The values represent the minimum and maximum compression among the 160 seismograms. Compression Type Minimum (MB) Maximum (MB)None 76.20 76.20Lossless 59.28 48.46Lossy 13.38 8.38The results from Table 1, and Table 2 suggest that we can use compressed data on the workﬂow and,thus, reduce the storage and network data transfer. However, we need to measure the error propagationdue to compression on the ﬁnal seismic images. Therefore, to measure the compression errors, we run RTMto migrate the seismograms without compression with an uncompressed velocity model as a parameter.Figure 4(a) shows the migration of the uncompressed seismograms, and Figure 4(d) shows the 2D Fourierspectra of the two white boxes shown in Figure 4(a). The described outcomes are the reference oncewe do not use ZFP compression. The same procedure is applied using the lossless compression of thevelocity model and seismograms as entries for RTM migration, and the lossy compression (tolerance errorof 10 − ) of the velocity model and seismograms as entries for RTM migration. These results can be seenin Figures 4(a)(e), and Figures 4(c)(f), respectively.We calculate the normalized root mean square (NRMS) error among the RTM results from Figure 4(a),and the RTM results from Figures 4(b), and 4(c). The NRMS errors are 2 . × − % and 4 . × − %,respectively. The NRMS errors among the spectra from the left white box regions in Figures 4(a), 4(b)and 4(c) are 1 . × − %, and 1 . × − %, respectively. Moreover, the NRMS errors among the spectrafrom the right white box regions in Figures 4(a), 4(b) and 4(c) are 1 . × − %, and 2 . × − %,respectively. The results from Figure 4 and the NRMS errors show that the chosen compression levels11or the velocity models and seismograms lead to migrated images with errors of order 10 − %. Thecompression strategy allows to reduce 85% in storage, and, consequently, improves network data transferwithout compromising accuracy. Figure 4: RTM outcomes after migrating the (a) reference seismograms, seismograms with (b) lossless compression, andseismograms with (c) lossy compression. The velocity ﬁeld inputs are not compressed for migration in Figure (a), losslesslycompressed for migration in Figure (b), and lossy compressed for Figure (c). Figures (d), (e), and (f) represent the 2DFourier spectra of the white box regions shown in Figures (a), (b), and (c), respectively.

Provenance data is captured to analyze data from the several workﬂow stages by DfAnalyzer andstored in a columnar database using the MonetDB system . The workﬂow needs to have a deﬁnitionof which data should be part of the provenance database. Then, DfAnalyzer calls are inserted in theworkﬂow code to capture data provenance, as shown in Listing 1. We run a series of RTMs with andwithout DfAnalyzer calls. In all tests, the overhead of provenance data capture and storage remainsbelow 2.95%, as shown in Table 3. Also, the overhead rate decreases as the number of nodes increases.During and after the workﬂow executions, queries can be submitted to the provenance database usingﬁlters or data aggregations over the workﬂow stages data derivation path, as shown in Figure 5. We discuss here the results of sequentially applying BET and RTM to the raw data provided byseismograms generating images with quantiﬁed uncertainty. The ensemble of uncertain velocity ﬁelds able 3: RTM overhead with DfAnalyzer calls. Each node executes two RTMs. shot shotLocation(x) m shotLocation(z) m seismogram path cross correlation path Figure 5: Provenance query result showing the shots with their ( x, z ) location and the path of their corresponding seismogramimage ﬁle and the cross correlation ﬁle, ﬁltered by a shot location where z = 25 and 0 < x < resulting from BET is assessed through primary statistics, such as velocity mean and standard deviation.In the sequence, we compute uncertainty maps. After the RTM migrates the seismograms for eachvelocity ﬁeld, we analyze the impact of the inversion uncertainties in the seismic images. Note that inthis stage, we choose to work on the compressed seismograms and velocity ﬁelds to reduce the storage andnetwork transfer. As we show in Figure 4, the lossy compression for both velocity and seismograms, withan error tolerance of 10 − , does not produce migrated seismic images with poor quality. Hence, we usethem in the whole workﬂow. Besides, the Marmousi2 benchmark, as speciﬁed in Sub-section 4.2, is usedto demonstrate the outcomes of the workﬂow. To keep track of the impact of each uncertainty source, werestrict the analysis in this particular example to the noise associated with monitoring the data, deﬁnedby σ d in equation 1. The other component that would induce uncertainty would be the scarcity of datapromoted by acquiring signals only on the surface. That is circumvented by collecting the synthetic data13n all grid points. We anticipate facing a lower degree of uncertainty in the ﬁnal images due to such achoice. int task_id = 1; Task task_runrtm = Task( dataflow . get_tag (), runrtm . get_tag (), task_id ); vector irunrtm_values = { velocityFile , waveletFile , std :: to_string (Nx), std :: tostring (Nz)}; Dataset & ds_irunrtm = task_runrtm . add_dataset ( irunrtm . get_tag ()); ds_irunrtm . add_element_with_values ( irunrtm_values ); // rtm per shot for( shotID = 1; shotID <= numberOfShots ; shotID ++) { shotLocation [0] = firstLocationShot + ( shotID - 1)* spacingShot ; // forward modeling isotropicAcousticModeling ( shotLocation , velocityModel , wavelet ); sprintf ( seismogramFile_02 , seismogramFile , shotID ); // begin DfAnalyzer input registration vector iseismogram_values = { seismogramFile_02 }; Dataset & ds_iseismogram = task_runrtm . add_dataset ( iseismogram . get_tag ()); ds_iseismogram . add_element_with_values ( iseismogram_values ); task_runrtm .begin (); // end input registration reading_file ( seismogramFile_02 , seismogram , numberOfReceivers , numberOfTimeStep ); // backward modeling adjointModeling (shotID , shotLocation , velocityModel , wavelet , seismogram ); // begin DfAnalyzer output registration if ( shotID == numberOfShots ) task_runrtm .end (); else task_runrtm .save (); // end output registration } Listing 1: This workﬂow code shows DfAnalyzer calls inserted before RTM stage runrtm.begin to capture and register theinput: velocityFile; wavelet and seismogram path with the corresponding output: cross-correlation ﬁle path after the RTMexecution runrtm.end explicitly relating input and output data in the provenance database.

Figure 6 shows the standard deviation (a), conﬁdence index (b), and standard deviation over themean value (c) of the velocity ﬁelds. The standard deviation ﬁeld expresses the degree of uncertaintyin the velocity at each point of the domain. The minimum standard deviation values are around 1 m/sand localized in shallow regions. On the other hand, the maximum standard deviation values occurin deeper regions with values around 80 m/s. The conﬁdence index (Equation 6) is the normalizedstandard deviation and represents the conﬁdence degree in the velocities [3]. We depict in Figure 6 (b)the conﬁdence index within the image domain. We can see that such criterion assigns higher conﬁdence14 > .

8) to shallow regions. Figure 6 (c) shows r ( r ) map. Again we observe the same trends. Figure 6: Figures show the standard deviation (a), conﬁdence index (b) and r ( r ) map (c) of the Marmousi velocity ﬁelds. We plot in Figure 7 (a) the standard deviation for a representative image computed as an averageamongst those of the ensemble. Besides, we plot the conﬁdence index (b) and r ( r ) map (c) of the seismicimages obtained from RTM wrapped on the MCMC. In this case, the observed variations are related to theamplitude of the reﬂections instead of p-wave velocity. That is the variability of the amplitude intensity.This one is maximum in the interface between two rocks with diﬀerent physical properties. However, if anestimated p-wave velocity diﬀers from the exact p-wave velocity of the medium, the amplitude intensityis misplaced. Hence, the high standard deviation, low conﬁdence index, and low standard deviation overthe mean values are highlighted in the interfaces and predominant in the middle region.15 igure 7: Figures show the standard deviation (a), conﬁdence index (b) and r ( r ) map (c) of the migrated seismic images.RTM migrates 160 seismograms for each velocity ﬁeld.

5. Conclusions

We proposed a new scientiﬁc workﬂow that supports uncertainty quantiﬁcation in seismic imaging.Our stochastic perspective to handle uncertainties is composed of three sequential stages: inversion,migration, and interpretation. A Bayesian tomography (Stage 1) provides the input data for the RTMwrapped on the MCMC (Stage 2). The workﬂow deploys means to end-users to explore uncertainty inthe ﬁnal images through maps computed in Stage 3.The workﬂow execution requires signiﬁcant computing power and generates a huge amount of data.Thus, we use optimized hybrid computing solutions that explore vectorization, data compression, andadvanced data management tools. These tools allow us to obtain a scalable solution for the MarmousiVelocity Model benchmark with quantiﬁed uncertainty. ZFP compression has been used to transfer16nformation among the stages and to store the ﬁnal images. High levels of compression have beenobserved for the ensembles of velocities and migrated images. The compression strategy improves thenetwork data transfer and reduces the amount of data up to 85%, assuming a tolerance error of 10 − without aﬀecting the quality of the ﬁnal result. The provenance database keeps a record of the workﬂowexecution stages enriching the seismic data analysis quality. We observed that capturing and storingprovenance data during the workﬂow execution is inexpensive. Acknowledgements

This study was ﬁnanced in part by CAPES, Brasil Finance Code 001. This work is also partiallysupported by FAPERJ, CNPq, and Petrobras. Computer time on Stampede is provided by TACC, theUniversity of Texas at Austin. The US NSF supports Stampede under award ACI-1134872. Computertime on Lobo Carneiro machine at COPPE/UFRJ is also acknowledged. We are also indebted to NECCorporation.

Computer Code Availability

The codes can be downloaded at the repository https://github.com/Uncertainty-Quantification-Project .The repository provides three opensource codes ( bayes-tomo , rtmUq2D , and rtmUq2D-Prov ). The userguide describes the software and hardware requirements, and directs to complementary opensource soft-ware used in the paper. References [1] J. Caers, Modeling uncertainty in the earth sciences, John Wiley & Sons, 2011.[2] S. Fomel, E. Landa, Structural uncertainty of time-migrated seismic images, Journal of AppliedGeophysics 101 (2014) 27–30.[3] L. Li, J. Caers, P. Sava, Assessing seismic uncertainty via geostatistical velocity-model perturbationand image registration: An application to subsalt imaging, The Leading Edge 34 (2015) 1064–1070.[4] O. Poliannikov, A. Malcolm, The eﬀect of velocity uncertainty on migrated reﬂector: Improvementsfrom relative-depth imaging, Geophysics 81 (2016) S21–S29.[5] J. Messud, M. Reinier, H. Prigent, P. Guillauma, T. Col´eou, S. Masclet, Extracting seismic uncer-tainty from tomography velocity inversion and their use in reservoir risk analysis, The Leading Edge36 (2017) 127–132.[6] G. Ely, A. Malcolm, O. Poliannikov, Assessing uncertainty in velocity models and images with afast nonlinear uncertainty quantiﬁcation method, Geophysics 83 (2018) R63–R75.177] T. Bodin, M. Sambridge, Seismic tomography with reversible jump algorithm, Geophysical JournalInternational 178 (2009) 1411–1436.[8] A. Botero, A. Gesret, T. Romary, M. Noble, C. Maisons, Stochastic seismic tomography by inter-acting markov chains, Geophysical Journal International 207 (2016) 374–392.[9] J. Belhadj, T. Romary, A. Gesret, M. Noble, , B. Figliuzzi, New parametrizations for bayesianseismic tomography, Inverse Problems 34 (2018) 33.[10] D. G. Michelioudakis, R. W. Hobbs, C. C. S. Caiado, Uncertainty analysis of depth predictionsfrom seismic reﬂection data using bayesian statistics, Geophysical Journal International 213 (2018)2161–2176.[11] J. Martin, L. C. Wilcox, C. Burstedde, O. Ghattas, A stochastic newton mcmc method for large-scale statistical inverse problems with application to seismic inversion, SIAM Journal on ScientiﬁcComputing 34 (2012) A1460–A1487.[12] N. Rawlinson, A. Fichtner, M. Sambridge, M. K. Young, Chapter one - seismic tomogra-phy and the assessment of uncertainty, volume 55 of

Advances in Geophysics , Elsevier, 2014,pp. 1 – 76. URL: .doi: https://doi.org/10.1016/bs.agph.2014.08.001 .[13] P. J. Green, Reversible jump markov chain monte carlo computation and bayesian model determi-nation, Biometrika 82 (1995) 711–732.[14] P. J. Green, D. I. Hastie, Reversible jump mcmc, Genetics 155 (2009) 1391–1403.[15] P. Lindstrom, P. Chen, , E.-J. Lee, Reducing disk storage of full-3d seismic waveform tomography(f3dt) through lossy online compression, Computers & Geosciences 93 (2016) 45–54.[16] A. F. M. Smith, Bayesian computational methods, Philosophical Transactions: Physical Sciencesand Engineering 337 (1991) 369–386.[17] N. Brantut, Time-resolved tomography using acoustic emissions in the laboratory, and applicationto sandstone compaction, Geophysical Journal International 213 (2018) 2177–2192.[18] H.-W. Zhou, H. Hu, Z. Zou, Y. Wo, O. Youn, Reverse time migration: A prospect of seismic imagingmethodology., Earth-Science Reviews 179 (2018) 207–227.[19] Y. Li, J. Sun, 3d magnetization inversion using fuzzy c-means clustering with application to geologydiﬀerentiation, Geophysics 81 (2016) J61–J78.[20] M. B. Kennel, KDTREE 2: Fortran 95 and C++ software to eﬃciently search for near neighbors ina multi-dimensional Euclidean space (2004). arXiv:0408067arXiv:0408067