A workflow for seismic imaging with quantified uncertainty
Carlos H. S. Barbosa, Liliane N. O. Kunstmann, Rômulo M. Silva, Charlan D. S. Alves, Bruno S. Silva, Djalma M. S. Filho, Marta Mattoso, Fernando A. Rochinha, Alvaro L.G.A. Coutinho
AA workflow for seismic imaging with quantified uncertainty
Carlos H. S. Barbosa a,b,1 , Liliane N. O. Kunstmann d,2 , Rˆomulo M. Silva a,b,3 , Charlan D. S. Alves a,4 ,Bruno S. Silva b,5 , Djalma M. S. Filho e,6 , Marta Mattoso d,7 , Fernando A. Rochinha c,1 , Alvaro L.G.A.Coutinho a,7, ∗ a High-Performance Computing Center, COPPE, Federal University of Rio de Janeiro, Brazil b Department of Civil Engineering, COPPE, Federal University of Rio de Janeiro, Brazil c Department of Mechanical Engineering, COPPE, Federal University of Rio de Janeiro, Brazil d Department of Computer Science, COPPE, Federal University of Rio de Janeiro, Brazil e CENPES, Petrobras
Abstract
The interpretation of seismic images faces challenges due to the presence of several uncertainty sources.Uncertainties exist in data measurements, source positioning, and subsurface geophysical properties.Understanding uncertainties’ role and how they influence the outcome is an essential part of the decision-making process in the oil and gas industry. Geophysical imaging is time-consuming. When we adduncertainty quantification, it becomes both time and data-intensive. In this work, we propose a workflowfor seismic imaging with quantified uncertainty. We build the workflow upon Bayesian tomography, re-verse time migration, and image interpretation based on statistical information. The workflow exploresan efficient hybrid parallel computational strategy to decrease the reverse time migration execution time.High levels of data compression are applied to reduce data transfer among workflow activities and datastorage. We capture and analyze provenance data at runtime to improve workflow execution, monitor-ing, and debugging with negligible overhead. Numerical experiments on the Marmousi2 Velocity ModelBenchmark demonstrate the workflow capabilities. We observe excellent weak and strong scalability, andresults suggest that the use of lossy data compression does not hamper the seismic imaging uncertaintyquantification.
Keywords: uncertainty quantification, seismic imaging, reverse time migration, high-performancecomputing, data compression, data provenance ∗ Corresponding author:
Email address: [email protected] (Alvaro L.G.A. Coutinho)
URL: https://orcid.org/0000-0002-4764-1142 (Alvaro L.G.A. Coutinho) Drafting the manuscript, coding the scientific workflow, and analysis of the seismic images and uncertainty maps. Drafting the manuscript, provenance coding, and analysis of the provenance results. HPC optimization and performance analysis. Coding the Bayesian Eikonal tomography and analysis of the results. Provided the necessary data, and seismic data analysis. Revising the manuscript critically for important intellectual content. Analysis and interpretation of data, drafting the manuscript, and revising the manuscript critically for importantintellectual content.
Preprint submitted to Computers & Geosciences March 24, 2020 a r X i v : . [ phy s i c s . g e o - ph ] M a r . Introduction An application of seismic imaging is delineating the structural geologic aspects of the subsurface.A migrated image is inferred from seismic data after careful processing and migration that intendsto collapse diffraction and relocates seismic reflections events to their correct position in depth. Thegeneration of images for seismic interpretation needs a migration tool, which has as inputs seismogramsand an estimation of the spatial velocity distribution. This data is acquired from seismic measurementswith appropriated pre-processing and velocity analysis techniques, like Normal Move-Out, tomography,or full-waveform inversion (FWI).In seismic exploration, many decisions rely on interpretations of seismic images, which are affected bymultiple sources of uncertainty. The leading causes of uncertainty in seismic imaging are the geometrysurvey, source signature, data noise, data processing, parameters, and model and numerical errors [1, 2].Such procedures lead, unavoidably, to uncertainties on the data that propagate to the velocity analysisand migration process. The effects of such uncertainties are hard to quantify. Thus, fundamental to thedecision-making process is understanding uncertainties and how they influence the outcomes. Building adepth seismic migrated image is challenging due to the non-uniqueness of the inverse problem to estimatethe uncertain velocity parameters.Several works discuss how to express the uncertainty related to velocity-model building and measurethe impact on the migration techniques [1, 2, 3, 4, 5, 6]. Typically, uncertainties are characterizedthrough a probabilistic perspective and expressed as an ensemble of possible realizations of a randomvariable, making the Monte Carlo (MC) method the standard tool to manage and quantify uncertainties[7, 8, 9, 10, 11 ? ]. Furthermore, these works emphasize the computational difficulties in quantifyingthe uncertainties, particularly those associated with the high dimensionality of the data. For instance,[3] and Poliannikov and Malcolm [4] choose smart sampling strategies to overcome the computationallimits of calculating posteriors for more realistic problems. We recall that to access uncertainties isnecessary to migrate the recorded data in several hundreds of samples. Migrating the samples makesthe computational algorithms not only time but also data intensive. Even in today’s supercomputers,their implementation and data management become critical for the effective implementation of seismicimaging with quantified uncertainty.To tackle the computational and data management challenges on seismic imaging with quantifieduncertainty, we introduce a workflow built upon three sequential stages: (i) Bayesian tomography toestimate the seismic velocity from first arrival travel times, where the forward modeling relies on theEikonal equation (called here Bayesian Eikonal Tomography - BET), and the Reversible Jump MonteCarlo Markov Chain (RJMCMC) algorithm [13, 14, 9]; (ii) Seismic migration using Reverse Time Mi-gration (RTM) wrapped in a Monte Carlo algorithm; (iii) Image interpretation supported by statisticalinformation. The BET inversion for velocity estimation offers the flexibility of treating any prior informa-tion, like a reasonable initial velocity estimation, and it has the advantage of providing results that help2o quantify the uncertainties associated with the solutions [9]. RTM migrates the seismograms for thevelocity models from the first stage (BET) to generate the corresponding set of migrated seismic images.Lastly, we expose the uncertainty maps that give a global representation of the ensemble of velocity fields,and images produced respectively by BET and RTM.Model-selection strategies [3] can decrease the number of models for the migration stage. Neverthe-less, even in this case, RTM migration wrapped into the Monte Carlo method is still challenging dueto the high dimensional uncertain inputs, the amount of data to manage, and computational costs as-sociated with imposing the Courant-Friedrichs-Lewy (CFL) condition for the two-way wave equation.Therefore, we present a parallel solution for RTM with quantified uncertainty to mitigate the compu-tational costs. Our implementation explores MPI to manage each RTM among distributed platforms,and OpenMP+vectorization to explore parallelism in hybrid architectures. Besides, data compression[15] improves data transfer through the workflow stages, to read the seismogram from disk, to managetemporary files, and to store the results in the final stage. Data compression is necessary because theworkflow generates a large amount of heterogeneous data. Finally, to manage and track data exchangeamong the stages, we capture provenance data to monitor the several file transformations in addition toregistering CPU, file paths, and relationships among files.This work is organized by initially briefly reviewing the essential components to build the workflow.After that, we detail the workflow with an emphasis on providing uncertainty estimations on the results.We discuss in the sequence high-performance computing improvements for RTM, and the additional datamanagement tools for data compression and provenance. The paper ends with a summary of our mainconclusions.
2. Migration with Quantified Uncertainty
In this section, we present the main concepts supporting our probabilistic framework for seismicimaging with quantified uncertainty. Bayesian inference states the solution of an inverse problem as aposteriori probability distribution (pdf) [16] of a random variable (often random fields), which in our caseis a function that describes the spatial distribution of the heterogeneous subsurface velocity given theobserved data. However, the posterior distribution cannot be expressed in a convenient analytical form[7]. Hence, to address this problem, we use a Markov Chain Monte Carlo (MCMC) method to generatesamples of the stochastic field. The samples ensemble serves as input data for generating seismic images,producing a migration with quantified uncertainty.
Seismic tomography is an inversion tool that aims to identify physical parameters (velocity, anisotropy,among others) given a set of observations (measurement data). Parameters and the data are related bythe following equation: d obs = F ( m ) + (cid:15), (1)3here (cid:15) is an additive noise including modeling and measurements errors. Here, the operator F is theEikonal equation, that connects the velocity with first arrival travel times [17]. The high-dimensionalparameter vector m = ( V , U , n ), where n is the number of grid points, the n -dimensional vector U = { u i } ,contains the grid positions, and the vector V = { v i } , i = 1 , , . . . n , the velocity of the compressionalwave in the i -th computational grid point.Iterative linearized approaches for seismic tomography have problems related to non-linearity and ill- posedness. These methodologies usually produce inaccurate parameters due to local minimum conver-gence. Besides, iterative linearized methods are not able to properly quantify the uncertainty associatedwith their solution. To address this issue, we have chosen a Bayesian formulation to aggregate as infor-mation as possible to yield a better parameter estimation and the associated uncertainties. A Bayesianapproach aims at estimating the parameters conditional distribution with respect to known data, througha combination of prior knowledge regarding the velocity model and information provided by such data.Thus, to define the posterior distribution P ( m | d obs ), we use Bayes’ theorem: P ( m | d obs ) = P ( d obs | m ) P ( m ) P ( d obs ) , (2)where P ( d obs | m ) is the likelihood function, P ( m ) is the a priori parameter density distribution, and P ( d obs ) is the evidence.The likelihood represents how well the estimated travel times fits the measurements. Assuming thatthe additive noise in Equation 1 is Gaussian with zero mean and variance σ d the likelihood is: P ( d obs | m ) = exp − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F ( m ) − d obs σ d (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (3)where (cid:107) · (cid:107) is the standard l -norm.To explore the posterior distribution and deal with the space parameter dimension, we employ theReversible Jump Markov Chain Monte Carlo (RJMCMC) method [13, 14, 7, 9]. The MCMC method isan iterative stochastic approach to generate samples according to the posterior probability distribution.A generalized version, called Reversible Jump, allows inference on both parameters and its dimensionality[7]. Thus, following [9], we employ a Voronoi’s tesselation with Gaussian kernels as basis functions toconstrain the inverse problem solution and reduce the parameter dimensionality. Such parametrizationworks as a prior that embodies the expected velocity field non-stationary character, displaying localizedabrupt changes with a continuous variation. Hence, the number of Voronoi cells n (cid:48) (cid:28) n . Therefore, giventhe current state of the chain m i , three perturbations are possible, that is, the probability of adding ( p a )new cell nuclei, probability of deleting ( p d ) one existing cell nuclei, and the probability of moving ( p m )selected cell nuclei. Hence, the acceptance probability of the three perturbations depends on the ratioof their likelihoods. After a burn-in phase, the process iterates, eventually reaching the total numberof pre-defined iterations. By attributing a constant velocity magnitude inside each Voronoi cell, we puta probabilistic structure in the prior, assuming that such velocity values follow independent uniform4ariables. Playing with the bound of the random variables, we can incorporate in the priors spatialtrends learned by experts by different data sources, like the deposition history. Reverse Time Migration (RTM) is a depth migration approach based on the two-way wave equationand an appropriate imaging condition. Restricting ourselves to the acoustic isotropic case, the waveequation is, ∇ p ( r , t ) − v ( r ) ∂ p ( r , t ) ∂t = 0 , (4)where r = ( r x , r y , r z ) is the position vector, v is the compressional wave velocity, and p ( r , t ) the hydrostaticpressure in the position r and time t .The RTM imaging condition [18] is the zero-lag cross-correlation of the forward-propagated sourcewavefield ( S ) and the backward-propagated receiver wavefield ( R ) normalized by the square of the sourcewavefield, I ( r ) = (cid:82) T S ( r , t ) R ( r , t ) dt (cid:82) T S ( r , t ) dt . (5)The source wavefield is the solution of equation 4 with the seismic wavelet treated as a boundarycondition. Similarly, the receiver wavefield is the solution of the same equation 4 with the observeddata (seismograms) as a boundary condition. The normalization term in equation 4 produces a source-normalized cross-correlation image that has the same unit, scaling, and sign as the reflection coefficient[18]. Therefore, within the MC method, RTM migrates the near-surface recorded signals for the ensembleof BET-generated velocity samples. To reach a global representation of the solution variability within the ensemble of plausible velocityfields and images produced by the workflow, we follow [19]. They proposed a confidence index, thatnormalizes the uncertainty map explored by Li et al. [3], that assigns at each spatial point an evaluationof the degree of uncertainty expressed by the standard deviation σ ( r ), where low values represent regionswhere there are high variabilities, and high values the regions where there are small variabilities. Theconfidence index is expressed by, c ( r ) = σ max − σ ( r ) σ max − σ min , (6)where c ( r ) is the confidence in position r , and σ min , σ max and are the minimum, maximum and standarddeviations in the r position, respectively.An alternative measure is the ratio between standard deviation and average, that is, r ( r ) = σ ( r ) µ ( r ) , (7)where, µ represents the average map of the data. In this case, Equation 7 measures the variations amongthe samples with the mean as reference. 5 . A Workflow for Seismic Imaging In this section, we present the components of our workflow for seismic imaging with quantified uncer-tainty. We have chosen a particular set of chained components that provide a comprehensive probabilisticframework for seismic imaging. Nevertheless, we recognize that different components could be chainedin a similar form to provide other possible workflows. Thus, a workflow is an abstraction that representsstages of computations as a set of activities and a data flow among them. Each activity can be a softwarethat performs operations on an input data set, transforming it into an output data set. A workflow canrepresent the whole seismic imaging stages as a chain of activities. Stages of seismic imaging (or workflowactivities) can be replaced or extended to build a different viewpoint of the outcomes, such as statisticinformation for uncertainty quantification.Figure 1 shows the three workflow stages. BET produces an ensemble of velocity fields (stage 1),RTM generates the corresponding seismic images (stage 2), and stage 3 builds the uncertainty mapsfor the velocity and migrated images from the results of stages 1 and 2. To efficiently implement thisworkflow, we address three computational aspects: parallelism, data transfer, and data management.Figure 2 describes how the workflow produces the ensemble of velocities and migrated images for posterioruncertainty quantification, the hybrid parallelism in the RTM stage, and where data compression is usedto reduce storage and network data transfer.
Figure 1: Results of BET (Stage 1), and RTM for each velocity field (Stage 2) built-in Stage 1. Stage 3 shows the uncertaintymaps for velocities and migrated images. The maps can guide interpreters to potential areas with low/high uncertainty invelocities and structures.
Stage 1 aims to build a probabilistic density function for the velocity encompassing feasible solutions6hat fit the calculated travel times to the data and provides enough information for the next two stages.To generate samples of the stochastic field, BET needs to run thousands of RJMCMC iterations. In theVoronoi’s tessellation within RJMCMC the number of cells, their position and size vary. This strategyproduces a number of Voronoi cells smaller than the velocity parameters, as a cell might contain manycomputational grid points. We use a KD-tree [ ? ] to reduce the searching cost in the Voronoi’s tessellationupdating. Solving the Eikonal wave equation in each iteration dominates BET costs. We use here theopen-source software FaATSO to solve the Eikonal wave equation. Figure 2: Stage 1 provides a probabilistic distribution of velocity using BET. Stage 2 migrates the seismograms for allvelocity parameters, generating, thus, a set of stacked seismic sections. Finally, Stage 3 extracts statistical information fromvelocity parameters and migrated images. Compression is used to transfer the data among the stages. Step 2 uses rtmUq2D a hybrid parallel RTM code.
At the end of stage 1, we select a set of velocities for the migration based on the variance of successivebatches of the post-burn-in samples. Thus, when the stochastic process converges, the variance amongthe samples stabilizes. In the second stage, RTM migrates the seismograms for the sub-set of velocityfields. These migrations produce the corresponding set of seismic images, where each one has a directrelation with one velocity sample coming from stage 1. To attenuate low-frequency artifacts in the brutestack images, we apply a Laplacian filter. The RTM kernel, that is, the second-order wave equation,is discretized with a second-order finite difference scheme in time and an eighth-order scheme in space.The reverse problem implementation explores the technique proposed by [21]. This technique guaranteesthe coercivity of the adjoint problem. Furthermore, the RTM code, rtmUq2D , takes advantage of the Fa st Marching A coustic Emission T omography using S tandard O ptimization: https://github.com/nbrantut/FaATSO rtmUq2D also contains OpenMP directives to explore multiple cores parallelism. To complete ourhybrid RTM solution, we apply MPI to manage the execution of multiple migrations and shots. Thus, webroke the total number of migrations into subgroups, and each subgroup is assigned to an MPI process,as exemplified in Figure 2. When a subgroup finishes its workload, the next one starts immediately.The batch execution (an RTM for each subgroup) is repeated until exhausting the number of samples.Besides, each RTM verifies its own CFL condition to guarantee the stability of the discrete solution ofthe two-way wave equation for each sample.Once stages 1 and 2 produce ensembles of velocity fields and seismic images, a condensed repre-sentation of this information can be calculated. In stage 3, velocity uncertainty maps are obtained assoon as we accept BET solutions. The same procedure is valid for generating migrated images. Hence,stage 3 generates uncertainty maps representing variability in velocity, and reflectors amplitudes. Theseoutcomes give to geoscientists extra information to guide the seismic interpretation.However, to better support interpretation and improve confidence, it is essential to register relation-ships between data produced at each workflow stage. For example, the uncertainty maps of stage 3 arerelated to the velocity fields and statistics, which are also related to the seismic images. Tracking thesedata derivations among the stages at runtime, combined with the procedures’ execution time, can sup-port workflow parameters debugging and fine-tuning. Provenance data [22] represents the data derivationpath of the data transformations during the stages. Provenance data capture and storing in a prove-nance database while the workflow executes can add an overhead to the workflow execution. We chooseDfAnalyzer as a provenance asynchronous data capture software library because it can be used by highperformance computing (HPC) workflows to provide for runtime data analysis with negligible overhead[23]. While monitoring the workflow execution, users submit queries to the travel time matrix duringthe migration or go directly from an input velocity model file to the corresponding files that have theirstatistics obtained after all samples were aggregated. DfAnalyzer has been successfully used for otherHPC applications [24, 25].Data transfer is also a concern when each stage in the workflow has to manage its information andsend it to the next workflow stage. The amount of data transfer can increase drastically, given thenumber and size of the samples in the ensemble of velocities and migrated images. Data compressioncan reduce persistent storage and data transfer throughout the chain [15]. The ZFP library is an open-source C/C++ library for compressing numerical arrays. To achieve high compression rates, ZFP useslossy but optionally error-bounded compression. Although bit-for-bit lossless compression of floating-point data is not always possible, ZFP also offers an accurate near-lossless compression [15]. We employZFP to compress the numerical arrays that store the velocity parameters, imaging conditions, and theseismograms. This use is linked to the workflow stages to reduce the pressure on the network data transferamong the workflow stages and to reduce I/O to persistent storage. However, lossy compression has tobe managed carefully, and with safe error-bounds. We test both lossy and lossless compression, and we8easure the error on the final results.
4. Computational Experiments
We designed some computational experiments to assess the overall workflow behavior and perfor-mance. They consist of verifying RTM strong and weak scalability, measuring the numerical errors onthe final results after applying data compression, and discussing the workflow uncertainty quantificationoutcomes using the data analysis tools. We choose two datasets: a two-layer constant velocity field forscalability analysis, and one based on the 2-D Marmousi2 benchmark [26] for applying data compressionand running the full workflow.
We choose a two-layer constant velocity model that is 3.0 km depth and 9.2 km in the horizontaldirection for the scalability tests. The velocities for the two layers are 2000 m/s, and 3000 m/s. Wegenerate the reference data set for RTM based on the true velocity field. The reference data set is thewavefield recorded by near-surface hydrophones (one seismogram) with a cutoff frequency of 30 Hz. Theinput velocity field for RTM is a smooth version of the synthetic velocity model.Figure 3 shows the RTM strong scalability considering a migration of one shot for an increasing numberof cores. We run the tests on different platforms: 1 vector processor (NEC SX-Aurora TSUBASA TypeB),and 2 multi-cores CPUs (Intel Xeon E5-2670v3 (Haswell) and Intel Xeon Platinum 8160 (Skylake) CPUs).The vector processor available has its minimum configuration, that is, 8 cores, while Haswell and Skylakehave 24 cores each.
Figure 3: RTM scalability analysis on different platforms. The tests consist of a migration for one seismogram. The vectorprocessor available has 8 cores, while Haswell and Skylake have 24 cores each. n times to build thestatistics outcomes, the hybrid RTM has to run n times in n nodes. In this case, the CPU time for eachnode has to be approximately the same. Assuming the same parametrization of the strong scalabilitytest, we run the RTM code for an increasing number of nodes and measure the CPU time for each one.We note that the RTM takes, on average, 3.281s with standard deviation 0.0578s to execute one RTMper node on the Haswell cluster. Thus, the CPU time for running one RTM in one node is approximatelythe same as running five RTM in five nodes. We observe the same pattern for the other platforms. We select the 2-D Marmousi2 velocity model benchmark [26] for the data compression experiments.The benchmark is 3.5 km depth and 17.0 km in the horizontal direction and has parameters models(p-wave velocity and density), and other data sets related to its acquisition. However, we generate theseismograms solving the two-way wave equation with true velocity. Near-surface hydrophones record theseismogram with a cutoff frequency of 45 Hz. Thus, we have 160 seismograms to be used in the migrationstage. In the BET stage, the reference data set is 34 travel-time panels obtained from solving the Eikonalequation with true velocity.After BET, we pick 5000 post-burn-in velocity models for migration. Thus, the RTM produced 5000seismic images that corresponding with each velocity model from stage 1. Hence, we use 5000 velocitymodels, 5000 seismic migrations, and 160 seismograms to test the lossy and lossless rate compression.Table 1 shows the results related to compression efficiency for the velocity models and seismic images.The compression efficiency for the seismograms are in Table 2.
Table 1: Fixed accuracy lossy and lossless compression comparison. The error tolerances for lossy compression are 10 − ,and 10 − . Each value is the average of the 1000 compressed matrices (velocity fields, and seismic images). Compression Velocity (MB) Image (MB)None 6.10 6.10Lossless 4.78 5.34Lossy (10 − ) 4.77 3.65Lossy (10 − ) 2.98 1.83Each velocity model and the seismic image without compression has 6.1 MB. The values in the columns”Velocity” and ”Image” represent, respectively, the average size of 1000 matrices for the velocity models10nd seismic images without compression (first line), and with lossless and lossy compression. We applythree compression types: the first one is lossless compression, the second one is lossy compression with anabsolute error of 10 − , and the last one is lossy compression with an absolute error of 10 − . We can seethat the lossy compression achieves the best compression rate, reaching about 51.1% of compression forthe velocity models (2.98 MB), and about 70.0% of compression for the seismic images (1.83 MB). Notethat the compression rates for each matrix are different since ZFP explores the structure of the data.Although less efficient, the lossless compression reaches about 21.6% of compression for the velocitymodels (4.78 MB) and about 12.5% of compression for the seismic images (5.34 MB). This efficiency ratefor lossless compression is compatible with its purpose, which is to preserve the original information.We also apply lossless and lossy compression to the 160 seismograms. Each seismogram has 76.20MB. The error tolerance for lossy compression is to 10 − . Compression rates reach a minimum of 13.38MB and a maximum of 8.38 MB, as we can see in Table 2. These values represent 82.4%, and 89.0% lessthan the original data, respectively. Table 2: Seismograms fixed accuracy lossy and lossless compression comparison. The error tolerance for lossy compressionis set to 10 − . The values represent the minimum and maximum compression among the 160 seismograms. Compression Type Minimum (MB) Maximum (MB)None 76.20 76.20Lossless 59.28 48.46Lossy 13.38 8.38The results from Table 1, and Table 2 suggest that we can use compressed data on the workflow and,thus, reduce the storage and network data transfer. However, we need to measure the error propagationdue to compression on the final seismic images. Therefore, to measure the compression errors, we run RTMto migrate the seismograms without compression with an uncompressed velocity model as a parameter.Figure 4(a) shows the migration of the uncompressed seismograms, and Figure 4(d) shows the 2D Fourierspectra of the two white boxes shown in Figure 4(a). The described outcomes are the reference oncewe do not use ZFP compression. The same procedure is applied using the lossless compression of thevelocity model and seismograms as entries for RTM migration, and the lossy compression (tolerance errorof 10 − ) of the velocity model and seismograms as entries for RTM migration. These results can be seenin Figures 4(a)(e), and Figures 4(c)(f), respectively.We calculate the normalized root mean square (NRMS) error among the RTM results from Figure 4(a),and the RTM results from Figures 4(b), and 4(c). The NRMS errors are 2 . × − % and 4 . × − %,respectively. The NRMS errors among the spectra from the left white box regions in Figures 4(a), 4(b)and 4(c) are 1 . × − %, and 1 . × − %, respectively. Moreover, the NRMS errors among the spectrafrom the right white box regions in Figures 4(a), 4(b) and 4(c) are 1 . × − %, and 2 . × − %,respectively. The results from Figure 4 and the NRMS errors show that the chosen compression levels11or the velocity models and seismograms lead to migrated images with errors of order 10 − %. Thecompression strategy allows to reduce 85% in storage, and, consequently, improves network data transferwithout compromising accuracy. Figure 4: RTM outcomes after migrating the (a) reference seismograms, seismograms with (b) lossless compression, andseismograms with (c) lossy compression. The velocity field inputs are not compressed for migration in Figure (a), losslesslycompressed for migration in Figure (b), and lossy compressed for Figure (c). Figures (d), (e), and (f) represent the 2DFourier spectra of the white box regions shown in Figures (a), (b), and (c), respectively.
Provenance data is captured to analyze data from the several workflow stages by DfAnalyzer andstored in a columnar database using the MonetDB system . The workflow needs to have a definitionof which data should be part of the provenance database. Then, DfAnalyzer calls are inserted in theworkflow code to capture data provenance, as shown in Listing 1. We run a series of RTMs with andwithout DfAnalyzer calls. In all tests, the overhead of provenance data capture and storage remainsbelow 2.95%, as shown in Table 3. Also, the overhead rate decreases as the number of nodes increases.During and after the workflow executions, queries can be submitted to the provenance database usingfilters or data aggregations over the workflow stages data derivation path, as shown in Figure 5. We discuss here the results of sequentially applying BET and RTM to the raw data provided byseismograms generating images with quantified uncertainty. The ensemble of uncertain velocity fields able 3: RTM overhead with DfAnalyzer calls. Each node executes two RTMs. shot shotLocation(x) m shotLocation(z) m seismogram path cross correlation path Figure 5: Provenance query result showing the shots with their ( x, z ) location and the path of their corresponding seismogramimage file and the cross correlation file, filtered by a shot location where z = 25 and 0 < x < resulting from BET is assessed through primary statistics, such as velocity mean and standard deviation.In the sequence, we compute uncertainty maps. After the RTM migrates the seismograms for eachvelocity field, we analyze the impact of the inversion uncertainties in the seismic images. Note that inthis stage, we choose to work on the compressed seismograms and velocity fields to reduce the storage andnetwork transfer. As we show in Figure 4, the lossy compression for both velocity and seismograms, withan error tolerance of 10 − , does not produce migrated seismic images with poor quality. Hence, we usethem in the whole workflow. Besides, the Marmousi2 benchmark, as specified in Sub-section 4.2, is usedto demonstrate the outcomes of the workflow. To keep track of the impact of each uncertainty source, werestrict the analysis in this particular example to the noise associated with monitoring the data, definedby σ d in equation 1. The other component that would induce uncertainty would be the scarcity of datapromoted by acquiring signals only on the surface. That is circumvented by collecting the synthetic data13n all grid points. We anticipate facing a lower degree of uncertainty in the final images due to such achoice. int task_id = 1; Task task_runrtm = Task( dataflow . get_tag (), runrtm . get_tag (), task_id ); vector
Figure 6 shows the standard deviation (a), confidence index (b), and standard deviation over themean value (c) of the velocity fields. The standard deviation field expresses the degree of uncertaintyin the velocity at each point of the domain. The minimum standard deviation values are around 1 m/sand localized in shallow regions. On the other hand, the maximum standard deviation values occurin deeper regions with values around 80 m/s. The confidence index (Equation 6) is the normalizedstandard deviation and represents the confidence degree in the velocities [3]. We depict in Figure 6 (b)the confidence index within the image domain. We can see that such criterion assigns higher confidence14 > .
8) to shallow regions. Figure 6 (c) shows r ( r ) map. Again we observe the same trends. Figure 6: Figures show the standard deviation (a), confidence index (b) and r ( r ) map (c) of the Marmousi velocity fields. We plot in Figure 7 (a) the standard deviation for a representative image computed as an averageamongst those of the ensemble. Besides, we plot the confidence index (b) and r ( r ) map (c) of the seismicimages obtained from RTM wrapped on the MCMC. In this case, the observed variations are related to theamplitude of the reflections instead of p-wave velocity. That is the variability of the amplitude intensity.This one is maximum in the interface between two rocks with different physical properties. However, if anestimated p-wave velocity differs from the exact p-wave velocity of the medium, the amplitude intensityis misplaced. Hence, the high standard deviation, low confidence index, and low standard deviation overthe mean values are highlighted in the interfaces and predominant in the middle region.15 igure 7: Figures show the standard deviation (a), confidence index (b) and r ( r ) map (c) of the migrated seismic images.RTM migrates 160 seismograms for each velocity field.
5. Conclusions
We proposed a new scientific workflow that supports uncertainty quantification in seismic imaging.Our stochastic perspective to handle uncertainties is composed of three sequential stages: inversion,migration, and interpretation. A Bayesian tomography (Stage 1) provides the input data for the RTMwrapped on the MCMC (Stage 2). The workflow deploys means to end-users to explore uncertainty inthe final images through maps computed in Stage 3.The workflow execution requires significant computing power and generates a huge amount of data.Thus, we use optimized hybrid computing solutions that explore vectorization, data compression, andadvanced data management tools. These tools allow us to obtain a scalable solution for the MarmousiVelocity Model benchmark with quantified uncertainty. ZFP compression has been used to transfer16nformation among the stages and to store the final images. High levels of compression have beenobserved for the ensembles of velocities and migrated images. The compression strategy improves thenetwork data transfer and reduces the amount of data up to 85%, assuming a tolerance error of 10 − without affecting the quality of the final result. The provenance database keeps a record of the workflowexecution stages enriching the seismic data analysis quality. We observed that capturing and storingprovenance data during the workflow execution is inexpensive. Acknowledgements
This study was financed in part by CAPES, Brasil Finance Code 001. This work is also partiallysupported by FAPERJ, CNPq, and Petrobras. Computer time on Stampede is provided by TACC, theUniversity of Texas at Austin. The US NSF supports Stampede under award ACI-1134872. Computertime on Lobo Carneiro machine at COPPE/UFRJ is also acknowledged. We are also indebted to NECCorporation.
Computer Code Availability
The codes can be downloaded at the repository https://github.com/Uncertainty-Quantification-Project .The repository provides three opensource codes ( bayes-tomo , rtmUq2D , and rtmUq2D-Prov ). The userguide describes the software and hardware requirements, and directs to complementary opensource soft-ware used in the paper. References [1] J. Caers, Modeling uncertainty in the earth sciences, John Wiley & Sons, 2011.[2] S. Fomel, E. Landa, Structural uncertainty of time-migrated seismic images, Journal of AppliedGeophysics 101 (2014) 27–30.[3] L. Li, J. Caers, P. Sava, Assessing seismic uncertainty via geostatistical velocity-model perturbationand image registration: An application to subsalt imaging, The Leading Edge 34 (2015) 1064–1070.[4] O. Poliannikov, A. Malcolm, The effect of velocity uncertainty on migrated reflector: Improvementsfrom relative-depth imaging, Geophysics 81 (2016) S21–S29.[5] J. Messud, M. Reinier, H. Prigent, P. Guillauma, T. Col´eou, S. Masclet, Extracting seismic uncer-tainty from tomography velocity inversion and their use in reservoir risk analysis, The Leading Edge36 (2017) 127–132.[6] G. Ely, A. Malcolm, O. Poliannikov, Assessing uncertainty in velocity models and images with afast nonlinear uncertainty quantification method, Geophysics 83 (2018) R63–R75.177] T. Bodin, M. Sambridge, Seismic tomography with reversible jump algorithm, Geophysical JournalInternational 178 (2009) 1411–1436.[8] A. Botero, A. Gesret, T. Romary, M. Noble, C. Maisons, Stochastic seismic tomography by inter-acting markov chains, Geophysical Journal International 207 (2016) 374–392.[9] J. Belhadj, T. Romary, A. Gesret, M. Noble, , B. Figliuzzi, New parametrizations for bayesianseismic tomography, Inverse Problems 34 (2018) 33.[10] D. G. Michelioudakis, R. W. Hobbs, C. C. S. Caiado, Uncertainty analysis of depth predictionsfrom seismic reflection data using bayesian statistics, Geophysical Journal International 213 (2018)2161–2176.[11] J. Martin, L. C. Wilcox, C. Burstedde, O. Ghattas, A stochastic newton mcmc method for large-scale statistical inverse problems with application to seismic inversion, SIAM Journal on ScientificComputing 34 (2012) A1460–A1487.[12] N. Rawlinson, A. Fichtner, M. Sambridge, M. K. Young, Chapter one - seismic tomogra-phy and the assessment of uncertainty, volume 55 of
Advances in Geophysics , Elsevier, 2014,pp. 1 – 76. URL: .doi: https://doi.org/10.1016/bs.agph.2014.08.001 .[13] P. J. Green, Reversible jump markov chain monte carlo computation and bayesian model determi-nation, Biometrika 82 (1995) 711–732.[14] P. J. Green, D. I. Hastie, Reversible jump mcmc, Genetics 155 (2009) 1391–1403.[15] P. Lindstrom, P. Chen, , E.-J. Lee, Reducing disk storage of full-3d seismic waveform tomography(f3dt) through lossy online compression, Computers & Geosciences 93 (2016) 45–54.[16] A. F. M. Smith, Bayesian computational methods, Philosophical Transactions: Physical Sciencesand Engineering 337 (1991) 369–386.[17] N. Brantut, Time-resolved tomography using acoustic emissions in the laboratory, and applicationto sandstone compaction, Geophysical Journal International 213 (2018) 2177–2192.[18] H.-W. Zhou, H. Hu, Z. Zou, Y. Wo, O. Youn, Reverse time migration: A prospect of seismic imagingmethodology., Earth-Science Reviews 179 (2018) 207–227.[19] Y. Li, J. Sun, 3d magnetization inversion using fuzzy c-means clustering with application to geologydifferentiation, Geophysics 81 (2016) J61–J78.[20] M. B. Kennel, KDTREE 2: Fortran 95 and C++ software to efficiently search for near neighbors ina multi-dimensional Euclidean space (2004). arXiv:0408067arXiv:0408067