[PDF] Probabilistic Neural Network Tomography across Grane field (North Sea) from Surface Wave Dispersion Data

Abstract

Surface wave tomography uses measured dispersion properties of surface waves to infer the spatial distribution of subsurface properties such as shear-wave velocities. These properties can be estimated vertically below any geographical location at which surface wave dispersion data are available. As the inversion is significantly non-linear, Monte Carlo methods are often used to invert dispersion curves for shear-wave velocity profiles with depth to give a probabilistic solution. Such methods provide uncertainty information but are computationally expensive. Neural network based inversion provides a more efficient way to obtain probabilistic solutions when those solutions are required beneath many geographical locations. Unlike Monte Carlo methods, once a network has been trained it can be applied rapidly to perform any number of inversions. We train a class of neural networks called mixture density networks, to invert dispersion curves for shear-wave velocity models and their non-linearised uncertainty. Mixture density networks are able to produce fully probabilistic solutions in the form of weighted sums of multivariate analytic kernels such as Gaussians, and we show that including data uncertainties in the mixture density network gives more reliable mean velocity estimates when data contains significant noise. The networks were applied to data from the Grane field in the Norwegian North sea to produce shear-wave velocity maps at several depth levels. Post-training we obtained probabilistic velocity profiles with depth beneath 26,772 locations to produce a 3D velocity model in 21 seconds on a standard desktop computer. This method is therefore ideally suited for rapid, repeated 3D subsurface imaging and monitoring.

Full PDF

PP ROBABILISTIC N EURAL N ETWORK T OMOGRAPHY ACROSS G RANE FIELD (N ORTH S EA ) FROM S URFACE W AVE D ISPERSION D ATA

Stephanie Earp

Department of GeosciencesUniversity of Edinburgh [email protected]

Andrew Curtis

Department of GeosciencesUniversity of EdinburghandInstitute of GeophysicsETH Zurich [email protected]

Xin Zhang

Department of GeosciencesUniversity of Edinburgh

[email protected]

Fredrik Hansteen

Equinor ASABergen, Norway [email protected] A BSTRACT

Surface wave tomography uses measured dispersion properties of surface waves to infer the spatialdistribution of subsurface properties such as shear-wave velocities. These properties can be estimatedvertically below any geographical location at which surface wave dispersion data are available. Asthe inversion is signiﬁcantly non-linear, Monte Carlo methods are often used to invert dispersioncurves for shear-wave velocity proﬁles with depth to give a probabilistic solution. Such methodsprovide uncertainty information but are computationally expensive. Neural network based inversionprovides a more efﬁcient way to obtain probabilistic solutions when those solutions are requiredbeneath many geographical locations. Unlike Monte Carlo methods, once a network has been trainedit can be applied rapidly to perform any number of inversions. We train a class of neural networkscalled mixture density networks, to invert dispersion curves for shear-wave velocity models andtheir non-linearised uncertainty. Mixture density networks are able to produce fully probabilisticsolutions in the form of weighted sums of multivariate analytic kernels such as Gaussians, and weshow that including data uncertainties in the mixture density network gives more reliable meanvelocity estimates when data contains signiﬁcant noise. The networks were applied to data fromthe Grane ﬁeld in the Norwegian North sea to produce shear-wave velocity maps at several depthlevels. Post-training we obtained probabilistic velocity proﬁles with depth beneath 26,772 locations toproduce a 3D velocity model in 21 seconds on a standard desktop computer. This method is thereforeideally suited for rapid, repeated 3D subsurface imaging and monitoring.

Seismic surface waves travel around the surface of the Earth but are sensitive to heterogeneity in elastic propertieswithin the subsurface. Different frequencies of surface waves travel at different speeds since they depend mainly onthe shear-wave velocity structure at different depths. Surface wave tomography uses this property (called dispersion)to infer the spatial distribution of subsurface shear velocities over global scales (Woodhouse and Dziewonski, 1984;Trampert and Woodhouse, 1995; Shapiro and Ritzwoller, 2002; Zhou et al. , 2006; Meier et al. , 2007a,b), regional scales(Montagner and Jobert, 1988; Curtis and Woodhouse, 1997; Curtis et al. , 1998; Ritzwoller and Levshin, 1998; Devilee et al. , 1999; Villasenor et al. , 2001; Simons et al. , 2002) and reservoir scales (Bussat and Kugler, 2011; de Ridder andDellinger, 2011; Mordret et al. , 2014). a r X i v : . [ phy s i c s . g e o - ph ] A ug robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion DataSurface wave tomography is often performed using a 2-step inversion scheme (Trampert and Woodhouse, 1995;Ritzwoller et al. , 2002). In step 1, travel times of surface waves between pairs of known locations are measured atvarious ﬁxed periods, then used to create geographical phase or group velocity maps at each period using 2D tomography.In step 2, the dispersion properties (speed of the waves at different periods – often referred to as a dispersion curve) ateach point on the 2D map are then inverted to estimate a 1D shear-wave velocity proﬁle with depth below that point.The 1D velocity proﬁles beneath many geographical locations can then be placed side-by-side and interpolated to createa 3D model of the subsurface.Both of the 2-step surface wave inverse problems are non-linear. They can be solved approximately by partiallylinearized (Bodin and Sambridge, 2009), or fully non-linear (Rawlinson et al. , 2014; Galetti et al. , 2015, 2016) MonteCarlo methods. These types of approaches provide relatively robust estimates of the range of possible shear wavevelocity structures with depth that are consistent with the measured surface wave speeds (often referred to as the solution uncertainty ) by using the Markov chain Monte Carlo (McMC) algorithm to perform the inversions in a Bayesianframework. However, all existing sampling based methods, including the direct (1-step) 3D Monte Carlo tomographymethod of Zhang et al. (2018), are extremely demanding computationally. If large data sets are to be inverted rapidlywhile maintaining our ability to assess post-inversion uncertainties without making linearizing approximations to thePhysics, different methods are needed to speed up fully non-linear inversions.We take an alternative approach and use neural networks to perform non-linear inversion of the phase velocities ofRayleigh-type Scholte surface waves (we refer to these simply as Rayleigh waves) for subsurface shear-wave velocityover length scales ~1-10km. Neural networks (NN’s) approximate a non-linear mapping between two parameter spaces.The mapping is inferred from a set of examples of inputs and corresponding outputs of the real mapping (these examplesare called training data). Using certain types of NN-based methods, uncertainties in the mapping can be output bythe network. NN’s are therefore useful for problems where the forward mapping is well known or simple to calculate(in order to construct many training data) but the inverse mapping is complex or costly to determine directly. In suchcases training data can be generated by applying the forward mapping to many sets of model parameter values, afterwhich the NN can be trained to map in the inverse direction, taking the measurable data as input and outputting modelparameter estimates.Once trained, NN’s can be applied to calculate the mapping for any input parameters extremely efﬁciently. For thisreason neural networks have become increasingly popular for solving geophysical problems in recent years. Applicationsinclude well-log analysis (Aristodemou et al. , 2005; Maiti et al. , 2007), ﬁrst arrival picking (Murat and Rudman, 1992;McCormack et al. , 1993), fault detection (Araya-Polo et al. , 2017; Huang et al. , 2017) and velocity analysis (Roth andTarantola, 1994; Calderón-Macías et al. , 2000; Araya-Polo et al. , 2018). However all of these methods provide onlydeterministic estimates of the inverse problem solution (in most cases, the mean model estimate). Neural networks canalso be used in a Bayesian sense to give fully probabilistic solutions. They were ﬁrst used in Geophysics to estimateBayesian uncertainties by Devilee et al. (1999) who inverted surface wave phase and group velocities for large-scalesubsurface velocity structure and interface depths. They inverted regional dispersion curves for discretised probabilitydistributions of crustal thickness across Eurasia using histogram and median networks, and analysed the trade-offbetween crustal thickness and velocity structure. Meier et al. (2007a,b) improved this method by using mixture densitynetworks (MDN’s) to give continuous probabilistic estimates of global crustal thickness and crustal velocity structure.A MDN is a type of network that maps an input vector to a probability density function (pdf) rather than to a singleset of output values (Bishop, 1995). Since the work of Meier et al. (2007a,b), mixture density networks have beenused to perform petrophysical inversion of surface wave data for global water content and temperature in the mantletransition zone (Meier et al. , 2009), inversion of industrial seismic data sets for subsurface porosity and clay content(Shahraeeni and Curtis, 2011; Shahraeeni et al. , 2012), inference of the Earth’s 1D global average structure usingbody-wave travel-times (De Wit et al. , 2013) and for earthquake detection and source parameter estimation (Käuﬂ et al. ,2014, 2015).To produce a 3D shear-wave velocity versus depth model on any scale using the 2-step method, the inverse problem forstructures with depth must be solved at many geographical locations (usually many thousands) over the area of interest.McMC inversion methods are computationally expensive and it is generally impractical to apply them in cases whereparameters or data sets are large, where computational efﬁciency and processing time are usually limiting constraintsdue to the need to forward model many samples (of the order of thousands or millions) at each location. On the otherhand, once trained, NNs and MDNs can often solve such inverse problems in seconds with no additional sampling.In addition, in cases where we wish to monitor changes in the subsurface, the same network can be applied rapidlyto repeated data measurements, enabling the possibility of near-real time monitoring provided that the inputs to thenetworks can be produced rapidly from the raw measured data. Our aim herein is to investigate whether this is possiblein practice. 2robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion DataIn what follows we ﬁrst introduce neural networks and mixture density networks and how they can be used to invertRayleigh wave phase velocities for models of 1D shear-wave velocity with depth. We discuss the effect of datauncertainty and how to incorporate this within a neural network, then apply trained networks to ﬁeld data from theGrane ﬁeld in the Norwegian North Sea to create 2D shear-wave velocity maps of speciﬁc depth intervals. We comparethe results from the MDN to non-linearized McMC methods, and thus prove that MDN surface wave inversion methodsare both efﬁcient and robust at the scale of reservoirs. Grane is an offshore oil ﬁeld in the Norwegian North Sea. A permanent reservoir monitoring (PRM) system wasinstalled in 2014 over approximately 50 km of the Grane seabed (Thompson et al. , 2015). Ambient seismic noiseis recorded continuously at the ﬁeld using four-component sensors – 3-component geophones (Vertical, North andEast) and a hydrophone. The data used in this study was preprocessed according to the protocol of Zhang et al. (2019),summarised as follows. Data from the vertical and hydrophone components were selected over a 6.5 hour interval.The data were bandpass-ﬁltered between 0.35-1.5Hz and data from every pair of stations are cross-correlated usingoverlapping half-hour recording sections, then correlations are stacked over the full 6.5 hour interval. Cross-correlationsof hydrophone and vertical component noise mainly contain information about Rayleigh-type waves. Phase velocitieswere automatically picked for the cross-correlation of each station-pair. Seventeen phase velocity maps and theircorresponding standard deviation (uncertainty) maps were produced using eikonal tomography for periods between 0.6to 2.2 seconds at 0.1 second intervals over a 50m x 50m grid. Figure 1a shows 4 examples of the phase velocity maps atperiods 0.7s, 1.0s, 1.3s and 1.9s and their corresponding uncertainties.Zhang et al. (2019) perform 1D, 2D and 3D Markov Chain Monte Carlo (McMC) Tomography over the Grane ﬁeld toproduce maps of the shear velocity structure with depth. However, McMC solutions are relatively slow to compute asthey require ~

3D or ~

1D forward modelling simulations to obtain robust results and this needs to be repeatedfor each new data set for 4D surveys. Given that a PRM system exists, and that large amounts of data can be recordedin this area at many different points in time, a faster method is desired for monitoring applications.

We wish to solve the surface wave inversion problem in a probabilistic framework to ﬁnd the Bayesian posteriordistribution of subsurface velocity structure parameters m that ﬁt the given data d , written as p ( m | d ) . This is deﬁnedas (e.g. Tarantola, 2005): p ( m | d ) = k p ( d | m ) p ( m ) (1)where p ( m ) represents the prior probability density on the velocity parameter space which describes information about m known prior to using data d , p ( d | m ) is known as the likelihood and represents the conditional probability ofmeasuring data d given the velocity parameters m , and k is a normalisation constant. In multidimensional problemswhere the dimensionality of m is greater than 1, we often wish to infer the posterior inversion information about asingle parameter with index i and hence must calculate the marginal posterior distribution p ( m i | d ) . This is obtainedby integrating over all parameters m j that are not of interest: p ( m i | d ) = (cid:90) ∀ m j (cid:54) = m i p ( m | d ) dm j (2)In this study we focus on estimating marginal distributions p ( m i | d ) . A neural network can be trained to represent the arbitrary non-linear mapping between the spaces of input data d andoutput parameters m by presenting the network with a set of N training pairs T = { ( d i , m i ) : i = 1 , ..., N } andminimizing a cost function that measures the difference between the NN output and the deﬁned output, often called the‘error’. For example if the set of training velocity structures m i are distributed according to the prior posterior densityfunction (pdf), then a network trained to output m i given input d i by minimizing the sum-of-squared errors acrossset T will output an approximation to the mean of the Bayesian posterior distribution p ( m | d ) when presented withdata d (Bishop, 1995). By contrast, in this paper we use a class of neural networks called mixture density networks.These provide a framework for modelling complete probability distributions. They are trained on the same set T ofdata-velocity structure pairs, but instead of providing the mean estimate of the velocity structure, they provide an3robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data Y ( k m ) b) Y ( k m ) a) Figure 1: (a) A selection of four phase velocity maps used to compute discretised dispersion curves. Periods shown are0.7s, 1.0s, 1.3s and 1.9s. (b) Maps of estimated standard deviation of uncertainties in the phase velocity at each location.Velocities and uncertainty colour scales are saturated at either end to prevent domination of outliers, and to highlightstructure across the ﬁeld. The vertical black line in the top left plot shows the location of a cross section shown in otherFigures. 4robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Dataestimate of the Bayesian posterior probability distribution p ( m | d ) . The estimate is parametrised by a mixture (sum)of Gaussian kernels p ( m | d ) = M (cid:88) k =1 α k Θ k ( m | d ) (3)where α k are amplitude parameters that attach relative importance to each Gaussian kernel, M is the number ofGaussians in the mixture, and Θ k are Gaussian density functions given by Θ k ( m | d ) = 1(2 π ) c/ σ ck ( d ) exp (cid:26) − ( m k − µ k ( d )) σ k ( d ) (cid:27) (4)where c is the dimensionality of m .The set of mixture parameters α k , means µ k and standard deviations σ k fully deﬁne the set of Gaussian kernels andhence the output of the MDN. Training an MDN thus requires that we create a way to predict appropriate values forthese parameters given any input data. For this task we use a standard feed-forward neural network which contains aset number of layers and nodes. At each layer the inputs of each node are weighted and summed before being passedthrough a function that induces non-linearity in the mapping. This provides an output value that can become the inputfor all units in the following layer. These weights are adjusted during training to provide the optimum mapping. Thenumber of mixtures M dictates the complexity of the ﬁnal probability distribution, and the number of network outputsis given by ( c + 2) × M compared with the output of a standard NN that has c outputs. The number of kernels thatshould be used depends on the complexity of the problem; however, as long as more kernels than necessary are used, anaccurate number is not required as the network can reduce the amplitude of any mixture parameter to near zero forredundant kernels (Bishop, 1995). For a full description of MDN and neural network structures see Bishop (1995), or ina geophysical context see Meier et al. (2007b) or Shahraeeni and Curtis (2011).During training the internal weights of the network are adjusted to maximize the likelihood of the desired probabilitydensity function given the training data. The cost function minimized is the negative log likelihood function (Bishop,1995) E MDN = − N (cid:88) n =1 ln (cid:40) M (cid:88) k =1 α k ( d n )Θ k ( m n | d n ) (cid:41) (5)We train multiple networks with different network structures, and each network’s parameters are randomly initialisedbefore training begins. The networks are then combined using a weighted average of network outputs in order toimprove generalisation and prediction accuracy (Dietterich, 2000). The weights for each network are determined by thecost function E MDN evaluated over the so-called test set – a portion of the data set removed before training and used totest the network once training has completed. An approximation to the posterior probability distribution of a set ofvelocity parameters m given data d is thus given by p ( m | d ) (cid:39) L (cid:88) k =1 c (cid:88) j =1 E k α kj (cid:80) Ll =1 E l Θ kj ( m | d ) (6)where E k = − exp( E MDN,k ) (7)and where L is the number of networks in the ensemble, E MDN,k is the cost function value of the k th network, α kj is the j th weighting parameter of the k th network, and Θ kj is the j th Gaussian kernel of the k th network. Once thenetworks have been trained, the outputs can be used to estimate the posterior probability distributions using Equations 4and 6. This gives a more complete description of the family of velocity structures that are consistent with the data thandoes the output of a standard NN. The velocity structures m are parametrised as follows: each 1D structure has a water layer of 126m, followed byconstant velocity layers every 25m to a depth of 100m below the water layer, then 50m thick layers down to 2000mbelow the water layer, beneath which there is a homogeneous half-space. For each velocity structure the S-wave velocityof the top solid layer was selected randomly from the Uniform probability distribution v top ∼ U (0 . km/s, . km/s ) to represent unconsolidated near-surface sediments. For a fundamental mode surface wave to be observed, the topsolid layer must have the lowest velocity (Galetti et al. , 2016), therefore the following layers were randomly selectedfrom distribution U ( v top , . km/s ) . We generated 1,000,000 velocity structures and Figure 2a shows the resultingdistribution of velocities along with an example velocity structure.5robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data D ep t h ( m ) a) D ep t h ( m ) b) Figure 2: (a) Initial distribution of velocity structures created with a piecewise-constant discretisation over depth. (b)Distribution of velocity structures created after averaging structures in (a) over larger depth intervals. Grey-scale showsthe probability density distribution, darker colours represent higher density of velocity structures, and the black line isan example of a randomly selected velocity structure in each panel which also illustrates the depth intervals used incases (a) and (b).The forward problem is solved for each of the generated velocity structures using the DISPER80 subroutines bySaito (1988) to obtain corresponding fundamental mode Rayleigh wave dispersion curves. The phase velocities werecalculated for periods 0.6-2.2s at 0.1s intervals in order to match the range available from ambient noise recorded atGrane. The DISPER80 forward modeller needs P and S-wave velocity and density for each layer in depth in order tocalculate the phase velocities at each of any set of discrete periods. From our S-wave velocity structures calculatedpreviously we computed a corresponding P-wave velocity v p and the density ρ for each velocity layer based on typicalvalues for sedimentary rocks using (Castagna et al. , 1985; Brocher, 2005) v p = 1 . v s + 1 . (8) ρ = 1 . v . p (9)Rather than attempt to invert surface wave speeds at 17 periods for shear velocities in 40 depth layers, before trainingthe velocity model is averaged over seventeen larger ﬁxed-depth intervals (Figure 2b). We then train networks to invertfor the velocity in each of these larger depth intervals. In past work, uncertainty information about the data is only included by adding random Gaussian noise to the trainingdata set (Devilee et al. , 1999; Meier et al. , 2007a; Shahraeeni and Curtis, 2011; De Wit et al. , 2013). Adding noiseacts to regularise the network, helps to generalise when the network is inverting new data, and accounts for the datauncertainty in the Bayesian solution. However the disadvantage of such an approach is that when presenting the networkwith new data, updated uncertainty information for those particular data is not included in the inversion; indeed, thatnetwork would invert the data assuming that the incorrect (old) data uncertainties still pertain.By contrast, here the data uncertainty is included as an additional set of inputs to the network. This makes sense becauseuncertainty is in fact additional pertinent information for each inversion. To train the MDN the clean synthetically-modelled data set is augmented with varying levels of noisy data. For each data point in the original synthetic data set a6robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data V e l o c i t y ( k m / s ) Figure 3: Graph showing a synthetic dispersion curve d (triangles) compared to a dispersion curve with added noise ˜ d (stars). The grey shaded area is the uncertainty u from Equation 10.Table 1: Table of percentage range of noise added to data set in each of six different noise scenarios. Uncertainty isadded to each datum according to Equations 10 and 11. In the ﬁrst scenario uncertainties are zero.Percentage Noise %Noise Scenario Min Max1 0 02 0 53 4 144 10 155 3 106 0 15random percentage of noise (cid:15) is selected between the bounds outlined in Table 1 for six different Uniform distributionsof (cid:15) . The noise is then added to the data according to u j = (cid:15) × d j (10) ˜ d j = N (0 , u j + d j (11)where u j is the uncertainty value of the noisy data ˜ d j and N (0 , is a Standard Normal distribution with mean 0 andstandard deviation 1. An example of noisy data and the randomly chosen noise level is shown in Figure 3. We thusgenerate an augmented training set of data-velocity structure pairs T uncer = { ([˜ d j , u j ] , m j ) : j = 1 , ..., N } , where ourdata consists of the noisy dispersion curves ˜ d j and their associated uncertainties u j . The ﬁnal data sets T and T uncer are then shufﬂed and split into a training set (90% of training pairs) that is used to train the network for the optimummapping, a validation set (5%) used during training to check the network is not over-ﬁtting the training examples (seebelow), and a test set (5%) which is used post training to assess the network performance on previously unseen data.This ﬁnal assessment provides weights E i for the network ensemble in Equation 6. Early stopping is employed toprevent over-ﬁtting of the network to the data: this is where the cost function is periodically checked on the validationset during training. When the cost function stops decreasing it is assumed that the network is already ﬁt to the trainingdata but is no longer improving its generalisation to new data. Training is then stopped. Networks are trained for two different datasets: ﬁrst a training set T in which data were perturbed by 10% Gaussiannoise is used to train what we refer to herein as a Noisy-MDN . This MDN does not include uncertainty in its inputvector. A second training set T uncer includes a variable uncertainty vector as described in the previous section, which isused to train what we refer to as an Uncertainty-MDN . To include both the dispersion curve and their uncertainties in7robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data

Dispersion Curve(Phase Velocity)Fully ConnectedLayers Fully ConnectedLayersFully ConnectedLayersConcatenateUncertaintyProbability ofS Wave VelocityStructure

Figure 4: Diagram of network used to include uncertainty estimates in the input vector. Rounded edged boxes representinputs/outputs of network. Squared edges boxes represent one or more fully connected layers within the model wherethe internal model weights are optimised during training. The structure of these is described in Table 4. The diamondbox represents the concatenation of layers: this step involves no new weights and simply concatenates the outputs of theprevious layers. The arrows represent the direction of ﬂow of data through the network.8robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data

Mean MDN Velocity T r u e V e l o c i t y Figure 5: Mean of the posterior marginal pdfs from Noisy-MDN inversions, versus the true value of velocity for eachvelocity structure in the set of smooth models. Each graph represents a different depth interval as indicated above thegraph. The corresponding Pearson correlation coefﬁcient R is given in the top left corner of each graph.the latter networks two inputs are included as shown in Figure 4. The dispersion curve is passed through 2 layers, whilstthe uncertainty is input separately and passed through one layer. The outputs of these two layers is then concatenatedbefore being passed through two more layers to output the parameter vector that deﬁnes the probability distribution ofthe shear wave velocity structure.Separate MDNs are trained for each depth interval in the velocity structure deﬁned in Figure 2b. For each intervalapproximately 40 networks are trained from which we select for the ensemble the 10 networks with the lowest costvalue across the validation set. The weights and biases are randomly initialized using the Glorot uniform initializer(Glorot and Bengio, 2010) for each training run, and we use different sizes of layers in the different networks to creatediversity. The different layer sizes were determined using a form of Bayesian optimization using the Python libraryhyperopt (Bergstra et al. , 2015). In Appendix A we describe the network conﬁgurations trained. The networks each usea Gaussian mixture with 15 kernels, so by using an ensemble of 10 networks a total of 150 kernels potentially contributeto each posterior distribution. However, we found that normally only 3 or 4 kernels with different means and standarddeviations were assigned signiﬁcantly non-zero amplitudes by each individual network.9robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data

Mean MDN Velocity T r u e V e l o c i t y Figure 6: Mean of the posterior marginal pdfs from Uncertainty-MDN inversions, versus the true value of velocity foreach velocity structure in the set of smooth models. Each graph represents a different depth interval as indicated abovethe graph. The corresponding Pearson correlation coefﬁcient R is given in the top left corner of each graph.

A set of 100,000 synthetic velocity structures to which no network has previously been exposed were then created.These simulate relatively smooth velocity structures by not allowing the velocity to vary more than 400m/s betweenneighbouring depth intervals. Corresponding data are created using the DISPER80 forward modeller, to which 10%Gaussian noise was added. For each depth interval in the velocity structure we apply the MDN ensemble to each of the100,000 synthetic data and calculate the mean of each posterior marginal pdf ¯ µ post ¯ µ post = c (cid:88) i =1 M α i µ i (12)The correlation between the mean of the posterior and the true target value for each data vector can be used to evaluatethe performance of the networks when presented with new data. This evaluation does not use all of the informationcontained in each posterior pdf, but does provide a practical way to begin to evaluate network performance. Figure 5shows the means of the posterior pdf of the fundamental mode Rayleigh wave Noisy-MDN inversions versus the truevelocity values across all of the synthetic smooth velocity models, for each depth interval. The corresponding Pearson10robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data a)b) Figure 7: Individual probability density functions for depths below 1226m for two synthetic velocity structures in (a)and (b) respectively. The solid line is the marginal posterior probability density function from the MDN, the verticaldashed line is the true velocity value, and the dot-dash line is the prior probability density function.correlation coefﬁcient, R , is shown in the top-left corner of the plot. The plots show a clear tendency for the mean of thenetwork to over-estimate the true velocity value. When the same inversions are performed using the Uncertainty-MDNs(Figure 6) the correlation between the mean MDN velocities and the true velocities improves at every depth level. Theadditional information provided to the network that describes uncertainties in the data results in a signiﬁcantly moreaccurate estimate of the velocity structure.The plots in Figure 6 allow us to compare how the networks perform at different depth levels. The performance ofthe networks decrease with depth, and at the deeper levels (1626-1826m) the mean of the Uncertainty-MDN tendstowards the mean of the prior. Figure 7 shows an example of the marginal posterior probability density function for twosynthetic velocity structures at depths below 1226m. In both plots the true velocity structure is far away from the meanof the prior distribution yet the predicted marginal posterior distribution is very close to the prior: this shows that atthese depths the networks are unable to add any information to the prior pdf given the data presented to the network.For this reason the following results are only shown down to a depth of 1226m. Figure 8 shows the inversion of synthetic data from Noisy-MDN inversions: as seen in Figure 5 the networks generallyover-estimate the predicted velocity and often the uncertainty does not encompass the true solution. The additionof noise of ﬁxed standard deviation to training data examples does not fully represent the true uncertainty of eachindividual data point, so the inversion of noisy data results in unreliable estimates of velocity. However, when adding theuncertainty estimates explicitly into the network, Uncertainty-MDN results produce more reliable estimates as shown bythe 1D depth inversions in Figure 9. The results from the networks are more representative of the true velocity structure,the maximum likelihood from the pdfs are much closer to the true shear wave velocity value, and the uncertaintyranges encompass the true velocity structure in all cases. This result is important for training and applying networksto ﬁeld data. Noise, measurement error or assumptions about the Earth velocity structure can all affect the reliabilityof ﬁeld data measurements; we never know whether our data is reliable, but we can often estimate the uncertainty inrecorded data. Including the uncertainty information in networks allows them to be applied to ﬁeld data with increasedconﬁdence that they will produce reliable estimates of posterior pdfs that encompass the true subsurface velocities.11robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data D ep t h ( m ) b) D ep t h ( m ) a) Figure 8: 1D depth inversion result from Noisy-MDNs for two synthetic velocity structures with individual probabilitydensity functions shown for four depth levels. In the depth inversions dark colours represent areas of higher probability,each row of the posterior integrates to unity, and the black solid line is the true synthetic velocity structure. In theindividual probability density functions the solid line is the marginal posterior probability density function from theMDN, the vertical dashed line is the true velocity structure, and the dot-dash line is the prior probability densityfunction.

The ﬁnal trained MDNs are applied to invert Rayleigh wave phase velocities from the Grane ﬁeld in the NorwegianNorth Sea. Dispersion curves were extracted at each grid point producing 26,772 dispersion curves to be inverted for1D depth-velocity structures. The standard deviations shown in Figure 1b were extracted at each point and used asthe uncertainty vector input to the Uncertainty MDNs (Figure 4). Figures 10 and 11 show the mean and associatedstandard deviations (representing uncertainty) of the posterior pdf estimated at the location of the black line in Figure1a from Noisy-MDNs and Uncertainty-MDNs respectively. Both plots of the mean show a reasonably similar structure:a near-surface low velocity layer down to 300m, then an increased velocity down to 600m, with yet higher velocitiesbelow this. However, the layers are more distinct in the inversion using the Uncertainty-MDNs. Figure 10a from theNoisy-MDN shows a higher variability in the velocity below 600m than does the mean in Figure 11a, and the velocityhighs in Figure 10a coincide with higher uncertainties in Figure 10b. When networks are trained including the fulluncertainty information (Figure 11) these velocity highs disappear so that the mean velocity and uncertainties arelaterally smoother across the section. We therefore now focus on the Uncertainty MDN results.Figure 12 shows the mean and standard deviation horizontal depth slices from the Uncertainty-MDNs. In the nearsurface maps (126m - 326m) the results show similar structures to those in the phase velocity maps in Figure 1a atshort periods, for examples within the dotted red boxes in Figure 12a. The deeper maps (536m - 826m) show structuressimilar to that of the longer period phase velocity maps, but also a higher standard deviation (Figure 12b) than shallowerlayers. As a result, the shear velocity variation in these deeper structures falls within their standard deviation, suggestingthat they might not represent true structure.The method outlined above can easily be extended to joint inversion of fundamental and ﬁrst higher mode data by addingtwo further inputs: the vector of ﬁrst higher mode phase velocity values generated from the velocity structures in theoriginal training set and a vector of their associated uncertainties. Figure 13 shows the cross-section results and Figure12robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data D ep t h ( m ) b) D ep t h ( m ) a) Figure 9: 1D depth inversion result from Uncertainty-MDNs for two synthetic velocity structures with individualprobability density functions shown for four depth levels. In the depth inversions dark colours represent areas of higherprobability, each row of the posterior integrates to unity, and the black solid line is the true synthetic velocity structure.In the individual probability density functions the solid line is the marginal posterior probability density function fromthe MDN, the vertical dashed line is the true velocity structure, and the dot-dash line is the prior probability densityfunction.14 shows the results from 4 depths layers, 126m-176m, 226m-326m, 426m-526m, 626m-726m, from Uncertainty-MDNjoint inversion. The same features seen in the shallow layer of Figure 12a are seen in the shallow layer of the jointinversion, highlighted by the red dashed boxes in Figure 14a. However, the velocities are on average higher thanthe fundamental mode-only MDN inversions and the standard deviations are larger. In addition, the depth slice at226m-326m is entirely different to the corresponding slice in Figure 12a. Figure 13a shows that the low velocitiesobserved in the top layers of the Uncertainity MDN fundamental model inversion (Figure 11a) no longer exist in thejoint inversion with higher modes, showing the latter waves appear to have added additional information to the inversion.However, we are less conﬁdent about the quality of the higher mode dispersion measurements than those from thefundamental mode, so we include this result as a demonstration, but in the Discussion below we focus mainly on thefundamental mode results.

We inverted Rayleigh wave phase dispersion curves for subsurface shear-wave velocities using MDN’s trained withadded Gaussian noise at a ﬁxed standard deviation to simulate average data uncertainties, and a second type of MDNwith the data uncertainty vector included as an additional input. We showed that to invert noisy data for reliable velocitystructures the uncertainty estimates should be included in the network.A constant number of ﬁxed depth-velocity intervals were used in each MDN inversion, leading to inversions for effectivemedium (averaged) shear velocities for each ﬁxed depth interval. A trans-dimensional network inversion would havehad to include varying depths and number of layers which would signiﬁcantly increase the dimensionality of thenetwork inversion problem and require a much larger training set and more complex network structure. This in turnwould increase training time and the memory needed for training, and would likely make the network outputs less stableand reliable since the posterior would effectively be emulating the inverse function in a higher dimensional space. For13robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data D ep t h ( m ) V e l o c i t y ( k m / s ) b) D ep t h ( m ) V e l o c i t y ( k m / s ) a) Figure 10: (a) Mean shear velocity cross-section, and (b) corresponding posterior standard deviation cross-sectionsfrom the Noisy-MDN inversion. The top white layer represents the water layer where shear velocity is zero.Table 2: Table showing the mean-squared difference between the Noise- and Uncertainty-MDN inversion cross sections,and the Markov Chain Monte Carlo cross sections of Zhang et al. (2019).Noisy-MDN Uncertainty-MDN1D MCMC 0.0018 0.00252D MCMC 0.0025 0.00263D MCMC 0.0021 0.0019our intended application (to test our ability to rapidly monitor the overburden of a permanently instrumented ﬁeld), theinversion for effective medium parameters over ﬁxed depth intervals was sufﬁcient.

We compare the Noise- and Uncertainty-MDN inversion results to the Markov chain Monte Carlo results of Zhang et al. (2019). Figure 15 shows the mean shear velocity cross sections of Figures 10a and Figures 11a along with thesame cross sections from 1D, 2D and 3D trans-dimensional Markov chain Monte Carlo inversion (MCMC). Despitecomparing a trans-dimensional result from Monte Carlo methods with ﬁxed-depth layer results from MDNs, all crosssections show a similar, approximately 3-layered structure. The 1D MCMC (Figure 15c) most represents the networkstrained using the Noisy-MDNs (Figure 15a) as both contain vertical velocity anomalies in the deeper part of the section.The Uncertainty-MDN has smoother variations laterally but also has a thicker near surface velocity layer and the secondlayer extends deeper into the section (to ∼ D ep t h ( m ) V e l o c i t y ( k m / s ) b) D ep t h ( m ) V e l o c i t y ( k m / s ) a) Figure 11: (a) Mean shear velocity cross-section, and (b) corresponding posterior standard deviation cross-sectionsfrom the Uncertainty-MDN inversion. The top white layer represents the water layer where shear velocity is zero. Redstar shows location of the joint pdf shown in Figure 16.

The results in the previous section are created from the 1D marginal posterior pdf p ( m i | d ) of the shear velocity in eachlayer independent of other velocities in each 1D proﬁle. The correlations between velocities at different depths cannotbe derived from such results. To estimate correlations it is necessary to analyse the joint posterior pdf p ( m i , m i +1 | d ) ,which can be constructed from the product of the conditional and marginal pdfs. p ( m i , m i +1 | d ) = p ( m i | d ) × p ( m i +1 | m i , d ) (13)The marginal pdfs p ( m i | d ) are given by the results shown in the previous sections. New networks are trained toestimate the conditional pdfs p ( m i +1 | m i , d ) by extending the input vector of the data with the velocity to which wewant to condition our data: in this example this is the velocity of the layer above the one being estimated.Figure 16 shows the results from the location shown by the red star in the Grane cross-section from Figure 11a. Theplot shows a weak negative correlation, representing the weak trade-off between velocities in subsequent layers. This islikely to be because a relatively coarse parametrisation (compare that in Figure 2a and 2b) was used over depth for theinverse problem. If a ﬁner parametrisation was used, such trade-offs would emerge more strongly as demonstrated byMeier et al. (2007b). Post-training, neural networks invert new data extremely rapidly: in this study it took approximately 21 CPU seconds toinvert all 26,772 locations. The results are compared to Monte Carlo methods which are known to be computationallyexpensive (Bodin and Sambridge, 2009): the MCMC methods used to create the crosslines shown in Figure 15 tookapproximately 186 CPU hours for 1D, 206 CPU hours for 2D, and 4824 CPU hours for 3D inversions. Despite the highervertical resolution of results from MCMC methods (since the parametrisation over depth varies in those inversions), thecompute-time for inversions is between 4 and 6 orders of magnitude larger than for trained MDNs.A comparison of time per inversion of an individual location for 1D MDN, 1D Monte Carlo and a 1D linearizedinversion is shown in Figure 17a. Monte Carlo inversion is computationally the most expensive, and MDN inversion15robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data Y ( k m ) b) S t anda r d D e v i a t i on Y ( k m ) a) S hea r V e l o c i t y Figure 12: Fixed depth maps of (a) the mean and (b) the standard deviation of the shear velocity from Uncertainty-MDNinversion of fundamental mode Rayleigh dispersion at depth slices 176m-226m, 226m-326m, 526m-626m, 726m-826m.16robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data D ep t h ( m ) V e l o c i t y ( k m / s ) a) D ep t h ( m ) V e l o c i t y ( k m / s ) b) Figure 13: (a) Mean shear velocity cross-section, and (b) corresponding posterior standard deviation cross-sectionsfrom the Uncertainty-MDN inversion of fundamental and ﬁrst higher mode Rayleigh dispersion. The top white layerrepresents the water layer where shear velocity is zero.is two orders of magnitude faster than even linearized inversion. However, in this comparison we only accounted forthe speed of the inversion which is not the full computational expense involved in using neural networks. Training anetwork before the inversion takes signiﬁcant computational time: in this study we took 1280 CPU hours to train all ofthe networks used. It should be noted that training the network needs only to be done once, and hence is independentof number of locations to be inverted; therefore inverting more locations renders the MDN inversion method morecomputationally efﬁcient. Figure 17b compares the CPU hours needed for monitoring-style repeated inversions acrossthe full Grane ﬁeld as performed in this study, including the time required for training MDNs. The initial cost oftraining a network before the ﬁrst inversion is high, but thereafter repeated inversion of new data sets is nearly free.In comparison to 1D MCMC methods, even accounting for the initial training period, neural network methods aremore efﬁcient. Linear inversion methods are computationally cheaper that MDN methods: in this case approximately1000 inversions of the same ﬁeld would be needed for the neural network method to become cheaper. Surface wavetomography is a non-linear inversion problem and despite the linearized inversions involving fewer CPU hours they canonly give approximate solutions, in particular for uncertainty estimates, due to their implicit assumption of incorrect(linear) physics. The neural network method provides a fully probabilistic, fully non-linear solution that, once a networkis trained, can be used to obtain rapid, repeated inversions.

We trained mixture density networks (MDN’s) to invert fundamental mode Rayleigh wave dispersion curves forsubsurface shear-wave velocity using two different methods to represent data uncertainties. The MDNs give a fullyprobabilistic solution to this non-linear inversion problem giving comparable results to Monte Carlo solutions. We showthat inputting data uncertainties explicitly to the network provides a more reliable solution estimate on noisy syntheticdata, and a smoother result that is more similar to 3D Monte Carlo inversion results on ﬁeld data. The same method isused for joint inversion of fundamental and ﬁrst higher mode data. Once trained, the neural network approach givesrapid results that can be repeatedly applied to similar types of data in monitoring scenarios.17robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data Y ( k m ) a) Y ( k m ) b) Figure 14: Fixed depth maps of (a) the mean and (b) the standard deviation of the shear velocity from Uncertainty-MDN inversion of fundamental and ﬁrst higher mode Rayleigh dispersion at depth slices 176m-226m, 226m-326m,526m-626m, 726m-826m. 18robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data D ep t h ( m )

3D MCMC V e l o c i t y ( k m / s )

2D MCMC V e l o c i t y ( k m / s ) D ep t h ( m )

1D MCMC V e l o c i t y ( k m / s ) D ep t h ( m ) Uncertainty - MDN V e l o c i t y ( k m / s ) D ep t h ( m ) Noise - MDN V e l o c i t y ( k m / s ) a) b)c) d) e ) Figure 15: Mean shear velocity along the cross-section in Figure 1a from (a) MDN inversions using a training set withadded Gaussian noise of ﬁxed standard deviation, (b) MDN inversions using estimated data uncertainties as added inputdata, (c) independent 1D Monte Carlo inversions, (d) a single 2D Monte Carlo inversion, and (e) a 3D Monte Carloinversion, where results in (c), (d) and (e) are from Zhang et al. (2019). The top white layer represents the water layer,where the shear velocity is zero.

Acknowledgements

The authors would like to thank the Grane license partners Equinor ASA, Petoro AS, ExxonMobil E&P Norway ASand ConocoPhillips Skandinavia AS for allowing us to publish this work. The views expressed in this paper are theviews of the authors and do not necessarily reﬂect the views of the license partners. The authors thank the EdinburghInterferometry Project sponsors (Equinor, Schlumberger Cambridge Research and Total) for supporting this research.

Appendix 1: Network conﬁgurations

The terminology used here is standard for neural networks and is deﬁned succiently in Bishop (1995). The networksusing Gaussian noise to simulate uncertainty in the data were trained using 3 fully connected layers (FC), where eachnode receives an input from every node in the previous layer. Between each node of the FC layers a rectiﬁed linear unit(ReLU) is used. The individual layer sizes and the total number of parameters to be trained in each network is outlinedin Table 3.The networks that included uncertainties as inputs were trained using 2 fully connected layers connected to the dispersioncurve data and one fully connected layer connected to the uncertainty data, before concatenating the layers together andapplying a further two hidden layers of size 250 and 150 respectively (Figure 4). In between each node of the fully19robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion Data m - m ( k m / s ) m i m i + Figure 16: Joint pdf comparing the velocity trade-off between two adjacent layers m i and m i +1 at depths given in theaxis labels. The red star represents the mean velocity shown in Figure 11a. C P U H ou r s MCMC 1DLinearisedMDN

10 20 30 40 50Number of inversions over Grane Field10 C P U H ou r s MCMC 1DLinearisedMDN T r a i n i n g T i m e a) b) Figure 17: Plots showing the CPU hour time for (a) one inversion per location and (b) the inversion over the entireGrane ﬁeld using MDNs, linearized inversion and Monte Carlo 1D (MCMC 1D) methods. The typical time taken totrain the MDN (including the forward modelling runs) is included in (b) but not in (a).20robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion DataTable 3: Network conﬁgurations of the networks for which Gaussian noise of ﬁxed standard deviation was added to thetraining set. Each network structure is trained 5 times with different random initialisations of starting parameter values.FC 1 FC 2 FC 3 Total parameters200 300 200 133,145400 200 350 173,545400 1000 150 565,145200 1000 350 570,745400 500 350 398,845400 1000 200 617,445300 300 150 147,645400 1000 350 774,345Table 4: Network conﬁgurations of the networks that included uncertainty vectors in the training set. Each networkstructure is trained 5 times with different random initialisations of starting parameter values.Dispersion Uncertainty TotalFC 1 FC 2 FC 3 parameters1295 240 500 573,0451100 900 550 1,427,795960 860 400 1,210,6351000 220 1000 605,915950 1000 140 1,300,3151100 800 450 1,265,895930 960 100 1,221,9951200 200 600 517,295connected layers a rectiﬁed linear unit (ReLU) is used. The individual layer sizes and the total number of parameters tobe trained in each network is outlined in Table 4.

References

Araya-Polo, M., Dahlke, T., Frogner, C., Zhang, C., Poggio, T., and Hohl, D. (2017). Automated fault detection withoutseismic processing.

The Leading Edge , (3), 208–214.Araya-Polo, M., Jennings, J., Adler, A., and Dahlke, T. (2018). Deep-learning tomography. The Leading Edge , (1),58–66.Aristodemou, E., Pain, C., De Oliveira, C., Goddard, T., and Harris, C. (2005). Inversion of nuclear well-logging datausing neural networks. Geophysical Prospecting , (1), 103–120.Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., and Cox, D. D. (2015). Hyperopt: a python library for modelselection and hyperparameter optimization. Computational Science & Discovery , (1), 014008.Bishop, C. M. (1995). Neural Networks for Pattern Recognition . Oxford University Press.Bodin, T. and Sambridge, M. (2009). Seismic tomography with the reversible jump algorithm.

Geophysical JournalInternational , (3), 1411–1436.Brocher, T. M. (2005). Empirical relations between elastic wavespeeds and density in the earth’s crust. Bulletin of theseismological Society of America , (6), 2081–2092.Bussat, S. and Kugler, S. (2011). Offshore ambient-noise surface-wave tomography above 0.1 hz and its applications. The Leading Edge , (5), 514–524.Calderón-Macías, C., Sen, M. K., and Stoffa, P. L. (2000). Artiﬁcial neural networks for parameter estimation ingeophysics. Geophysical Prospecting , (1), 21–47.Castagna, J. P., Batzle, M. L., and Eastwood, R. L. (1985). Relationships between compressional-wave and shear-wavevelocities in clastic silicate rocks. Geophysics , (4), 571–581.Curtis, A. and Woodhouse, J. H. (1997). Crust and upper mantle shear velocity structure beneath the tibetan plateau andsurrounding regions from interevent surface wave phase velocity inversion. Journal of Geophysical Research: SolidEarth , (B6), 11789–11813. 21robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion DataCurtis, A., Trampert, J., Snieder, R., and Dost, B. (1998). Eurasian fundamental mode surface wave phase velocities andtheir relationship with tectonic structures. Journal of Geophysical Research: Solid Earth , (B11), 26919–26947.de Ridder, S. and Dellinger, J. (2011). Ambient seismic noise eikonal tomography for near-surface imaging at valhall. The Leading Edge , (5), 506–512.De Wit, R. W., Valentine, A. P., and Trampert, J. (2013). Bayesian inference of earth’s radial seismic structure frombody-wave traveltimes using neural networks. Geophysical Journal International , (1), 408–422.Devilee, R. J. R., Curtis, A., and Roy-Chowdhury, K. (1999). An efﬁcient, probabilistic neural network approach tosolving inverse problems: Inverting surface wave velocities for eurasian crustal thickness. Journal of GeophysicalResearch: Solid Earth , (B12), 28841–28857.Dietterich, T. G. (2000). Ensemble methods in machine learning. In International workshop on multiple classiﬁersystems , pages 1–15. Springer.Galetti, E., Curtis, A., Meles, G. A., and Baptie, B. (2015). Uncertainty loops in travel-time tomography from nonlinearwave physics.

Physical review letters , (14), 148501.Galetti, E., Curtis, A., Baptie, B., Jenkins, D., and Nicolson, H. (2016). Transdimensional love-wave tomography of thebritish isles and shear-velocity structure of the east irish sea basin from ambient-noise interferometry. GeophysicalJournal International , (1), 36–58.Glorot, X. and Bengio, Y. (2010). Understanding the difﬁculty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics , pages 249–256.Huang, L., Dong, X., and Clee, T. E. (2017). A scalable deep learning platform for identifying geologic features fromseismic attributes.

The Leading Edge , (3), 249–256.Käuﬂ, P., Valentine, A. P., O’Toole, T. B., and Trampert, J. (2014). A framework for fast probabilistic centroid-moment-tensor determination—inversion of regional static displacement measurements. Geophysical Journal International , (3), 1676–1693.Käuﬂ, P., Valentine, A., de Wit, R., and Trampert, J. (2015). Robust and fast probabilistic source parameter estimationfrom near-ﬁeld displacement waveforms using pattern recognition. Bulletin of the Seismological Society of America , (4), 2299–2312.Maiti, S., Krishna Tiwari, R., and Kümpel, H.-J. (2007). Neural network modelling and classiﬁcation of lithofaciesusing well log data: a case study from ktb borehole site. Geophysical Journal International , (2), 733–746.McCormack, M. D., Zaucha, D. E., and Dushek, D. W. (1993). First-break refraction event picking and seismic datatrace editing using neural networks. Geophysics , (1), 67–78.Meier, U., Curtis, A., and Trampert, J. (2007a). Fully nonlinear inversion of fundamental mode surface waves for aglobal crustal model. Geophysical Research Letters , (16).Meier, U., Curtis, A., and Trampert, J. (2007b). Global crustal thickness from neural network inversion of surface wavedata. Geophysical Journal International , (2), 706–722.Meier, U., Trampert, J., and Curtis, A. (2009). Global variations of temperature and water content in the mantletransition zone from higher mode surface waves. Earth and Planetary Science Letters , (1), 91–101.Montagner, J.-P. and Jobert, N. (1988). Vectorial tomography—ii. application to the indian ocean. Geophysical JournalInternational , (2), 309–344.Mordret, A., Landès, M., Shapiro, N., Singh, S., and Roux, P. (2014). Ambient noise surface wave tomographyto determine the shallow shear velocity structure at valhall: depth inversion with a neighbourhood algorithm. Geophysical Journal International , (3), 1514–1525.Murat, M. E. and Rudman, A. J. (1992). Automated ﬁrst arrival picking: A neural network approach. GeophysicalProspecting , (6), 587–604.Rawlinson, N., Fichtner, A., Sambridge, M., and Young, M. K. (2014). Seismic tomography and the assessment ofuncertainty. In Advances in Geophysics , volume 55, pages 1–76. Elsevier.Ritzwoller, M. H. and Levshin, A. L. (1998). Eurasian surface wave tomography: Group velocities.

Journal ofGeophysical Research: Solid Earth , (B3), 4839–4878.Ritzwoller, M. H., Shapiro, N. M., Barmin, M. P., and Levshin, A. L. (2002). Global surface wave diffractiontomography. Journal of Geophysical Research: Solid Earth , (B12), ESE–4.Roth, G. and Tarantola, A. (1994). Neural networks and inversion of seismic data. Journal of Geophysical Research , (B4), 6753–6768. 22robabilistic Neural Network Tomography across Grane ﬁeld (North Sea) from Surface Wave Dispersion DataSaito, M. (1988). DISPER80 : A subroutine package for the calculation of seismic normal mode solutions . AcademicPress.Shahraeeni, M. S. and Curtis, A. (2011). Fast probabilistic nonlinear petrophysical inversion.

Geophysics , (2),E45–E58.Shahraeeni, M. S., Curtis, A., and Chao, G. (2012). Fast probabilistic petrophysical mapping of reservoirs from 3dseismic data. Geophysics , (3), O1–O19.Shapiro, N. and Ritzwoller, M. (2002). Monte-carlo inversion for a global shear-velocity model of the crust and uppermantle. Geophysical Journal International , (1), 88–105.Simons, F. J., Van Der Hilst, R. D., Montagner, J.-P., and Zielhuis, A. (2002). Multimode rayleigh wave inversion forheterogeneity and azimuthal anisotropy of the australian upper mantle. Geophysical Journal International , (3),738–754.Tarantola, A. (2005). Inverse Problem Theory . Siam.Thompson, M., Andersen, M., Elde, R., Roy, S., and Skogland, S. (2015). The startup of permanent reservoir monitoringfor snorre and grane. In .Trampert, J. and Woodhouse, J. H. (1995). Global phase velocity maps of love and rayleigh waves between 40 and 150seconds.

Geophysical Journal International , (2), 675–690.Villasenor, A., Ritzwoller, M., Levshin, A., Barmin, M., Engdahl, E., Spakman, W., and Trampert, J. (2001). Shearvelocity structure of central eurasia from inversion of surface wave velocities. Physics of the Earth and PlanetaryInteriors , (2-4), 169–184.Woodhouse, J. H. and Dziewonski, A. M. (1984). Mapping the upper mantle: Three-dimensional modeling of earthstructure by inversion of seismic waveforms. Journal of Geophysical Research: Solid Earth , (B7), 5953–5986.Zhang, X., Curtis, A., Galetti, E., and de Ridder, S. (2018). 3-D monte carlo surface wave tomography. GeophysicalJournal International , (3), 1644–1658.Zhang, X., Hansteen, F., and Curtis, A. (2019). Fully 3D monte carlo ambient noise tomography over grane ﬁeld. In .Zhou, Y., Nolet, G., Dahlen, F., and Laske, G. (2006). Global upper-mantle structure from ﬁnite-frequency surface-wavetomography. Journal of Geophysical Research: Solid Earth ,111