[PDF] 1D Anisotropic Surface Wave Tomography with Bayesian Inference

Abstract

Classically, anisotropic surface wave tomography is treated as an optimisation problem where it proceeds through a linearised two-step approach. It involves the construction of 2D group or phase velocity maps for each considered period, followed by the inversion of local dispersion curves inferred from these maps for 1D depth-functions of the elastic parameters. Here, we cast the second step into a fully Bayesian probability framework. Solutions to the inverse problem are thus an ensemble of model parameters (\textit{i.e.} 1D elastic structures) distributed according to a posterior probability density function and their corresponding uncertainty limits. The method is applied to azimuthally-varying synthetic surface wave dispersion curves generated by a 3D-deforming upper mantle. We show that such a procedure captures essential features of the upper mantle structure. The robustness of these features however strongly depend on the wavelength of the wavefield considered and the choice of the model parameterisation. Additional information should therefore be incorporated to regularise the problem such as the imposition of petrological constraints to match the geodynamic predictions.

Full PDF

11D Anisotropic Surface Wave Tomography with BayesianInference

John Keith. Magali Universit´e de Lyon, UCBL, CNRS, LGL-TPE, 69622 Villeurbanne, France,Email: [email protected] 8, 2020

Abstract

1D elastic structures) distributed according to a posterior probability densityfunction and their corresponding uncertainty limits. The method is applied to azimuthally-varyingsynthetic surface wave dispersion curves generated by a 3D-deforming upper mantle. We show that sucha procedure captures essential features of the upper mantle structure. The robustness of these featureshowever strongly depend on the wavelength of the waveﬁeld considered and the choice of the modelparameterisation. Additional information should therefore be incorporated to regularise the problemsuch as the imposition of petrological constraints to match the geodynamic predictions.

Conventional surface wave tomography is usually im-plemented using a two-step approach [e.g Nakan-ishi and Anderson, 1983, Nataf et al., 1984, Tram-pert and Woodhouse, 1995, Romanowicz, 2002, Ritz-woller et al., 2002]. The ﬁrst step involves the inver-sion of the arrival times of each period considered inthe measured source-receiver dispersion data to infer2D group or phase velocity maps at a given period.Using the 2D velocity maps, the second step pro-ceeds by inverting a dispersion curve at a given geo-graphical location to estimate the 1D velocity struc-ture beneath this location. One may then build asmooth 3D velocity model by the juxtaposition of theinferred 1D models followed by interpolating them.The tomography problem is often solved by ap-plying ﬁrst-order corrections of the forward function g around a reference model m . Mathematically. this translates to: d = g ( m ) + ∂g∂m ∆ m . (1)Doing so allows it to be treated as a linearised in-verse problem. For instance, one may estimate the2D phase velocity maps by minimizing an objectivefunction containing a data residual term and morethan one regularization terms such as: S = || Gm − d || + R || m || + R || Dm || (2)where G = ∂g∂m is now a mathematical forward op-erator which now refers to the linearised physics be-tween the model parameters (in this case surfacewave velocities), and the data (dispersion data), D isa second-order smoothing operator, and R and R are the damping and smoothing parameters, respec-tively. The last two parameters will often dictatethe trade-oﬀ between the data misﬁt ( i.e. , how wellthe model parameters predict the data), the proxim-ity of the estimated model from its reference state,and the degree of smoothing in the inverted model.1 a r X i v : . [ phy s i c s . g e o - ph ] D ec oreover, the regularization parameters are chosenad hoc. Hence, the inverted model may be suscep-tible to non-data driven constraints and that theseconstraints shroud essential information provided bythe data. Lastly, optimization techniques such asthis lack the capability to estimate model uncertain-ties.The problems associated with non-uniquenessand quantiﬁcation of uncertainties, coupled by theever-growing computational capacity of modern su-percomputers led to the advancement of probabilis-tic approaches to geophysical inverse problems, asﬁrst exempliﬁed by Mosegaard and Tarantola [1995].Following this pioneering study, a volume of stud-ies that involve probabilistic approaches to geophys-ical inverse problems have been published in seis-mology [e.g. Lomax et al., 2000, Shapiro and Ritz-woller, 2002, Husen et al., 2003, Bodin and Sam-bridge, 2009, Debski, 2010, Bodin et al., 2016], andrapidly growing in the ﬁeld of geodynamics [e.g. Bau-mann et al., 2014, Baumann and Kaus, 2015, Mor-ishige and Kuwatani, 2020, Ortega-Gelabert et al.,2020].Casting the inverse problem in a probabilisticframework allows one to utilise the original non-linear mapping g between the data and the modelwithin its forward procedure. However due to theuse of sampling-based methods, one is then forced tosolve the forward model numerous times dependingon the number of candidate models to be sampled.Fortunately, various techniques are available to ad-dress such complications that are not solely based onheuristics. Here, we restrict ourselves with Bayesianinference, that is, a form of statistical inference thatformulates our solution as an a posteriori probabil-ity initially based on the information we have priorto evaluating the inverse problem. The goal is there-fore not to create an ensemble of solutions that fol-low a certain probability distribution ex nihilo , butto update a prior probability based on valuable in-formation provided by the data.In such schemes, the parameter space has tobe explored for best possible model candidates thatcould match the predictions observed at the sur-face. Grid-search algorithms however are time con-suming and thus uniform sampling may not be per-formed eﬃciently. As such, direct-search algorithmshave been introduced in geophysical inverse prob-lems that sample candidate models within a subsetinstead of the entire parameter space. A speciﬁc class of ergodic algorithms is called Markov chainMonte Carlo (McMC) methods, which has been ini-tially used to solve problems in physics [Metropo-lis and Ulam, 1949, Metropolis et al., 1953], butis now widely used in geophysical inverse problems.In this algorithm, the parameter space is randomlysampled based on our current state of knowledge ofa given model candidate. The sequence of search-ing new model candidates depend on an acceptanceprobability that is often determined by satisfying adetailed balance condition to ensure stationarity ofthe desired solution, the most common being theMetropolis-Hastings algorithm [Hastings, 1970].In this manuscript, we cast the second step ofthe surface wave tomography problem in a Bayesianframework. That is, we assume that we have com-pletely inferred 2D phase velocity maps of the re-gions considered, and invert for 1D velocity struc-tures that explain the dispersion curves built fromthese maps. The problem is applied to syntheticdata for each geographical location considered whoseanisotropic signatures are solely inﬂuenced by con-vective ﬂow in the mantle beneath it. The solution isa marginal posterior distribution of 1D velocity mod-els that best explain the data accompanied by theiruncertainty limits. We show that even with strong apriori constraints, conventional surface tomographyfalls short to capture the complete picture relatedto mantle deformation. Still, some of the notablefeatures are resolved more or less. This work buildsupon the hypothesis that adding geodynamical andpetrological constraints would allows us to reducethe number of acceptable tomographic models thatare consistent with the geodynamical predictions. In this section, we discuss the full implementation ofa 1D anisotropic surface wave tomography in a fullBayesian parameter search approach. We highlightin full detail the (1) model parameterisation, (2) theforward problem, (3) the data, and (4) the inverseapproach. The method is applied to synthetic sur-face wave dispersion curves produced by simple se-tups of an intrinsically anisotropic upper mantle.2 .1 Model parameterisation of a 1DEarth structure

Surface waves are sensitive to 13 depth parameterswhich are just a linear combination of the full elastictensor S ij [Montagner and Nataf, 1986]. In particu-lar, Love and Rayleigh phase velocities are sensitiveto ﬁve depth depth parameters which make up anazimuthally-averaged vertically transverse isotropic(VTI) medium. These parameters are also known asthe Love parameters, and by convention, are desig-nated as A , C , F , L , and N . Note that these func-tions are constrained by the isotropic phase veloci-ties, and are independent of the azimuth of surfacewave anisotropy. As a supplementary, the seismicwave velocities propagating either parallel or per-pendicular to the symmetry axis can be written as: V P H = (cid:115) Aρ (3) V P V = (cid:115) Cρ (4) V SH = (cid:115) Nρ (5) V SV = (cid:115) Lρ , (6)where ρ is the density of the medium. Here, it isimportant to understand that the vertical symmetryaxis need not be the fast axis of anisotropy. Hence,the relative magnitude between N and L , and C and A are interchangeable. Most anisotropic tomog-raphy studies interpret L > N as vertical ﬂow, sinceto ﬁrst-order, the direction of shear is presumed tobe vertical and thus aligned with vertically propa-gating S − waves [e.g. Montagner, 1994]. Likewise, L < N is often interpreted in terms of horizontalﬂow. When the ﬂow has both horizontal and ver-tical components, then the resulting anisotropy willbe ambiguous. This requires resolving the tilt ofanisotropy [Montagner and Nataf, 1988, Montagnerand Jobert, 1988].We constrain radial anisotropy by using a morecompact form related to the Love parameters. For P − waves, the strength of radial anisotropy can beexpressed as φ = C/A , whereas for S − waves, it isgiven by ξ = N/L . Finally, there exists another anisotropic parameter which relates to the elliptic-ity η = F/A − L . The phenomenology of η can beunderstood through the parameter F which controlsthe velocity along the direction between the fast andslow velocities. Assuming a slightly anisotropic medium, azimuthalanisotropy in surface waves can be decomposed intotwo terms that depend on its azimuth θ . Thesetwo terms are usually called the 2 θ and 4 θ com-ponents, and are small perturbations around theisotropic phase velocities. The 2 θ and 4 θ com-ponents are sensitive to eight depth functions 2 θ : G s , G c , B s , B c , H s , H c , and 4 θ : C s , C c [Montagnerand Nataf, 1986].We will only work with azimuthal anisotropy inRayleigh waves. Rayleigh waves are much more sen-sitive to the 2 θ terms than the 4 θ terms [Maupinand Park, 2015]. Thus, to ﬁrst-order, we could elim-inate the parameters C s and C c from the inversions.Surface waves also poorly resolve the parameters H s and H c [Bodin et al., 2016]. Thus, we could reducethe model dimensionality associated with azimuthalanisotropy down to four parameters G s , G c , B s , and B c . By using compact notations and ﬁrst-order approx-imations, we are able to reduce the possible numberof parameters to be inverted for. Four of which: A , L , ξ , φ , and η are sensitive to Love and Rayleighphase velocities, and the remaining four: G s , G c , B s ,and B c are sensitive to the azimuthal variations inRayleigh phase velocities. Working with syntheticdata allows us to access the correct values of the pa-rameters deﬁning our Earth model. As we wish tocompare conventional tomography with geodynamictomography, here, we will be placing ourselves in thebest case scenario. We impose strong a priori con-straints to our solution in the tomographic problem.These constraints can be regarded as regularisationparameters which limit the regions in the parame-ter space to search through. Here, we assume thatwe have the correct values relating to the P − wavestructure A , φ , η , B s , and B c ; and thus, only invertfor S − wave-related structures L , ξ , G c , and G s . Thelist of parameters to/what not to invert for is sum-marized in Table 1.3able 1: Depth functions constrained by surface waves and their azimuthal variations.Parameter Fixed Inverted for A (cid:88) L (cid:88) ξ (cid:88) φ (cid:88) η (cid:88) G c (cid:88) G s (cid:88) B c (cid:88) B s (cid:88) Figure 1: Schematic diagram of the 1D parameterisation. The entire region is parameterised with a piece-wise cubic Hermite polynomials based on a number of control points. Here, the control points are ﬁxed indepth. The model parameters are then varied at these points using an McMC sampling algorithm. Thelayers in between are interpolated, and anything below the 400 km is isotropic PREM.The 1D Earth structure spans from the surfacedown to a depth of 400 km. It is subdivided into62 layers of equal thicknesses where each layer hasa given value of the nine depth functions (see Ta-ble 1). Thus, the total number of unknowns to beinverted for is 62 × L , ξ , G c , and G s at the control pointsand in-between these points, the parameters are in-terpolated. Below the 400 km, we impose isotropicPREM [Dziewonski and Anderson, 1981]. The 1Dstructure is illustrated in Fig. 1The model vector is thus deﬁned as: m = [ L , ξ , G s , G c ] , (7) where bold faces indicate that each parameter is alsoa vector of size 62. For each step in the McMC algorithm, the forwardproblem is evaluated using the proposed model (seeFig. 1) as input. The predicted data from this modelis then compared with the observed synthetics.Isotropic Rayleigh c R ( T ) and Love c L ( T ) disper-sion curves are computed using normal mode sum-mation in a spherical Earth [Smith and Dahlen,1973]. Here, the computations are carried out in afully non-linear approach following the method de-veloped by Saito [1967, 1988]. The software package DISPER80 [Saito, 1988] takes 1D depth proﬁles of V p , V s , ρ , ξ , φ , and η as inputs to compute for c R ( T ) and c L ( T ) and their associated sensitivity kernels usinga Runge-Kutta matrix integration scheme.4ollowing the pioneering work of Montagner andNataf [1986], the azimuthal variations in surfacewave phase velocities c and c can be evaluated us-ing the following expressions: c ( T ) = (cid:90) ∞ z =0 (cid:18) B c ( z ) ∂c r ( T ) ∂A + G c ( z ) ∂c r ( T ) ∂L (cid:19) dz (8) c ( T ) = (cid:90) ∞ z =0 (cid:18) B s ( z ) ∂c r ( T ) ∂A + G s ( z ) ∂c r ( T ) ∂L (cid:19) dz. (9)Eqs. (8) and (9) imply that the azimuthal varia-tions in Rayleigh waves are linearized around thereference VTI model after averaging azimuthally.Such approximations are valid assuming the mediumis quasi-anisotropic [Montagner and Nataf, 1986,Maupin and Park, 2015]. For each geographical location, we can express thelocal dispersion curve as the sum of the isotropic dis-persion curves and their azimuthal variations giving: c ( T, θ ) = c ( T )+ c ( T ) cos(2 θ )+ c ( T ) sin(2 θ ) , (10)where T is the period, and θ is the azimuth of thepropagating surface wave.For Rayleigh waves, we invert c , c , and c whereas for Love waves, we only invert c . For sim-plicity, we neglect the higher-order terms associatedwith the elastic parameter N . such assumptions arevalid due to sparse sampling, low sensitivity, or highnoise levels. Anisotropic surface wave tomography is capable ofconstraining azimuthal and radial anisotropy. Thelevel of radial anisotropy can be quantiﬁed throughthe parameter ξ . Conversely, there are a vari-ety of ways to quantify the strength of azimuthalanisotropy, and its fast azimuth Ψ. Here, we quan-tify it in terms of the peak-to-peak anisotropy:azi = 2 GL , (11)where G = (cid:112) G c + G s . The azimuth of fast propa-gation is given by:Ψ = 0 . (cid:18) G s G c (cid:19) . (12) The inverse problem is cast in a full Bayesian proce-dure where the solution is an ensemble of models dis-tributed according to the posterior pdf p ( m — d obs ),accompanied by their uncertainty bounds. In thisframework, Bayes’ theorem holds: p ( m | d obs ) ∝ p ( m ) p ( d obs | m ) . (13)The parameter space is searched using a Markovchain Monte Carlo (McMC) algorithm. To producereasonable acceptance rates, we employed the adap-tive perturbation scheme. The likelihood function p ( d obs — m ) quantiﬁes howwell the model parameters ﬁt the observed data. Inthe context of our problem, it is loosely based on the L -norm cost function in that it measures the level ofmisﬁt between the predictions and the observations.Here, it is essential to distinguish the data residualsrelated from errors in measurement (cid:15) d , and from themodeling error due to the use of an incorrect forwardmodel (cid:15) g . Assuming that the errors are independentand describe a random process, the forward problemcan be written as: d obs = g ( m ) + (cid:15) d + (cid:15) g . (14)Thus, d obs can also be seen as a random process,and the likelihood distribution can be formulated interms of the pdf of the data errors: p ( d obs | m ) = p ( (cid:15) d + (cid:15) g ) . (15)If we then assume that the errors are uncorrelatedand follow a univariate Gaussian distribution withzero mean, and variance σ where σ = σ d + σ g , wecan write the likelihood function as an exponentialgiving: p ( d obs | m ) = 1(2 πσ ) N/ exp (cid:20) −|| d obs − g ( m ) || σ (cid:21) . (16)where N is the size of the data vector. Since the goalof any optimization problem is to minimise the L cost function, minimising it tantamounts to max-imising the probability of the Gaussian likelihoodfunction given by eq. (16).5he above formulation can be utilised to con-struct the likelihood function for surface wave data.For instance, the likelihood function correspondingto a single dispersion measurement can be writtenas: p ( c obs | m ) = 1(2 πσ c ) N/ exp (cid:20) −|| c obs − c || σ c (cid:21) , (17)where m is the 1D velocity model described in Fig. 1, N is the number of discreet periods, and σ c is theestimated variance of the data noise. The likelihoodfunctions of the 2 θ terms can be cast in the samemanner. One of the ﬂexibilities of the Bayesian framework isthat it enables one to account for prior informationprovided that it can be formulated as a probabilitydistribution. The choice of the prior depends on thetype of geophysical process we aim to tackle. In thecontext of seismic tomography, the prior informationdepends on a range velocity structures that are rea-sonable for the Earth in general. In practice, theprior information is constrained by existing studies.At this point forward, we assume minimal priorknowledge and hence, make use of uniform prior dis-tributions with wide bounds. Although we acknowl-edge that using uniform distributions may be a naiveway to setup the prior, working with such a simpledistribution would already suﬃce when demonstrat-ing proofs of concept. Here, we know the exact val-ues of the model parameters, and by imposing wideuniform priors, we are able to assess the eﬃciencyof the method by placing ourselves in the worst casescenario. Indeed, a subject of future work is to con-sider other forms of the prior distribution, for exam-ple non-informative priors or even hiearachical Bayes[Malinverno and Briggs, 2004].Let us now consider a given model parameter m i .The prior p ( m i ) is prescribed a constant value over agiven range of values deﬁned by [ m min , m max ]. Theprior distribution is thus given by: p ( m i ) = (cid:40) m i > m max , m i < m min1∆ m m min ≤ m i ≤ m max . (18)Eq. (18) is interpreted as follows. Suppose that wedraw a sample for the speciﬁc model parameter m i from the proposal distribution q . If m i is out of bounds, then the proposal is automatically rejectedbecause the value is not speciﬁed by the prior. If m i is within the prior bounds, then the proposal is ac-cepted with condition based on the the acceptanceprobability A . Choosing narrow bounds thereforeimposes hard constraints to the model parametersgiving less emphasis to the information provided bythe data.Assuming prior independence, the prior p ( m )can be written as a product of 1D priors on eachunknown parameter considered giving us: p ( m ) = Nlyrs (cid:89) i =1 (cid:20) p ( L i ) p ( ξ i ) p ( Gs i ) p ( Gc i ) (cid:21) , (19)where Nlyrs is the number of layers of the 1D Earthmodel. Eq. (19) implies that the probability of ac-cepting the transition is automatically zero shouldone of the parameters be outside their respectivebounds. We use a Markov chain Monte Carlo (McMC) algo-rithm to search the parameter space for 1D Earthmodel candidates that could explain the surfacewave dispersion measurements. The sampler initi-ates by randomly drawing a reference model fromthe prior followed by the evaluation of the likelihoodfunction. Within the Markov chain, the current 1DEarth model is perturbed at their control points (seethe red arrows in Fig. 1 to transition into a new state.This is performed by randomly selecting one of thefollowing set of moves:1. Change the Love parameter L values of all con-trol points according to a Gaussian distribu-tion centered at the current value of L .2. Change ξ values of all control points accord-ing to a Gaussian distribution centered at thecurrent value of ξ .3. Change G s values of all control points accord-ing to a Gaussian distribution centered at thecurrent value of G s .4. Change G c values of all control points accord-ing to a Gaussian distribution centered at thecurrent value of G c .6f the proposed 1D Earth model is within theirrespective prior bounds, we then solve the for-ward problem completely. The computed dispersioncurves are then compared with the observed synthet-ics following the evaluation of the likelihood func-tion. The resulting probability is then used to eval-uate the acceptance probability via the Metropolis-Hastings algorithm. The outcome of the algorithmdetermines whether the proposed model is added tothe posterior distribution. Should it be rejected, thecurrent model is counted successively in the next it-eration. We opted for a more dynamic perturbation, that is,the perturbations vary depending on a given situa-tion. Here, we employ the McMC sampler with anadaptive perturbation scheme based on the accep-tance rate. This requires keeping track of the accep-tance rate for a given model parameter on the ﬂy.Let us now denote N pop to be the population size,that is, total number of samples in the inversion,and M to be the total number of accepted modelswithin N pop . The acceptance rate corresponding tothe entire population is just:acceptance = MN pop × n within N pop to allow for the relaxation of the acceptancerates, that is, the period at which the acceptancerates are in stable conditions. The population N pop are then separated into diﬀerent cycles ( i.e. , samplewindow) that are multiples of n . For instance, choos-ing n = N pop just pertains to the acceptance ratesof the entire population, whereas choosing n = 1would constantly reset the counters for the proposaland the accepted models at every iteration. At theend of every cycle, the current values of the accep-tance rates are then used to determine whether toincrease or decrease the perturbation. If the accep-tance is less than 20%, then the perturbations willbe reduced by 25% of its current value. Likewise, ifthe acceptance is more than 50%, then the pertur-bations will be increased by 25% instead. The coun-ters for the proposal and the acceptance are thenreset, and the new acceptance rate is recalculatedin the subsequent cycle. As an example, supposethat we have N pop = 50000 samples, and a subsetof n = 5000, we thus having ten cycles. After the ﬁrst 5000 iterations l (ﬁrst cycle), the acceptancerate at l = 5001 will determine whether to increaseor decrease the perturbations by 25%. Resetting ofthe counters will then be ensued regardless. The ap-praisal of the acceptance rates are then marked afterevery n intervals,that is, at l = 10000 , , We perform 1D anisotropic surface wave tomogra-phy at 32 diﬀerent geographical locations. To ef-ﬁciently implement the inversions for each location,the algorithm is parallellised in such a way that mul-tiple Markov chains can simultaneously search theparameter space independently from one another.Here, one Markov chain is allocated to one proces-sor. At the end of the inversion procedure, the en-semble of models from each chain are then gatheredto construct the posterior probability distribution.Since we consider 20 independent chains for eachgeographical location, and we have 32 locations intotal, we need a hefty 32 ×

20 = 640 cores to imple-ment the parallel scheme.

We ﬁrst test the method to a deforming viscous up-per mantle induced by a negatively buoyant spheri-cal anomaly, akin to a Stoke’s sinker. As illustratedin Fig. 2, the surrounding material responds to thesinking anomaly by producing a return ﬂow. Thereturn ﬂow, together with the downward motion ofthe anomaly, generates local convection cells whosescales are consistent with that of the upper man-tle (which we set at L s = 400 km). We setup thegeographical locations of the local surface wave dis-persion curves in such a way that they provide agood coverage of the anomaly. Here, we use 32 lo-cations that slice the region evenly into two (referto the dashed lines). The observed synthetics aregenerated entirely by a 3D deforming upper mantle.We refer the reader to Magali et al. [2020] for furtherdetails.7igure 2: Snapshot of a 3D deforming mantle in-duced by a sinking spherical anomaly. The modeldomain is of the size 400 km ×

400 km ×

400 km.The isovolumetric gradients correspond to the 3Dtemperature proﬁle of the region, and the superim-posed vector ﬁeld is the ﬂow induced by the sphericalbody. The dashed lines at the surface represent the32 geographical locations of the local surface wavedispersion curves. (a)(b)

Figure 3: Reconstructed 2D surface wave maps withadded noise. (A) Rayleigh. (B) Love.To mimic real-Earth observations, we tarnish theobserved synthetics with noise. We added randomuncorrelated noise with standard deviation σ c R,L =0.001 km/s for the isotropic dispersion curves c R and c L , and σ c , = 0.0005 km/s for the azimuthal com- ponents c and c . Fig. 4 shows the resulting dis-persion curves at one given location. Solid blue linesare the synthetic dispersion curves without noise andthe red dots represent the synthetics with added ran-dom uncorrelated noise. By stacking the dispersioncurves laterally ( i.e. , placing them side-by-side), weare able to visualize the true structure in the dataspace. Fig. 3 shows the eﬀect of the density anomalyonto the resulting surface wave dispersion maps. Thenegative subsidence observed in the Rayleigh mapcorresponds to an increases in its speed as it tra-verses the cold anomaly.For each geographical location, the inversion con-sists of 20 independent Markov chains each contain-ing 1.0 × samples initiated at a random 1D Earthmodel drawn from the prior distribution. The priorbounds of each model parameter is summarized inTable 2.Table 2: Prior ranges of the unknown model param-eters. L (GPa) ξ G c (GPa) G s (GPa)min 20 0.7 -5 -5max 150 1.3 5 5The models are then collected after a burn-in pe-riod of 900,000 samples to ensure the chains are insteady-state and are properly sampling the poste-rior distribution. Note that even though the modelsare randomly initiated, the positions of the controlpoints are ﬁxed, and thus may have strong implica-tions on the recovered structures.Fig. 5 shows the mean velocity structure and themean radial anisotropy recovered from Bayesian in-version, and the correct structures. The 2D modelsare constructed by placing the recovered 1D meanstructures side-by-side. Surface waves were able tosuccessfully map the most prominent feature, thatis, the seismic anomaly associated with the densersphere. Radial anisotropy was also successfully re-covered. However, one of the major drawbacks ofsurface waves is that they are more sensitive at shal-lower depths. Hence, the top layers are better re-solved than the bottom layers. We thus expect theupper portion to exhibit less model uncertaintiesthat the bottom half. Additionally, some essentialfeatures are smeared vertically, which we attributeto the inherent long period nature of surface waves.The structure also appears to be smooth with depthwhich in part is due to the choice of parameterisa-8 a) (b)(c) (d) Figure 4: Surface wave phase velocity dispersion curves and their azimuthal variations at a speciﬁc geo-graphical location. Solid blue lines are the correct values and the scatter plots are the ones added withnoise and are to be inverted.tion ( i.e. , cubic splines are smooth functions). Thelateral resolution however does not exhibit completesmoothness; instead, appears to be degraded at someregions. This is a result of using randomly uncor-related data. In this case, the random data noisemanifest as small-scale artifacts in the model space.One of the main advantages of a Bayesian frame-work is we can express the solution as a marginalposterior pdf where the width of the distributionquantiﬁes the model uncertainties. These depth pro-ﬁles were obtained by merging the ensemble of mod-els from the 20 Markov chains. The entire numberof models m used to build the posterior pdf is thus2.0 × . Results show that the proﬁles success-fully capture the true structure although sharp gra-dients fail to be resolved. The resulting azimuthalanisotropy shows some peculiarities. Since we onlymake use of ﬁve control points which are ﬁxed, thesaddle points of the true azimuthal anisotropy failto be captured. The increase in model uncertaintyat depths is again a result of surface waves beingconcentrated at shallower depths. Fig. 6 shows 1Dmarginal distributions versus depth of L , ξ , peak-to-peak azimuthal anisotropy, and its fast azimuth Ψat a speciﬁc geographical location. (a) (b)(c) (d) Figure 5: True models (left), Mean models recov-ered from the inversion (right). (A) and (B) L − structure, (C) and (D) radial anisotropy.9igure 6: 1D marginal posterior distributions of L , ξ ,peak-to-peak azimuthal anisotropy, and its fast az-imuth Ψ at a speciﬁc geographical location, inferredfrom the Bayesian inversion of surface wave disper-sion curves. The true structures are plotted in solidred. We up the notch further by considering instanta-neous ﬂow in the upper mantle induced by sub-duction as shown in Fig. 7. Similar to the previ-ous example, we added random uncorrelated noisewith σ c R,L = 0.001 km/s and σ c , = 0.0005 km/s.Fig. 8 shows the surface wave maps across a subduc-tion zone. As expected, surface waves (especiallyRayleigh in this case) undeniably map the subduct-ing slab. Figure 7: Snapshot of a 3D upper mantle ﬂow in-duced by subduction. The model dimensions are alsoof the size 400 km ×

400 km ×

400 km. The dashedlines at the surface represent the 32 geographical lo-cations of the local surface wave dispersion measure-ments. (a)(b)

Figure 8: Reconstructed 2D surface wave maps ofa subduction zone with added noise. (A) Rayleigh.(B) Love.The inversion properties are preserved ( e.g. ,number of independent chains, same prior bounds,etc.). Fig. 9 shows the true structures and those re-covered by Bayesian inversion. As expected, we arestill able to successfully recover the subducting slab10ased from the inversions for L . Unlike the previ-ous example however, the added complexity broughtabout by such a type of tectonic setting makes itmore diﬃcult for surface wave tomography. One ofthe reasons for this is the choice of parameterisationas other means of parameterising the model mightincrease the quality of the recovered images. Otherglaring features such as the presence of blobs in ξ in-stead of sharp edges ascribe to the use of long periodobservations. Finally, the lateral resolution is verymuch degraded by random uncorrelated noise beingmapped in the parameter space.Fig. 10 displays the 1D marginal posterior depthproﬁles of the model parameters at a speciﬁc geo-graphical location. As predicted, the sharp gradientssuch as those shown in L at 75 km are not recov-ered by surface waves but are spatially-averaged in-stead. The complex azimuthal anisotropy structurealso fails to be rendered by surface wave inversion inpart due to the type of parameterisation. It is alsoquite evident that surface waves lose resolution withdepth as initially observed in the sphere case. (a) (b)(c) (d) Figure 9:

Subduction case . True models (left),Mean models recovered from the inversion (right).(A) and (B) L − structure, (C) and (D) radialanisotropy. Figure 10: Subduction case . 1D marginal pos-terior distributions of L , ξ , peak-to-peak azimuthalanisotropy, and its fast azimuth Ψ at a speciﬁc ge-ographical location, inferred from the Bayesian in-version of surface wave dispersion curves. The truestructures are plotted in solid red. We have demonstrated conventional anisotropic sur-face wave tomography. Here, the inverse problemis fully Bayesian in that the solution is an ensem-ble of models distributed according to a posteriorprobability. Here, the parameter space is searchedusing a Markov chain Monte Carlo algorithm withimportance sampling. This method was applied toazimuthally-varying surface wave dispersion curvescomputed from a 3D deforming upper mantle.In both synthetic examples, conventional surfacewave tomography was able to recover robust fea-11ures. However, the resolving power of surface waveslimits the vertical resolution of the recovered struc-tures. As exempliﬁed in both cases, surface are longperiod observations and hence cannot resolve small-scale features. These features are instead spatially-averaged and are smooth as a result. It is also im-portant to emphasize that the choice of parameter-isation regularises the inverse problem. As initiallystated, the positions of the control points where thevalues are perturbed are ﬁxed. As a consequence, theresults from all the chains inﬂict strong dependencyof the ﬁnal result on the parameterisation. Suchis apparent in the recovered azimuthal anisotropystructures. Finally, since the energy associated withsurface waves tend to be more concentrated near thesurface, they thus tend to decrease resolution withdepth. This is evident in both inversions where be-low the 250 km mark, the width of the posteriorsbegin to increase.To handle such complications, possible futureavenues include the incorporation of higher-modes.Throughout the inversions, we only consideredfundamental-mode surface waves. Using highermodes would increase the sensitivity of surface wavesto deeper structures thus providing adequate reso-lution with depth [e.g. Simons et al., 1999]. An-other alternative route we could take is to considertrans-dimensional approaches where the number ofmodel parameters to be inverted for are treated asan unknown. In such cases, the model adapts to thedata itself, thus providing a state of balance betweenmodel complexity and resolution [Bodin et al., 2012].Across the horizontal, the recovered structuresappear to be less resolved since unlike teleseismicbody waves, surface waves exhibit poor lateral reso-lution. Still, we should expect the horizontal struc-tures to be smooth. This is not the case however asthis was a result of using randomly uncorrelated datanoise. This random noise maps as small-scale arti-facts in the model, which clearly explains the lateraldiscontinuities. Since real data noise are inherentlyspatially- and periodically-correlated, it is thereforenecessary to build the full data covariance matrixand account for them in the likelihood function.Finally, even though we placed ourselves in thebest case scenario, that is, imposing hard a priori constraints in the inversions by setting the correctvalues for A , φ , η , B c , and B s , we are still hinderedby the problems associated with conventional sur- face wave tomography. In practice, these values aretreated as unknowns in the inversion or are deter-mined based on empirical relations. Implementingthe former complicates the inversion procedure in away that it increases the model complexity, and thatusing only one type of data may not resolve theseparameters at every location. One is thus forced togo with the latter, where they utilise simple relationssuch as the V p V s ratio to constrain V p -related param-eters. Velocity models inferred from such simplisticformulations however may not be representative ofthe would-be recovered structure especially in sit-uations where complex underlying mechanisms dic-tate the lithological integrity of the region. The to-mographic problem should therefore be approachedfrom a diﬀerent perspective, where in lieu of pre-scribing tensor symmetries at the outset it is insteaddriven by key geophysical processes, such as the workof Magali et al. [2020]. Acknowledgements

The provision of the computational resources aremade possible through the ERC grant − TRAN-SCALE: Reconciling Scales in Global Seismologyproject. The manuscript is part of the PhD disser-tation of JK Magali. All computations were carriedout through the in-house Transcale cluster situatedin Lyon.

References

Tobias S Baumann, Boris JP Kaus, and Anton APopov. Constraining eﬀective rheology throughparallel joint geodynamic inversion.

Tectono-physics , 631:197–211, 2014.TS Baumann and Boris JP Kaus. Geodynamic in-version to constrain the non-linear rheology of thelithosphere.

Geophysical Journal International ,202(2):1289–1316, 2015.T Bodin, J Leiva, B Romanowicz, V Maupin, andHuaiyu Yuan. Imaging anisotropic layering withbayesian inversion of multiple data types.

Geo-physical Journal International , 206(1):605–629,2016.Thomas Bodin and Malcolm Sambridge. Seismictomography with the reversible jump algorithm.12 eophysical Journal International , 178(3):1411–1436, 2009.Thomas Bodin, Malcolm Sambridge, H Tkalˇci´c,Pierre Arroucau, Kerry Gallagher, and NicholasRawlinson. Transdimensional inversion of receiverfunctions and surface wave dispersion.

Journal ofGeophysical Research: Solid Earth , 117(B2), 2012.Wojciech Debski. Seismic tomography by montecarlo sampling.

Pure and applied geophysics , 167(1-2):131–152, 2010.Adam M Dziewonski and Don L Anderson. Prelim-inary reference earth model.

Physics of the earthand planetary interiors , 25(4):297–356, 1981.W Keith Hastings. Monte carlo sampling methodsusing markov chains and their applications. 1970.S Husen, E Kissling, N Deichmann, S Wiemer, D Gi-ardini, and M Baer. Probabilistic earthquake lo-cation in complex three-dimensional velocity mod-els: Application to switzerland.

Journal of Geo-physical Research: Solid Earth , 108(B2), 2003.Anthony Lomax, Jean Virieux, Philippe Volant,and Catherine Berge-Thierry. Probabilistic earth-quake location in 3d and layered models. In

Ad-vances in seismic event location , pages 101–134.Springer, 2000.J K Magali, T Bodin, N Hedjazian, H Samuel, andS Atkins. Geodynamic Tomography: Constrain-ing Upper Mantle Deformation Patterns fromBayesian Inversion of Surface Waves.

GeophysicalJournal International , 12 2020. ISSN 0956-540X.doi: 10.1093/gji/ggaa577.Alberto Malinverno and Victoria A Briggs. Ex-panded uncertainty quantiﬁcation in inverse prob-lems: Hierarchical bayes and empirical bayes.

Geophysics , 69(4):1005–1016, 2004.V Maupin and J Park. 1.09—theory and observa-tions—seismic anisotropy.

Treatise on Geophysics ,pages 277–305, 2015.Nicholas Metropolis and Stanislaw Ulam. The montecarlo method.

Journal of the American statisticalassociation , 44(247):335–341, 1949.Nicholas Metropolis, Arianna W Rosenbluth, Mar-shall N Rosenbluth, Augusta H Teller, and Ed-ward Teller. Equation of state calculations by fast computing machines.

The journal of chem-ical physics , 21(6):1087–1092, 1953.Jean-Paul Montagner. Can seismology tell us any-thing about convection in the mantle?

Reviews ofGeophysics , 32(2):115–137, 1994.Jean-Paul Montagner and Nelly Jobert. Vectorialtomography—ii. application to the indian ocean.

Geophysical Journal International , 94(2):309–344,1988.Jean-Paul Montagner and Henri-Claude Nataf.A simple method for inverting the azimuthalanisotropy of surface waves.

Journal of Geophysi-cal Research: Solid Earth , 91(B1):511–520, 1986.JP Montagner and HC Nataf. Vectorial tomography.part i: Theory.

Geophysical Journal International ,94:295–307, 1988.M Morishige and T Kuwatani. Bayesian inversionof surface heat ﬂow in subduction zones: a frame-work to reﬁne geodynamic models based on obser-vational constraints.

Geophysical Journal Interna-tional , 222(1):103–109, 2020.Klaus Mosegaard and Albert Tarantola. Monte carlosampling of solutions to inverse problems.

Journalof Geophysical Research: Solid Earth , 100(B7):12431–12447, 1995.Ichiro Nakanishi and Don L Anderson. Measurementof mantle wave velocities and inversion for lateralheterogeneity and anisotropy: 1. analysis of greatcircle phase velocities.

Journal of Geophysical Re-search: Solid Earth , 88(B12):10267–10283, 1983.H-C Nataf, I Nakanishi, and Don L Anderson.Anisotropy and shear-velocity heterogeneities inthe upper mantle.

Geophysical Research Letters ,11(2):109–112, 1984.O Ortega-Gelabert, S Zlotnik, JC Afonso, andP D´ıez. Fast stokes ﬂow simulations forgeophysical-geodynamic inverse problems and sen-sitivity analyses based on reduced order modeling.

Journal of Geophysical Research: Solid Earth , 125(3):e2019JB018314, 2020.Michael H Ritzwoller, Nikolai M Shapiro, Mikhail PBarmin, and Anatoli L Levshin. Global surfacewave diﬀraction tomography.

Journal of Geophys-ical Research: Solid Earth , 107(B12):ESE–4, 2002.13arbara Romanowicz. Inversion of surface waves:a review.

International Geophysics Series , 81(A):149–174, 2002.Masanori Saito. Excitation of free oscillations andsurface waves by a point source in a verticallyheterogeneous earth.

Journal of Geophysical Re-search , 72(14):3689–3699, 1967.MASANORI Saito. Disper80: A subroutine pack-age for the calculation of seismic normal-mode so-lutions.

Seismological algorithms , pages 293–319,1988.NM Shapiro and MH Ritzwoller. Monte-carlo inver-sion for a global shear-velocity model of the crustand upper mantle.

Geophysical Journal Interna-tional , 151(1):88–105, 2002. Frederik J Simons, Alet Zielhuis, and Rob D VanDer Hilst. The deep structure of the australiancontinent from surface wave tomography. In

De-velopments in Geotectonics , volume 24, pages 17–43. Elsevier, 1999.Martin L Smith and FA Dahlen. The azimuthal de-pendence of love and rayleigh wave propagationin a slightly anisotropic medium.

Journal of Geo-physical Research , 78(17):3321–3333, 1973.Jeannot Trampert and John H Woodhouse. Globalphase velocity maps of love and rayleigh wavesbetween 40 and 150 seconds.