[PDF] A dual formulation of wavefield reconstruction inversion for large-scale seismic inversion

Abstract

Most of the seismic inversion techniques currently proposed focus on robustness with respect to the background model choice or inaccurate physical modeling assumptions, but are not apt to large-scale 3D applications. On the other hand, methods that are computationally feasible for industrial problems, such as full waveform inversion, are notoriously bogged down by local minima and require adequate starting models. We propose a novel solution that is both scalable and less sensitive to starting model or inaccurate physics when compared to full waveform inversion. The method is based on a dual (Lagrangian) reformulation of the classical wavefield reconstruction inversion, whose robustness with respect to local minima is well documented in the literature. However, it is not suited to 3D, as it leverages expensive frequency-domain solvers for the wave equation. The proposed reformulation allows the deployment of state-of-the-art time-domain finite-difference methods, and is computationally mature for industrial scale problems.

Full PDF

AA dual formulation of waveﬁeld reconstruction inversion forlarge-scale seismic inversion

Gabrio Rizzuti , Mathias Louboutin , Rongrong Wang , and Felix J. Herrmann Georgia Institute of Technology, Michigan State University

Abstract

Most of the seismic inversion techniques currently proposed focus on robustness with respect to the backgroundmodel choice or inaccurate physical modeling assumptions, but are not apt to large-scale 3D applications.On the other hand, methods that are computationally feasible for industrial problems, such as full waveforminversion, are notoriously bogged down by local minima and require adequate starting models. We propose anovel solution that is both scalable and less sensitive to starting model or inaccurate physics when comparedto full waveform inversion. The method is based on a dual (Lagrangian) reformulation of the classicalwaveﬁeld reconstruction inversion, whose robustness with respect to local minima is well documented in theliterature. However, it is not suited to 3D, as it leverages expensive frequency-domain solvers for the waveequation. The proposed reformulation allows the deployment of state-of-the-art time-domain ﬁnite-diﬀerencemethods, and is computationally mature for industrial scale problems.

Introduction

Field-data applications of wave-equation based seismic inversion are limited by the computational complexityof large-scale 3D simulations, and by the tendency of traditional inversion techniques [such as full waveforminversion, FWI, Tarantola and Valette, 1982] to converge to unrealistic models. While considerable advanceshave been made to speed up wave equation solvers and avoid local minima, most of the currently proposedinversion methods do not take advantage of both these developments, and are practically limited to 2Dproblems. In this paper, we introduce a novel method that is ﬁt for 3D large-scale deployment and relativelyrobust against spurious minima.We adopt model extension methods that aim at counteracting local minimum stagnation, which are basedon a reformulation of the original inverse problem into a higher dimensional space of unknowns [Symes,2008]. While beneﬁcial for avoiding local minima, the inherent added complexity of these methods typicallycurtails large-scale applications. One notable example, and the basis for this work, is waveﬁeld reconstructioninversion [WRI, van Leeuwen and Herrmann, 2013], which has demonstrated robustness towards local minimabut is not well suited for 3D problems. Indeed, WRI involves the solution of an extended wave equation thatis conventionally solved in the frequency domain with direct or iterative methods, although not eﬃciently forlarge grid sizes. Many other methods related to WRI share, fundamentally, the same limitation [Huang et al.,2017, 2018, Aghamiry et al., 2019].We propose a principled Lagrangian reformulation of classical WRI, referred to as WRI ∗ throughout the text,which naturally lends itself to time-domain solvers, hence large-scale applications. The reformulation is basedon minimizing the wave equation error and interpreting the data misﬁt as a constraint. Complexity analysis1 a r X i v : . [ phy s i c s . g e o - ph ] O c t nd numerical experimentation highlights comparable computational costs to FWI per gradient computation(corresponding to roughly twice its cost), and is consistently more robust with respect to starting modeland inexact physical modeling assumptions. Contrary to most of the inversion methods based on the waveequation nowadays available, our approach represents a computationally eﬃcient, robust, and ultimatelyfeasible proposal for 3D seismic inversion.Our proposal leverages on the current state-of-the-art in time-domain ﬁnite-diﬀerence solvers that ensuresscalability on the hardware available to the oil-and-gas industry, or even to a wider audience thanks to cloudcomputing [Witte et al., 2020]. We employ the domain-speciﬁc language Devito for automatic generationof highly optimized ﬁnite-diﬀerence C code [Luporini et al., 2020, Luporini et al., 2018, Louboutin et al.,2019]. Here, we focus on acoustic modeling and pseudo-acoustic modeling with tilted transverse isotropy[TTI, Alkhalifah, 2000], established as a reasonable compromise between realistic physics and computationalfeasibility.The computational and theoretical advantages of WRI ∗ will be demonstrated with extensive 2D experimenta-tion. Despite the central claim of the paper being centered on 3D feasibility, 3D problems are still beyondthe economic resources realistically available to academic organizations, and will not be included in this work.We stress, however, that this is not due to a fundamental limitation of the method, and any institution thatcan aﬀord the computations needed for FWI can also fruitfully employ WRI ∗ .In the rest of this paper, we discuss the computational and theoretical pitfalls of classical seismic inversion,with a general overview of the methods that deal with those issues. We review Lagrangian and reducedformulations for seismic inversion, followed by a more in-depth discussion on WRI. The main section isdevoted to the reformulation of WRI object of this work: WRI ∗ . Lastly, we present numerical experiments tothoroughly test our proposal and compare it to conventional WRI and FWI. Seismic inversion: computational and theoretical complexity

Modern seismic inversion, such as FWI, is cast as an optimization problem endowed with partial diﬀerentialequation constraints [Virieux and Operto, 2009]. Given a suitable discretization of the physical quantities ofinterest, and the algebraic operators that govern their mutual relationship, the problem can be succinctlyexpressed as: min m , u k d − R u k subject to A ( m ) u = q . (1)Data d ∈ R N r × N t × N s are recorded in time at N r receiver locations for diﬀerent N s source positions. Alterna-tively, after a temporal Fourier transform, d ∈ C N r × N f × N s for N f frequencies. The variable u ∈ R N × N t × N s (or u ∈ C N × N f × N s , in frequency domain) represents a waveﬁeld—e.g. the solution of the wave equation A ( m ) u = q — discretized on an N -dimensional space. The right-hand side q of the wave equation is typicallya point source. In equation (1), the data are compared to the values of u interpolated at the receiver locationsvia the operator R , and the misﬁt calculated via the least-squares norm k·k := k·k . The wave operator A ( m )is parameterized by m , which encapsulates the model unknowns of the problem (e.g. squared slowness, in thesimplest acoustic, constant density setting).In 3D, the grid size grows as N = O ( n ) (in big O notation), with n ≈ N r = O ( n )and N s = O ( n ). Furthermore, when time-domain methods with explicit time-marching schemes are applied, N t = O ( n ) (realistically, N t ≈ n ), due to stability conditions. In the frequency domain, we also oftenkeep N f = O ( n ), albeit with much more favorable complexity constants than time domain [note that, withoptimal frequency sampling, we theoretically only need N f = O (log n ), Sirgue and Pratt, 2004]. In thisregime, the computational requirements are extremely challenging. For instance, waveﬁeld storage grows as O ( n ), and cannot be attained for each source/time sample at once. Time complexity is largely dominated by2he computation of the solutions of the wave equation. Currently, explicit time-domain methods are favored,as in the frequency domain the wave equation is represented by a large, sparse, non-deﬁnite linear system,and neither direct nor iterative methods scale as eﬃciently to 3D. Frequency domain becomes a palatableoption only when a large high-performance computing system can store the matrix factorization [Wang et al.,2011], or abundant compute cores are available [Knibbe et al., 2016].Despite these computational challenges, FWI is now viable for 3D problems of practical size, especiallywhen the simulated physics is restricted to acoustics or tilted transversely isotropic (TTI) pseudo-acoustics[Alkhalifah, 2000, Grechka et al., 2004, Duveneck et al., 2008]. The FWI objective functional is, however,highly multimodal and requires considerable problem-speciﬁc intervention to be consistently successful whengradient-based optimization is employed. In practice, FWI requires a kinematically correct starting guess toavoid cycle skipping. Local minima have been the subject of a great deal of research in the last 10 years, andmany proposals tackling the problem have been advanced. The main lines of investigation involved:• regularization techniques that penalize unrealistic models, e.g. via total-variation minimization [Esseret al., 2018] or projection onto constraint sets [Peters and Herrmann, 2017, Peters et al., 2018];• analysis of data misﬁt in a transformed domain [Shin and Cha, 2008, 2009, Bozdag et al., 2011] and/orapplication of preprocessing hierarchical strategies, where diﬀerent levels of data preconditioning aresetup and the associated problems solved sequentially, e.g. low- to high-frequency data ﬁltering inversion[Bunks et al., 1995], or shallow- to deep-model inversion [Shipp and Singh, 2002];• model extension, where the original problem is generalized to a higher dimensional space better suitedfor gradient-based optimization [Symes, 2008, van den Berg and Kleinman, 1997, Biondi and Almomin,2014, van Leeuwen and Herrmann, 2013, Wang et al., 2016, Wang and Herrmann, 2017b, Aghamiryet al., 2019, Warner and Guasch, 2016, Guasch et al., 2019];• special objective functionals more amenable to local-search optimization, such as the optimal transportdata misﬁt via the Wasserstein distance [Métivier et al., 2016, Yang et al., 2018, Sun and Alkhalifah,2019].Note that these approaches are not mutually exclusive, and despite the fact that our contribution is aninstance of model extension techniques, preprocessing hierarchical strategies, data transforms, or regularizationmethods can be incorporated straightforwardly. More recent ideas based on optimal transport can alsobe formally integrated in the framework presented later on, but requires further study and will not beconsidered here. Also, the Wasserstein distance needs a careful reinterpretation of seismic data as probabilitydistributions and current solutions are conﬁned to a trace-by-trace comparison, limiting its beneﬁts.Recently, many model extension methods have been proven successful in their empirical robustness to localminima, but are hampered by the computational costs associated to operating in higher dimension. A notableexception is adaptive waveform inversion [AWI, Warner and Guasch, 2016, Guasch et al., 2019], which consistsin matching synthetic time traces to data by means of convolutional Wiener ﬁlters with arbitrary length.Since the computational overhead with respect to FWI is aﬀordable, AWI can be applied to large problems.In this paper, however, we are not interested in establishing the relative merits of AWI with respect to othermethods, and we will focus on developing a competing method based on WRI. WRI, in its basic formulation[van Leeuwen and Herrmann, 2013], is not amenable to 3D. Indeed, it involves a relaxed version of the waveequation that does not allow for a straightforward implementation of time-marching schemes and is practicallylimited to 2D problems [see, however, Peters and Herrmann, 2019, for some advances using frequency-domain3D solvers]. Alternative characterizations of the original WRI formulation, as oﬀered in Huang et al. [2017],Huang et al. [2018], or Aghamiry et al. [2019], suﬀer from the same issues and do not scale eﬃciently to 3D.The main goal of this paper is to provide a reformulation of WRI that circumvents this limitation.3 educed, full-space, and penalty formulations In this section, we lay out the groundwork for WRI and our proposal WRI ∗ by illustrating diﬀerent waysto deal with the problem formulated in equation (1). Classical FWI consists of imposing the constraint u = A ( m ) − q explicitly (from which the moniker of reduced formulation stems). Equation (1) can beequivalently rewritten as: min m J FWI ( m ) , J FWI ( m ) = 12 (cid:13)(cid:13) d − RA ( m ) − q (cid:13)(cid:13) . (2)Regularization can be enforced via constraints or extra terms. There are several advantages to this formulationthat makes large-scale deployment possible with the computational capabilities that are currently available.The objective functional J FWI is deﬁned by a summation of the data misﬁt terms over individual sources.Therefore, in the evaluation of J FWI and its gradient calculation, the solution waveﬁelds of forward andadjoint problems pertaining to a source can be discarded after use, thus limiting the memory overhead.In time domain, checkpointing techniques are still essential to control memory complexity [Symes, 2007].Alternatively, one could combine modeling in time domain and on-the-ﬂy Fourier transform to obtain anestimate of the gradient by cross-correlating the forward and backpropagated waveﬁelds in the Fourierdomain [Witte et al., 2019b]. Although the reduced formulation leads to a scalable approach by makinguse of relatively frugal time-domain modeling, it is particularly prone to local minima, and necessitates akinematically adequate starting guess for m , ideally matching the data traveltimes within half a period[Beydoun and Tarantola, 1988].Another classical setup, sometimes referred to as the full-space method, is by means of the Lagrangianfunction associated to equation (1) and the related saddle point problem [Haber et al., 2000]:max v min m , u L weq ( m , u , v ) , L weq ( m , u , v ) = 12 k d − R u k + h v , q − A ( m ) u i , (3)where h· , ·i is the least-squares inner product. The multipliers v belong to a linear space with the samedimension as the waveﬁeld u space. When memory usage is not of concern, we can carry out joint optimizationover m , u and v . Note that the evaluation and gradient computation of the Lagrangian do not requirethe solution of the wave equation. Moreover, the Hessian of the Lagrangian is sparse and can be evaluatedcheaply, making second-order methods aﬀordable. However, in spite of these advantages, storing u and v inmemory is not viable.Penalty formulations are situated in between the full space method and FWI, and seek to enforce the waveequation weakly as a regularization term. By ﬁxing the multiplier v = λ ( q − A ( m ) u ) in equation (3), weobtain the problem: min m , u J pen ( m , u ) , J pen ( m , u ) = 12 k d − R u k + λ k q − A ( m ) u k . (4)Here, λ is a trade-oﬀ parameter between data misﬁt and wave equation error. Note that, with λ → ∞ , themethod becomes equivalent to FWI. Similarly to the full space method, we cannot aﬀord the storage of theunknowns u , and need to resort to the WRI approach described in the next section.4 aveﬁeld reconstruction inversion Within the penalty method framework in equation (4), van Leeuwen and Herrmann [2013] apply the variableprojection scheme by deﬁning ¯ u := arg min u J pen ( m , u ) [Golub and Pereyra, 2003], as a way to eliminate u .This procedure yields the WRI method:min m J WRI ( m ) , J WRI ( m ) = 12 k d − R ¯ u k + λ k q − A ( m )¯ u k , (5)where ¯ u is the solution of the augmented wave equation: (cid:18) RλA ( m ) (cid:19) ¯ u = (cid:18) d λ q (cid:19) , (6)to be solved in least-squares sense. Note that, by virtue of variable projection, ∇ u J pen ( m , ¯ u ) = , and ∇ m J WRI ( m ) = ∇ m J pen ( m , ¯ u ) . (7)As in the reduced approach, WRI avoids simultaneous waveﬁeld storage for each source (by computing anddiscarding). Some theoretical indications, rooted in microlocal analysis of Fourier integral operators [e.g.,Stolk and Symes, 2002], show that WRI is more robust than FWI with respect to local minima [with caveatsrelated to how the wave equation error is measured, see Symes, 2020a,b]. This thesis is also supportedby plenty of empirical results (including this paper). Furthermore, WRI delivers consistently superior saltimaging compared with respect to FWI [especially when combined with regularization, Esser et al., 2018,Peters and Herrmann, 2017], thanks to the inherent well-conditioning produced by variable projection [Goluband Pereyra, 2003]. Alas, a major drawback of WRI is that the augmented wave equation (6) cannot betrivially solved in neither frequency nor time domain when the problem is sizable.A way to overcome the computational hurdles associated to the augmented wave equation is suggested byrewriting equation (6) as A ( m )¯ u = q + ¯ q , where A ( m ) ∗ ¯ q = R ∗ ¯ y , (8)for some variable ¯ y that belongs to the data space ( ∗ denotes the adjoint operation). In equation (8), ¯ y isbackpropagated to form an additional source contribution ¯ q . This extended source has a time and spatialdistribution and undergirds the idea of extended-source focusing by Huang et al. [2017] (which is itself aninstance of WRI). In fact, ¯ y = ( d − R ¯ u ) /λ . (9)This identity highlights that ¯ y is proportional to the data residual of the augmented waveﬁeld deﬁnedequation (6). It can be equivalently cast as the solution of the linear system: h λ I + F ( m ) F ( m ) ∗ i ¯ y = r ( m ) , where F ( m ) = RA ( m ) − (10)is the forward operator, I is the identity, and r ( m ) = d − F ( m ) q is the data residual of the physical solutionof the wave equation. The evaluation of the linear sytem in equation (10) involves solving the conventionalwave equation (and adjoint thereof). If one could cheaply estimate ¯ y , the augmented solution ¯ u would onlyrequire one extra wave equation solution.The previous observation is related to the work by Wang et al. [2016], which consists in the approximation˜ y = r ( m ) /λ ≈ ¯ y and A ( m )˜ u = q + F ( m ) ∗ ˜ y . The objective then becomes ˜ J WRI ( m ) = J pen ( m , ˜ u ). Thisapproach does not require the solution of augmented wave equations of sort, and can exploit traditionaltime-domain solvers. On the other hand, the approximation ˜ u ≈ ¯ u deteriorates as λ →

0, and the gradientof ˜ J WRI is only approximated by ∇ m ˜ J WRI ≈ ∇ m J pen ( m , ˜ u ) (since ˜ u is not obtained through variableprojection, as in equation 7). A rigorous calculation should indeed comprise the dependency of ˜ u with respectto m . Nonetheless, the approach taken in Wang et al. [2016] directly inspires some of the choices involved inthe proposal introduced in the next section. 5 dual formulation for waveﬁeld reconstruction inversion: WRI ∗ Following Wang and Herrmann [2017a], we formulate from ﬁrst principles a scalable version of WRI bystarting with a denoising reformulation of equation (1):min m , u k q − A ( m ) u k s . t . k d − R u k ≤ (cid:15), (11)for a known noise level (cid:15) . Notice how objective and constraints are here swapped with respect to theLagrangian formulation for the full-space method in equation (3). The denoising problem in equation (11) issomewhat equivalent to the penalty formulation in equation (4), in the sense that for a ﬁxed m and a given (cid:15) ,there exists a weight λ such that the solutions of (4) and (11) are identical [and viceversa, Bjorck, 1996]. Thedenoising formulation, however, has the advantage of a lower dimensional Lagrangian multiplier than thepenalty counterpart, as described in the following sections. Lagrangian formulation for data misﬁt constraint

We proceed similarly to the full-space method, by building the Lagrangian associated to the problem (11). Inorder to do so, we refer to the Fenchel duality theory [Rockafellar, 1970]. The saddle-point problem associatedto equation (11) takes the formmax y min m , u L dat ( m , u , y ) , L dat ( m , u , y ) = 12 k q − A ( m ) u k + h y , d − R u i − (cid:15) k y k , (12)where the slack variables y belong to a linear space with the same dimension as data. We can eliminate thewaveﬁeld variables u from equation (12) by applying the variable projection ¯ u = arg min u L dat ( m , u , y ): A ( m )¯ u = q + ¯ q , where A ( m ) ∗ ¯ q = R ∗ y , (13)in a similar fashion to what was described in equation (8). The substitution L WRI ∗ ( m , y ) := L dat ( m , ¯ u , y )yields the WRI ∗ problem max y min m L WRI ∗ ( m , y ) , L WRI ∗ ( m , y ) = − (cid:13)(cid:13)(cid:13) F ( m ) ∗ y (cid:13)(cid:13)(cid:13) + h y , r ( m ) i − (cid:15) k y k , (14)where the forward operator F ( m ) = RA ( m ) − was introduced in equation (10), and r ( m ) is the data residualfor the model m . In equation (14), optimizing over y means maximizing the correlation of y with thedata residual r ( m ), where additional regularization terms penalize the magnitudes of y and the extendedsource ¯ q = F ( m ) ∗ y . Under this lens, the method bears some resemblance with techniques based on datacross-correlation thoroughly investigated in the past starting from the work of Luo and Schuster [1991].Regarding gradient calculation, we obtain ∇ m L WRI ∗ = − J [ m , q + F ( m ) ∗ y ] ∗ y , (15) ∇ y L WRI ∗ = d − F ( m )( q + F ( m ) ∗ y ) − (cid:15) y / k y k . (16)For a general source f , J [ m , f ] is the Jacobian of the mapping m F ( m ) f . The expression in equation (15)can be computed similarly to a conventional FWI gradient: ( i ) compute ¯ q in equation (13) by backwardpropagation, ( ii ) compute ¯ u in equation (13) by forward propagation, and ( iii ) perform temporal cross-correlation of ¯ q and ∂ tt ¯ u . The gradient with respect to y in equation (16) corresponds to the data residual ofthe augmented waveﬁeld in equation (13), plus a relaxation term.6he fundamental gain in solving WRI ∗ instead of WRI lies in the fact that the evaluation of the Lagrangian L WRI ∗ and its gradients only require solutions of a conventional wave equation, opening up the choice fortime-domain solvers. The multipliers y can be stored in memory whenever m and y are jointly optimized.Alternatively, one might consider variable projection for y . Both these approaches are computationallyexpensive, and in this paper we will rely on a simple approximation for y , to be discussed in the next section. Dual variable approximation

Equating the gradient with respect to y to zero, in equation (16), provides a closed-form solution for theoptimal ¯ y , deﬁned by the non-linear system: (cid:15) ¯ y k ¯ y k + F ( m ) F ( m ) ∗ ¯ y = r ( m ) , (17)analogously to equation (10) for conventional WRI. Note that a slight modiﬁcation to equation (14), L WRI ∗ ,λ ( m , y ) = − (cid:13)(cid:13)(cid:13) F ( m ) ∗ y (cid:13)(cid:13)(cid:13) + h y , r ( m ) i − λ k y k , (18)would produce the same linear system in equation (10) for the associated optimal ¯ y . With this choice,the problem becomes equivalent to conventional WRI, since L WRI ∗ ,λ ( m , ¯ y ) = J WRI ( m ) /λ when ¯ y solvesequation (10).Equation (17) (or its linear variant 18), can be solved iteratively. However, each evaluation of equation (17)amounts to two wave equation solutions and many iterations may be required. As a result, we propose —similarly to Wang et al. [2016] — a cheap approximation for ¯ y corresponding to the scaled data residual¯ y ≈ ˜ y = α r ( m ), albeit for an unknown α . Given that equation (14) is (strictly) concave with respect to y ,the optimal scaling ˜ α = arg max α L WRI ∗ ( m , α r ( m )) is uniquely solved by˜ y = ˜ α r ( m ) , ˜ α =  k r ( m ) k ( k r ( m ) k − (cid:15) ) k F ( m ) ∗ r ( m ) k , k r ( m ) k ≥ (cid:15), , otherwise . (19)There are no signiﬁcant extra costs associated with this computation, as it can be obtained as a byproduct ofthe calculations involved in the evaluation of the Lagrangian functional. A new objective based on the dual formulation

The approximation for y introduced in equation (19) results in the reduced objective˜ L WRI ∗ ( m ) := L WRI ∗ ( m , ˜ y ) , (20)and its gradient with respect to m is ∇ m ˜ L WRI ∗ = ∇ m L WRI ∗ ( m , ˜ y ) − ˜ αJ [ m , q ] ∗ ∇ y L WRI ∗ ( m , ˜ y ) , (21)where ∇ m L WRI ∗ and ∇ y L WRI ∗ are given in equations (15) and (16). For simplicity, we will still refer to thereduced problem as WRI ∗ . Note that the computation of equation (21) does not require the diﬀerentiationof ˜ α = ˜ α ( m ), as its expression has been obtained from variable projection. Neglecting the extra correctionterm in equation (21) [analogously to Wang et al., 2016] saves some computation but, as observed in practice,it results in a more laborious line search phase, typically employed in nonlinear optimization methods,particularly in combination with quasi-Newton methods such as l-BFGS [Nocedal and Wright, 2006]. Theevaluation and gradient calculation of the WRI ∗ objective (20) is schematically reported in algorithm 1, andcan be used in combination with any nonlinear optimization solvers.7 lgorithm 1 WRI ∗ objective evaluation and gradient computation. Notation : R : receiver restriction operator A ( m ): wave equation operator Function : objective_WRI Input : (cid:15) : ﬁxed data tolerance d : observed data q : source position/wavelet m : model Algorithm :1. Forward waveﬁeld u = A ( m ) − q and residual r = d − R u

2. Backward waveﬁeld ˜ q = A ( m ) −∗ R ∗ r ( m )3. Compute α = α ( k r k , k ˜ q k , (cid:15) ), equation (19)4. Objective value L = − α k ˜ q k / α k r k − | α | (cid:15) k r k , equation (20)5. Augmented waveﬁeld ˜ u = A ( m ) − ( q + α ˜ q ) and residual ˜ r = d − R ˜ u

6. Gradient g m , ( x ) = − α P t ∂ tt ˜ u ( x , t )˜ q ( x , t ), equation (15)7. Gradient g y = ˜ r − sign( α ) (cid:15) r / k r k , equation (16)8. Backpropagated gradient v = A ( m ) −∗ R ∗ g y

9. Gradient correction g m , = − α P t ∂ tt u ( x , t ) v ( x , t )10. Corrected gradient g m = g m , + g m , , equation (21) Output: L , g m Source-dependent metrics for the wave equation misﬁt

In equation (11), we can exploit the knowledge about the location and distribution of the physical source q ,and design metrics for the wave equation misﬁt that are more suited for the problem at hand. Huang et al.[2018], for example, choose a weighted least-squares norm k p k := k p k Σ = p h Σ − p , p i , for p = q − A ( m ) u ,with a diagonal matrix Σ − / = diag( w ) penalizing functions with a support distributed far from the pointsource location. Here, similarly, we select a penalty grid function w ( x ) = p | x − x s | + h /h dependent on aparameter h tuned to be a fraction of a reference wavelength. With this modiﬁcation, the Lagrangian inequation (14) becomes L WRI ∗ ( m , y ) = − (cid:13)(cid:13)(cid:13) F ( m ) ∗ y (cid:13)(cid:13)(cid:13) − + h y , r ( m ) i − (cid:15) k y k , (22)and only minor corrections are needed for related gradient expressions and dual variable approximation. Inthe experiments in the next section, we tacitly assume this type of source-dependent weighting. Note thatsource weighting cannot be trivially incorporated in FWI, since it is based on the exact solution of the waveequation.Other potentially interesting modiﬁcations of the WRI ∗ objective in equation (14) involve the ‘ norm, k·k ,or weighted versions thereof, to measure the wave equation error. The basic rationale is to impose sparsity onthe spatial distribution of the extended source. We refer to Sharan et al. [2018] for a microseismic application.Following Sharan et al. [2018], we further suggest the so-called elastic net regularization obtained by replacingthe ‘ norm with a relaxation that involves the least-squares norm: k·k := k·k + k·k / (2 µ ). This relaxationis key in obtaining an analytical expression for the dual objective, analogously to equation (14). In this paper,however, we defer the ‘ formulation to future work, and stick to the objective (14).8 omplexity of WRI ∗ Compared to FWI, evaluating and diﬀerentiating (20) requires the computation of twice as many solutions ofthe wave equation, respectively. Despite the increased cost, this still compares favorably to conventional WRI,since we avoid the need to solve for an augmented wave equation and frequency domain solvers. Moreover,as it will be demonstrated with numerical examples, WRI ∗ retains the robustness against local minimacharacteristic of classical WRI, despite the approximations discussed in the previous sections. Results

We test the capabilities of the WRI ∗ inversion method with synthetic examples. We start with the• comparison of WRI ∗ with classical WRI.We remind that WRI ∗ and WRI are equivalent only when the dual variable is solved exactly, as in equation (17)(with the caveat described in that section). In other words, we will test the eﬀect of the dual variableapproximation proposed in equation (19). Note that this set of experiments will be carried out in thefrequency domain, since traditional WRI is therein framed more conveniently from a computational point ofview.Other experiments mainly aim at the empirical demonstrations of the following claims:• WRI ∗ is more robust than FWI with respect to local minima;• WRI ∗ is more robust than FWI with respect to modeling inaccuracies.With modeling inaccuracies, we refer to erroneous assumptions about some of the physical parameters thatwill be kept ﬁxed during inversion, in the context of TTI modeling (tilted transverse isotropy) and inversion. Comparison of WRI and WRI ∗ The aim of this experiment is a qualitative assessment of the dependency of the WRI ∗ scheme with respectto the hyperparameter (cid:15) , appearing in equation (11) (which denotes a given data noise tolerance), and thecomparison with WRI. We emphasize that one of the main diﬀerences between WRI ∗ and classical WRI is asuboptimal — but computationally convenient — approximation of the slack variables y described in (19)(while the optimal y makes WRI ∗ and WRI substantially equivalent). The scope of the example described inthis section is therefore twofolded:• determine the eﬀect of (cid:15) on the WRI ∗ inversion result;• determine the eﬀect of the dual variable approximation in equation (19) by comparing WRI ∗ withtraditional WRI (based on the optimal solution for y via variable projection).The classical WRI formulation is deﬁned by the objective functional in equation (18), where y is obtainedby (10). Note that the experiment of this section will be carried out in the frequency domain, where WRI ismore conveniently formulated.We consider the well-known Marmousi model, presented in Figure 1a. Data are directly generated in thefrequency domain with components ranging from 3 Hz to 14 Hz, with no additional noise (Figure 2). We set100 sources evenly spaced and receivers densely sampled along the surface.9 a)(b) Figure 1: Marmousi model: (a) true velocity model, (b) background model.We invert single-frequency data components with a multiscale approach [Bunks et al., 1995] from low tohigh frequencies, and run 10 iterations of the l-BFGS optimizer per frequency. We compare FWI, WRI ∗ with diﬀerent values for (cid:15) , and conventional WRI (with a ﬁxed choice for the weighting parameter λ inequation 18). The results are gathered in Figure 3.Comparing the results in Figure 3, it is apparent that FWI converges to a local minimum, while both WRI ∗ and conventional WRI produce plausible models. WRI ∗ does not achieve the same resolution of the WRIinversion, although we observe that increasing values for (cid:15) translate to higher model resolution (cf. Figures 3band 3c). This behavior can be understood in the following qualitative terms: lower values for (cid:15) promote awaveﬁeld solution that ﬁts data rather than the wave equation (see, for instance, equation 11), and unphysicalsolutions translate to inaccurate model resolution. This observation might seem in contradiction with the WRIresult shown in Figure 3d: indeed the same considerations just made for WRI ∗ should apply to WRI (where λ plays a similar role of (cid:15) ), and yet the WRI result consists in a much better resolved model. This paradox isdue to the notorious well-conditioning properties of variable projection [Golub and Pereyra, 2003]: WRI ∗ has10 a) (b)(c) (d) Figure 2: Marmousi model data, normalized for display. Data are organized as source/receiver frequencyslices for diﬀerent frequency values. Direct arrivals are here removed by subtraction with the data synthetizedby the background model in Figure 1b.been obtained through only an approximation of the optimal dual variable y , and therefore does not enjoythe same advantages of WRI. In general, however, WRI ∗ conserves the same ability of WRI to circumventlocal minima, and the gap between the two methods can be reduced by either quasi-Newton preconditioningor devising a sequential WRI ∗ -FWI scheme where the WRI ∗ result is sharpened by FWI (which will alsohelp in saving computations once local minima are avoided). An example of the joint WRI ∗ -FWI approach isdepicted in Figure 4, highlighting a qualitatively similar result to conventional WRI in Figure 3d.11 a)(b)(c)(d) Figure 3: Marmousi inversion result for: (a) FWI, (b) WRI ∗ ( (cid:15) = 0 . ∗ ( (cid:15) = 0 . ∗ ( (cid:15) = 0 . Comparison of FWI and WRI ∗ Low-velocity lens

This numerical experiment is geared towards a direct comparison of WRI ∗ and FWI, and demonstrates therelative robustness of WRI ∗ against spurious modes. We consider the Gaussian lens model presented in[Huang et al., 2018], here reproduced in Figure 5. The problem consists of imaging a low-velocity anomalyof Gaussian shape from transmission data, starting with an homogeneous model. Contrary to the previousexample, we run this experiment in time domain. The synthetics are generated with a Ricker source waveletof 10 Hz peak frequency (corresponding to a spatial wavelength of 200 m), with no artiﬁcial noise added. Wemodel 15 sources, and gather data with approximately 200 receivers (evenly spaced). The low-velocity zonegenerates triplications that are apparent in the data shown in Figure 6, which cause cycle skipping.In this experiment, we run 20 iterations of the l-BFGS optimizer [Nocedal and Wright, 2006]. Due to cycleskipping, FWI converges to a local minimum, shown in Figure 7b. We stress the fact both WRI or WRI ∗ needbe used in combination with source weighting (equation 22) to converge to the solution [as also documentedin Huang et al., 2018, Symes, 2020b,a]. The WRI ∗ result (obtained with data tolerance (cid:15) = 0) is depictedin Figure 7a. We observe that WRI ∗ is clearly able to reconstruct the anomaly and is not stuck to a localminimum, contrary to FWI. BG Compass

We focus now not only on how WRI ∗ and FWI handle cycle skipping, but also how inaccurate assumptionsabout the underlying physical model of the data reﬂect on the inversion result. For example, what is eﬀect ofnot accounting for the correct anisotropic model when data come from anisotropic modeling? We show thatWRI ∗ , being based on a relaxation of the wave equation, has the potential to mitigate erroneous assumptions,while FWI will be more heavily aﬀected. We therefore compare the results of WRI ∗ and FWI on the BGCompass model according to two distinct scenarios:• perfect modeling assumptions: both given data and modeled synthetics are obtained from the samephysical model (in particular, acoustic modeling). The inversion parameter is the squared slowness;• inaccurate modeling assumptions: given data are produced with a certain TTI model (tilted transverseisotropy), while synthetics are modeled with an inaccurate TTI model. All the TTI parameters but thesquared slowness are kept ﬁxed during inversion.As in the previous example, this set of experiments is carried out in time domain.13igure 5: Low-velocity lens model. Inversion with correct modeling assumptions

The acoustic (constant density) BG Compass model,here considered, is representative of some of the complexities encounter in the ﬁeld, and leads to a notoriouslychallenging problem for FWI due to the velocity kickback approximately located at the depth of 1 Km (seeFigure 8), delaying turning waves traveling back to the surface.We model 51 sources with a Ricker wavelet (peak frequency 10 Hz), and record data with dense receivercoverage (sampling rate according to the grid spacing), without additional noise. Some shot records areplotted in Figure 9. Note that the frequency components below 5 Hz in the data are ﬁltered out and notused during inversion.The diﬃculties associated with the BG Compass model are hinted in Figure 10, where the squared slownessgradients of WRI ∗ and FWI (computed with respect to the background model in Figure 8b) are compared.These are obtained by ﬁltering the data according to a narrow frequency band centered at 5 Hz [a similarresult can be obtained by working directly in the frequency domain, Peters et al., 2014]. We observe thatFWI produces an erroneous update in the water layer, while WRI ∗ points in the correct direction.For the inversion, we adopt a multiscale strategy [Bunks et al., 1995], by ﬁltering and inverting datacomponents of increasing frequency content, from 5 Hz to 20 Hz. This is achieved by preconditioning bothdata and synthetics. For simplicity, we do not adapt the model grid to the varying frequency. At each ofthese sequential stages, we perform l-BFGS optimization for 20 iterations and use the result as the startingguess for the next step. The process is repeated with multiple sweeps across the frequency range, by repeatingthe multiscale inversion twice and restarting from the lowest frequency when the ﬁrst pass is completed. Theﬁnal results for WRI ∗ and FWI are collected in Figure 11. For reference, see also Peters et al. [2014] forfrequency-domain WRI inversion results on a similar model.Notably, the poor update generated by FWI within the water layer eventually leads to the local minimumin Figure 11b (for FWI, we report the result after only one data sweep). The WRI ∗ result (with data14 a) (b)(c) (d) Figure 6: Low-velocity lens data for diﬀerent horizontal source positions (data are normalized).tolerance (cid:15) = 0) is shown in Figure 11a. Clearly, WRI ∗ is able to correct for the water layer and resolvethe high-velocity/low-velocity sequence at 1 Km depth. The deeper part of the model can be improved,in principle, by further optimization or pseudo-Hessian preconditioning via incident ﬁeld energy [see vanLeeuwen and Herrmann, 2013, for a simple Gauss-Newton scheme]. Inversion with inaccurate modeling assumptions

Here, we consider a TTI anisotropic version of the acoustic BG Compass model, previously shown in Figure 8a.The model is rescaled for practical reasons. The Thomsen parameters ε and δ are synthesized from thevelocity model, and dip angles inferred from the orientation of the layers. The TTI model is presented inFigure 12. 15 a)(b) Figure 7: Inversion results for the Gaussian lens problem: (a) WRI ∗ inversion result, (b) FWI inversionresult. 16 a)(b) Figure 8: BG Compass, acoustic model: (a) velocity model, (b) background model.Data are generated according to the same settings described for the acoustic BG Compass model (note,however, that the TTI model size is diﬀerent). In order to illustrate the eﬀect of anisotropy, we compareacoustic and TTI shot gathers in Figure 13, highlighting arrival times diﬀerences for both turning wave andreﬂection events.The inversion is performed with data generated with a Ricker wavelet of peak frequency 10 Hz (with lowfrequencies ﬁltered out), and the optimization is carried out by 20 iterations of the spectral projected gradientmethod (SPG) of Birgin et al. [2000] (the projection here bounding velocity values). We compare the17 a) (b)(c) (d)

Figure 9: Acoustic data for the BG Compass model (in Figure 8a). The shot gathers are normalized fordisplaying purposes.robustness of the WRI ∗ and FWI results by initializing the inversion with increasingly worse smoothed versionof the true velocity model. The anisotropic parameters are smoothed and kept ﬁxed for all the experimentsand throughout the inversion. The inconsistency between true and assumed TTI parameters will be thesource of modeling inaccuracy with which we test the resilience of the velocity model recovery with WRI ∗ and FWI. An example of TTI model with smoothing level σ = 20 (indicating the variance of Gaussiansmoothing) is depicted in Figure 14.The inversion results for WRI ∗ and FWI are displayed in Figure 15. We notice that the starting modeland wrong assumptions about anisotropy have an eﬀect on WRI ∗ results, especially in resolving the shallowhigh-velocity layer and the deeper part of the model. Despite this, the WRI ∗ results remain relatively stablewith respect to the starting guess, the major diﬀerence being a loss of resolution with depth with worsestarting models. More notably, they are consistently better than the FWI results, which are instead heavily18 a)(b) Figure 10: Gradient comparison for a single frequency component 5 Hz (with respect to homogeneous squaredslowness parameters): (a) WRI ∗ (normalized), (b) FWI (normalized). The FWI gradient seeks to update thewater layer in the wrong direction. 19 a)(b) Figure 11: Acoustic BG Compass inversion result for: (a) WRI ∗ , (b) FWI.aﬀected by this choice. Discussion

The application of modern seismic inversion tools to ﬁeld data has been curbed by the need for tremendouscomputational resources, which are required for wave simulations on extremely large 3D grids. Moreover,20 a)(b)(c)

Figure 12: BG Compass, TTI model: (a) Thomsen parameter ε , (b) Thomsen parameter δ , (c) anisotropydip angle θ . Refer to Figure 8a for the velocity model.21 a) (b) Figure 13: Comparison of acoustic vs TTI data for the BG Compass model (in Figure 12). Anisotropymodiﬁes the dipping and curvature of transmitted and reﬂected events. (a) (b)(c) (d)

Figure 14: BG Compass, TTI starting model: (a) velocity model, (b) Thomsen parameter ε , (c) Thomsenparameter δ , (b) anisotropy dip angle θ . In the inversion, the anisotropic parameters ε , δ , and θ here displayedare kept ﬁxed, and only the velocity is updated. Note that the anisotropic parameters are smoothed versionof the true model in Figure 12. 22 a) (b)(c) (d)(e) (f)(g) (h) Figure 15: BG Compass, TTI inversion results acquired with starting velocity models obtained from smoothingthe ground truth with diﬀerent smoothing levels σ (the anisotropy parameters are kept ﬁxed as in Figures 14b,14c, 14d): (a,b) WRI ∗ and FWI results (respectively) for σ = 20, (c,d) σ = 25, (e,f) σ = 30, (g,h) σ = 40. The physical model underlying the inversion always assume inaccurate anisotropy, but WRI ∗ delivers consistently better result than FWI and remain relatively stable with respect to the starting guess.FWI, on the other hand, does increasingly get worse as the starting model is further from the ground truth.23lassical acoustic models are gradually being discarded in favor of pseudo-acoustic or elastic simulators toaccount for anisotropic phenomena in ﬁeld data. Less simplistic modeling inevitably adds to the computationalcomplexity of the problem. Be that as it may, the oil and gas industry manages to employ FWI almoston a routinely basis, thanks to eﬃcient time-domain simulators, and its business value is starting to beacknowledged. Unfortunately, the success of FWI is highly dependent on a lengthy preprocessing phase aimedat the retrieval of a kinematically accurate background and starting guess for seismic inversion.The main objective of this work is to dispense of the need for accurate velocity models in seismic inversion.Despite the fact that many solutions exist focusing on robustness with respect to local minima, they do notscale eﬃciently with grid size and are not relevant for real-world applications. One such example is WRI.WRI is, nonetheless, an attractive method that is not only robust with respect to the starting guess, but alsohas the potential to handle inexact physical assumptions. Indeed, WRI hinges on a relaxed wave equation,whose solution, however, can only be approximated iteratively in 3D with considerable computational costs.We presented a novel method, WRI ∗ which is both scalable and robust. It is based on a denoising reformulationof WRI, which amounts to the minimization of the wave equation error subject to the data misﬁt being lessthan a known noise level. The resulting saddle-point problem involves the usual model unknowns, such assquared slowness, and Lagrangian multipliers having the same dimension as data. We stress that all thecomputations needed for the evaluation of the Lagrangian and its gradients require modeling in time domain,which can be attained for realistically sized problems (contrary to frequency domain formulations). Since thedual variables can be stored in memory, we might in principle devise a joint optimization scheme. In thispaper, however, we approximate the Lagrangian multipliers with a scaled version of data residual in orderto reduce complexity (a more accurate assessment of the trade-oﬀ between multiplier approximation andreconstruction quality is left for future studies).We demonstrated, through extensive numerical experimentation, that the resulting method is consistentlymore robust than FWI against local minima. We also tested the tolerance of the method toward faultymodeling assumptions, e.g. when seismic data obtained with anisotropic modeling is being matched withpredictions based on the wrong anisotropic model. WRI ∗ , which employs a relaxation of the wave equation,produces better results than FWI, under the same conditions.In general, the results obtained from WRI ∗ do depend on the particular choice of hyperparameters such asthe data noise level (cid:15) , deﬁned in equation (11). This parameter governs the trade-oﬀ between wave equationerror and data misﬁt and is akin to the role of the weighting parameter λ , in equation (5), for conventionalWRI. We ﬁnd that decreasing its value tends to produce lower resolution results. On the other hand, choosinga small value for (cid:15) (or even (cid:15) = 0) results in relaxed physics and is beneﬁcial for local minimum avoidance.In our experience, WRI ∗ retains the same ability of WRI to circumvent local minima, but might produceless qualitative results, especially in terms of resolution. We note that conventional WRI is not as aﬀectedas WRI ∗ by the choice of λ . The fundamental reason for this phenomenon is rooted in the inexact, butrelatively inexpensive, dual variable approximation in equation (19), as opposed to equation (17). Indeed,WRI ∗ does not enjoy the well-conditioning properties of variable projection [Golub and Pereyra, 2003].Taking into account the dependency of the approximated dual variable with respect to the model parameter,as in equation (21), is designed to partially compensate for this behavior [Ablin et al., 2020]. We thenadvocate for either a continuation strategy for (cid:15) , by increasing its value sensibly during the inversion, properpreconditioning [as in the Gauss-Newton approach in van Leeuwen and Herrmann, 2013], or a sequentialapproach consisting of a WRI ∗ phase followed by FWI. Preliminary results prove this strategy successful.Another approach might consist of iteratively solving for the optimal dual variable, but we envisage this canbe only feasible in combination with stochastic optimization, where simultaneous sources are considered.From a computational perspective, WRI ∗ requires roughly twice as much the cost needed for FWI, perobjective evaluation or gradient calculation. Given the gain in robustness previously discussed, we deem thisextra cost acceptable in the many situations where FWI would normally fail, since the result improvementeasily oﬀsets the extra computations. Naturally, hybrid combinations of WRI ∗ and FWI are also possible, as24hown here, to amortize these costs. In general, stochastic optimization techniques can be straightforwardlyincorporated in WRI ∗ , further reducing its complexity [Krebs et al., 2009, van Leeuwen and Herrmann, 2012]. Conclusion

We proposed a seismic inversion method whose main strength is being both robust with respect to localminima and scalable to 3D. Traditional methods are either scalable but prone to cycle skipping, like fullwaveform inversion, or less sensitive to the starting model but unfeasible for large problems, like waveﬁeldreconstruction inversion. Our proposal, despite being a reformulation of waveﬁeld reconstruction inversion,can leverage on modern time-domain solvers, which is key to tackle realistically sized problems. Numericalexperiments indicate that our proposal is consistently superior to full waveform inversion when cycle skippingoccurs. Moreover, it beneﬁts from a relaxed formulation of the wave equation, and is more resilient thanfull waveform inversion with respect to starting models when inaccurate modeling assumptions are made. Acomparison with conventional waveﬁeld reconstruction inversion highlights competitive results, albeit withsome loss of resolution, depending on the choice of hyperparameters like the estimated data noise level. Ithas been shown that these shortcomings can be easily avoided with an hybrid scheme involving full waveforminversion and/or pseudo-Hessian preconditioning. Our method is mature for 3D, but the computational eﬀortrequired is twice as much what is prescribed by full waveform inversion (per gradient calculation). This extracost can be reduced by aforementioned hybrid schemes and stochastic optimization.

Related material

A Julia implementation of the method herein described can be found in the following repository:https://github.com/slimgroup/JUDI.jl/tree/twri/examples/twri

Acknowledgments

This work has been implemented in Julia and leverages on the Devito framework for seismic modeling, whichallows automatic generation of highly-optimized ﬁnite-diﬀerence C code, given only a symbolic representationof the wave equation [Louboutin et al., 2019, Luporini et al., 2018]. From Julia, we interface to Devitothrough the JUDI package [Witte et al., 2019a].

References

Pierre Ablin, Gabriel Peyré, and Thomas Moreau. Super-eﬃciency of automatic diﬀerentiation for functionsdeﬁned as a minimum. Technical report, 2020. URL https://arxiv.org/pdf/2002.03722.H. S. Aghamiry, A. Gholami, and S. Operto. Improving full-waveform inversion by waveﬁeld reconstructionwith the alternating direction method of multipliers.

Geophysics , 84(1):R139–R162, 2019.T. Alkhalifah. An acoustic wave equation for anisotropic media.

GEOPHYSICS , 65(4):1239–1250, 2000. doi:10.1190/1.1444815.W. B. Beydoun and A. Tarantola. First Born and Rytov approximations: Modeling and inversion conditionsin a canonical example.

Journal of the Acoustical Society of America , 83(3):1045–1055, mar 1988. doi:10.1121/1.396537. 25. Biondi and A. Almomin. Simultaneous inversion of full data bandwidth by tomographic full-waveforminversion.

Geophysics , 79(3):WA129–WA140, 2014. doi: 10.1190/geo2013-0340.1.E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone Spectral Projected Gradient Methods on ConvexSets.

SIAM Journal on Optimization , 10(4):1196–1211, 2000. doi: 10.1137/S1052623497330963.A. Bjorck.

Numerical Methods for Least Squares Problems . Society for Industrial and Applied Mathematics,1996. doi: 10.1137/1.9781611971484. URL https://epubs.siam.org/doi/abs/10.1137/1.9781611971484.E. Bozdag, J. Trampert, and J. Tromp. Misﬁt functions for full waveform inversion based on instantaneousphase and envelope measurements.

Geophys. J. Int. , 185(2):845–870, 2011. doi: 10.1111/j.1365-246X.2011.04970.x.C. Bunks, F. M. Saleck, S. Zaleski, and G. Chavent. Multiscale seismic waveform inversion.

GEOPHYSICS ,60(5):1457–1473, 1995. doi: 10.1190/1.1443880.E. Duveneck, P. Milcik, P. M. Bakker, and C. Perkins.

Acoustic VTI wave equations and their application foranisotropic reverse-time migration , pages 2186–2190. 2008. doi: 10.1190/1.3059320.E. Esser, L. Guasch, T. van Leeuwen, A. Y. Aravkin, and F. J. Herrmann. Total-variation regularizationstrategies in full-waveform inversion.

SIAM Journal on Imaging Sciences , 11(1):376–406, 2018. doi:10.1137/17M111328X.G. Golub and V. Pereyra. Separable nonlinear least squares: the variable projection method and itsapplications.

Inverse problems , 19(2):R1, 2003.V. Grechka, L. Zhang, and J. W. III Rector. Shear waves in acoustic anisotropic media.

GEOPHYSICS , 69(2):576–582, 2004. doi: 10.1190/1.1707077.L. Guasch, M. Warner, and C. Ravaut. Adaptive waveform inversion: Practice.

GEOPHYSICS , 84(3):R447–R461, 2019. doi: 10.1190/geo2018-0377.1.E. Haber, U. M Ascher, and D. Oldenburg. On optimization techniques for solving nonlinear inverse problems.

Inverse problems , 16(5):1263, 2000.G. Huang, R. Nammour, and W. W. Symes. Full-waveform inversion via source-receiver extension.

Geophysics ,82(3):R153–R171, 2017.G. Huang, R. Nammour, and W. W. Symes. Volume source-based extended waveform inversion.

Geophysics ,83(5):R369–R387, 2018.H. Knibbe, C. Vuik, and C. W. Oosterlee. Reduction of computing time for least-squares migration based onthe Helmholtz equation by graphics processing units.

Computational Geosciences , 20(2):297–315, 2016.J. R. Krebs, J. E. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein, and M.-D. Lacasse. Fastfull-waveﬁeld seismic inversion using encoded sources.

Geophysics , 74(6):WCC177–WCC188, 2009.M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J.Gorman. Devito (v3.1.0): an embedded domain-speciﬁc language for ﬁnite diﬀerences and geophysicalexploration.

Geoscientiﬁc Model Development

GEOPHYSICS , 56(5):645–653, 1991. doi:10.1190/1.1443081.F. Luporini, M. Lange, M. Louboutin, N. Kukreja, J. Hückelheim, C. Yount, P. Witte, P. H. J. Kelly, F. J.Herrmann, and G. J. Gorman. Architecture and performance of Devito, a system for automated stencilcomputation.

CoRR , abs/1807.03032, jul 2018. URL http://arxiv.org/abs/1807.03032.26. Luporini, M. Lange, M. Louboutin, N. Kukreja, J. Hückelheim, C. Yount, P. A. Witte, P. H. J. Kelly, G. J.Gorman, and F. J. Herrmann. Architecture and performance of devito, a system for automated stencilcomputation.

ACM Trans. Math. Softw. , 46(1), 04 2020. doi: 10.1145/3374916. URL https://slim.gatech.edu/Publications/Public/Journals/ACMTOMS/2020/luporini2018aap/luporini2018aap.pdf. (ACM Trans.Math. Softw.).L. Métivier, R. Brossier, Q. Mérigot, E. Oudet, and J. Virieux. Measuring the misﬁt between seismogramsusing an optimal transport distance: application to full waveform inversion.

Geophys. J. Int. , 205(1):345–377, 2016. doi: 10.1093/gji/ggw014.J. Nocedal and S. Wright.

Numerical optimization . Springer Science & Business Media, 2006.B. Peters and F. J. Herrmann. Constraints versus penalties for edge-preserving full-waveform inversion.

TheLeading Edge , 36(1):94–100, 01 2017. doi: 10.1190/tle36010094.1. (The Leading Edge).B. Peters, F. J. Herrmann, and T. van Leeuwen. Wave-equation Based Inversion with the Penalty Method- Adjoint-state Versus Waveﬁeld- reconstruction Inversion.

Geophysics , 84(2):R251–R269, 2018. doi: 10.1190/geo2018-0192.1.Bas Peters and Felix J. Herrmann. A numerical solver for least-squares sub-problems in 3D waveﬁeldreconstruction inversion and related problem formulations. In

SEG Technical Program Expanded Abstracts ,pages 1536–1540, 09 2019. doi: 10.1190/segam2019-3216638.1. (SEG, San Antonio).R. T. Rockafellar.

Convex analysis . Princeton university press, 1970.S. Sharan, R. Wang, and F. J. Herrmann. Fast sparsity-promoting microseismic source estimation.

GeophysicalJournal International , 216(1):164–181, 2018.C. Shin and Y. H. Cha. Waveform inversion in the Laplace domain.

Geophys. J. Int. , 173(3):922–931, 2008.doi: 10.1111/j.1365-246X.2008.03768.x.C. Shin and Y. H. Cha. Waveform inversion in the Laplace–Fourier domain.

Geophys. J. Int. , 177(3):1067–1079, 2009. doi: 10.1111/j.1365-246X.2009.04102.x.R. M. Shipp and S. C. Singh. Two-dimensional full waveﬁeld inversion of wide-aperture marine seismicstreamer data.

Geophys. J. Int. , 151(2):325–344, 2002. doi: 10.1046/j.1365-246X.2002.01645.x.L. Sirgue and R. G. Pratt. Eﬃcient waveform inversion and imaging: A strategy for selecting temporalfrequencies.

Geophysics , 69(1):231–248, 2004. doi: 10.1190/1.1649391.C. C. Stolk and W. W. Symes. Smooth objective functionals for seismic velocity inversion.

Inverse problems ,19(1):73, 2002.B. Sun and T. Alkhalifah. The application of an optimal transport to a preconditioned data matchingfunction for robust waveform inversion.

Geophysics , 84(6):R923–R945, nov 2019. ISSN 0016-8033. doi:10.1190/geo2018-0413.1.W. W. Symes. Reverse time migration with optimal checkpointing.

Geophysics , 72(5):SM213–SM221, 2007.W. W. Symes. Migration velocity analysis and waveform inversion.

Geophysical Prospecting , 56:765–790,2008. doi: 10.1111/j.1365-2478.2008.00698.x. 27. W. Symes. Waveﬁeld Reconstruction Inversion: an example.

Inverse Problems , mar 2020a. URLhttp://arxiv.org/abs/2003.14181.W. W. Symes. Full Waveform Inversion by Source Extension: Why it works. mar 2020b. URL http://arxiv.org/abs/2003.12538.A. Tarantola and B. Valette. Generalized nonlinear inverse problems solved using the least squares criterion.

Reviews of Geophysics , 20(2):219–232, 1982.P. M. van den Berg and R. E. Kleinman. A contrast source inversion method.

Inverse problems , 13(6):1607,1997.T. van Leeuwen and F. J. Herrmann. Fast waveform inversion without source-encoding.

GeophysicalProspecting , pages 10–19, 2012. doi: https://doi.org/10.1111/j.1365-2478.2012.01096.x.T. van Leeuwen and F. J. Herrmann. Mitigating local minima in full-waveform inversion by expanding thesearch space.

Geophysical Journal International , 195(1):661–667, 2013.J. Virieux and S. Operto. An overview of full-waveform inversion in exploration geophysics.

Geophysics , 74(6):WCC1–WCC26, 2009.C. Wang, D. Yingst, P. Farmer, and J. Leveille. Full-waveform inversion with the reconstructed waveﬁeldmethod. In

SEG Technical Program Expanded Abstracts 2016 , pages 1237–1241. Society of ExplorationGeophysicists, 2016.R. Wang and F. J. Herrmann. A denoising formulation of full-waveform inversion. In

SEG Technical ProgramExpanded Abstracts , pages 1594–1598, 2017a. doi: 10.1190/segam2017-17794690.1.R. Wang and F. J. Herrmann. A denoising formulation of Full-Waveform Inversion. In

SEG TechnicalProgram Expanded Abstracts 2017 , pages 1594–1598. Society of Exploration Geophysicists, 2017b.S. Wang, M. V. de Hoop, and J. Xia. On 3D modeling of seismic wave propagation via a structured parallelmultifrontal direct Helmholtz solver.

Geophysical Prospecting , 59(5):857–873, 2011.M. Warner and L. Guasch. Adaptive waveform inversion: Theory.

GEOPHYSICS , 81(6):R429–R445, 2016.doi: 10.1190/geo2015-0387.1.P. A. Witte, M. Louboutin, N. Kukreja, F. Luporini, M. Lange, G. J. Gorman, and F. J. Herrmann. Alarge-scale framework for symbolic implementations of seismic inversion algorithms in Julia.

GEOPHYSICS ,84(3):F57–F71, 2019a. doi: 10.1190/geo2018-0174.1. URL https://doi.org/10.1190/geo2018-0174.1.P. A. Witte, M. Louboutin, F. Luporini, G. J. Gorman, and F. J. Herrmann. Compressive least-squaresmigration with on-the-ﬂy fourier transforms.

Geophysics , 84(5):R655–R672, 08 2019b. doi: 10.1190/geo2018-0490.1. (Geophysics).P. A. Witte, M. Louboutin, H. Modzelewski, C. Jones, J. Selvage, and F. J. Herrmann. An event-drivenapproach to serverless seismic imaging in the cloud.

IEEE Transactions on Parallel and Distributed Systems ,31(9):2032–2049, 03 2020. doi: 10.1109/TPDS.2020.2982626. URL https://slim.gatech.edu/Publications/Public/Journals/IEEETPDS/2020/witte2019TPDedas/witte2019TPDedas.html. (IEEE Transactions onParallel and Distributed Systems).Y. Yang, B. Engquist, J. Sun, and B. F. Hamfeldt. Application of optimal transport and the quadraticwasserstein metric to full-waveform inversion.