[PDF] Full waveform inversion using extended and simultaneous sources

Abstract

PDE-constrained optimization problems are often treated using the reduced formulation where the PDE constraints are eliminated. This approach is known to be more computationally feasible than other alternatives at large scales. However, the elimination of the constraints forces the optimization process to fulfill the constraints at all times. In some problems this may lead to a highly non-linear objective, which is hard to solve. An example to such a problem, which we focus on in this work, is Full Waveform Inversion (FWI), which appears in seismic exploration of oil and gas reservoirs, and medical imaging. In an attempt to relieve the non-linearity of FWI, several approaches suggested to expand the optimization search space and relax the PDE constraints. This comes, however, with severe memory and computational costs, which we aim to reduce. In this work we adopt the expanded search space approach, and suggest a new formulation of FWI using extended source functions. To make the source-extended problem more feasible in memory and computations, we couple the source extensions in the form of a low-rank matrix. This way, we have a large-but-manageable additional parameter space, which has a rather low memory footprint, and is much more suitable for solving large scale instances of the problem than the full rank additional space. In addition, we show how our source-extended approach is applied together with the popular simultaneous sources technique -- a stochastic optimization technique that significantly reduces the computations needed for FWI inversions. We demonstrate our approaches for solving FWI problems using 2D and 3D models with high frequency data only.

Full PDF

FFULL WAVEFORM INVERSION USING EXTENDED ANDSIMULTANEOUS SOURCES

SAGI BUCHATSKY ∗ AND ERAN TREISTER ∗ Abstract.

PDE-constrained optimization problems are often treated using the reduced formula-tion where the PDE constraints are eliminated. This approach is known to be more computationallyfeasible than other alternatives at large scales. However, the elimination of the constraints forcesthe optimization process to fulﬁll the constraints at all times. In some problems this may lead toa highly non-linear objective, which is hard to solve. An example to such a problem, which wefocus on in this work, is Full Waveform Inversion (FWI), which appears in seismic exploration of oiland gas reservoirs, and medical imaging. In an attempt to relieve the non-linearity of FWI, severalapproaches suggested to expand the optimization search space and relax the PDE constraints. Thiscomes, however, with severe memory and computational costs, which we aim to reduce. In thiswork we adopt the expanded search space approach, and suggest a new formulation of FWI usingextended source functions. To make the source-extended problem more feasible in memory and com-putations, we couple the source extensions in the form of a low-rank matrix. This way, we have alarge-but-manageable additional parameter space, which has a rather low memory footprint, and ismuch more suitable for solving large scale instances of the problem than the full rank additionalspace. In addition, we show how our source-extended approach is applied together with the popularsimultaneous sources technique—a stochastic optimization technique that signiﬁcantly reduces thecomputations needed for FWI inversions. We demonstrate our approaches for solving FWI problemsusing 2D and 3D models with high frequency data only.

Key words.

Inverse problems, PDE-constrained optimization, Gauss-Newton, Full WaveformInversion, Trace estimation, Extended Sources, Low rank minimization, Alternating minimization.

AMS subject classiﬁcations.

1. Introduction.

Many computational science applications involve parameterestimation of partial diﬀerential equations (PDEs), known as inverse problems [15,47, 38]. One challenging inverse problem is

Full Waveform Inversion (FWI), wherewe aim to determine the wave velocity, density, and possibly other parameters of aheterogeneous medium, given sources and waveform observations at receiver locationson its boundary [25]. Inverse problems of this type arise in seismic exploration ofoil and gas reservoirs, earth sub-surface mapping, ultrasound imaging [8], opticaldiﬀraction tomography [39], brain imaging [14] and more.The estimation of the velocity model is usually performed by ﬁtting numericallysimulated data to observed ﬁeld data. This results in an optimization problem thatneeds to be solved by an iterative descent algorithm, which gradually reduces themisﬁt between the simulated and ﬁeld data. This data ﬁtting problem is usuallyaccompanied by a suitable regularization that aims to introduce prior informationon the estimated coeﬃcients [40]. In this paper we focus on FWI in the context ofseismic exploration, where the wave velocity of the earth subsurface is estimated. Thisproblem has gained popularity in the last decade with the advances in data acquisitiontechniques, computing power and numerical algorithms [27, 12, 21, 37, 10, 45, 44, 25].Solving the FWI problem is challenging in two main aspects, both of which maybe relevant to other inverse problems. First, it is an ill-posed and highly non-convexproblem, especially if the sources and receivers are placed on the same surface. Thenthe problem typically has multiple minima, and convergence to a local minimumleads to a non-plausible estimated model. Many approaches have been proposed for ∗ Department of Computer Science, Ben-Gurion University of the Negev, Beer Sheva, Israel.( [email protected], [email protected] ) 1 a r X i v : . [ c s . C E ] F e b Sagi Buchatsky and Eran Treiser solving FWI, however, it is still considered diﬃcult to solve, as solutions techniquescan be unstable, converging to local minima in many scenarios. Some recent workssuggest to complement FWI with other more stable inverse problems such as traveltime tomography [24, 41] or electromagnetic inversion [19]. Other approaches can befound in [10, 22]. Such techniques are useful in some cases, but still, FWI is consideredchallenging and not robust enough in real life.In addition to being ill-posed, FWI is highly computationally expensive, bothin memory usage and computations. The problem typically involves a large numberof sources and several frequencies, and requires multiple solutions of the forwardproblem to simulate the data, each of which is challenging by itself. These numericalsimulations are either obtained by propagating the wave equation (in time domain),or by solving the Helmholtz equation (in frequency domain). Both options introducecomputational challenges due to the scale and properties of the problem. The aim ofthis work is to help dealing with both of the aforementioned challenges—non-linearityand computational cost—and make the FWI solution process more robust, but alsomore computationally feasible at the same time, involving less forward simulations.

Relieving the non-linearity by enlarging the search-space

Our ﬁrst objective in thiswork is to relieve some of non-linearity in FWI. Recent works [2, 20, 46] have demon-strated that relaxing the PDE constraints and enlarging the search-space may allowus to avoid local minima, and ﬁnd plausible solution of FWI from rather arbitraryinitial guesses. This can be obtained by introducing the full waveforms as variables,or extending the point-wise sources to the full domain. However, this comes with asevere memory footprint in 3D, coming from the introduction of the full waveformsor extended sources as variables of the optimization problem. Both options introduceseveral hundreds (or more) vectors of unknowns to keep track of and iterate on aspart of the solution process, each at the size of the 3D domain.In our ﬁrst contribution in this paper we suggest a new formulation of FWI thathas the additional search parameters in the form of low-rank and sparse extendedsource functions. This way, we relax the PDE constraints, while at the same time wekeep the inversion manageable in memory (comparable to the reduced FWI version),by using low-rank source extensions and sparsity promoting penalties.

Reducing computational complexity

To further reduce the computational complex-ity of the extended inversion problem, we adopt a stochastic optimization techniquecalled simultaneous sources [18]. In this approach we reduce the number of sourcesinvolved in each iteration of the optimization, by applying a randomized trace estima-tion technique [7] to the data misﬁt term. While standard stochastic techniques arebased on random subsets of the misﬁt terms (sources), in the simultaneous techniquewe project all the sources onto a smaller dimension using a random matrix. This way,the sources are randomly mixed, and the inversion is not biased towards any geometryof the sources’ locations as in the subset-based methods [18]. Therefore the simul-taneous sources technique was found to be more eﬀective than the subset-of-sourcestechnique in general PDE-constrained optimization like DC-resistivity [32, 30], andFWI in particular [43].However, as it is, the simultaneous sources technique is not suitable for the ex-perimental setting of typical real-life FWI scenarios. That includes common settingswhere, for example, each source only aﬀects part of the receivers, or sporadicallymissing data due to malfunctioning receivers, or cases where the measurement noiseparameters (the co-variance matrices) are not similar for all the sources. The par-ticular case of sporadically missing data was addressed in [31] using a smooth datacompletion regularization for the DC-resistivity problem. This solution, however, is

WI USING EXTENDED AND SIMULTANEOUS SOURCES

2. Preliminaries and background.

To model the waveforms for FWI in fre-quency domain, we consider the discrete acoustic Helmholtz equation as a forwardproblem, assuming a constant density media H ( m , ω ) u = ∆ h u + ω m (cid:12) u = q s . (2.1)The symbol ∆ h represents the discretized Laplacian on a nodal regular grid, u = u ( m , ω, x s ) is the discrete waveﬁeld, m > ω is the frequency, and the symbol (cid:12) denotes the Hadamard product.The source, q s is assumed to be a discretization of a delta function, δ ( x − x s ), that islocated at x s . The equation is discretized on a ﬁnite domain and is accompanied withabsorbing boundary conditions on all sides of the domain, and possible attenuationterm [27]. At high wave-numbers, the Helmholtz linear system (2.1) is very challengingto solve numerically—it is indeﬁnite, highly ill-conditioned, and it requires a very ﬁnemesh to accurately capture the wave behavior. To solve FWI, we require multiplesolutions of (2.1), which makes the solution of the inverse problem highly expensiveand cumbersome [41] in addition to the other diﬃculties mentioned earlier.In FWI, sources are located at many locations on the top part of the grid, andthe waveform that is generated by each source is recorded at locations where receiversare placed. In our formulation, each observed data sample corresponding to a sourceand frequency ω , is given by(2.2) d obs ( ω, x s ) = P (cid:62) s u ( m true , ω, x s ) + (cid:15) where P s is a sampling matrix that measures the wave ﬁeld u that is generated bythe source at x s at the locations of receivers. The data contain noise (cid:15) , which weassume to be i.i.d, Gaussian, and with zero mean and (co-)variance Σ. Given datafor many sources and several frequencies, we aim to estimate the true model, m true ,by minimizing the diﬀerence between the measured and simulated data, obtained bysolving (2.1) across all the sources and frequencies. This may be done by solving thePDE-constrained optimization problemmin m L ≤ m ≤ m H { u sj } nf nsj =1 ,s =1 Φ Constrained ( m , { u sj } ) = n f (cid:88) j =1 n s (cid:88) s =1 (cid:13)(cid:13) P (cid:62) s u sj − d obs sj (cid:13)(cid:13) − sj + αR ( m )(2.3) s.t. H ( m , ω j ) u sj = q s , s = 1 , . . . , n s , j = 1 , . . . , n f Sagi Buchatsky and Eran Treiser where u sj is the waveform for source s and frequency ω j , which is predicted for a givenmodel m , according to the forward problem (2.1). The data terms d obs sj = d obs ( ω j , x s )are the corresponding observed data as in (2.2).We use the upper and lower bounds m H > m L >

0, and a regularization term R ( m ), which is accompanied by a parameter α >

0. These promote prior informationin the inversion, and help us ﬁnd a reasonable solution to the otherwise ill-posedproblem [47, 38]. To this end, we assume m to be a layered model, and choose R topromote piece-wise smooth functions like the total variation regularization term [33].We elaborate on these choices later in the results section.Since we have many sources and several frequencies, the solution of the inverseproblem at large scales requires parallel/distributed software and resources. Typically,the solution will be distributed to workers according to frequencies, sources, and evensub-domains, and some components (e.g., the gradient for m ) will be assembled on amaster worker or a small group of workers. For more information, and an open-sourcecode, see [34] and references therein. As stated before, reachinglocal minima is a major problem in solving FWI. The ﬁrst step in trying to avoidlocal minima is adopting a method called frequency continuation, which has provento be eﬀective in [28, 27]. Frequency continuation is obtained by ﬁrst approximatelysolving the problem (2.3) (or one of the equivalent formulations that we show later)using the lower frequencies only in its cost function, to build a smooth approximationof the velocity model. We then add more and more frequencies and approximatelysolve the problems, each time starting from the previously obtained model, until wecover all of the frequencies. Algorithm 2.1 summarizes the “window-wise” frequencycontinuation approach that we use in this work, where we consider a window offrequencies at each time, instead of all of them. Furthermore, in this work we use theGauss-Newton (GN) method to approximately minimize the corresponding Φ in everyfrequency continuation iteration [28]. We brieﬂy present the projected GN later, andrefer the reader to [34, 41] for a more detailed description of GN using similar notation.Projected Conjugate Gradients (CG) is used for the inner Newton problems.

Algorithm 2.1

Frequency continuation ω < ... < ω n f . ws : Window size of frequencies that we work on each time. i start , i end - Initial and ﬁnal frequencies to consider. i start ≥ i end ≤ n f .Initialize m by some reference model (or have it from a previous cycle). for i = i start : i end do Approximately solve Φ using data for ω max { i − ws, } , ..., ω i , starting from a previousmodel m . end While having good results, the frequency continuation has a major obstacle -starting with a good initial guess m ref is critical for an accurate estimation. Findingsuch a m ref is not a trivial task, since our data does not correspond to low enoughfrequencies because of acquisition limitations. Some of the previously mentionedmethods aim exactly at ﬁnding the initial guess, while in this work we oﬀer a moregeneral approach to try to escape local minima eﬃciently. One of the main approaches to solve (2.3) considers m and all the ﬁelds u sj as variables, and handles the PDE constraints using the WI USING EXTENDED AND SIMULTANEOUS SOURCES s, j [17, 25].However, in real life 3D scenarios, solving the problem is computationally demandingboth in terms of memory and computations. The forward problem (2.1) requires a veryﬁne mesh of hundreds of millions of grid points for each u sj . The inverse problem (2.3)typically includes several hundreds of sources q s and tens of frequencies ω j . Handlingthe additional u sj variables imposes severe memory issues at large scales, since thesevectors are huge, and there are many of them. Manipulating them iteratively in theoptimization process is quite cumbersome at such large scales. Another common approachto solve (2.3) is by eliminating the PDE constraints, setting(2.4) u sj = u sj ( m ) = H ( m , ω j ) − q s . This results in the reduced unconstrained (with respect to the PDEs) optimizationproblem(2.5) min m L ≤ m ≤ m H Φ Reduced ( m ) = n f (cid:88) j =1 n s (cid:88) s =1 (cid:13)(cid:13) P (cid:62) s H ( m , ω j ) − q s − d obs sj (cid:13)(cid:13) − sj + αR ( m )for the model m only. This problem is equivalent to (2.3), but it imposes diﬀerentsolution techniques. In particular, using traditional methods like GN [28], it is pos-sible to handle large instances of (2.5) by occasional use of the disk [41], instead offrequently manipulating the many large vectors u sj .However, while the unconstrained (2.5) is more feasible to handle than (2.3)at large scales, it is highly non-linear. This non-linearity partially stems from theelimination of the PDE constraints—the optimization process is forced to fulﬁll thoseconstraints at all times. Consequently, [46] suggested to relax these constraints. Thatis, to allow the ﬁelds u sj not to satisfy the PDE (2.1), which is weakly enforced by apenalty term using the L norm. The resulting constrained problem is given bymin m L ≤ m ≤ m H { u sj } nf nsj =1 ,s =1 Φ Penalized ( m , { u sj } ) = n f (cid:88) j =1 n s (cid:88) s =1 (cid:13)(cid:13) P (cid:62) s u sj − d obs sj (cid:13)(cid:13) − sj +(2.6) β n f (cid:88) j =1 n s (cid:88) s =1 (cid:107)H ( m , ω j ) u sj − q s (cid:107) + αR ( m ) . This problem has penalty terms which essentially replace the PDE constraints, and β > L penalty method was further developed (along with suitable handling of bound-constraints and regularization) in [1, 2, 3] using the alternating direction method ofmultipliers (ADMM), which relieves the user from choosing a large β to fulﬁl theconstraints in high accuracy. These approaches, like the all-at-once method to solve(2.3), include the iterative updates of the vectors { u sj } , which require a lot of memoryand are almost impractical in certain scenarios. A diﬀerent approach, albeit inthe same spirit as the penalty approach in the context of this work, enlarges the searchspace of the problem by extending the point-sources q s [20]. Even though this worksuggested the approach in a time-domain formulation, the equivalent formulation in Sagi Buchatsky and Eran Treiser frequency domain may result in the problemmin m L ≤ m ≤ m H { z s } nss =1 Φ ExtSrc ( m , { z s } n s s =1 ) = n s (cid:88) s =1 n f (cid:88) j =1 (cid:13)(cid:13) P (cid:62) s H ( m , ω j ) − z s − d obs sj (cid:13)(cid:13) − sj +(2.7) β n s (cid:88) s =1 (cid:107) z s (cid:107) W s + αR ( m ) . Here, the extended sources z s are introduced as variables, and are penalized by aweighted (cid:96) norm. The weight matrices W s do not penalize z s at the sources locations x s , hence setting the extended sources to be the original ones, i.e., z s = q s , does notpenalize the objective. This is another form of enlarging the search space and relaxingthe PDE constraints. Similarly to before, the variables z s are large and typically wehave too many of them in 3D. We also note that here, we need to choose β suﬃcientlylarge so that the extension of the sources vanishes at the end of the optimization.This method showed very promising results, but is expensive. In this work we wishto make it more applicable at large scales. We note that compared to (2.3) or [46],we have only n s unknown vectors z s in addition to m , instead of n s · n f vectors inthe ﬁelds u sj . The downside of the extended sources approach compared to (2.3) or[46], is that the forward problem (2.1) needs to be solved repeatedly for the sourcevariables z s . As mentioned before, oneof the most eﬀective ways to reduce the computational cost of solving inverse problemsis by simultaneous sources, which is obtained by trace estimation [18]. That is, givena matrix A ∈ R m × n , we can approximately calculate its Frobenius norm by [7]:(2.8) (cid:107) A (cid:107) F = trace ( A (cid:62) A ) = E x (cid:107) Ax (cid:107) ≈ p p (cid:88) i =1 (cid:107) Ax i (cid:107) where each x i ∈ R n is chosen from Radermacher distribution—each element is ran-domly chosen from the set {− , } with equal probability. We can reformulate inmatrix notation: p (cid:80) pi =1 (cid:107) Ax i (cid:107) = p (cid:107) AX (cid:107) F where X is a n × p matrix whose col-umn i is x i . In practice we calculate a norm of an m × p matrix instead of an m × n matrix, which can save signiﬁcant computations when p (cid:28) n .To apply the simultaneous sources technique, we wish to approximate the datamisﬁt terms in formulations like (2.3) or (2.5), by applying (2.8). However, this ispossible only when the operators P s in these formulations do not depend on s , i.e.,when all the receivers record the waveforms from all the sources. The same goes forthe error (co-)variance matrices Σ sj . Such requirements are not met in many realisticscenarios. The work of [23] handled this issue using a quite general approach: bysplitting the data term. In this approach a new set of data variables ˆ d sj is introduced,and is deﬁned on the union of the supports of P s for all s —that is the union of thelocations of all the receivers for all the sources. Next, ˆ d sj should be similar to theobserved data d obs sj at the locations of the receivers that actually record the waveformfor each source s . To this end, (2.5), for example, is reformulated asmin m L ≤ m ≤ m H { ˆ d sj } nf nsj =1 ,s =1 Φ ReducedSplit ( m , { ˆ d sj } ) =(2.9) n f (cid:88) j =1 n s (cid:88) s =1 (cid:13)(cid:13)(cid:13) P (cid:62) H ( m , ω j ) − q s − ˆ d sj (cid:13)(cid:13)(cid:13) − j + η (cid:107) ˆ P (cid:62) s ˆ d sj − d obs sj (cid:107) − sj + αR ( m ) , WI USING EXTENDED AND SIMULTANEOUS SOURCES P is an operator that projects a vector u onto the union of the receivers’locations, and ˆ P s is deﬁned to choose the subset of receivers for source s out of thatunion (that is, such that P (cid:62) s = ˆ P (cid:62) s P (cid:62) ). η > d sj will be similar to d obs sj at the support ofthe receivers of each source s . Minimizing (2.9) with respect to ˆ d sj is trivial, andminimizing it with respect to m is similar to (2.5), but can be applied by stochastictrace estimation more eﬃciently. Hence, techniques such as alternating minimizationor variable projection [6] are highly favorable here. Adding a regularization to ˆ d sj like in [23] might improve this formulation and is also a subject of research.To apply the simultaneous sources technique to (2.9), we ﬁrst select the size ofthe subspace we want (denoted by p ), then deﬁne a random matrix X of size n s × p from Rademacher distribution. By (2.8) we obtain (cid:88) s (cid:107) P (cid:62) H ( m , ω j ) − q s − ˆ d s,j (cid:107) − j = (cid:107) P (cid:62) H ( m , ω j ) − Q − ˆ D j (cid:107) − j (2.10) ≈ p (cid:107) P (cid:62) H ( m , ω j ) − QX − ˆ D j X (cid:107) − j where Q is the matrix whose columns are all the sources q s , and ˆ D j is the matrix ofall data unknowns ˆ d sj . This eﬀectively reduces the number of sources to p in everyiteration. Following this splitting, we henceforth present our methods assuming that P s = P , and Σ sj = Σ j for all the sources s , and ignore the extra data term in (2.9).

3. Robust FWI with low-rank and sparse extended sources.

In this workwe aim to relax the PDE-constraints without introducing full (memory consuming)variables such as u sj , aiming for a problem that is similar to the more memory-friendly(2.5) rather than to (2.3) or (2.7). To this end, we consider source extensions z s asvariables. In our formulation, the new sources are q s + z s (or Q + Z in matrix notation),and we aim that at the end of the reconstruction, z s will vanish. However, since wehave many sources, this again introduces a signiﬁcant amount of extra variables, like(2.7), that we wish to prevent. To achieve that, we couple between all the extendedsources, and deﬁne all of them as a low-rank matrix. Also, we use the (cid:96) norm aspenalty, and aim that the source extensions are sparse to further reduce the memoryconsumption. Because the vectors q s are sparse (they are a discrete δ function), theydo not require a lot of memory—we wish the same for the extensions z s . The (cid:96) norm gives a higher penalty for small non-zero numbers compared to the (cid:96) norm,and encourages sparse results with a high amount of zeros.More explicitly, we use the matrix notation of (2.10), and deﬁne the source ex-tension matrix as Z = Z Z , where Z ∈ C N × n es and Z ∈ C n es × n s are the matricesthat form the low-rank decomposition of Z . N is the total number of grid nodes inthe domain, n s is the number of original sources and n es < n s is the maximal rankof the extended sources matrix Z (essentially the number of extended sources). n es is chosen to be small enough such that Z can reasonably ﬁt in memory, and thecomputations involving Z are reasonable. On the other hand, Z needs to be rich(wide) enough to allow the relaxation of the PDE-constraints. This is a balance that Sagi Buchatsky and Eran Treiser we need to manage. Our optimization problem (in matrix notation) becomes:min m L ≤ m ≤ m H Z , Z Φ LowRankExtSrc ( m , Z , Z ) =(3.1) n f (cid:88) j =1 (cid:13)(cid:13) P (cid:62) H ( m , ω j ) − ( Q + Z Z ) − D obs j (cid:13)(cid:13) − j + β (cid:107) Z (cid:107) + β (cid:107) Z (cid:107) F + αR ( m ) .β , β > Z matrices to change and reduce the objective. The penalty term for Z is the Frobeniusnorm, and the penalty for Z is the element-wise (cid:96) norm to promote sparsity in Z .This stems from memory considerations to help cases where the required rank of Z needs to be high. This way, the PDE-constraints are relaxed, since we removethe demand to strictly satisfy the PDE constraints for the original sources Q . Theadvantage is that we achieve that without increasing the required memory by a lot.The closely related extended sources approach of [20], presented in (2.7) uses a fullrank Z and a weighted (cid:96) norm instead of an (cid:96) norm. These diﬀerences have asigniﬁcant practical implication in 3D, as the low-rank structure and sparsity of Z arebeneﬁcial because of computational and memory considerations. We solve the objec-tive (3.1) by alternating minimization (ALM), where we alternate between the ap-proximate minimization of m , Z , and Z in turns. Minimizing for m is equivalentto minimizing (2.5), and can be obtained by the LBFGS [29] or GN methods. Min-imizing for Z can be obtained directly, as it is a minimization of a rather smallquadratic function (details are given later). Minimizing for Z can be done eﬃcientlyusing iterative re-weighted least squares (IRLS) or proximal methods (in particular,proximal CG [48, 42]). Alg. 3.1 presents the ALM algorithm which is integrated intothe standard frequency continuation procedure in Alg. 2.1. Algorithm 3.1

Frequency continuation with alternating minimization. ω < ... < ω n f . ws : Window size of frequencies that we work on each time. i start , i end - Initial and ﬁnal frequencies to consider. i start ≥ i end ≤ n f .Initialize m (0) by some reference model.Initialize Z randomly (if there is no previous solution). for i = i start : i end dofor j = 1 : iter ALM do

1. Solve for Z given Z .2. Apply CG iterations for Z .3. Solve for Z given Z .4. Apply a GN iteration for Φ( m ) using data for ω max { i − ws, } , ..., ω i ,starting from a previous model m .5. Update β , β as in Section 3.2. endend3.1.1. Solving for m using Gauss-Newton. The solution of (3.1) with re-spect to m is hardly inﬂuenced by the extended sources, and is similar to the solutionof (2.5). At each iteration k of GN we obtain the linear approximation(3.2) u s,j ( m ( k ) + δ m ) ≈ u s,j ( m ( k ) ) + J s,j ( m ( k ) ) δ m , WI USING EXTENDED AND SIMULTANEOUS SOURCES u sj ( m ) = H ( m , ω j ) − ( q s + z s ), and J s,j = ∇ m u s,j is the Jacobian matrix ofthe data for every source s and frequency ω j . z s is the s -th column of Z Z . In real-life scales, the Jacobian matrix cannot be stored in memory [16], but we can applymatrix-vector products with this matrix and its conjugate transpose.At each step, we place (3.2) in (3.1) and solve a quadratic minimization by aniterative solver where only matrix vector products of the Jacobians are computed. Toobtain the GN step δ m , we ﬁrst compute the gradient of (3.1), which is given by(3.3) ∇ m Φ( m ( k ) ) = (cid:88) s,j J sj ( m ( k ) ) (cid:62) P ( P (cid:62) u sj ( m ( k ) ) − d obs sj ) + α ∇ m R ( m ( k ) ) , and then we approximately solve the linear system H δ m = −∇ m Φ( m ( k ) ) , where theGauss-Newton Hessian is deﬁned by(3.4) H = (cid:88) s,j J sj ( m ( k ) ) (cid:62) PP (cid:62) J sj ( m ( k ) ) + α ∇ R ( m ( k ) ) . Once the linear system is approximately solved, the model is updated, m ← m + µδ m where µ is a line search parameter that is chosen such that the objective function issuﬃciently decreased at each iteration (the Armijo rule).The Jacobian matrix required in (3.3)-(3.4), is given by J sj ( m ) = − ω H ( m , ω j ) − diag( H ( m , ω j ) − z s ) . (3.5)Assuming that the ﬁelds u sj are stored in memory (or the disk), the multiplicationof this matrix with a vector requires one forward solution (for each pair of source andfrequency). The ﬁelds can be stored in a rather low precision [41]. . The minimization of (3.1) with respect to Z can beobtained directly, as this is a small-sized quadratic minimization problem. Denotethe residual R j and temporary matrix T j by(3.6) R j = D obs j − P (cid:62) H ( m , ω j ) − Q , T j = P (cid:62) H ( m , ω j ) − ∈ C n r × N , where n r is the number of receivers. The solution for Z is given by solving n s smalllinear systems of size n es × n es with the same symmetric and positive deﬁnite matrix:(3.7) Z = (cid:88) j ( T j Z ) ∗ Σ − j ( T j Z ) + β I  − (cid:88) j ( T j Z ) ∗ Σ − j R j . The residuals R j and matrices ( T j Z ) are computed as part of the update for Z ,which are both kept ﬁxed in this part. We note that Z is easily computed given Z ,hence, there is no need to keep its iterative state or initialize it. . When limited to Z only, problem (3.1) is a quadraticminimization with an (cid:96) penalty, which is also called the least absolute shrinkage andselection operator (LASSO) regression. This problem was originally suggested in [35]for seismic inversion like here, but went on to be highly popular in other applications,mainly signal processing [48]. It is a well understood problem with a variety ofavailable solvers, like IRLS, proximal CG [42] or SEquential Subspace OPtimization(SESOP) [48]. These three are the most suitable methods in our context, because0 Sagi Buchatsky and Eran Treiser they allow us to apply the relevant matrices as “black-box” operators, some of whichinclude the solution of (2.1).Again, using the residual R j and T j in (3.6), the problem for Z is given by: n f (cid:88) j =1 (cid:107) T j Z Z − R j (cid:107) − j + β (cid:107) Z (cid:107) . (3.8)Because of the coupling, we get a single linear system inside the (cid:96) norm for all theunknowns in Z . That is, this is not a block linear system with multiple right-hand-sides, like (3.7). Using the ⊗ symbol for the Kronecker product, the matrix equationin (3.8) can also be vectorized as: n f (cid:88) j =1 (cid:107) ( Z ∗ ⊗ T j ) vec ( Z ) − vec ( R j ) (cid:107) − j + β (cid:107) vec ( Z ) (cid:107) , (3.9)where vec () denotes the column-stacking of any matrix into a vector. In this work wesolve the LASSO minimization using IRLS, where at each ALM iteration for Z inAlg. (3.1) we replace the (cid:96) norm with a weighted (cid:96) norm(3.10) (cid:107) vec ( Z ) (cid:107) → (cid:107) vec ( Z ) (cid:107) W , where W = diag { ( | vec ( Z ) | + ε ) − } , where | vec ( Z ) | denotes the stacked vector of absolute entries of Z . This way, thegradient of the temporary IRLS objective smoothly approximates the gradient of(3.9). Then, the IRLS objective is approximately minimized by a few iterations ofstandard preconditioned CG for the normal equations, which directly minimizes theIRLS approximation of (3.9) in each of its iterations. For eﬃciency, the operators forCG are computed in matrix form as follows(3.11) OP ( Z ) = T j Z Z , OP ∗ ( R ) = T ∗ j RZ ∗ , where the conjugate transposed operator OP ∗ ( R ) is equivalent to(3.12) ( Z ∗ ⊗ T j ) ∗ vec ( R ) = ( Z ⊗ T ∗ j ) vec ( R ) = vec ( T ∗ j RZ ∗ )in vectorized form. We apply a few such CG iterations (speciﬁcally, about 5 eachtime), as an approximate minimization for Z inside the most inner ALM algorithm.As a preconditioner, we use the matrix W in (3.10).The solution for Z and Z and update for m are repeated iter ALM times inAlgorithm (3.1). In this process, the LU factorization or preconditioner of H neededto multiply the matrices T j with vectors, is obtained as part of the iteration for m . The regularization pa-rameters β and β play a signiﬁcant role in the extended sources framework. Theycontrol how much we keep the problem (3.1) close to the original one (2.5). On theone hand, we wish to allow the matrix Z = Z Z to have signiﬁcant enough values toinﬂuence the optimization process. On the other hand, we wish to keep this processclose to the reality, where the extensions do not exist, since they are only artiﬁcial.The work [20] chose the parameter β in (2.7) to be ﬁxed, so initially, when the misﬁtis high with respect to m this parameter is relatively low, and as m improves withthe iterations, β becomes more signiﬁcant with respect to Z . As one can expect, we WI USING EXTENDED AND SIMULTANEOUS SOURCES β and β . As a rule ofthumb, the work [20] chose β such that the misﬁt with the extended sources is abouthalf of the misﬁt without the extended sources (if setting Z = 0). This is obviouslya quite strong penalty, as the full Z can easily zero out the misﬁt in (3.1) almost forany m , if not penalized. That is part of the motivation for this work. By setting ahigh-enough β , the work [20] essentially limits Z . Here, we limit it by using a low-rankstructure that is much more favorable computationally, even though it does introducesome algorithmic complications, as discussed above. In our framework, we keep theratio between β and β ﬁxed—speciﬁcally, we choose β = 100 β , which was chosenbased on trial and error. Throughout the iterations, we change β i together to keepthe ratio between misﬁts (with and without extended sources) to be approximately0.5. More explicitly, to keep the ratio in the section [ r , r ] we apply the followingrule: If misﬁt( Z Z )misﬁt(0) > r then β , β ← β /γ, β /γ, (3.13) Else if misﬁt( Z Z )misﬁt(0) < r then β , β ← β · γ, β · γ. In this work we choose r = 0 . r = 0 .

5, and γ = 1 . The cost of the entire inversion is dominated by1) the cost of the GN iterations for m , and 2) the cost of the IRLS-CG iterations for Z . The cost of the minimization for Z is quite negligible compared to the cost ofthe other two. Below we provide details regarding each of these components. The costof each GN iteration is entirely dominated by the costs of the forward solvers, andthat is also the dominant cost in standard reduced FWI in (2.5). At each gradientand Hessian-vector multiplication, we need to solve the forward problem (2.1) twice:once to compute the Jacobian in (3.5), and once for its adjoint, where the adjointHelmholtz equation is solved (that is assuming that the ﬁelds u sj are stored). Given m ( k ) , if possible (e.g., in 2D) we can factorize the matrices H ( ω j ) for all frequencies,using a direct solver [4, 36], and then only need to apply the forward-backward substi-tutions for each source. Solutions in 3D typically require using iterative solvers witheﬀective preconditioners (e.g., [13, 26]) to be computationally eﬃcient. The precon-ditioner often dominates the computational cost. The preconditioner setup, like theLU factorization, is applied only once per iteration. It is clear that the cost of GNis controlled by the number of frequencies we consider in a frequency continuationwindow, and more importantly, the number of sources. To summarize:(3.14) cost(GN) = (cid:88) frequencies cost(LinSetup) + iter (GN.CG) · · n s · cost(LinSolve) , where iter (GN.CG) is the number of inner CG iterations in GN. Setting up thesources, i.e., multiply Z Z , may be costly if the rank of source extension n es is notlow as we choose here.In terms of memory, the footprint of the GN iterations is given by(3.15) mem(GN) = (cid:88) frequencies mem(LinSetup) + n s · O ( N ) , where mem(LinSetup) is the memory requirements of the Helmholtz solver, and N is the forward mesh size. The O ( N ) storage mainly reﬂects the storage of the ﬁelds2 Sagi Buchatsky and Eran Treiser u sj that are needed in the sensitivity computations in Eq. (3.5). If the ﬁelds are notstored, the multiplication of the sensitivities with vector requires twice the number offorward simulations. . The cost of the minimization for Z per extended source is similar to that of GN. The matrices T j that are deﬁned in(3.6) and needed in (3.11) typically cannot be stored in memory. Hence, for example,to apply OP to Z in (3.11), we need to apply a forward simulation for each pair ofextend source and frequency. This shows that the low-rank structure of the sourceextension Z Z is crucial for the method to be computationally attractive. Notethat the solution for Z does not require the solver or preconditioner setups for theHelmholtz operators, and can reuse the ones computed as part of GN. To summarize:(3.16) cost( Z solve) = (cid:88) frequencies iter (IRLS.CG) · · n es · cost(LinSolve) , where iter (GN.CG) is the number of inner CG iterations in IRLS. In terms ofmemory, the footprint of the Z minimization is given by(3.17) mem( Z solve) = (cid:88) frequencies mem(LinSetup) + n es · O ( N ) , which can be expensive if n es is large. In particular, in [20] the number of extendedsources equals to the number of sources n s as in (2.7), and the cost is proportional to n s instead of n es . In this analysis we neglect the memory saving that we can exploitfrom the sparsity of Z , since the peak memory of the sparse solvers can reach a highpercentage of the unknowns, and should be carefully controlled. In any case, this isonly crucial if n es is high. Comparison:

As one can observe, the cost of the two dominant components ofthe algorithm are controlled by similar factors: number of CG iterations, and numberof sources involved. If, as we expect, the total number of sources n s is signiﬁcantlylarger than the rank of extended sources n es , then the additional computations andmemory for the inversion following the low-rank source extension is low.

4. FWI using both extended and simultaneous sources.

To further easethe computational cost of the inversion we will eﬀectively reduce the number of sourcesin the misﬁt at each GN iteration using the simultaneous sources technique presentedin Section 2.5. Here we describe how to combine it with the low-rank extended sourcesobjective (3.1). We note that, as far as we know, no work describes the combinationof simultaneous sources with the standard extended sources (2.7). The task is notstraightforward, since in some sense, the simultaneous sources technique compactly“summarizes” the many sources Z into a few by ZX . However, if all those manysources in Z are part of the inversion unknowns, then it is not clear how to updatethem based on their compactly estimated version, without investing the computationsfor all of them. A standard update for a full Z costs proportionally to n s forwardlinear solves—that is the type of computation that we wish to prevent.Basically, we wish to combine equations (2.10) and (3.1). Given a random matrix X ∈ R n s × p , the combination leads to the objectivemin m L ≤ m ≤ m H Z , Z Φ LowRankExtSimSrc ( m , Z , Z ; X ) =(4.1)1 p n f (cid:88) j =1 (cid:13)(cid:13) P (cid:62) H ( m , ω j ) − ( Q + Z Z ) X − D obs j X (cid:13)(cid:13) − j + β (cid:107) Z (cid:107) + β p (cid:107) Z X (cid:107) F + αR ( m ) . WI USING EXTENDED AND SIMULTANEOUS SOURCES QX , D obs j X , and Z X replace Q and D obs j ,and Z respectively, which is similar to the standard simultaneous method in (2.10).By frequently changing X in (4.1), we can essentially solve (3.1) at reduced cost. To use the simultaneous sources approach,we choose a new dimensionality reduction matrix X in step 1 of Algorithm 3.1, andapply the ALM iteration with updates over m , Z and Z based on (4.1). We applythe following steps which are similar to the ones described before.To solve for m , the GN method is applied in the same way as before, only withthe low-rank source-extension ( Q + Z Z ) X , which is easily computed in memorythanks to the sparsity of Q and the low-rank structure of Z Z . This part would becomputationally similar also if we use standard FWI with extended sources in (2.10).To account for Z , we compute its counterpart ˆ Z = Z X by direct minimization.That is, ˆ Z replaces Z in (3.1), and just like Z is state-less (directly computed andis not updated iteratively), ˆ Z is also state-less. Moreover, it is computed using thesame formula as (3.7), only now with the reduced residual ˆ R j = R j X instead of R j .Hence, we do not keep track of Z in the inversion, and compute its reduced versionˆ Z directly every iteration, given X .The solution for Z remains the same as the minimization of (3.8), only with ˆ Z and ˆ R j given above instead of Z and R j , respectively. We choose the dimension ofthe simultaneous sources and extended sources to be similar ( p ≈ n es ) so that theminimization for Z , even though only approximated, will not over-ﬁt the data thatis reduced by the multiplication in X . Because the column-dimension of Z is n es , thenthe cost of the update for Z is similar to (3.16). As in the case of simultaneoussources with standard FWI, the real saving is in the GN iterations. That is, we have p sources instead of n s , so compared to (3.14) we have(4.2) cost(GN-SS) = (cid:88) frequencies cost(LinSetup)+ iter (GN.CG) · · p · cost(LinSolve) , where iter (GN.CG) is the number of CG iterations in GN. Memory-wise, we have(4.3) mem(GN-SS) = (cid:88) frequencies mem(LinSetup) + p · O ( N ) . In short, if p ≈ n es the solution cost for Z is proportional to the GN iterations for m using simultaneous sources. This is what we wanted to achieve here.

5. Numerical Results.

In this section we demonstrate our low-rank extendedsources approach with and without the simultaneous sources technique and compareit to standard FWI for velocity model reconstruction. For the purpose of demonstra-tion, we do not augment the inversions with other modalities or techniques knownin the literature. In principle, such techniques can be used in addition to our ap-proach in more complicated real-life experiments. We conduct our experiments ontwo 2D models, one is the SEG/EAGE salt model [5], the other is the Marmousimodel [11]. For each model we include three experiments - the ﬁrst is a reconstruc-tion using standard FWI, which is obtained by minimizing the reduced formulation(2.5), using a few sweeps of Algorithm 2.1. The second experiment is FWI with thelow-rank extended sources formulation according to (3.1), using Algorithm 3.1. Thelast experiment involves a reconstruction using both the extended and simultaneous4

Sagi Buchatsky and Eran Treiser sources techniques, according to (4.1), again using Algorithm 3.1, but this time witha new random matrix X at every ALM iteration as explained in Section 4. For all thesettings and models we display the resulting reconstructed model, and the misﬁt ateach iteration of the GN (or ALM) methods during frequency continuation sweeps.That misﬁt is computed for all the sources and frequencies, and without the sourceextensions, regardless of the frequency window or method used.In addition, we demonstrate the feasibility of our method in 3D, by performing anexperiment on part of the SEG/EAGE Overthrust model [5]. In particular, we wishto demonstrate that a 3D FWI experiment with extended sources can be obtainedusing a rather standard workstation in terms of memory and computations.All the code in these experiments was written in the Julia language [9], as partof the open source jInv framework [34]. Some critical parts of the code, like the LUsolver (forward + backward substitution) and matrix-vector products for the forwardmodelling is written in C++ and is parallelized using shared memory OpenMP. Theexperiments were computed on a workstation with Intel Xeon Gold 5117 2GHz X 2(14 cores per socket) with 256 GB RAM, running on Centos 7 Linux distribution. Theobjective functions in the FWI formulations in Equations (2.5), (3.1), and (4.1) con-tain a regularization term R ( m ). Based on the work [41] we apply two regularizationfunctions - one is a high order regularization called spline smoothing, given by(5.1) R ( m ) = (cid:107) ∆ h ( m − m ref ) (cid:107) . The goal of this regularization is to create a smooth model from high-frequency data.The reference model m ref is set to be the initial guess for the inversion, and whileusing this regularization we keep m ref ﬁxed. We use (5.1) to obtain a good smoothmodel, so that the rest of the process will result in a plausible reconstruction.The second regularization function is a standard diﬀusion regularization(5.2) R ( m ) = (cid:107)∇ h ( m − m ref ) (cid:107) , where ∇ h represents a discretized gradient on a nodal grid. When using this reg-ularization we change m ref at each GN/ALM iteration to be the resulting model.Updating m ref encourages the model to change more at each iteration, resulting infaster convergence. In this problem, (5.2) results in a sharp reconstruction, and is suit-able to use after an initial smooth model is constructed. Total Variation regularizeris another approach to promote sharper piece-wise constant/smooth reconstruction. In this section we describethe general setup of our inversion algorithms which is common for all experiments,and more speciﬁc details will be provided for each experiment separately. For theinversions we use the frequency continuation strategies in Algorithms 2.1 or 3.1—weuse 3-4 sweeps of the corresponding version depending on each setting and model,with a window size of 4 frequencies for all the sweeps. At ﬁrst, we apply frequencycontinuation sweeps to create an initial smooth guess, using the smoothing high orderregularization (5.1). The remaining sweeps are responsible for sharpening the modeltowards the true one, hence we use the standard regularization (5.2). The continuationsweeps with the smoothing regularization start from the ﬁrst frequency, and end atthe fourth frequency ( i start = 1 , i end = 4). The next sweeps using (5.2) start fromthe fourth frequency, up to the last one. The number of GN or ALM iterations diﬀerfor each inversion and are noted for each experiment separately. In each GN iteration WI USING EXTENDED AND SIMULTANEOUS SOURCES (a) True velocity model. (b) The reference initial model. Fig. 1: The 2D Marmousi velocity model and initial guess in [km/s].we apply 5 projected and preconditioned CG iterations to approximately solve theinner Newton problem. As preconditioner for these iterations, we use the inverse ofthe Hessian of the smoothing regularization terms. This way, the low number of CGiterations also play a role in regularization, the inversion is not so sensitive to thechoice of α (see [15] for more details on this technique). Once a direction is found,we apply a standard Armijo linesearch. To solve for Z in either (3.1) or (4.1), wealso apply 5 (quadratic) CG iterations, at each update for Z in an ALM iteration.Throughout all the relevant iterations, the sparsity level of Z (percentage of non-zerovalues) was in the range of 2% − Our ﬁrst set of experiments is conducted usingthe Marmousi model presented in Fig. 1a. For this experiment we use a grid size of550 ×

200 representing an area of size 9 . km × . km . We place 136 sourcesand 549 receivers, which are uniformly spread on the top of the grid. The data isgenerated by solving (2.1) for frequencies ω j = 2 πf j , where { f j } = { , . , . , . , . , . , . , . , . } Hz, with the addition of 1% Gaussian i.i.d noise. The initial model for all the inversionsis given in Fig. 1b, and is initially used as m ref in the smoothing regularization.For all the Marmousi experiments we run three sweeps of frequency continuation -one sweep over the ﬁrst 4 frequencies using the smoothing regularization (5.1), and twoadditional sweeps over all the frequencies using the regularization (5.2). All sweepsare obtained with 10 GN or ALM iterations for each outer frequency continuationiteration. This results in total of 140 GN iterations. Then, we applied up to 100additional GN iterations involving the four highest frequencies only, without sourceextensions. The additional iterations are needed to sharpen the reconstructed model,especially for the extended sources versions. The reconstructed model for the experiment is shownat Fig. 2a, and the misﬁt history plot is shown at Fig. 3a. This result convergedto a local minima where the misﬁt value (2.6) equals 6,957, and did not reconstructthe model properly. We note that in all the convergence plots, the misﬁt values arecomputed after the inversion is over for all the sources and frequencies (independentlyof the frequency continuation schedule), and without the source extensions.

For the next experiment we added theextended sources (section 3) technique to the ﬁrst frequency continuation sweep withALM (Alg. 3.1). The extended sources has been used only in the ﬁrst sweep sinceit helps with obtaining a good initial smooth guess, and once we have such a guess6

Sagi Buchatsky and Eran Treiser(a) Standard FWI.(b) FWI with extended sources. (c) FWI with extended and simultaneous sources.

Fig. 2: Marmousi model reconstruction using the diﬀerent FWI formulations.Standard FWI FWI + SS FWI + ES FWI + ES + SS164,560 19.360 230,296 48,416Table 1: Comparison of computational costs for the Marmousi experiments.the standard FWI achieves good results. We set the rank of the source extensions tobe n es = 16 (that is the number of columns in Z ). In this experiment we chose theinitial β parameters to be β = 0 . , β = 10. Fig. 2b shows the reconstructed modeland Fig. 3b shows the misﬁt history of the inversion. The experiment results in goodreconstruction of the model, which is closer to the real model that the one obtainedwith standard FWI. The ﬁnal misﬁt value is 1,961, which is more than half the valueof the standard FWI. To complete theexperiments for the Marmousi model, we applied the joint extended and simultaneoussources approach (Section 4), using the same settings as in Sec. 5.3.2. We chose therandom matrix X in (2.10) to be of size n s × p , where p = 16, so that in all theGN/ALM iterations, the number of sources is 16. We chose this value through trialand error, aiming at keeping p small. The reconstructed model shown in Fig. 2c, andis very similar to the result in Fig. 2b. This similarity is also evident in the misﬁthistory in Fig. 3c. This shows the addition of simultaneous sources did not damagethe eﬀectiveness of the extended sources alone in converging to a better minimum. To compare the computational costs of the ex-periment we count the amount of forward simulations (2.1) that we solve in the ﬁrstfrequency continuation sweep of each experiment. Since the forward simulations arethe most computationally expensive part of the algorithm, this gives a good compar-ison between the diﬀerent algorithms.The results in Table 1 demonstrate that even with the low-rank structure, the

WI USING EXTENDED AND SIMULTANEOUS SOURCES (a) Standard FWI. (b) FWI with extended sources. (c) FWI with extended and si-multaneous sources. Fig. 3: Misﬁt values history, using all frequencies original sources only, for the Mar-mousi model reconstruction. (a) True SEG/EAGE velocity model. (b) The initial reference model.

Fig. 4: The 2D SEG/EAGE salt velocity model and initial guess in [km/s].addition of the extended sources (denoted as FWI+ES) increases the computationalcosts quite signiﬁcantly compared to standard FWI, and especially compared to FWIwith simultaneous sources (denoted FWI+SS). That is partially because of the resid-ual computation ( R j in Eq. (3.6)), which is obtained for the full set of sources Q .However, with the addition of the simultaneous sources (denoted as FWI+ES+SS)the cost is reduced signiﬁcantly. The most computationally eﬀective algorithm isFWI with simultaneous sources only (denoted FWI+SS), which we do not demon-strate here because its result is similar to standard FWI. Here we demonstrate thatour FWI+ES+SS version only roughly doubles the cost of FWI+SS, which, just likeFWI, does not involve an expanded search space and therefore is less robust. Our second batch of experiments in-volves the SEG/EAGE salt model, presented in Fig. 4a. The model is describedby a 600 ×

300 grid, representing an area of size 13 . km × . km . We placed 119sources and 599 receivers uniformly spread at the top of the domain grid. The dataare generated by ﬁrst solving (2.1) for frequencies ω j = 2 πf j , where { f j } = { , . , . , . , . , . , , . , . } Hz.

We then add 1% Gaussian noise to the data. The initial model for all the inversionsis given in Fig. 4b, and is initially used as m ref in the smoothing regularization.For the SEG/EAGE model we apply four frequency continuation sweeps - two us-ing the ﬁrst 4 frequencies with the regularization (5.1), and two using all frequencies8 Sagi Buchatsky and Eran Treiser(a) Standard FWI.(b) FWI with extended sources. (c) FWI with extended and simultaneoussources.

Fig. 5: SEG/EAGE model reconstruction using the diﬀerent FWI formulations.with the regularization (5.2) to obtain a sharp model. For the ﬁrst two sweeps we used20 GN iterations per outer iteration (with 7 CG iterations in each inner iteration),and for the last two we used 15 GN iterations (with 5 CG iterations in each, as inthe rest of the conﬁgurations). To ﬁnalize the inversion, we applied up to 100 addi-tional GN iterations involving the four highest frequencies, without source extensionswhere relevant. These are used to sharpen the reconstructed model, especially for theextended sources versions.

The reconstructed model for the experiment is shownin Fig. 5a, and the misﬁt history plot is shown in Fig. 6a. The FWI seems to onlyreconstruct the upper part of the salt block, and is missing the lower part. This is atypical behavior of FWI that we wish to overcome. The ﬁnal misﬁt value here was2278, and the iterations stagnated.

In this experiment we applied our inver-sion strategy with the extended sources approach, which had been applied in the ﬁrstcontinuation sweep only. In this experiment we chose the initial β parameters to be β = 0 . , β = 1. The reconstructed model is shown at Fig. 5b, and Fig. 6b showsthe misﬁt history of the inversion. The addition of the extended sources resulted inrecovering the whole salt block, while keeping the extended sources at low rank. Themisﬁt value at the last iteration was 453 - much lower than with standard FWI. In the lastexperiment for SEG/EAGE, we use both extended and simultaneous sources, withthe same parameters as in the previous section. We chose the X in (2.10) to be ofsize n s × p , with p = 16. As before, we see that the reconstructed model, shownin Fig. 5c, is very similar to the result in Fig. 5b. Furthermore, the misﬁt plotsfor the current run (6c) and previous run (6b) are similar as well, with ﬁnal misﬁts WI USING EXTENDED AND SIMULTANEOUS SOURCES (a) Standard FWI. (b) FWI with extended sources. (c) FWI with extended and si-multaneous sources. Fig. 6: Misﬁt values history, using all frequencies and original sources, for theSEG/EAGE model reconstruction.Standard FWI FWI + SS FWI + ES FWI + ES + SS285,600 38,400 418,102 108,640Table 2: Computational costs comparison for the SEG/EAGE experiments.of 591 versus 453, respectively. Like in the Marmousi experiment, the addition ofthe simultaneous sources hardly aﬀected the ﬁnal reconstruction, and improved thecomputational eﬃciency.

To compare the computational costs involved inthe SEG/EAGE experiments for the diﬀerent algorithms, we again count the amountof forward simulations (2.1) we applied in the ﬁrst frequency continuation sweep.The results in Table 2 show the same trend as in the Marmousi case. The extendedsources increases the computational costs, while the addition of the simultaneoussources reduces the cost drastically.

Our ﬁnal experiment isapplied to the central part of the 3D SEG/EAGE Overthrust model, presented inFig. 7a. The model is discretized on a grid of size 172 × × . km × . km × . km (the center of the original model). We place 289sources and 1849 receivers which are spread uniformly in a 2D array on the top of thegrid. The data is generated by solving (2.1) for frequencies ω j = 2 πf j , where: { f j } = { . , . , . , . , . } Hz, and adding white Gaussian noise of std 1% of the magnitude of the data. Starting fromthe initial reference model in Fig. 7b, we applied 3 cycles of frequency continuationusing our method with extended and simultaneous sources. In this experiment wedid not run the other conﬁgurations, as they required too extensive time and carefulmanagement of memory swaps (between RAM and the disk) for our resources. Forthe ﬁrst cycle we used the smoothing regularization with the lower 4 frequencies toobtain a smooth starting model for the next cycles. The next two cycles were appliedusing the standard regularization, starting from the 4-th frequency. Those cycleswere applied without extended sources (FWI + simultaneous sources alone), since the0

Sagi Buchatsky and Eran Treiser(a) True velocity model. (b) The reference initial model. (c) Overthrust model reconstruc-tion using FWI with extendedand simultaneous sources.

Fig. 7: The 3D Overthrust velocity model and initial guess in [km/s].model obtained after the ﬁrst cycle was suﬃcient as a smooth guess. The resultingreconstruction presented in Fig. 7c. We notice that the reconstructed model, whilenot sharp enough, did manage to catch the important structures of the true model.The reason for the rather smooth reconstruction is that the highest frequency we usedis only 5Hz to keep the model size reasonable—higher frequencies would require largergrids, and a signiﬁcantly more expensive inversion.

6. Conclusion.

In this work we aimed to improve on recent approaches forsolving PDE-constrained optimization problems—approaches that expand the searchspace in order to relieve the non-linearity of the objective. In particular, we consideredthe recent extended sources approach for FWI, and suggested a new reduced versionof this problem, where we couple the source extensions as a low-rank matrix. Wealso showed that it is possible to accelerate the minimization of our (source-extended)objective function by using simultaneous sources, which reduces both memory andcalculation costs. Unlike the previous full-rank approach, ours does not require theadditional computations or memory for all the sources, which overrides the advantagesof simultaneous sources, and is prohibitively expensive in large scales. Therefore, ourapproach is more applicable in real life 3D scenarios.Our results showed that it is possible to combine the extended sources and thesimultaneous sources as we propose. On the one hand we use the source-extensionto achieve better reconstructions than the standard reduced FWI, and on the otherhand we are able to enjoy manageable computations and low-memory footprint. Wedemonstrated our approach on two 2D models and one 3D model. The latter, inparticular, demonstrates the advantage of our approach—we were able to apply thesource-extended approach to a 3D problem using a rather standard workstation.Our method has two main limitations: one is the need to a-priory choose thedimensions of both the low-rank source extension matrix and the dimension of thesimultaneous sources technique. Furthermore, we found that the dimension of thelatter has to be at least of the same size as the dimension of the trace estimation, toprevent over-ﬁtting and ﬂuctuations when solving for the source extensions.

REFERENCES[1]

H. S. Aghamiry, A. Gholami, and S. Operto , Implementing bound constraints and total-variation regularization in extended full-waveform inversion with the alternating directionmethod of multiplier: application to large contrast media , Geophysical Journal Interna-tional, 218 (2019), pp. 855–872.WI USING EXTENDED AND SIMULTANEOUS SOURCES [2] , Improving full-waveform inversion by waveﬁeld reconstruction with the alternating di-rection method of multipliers , Geophysics, 84 (2019), pp. R139–R162.[3] ,

Robust waveﬁeld inversion via phase retrieval , Geophysical Journal International, 221(2020), pp. 1327–1340.[4]

P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and J. Koster , A fully asynchronous multi-frontal solver using distributed dynamic scheduling , SIAM Journal on Matrix Analysis andApplications, 23 (2001), pp. 15–41.[5]

F. Aminzadeh, B. Jean, and T. Kunz , , Society of ExplorationGeophysicists, 1997.[6] A. Y. Aravkin, D. Drusvyatskiy, and T. van Leeuwen , Eﬃcient quadratic penalizationthrough the partial minimization technique , IEEE Transactions on Automatic Control, 63(2017), pp. 2131–2138.[7]

H. Avron and S. Toledo , Randomized algorithms for estimating the trace of an implicitsymmetric positive semi-deﬁnite matrix , Journal of the ACM (JACM), 58 (2011), p. 8.[8]

S. Bernard, V. Monteiller, D. Komatitsch, and P. Lasaygues , Ultrasonic computedtomography based on full-waveform inversion for bone quantitative imaging , Physics inMedicine & Biology, 62 (2017), p. 7011.[9]

J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah , Julia: A fresh approach to nu-merical computing , SIAM Review, 59 (2017), pp. 65–98.[10]

B. Biondi and A. Almomin , Simultaneous inversion of full data bandwidth by tomographicfull-waveform inversion , Geophysics, 79 (2014), pp. WA129–WA140.[11]

A. Brougois, M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg , Mar-mousi, model and data , in EAEG workshop-practical aspects of seismic data inversion,European Association of Geoscientists & Engineers, 1990, pp. cp–108.[12]

I. Epanomeritakis, V. Akcelik, O. Ghattas, and J. Bielak , A Newton-CG method forlarge-scale three-dimensional elastic full-waveform seismic inversion , Inverse Problems,(2008).[13]

Y. A. Erlangga, C. W. Oosterlee, and C. Vuik , A novel multigrid based preconditionerfor heterogeneous Helmholtz problems , SIAM J. Sci. Comput., 27 (2006), pp. 1471–1492.[14]

L. Guasch, O. C. Agudo, M.-X. Tang, P. Nachev, and M. Warner , Full-waveform inversionimaging of the human brain , NPJ digital medicine, 3 (2020), pp. 1–12.[15]

E. Haber , Computational Methods in Geophysical Electromagnetics , vol. 1, SIAM, 2014.[16]

E. Haber, U. Ascher, and D. Oldenburg , On optimization techniques for solving nonlinearinverse problems , Inverse problems, 16 (2000), pp. 1263–1280.[17]

E. Haber and U. M. Ascher , Preconditioned all-at-once methods for large, sparse parameterestimation problems , Inverse Problems, 17 (2001), p. 1847.[18]

E. Haber, M. Chung, and F. Herrmann , An eﬀective method for parameter estimation withPDE constraints with multiple right-hand sides , SIAM Journal on Optimization, 22 (2012),pp. 739–757.[19]

E. Haber, E. Treister, and E. Holtham , Obtaining low frequencies for full waveform inver-sion by using augmented physics , ASEG Extended Abstracts, (2016), pp. 1–5.[20]

G. Huang, R. Nammour, and W. W. Symes , Volume source based extended waveform inver-sion , Geophysics, 83 (2018), pp. 1–139.[21]

J. R. Krebs, J. E. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein, andM.-D. Lacasse , Fast full-waveﬁeld seismic inversion using encoded sources , Geophysics,74 (2009), pp. WCC177–WCC188.[22]

Y. E. Li and L. Demanet , Full-waveform inversion with extrapolated low-frequency data ,Geophysics, 81 (2016), pp. R339–R348.[23]

M. Liu, R. Kumar, E. Haber, and A. Aravkin , Simultaneous-shot inversion for PDE-constrained optimization problems with missing data , Inverse Problems, 35 (2018),p. 025003.[24]

Z. Liu and J. Zhang , Joint traveltime, waveform, and waveform envelope inversion for near-surface imaging , Geophysics, 82 (2017), pp. R235–R244.[25]

L. M´etivier, R. Brossier, S. Operto, and J. Virieux , Full waveform inversion and thetruncated newton method , SIAM Review, 59 (2017), pp. 153–195.[26]

J. Poulson, B. Engquist, S. Li, and L. Ying , A parallel sweeping preconditioner for hetero-geneous 3D Helmholtz equations , SIAM J. Sci. Comput., 35 (2013), pp. C194–C212.[27]

R. Pratt , Seismic waveform inversion in the frequency domain, part 1: Theory, and veriﬁ-cation in a physical scale model , Geophysics, 64 (1999), pp. 888–901.[28]

R. G. Pratt, C. Shin, and G. Hick , Gauss–Newton and full Newton methods in frequency–space seismic waveform inversion , Geophys. J. Int., 133 (1998), pp. 341–362.[29]

Y. Rao and Y. Wang , Seismic waveform tomography with shot-encoding using a restarted Sagi Buchatsky and Eran Treiser l-bfgs algorithm , Scientiﬁc reports, 7 (2017), pp. 1–9.[30]

F. Roosta-Khorasani and U. Ascher , Improved bounds on sample size for implicit matrixtrace estimators , Foundations of Computational Mathematics, 15 (2015), pp. 1187–1212.[31]

F. Roosta-Khorasani, K. Van Den Doel, and U. Ascher , Data completion and stochasticalgorithms for PDE inversion problems with many measurements , Electron. Trans. Numer.Anal, 42 (2014), pp. 177–196.[32]

F. Roosta-Khorasani, K. van den Doel, and U. Ascher , Stochastic algorithms for inverseproblems involving PDEs and many measurements , SIAM J. Sci. Comput., 36 (2014),pp. S3–S22.[33]

L. I. Rudin, S. Osher, and E. Fatemi , Nonlinear total variation based noise removal algo-rithms , Physica D: Nonlinear Phenomena, 60 (1992), pp. 259–268.[34]

L. Ruthotto, E. Treister, and E. Haber , jInv – a ﬂexible Julia package for PDE parameterestimation , SIAM J. Sci. Comput., 39 (2017), pp. S702—-S722.[35] F. Santosa and W. W. Symes , Linear inversion of band-limited reﬂection seismograms , SIAMJournal on Scientiﬁc and Statistical Computing, 7 (1986), pp. 1307–1330.[36]

O. Schenk and K. G¨artner , Solving unsymmetric sparse systems of linear equations withpardiso , Future Generation Computer Systems, 20 (2004), pp. 475–487.[37]

C. Shin and Y. Ho Cha , Waveform inversion in the Laplace—Fourier domain , GeophysicalJournal International, 177 (2009), pp. 1067–1079.[38]

E. Somersalo and J. Kaipio , Statistical and computational inverse problems , Applied Math-ematical Sciences, 160 (2004).[39]

E. Soubies, T.-A. Pham, and M. Unser , Eﬃcient inversion of multiple-scattering model foroptical diﬀraction tomography , Optics express, 25 (2017), pp. 21786–21800.[40]

A. Tarantola , Inverse problem theory , Elsevier, Amsterdam, 1987.[41]

E. Treister and E. Haber , Full waveform inversion guided by travel time tomography , SIAMJ. Sci. Comput., 39 (2017), pp. S587––S609.[42]

E. Treister and I. Yavneh , A multilevel iterated-shrinkage approach to l penalized least-squares minimization , IEEE Transactions on Signal Processing, 60 (2012), pp. 6319–6329.[43] T. van Leeuwen, A. Y. Aravkin, and F. J. Herrmann , Seismic waveform inversion bystochastic optimization , International Journal of Geophysics, 2011 (2011).[44]

T. van Leeuwen and F. J. Herrmann , Mitigating local minima in full-waveform inversion byexpanding the search space , Geophysical Journal International, 195 (2013), pp. 661–667.[45] ,

3D frequency-domain seismic inversion with controlled sloppiness , SIAM J. Sci. Com-put., 36 (2014), pp. S192–S217.[46]

T. van Leeuwen and F. J. Herrmann , A penalty method for PDE-constrained optimizationin inverse problems , Inverse Problems, 32 (2016), p. 015007.[47]

C. R. Vogel , Computational methods for inverse problems , vol. 23, SIAM, Philadelphia, 2002.[48]