A comparative study of structural similarity and regularization for joint inverse problems governed by PDEs
AA comparative study of structural similarity andregularization for joint inverse problems governedby PDEs
Benjamin Crestel , Georg Stadler and Omar Ghattas , Institute for Computational Engineering & Sciences, The University of Texasat Austin, Austin, TX, USA Courant Institute of Mathematical Sciences, New York University, New York,NY, USA Department of Geological Sciences and Department of Mechanical Engineering,The University of Texas at Austin, Austin, TX, USAE-mail: [email protected], [email protected] [email protected]
Abstract.
Joint inversion refers to the simultaneous inference of multipleparameter fields from observations of systems governed by single or multipleforward models. In many cases these parameter fields reflect different attributesof a single medium and are thus spatially correlated or structurally similar. Byimposing prior information on their spatial correlations via a joint regularizationterm, we seek to improve the reconstruction of the parameter fields relativeto inversion for each field independently. One of the main challenges is todevise a joint regularization functional that conveys the spatial correlationsor structural similarity between the fields while at the same time permittingscalable and efficient solvers for the joint inverse problem. We describe severaljoint regularizations that are motivated by these goals: a cross-gradient and anormalized cross-gradient structural similarity term, the vectorial total variation,and a joint regularization based on the nuclear norm of the gradients. Basedon numerical results from three classes of inverse problems with piecewise-homogeneous parameter fields, we conclude that the vectorial total variationfunctional is preferable to the other methods considered. Besides resulting ingood reconstructions in all experiments, it allows for scalable, efficient solvers forjoint inverse problems governed by PDE forward models.
Keywords : Joint inversion, multi-physics inverse problem, joint regularization,structural similarity prior, vectorial total variation, cross-gradient, nuclear norm
1. Introduction
In a joint inverse problem one seeks to reconstruct multiple parameter fields fromobservational data and forward models that map the parameter fields to the data. Inmany cases these parameter fields reflect different attributes of a single medium andare thus spatially correlated or structurally similar. By imposing prior informationon their spatial correlations via a joint regularization term, we seek to improve thereconstruction of the parameter fields relative to inversion for each field independently.We formulate the joint inverse problem as an optimization problem with aregularized data misfit objective, governed by a forward model that represents a single a r X i v : . [ m a t h . NA ] A ug tructural similarity and regularization for joint inverse PDE problems m and m , which we seek to reconstruct fromobservational data d . The parameter-to-observable map F ( m , m ) typically involvessolution of the forward PDEs given the parameter fields, followed by application ofthe observation operator, which restricts the PDE solution to the space of observables.The optimization problem is thusmin ( m ,m ) (cid:26) |F ( m , m ) − d | + R ( m , m ) (cid:27) . (1)The role played by R in (1) is discussed in the next paragraph. Here, we addresstwo specific settings for (1). In the first, the forward model in F ( m , m ) describes asingle physical phenomenon. An example of such a joint inverse problem is inversionfor the primary and secondary wave speeds in the Earth given measurements of theacceleration at the surface. Obtaining high quality reconstructions for both parameterfields is known to be difficult without incorporating some form of prior knowledge thatcouples the two fields [1, 2, 3]. We refer to formulation (1) as a single physics jointinverse problem.In the second type of joint inverse problem, we consider observations d and d stemming from two distinct physical phenomena respectively, each depending on asingle parameter field. In this case the forward models of the physical phenomenaare uncoupled, and coupling occurs only via the inverse problem. The correspondingparameter-to-observable maps are denoted by F ( m ) and F ( m ), resulting inmin ( m ,m ) (cid:26) |F ( m ) − d | + 12 |F ( m ) − d | + R ( m , m ) (cid:27) . (2)This formulation emerges from the general case above by defining F ( m , m ) =[ F ( m ) , F ( m )] T and d = [ d , d ] T . In the context of subsurface exploration,just a few of the different physical phenomena that can be combined in (2) includeelectromagnetic and seismic waves [4, 5], radar and seismic waves [6], DC resistivityand seismic waves [7], and current resistivity and groundwater flow [8].The joint regularization term R ( m , m ) in (1) and (2) acts to impose regularityon m and m individually to combat ill-posedness, but can also express structuralsimilarity or spatial correlations between the two parameter fields. The remainder ofthis section introduces several different choices for R . To isolate regularization fromstructural similarity, we decompose the joint regularization term R ( m , m ) into R ( m , m ) = γ R ( m ) + γ R ( m ) + γ ˆ R ( m , m ) , with γ, γ , γ >
0. The terms R and R are regularization terms for each parameterfield; here we take them to be total variation (TV) regularizations, since our targetmedia are piecewise-homogeneous (i.e., blocky). The term ˆ R ( m , m ) incorporatesthe structural similarity between m and m . We now discuss several choices for ˆ R .In [7], the authors introduce the cross-gradient termˆ R ( m , m ) = 12 (cid:90) Ω |∇ m × ∇ m | dx, which seeks to align gradients of the two parameter fields at each point in the medium,i.e., level sets that have the same shape. This seems to be the most popular choice in tructural similarity and regularization for joint inverse PDE problems R ncg ( m , m ) = 12 (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12) ∇ m |∇ m | × ∇ m |∇ m | (cid:12)(cid:12)(cid:12)(cid:12) dx. The normalized cross-gradient was first used in the context of image registration [9],and is discussed in section 2.2. Alternatively, when an empirical relationshipbetween both parameters is known, one could use it in place of the structuralsimilarity term ˆ R [4, 10]; this approach, however, can be problematic in practiceas these relationships are typically uncertain (thus introducing bias) and the resultingoptimization problems can be difficult to solve [10, 11].Alternatively, a single joint regularization term can impose regularity on bothparameter fields while also expressing a preference for structural similarity. Inparticular, we consider the vectorial total variation (VTV) functional, R ( m , m ) = γ (cid:90) Ω (cid:112) |∇ m | + |∇ m | dx, with γ >
0. The VTV functional was introduced in the context of multi-channelimaging [12, 13], and later used in PDE-constrained joint inverse problems [10]; it isdiscussed in section 3. A second term we consider is the nuclear norm , which wasused in [14, 15] to promote gradient alignment of a vector-valued image. Building onthis idea, in section 4 we introduce a nuclear norm-based joint regularization term forPDE-constrained joint inverse problems.The objective of this article is to construct and assess joint regularization termsthat are (1) efficient for inverse problems governed by PDEs with infinite-dimensionalparameter fields (and are thus large-scale after discretization) and (2) perform well inreconstructing sharp interfaces in the truth parameter fields. Indeed, targeting large-scale inverse problems entails several unique challenges that limit choices of the jointregularization term. Nonlinear inverse problems such as (1) and (2) must be solvediteratively, which requires gradient- (and Hessian-) based optimization methods tolimit the number of optimization iterations, along with adjoint methods to limit thenumber of PDE model solutions that must be carried out at each iteration. Moreover,the adjoint method efficiently provides only directional second derivatives rather thanfull Hessians, the construction of which would require as many PDE solves as thereare parameters (or observations). For these reasons, unless otherwise specified, weemploy an inexact Hessian-free Newton–conjugate gradient method with backtrackingline-search [1, 16, 17]. That is, we compute the Newton search direction using thepreconditioned conjugate gradient method, with early termination to guarantee adescent direction and to avoid over-solving [18]. The efficient solution of the Newtonsystem depends crucially on the choice of preconditioner; we detail our choices for eachjoint regularization functional in sections 2, 3, and 4. An overview of the numericalmethods we employ to solve large-scale inverse problems governed by PDEs can befound in Appendix A.Besides practicality and efficiency, our comparison of joint inversion methodsfocuses on the quality of the reconstructions. Truth parameter fields in geophysicalexploration and medical imaging problems often present sharp contrasts withinparameter fields. We focus on joint regularization terms that can best preserve sharp tructural similarity and regularization for joint inverse PDE problems
The main contributions of this article are as follows: (1) We review three jointregularization terms commonly found in the literature (cross-gradient paired withTV, normalized cross-gradient paired with TV, and VTV joint regularization), anddiscuss their practical use for large-scale joint inverse problems governed by PDEs.We derive their first and second derivatives, and use them to study properties of thedifferent joint regularization terms. (2) We adapt a nuclear norm joint regularizationterm to the context of joint inverse problems governed by PDEs. We discuss some ofthe resulting computational challenges, and propose a solver to address them. (3) Wecarry out a detailed comparison of all four joint regularization terms over a broad rangeof applications, and discuss their practical performance to reconstruct parameter fieldswith sharp interfaces.
In the next three sections, we introduce the four joint regularization terms. Thecross-gradient and normalized cross-gradient are discussed in sections 2.1 and 2.2,the vectorial total variation in section 3, and the nuclear norm joint regularizationin section 4. Section 5 summarizes our numerical experiments. In section 5.1, wereport on several multiple physics joint inverse problems of the form (2), in whichthe two parameters fields arise as coefficients in two independent Poisson equations,respectively. We use this example to illustrate some key features of each jointregularization term. In section 5.2, we consider a single physics joint inverse problem ofthe form (1) for the acoustic wave equation, in which we invert for the bulk modulusand the density. Finally, in section 5.3, we study a multiple physics joint inverseproblem with two different forward models, one an elliptic PDE and the other anacoustic wave equation. Section 6 provides concluding remarks.
2. Cross-gradient terms
In this section, we introduce the cross-gradient term and its normalized version. Themain idea behind both of these structural similarity terms is to express the preferencethat the level sets of the inversion parameter fields m and m align. As illustratedin figure 1, alignment of the level sets is equivalent to the alignment of the gradients ∇ m and ∇ m at each point. By definition of the cross-product of two vectors, thevectors ∇ m and ∇ m are aligned when |∇ m × ∇ m | vanishes. The cross-gradient term ˆ R cg , defined asˆ R cg ( m , m ) := 12 (cid:90) Ω |∇ m × ∇ m | dx, (3) tructural similarity and regularization for joint inverse PDE problems ∇ m level setof m ∇ m level setof m Figure 1.
Sketch of a level set of the parameter fields m (red) and m (blue),with their respective gradients at a point. was introduced in [7] and has become a popular choice in geophysical applications,particularly in seismic imaging. Although the formulation (3) is intuitive, it isinconvenient for discretization and computation of derivatives. Hence, using vectorcalculus, we re-write (3) asˆ R cg ( m , m ) = 12 (cid:90) Ω |∇ m | |∇ m | − ( ∇ m · ∇ m ) dx. (4)Combining the cross-gradient term (4) with independent TV regularizations for m and m , we obtain the joint regularization R ( m , m ) = γ R TV ,ε ( m ) + γ R TV ,ε ( m ) + γ ˆ R cg ( m , m ) , (5)where and γ, γ , γ >
0, and here and in the remainder of this paper, we use thenotation R TV ,ε ( m ) := (cid:90) Ω (cid:112) |∇ m | + ε dx for ε > . (6)In [10] the authors propose a different formulation, in which each independent TVregularization is weighted by a non-linear function of the gradient of the otherparameter. The goal of this weighting is to apply TV regularization only for points inthe parameter space where the cross-gradient term by itself is not sufficient to preventoscillatory solutions. Such oscillations may occur where the gradient of one parameteris very small, resulting in an (almost) vanishing cross-gradient term. Because thisformulation further increases the nonlinearity of the problem, we instead use (5).Next, we derive first and second derivatives of the cross-gradient regularization,interpret these derivatives as PDE operators, and draw analogies with the derivativesof the TV functional R TV or its regularized version (6). For this purpose, we firstderive the first and second variation of the TV functional as follows: δ m R TV ( m ; ˜ m ) = (cid:90) Ω |∇ m | − ( ∇ m · ∇ ˜ m ) dx,δ m R TV ( m ; ˆ m, ˜ m ) = (cid:90) Ω |∇ m | − ( ∇ ˆ m · ∇ ˜ m ) − |∇ m | − ( ∇ m · ∇ ˜ m )( ∇ m · ∇ ˆ m ) dx, where ˜ m and ˆ m are arbitrary directions. Using integration by parts, the fact that ˜ m in the expression for δ m is arbitrary, and the vector identity ( a · b )( c · d ) = b · ( ac T ) · d ,one finds that the Hessian H is the following second-order elliptic PDE operator H TV ˆ m := −∇· ( A TV ( m ) ∇ ˆ m ) , tructural similarity and regularization for joint inverse PDE problems A TV ( m ) = 1 |∇ m | (cid:18) I − ∇ m ∇ m T |∇ m | (cid:19) . (7)This interpretation as diffusion operator shows that H TV acts very differently atdifferent points x ∈ Ω. In particular, let us consider a point x where the norm of ∇ m is large, e.g., x is located at an interface in the parameter field m . Then, in directionsorthogonal to ∇ m (i.e., directions normal to an interface), A TV vanishes and thus theelliptic operator does not smooth the reconstruction m in these directions. In contrast,in directions that are orthogonal to ∇ m (i.e., directions that are tangent to interfaces), A TV does not vanish, thus smoothing the reconstruction m along interfaces. Thisexplains the anisotropic smoothing properties of the TV functional and, in particular,its ability to recover sharp interfaces in parameter fields. Away from interfaces, where ∇ m is small, A TV behaves like a scaled identity, thus smoothing m in all directions,much as H norm-based Tikhonov regularization does.We now turn to the derivation of the derivatives of the cross-gradient term ˆ R cg .Following similar arguments as for the scalar TV regularization above, this will provideus with insight regarding the regularization properties. Additionally, these derivativesare useful for devising a Newton-type algorithm for the inverse problem solution andfor preconditioning the linear systems that arise.Starting from (4), we now compute the gradient, and the action of the Hessianin a given direction for the cross-gradient term. We perform the computations usingweak forms and then use integration by parts to derive the corresponding strong forms.The directional derivative at m := ( m , m ) in a direction ˜ m := ( ˜ m , ˜ m ) is given by δ m ˆ R cg ( m ; ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ m ) − ( ∇ m · ∇ m )( ∇ ˜ m · ∇ m ) dx,δ m ˆ R cg ( m ; ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ m ) − ( ∇ m · ∇ m )( ∇ ˜ m · ∇ m ) dx. Taking another variation, we find that the action of the Hessian of the cross-gradientterm in a direction ˆ m = ( ˆ m , ˆ m ) is given by δ m ˆ R cg ( m ; ˆ m , ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ ˆ m ) − ( ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) dx,δ m ,m ˆ R cg ( m ; ˆ m , ˜ m ) = (cid:90) Ω ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) − ( ∇ m · ∇ m )( ∇ ˜ m · ∇ ˆ m ) − ( ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) dx,δ m ˆ R cg ( m ; ˆ m , ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ ˆ m ) − ( ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) dx. In strong form and neglecting boundary conditions, the Hessian H acts, in adirection ˆ m , like an anisotropic vector diffusion operator, i.e., H ˆ m = −∇· ( A cg ( m ) ∇ ˆ m ) , where A cg is a diffusion tensor given by A cg ( m ) = (cid:20) D ( m ) B ( m ) B ( m ) T D ( m ) (cid:21) , (8) tructural similarity and regularization for joint inverse PDE problems i = 1 , D ( m i ) := |∇ m i | I − ∇ m i ∇ m Ti ,B ( m ) := 2 ∇ m ∇ m T − ( ∇ m · ∇ m ) I − ∇ m ∇ m T . The block-diagonal part of A cg indicates a TV-like behavior but where parameter m (resp. m ) preserves interfaces in directions where parameter m (resp. m ) presentsan interface; this illustrates the coupling between both parameters. As we shownumerically in figure 2, the Hessian of the cross-gradient term can be indefinite. TheTV regularization being a convex functional, its Hessian is guaranteed to be positivesemidefinite. Therefore, the Hessian obtained by retaining the block diagonal parts ofthe diffusion tensor (8), i.e., H d ˆ m := −∇· ( A cg ,d ( m ) ∇ ˆ m ), with A cg ,d ( m ) := (cid:20) D ( m ) 00 D ( m ) (cid:21) , (9)is also guaranteed to be positive semidefinite. For this reason, when using the cross-gradient paired with two independent TV regularizations, we precondition the Newtonsystem with a block-diagonal matrix containing the Hessian of the TV regularizations,combined with a small multiple of the identity in each block, and the block-diagonalpart of the Hessian of the cross-gradient term (9).0 2 , · , − − − · , · eigenvalue 0 2 , − − − · eigenvalue(i) m (ii) m (iii) cross-gradient (iv) norm.cross-gd Figure 2.
Eigenvalues of the Hessian operator (blue) and block-diagonal partof the Hessian operator (red) for the (iii) cross-gradient term (3) and the (iv)normalized cross-gradient term (10) with ε = 10 − , for two combinations of truthparameter fields (i) m and (ii) m . The domain is a unit square discretized bya 40 ×
40 mesh of squares subdivided into triangles, and the parameter fields m and m are discretized using continuous piecewise linear finite elements. tructural similarity and regularization for joint inverse PDE problems A disadvantage of the cross-gradient term (4) is that it vanishes where one of theinversion parameter fields is constant, hence potentially ignoring sharp discontinuitiesin the other. A remedy, proposed in the context of image registration in [9], is tonormalize the gradient of both inversion parameters in the formulation of the cross-gradient. The normalized cross-gradient is given byˆ R ( m , m ) = 12 (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12) ∇ m |∇ m | × ∇ m |∇ m | (cid:12)(cid:12)(cid:12)(cid:12) dx = 12 (cid:90) Ω − (cid:18) ∇ m · ∇ m |∇ m ||∇ m | (cid:19) dx. Since this formulation is non-differentiable where |∇ m | = 0 or |∇ m | = 0, we use themodified normalized cross-gradient,ˆ R ncg ( m , m ) := 12 (cid:90) Ω − (cid:32) ∇ m · ∇ m (cid:112) |∇ m | + ε (cid:112) |∇ m | + ε (cid:33) dx, (10)with ε >
0. In the rest of this paper, we refer to (10) when discussing the normalizedcross-gradient. Combining the normalized cross-gradient term (10) with two TVregularizations, we obtain the joint regularization R ( m , m ) = γ R TV ,ε ( m ) + γ R TV ,ε ( m ) + γ ˆ R ncg ( m , m ) , (11)where γ, γ , γ >
0. Compared to the cross-gradient term, the derivatives ofthe normalized cross-gradient term give less obvious insight into its regularizationbehavior. Instead, we illustrate numerically that the normalized cross-gradient oftenbehaves as a concave operator. In figure 2, we plot the eigenvalues of its Hessian andof the block-diagonal part of its Hessian for different parameter fields m and m , andobserve that most eigenvalues are negative. The main practical consequence of thisobservation is that the Hessian of the joint regularization (11) may be indefinite. Forthis reason, the preconditioner for the Newton system is formed by the Hessians ofthe TV regularizations alone.
3. Vectorial total variation
The vectorial total variation functional [13], or color TV [12], is the multi-parameterequivalent of the total variation functional. It was first introduced for multi-channelimaging applications [12, 13], and later applied to joint inverse problems [10]. TheVTV functional is convex, and unlike the cross-gradient and normalized cross-gradient,it serves as a regularization by itself, i.e., it does not require additional regularizationterms. It is given by R ( m , m ) = γ (cid:90) Ω (cid:112) |∇ m | + |∇ m | dx, (12)with γ >
0. Since this formulation is non-differentiable where |∇ m | = |∇ m | = 0,we introduce a modified VTV regularization given by R VTV ( m , m ) := γ (cid:90) Ω (cid:112) |∇ m | + |∇ m | + ε dx, (13) tructural similarity and regularization for joint inverse PDE problems ε, γ >
0. Whereas the cross-gradient terms (see section 2) work by aligningthe level sets of the inversion parameter fields, VTV favors superimposition ofdiscontinuities. An intuitive way to explain this, given the understanding of the TVregularization [19], is sketched in figure 3. Given two parameter fields with a singlejump of same amplitude, the VTV functional is minimum when both jumps occur atthe same location. 0 1 200 . m m . R VTV ( m , m ) = ”2 (cid:82) Ω (cid:112) |∇ m | dx ” > ” √ (cid:82) Ω (cid:112) |∇ m | dx ” Figure 3.
Values of the VTV regularization (12), for two parameter fields m and m defined over Ω = [0 , R TV ( m ) = R TV ( m ). This informal argument canbe made rigorous by using piecewise linear functions for m and m . The derivatives of the VTV regularization resemble those of the TVregularization. For simplicity, we set γ ≡ m = ( m , m ) in a direction ˜ m = ( ˜ m , ˜ m ) is given by δ m i R VTV ( m ; ˜ m i ) = (cid:90) Ω ∇ m i · ∇ ˜ m i (cid:112) |∇ m | + |∇ m | + ε dx, for i = 1 , . (14)We again interpret the Hessian of the VTV as a diffusion tensor to study its anisotropicdiffusion behavior. In strong form (see section 2.1), it is given by A VTV ( m ) := 1 |∇ m | ε I − ∇ m ∇ m T |∇ m | ε − ∇ m ∇ m T |∇ m | ε − ∇ m ∇ m T |∇ m | ε I − ∇ m ∇ m T |∇ m | ε , (15)where |∇ m | ε = |∇ m | + |∇ m | + ε . Comparing with the diffusion tensor for theHessian of the TV regularization (7), we find similar terms along the block diagonal,with the exception of the normalization factor in the denominator. It is |∇ m i | inthe case of TV, and |∇ m | ε in the case of VTV, i.e., it involves the gradient of bothparameters, hence introducing coupling between the parameter fields. The eigen-decomposition of the diffusion tensor of the Hessian provides further insights. Forsimplicity, we use ε = 0 in this analysis. Skipping details that can be found in [20],the eigenpairs for the diffusion tensor A VTV are (cid:18)(cid:20) ∇ m ∇ m (cid:21) , (cid:19) , (cid:18)(cid:20) ( ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) , (cid:18)(cid:20) ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) , (cid:18)(cid:20) ∇ m −∇ m (cid:21) , |∇ m | (cid:19) . The kernel of the diffusion tensor contains parameter field directions that are notsmoothed out by the regularization. Reconstructions in these directions can displaysharp edges. It is informative to compare the eigenpairs of the diffusion tensor arisingfrom the VTV Hessian with those arising from R TV ( m ) + R TV ( m ), the sum of two tructural similarity and regularization for joint inverse PDE problems (cid:18)(cid:20) ∇ m (cid:21) , (cid:19) , (cid:18)(cid:20) ∇ m (cid:21) , (cid:19) , (cid:18)(cid:20) ( ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) , (cid:18)(cid:20) ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) . The sum of independent TV regularizations acts in the direction of each parameter m i independently from the other parameters, analogously to the TV functional for a singleinverse problem. That is, it preserves sharp interfaces in the parameter fields (largevalues of |∇ m i | ) but smoothes along interfaces. This is in contrast with the kernelof the diffusion tensor of VTV, which favors parameter fields with sharp variationsoccurring at the same physical locations.The use of TV regularization in PDE-constrained inverse problems increases thenonlinearity of the problem, and requires the use of customized solvers. Due to thesimilarity between TV and VTV, a similar challenging numerical behaviour can beexpected for VTV. In [20] we tailor a primal-dual Newton method [21] for the efficient,scalable solution of PDE-constrained joint inverse problems regularized with VTV.Since the focus of the current paper is on a qualitative comparison of several jointregularization terms, we skip details of this solver here and instead refer to [20].
4. Nuclear norm joint regularization
The nuclear norm joint regularization seeks to promote gradient alignment byminimizing the rank of the Jacobian of the gradients of the parameter fields. Differentversions of that idea have been used in various imaging applications. In color imagedenoising, this approach is often referred to as total nuclear variation [14]; the unifiedframework to discuss VTV and the total nuclear variation in [14] shows that the nuclearnorm-based functional is a regularizer in itself. This can be simply justified by theequivalence of all norms in finite dimensions (here, on the space of matrices). In [15],the authors propose the pointwise nuclear norm of a matrix field as regularizationto express a preference for alignment of image edges. Building on [14, 15], wepropose a nuclear norm joint regularization suitable for large-scale PDE constrainedoptimization.As for the methods discussed in section 2, this term seeks to promote alignmentof parameter level sets by attaining its minimum value when gradients align. Letus introduce the matrix-valued function G : Ω → R d × , with Ω ⊂ R d the physicaldomain, defined by G ( x ) := (cid:2) ∇ m |∇ m (cid:3) = ∂ x m ∂ x m ... ... ∂ x d m ∂ x d m . The gradients ∇ m and ∇ m are aligned at x ∈ Ω if the columns of G ( x ) are multiplesof each other, in which case the rank of G ( x ) is 1. One could seek to promote gradientalignment by minimizing (cid:82) Ω rank( G ( x )) dx . However, in practice, minimization of therank of a matrix is notoriously difficult. The nuclear norm of a matrix, defined asthe (cid:96) -norm of its singular values and denoted by (cid:107)· (cid:107) ∗ , is often a good proxy for therank [22]. We therefore define, with γ >
0, the nuclear norm joint regularization as R ∗ ( m , m ) := γ (cid:90) Ω (cid:107) G ( x ) (cid:107) ∗ dx. (16) tructural similarity and regularization for joint inverse PDE problems We now compute derivatives of (16) using the chain rule. Let us introduce the notation f ( M ) := (cid:107) M (cid:107) ∗ , for arbitrary M ∈ R d × . Thus, R ∗ ( m , m ) = γ (cid:82) Ω f ( G ( x )) dx .Denoting the gradient of f with respect to the entries of matrix M by ∇ f ( M ) ∈ R d × ,the first directional derivatives of (16) with respect to the inversion parameters m i , i = 1 ,
2, in a direction ˜ m i , are given by ∂ m i R ∗ ( m , m ) ˜ m i = γ (cid:90) Ω ( ∇ f ( G ) , ∂ m i G ( x ) ˜ m i ) dx, (17)where ∂ m G ( x ) ˜ m = ∂ x ˜ m ∂ x d ˜ m and ∂ m G ( x ) ˜ m = ∂ x ˜ m ... ...0 ∂ x d ˜ m , and the inner product for matrices M = ( m ij ) ij , N = ( n ij ) ij ∈ R d × is defined as( M, N ) = (cid:80) di =1 (cid:80) j =1 m ij n ij .We next compute the gradient of the nuclear norm ∇ f ( M ). Given a full-rankmatrix M ∈ R n × m , i.e., r := rank( M ) = min( m, n ), and singular values { σ k } rk =1 ,we define its (reduced) singular value decomposition (SVD) by M = U Σ V T , with U ∈ R n × r , V ∈ R m × r , and Σ ∈ R r × r a diagonal matrix containing the singular valuesof M , i.e., Σ kk = σ k > k = 1 , . . . , r . The ( i, j )-entry of the gradient of the nuclearnorm is given by ( ∇ f ( M )) ij = r (cid:88) k =1 ∂σ k ∂m ij = r (cid:88) k =1 u ik v jk , where the second equality uses the singular value sensitivity [23]. The gradient of thenuclear norm with respect to the entries of M is then given by ∇ f ( M ) = U V T . The nuclear norm f ( M ) is non-differentiable when the matrix M is not full-rank,corresponding to the case where at least one of the singular values vanishes. To makeit differentiable, similar to the treatment of TV regularization, we define the modifiednuclear norm by f ε ( M ) := (cid:107) M (cid:107) ∗ ,ε = min( m,n ) (cid:88) k =1 (cid:113) σ k + ε, (18)where ε >
0. For γ >
0, we define the modified nuclear norm joint regularization as R ∗ ,ε ( m , m ) := γ (cid:90) Ω f ε ( G ( x )) dx. (19)The ( i, j )-entry of the gradient of the modified nuclear norm (18) is given by( ∇ f ε ( M )) ij = ∂∂m ij min( m,n ) (cid:88) k =1 (cid:113) σ k + ε = r (cid:88) k =1 σ k (cid:112) σ k + ε ∂σ k ∂m ij , tructural similarity and regularization for joint inverse PDE problems r since, by definition of the rank of amatrix, σ k = 0 for all k > r . Let us now introduce the diagonal matrix W ε ∈ R r × r ,with entries ( W ε ) ii = σ i / (cid:112) σ i + ε . Using the expression for the sensitivity of thesingular values [23], the gradient of the modified nuclear norm is then given by ∇ f ε ( M ) = U W ε V T . (20)The first directional derivatives of (19) with respect to the inversion parameters m i , i = 1 ,
2, in a direction ˜ m i , are given by ∂ m i R ∗ ,ε ( m , m ) ˜ m i = γ (cid:90) Ω ( ∇ f ε ( G ) , ∂ m i G ( x ) ˜ m i ) dx, (21)The modified nuclear norm (18), however, is not twice differentiable when twosingular values are equal (crossing singular values). This is because the secondderivative requires the sensitivity of the individual singular vectors, which are notdifferentiable where singular values cross. We have not found a practical workaroundfor this singularity, and thus proceed with a gradient-based method to solve jointinverse problems regularized with the nuclear norm joint regularization; the solver isdetailed in Appendix A.2. In the rest of this paper, when using “nuclear norm jointregularization”, we refer to the modified nuclear norm joint regularization (19).
5. Numerical examples
In this section, we present a comprehensive numerical comparison of the four jointregularization approaches introduced in sections 2–4, i.e., the cross-gradient (5), thenormalized cross-gradient (11), the vectorial total variation (13), and the nuclear norm(18) regularization. Reconstructions obtained with these joint regularization terms arecompared with each other, and with the reconstructions obtained by solving a jointinverse problem with independent TV regularizations. The parameters for all jointregularization terms are selected empirically as leading to the best reconstructions.The values of ε are chosen small enough to provide reconstructions with sharpinterfaces, but large enough to avoid numerical difficulties (see for instance thediscussion in [24]).The different regularizations are compared using three examples covering bothtypes of joint inverse problems (1) and (2). In section 5.1, we combine two uncoupledPoisson inverse problems to form the joint inverse problem (2), where we invoke priorknowledge that the two truth parameter fields have similar structure. In section 5.2,we compare the ability of the joint regularization terms to improve the reconstructionof the bulk modulus and the density in an acoustic wave equation, an example of a jointinverse problem (1). Finally, in section 5.3, we formulate a multi-physics joint inverseproblem (2), which combines an inverse problem governed by the Poisson equationwith one governed by the acoustic wave equation. Here again, the Poisson parameterand the wave speed fields are assumed to have similar structure.In all examples, the domain is a 2D unit square, with a uniform mesh of isoscelesright triangles obtained by cutting in half N × N squares; we define the mesh sizeparameter h := 1 /N . All data are generated synthetically from the truth parameterfields, and then polluted by adding independent and identically distributed Gaussiannoise; the noise level is specific to each example. We use continuous Galerkin finiteelements to discretize all field variables, with the state, adjoint, incremental state, tructural similarity and regularization for joint inverse PDE problems Here, we solve a joint inverse problem of the form (2) for the two coefficient fields m and m . Considered separately, m and m are solutions to the (almost identical)TV-regularized inverse problems governed by the Poisson equation, i.e., m i := arg min m (cid:26) | B i u − d i | + γ i (cid:90) Ω (cid:112) |∇ m | + ε dx (cid:27) , where (cid:40) −∇· ( e m ∇ u ) = 1 , in Ω ,u = 0 , on ∂ Ω . (22)The operators B i represent pointwise observation operators, and the data d i aresynthetic observations polluted with 2% Gaussian noise. The domain Ω is discretizedwith a mesh of 8192 triangles (i.e., h = 1 / m ≡ m ≡ m and m reside in thetruth parameter fields, and in the observation operators B i . In the first example(section 5.1.1), the truth parameter fields differ but have interfaces at the same spatiallocations. In the second example (section 5.1.2), some interfaces in the truth parameterfield for m are not present in the truth parameter field for m . In both examples, theobservation locations defined by B only cover the top-right quadrant of the domain,whereas the observation locations defined by B are distributed over the entire domain;see figures 4 and 6. In the first example,the parameter fields have interfaces at the same locations. In figure 4, we show thetruth parameter fields m and m and their reconstructions obtained by solving theinverse problems (22) independently. The reconstructions obtained with the fourregularization methods are shown in figure 5, and the corresponding values of therelative medium misfit are given in table C1.The reconstructions for parameter m do not differ significantly (figure 5b). Dueto the large number of observation points, this parameter is already well reconstructedin an independent inverse problem (figure 4d). We observe an improvement inthe reconstruction of parameter m for all four joint inverse problems compared tothe independent reconstruction shown in figure 4c. Using the cross-gradient onlymarginally improves the reconstruction for parameter m , most likely because theindependent reconstruction for m shows large areas of constant values, where thecross-gradient term vanishes; these areas therefore cannot be improved by the cross-gradient. The normalized cross-gradient improves over the cross-gradient but fails torecover the circular interface. Both the VTV joint regularization and the nuclear norm tructural similarity and regularization for joint inverse PDE problems m (b) Truth m (c) Indep m (d) Indep m Figure 4.
Parameter fields m and m in the example of section 5.1.1: truthparameter fields (a,b) and reconstructions (c,d) obtained by solving the inverseproblem (22) with ε = 10 − , γ = 3 · − , γ = 4 · − , and initial guesses m = m = 0. White dots in (a) and (b) indicate the location of the pointwiseobservations. The observation points defined through B are a lattice of 25 × B are a square lattice of 50 ×
50 points distributed over the entiredomain. ( a ) m ( b ) m (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 5.
Reconstructions for the parameter fields (a) m and (b) m , obtainedby solving a joint inverse problem (2) regularized with (i) the cross-gradient( γ = 2 · − ) combined with two independent TV regularizations, (ii) thenormalized cross-gradient ( γ = 6 · − and ε = 10 − ) combined with the sameindependent TV regularizations, (iii) the VTV joint regularization ( γ = 3 · − and ε = 10 − ), and (iv) the nuclear norm joint regularization ( γ = 3 · − and ε = 10 − ). The parameters for the independent TV regularizations and the initialguesses for all problems are as for the independent inverse problems (see captionof figure 4). The legend for all plots is as in figure 4. tructural similarity and regularization for joint inverse PDE problems m (b) Truth m (c) Indep m (d) Indep m Figure 6.
Parameter fields for m and m in the example of section 5.1.2: truthparameter field (a,b) and reconstructions (c,d) obtained by solving the inverseproblem (22) with ε = 10 − , γ = 4 · − , γ = 4 · − , and initial guesses m = m = 0. White dots in (a) and (b) indicate the location of the pointwiseobservations, as detailed in figure 4. joint regularization perform better in this example, and lead to reconstructions thatcontain all sharp interfaces in the target image. Here, the onlydifference with the previous example is that the truth parameter field for m nolonger has a vertical discontinuity along the line x = 0 . m and m obtained by solvingtwo independent inverse problems (22). The reconstructions for the four joint inverseproblems are shown in figure 7, and the corresponding values of the relative mediummisfit are given in table C1.As in the previous example, for m the reconstructions obtained with the differentjoint inverse problems do not differ significantly (see figure 7b). However, we observedifferences among the reconstructions for parameter m . Using the cross-gradient onlymarginally improves the reconstruction for parameter m . The use of the normalizedcross-gradient does not show improvement over the cross-gradient. As in the firstexample, both the VTV joint regularization and the nuclear norm joint regularizationperform the best, and their corresponding reconstructions contain all sharp interfacespresent in the true image. However, in figures 7a (iii) and (iv) we also see a verticaldiscontinuity not present in the true image 6c. This ghost interface in m is due to thepresence of such a discontinuity in m , and highlights the tendency of the VTV jointregularization and nuclear norm joint regularization to superimpose discontinuities inboth parameters. Note, however, that the amplitude of this ghost interface is smallcompared to the amplitudes of the correctly recovered interfaces. We now study a joint inverse problem of the form (1), i.e., both parameters enter thesame equation, namely the acoustic wave equation.
We start by defining the forward problem, i.e., theacoustic wave PDE. The propagation of acoustic waves depends on the bulk modulus κ and the density ρ of the medium of propagation. Let us define the acoustic pressure, u ( x , t ) := − κ ( x ) ∇· u ( x , t ), with u ( x , t ) the displacement vector at location x andtime t . The time-domain acoustic wave equation with first order absorbing boundary tructural similarity and regularization for joint inverse PDE problems ( a ) m ( b ) m (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 7.
Reconstructions for the parameter fields (a) m and (b) m ,obtained by solving a joint inverse problem (2) regularized with (i) the cross-gradient combined with 2 independent TV regularizations ( γ = 5 · − ),(ii) the normalized cross-gradient combined with the same independent TVregularizations ( γ = 7 · − and ε = 10 − ), (iii) the VTV joint regularization( γ = 4 · − and ε = 10 − ), and (iv) the nuclear norm joint regularization ( γ =4 · − and ε = 10 − ). The parameters for the independent TV regularizationsand all initial guesses are the same as used for the independent inverse problems(see caption in figure 6). The legend is as in figure 6. condition [28] and initial conditions at rest is given by1 κ ¨ u − ∇· (cid:18) ρ ∇ u (cid:19) = f, in Ω × (0 , T ) ,u ( x ,
0) = ˙ u ( x ,
0) = 0 , in Ω , ρ ∇ u · n = 0 , on ∂ Ω n × (0 , T ) , ρ ∇ u · n = − √ κρ ˙ u, on ∂ Ω a × (0 , T ) , (23)where f is a forcing term, ˙ u and ¨ u are the first and second time derivatives of u , andthe boundary of the domain ∂ Ω is partitioned as ∂ Ω = ∂ Ω a ∪ ∂ Ω n . The acoustic wavevelocity of the medium is given by c , with the relation κ = ρc . The PDE in (23) isthe variable density form of the acoustic wave equation; when the density ρ is assumedconstant, equation (23) reduces to c ¨ u − ∆ u = ˜ f .Here, we assume that both the bulk modulus κ and the density ρ are unknown.Since they both appear in (23) through their inverse, we introduce the parameters α := 1 /κ and β := 1 /ρ , and formulate the inverse problem in terms of α and β . Ascommon in seismic inversion, we consider N s multiple experiments, characterized bytheir forcing terms f i and datasets d i , which corresponds to pointwise observationsin space, recorded continuously in time. The acoustic wave inverse problem is then tructural similarity and regularization for joint inverse PDE problems α,β> (cid:40) N s N s (cid:88) i =1 (cid:90) T | Bu i ( t ) − d i ( t ) | dt + R ( α, β ) (cid:41) , (24)where each u i solves the forward problem (23) with forcing term f i , α ¨ u i − ∇· ( β ∇ u i ) = f i , in Ω × (0 , T ) ,u i ( x ,
0) = ˙ u i ( x ,
0) = 0 , in Ω ,β ∇ u i · n = 0 , on ∂ Ω n × (0 , T ) ,β ∇ u i · n = − (cid:112) αβ ˙ u i , on ∂ Ω a × (0 , T ) . In our experiments, the physical constraints α, β >
Because the solution ofthe acoustic wave equation couples the parameters α and β , the inverse problem (24)could be regularized by two independent TV regularizations, i.e., R ( α, β ) = R TV ,ε ( α )+ R TV ,ε ( β ) [1]. However, the resulting problem can be difficult to solve and does notincorporate the structural correlation that usually exists between these parametersdue to the types of rock occurring in the subsurface. Going beyond the use of ad-hocmethods to handle both parameters at once, some researchers have addressed (24) asa joint inverse problem [2, 3]. Previous attempts have used the cross-gradient term,but not its normalized version, the VTV or the nuclear norm regularization. In thissection, we study whether the use of joint regularization can improve reconstructionsfor α and β .In our numerical tests, we use 6 independent sources, f i ( x , t ), located on the topboundary of the domain at 0.1, 0.25, 0.4, 0.6, 0.75, and 0.9 from the left boundary(yellow stars in figure 8a); each source is a point source in space, and a Ricker waveletin time with a central frequency of 2 Hz. The data are recorded at 20 locationsequally spaced along the top boundary (green triangles in figure 8b), and polluted byindependent Gaussian noise with zero mean and variance corresponding to a signal-to-noise ratio of 20 dB. The boundary conditions are a homogeneous Neumann boundarycondition along the top boundary ∂ Ω n = [0 , × { } , and an absorbing boundarycondition along the left, bottom, and right boundaries ∂ Ω a = { , }× [0 , ∪ [0 , ×{ } .The truth parameter fields for α and β are shown in figure 8; they correspond to anacoustic wave velocity varying from 2km/s to 3km/s ‡ , typical values for a shallowsubsurface (see for instance [29, 30]). The finite-element mesh consists of 800 triangles( h = 1 / α and β are smoothed versions of thetruth parameters fields (see figure 8ii).In figure 8iii, we show the reconstructions of parameters α and β obtained bysolving (24) with independent TV regularizations. Whereas parameter α is wellreconstructed, the reconstruction for β is rather poor. We next solve (24) withthe proposed joint regularization terms. The results are shown in figure 9, and thecorresponding values of the relative medium misfit are given in table C1.The different reconstructions for α (figure 9a) do not differ significantly fromeach other. However, the use of joint regularization improves the quality of the ‡ The following units are used: distance in km, velocity in km/s, density in g/cm , and bulk modulusin GPa. tructural similarity and regularization for joint inverse PDE problems ( a ) α ( b ) β (i) truth (ii) initial (iii) independent Figure 8.
Parameter fields (a) α and (b) β in the joint acousticinverse problem (24): (i) truth parameter fields, (ii) initial guesses, and(iii) reconstructions when solving (24) regularized with two independent TVregularizations ( ε = 10 − , γ α = 5 · − , and γ β = 9 · − ). The yellow starsin (a-i) and the green triangles in (b-i) indicate the locations of the point sourcesand observations, respectively. ( a ) α ( b ) β (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 9.
Reconstructions for the parameter fields (a) α and (b) β , obtained bysolving (24) regularized with (i) the cross-gradient ( γ = 10 − ) combined with twoindependent TV regularizations, (ii) the normalized cross-gradient ( γ = 9 · − and ε = 10 − ) combined with the same independent TV regularizations, (iii)the VTV joint regularization ( γ = 7 · − and ε = 10 − ), and (iv) the nuclearnorm joint regularization ( γ = 7 · − and ε = 10 − ). The parameters for theindependent TV regularizations are the ones selected for the independent inverseproblems (see caption in figure 8). The legend is as in figure 8. tructural similarity and regularization for joint inverse PDE problems β (figure 9b). Whereas the use of the cross-gradientonly results in marginal improvement compared to the reconstruction in figure 8d, theuse of the normalized cross-gradient allows recovery of the interfaces more clearly.The best reconstructions are obtained using the VTV or the nuclear norm jointregularizations. As a last problem, we study a joint inverse problem (2) governed by two differentphysics models; namely, we combine a Poisson inverse problem and an acoustic waveinverse problem (assuming the density ρ is known). This inverse problem is intendedas a model problem for joint seismic-electromagnetic inversion in the electromagneticlow frequency limit. The Poisson inverse problem is identical to the one used insection 5.1, min m (cid:26) | Bu − d | + γ m (cid:90) Ω (cid:112) |∇ m | + ε dx (cid:27) , where (cid:40) −∇· ( e m ∇ u ) = 1 in Ω ,u = 0 on ∂ Ω . (25)The observation operator B extracts the state u at 20 ×
20 equally distributed pointsover the entire domain (white dots in figure 10a). The data are polluted with1% Gaussian noise. For the acoustic wave inverse problem, we set β ≡
1, and invertonly for the parameter α = 1 /κ = 1 /c ,min α (cid:40) (cid:90) T | Bu ( t ) − d ( t ) | dt + γ α (cid:90) Ω (cid:112) |∇ α | + ε (cid:41) dx, where α ¨ u − ∆ u = f α , in Ω × (0 , T ) ,u ( x ,
0) = ˙ u ( x ,
0) = 0 , in Ω , ∇ u · n = 0 , on ∂ Ω n × (0 , T ) , ∇ u · n = −√ α ˙ u, on ∂ Ω a × (0 , T ) . (26)We use a single source f α with frequency 2 Hz or 4 Hz, located at (0 . , .
1) (yellow starin figure 11a), and 20 pointwise observations equally spaced along the top boundary(green triangles in figure 11a). The boundary conditions, the noise level in the data,the mesh, and the numerical discretization are as in section 5.2. The initial guess forthe Poisson parameter field m (resp. for the acoustic parameter field α ) is set to aconstant field with value 0 .
625 (resp. 0 . m and α when (25) and (26) are solved independently. The results for the Poisson inverseproblem (26) are shown in figure 10b, where it can be seen that the horizontal interfaceis well reconstructed, but the shape of the rectangular perturbation is smeared out. Forthe acoustic wave inverse problem (26), we show two reconstructions in figure 11, onewith a source f α of frequency 2 Hz (figure 11b), and one with a source f α of frequency4 Hz (figure 11c). While the reconstruction at 2 Hz is excellent, the reconstructionat 4 Hz lacks sufficient low-frequency information and appears to converge toward alocal minimum, missing the horizontal discontinuity present in the truth parameterfield (figure 11a). The reconstructions for all four joint inverse problems, with a seismic tructural similarity and regularization for joint inverse PDE problems Figure 10.
Plots of (a) truth parameter field for m in the Poisson inverseproblem (25), and (b) its reconstruction ( γ m = 2 · − and ε = 10 − ) withinitial parameter field set to a constant value of 0 . (a) truth (b) reconstruction 2 Hz (c) reconstruction 4 Hz Figure 11.
Plots of (a) truth parameter field for α in the acoustic inverseproblem (26), and (b,c) its reconstructions ( γ α = 3 · − and ε = 10 − ) withinitial value for the parameter field set to 0 .
25, and a source f α of frequency(b) 2 Hz, and (c) 4 Hz. The green triangles in (a) indicate the locations of thepointwise observations, and the yellow star in (a) indicates the location of thesource. source f α of frequency 4 Hz, are shown in figure 12, and the corresponding values ofthe relative medium misfit are given in table C1.The use of the cross-gradient or its normalized variant improves the reconstructionfor the Poisson parameter m (figures 12a, (i) and (ii)), compared to the reconstructionfrom the Poisson inverse problem (25) alone (figure 10b). However, neither of thecross-gradient terms brings any improvement to the reconstruction of the acousticwave velocity (figures 12b, (i) and (ii)); in particular, the reconstructions do not showthe horizontal discontinuity that was missing in the reconstruction of the acousticwave velocity alone (figure 11c). On the other hand, the use of either the VTVjoint regularization, or the nuclear norm joint regularization, leads to significantimprovements in the reconstruction of the acoustic wave velocity (figures 12b, (iii)and (iv)). Both reconstructions contain all features of the truth parameter field(figure 11a); most noticeably, the horizontal discontinuity that was missing in theindependent reconstruction (figure 11c) is now fully reconstructed. The use of theVTV joint regularization provides only marginal improvement to the reconstructionof the Poisson parameter m , in terms of relative medium misfit (table C1); however,the shape of the rectangular perturbation, which was smeared out in the reconstructionfrom the Poisson inverse problem alone (figure 10b), is clearer in figure 12a(iii).The reconstruction of the Poisson parameter obtained with the nuclear norm joint tructural similarity and regularization for joint inverse PDE problems ( a ) m ( b ) α (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 12.
Reconstructions for the parameter fields (a) m in (25) and (b) α in (26), obtained by solving a joint inverse problem with seismic source f α offrequency 4 Hz, and regularized with (i) the cross-gradient ( γ = 8 · − ) combinedwith two TV regularizations, (ii) the normalized cross-gradient ( γ = 8 · − and ε = 10 − ) combined with the same TV regularizations, (iii) the VTV jointregularization ( γ = 4 · − and ε = 10 − ), and (iv) the nuclear norm jointregularization ( γ = 5 · − and ε = 10 − ). The parameters for the independentTV regularizations are as for the independent inverse problems (see captions offigures 10 and 11). Legend is the same as in figures 10 and 11. regularization indicates that the optimization converged to a local minimum. Despiteall discontinuities present in the truth parameter field (figure 10a) being clearlyreconstructed in figure 12a(iv), the values of the parameters are significantly different.Similar, or worse, performance was observed when setting H to be a multiple ofthe identity matrix in the BFGS solver [17]. Moreover, almost identical results wereobtained when solving the Poisson-acoustic joint inverse problem, regularized by VTV,using the BFGS method described in Appendix A.2. We therefore conjecture that thepoor performance of the nuclear norm joint regularization, in the case of a multi-physics joint inverse problem, can be attributed to the use of a gradient-based methodfor the solution of the joint inverse problem. The significant difference in the structureof the gradients, coming from the Poisson and acoustic wave inverse problems, dictatethe use of a Newton method, which is affine-invariant, in order to balance theindividual search directions. This conjecture is supported by previous results foundin the literature. For instance, in the context of a joint full waveform inversionfor the conductivity and permittivity of a medium, the authors in [31] found thereconstructions obtained using the L-BFGS method to be highly sensitive to the scalingof the parameter fields. The authors of [32] report similar difficulties when employinga quasi-Newton method on a cross-well example, inverting for compressibility andanisotropy, and study alternative formulations to remedy this problem. tructural similarity and regularization for joint inverse PDE problems
6. Conclusion
We conducted a systematic review of regularization terms for joint inverse problemsgoverned by PDEs with infinite-dimensional parameter fields. We considered twotypes of joint inverse problems: (1) those coupling several uncoupled physics forwardproblems via joint regularization terms, and (2) those in which all inversion parametersdepend on the same physics. Based on a review of the literature, we identifiedthree joint regularization terms for this study that are tractable for large-scalePDE constrained joint inverse problems. The cross-gradient is a popular choice ingeophysical applications and seeks to align level sets of the parameter fields. Thenormalized cross-gradient was designed to overcome some of the potential weaknessesof the cross-gradient term. The vectorial total variation is an extension of totalvariation regularization to joint inverse problems, and originated from the imagingcommunity. In addition, we introduced a fourth novel joint regularization term basedon the nuclear norm of a gradient matrix. The comparison of these joint regularizationterms was carried out for three problems: (1) a joint Poisson inverse problem for whichthe truth parameter fields are known to share a similar structure, (2) an acoustic waveinverse problem in which we invert for the bulk modulus and the density, and (3) ajoint Poisson–acoustic wave inverse problem, providing an example of multiple physicsjoint inversion.Based on this study, we recommend use of the vectorial total variation jointregularization. It leads to superior reconstructions in all our examples. Moreover,we have available a scalable, efficient primal-dual nonlinear optimization solver andHessian preconditioner for joint inverse problems regularized with this term [20]. Thenuclear norm joint regularization showed encouraging results, even leading to slightlybetter reconstructions than the vectorial total variation for some examples. However,its numerical realization is challenging since it is not twice differentiable as requiredby Newton’s method. For piecewise-homogeneous parameter fields, the cross-gradientsimilarity term does not improve significantly over independent reconstructions. Inparticular, it can fail to reconstruct some edges entirely, since the cross-gradient termvanishes at points where one parameter field is constant. The normalized cross-gradient similarity term leads to a joint inverse problem that is challenging to solvenumerically. Even though it improves on the cross-gradient, the improvement isgenerally minimal, and the reconstructions do not compare favorably with the onesobtained with vectorial total variation. Compared to the cross-gradient approaches,an additional advantage of the VTV and nuclear norm functionals is that they alsoact as regularizations, making individual regularization functionals unnecessary. Thisreduces the number of hyperparameters or regularization weights that must be chosen(see Table B1), thereby simplifying the inverse problem.
Appendix A. Summary of numerical optimization techniques for thesolution of regularized inverse problems
In this section, we describe the large-scale numerical optimization methods used forour numerical examples. As already discussed in the introduction, the solution ofPDE-constrained optimization problems typically requires iterative methods. Thesemethods require first (and ideally, also second) derivatives of the objective functionwith respect to the parameter fields [17, 33]. These derivatives can be computedefficiently using adjoint methods [34, 35, 36]. In particular, the computation of a tructural similarity and regularization for joint inverse PDE problems m = ( m , m ), the objective function by J ( m )and use upper indices to denote iteration numbers. Appendix A.1. Line-search Newton-CG for cross-gradient and VTV regularizations
In the k -th iteration, we update the medium parameters m ( k ) along a searchdirection p ( k ) by computing m ( k +1) = m ( k ) + α ( k ) p ( k ) with an appropriate steplength α ( k ) >
0. To ensure convergence, the search direction must be a descentdirection, i.e., it must satisfy (cid:104) g ( k ) , p ( k ) (cid:105) <
0, where g ( k ) is the gradient of J withrespect to m evaluated at m ( k ) , and (cid:104)· , ·(cid:105) is an appropriate inner product. The steplength α ( k ) could be chosen to minimize the objective functional along this searchdirection p ( k ) . However, solving this minimization problem exactly is too expensivefor large-scale applications, since a single evaluation of the objective functional requiresthe solution of the state PDE, potentially multiple times (e.g., N s times in the exampleof section 5.2 which has multiple sources). Instead, we seek an approximate minimizerthat satisfies the following Armijo condition to ensure sufficient descent, J ( m ( k ) + α ( k ) p ( k ) ) ≤ J ( m ( k ) ) + c α ( k ) (cid:104) g ( k ) , p ( k ) (cid:105) , (A.1)with 0 < c <
1. To ensure sufficiently large step lengths, we use backtrackingline search [17] to find a step length that satisfies (A.1). That is, the step lengthis computed by starting from an initial guess α ( k )0 >
0, and is reduced until thesufficient descent condition (A.1) is satisfied. When computing the search direction fora Newton-type method (see next paragraph), we use α ( k )0 = 1, since this is guaranteedto be a successful step length in a neighborhood of a minimizer [17].The choice of good search directions is crucial in PDE-constrained optimization.In the steepest descent method, one chooses the search direction as the negativegradient, i.e., p ( k ) = − g ( k ) . Unfortunately, the resulting algorithm usually convergesslowly in the presence of stretched contour lines of the objective J , a consequenceof the typical ill-posedness of inverse problems. The Newton direction is givenby the solution of the linear system H ( m ( k ) ) p ( k ) = − g ( k ) , where H ( m ( k ) ) is theHessian, i.e., the second derivative of J , evaluated at m ( k ) . The direction p ( k ) arisingas solution of this equation is a descent direction only if the Hessian is positivedefinite, which may not be the case, in particular far away from the minimizer.When the Hessian is indefinite, one solution is to replace the Hessian with a positivedefinite approximation, a common choice being the Gauss-Newton Hessian [17]. Thisapproximation is obtained by setting the adjoint variables to zero in the computationof the Hessian. Another option is to retain the full Hessian but solve the Newtonsystem approximately, in a way that guarantees the computed solution to be a descentdirection. Since for large-scale problems exactly constructing the Hessian is infeasible, tructural similarity and regularization for joint inverse PDE problems Appendix A.2. BFGS method for nuclear norm regularization
To solve joint inverse problems regularized with the nuclear norm joint regularization(section 4), we use a BFGS quasi-Newton method with damped update [17]. That is,we find the search direction p ( k ) by computing p ( k ) = − B ( k ) g ( k ) , where g ( k ) is againthe gradient of the objective function and B ( k ) is a positive definite approximation ofthe inverse of the Hessian. This approximation is updated at each iteration with therank-2 update B ( k +1) = ( I − ρ k r ( k ) ( y ( k ) ) T ) B ( k ) ( I − ρ k y ( k ) ( r ( k ) ) T ) + ρ k r ( k ) ( r ( k ) ) T , (A.2)where y ( k ) is the difference between the gradient at steps k + 1 and k , ρ k :=1 / ( y ( k ) ) T r ( k ) , and r ( k ) is the damped form of s ( k ) , the difference between the parameterat steps k + 1 and k , and is defined as r ( k ) := θ k s ( k ) + (1 − θ k ) B ( k ) y ( k ) , with θ k := , if ( s ( k ) ) T y ( k ) ≥ α ( y ( k ) ) T B ( k ) y ( k ) , (1 − α )( y ( k ) ) T B ( k ) y ( k ) ( y ( k ) ) T B ( k ) y ( k ) − ( s ( k ) ) T y ( k ) , otherwise.The classical BFGS method requires the curvature condition ( s ( k ) ) T y ( k ) > B ( k ) for all k . However, the curvature condition can be guaranteed to be satisfiedonly when the objective function is strictly convex, which is typically not the case fornonlinear inverse problems. Using a damped update allows us to apply a backtrackingline search, while avoiding skipping some updates of B ( k ) entirely. In our numericalexperiments, we found that α = 0 . B (0) . BFGS-type methods perform well when the difference betweenthe initial Hessian approximation and the true Hessian is a compact operator [37].Thus, we take B (0) as the inverse of the Hessian of the regularization. This quantity isnot available for the nuclear norm joint regularization. However, VTV and the nuclearnorm joint regularization come from the same family of joint regularizations, differingonly by the matrix norm employed [14]. Since matrix norms are equivalent in finite tructural similarity and regularization for joint inverse PDE problems B (0) to the inverse of the Hessian of the VTV joint regularizationat the parameter m ( k ) . Appendix B. Number of hyperparameters for each joint regularization
Table B1.
Number of hyperparameters for a joint inverse problem with 2parameter fields. joint regularizationcross-grad n-cross-grad vectorial TV nuclear norm γ TV 2 2 – –joint 1 1 1 1 ε TV 1 1 – –joint 1 1 1 1total 5 5 2 2
Appendix C. Table of relative medium misfits for examples
In table C1, the relative misfits for the examples presented in section 5 are summarized.
Table C1.
Relative medium misfits (in L -norm) for the examples in section 5. sec. 5.1.1 sec. 5.1.2 sec. 5.2 sec. 5.3 m m m m α β m α independent 23.2% 5.1% 46.9% 5.1% 2.8% 0.8% 9.0% 9.9%cross-grad 22.3% 5.2% 46.1% 5.6% 3.1% 0.7% 4.9% 11.0%n-cross-grad 21.2% 5.0% 46.7% 5.0% 2.5% 0.4% 4.9% 10.7%vectorial TV 20.2% 5.1% 41.1% 5.2% 2.4% 0.2% 8.9% 3.3%nuclear norm 20.2% 4.8% 40.8% 5.0% 2.4% 0.2% 20.6% 4.5% Acknowledgments
The authors would like to thank David Keyes (KAUST) and George Turkiyyah (AUB)for very helpful discussions that inspired this work. They would also like to thankSergey Fomel (UT-Austin) for referring them to [2, 3], Jan Modersitzki (L¨ubeck) fordirecting their attention to [9], Nick Alger (UT-Austin) for help with figure 1, and twoanonymous referees whose comments helped significantly to improve the manuscript.This work was partially supported by AFOSR grant FA9550-17-1-0190, DOE grantDE-SC0009286, KAUST award OSR-2016-CCF-2596, and NSF grants ACI-1550593,DMS-1723211, CBET-1507009, and CBET-1508713. tructural similarity and regularization for joint inverse PDE problems References [1] Ioannis Epanomeritakis, Volkan Ak¸celik, Omar Ghattas, and Jacobo Bielak. A Newton-CGmethod for large-scale three-dimensional elastic full-waveform seismic inversion.
InverseProblems , 24(3):034015 (26pp), 2008.[2] Edgar Manukyan, Hansruedi Maurer, and Andre Nuber. Improvements to elastic full waveforminversion using cross-gradient constraints. In
SEG Technical Program Expanded Abstracts2016 , pages 1506–1510. Society of Exploration Geophysicists, 2016.[3] Maokun Li, Lin Liang, Aria Abubakar, and Peter M van den Berg. Structural similarityregularization scheme for multiparameter seismic full waveform inversion. In
SEG TechnicalProgram Expanded Abstracts 2013 , pages 1089–1094. Society of Exploration Geophysicists,2013.[4] A Abubakar, G Gao, Tarek M Habashy, and J Liu. Joint inversion approaches for geophysicalelectromagnetic and elastic full-waveform data.
Inverse Problems , 28(5), 2012.[5] Oguz Semerci, Guangdong Pan, Maokun Li, Lin Liang, and Tarek Habashy. Jointelectromagnetic and seismic inversion for petrophysical parameters using multi-objectiveoptimization. In
SEG Annual Meeting , Denver, CO, 26-31 October 2014. SEG.[6] Xuan Feng, Qianci Ren, Cai Liu, and Xuebing Zhang. Joint acoustic full-waveform inversion ofcrosshole seismic and ground-penetrating radar data in the frequency domain.
Geophysics ,82(6):H41–H56, 2017.[7] Luis A. Gallardo and Max A. Meju. Characterization of heterogeneous near-surface materials byjoint 2D inversion of DC resistivity and seismic data.
Geophysical Research Letters , 30(13),2003.[8] Klara Steklova and Eldad Haber. Joint hydrogeophysical inversion: state estimation for seawaterintrusion models in 3D.
Computational Geosciences , 1(21):75–94, 2016.[9] Eldad Haber and Jan Modersitzki. Intensity gradient based registration and fusion of multi-modal images.
Proceedings of Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2006 , pages 726–733, 2006.[10] Eldad Haber and Michal Holtzman Gazit. Model fusion and joint inversion.
Surveys inGeophysics , 34(5):675–695, 2013.[11] Luis A. Gallardo and Max A. Meju. Structure-coupled multiphysics imaging in geophysicalsciences.
Reviews of Geophysics , 49(1), 2011.[12] Peter Blomgren and Tony F. Chan. Color TV: Total variation methods for restoration of vector-valued images.
IEEE Transactions on Image Processing , 7(3):304–309, 1998.[13] Xavier Bresson and Tony F Chan. Fast dual minimization of the vectorial total variation normand applications to color image processing.
Inverse problems and imaging , 2(4):455–484,2008.[14] Kevin M Holt. Total nuclear variation and jacobian extensions of total variation for vectorfields.
IEEE Transactions on Image Processing , 23(9):3975–3989, 2014.[15] Florian Knoll, Martin Holler, Thomas Koesters, Ricardo Otazo, Kristian Bredies, and Daniel KSodickson. Joint MR-PET reconstruction using a multi-channel image regularizer.
IEEEtransactions on medical imaging , 36(1):1–16, 2017.[16] Volkan Ak¸celik, George Biros, and Omar Ghattas. Parallel multiscale Gauss–Newton–Krylovmethods for inverse wave propagation. In
Proceedings of IEEE/ACM SC2002 Conference ,Baltimore, MD, November 2002.[17] Jorge Nocedal and Stephen J. Wright.
Numerical Optimization . Springer Verlag, Berlin,Heidelberg, New York, second edition, 2006.[18] Ron S. Dembo, Stanley C. Eisenstat, and Trond Steihaug. Inexact Newton methods.
SIAMJournal on Numerical Analysis. , 19:400–408, 1982.[19] Antonin Chambolle, Vicent Caselles, Daniel Cremers, Matteo Novaga, and Thomas Pock. Anintroduction to total variation for image analysis.
Theoretical foundations and numericalmethods for sparse recovery , 9(263-340):227, 2010.[20] Benjamin Crestel.
Advanced techniques for multi-source, multi-parameter, and multi-physicsinverse problems . PhD thesis, University of Texas at Austin, 2017.[21] Michael Hinterm¨uller and Georg Stadler. An infeasible primal-dual algorithm for total variation-based inf-convolution-type image restoration.
SIAM Journal on Scientific Computing ,28(1):1–23, 2006.[22] Carlos Fernandez-Granda. Optimization-based data analysis.
Lecture notes at NYU-CIMS ,April 2016.[23] Th´eodore Papadopoulo and Manolis I. A. Lourakis.
Estimating the Jacobian of the SingularValue Decomposition: Theory and Applications , pages 554–570. Springer Berlin Heidelberg, tructural similarity and regularization for joint inverse PDE problems Berlin, Heidelberg, 2000.[24] Uri M. Ascher, Eldad Haber, and Hui Huang. On effective methods for implicit piecewise smoothsurface recovery.
SIAM Journal on Scientific Computing , 28(1):339–358, 2006.[25] Anders Logg, Kent-Andre Mardal, and Garth Wells.
Automated Solution of DifferentialEquations by the Finite Element Method: The FEniCS book , volume 84. Springer Science &Business Media, 2012.[26] Anders Logg and Garth N. Wells. DOLFIN: Automated finite element computing.
ACMTransactions on Mathematical Software , 37(2):Article 20, 2010.[27] Umberto Villa, Noemi Petra, and Omar Ghattas. hIPPYlib: An Extensible Software Frameworkfor Large-Scale Deterministic and Linearized Bayesian Inverse Problems. To be submitted,2016.[28] Bjorn Engquist and Andrew Majda. Absorbing boundary conditions for the numerical simulationof waves.
Mathematics of Computation , 31(139):629–651, 1977.[29] Gary S Martin, Robert Wiley, and Kurt J Marfurt. Marmousi2: An elastic upgrade forMarmousi.
The Leading Edge , 25(2):156–166, 2006.[30] Jean Virieux and Stephane Operto. An overview of full-waveform inversion in explorationgeophysics.
Geophysics , 74(6):WCC1–WCC26, 2009.[31] Francois Lavou´e, Romain Brossier, Ludovic M´etivier, St´ephane Garambois, and Jean Virieux.Two-dimensional permittivity and conductivity imaging by full waveform inversion ofmultioffset gpr data: A frequency-domain quasi-newton approach.
Geophysical JournalInternational , 197(1):248–268, 2014.[32] Bas Peters and Felix J Herrmann. A sparse reduced Hessian approximation for multi-parameterwavefield reconstruction inversion. In
SEG Technical Program Expanded Abstracts 2014 ,pages 1206–1210. Society of Exploration Geophysicists, 2014.[33] Curt R. Vogel.
Computational Methods for Inverse Problems . Frontiers in AppliedMathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,2002.[34] Juan Carlos De Los Reyes.
Numerical PDE-constrained optimization . Springer, 2015.[35] Michael Hinze, Rene Pinnau, Michael Ulbrich, and Stefan Ulbrich.
Optimization with PDEConstraints . Springer, 2009.[36] Fredi Tr¨oltzsch.
Optimal Control of Partial Differential Equations: Theory, Methods andApplications , volume 112 of
Graduate Studies in Mathematics . American MathematicalSociety, 2010.[37] A. Griewank. The local convergence of Broyden-like methods on Lipschitzian problems in Hilbertspace.