[PDF] A comparative study of structural similarity and regularization for joint inverse problems governed by PDEs

Abstract

Joint inversion refers to the simultaneous inference of multiple parameter fields from observations of systems governed by single or multiple forward models. In many cases these parameter fields reflect different attributes of a single medium and are thus spatially correlated or structurally similar. By imposing prior information on their spatial correlations via a joint regularization term, we seek to improve the reconstruction of the parameter fields relative to inversion for each field independently. One of the main challenges is to devise a joint regularization functional that conveys the spatial correlations or structural similarity between the fields while at the same time permitting scalable and efficient solvers for the joint inverse problem. We describe several joint regularizations that are motivated by these goals: a cross-gradient and a normalized cross-gradient structural similarity term, the vectorial total variation, and a joint regularization based on the nuclear norm of the gradients. Based on numerical results from three classes of inverse problems with piecewise-homogeneous parameter fields, we conclude that the vectorial total variation functional is preferable to the other methods considered. Besides resulting in good reconstructions in all experiments, it allows for scalable, efficient solvers for joint inverse problems governed by PDE forward models.

Full PDF

AA comparative study of structural similarity andregularization for joint inverse problems governedby PDEs

Benjamin Crestel , Georg Stadler and Omar Ghattas , Institute for Computational Engineering & Sciences, The University of Texasat Austin, Austin, TX, USA Courant Institute of Mathematical Sciences, New York University, New York,NY, USA Department of Geological Sciences and Department of Mechanical Engineering,The University of Texas at Austin, Austin, TX, USAE-mail: [email protected], [email protected] [email protected]

Abstract.

Joint inversion refers to the simultaneous inference of multipleparameter ﬁelds from observations of systems governed by single or multipleforward models. In many cases these parameter ﬁelds reﬂect diﬀerent attributesof a single medium and are thus spatially correlated or structurally similar. Byimposing prior information on their spatial correlations via a joint regularizationterm, we seek to improve the reconstruction of the parameter ﬁelds relativeto inversion for each ﬁeld independently. One of the main challenges is todevise a joint regularization functional that conveys the spatial correlationsor structural similarity between the ﬁelds while at the same time permittingscalable and eﬃcient solvers for the joint inverse problem. We describe severaljoint regularizations that are motivated by these goals: a cross-gradient and anormalized cross-gradient structural similarity term, the vectorial total variation,and a joint regularization based on the nuclear norm of the gradients. Basedon numerical results from three classes of inverse problems with piecewise-homogeneous parameter ﬁelds, we conclude that the vectorial total variationfunctional is preferable to the other methods considered. Besides resulting ingood reconstructions in all experiments, it allows for scalable, eﬃcient solvers forjoint inverse problems governed by PDE forward models.

Keywords : Joint inversion, multi-physics inverse problem, joint regularization,structural similarity prior, vectorial total variation, cross-gradient, nuclear norm

1. Introduction

In a joint inverse problem one seeks to reconstruct multiple parameter ﬁelds fromobservational data and forward models that map the parameter ﬁelds to the data. Inmany cases these parameter ﬁelds reﬂect diﬀerent attributes of a single medium andare thus spatially correlated or structurally similar. By imposing prior informationon their spatial correlations via a joint regularization term, we seek to improve thereconstruction of the parameter ﬁelds relative to inversion for each ﬁeld independently.We formulate the joint inverse problem as an optimization problem with aregularized data misﬁt objective, governed by a forward model that represents a single a r X i v : . [ m a t h . NA ] A ug tructural similarity and regularization for joint inverse PDE problems m and m , which we seek to reconstruct fromobservational data d . The parameter-to-observable map F ( m , m ) typically involvessolution of the forward PDEs given the parameter ﬁelds, followed by application ofthe observation operator, which restricts the PDE solution to the space of observables.The optimization problem is thusmin ( m ,m ) (cid:26) |F ( m , m ) − d | + R ( m , m ) (cid:27) . (1)The role played by R in (1) is discussed in the next paragraph. Here, we addresstwo speciﬁc settings for (1). In the ﬁrst, the forward model in F ( m , m ) describes asingle physical phenomenon. An example of such a joint inverse problem is inversionfor the primary and secondary wave speeds in the Earth given measurements of theacceleration at the surface. Obtaining high quality reconstructions for both parameterﬁelds is known to be diﬃcult without incorporating some form of prior knowledge thatcouples the two ﬁelds [1, 2, 3]. We refer to formulation (1) as a single physics jointinverse problem.In the second type of joint inverse problem, we consider observations d and d stemming from two distinct physical phenomena respectively, each depending on asingle parameter ﬁeld. In this case the forward models of the physical phenomenaare uncoupled, and coupling occurs only via the inverse problem. The correspondingparameter-to-observable maps are denoted by F ( m ) and F ( m ), resulting inmin ( m ,m ) (cid:26) |F ( m ) − d | + 12 |F ( m ) − d | + R ( m , m ) (cid:27) . (2)This formulation emerges from the general case above by deﬁning F ( m , m ) =[ F ( m ) , F ( m )] T and d = [ d , d ] T . In the context of subsurface exploration,just a few of the diﬀerent physical phenomena that can be combined in (2) includeelectromagnetic and seismic waves [4, 5], radar and seismic waves [6], DC resistivityand seismic waves [7], and current resistivity and groundwater ﬂow [8].The joint regularization term R ( m , m ) in (1) and (2) acts to impose regularityon m and m individually to combat ill-posedness, but can also express structuralsimilarity or spatial correlations between the two parameter ﬁelds. The remainder ofthis section introduces several diﬀerent choices for R . To isolate regularization fromstructural similarity, we decompose the joint regularization term R ( m , m ) into R ( m , m ) = γ R ( m ) + γ R ( m ) + γ ˆ R ( m , m ) , with γ, γ , γ >

0. The terms R and R are regularization terms for each parameterﬁeld; here we take them to be total variation (TV) regularizations, since our targetmedia are piecewise-homogeneous (i.e., blocky). The term ˆ R ( m , m ) incorporatesthe structural similarity between m and m . We now discuss several choices for ˆ R .In [7], the authors introduce the cross-gradient termˆ R ( m , m ) = 12 (cid:90) Ω |∇ m × ∇ m | dx, which seeks to align gradients of the two parameter ﬁelds at each point in the medium,i.e., level sets that have the same shape. This seems to be the most popular choice in tructural similarity and regularization for joint inverse PDE problems R ncg ( m , m ) = 12 (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12) ∇ m |∇ m | × ∇ m |∇ m | (cid:12)(cid:12)(cid:12)(cid:12) dx. The normalized cross-gradient was ﬁrst used in the context of image registration [9],and is discussed in section 2.2. Alternatively, when an empirical relationshipbetween both parameters is known, one could use it in place of the structuralsimilarity term ˆ R [4, 10]; this approach, however, can be problematic in practiceas these relationships are typically uncertain (thus introducing bias) and the resultingoptimization problems can be diﬃcult to solve [10, 11].Alternatively, a single joint regularization term can impose regularity on bothparameter ﬁelds while also expressing a preference for structural similarity. Inparticular, we consider the vectorial total variation (VTV) functional, R ( m , m ) = γ (cid:90) Ω (cid:112) |∇ m | + |∇ m | dx, with γ >

0. The VTV functional was introduced in the context of multi-channelimaging [12, 13], and later used in PDE-constrained joint inverse problems [10]; it isdiscussed in section 3. A second term we consider is the nuclear norm , which wasused in [14, 15] to promote gradient alignment of a vector-valued image. Building onthis idea, in section 4 we introduce a nuclear norm-based joint regularization term forPDE-constrained joint inverse problems.The objective of this article is to construct and assess joint regularization termsthat are (1) eﬃcient for inverse problems governed by PDEs with inﬁnite-dimensionalparameter ﬁelds (and are thus large-scale after discretization) and (2) perform well inreconstructing sharp interfaces in the truth parameter ﬁelds. Indeed, targeting large-scale inverse problems entails several unique challenges that limit choices of the jointregularization term. Nonlinear inverse problems such as (1) and (2) must be solvediteratively, which requires gradient- (and Hessian-) based optimization methods tolimit the number of optimization iterations, along with adjoint methods to limit thenumber of PDE model solutions that must be carried out at each iteration. Moreover,the adjoint method eﬃciently provides only directional second derivatives rather thanfull Hessians, the construction of which would require as many PDE solves as thereare parameters (or observations). For these reasons, unless otherwise speciﬁed, weemploy an inexact Hessian-free Newton–conjugate gradient method with backtrackingline-search [1, 16, 17]. That is, we compute the Newton search direction using thepreconditioned conjugate gradient method, with early termination to guarantee adescent direction and to avoid over-solving [18]. The eﬃcient solution of the Newtonsystem depends crucially on the choice of preconditioner; we detail our choices for eachjoint regularization functional in sections 2, 3, and 4. An overview of the numericalmethods we employ to solve large-scale inverse problems governed by PDEs can befound in Appendix A.Besides practicality and eﬃciency, our comparison of joint inversion methodsfocuses on the quality of the reconstructions. Truth parameter ﬁelds in geophysicalexploration and medical imaging problems often present sharp contrasts withinparameter ﬁelds. We focus on joint regularization terms that can best preserve sharp tructural similarity and regularization for joint inverse PDE problems

The main contributions of this article are as follows: (1) We review three jointregularization terms commonly found in the literature (cross-gradient paired withTV, normalized cross-gradient paired with TV, and VTV joint regularization), anddiscuss their practical use for large-scale joint inverse problems governed by PDEs.We derive their ﬁrst and second derivatives, and use them to study properties of thediﬀerent joint regularization terms. (2) We adapt a nuclear norm joint regularizationterm to the context of joint inverse problems governed by PDEs. We discuss some ofthe resulting computational challenges, and propose a solver to address them. (3) Wecarry out a detailed comparison of all four joint regularization terms over a broad rangeof applications, and discuss their practical performance to reconstruct parameter ﬁeldswith sharp interfaces.

In the next three sections, we introduce the four joint regularization terms. Thecross-gradient and normalized cross-gradient are discussed in sections 2.1 and 2.2,the vectorial total variation in section 3, and the nuclear norm joint regularizationin section 4. Section 5 summarizes our numerical experiments. In section 5.1, wereport on several multiple physics joint inverse problems of the form (2), in whichthe two parameters ﬁelds arise as coeﬃcients in two independent Poisson equations,respectively. We use this example to illustrate some key features of each jointregularization term. In section 5.2, we consider a single physics joint inverse problem ofthe form (1) for the acoustic wave equation, in which we invert for the bulk modulusand the density. Finally, in section 5.3, we study a multiple physics joint inverseproblem with two diﬀerent forward models, one an elliptic PDE and the other anacoustic wave equation. Section 6 provides concluding remarks.

2. Cross-gradient terms

In this section, we introduce the cross-gradient term and its normalized version. Themain idea behind both of these structural similarity terms is to express the preferencethat the level sets of the inversion parameter ﬁelds m and m align. As illustratedin ﬁgure 1, alignment of the level sets is equivalent to the alignment of the gradients ∇ m and ∇ m at each point. By deﬁnition of the cross-product of two vectors, thevectors ∇ m and ∇ m are aligned when |∇ m × ∇ m | vanishes. The cross-gradient term ˆ R cg , deﬁned asˆ R cg ( m , m ) := 12 (cid:90) Ω |∇ m × ∇ m | dx, (3) tructural similarity and regularization for joint inverse PDE problems ∇ m level setof m ∇ m level setof m Figure 1.

Sketch of a level set of the parameter ﬁelds m (red) and m (blue),with their respective gradients at a point. was introduced in [7] and has become a popular choice in geophysical applications,particularly in seismic imaging. Although the formulation (3) is intuitive, it isinconvenient for discretization and computation of derivatives. Hence, using vectorcalculus, we re-write (3) asˆ R cg ( m , m ) = 12 (cid:90) Ω |∇ m | |∇ m | − ( ∇ m · ∇ m ) dx. (4)Combining the cross-gradient term (4) with independent TV regularizations for m and m , we obtain the joint regularization R ( m , m ) = γ R TV ,ε ( m ) + γ R TV ,ε ( m ) + γ ˆ R cg ( m , m ) , (5)where and γ, γ , γ >

0, and here and in the remainder of this paper, we use thenotation R TV ,ε ( m ) := (cid:90) Ω (cid:112) |∇ m | + ε dx for ε > . (6)In [10] the authors propose a diﬀerent formulation, in which each independent TVregularization is weighted by a non-linear function of the gradient of the otherparameter. The goal of this weighting is to apply TV regularization only for points inthe parameter space where the cross-gradient term by itself is not suﬃcient to preventoscillatory solutions. Such oscillations may occur where the gradient of one parameteris very small, resulting in an (almost) vanishing cross-gradient term. Because thisformulation further increases the nonlinearity of the problem, we instead use (5).Next, we derive ﬁrst and second derivatives of the cross-gradient regularization,interpret these derivatives as PDE operators, and draw analogies with the derivativesof the TV functional R TV or its regularized version (6). For this purpose, we ﬁrstderive the ﬁrst and second variation of the TV functional as follows: δ m R TV ( m ; ˜ m ) = (cid:90) Ω |∇ m | − ( ∇ m · ∇ ˜ m ) dx,δ m R TV ( m ; ˆ m, ˜ m ) = (cid:90) Ω |∇ m | − ( ∇ ˆ m · ∇ ˜ m ) − |∇ m | − ( ∇ m · ∇ ˜ m )( ∇ m · ∇ ˆ m ) dx, where ˜ m and ˆ m are arbitrary directions. Using integration by parts, the fact that ˜ m in the expression for δ m is arbitrary, and the vector identity ( a · b )( c · d ) = b · ( ac T ) · d ,one ﬁnds that the Hessian H is the following second-order elliptic PDE operator H TV ˆ m := −∇· ( A TV ( m ) ∇ ˆ m ) , tructural similarity and regularization for joint inverse PDE problems A TV ( m ) = 1 |∇ m | (cid:18) I − ∇ m ∇ m T |∇ m | (cid:19) . (7)This interpretation as diﬀusion operator shows that H TV acts very diﬀerently atdiﬀerent points x ∈ Ω. In particular, let us consider a point x where the norm of ∇ m is large, e.g., x is located at an interface in the parameter ﬁeld m . Then, in directionsorthogonal to ∇ m (i.e., directions normal to an interface), A TV vanishes and thus theelliptic operator does not smooth the reconstruction m in these directions. In contrast,in directions that are orthogonal to ∇ m (i.e., directions that are tangent to interfaces), A TV does not vanish, thus smoothing the reconstruction m along interfaces. Thisexplains the anisotropic smoothing properties of the TV functional and, in particular,its ability to recover sharp interfaces in parameter ﬁelds. Away from interfaces, where ∇ m is small, A TV behaves like a scaled identity, thus smoothing m in all directions,much as H norm-based Tikhonov regularization does.We now turn to the derivation of the derivatives of the cross-gradient term ˆ R cg .Following similar arguments as for the scalar TV regularization above, this will provideus with insight regarding the regularization properties. Additionally, these derivativesare useful for devising a Newton-type algorithm for the inverse problem solution andfor preconditioning the linear systems that arise.Starting from (4), we now compute the gradient, and the action of the Hessianin a given direction for the cross-gradient term. We perform the computations usingweak forms and then use integration by parts to derive the corresponding strong forms.The directional derivative at m := ( m , m ) in a direction ˜ m := ( ˜ m , ˜ m ) is given by δ m ˆ R cg ( m ; ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ m ) − ( ∇ m · ∇ m )( ∇ ˜ m · ∇ m ) dx,δ m ˆ R cg ( m ; ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ m ) − ( ∇ m · ∇ m )( ∇ ˜ m · ∇ m ) dx. Taking another variation, we ﬁnd that the action of the Hessian of the cross-gradientterm in a direction ˆ m = ( ˆ m , ˆ m ) is given by δ m ˆ R cg ( m ; ˆ m , ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ ˆ m ) − ( ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) dx,δ m ,m ˆ R cg ( m ; ˆ m , ˜ m ) = (cid:90) Ω ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) − ( ∇ m · ∇ m )( ∇ ˜ m · ∇ ˆ m ) − ( ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) dx,δ m ˆ R cg ( m ; ˆ m , ˜ m ) = (cid:90) Ω |∇ m | ( ∇ ˜ m · ∇ ˆ m ) − ( ∇ ˜ m · ∇ m )( ∇ m · ∇ ˆ m ) dx. In strong form and neglecting boundary conditions, the Hessian H acts, in adirection ˆ m , like an anisotropic vector diﬀusion operator, i.e., H ˆ m = −∇· ( A cg ( m ) ∇ ˆ m ) , where A cg is a diﬀusion tensor given by A cg ( m ) = (cid:20) D ( m ) B ( m ) B ( m ) T D ( m ) (cid:21) , (8) tructural similarity and regularization for joint inverse PDE problems i = 1 , D ( m i ) := |∇ m i | I − ∇ m i ∇ m Ti ,B ( m ) := 2 ∇ m ∇ m T − ( ∇ m · ∇ m ) I − ∇ m ∇ m T . The block-diagonal part of A cg indicates a TV-like behavior but where parameter m (resp. m ) preserves interfaces in directions where parameter m (resp. m ) presentsan interface; this illustrates the coupling between both parameters. As we shownumerically in ﬁgure 2, the Hessian of the cross-gradient term can be indeﬁnite. TheTV regularization being a convex functional, its Hessian is guaranteed to be positivesemideﬁnite. Therefore, the Hessian obtained by retaining the block diagonal parts ofthe diﬀusion tensor (8), i.e., H d ˆ m := −∇· ( A cg ,d ( m ) ∇ ˆ m ), with A cg ,d ( m ) := (cid:20) D ( m ) 00 D ( m ) (cid:21) , (9)is also guaranteed to be positive semideﬁnite. For this reason, when using the cross-gradient paired with two independent TV regularizations, we precondition the Newtonsystem with a block-diagonal matrix containing the Hessian of the TV regularizations,combined with a small multiple of the identity in each block, and the block-diagonalpart of the Hessian of the cross-gradient term (9).0 2 , · , − − − · , · eigenvalue 0 2 , − − − · eigenvalue(i) m (ii) m (iii) cross-gradient (iv) norm.cross-gd Figure 2.

Eigenvalues of the Hessian operator (blue) and block-diagonal partof the Hessian operator (red) for the (iii) cross-gradient term (3) and the (iv)normalized cross-gradient term (10) with ε = 10 − , for two combinations of truthparameter ﬁelds (i) m and (ii) m . The domain is a unit square discretized bya 40 ×

40 mesh of squares subdivided into triangles, and the parameter ﬁelds m and m are discretized using continuous piecewise linear ﬁnite elements. tructural similarity and regularization for joint inverse PDE problems A disadvantage of the cross-gradient term (4) is that it vanishes where one of theinversion parameter ﬁelds is constant, hence potentially ignoring sharp discontinuitiesin the other. A remedy, proposed in the context of image registration in [9], is tonormalize the gradient of both inversion parameters in the formulation of the cross-gradient. The normalized cross-gradient is given byˆ R ( m , m ) = 12 (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12) ∇ m |∇ m | × ∇ m |∇ m | (cid:12)(cid:12)(cid:12)(cid:12) dx = 12 (cid:90) Ω − (cid:18) ∇ m · ∇ m |∇ m ||∇ m | (cid:19) dx. Since this formulation is non-diﬀerentiable where |∇ m | = 0 or |∇ m | = 0, we use themodiﬁed normalized cross-gradient,ˆ R ncg ( m , m ) := 12 (cid:90) Ω − (cid:32) ∇ m · ∇ m (cid:112) |∇ m | + ε (cid:112) |∇ m | + ε (cid:33) dx, (10)with ε >

0. In the rest of this paper, we refer to (10) when discussing the normalizedcross-gradient. Combining the normalized cross-gradient term (10) with two TVregularizations, we obtain the joint regularization R ( m , m ) = γ R TV ,ε ( m ) + γ R TV ,ε ( m ) + γ ˆ R ncg ( m , m ) , (11)where γ, γ , γ >

0. Compared to the cross-gradient term, the derivatives ofthe normalized cross-gradient term give less obvious insight into its regularizationbehavior. Instead, we illustrate numerically that the normalized cross-gradient oftenbehaves as a concave operator. In ﬁgure 2, we plot the eigenvalues of its Hessian andof the block-diagonal part of its Hessian for diﬀerent parameter ﬁelds m and m , andobserve that most eigenvalues are negative. The main practical consequence of thisobservation is that the Hessian of the joint regularization (11) may be indeﬁnite. Forthis reason, the preconditioner for the Newton system is formed by the Hessians ofthe TV regularizations alone.

3. Vectorial total variation

The vectorial total variation functional [13], or color TV [12], is the multi-parameterequivalent of the total variation functional. It was ﬁrst introduced for multi-channelimaging applications [12, 13], and later applied to joint inverse problems [10]. TheVTV functional is convex, and unlike the cross-gradient and normalized cross-gradient,it serves as a regularization by itself, i.e., it does not require additional regularizationterms. It is given by R ( m , m ) = γ (cid:90) Ω (cid:112) |∇ m | + |∇ m | dx, (12)with γ >

0. Since this formulation is non-diﬀerentiable where |∇ m | = |∇ m | = 0,we introduce a modiﬁed VTV regularization given by R VTV ( m , m ) := γ (cid:90) Ω (cid:112) |∇ m | + |∇ m | + ε dx, (13) tructural similarity and regularization for joint inverse PDE problems ε, γ >

0. Whereas the cross-gradient terms (see section 2) work by aligningthe level sets of the inversion parameter ﬁelds, VTV favors superimposition ofdiscontinuities. An intuitive way to explain this, given the understanding of the TVregularization [19], is sketched in ﬁgure 3. Given two parameter ﬁelds with a singlejump of same amplitude, the VTV functional is minimum when both jumps occur atthe same location. 0 1 200 . m m . R VTV ( m , m ) = ”2 (cid:82) Ω (cid:112) |∇ m | dx ” > ” √ (cid:82) Ω (cid:112) |∇ m | dx ” Figure 3.

Values of the VTV regularization (12), for two parameter ﬁelds m and m deﬁned over Ω = [0 , R TV ( m ) = R TV ( m ). This informal argument canbe made rigorous by using piecewise linear functions for m and m . The derivatives of the VTV regularization resemble those of the TVregularization. For simplicity, we set γ ≡ m = ( m , m ) in a direction ˜ m = ( ˜ m , ˜ m ) is given by δ m i R VTV ( m ; ˜ m i ) = (cid:90) Ω ∇ m i · ∇ ˜ m i (cid:112) |∇ m | + |∇ m | + ε dx, for i = 1 , . (14)We again interpret the Hessian of the VTV as a diﬀusion tensor to study its anisotropicdiﬀusion behavior. In strong form (see section 2.1), it is given by A VTV ( m ) := 1 |∇ m | ε  I − ∇ m ∇ m T |∇ m | ε − ∇ m ∇ m T |∇ m | ε − ∇ m ∇ m T |∇ m | ε I − ∇ m ∇ m T |∇ m | ε  , (15)where |∇ m | ε = |∇ m | + |∇ m | + ε . Comparing with the diﬀusion tensor for theHessian of the TV regularization (7), we ﬁnd similar terms along the block diagonal,with the exception of the normalization factor in the denominator. It is |∇ m i | inthe case of TV, and |∇ m | ε in the case of VTV, i.e., it involves the gradient of bothparameters, hence introducing coupling between the parameter ﬁelds. The eigen-decomposition of the diﬀusion tensor of the Hessian provides further insights. Forsimplicity, we use ε = 0 in this analysis. Skipping details that can be found in [20],the eigenpairs for the diﬀusion tensor A VTV are (cid:18)(cid:20) ∇ m ∇ m (cid:21) , (cid:19) , (cid:18)(cid:20) ( ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) , (cid:18)(cid:20) ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) , (cid:18)(cid:20) ∇ m −∇ m (cid:21) , |∇ m | (cid:19) . The kernel of the diﬀusion tensor contains parameter ﬁeld directions that are notsmoothed out by the regularization. Reconstructions in these directions can displaysharp edges. It is informative to compare the eigenpairs of the diﬀusion tensor arisingfrom the VTV Hessian with those arising from R TV ( m ) + R TV ( m ), the sum of two tructural similarity and regularization for joint inverse PDE problems (cid:18)(cid:20) ∇ m (cid:21) , (cid:19) , (cid:18)(cid:20) ∇ m (cid:21) , (cid:19) , (cid:18)(cid:20) ( ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) , (cid:18)(cid:20) ∇ m ) ⊥ (cid:21) , |∇ m | (cid:19) . The sum of independent TV regularizations acts in the direction of each parameter m i independently from the other parameters, analogously to the TV functional for a singleinverse problem. That is, it preserves sharp interfaces in the parameter ﬁelds (largevalues of |∇ m i | ) but smoothes along interfaces. This is in contrast with the kernelof the diﬀusion tensor of VTV, which favors parameter ﬁelds with sharp variationsoccurring at the same physical locations.The use of TV regularization in PDE-constrained inverse problems increases thenonlinearity of the problem, and requires the use of customized solvers. Due to thesimilarity between TV and VTV, a similar challenging numerical behaviour can beexpected for VTV. In [20] we tailor a primal-dual Newton method [21] for the eﬃcient,scalable solution of PDE-constrained joint inverse problems regularized with VTV.Since the focus of the current paper is on a qualitative comparison of several jointregularization terms, we skip details of this solver here and instead refer to [20].

4. Nuclear norm joint regularization

The nuclear norm joint regularization seeks to promote gradient alignment byminimizing the rank of the Jacobian of the gradients of the parameter ﬁelds. Diﬀerentversions of that idea have been used in various imaging applications. In color imagedenoising, this approach is often referred to as total nuclear variation [14]; the uniﬁedframework to discuss VTV and the total nuclear variation in [14] shows that the nuclearnorm-based functional is a regularizer in itself. This can be simply justiﬁed by theequivalence of all norms in ﬁnite dimensions (here, on the space of matrices). In [15],the authors propose the pointwise nuclear norm of a matrix ﬁeld as regularizationto express a preference for alignment of image edges. Building on [14, 15], wepropose a nuclear norm joint regularization suitable for large-scale PDE constrainedoptimization.As for the methods discussed in section 2, this term seeks to promote alignmentof parameter level sets by attaining its minimum value when gradients align. Letus introduce the matrix-valued function G : Ω → R d × , with Ω ⊂ R d the physicaldomain, deﬁned by G ( x ) := (cid:2) ∇ m |∇ m (cid:3) =  ∂ x m ∂ x m ... ... ∂ x d m ∂ x d m  . The gradients ∇ m and ∇ m are aligned at x ∈ Ω if the columns of G ( x ) are multiplesof each other, in which case the rank of G ( x ) is 1. One could seek to promote gradientalignment by minimizing (cid:82) Ω rank( G ( x )) dx . However, in practice, minimization of therank of a matrix is notoriously diﬃcult. The nuclear norm of a matrix, deﬁned asthe (cid:96) -norm of its singular values and denoted by (cid:107)· (cid:107) ∗ , is often a good proxy for therank [22]. We therefore deﬁne, with γ >

0, the nuclear norm joint regularization as R ∗ ( m , m ) := γ (cid:90) Ω (cid:107) G ( x ) (cid:107) ∗ dx. (16) tructural similarity and regularization for joint inverse PDE problems We now compute derivatives of (16) using the chain rule. Let us introduce the notation f ( M ) := (cid:107) M (cid:107) ∗ , for arbitrary M ∈ R d × . Thus, R ∗ ( m , m ) = γ (cid:82) Ω f ( G ( x )) dx .Denoting the gradient of f with respect to the entries of matrix M by ∇ f ( M ) ∈ R d × ,the ﬁrst directional derivatives of (16) with respect to the inversion parameters m i , i = 1 ,

2, in a direction ˜ m i , are given by ∂ m i R ∗ ( m , m ) ˜ m i = γ (cid:90) Ω ( ∇ f ( G ) , ∂ m i G ( x ) ˜ m i ) dx, (17)where ∂ m G ( x ) ˜ m =  ∂ x ˜ m ∂ x d ˜ m  and ∂ m G ( x ) ˜ m =  ∂ x ˜ m ... ...0 ∂ x d ˜ m  , and the inner product for matrices M = ( m ij ) ij , N = ( n ij ) ij ∈ R d × is deﬁned as( M, N ) = (cid:80) di =1 (cid:80) j =1 m ij n ij .We next compute the gradient of the nuclear norm ∇ f ( M ). Given a full-rankmatrix M ∈ R n × m , i.e., r := rank( M ) = min( m, n ), and singular values { σ k } rk =1 ,we deﬁne its (reduced) singular value decomposition (SVD) by M = U Σ V T , with U ∈ R n × r , V ∈ R m × r , and Σ ∈ R r × r a diagonal matrix containing the singular valuesof M , i.e., Σ kk = σ k > k = 1 , . . . , r . The ( i, j )-entry of the gradient of the nuclearnorm is given by ( ∇ f ( M )) ij = r (cid:88) k =1 ∂σ k ∂m ij = r (cid:88) k =1 u ik v jk , where the second equality uses the singular value sensitivity [23]. The gradient of thenuclear norm with respect to the entries of M is then given by ∇ f ( M ) = U V T . The nuclear norm f ( M ) is non-diﬀerentiable when the matrix M is not full-rank,corresponding to the case where at least one of the singular values vanishes. To makeit diﬀerentiable, similar to the treatment of TV regularization, we deﬁne the modiﬁednuclear norm by f ε ( M ) := (cid:107) M (cid:107) ∗ ,ε = min( m,n ) (cid:88) k =1 (cid:113) σ k + ε, (18)where ε >

0. For γ >

0, we deﬁne the modiﬁed nuclear norm joint regularization as R ∗ ,ε ( m , m ) := γ (cid:90) Ω f ε ( G ( x )) dx. (19)The ( i, j )-entry of the gradient of the modiﬁed nuclear norm (18) is given by( ∇ f ε ( M )) ij = ∂∂m ij min( m,n ) (cid:88) k =1 (cid:113) σ k + ε = r (cid:88) k =1 σ k (cid:112) σ k + ε ∂σ k ∂m ij , tructural similarity and regularization for joint inverse PDE problems r since, by deﬁnition of the rank of amatrix, σ k = 0 for all k > r . Let us now introduce the diagonal matrix W ε ∈ R r × r ,with entries ( W ε ) ii = σ i / (cid:112) σ i + ε . Using the expression for the sensitivity of thesingular values [23], the gradient of the modiﬁed nuclear norm is then given by ∇ f ε ( M ) = U W ε V T . (20)The ﬁrst directional derivatives of (19) with respect to the inversion parameters m i , i = 1 ,

2, in a direction ˜ m i , are given by ∂ m i R ∗ ,ε ( m , m ) ˜ m i = γ (cid:90) Ω ( ∇ f ε ( G ) , ∂ m i G ( x ) ˜ m i ) dx, (21)The modiﬁed nuclear norm (18), however, is not twice diﬀerentiable when twosingular values are equal (crossing singular values). This is because the secondderivative requires the sensitivity of the individual singular vectors, which are notdiﬀerentiable where singular values cross. We have not found a practical workaroundfor this singularity, and thus proceed with a gradient-based method to solve jointinverse problems regularized with the nuclear norm joint regularization; the solver isdetailed in Appendix A.2. In the rest of this paper, when using “nuclear norm jointregularization”, we refer to the modiﬁed nuclear norm joint regularization (19).

5. Numerical examples

In this section, we present a comprehensive numerical comparison of the four jointregularization approaches introduced in sections 2–4, i.e., the cross-gradient (5), thenormalized cross-gradient (11), the vectorial total variation (13), and the nuclear norm(18) regularization. Reconstructions obtained with these joint regularization terms arecompared with each other, and with the reconstructions obtained by solving a jointinverse problem with independent TV regularizations. The parameters for all jointregularization terms are selected empirically as leading to the best reconstructions.The values of ε are chosen small enough to provide reconstructions with sharpinterfaces, but large enough to avoid numerical diﬃculties (see for instance thediscussion in [24]).The diﬀerent regularizations are compared using three examples covering bothtypes of joint inverse problems (1) and (2). In section 5.1, we combine two uncoupledPoisson inverse problems to form the joint inverse problem (2), where we invoke priorknowledge that the two truth parameter ﬁelds have similar structure. In section 5.2,we compare the ability of the joint regularization terms to improve the reconstructionof the bulk modulus and the density in an acoustic wave equation, an example of a jointinverse problem (1). Finally, in section 5.3, we formulate a multi-physics joint inverseproblem (2), which combines an inverse problem governed by the Poisson equationwith one governed by the acoustic wave equation. Here again, the Poisson parameterand the wave speed ﬁelds are assumed to have similar structure.In all examples, the domain is a 2D unit square, with a uniform mesh of isoscelesright triangles obtained by cutting in half N × N squares; we deﬁne the mesh sizeparameter h := 1 /N . All data are generated synthetically from the truth parameterﬁelds, and then polluted by adding independent and identically distributed Gaussiannoise; the noise level is speciﬁc to each example. We use continuous Galerkin ﬁniteelements to discretize all ﬁeld variables, with the state, adjoint, incremental state, tructural similarity and regularization for joint inverse PDE problems Here, we solve a joint inverse problem of the form (2) for the two coeﬃcient ﬁelds m and m . Considered separately, m and m are solutions to the (almost identical)TV-regularized inverse problems governed by the Poisson equation, i.e., m i := arg min m (cid:26) | B i u − d i | + γ i (cid:90) Ω (cid:112) |∇ m | + ε dx (cid:27) , where (cid:40) −∇· ( e m ∇ u ) = 1 , in Ω ,u = 0 , on ∂ Ω . (22)The operators B i represent pointwise observation operators, and the data d i aresynthetic observations polluted with 2% Gaussian noise. The domain Ω is discretizedwith a mesh of 8192 triangles (i.e., h = 1 / m ≡ m ≡ m and m reside in thetruth parameter ﬁelds, and in the observation operators B i . In the ﬁrst example(section 5.1.1), the truth parameter ﬁelds diﬀer but have interfaces at the same spatiallocations. In the second example (section 5.1.2), some interfaces in the truth parameterﬁeld for m are not present in the truth parameter ﬁeld for m . In both examples, theobservation locations deﬁned by B only cover the top-right quadrant of the domain,whereas the observation locations deﬁned by B are distributed over the entire domain;see ﬁgures 4 and 6. In the ﬁrst example,the parameter ﬁelds have interfaces at the same locations. In ﬁgure 4, we show thetruth parameter ﬁelds m and m and their reconstructions obtained by solving theinverse problems (22) independently. The reconstructions obtained with the fourregularization methods are shown in ﬁgure 5, and the corresponding values of therelative medium misﬁt are given in table C1.The reconstructions for parameter m do not diﬀer signiﬁcantly (ﬁgure 5b). Dueto the large number of observation points, this parameter is already well reconstructedin an independent inverse problem (ﬁgure 4d). We observe an improvement inthe reconstruction of parameter m for all four joint inverse problems compared tothe independent reconstruction shown in ﬁgure 4c. Using the cross-gradient onlymarginally improves the reconstruction for parameter m , most likely because theindependent reconstruction for m shows large areas of constant values, where thecross-gradient term vanishes; these areas therefore cannot be improved by the cross-gradient. The normalized cross-gradient improves over the cross-gradient but fails torecover the circular interface. Both the VTV joint regularization and the nuclear norm tructural similarity and regularization for joint inverse PDE problems m (b) Truth m (c) Indep m (d) Indep m Figure 4.

Parameter ﬁelds m and m in the example of section 5.1.1: truthparameter ﬁelds (a,b) and reconstructions (c,d) obtained by solving the inverseproblem (22) with ε = 10 − , γ = 3 · − , γ = 4 · − , and initial guesses m = m = 0. White dots in (a) and (b) indicate the location of the pointwiseobservations. The observation points deﬁned through B are a lattice of 25 × B are a square lattice of 50 ×

50 points distributed over the entiredomain. ( a ) m ( b ) m (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 5.

Reconstructions for the parameter ﬁelds (a) m and (b) m , obtainedby solving a joint inverse problem (2) regularized with (i) the cross-gradient( γ = 2 · − ) combined with two independent TV regularizations, (ii) thenormalized cross-gradient ( γ = 6 · − and ε = 10 − ) combined with the sameindependent TV regularizations, (iii) the VTV joint regularization ( γ = 3 · − and ε = 10 − ), and (iv) the nuclear norm joint regularization ( γ = 3 · − and ε = 10 − ). The parameters for the independent TV regularizations and the initialguesses for all problems are as for the independent inverse problems (see captionof ﬁgure 4). The legend for all plots is as in ﬁgure 4. tructural similarity and regularization for joint inverse PDE problems m (b) Truth m (c) Indep m (d) Indep m Figure 6.

Parameter ﬁelds for m and m in the example of section 5.1.2: truthparameter ﬁeld (a,b) and reconstructions (c,d) obtained by solving the inverseproblem (22) with ε = 10 − , γ = 4 · − , γ = 4 · − , and initial guesses m = m = 0. White dots in (a) and (b) indicate the location of the pointwiseobservations, as detailed in ﬁgure 4. joint regularization perform better in this example, and lead to reconstructions thatcontain all sharp interfaces in the target image. Here, the onlydiﬀerence with the previous example is that the truth parameter ﬁeld for m nolonger has a vertical discontinuity along the line x = 0 . m and m obtained by solvingtwo independent inverse problems (22). The reconstructions for the four joint inverseproblems are shown in ﬁgure 7, and the corresponding values of the relative mediummisﬁt are given in table C1.As in the previous example, for m the reconstructions obtained with the diﬀerentjoint inverse problems do not diﬀer signiﬁcantly (see ﬁgure 7b). However, we observediﬀerences among the reconstructions for parameter m . Using the cross-gradient onlymarginally improves the reconstruction for parameter m . The use of the normalizedcross-gradient does not show improvement over the cross-gradient. As in the ﬁrstexample, both the VTV joint regularization and the nuclear norm joint regularizationperform the best, and their corresponding reconstructions contain all sharp interfacespresent in the true image. However, in ﬁgures 7a (iii) and (iv) we also see a verticaldiscontinuity not present in the true image 6c. This ghost interface in m is due to thepresence of such a discontinuity in m , and highlights the tendency of the VTV jointregularization and nuclear norm joint regularization to superimpose discontinuities inboth parameters. Note, however, that the amplitude of this ghost interface is smallcompared to the amplitudes of the correctly recovered interfaces. We now study a joint inverse problem of the form (1), i.e., both parameters enter thesame equation, namely the acoustic wave equation.

We start by deﬁning the forward problem, i.e., theacoustic wave PDE. The propagation of acoustic waves depends on the bulk modulus κ and the density ρ of the medium of propagation. Let us deﬁne the acoustic pressure, u ( x , t ) := − κ ( x ) ∇· u ( x , t ), with u ( x , t ) the displacement vector at location x andtime t . The time-domain acoustic wave equation with ﬁrst order absorbing boundary tructural similarity and regularization for joint inverse PDE problems ( a ) m ( b ) m (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 7.

Reconstructions for the parameter ﬁelds (a) m and (b) m ,obtained by solving a joint inverse problem (2) regularized with (i) the cross-gradient combined with 2 independent TV regularizations ( γ = 5 · − ),(ii) the normalized cross-gradient combined with the same independent TVregularizations ( γ = 7 · − and ε = 10 − ), (iii) the VTV joint regularization( γ = 4 · − and ε = 10 − ), and (iv) the nuclear norm joint regularization ( γ =4 · − and ε = 10 − ). The parameters for the independent TV regularizationsand all initial guesses are the same as used for the independent inverse problems(see caption in ﬁgure 6). The legend is as in ﬁgure 6. condition [28] and initial conditions at rest is given by1 κ ¨ u − ∇· (cid:18) ρ ∇ u (cid:19) = f, in Ω × (0 , T ) ,u ( x ,

0) = ˙ u ( x ,

0) = 0 , in Ω , ρ ∇ u · n = 0 , on ∂ Ω n × (0 , T ) , ρ ∇ u · n = − √ κρ ˙ u, on ∂ Ω a × (0 , T ) , (23)where f is a forcing term, ˙ u and ¨ u are the ﬁrst and second time derivatives of u , andthe boundary of the domain ∂ Ω is partitioned as ∂ Ω = ∂ Ω a ∪ ∂ Ω n . The acoustic wavevelocity of the medium is given by c , with the relation κ = ρc . The PDE in (23) isthe variable density form of the acoustic wave equation; when the density ρ is assumedconstant, equation (23) reduces to c ¨ u − ∆ u = ˜ f .Here, we assume that both the bulk modulus κ and the density ρ are unknown.Since they both appear in (23) through their inverse, we introduce the parameters α := 1 /κ and β := 1 /ρ , and formulate the inverse problem in terms of α and β . Ascommon in seismic inversion, we consider N s multiple experiments, characterized bytheir forcing terms f i and datasets d i , which corresponds to pointwise observationsin space, recorded continuously in time. The acoustic wave inverse problem is then tructural similarity and regularization for joint inverse PDE problems α,β> (cid:40) N s N s (cid:88) i =1 (cid:90) T | Bu i ( t ) − d i ( t ) | dt + R ( α, β ) (cid:41) , (24)where each u i solves the forward problem (23) with forcing term f i , α ¨ u i − ∇· ( β ∇ u i ) = f i , in Ω × (0 , T ) ,u i ( x ,

0) = ˙ u i ( x ,

0) = 0 , in Ω ,β ∇ u i · n = 0 , on ∂ Ω n × (0 , T ) ,β ∇ u i · n = − (cid:112) αβ ˙ u i , on ∂ Ω a × (0 , T ) . In our experiments, the physical constraints α, β >

Because the solution ofthe acoustic wave equation couples the parameters α and β , the inverse problem (24)could be regularized by two independent TV regularizations, i.e., R ( α, β ) = R TV ,ε ( α )+ R TV ,ε ( β ) [1]. However, the resulting problem can be diﬃcult to solve and does notincorporate the structural correlation that usually exists between these parametersdue to the types of rock occurring in the subsurface. Going beyond the use of ad-hocmethods to handle both parameters at once, some researchers have addressed (24) asa joint inverse problem [2, 3]. Previous attempts have used the cross-gradient term,but not its normalized version, the VTV or the nuclear norm regularization. In thissection, we study whether the use of joint regularization can improve reconstructionsfor α and β .In our numerical tests, we use 6 independent sources, f i ( x , t ), located on the topboundary of the domain at 0.1, 0.25, 0.4, 0.6, 0.75, and 0.9 from the left boundary(yellow stars in ﬁgure 8a); each source is a point source in space, and a Ricker waveletin time with a central frequency of 2 Hz. The data are recorded at 20 locationsequally spaced along the top boundary (green triangles in ﬁgure 8b), and polluted byindependent Gaussian noise with zero mean and variance corresponding to a signal-to-noise ratio of 20 dB. The boundary conditions are a homogeneous Neumann boundarycondition along the top boundary ∂ Ω n = [0 , × { } , and an absorbing boundarycondition along the left, bottom, and right boundaries ∂ Ω a = { , }× [0 , ∪ [0 , ×{ } .The truth parameter ﬁelds for α and β are shown in ﬁgure 8; they correspond to anacoustic wave velocity varying from 2km/s to 3km/s ‡ , typical values for a shallowsubsurface (see for instance [29, 30]). The ﬁnite-element mesh consists of 800 triangles( h = 1 / α and β are smoothed versions of thetruth parameters ﬁelds (see ﬁgure 8ii).In ﬁgure 8iii, we show the reconstructions of parameters α and β obtained bysolving (24) with independent TV regularizations. Whereas parameter α is wellreconstructed, the reconstruction for β is rather poor. We next solve (24) withthe proposed joint regularization terms. The results are shown in ﬁgure 9, and thecorresponding values of the relative medium misﬁt are given in table C1.The diﬀerent reconstructions for α (ﬁgure 9a) do not diﬀer signiﬁcantly fromeach other. However, the use of joint regularization improves the quality of the ‡ The following units are used: distance in km, velocity in km/s, density in g/cm , and bulk modulusin GPa. tructural similarity and regularization for joint inverse PDE problems ( a ) α ( b ) β (i) truth (ii) initial (iii) independent Figure 8.

Parameter ﬁelds (a) α and (b) β in the joint acousticinverse problem (24): (i) truth parameter ﬁelds, (ii) initial guesses, and(iii) reconstructions when solving (24) regularized with two independent TVregularizations ( ε = 10 − , γ α = 5 · − , and γ β = 9 · − ). The yellow starsin (a-i) and the green triangles in (b-i) indicate the locations of the point sourcesand observations, respectively. ( a ) α ( b ) β (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 9.

Reconstructions for the parameter ﬁelds (a) α and (b) β , obtained bysolving (24) regularized with (i) the cross-gradient ( γ = 10 − ) combined with twoindependent TV regularizations, (ii) the normalized cross-gradient ( γ = 9 · − and ε = 10 − ) combined with the same independent TV regularizations, (iii)the VTV joint regularization ( γ = 7 · − and ε = 10 − ), and (iv) the nuclearnorm joint regularization ( γ = 7 · − and ε = 10 − ). The parameters for theindependent TV regularizations are the ones selected for the independent inverseproblems (see caption in ﬁgure 8). The legend is as in ﬁgure 8. tructural similarity and regularization for joint inverse PDE problems β (ﬁgure 9b). Whereas the use of the cross-gradientonly results in marginal improvement compared to the reconstruction in ﬁgure 8d, theuse of the normalized cross-gradient allows recovery of the interfaces more clearly.The best reconstructions are obtained using the VTV or the nuclear norm jointregularizations. As a last problem, we study a joint inverse problem (2) governed by two diﬀerentphysics models; namely, we combine a Poisson inverse problem and an acoustic waveinverse problem (assuming the density ρ is known). This inverse problem is intendedas a model problem for joint seismic-electromagnetic inversion in the electromagneticlow frequency limit. The Poisson inverse problem is identical to the one used insection 5.1, min m (cid:26) | Bu − d | + γ m (cid:90) Ω (cid:112) |∇ m | + ε dx (cid:27) , where (cid:40) −∇· ( e m ∇ u ) = 1 in Ω ,u = 0 on ∂ Ω . (25)The observation operator B extracts the state u at 20 ×

20 equally distributed pointsover the entire domain (white dots in ﬁgure 10a). The data are polluted with1% Gaussian noise. For the acoustic wave inverse problem, we set β ≡

1, and invertonly for the parameter α = 1 /κ = 1 /c ,min α (cid:40) (cid:90) T | Bu ( t ) − d ( t ) | dt + γ α (cid:90) Ω (cid:112) |∇ α | + ε (cid:41) dx, where  α ¨ u − ∆ u = f α , in Ω × (0 , T ) ,u ( x ,

0) = ˙ u ( x ,

0) = 0 , in Ω , ∇ u · n = 0 , on ∂ Ω n × (0 , T ) , ∇ u · n = −√ α ˙ u, on ∂ Ω a × (0 , T ) . (26)We use a single source f α with frequency 2 Hz or 4 Hz, located at (0 . , .

1) (yellow starin ﬁgure 11a), and 20 pointwise observations equally spaced along the top boundary(green triangles in ﬁgure 11a). The boundary conditions, the noise level in the data,the mesh, and the numerical discretization are as in section 5.2. The initial guess forthe Poisson parameter ﬁeld m (resp. for the acoustic parameter ﬁeld α ) is set to aconstant ﬁeld with value 0 .

625 (resp. 0 . m and α when (25) and (26) are solved independently. The results for the Poisson inverseproblem (26) are shown in ﬁgure 10b, where it can be seen that the horizontal interfaceis well reconstructed, but the shape of the rectangular perturbation is smeared out. Forthe acoustic wave inverse problem (26), we show two reconstructions in ﬁgure 11, onewith a source f α of frequency 2 Hz (ﬁgure 11b), and one with a source f α of frequency4 Hz (ﬁgure 11c). While the reconstruction at 2 Hz is excellent, the reconstructionat 4 Hz lacks suﬃcient low-frequency information and appears to converge toward alocal minimum, missing the horizontal discontinuity present in the truth parameterﬁeld (ﬁgure 11a). The reconstructions for all four joint inverse problems, with a seismic tructural similarity and regularization for joint inverse PDE problems Figure 10.

Plots of (a) truth parameter ﬁeld for m in the Poisson inverseproblem (25), and (b) its reconstruction ( γ m = 2 · − and ε = 10 − ) withinitial parameter ﬁeld set to a constant value of 0 . (a) truth (b) reconstruction 2 Hz (c) reconstruction 4 Hz Figure 11.

Plots of (a) truth parameter ﬁeld for α in the acoustic inverseproblem (26), and (b,c) its reconstructions ( γ α = 3 · − and ε = 10 − ) withinitial value for the parameter ﬁeld set to 0 .

25, and a source f α of frequency(b) 2 Hz, and (c) 4 Hz. The green triangles in (a) indicate the locations of thepointwise observations, and the yellow star in (a) indicates the location of thesource. source f α of frequency 4 Hz, are shown in ﬁgure 12, and the corresponding values ofthe relative medium misﬁt are given in table C1.The use of the cross-gradient or its normalized variant improves the reconstructionfor the Poisson parameter m (ﬁgures 12a, (i) and (ii)), compared to the reconstructionfrom the Poisson inverse problem (25) alone (ﬁgure 10b). However, neither of thecross-gradient terms brings any improvement to the reconstruction of the acousticwave velocity (ﬁgures 12b, (i) and (ii)); in particular, the reconstructions do not showthe horizontal discontinuity that was missing in the reconstruction of the acousticwave velocity alone (ﬁgure 11c). On the other hand, the use of either the VTVjoint regularization, or the nuclear norm joint regularization, leads to signiﬁcantimprovements in the reconstruction of the acoustic wave velocity (ﬁgures 12b, (iii)and (iv)). Both reconstructions contain all features of the truth parameter ﬁeld(ﬁgure 11a); most noticeably, the horizontal discontinuity that was missing in theindependent reconstruction (ﬁgure 11c) is now fully reconstructed. The use of theVTV joint regularization provides only marginal improvement to the reconstructionof the Poisson parameter m , in terms of relative medium misﬁt (table C1); however,the shape of the rectangular perturbation, which was smeared out in the reconstructionfrom the Poisson inverse problem alone (ﬁgure 10b), is clearer in ﬁgure 12a(iii).The reconstruction of the Poisson parameter obtained with the nuclear norm joint tructural similarity and regularization for joint inverse PDE problems ( a ) m ( b ) α (i) cross-gradient (ii) norm. cross-gd (iii) vectorial TV (iv) nuclear norm Figure 12.

Reconstructions for the parameter ﬁelds (a) m in (25) and (b) α in (26), obtained by solving a joint inverse problem with seismic source f α offrequency 4 Hz, and regularized with (i) the cross-gradient ( γ = 8 · − ) combinedwith two TV regularizations, (ii) the normalized cross-gradient ( γ = 8 · − and ε = 10 − ) combined with the same TV regularizations, (iii) the VTV jointregularization ( γ = 4 · − and ε = 10 − ), and (iv) the nuclear norm jointregularization ( γ = 5 · − and ε = 10 − ). The parameters for the independentTV regularizations are as for the independent inverse problems (see captions ofﬁgures 10 and 11). Legend is the same as in ﬁgures 10 and 11. regularization indicates that the optimization converged to a local minimum. Despiteall discontinuities present in the truth parameter ﬁeld (ﬁgure 10a) being clearlyreconstructed in ﬁgure 12a(iv), the values of the parameters are signiﬁcantly diﬀerent.Similar, or worse, performance was observed when setting H to be a multiple ofthe identity matrix in the BFGS solver [17]. Moreover, almost identical results wereobtained when solving the Poisson-acoustic joint inverse problem, regularized by VTV,using the BFGS method described in Appendix A.2. We therefore conjecture that thepoor performance of the nuclear norm joint regularization, in the case of a multi-physics joint inverse problem, can be attributed to the use of a gradient-based methodfor the solution of the joint inverse problem. The signiﬁcant diﬀerence in the structureof the gradients, coming from the Poisson and acoustic wave inverse problems, dictatethe use of a Newton method, which is aﬃne-invariant, in order to balance theindividual search directions. This conjecture is supported by previous results foundin the literature. For instance, in the context of a joint full waveform inversionfor the conductivity and permittivity of a medium, the authors in [31] found thereconstructions obtained using the L-BFGS method to be highly sensitive to the scalingof the parameter ﬁelds. The authors of [32] report similar diﬃculties when employinga quasi-Newton method on a cross-well example, inverting for compressibility andanisotropy, and study alternative formulations to remedy this problem. tructural similarity and regularization for joint inverse PDE problems

6. Conclusion

We conducted a systematic review of regularization terms for joint inverse problemsgoverned by PDEs with inﬁnite-dimensional parameter ﬁelds. We considered twotypes of joint inverse problems: (1) those coupling several uncoupled physics forwardproblems via joint regularization terms, and (2) those in which all inversion parametersdepend on the same physics. Based on a review of the literature, we identiﬁedthree joint regularization terms for this study that are tractable for large-scalePDE constrained joint inverse problems. The cross-gradient is a popular choice ingeophysical applications and seeks to align level sets of the parameter ﬁelds. Thenormalized cross-gradient was designed to overcome some of the potential weaknessesof the cross-gradient term. The vectorial total variation is an extension of totalvariation regularization to joint inverse problems, and originated from the imagingcommunity. In addition, we introduced a fourth novel joint regularization term basedon the nuclear norm of a gradient matrix. The comparison of these joint regularizationterms was carried out for three problems: (1) a joint Poisson inverse problem for whichthe truth parameter ﬁelds are known to share a similar structure, (2) an acoustic waveinverse problem in which we invert for the bulk modulus and the density, and (3) ajoint Poisson–acoustic wave inverse problem, providing an example of multiple physicsjoint inversion.Based on this study, we recommend use of the vectorial total variation jointregularization. It leads to superior reconstructions in all our examples. Moreover,we have available a scalable, eﬃcient primal-dual nonlinear optimization solver andHessian preconditioner for joint inverse problems regularized with this term [20]. Thenuclear norm joint regularization showed encouraging results, even leading to slightlybetter reconstructions than the vectorial total variation for some examples. However,its numerical realization is challenging since it is not twice diﬀerentiable as requiredby Newton’s method. For piecewise-homogeneous parameter ﬁelds, the cross-gradientsimilarity term does not improve signiﬁcantly over independent reconstructions. Inparticular, it can fail to reconstruct some edges entirely, since the cross-gradient termvanishes at points where one parameter ﬁeld is constant. The normalized cross-gradient similarity term leads to a joint inverse problem that is challenging to solvenumerically. Even though it improves on the cross-gradient, the improvement isgenerally minimal, and the reconstructions do not compare favorably with the onesobtained with vectorial total variation. Compared to the cross-gradient approaches,an additional advantage of the VTV and nuclear norm functionals is that they alsoact as regularizations, making individual regularization functionals unnecessary. Thisreduces the number of hyperparameters or regularization weights that must be chosen(see Table B1), thereby simplifying the inverse problem.

Appendix A. Summary of numerical optimization techniques for thesolution of regularized inverse problems

In this section, we describe the large-scale numerical optimization methods used forour numerical examples. As already discussed in the introduction, the solution ofPDE-constrained optimization problems typically requires iterative methods. Thesemethods require ﬁrst (and ideally, also second) derivatives of the objective functionwith respect to the parameter ﬁelds [17, 33]. These derivatives can be computedeﬃciently using adjoint methods [34, 35, 36]. In particular, the computation of a tructural similarity and regularization for joint inverse PDE problems m = ( m , m ), the objective function by J ( m )and use upper indices to denote iteration numbers. Appendix A.1. Line-search Newton-CG for cross-gradient and VTV regularizations

In the k -th iteration, we update the medium parameters m ( k ) along a searchdirection p ( k ) by computing m ( k +1) = m ( k ) + α ( k ) p ( k ) with an appropriate steplength α ( k ) >

0. To ensure convergence, the search direction must be a descentdirection, i.e., it must satisfy (cid:104) g ( k ) , p ( k ) (cid:105) <

0, where g ( k ) is the gradient of J withrespect to m evaluated at m ( k ) , and (cid:104)· , ·(cid:105) is an appropriate inner product. The steplength α ( k ) could be chosen to minimize the objective functional along this searchdirection p ( k ) . However, solving this minimization problem exactly is too expensivefor large-scale applications, since a single evaluation of the objective functional requiresthe solution of the state PDE, potentially multiple times (e.g., N s times in the exampleof section 5.2 which has multiple sources). Instead, we seek an approximate minimizerthat satisﬁes the following Armijo condition to ensure suﬃcient descent, J ( m ( k ) + α ( k ) p ( k ) ) ≤ J ( m ( k ) ) + c α ( k ) (cid:104) g ( k ) , p ( k ) (cid:105) , (A.1)with 0 < c <

1. To ensure suﬃciently large step lengths, we use backtrackingline search [17] to ﬁnd a step length that satisﬁes (A.1). That is, the step lengthis computed by starting from an initial guess α ( k )0 >

0, and is reduced until thesuﬃcient descent condition (A.1) is satisﬁed. When computing the search direction fora Newton-type method (see next paragraph), we use α ( k )0 = 1, since this is guaranteedto be a successful step length in a neighborhood of a minimizer [17].The choice of good search directions is crucial in PDE-constrained optimization.In the steepest descent method, one chooses the search direction as the negativegradient, i.e., p ( k ) = − g ( k ) . Unfortunately, the resulting algorithm usually convergesslowly in the presence of stretched contour lines of the objective J , a consequenceof the typical ill-posedness of inverse problems. The Newton direction is givenby the solution of the linear system H ( m ( k ) ) p ( k ) = − g ( k ) , where H ( m ( k ) ) is theHessian, i.e., the second derivative of J , evaluated at m ( k ) . The direction p ( k ) arisingas solution of this equation is a descent direction only if the Hessian is positivedeﬁnite, which may not be the case, in particular far away from the minimizer.When the Hessian is indeﬁnite, one solution is to replace the Hessian with a positivedeﬁnite approximation, a common choice being the Gauss-Newton Hessian [17]. Thisapproximation is obtained by setting the adjoint variables to zero in the computationof the Hessian. Another option is to retain the full Hessian but solve the Newtonsystem approximately, in a way that guarantees the computed solution to be a descentdirection. Since for large-scale problems exactly constructing the Hessian is infeasible, tructural similarity and regularization for joint inverse PDE problems Appendix A.2. BFGS method for nuclear norm regularization

To solve joint inverse problems regularized with the nuclear norm joint regularization(section 4), we use a BFGS quasi-Newton method with damped update [17]. That is,we ﬁnd the search direction p ( k ) by computing p ( k ) = − B ( k ) g ( k ) , where g ( k ) is againthe gradient of the objective function and B ( k ) is a positive deﬁnite approximation ofthe inverse of the Hessian. This approximation is updated at each iteration with therank-2 update B ( k +1) = ( I − ρ k r ( k ) ( y ( k ) ) T ) B ( k ) ( I − ρ k y ( k ) ( r ( k ) ) T ) + ρ k r ( k ) ( r ( k ) ) T , (A.2)where y ( k ) is the diﬀerence between the gradient at steps k + 1 and k , ρ k :=1 / ( y ( k ) ) T r ( k ) , and r ( k ) is the damped form of s ( k ) , the diﬀerence between the parameterat steps k + 1 and k , and is deﬁned as r ( k ) := θ k s ( k ) + (1 − θ k ) B ( k ) y ( k ) , with θ k :=  , if ( s ( k ) ) T y ( k ) ≥ α ( y ( k ) ) T B ( k ) y ( k ) , (1 − α )( y ( k ) ) T B ( k ) y ( k ) ( y ( k ) ) T B ( k ) y ( k ) − ( s ( k ) ) T y ( k ) , otherwise.The classical BFGS method requires the curvature condition ( s ( k ) ) T y ( k ) > B ( k ) for all k . However, the curvature condition can be guaranteed to be satisﬁedonly when the objective function is strictly convex, which is typically not the case fornonlinear inverse problems. Using a damped update allows us to apply a backtrackingline search, while avoiding skipping some updates of B ( k ) entirely. In our numericalexperiments, we found that α = 0 . B (0) . BFGS-type methods perform well when the diﬀerence betweenthe initial Hessian approximation and the true Hessian is a compact operator [37].Thus, we take B (0) as the inverse of the Hessian of the regularization. This quantity isnot available for the nuclear norm joint regularization. However, VTV and the nuclearnorm joint regularization come from the same family of joint regularizations, diﬀeringonly by the matrix norm employed [14]. Since matrix norms are equivalent in ﬁnite tructural similarity and regularization for joint inverse PDE problems B (0) to the inverse of the Hessian of the VTV joint regularizationat the parameter m ( k ) . Appendix B. Number of hyperparameters for each joint regularization

Table B1.

Number of hyperparameters for a joint inverse problem with 2parameter ﬁelds. joint regularizationcross-grad n-cross-grad vectorial TV nuclear norm γ TV 2 2 – –joint 1 1 1 1 ε TV 1 1 – –joint 1 1 1 1total 5 5 2 2

Appendix C. Table of relative medium misﬁts for examples

In table C1, the relative misﬁts for the examples presented in section 5 are summarized.

Table C1.

Relative medium misﬁts (in L -norm) for the examples in section 5. sec. 5.1.1 sec. 5.1.2 sec. 5.2 sec. 5.3 m m m m α β m α independent 23.2% 5.1% 46.9% 5.1% 2.8% 0.8% 9.0% 9.9%cross-grad 22.3% 5.2% 46.1% 5.6% 3.1% 0.7% 4.9% 11.0%n-cross-grad 21.2% 5.0% 46.7% 5.0% 2.5% 0.4% 4.9% 10.7%vectorial TV 20.2% 5.1% 41.1% 5.2% 2.4% 0.2% 8.9% 3.3%nuclear norm 20.2% 4.8% 40.8% 5.0% 2.4% 0.2% 20.6% 4.5% Acknowledgments

The authors would like to thank David Keyes (KAUST) and George Turkiyyah (AUB)for very helpful discussions that inspired this work. They would also like to thankSergey Fomel (UT-Austin) for referring them to [2, 3], Jan Modersitzki (L¨ubeck) fordirecting their attention to [9], Nick Alger (UT-Austin) for help with ﬁgure 1, and twoanonymous referees whose comments helped signiﬁcantly to improve the manuscript.This work was partially supported by AFOSR grant FA9550-17-1-0190, DOE grantDE-SC0009286, KAUST award OSR-2016-CCF-2596, and NSF grants ACI-1550593,DMS-1723211, CBET-1507009, and CBET-1508713. tructural similarity and regularization for joint inverse PDE problems References [1] Ioannis Epanomeritakis, Volkan Ak¸celik, Omar Ghattas, and Jacobo Bielak. A Newton-CGmethod for large-scale three-dimensional elastic full-waveform seismic inversion.

InverseProblems , 24(3):034015 (26pp), 2008.[2] Edgar Manukyan, Hansruedi Maurer, and Andre Nuber. Improvements to elastic full waveforminversion using cross-gradient constraints. In

SEG Technical Program Expanded Abstracts2016 , pages 1506–1510. Society of Exploration Geophysicists, 2016.[3] Maokun Li, Lin Liang, Aria Abubakar, and Peter M van den Berg. Structural similarityregularization scheme for multiparameter seismic full waveform inversion. In

SEG TechnicalProgram Expanded Abstracts 2013 , pages 1089–1094. Society of Exploration Geophysicists,2013.[4] A Abubakar, G Gao, Tarek M Habashy, and J Liu. Joint inversion approaches for geophysicalelectromagnetic and elastic full-waveform data.

Inverse Problems , 28(5), 2012.[5] Oguz Semerci, Guangdong Pan, Maokun Li, Lin Liang, and Tarek Habashy. Jointelectromagnetic and seismic inversion for petrophysical parameters using multi-objectiveoptimization. In

SEG Annual Meeting , Denver, CO, 26-31 October 2014. SEG.[6] Xuan Feng, Qianci Ren, Cai Liu, and Xuebing Zhang. Joint acoustic full-waveform inversion ofcrosshole seismic and ground-penetrating radar data in the frequency domain.

Geophysics ,82(6):H41–H56, 2017.[7] Luis A. Gallardo and Max A. Meju. Characterization of heterogeneous near-surface materials byjoint 2D inversion of DC resistivity and seismic data.

Geophysical Research Letters , 30(13),2003.[8] Klara Steklova and Eldad Haber. Joint hydrogeophysical inversion: state estimation for seawaterintrusion models in 3D.

Computational Geosciences , 1(21):75–94, 2016.[9] Eldad Haber and Jan Modersitzki. Intensity gradient based registration and fusion of multi-modal images.

Proceedings of Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2006 , pages 726–733, 2006.[10] Eldad Haber and Michal Holtzman Gazit. Model fusion and joint inversion.

Surveys inGeophysics , 34(5):675–695, 2013.[11] Luis A. Gallardo and Max A. Meju. Structure-coupled multiphysics imaging in geophysicalsciences.

Reviews of Geophysics , 49(1), 2011.[12] Peter Blomgren and Tony F. Chan. Color TV: Total variation methods for restoration of vector-valued images.

IEEE Transactions on Image Processing , 7(3):304–309, 1998.[13] Xavier Bresson and Tony F Chan. Fast dual minimization of the vectorial total variation normand applications to color image processing.

Inverse problems and imaging , 2(4):455–484,2008.[14] Kevin M Holt. Total nuclear variation and jacobian extensions of total variation for vectorﬁelds.

IEEE Transactions on Image Processing , 23(9):3975–3989, 2014.[15] Florian Knoll, Martin Holler, Thomas Koesters, Ricardo Otazo, Kristian Bredies, and Daniel KSodickson. Joint MR-PET reconstruction using a multi-channel image regularizer.

IEEEtransactions on medical imaging , 36(1):1–16, 2017.[16] Volkan Ak¸celik, George Biros, and Omar Ghattas. Parallel multiscale Gauss–Newton–Krylovmethods for inverse wave propagation. In

Proceedings of IEEE/ACM SC2002 Conference ,Baltimore, MD, November 2002.[17] Jorge Nocedal and Stephen J. Wright.

Numerical Optimization . Springer Verlag, Berlin,Heidelberg, New York, second edition, 2006.[18] Ron S. Dembo, Stanley C. Eisenstat, and Trond Steihaug. Inexact Newton methods.

SIAMJournal on Numerical Analysis. , 19:400–408, 1982.[19] Antonin Chambolle, Vicent Caselles, Daniel Cremers, Matteo Novaga, and Thomas Pock. Anintroduction to total variation for image analysis.

Theoretical foundations and numericalmethods for sparse recovery , 9(263-340):227, 2010.[20] Benjamin Crestel.

Advanced techniques for multi-source, multi-parameter, and multi-physicsinverse problems . PhD thesis, University of Texas at Austin, 2017.[21] Michael Hinterm¨uller and Georg Stadler. An infeasible primal-dual algorithm for total variation-based inf-convolution-type image restoration.

SIAM Journal on Scientiﬁc Computing ,28(1):1–23, 2006.[22] Carlos Fernandez-Granda. Optimization-based data analysis.

Lecture notes at NYU-CIMS ,April 2016.[23] Th´eodore Papadopoulo and Manolis I. A. Lourakis.

Estimating the Jacobian of the SingularValue Decomposition: Theory and Applications , pages 554–570. Springer Berlin Heidelberg, tructural similarity and regularization for joint inverse PDE problems Berlin, Heidelberg, 2000.[24] Uri M. Ascher, Eldad Haber, and Hui Huang. On eﬀective methods for implicit piecewise smoothsurface recovery.

SIAM Journal on Scientiﬁc Computing , 28(1):339–358, 2006.[25] Anders Logg, Kent-Andre Mardal, and Garth Wells.

Automated Solution of DiﬀerentialEquations by the Finite Element Method: The FEniCS book , volume 84. Springer Science &Business Media, 2012.[26] Anders Logg and Garth N. Wells. DOLFIN: Automated ﬁnite element computing.

ACMTransactions on Mathematical Software , 37(2):Article 20, 2010.[27] Umberto Villa, Noemi Petra, and Omar Ghattas. hIPPYlib: An Extensible Software Frameworkfor Large-Scale Deterministic and Linearized Bayesian Inverse Problems. To be submitted,2016.[28] Bjorn Engquist and Andrew Majda. Absorbing boundary conditions for the numerical simulationof waves.

Mathematics of Computation , 31(139):629–651, 1977.[29] Gary S Martin, Robert Wiley, and Kurt J Marfurt. Marmousi2: An elastic upgrade forMarmousi.

The Leading Edge , 25(2):156–166, 2006.[30] Jean Virieux and Stephane Operto. An overview of full-waveform inversion in explorationgeophysics.

Geophysics , 74(6):WCC1–WCC26, 2009.[31] Francois Lavou´e, Romain Brossier, Ludovic M´etivier, St´ephane Garambois, and Jean Virieux.Two-dimensional permittivity and conductivity imaging by full waveform inversion ofmultioﬀset gpr data: A frequency-domain quasi-newton approach.

Geophysical JournalInternational , 197(1):248–268, 2014.[32] Bas Peters and Felix J Herrmann. A sparse reduced Hessian approximation for multi-parameterwaveﬁeld reconstruction inversion. In

SEG Technical Program Expanded Abstracts 2014 ,pages 1206–1210. Society of Exploration Geophysicists, 2014.[33] Curt R. Vogel.

Computational Methods for Inverse Problems . Frontiers in AppliedMathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,2002.[34] Juan Carlos De Los Reyes.

Numerical PDE-constrained optimization . Springer, 2015.[35] Michael Hinze, Rene Pinnau, Michael Ulbrich, and Stefan Ulbrich.

Optimization with PDEConstraints . Springer, 2009.[36] Fredi Tr¨oltzsch.

Optimal Control of Partial Diﬀerential Equations: Theory, Methods andApplications , volume 112 of

Graduate Studies in Mathematics . American MathematicalSociety, 2010.[37] A. Griewank. The local convergence of Broyden-like methods on Lipschitzian problems in Hilbertspace.