[PDF] Multi-modality imaging with structure-promoting regularisers

Abstract

Imaging with multiple modalities or multiple channels is becoming increasingly important for our modern society. A key tool for understanding and early diagnosis of cancer and dementia is PET-MR, a combined positron emission tomography and magnetic resonance imaging scanner which can simultaneously acquire functional and anatomical data. Similarly in remote sensing, while hyperspectral sensors may allow to characterise and distinguish materials, digital cameras offer high spatial resolution to delineate objects. In both of these examples, the imaging modalities can be considered individually or jointly. In this chapter we discuss mathematical approaches which allow to combine information from several imaging modalities so that multi-modality imaging can be more than just the sum of its components.

Full PDF

MMulti-modality imagingwith structure-promoting regularisers

Matthias J. Ehrhardt

Institute for Mathematical Innovation, University of Bath, UK [email protected]

Abstract

Imaging with multiple modalities or multiple channels is becoming increasingly important forour modern society. A key tool for understanding and early diagnosis of cancer and dementiais PET-MR, a combined positron emission tomography and magnetic resonance imaging scannerwhich can simultaneously acquire functional and anatomical data. Similarly in remote sensing, whilehyperspectral sensors may allow to characterise and distinguish materials, digital cameras oﬀer highspatial resolution to delineate objects. In both of these examples, the imaging modalities can beconsidered individually or jointly. In this chapter we discuss mathematical approaches which allowto combine information from several imaging modalities so that multi-modality imaging can be morethan just the sum of its components.

Many tasks in almost all scientiﬁc ﬁelds can be posed as an inverse problem of the form Ku = f (1)where K is a mathematical model that connects an unknown quantity of interest u to measured data f . The task is to recover u from data f under the model K . In practice this task is diﬃcult becauseof measurement errors in the data f and inaccuracies in the model K . Moreover, in many cases themodel (1) lacks information we have at hand about the unknown quantity u such as its regularity. Inthis chapter we are interested in the situation when have a-priori knowledge about the ”structure” of u from a second measurement v which we want to exploit in the inversion. Throughout this chapterwe will refer to v as the side information . Intuitively, this is the case when u and v describe diﬀerentproperties of the same geometry (in medicine: anatomy). We will be more precise in Section 2 where wediscuss mathematical models for structural similarity. The two notions we will discuss in detail that theedges of the two images u and v having similar 1) locations [1–5] and 2) directions [3, 5–16]. Real-worldexamples for these mathematical models are numerous as we will see in the next section. Historically the ﬁrst application where information from several modalities was combined was positronemission tomography (PET) and magnetic resonance imaging (MRI) in the early 1990s [17]. Sharinginformation between two diﬀerent imaging modalities is motivated by the fact that all images will behighly inﬂuenced by the same underlying anatomy, see Figure 1. Since single-photon emission computedtomography (SPECT) imaging is both mathematically and physically similar to PET imaging, mostof the proposed models can be directly translated and often models are proposed for both modalitiessimultaneously, see e.g. [18–21]. Over the years there always has been research in this direction, seee.g. [18–20, 22–34], which was intensiﬁed with the advent of the ﬁrst simultaneous PET-MR scanner in2011 [35], see e.g. [4, 5, 11–13, 36–43].The same motivation applies to other medical imaging techniques, for example multi-contrast MRI,see e.g. [10, 44–48]. In multi-contrast MRI multiple acquisition sequences are used to acquire data of thesame patient, see Figure 2 for a T - and a T -weighted image with share anatomy. Other special casesare the combination of anatomical MRI (e.g. T -weighted) and magnetic particle imaging [14], functionalMRI (fMRI) and anatomical MRI [49] as well as anatomical ( H) and ﬂuorinated gas ( F) MRI [50]. A1 a r X i v : . [ ee ss . I V ] J u l igure 1: PET-MR and PET-CT.

A low resolution functional PET image (left) is to be reconstructedwith the help of an anatomical MRI (middle) or CT image (right). As is evident from the images, allthree images share many edges due to the same underlying anatomy. Note that the high soft tissuecontrast in MRI makes it favourable over CT for this application. Images curtesy of P. Markiewicz andJ. Schott.Figure 2:

Multi-contrast MRI.

The same MRI scanner can produce diﬀerent images depending onthe acquisition sequence such as T -weighted (left) and T -weighted images (right). Images courtesy ofN. Burgos.related imaging task is quantitative MRI (such as Magnetic Resonance Fingerprinting [51]) [52–55] whereone aims to reconstruct quantitative maps of tissue parameters (e.g. T , T , proton density, oﬀ-resonancefrequency), but regularisers coupling these maps have not been used to date. The idea to couple channelshas also been used for parallel MRI [56].Starting from the 1990s, mathematical models were developed that make use of the expected corre-lations between colour channels of RGB images [58–60], see Figure 3. Research in this ﬁeld is still veryactive today, see e.g. [2, 8, 61–64].In remote sensing observations are often available from multiple sensors either mounted on a planeor on a satellite. For example a hyperspectral camera with low spatial resolution and a digital camerawith higher spatial resolution may be used simultaneously, see Figure 4. This situation naturally invitesfor the fusion of information, see [15, 65–71] and references therein. In some situations the response ofthe cameras to certain wavelengths is (assumed to be) known such that the data can be fused makinguse of this knowledge. This is commonly referred to as, see e.g. pansharpening [68–70]. It is importantto note that this assumption is sometimes not fulﬁlled and many of the aforementioned algorithms areﬂexible enough to fuse data in this more general situation.Dual and spectral computed tomography (CT) is becoming increasingly popular in (bio-) medicalimaging and material sciences due to its ability to distinguish diﬀerent materials which would not beFigure 3: Color imaging.

The color image (left) is composed of three color channels (right) all of whichshow similar edges due to the same scenery. Images courtesy of M. Ehrhardt.2igure 4:

Hyperspectral imaging + photography.

A nowadays common scenario is that multiplecameras are mounted on a plane or satellite for remote sensing. While one camera carries spectralinformation (right), the other has high spatial resolution (left). Images courtesy of D. Coomes.Figure 5:

Spectral CT.

Standard (white-beam) CT on the left and three channels (28, 34 and 39 keV)of spectral CT on the right of an iodine-stained lizard head reconstructed by CIL [57]. The spectralchannels clearly show a large increase in intensity from 28 to 34 keV, thereby revealing the presence,location and concentration of iodine. Images courtesy of J. Jorgensen and R. Warr.possible using a single energy, see Figure 5. Since the energy channels have a very diﬀerent signal-to-noise ratio, coupling them within the reconstruction allows to transfer information from high signal tolow signal channels [9, 72–74].In geophysics, the coupling between modalities has been used to model similarity between electricalresistivity and seismic velocity [6, 7], estimating conductivity from multi-frequency data [75], invertinggravity and seismic tomography [75] and controlled-source electromagnetic resistivity inversion [76]. Foran overview and more details on examples in geophysics see in [3, 77] and references therein.Ideas from multi-modality imaging have recently also been used for art restoration. When a canvas ispainted on both sides, an x-ray image shows the superposition of both paintings. The x-ray informationcan then be separated using photos of both sides of the canvas [78].Other examples that were considered in the literature are combining anatomical information andelectrical impedance tomography [16, 79], CT and MRI [80], photo-acoustic and optical coherence to-mography [81], x-ray ﬂuorescence and transmission tomography [82] and various channels in multi-modalelectron tomography [83]. The combination of various imaging modalities into one system may eventuallylead to what is sometimes referred to as omni-tomography [84].Image reconstruction with side information is mathematically similar to multi-modal image registra-tion and thus it is not surprising that both ﬁelds share a lot of mathematical models, see e.g. [85–88].

Inverse problems of the form (1) can be solved using variational regularisation, i.e. framed as theoptimisation problem u α ∈ arg min u D ( Au, f ) + α R ( u ) . (2)Here the data ﬁdelity D : Y × Y → R ∞ := R ∪ {∞} measures how close the estimated data Au ﬁts theacquired data f . The regulariser (also referred to as the prior ) R : X → R ∞ deﬁnes which propertiesof the image u we favour and which we do not. The trade-oﬀ between data ﬁtting and regularisationcan be chosen using the regularisation parameter α >

0. Problems of this form have been extensivelystudied, see for instance [89–93] and references therein.Three popular regularisers for imaging are the squared H -semi norm (H ), the total variation (TV)[94, 95] and the total generalised variation (TGV) [96–98]. It is common to model images as functions3 : Ω ⊂ R d → R . If u is smooth enough, then these regularisers are deﬁned asH ( u ) = (cid:90) Ω |∇ u ( x ) | d x (3)TV( u ) = (cid:90) Ω |∇ u ( x ) | d x (4)TGV( u ) = inf ζ (cid:90) Ω |∇ u ( x ) − ζ ( x ) | + β | Eζ ( x ) | d x . (5)Here ∇ u : Ω → R d , [ ∇ u ] i = ∂ i u denotes the gradient of u , Eζ : Ω → R d × d , [ Eζ ] i,j = ( ∂ i ζ j + ∂ j ζ i ) / ζ : Ω → R d , see [98] for more details, and | · | denotesthe Euclidean/Frobenius norm. For TV and TGV it is of interest to develop other formulations whichare well-deﬁned even when u is not smooth. For simplicity, we do not go into more detail in this directionbut refer the interested reader to the literature, e.g. [95, 96].All three regularisers promote solutions with diﬀerent smoothness properties. H promotes smoothsolutions with small gradients everywhere, whereas TV promotes solutions which have sparse gradients,i.e. the images are piecewise constant and appear cartoon-like. The latter also leads to the staircaseartefact which can be overcome by TGV which promotes piecewise linear solutions. None of theseregularisers are able to encode additional information on the location or direction of edges. The contributions in this chapter are threefold.

Overview over existing methods

We provide an overview on existing mathematical models forstructural similarity which are related to the shared location or direction of edges. We then discussvarious regularisers which promote similarity in this sense.

Higher order models

Existing methods focus on incorporating additional information into regularis-ers modelling ﬁrst-order smoothness. We extend existing methodology to second-order smoothness usingthe total generalised variation framework.

Extensive numerical comparison

We highlight the properties of the discussed regularisers and thedependence on various parameters using two inverse problems: tomography and super-resolution.

One can think of the setting (1) with extra information v as a special case when multiple measurements K i u i = f i i = 1 , . . . , m (6)are taken. If m = 2 and one inverse problem is considerably less ill-posed, then this can be solvedﬁrst to guide the inversion of the other. Some of the described models can be extended to the moregeneral case (e.g. an arbitrary number of modalities) or the joint recovery of both/all unknowns, see e.g.[3–9, 12, 38, 41, 56, 58, 63, 75–77, 82, 83, 99], but it is out of the scope of this chapter to provide anoverview on those. For an overview up to 2015, see [100]. A few recent contributions are summarized in[101].Model (6) may include several special cases i) multiple measurements of the same unknown, i.e. u i = u and ii) measurements correspond to diﬀerent states of the same unknown, e.g. in dynamicimaging u i = u ( · , t i ). The former case is covered by the standard literature when concatenating themeasurements and the systems models, i.e. ( Ku ) i := K i u and f = ( f , . . . , f m ). The latter has beenwidely studied in the literature, too, see e.g. [102–104] and references therein. Both of these are in generalunrelated to multi-modality imaging. 4 .4.2 Other models for similarity The earliest contributions to structure-promoting regularisers for multi-modality imaging were madein the early 1990s by Leahy and Yan [17] who used a segmentation of an anatomical MRI image toenhance PET reconstruction. This is achieved by carefully handcrafting a regulariser which can encodethis information. In this chapter we will use the same strategy but in a continuous setting which isindependent of the discretisation and will not rely on a segmentation of the side information v . Theseideas were subsequently reﬁned in various directions [18–20, 22–25, 27, 28, 33, 34, 44] of which Bowshersprior [23] remains most popular today.Other models that can combine information of multiple modalities are based on coupled diﬀusion[1, 61, 99], level-sets [30], information theoretic priors (joint entropy, mutual information) [21, 26, 29, 37],Bregman distances [49, 65, 66, 105, 106], Bregman iterations [64, 107], the structure tensor [108], jointdictionary learning [47, 78, 109], common edge weighting [41] and deep learning [48]. Most of thesemethods are very diﬀerent to what will be described in this chapter. There are some similarities betweenthe methods of this chapter and methods which are based on the Bregman distance of the total variation[49, 64–66, 105–107] but a detailed treatment is outside the scope of this section. In this section we deﬁne mathematical models where we aim to capture the similarities as shown inFigures 1 to 5. We start by explicitly stating two deﬁnitions which capture structural similarity whichhave been used implicitly in the literature. The ﬁrst is based on the location of edges or the edge set[1–5, 41, 56, 64] and the second is based on direction of edges or the shape of an object [3, 5–9, 12].The latter is essentially the same as Deﬁnition 5.1.6 in [100] except for the degenerate case when either ∇ u ( x ) = 0 or ∇ v ( x ) = 0. Deﬁnition 1 (Structural similarity with edge sets) . Two diﬀerentiable images u, v : Ω → R are said tobe structurally similar in the sense of edge sets if E u = E v (7) where E u = { x ∈ Ω | ∇ u ( x ) (cid:54) = 0 } . We also write u e ∼ v to denote that u and v are structurally similar inthe sense of edge sets. Deﬁnition 2 (Structural similarity with parallel level sets) . Two diﬀerentiable images u, v : Ω → R aresaid to be structurally similar in the sense of parallel level sets if u e ∼ v and for all x ∈ E u there is ∇ u ( x ) (cid:107) ∇ v ( x ) . (8) We also write u d ∼ v to denote that u and v are structurally similar in the sense of parallel level sets. Remark 1.

For smooth images u and v , their gradients are perpendicular to their level sets, i.e. u − ( s ) = { x ∈ Ω | u ( x ) = s } . Thus parallel gradients is equivalent to parallel level sets which explains the naming.The notion that the structure of an image is contained in its level sets dates back to [110]. Remark 2.

By deﬁnition, similarity with parallel level sets (Deﬁnition 2) is stronger than the deﬁnitionthat only involves edge sets (Deﬁnition 1). An example of two images u and v which have the sameedge set but do not have parallel level sets is the following. u, v : Ω ⊂ R → R , u ( x ) = x , v ( x ) = x .Clearly they have the same edge set since E u = E v = Ω , but they do not have parallel level sets since ∇ u ( x ) = [1 , but ∇ v ( x ) = [0 , . Remark 3.

Two images u and v have parallel level sets if and only if u e ∼ v and for all x ∈ E u thereexists α ∈ R such that ∇ u ( x ) = α ∇ v ( x ) . (9) Examples of images which have parallel level sets include:1.

Function value transformations . Let f : R → R be smooth and strictly monotonic, i.e. f (cid:48) > or f (cid:48) < . Then v := f ◦ u d ∼ u . This is readily to be seen from the fact that ∇ v ( x ) = f (cid:48) ( u ( x )) ∇ u ( x ) (cid:54) =0 if and only if ∇ u ( x ) (cid:54) = 0 . . Local function value transformations . Let f i : R → R be smooth and strictly monotonic and u = (cid:80) i u i where u i are smooth functions whose gradients have mutually disjoint support. Then v := (cid:80) i f i ◦ u i d ∼ u . Remark 4.

It has been argued in the literature that many multi-modality images z : Ω → R m essentiallydecompose as z i ( x ) = τ i ( x ) ρ ( x ) (10) where ρ ( x ) describes its structure and τ is a material property, see e.g. [63, 111]. Since the materialdoes not change arbitrarily, it is natural to assume that τ i is slowly varying or even piecewise constant.In the latter case, if x is such that ∇ τ i ( x ) = 0 , then we have ∇ z i ( x ) = τ i ( x ) ∇ ρ ( x ) , (11) in particular if τ i , τ j (cid:54) = 0 , then z i d ∼ z j . This property is also related to the material decomposition inspectral CT, see e.g. [112–114]. Measuring the degree of similarity with respect to the previous two deﬁnitions of structural similarityis not easy and we will now discuss a couple of ideas from the literature. Here and for the rest ofthis chapter, we will make frequent use of the vector-valued representation of a set of images z : Ω → R , z ( x ) := [ u ( x ) , v ( x )]. We denote by J its Jacobian, i.e. J : Ω → R d × , J i,j = ∂ i z j .With the deﬁnition of the Jacobian we see that u e ∼ v if and only if (cid:90) Ω | J ( x ) | d x = (cid:90) Ω |∇ u ( x ) | d x = (cid:90) Ω |∇ v ( x ) | d x (12)where | x | := 1 if x (cid:54) = 0 and 0 else.Similarly, by deﬁnition u d ∼ v if and only if u e ∼ v and (a) rank J ( x ) = 1 for all x ∈ E u . (a) isequivalent to (b) a vanishing determinant, i.e. det J (cid:62) ( x ) J ( x ) = 0. Simple calculations, see e.g. [100],show that det J (cid:62) ( x ) J ( x ) = |∇ u ( x ) | |∇ v ( x ) | − (cid:104)∇ u ( x ) , ∇ v ( x ) (cid:105) , (13)where we use the notation (cid:104) x, y (cid:105) = x (cid:62) y for the inner product of two column vectors x and y . In orderto get further equivalent statements we turn to the singular values of the Jacobian which are given by σ / ( x ) = 12 (cid:20) | J ( x ) | ± (cid:113) | J ( x ) | − det J (cid:62) ( x ) J ( x ) (cid:21) (14)with | J ( x ) | = |∇ u ( x ) | + |∇ v ( x ) | , see e.g. [100]. Since σ ( x ) ≥ σ ( x ) ≥ σ ( x ) = 0 or (d) the vector of singular vectors σ ( x ) = [ σ ( x ) , σ ( x )] is 1-sparse. Many of the abstract models from the previous section to measure the degree of similarity with respectto the previous two deﬁnitions of structural similarity are computationally challenging as they relateto non-convex constraints. In this section we will deﬁne convex structure-promoting regularisers whichmake them computationally tractable.

We ﬁrst look at isotropic models which only depend on gradient magnitudes rather than directions, thuspromote structural similarity in the sense of edge sets, Deﬁnition 1.6 mage η = − η = − η = − Figure 6: Inﬂuence of the parameter η on estimation of edge location . The images on the right showthe scalar ﬁeld w : Ω → [0 ,

1] which locally weights the inﬂuence of the regulariser, see (18). Here ”black”denotes 0 and ”white” denotes 1.First, based on (12) if we approximate | J ( x ) | by | J ( x ) | , thenJTV( u ) = (cid:90) Ω | J ( x ) | d x = (cid:90) Ω (cid:112) |∇ u ( x ) | + |∇ v ( x ) | d x (15) ≤ (cid:90) Ω |∇ u ( x ) | + |∇ v ( x ) | d x = TV( u ) + TV( v ) (16)with equality if and only if E u ∩ E v = ∅ . This regulariser is called joint total variation in some commu-nities, see e.g. [3, 5, 11, 56] and vectorial total variation in others, see e.g. [2]. Remark 5.

Note that

JTV has the favourable property that if ∇ v = 0 , then JTV( u ) = TV( u ) , so thatit reduces to a well deﬁned regularisation in u in this degenerate case. Note that this property also holdslocally. Remark 6.

We would also like to note that there is a connection between

JTV and the singular valuesof J . Let σ , σ : Ω → [0 , ∞ ) be the two singular values of J , then we have JTV( u ) = (cid:90) Ω (cid:113) σ ( x ) + σ ( x ) d x . (17)Another strategy to favour edges at similar locations while reducing to a well-deﬁned regulariser inthe degenerate case is to introduce local weighting. Let w : Ω → [0 ,

1] be an edge indicator function for v such that w ( x ) = 1 when ∇ v ( x ) = 0 and a small value whenever |∇ v ( x ) | is large. For example, choose w ( x ) = η (cid:112) η + |∇ v ( x ) | (18)which is illustrated in Figure 6. The ﬁgure shows that with a medium η the weight w in (18) shows themain structures of the images so that these can be promoted in the other image. If η is too small, thenalso unwanted structures are captured in w such as a smooth background variation. If η is too large,then the structures start to disappear.For regularisers which are based on the image gradient ∇ u , the weighting w can be used to favouredges at certain locations by replacing ∇ by w ∇ . For instance, for H (3), TV (4) and TGV (5) thisstrategy results in wH ( u ) = (cid:90) Ω | w ( x ) ∇ u ( x ) | d x = (cid:90) Ω w ( x ) |∇ u ( x ) | d x (19)wTV( u ) = (cid:90) Ω | w ( x ) ∇ u ( x ) | d x = (cid:90) Ω w ( x ) |∇ u ( x ) | d x (20)wTGV( u ) = inf ζ (cid:90) Ω | w ( x ) ∇ u ( x ) − ζ ( x ) | + β | Eζ ( x ) | d x (21)7 mage η = − η = − η = − Figure 7: Inﬂuence of the parameter η on estimation of edge location and direction . The images onthe right show the vector ﬁeld ξ : Ω → R d which locally deﬁnes the inﬂuence of the regulariser, see e.g.(22). Here ”black” denotes that the magnitude of ξ , i.e. | ξ ( x ) | , is 0 and a bright colour denotes that | ξ ( x ) | is 1. The colours show the direction of the vector ﬁeld ξ modulo its sign.which we will refer to as weighted squared H -semi norm , weighted total variation and weighted totalgeneralised variation . wTV was used in [1, 10, 115]. A variant of wTV has been considered for singlemodality imaging in [116, 117] and extended to a variant of wTGV [118]. Remark 7.

The parameter η in w , see (18) , should be chosen in relation to |∇ v ( x ) | . A common strategyis to normalise the side information ﬁrst such that sup x ∈ Ω |∇ v ( x ) | = 1 . Then desirable values of η areusually within the range [0 . , . The same idea which resulted in isotropically ”weighted” variants of common regularisers can be usedanisotropically, i.e. by making the local weights vary with direction. Let us denote the anisotropicweighting by D : Ω → R d × d . Similar to the isotropic variant, one would like the weight to become theidentity matrix, i.e. D ( x ) = I , when ∇ v ( x ) = 0. In order to promote parallel level sets it is desirablethat D ( x ) ∇ u ( x ) should be small if ∇ u ( x ) (cid:107) ∇ v ( x ) and D ( x ) ∇ u ( x ) = ∇ u ( x ) if ∇ u ( x ) ⊥ ∇ v ( x ). Forexample D ( x ) = I − γξ ( x ) ξ (cid:62) ( x ) , ξ ( x ) = ∇ v ( x ) (cid:112) η + |∇ v ( x ) | (22)for γ ∈ (0 ,

1] (usually close to 1) and η > ∇ v ( x ) = 0 then ξ = 0 such that D ( x ) = I . Moreover, if ∇ u ( x ) (cid:107) ∇ v ( x ), then there exists an α such that ∇ u ( x ) = α ∇ v ( x )and D ( x ) ∇ u ( x ) = (cid:20) I − γη + |∇ v ( x ) | ∇ v ( x ) ∇ v (cid:62) ( x ) (cid:21) ∇ u ( x ) (23)= (cid:20) − γ |∇ v ( x ) | η + |∇ v ( x ) | (cid:21) ∇ u ( x ) . (24)The scalar weighting factor converges to 1 − γ for |∇ v ( x ) | → ∞ . Finally, if ∇ u ( x ) (cid:107) ∇ v ( x ), then clearly D ( x ) ∇ u ( x ) = ∇ u ( x ).The example of the matrix-ﬁeld D : Ω → R d × d in (22) is determined by the vector-ﬁeld ξ : Ω → R d which we visualise in Figure 7. The colours show the direction of the vector-ﬁeld modulo its sign (since ξ ( x ) ξ (cid:62) ( x ) is invariant to a change of sign) and the brightness indicate its magnitude | ξ ( x ) | . Note thatimages appear as colour versions of Figure 6 which shows the isotropic weighting w .8able 1: Examples of ﬁrst-order structure-promoting regularisers, see (32).regulariser deﬁnition B ( x ) y m φ ( x )H (3) y d | x | wH (19) w ( x ) y d | x | dH (25) D ( x ) y d | x | TV (4) y d | x | wTV (20) w ( x ) y d | x | dTV (26) D ( x ) y d | x | JTV (16) [ y, ξ ( x )] d × | x | TNV (31) [ y, ξ ( x )] d × | x | ∗ Using a matrix-ﬁeld in common regularisers lead to their ”directional” variantdH ( u ) = (cid:90) Ω | D ( x ) ∇ u ( x ) | d x (25)dTV( u ) = (cid:90) Ω | D ( x ) ∇ u ( x ) | d x (26)dTGV( u ) = inf ζ (cid:90) Ω | D ( x ) ∇ u ( x ) − ζ ( x ) | + β | Eζ ( x ) | d x . (27) Remark 8.

There is a connection between the particular choice of the matrix-ﬁeld D in (22) and theJacobian J . | D ( x ) ∇ u ( x ) | = |∇ u ( x ) − γη + |∇ v ( x ) | (cid:104)∇ u ( x ) , ∇ v ( x ) (cid:105)∇ v ( x ) | (28)= |∇ u ( x ) | − γη + γ (2 − γ ) |∇ v ( x ) | ( η + |∇ v ( x ) | ) (cid:104)∇ u ( x ) , ∇ v ( x ) (cid:105) . (29) For η = 0 , γ = 1 and |∇ v ( x ) | = 1 , then with (13) we have | D ( x ) ∇ u ( x ) | = |∇ u ( x ) | |∇ v ( x ) | − (cid:104)∇ u ( x ) , ∇ v ( x ) (cid:105) = det J (cid:62) ( x ) J ( x ) . (30) Thus, dH corresponds to penalising the determinant. This regulariser is widely used for joint recon-struction in geophysics under the name cross-gradient function since it is also the cross product of ∇ u ( x ) and ∇ v ( x ) , see e.g. [6, 7, 76, 77]. Similarly the dTV used for instance in medical imaging[10, 11, 13, 14, 16, 50] and remote sensing [15] can be seen as penalising the square root of the determi-nant. Another strategy to promote parallel level sets is via nuclear norm of the Jacobian which is deﬁnedas | J ( x ) | ∗ = (cid:80) min( d, i =1 σ i ( x ) where σ i ( x ) denotes the i th singular value of J ( x ). Using the nuclear normpromotes sparse vectors of singular values σ ( x ) = [ σ ( x ) , σ ( x )] and thereby parallel level sets. As aregulariser TNV( u ) = (cid:90) Ω | J ( x ) | ∗ d x (31)this strategy became known as total nuclear variation , see [9, 12, 63, 73].All ﬁrst-order regularisers of this section can be readily summarised in the following standard form J ( u ) = (cid:90) Ω φ [ B ( x ) ∇ u ( x )] d x (32)where B ( x ) : R d → R m is a an aﬃne transformation and φ : R m → R . For details how B and φ canbe chosen for speciﬁc regularisers to ﬁt this framework, see Table 1. It is useful for Jacobian-basedregularisers to use the reweighted Jacobian [ ∇ u ( x ) , ξ ( x )] with ξ ( x ) = η ∇ v ( x ) instead. Note that the solution to variational regularisation (2) with either ﬁrst- (32) or second-order structuralregularisation (5), (21), (27) can be cast into the general non-smooth composite optimisation formmin x F ( Ax ) + G ( x ) (33)9able 2: Mapping the variational regularisation models into the composite optimisation framework (33).In all cases we choose A x = Ku , F ( y ) = D ( y , b ) and G ( x ) = ı ≥ ( u ).regulariser deﬁnition x A x A x F ( y ) F ( y )H (3) u ∇ u - α (cid:107) y (cid:107) -wH (19) u w ∇ u - α (cid:107) y (cid:107) -dH (19) u D ∇ u - α (cid:107) y (cid:107) -TV (4) u ∇ u - α (cid:107) y (cid:107) , -wTV (20) u w ∇ u - α (cid:107) y (cid:107) , -dTV (26) u D ∇ u - α (cid:107) y (cid:107) , -JTV (16) u [ ∇ u,

0] - α (cid:107) y − [0 , ξ ] (cid:107) , -TNV (31) u [ ∇ u,

0] - α (cid:107) y − [0 , ξ ] (cid:107) ∗ , -TGV (5) ( u, ζ ) ∇ u − ζ Eζ α (cid:107) y (cid:107) , αβ (cid:107) y (cid:107) , wTGV (21) ( u, ζ ) w ∇ u − ζ Eζ α (cid:107) y (cid:107) , αβ (cid:107) y (cid:107) , dTGV (27) ( u, ζ ) D ∇ u − ζ Eζ α (cid:107) y (cid:107) , αβ (cid:107) y (cid:107) , with F ( y ) = (cid:80) ni =1 F i ( y i ) and Ax = [ A x, . . . , A n x ], see Table 2. We denote by (cid:107) · (cid:107) , , (cid:107) · (cid:107) and (cid:107) · (cid:107) ∗ , discretisations of z (cid:55)→ (cid:90) Ω | z ( x ) | d x , z (cid:55)→ (cid:90) Ω | z ( x ) | d x and z (cid:55)→ (cid:90) Ω | z ( x ) | ∗ d x . (34) A popular algorithm to solve (33) and therefore (2) is the primal-dual hybrid gradient (PDHG) [119, 120],see Algorithm 1. It consists of two simple steps only involving basic linear algebra and the evaluationof the operator A and its adjoint A ∗ . Moreover, it involves the computation of the proximal operator of τ G and the convex conjugate of σ F ∗ where τ and σ are scalar step sizes. The proximal operator of afunctional H is deﬁned as prox H ( z ) := arg min x (cid:26) (cid:107) x − z (cid:107) + H ( x ) (cid:27) . (35)The proximal operator can be computed in closed-form for (cid:107) · (cid:107) , and (cid:107) · (cid:107) . It also also be computedin closed-form for (cid:107) · (cid:107) ∗ , if either the number channels or the dimension of the domain are strictly lessthan 5, i.e. m, d <

5, see [63] for more details. Note also that the proximal operator of α F ( · − ξ ) canbe readily computed based on the proximal operator of F . More details on proximal operators, convexconjugates and examples can be found for example in [121–124].For some applications (e.g. x-ray tomography) a preconditioned [42, 125] or randomised [42, 126]variant can be useful but we will not consider these here for simplicity. Algorithm 1

Primal-dual hybrid gradient (PDHG) to solve (33). Default values given in brackets.

Input: iterates x (= 0), y (= 0), step size parameter ρ (= 1) Initialize: extrapolation x = x , step sizes σ = ρ/ (cid:107) A (cid:107) , τ = 0 . / ( ρ (cid:107) A (cid:107) ) for k = 1 , . . . do x + = prox τ G ( x − τ A ∗ y ) y + = prox σ F ∗ ( y + σA (2 x + − x )) Since the operator norms (cid:107) A i (cid:107) , i = 1 , . . . n can vary signiﬁcantly, it is often advisably to ”prewhiten” theproblem by recasting it as min x ˜ F ( ˜ Ax ) + G ( x ) . (36)with ˜ F ( y ) := (cid:80) ni =1 F i ( (cid:107) A i (cid:107) · y i ) and ˜ A i x := A i x/ (cid:107) A i (cid:107) . Then trivially (cid:107) ˜ A i (cid:107) = 1 , i = 1 , . . . , n so that alloperator norms are equal. Note that the proximal operator of σ ˜ F is simple to compute if the proximal10 -ray ground truth side information data super-resolution ground truth side information data Figure 8:

Test cases for numerical experiments . Top: x-ray reconstruction from sparse views andfailed detectors, bottom: super-resolution by a factor of 5 and Gaussian noise.operators of σ F i , i = 1 , . . . , n are simple to compute, since[prox σ ˜ F ( y )] i = λ − i [prox σλ i F i ( λ i y i )] , (37)for any λ i >

0, see for instance [92, Lemma 6.136].

This section describes numerical experiments to compare ﬁrst- and second-order structure-promotingregularisers.

Software

The numerical computations are carried out in Python using ODL (version 1.0.0.dev0) [127]and ASTRA [128, 129] for computing line integrals in the tomography example. The source code whichreproduces all experiments in this chapter can be found at https://github.com/mehrhardt/Multi-Modality-Imaging-with-Structural-Priors . Data

We consider two test cases with diﬀerent characteristics, both of which are visualised in Figure8. The ﬁrst test case, later referred to as x-ray , is parallel beam x-ray reconstruction from only 15 viewswhere additionally some detectors are broken. The latter is modelled by salt-and-pepper noise where 5% of all detectors are corrupted. We aim to recover an image with domain [ − , discretised with 200 pixels. The simulated x-ray camera has 100 detectors and a width of 3 in the same dimensions as theimage domain. Therefore, the challenges are 1) sparse views, 2) small number of detectors and 3) brokendetectors.The second test case, which we refer to as super-resolution , considers the task of super-resolution.Also here we aim to recover an image with domain [ − , discretised with 200 pixels. The forwardoperator is integrating over 5 pixels, thus mapping images of size 200 to images of size 40 . In addition,Gaussian noise of mean zero and standard deviation of 0.01 is added. Algorithmic parameters

We chose the default value ρ = 1 for balancing the step sizes in PDHG andran the algorithm for 3,000 iterations without choosing a speciﬁc stopping criterion.11 dge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) edge weightinglow: η =1e − η =1e − η =1e − .

759 29.50 .

896 23.80 . wH ( α = ) .

887 32.90 .

962 26.40 . w T V ( α = ) .

632 34.30 .

960 29.40 . w T GV ( α = , β = e − ) Figure 9: Eﬀect of edge weighting on locally weighted models for test case x-ray : increasing edgeparameter η from left to right. All other parameters where tuned to maximize the PSNR and visualimage quality. The multiplicative scaling of an unconstrained optimisation problem is arbitrary, nevertheless we reportthe absolute values here for completeness. For simplicity, all regularisation parameters are shown asmultiples of 1e −

4. The ﬁgures at the bottom right of each image are PSNR and SSIM. x-ray

Eﬀect of edge weighting

All structure-promoting regularisers described in Section 3 have in commonthat they rely to some extend on the size of edges in the side information, i.e. |∇ v ( x ) | . For JTV andTNV the actual values of |∇ v ( x ) | matter so that a parameter η is needed to correct for this. For allother regularisers a parameter η is needed to decide which edges to trust and which not. The eﬀect ofthis edge weighting parameter η on all described regularisers is illustrated in Figures 9, 10 and 11. Thelocally weighted regularisers (i.e. wH , wTV and wTGV) and the directional regularisers (i.e. dH , dTVand dTGV) have in common that if η is too small, then small artefacts around the edges appear. Thiseﬀect is more pronounced in locally weighted regularisers. If η is too large, then the structure-promotingeﬀect becomes too small. For joint total variation and total nuclear variation similar eﬀects exist withreverse relationship to η . Eﬀect of regularisation

The eﬀect of the regularisation parameter α on the solution is illustratedin Figures 12, 13 and 14. All regularisers show the same behaviour if α is too small or too large. Ifthe regularisation parameter is chosen too small then artefacts from inverting an ill-posed operator areintroduced and if it is chosen too large then all regularisers oversmooth the solution. Note that allstructure-promoting regularisers have an increased robustness in areas of shared structures.12 dge weightinglow: η =1e − η =1e − η =1e − .

862 30.90 .

914 24.20 . d H ( α = ) .

959 37.30 .

976 27.20 . d T V ( α = ) .

938 39.90 .