[PDF] Joint super-resolution image reconstruction and parameter identification in imaging operator: Analysis of bilinear operator equations, numerical solution, and application to magnetic particle imaging

Abstract

One important property of imaging modalities and related applications is the resolution of image reconstructions which relies on various factors such as instrumentation or data processing. Restrictions in resolution can have manifold origins, e.g., limited resolution of available data, noise level in the data, and/or inexact model operators. In this work we investigate a novel data processing approach suited for inexact model operators. Here, two different information sources, high-dimensional model information and high-quality measurement on a lower resolution, are comprised in a hybrid approach. The joint reconstruction of a high resolution image and parameters of the imaging operator are obtained by minimizing a Tikhonov-type functional. The hybrid approach is analyzed for bilinear operator equations with respect to stability, convergence, and convergence rates. We further derive an algorithmic solution exploiting an algebraic reconstruction technique. The study is complemented by numerical results ranging from an academic test case to image reconstruction in magnetic particle imaging.

Full PDF

JJoint super-resolution image reconstruction and parameteridentiﬁcation in imaging operator: Analysis of bilinear operatorequations, numerical solution, and application to magnetic particleimaging

Tobias Kluth ∗ Christine Bathke ∗ Ming Jiang † Peter Maass ∗ April 29, 2020

Abstract

One important property of imaging modalities and related applications is the resolution of imagereconstructions which relies on various factors such as instrumentation or data processing. Restrictionsin resolution can have manifold origins, e.g., limited resolution of available data, noise level in the data,and/or inexact model operators. In this work we investigate a novel data processing approach suited forinexact model operators. Here, two diﬀerent information sources, high-dimensional model informationand high-quality measurement on a lower resolution, are comprised in a hybrid approach. The jointreconstruction of a high resolution image and parameters of the imaging operator are obtained by min-imizing a Tikhonov-type functional. The hybrid approach is analyzed for bilinear operator equationswith respect to stability, convergence, and convergence rates. We further derive an algorithmic solutionexploiting an algebraic reconstruction technique. The study is complemented by numerical results rang-ing from an academic test case to image reconstruction in magnetic particle imaging.

Keywords : mathematical imaging, hybrid models, super-resolution, joint parameter identiﬁcation, mag-netic particle imaging

Enhancing the resolution in image reconstructions is a never ending challenge in medical imaging. Diagnosticquality or the potential for novel applications such as molecular or multi-modal imaging crucially dependon advances in image resolution [15, 21]. Obtaining a better resolution can be achieved by either improvingthe measurement technology or by advances in data processing. In this paper we will address the secondapproach in a setting which is motivated by the particular case of magnetic particle imaging (MPI) to beintroduced later in this section.For motivation we start with a general task of an inverse problem of reconstructing a two-dimensionalimage c : Ω → R with Ω = [0 , from measured data u : D → R d with D ⊂ R ˜ d , i.e., u is a d -dimensional dataset deﬁned on a ˜ d -dimensional domain. Image and data are related by a measurement process A : X → Z ,i.e., Ac ∼ u , for some suitably deﬁned function spaces X, Z , where X is called image space and Z data space.The general task in image reconstruction is to determine an approximation of c from given u and A . In allapplications only a sampled and noisy version u δ of u is measured and a discretized reconstruction of c issought after [27, 38, 35, 41, 39, 3].The achievable resolution in medical image reconstruction can be limited for several reasons: • the available data u δ has limited resolution, ∗ Center for Industrial Mathematics, University of Bremen, 28357 Bremen, Germany ( [email protected],[email protected], [email protected] ) † Department of Information and Computing Sciences, School of Mathematical Sciences, Peking University, Beijing 100871,China ( [email protected] ) a r X i v : . [ m a t h . NA ] A p r the noise level of the data prohibits high quality reconstructions • the model operator is inexact and allows only for a limited spatial accuracy in the image space.For an overview of the achievable resolution of diﬀerent medical imaging technologies see [33]. In the presentpaper we address a particular setting, which is motivated by modeling the inversion process in magneticparticle imaging (MPI), see [30, 34]. The MPI problem is the reconstruction of an unknown distribution ofnanoparticle concentration c : Ω → R inside the body from voltage measurements u : [0 , T ] → R d induced byan electromagnetic ﬁeld of the magnetized nanoparticles. Measurements are commonly obtained from threemeasurement coil units, which results in d = 3. Alternatively, the measurement signals are often transformedin Fourier-space, which then results in D = N and d = 6 after separating real and imaginary parts of thedata. We will use this application as motivation but consider the super resolution problem in a generalvariational setting.The basic analytical model of MPI and other imaging reconstruction problems, [13, 35, 39], is given bya linear integral equation (cid:90) Ω s ( x, t ) c ( x )d x = u δ ( t ) . (1.1)Here, s ( x, t ) is the system or point spread function. The system function s can either be determinedexperimentally by placing a delta probe at position x , i.e., c ( · ) = δ ( · − x ), and measuring the resultingdata u , which yields s ( x , t ) = u ( t ), or it can be derived from ﬁrst-principle physical-mathematical modeling.However, taking MPI as motivation, we encounter the situation that the experimental approach is verydelicate and time consuming. Hence, the experimental approach will yield the precise system function s ( x i , t ) but only for a small set of sample points and with a very coarse resolution. This model is called s calib . On the other hand, a physical-mathematical model of s can be evaluated with arbitrary resolution;however, several models are not suitable for the purpose of image reconstruction as they neglect eﬀectssuch as particle magnetization dynamics, size eﬀects of the nanoparticles or particle-particle interaction[28, 16, 32]. More recently, progress has been made in the development of a suitable model [31]. This modelwill be called s mod .Similar situations of having a low-quality, high-resolution and a high-quality, low-resolution model occurin several other imaging applications. E.g., this is typical in bi-modal imaging [22, 2] or in molecularimaging (MALDI imaging) [42]. More speciﬁcally, we consider applications where high-quality calibrationmeasurements can be performed on a rather coarse grid describing the image-measurement relationshipaccurately. Unfortunately, the improved accuracy is then accompanied by a restriction in resolution. Oneimportant question is, how to connect this experimental ”expert” knowledge with a model based approach.One possible answer is to simultaneously determine parameters of the forward operator when reconstructingthe desired image.In this situation, it is natural to regard s as an additional variable and to consider the bilinear inverseproblem with operator B : X × Y → Z ( c, s ) (cid:55)→ (cid:82) Ω s ( x, · ) c ( x )d x . (1.2) Y denotes a suitable function space for modeling system functions s . In addition we need an operator P linking a high resolution system function to a low resolution approximation, such as a projection operator.Both, Y and P will be speciﬁed in the next section.Introducing suitable penalty functionals, the inverse problem of reconstructing simultaneously an updatefor s mod and a reconstruction of c can be formulated as a Tikhonov regularization scheme deﬁned by thefunctional J δα,β,γ,µ ( c, s ) = 12 (cid:107) B ( c, s ) − u δ (cid:107) Z + γ (cid:107) s − s mod (cid:107) Y + µ (cid:107) P ( s ) − s calib (cid:107) Y + α R c ( c ) + β R s ( s ) . (1.3)In contrast to the sole image reconstruction problem, the additional degree of freedom allows to compensatepotential errors in the modeled system function s mod but also causes a largely underdetermined problemwhich requires a priori knowledge on s being the parameters of the forward operator.For system functions satisfying s ( x, t ) = s ( x − t ) this setting is identical to the well known problemof blind deconvolution, see [36, 4, 26] and the references therein for an overview of related regularizationapproaches. Indeed, [4] has partially inﬂuenced the approach of the present paper.2lso, there exists a large and somewhat complete body of literature related to general inverse problemsin Hilbert and Banach space settings [19, 14, 41, 24]. [19] is of particular importance for the present paperand to some extend one can regard the present paper as specifying their results to the functional deﬁned in(1.3). Moreover, machine learning approaches for inverse problems have been investigated intensively in thepast few years, for a review of the present state of the art see [1].The precise mathematical setting of the super-resolution problem discussed in the present paper willbe deﬁned in the next section. We then discuss the analytic properties of the Tikhonov functional as wellas the regularization properties of its minimizers in Section 3. A Kaczmarz-type algorithm minimizing theconsidered Tikhonov functional is presented in Section 4. Finally, we apply this approach to an academic testproblem as well as to real data obtained from an MPI experiment in Section 5 and conclude with discussionsin Section 6. In the following we deﬁne the mathematical setting of the problem, formulate a variational approach for itssolution, and distinguish possible perspectives on the resulting problem. Let

X, Y, Z be Hilbert spaces andlet B : X × Y → Z , ( c, s ) (cid:55)→ B ( c, s ) denote a bilinear operator as deﬁned in (1.2) where X × Y is equippedwith the canonical inner product (cid:104) ( c , s ) , ( c , s ) (cid:105) X × Y = (cid:104) c , c (cid:105) X + (cid:104) s , s (cid:105) Y .The problem of interest is to obtain an approximate solution ( c, s ) ∈ X × Y from noisy measurements u δ ∈ Z . As usual we assume that a physically exact solution ( c ∗ , s ∗ ) exists and that noisy data u δ satisfying (cid:107) u δ − u ∗ (cid:107) Z ≤ δ , where u ∗ = B ( c ∗ , s ∗ ) (2.1)is the true data. For given u δ and B , the inverse problem is to determine a suitable approximation for( c ∗ , s ∗ ).In the setting of the present paper we address the problem of super-resolution by including two piecesof information for the system function s . We assume that we have two diﬀerent approximations of thetrue system function s ∗ : s mod is obtained from a theoretical but incomplete model in the original inﬁnite-dimensional (or high-dimensional) space (type A), s calib is obtained in a high-quality calibration procedurebut on a ﬁnite-dimensional (or lower-dimensional) subspace (type B). We further distinguish two sub-casesfor s mod .(A) Let s mod , s mod ,(cid:15) ∈ Y . We either assume that a ﬁxed model s mod is given or that a model hierarchy s mod ,(cid:15) with varying accuracy (cid:15) is available:(i) s mod is used as a high-resolution reference model of limited accuracy for the system function.This will be used for the formulation of an additional penalty term in a Tikhonov regularizationscheme.(ii) s mod ,(cid:15) is assumed to be a high-dimensional model, which can be obtained with diﬀerent levels ofaccuracy fulﬁlling (cid:107) s mod ,(cid:15) − s ∗ (cid:107) Y ≤ (cid:15) . This type of information is more suitable for formulatingan alternative discrepancy term in a Tikhonov functional.(B) The low-resolution, high-qualtity approximation s calib of the system functions s ∗ is modeled by anelement in Y n ⊂ Y , where Y n is a ﬁnite-dimensional space (if Y is already m -dimensional with m < ∞ ,let n < m ). Let P : Y → Y n be a linear and bounded operator mapping onto Y n and we assume that s calib ∼ P ( s ∗ ) ∈ Y n is an almost perfect but low-dimensional observation of the true system function.This will serve as a reference for P ( s ) in a penalty term. From an application point of view this canbe interpreted for example as an observation on a coarse spatial grid. Remark 2.1.

Note that we do not explicitly require P to be a projection operator, which would imply P = P and (cid:107) P (cid:107) = 1 for an orthogonal projection. The above assumption, that P is linear and bounded,will be suﬃcient for the following theoretical analysis. c, s ) with amulti-criteria penalty term. For case A(i) we consider J δα,β,γ,µ ( c, s ) = 12 (cid:107) B ( c, s ) − u δ (cid:107) Z (cid:124) (cid:123)(cid:122) (cid:125) =: D (( c,s ) ,u δ ) + γ (cid:107) s − s mod (cid:107) Y + µ (cid:107) P ( s ) − s calib (cid:107) Y + α R c ( c ) + β R s ( s ) (cid:124) (cid:123)(cid:122) (cid:125) =:( α,β,γ,µ ) t R ( c,s ) (2.2)with α, β, γ, µ ≥ R ( c, s ) = (cid:18) R c ( c ) , R s ( s ) , (cid:107) s − s mod (cid:107) Y , (cid:107) P ( s ) − s calib (cid:107) Y (cid:19) t where R c : X → R + and R s : Y → R + are proper, convex, and weakly lower semi-continuous penalty termswith respect to c and s . The choice of the regularization parameters ( α, β, γ, µ ) is crucial, of course, and inparticular we consider the case, that their ratios ( β/α, γ/α, µ/α ) are ﬁxed.For A(ii) we ﬁx γ > J δ,(cid:15)α,β,µ ( c, s ) = 12 (cid:107) B ( c, s ) − u δ (cid:107) Z + γ (cid:107) s − s mod ,(cid:15) (cid:107) Y (cid:124) (cid:123)(cid:122) (cid:125) =: D (( c,s ) , ( u δ ,s mod ,(cid:15) )) + µ (cid:107) P ( s ) − s calib (cid:107) Y + α R c ( c ) + β R s ( s ) (cid:124) (cid:123)(cid:122) (cid:125) =:( α,β,µ ) t R ( c,s ) (2.3)where α, β, µ, R c , R s and R are analogously chosen to case A(i). Remark 2.2.

The precise deﬁnition of suitable penalty terms R c and R s depends on the desired type ofreconstruction. Typical choices are TV- [12, 7, 44] or L p -norms with p = 1 , or combinations of such norms. Remark 2.3.

The general nature of the operator P might allow alternative strategies to obtain higherresolution potentially without model knowledge, i.e., γ = 0 . This requires carefully selected choices of P and R s , for example, speciﬁc sampling patterns and a sparsity constraint in a DCT basis have been exploited formagnetic particle imaging [20]. In this section we discuss the basic analytic properties of the Tikhonov functionals (2.2) and (2.3) and oftheir minimizers. The analysis rests on the framework introduced in [19] for non-linear inverse problems ina general function space setting and on [26], which discusses bilinear operator equations in more detail.We ﬁrst discuss the assumptions on the bilinear operator B needed for obtaining existence and conver-gence results. We then discuss the cases A(i) and A(ii) separately. B As already stated, the analysis of this paper is based on the results of [19, 26] and we will use the sameassumptions as introduced in these papers.

Assumption 3.1.

Let

X, Y, Z be Hilbert spaces and let B : X × Y → Z , ( c, s ) (cid:55)→ B ( c, s ) be a bilinear operatorwhere X × Y is equipped with the canonical inner product (cid:104) ( c , s ) , ( c , s ) (cid:105) X × Y = (cid:104) c , c (cid:105) X + (cid:104) s , s (cid:105) Y .(i) There exists a constant C > such that (cid:107) B ( c, s ) (cid:107) Z ≤ C (cid:107) c (cid:107) X (cid:107) s (cid:107) Y for all ( c, s ) ∈ X × Y .(ii) B is sequentially weak-weak continuous, i.e., for any sequence { ( c k , s k ) } k ∈ N ⊂ X × Y , ( c ∗ , s ∗ ) ∈ X × Y with ( c k , s k ) (cid:42) (¯ c, ¯ s ) it holds B ( c k , s k ) (cid:42) B (¯ c, ¯ s ) . In the following we use the slightly shorter term ofa weakly continuous operator. Remark 3.1.

For a ﬁxed s ∈ Y the resulting imaging operator is given by A : X → Z , c (cid:55)→ B ( c, s ) and theimaging problem then is to compute a c ∈ X for a given u δ ∈ Z by solving A ( c ) ∼ u δ . (3.1)4 et s calib ∈ Y n ⊂ Y be obtained from a true s ∗ ∈ Y by s calib = P ( s ∗ ) . Such a projection of a truehigh-resolution s ∗ may be derived in a calibration procedure. Then the resulting reduced imaging operator A n : X → Z , c (cid:55)→ B ( c, P ( s ∗ )) = B ( c, s calib ) allows the formulation of the reduced imaging problem, i.e.,ﬁnding c ∈ X for u δ ∈ Z by solving A n ( c ) ∼ u δ . (3.2) Note that depending on the actual deﬁnition of P and B it might be possible to deﬁne the reduced imagingoperator on subspaces ˜ X ⊂ X and ˜ Z ⊂ Z . Remark 3.2.

Two simple examples for bilinear operators B are a convolution operator or a Fredholmintegration operator of the ﬁrst kind, where s is either the convolution or a general integral kernel. Thelatter one can be used to describe the general MPI setup, i.e., let Ω ⊂ R be a bounded domain and I = (0 , T ) , < T < ∞ , is the time interval in which a measurement is obtained. Choosing the functionspaces X = L (Ω) , Y = L (Ω × I ) , and Z = L ( I ) the bilinear operator of interest is given by B ( c, s )( t ) = (cid:90) Ω c ( x ) s ( x, t )d x, (3.3) which describes the relation between nanoparticle concentration c and voltage measurement B ( c, s ) for onereceive coil unit. Note that the subsequent theory is build on the assumption that the operator B is weakly continuouswhich is a weaker assumption as for example the strong continuity used in [4] for a special case of ourfunctional. Nevertheless, the function space setup in Remark 3.2 which is adapted from Example 1 in [4]is not suﬃcient to show weak continuity of a general B which also implies that strong continuity does nothold. The following two lemmata show two potential adaptations of the examples in Remark 3.2, whichthen fulﬁll the stated assumptions. One can either change the inﬁnite-dimensional function space setup byutilising embedding theorems or alternatively use a ﬁnite-dimensional adaptation. Lemma 3.1.

Let Ω ⊂ R n and I ⊂ R m be bounded domains. Let X = L (Ω) , Y = H s (Ω × I ) , and Z = L ( I ) for s > . Then the operator B : X × Y → Z , B ( c, s )( t ) = (cid:82) Ω c ( x ) s ( x, t )d x is weakly continuous.Proof. Let { ( c k , s k ) } k ∈ N ⊂ X × Y , ( c ∗ , s ∗ ) ∈ X × Y with ( c k , s k ) (cid:42) ( c ∗ , s ∗ ). For arbitrary φ ∈ L ( I ) let ψ k ( x ) = (cid:82) I s k ( x, t ) φ ( t )d t and ψ ∗ ( x ) = (cid:82) I s ∗ ( x, t ) φ ( t )d t . We then obtain (cid:107) ψ k − ψ ∗ (cid:107) L (Ω) ≤ (cid:107) s k − s ∗ (cid:107) L (Ω × I ) (cid:107) φ (cid:107) L ( I ) . (3.4)Weak convergence of s k (cid:42) s ∗ and the compact embedding H s (Ω × I ) (cid:44) → L (Ω × I ) imply strong convergencein L (Ω × I ). Thus ψ k → ψ ∗ in L (Ω). Then we consider for arbitrary φ ∈ L ( I ) |(cid:104) B ( c k , s k ) − B ( c ∗ , s ∗ ) , φ (cid:105)| ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω ( c k ( x ) − c ∗ ( x )) ψ k ( x )d x (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω (cid:90) I c ∗ ( x ) φ ( t )( s k ( x, t ) − s ∗ ( x, t ))d x d t (cid:12)(cid:12)(cid:12)(cid:12) As c k (cid:42) c ∗ and ψ k → ψ ∗ in L (Ω), the ﬁrst term converges to zero. Convergence to zero of the secondsummand follows immediately from s k (cid:42) s ∗ and ( c ∗ φ ) ∈ L (Ω × I ) deﬁning a linear functional. As thisholds for any φ ∈ L ( I ) this concludes the proof.Weak continuity can also be proved in a ﬁnite dimensional setting, which e.g., can be justiﬁed by usinga ﬁnite dimensional approximation of a compact operator B using its singular value decomposition. Lemma 3.2.

Let Ω ⊂ R n and I ⊂ R m be bounded domains. X = L (Ω) , Z = L ( I ) and let { Ψ k } Kk =1 ⊂ L (Ω × I ) denote a set of K orthonormal functions in L (Ω × I ) . For Y = R K and κ = ( κ , .., κ K ) t ∈ Y wedeﬁne the operator B : X × Y → Z by B ( c, κ )( t ) = (cid:82) Ω c ( x ) (cid:80) Kk =1 κ k Ψ k ( x, t )d x . Then B is weakly continuous.Proof. The assertion follows analogously to the proof of Lemma 3.1 while weak convergence with respect to Y directly implies strong convergence due to the ﬁnite dimension. Remark 3.3.

Additionally, one may consider a nonlinear dependence of B on the system function s as-suming weak sequantial closedness of the operator. Theoretical results on minimizer existence, stability andconvergence may be derived in an analogous way. However, the results on convergence rates rely on the bi-linearity of B . In the present paper we stay with the bilinear setting, the extension to a nonlinear dependenceis beyond the scope of this paper and remains future work. .2 Basic properties We further provide some basic properties of the functionals appearing in our problem formulation.

Lemma 3.3.

Let P : Y → Y n ⊂ Y be a linear and bounded operator, P (cid:54) = 0 , and s calib ∈ Y n . Then thefunctional T : Y → [0 , ∞ ] s (cid:55)→ (cid:107) P ( s ) − s calib (cid:107) (3.5) is proper, convex and weakly lower semi-continuous. See Appendix A for a proof of this Lemma.In general, an ill-posed inverse problem may have multiple solutions, hence we introduce the usual conceptof an R c -minimizing solution. Deﬁnition 3.1.

For s ∗ ∈ Y and u ∗ ∈ range ( B ) ⊂ Z an element c † ∈ X is called an R c -minimizing solutionif c † = arg min c {R c ( c ) | B ( c, s ∗ ) = u ∗ } . (3.6)In the next lemma we state the derivative of a bilinear operator in a Hilbert space setting and an inequalityto be used later. Lemma 3.4.

For a bilinear operator B : X × Y → Z with (cid:107) B ( c, s ) (cid:107) Z ≤ C (cid:107) c (cid:107) X (cid:107) s (cid:107) Y the Fr´echet derivativeat ( c, s ) ∈ X × Y is given by B (cid:48) ( c, s )( x, y ) = B ( c, y ) + B ( x, s ) (3.7) for any ( x, y ) ∈ X × Y . The residual of the ﬁrst degree Taylor expansion satisﬁes (cid:107) B ( c + x, s + y ) − B ( c, s ) − B (cid:48) ( c, s )( x, y ) (cid:107) Z ≤ C (cid:107) ( x, y ) (cid:107) X × Y . (3.8)Again, this result follows from standard arguments in functional analysis. For completeness a proof isincluded in Appendix A. In this setting we exploit a high-resolution reference s mod to formulate an additional penalty term for thesystem function s . We thus consider the functional J δα,β,γ,µ ( c, s ) = 12 (cid:107) B ( c, s ) − u δ (cid:107) Z (cid:124) (cid:123)(cid:122) (cid:125) =: D (( c,s ) ,u δ ) + γ (cid:107) s − s mod (cid:107) Y + µ (cid:107) P ( s ) − s calib (cid:107) Y + α R c ( c ) + β R s ( s ) (cid:124) (cid:123)(cid:122) (cid:125) =:( α,β,γ,µ ) t R ( c,s ) (3.9)including four regularization parameters ( α, β, γ, µ ). For ﬁxed ratios of regularization parameters, i.e., ν = µ/α , ν = β/α , and ν = γ/α the desired regularization properties follow immediately from the generaltheory in [19]. In the following theorem we consider a slightly more general setting, where these ratios areonly obtained asymptotically. Theorem 3.1.

Let Assumption 3.1 be fulﬁlled. Let R c : X → R + and R s : Y → R + be proper, convex, andweakly lower semi-continuous. Let P : Y → Y n ⊂ Y be a linear and bounded operator. Then the followingholds:(i) The functional J δα,β,γ,µ as deﬁned in (3.9) has a minimizer.(ii) (Continuity for ﬁxed regularization parameters) Let the regularization parameters α, β, γ, µ > beﬁxed. Consider the sequence ( u δ j ) j ∈ N with u δ j → u δ . Let ( c j , s j ) denote a minimizer of J δ j α,β,γ,µ withnoisy u δ j . Then there exists a weakly convergent subsequence of ( c j , s j ) and the limit of every weaklyconvergent subsequence is a minimizer (˜ c, ˜ s ) of the functional J δα,β,γ,µ . Moreover, for each weaklyconvergent subsequence ( c i , s i ) , ( α, β, γ, µ ) t R (( c i , s i )) → ( α, β, γ, µ ) t R ((˜ c, ˜ s )) . iii) (Convergence for diminishing regularization parameters) Assume that the data sequence ( u δ j ) j with (cid:107) u δ j − u ∗ (cid:107) ≤ δ j for δ j → is given. Assume that the regularization parameters are chosen accordingto the noise level such that α j = α ( δ j ) , β j = β ( δ j ) , γ j = γ ( δ j ) , and µ j = µ ( δ j ) are monotonicallydecreasing and fulﬁll α j → , β j → , γ j → , and µ j → . Further assume that lim j →∞ δ j α j = 0 , lim j →∞ β j α j = ν lim j →∞ γ j α j = ν and lim j →∞ µ j α j = ν (3.10) holds for some < ν , ν , ν < ∞ such that ν ≤ β j /α j , ν ≤ µ j /α j , and γ j /α j ≤ ν . Let ( c j , s j ) j := (cid:16) c δ j α j ,β j ,γ j ,µ j , s δ j α j ,β j ,γ j ,µ j (cid:17) j (3.11) denote the minimizing sequence of (3.9) obtained from noisy u δ j .Then there exists a weakly convergent subsequence of ( c j , s j ) j . The limit of every weakly convergentsubsequence of ( c j , s j ) j is an (1 , ν , ν , ν ) t R -minimizing solution.Proof. Assertion (i) and (ii) immediately follow from [19, Theorems 3.1, 3.2]. For (iii) we exploit that α j (1 , ν , ν , ν ) t R ( c, s ) ≤ ( α j , β j , γ j , µ j ) t R ( c, s ) for any ( c, s ) ∈ X × Y . We obtain from12 (cid:107) B ( c j , s j ) − u δ j (cid:107) Z + ( α j , β j , γ j , µ j ) t R ( c j , s j ) ≤ δ j + ( α j , β j , γ j , µ j ) t R ( c ∗ , s ∗ ) (3.12)that lim j →∞ (cid:107) B ( c j , s j ) − u δ j (cid:107) Z = 0 and (1 , ν , ν , ν ) t R ( c j , s j ) ≤ δ j α j + α j ( α j , β j , γ j , µ j ) t R ( c ∗ , s ∗ ). Thisimplieslim sup j →∞ (1 , ν , ν , ν ) t R ( c j , s j ) ≤ lim sup j →∞ α j ( α j , β j , γ j , µ j ) t R ( c j , s j ) ≤ (1 , ν , ν , ν ) t R ( c ∗ , s ∗ ) . We thus obtainlim sup j →∞ (cid:18) (cid:107) B ( c j , s j ) − u δ j (cid:107) Z + α (1 , ν , ν , ν ) t R ( c j , s j ) (cid:19) ≤ lim sup j →∞ (cid:18) (cid:107) B ( c j , s j ) − u δ j (cid:107) Z + α j (1 , ν , ν , ν ) t R ( c j , s j ) (cid:19) + lim sup j →∞ (cid:0) ( α − α j )(1 , ν , ν , ν ) t R ( c j , s j ) (cid:1) ≤ α (1 , ν , ν , ν ) t R ( c ∗ , s ∗ ) < ∞ . The assertion (iii) then follows analogously by the remaining steps in the proof of [19, Theorem 3.5].A ﬁrst convergence rate result can be obtained by making the following general assumption.

Assumption 3.2.

Let Assumption 3.1 be fulﬁlled. Further assume(i) There exists an (1 , ν , ν , ν ) t R -minimizing solution ( c ∗ , s ∗ ) ∈ X × Y .(ii) There exist constants κ ∈ [0 , , κ ≥ , and a subgradient ( ξ c ∗ , ξ s ∗ ) ∈ ∂ (1 , ν , ν , ν ) t R ( c ∗ , s ∗ ) , suchthat (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c ∗ − c, s ∗ − s ) (cid:105) ≤ κ D ( ξ c ∗ ,ξ s ∗ )(1 ,ν ,ν ,ν ) t R (( c, s ) , ( c ∗ , s ∗ )) + κ (cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) (cid:107) (3.13) for all ( c, s ) ∈ { ( c, s ) ∈ X × Y | J δα max ,α max ν ,α max ν ,α max ν ( c, s ) ≤ M } and M > α max ((1 , ν , ν , ν ) t R ( c ∗ , s ∗ )+ δ /α ) for given < δ, α < ∞ , α ≤ α max . These are the assumptions required for [19, Theorem 4.4] and we directly obtain the following result onconvergence rates.

Theorem 3.2.

Let u δ ∈ Z with (cid:107) u ∗ − u δ (cid:107) Z ≤ δ . Let Assumption 3.2 be fulﬁlled for all ( δ, α ) tuples deﬁnedbelow. For < α ≤ α max < ∞ , β = ν α , γ = ν α , and µ = ν α the minimizer of the functional J δα,β,γ,µ asdeﬁned in (3.9) is denoted by ( c α , s α ) . Further assume α ∼ δ . Then it holds D ( ξ c ∗ ,ξ s ∗ )(1 ,ν ,ν ,ν ) t R (( c, s ) , ( c ∗ , s ∗ )) = O ( δ ) and (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) = O ( δ ) . (3.14)7 .4 Setup A(ii) In this setting we consider a high-resolution approximation s mod ,(cid:15) of the true but unknown system function s ∗ satisfying (cid:107) s mod ,(cid:15) − s ∗ (cid:107) Y ≤ (cid:15). As already discussed in the previous section, one can consider s mod ,(cid:15) asadditional data in this setting. Hence, we ﬁx γ > J δ,(cid:15)α,β,µ ( c, s ) = 12 (cid:107) B ( c, s ) − u δ (cid:107) Z + γ (cid:107) s − s mod ,(cid:15) (cid:107) Y (cid:124) (cid:123)(cid:122) (cid:125) =: D (( c,s ) , ( u δ ,s mod ,(cid:15) )) + µ (cid:107) P ( s ) − s calib (cid:107) Y + α R c ( c ) + β R s ( s ) (cid:124) (cid:123)(cid:122) (cid:125) =:( α,β,µ ) t R ( c,s ) . (3.15)The multi-criterial penalty term involving three regularization parameters ( α, β, µ ) can be reduced to theusual single parameter setting by ﬁxing the ratios ν = µ/α and ν = β/α . As in the previous section,this will allow us to use the available regularization results directly and will be utilized later in this section.However, we start with an adaptation which allows slightly more freedom in the choice of the regularizationparameters. Theorem 3.3.

Let Assumption 3.1 be fulﬁlled. Let R c : X → R + and R s : Y → R + be proper, convex, andweakly lower semi-continuous. Let P : Y → Y n ⊂ Y be a linear and bounded operator. Then the followingholds:(i) The functional J δ,(cid:15)α,β,µ as deﬁned in (3.15) has a minimizer.(ii) (Continuity for ﬁxed regularization parameters) Let the regularization parameters α, β, µ > be ﬁxed.Consider sequences ( u δ j ) j ∈ N and ( s (cid:15) j ) j ∈ N with u δ j → u δ and s (cid:15) j → s mod ,(cid:15) . Let ( c j , s j ) denote aminimizer of J δ j ,(cid:15) j α,β,µ with noisy ( u δ j , s (cid:15) j ) . Then there exists a weakly convergent subsequence of ( c j , s j ) and the limit of every weakly convergent subsequence is a minimizer (˜ c, ˜ s ) of the functional J δ,(cid:15)α,β,µ .Moreover, for each weakly convergent subsequence ( c j , s j ) we have ( α, β, µ ) t R ( c j , s j ) → ( α, β, µ ) t R (˜ c, ˜ s ) and J δ j ,(cid:15) j α,β,µ ( c j , s j ) → J δ,(cid:15)α,β,µ (˜ c, ˜ s ) . (iii) (Convergence for vanishing noise levels) Assume that data sequences ( u δ j ) j and ( s (cid:15) j ) j with (cid:107) u δ j − u ∗ (cid:107) ≤ δ j and (cid:107) s (cid:15) j − s ∗ (cid:107) ≤ (cid:15) j for δ j → and (cid:15) j → are given. Assume that the regularization parametersare chosen according to the noise level such that α j = α ( δ j , (cid:15) j ) , β j = β ( δ j , (cid:15) j ) and µ j = µ ( δ j , (cid:15) j ) aremonotonically decreasing and fulﬁll α j → , β j → and µ j → . Further assume that lim j →∞ δ j + γ(cid:15) j α j = 0 , lim j →∞ β j α j = ν and lim j →∞ µ j α j = ν (3.16) for some < ν , ν < ∞ such that ν ≤ β j /α j and ν ≤ µ j /α j . Let ( c j , s j ) j := (cid:16) c δ j ,(cid:15) j α j ,β j ,µ j , s δ j ,(cid:15) j α j ,β j ,µ j (cid:17) j (3.17) denote the minimizing sequence of (3.15) obtained from data u δ j and s (cid:15) j .Then there exists a weakly convergent subsequence of ( c j , s j ) j with s j (cid:42) s ∗ and the limit of everyweakly convergent subsequence of ( c j ) j is a R c -minimizing solution. The proof requires minor adaptations of the general approach, see Appendix B. The speciﬁc nature ofthe problem further allows the derivation of a stronger property of the minimizing sequence as can be seenin the following corollary.

Corollary 3.1.

In Theorem 3.3 (ii) ( s j ) j has a strongly convergent subsequence ( s k ) k and in (iii) thesequence ( s j ) j converges strongly, i.e., Theorem 3.3(ii) implies s k → ˜ s and Theorem 3.3(iii) implies s j → s ∗ .Proof. We ﬁrst consider the situation of Theorem 3.3(ii). This implies the existence of a subsequence suchthat ( s j − s mod ,(cid:15) ) (cid:42) (˜ s − s mod ,(cid:15) ). This weakly convergent sequences also convergences in norm, if we can8how that (cid:107) s j − s mod ,(cid:15) (cid:107) → (cid:107) ˜ s − s mod ,(cid:15) (cid:107) . The weakly lower semi-continuity of the norm further implies, thatnorm convergence, s j − s mod ,(cid:15) → ˜ s − s mod ,(cid:15) , is equivalent to (cid:107) ˜ s − s mod ,(cid:15) (cid:107)≥ lim sup j (cid:107) s j − s mod ,(cid:15) (cid:107) . We observe, that for ﬁxed γ the minimization property of ( c j , s j ), i.e., J δ j ,(cid:15) j α,β,µ ( c j , s j ) ≤ J δ j ,(cid:15) j α,β,µ (˜ c, ˜ s ), impliesthe boundedness of (cid:107) s j − s mod ,(cid:15) (cid:107) .Now, assume that s j is not converging in norm to ˜ s , i.e., there exists a τ such that τ := lim sup j (cid:107) s j − s mod ,(cid:15) (cid:107) > (cid:107) ˜ s − s mod ,(cid:15) (cid:107) . (3.18)Since τ is a limit point of the sequence (cid:107) s j − s mod ,(cid:15) (cid:107) there exists a subsequence ( s k ) k ∈ N , such that ( s k − s mod ,(cid:15) ) (cid:42) (˜ s − s mod ,(cid:15) ) and (cid:107) s k − s mod ,(cid:15) (cid:107) → τ . Using the triangle inequality we obtain (cid:107) s k − s mod ,(cid:15) (cid:107) − (cid:107) s (cid:15) k − s mod ,(cid:15) (cid:107) ≤ (cid:107) s k − s (cid:15) k (cid:107) ≤ (cid:107) s k − s mod ,(cid:15) (cid:107) + (cid:107) s (cid:15) k − s mod ,(cid:15) (cid:107) (3.19)and hence we conclude lim k →∞ (cid:107) s k − s (cid:15) k (cid:107) = lim k →∞ (cid:107) s k − s mod ,(cid:15) (cid:107) = τ . (3.20)Furthermore, the weak-weak continuity of B and u δ k → u δ imply B ( c k , s k ) − u δ k (cid:42) B (˜ c, ˜ s ) − u δ . We nowemploy the lower semicontinuity of the norm as well as the convergence J δ j ,(cid:15) j α,β,µ ( c j , s j ) → J δ,(cid:15)α,β,µ (˜ c, ˜ s ), seeTheorem 3.3(ii), and obtain12 (cid:107) B (˜ c, ˜ s ) − u δ (cid:107) ≤ lim inf 12 (cid:107) B ( c k , s k ) − u δ k (cid:107) (3.21)= lim inf (cid:110) J δ k ,(cid:15) k α,β,µ ( c k , s k ) − γ (cid:107) s k − s (cid:15) k (cid:107) − ( α, β, µ ) t R ( c k , s k ) (cid:111) (3.22)= lim inf J δ k ,(cid:15) k α,β,µ ( c k , s k ) − γ τ − ( α, β, µ ) t R (˜ c, ˜ s ) (3.23)= J δ,(cid:15)α,β,µ (˜ c, ˜ s ) − γ τ − ( α, β, µ ) t R (˜ c, ˜ s ) (3.24) < (cid:107) B (˜ c, ˜ s ) − u δ (cid:107) (3.25)which contradicts the assumption.We now consider case (iii) of the previous theorem. The pair ( c j , s j ) is a minimizer of the functional J δ j ,(cid:15) j α j ,β j ,µ j , hence0 ≤ J δ j ,(cid:15) j α j ,β j ,µ j ( c j , s j ) ≤ J δ j ,(cid:15) j α j ,β j ,µ j ( c ∗ , s ∗ )= 12 (cid:107) B ( c ∗ , s ∗ ) − u δ j (cid:107) + γ (cid:107) s ∗ − s (cid:15) j (cid:107) + µ j (cid:107) P ( s ∗ ) − s calib (cid:107) + α j R c ( c ∗ ) + β j R s ( s ∗ ) ≤ (cid:0) δ j + γ(cid:15) j (cid:1) + µ j (cid:107) P ( s ∗ ) − s calib (cid:107) + α j R c ( c ∗ ) + β j R s ( s ∗ ) → , (3.26)where the convergence of the right hand side follows from the parameter choice α j , β j , µ j → δ j , (cid:15) j →

0. This also implies, thatlim j →∞ (cid:107) B ( c j , s j ) − u δ j (cid:107) + γ (cid:107) s j − s (cid:15) j (cid:107) = 0 . Both terms are non-negative and γ > j →∞ (cid:107) s j − s (cid:15) j (cid:107) = 0 . Now, lim j →∞ (cid:107) s (cid:15) j − s ∗ (cid:107) = 0 implies s j → s ∗ . Remark 3.4.

Strong convergence for c j in Theorem 3.3 can be proven for certain choices of R c . We willprove this for the case of sparsity-promoting penalty terms in the next subsection. onvergence rates for sparsity-promoting penalty terms We now prove convergence rates for sparsity-promoting penalty terms, see [10, 24, 25],Φ p ( c ) = (cid:88) i w i |(cid:104) c, ϕ i (cid:105)| p (3.27)with weights 0 < w min ≤ w i < ∞ , 1 ≤ p ≤

2, and orthonormal basis { φ i } i ∈ N ⊂ X . In the remainder weprove two results, which use diﬀerent source conditions and are applicable for certain ranges of p . Withoutloss of generality we assume w min = 1. The ﬁrst result holds for 1 < p ≤ Assumption 3.3.

Let Assumption 3.1 be fulﬁlled. In (2.3) let ( α, β, µ ) t R ( c, s ) = α ˜ R ( c, s ) := α (cid:16) Φ p ( c ) + ν (cid:107) P ( s ) − s calib (cid:107) Y + ν R s ( s ) (cid:17) for < ν , ν < ∞ (i.e., β = ν α , µ = ν α ). For c ∗ ∈ X and s ∗ ∈ Y we assume(i) Source condition : There exists an ω ∈ Z such that ( ξ c ∗ , ξ s ∗ ) = B (cid:48) ( c ∗ , s ∗ ) ∗ ω (3.28) with ( ξ c ∗ , ξ s ∗ ) ∈ ∂ ˜ R ( c ∗ , s ∗ ) .(ii) Smallness assumption : For C from Assumption 3.1 and γ, α in (2.3) it holds C (cid:107) ω (cid:107) < min (cid:110) , γ α (cid:111) . (3.29)Again, the following proofs require only minor adaptations of the general theory. We start by computingthe Bregman distance of the penalty ˜ R from the previous assumption after the following remark. Remark 3.5.

In the product space setting, for ( ξ c ∗ , ξ s ∗ ) ∈ ∂ ˜ R ( c ∗ , s ∗ ) the Bregman distance is deﬁned as D ( ξ c ∗ ,ξ s ∗ )˜ R (( c, s ) , ( c ∗ , s ∗ )) = ˜ R ( c, s ) − ˜ R ( c ∗ , s ∗ ) − (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( x, y ) − ( c ∗ , s ∗ ) (cid:105) . (3.30)Computing the Bregman distances term by term yields the following corollary. Corollary 3.2.

For ≤ p ≤ the Bregman distance of ˜ R is given by D ( ξ c ∗ ,ξ s ∗ )˜ R (( c, s ) , ( c ∗ , s ∗ )) = D ξ c ∗ Φ p ( c, c ∗ ) + ν D ζ s ∗ R s ( s, s ∗ ) + ν (cid:107) P ( s − s ∗ ) (cid:107) (3.31) with ζ s ∗ ∈ ν ∂ R s ( s ∗ ) where ξ s ∗ = ζ s ∗ + ν P ∗ ( P s ∗ − s calib ) . We thus obtain the following result for the convergence rates.

Theorem 3.4 (1 < p ≤ . Let u δ ∈ Z with (cid:107) B ( c ∗ , s ∗ ) − u δ (cid:107) ≤ δ and (cid:107) s ∗ − s mod ,(cid:15) (cid:107) ≤ (cid:15) . Let < p ≤ andlet c ∗ be a Φ p -minimizing solution. Furthermore, let Assumption 3.3 be fulﬁlled, in particular we assumethat α max is chosen s.t. Assumption 3.3 (ii) is fulﬁlled for all α with < α ≤ α max < ∞ . For β = ν α ,and µ = ν α the minimizer of the functional J δ,(cid:15)α,β,µ as deﬁned in (2.3) is denoted by ( c α , s α ) .Then, with α ∼ δ + (cid:15) we have the convergence rates (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) = O ( δ + (cid:15) ) and D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) = O ( δ + (cid:15) ) . (3.32) Proof.

We follow the general outline of [24] for proving convergence rates for sparsity constrained Tikhonovfunctionals. First, from the minimizing property of ( c α , s α ) we have J δ,(cid:15)α,ν α,ν α ( c α , s α ) ≤ J δ,(cid:15)α,ν α,ν α ( c ∗ , s ∗ ) ≤ δ + γ(cid:15) α ˜ R ( c ∗ , s ∗ ) . R with subgradient ξ ∗ = ( ξ c ∗ , ξ s ∗ ) ∈ ∂ ˜ R ( c ∗ , s ∗ ) allows torewrite the above equation as12 (cid:107) B ( c α ,s α ) − u δ (cid:107) + γ (cid:107) s α − s mod ,(cid:15) (cid:107) ≤ δ + γ(cid:15) − α (cid:16) D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) + (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c α , s α ) − ( c ∗ , s ∗ ) (cid:105) (cid:17) . (3.33)Second, from the parallelogram law we obtain γ (cid:107) s α − s ∗ (cid:107) ≤ γ (cid:107) s α − s mod ,(cid:15) (cid:107) + γ(cid:15) (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) ≤ (cid:107) B ( c α , s α ) − u δ (cid:107) + δ which yields in combination with (3.33) the following estimate14 (cid:107) B ( c α ,s α ) − B ( c ∗ , s ∗ ) (cid:107) + γ (cid:107) s α − s ∗ (cid:107) ≤ δ + γ(cid:15) − α (cid:16) D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) + (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c α , s α ) − ( c ∗ , s ∗ ) (cid:105) (cid:17) . (3.34)Third, we exploit the source condition. We deﬁne r := B ( c α , s α ) − B ( c ∗ , s ∗ ) − B (cid:48) ( c ∗ , s ∗ )(( c α , s α ) − ( c ∗ , s ∗ ))as the residual of the ﬁrst degree Taylor expansion. B is a bilinear operator, hence, with C as in Assumption3.1 we have (cid:107) r (cid:107) ≤ C (cid:107) ( c ∗ , s ∗ ) − ( c α , s α ) (cid:107) . Then, using the source condition (Assumption 3.3 (i)) we canestimate the last term of the above inequality as −(cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c α , s α ) − ( c ∗ , s ∗ ) (cid:105) = −(cid:104) B (cid:48) ( c ∗ , s ∗ ) ∗ ω, ( c α , s α ) − ( c ∗ , s ∗ ) (cid:105) = − (cid:104) ω, B (cid:48) ( c ∗ , s ∗ ) (( c α , s α ) − ( c ∗ , s ∗ )) (cid:105) = (cid:104) ω, B ( c ∗ , s ∗ ) − B ( c α , s α ) + r (cid:105)≤ (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + C (cid:107) ω (cid:107)(cid:107) ( c α , s α ) − ( c ∗ , s ∗ ) (cid:107) ≤ (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + C (cid:107) ω (cid:107)(cid:107) c α − c ∗ (cid:107) + C (cid:107) ω (cid:107)(cid:107) s α − s ∗ (cid:107) ≤ (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + C (cid:107) ω (cid:107) D ξ c ∗ (cid:96) p ( c α , c ∗ ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ D ξc ∗ Φ p ( c α ,c ∗ ) + C (cid:107) ω (cid:107)(cid:107) s α − s ∗ (cid:107) . The estimation in the last step follows from the 2-convexity of the (cid:96) p -spaces for 1 < p ≤ (cid:107) B ( c α ,s α ) − B ( c ∗ , s ∗ ) (cid:107) + αD ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) ≤ δ + γ(cid:15) − γ (cid:107) s α − s ∗ (cid:107) + α (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + C α (cid:107) ω (cid:107) D ξ c ∗ Φ p ( c α , c ∗ ) + C α (cid:107) ω (cid:107)(cid:107) s α − s ∗ (cid:107) = δ + γ(cid:15) + α (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + 12 ( αC (cid:107) ω (cid:107) − γ (cid:107) s α − s ∗ (cid:107) + α C (cid:107) ω (cid:107) D ξ c ∗ Φ p ( c α , c ∗ ) . By using the Bregman distance (3.31) we obtain14 (cid:107) B ( c α ,s α ) − B ( c ∗ , s ∗ ) (cid:107) + α (1 − C (cid:107) ω (cid:107) ) D ξ c ∗ Φ p ( c α , c ∗ ) + αν D ζ s ∗ R s ( s α , s ∗ ) + α ν (cid:107) P ( s α − s ∗ ) (cid:107) ≤ δ + γ(cid:15) + α (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + 12 (cid:16) αC (cid:107) ω (cid:107) − γ (cid:17) (cid:107) s α − s ∗ (cid:107) ≤ δ + γ(cid:15) + α (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) , (3.35)11here the last step follows from the smallness assumption (Assumption 3.3 (ii)). This smallness assumptionalso tells us, that (1 − C (cid:107) ω (cid:107) ) is positive. As the Bregman distance is non-negative, we can deduce14 (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) − α (cid:107) ω (cid:107)(cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) − ( δ + γ(cid:15) ) ≤ . This is a quadratic equation with a non-negative argument, hence,: (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) ≤ α (cid:107) ω (cid:107) + 2 (cid:112) α (cid:107) ω (cid:107) + δ + γ(cid:15) . (3.36)In the fourth step we use the parameter choice rule α ∼ δ + (cid:15) , i.e., α ≤ M ( δ + (cid:15) ) with some constant M >

0, such that we can conclude (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) ≤ M ( δ + (cid:15) ) (cid:107) ω (cid:107) + 2 (cid:112) M ( δ + (cid:15) ) (cid:107) ω (cid:107) + δ + γ(cid:15) ≤ ( δ + (cid:15) ) (cid:16) M (cid:107) ω (cid:107) + 2 (cid:112) M (cid:107) ω (cid:107) + max(1 , γ ) (cid:17) and thus (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) = O ( δ + (cid:15) ).Finally, in the last step, in order to get the convergence rate for the Bregman distance we ﬁrst need toestimate the single terms on the left hand side of (3.35), which are all positive. Using (3.36) we derive D ξ c ∗ (cid:96) p ( c α , c ∗ ) ≤ D ξ c ∗ Φ p ( c α , c ∗ ) ≤ − C (cid:107) ω (cid:107) ) (cid:18) δ + γ(cid:15) α + 2 α (cid:107) ω (cid:107) + 2 (cid:107) ω (cid:107) (cid:112) α (cid:107) ω (cid:107) + δ + γ(cid:15) (cid:19) and ν D ζ s ∗ R s ( s α , s ∗ ) ≤ δ + γ(cid:15) α + 2 α (cid:107) ω (cid:107) + 2 (cid:107) ω (cid:107) (cid:112) α (cid:107) ω (cid:107) + δ + γ(cid:15) and ν (cid:107) P ( s α − s ∗ ) (cid:107) ≤ δ + γ(cid:15) α + 2 α (cid:107) ω (cid:107) + 2 (cid:107) ω (cid:107) (cid:112) α (cid:107) ω (cid:107) + δ + γ(cid:15) . By using the parameter choice rule α ∼ δ + (cid:15) and estimating both, δ and (cid:15) , with ( δ + (cid:15) ) again we get thatall of the three terms above are in the order δ + (cid:15) . Hence, in total we have the convergence rate D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) = O ( δ + (cid:15) ) . For p = 2 this also yields the convergence rates for a quadratic penalty term. The following corollaryshows the convergence in the Hilbert space norm. Corollary 3.3.

From Theorem 3.4 directly follows a convergence rate of (cid:107) c α − c ∗ (cid:107) = O ( √ δ + (cid:15) ) .Proof. Since D ξ ∗ ˜ R is the sum of non-negative Bregman distances including D ξ c ∗ Φ p , we have D ξ c ∗ Φ p ( c α , c ∗ ) = O ( δ + (cid:15) ) . (3.37)We continue to estimate the Bregman distance by the Hilbert space norm using the deﬁnition of the sub-gradient for the sparsity term and the 2-convexity of the (cid:96) p -spaces for 1 < p ≤ D Φ p ( c α , c ∗ ) = (cid:88) i w i |(cid:104) ϕ i , c α | p − (cid:88) i w i |(cid:104) ϕ i , c ∗ (cid:105)| p − (cid:42)(cid:88) i w i |(cid:104) ϕ i , c ∗ (cid:105)| p − sgn( (cid:104) ϕ i , c ∗ (cid:105) ) ϕ i , c α − c ∗ (cid:43) ≥ w min (cid:88) i (cid:0) |(cid:104) ϕ i , c α | p − |(cid:104) ϕ i , c ∗ (cid:105)| p − (cid:10) |(cid:104) ϕ i , c ∗ (cid:105)| p − sgn( (cid:104) ϕ i , c ∗ (cid:105) ) ϕ i , c α − c ∗ (cid:11)(cid:1) = w min D ξ c ∗ (cid:96) p ( c α , c ∗ ) ≥ C (cid:107) c α − c ∗ (cid:107) . Thus, we get (cid:107) c α − c ∗ (cid:107) ≤ C D ξ c ∗ Φ p ( c α , c ∗ ) , where C is a positive constant. 12 emark 3.6. Assumption 3.3 does not yield convergence rates for p = 1 because (cid:96) is not 2-convex. In order to obtain a convergence result for p = 1 we consider an alternative source condition, which is ageneralization of the previous source condition in certain cases. Assumption 3.4.

Let Assumption 3.1 be fulﬁlled. In (2.3) let ( α, β, µ ) t R ( c, s ) = α ˜ R ( c, s ) := α (cid:16) Φ p ( c ) + ν (cid:107) P ( s ) − s calib (cid:107) Y + ν R s ( s ) (cid:17) for < ν , ν < ∞ (i.e., β = ν α , µ = ν α ).We further assume for an < α max < ∞ and γ from (2.3) (i) There exists an Φ p -minimizing solution c ∗ ∈ X for s ∗ ∈ Y and u ∗ ∈ Z .(ii) There exist constants κ ∈ [0 , , κ , κ ≥ , κ < min(1 , γ α max ) , and a subgradient ξ ∗ := ( ξ c ∗ , ξ s ∗ ) ∈ ∂ ˜ R ( c ∗ , s ∗ ) , such that (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c ∗ − c, s ∗ − s ) (cid:105) ≤ κ D ξ ∗ ˜ R (( c, s ) , ( c ∗ , s ∗ )) + κ (cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) (cid:107) + κ (cid:107) s − s ∗ (cid:107) (3.38) for all ( c, s ) ∈ X × Y . Remark 3.7.

A similar type of condition was ﬁrst introduced for more general penalty terms in [19] andparticularly for sparsity regularization in [14]. The general source condition in [19, Ass. 4.1] can also beapplied to the product space setting deﬁned in the proof of Theorem 3.3 resulting in the same convergencerate. In line with assumption [19, Ass. 4.1] restricting (3.38) to hold true for ( c, s ) ∈ M δ,(cid:15)α max ( M ) := { ( c, s ) ∈ X × Y | J δ,(cid:15)α max ,ν α max ,ν α max ( c, s ) ≤ M } where M > α max (cid:16) ˜ R ( c ∗ , s ∗ ) + δ + γ(cid:15) α (cid:17) for given ( δ, (cid:15), α ) tuples (e.g.,those used in Theorem 3.5) may allow for the following statement. If M δ,(cid:15)α max ( M ) ⊂ X × B ( s ∗ ) holds true,we could then obtain [19, Ass. 4.1(5)] from (3.38) exploiting (cid:107) s − s ∗ (cid:107) ≤ and the equivalence of p-norms inthe product space norm of X × Y . In this particular case [19, Ass. 4.1] would be a generalization of the usedsource condition in Assumption 3.4 equipped with the previously described restriction. However, proving thisrelation is beyond the scope of the present work. For 1 < p ≤ Proposition 3.1.

Let Assumption 3.3 be fulﬁlled and let < p ≤ . Then there exist κ ∈ [0 , , κ ≥ and κ < min(1 , γ α max ) such that (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c ∗ − c, s ∗ − s ) (cid:105) ≤ κ D ξ ∗ ˜ R (( c, s ) , ( c ∗ , s ∗ )) + κ (cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) (cid:107) + κ (cid:107) s − s ∗ (cid:107) . (3.39) Proof.

We start by using the source condition (i) of Assumption 3.3 and the Cauchy-Schwartz inequality toestimate the left hand side of the assertion. (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c ∗ − c, s ∗ − s ) (cid:105) = (cid:104) ω, B (cid:48) ( c ∗ , s ∗ )( c ∗ − c, s ∗ − s ) (cid:105)≤ (cid:107) ω (cid:107)(cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) − B ( c, s ) + B ( c ∗ , s ∗ ) + B (cid:48) ( c ∗ , s ∗ )( c ∗ − c, s ∗ − s ) (cid:107)≤ (cid:107) ω (cid:107)(cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) − B (cid:48) ( c ∗ , s ∗ )( c ∗ − c, s ∗ − s ) (cid:107) + (cid:107) ω (cid:107)(cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) (cid:107) . Deﬁning κ = (cid:107) ω (cid:107) , we only need to estimate the ﬁrst term now. By using (3.8) and the 2-convexity of the (cid:96) p -spaces for 1 < p ≤ (cid:107) ω (cid:107)(cid:107) B ( c, s ) − B ( c ∗ , s ∗ ) − B (cid:48) ( c ∗ , s ∗ )( c ∗ − c, s ∗ − s ) (cid:107)≤ (cid:107) ω (cid:107) C (cid:107) ( c ∗ − c, s ∗ − s ) (cid:107) = (cid:107) ω (cid:107) C (cid:0) (cid:107) c ∗ − c (cid:107) + (cid:107) s ∗ − s (cid:107) (cid:1) ≤ (cid:107) ω (cid:107) C (cid:16) D ξ c ∗ (cid:96) p ( c ∗ , c ) + (cid:107) s ∗ − s (cid:107) (cid:17) ≤ ω C (cid:124)(cid:123)(cid:122)(cid:125) =: κ D ξ ∗ ˜ R (( c ∗ , s ∗ ) , ( c, s )) + 12 (cid:107) ω (cid:107) C (cid:124) (cid:123)(cid:122) (cid:125) =: κ (cid:107) s ∗ − s (cid:107) , D ξ c ∗ Φ p ≥ D ξ c ∗ (cid:96) p being part of the Bregman distance D ξ ∗ ˜ R . From thesmallness assumption we conclude that κ < κ < min(1 , γ α max ).In the following we present a convergence rate result based on the source condition in Assumption 3.4which now includes the case p = 1 where we obtain the same order of convergence as in Theorem 3.4. Theorem 3.5.

Let u δ ∈ Z with (cid:107) u ∗ − u δ (cid:107) ≤ δ and (cid:107) s ∗ − s mod ,(cid:15) (cid:107) ≤ (cid:15) . Let ≤ p ≤ and let Assumption 3.4be fulﬁlled. For < α ≤ α max < ∞ , β = ν α , and µ = ν α the minimizer of the functional J δ,(cid:15)α,β,µ as deﬁnedin (2.3) is denoted by ( c α , s α ) . Further assume that α ∼ ( δ + (cid:15) ) . Then it holds D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) = O ( δ + (cid:15) ) and (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) = O ( δ + (cid:15) ) . (3.40) Proof.

For the ﬁrst two steps we refer to step (i) and (ii) in the proof of Theorem 3.4 (up to equation (3.34)).Thus, we start with14 (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + γ (cid:107) s α − s ∗ (cid:107) ≤ δ + γ(cid:15) − α (cid:16) D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) + (cid:104) ( ξ c ∗ , ξ s ∗ ) , ( c α , s α ) − ( c ∗ , s ∗ ) (cid:105) (cid:17) . Using the source condition in Assumption 3.4, the above equation transforms to14 (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + γ (cid:107) s α − s ∗ (cid:107) ≤ δ + γ(cid:15) − αD ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ ))+ α (cid:16) κ D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) + κ (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + κ (cid:107) s α − s ∗ (cid:107) (cid:17) , which we reorder to14 (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + α (1 − κ ) D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) ≤ δ + γ(cid:15) + κ (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) + 12 (cid:16) ακ − γ (cid:17) (cid:107) s α − s ∗ (cid:107) ≤ δ + γ(cid:15) + κ (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) . (3.41)Since the factor (1 − κ ) is positive we can use the same line of reasoning as in the proof of Theorem 3.4(starting after equation (3.35)) and get (cid:107) B ( c α , s α ) − B ( c ∗ , s ∗ ) (cid:107) = O ( δ + (cid:15) ) for α ∼ δ + (cid:15). (3.42)With this we immediately deduce from equation (3.41), that also D ξ ∗ ˜ R (( c α , s α ) , ( c ∗ , s ∗ )) = O ( δ + (cid:15) ) . (3.43)This concludes the proof.As before we can directly infer convergence rates in the norm for 1 < p ≤ Corollary 3.4.

From Theorem 3.5 the convergence rate (cid:107) c α − c ∗ (cid:107) = O ( √ δ + (cid:15) ) , < p ≤ , (3.44) follows.Proof. The assertion follows analogously to the proof of Corollary 3.3.

Remark 3.8.

The case p = 1 requires additional assumptions to connect the Bregman distance to the norm,such as a sparsity assumption on the minimum norm solution in a certain basis like for example in [8]. Apossible proof for the case p = 1 may follow the proof for [40, Thm. 3.54]. s with respect to the norm topology. Corollary 3.5.

From Theorem 3.5 it follows the convergence rate (cid:107) s α − s ∗ (cid:107) = O ( √ δ + (cid:15) ) . (3.45) Proof.

We exploit the estimate γ (cid:107) s α − s mod ,(cid:15) (cid:107) ≤ J δ,(cid:15)α ( c α , s α ) ≤ J δ,(cid:15)α ( c ∗ , s ∗ ) ≤ δ + γ(cid:15) (cid:124) (cid:123)(cid:122) (cid:125) ≤ max(1 ,γ )( δ + (cid:15) ) + α ˜ R ( c ∗ , s ∗ ) (3.46)and the parameter choice rule α ∼ δ + (cid:15) to estimate (cid:107) s α − s ∗ (cid:107) ≤ (cid:107) s α − s mod ,(cid:15) (cid:107) + 2 (cid:15) ≤ (2 max(1 , γ ) /γ + 2)( δ + (cid:15) ) + C ( δ + (cid:15) ) (3.47)which concludes the proof. Remark 3.9.

We have not chosen R s , yet. Depending on the speciﬁc penalty one might derive betterconvergence rates in the norm topology. Further investigations in this direction are beyond the scope of thepresent work. The algorithmic solution is derived for the discretized integral operator in (3.3). After discretization insuitable ﬁnite-dimensional subspaces of the spaces X , Y , and Z , we obtain the discretized problem in termsof a matrix-vector product, i.e., the discretized operator ˜ B : R M × K K × M → K K with( c, S ) (cid:55)→ Sc (4.1)where K = R , C and N, M, K ∈ N . Information source of type A analogously translates into the discretesetup, where either S mod ∈ K K × M or S mod ,(cid:15) ∈ K K × M is available. For information source of type Bwe assume a given S calib ∈ K K × N , N < M . And an operator ˜ P : K K × M → K K × N derived from thelinear operator P which aﬀects the integral kernel in its space variable x only. The measurement dimensionremains unaﬀected. In this speciﬁc case the operator ˜ P can be represented by a matrix Q ∈ K M × N , i.e., S (cid:55)→ ˜ P ( S ) = SQ . According to the particular discretizations R c : R M → R + and R s : K K × M → R + areobtained from R c and R s .In this work, the joint reconstruction of an image c and the system matrix S is obtained by minimizing J ( c, S ) = 12 (cid:107) Sc − u (cid:107) + γ (cid:107) S − S mod (cid:107) F + µ (cid:107) SQ − S calib (cid:107) F + αR c ( c ) + βR s ( S ) (4.2)for given α, β, γ, µ ≥ J ( c, S ) with respect to c and S followingtwo steps: min c ∈ R M (cid:107) Sc − u (cid:107) + αR c ( c ) (cid:124) (cid:123)(cid:122) (cid:125) =: J c ( c ) , (4.3)min S ∈ K K × M (cid:107) Sc − u (cid:107) + γ (cid:107) S − S mod (cid:107) F + µ (cid:107) SQ − S calib (cid:107) F + βR s ( S ) (cid:124) (cid:123)(cid:122) (cid:125) =: J S ( S ) . (4.4)For the regularization of particle density, based on previous experience, we use the following combination of (cid:96) and (cid:96) regularization R c ( c ) = | c | + λα | c | . (4.5)15or the regularization of the system matrix, the term with S mod including the Frobenius norm provides an (cid:96) -type regularization. The term with S calib provides a priori information on the high resolution systemmatrix. Here, we do not impose further regularization on the system matrix with our prior knowledge onMPI. Therefore, the reconstruction functionals are set to J c ( c ) = (cid:107) Sc − u (cid:107) + ˜ α | c | + λ | c | , (4.6) J S ( S ) = (cid:107) Sc − u (cid:107) + ˜ γ (cid:107) S − S mod (cid:107) F + ˜ µ (cid:107) SQ − S calib (cid:107) F (4.7)where we have substituted regularization parameters for notational simplicity of the numerical algorithmspresentations, i.e., ˜ α = (cid:113) α , ˜ γ and ˜ µ analogously.In the remainder we focus on the algorithmic derivation for K = C as the system matrix in MPI iscommonly complex-valued. Furthermore c is equipped with a positivity constraint. When the regularization parameters are ﬁxed, we derive the following reconstruction algorithm with theregularization terms in (4.5) in the following. The Kaczmarz algorithm is chosen for the following reasons.For MPI a Kaczmarz-type algorithm [11] still deﬁnes one of the standard methods since the very beginningof MPI research [32] which seems to beneﬁt qualitatively from an advantageous inﬂuence of early stopping.More recently this has also been observed quantitatively on real MPI data [29]. Moreover, because of its row-action nature, the Kaczmarz algorithm provides us the ﬂexibility for choosing the order of using the measureddata u and steer the reconstruction process. Because our formulation is an unconventional mixture of realand complex spaces and with additional regularization terms, we provide the derivation of the Kaczmarzalgorithm with the regularization terms used in this work. The algorithms are developed by applyingan alternating minimizing approach, or the incremental gradient descent method, to the reconstructionfunctionals in (4.6) and (4.7) with appropriate decomposition of both functionals.Because the Kaczmarz algorithm is based on the projection onto hyperplanes [9], we need the projectiononto hyperplanes in the complex C M . For a hyperplane H in C M , given by a ∈ C M and b ∈ C , i.e., H = { z ∈ C M : M (cid:88) m =1 a m z m = b } , (4.8)the projection P H for any z ∈ C M to H is equal to P H [ z ] = z − (cid:80) Mm =1 a m z m − b (cid:107) a (cid:107) ¯ a, (4.9)where ¯ · is the complex conjugate, and vectors z and a are column vectors. For a (cid:96) -regularized least-squaresproblem as (cid:107) Az − b (cid:107) + η | z − z | , (4.10)for A ∈ C K × M , z ∈ C M , and η >

0, by introducing an auxiliary variable v ∈ C K , the following system ofequations is consistent [18, 17], (cid:0) ηI A (cid:1) (cid:18) vz (cid:19) = b − Az , (4.11)where I is the K × K identity matrix. If (cid:18) vz (cid:19) is the minimal norm solution of (4.11), then z ∗ = z + z (4.12)16s a minimizer for (4.10) [18, 17]. One Kaczmarz step for (4.11) and an iteration-to-row index mapping g A : N → { , . . . , K } is v j +1 = v j + ητ b k − A k z − ηv jk − A k z j η + (cid:107) A k (cid:107) I k Tr , (4.13) z j +1 = z j + τ b k − A k z − ηv jk − A k z j η + (cid:107) A k (cid:107) A k Tr , (4.14)with k = g A ( j ) and where · Tr is the transpose, I k is the k -th column of the K × K identity matrix I and A k is the k -th row of A for 1 ≤ k ≤ K . Each step involves two equations in the complex form, and thusfour equations in the real space. Hence, the iterations in (4.13) and (4.14) are block-iterative of block size 4,unlike the original Kaczmarz iteration (in real space) of block size 1. Because the system of equations (4.11)is consistent, the above Kaczmarz algorithm will converge to the minimal norm solution of (4.11) with therelaxation parameter τ ∈ (0 ,

2) and zero initial values for v j and z j [23]. One equivalent form of iteration in(4.13) and (4.14) is as follows, by choosing the initial values v = 0 and z = z [18, 17], v j +1 = v j + ητ b k − A k z j − ηv jk η + (cid:107) A k (cid:107) I k Tr , (4.15) z j +1 = z j + τ b k − A k z j − ηv jk η + (cid:107) A k (cid:107) A k Tr , (4.16)for k = g A ( j ). Because I k is equal to 1 at its k -th component and zero otherwise, the above iteration can bereduced to v j +1 k = v jk + ητ b k − A k z j − ηv jk η + (cid:107) A k (cid:107) , (4.17) z j +1 = z j + τ b k − A k z j − ηv jk η + (cid:107) A k (cid:107) A k Tr , (4.18)for k = g A ( j ).In the following we apply this technique to the joint reconstruction problem. Thus, let the image c =( c , · · · , c M ) Tr ∈ R M , the measurement u = ( u , · · · , u K ) Tr ∈ C K , and S k is the k -th row of the system S ∈ C K × M . J c ( c )The reconstruction functional for the image c is decomposed into the following two terms J c ( c ) = J c(cid:96) ( c ) + J c(cid:96) ( c ) , (4.19)with J c(cid:96) ( c ) = (cid:107) Sc − u (cid:107) + ˜ α | c | , (4.20) J c(cid:96) ( c ) = λ | c | . (4.21)For J c(cid:96) ( c ), by applying (4.13) and (4.14), one Kaczmarz step for c is then c j +1 = c j + τ u k − S k c j − ˜ αv jk ˜ α λ + (cid:107) S k (cid:107) S k Tr , (4.22) v j +1 k = v jk + ˜ ατ u k − S k c j − ˜ αv jk ˜ α λ + (cid:107) S k (cid:107) , (4.23)with k = g A ( j ) and where · Tr is the transpose, v j ∈ C K is auxiliary for computing the iteration of c . Theinitial values must be c = 0 and v = 0. One sweep of the Kaczmarz algorithm consists of applying (4.22)17or all measured data u k for k = 1 , . . . , K . Of course, the iteration order over the measured data can becrucial for the reconstructed image quality within ﬁnite number of iterations. Here, we use the simple ordergiven by g A ( j ) = ( j mod K ) + 1.Because the particle density c is real and non-negative, we apply a projection P + onto the non-negativequadrant of R M after each Kaczmarz sweep over the entire matrix.The (cid:96) regularization is applied after the projection P + . It is performed by applying the soft thresholdingoperator [6], T λ : R → R , T λ ( c m ) = ( c m − λ ) + − ( − c m − λ ) + (4.24)where ( · ) + = max(0 , · ). The regularized Kaczmarz algorithm for solving (4.6) is summarized in Algorithm 1.The convergence criteria usually consist of the following: a given limit of iteration number, a threshold onthe diﬀerence between the current and last iterates, such as (cid:107) c j +1 − c j (cid:107) , or a relative diﬀerence such as (cid:107) c j +1 − c j (cid:107)(cid:107) c j (cid:107) . Algorithm 1:

Kaczmarz algorithm for particle density with the (cid:96) and (cid:96) regularization terms Result:

Iterative reconstruction for partical density c with given regularization parameters ˜ α , λ initialization;choose a sweep order for measured data;set c = 0 ∈ C M ;set v = 0 ∈ C K ;set τ = τ ∈ (0 , while convergence criteria do not meet do Kaczmarz sweep: K aczmarz ← τ u k − S k c − ˜ αv k ˜ α + (cid:107) S k (cid:107) (4.25) c ← c + K aczmarz S k Tr (4.26) v k ← v k + ˜ αK aczmarz (4.27)by the sweep order k = 1 , . . . , K for measured data u k .;Projection onto the real space: c ← P + [ c ] (4.28)Soft thresholding: c m ← ( c m − λ ) + − ( − c m − λ ) + (4.29)for any m = 1 , . . . , M .;Continue; end4.1.2 Regularized Kaczmarz algorithm for J S ( S )Let Q n be the n -th column of Q . Then we have J S ( S ) = K (cid:88) k =1 | S k c − u k | + K (cid:88) k =1 N (cid:88) n =1 | ˜ µ ( S k Q n − S calib ,k,n ) | + ˜ γ (cid:107) S − S mod (cid:107) F , (4.30)By applying (4.13) and (4.14), one Kaczmarz step for the k -th row S k is then either update by cS j +1 k = S jk + τ u k − S jk c − ˜ γv jk ˜ γ + (cid:107) c (cid:107) c Tr , (4.31) v j +1 k = v jk + ˜ γτ u k − S jk c − ˜ γv jk ˜ γ + (cid:107) c (cid:107) , (4.32)18r update by S calib S j +1 k = S jk + τ ˜ µ ( S calib ,k,n − S jk Q n ) − ˜ γw jk,n ˜ γ + ˜ µ (cid:107) Q n (cid:107) ˜ µQ n Tr , (4.33) w j +1 k,n = w jk,n + ˜ γτ ˜ µ ( S calib ,k,n − S jk Q n ) − ˜ γw jk,n ˜ γ + ˜ µ (cid:107) Q n (cid:107) , (4.34)with n = g Q ( j ) where g Q : N → { , . . . , N } deﬁnes the order of the columns of Q during the iteration. Notethat we need two auxiliary variables v j ∈ C K and w j ∈ C K × N . The initial values must be S = S mod and v = 0. One sweep of the Kaczmarz algorithm can be performed by updating any row S k , k = 1 , . . . , K , with c by applying (4.31) and (4.32) and by updating with S calib by applying (4.33) and (4.34) for all calibrationdata measured data S calib ,k,n for n = 1 , · · · , N , i.e., g Q ( j ) = ( j mod N ) + 1. Again, the sweeping order ofthe Kaczmarz algorithm should not be overlooked. The regularized Kaczmarz algorithm for solving (4.7) issummarized in Algorithm 2. Algorithm 2:

Kaczmarz algorithm for system matrix

Result:

Iterative reconstruction for system matrix S with given regularization parameters ˜ γ , ˜ µ initialization;choose a sweep order for measured data;choose a sweep order for calibration data;set S = S mod ∈ C K × M ;set v = 0 ∈ C K ;set w = 0 ∈ C K × N ;set τ = τ ;set η = η ; while convergence criteria do not meet do Kaczmarz sweep by c K aczmarz ← τ u k − S k c − ˜ γv k ˜ γ + (cid:107) c (cid:107) , (4.35) S k ← S k + K aczmarz c Tr , (4.36) v k ← v k + ˜ γK aczmarz , (4.37)for any k = 1 , · · · , K ;Kaczmarz sweep by Q K aczmarz ← τ ˜ µ ( S calib ,k,n − S k Q n ) − ˜ γw k,n ˜ γ + ˜ µ (cid:107) Q n (cid:107) , (4.38) S k ← S k + K aczmarz ˜ µQ n Tr , (4.39) w k,n ← w k,n + ˜ γK aczmarz , (4.40)by the sweep order n = 1 , . . . , N for any k = 1 , . . . , K ;Continue; end We illustrate the proposed method by numerical examples including an academic test problem and theapplication to the imaging problem in MPI. Results using a standard integral operator are presented tohighlight the characteristic behavior of the method for optimal parameter settings obtained via (cid:96) -error orSSIM. Furthermore, we provide numerical results for measured phantom data in MPI.19 .1 Academic example - integral operator We investigate the behavior of the proposed method ﬁrst in an academic test example. Here, we choosea discretized standard problem deﬁned by the integral operator in (3.3) for Ω , I = (0 , T ), T ∈ N , and s ( x, t ) = χ [0 ,x ) ( t ), i.e., the corresponding discretized bilinear operator ˜ B : R M × R K × M → R K is given by˜ B ( S, c ) = Sc . By choosing K = M = T , an equidistant grid, and piecewise constant basis functions, the trueoperator S ∗ ∈ R K × M is thus a lower triangular matrix ﬁlled with ones. Please not that · ∗ does not denotethe adjoint matrix in this subsection. S (cid:15) is obtained by adding a Gaussian matrix η ∈ R K × M with entriesbeing i.i.d. and normally distributed with zero mean and standard deviation σ , i.e., η i,j ∼ N (0 , σ ) for any i = 1 , . . . , K , j = 1 , . . . , M . Analogously, we obtain the noisy measurement u δ = u ∗ + ξ where the noise ξ ∈ R K is also i.i.d. and normally distributed with zero mean and standard deviation σ . The calibratedmatrix S calib ∈ R K × N , N = M , on a coarser resolution is obtained using the operator ˜ P , respectively thematrix Q ∈ R M × N which encodes computing the sum of two consecutive columns of the high resolutionmatrix, i.e., Q is a sparse matrix with two ones in each column. Each column of the high resolution matrixcontributes only to one single column of the low resolution system matrix. We thus use S mod ,(cid:15) = S (cid:15) = S ∗ + η , S calib = S ∗ Q , and u δ = u ∗ + ξ in (4.2). A solution for M = 50 is obtained by minimizing the functional in(4.2) using the Kaczmarz-type method outlined in Section 4. Here, in each outer iteration of the alternatingminimization problem we us 500 Kaczmarz sweeps for J c ( c ) and 300 ones for J S ( S ). In total 100 outeriterations are computed. In the following we compare • the ( c, S )-reconstruction, which is the joint reconstruction of c and S minimizing J , • the sole c -reconstruction minimizing J c for ﬁxed S = S ∗ , which is the ideal and desired situation wherethe operator is accurately known, and • the sole c -reconstruction minimizing J c for ﬁxed S = S (cid:15) , which represents the other end of the rangewhere the noisy/inaccurate operator is considered as true.In order to compare the diﬀerent methods we performed a discrete regularization parameter search for γ ∈ { − i | i = 0 , . . . , } , µ ∈ { − i | i = 0 , . . . , } , α ∈ { − i | i = 10 , . . . , } , and λ ∈ { − i | i = 1 , . . . , } (38988 parameter combinations in total) optimizing either the (cid:96) -error or the SSIM when compared tothe true solution c ∗ . For the ( c, S )-reconstruction the last outer iterate is used to determine the optimalparameter.The results for the (cid:96) -error optimized regularization parameters for varying noise levels (in terms ofstandard deviation σ ) are illustrated in Figure 1 and Table 1. In all three cases the ( c, S )-reconstructionmethod improves the reconstruction quality quantitatively when compared to the c -reconstruction ( S = S (cid:15) )which is the desired impact of the joint reconstruction. The ( c, S )-reconstructions tend to approximate thetrue solution c ∗ quantitatively and qualitatively better but do not reach the same error level which is duethe noisy nature of the operator and thus is not expected. When decreasing the noise level σ , we can alsoobserve a decrease of the (cid:96) -error for the c -reconstruction ( S = S ∗ ) as predicted by the theory. The ( c, S )-reconstruction also follows this trend reaching the (cid:96) -error values 0.3158, 0.1124, and 0.1027 in the last outeriterate for decreasing σ .Analogous observations can be made when using SSIM to obtain optimal parameters which are illustratedin Figure 2 and Table 2. As a second example we consider MPI which also motivated the general problem setup of the present work.Precisely modeling MPI, resp. formulating a physically accurate integral kernel for image reconstructionis still an unsolved problem. Various modeling aspects, e.g., the magnetization dynamics and particle-particle interactions, make it a challenging task such that the integral kernel is commonly determined ina time-consuming calibration procedure. For further information on the modeling aspects, the interestedreader is referred to the survey paper [30] as well as to the review article [32] for further details on theMPI methodology. The numerical results are obtained from the recently improved modeled approach in[31] and the real data example in a 2D ﬁeld-free-point (FFP) setup therein. Thus, the setup is as follows: S mod ∈ C K × M , respectively S mod ,(cid:15) for one particular (cid:15) is given by the improved model B3 in [31] which20 = . σ = . σ = . Figure 1: Phantom reconstructions (left) and (cid:96) -error (right) of proposed method for decreasing standarddeviation from top to bottom. Regularization parameters are chosen such that the (cid:96) -(reconstruction) erroris minimized. For ( c, S )-reconstruction method the last outer iteration is used to determine the regularizationparameters, which can be found in Table 1. σ Method γ µ α λ ( c, S ) - rec. 0 .

25 1 . . × − . × − c - rec., S = S (cid:15) – – 6 . × − . × − c - rec., S = S ∗ – – 3 . × − . × − ( c, S ) - rec. 0 .

50 1 . . × − . × − c - rec., S = S (cid:15) – – 3 . × − . × − c - rec., S = S ∗ – – 7 . × − . × − ( c, S ) - rec. 0 .

50 1 . . × − . × − c - rec., S = S (cid:15) – – 7 . × − . × − c - rec., S = S ∗ – – 3 . × − . × − Table 1: Optimal regularization parameters with respect to (cid:96) -(reconstruction) error.21 = . σ = . σ = . Figure 2: Phantom reconstructions (left) and SSIM-measure (right) of proposed method for decreasingstandard deviation from top to bottom. Regularization parameters are chosen such that the 1 − SSIM isminimized. For ( c, S )-reconstruction method the last outer iteration is used to determine the regularizationparameters, which can be found in Table 2 σ Method γ µ α λ ( c, S ) - rec. 0 . . . × − . × − c - rec., S = S (cid:15) – – 6 . × − . × − c - rec., S = S ∗ – – 3 . × − . × − ( c, S ) - rec. 0 .

50 1 . . × − . × − c - rec., S = S (cid:15) – – 3 . × − . × − c - rec., S = S ∗ – – 7 . × − . × − ( c, S ) - rec. 1 . . . × − . × − c - rec., S = S (cid:15) – – 3 . × − . × − c - rec., S = S ∗ – – 7 . × − . × − Table 2: Optimal regularization parameters with respect to 1 − SSIM .–22xploits a space-dependent anisotropy in a N´eel rotation model for ensembles of nanoparticles. The authorsﬁtted the analog ﬁlter function to calibration measurements in a previous step. For this work we exploitedthis model to obtain the reﬁned resolution of M = 60 ×

60 voxels corresponding to voxels of size 0.5mm × × K max = 2 ×

817 frequenciesavailable after applying the Fourier transform (2 channels × the entire available spectrum). To obtainreasonable reconstructions the frequencies are restricted to a subset in a preprocessing step (a combinationof SNR and NRMSE thresholding, see [31] for further details) resulting in K ≤ K max . S calib ∈ R K × N isobtained in a calibration procedure with a delta sample of size 1mm × × ×

30 = N voxels. K is as described for S mod .The real phantom consisted of 3 capillary ﬁlled with tracer having a concentration of 250 mmol/l. Intotal a tracer volume of 2 . × − l was used. For a photo of the phantom we refer to [31, Fig. 6]. u δ ∈ C K is thus the MPI measurement after applying the previously described frequency selection.Solutions are obtained by minimizing the functional in (4.2) using the Kaczmarz-type method outlinedin Section 4. We use 10 outer iterations each comprising 75 Kaczmarz sweeps for J c ( c ) and 20 ones for J S ( S ). Due to the missing ground truth for the system matrix, we cannot compare any reconstruction tothe c -reconstruction using S = S ∗ . As an alternative we compare them to low resolution reconstructionsusing S = S calib which also represents the standard reconstruction in MPI. Thus, in the following we compare • the ( c, S )-reconstruction, which is again the joint reconstruction of c and S minimizing J , • the sole c -reconstruction minimizing J c for ﬁxed S = S mod , which represents the pure model-basedreconstruction on the reﬁned resolution, and • the sole c -reconstruction minimizing J c for ﬁxed S = S calib , which represents the standard reconstruc-tion method in MPI.For comparison we computed reconstructions for various parameter combinations, i.e., γ ∈ { − i | i =0 , . . . , } , µ ∈ { − i | i = 0 , . . . , } , α ∈ { − i | i = 28 , , . . . , } , and λ ∈ { − i | i = 3 , . . . , } , and as aground truth phantom is not available for computing an image quality measure, we exploited the quantita-tive information on the volume of the used tracer material. We sorted the reconstructions by their absolutedeviation to the desired volume in an ascending order which are illustrated in Figure 3 and Table 3.For the ( c, S )-reconstructions we can observe that the capillary reconstructions become sharper whenincreasing the number of outer iterations. Two main kinds of reconstructions can be found for the ( c, S )-reconstruction in the illustrated results. Rows 1-4 result from a large µ while rows 5-8 are based on asmall µ . This shows the inﬂuence of the reconstructed system matrix on the reconstruction when changingthe inﬂuence of the term including the additional information on the operator P . For large µ (rows 1-4)and thus a smaller inﬂuence of the regularization term including S mod , we can observe high frequent noisepatterns in the reconstructions. When decreasing µ we obtain improved reconstructions (rows 5-8), whichillustrates the importance of additional a priori information in the reconstruction process, either providedvia S mod or another penalty term R s which may include further a priori information on S . For pure c -reconstruction using S = S mod the best reconstructions (by visual judgement) can be found in rows 1 and6. These reconstructions are of similar quality when compared to the ( c, S )-reconstruction, while the latterone results in slightly sharper reconstructions. Compared to the c -reconstruction ( S = S calib ) on the coarsergrid, both high resolution reconstruction methods improve the separation of the capillaries in the phantom. In this work, we considered a hybrid approach to obtain high resolution reconstructions in imaging appli-cations by explicitly taking into account parameters of the imaging operators in the reconstruction process.The present approach combines incomplete inﬁnite- or high-dimensional model information (type A) withhigh-quality but ﬁnite-/lower-dimensional information (type B). Motivated by the application of interest,MPI, we analyzed a general Hilbert space setup for bilinear operators, i.e., a linear imaging operator as wellas a linear dependence on the parameters of the imaging operator. Furthermore we derived a Kaczmarz-type23 owno. c -rec., S = S mod ( c, S ) - rec. c -rec., S = S calib – 1 iter. 5 iter. 10 iter. –12345678Figure 3: Concentration reconstructions for the phantom consisting of 3 glass capillaries (see [31, Fig. 6])are presented. All reconstructions are in mmol/l. The corresponding regularization parameters can be foundin Table 3. Reconstructions are sorted by increasing absolute deviation from the expected tracer volumefrom top to bottom. 24 -rec., S = S mod row no. α λ . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − c -rec., S = S calib row no. α λ . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − ( c, S ) -rec.row no. γ µ α λ . × − . . × − . × − . × − . . × − . × − . × − . . × − . × − . × − . . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − Table 3: Regularization parameters used to obtain the image reconstructions in Figure 3algorithm to obtain a solution for the joint reconstruction problem and tested it in an academic problem aswell as in the imaging application of MPI.The theoretical ﬁndings in terms of stability, convergence and convergence rates are in line with theﬁndings in [4] where the authors considered bilinear operators fulﬁlling stronger assumptions and a specialcase of the functional of the present work. In contrast to the work [4] we exploited the general resultsfor nonlinear operator equations in [19] by addressing the product space setup of the joint reconstructionproblem.An algorithmic solution to the joint reconstruction problem is derived using an alternating minimizationapproach as analogously formulated for a special case of the functional [5]. Motivated by the applicationof interest, Kaczmarz-type algorithms are exploited to minimize the respective functional for the image andthe parameter reconstruction. The extension taking into account the link between low and high resolutionsystem matrices is straight forward and it showed to be successful as illustrated by the academic test example.Furthermore, the numerical results for MPI illustrate the potential of the present approach to exploit multipleinformation sources to comprise best of both worlds in hybrid methods.The obtained results of this work build the basis for several directions of research in diﬀerent disciplines.In the context of MPI the present work motivates diﬀerent future research questions. In particular, anexperimental study where ground truth of the phantom is available is desirable which also enables theinvestigation of suitable image quality measures.From a theoretical as well as an application point of view the extension to a nonlinear dependence on theparameters of the imaging operator is highly desirable. The interpretation of the linear operator P linkingthe low and high resolution is then not as intuitive as in the present setup anymore. It can then rather be seenas a joint model calibration and image reconstruction problem. An intuitive further direction of research isthe treatment of dynamically changing model parameters in time-dependent image reconstruction. In MPI,for example, the tracer material changes its magnetization behavior if the nanoparticles are immobilized[28, 37], e.g., if the nanoparticles are blocked while labeling certain types of tissue [43].25 cknowledgements The authors would like to thank T. Knopp from University Medical Center Hamburg-Eppendorf for sharingthe MPI data used in this work. T. Kluth acknowledges funding by the Deutsche Forschungsgemeinschaft(DFG, German Research Foundation) - project number 281474342/GRK2224/1 “Pi : Parameter Identiﬁca-tion - Analysis, Algorithms, Applications”. C. Bathke acknowledges funding by the project “MPI ” fundedby the Federal Ministry of Education and Research (BMBF, project no. 05M16LBA). P. Maass acknowl-edges the support by the Federal Ministry of Education and Research (BMBF project no. 05M20LBC,HYDAMO). M. Jiang acknowledges the funding by the National Science Foundation of China (11961141007,61520106004), and the friendly hospitality of Prof. Peter Maass during his sabbatical visit to University ofBremen. References [1] S. Arridge, P. Maass, O. ktem, and C.-B. Schnlieb. Solving inverse problems using data-driven models.

Acta Numerica , 28:1174, 2019.[2] S. R. Arridge. Methods in diﬀuse optical imaging.

Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences , 369(1955):4558–4576, 2011.[3] M. Benning and M. Burger. Modern regularization methods for inverse problems.

Acta Numerica ,27:1111, 2018.[4] I. R. Bleyer and R. Ramlau. A double regularization approach for inverse problems with noisy data andinexact operator.

Inverse Problems , 29(2):025004–17.[5] I. R. Bleyer and R. Ramlau. An alternating iterative minimisation algorithm for the double-regularisedtotal least square functional.

Inverse Problems , 31(7):075004, 2015.[6] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learningvia the alternating direction method of multipliers.

Foundations and Trends in Machine Learning ,3(1):1–122, 2011.[7] K. Bredies and M. Holler. Higher-order total variation approaches and generalisations. 12 2019.[8] K. Bredies and D. A. Lorenz. Linear Convergence of Iterative Soft-Thresholding.

Journal of FourierAnalysis and Applications , 14(5-6):813–837.[9] Y. Censor, G. T. Herman, and M. Jiang. A note on the behavior of the randomized kaczmarz algorithmof strohmer and vershynin.

Journal of Fourier Analysis and Applications , 15(4):431–436, 2009.[10] I. Daubechies, M. Defrise, and C. D. Mol. An iterative thresholding algorithm for linear inverse problemswith a sparsity constraint. 2003.[11] A. Dax. On row relaxation methods for large constrained least squares problems.

SIAM Journal onScientiﬁc Computing , 14(3):570–584, 1993.[12] J. C. De los Reyes, C.-B. Sch¨onlieb, and T. Valkonen. Bilevel parameter learning for higher-order totalvariation regularisation models.

Journal of Mathematical Imaging and Vision , 57(1):1–25, 2017.[13] H. W. Engl, M. Hanke, and A. Neubauer.

Regularization of Inverse Problems , volume 375 of

Mathematicsand Its Applications . Springer Verlag, 2000.[14] M. Grasmair, M. Haltmeier, and O. Scherzer. Sparse regularization with lq penalty term.

InverseProblems , 24(5):055020, 2008.[15] H. Greenspan. Super-resolution in medical imaging.

The computer journal , 52(1):43–63, 2009.2616] J. Haegele, J. Rahmer, B. Gleich, J. Borgert, H. Wojtczyk, N. Panagiotopoulos, T. Buzug,J. Barkhausen, and F. Vogt. Magnetic particle imaging: visualization of instruments for cardiovas-cular intervention.

Radiology , 265(3):933–938, 2012.[17] G. T. Herman.

Fundamentals of Computerized Tomography: Image Reconstruction from Projections .Springer, New York, 2nd edition, 2009.[18] G. T. Herman, A. Lent, and H. Hurwitz. A storage-eﬃcient algorithm for ﬁnding the regularized solutionof a large, inconsistent system of equations.

IMA Journal of Applied Mathematics , 25(4):361–366, 061980.[19] B. Hofmann, B. Kaltenbacher, C. Pschl, and O. Scherzer. A convergence rates result for tikhonovregularization in banach spaces with non-smooth operators.

Inverse Problems , 23(3):987–1010, 2007.[20] S. Ilbey, C. B. Top, A. G¨ung¨or, T. C¸ ukur, E. U. Saritas, and H. E. G¨uven. Fast system calibrationwith coded calibration scenes for magnetic particle imaging.

IEEE Transactions on Medical Imaging ,38(9):2070–2080, 2019.[21] J. S. Isaac and R. K. Kulkarni. Super resolution techniques for medical image processing. , pages 1–6, 2015.[22] M. Jiang, P. Maass, and T. Page. Regularizing properties of the mumford–shah functional for imagingapplications.

Inverse Problems , 30(3):035007, 2014.[23] M. Jiang and G. Wang. Convergence studies on iterative algorithms for image reconstruction.

IEEETransactions on Medical Imaging , 22(5):569 – 579, 2003.[24] B. Jin and P. Maass. Sparsity regularization for parameter identiﬁcation problems.

Inverse Problems ,28(12):123001, 2012.[25] B. Jin, P. Maaß, and O. Scherzer. Sparsity regularization in inverse problems.

Inverse Problems ,33(6):060301, 2017.[26] L. Justen and R. Ramlau. A non-iterative regularization approach to blind deconvolution.

InverseProblems , 22(3):771–800, 2006.[27] B. Kaltenbacher, A. Neubauer, and O. Scherzer.

Iterative Regularization Methods for Nonlinear Ill-PosedProblems . De Gruyter, Berlin, Boston, 2008.[28] T. Kluth. Mathematical models for magnetic particle imaging.

Inverse Problems , 34(8):083001, 2018.[29] T. Kluth and B. Jin. L1 data ﬁtting for robust numerical reconstruction in magnetic particle imaging:quantitative evaluation on

Open MPI dataset . Preprint, arXiv: 2001.06083, 2020.[30] T. Kluth, B. Jin, and G. Li. On the degree of ill-posedness of multi-dimensional magnetic particleimaging.

Inverse Problems , 34(9):095006, 2018.[31] T. Kluth, P. Szwargulski, and T. Knopp. Towards accurate modeling of the multidimensional magneticparticle imaging physics.

New Journal of Physics , 21(10):103032, 2019.[32] T. Knopp, N. Gdaniec, and M. M¨oddel. Magnetic particle imaging: from proof of principle to preclinicalapplications.

Physics in Medicine and Biology , 62(14):R124, 2017.[33] T. Knopp and M. Hofmann. Online reconstruction of 3D magnetic particle imaging data.

Physics inMedicine and Biology , 61(11):N257–67, 2016.[34] T. Knopp, J. Rahmer, T. F. Sattel, S. Biederer, J. Weizenecker, B. Gleich, J. Borgert, and T. M.Buzug. Weighted iterative reconstruction for magnetic particle imaging.

Phys. Med. Biol. , 55(6):1577–1589, 2010.[35] A. K. Louis.

Inverse und schlecht gestellte Probleme . Vieweg+Teubner Verlag, 1989.2736] X.-G. Lv, F. Li, and T. Zeng. Convex blind image deconvolution with inverse ﬁltering.

Inverse Problems ,34(3):035003, 2018.[37] M. M¨oddel, C. Meins, J. Dieckhoﬀ, and T. Knopp. Viscosity quantiﬁcation using multi-contrast magneticparticle imaging.

New Journal of Physics , 20(8):083001, 2018.[38] J. Mueller and S. Siltanen.

Linear and Nonlinear Inverse Problems with Practical Applications . Com-putational Science and Engineering. Society for Industrial and Applied Mathematics, 2012.[39] A. Rieder.

Keine Probleme mit inversen Problemen: Eine Einf¨uhrung in ihre stabile L¨osung .Vieweg+Teubner Verlag, 2003.[40] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen.

Variational methods in imaging ,volume 167 of

Applied Mathematical Sciences . Springer, New York.[41] T. Schuster, B. Kaltenbacher, B. Hofmann, and K. Kazimierski.

Regularization Methods in BanachSpaces . Radon Series on Computational and Applied Mathematics. De Gruyter, 2012.[42] K. ˇSˇcup´akov´a, V. Terzopoulos, S. Jain, D. Smeets, and R. M. Heeren. A patch-based super resolutionalgorithm for improving image resolution in clinical mass spectrometry.

Scientiﬁc reports , 9(1):1–11,2019.[43] L. Wu, Y. Zhang, G. Steinberg, H. Qu, S. Huang, M. Cheng, T. Bliss, F. Du, J. Rao, G. Song, L. Pisani,T. Doyle, S. Conolly, K. Krishnan, G. Grant, and M. Wintermark. A review of magnetic particle imagingand perspectives on neuroimaging.

American Journal of Neuroradiology , 2019.[44] Z. Zhao, J. Yang, and M. Jiang. A fast algorithm for high order total variation minimization basedinterior tomography.

Journal of X-Ray Science and Technology , 23:349–364, 06 2015.

A Proofs of Section 3.2

We now provide a proof of Lemma 3.3, which is a classical result from linear operator theory.

Proof.

The deﬁnition of the functional T and the properties of P yield that T is proper. Due to the linearityof P and the triangle inequality, it holds for any a ∈ (0 ,

1) and x, y ∈ Yf ((1 − a ) x + ay ) = (cid:107) P ((1 − a ) x + ay ) − s calib (cid:107) = (cid:107) P ((1 − a ) x ) − (1 − a ) s calib + P ( ay ) − as calib (cid:107)≤ (1 − a ) (cid:107) P ( x ) − s calib (cid:107) + a (cid:107) P ( y ) − s calib (cid:107) = (1 − a ) f ( x ) + af ( y )with f ( s ) := (cid:107) P ( s ) − s calib (cid:107) . Since f is convex, also f ≡ T is convex. The weak lower semi-continuity ofthe functional T follows from the weak lower semi-continuity of the Hilbert space norm and the continuityof P .We now include a proof of Lemma 3.4, which also follows by standard arguments. Proof.

Using the linearity of B in both arguments, we obtain (cid:107) B ( c + x, s + y ) − B ( c, s ) − B ( c, y ) − B ( x, s ) (cid:107) = (cid:107) B ( x, y ) (cid:107) ≤ C (cid:107) x (cid:107)(cid:107) y (cid:107) ≤ C (cid:107) ( x, y ) (cid:107) It thus follows lim (cid:107) ( x,y ) (cid:107) X × Y → (cid:107) B ( c + x, s + y ) − B ( c, s ) − B (cid:48) ( c, s )( x, y ) (cid:107) Z (cid:107) ( x, y ) (cid:107) X × Y = 0for B (cid:48) ( c, s )( x, y ) = B ( x, s ) + B ( c, y ). 28 Proofs of Section 3.4

Proof of Theorem 3.3.

Proof.