[PDF] Multi-Regularization Reconstruction of One-Dimensional T_2 Distributions in Magnetic Resonance Relaxometry with a Gaussian Basis

Abstract

We consider the inverse problem of recovering the probability distribution function of T_2 relaxation times from NMR transverse relaxometry experiments. This problem is a variant of the inverse Laplace transform and hence ill-posed. We cast this within the framework of a Gaussian mixture model to obtain a least-square problem with an L_2 regularization term. We propose a new method for incorporating regularization into the solution; rather than seeking to replace the native problem with a suitable mathematically close, regularized, version, we instead augment the native formulation with regularization. We term this new approach 'multi-regularization'; it avoids the treacherous process of selecting a single best regularization parameter \lambda and instead permits incorporation of several degrees of regularization into the solution. We illustrate the method with extensive simulation results as well as application to real experimental data.

Full PDF

MMulti-Regularization Reconstruction of One-Dimensional T Distributions in Magnetic Resonance Relaxometry with aGaussian Basis

Chuan Bi , M. Yvonne Ou , Wenshu Qian , Kenneth W. Fishbein , MustaphaBouhrara , and Richard G. Spencer National Institute on Aging, Baltimore, MD 21224, U.S.A Department of Mathematical Sciences, University of Delaware, Newark, DE 19716,[email protected], [email protected], [email protected], ﬁ[email protected],[email protected],[email protected]

We consider the inverse problem of recovering the probability distribution func-tion of T relaxation times from NMR transverse relaxometry experiments. Thisproblem is a variant of the inverse Laplace transform and hence ill-posed. We castthis within the framework of a Gaussian mixture model to obtain a least-squareproblem with an L regularization term. We propose a new method for incorpo-rating regularization into the solution; rather than seeking to replace the nativeproblem with a suitable mathematically close, regularized, version, we insteadaugment the native formulation with regularization. We term this new approach’multi-regularization’; it avoids the treacherous process of selecting a single bestregularization parameter λ and instead permits incorporation of several degrees ofregularization into the solution. We illustrate the method with extensive simula-tion results as well as application to real experimental data. UPDATED: March 3, 2021

In the study of magnetic resonance (MR) and magnetic resonance imaging (MRI), measurementsand determination of T relaxation times by magnetic resonance spectroscopy (MRS) is one of the mainapplications. Physically speaking, T relaxation time represents the time constant for decay of transversemagnetization for speciﬁc tissue in atomic levels, therefore, due to the fact that diﬀerent materials possesstheir unique T values, the inverse problem is then considered to be recovering those values from dataacquired by the noninvasive MR tests. Signals obtained from measurement usually exhibit behaviorsof mono- or multi-exponential decay whose governing equation can be initially expressed as weightedsummation of exponential functions: y ob ( t ) = M (cid:88) i =1 c i e − t/T ,i (1.1)where c i and T ,i represent characteristics (concentration and relaxation time) of the i -th material in thesample. The addition parameter M denotes the pre-determined number of components.1 a r X i v : . [ s t a t . M E ] M a r s an alternative to the model in Eq. (1.1), a more general representation in the form of Fredholmintegral equation of the ﬁrst kind, is introduced: y ob ( t ) = (cid:90) T , max T , min G ( t, T ) f ( T ) dT (1.2)where G ( t, T ) = e − tT is called the kernel and f ( T ) ≥ T relaxation time distribution .As a consequence, Eq. (1.1) can be seen as a special case for Eq. (1.2) where the T distribution f ( T )becomes summation of M Dirac Delta functions at T = T ,i . On the other hand, Equation (1.2) canalso be recognized as a truncated Laplace transform, whose inverse transform is a well-known ill-posedproblem. In this case, the associated inverse problem is to determine the distribution f ( T ) given discretemeasurements from y ob ( t ). Practically, the observable data y ob ∈ R m comes as discretized values and thegoal is to determine a discretized version of the T distribution f true ∈ R n subject to the forward map y ob = Af + n , f ≥ (1.3)where A ∈ R m × n is the kernel matrix with entries A i,j = e − tiT ,j ∆ T , and n is denoted as additive noise.The positivity constraint f ≥ is physically intrinsic in the sense that, all T values of any material arealways non-negative. Inheriting the smoothing property of the integral operator in (1.2), A is a smoothingoperation whose condition number can be so large that any direct inversion without regularization such asusing pseudo-inverse or non-negative least squares (NNLS) can be extremely unstable.From the perspective of inverse problems, determination of discrete T distribution f can be seen asa constrained version of inverse Laplace transform and integral equation whose ill-posedness can be oftenaddressed by regularizations. And in inverse problems, Tikhonov-typed Regularization is often consideredas a powerful tool in stabilizing such ill-posedness. It is performed by minimizing the cost functioninvolving observable data and one regularization term which requires a regularization parameter λ . Theregularization parameter needs to be appropriately chosen so as to reach a balance between the modelerror and solution norm. In magnetic resonance applications, the classical Tikhonov regularization of reconstructing f given A and y ob reads f λ = argmin f ≥ (cid:110) (cid:107) Af − y ob (cid:107) + λ (cid:107) f (cid:107) (cid:111) . (1.4)Again the regularization parameter λ acts as a trade-oﬀ between the the size of regularized solutions and theﬁt to observable data. Additionally, prior knowledge of the solution f , such as sparsity or smoothness thatdepends on the actual application, can be applied to the minimization problem by varying the regularizingterm (cid:107) f (cid:107) to (cid:107) f (cid:107) p with p ≥ λ has been studied for decades, however, there is nouniversal approach. Classical methods such as L-curve [11, 10, 8, 9, 3], generalized cross-validation (GCV)[5, 19, 20], discrepancy principle (DP) [13, 14, 15] and so on are used to choose one λ by certain criteria.For example, in L-curve method, the optimal λ lies near the corner of the L-shaped curve practically, andthe L-curve is the log-log plot of the solution norm against residual norm; Selecting λ using GCV is tolocate the minimum of the GCV function; DP seeks λ such that the residual norm is proportional to thenorm of noise by a ﬁxed constant. All existing parameter selection methods seek only one optimal λ andits corresponding regularized solution, and discard all others obtained from regularization with diﬀerentvalues of λ . A survey of the methods can be found in [1]. Distinct from all current methods, we propose anew method termed Multi-Regularization (Multi-Reg) in section 2. Recovered approximation by Multi-Regis formed by a linear combination of regularized solutions across a range of proposed λ ’s, as oppose toselecting one λ and its associated approximation. 2 .3 Gaussian Mixture Model and Application to Determining T Distributions

Gaussian mixture model is an approach of representing a probability density function by a weightedsum of Gaussian distribution functions [17]. Gaussian mixture models are often considered as a means ofunsupervised learning problems where a cluster of data is assumed to be governed by a Gaussian densityfunction g ( µ i , σ i ), with µ i and σ i represent the mean and standard deviation of a single Gaussian densityin one dimension. In the application of determining the one-dimensional T relaxation distribution, theidea of representing the distribution f ( T ) in terms of a weighted sum of Gaussian density functions, is tosolve for the unknowns µ i , σ i so that the problem of determining a T distribution can be formed as annon-linear least squares problem: { σ i , µ i } Mi =1 = argmin { σ i ,µ i } (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) y ob − A M (cid:88) i =1 g ( σ i , µ i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (1.5)where g ( σ i , µ i ) denote the discretized Gaussian density function with standard deviation σ i and mean µ i . This approach has the advantage of signiﬁcantly reducing the size of unknowns [16]. However, thisnonlinear form requires heavily on the prior knowledge of how many Gaussian functions are needed torepresent the end distribution, and the intrinsic ill-posedness of the linear problem in Eq. (1.3) is altered,which is beyond the scope of study in this paper.Instead, it is still possible to treat the problem of determination of T distribution based on the Gaussianmixture assumption as simply a least squares problem by providing a dictionary of Gaussian functions tochoose from: c ∗ = argmin c ≥ , (cid:80) i c i =1 (cid:107) y ob − ( AG ) c (cid:107) , (1.6)where columns of matrix G , termed { g i } , represent diﬀerent Gaussian distributions and c is the vectorof coeﬃcients assigned to each proposed distribution. The least squares problem is then considered asﬁnding the coeﬃcients of the proposed Gaussian distribution functions such that problem (1.6) is satisﬁed.The problem is usually severely ill-posed due to the non-orthogonality of Gaussian functions that form thedictionary, and can lead to extremely unstable solutions. In general, in the case of a given underlying function f true ≥ , a common procedure of simulation aimsto test the performance of a certain parameter selection method, is performed following the steps below:1. Generate noiseless observation: y clean = Af true .2. Manually add noise with pre-selected SNR: y noisy = y clean + n .3. Solve Tikhonov regularization problem (1.4) with diﬀerent proposed λ j to get regularized solution f λ j .4. Choose one regularized solution f λ ∗ according to certain parameter selection method.5. Compare the end result f λ ∗ with the underlying distribution f true .The common ground for all existing parameter selection methods is that, the “optimal” parameteris chosen among a list of proposed values and the associated regularized solution follows. That is thesame way to say that, in the view of all proposed regularized solutions, existing methods assign coeﬃcients[0 , , · · · , , , , · · · ,

0] to the candidates, in which at index i , the i -th regularized solution is selected. How-ever, in general, there is no best method of parameter selection, and depending on the actual applicationsof the inverse problems, various methods perform diﬀerently. This means that there is no guarantee that3he selected i -th regularized solution according to a conventional method ought to be the optimal whileother methods might return diﬀerent values.To put the problem of parameter selection in a grater framework, that is, the end solution is consideredto be a vector of coeﬃcients assigned to diﬀerent regularized solutions. So the end solution is not restrictedto sparse solutions such as [0 , , · · · , , , , · · · , f true is given, we can certainly obtainbetter recovery than classical parameter selection methods for (1.4) with a little help of “cheating”, whichis by solving for { α j } N that satisfy f true ≈ N (cid:88) j =1 α j f λ j (2.1)where { α j } N solve the following least square problem( α , α , · · · , α N ) = argmin α ≥ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) f true − N (cid:88) j =1 α j f λ j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (2.2)In other words, from the perspective of Eq. (2.2), all traditional parameter selection methods only arriveat a single regularized solution f J that corresponds to special choices of { α j } N where α J = 1 for some J and all other α j = 0 for j (cid:54) = J . Given the observation above, we wonder if similar idea can be appliedto this inverse problem in practice where f true is not known, as a result, the problem now becomes todetermine { α j } N while the underlying f unknown is not given: f unknown ≈ N (cid:88) j =1 α j f λ j , where α j ≥ (cid:8) f λ j (cid:9) N as one set of representing functions for f unknown as in (2.3), we also takeadvantage of the assumption that f unknown can be represented by linear combinations of Gaussian functions { g i } M in order to be consistent with the non-negativity f unknown ≥ . Therefore the other representationof f unknown yields f unknown = M (cid:88) i =1 c i g i , where c i ≥ . (2.4)Additional information maybe available depending on the applications. For MR relaxometry, f ( T ) isregarded as a probability distribution of T times, whose integral over the whole T space equals 1. In Eq.(2.4) this means (cid:80) i c i ≡

1. To link the two expressions (2.3) and (2.4), we propose to approximate theregularized solutions f λ j as a linear combination of the regularized solutions { g ij } Nj =1 , where noise pollutedGaussian g ij is obtained after step 1 - 3 for each Gaussian function g i with respect to λ j . Given noisy observation y ob ∈ R m , we obtain the regularized solutions f j =: A − λ j y ob (as a shortof f λ j ) by solving (1.4). Note that the inversion operator A − λ j is symbolic and not the usual form ofpseudoinverse because of the non-negativity constraints. We ﬁrst assume the underlying distribution f unknown can be written as a linear combination of f j , and the goal is to seek coeﬃcients { α j } N as well as f unknown according to f unknown ≈ N (cid:88) j =1 α j f j =: f α (2.5)and note that both f unknown and α j need to be determined.4n addition, if a Gaussian distribution g i ∈ R n is provided, it can be approximated as a linear combi-nation g ij ∈ R n , which are regularized solutions after the procedure g i → Ag i → Ag i + n → g ij or simplywritten as g ij = A − λ j ( Ag i + n ) . (2.6)Then follow the intuition of Multi-Reg g i ≈ N (cid:88) j =1 β ij g ij (2.7)where random noise realizations n share the same SNR tailored to the experimental data y ob . In fact, as n is random, the corresponding g ij and β ij are random variables whose values diﬀer for each diﬀerent n .So we average both g ij and β ij over n run times of noise realizations to get the mean values of g ij and β ij .Numerically the coeﬃcients β ij can be obtained by solving the LS problem β ij = argmin β ij ≥ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g i − N (cid:88) j =1 β ij g ij (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , i = 1 , , · · · , M (2.8)Using the presumptive representation for our distribution f unknown that can be seen as sum of proposedGaussian functions { g i } Mi =1 , we have f unknown ≈ M (cid:88) i =1 c i g i =: f c (2.9)and note that coeﬃcients c i need to be determined since f unknown is not known. By the assumption of(2.9) that the original distribution is represented in terms of Gaussian functions, we can further assumeeach regularized recovery f j can be represented by regularized solutions g ij as well: f j ≈ M (cid:88) i =1 x ij g ij (2.10)so coeﬃcients x ij satisfy the following LS problem, where each minimization is over index i with oneminimization problem for each j : x ij = argmin x ij ≥ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) f j − M (cid:88) i =1 x ij g ij (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , j = 1 , , · · · , N (2.11)In conclusion, once the dictionary of Gaussian distributions { g i } Mi =1 is deﬁned and given the noisy data y ob , and the statistics of the noise in the experiment is known, we would be able to calculate the sets ofcoeﬃcients { β ij } and { x ij } from the regularized solutions { f j } and { g ij } and our task becomes to seekcoeﬃcients { α j } N , { c i } M in (2.5) and (2.9).Now equating the equations (2.5) and (2.9) together with (2.7) and (2.10), we arrived at an equationin terms of the obtained variables β ij , x ij and g ij : N (cid:88) j =1 α j f j ≈ M (cid:88) i =1 c i g i ⇒ f α = N (cid:88) j =1 α j M (cid:88) i =1 x ij g ij ≈ M (cid:88) i =1 c i N (cid:88) j =1 β ij g ij = f c (2.12)Then we can solve for α j and c i at the same time by solving the following LS problem  ( α ∗ , c ∗ ) = argmin (cid:107) f α − f c (cid:107) subject to α ∗ ≥ , c ∗ ≥ , and (cid:88) i c ∗ i = 1 , (2.13)5ote that the solution ( α ∗ , c ∗ ) is unique, provided the additional equality constraint on (cid:80) Mi =1 c ∗ i = 1.Hence the ﬁnal recovery f ∗ will be f ∗ = N (cid:88) j =1 α j f j or f ∗ = M (cid:88) i =1 c i g i (2.14) Before diving into the detailed computations, some notations shall be declared: L α = (cid:2)(cid:2) g g · · · g M (cid:3) , · · · , (cid:2) g N g N · · · g MN (cid:3)(cid:3) ∈ R m × MN (2.15) L c = (cid:2)(cid:2) g g · · · g N (cid:3) , · · · , (cid:2) g M g M · · · g MN (cid:3)(cid:3) ∈ R m × MN (2.16)note that the column size of L α is N M and L c is M N , the row size of two matrices are the same, whichis the number of nods for T . x vec = (cid:2)(cid:2) x x · · · x M (cid:3) , · · · , (cid:2) x N x N · · · x MN (cid:3)(cid:3) ∈ R MN (2.17) β vec = (cid:2)(cid:2) β β · · · β N (cid:3) , · · · , (cid:2) β M β M · · · β MN (cid:3)(cid:3) ∈ R MN (2.18)where the sizes of x vec and β vec are same as number of columns for L α and L c , respectively. α vec = (cid:2)(cid:2) α α · · · α α (cid:3) M , · · · , (cid:2) α N α N · · · α N α N (cid:3) M (cid:3) T (2.19) c vec = (cid:2)(cid:2) c c · · · c c (cid:3) N , · · · , (cid:2) c M c M · · · c M c M (cid:3) N (cid:3) T (2.20)where α vec ∈ R MN and c vec ∈ R MN share the same size with x vec and β vec as well.According to (2.5) and (2.9) f α = L α · diag ( x vec ) · α vec (2.21) f c = L c · diag ( β vec ) · c vec (2.22)hence the minimization problem (2.13) can be explicitly written as  ( c ∗ , α ∗ ) = argmin (cid:107) L α · diag ( x vec ) · α vec − L c · diag ( β vec ) · c vec (cid:107) subject to c i ≥ , α j ≥ , (cid:88) i c i = 1 (2.23)Assume the solution is stacked in the form of s = [ α , α , · · · , α N , c , c , · · · , c M ] T ∈ R N + M (2.24)problem (2.23) can be further simpliﬁed to the classical form of constrained least square problem: ﬁnd s ∗ ∈ R N + M such that  s ∗ = argmin s ≥ (cid:107) Bs (cid:107) such that N + M (cid:88) j = N +1 s j = 1 (2.25)where B = L α · diag ( x vec ) · TT α − L c · diag ( β vec ) · TT c ∈ R m × ( N − M ) (2.26) TT α = I N ⊗   M · (cid:2) I N N × M (cid:3) ∈ R MN × ( N + M ) (2.27)6 T c = I M ⊗   N · (cid:2) M × N I M (cid:3) ∈ R MN × ( N + M ) (2.28) Remarks:

1. The computational procedure can be divided into two parts: oﬄine computation and online com-putation. For oﬄine computation, { g i } , { g ij } and { β ij } will be determined for only once. As foronline part, each time a noisy measurement y ob is given, the corresponding { f j } , { x ij } and ( c ∗ , α ∗ )can be obtained and ﬁnal recovery is denoted as (2.14).2. To solve for { β ij } in (2.7), note that g ij depends on the noise and for a ﬁxed g i , the resulted β ij depends on the added noise n as well, in other words, both g ij and β ij are actually random variables.One could perform the same task described in (2.7) for n run times and take the mean values for g ij and β ij .The pseudocode for oﬄine and online computations read as follows: Algorithm 1:

Oﬄine Computation

Data:

Pre-determined { λ j } , { g i } , n run and noise SN R

Result: { g ij } and { β ij } to be storedinitialization; for k = 1 , , · · · , n run dofor i = 1 , , · · · , M doy ( i )clean = A g i ; y ( i )noisy = y ( i )clean + n ; for j = 1 , , · · · , N dog ij = argmin g ≥ (cid:26)(cid:13)(cid:13)(cid:13) Ag − y ( i )noisy (cid:13)(cid:13)(cid:13) + λ j (cid:107) g (cid:107) (cid:27) ; endend Average g ij to get g ij ; β ij = argmin β ij ≥ (cid:13)(cid:13)(cid:13) g i − (cid:80) Nj =1 β ij g ij (cid:13)(cid:13)(cid:13) . endAlgorithm 2: Online Computation

Data: y ob , { λ j } , { g ij } , { β ij } Result: f ∗ initialization; for j = 1 , , · · · , N dof j = argmin f ≥ (cid:110) (cid:107) Af − y (cid:107) + λ j (cid:107) f (cid:107) (cid:111) ; x ij = argmin x ij ≥ (cid:13)(cid:13)(cid:13) f j − (cid:80) Mi =1 x ij g ij (cid:13)(cid:13)(cid:13) ; end ( c ∗ , α ∗ ) = argmin c , α ≥ , (cid:80) i c i =1 (cid:13)(cid:13)(cid:13)(cid:80) Nj =1 α j (cid:80) Mi =1 x ij g ij − (cid:80) Mi =1 c i (cid:80) Nj =1 β ij g ij (cid:13)(cid:13)(cid:13) ; f ∗ = (cid:80) j α ∗ j f j or f ∗ = (cid:80) i c ∗ i g i Case study: inversion of one-dimensional continuous NMR re-laxometry

The one-dimensional continuous NMR relaxometry signal admits the form [2, 12, 21, 6] y ob ( t ) = (cid:90) ∞ e − tT f true ( T ) d T + n ( t ) (3.1) y ob ( t ) denotes the observational signal at time t , T are relaxation times, f true ( T ) ≥ T distribution function and n ( t ) is the additive Gaussian noise for which the statistics of thenoise can be obtained. The truncated forward problem of (3.1) y ob ( t ) = (cid:90) T end T start e − tT f true ( T ) d T + n ( t ) (3.2)can be discretized as (1.3) y ob = Af true + n , f true ≥ (3.3)where y ob ∈ R m , f true ∈ R n , n ∈ R n , and entries of matrix A ∈ R m × n is given by A ij = e − tiT ,j ∆ T . Inthis case, we deﬁne the SNR of the noise as followsSNR = max | y ob | RMS( n ) (3.4)Gaussian Mixture Model (GMM) is considered as as a powerful tool to represent an arbitrary probabilitydistribution by a weighted sum of Gaussian distributions [16, 4, 7] and GMM works for reconstruction of T distribution because of the nature of non-negativity for f true . Therefore the associated inverse problemto (1.3) for Multi-Reg is f j = argmin f ≥ (cid:110) (cid:107) Af − y ob (cid:107) + λ j (cid:107) f (cid:107) (cid:111) (3.5)subject to the prior information that f can be represented as a linear combination of Gaussian distributions. In order to proceed with the oﬄine computation, a few parameters need to be pre-determined, such asthe number of regularization parameters N , number of Gaussian functions M , choice of Gaussian functions g i , and the number of noise realizations n run added in the time-domain representation for each simulateddistribution in (2.6).We use the following biexponential form of f sim ( T ) in the T domain for simulation f sim ( T ) = 1 (cid:112) πσ e − ( T − µ σ + 1 (cid:112) πσ e − ( T − µ σ (3.6)where µ is set to 20ms, and µ is determined by the ratio of peak separation (RPS) µ µ , which consists of8 evenly spaced values between 1 and 8; σ = σ which contains 10 evenly spaced values between 2 and 8.There are 8 ×

10 = 80 diﬀerent biexponential distributions in total.Discretized T values consist n = 200 evenly spaced nodes within the range [1 , y sim for truncated forward problem y sim = Af sim + n (3.7)is deﬁned on t ∈ [0 . , m = 150 evenly spaced nodes, with A ij = e − tiT ,j . The SNR of the additivenoise is set to be 500. For the Tikhonov regularization (3.5) used in Multi-Reg, the range of regularizationparameter is from 10 − to 10. 8o test performance of Multi-Reg on the 80 simulated signals y sim , we represent the simulation resultusing heat maps to indicate the relative error over a wide range of Gaussian peak separations and widthswhere the relative error is deﬁned as ε = (cid:107) f ∗ − f sim (cid:107) (cid:107) f sim (cid:107) (3.8)where f ∗ is the reconstructed distribution using Multi-Reg [18]. Moreover, the criteria for evaluating theperformance of each setting for Multi-Reg is the mean value of the heat map matrix (where the ( i, j )-thelement of the matrix stands for the relative error corresponds to the i -th σ and j -th RPS).To achieve good settings of Multi-Reg, we study diﬀerent possible conﬁgurations of the dictionary { g i } M by varying the total number M , the standard deviations σ k for the Gaussian functions, as well asthe number of regularization parameters n λ , and number of repetitive tests n run . Since there are inﬁnitesuch compositions, there is no way to analyze each possible conﬁguration numerically, so we tested a fewpossible diﬀerent settings and pick the one that has the smallest mean error of the heatmaps. In general,we tested 7 groups of dictionaries, for each dictionary { g i } M in group k , the standard deviations σ k of allGaussian functions g i equals k , and the mean values of all Gaussian distributions are set to be equallyspaced along the T axis depending on the number M , which means that, as M increases, the mean valuesof Gaussian functions become closer. For each group, we tested various settings, as are shown below. M Diﬀerent M in the ﬁnite set { g i } M with uniform standard deviation σ k = k for each g i and for k = 1 , , · · · ,

7. To do this, for a given value of M , we vary the number of Gaussians by means of changingthe number of evenly spaced µ i within the range of T ∈ [1 , g i . The proposed values of M arechosen within { , , , , , , , , } , the performance (indicated by the mean valuesof the heatmaps) of Multi-Reg under the diﬀerent settings of { g i } M . Other variations are ﬁxed where thenumber of repetitive noise realizations n run = 10, and number of regularization parameters n λ = 10.

100 120 140 160 180 200 220 240 260 280

Number of Gaussians M ean V a l ue s o f H ea t m ap s = 1 = 2 = 3 = 4 = 5 = 6 = 7 Figure 1: Mean values of heat maps versus number of Gaussians M in { g i } M , i.e. the number of evenly spaced µ i for each g i with σ k = k where k = 1 , , · · · , σ k = 7 and with M = 220 as our proposed set ofGaussian distributions, simply because it shows the smallest averaged relative error among all other con-ﬁgurations. 9 .1.2 Choosing n λ The regularization parameters are chosen to be the n λ logarithmically spaced points between 10 − and10. Unlike in L-curve or GCV that a ﬁner grid of regularization parameters is preferred than a coarse one,a good number of regularization parameters n λ for Multi-Reg should seek for a balance between reducingthe ill-posedness for solving the LS problems (2.8) and (2.11) and expanding the space spanned by { f j } .This means on one hand, n λ should be small so that solutions to (2.8) and (2.11), { β ij } and { x ij } arestable; on the other hand, n λ should be large so that each f j may be able to show various regularized results.While we ﬁx all other settings of the method, where the M = 220 and n run = 10, we plot the trajectoriesof mean values of the heatmaps against various numbers of n λ . Number of Gaussians M ean V a l ue s o f H ea t m ap s = 1 = 2 = 3 = 4 = 5 = 6 = 7 Figure 2: L norms of heat maps for diﬀerent number n λ ofregularization parameters, where the regularization parametersrange from 10 − to 10.We see that the “best” scenario is when n λ = 12 for σ k = 7, as it resulted in a smaller mean values ofthe heatmaps. n run To choose n run , we proposed a set of values { , , , , , , } to choose from while ﬁxing all otherparameters where n λ = 12, M = 220. We plot the trajectories of the mean values as functions of n run ,shown in Fig. 3. 10

10 20 30 40 50 60 70 n run M ean V a l ue s o f H ea t m ap s = 1 = 2 = 3 = 4 = 5 = 6 = 7 Figure 3: Mean values of the heat maps against number of noiserealizations.It is seen from the trajectories that, when σ k = 6, and the associated n run = 32, the mean values ofheatmaps are minimal among all conﬁgurations considered for the test of choosing n run .In summary, during the limited number of testings, the Gaussian function dictionary is chosen to beformed as follows: M = 220, n λ = 12, n run = 32 and with uniform standard deviation σ = 6. After the preferred settings of Multi-Reg is determined, we would be ready to proceed online com-putations once new observational data y ob is given. We will use heat maps to represent the diﬀerencesbetween various underlying distributions and recoveries using Multi-Reg, together with L-curve and GCV.For L-curve and GCV recovery, we follow used Tikhonov Regularization to determine coeﬃcients of thesame given Gaussian basis { g i } as is used in the oﬄine computation for Multi-Reg, and increased thenumber of regularization parameter to 40 evenly logspaced nodes from 10 − to 10, in order to achieve ahigher resolution of L-curve and GCV curve. After the simulation, we averaged the the heat maps as wellas their corresponding L norm over 40 times. Each time of simulation, we manually added diﬀerent noiserealizations with same SNR. Results are shown below:11 ulti-Reg Recoveries Ratio of Peak Separation G au ss i an (a) GCV Recoveries

Ratio of Peak Separation G au ss i an (b) L-curve Recoveries

Ratio of Peak Separation G au ss i an (c) Figure 4: Comparisons of the heat maps of recoveries using Multi-Reg, GCV and L-curve, on noisy signalsgenerated by various distributions.To better compare the results between Multi-Reg and the other two methods, and indicate regimes indetail, we plot the fractions of the relative-error heat maps matrices:

Multi-RegGCV and

Multi-RegL-curve as below, i.e.in the regime where the ratio is less than 1, Multi-Reg performs better and otherwise when the ratio islarger than 1, in the sense of relative errors. 12 omparison with GCV

Ratio of Peak Separation G au ss i an Figure 5: Ratio of average relative errors of Multi-Reg/GCV methods, over 20 noise realizations withSNR=500, corresponding to a range of peak widths and separations for two Gaussian peaks of equalstandard deviation.

Comparison with L-curve

Ratio of Peak Separation G au ss i an Figure 6: Ratio of average relative errors of Multi-Reg/L-curve methods, over 20 noise realizations withSNR=500, corresponding to a range of peak widths and separations for two Gaussian peaks of equalstandard deviation. 13

20 40 60 80 100 120 140 160 180 200