Nonlinear inverse problem by T-matrix completion. I. Theory
aa r X i v : . [ m a t h - ph ] S e p Solution of the nonlinear inverse scattering problem by T-matrix completion. I.Theory
Howard W. Levinson
Department of Mathematics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA ∗ Vadim A. Markel † Aix-Marseille Universit´e, CNRS, Centrale Marseille,Institut Fresnel UMR 7249, 13013 Marseille, France (Dated: August 20, 2018)We propose a conceptually new method for solving nonlinear inverse scattering problems (ISPs)such as are commonly encountered in tomographic ultrasound imaging, seismology and other appli-cations. The method is inspired by the theory of nonlocality of physical interactions and utilizes therelevant formalism. We formulate the ISP as a problem whose goal is to determine an unknown inter-action potential V from external scattering data. Although we seek a local (diagonally-dominated) V as the solution to the posed problem, we allow V to be nonlocal at the intermediate stages ofiterations. This allows us to utilize the one-to-one correspondence between V and the T-matrix ofthe problem, T . Here it is important to realize that not every T corresponds to a diagonal V andwe, therefore, relax the usual condition of strict diagonality (locality) of V . An iterative algorithmis proposed in which we seek T that is (i) compatible with the measured scattering data and (ii)corresponds to an interaction potential V that is as diagonally-dominated as possible. We refer tothis algorithm as to the data-compatible T-matrix completion (DCTMC). This paper is Part I in atwo-part series and contains theory only. Numerical examples of image reconstruction in a stronglynonlinear regime are given in Part II. The method described in this paper is particularly well suitedfor very large data sets that become increasingly available with the use of modern measurementtechniques and instrumentation. I. INTRODUCTION
Inverse scattering problems (ISPs) are encountered inoptical diffusion tomography [1, 2], diffraction tomog-raphy [3, 4], electrical impedance tomography [5–7], innear-field [8–10] and far-field [11–13] tomographic elec-tromagnetic imaging, in seismic tomography [14, 15],and in many other physical and engineering applications.Solving nonlinear ISPs is a difficult computational task,especially in three dimensions. This is even more true forproblems involving large data sets that are available withthe use of modern experimental techniques. Developingefficient algorithms for solving nonlinear ISPs remains afundamental problem of computational physics and animportant challenge.Nonlinear ISPs are amply reviewed in the litera-ture [16–20]. The mainstream approach to solving theseproblems numerically is Newton’s method and its vari-ants such as Levenberg-Marquardt method, iterativelyregularized Gauss-Newton method, Newton-Kantorovichmethod and steepest descent (Landweber iteration).These methods (except for Newton-Kantorovich) are suc-cinctly explained in [21]. Newton-Kantorovich iterationsare closely related [22] to the method of inverse Bornseries [7, 23, 24]. A different class of non-deterministic ∗ [email protected] † On leave from the Department of Radiology, Universityof Pennsylvania, Philadelphia, Pennsylvania 19104, USA;[email protected],[email protected] inversion approaches that make use of some form of priorknowledge about the medium is based on Bayesian infer-ence [25]. The common feature of all these approaches(except for the inverse Born series) is that a certain costfunction is minimized and updated iteratively and thatthis cost function depends on all available measurements(data points). In the case of inverse Born series, the solu-tion is obtained as an analytically-computable functionalof the data.The method proposed in this paper is conceptually dif-ferent from the methods reviewed above and is based ona digression into a seemingly unrelated field of physics,namely, into the theory of nonlocality. This theory ac-counts for the fact that certain physical processes occur-ring at the point r in space can be influenced by thefield in some finite vicinity of that point. For exam-ple, in local electrodynamics, Ohm’s law is written as J ( r ) = σ ( r ) E ( r ). In a nonlocal theory, this linear relationis generalized by writing J ( r ) = R V ( r , r ′ ) E ( r ′ ) d r ′ . Ofcourse, we expect on physical grounds that V ( r , r ′ ) → | r − r ′ | > ℓ , where ℓ is the characteristic scale of non-locality (the radius of influence), which is usually muchsmaller than the overall size of the sample. If the electricfield E ( r ) does not change noticeably on the scale of ℓ ,we can define the local conductivity as σ ( r ) = Z V ( r , r ′ ) d r ′ (1)and use Ohm’s law in its local form. This is all wellknown in physics. However, implications of nonlocalityfor nonlinear ISPs have not been considered so far.Let us assume that we want to find σ ( r ) from the mea-surements of voltage drop for a direct current injectedinto the sample by two point-like electrodes attachedto its surface at various points (Calderon problem). Itturns out that it is much easier to find a nonlocal ker-nel V ( r , r ′ ) that is consistent with the measurements.Of course, V ( r , r ′ ) can not be determined uniquely froma typical data set because the number of unknown pa-rameters (degrees of freedom) in V ( r , r ′ ) is usually muchlarger than the number of measurements. However, asexplained above, we also expect that V ( r , r ′ ) should beapproximately diagonal. We then proceed as follows:(1) First, we define a class of kernels V ( r , r ′ ) that arecompatible with the data. This is the only instancewhen the data are used, and it turns out that thesize of the data set is not a limiting factor for thisstep.(2) Then we iteratively reduce the off-diagonal normof V ( r , r ′ ) while making sure that V ( r , r ′ ) remainswithin the class of “data-compatible” kernels.(3) Once the ratio of the off-diagonal and diagonalnorms of V ( r , r ′ ) is deemed sufficiently small, wecompute σ ( r ) = R V ( r , r ′ ) d r ′ . This gives an ap-proximate numerical solution to the nonlinear ISP.The above algorithm can be generalized to other ISPs.We refer to it as to Data-Compatible T-matrix Comple-tion (DCTMC) method.The role of the T-matrix in solving nonlinear ISPs hasbeen recognized previously [14, 15, 26, 27]. The key newinsight used in DCTMC is to relax the requirement that V be strictly diagonal. This allows one to establish aone-to-one correspondence between T and V . The firstadvantage of using this approach is that the T-matrixis source- and detector-independent. For example, finite-difference and finite elements forward solvers must be runanew for each source used. The T-matrix approach is freefrom this requirement. The price of this simplification isthat the transformations between T and V involve in-version of dense matrices. However, the computationalcomplexity associated with T to V and V to T operationscan be reduced, for example, by exploiting the sparsityof V . Second, the T-matrix-based approach results in auseful data reduction, which is applicable to both linearand nonlinear image reconstruction regimes. Finally, themethod does not utilize a cost function in the traditionalsense and therefore it is not affected by the problem oflocal minima of the cost function (false solutions).We underscore that physical interactions are nevertruly local and some small degree of nonlocality exists inall physical systems. However, the radius of influence ℓ istypically so small (e.g., equal to the atomic scale) that thenonlocality can be safely ignored for most practical pur-poses. In our approach, we relax this condition and allow V to be off-diagonal on much larger scales. Of course, wewill seek to find V that is as diagonal as possible. How-ever, we do not expect to eliminate all off-diagonal terms that are separated by more than one atomic scale, notto mention that such fine discretization of the medium ispractically impossible. Thus, the non-locality of V thatis utilized in DCTMC is not an intrinsic physical prop-erty of the material but rather a physically-inspired trickthat is used to facilitate the solution of nonlinear ISPs. Inother words, we simplify the solution process by relaxingthe underlying physical model.This paper is Part I of a two-part series wherein we fo-cus our attention on theory. Numerical examples for thenonlinear inverse diffraction problem are given in PartII [28]. The remainder of this paper is organized as fol-lows. In Sec. II we state the general algebraic formu-lation of the nonlinear ISP that is applicable to manydifferent physical scenarios. In Sec. III, we introducethe data-compatible T-matrix, which is a central ideaof the proposed method. In Sec. IV, we define the ba-sic iterative algorithm of DCTMC. In Sec. V, we intro-duce ”computational shortcuts”, which combine analyti-cally several steps of the DCTMC algorithm into a singlestep with a reduced computational complexity. DCTMCalgorithm in the linear regime is discussed in Sec. VI.Here we also discuss convergence and regularization ofthe method. Sec. VII contains a brief discussion. Aux-iliary information is given in several appendices. Sum-mary of linearizing approximations (first Born, first Ry-tov and mean-field) is given in Appendix A. Appendix Bcontains a derivation that establishes the correspondencebetween DCTMC and the conventional methods in thelinear regime. Finally, definitions and properties of sev-eral functionals used in this paper are summarized inAppendix C. II. GENERAL FORMULATION OF THE ISP
Consider a linear operator L and the equation L u ( r ) = q ( r ) , (2)where u ( r ) is a physical field and q ( r ) is the source term.Note that (2) does not contain time but can depend para-metrically on frequency. It can be said that we workin the frequency domain. Moreover, we consider only asingle fixed frequency. Using different working frequen-cies as additional degrees of freedom for solving an ISPcan be very useful (especially if the contrast is approx-imately frequency-independent, as is often the case inseismic imaging) but is outside of the scope of this pa-per.Let L = L − V , where L is known and V is the un-known interaction operator that we seek to reconstruct.As discussed above, we assume at the outset that V is anintegral operator with the kernel V ( r , r ′ ) but, eventually,the computed image will be obtained as a function of r only. We also assume that V ( r , r ′ ) = 0 only if r , r ′ ∈ Ω,where Ω is a spatial region occupied by the sample. Ourgoal is to recover V from the measurements of u per-formed outside of the sample, assuming that it is illumi-nated by various external sources. We can not performmeasurements or insert sources inside the sample, whichwould have greatly simplified the ISP solution if it wasphysically possible.The inverse of L is the complete Green’s function ofthe system, denoted by G = L − . The formal solutionto (2) is then u = Gq . We know that G exists as long asthe forward problem has a solution. This is usually thecase if V is physically admissible. Likewise, the inverseof L is the unperturbed Green’s function, denoted by G = L − . The field u inc = G q is the incident field,in other words, it is the field that would have existedeverywhere in space in the case V = 0. Nonzero V givesrise to a scattered field u scatt , and the total field is a sumof the incident and scattered components, u = u inc + u scatt . A straightforward algebraic manipulation yieldsthe following result: u scatt = ( G − G ) q = G ( I − V G ) − V G q , (3)where I is the identity operator.A single data point Φ ( r d , r s ) is obtained by illuminat-ing the medium with a localized source of unit strength, q ( r ) = δ ( r − r s ), and measuring the scattered field by adetector at the location r d [29]. By scanning r d and r s on the measurement surfaces Σ d and Σ s outside of thesample, we measure a function of two variables Φ ( r d , r d ),which is coupled to V ( r , r ′ ) by the equation G ( I − V G ) − V G = Φ . (4)All product and inversion operations in (4) should be un-derstood in the operator sense. The ISP can now be for-mulated as follows: Given a measured function Φ ( r d , r s ),where r d ∈ Σ d and r s ∈ Σ s , find an “approximately diag-onal” kernel V ( r , r ′ ), where r , r ′ ∈ Ω. We do not need todefine “approximate diagonality” precisely at this point,but in the case of matrices that are inevitably used inall computations, this requirement implies a sufficientlysmall ratio of the off-diagonal and diagonal norms.It is important to note that G in (4) is the same oper-ator in all instances where it appears, but for the purposeof computing the operator products and inverses, its ker-nel G ( r , r ′ ) is differently restricted. This is illustratedgraphically in Fig. 1. Thus, for the first term G in theleft-hand side of (4), r = r d ∈ Σ d and r ′ = r ′ ∈ Ω.For the second term (inside the brackets) r = r ∈ Ωand r ′ = r ∈ Ω. For the last term, r = r ′ ∈ Ω and r ′ = r s ∈ Σ s . We emphasize that the imaging geometryshown in Fig. 1 is representative but not very general. Inparticular, the measurement surfaces Σ d and Σ s can belarger or smaller than the face of the cube, or curve, oreven be regions of space of finite volume rather than sur-faces [30]. The sample volume Ω does not have to be cu-bic and, in an extreme case, it can be a two-dimensionalsurface. All this has no bearing on the method of thispaper. The only requirement that we impose, which isphysical rather than mathematical, is that Σ d and Σ s donot overlap with Ω. However, Σ d can overlap with Σ s . Further, in all practical implementations, the dataare sampled rather than measured continuously and themedium is voxelized. An example of such discretization isgiven in [28]. At this point we proceed under the assump-tion that (4) can be suitably discretized and convertedto a matrix equation. In this case, it is logical to usedifferent notations for the matrices that are obtained bydifferent restriction and sampling of the kernel G ( r , r ′ ).Indeed, the matrices obtained in this manner are differ-ent and can even be of different size. We will denote thematrices obtained by sampling the first, the second, andthe last terms G in (4) by A , Γ and B , respectively.These notations are also illustrated in Fig. 1. Then thediscretized version of (4) takes the following form: A ( I − V Γ ) − V B = Φ . (5)In (5), A , B and Γ are known theoretically, Φ is mea-sured, and we seek to find the unknown V .Eq. (5) is the main nonlinear equation that is dis-cussed in this paper. It is, in fact, very general andencompasses many different problems of imaging andtomography. The underlying physical model is encodedin the operator G and in the matrices A , B and Γ thatare obtained by sampling this operator. The followingthree important remarks about this equation can bemade: Remark 1: Noninvertibility of A and B . Ifmatrices A and B were invertible in the ordinary sense,the nonlinear ISP would be solvable exactly by threeoperations of matrix inversion. Unfortunately, A and B are almost never invertible. To construct A and B of suf-ficiently high rank, one needs to perform measurementsinside the medium. As was noted above, this is usuallyimpossible. The typical sizes of all matrices involvedwill be discussed below (see Fig. 2 and its discussion). Remark 2: Linearization.
One may seek a lin-earization of the ISP by approximating the left-hand sidein (5) with various expressions that allow an analyticallinearizing transformation. The three main approachesto achieve this end are first Born, first Rytov and mean-field approximations, and they are briefly summarizedin Appendix A. In the mathematical formulation of theISP, the three approximations differ only by the trans-formation that is applied to the data matrix Φ , while thegeneral form of the linearized equation is in all cases AV B = Ψ [ Φ ] , (6)where Ψ [ Φ ] is the appropriate transformation of the datamatrix; in the simplest case of first Born approximation, Ψ [ Φ ] = Φ . Remark 3: Matrix unrolling for the linearizedproblem.
The linearization approaches described in Ap-pendix A do not require or enforce by design the diag-onality of V . However, in the conventional treatments ! d ! s r d r s G ( r d , r ) !" r , r , r , r ; r s ! s ; r d ! d G ( r , r ) V ( r , r ) V ( r , r ) G ( r , r s ) r r r V r V A B
FIG. 1. (color online) Illustration of the imaging geometry. The symbols A , B , Γ and V in the rectangular frames denote thematrices obtained by restricting and sampling the kernels G ( r , r ′ ) and V ( r , r ′ ). The scattering diagram corresponds to thesecond order term G V G V G in the formal power-series expansion of the left-hand side in (4). Note that in the local limit V ( r , r ′ ) = D ( r ) δ ( r − r ′ ), the two arrows contract to two vertexes at r = r ′ and r = r ′ . of the problem, it is typical to assume that V is strictlydiagonal and to operate with the vector | υ i composed ofthe diagonal elements of V . Accordingly, the matrix Ψ isunrolled into a vector | ψ i by the matrix operation knownas vec , that is, by stacking the columns of Ψ into onecolumn-vector. The resultant equation has the form K | υ i = | ψ i , (7)where K is a matrix obtained by multiplying the elementsof A and B according to the rule K ( mn ) ,j = A mj B jn and( mn ) is a composite index. The important point here isthat the conventional methods often treat K in (7) as amatrix of the most general form. In contrast, DCTMCalgorithm takes account of the special algebraic structureof K and, therefore, can be used advantageously even inthe linear regime. This is discussed in more detail inAppendix B. III. T-MATRIX AND ITS REPRESENTATIONS;“EXPERIMENTAL” T-MATRIX
The basic definition of the T-matrix (which is, actu-ally, an operator) is through the relation between the complete and the unperturbed Green’s functions: G = G + G T G . By direct comparison with (3) we findthat T = ( I − V G ) − V . (8)We will not use different notations for the operator T andits discretized version, which is truly a matrix. Conse-quently, Eq. (5) can be rewritten as AT B = Φ , (9)where T = T [ V ] ≡ ( I − V Γ ) − V = V ( I − Γ V ) − . (10)Here we have defined the nonlinear functional T [ · ], whichcontains Γ as a parameter.We can view (10) as a matrix formulation of the for-ward problem. If V is known, we can use (10) to com-pute T , and once this is accomplished, we can predictthe result of a measurement by any detector due to anysource by matrix multiplication according to (9). There-fore, computation of T yields the most general solutionto the forward problem. The forward solution is usually T = ââ N v N v N v N v N d N s N s N d (cid:145) B A FIG. 2. (color online) Block diagram of Eq. (9) with sizes ofall matrices indicated. Here N d and N s are the numbers ofdetectors and sources and N v is the number of voxels. known to exist if V is physically admissible. In the iter-ative process of DCTMC, we can ensure physical admis-sibility of V every time before the transformation T [ V ]is used. We can view this procedure as a particular typeof regularization by imposition of physical constraints. Ifthis type of regularization is used, one can be sure thatthe matrix inversion involved in computing T [ V ] is al-ways well-defined.We can also formally invert T and write V = T − [ T ] ≡ ( I + T Γ ) − T = T ( I + Γ T ) − . (11)Much less is known about the existence of the inversein (11). In other words, we do not know the conditionsof physical admissibility of T apart from the general butnot very useful symmetry property T ij = T ji . Certainly, T − [ T ] does not exist for all arguments T . In DCTMC,one of the possible approaches is to update V iterativelyby using (11). In this case, existence of the inverse isrequired. While we do not possess a general proof, nu-merical simulations for the inverse diffraction problemhave encountered no singularities in (11). More impor-tantly, the problem of invertibility of T does not ariseat all if Computational Shortcut 2 is used (see Sec. V Bbelow).A block diagram of Eq. (9) with all matrix sizes indi-cated is shown in Fig. 2. Here N d and N s are the num-bers of detectors and sources used (not necessarily equal)and N v is the number of volume voxels. For a practicalestimate of these numbers, refer to Fig. 1. Let the mea-surement surfaces Σ d and Σ s be identical squares locatedon the opposite sides of a cubic sample. Let the detectorsand sources be scanned on an L × L square grids and letthe sample be discretized on a L × L × L cubic grid withthe same pitch. Then N d = N s = L , N v = L . Theseestimates are typical but, admittedly, not very general.Still, in many practical cases we can expect that N d , N s ≪ N v ≪ N d N s . (12)The first inequality in (12) illustrates Remark 1 of Sec. IIbecause the matrices A and B are in this case clearlynot invertible. The second inequality is important if wewish to compare DCTMC to some of the traditional ap-proaches. For example, the conventional formulation ofthe linearized ISP starts from equation (7). As is ex-plained in Remark 3, is is commonly assumed that K in (7) is a general matrix of the size N d N s × N v ( L × L ).However, the sizes of A and B are N d × N v ( L × L ) and N v × N s ( L × L ), respectively. Computing numericallythe pseudoinverse of K (if we do not account for its spe-cial algebraic structure as is described in Appendix B) isa much more computationally-intensive task than com-puting the pseudoinverses of A and B . Therefore, therelaxation of the strict requirement of diagonality of V allows one to work with two much smaller “weight ma-trices” A and B instead of one large “weight matrix” K .We now turn to the central idea of DCTMC, namely,to the concept of data-compatibility of the T-matrix. Toformulate the constraints that equation (9) places on T ina computationally-tractable form, consider the singularvalue decompositions of A and B : A = N d X µ =1 σ Aµ (cid:12)(cid:12) f Aµ (cid:11) (cid:10) g Aµ (cid:12)(cid:12) , B = N s X µ =1 σ Bµ (cid:12)(cid:12) f Bµ (cid:11) (cid:10) g Bµ (cid:12)(cid:12) . (13)Here σ Aµ , (cid:12)(cid:12) f Aµ (cid:11) and (cid:12)(cid:12) g Aµ (cid:11) are the singular values and rightand left singular vectors of A , and similarly for B . Notethat (cid:12)(cid:12) f Aµ (cid:11) and (cid:12)(cid:12) g Bµ (cid:11) are vectors of length N d and N s ,respectively, while (cid:12)(cid:12) g Aµ (cid:11) and (cid:12)(cid:12) f Bµ (cid:11) are both of length N v ,and we have assumed in (13) that N d , N s ≤ N v . Usingthe orthogonality of singular vectors, we obtain from (9)and (13) σ Aµ σ Bν ˜ T µν = ˜Φ µν , ≤ µ ≤ N d , ≤ ν ≤ N s , (14)where˜ T µν ≡ (cid:10) g Aµ | T | f Bν (cid:11) , ≤ µ, ν ≤ N v ; (15a) ˜Φ µν ≡ (cid:10) f Aµ | Φ | g Bν (cid:11) , ≤ µ ≤ N d , ≤ ν ≤ N s . (15b)By ˜ T we denote the T-matrix in singular-vector represen-tation while T that was used previously is the T-matrixin real-space representation . The two representations arerelated to each other by the transformation˜ T = R ∗ A T R B ≡ R [ T ] , T = R A ˜ T R ∗ B ≡ R − [ ˜ T ] , (16)where R A is the unitary matrix whose columns are thesingular vectors (cid:12)(cid:12) g Aµ (cid:11) while R B is the unitary matrixwhose columns are the singular vectors (cid:12)(cid:12) f Bµ (cid:11) . Equation(16) defines the pseudo-rotation functional R [ · ]. We notethat R [ · ] is linear and invertible even though it is notequivalent to a conventional rotation because R A = R B .It is useful to keep in mind that ˜Φ = R [ Φ ]. As can beseen from the definition (15), ˜Φ is related to Φ by a sim-ilar transformation but with different unitary matrices.We now can write the solution to (14) as follows:˜ T µν = σ Aµ σ Bν ˜Φ µν , if σ Aµ σ Bν > ǫ ;unknown , otherwise . (17)Here ǫ is a small positive constant. If computations couldbe performed with infinite precision, we could have set T à exp = û Aö û B÷ (cid:145) à ö÷ !! ! T à = " û Aö û B÷ (cid:145) à ö÷ M A M B M A M B FIG. 3. (color online) Left panel: Elements of ˜ T inside theshaded block can be computed from the data by using (17).Elements outside of the shaded block are not known and cannot be in any way inferred from the data. Right panel: Theinitial guess for the T-matrix, T exp . In this initial guess, we setthe unknown elements of ˜ T (in singular-vector representation)to zero. ǫ = 0. In practice, we should take ǫ to be small but atleast larger than the smallest positive floating-point con-stant for which a particular implementation of numeri-cal arithmetic adheres to the IEEE standard. We notethat under the assumptions stated above, the condition σ Aµ σ Bν > ǫ can be satisfied only for 1 ≤ µ ≤ N d and1 ≤ ν ≤ N s . Singular values σ Aµ and σ Bν with indexesoutside of these ranges are identically zero.Eq. (17) summarizes our knowledge about the systemthat is contained in the data. There are few matrix el-ements of ˜ T that are known with certainty. These ma-trix elements can be computed by the first expressionin (17). The other matrix elements can not be deter-mined from equation (9). We can vary these unknownelements arbitrarily and the error of (9) will not notice-ably change. The number of known elements of ˜ T cannot exceed N d N s but can, in principle, be smaller, e.g.,if the rank of A is less than N d or the rank of B is lessthan N s , although this situation is not typical even forseverely ill-posed ISPs. In any event, N d N s is usuallymuch smaller than the total number of the matrix ele-ments of ˜ T , which is equal to N v . Using the previouslyintroduced estimates, N d N s /N v ∼ /L . Therefore, onlya small fraction of the elements of ˜ T are known. In whatfollows, we assume that the singular values of A and B are arranged in the descending order and that the knownelements of ˜ T can be collected into the upper-left rect-angular block of the size M A × M B (see Fig. 3), where M A ≤ N d and M B ≤ N s . We emphasize again that, inmany practically-important cases, equalities will hold inthe above expressions. However, it is possible to arrangethe sources in such a way that the rank of B is less than N s (and similarly for detectors and A ), at least up to thenumerical precision of the computer [31]. Moreover, theregion of known matrix elements can be of a more generalshape than a rectangle, as is shown in Fig. 4. It is notconceptually difficult to account for this fact. However,we will proceed under the assumption that the region isrectangular in order to shorten the discussion. Besides,in the numerical simulations of [28], this region was, infact, rectangular.Even though we can not gain any knowledge about thematrix elements of ˜ T outside of the shaded area shown in û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B û A û B FIG. 4. (color online) Assuming that the singular values σ Aµ and σ Bµ are arranged in the descending order, this sketchshows an example of a more general shape (compared toFig. 3) of the region in which the elements of ˜ T are known.The numbers above the thick line satisfy σ Aµ σ Bν > ǫ . In thegeneral case, the boundary line can only go from left to rightand from bottom to top if followed from the left-most bound-ary of the matrix. Fig. 3 by using equation (9) alone, we can make an initialguess for ˜ T , which we denote by ˜ T exp (the ”experimental”T-matrix). We define T exp (in real-space representation)as the matrix that satisfies (9) in the minimum normsense and has the smallest entry-wise norm k T k . Thismatrix is uniquely defined by the equation T exp = A + Φ B + , (18)where “+” denotes Moore-Penrose pseudoinverse. If A and B are rank-deficient or invertible, (9) is satisfied by T exp exactly so that k AT exp B − Φ k = 0. The experi-mental T-matrix in singular-vector representation is ob-tained from (18) by the transformation (16). In fact, theexperimental T-matrix is more easily characterizable insingular-vector representation. Indeed, all the elementsof ˜ T exp in the unshaded area of the diagram in Fig. 3(right panel) are equal to zero. This is expressed mathe-matically by writing (cid:16) ˜ T exp (cid:17) µν = σ Aµ σ Bν ˜Φ µν , if σ Aµ σ Bν > ǫ ;0 , otherwise . (19)This expression is equivalent to (18).We conclude this section with two important observa-tions about the experimental T-matrix: Remark 4: Lack of sparsity of T exp . The matrix˜ T exp is sparse but the same is not true for T exp . Remark 5: Lack of symmetry of ˜ T exp . It is knowntheoretically that the correct T-matrix is symmetric inreal-space representation. However, this is not generallytrue for T exp . Indeed, T exp = R − [ ˜ T exp ], and in ˜ T exp alarge fraction of the elements are replaced by zeros. Theresultant T exp is not likely to be symmetric. IV. BASIC ITERATION CYCLE
In this section we describe a computational algorithmin which the matrices T and V are continuously updatedso that T is kept data-compatible and V becomes in-creasingly diagonally-dominated. Our goal is to fill theunknown elements of ˜ T (the white areas in the left panelof Fig. 3) in such a way that the corresponding interactionmatrix V , computed according to (11), is approximatelydiagonal. This is a general formulation of the problem ofmatrix completion, although the constraint that we ap-ply to ˜ T is not the same as in the conventional statementof the problem.Before proceeding, we need to introduce several ad-ditional operators. First, define the masking operators M [ · ] and N [ · ]: (cid:16) M [ ˜ T ] (cid:17) µν ≡ (cid:26) , σ Aµ σ Bν > ǫ ;˜ T µν , otherwise . (20a) (cid:16) N [ ˜ T ] (cid:17) µν ≡ (cid:26) ˜ T µν , σ Aµ σ Bν > ǫ ;0 , otherwise . (20b)We note that M [ ˜ T ] + N [ ˜ T ] = ˜ T . Then the operatorof enforcing data-compatibility of ˜ T (in singular-vectorrepresentation) O [ · ] can be defined as follows: O [ ˜ T ] ≡ M [ ˜ T ] + ˜ T exp = ˜ T − N [ ˜ T ] + ˜ T exp . (21)It can be seen that the action of O [ ˜ T ] is to overwrite(hence the notation O ) the elements of ˜ T in the shadedarea of Fig. 3 with the elements of ˜ T exp and to leave allother elements unchanged. The operator O [ · ] is definedfor any N v × N v matrix but in the iterations discussedbelow we always apply this operator to the T-matrix insingular-vector representation.Next, we will need to define a diagonal approxima-tion to V . To this end, we define an entry-wise “force-diagonalization” operator D [ · ], where( D [ V ]) ij ≡ δ ij X k V ik ρ ( ℓ ki ) . (22)Here ρ ( ℓ ki ) is a weight function that depends on the phys-ical distance ℓ ik between the voxels i and k and not on therelative position of the lines i and k in the matrix V . Thedefinition (22) is in agreement with the approximation ofthe form (1). We note that a more symmetric definitioninvolving the factor ρ ( ℓ ki )( V ik + V ki ) / V is symmetric, the two expressions yield identical re-sults but the iteratively-updated V is not symmetric inDCTMC, and the physical meaning of the off-diagonalelements V ik and V ki is generally not the same. In thesimplest case, we can take ρ ( ℓ ki ) = δ ki . This correspondsto sending all off-diagonal elements of V to zero. Thisapproach allows for a complete and simple analysis ofDCTMC in the linear regime, as is described in Sec. VIbelow. However, the use of more complicated functions ρ ( ℓ ) corresponds better to the spirit of DCTMC; it allowsone to take the off-diagonal elements of V that are gener-ated in the course of iterations. Unlike the operator O [ · ], D [ · ] will always be applied in real-space representation.We are now ready to describe the iterative process ofDCTMC. The iteration steps will be defined in terms ofthe operators R [ · ] and T [ · ], D [ · ] and O [ · ]. A summary ofall operators used in this paper is given in Appendix C.We assume that the SVD decompositions of matrices A and B and the experimental T-matrix ˜ T exp (19) havebeen precomputed. Consider the case when the iterationsstart from an initial guess for the T-matrix. We then set k = 1, ˜ T = ˜ T exp and run the following iterations:1: T k = R − [ ˜ T k ]This transforms the T-matrix from singular-vectorto real-space representation. Both ˜ T k and T k aredata-compatible.2: V k = T − [ T k ]This gives k -th approximation to the interactionmatrix V . V k is data-compatible but not diagonal.3: D k = D [ V k ]Compute the diagonal approximation to V k , de-noted here by D k . D k is diagonal but not data-compatible.4: T ′ k = T [ D k ]Compute the T-matrix that corresponds to the di-agonal matrix D k . Unlike T k , T ′ k is no longer data-compatible.5: ˜ T ′ k = R [ T ′ k ]Transform T ′ k to singular-vector representation.Here ˜ T ′ k is still not data-compatible.6: ˜ T k +1 = O [ ˜ T ′ k ]Advance the iteration index by one and overwritethe elements of ˜ T ′ k that are known from data withthe corresponding elements of ˜ T exp . This will re-store data-compatibility of ˜ T k +1 . Then go to Step1.The computational complexity of Steps 1,2,4,5 is O ( N v ).However, the complexity can be dramatically reducedwith the use of the computational shortcuts that are de-scribed in the next section, with the only exception ofStep 4. Therefore, Step 4 is the true computational bot-tleneck of the method. It’s complexity can be reducedby accounting for sparsity of V . However, if no a priori !" k T (cid:4) !" k T !)"7&3"+-)3 k T (cid:99) !)"7&3"+-)3 k T (cid:99) (cid:4) !" k V !)"7&3"+-)3 k D !" T ! D ,0( T (cid:4) (cid:41) A ,08( B . R à . T à . D . T . R . O k (cid:120) k + 1 FIG. 5. (color online) Basic flowchart of the DCTMC iteration process for the case when the iterations start with an initial guessfor ˜ T . Numbered iteration steps are defined in Sec. IV. Computational shortcuts are described in Sec. V. Matrix representationsare abbreviated by SV (singular-vector) and RS (real-space). Exit condition can be checked at various Steps of the iterations(see [28] for more detail). knowledge about sparsity of V is available, then the com-putational complexity of Step 4 is the limiting factor ofDCTMC, at least to the best of our current understand-ing. V. COMPUTATIONAL SHORTCUTSA. Shortcut 1: Fast pseudo-rotations
Consider iteration Steps 5,6,1 written sequentially:5 : ˜ T ′ k = R [ T ′ k ] O ( N v ) = O ( L ) , T k +1 = O [ ˜ T ′ k ] ≤ O ( N d N s ) = O ( L ) , T k +1 = R − [ ˜ T k +1 ] O ( N v ) = O ( L ) . To the right, we have indicated the computational com-plexity of each step and used the previously introducedestimates for N d , N s and N v in terms of the grid size L . The complexity of Step 6 is equal to, at most, N d N s . Therefore, the complexity of Steps 5 and 1 is dominating.Now, let us combine the steps by writing T k +1 = R − [ O [ R [ T ′ k ]]]= R − h R [ T ′ k ] − N [ R [ T ′ k ]] + ˜ T exp i , (23)where in the second equality we have used the definitionof O [ · ] (21). We now use the linearity and invertibility of R to rewrite (23) as T k +1 = T ′ k + T exp − R − [ N [ R [ T ′ k ]]] . (24)We are therefore left with the task of numerically evalu-ating an expression of the type R − [ N [ R [ T ]]], where wehave dropped all indexes for simplicity. But this can beaccomplished in much less than O ( N v ) operations due tothe sparsity of N [ · ]. Indeed, consider first the computa-tion of N [ R [ T ]]. This operation is illustrated in Fig. 6.It can be seen that N [ R [ T ]] = P ∗ A T P B , where the ma-trix P A is obtained from R A by overwriting all columnsof R A , except for the first M A columns, with zeros, and = ââ T P ã A P B = â â T R ã A R B R [ T ] N [ R [ T ]] N M A M B M A M B FIG. 6. (color online) Schematics of computing N [ R [ T ]].Matrices P A and P B are obtained from R A and R B by settingall columns to zero except for the first M A and M B columns,respectively. P B is defined analogously. The complexity of computing P ∗ A T P B is O (min( M A , M B ) N v ), which is less than N v by the factor of at least O ( L ). Quite analogously, we canshow that R − [ N [ R [ T ′ k ]]] = P A ( P ∗ A T P B ) P ∗ B . (25)It should be kept in mind that P A P ∗ A = I and P B P ∗ B = I and that premultiplying these matrices is not a compu-tationally efficient approach. Doing so will result in anexpression of the type Q ∗ A T Q B , where Q A = P A P ∗ A and Q B = P B P ∗ B are not sparse. Instead, one should evaluatethe right-hand side of (25) using the operator precedenceimplied by the parentheses. We conclude that the Steps5,6,1 of the iterative procedure described above can becombined in the following single computational step: T k +1 = T ′ k + T exp − P A ( P ∗ A T ′ k P B ) P ∗ B . (26)In this formula, T exp , P A and P B are precomputed andstored in memory. We finally note that the sparsity of N can be used efficiently even if the region of ”known”elements of ˜ T is not rectangular. Although matrices P A and P B can not be easily defined in this case, the use ofappropriate masks in computing expressions of the type N [ R ∗ A T R B ] will achieve a similar reduction of computa-tional complexity. B. Shortcut 2: Fast T → D operation. Here we describe a computational shortcut that cancut the computational time per one iteration by approx-imately a factor of ∼
2. However, we have found em-pirically [28] that this is not the most efficient approachsince it does not use one of the main features of DCTMC,that is, accounting for the off-diagonal elements of V atthe intermediate stages of the iterations. In [28], it isshown that a more efficient approach is based on uti-lization of the formula (22) (weighted summation to the diagonal). Here we describe Computational Shortcut 2for completeness of exposition.Consider Steps 2 and 3 of the basic iteration cycle:2 : V k = T − [ T k ] O ( N v ) = O ( L ) , D k = D [ V k ] O ( N v ) = O ( L ) , To the right, we have indicated the computational com-plexity of each step. The goal of Steps 2 and 3 is to finda diagonal matrix D that in some sense approximatesthe previously-computed T-matrix. More specifically, wecompute V that corresponds to T exactly but is not di-agonal in Step 2 and then seek a diagonal approximationto V denoted by D . The last operation is governed bythe “force-diagonalization” operator D [ · ], which in turndepends on the weight function ρ ( ℓ ). As was mentionedabove, the simplest choice of this weight function is suchthat ρ ( ℓ ki ) = δ ki (we only need to define ρ ( ℓ ) for a setof discrete values of the argument). This choice of theweight function minimizes the entry-wise norm k V − D k .An alternative approach is to seek a diagonal matrix D that satisfies the equation T = D + D Γ T (27)in the minimum L -norm sense (of course, (27) can not besatisfied exactly by any diagonal matrix D ). The above isa classical minimization problem, which has the followinganalytical solution: D ij = δ ij T ii + (cid:2) ( Γ T ) ∗ T (cid:3) ii (cid:2) ( Γ T ) ∗ + ( Γ T ) + ( Γ T ) ∗ ( Γ T ) (cid:3) ii . (28)It may seem that evaluation of (28) still requires O ( N v )operations because it contains the matrix-matrix product Γ T . However, this is not so. The matrix Λ ≡ Γ T can beupdated iteratively during Computational Shortcut 1 byusing (26) multiplied from the left by Γ , viz, Λ k +1 = Λ ′ k + Λ exp − ( Γ P A ) ( P ∗ A T ′ k P B ) P ∗ B . (29)Here Λ k +1 = Γ T k +1 , Λ ′ k = Γ T ′ k and Λ exp = Γ T exp .The matrix Γ P A can be precomputed and has exactlythe same sparsity structure as P A itself, that is, all ofits columns except for the first M A columns are zero.Therefore, computing the last term in (29) is of thesame complexity as Computational Shortcut 1, that is, O (min( M A , M B ) N v ). There remains the question of how Λ ′ k is computed and whether this computation requiresan extra matrix-matrix multiplication. The answer is, itcan be precomputed at Step 4 of the k -th iteration with-out any additional matrix-matrix multiplications. In-deed, let us utilize the second formula in the definitionof T [ · ] (10) and write Step 4 of k -th iteration as follows: T ′ k = D k ( I − Γ D k ) − . (30)We then multiply from the left both sides of (30) by Γ and obtain Λ ′ k = Γ D k ( I − Γ D k ) − = ( I − Γ D k ) − − I . (31)0Now we can compute T ′ k and Λ ′ k without any additionalcomplexity by using the following sub-steps:1: Compute the product ∆ k ≡ Γ D k , which is fastbecause D k is diagonal.2: Compute the inverse S k ≡ ( I − ∆ k ) − , which hasthe complexity of N v .3: Compute Λ ′ k = S k − I [as follows from (31)].4: Compute T ′ k = D k S k [as follows from (30)], whichis again fast because D k is diagonal.Thus, in all the computations outlined above only in-version of I − ∆ k has the computational complexity of N v . This is, therefore, the true computational bottleneckof the algorithm.The computational shortcut described here allows oneto cut the computational time per one iteration ofDCTMC by approximately the factor of 2. However, wewill show in the second part of this paper series [28] that amore useful approach is to use an explicit weight function ρ ( ℓ ). This is more in line with the main idea of DCTMC,which is relaxing the requirement that V be a strictlylocal interaction. Weighted summation to the diagonalis a natural approach to account for the nonlocality (off-diagonality of V ). Indeed, we will see that, although thisapproach does not allow one to use the ComputationalShortcut 2, it reduces the number of required iterationsand is more benefitial in the end. C. Streamlined iteration cycle
The computational shortcuts can be integrated into asingle streamlined iteration algorithm. Doing so requirescareful consideration of the flowchart shown in Fig. 5and of the associated data dependencies. However,the resulting algorithm is relatively simple. For easeof programming, we have broken this algorithm intoelementary computational steps. We describe separatelythe cases when Computational Shortcut 2 is and is notused. In both cases, we start from the initial guess forthe T-matrix, T = T exp . Modification in which theprocess starts from an initial guess for V is quite obviousand is numerically implemented in [28]. Initial setup:
1: Permanently store in memory the analytically-known matrix Γ .2: Compute the SVD decomposition (13) of A and B .This will yield a set of singular values σ Aµ , σ Bµ (someof which are identically zero) and singular vectors | f Aµ i , | f Bµ i , | g Aµ i , | g Bµ i .3: Use the previous result to construct and perma-nently store in memory the dense matrices R A and R B , sparse matrices P A and P B . If ComputationalShortcut 2 is used, compute also Q A = Γ P A . Note:no additional memory allocation for P A and P B isrequired.4: Compute ˜ Φ µν according to (15b) and ˜ T exp accord-ing to (19). Discard the real-space data function,the singular values and singular vectors, and deal-locate the associated memory.5: Compute and store permanently in memory T exp = R A ˜ T exp R ∗ B = P A ˜ T exp P ∗ B . If computational Short-cut 2 is used, also compute Λ exp = Γ T exp .6: Initialize iterations by setting T = T exp and Λ = Λ exp .The computational cost of the initial setup is comparableto that of one iteration or less. Step 1 is negligible. Com-putation of the SVD of A and B in Step 2 has the costof O ( N v ( N s + N d ), which is also relatively small. Thecost of Step 3 is again negligible as it mostly consists ofarranging numerical data in the computer memory. Thecost of Step 4 is O ( N s N d + N d N s ). Finally, in Step 5, thecomplexity of computing P A ˜ T exp P ∗ B is O ( N s N d + N d N s )or less, depending on the size of nonzero blocks in P A and P B . The only costly operation is the computation of thematrix-matrix product Γ T exp in Step 5, with the com-putational complexity of O ( N v ). But this step is onlyrequired if the Computational Shortcut 2 is used.Exit condition can be defined at different stages ofthe iterations by using various error measures, as isdescribed in more detail in [28]. Here no specific exitconditions are defined. Main iteration with the use of ComputationalShortcut 2:
Starting from k = 1, T = T exp and Λ = Λ exp , run the following iterations:1: ( D k ) ij = δ ij ( T k ) ii + ( Λ ∗ k T k ) ii Λ ∗ k + Λ k + Λ ∗ k Λ k ) ii ∆ k = Γ D k S k = ( I − ∆ k ) − T ′ k = D k S k , Λ ′ k = S k − I T k +1 = T ′ k + T exp − P A ( P ∗ A T ′ k P B ) P ∗ B , Λ k +1 = Λ ′ k + Λ exp − Q A ( P ∗ A T ′ k P B ) P ∗ B Operations whose order of execution is insignificantand which can run independently in parallel threadsare shown in the same numbered step. Note thatcomputation of the terms ( Λ k T k ) ii and ( Λ ∗ k Λ k ) ii for i = 1 , . . . , N v has the computational complexity of only O ( N v ). Therefore, the true bottleneck of each iterationis the operation of matrix inversion S k = ( I − ∆ k ) − whose computational complexity is O ( N v ).1 Main iteration without the use of ComputationalShortcut 2:
Starting from k = 1, T = T exp , run thefollowing iterations:1: V k = ( I + T k Γ ) − T k D k = D [ V k ]3: T ′ k = ( I − D k Γ ) − D k T k +1 = T ′ k + T exp − P A ( P ∗ A T ′ k P B ) P ∗ B The computational cost of this iterative scheme is domi-nated by the Steps 1 and 3, the computational complexityof each of these steps being O ( N v ). VI. DCTMC IN THE LINEAR REGIME ANDTHE QUESTIONS OF CONVERGENCE ANDREGULARIZATION
In this section we analyze DCTMC in the linearregime. Most results will be obtained in the case whenthe weight function in (22) is given by ρ ( ℓ ki ) = δ ki . How-ever, generalizations for a more general (and a more prac-tical) operator D [ · ] will be mentioned briefly.Consider the iteration cycle of Sec. V C in the limit Γ →
0. Omitting intermediate steps, we find that eachiteration is reduced to the following two operations:1: D k = D [ T k ]2: T k +1 = D k + T exp − ( P A P ∗ A ) D k ( P B P ∗ B )where D [ · ] is defined in (22). These two steps can becombined in the following simple iteration: D k +1 = D k + D exp − D [( P A P ∗ A ) D k ( P B P ∗ B )] , (32)where D exp = D [ T exp ]. Iteration (32) can be obtainedsimply by applying the operator D [ · ] to the equation inStep 2 above. Let us now convert (32) to an equationwith respect to the vector | υ k i that contains the diagonalelements of D k . From the linearity of (32) we immedi-ately find | υ k +1 i = | υ exp i + ( I − W ) | υ k i , (33)where | υ exp i is the vector of diagonal elements of D exp .We now specialize to the case when the weight functionin the definition (22) of the operator D [ · ] is given by ρ ( ℓ ki ) = δ ki . In this case, the matrix W has the elements W ij = ( P A P ∗ A ) ij ( P B P ∗ B ) ji . (34)It is easy to see that (33) is Richardson first-order itera-tion with the fixed point | υ ∞ i = W − | υ exp i . Therefore,DCTMC in the linear regime simply provides an iterativeway of solving the equation W | υ i = | υ exp i . (35) This equation can be derived independently fromDCTMC and in a more straightforward manner start-ing from the linearized equation (7). This derivation isshown in Appendix B and it takes advantage of the al-gebraic properties of K (see Remark 3). It is importantto realize that, although (35) can be obtained from (7)by a series of linear transformations, the two equationsare not equivalent in the following sense: if K is notinvertible, then the pseudoinverse solutions of the twoequations can be different. However, if K is invertible,then the two equations have the same unique solution.Of course, iteration (33) is only a particular numericalmethod of solving (35) and not the most efficient one:conjugate-gradient descent is expected to provide bettercomputational performance. However, consideration ofDCTMC in the linear regime is not a vane or trivial ex-ercise but is useful in several respects. First, it gives usan insight into the convergence properties of DCTMC.Second, it gives us an idea of how DCTMC iterationscan be regularized. Convergence and regularization willbe discussed in the remainder of this section.It is obvious that the iterations converge to the fixedpoint provided that | − w n | < n , where w n arethe eigenvalues of W . Since W is Hermitian, all its eigen-values are real and therefore the convergence conditionreads 0 < w n <
2. Under the same condition the inverse W − exists. We will now prove that0 ≤ w n ≤ . (36)The fact that W is non-negative definite is obvious from(34). We will however make an additional step and recallthat the columns of P A are the singular vectors | g Aµ i for µ = 1 , . . . , M A and zeros otherwise while the columns of P B are the singular vectors | f Bµ i for µ = 1 , . . . , M B andzeros otherwise. We then obtain in a straightforwardmanner: W ij = M A X µ =1 M B X ν =1 h i | g Aµ ih g Aµ | j ih j | f Bν ih f Bν | i i . (37)Let | x i be an arbitrary nonzero vector of length N v and X be an N v × N v matrix with the elements of | x i on thediagonal and zeros elsewhere. Then h x | W | x i = M A X µ =1 M B X ν =1 (cid:12)(cid:12) h g Aµ | X | f Bν i (cid:12)(cid:12) ≥ . (38)We therefore have proved that w n ≥
0. Next, we use theorthonormality of each set of singular vectors to write thefollowing identities h x | x i = N v X i =1 h i | X ∗ X | i i = N v X µ =1 N v X ν =1 (cid:12)(cid:12) h g Aµ | X | f Bν i (cid:12)(cid:12) . (39)Since M a , M b ≤ N v , h x | W | x i ≤ h x | x i and we have proved(36). The equality h x | W | x i = h x | x i holds only in thecase M A = M B = N v , in which case W = I and the2iteration (33) trivially converges to its fixed point rightupon making the initial guess. In this unrealistic case, allelements of the T-matrix are determined from the dataand no iterations are needed.We thus conclude that convergence can be slow in thecase W has a small (or zero) eigenvalue. We can definethe characteristic overlap of singular vectors related todetectors and sources as ξ = inf X =0 ( P M A µ =1 P M B ν =1 (cid:12)(cid:12) h g Aµ | X | f Bν i (cid:12)(cid:12) P N v µ =1 P N v ν =1 (cid:12)(cid:12) h g Aµ | X | f Bν i (cid:12)(cid:12) ) . (40)The iterations (33) converge at least as fast as the powerseries P n (1 − ξ ) n . If ξ is close to zero, the convergencecan be slow. This observation gives us an idea of how theiterations can be regularized. This can be accomplishedby replacing W by W + λ I , where λ is a regularizationparameter. As is shown in Appendix B, this procedure isequivalent to Tikhonov regularization of equation (B5),which can be derived from (7) by several linear opera-tions. However, the substitution W → W + λ I is notequivalent to Tikhonov regularization of (7).We can now introduce regularization of the general it-erative algorithm of Sec. V C, which is applicable to thenonlinear case as well. Namely, for any matrix X , wereplace the linear transformation ( P A P ∗ A ) X ( P B P ∗ B ) by( P A P ∗ A ) X ( P B P ∗ B ) + λ D [ X ]. This entails the followingmodification to Step 5 of the streamlined algorithm withShortcut 2 or Step 4 of the algorithm without Shortcut2: T k +1 = T ′ k − λ D [ T ′ k ] + T exp − P A ( P ∗ A T ′ k P B ) P ∗ B , Λ k +1 = Λ ′ k − λ D [ Λ ′ k ] + Λ exp − Q A ( P ∗ A T ′ k P B ) P ∗ B . Finally, we mention briefly which modifications can beexpected if we use a more general weight function ρ ( ℓ ki )in the definition (22) of the operator D . The first obviousresult is that the matrix W is modified in this case so thatits elements are given by W ij = X k ( P A P ∗ A ) ij ( P B P ∗ B ) jk ρ ( ℓ ki )= ( P A P ∗ A ) ij ( P B P ∗ B H ) ji , (41)where H ij = ρ ( ℓ ij ). This matrix is no longer non-negative definite. Moreover, its eigenvalues can be com-plex and we can not prove easily that they are con-strained to the disk | − w n | < ρ ( ℓ ) that are non-zero for finite ℓ improves the convergence rate of the DCTMC itera-tions [28]. We view this as an indication that the eigen-values w n are pushed away from the origin in the com-plex plane while staying within the disk | − w n | < ρ ( ℓ ). Indeed, wecan write h i | υ exp = ( T exp H ) ii . The resultant equation W | υ i = | υ exp i is still derivable from (7). In AppendixB, the derivation is shown for the simple case ρ ( ℓ ki ) = δ ki or H = I , but the more general case can be easily con-sidered and the resultant transformation between K and W is given in the end of the Appendix. VII. DISCUSSION
This paper describes a novel method for solving non-linear inverse scattering problems (ISPs). The method isbased on iterative completion of the unknown entries ofthe T-matrix and we refer to it as to the data-compatibleT-matrix completion (DCTMC) method. It should beemphasized that the constraint that we apply to the T-matrix (namely, that it corresponds to a nearly diago-nal interaction matrix V ) is not the same as in the con-ventional formulation of the matrix completion problem.The method developed in this paper is well suited foroverdetermined ISPs in which the number of volume vox-els is not too large (e.g., . ) or the target is sparse.The size of the data set in not a limiting factor for thismethod, unlike in many traditional approaches to thesame problem.In the case of ill-posed ISPs, regularization plays thekey role. One should not expect to recover a reasonableimage without some form of regularization. DCTMCallows for two types of regularization: (i) by imposingphysical constraints and (ii) by regularizing the matrix W ij = ( P A P ∗ A ) ij ( P B P ∗ B ) ji . In the linear regime, the ap-proach (ii) corresponds to Tikhonov regularization of thelinearized equation Θ U K | υ i = Θ U | φ i , which is obtainedfrom Eq. (7) by multiplying the latter by Θ U from theleft; the unitary matrix U and the matrix of diagonalscaling Θ are defined in Appendix B. In the nonlinearregime, the approach (ii) is somewhat ad hoc and its ap-plicability requires additional research.A potential advantage of DCTMC is its computa-tional efficiency in solving strongly overdetermined in-verse problems with large data sets. This advance is ob-tained by exploiting the algebraic structure of the ISPrather than stating it in the conventional generic form F [ υ ] = 0 (or Kυ = 0 in the linear approximation) where F [ · ] is a general nonlinear functional, K is its linear ap-proximation and υ is the vector consisting of the diagonalelements of V . In our previous work, we have shown thatstrongly overdetermined data sets are required to obtainthe optimal image resolution [32, 33]. Specifically, thefundamental limit of lateral resolution of diffuse opticaltomography (DOT) in the slab geometry is equal to thestep h of the mesh of sources and detectors on each faceof the slab, provided that the imaging windows are signif-icantly larger than the field of view of the reconstruction.Thus, to reconstruct a lateral central cross section of theslab on a 100 ×
100 mesh of step h , one needs about 300 sources on one side and the same number of detectors onthe other side of the slab. This translates into ∼ data points. This result is not specific to DOT but alsoholds for diffraction tomography that we consider numer-3ically in [28]. However, the fundamental resolution limitis not always achievable. If the inverse problem is ill-posed or noise is present in the data, the theoretical limitof resolution can not be achieved and smaller data setscan suffice to obtain the optimal image quality. There-fore, the optimal size of the data set is determined bya complex interplay between the ill-posedness of the in-verse problem and the statistical properties of the noise.Experimental DOT reconstruction with large data setswas demonstrated in [34], experimental determination ofthe optimal size of the data set was in [34, 35] and furtherinsights on selection of useful data points in the presenceof strong nonlinearity were provided in [36]. However,all references just quoted used linearized image recon-struction. DCTMC allows one to overcome this limita-tion and obtain nonlinear reconstructions with very largedata sets.Although the main goal of DCTMC is to solve nonlin-ear problems, the methods developed above can be use-ful for solving linearized problems with large data setsas well. This development is conceptually similar to theimage reconstruction methods of [33, 37–39] that weredeveloped for solving linear ISPs. The similarity lies inexploring the algebraic structure of the matrix K , whichis obtained as a product of two unperturbed Green’s func-tions. In [33, 37–39] the special structure of K that fol-lows from the translational invariance of an infinite ho-mogeneous slab was exploited. In this work, we do notuse or assume translational invariance of the medium anddo not work in the infinite slab geometry. Instead, we ob-tain an expression of the form AV B , where V is the un-known potential (in the linear regime, V = T ). Thisreplaces the traditional approach in which one writes AV B = Kυ , where υ is the vector of diagonal elementsof V .In the full nonlinear regime, DCTMC also bears some similarity to the methods of [11, 13, 15] where thenotion of the fundamental unknown is also expandedto include the internal fields or the complete Green’sfunction G or the T-matrix T . For example, in the workof Chaumet et al. , the dipole moments d n of voxels areiteratively updated. In our terminology, d nα = ( T q ) nα ,where α = x, y, z labels Cartesian components ofthree-dimensional vectors and q is the incident field (seeSec. II). The intermediate dipole results are directlyupdated by a step in the Polak-Ribiere conjugategradient direction. Then, once an acceptable resulthas been obtained, the relationship between the dipolesand the voxel polarizabilities is used to obtain the finalreconstruction. DCTMC is different in several respects.It searches in a different direction, which is determinedat each iteration by the experimental T-matrix . Also,the voxel polarizabilities are updated at each iteration(here we imply a unique one-to-one correspondencebetween the voxel polarizabilities and the T-matrix).Therefore, our treatment of the unknowns is similar tothat in Refs. [11, 13, 15] but the method for updatingthe unknown is different. ACKNOWLEDGMENTS
This work has been carried out thanks to the sup-port of the A*MIDEX project (No. ANR-11-IDEX-0001-02) funded by the “Investissements d’Avenir” FrenchGovernment program, managed by the French NationalResearch Agency (ANR) and was also supported bythe US National Science Foundation under Grant DMS1115616 and by the US National Institutes of health un-der Grant P41 RR002305. The authors are grateful toJ.C.Schotland, A.Yodh and Anne Sentenac for very use-ful discussions. [1] D. A. Boas, D. H. Brooks, E. L. Miller, C. A. DiMarzio,M. Kilmer, R. J. Gaudette, and Q. Zhang, IEEE SignalProc. Mag. , 57 (2001).[2] S. R. Arridge and J. C. Schotland, Inverse Problems ,123010 (2009).[3] M. M. Bronstein, A. M. Bronstein, M. Zibulevski, andH. Azhari, IEEE Trans. Med. Imag. , 1395 (2002).[4] A. J. Devaney, IEEE Trans. Geosci. Remote Sensing GE-22 , 3 (1984).[5] J. G. Berryman and R. V. Kohn, Phys. Rev. Lett. ,325 (1990).[6] D. Isaacson, J. L. Mueller, J. C. Newell, and S. Siltanen,IEEE Trans. Med. Imag. , 821 (2004).[7] A. Arridge, S. Moskow, and J. C. Schotland, InverseProblems , 035003 (2012).[8] P. S. Carney, R. A. Frazin, S. I. Bozhevolnyi, V. S.Volkov, A. Boltasseva, and J. C. Schotland, Phys. Rev.Lett. , 163903 (2004). [9] K. Belkebir, P. C. Chaumet, and A. Sentenac, J. Opt.Soc. Am. A , 1889 (2005).[10] G. Bao and P. Li, Opt. Lett. , 1465 (2007).[11] P. C. Chaumet, K. Belkebir, and A. Sentenac, Phys. Rev.B , 245405 (2004).[12] K. Belkebir, P. C. Chaumet, and A. Sentenac, J. Opt.Soc. Am. A , 586 (2006).[13] E. Mudry, P. C. Chaumet, K. Belkebir, and A. Sentenac,Inverse Problems , 065007 (2012).[14] M. Jakobsen, Stud. Geophys. Geod. , 1 (2012).[15] M. Jakobsen and B. Ursin, J. Geophys. Eng. , 400(2015).[16] R. G. Newton, Scattering theory of waves and particles (McGrow-Hill, New York, 1966).[17] D. Colton and R. Kress,
Inverse Acoustic and Electro-magnetic Scattering Theory , vol. 93 of
Applied Mathe-matical Sceinces (Springer, Berlin, 1998).[18] R. Snieder, Inverse Problems , 387404 (1998). [19] R. C. Aster, B. Borchers, and C. H. Thurber, ParameterEstimation and Inverse Problems (Elsevier, Amsterdam,2005).[20] J. K. Seo and E. J. Woo,
Nonlinear Inverse Problems inImaging (Wiley, 2012).[21] H. W. Engl and P. Kugler,
Multidisciplinary Methodsfor Analysis Optimization and Control of Complex Sys-tems Mathematics in Industry (Springer, 2005), vol. 6,chap. Nonlinear inverse problems: Theoretical aspectsand some industrial applications, pp. 3–47.[22] V. A. Markel, J. A. O’Sullivan, and J. C. Schotland, J.Opt. Soc. Am. A , 903 (2003).[23] S. Moskow and J. Schotland, Inverse Problems ,065005 (2008).[24] M. Moskow and J. Schotland, Inverse Problems ,095007 (2009).[25] D. Watzenig, e & i Elektrotechnik und Informationstech-nik (Springer, 2007), vol. 124, chap. Bayesian inferencefor inverse problems - statistical inversion, pp. 240–247.[26] R. H. Stolt and B. Jacobs, Tech. Rep. 24, Stanford Ex-ploration Project (SEP) (1980).[27] D. J. Kouri and A. Vijay, Phys. Rev. E , 046614 (2003).[28] H. W. Levinson and V. A. Markel, Phys. Rev. E (2016),submitted.[29] If the source is not of unit strength, measurements shouldbe divided by the source amplitude.[30] There is no real distinction between the two cases if thedata are sampled.[31] This possibility is closely related to the existence of non-radiating sources. Indeed, to force the rank of B to be lessthan N s , we need to arrange the sources in space and toassign them relative amplitudes and phases so that theyproduce almost no field in Ω.[32] V. A. Markel and J. C. Schotland, Appl. Phys. Lett. ,1180 (2002).[33] V. A. Markel and J. C. Schotland, Phys. Rev. E ,056616 (2004).[34] Z.-M. Wang, G. Y. Panasyuk, V. A. Markel, and J. C.Schotland, Opt. Lett. , 3338 (2005).[35] S. D. Konecky, G. Y. Panasyuk, K. Lee, V. Markel,A. Yodh, and J. C. Schotland, Opt. Expr. , 5048(2008).[36] H. Y. Ban, D. R. Busch, S. Pathak, F. A. Moscatelli,M. Machida, S. J. C., V. A. Markel, and A. G. Yodh, J.Biomed. Opt. , 026016 (2013).[37] J. C. Schotland, J. Opt. Soc. Am. A , 275 (1997).[38] J. C. Schotland and V. A. Markel, J. Opt. Soc. Am. A , 2767 (2001).[39] V. A. Markel, V. Mital, and J. C. Schotland, J. Opt. Soc.Am. A , 890 (2003). Appendix A: Linearizing approximations
In this Appendix we state the linearizing approxima-tions without derivation or analysis as this is outside ofthe scope of this paper. We only note that first Born ap-proximation is obtained by retaining the first-order termin the power-series expansion of the complete Green’sfunction G ; first Rytov approximation is obtained by re-taining the first-order term in the cumulant expansionof log( G ) and the mean-field approximation is obtained by using the first-order approximant in the continued-fraction expansion of G . More details are given in [33].Only first Born approximation can be stated in abstractform while the other two approximations involve entry-wise expressions. Correspondingly, the accuracy of thesetwo approximations depends on the matrix representa-tion while the accuracy of first Born approximation isrepresentation-independent. (i) First Born approximation. The simplest ap-proach to linearization is the first Born approximation,according to which A ( I − V Γ ) − V B ≈ AV B .
Obviously, the first Born approximation is valid if k V Γ k ≪
1. By substituting the above approximationinto the left-hand side of (5), we obtain (6) in which Ψ = Φ . Therefore, the data transformation in the caseof first Born approximation is trivial. (ii) First Rytov approximation. The first Rytovapproximation can be stated as (cid:2) A ( I − V Γ ) − V B (cid:3) ij ≈ C ij (cid:26) exp (cid:20) ( AV B ) ij C ij (cid:21) − (cid:27) , where C ij = G ( r i , r j ) and r i ∈ Σ d , r j ∈ Σ s . Thuswe have encountered yet another restriction of G ( r , r ′ ).Obviously, with this restriction used, G ( r , r ′ ) is the di-rect (unscattered) field that would have been producedby a source located at r and measured by a detector at r ′ in the case V = 0. If we substitute the above approx-imation into the left-hand side of (5) and introduce thedata transformation Ψ ij = C ij log (1 + Φ ij /C ij ) , we would arrive again at (6). Therefore, the equationabove defines the data transformation of first Rytov ap-proximation. (iii) Mean-field approximation. The mean-fieldapproximation is (cid:2) A ( I − V Γ ) − V B (cid:3) ij ≈ ( AV B ) ij − ( AV B ) ij /C ij . The data transformation of the mean-field approximationhas the form of element-wise harmonic average and reads Ψ ij = 11 / Φ ij + 1 /C ij . Appendix B: Derivation of the equation W | υ i = | υ exp i [Eq. (35) ] from Eq. (7) . In this Appendix, we derive Eq. (35)] from Eq. (7) forthe special case of the weight function ρ ( ℓ ki ) = δ ki . Themore general case can be considered without difficulty,and the relevant result is adduced in the end of this ap-pendix.5Consider the linearized equation (7) in the first Bornapproximation, that is, with the trivial data transforma-tion Ψ = Φ or equivalently | ψ i = | φ i . We recall that K ( mn ) ,j = A mj B jn and use (13) for A mj and B jn toobtain the following result for the elements of K : K ( mn ) ,j = N d X µ =1 N s X ν =1 σ Aµ σ Bν h m | f Aµ ih g Aµ | j ih j | f Bν ih g Bν | n i . Now define the unitary matrix U with the elements U ( µν ) , ( mn ) = h f Aµ | m ih n | g Bν i , (B1)1 ≤ µ, m ≤ N d , ≤ ν, n ≤ N s and multiply (7) by U from the left:( U K ) | υ i = U | φ i . (B2)We emphasize that multiplication of any linear equa-tion by a unitary matrix does not change its Tikhonov-regularized pseudoinverse solution. This follows imme-diately from ( U K ) ∗ ( U K ) = K ∗ K and ( U K ) ∗ U = K ∗ .Now we use the equalities( U K ) ( µν ) ,j = σ Aµ σ Bν h g Aµ | j ih j | f Bν i , h ( µν ) | U | φ i = ˜ Φ µν to re-write (B2) as σ Aµ σ Bν N v X j =1 h g Aµ | j ih j | f Bν ih j | υ i = ˜ Φ µν , ≤ µ ≤ N d , ≤ ν ≤ N s . Next we observe that the above set may contain someequations in which all coefficients are zero or very small.These equations can be safely discarded and this oper-ation still does not affect the pseudo-inverse. Thereforewe obtain σ Aµ σ Bν N v X j =1 h g Aµ | j ih j | f Bν ih j | υ i = ˜ Φ µν , (B3)1 ≤ µ ≤ M A , ≤ ν ≤ M B , where M A and M B are the dimensions of the shaded rect-angle in Fig. 3, which is the region where the inequality σ Aµ σ Bν > ǫ holds and ǫ is the small positive constant in-troduced in (17). Note that (B3) is in all respects equiv-alent to (7).At this point however, we make a transformation thatwill, in fact, change the equation. Namely, we divide (B3)by the factor σ Aµ σ Bν , which is larger than ǫ for all equa-tions included in (B3). In computational linear algebra,this operation is known as preconditioning by diagonalscaling. We thus obtain N v X j =1 h g Aµ | j ih j | f Bν ih j | υ i = (cid:16) ˜ T exp (cid:17) µν , (B4) 1 ≤ µ ≤ M A , ≤ ν ≤ M B , where we have used the definition (19) of ˜ T exp . The diag-onal scaling that was applied to obtain (B4) is invertible.Therefore, if (7) has a solution in the ordinary sense, then(B4) has the same solution. However, if (7) is not invert-ible, then the two equations have different pseudoinversesolutions that can not be related to each other by a sim-ple transformation. In this sense, the two equations (7)and (B4) are no longer equivalent.To obtain (35), we observe that (B4) can be written as Q | υ i = | τ i , (B5)where Q ( µν ) ,j = h g Aµ | j ih j | f Bν i and h ( µν ) | τ i = (cid:16) ˜ T exp (cid:17) µν .We then multiply (B3) by Q ∗ from the left and obtain( Q ∗ Q ) ij = M A X µ =1 M B X ν =1 h f Bν | i ih i | g Aµ ih g Aµ | j ih j | f Bν i = W ij , where W = Q ∗ Q is the same matrix as in (34) (comparethe above equation to (37)). In a similar manner, weobtain h i | Q ∗ | τ i = M A X µ M B X ν =1 Q ∗ i, ( µν ) (cid:16) ˜ T exp (cid:17) µν = ( P A ˜ T exp P ∗ B ) ii = ( T exp ) ii = h i | υ exp i . Therefore, Q ∗ | τ i = | v exp i . We thus conclude that (35) isobtained from (B5) by multiplying both sides with Q ∗ .Moreover, the substitution W → W + λ I is equivalentto Tikhonov-regularization of (B5).Finally, we can state the formal relation between K and W in the following form: W = ( Θ U K ) ∗ ( Θ U K ) = K ∗ U − Θ U K , where U is the unitary matrix defined in (B1) and Θ isthe diagonal conditioning matrix containing the quanti-ties 1 / ( σ Aµ σ Bν ) for 1 ≤ µ ≤ M A and 1 ≤ ν ≤ M B andzeros otherwise.Above consideration applied to the case ρ ( ℓ ki ) = δ ki or, equivalently, H = I , where H ki = ρ ( ℓ ki ). If H = I ,the matrix W is given by W = ( Θ U K ) ∗ F ( Θ U K ) . The matrix F appears in this expression as an additionalfilter. The free term of the equation is modified so that h i | υ exp i = ( T exp F ) ii . Appendix C: Definitions and properties of severalfunctionals used in this paper
A summary of definitions and mathematical propertiesof the several functionals used in this paper is given inTable I. In this table, F refers to any of the functionals T , R , D , M and N and O .6 F Entry-wise? Y = F [ X ] Invertible? X = F − [ Y ] T No Y = ( I − X Γ ) − X Sometimes X = ( I + Y Γ ) − Y R No Y = R ∗ A XR B Yes X = R A Y R ∗ B D Yes Y ij = δ ij P k X ik ρ ( ℓ ki ) No N/A M Yes ˜ Y µν = ( , σ Aµ σ Bν > ǫ ˜ X µν , otherwise No N/A N Yes ˜ Y µν = ( ˜ X µν , σ Aµ σ Bν > ǫ , otherwise No N/A O Yes ˜ Y = M [ ˜ X ] + ˜ T exp No N/A= ˜ X − N [ ˜ X ] + ˜ T exp TABLE I. Definitions and properties of the various functional used in this paper. The weight function ρ ( ℓ ) must be definedseparately and is expected to go to zero for large values of ℓ ; ℓ ki is the physical distance between voxels k and ii