A very fast iterative algorithm for TV-regularized image reconstruction with applications to low-dose and few-view CT
Hiroyuki Kudo, Fukashi Yamazaki, Takuya Nemoto, Keita Takaki
aa r X i v : . [ phy s i c s . m e d - ph ] S e p A very fast iterative algorithm for TV-regularized imagereconstruction with applications to low-dose and few-view CT
Hiroyuki Kudo a,b , Fukashi Yamazaki a , Takuya Nemoto a , and Keita Takaki aa Faculty of Engineering, Information and Systems, University of Tsukuba, Tennoudai 1-1-1,Tsukuba 305-8573, Japan b JST-ERATO Momose Quantum-Beam Phase Imaging Project, Katahira, Aoba-ku, Sendai980-8577, Japan
ABSTRACT
This paper concerns iterative reconstruction for low-dose and few-view CT by minimizing a data-fidelity termregularized with the Total Variation (TV) penalty. We propose a very fast iterative algorithm to solve thisproblem. The algorithm derivation is outlined as follows. First, the original minimization problem is reformulatedinto the saddle point (primal-dual) problem by using the Lagrangian duality, to which we apply the first-orderprimal-dual iterative methods. Second, we precondition the iteration formula using the ramp filter of FilteredBackprojection (FBP) reconstruction algorithm in such a way that the problem solution is not altered. Theresulting algorithm resembles the structure of so-called iterative FBP algorithm, and it converges to the exactminimizer of cost function very fast.
Keywords:
Tomography, Image Reconstruction, Image Processing, Iterative Reconstruction, Primal-Dual Al-gorithm
1. INTRODUCTION
Within the famous Total Variation (TV) regularization framework, image reconstruction in low-dose CT is for-mulated as minimizing the cost function expressed as f ( ~x ) = β k ~x k TV + k A~x − ~b k W subject to ~x ≥
0, andthat in few-view CT is formulated as min ~x k ~x k TV subject to A~x = ~b, ~x ≥
0, where k ~x k TV is the TV normwhich aims at regularizing the ill-conditioned reconstruction problem. A variety of iterative algorithms havebeen proposed for solving the above problems in CT reconstruction fields based on modifying classical iterativealgorithms such as the gradient method, ART (Algebraic Reconstruction Technique), SIRT (Simultaneous Itera-tive Reconstruction Technique), and SART (Simultaneous Algebraic Reconstruction Technique), often combinedwith the Ordered-Subsets (OS) technique [1]-[10] (and many others). However, it is fair to say that there existvery few iterative algorithms in engineering literatures, which not only converge fast but also can exactly solvethe above TV-regularized problems. It is known that, from mathematical point of view, this difficulty partlycomes from the non-differentiability of TV penalty. However, in applied mathematics fields, the research progresson this topic is very fast so that many nice algorithms to handle the TV regularization have been developed since2010. Therefore, we believe that it would be time for CT engineers like us to develop and use more rigorousiterative algorithms for the TV-regularized image reconstructions.In this paper, we propose a very fast iterative algorithm which can be applied to the above two typicalTV-regularized image reconstructions in a unified way. The algorithm derivation is outlined as follows. First,the original minimization problem is reformulated into the standard saddle point (primal-dual) problem byusing the Lagrangian duality, to which we apply the Alternating Projection Proximal (APP) algorithm whichbelongs to a class of first-order primal-dual methods including Chambolle-Pock algorithm, Generalized IterativeSoft-Thresholding (GIST) algorithm, and Alternating Extragradient (AE) algorithm [11]-[14]. However, theresulting algorithm converges very slowly, mainly because its overall structure is same as the simultaneousiterative reconstruction methods such as the classical SIRT and SART algorithms. To overcome this drawback, Further author information: (Send correspondence to H.K.)H.K.: E-mail: [email protected] e precondition the iteration formula using the ramp filter of Filtered Backprojection (FBP) reconstructionalgorithm in such a way that the solution to the preconditioned iteration perfectly coincides with the solution tothe original problem. The final algorithm can be interpreted as the first-order primal-dual method accelerated bythe FBP-type preconditioning using the ramp filter. Unlike the famous known FBP-based acceleration techniquecalled the iterative FBP algorithm [15],[16], for both the above two different formulations (low-dose CT and few-view CT), the proposed algorithm converges to the solution exactly minimizing the cost function. We evaluatethis algorithm for both the low-dose CT case and the few-view CT case with 32 projection data. In the low-doseCT experiment, the proposed algorithm converged with approximately 10 iterations to the almost same image asthe GIST algorithm with 1,000 iterations. In the few-view CT experiment, the proposed algorithm required only3 iterations to reach to the almost same image as the Chambolle-Pock algorithm with 1,000 iterations. Theseaccelerations were achieved by the introduction of FBP-type preconditioning in the primal-dual space (in thesaddle point formulation), which has not been investigated yet and is an original contribution of this paper.
2. PROPOSED ALGORITHM FOR LOW-DOSE CT RECONSTRUCTION2.1 Problem Formulation and Brief Review of Existing Algorithms
In this section, we formulate image reconstruction in the low-dose CT and review typical existing iterativealgorithms. We denote an image by a J -dimensional vector ~x = ( x , x , . . . , x J ) T and denote a correspondingprojection data (sinogram) by an I -dimensional vector ~b = ( b , b , . . . , b I ) T . We denote the system matrix whichrelates ~x to ~b by A = { a ij } , where we assume that I > J and ~b is contaminated with statistical noise. Throughoutthis paper, we assume that the projection data is acquired by the standard parallel-beam geometry so that ~b consists of uniform samples of continuous projection data p ( r, θ ). Normally, image reconstruction in this setupcan be formulated as the problem of minimizing the Penalized Weighted Least-Squares (P-WLS) cost functionexpressed as min ~x f ( ~x ) ≡ βψ ( ~x ) + 12 k A~x − ~b k W subject to ~x ≥ , (1)where W denotes I × I diagonal matrix in which each diagonal element w i is the inverse of noise variance σ i ,and ψ ( ~x ) is the penalty function to smooth the image [17],[18]. Although there exist several choices in ψ ( ~x ),throughout this paper, we assume that ψ ( ~x ) is the TV penalty function defined by ψ ( ~x ) = k ~x k TV ≡ J X j =1 q ( ~h Tj ~x ) + ( ~v Tj ~x ) , (2)where ~h Tj ~x and ~v Tj ~x are inner product representations of finite difference operations around the j -th pixel alongthe horizontal and vertical directions, respectively. See Fig. 1 for the detailed definitions of ~h j and ~v j .Since 2008 when GE Healthcare developed the ASIR (Advanced Statistical Iterative Reconstruction) software,iterative low-dose CT reconstruction has been progressed according to the following three stages.[Image Space Denoising (1-st Generation)] This class of reconstruction methods first perform an FBP reconstruc-tion followed by a smoothing by using an iterative edge-preserving denoising algorithm such as the TV-denoisingand the MRF-based denoising.[Iterative FBP Algorithm (2-nd Generation)] Typically, this class of reconstruction methods are based on thefollowing iteration formula ~x ( k +1) = P X [ ~x ( k ) − γ ( A T W / GW / ( A~x ( k ) − ~b ) + β ∇ ψ ( ~x ( k ) ))] , (3)where P X [ · ] denotes the projection operator onto the positive orthant ~x ≥ G is the 1-D ramp filter ofFBP reconstruction algorithm having the frequency response | ω | . Thanks to the introduction of G during theiteration, Eq. (3) converges very fast. However, it is well-known that is cannot exactly minimize the P-WLS igure 1. Definitions of the horizontal difference h Tj ~x and the vertical difference v Tj ~x used in the TV penalty function. cost function of Eq. (1). Another limitation is that the penalty function ψ ( · ) needs to be differentiable so thatthe TV penalty, which is non-differentiable, cannot be used.[True Iterative Reconstruction (IR) Algorithm (3-rd Generation)] This class of reconstruction methods try toexactly minimize the P-WLS cost function of Eq. (1). For example, if we use the gradient method, its iterationformula can be expressed as ~x ( k +1) = P X [ ~x ( k ) − γ ( A T W ( A~x ( k ) − ~b ) + β ∇ ψ ( ~x ( k ) ))] . (4)The drawback of this method is that its convergence is rather slow compared with the iterative FBP algorithmwhen no acceleration techniques are incorporated. Another limitation is that the penalty function ψ ( · ) needs tobe differentiable so that the TV penalty cannot be used again.[Preconditioned Gradient Method] An alternative approach which is known in image reconstruction communityis to use the iteration formula as ~x ( k +1) = P X [ ~x ( k ) − γM ( A T W ( A~x ( k ) − ~b ) + β ∇ ψ ( ~x ( k ) ))] , (5)where M denotes the 2-D ramp filter in image space having the frequency response q ω x + ω y [19]. Unlike Eq.(3), this method converges to an exact minimizer of Eq. (1), but it is rarely used mainly because the 2-D filteris computationally intensive and its design is troublesome.We note that the proposed algorithm derived in the following sections also uses the FBP-type acceleration asin Eq. (3), but it exactly converges to a minimizer of Eq. (1) and also allows to use the TV penalty. Therefore,we believe that the proposed algorithm overcomes the most major drawbacks of existing low-dose CT iterativereconstruction algorithms.In Fig. 2, we show a comparison between the true IR reconstruction and the iterative FBP reconstruction fora numerical chest phantom without a regularization term in the low-dose CT setup. As expected from the theory,it can be observed that the noise property of iterative FBP reconstruction is worse compared with the true IRreconstruction. This result motivates us to develop an alternative FBP-type acceleration technique which is ableto exactly minimize the WLS and P-WLS cost functions. igure 2. Comparison of reconstructed images and reconstructed error magnitude images between the true IR reconstruc-tion and the iterative FBP reconstruction. The iterative FBP reconstruction suffers from worse noise properties. From this section, we derive the proposed algorithm, which is based on reformulating the minimization ofEq. (1) into a saddle point (primal-dual) problem, precondition it, and applying the Alternating ProjectionProximal (APP) algorithm which belongs to a class of first-order primal-dual methods [11]. First, Eq. (1) canbe reformulated into the following equality constrained minimization by introducing an additional variable ~y .min ( ~x,~y ) ( βψ ( ~x ) + 12 k ~y − ~b k W ) subject to A~x − ~y = 0 , ~x ≥ , (6)where we note that ~y can be interpreted as forward-projected projection data computed from ~x . Next, we performa preconditioning on the constraint A~x = ~y . When we solve Eq. (6) using an iterative algorithm, its convergencespeed strongly depends on the condition of linear constraint A~x = ~y . So, we perform the preconditioning bymultiplying A~x − ~y = 0 by some non-singular I × I matrix D / . Then, Eq. (6) can be converted intomin ( ~x,~y ) ( βψ ( ~x ) + 12 k ~y − ~b k W ) subject to D / ( A~x − ~y ) = 0 , ~x ≥ , (7)The detailed form of D / is not shown here, but we discuss later which choice of D / is best to acceleratethe convergence. Intuitively, if we choose D / in such a way that D = D T/ D / approximates the rampfilter in the FBP reconstruction, the final structure of iteration formula resembles the iterative FBP algorithmleading to a fast convergence. Furthermore, we remark that the rewriting from Eq. (6) to Eq. (7) does notalter the problem solution ( ~x, ~y ). Next, we reformulate Eq. (7) into a form of saddle point problem [11]-[14].The procedure is as follows. The Lagrangian function L ( ~x, ~y, ~µ ) corresponding to Eq. (7) with respect to theconstraint D / ( A~x − ~y ) = 0 is defined by L ( ~x, ~y, ~µ ) = βψ ( ~x ) + 12 k ~y − ~b k W + ~µ T D / ( A~x − ~y ) , (8)here ~µ is the Lagrange multiplier vector, which is also called the dual variable. Therefore, Eq. (7) can beconverted into the saddle point problemmax ~µ min ( ~x ≥ ,~y ) L ( ~x, ~y, ~µ ) = βψ ( ~x ) + 12 k ~y − ~b k W + ~µ T D / ( A~x − ~y ) . (9)Furthermore, the variable ~y can be eliminated in Eq. (9) by solving the minimization with respect to ~y . Byperforming this computation, we obtainmax ~µ min ~x ≥ L ( ~x, ~µ ) = βψ ( ~x ) − k D T/ ~µ k W − − ~µ T D / ~b + ( D T/ ~µ, A~x ) , (10)where ( · , · ) denotes the inner product. In some literatures, Eq. (10) is called the standard form of saddle pointproblem [12]. It is also called the primal-dual (PD) problem whereas the original problem of Eq. (1) is calledthe primal (P) problem [20]-[22]. By using the so-called strong duality theorem in optimization literatures, wecan prove the following theorem [20]-[22].[Theorem] Assume that the penalty function ψ ( ~x ) is a possibly non-differentiable convex function. We denotethe solution to the primal problem (Eq. (1)) by ~x (P) . We denote the solution to the primal-dual problem (Eq.(10)) by ( ~x (PD) , ~µ (PD) ). Then, the following two properties hold.(1) ~x (P) = ~x (PD) (2) L ( ~x (PD) , ~µ (PD) ) = f ( ~x (P) )From this theorem, if we succeed in exactly solving the saddle point problem of Eq. (10), its solution vector ~x (PD) is also a solution of the original problem of Eq. (1), i.e. ~x (P) . In summary of the above discussion, weneed to solve the following saddle point problem.max ~µ min ~x L ( ~x, ~µ ) = g ( ~x ) − h ( ~µ ) + ( D T/ ~µ, A~x ) ,g ( ~x ) = βψ ( ~x ) + i ( ~x ) , h ( ~µ ) = 12 k D T/ ~µ k W − + ~µ T D / ~b, (11)where the non-negativity constraint ~x ≥ g ( ~x ) as the indicator function i ( ~x )defined by i ( ~x ) = (cid:26) ~x ≥ ∞ (otherwise) . (12)The proposed algorithm is based on solving Eq. (11) instead of Eq. (1). There exist a variety of iterativemethods to solve Eq. (11) in optimization literatures such as the classical multiplier method, augmented La-grangian method, Alternating Direction Method of Multiplier (ADMM) method, etc [20]-[23]. Among them,we use a class of iterative methods called the first-order primal-dual methods, because they require relativelysimple computations per iteration [11]-[14]. The basic overall structure of this iterative method is to repeat anupdate of primal variable ~x along the descent direction of Lagrangian − ∂L ( ~x, ~µ ) /∂~x , i.e. the descent directionof g ( ~x ) + ( D T/ ~µ, A~x ), and an update of dual variable ~µ along the ascent direction of Lagrangian ∂L ( ~x, ~µ ) /∂~µ , i.e. the ascent direction of − h ( ~µ ) + ( D T/ ~µ, A~x ), alternately with some additional extrapolation step. Therole of extrapolation step is to guarantee that each update from ( ~x ( k ) , ~µ ( k ) ) to ( ~x ( k +1) , ~µ ( k +1) ) becomes a non-expansive mapping leading to a convergence to the saddle point. In each update along the descent or ascentdirection, either the gradient step or the proximal step can be used dependent on differentiabilities of g ( ~x ) and h ( ~µ ). Typically, there exist four variations in the first-order primal-dual methods, which are the Chambolle-Pockalgorithm, Generalized Iterative Soft-Thresholding (GIST) algorithm, Alternating Projection Proximal (APP)algorithm, and Alternating Extragradient (AE) algorithm [11]-[14]. The differences in used operations amongthe four algorithms are briefly summarized in Table 1.In our problem, as is clear from Eq. (11), g ( ~x ) is non-differentiable because it corresponds to the TV penalty(combined with the non-negativity constraint) whereas h ( ~µ ) is differentiable (quadratic form). Thus, we decidedame Required differentiability Primalupdate Dual up-date ExtrapolationAlternating ProjectionProximal (APP) [11] g ( ~x ) can be non-differentiable, h ( ~µ ) needs to be differentiable proximal gradient dual spaceChambolle-Pock (CP) [12] both g ( ~x ) and h ( ~µ ) can be non-differentiable proximal proximal primal spaceGeneralized Iterative Soft-Thresholding (GIST) [13] g ( ~x ) needs to be differentiable, h ( ~µ ) can be non-differentiable gradient proximal dual spaceAlternating Extragradient(AE) [14] both g ( ~x ) and h ( ~µ ) need to bedifferentiable gradient gradient dual space Table 1. Differences among the four first-order primal-dual algorithms. to employ the APP algorithm [11]. For general (non-differentiable) g ( ~x ) and (differentiable) h ( ~µ ), the iterationformula of APP algorithm in its original form is summarized as follows.[Extrapolation Step] ~µ ( k +1) = ~µ ( k ) + σ ( D / A~x ( k ) − ∇ h ( ~µ ( k ) )) (13)[Primal Update (Proximal)] ~x ( k +1) = prox τg ( ~x ( k ) − τ A T D T/ ~µ ( k +1) ) (14)[Dual Update (Gradient)] ~µ ( k +1) = ~µ ( k ) + σ ( D / A~x ( k +1) − ∇ h ( ~µ ( k ) )) , (15)where k is the iteration number, and τ > σ > τ and σ are selected such that0 < σ < L , < τ < σ k D / AA T D T/ k , (16)where L is Lipschitz constant of the gradient ∇ h ( ~µ ) [11]. Furthermore, the operator prox τg ( · ) appearing in Eq.(14) denotes the proximity (prox) operator defined by ~x = prox τg ( ~z ) ≡ arg min ~x ( τ g ( ~x ) + 12 k ~x − ~z k ) = arg min ~x ≥ ( τ βψ ( ~x ) + 12 k ~x − ~z k ) , (17)where τ is called the stepsize parameter [24]. We will describe later on how to compute the prox operator when ψ ( ~x ) is the TV penalty. We note that Eq. (13) represents the extrapolation step to mathematically guaranteethe convergence, Eq. (14) is the primal update using the prox operator, and Eq. (15) is the dual update usingthe gradient step. The iteration formula in our special case of Eq. (11) is obtained by applying the generalalgorithm of Eqs. (13)-(15) followed by the variable change ~µ ← D T/ ~µ and rewriting the extrapolation stepinto a much simpler form. The resulting iteration formula is expressed as[Extrapolation Step] ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k ) − ~b − W − ~µ ( k ) ) = 2 ~µ ( k ) − ~µ ( k − − σDW − ( ~µ ( k ) − ~µ ( k − ) (18)[Primal Update] ~x ( k +1) = prox τg ( ~x ( k ) − τ A T ~µ ( k +1) ) (19)[Dual Update] ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k +1) − ~b − W − ~µ ( k ) ) , (20)here D = D T/ D / and we simplified the extrapolation step using the fact that the σD ( A~x ( k ) − ~b ) term in theextrapolation step is same as σD ( A~x ( k +1) − ~b ) appearing in the previous dual update. The algorithm derivationwas finished. With respect to computational requirements, the dominant computations in implementing Eqs.(18)-(20) are a forward projection A , a backprojection A T , and a prox operator associated with the TV penaltyin each iteration. Therefore, we expect that the computational costs of the proposed algorithm are similar tothose of the other iterative reconstruction algorithms which use the TV penalty.In implementing Eqs. (18)-(20), there still exist two unclear points. The first point is how to computeprox τg ( · ) in the primal update, which amounts to solving the minimization of Eq. (17). When ψ ( ~x ) is the TVpenalty of Eq. (2), Eq. (17) is same as the so-called TV denoising problem (ROF problem) investigated in[25],[26], to which a variety of efficient algorithms are available. In our implementations, we use Chambolle’sprojection algorithm for this purpose modified in such a way that the non-negativity constraint ~x ≥ D to achieve a fastconvergence. Intuitively, from Eqs. (18)-(20), it can be inspired that a nice choice of D is the ramp filter in theFBP reconstruction, because, in this case, the first iteration initialized with ~x (0) = 0 , ~µ (0) = 0 coincides with theFBP reconstruction followed by the TV denoising. However, this is not the best choice when the projection data ~b contains statistical noise. We discuss this issue below. First, we note that D needs to be a positive definitematrix, because D is required to be expressed as D = D T/ D / . By forgetting the extrapolation step (for somemoment) and combining Eqs. (19) and (20), the approximate iteration formula with respect to the dual variable ~µ is expressed as ~µ ( k +1) ≈ ~µ ( k ) + σD ( A prox τg ( ~x ( k ) − τ A T ~µ ( k ) ) − ~b − W − ~µ ( k ) ) . (21)Furthermore, neglecting the prox operator, we have ~µ ( k +1) ≈ ~µ ( k ) + σD ( A ( ~x ( k ) − τ A T ~µ ( k ) ) − ~b − W − ~µ ( k ) ) ≈ ~µ ( k ) − σD (( τ AA T + W − ) ~µ ( k ) − ( A~x ( k ) − ~b )) . (22)Equation (22) implies that the meaning of ~µ is the noise compoment in the projection data and the APP methodis implicitly solving the equation ( τ AA T + W − ) ~µ = A~x ( k ) − ~b at each iteration k by using the preconditionediteration of Eq. (22). From this observation, the best choice of D is clearly given by D = ( τ AA T + W − ) − . (23)However, Eq. (23) is not practical because its computation requires a large matrix inverse. So, we use a shift-invariant approximation W − ≈ κI where κ is the average value of diagonal elements of W − . Then, notingthat AA T is the blurring operator 2 m/ | ω | in projection data space ( m is the number of view angles over0 ≤ θ ≤ ◦ in the parallel-beam projection data), D becomes the frequency domain filter expressed as D = ( τ AA T + κI ) − = F − H ( ω ) FH ( ω ) = | ω | mτ + κ | ω | , (24)where F and F − denote the 1-D Fourier transform with respect to the radial variable in the projection dataspace and its inverse, respectively. We note that Eq. (24) is the smoothed ramp filter. We show the frequencyresponse H ( ω ) with varying parameter τ in Fig. 3. Finally, we summarize that the smoothed ramp filter is thebest preconditioning matrix D in the case of low-dose CT reconstruction. We experimentally confirmed thatthere exists an essential difference between the ordinary (non-smoothed) ramp filter and the smoothed rampfilter in the convergence speed of cost function value f ( ~x ). We summarize the proposed algorithm for the low-dose CT in Algorithm 1. As a special case of β = 0, i.e. absence of the TV penalty term, we obtain a fast FBP-preconditioned algorithm to exactly minimize the WLScost function, which is also summarized in Algorithm 1’. igure 3. Frequency response of the preconditioning filter H ( ω ) used in the proposed algorithm for the low-dose CT. Algorithm 1: Low-Dose CT TV-Regularized WLS ReconstructionPrepare the preconditioning matrix D . Give initial primal and dual vectors by ~x (0) = 0 and ~µ (0) = 0. Set stepsize parameters ( σ > , τ >
0) such that 0 < σ < / k D / W − D T/ k and 0 < τ < / ( σ k D / AA T D T/ k ). Execute the following steps for k = 0 , , , . . . .[Step 1] Extrapolation Step ~µ ( k +1) = (cid:26) − σD~b (if k = 0)2 ~µ ( k ) − ~µ ( k − − σDW − ( ~µ ( k ) − ~µ ( k − ) (if k = 0)[Step 2] Primal Update ~x ( k +1) = prox τg ( ~x ( k ) − τ A T ~µ ( k +1) )= arg min ~x ≥ ( τ βψ ( ~x ) + 12 k ~x − ( ~x ( k ) − τ A T ~µ ( k +1) ) k )[Step 3] Dual Update ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k +1) − ~b − W − ~µ ( k ) )Algorithm 1’: Low-Dose CT WLS ReconstructionPrepare the preconditioning matrix D . Give initial primal and dual vectors by ~x (0) = 0 and ~µ (0) = 0. Set stepsize parameters ( σ > , τ >
0) such that 0 < σ < / k D / W − D T/ k and 0 < τ < / ( σ k D / AA T D T/ k ). Execute the following steps for k = 0 , , , . . . .[Step 1] Extrapolation Step ~µ ( k +1) = (cid:26) − σD~b (if k = 0)2 ~µ ( k ) − ~µ ( k − − σDW − ( ~µ ( k ) − ~µ ( k − ) (if k = 0)[Step 2] Primal Update ~x ( k +1) = P X [ ~x ( k ) − τ A T ~µ ( k +1) ][Step 3] Dual Update ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k +1) − ~b − W − ~µ ( k ) ) . PROPOSED ALGORITHM FOR FEW-VIEW CT RECONSTRUCTION The algorithm derivation used in Section 2 based on the first-order primal-dual methods combined with thepreconditioning is a rather general framework so that it can be applied to develop a fast convergent reconstructionalgorithm in the few-view CT formulation. We denote an image by a J -dimensional vector ~x = ( x , x , . . . , x J ) T and denote a corresponding projection data (sinogram) by an I -dimensional vector ~b = ( b , b , . . . , b I ) T . Wedenote the system matrix which relates ~x to ~b by A = { a ij } , where we assume that I < J and ~b does not containstatistical noise. Normally, image reconstruction in this setup can be formulated as the problem of minimizinga penalty function ψ ( ~x ) under the linear constraint A~x = ~b asmin ~x ψ ( ~x ) subject to A~x = ~b, ~x ≥ , (25)where we assume that ψ ( · ) is the TV penalty defined by Eq. (2) throughout this section. Very often, in engi-neering literatures, the problem of Eq. (25) is converted into the unconstrained minimization min ~x ≥ ( βψ ( ~x )+ k A~x − ~b k ) followed by solving it using a technique of unconstrained optimizations. However, the solution ob-tained in this way can be quite different from that of the original constrained problem of Eq. (25). The proposedalgorithm below allows us to exactly solve Eq. (25).We derive the proposed algorithm to solve Eq. (25) using the same mathematical framework as in Section 2below. First, we introduce the I × I non-singular preconditioning matrix D / into Eq. (25) asmin ~x ψ ( ~x ) subject to D / ( A~x − ~b ) = 0 , ~x ≥ . (26)We note that the solution to Eq. (25) is same as the solution to Eq. (26). The Lagrangian function L ( ~x, ~µ )corresponding to Eq. (26) with respect to the constraint D / ( A~x − ~b ) = 0 is defined by L ( ~x, ~µ ) = ψ ( ~x ) + ~µ T D / ( A~x − ~b ) , (27)where ~µ is the Lagrange multiplier vector (the dual variable). Therefore, Eq. (26) can be converted into thesaddle point problemmax ~µ min ~x L ( ~x, ~µ ) = g ( ~x ) − h ( ~µ ) + ( D T/ ~µ, A~x ) , g ( ~x ) = ψ ( ~x ) + i ( ~x ) , h ( ~µ ) = ~µ T D / ~b, (28)where i ( ~x ) is the indicator function defined by Eq. (12). Since the structure of Eq. (28) is same as that of Eq.(11) (there exists only a difference in the form of h ( ~µ )), we can use the same approach as in the low-dose CTcase to develop an iterative algorithm. Applying the APP algorithm given by Eqs. (13)-(15) to Eq. (28) yieldsthe following iterative algorithm.[Extrapolation Step] ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k ) − ~b ) = 2 ~µ ( k ) − ~µ ( k − (29)[Primal Update] ~x ( k +1) = prox τg ( ~x ( k ) − τ A T ~µ ( k +1) ) (30)[Dual Update] ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k +1) − ~b ) . (31)The algorithm derivation was finished. We note that the only difference between the low-dose CT case and thefew-view CT case lies in the absence of W − ~µ ( k ) term in the latter from the comparison between Eqs. (18)-(20)and Eqs. (29)-(31). Next, we discuss about the preconditioning matrix D to achieve a fast convergence. Byfollowing the same discussion as in the low-dose CT case, thanks to the absence of W − ~µ ( k ) term, the bestlgorithm 2: Few-View CT TV-Regularized ReconstructionPrepare the preconditioning matrix D . Give initial primal and dual vectors by ~x (0) = 0 and ~µ (0) = 0. Set stepsize parameters ( σ > , τ >
0) such that 0 < στ < / k D / AA T D T/ k .Execute the following steps for k = 0 , , , . . . .[Step 1] Extrapolation Step ~µ ( k +1) = (cid:26) − σD~b (if k = 0)2 ~µ ( k ) − ~µ ( k − (if k = 0)[Step 2] Primal Update ~x ( k +1) = prox τg ( ~x ( k ) − τ A T ~µ ( k +1) )= arg min ~x ≥ ( τ ψ ( ~x ) + 12 k ~x − ( ~x ( k ) − τ A T ~µ ( k +1) ) k )[Step 3] Dual Update ~µ ( k +1) = ~µ ( k ) + σD ( A~x ( k +1) − ~b )preconditioning matrix D in Eqs. (29)-(31) can be shown to be the ramp filter in the FBP reconstruction, whichis expressed as D = ( τ AA T ) − = F − H ( ω ) FH ( ω ) = | ω | mτ , (32)where we note again that m is the number of view angles over 0 ≤ θ ≤ ◦ in the parallel-beam projectiondata, and F and F − denote the 1-D Fourier transform with respect to the radial variable in the projection dataspace and its inverse, respectively. With this choice of D , the first iteration initialized with ~x (0) = 0 , ~µ (0) = 0coincides with the FBP reconstruction followed by the TV smoothing (denoising). We summarize the proposed algorithm for the few-view CT in Algorithm 2.
4. SIMULATION STUDIES
We have performed two simulation studies to demonstrate performances of Algorithm 1 in the low-dose CT recon-struction and Algorithm 2 in the few-view CT reconstruction. The details of simulation studies are summarizedbelow.[Low-Dose CT Simulation] We used a numerical phantom called the spot phantom consisting of 256 ×
256 (pixels)and a single slice of chest-scan CT image consisting of 320 ×
320 (pixels). The simulated projection data wascomputed with 1,200 (angles) by the parallel-beam geometry followed by adding noise which follows transmissionPoisson statistics, from which image reconstructions were performed. We compared the proposed algorithm withthe following three competitive algorithms.(a) Gradient descent algorithm: The standard gradient descent algorithm to minimize the TV-regularized WLScost function of Eq. (1) was implemented. To deal with the non-differentiability of TV penalty, the TV normwas approximated by the standard smoothing technique as ψ ( ~x ) ≈ J X j =1 ( q ( ~h Tj ~x ) + ( ~v Tj ~x ) + ǫ − √ ǫ ) , (33)where ǫ > A~x = ~b, ~x ≥
0. Since this algorithm is of simultaneous update type as in the GIST algorithm, we expect that itsconvergence is rather slow.(b) ART algorithm [27]: We also implemented the standard ART algorithm without the TV penalty, which isknown to converge fast thanks to its row-action (sequential update) structure.In Figs. 4-5, we show reconstructed images in the low-dose CT case together with corresponding Root MeanSquares (RMSE) reconstruction errors defined byRMSE = vuuuut J X j =1 ( x j − x (true) j ) J , (34)where x j denotes the reconstructed pixel value and x (true) j denotes the true pixel value. Also, in Fig. 6, we showthe corresponding convergence properties of each reconstruction algorithm. These figures clearly demonstratethat Algorithm 1 converges to the exact minimizer of the TV-regularized WLS cost function very fast. Inparticular, it can be observed that the iterative FBP algorithm converges to an image with worse RMSE valuealthough its convergence is very fast, whereas Algorithm 1 converges to the exact minimizer with a much fasterspeed compared with the other slow algorithms such as the gradient descent and GIST algorithms thanks to theFBP-type preconditioning.In Fig. 7, we show reconstructed images in the few-view CT case together with corresponding RMSEreconstruction errors. Also, in Fig. 8, we show the corresponding convergence properties of each reconstructionalgorithm. These figures clearly demonstrate that Algorithm 2 converges to an image with much smaller RMSEvalue compared with the ART algorithm, and its convergence is much faster than the Chambolle-Pock algorithmagain thanks to the FBP-type preconditioning.
5. CONCLUSIONS
In this paper, we proposed new image reconstruction algorithms for two typical TV-regularized reconstructionproblems in tomography. The first algorithm (Algorithm 1) minimizes the TV-regularized WLS cost function forthe low-dose CT, and the second algorithm (Algorithm 2) minimizes the TV penalty under the data fidelity linearconstraint for the few-view CT. The both algorithms were derived from the saddle point reformulation of eachoptimization problem followed by the preconditioning using the ramp filter of FBP reconstruction algorithm andapplying the Alternating Projection Proximal (APP) iterative algorithm. We demonstrated that both Algorithm1 and Algorithm 2 converge very fast while its convergence to the exact solution of each optimization problemis guaranteed.On the low-dose CT reconstruction, very recently, we began to use a shift-variant preconditioner insteadof the smoothed ramp filter, which approximates D = ( τ AA T + W − ) − better, and confirmed an essentialimprovement. On the few-view CT reconstruction, we began to extend the formulation to min ~x ψ ( ~x ) subject to k A~x − ~b k ≤ ǫ , ~x ≥ ǫ . These newest results will be published elsewhere. ACKNOWLEDGMENTS
This work was partially supported by JSPS KAKENHI Grant Number 15K06103. igure 4. Reconstructed images of chest CT phantom in the low-dose CT experiment (1,200 projection data). Algorithm1 was compared with the gradient descent algorithm and the iterative FBP algorithm. The numbers in images representRMSE reconstruction errors.igure 5. Reconstructed images of spot phantom in the low-dose CT experiment. Algorithm 1 was compared with theGIST algorithm. The numbers in images represent RMSE reconstruction errors.
REFERENCES [1] E.Y.Sidky and X.Pan ”Image reconstruction in circular cone-beam computed tomography by constrained,total-variation minimization,”
Phys.Med.Biol. , Vol.53, pp.4777-4807, 2008.[2] H.Yu and G.Wang ”A soft-thresholding filtering approach for reconstruction from a limited number of pro-jections,”
Phy.Med.Biol. , Vol.55, pp.3905-3916, 2010.[3] S.Ramani and J.A.Fessler ”A splitting-based iterative algorithm for accelerated statistical x-ray CT recon-struction,”
IEEE Trans.Med.Imaging , Vol.31, pp.677-688, 2012.[4] E.Y.Sidky, J.H.Jorgensen, and X.Pan ”Convex optimization problem prototyping for image reconstructionin computed tomography with the Chambolle-Pock algorithm,”
Phys.Med.Biol. , Vol.57, pp.3065-3091, 2012.[5] L.Ritschl, F.Bergner, C.Fleischmann, M.Kachelriess ”Improved total variation-based CT image reconstruc-tion applied to clinical data,”
Phys.Med.Biol. , Vol.56, pp.1545-1561, 2011.[6] J.Song, Q.H.Liu, G.A.Johnson, and C.T.Bade ”Sparseness prior based iterative image reconstruction forretrospectively gated cardiac micro-CT,”
Med.Phys. , Vol.34, pp.4476-4483, 2007.[7] Z.Chen, X.Jin, L.Li, and G.Wang ”A limited-angle CT reconstruction method based on anisotropic TVminimization,”
Phys.Med.Biol. , Vol.58, pp.2119-2141, 2013.[8] Z.Tian, X.Jia, K.Yuan, T.Pan, and S.B.Jiang ”Low dose CT reconstruction via edge-preserving total variationregularization,”
Phys.Med.Biol. , Vol.56, pp.5949-5967, 2011.[9] M.Defrise, C.Vanhove, and X.Liu ”An algorithm for total variation regularization in high-dimensional linearproblems,”
Inverse Problems , Vol.27, Paper No. 065002, 2011.[10] V.P.Gopia, P.Palanisamya, K.A.Wahidb, P.Babync, and D.Cooperd ”Micro-CT image reconstruction basedon alternating direction augmented Lagrangian method and total variation,”
Comput.Med.Imaging Graph. ,Vol.37, pp.419-429, 2013.[11] P.Tseng ”Alternating projection-proximal methods for convex programming and variational inequalities,”
SIAM J.Opt. , Vol.7, pp.951-965, 1997.[12] A.Chambolle and T.Pock ”A first-order primal-dual algorithm for convex problems with applications toimaging,”
J.Math.Imaging Vis. , Vol.40, pp.120-145, 2011. igure 6. Comparison of convergence properties among Algorithm 1, gradient descent algorithm, iterative FBP algorithm,and the GIST algorithm in the low-dose CT experiment. Iteration number (computational time) versus cost functionvalue f ( ~x ) was plotted.igure 7. Reconstructed images of spot phantom in the few-view CT experiment (32 projection data). Algorithm 2was compared with the Chambolle-Pock algorithm and the ART algorithm. The numbers in images represent RMSEreconstruction errors.igure 8. Comparison of convergence properties among Algorithm 2, the Chambolle-Pock algorithm, and the ART algo-rithm in the few-view CT experiment. Iteration number (computational time) versus RMSE reconstruction error valuewas plotted. [13] I.Loris and C.Verhoeven ”On a generalization of the iterative soft-thresholding algorithm for the case ofnon-separable penalty,” Inverse Problems , Vol.27, Paper No.125007, 2011.[14] S.Bonettini and V.Ruggiero ”An alternating extragradient method for total variation-based image restora-tion from poisson data,”
Inverse Problems , Vol.27, Paper No.095001, 2011.[15] K.Zeng, Z.Chen, L.Zhang, and G.Wang ”An error-reduction-based algorithm for cone-beam computed to-mography,”
Med.Phys. , Vol.31, pp.3206-3212, 2004.[16] J.Sunnegardh, P-E.Danielsson ”Regularized iterative weighted filtered backprojection for helical cone-beamCT,”
Med Phys , Vol.35, pp.4173-4185, 2008.[17] J.A.Fessler ”Penalized weighted least-squares image reconstruction for positron emission tomography,”
IEEETrans.Med.Imaging , Vol.13, pp.290-300, 1994.[18] E.U.Mumcuoglu, R.Leahy, S.R.Cherry, and Z.Zhou ”Fast gradient-based methods for Bayesian reconstruc-tion of transmission and emission PET images,”
IEEE Trans.Med.Imaging , Vol.13, pp.687-701, 1994.[19] G.Chinn and S-C.Huang ”A general class of preconditioners for statistical iterative reconstruction of emissioncomputed tomography,”
IEEE Trans.Med.Imaging , Vol.16, pp.1-10, 1997.[20] D.P.Bertsekas ”Nonlinear programming,” Athena Scientific, 1999.[21] D.P.Bertsekas ”Convex optimization algorithms,” Athena Scientific, 2015.[22] R.Glowinski and P.LeTallec ”Augmented Lagrangian and operator-splitting methods in nonlinear mechan-ics,” SIAM, 1987.[23] S.Boyd, N.Parikh, and E.Chu ”Distributed optimization and statistical learning via the alternating directionmethod of multipliers,” Foundations and Trends(r) in Machine Learning, Now Publishers Inc., 2011.[24] N.Parikh and S.Boyd ”Proximal algorithms,” Foundations and Trends(r) in Optimization, Now PublishersInc., 2013.[25] L.I.Rudin, S.Osher, and E.Fatemi ”Nonlinear total variation noise removal algorithm,”
Physica D , Vol.60,pp.259-268, 1992.[26] A.Chambolle ”An algorithm for total variation minimization and applications,”
J.Math.Imaging Vis. , Vol.20,pp.89-97, 2004.[27] G.T.Herman and L.B.Meyer ”Algebraic reconstruction techniques can be made computationally efficient,”