[PDF] An Iterative Least Squares Method for Proton CT Image Reconstruction

Abstract

Clinically useful proton Computed Tomography images will rely on algorithms to find the three-dimensional proton stopping power distribution that optimally fits the measured proton data. We present a least squares iterative method with many features to put proton imaging into a more quantitative framework. These include the definition of a unique solution that optimally fits the protons, the definition of an iteration vector that takes into account proton measurement uncertainties, the definition of an optimal step size for each iteration individually, the ability to simultaneously optimize the step sizes of many iterations, the ability to divide the proton data into arbitrary numbers of blocks for parallel processing and use of graphical processing units, and the definition of stopping criteria to determine when to stop iterating. We find that it is possible, for any object being imaged, to provide assurance that the image is quantifiably close to an optimal solution, and the optimization of step sizes reduces the total number of iterations required for convergence. We demonstrate the use of these algorithms on real data.

Full PDF

11 An Iterative Least Squares Method for Proton CTImage Reconstruction.

Don F. DeJongh and Ethan A. DeJongh

Abstract —Clinically useful proton Computed Tomography im-ages will rely on algorithms to ﬁnd the three-dimensional protonstopping power distribution that optimally ﬁts the measuredproton data. We present a least squares iterative method withmany novel features to put proton imaging into a more quan-titative framework. These include the deﬁnition of a uniquesolution that optimally ﬁts the protons, the deﬁnition of aniteration vector that takes into account proton measurementuncertainties, the deﬁnition of an optimal step size for eachiteration individually, the ability to simultaneously optimize thestep sizes of many iterations, the ability to divide the protondata into arbitrary numbers of blocks for parallel processing anduse of graphical processing units, and the deﬁnition of stoppingcriteria to determine when to stop iterating. We ﬁnd that itis possible, for any object being imaged, to provide assurancethat the image is quantiﬁably close to an optimal solution,and the optimization of step sizes reduces the total number ofiterations required for convergence. We demonstrate the use ofthese algorithms on real data from the ProtonVDA system.

Index Terms —proton imaging, proton computed tomography,least squares, iterative algorithm, relaxation coefﬁcient, parallelprocessing, stopping criterion

I. I

NTRODUCTION I N radiation therapy, protons provide a superior dose dis-tribution compared to x rays, with a relatively low dosedeposition in the entrance region (plateau), followed by a steepincrease to a dose (Bragg) peak and an even steeper distaldose fall-off. However, the steep distal dose gradient and ﬁniterange of proton beams can be a profound disadvantage whenits actual position in the patient is uncertain. One source ofuncertainty is the use of x-ray imaging for treatment planningto obtain a map of relative stopping power (RSP) of tissues(relative to water), which is inaccurate due to the differences inthe dependence of x-ray attenuation and proton energy loss ontissue composition (electron density and atomic number). Thisyields an inherently inaccurate conversion of x-ray Hounsﬁeldunits to proton RSP.Treatment planning procedures take these uncertainties intoaccount with measures including adding uncertainty margins,selection of beam angles tangential to organs at risk, and robustoptimization. Using dose delivery technology such as Pencil

Research reported in this publication was supported by the NationalCancer Institute of the National Institutes of Health under award numberR44CA243939.This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this version mayno longer be accessible.Don F. DeJongh is with ProtonVDA LLC, 1700 Park St Ste 208, Naperville,IL 60563 USA (e-mail: [email protected])Ethan A. DeJongh is with ProtonVDA LLC, 1700 Park St Ste 208,Naperville, IL 60563 USA (e-mail: [email protected]) Fig. 1. Illustration of proton imaging, with tracking and residual rangemeasurements for each proton. The proton radiograph image on the rightdisplays range through the patient vs. transverse position.

Beam Scanning (PBS) and intensity modulation, the resultingplans are robust to the uncertainties and provide major beneﬁtsto a signiﬁcant fraction of patients [1]. However, they increasethe high-dose treatment volume and can preclude use of themost advantageous beam angles.In the quest to further optimize proton therapy while alsoreducing costs, proton beam-based image guidance is oftenconsidered to be a prerequisite to achieve the full potentialof proton therapy [2]. This is particularly the case for hypo-fractionated treatments, which can beneﬁt from more confor-mal dose distributions and a higher standard of safety given thehigh dose delivery for each treatment. ProtonVDA is develop-ing proton imaging systems [3] for both proton radiography(pRad) and proton CT (pCT) [4]. Range uncertainties can bereduced with pCT, while pRad has the potential to provide afast and efﬁcient check of patient set up and integrated rangealong a beams eye view just before treatment [5], [6]. ProtonCT will substantially reduce the uncertainties of treatmentplanning by directly measuring RSP without being affectedby image artifacts and with much lower dose to the patientthan comparable x-ray images [7].Proton imaging uses tracking detectors to measure thetransverse positions of individual protons before and after thepatient, and a residual range detector to determine the protonenergy absorbed within the patient, as illustrated in Fig. 1. Atwo-dimensional pRad image uses a single projection angle,directly quantifying proton range through the patient rather a r X i v : . [ phy s i c s . m e d - ph ] S e p Fig. 2. Left: ProtonVDA system [3] with pediatric head phantom placedbetween tracking planes, prepared to take data with pencil beam scanning.Right: The resulting proton radiograph [10] was automatically and promptlyproduced, and displays accurate water equivalent thickness.Fig. 3. Protons a, b, and c are acquired with the object at different anglesrelative to the detectors. For image reconstruction, all proton trajectories areplaced in a single 3D coordinate grid, as shown on the right, that moves withthe object. than integrated x-ray attenuation. A three-dimensional pCTimage measures the 3D RSP map of the patient by acquiringproton histories from a full set of projection angles. Proton tra-jectories deviate from straight lines due to multiple Coulombscattering, thus blurring images. We overcome this limitationby measuring each proton trajectory individually to estimate itsmost likely path, along with its energy loss quantiﬁed as water-equivalent path length (WEPL) and then applying iterativereconstruction algorithms [8]. Another approach uses distance-driven binning with ﬁltered backprojection to account for thecurved trajectories and reconstruct the image [9].The ﬁrst challenge in producing clinically useful pCTimages is to efﬁciently obtain a large data set of protons withaccurately measured trajectories and well-calibrated WEPLs.The ProtonVDA design [3], based on well-established fast-scintillator technology, is fast, compact, monolithic, and easilyscaled to large ﬁeld sizes (40 x 40 cm in the currentimplementation). The ProtonVDA system (Fig. 2) was ableto automatically and promptly produce the proton radiographin Fig. 2 [10]. We have also produced our ﬁrst pCT images[11] which required the measurement of protons through theobject at a comprehensive set of angles and the placement ofproton trajectories into a single 3D coordinate grid for imagereconstruction, as illustrated in Fig. 3. Our standard voxel sizeis 1 mm , which is matched to our expected spatial resolution.The second challenge in producing clinically useful pCTimages is to reconstruct the 3D RSP distribution that optimallyﬁts the proton data by solving the following matrix equationfor x : Ax = b, (1) where b is a vector with one entry per proton, containing theWEPL measurements for each proton, x is a vector with oneentry per voxel, containing the RSP for that voxel, and A isa matrix with one row for each proton and one column foreach voxel, where each entry contains the chord length of theproton trajectory through the voxel. Since each proton touchesonly a tiny fraction of the voxels, A is quite sparse.Penfold and Censor have described several iterative algo-rithms which adjust the RSPs of the voxels touched by theprotons to match the WEPLs of the protons [12]. Thesealgorithms generally rely on a projective approach, with re-peated projections of a solution vector x k onto hyperplanesin a space with coordinates deﬁned as the components of x . Each hyperplane is deﬁned by an equation obtained fromone row of A multiplied by x . The goal of the projectionsis to move towards a solution consistent with each proton,and these methods have been successfully applied to severalpCT data sets. These algorithms are often combined withadditional smoothing algorithms such as median ﬁltering ortotal variation superiorization [13].Goitein described in 1972 an iterative least squares algo-rithm used for reconstructing the ﬁrst tomographic imagesusing charged particles, with a formalism able to accommo-date tracks with curved trajectories, and a prescription foroptimizing the step size of each iteration [14]. The leastsquares approach is suitable for pCT imaging, where theWEPL uncertainties of individual protons are typically a factor300 times greater than the RSP precision of individual voxels,based on a 3 mm WEPL uncertainty and a voxel size of 1 mmwith a goal of a 1% RSP measurement. In this case, the RSPprecision is arrived at through the averaging effect of roughly protons touching each voxel. The goal of each iterationis to converge towards a point that is a best ﬁt but that is notperfectly consistent with each proton.Our goal is to produce a proton imaging system with promptimage reconstruction, using a method for which each iterationis as fast as possible, each iteration is as useful as possible(moves as much as possible towards an optimal solution), andthat iterates only as many times as necessary (stops whenclose enough to the optimal solution). We present herein aniterative least squares method for pCT image reconstructionthat achieves this with many novel features to put pCT imaginginto a more quantitative framework. These include: • The deﬁnition of a unique solution that optimally ﬁtsthe protons. Effectively converging toward this solutioneliminates some problems often associated with projec-tive algorithms [15]: – The solution does not depend on the initial startingpoint for the iterations. – There is no need to have a trade-off between opti-mizing spatial resolution and RSP resolution. • The deﬁnition of an iteration vector that takes intoaccount the large WEPL uncertainties in the proton data. • The ability to optimize the step size for each iterationindividually. • The ability to simultaneously optimize the step sizes ofmany iterations. • The ability to divide the proton data into arbitrary num-bers of blocks. Blocks can be as small as a singleproton and still maintain the simultaneous character ofthe algorithm, which takes into account all protons onan equal basis for each iteration, rather than taking theminto account sequentially, in which case the last protonhas a disproportionate impact. – The resulting ﬂexibility is very useful for optimizinguse of computing resources such as GPUs. • The deﬁnition of stopping criteria, to determine when tostop iterating. – As a result, it is possible, for any object beingimaged, to provide assurance that the image is quan-tiﬁably close to an optimal solution.The assurance of an image reconstruction quantiﬁably closeto an optimal solution is a crucial step towards applyingthis technology to clinical treatment planning, and a usefulstarting point for evaluating the clinical impact of furtherimage processing or use of approximations.We have not incorporated smoothing methods, seeing theseas better left for a later step after deﬁning an optimal solution.While these can produce better-looking images with less noise,they can introduce unknown systematic effects, particularlywhen imaging complex objects with rapid density variations.While we are describing a system for proton imaging, themethods apply equally as well to other ions such as heliumand can also be applied to other tomographic modalities suchas x-ray imaging.II. S

OLVING Ax = b USING AN ITERATIVE LEAST SQUARESMETHOD

A typical pCT image may reconstruct a few million voxelsusing a few hundred million protons, each with a WEPLmeasurement with approximately 3 mm uncertainty. The A matrix is therefore tall and skinny with no solution that exactlyﬁts all protons. We deﬁne the proton deviation vector as: d p = Ax − b (2)Here, d ip is the deviation for proton i , with d p (cid:54) = 0 even for thebest solution. We then deﬁne the voxel deviation vector d v asa weighted (ie unbiased) average of the d p of all the protonsgoing through each voxel, with d iv as the deviation for voxel i . Each iteration updates the voxels using the voxel deviationvector: x → x − λd v (3)where λ is a relaxation coefﬁcient that determines the stepsize of the iteration and can vary with iteration.Our current choice for the weighted average is to use thechord lengths for the weights, where the deviation for a voxelcan be written as (cid:80) i a i d pi (cid:80) i a i , (4)where the sums are over all protons touching the voxel and thea i are the chord lengths for each proton. The weights couldpossibly be further optimized, for example, by incorporatingthe individual precision for the WEPL measurement of each proton [14]. For our application, the protons are all measuredwith similar precision, and for simplicity we have not incor-porated the WEPL uncertainties.In terms of the A matrix, we can write the voxel deviationvector as: d iv = ( A T d p ) i (cid:80) j α Tij = A Ti (cid:80) j α Tij d p (5)where α Tij are elements of A Ti , which is the i th row of A T .We deﬁne the ¯ A T matrix as: ¯ A Ti = A Ti (cid:80) j α Tij (6)in terms of which we can write: d v = ¯ A T d p . (7)Our method is an example of a general Landweber iterativemethod [16], for which broad convergence conditions havebeen established, with ¯ A T = V − A T (8) V − = diag (1 / (cid:88) j α Tij ) . (9)We deﬁne χ as: χ = d p · d p = ( Ax − b ) · ( Ax − b ) (10) ∂χ ∂x i = 2 A Ti · ( Ax ) − A Ti · b (11)and ∂χ /∂x i corresponds to the gradient used in Landwe-ber iteration. To obtain the least squares solution we set ∂χ /∂x i = 0 , divide by (cid:80) j α Tij , and apply (6) to obtain: ¯ A T Ax − ¯ A T b = 0 (12)Applying (2) and (7), this is equivalent to: d v = 0 (13)Thus, we see the iteration in (3) converges towards the uniqueleast squares solution that optimizes the ﬁt of the ﬁnal image tothe proton data. Our goal with the use of a weighted averagein the deﬁnition of d v is to obtain an optimal direction forthe iteration vector, but it is also possible to converge whiledeﬁning d v = A T d p if this proves to have a computational ornumerical advantage.III. C HOICE OF R ELAXATION C OEFFICIENT

The steps to execute for iteration k +1 would most obviouslybe written as: x k +1 = x k − λ k d vk (14) d p ( k +1) = Ax k +1 − b (15) d v ( k +1) = ¯ A T d p ( k +1) (16)These steps require a choice of λ k before executing thecomputationally costly matrix-vector multiplications in (15) and (16). By substituting the value of x k +1 from (14) into(15) and then (15) into (16), we re-write the last two steps as: d p ( k +1) = d pk − λ k Ad vk (17) d v ( k +1) = d vk − λ k ¯ A T ( Ad vk ) (18)In this form it is possible to execute the computationally costlymatrix-vector multiplications Ad vk followed by ¯ A T ( Ad vk ) before choosing a value for the relaxation coefﬁcient, andfurthermore, to utilize the resulting vectors in the choice of λ k . Some choices we ﬁnd useful include: • Minimize χ k +1 . The following expression was previouslyderived by Goitein [14]. χ k +1 = d p ( k +1) · d p ( k +1) = χ k − λ k d pk · ( Ad vk ) + χ k | Ad vk | (19)d χ k +1 d λ k = − d pk · ( Ad vk ) + 2 λ k | Ad vk | = 0 (20) λ k = d pk · ( Ad vk ) | Ad vk | (21) • Make (cid:80) i d iv ( k +1) = 0 . λ k = (cid:80) d vk (cid:80) ¯ A T ( Ad vk ) (22) • Minimize d v ( k +1) · d v ( k +1) λ k = d vk · ¯ A T ( Ad vk ) | ¯ A T ( Ad vk ) | (23)Thus, we ﬁnd the interesting result that an optimal stepsize for each iteration, when looked at individually, can beapplied, with (21). Our experience is that the optimal relax-ation coefﬁcient can vary over two orders of magnitude fromiteration to iteration, and the traditional method of choosinga constant λ k can be quite ineffectual. If most voxels have d v far from 0, a smaller λ k is required, since each protonwill be affected by many voxels. If only a small number ofvoxels have d v far from 0, a larger λ k is possible. In thissituation, a constant λ k will result in a very gradual movementtoward the optimal solution. We have observed that it is oftenbeneﬁcial to use (22) or (23), especially when d v ( k +1) departssigniﬁcantly from 0. While less then optimal for the currentstep, this often provides conditions for subsequent large steps.Various strategies are possible to combine different methodsof choosing λ at different iterations, as illustrated in Fig. 6.An example is shown in Fig. 4 with data from the Proton-VDA system. We have found that after a few iterations thelargest deviations are usually around the edges, and use of(21) can enable a subsequent large step that results in imagessharper than efﬁciently attainable with previous approaches,as seen with the bottom image of Fig. 4, which is quantiﬁedas being very close to the optimal solution. The measuredRSPs in uniform regions-of-interest agreed well with thosederived from an x-ray CT image with a standard conversionof Hounsﬁeld units to RSP [11]. Fig. 4. A 1 mm thick pCT slice of a sample of pork shoulder and ribs [11].Top: Preliminary image. Middle: d v for voxels in the top image, with largedeviations mainly near edges. A large λ was prescribed to go to the nextiteration. Bottom: Final image, for which d v was low everywhere. IV. G

LOBAL O PTIMIZATION OF M ANY I TERATIVE S TEPS

The idea in (17) and (18) can be generalized to defer choiceof relaxation coefﬁcients for an arbitrary number of steps, andthe combination of these steps can be globally optimized. Wedeﬁne the p and v vectors as follows: p = d p (24) v k = ¯ A T p k (25) p k +1 = Av k (26)It is evident by induction using (24) to (26) and the deﬁnitionsof d p and d v that the following can be written as a sum ofthe p and v vectors with coefﬁcients κ i for a given number ofiterations n : d pn = p + n (cid:88) i =1 κ i p i (27) d vn = v + n (cid:88) i =1 κ i v i (28)The solution vector x can then be written as: x n = x + n (cid:88) i =1 κ i v i − (29)as can be easily veriﬁed by substituting (29) into (2), applying(26), and comparing with (27).The χ after n iterations is, with κ := 1 : χ p = d pn · d pn (30) = (cid:32) n (cid:88) i =0 κ i p i (cid:33) · (cid:32) n (cid:88) i =0 κ i p i (cid:33) (31) = n (cid:88) i,j =0 κ i κ j p i · p j (32)After minimizing χ with respect to the κ i , we can ﬁnd the x n closest to the optimum solution using (29) with no needfor the λ k . One direct way of ﬁnding the minimum is to setthe partial derivatives of χ with respect to the κ i to zero toobtain: p i · p + n (cid:88) j =1 κ j p i · p j = 0 (33)Deﬁning P n as the array of p i · p j , k n as the vector of κ i , and p n as the vector of − p i · p , the problem reduces to solvingfor k n in the following equation, for which there are manystandard methods. P n k n = p n (34)Alternatively, χ can be deﬁned from the d vn and since asdescribed above d v = 0 for the optimal result, the following,with similar deﬁnitions, leads to a similar solution as (34): χ v = d vn · d vn (35) V n k n = v n (36)For our application, the entries for A and b are usuallydeﬁned in mm, and the entries of x have no units. Withrepeated iterations, the resulting vectors often increase rapidlyin numerical magnitude. In theory, this is not a problem, but Fig. 5. For a typical data set, χ p versus iteration number, with simultaneousoptimization of several steps, using (34).Fig. 6. For a typical data set, χ p versus iteration number for a variety ofstrategies, based on either (34) (dp), (36) (dv), or alternating between the two. in practice can affect the numerical stability of the solution of(34) or (36). We resolve this by using for our units a lengthscale that maintains roughly constant magnitudes. This can befound with a few trials after the iterations are ﬁnished butbefore solving (34) or (36). The length of the reconstructionvolume is a good ﬁrst guess in our experience.As illustrated in Fig. 5, we have found that optimizing manyiterations simultaneously has major beneﬁts in terms of thenumber of iterations needed to reduce χ to a given level.Fig. 6 compares optimizing based on χ p , χ v , or alternatingbetween the two. We have found that an alternating strategyprovides the best convergence. As discussed in Section III, the Fig. 7. σ v as deﬁned in (38), and r.m.s. d v versus iteration number,optimizing one step and seven steps using an alternating strategy, for thedata used in Fig. 9. As r.m.s. d v falls below σ v the algorithm meets thestopping criterion in (39) for common choices of r . steps based on χ v bring the average d v closer to 0, and setup conditions for improved steps based on χ p . Steps based on χ p often move the average d v away from 0.The alternating strategy is powerful enough that a singlestep alternating strategy is often as optimal as a multi-stepstrategy. Fig. 7 compares the use of alternating strategies forboth single step and seven step optimization for the data usedfor Fig. 9, and shows similar performance for this example.V. S TOPPING C RITERIA

The above methods optimize the χ of the solution after anumber of iterations, and this χ can then be used to evaluatewhether further iterations are needed or if the current solutionis close enough to the optimal solution. For example, workerprocesses can be continuously producing additional iterationsof the p and v vectors while a parallel executive process ﬁndsthe optimal coefﬁcients and evaluates the quality of the ﬁt.We deﬁne a χ per degree of freedom, for which the squareroot can be interpreted as the average deviation per proton, as: σ p = (cid:115) χ p N p − N v (37)where N p is the total number of protons and N v is the totalnumber of voxels. With N pv as the average number of protonstouching a voxel, as obtained from the data, we can deﬁne anestimated average voxel precision as: σ v = σ p ¯ α (cid:112) N pv (38)where ¯ α is the average chord length of a proton through avoxel. For our purposes, we can simply approximate this asthe length of the side of a voxel. If a region of the image is known to have uniform RSP, the estimated voxel precision canbe determined from the image in that region, but in general theestimate in (38) has the advantage of not requiring assumptionsabout the RSP distribution.At the minimum χ , we expect σ p ≈ d v = 0 at theminimum, if for a given iteration the root-mean-square (r.m.s)of d v is less than the estimated average voxel precision, itmay be justiﬁed to stop iterating, since the noise from theproton measurement uncertainties is greater than the remainingdistance to the optimal solution. One example of a criterionto use in the decision to stop is:r.m.s. d v < rσ v . (39)We typically choose r in the range of 0.2 to 0.5, and we havefound that with enough iterations we can generally reducer.m.s. d v to any level. Fig. 7 illustrates the evolution of r.m.s. d v and σ v with the number of iterations.VI. S TRATEGIES FOR MEMORY USE

Memory resources can be a bottleneck in the implementa-tion of these strategies. While the A matrix is very large, it isalso very sparse, and various strategies to store the informationin compact form are possible. For example, the entries in A can be stored as lists of voxels with chord lengths for eachproton, or as lists of protons with chord lengths for each voxel.Although the lists only include non-zero chord lengths, it stillamounts to a large storage requirement.In the case where we are storing the entries for A as listsof voxels with chord lengths for each proton, we can takeadvantage of geometry to store this information with muchless memory. For example, each proton can have a list of linesegments which can be used to recreate the voxel list andchord lengths when needed. Each line segment should be shortenough that a straight line approximates the proton trajectoryto appropriate accuracy within the segment.As another example, each proton can have a list of chordlengths, each typically stored in one byte, and a second listof base-6 numbers, stored in 4-byte integers with 12 base-6 numbers contained in each 4-byte integer. Starting from agiven voxel, the ﬁrst base-6 number speciﬁes the voxel facethat the proton exits, and thus the identity of the next voxel,and subsequent base-6 numbers continue the chain from there.Thus, the chord lengths can be associated with the correctvoxel. VII. S TRATEGIES FOR PARALLEL PROCESSING

The algorithms described in Ref. [12], such as DiagonallyRelaxed Orthogonal Projections (DROP), can process blocksof protons in parallel. Various strategies combine the resultsfrom the different blocks. For example [17], worker processesﬁnd a solution for each block of data, a foreman processreceives all the solutions, combines them, and sends thecombined solution back to the worker processes for a furtheriteration. One drawback of these approaches is that each block

Fig. 8. Illustration of the matrix-vector multiplications, showing chord lengthdata for different blocks of protons with different shading. The chord lengthdata for one proton is contained in one row of A , and a block consists ofone or more rows. An iteration sequence can execute the two matrix-vectormultiplications in blocks, as shown in the bottom line, with the additional stepthat a foreman process must concatenate the outputs from the ﬁrst matrix-vector multiplication to create the p k +1 vector, and sum the outputs of thesecond matrix-vector multiplication to create the v k +1 vector. However, theﬁnal result is the same as if the sequence were executed in a single block asshown in the upper line. of protons must be large enough to solve the image, and thecombined solution is not identical to what would be obtainedwith a single block.We have designed new parallelization strategies that can useblocks with arbitrarily small numbers of protons and obtain aresult which is exactly the same as if the calculations wereexecuted in a single block. Bottlenecks may involve memoryresources, CPU resources, GPU resources, and data transfercapacity. Choice of strategy will depend on the resourcesof a particular computing system, and implementation willrely on appropriate design of data structures and softwarearchitectures.The computationally costly part of each full iteration in-volves a sequence of two matrix-vector multiplications foreither ¯ A T ( Ad vk ) or ¯ A T ( Av k ) as described above. In a matrix-vector multiplication, each row of the matrix multiplies thevector independently, so it is possible to do all the row-vectormultiplications in parallel, a task well suited to GPUs. A. Iteration with proton blocks followed by voxel blocks

The most straightforward parallelization strategy is to:1) Divide the data into blocks of protons for the ﬁrstmultiplication, Av k , processing each block in parallel.The blocks may be as small as a single proton.2) Assemble the output vector, concatenating the outputfrom each block.3) Divide the data into blocks of voxels for the second mul-tiplication, ¯ A T ( Av k ) , processing each block in parallel.The blocks may be as small as a single voxel.4) Assemble the output vector, concatenating the outputfrom each block.Parallel worker processes can execute the multiplications foreach block of protons, a foreman process can assemble theoutputs, and an executive process can evaluate the results asdescribed above. Blocks can be further divided into sub-blocks Fig. 9. Top: A 1 mm thick pCT slice of a cylindrical phantom with inserts,processed using a single block [11]. Bottom: Same data processed in 10independent blocks. and a hierarchical system of blocks may help route calculationsto multiple GPUs.The drawback to this method is that the memory require-ments exceed resources available in most currently cost-effective systems. This method requires lists of voxels withchord lengths for each proton, as well as lists of protons withchord lengths for each voxel. Although the lists only includenon-zero chord lengths, it still amounts to a large storagerequirement. However, the cost of memory continues to drop,and this method may become feasible in the near future.

B. Iteration with coordinated blocks of protons

As illustrated in Fig. 8, the entire sequence of multiplica-tions may be carried out independently in different blocks ofprotons, each with a worker process, and with each block t producing output vectors [ Av k ] t and [ ¯ A T ( Av k )] t . The foremanprocess concatenates the [ Av k ] t vectors into the complete p k +1 vector, and obtains the complete v k +1 vector with a sum overthe blocks: v k +1 = (cid:88) t [ ¯ A T ( Av k )] t (40)The complete vectors are identical to what would be obtainedwith a single block and are the input to the next iteration.Again, there can be a hierarchy of blocks. If the computingsystem contains multiple GPUs, the data can be divided intoone block for each GPU. Within each block, each proton canbe processed as a separate sub-block utilizing the parallel pro-cessing power of the GPUs, carefully managing the summationof the output vectors from each proton.Processing each proton independently (blocks of size oneproton) enables major savings in the use of memory. First,there is no need for a list of protons with chord lengths foreach voxel. All the needed information for the block is with thelist of voxels with chord lengths for the proton, and the outputvector needs only the voxels from that proton. Second, wecan take advantage of the path of the proton through adjacentvoxels to store the list of voxels with much less memory, asdescribed above. C. Iteration with independent blocks of protons

This last method does not produce exactly the same resultsas for a single block but can be simply implemented withoutdeveloping parallel processing architecture features such asforeman and executive processes, and provides a convenientmethod for rapid studies. We start by dividing protons into N well-randomized and equal-sized blocks t (the method istrivially extendable to different sized blocks, as long as theassignment of a proton to a block is random). Each blockmust contain enough protons to ﬁnd a solution vector (moreprotons than voxels.) A program able to handle a single blockcan then be run with N copies in parallel, each copy producinga solution close to the minimum χ for its block of data.Sums such as (cid:80) j α Tij scale linearly with the number ofprotons in the block. ¯ A T A is a square matrix with dimensionequal to the number of voxels where each entry is a ratiowhere the numerator and denominator both on average scalelinearly with the number of protons. Therefore, each entryis on average independent of the number of protons. Thefollowing holds within the statistical variability of the datafor each block, where the normalization of ¯ A t is done usingonly the protons in that block: ¯ A T A ≈ ¯ A Tt A t (41) V − t ≈ N V − (42)and we can show, at minimum χ , ¯ A T Ax = ¯ A T b (43) = V − A T b = (cid:88) t V − A Tt b t (44) = (cid:88) t V − V t ¯ A Tt b t ≈ N (cid:88) t ¯ A Tt A t x (45) ≈ ¯ A T A N (cid:88) t x t (46) x ≈ N (cid:88) t x t (47)and we ﬁnd we can simply average the solution vectors fromthe blocks. We have found this method works quite well,as shown for example in Fig. 9, although with a little morenoise near the edges with rapid RSP variation. The cylindricalphantom in Fig. 9 contained inserts with known RSP, whichare in good agreement with our measured values [11].VIII. C ONCLUSION

Clinically useful proton Computed Tomography images willrely on algorithms to ﬁnd the three-dimensional proton stop-ping power distribution that optimally ﬁts the measured protondata. The assurance of an image reconstruction quantiﬁablyclose to an optimal solution is a crucial step towards applyingproton imaging technology to clinical treatment planning, anda useful starting point for evaluating the clinical impact offurther image processing or use of approximations.While recent work has focused mostly on the use ofiterative projective algorithms, we have revisited the approachof Goitein using iterative least squares algorithms. We havefound that this approach is suitable for proton imaging, whichuses individual protons with relatively large measured WEPLuncertainties. Our method employs strategies with improvedconvergence which can naturally accommodate parallel pro-cessing and several novel features that help put pCT imaginginto a more quantitative framework.We have presented results with the ﬁrst pCT data acquiredwith the ProtonVDA system. Our collaboration is continuingdevelopment of the proton imaging system with the goal ofachieving automatic and prompt pCT image reconstruction,and we look forward to applying the algorithms presentedherein to new data sets in the future.A

CKNOWLEDGMENT

The authors thank our collaborators in the development ofthe pCT system: V. Rykalin and I. Polnyi from ProtonVDA, M.Pankuch, B. Kreydick and G. DeFillippo from NorthwesternMedicine Chicago Proton Center, N. Karonis, C. Ordoez, K.Dufﬁn, J. Winans, G. Coutrakon, C. Sarosiek and A. Best fromNorthern Illinois University, J. Welsh from Edward Hines JrVA Hospital, and R. Schulte from Loma Linda University.R

EFERENCES[1] A. J. Lomax, “Myths and realities of range uncertainty,”

The BritishJournal of Radiology , vol. 93, no. 1107, p. 20190582, 2020, pMID:31778317. [Online]. Available: https://doi.org/10.1259/bjr.20190582[2] A. N. Schreuder and J. Shamblin, “Proton therapy delivery: what isneeded in the next ten years?”

The British Journal of Radiology ,vol. 93, no. 1107, p. 20190359, 2020, pMID: 31692372. [Online].Available: https://doi.org/10.1259/bjr.20190359[3] E. A. DeJongh, D. F. DeJongh, I. Polnyi, V. Rykalin, C. Sarosiek,G. Coutrakon, K. L. Dufﬁn, N. T. Karonis, C. E. Ordoez, M. Pankuch,J. S. Welsh, and J. R. Winans, “Technical note: A prototype clinicalproton radiography system,” 2020, arXiv:2009.04652 [physics.med-ph].[4] C. Sarosiek, E. A. DeJongh, G. Coutrakon, D. F. DeJongh, K. L. Dufﬁn,N. T. Karonis, C. E. Ordoez, M. Pankuch, V. Rykalin, J. S. Welsh, andJ. R. Winans, “A prototype proton radiography system for clinical use,”2020, arXiv:2009.04657 [physics.med-ph]. [5] C. Miller, B. Altoos, E. DeJongh, M. Pankuch, D. DeJongh, V. Rykalin,C. Ordoez, N. Karonis, J. Winans, G. Coutrakon, and J. Welsh, “Recon-structed and real proton radiographs for image-guidance in proton beamtherapy,”

Journal of Radiation Oncology , vol. 8, pp. 97–101, 03 2019.[6] M. Pankuch, E. DeJongh, F. DeJongh, and et al., “A method toevaluate the clinical utility of proton radiography for geometric pa-tient alignment.” in

Proceedings of the 57th Annual Meeting of theParticle Therapy Cooperative Group (PTCOG) , May 2018, available:http://theijpt.org/doi/pdf/10.14338/2331-5180-5-2-000.[7] R. W. Schulte, V. Bashkirov, M. C. Loss Klock, T. Li, A. J. Wroe,I. Evseev, D. C. Williams, and T. Satogata, “Density resolution ofproton computed tomography,”

Medical Physics , vol. 32, no. 4, pp.1035–1046, 2005. [Online]. Available: https://aapm.onlinelibrary.wiley.com/doi/abs/10.1118/1.1884906[8] V. Giacometti, V. A. Bashkirov, P. Piersimoni, S. Guatelli, T. E. Plautz,H. F.-W. Sadrozinski, R. P. Johnson, A. Zatserklyaniy, T. Tessonnier,K. Parodi, A. B. Rosenfeld, and R. W. Schulte, “Softwareplatform for simulation of a prototype proton ct scanner,”

MedicalPhysics , vol. 44, no. 3, pp. 1002–1016, 2017. [Online]. Available:https://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.12107[9] S. Rit, G. Dedes, N. Freud, D. Sarrut, and J. M. Ltang, “Filteredbackprojection proton ct reconstruction along most likely paths,”

Medical Physics , vol. 40, no. 3, p. 031103, 2013. [Online]. Available:https://aapm.onlinelibrary.wiley.com/doi/abs/10.1118/1.4789589[10] C. Ordoez, N. Karonis, K. Dufﬁn, J. Winans, E. DeJongh, D. DeJongh,G. Coutrakon, N. Myers, M. Pankuch, and J. Welsh, “Fast in situ imagereconstruction for proton radiography,”

Journal of Radiation Oncology ,05 2019.[11] D. DeJongh, E. DeJongh, V. Rykalin, M. Pankuch, B. Kreydick,J. Welsh, R. Schulte, N. Karonis, C. Ordoez, K. Dufﬁn,J. Winans, G. Coutrakon, and C. Sarosiek, “Comparison ofproton stopping power measurements of animal tissues fromproton ct and x-ray ct systems,” in

Poster presented at JointAAPM — COMP Meeting (Virtual) , July 2020, available:https://w3.aapm.org/meetings/2020AM/programInfo/programAbs.php?sid=8490&aid=50340.[12] S. Penfold and Y. Censor, “Techniques in iterative proton ct imagereconstruction,”

Sensing and Imaging , vol. 16, 10 2015.[13] B. Schultze, Y. Censor, P. Karbasi, K. Schubert, and R. Schulte, “Animproved method of total variation superiorization applied to reconstruc-tion in proton computed tomography,”

IEEE Transactions on MedicalImaging , vol. PP, 03 2018.[14] M. Goitein, “Three-dimensional density reconstruction from a seriesof two-dimensional projections,”

Nuclear Instruments and Methods , 2009, pp.4176–4180.[16] G. Han, G. Qu, and Q. Wang, “Weighting algorithm and relaxationstrategies of the landweber method for image reconstruction,”

Mathe-matical Problems in Engineering , vol. 2018, pp. 1–19, 07 2018.[17] N. T. Karonis and et al, “High performance computing for threedimensional proton computed tomography (hpc-pct),” U.S. PatentUS8 766 180B2, May 2, 2010.

Don ”Fritz” DeJongh received the B.S. degreein physics and mathematics from the Ohio StateUniversity in 1983, and the Ph.D. degree in physicsfrom the California Institute of Technology in 1990.From 1990 to 2012, he was a Research Associate,Wilson Fellow, and Scientist at Fermilab, conductingresearch in particle physics and particle astrophysics.In 2014, he co-founded ProtonVDA, focusing ondeveloping instrumentation to optimize proton ra-diation therapy. He currently serves as CEO ofProtonVDA as well as Principle Investigator for thegrant funding this work.