X-ray Scatter Estimation Using Deep Splines
Philipp Roser, Annette Birkhold, Alexander Preuhs, Christopher Syben, Lina Felsner, Elisabeth Hoppe, Norbert Strobel, Markus Korwarschik, Rebecca Fahrig, Andreas Maier
11 X-ray Scatter Estimation Using Deep Splines
Philipp Roser, Annette Birkhold, Alexander Preuhs, Christopher Syben, Lina Felsner, Elisabeth Hoppe,Norbert Strobel, Markus Korwarschik, Rebecca Fahrig, Andreas Maier
Abstract —Algorithmic X-ray scatter compensation is a desir-able technique in flat-panel X-ray imaging and cone-beam com-puted tomography. State-of-the-art U-net based image translationapproaches yielded promising results. As there are no physicsconstraints applied to the output of the U-Net, it cannot be ruledout that it yields spurious results. Unfortunately, those may bemisleading in the context of medical imaging. To overcome thisproblem, we propose to embed B-splines as a known operatorinto neural networks. This inherently limits their predictions towell-behaved and smooth functions. In a study using synthetichead and thorax data as well as real thorax phantom data, wefound that our approach performed on par with U-net whencomparing both algorithms based on quantitative performancemetrics. However, our approach not only reduces runtime andparameter complexity, but we also found it much more robustto unseen noise levels. While the U-net responded with visibleartifacts, our approach preserved the X-ray signal’s frequencycharacteristics.
Index Terms —Approximation, B-spline, Neural network, X-rayscatter.
I. I
NTRODUCTION X RAY transmission imaging enables advanced diagnos-tic imaging or facilitates instrument guidance duringminimally-invasive interventions. Unfortunately, primary pho-tons scattered by the patient’s body impair image quality atthe detector. Especially cone-beam X-ray systems using flat-panel detectors suffer from this phenomenon due to the largeX-ray field and the resulting high scatter-to-primary ratio.Besides contrast degradation in X-ray projections, artifactsin cone-beam computed tomography (CBCT) deteriorate theimage quality in the reconstructed volume. Since the advent ofCBCT, various methods to compensate for scatter have beendeveloped [1], [2]. These can be categorized into physical andalgorithmic scatter compensation methods.
A. Physical Scatter Compensation
Physical scatter compensation refers to the direct manipula-tion of the X-ray field to either suppress scattered photons fromreaching the detector, or modulate them to enable differentia-tion into scattered and primary photons. A simple yet effectiveapproach is to increase the patient to detector distance, the so-called air gap [3]–[6]. Since the number of scattered photons
P. Roser, A. Preuhs, C. Syben, L. Felsner, E. Hoppe, and A. Maier are withthe Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander Universit¨at Erlangen-N¨urnberg, Erlangen, Germany. P. Roser isfunded by the Erlangen Graduate School in Advanced Optical Technolo-gies (SAOT), Friedrich-Alexander Universit¨at Erlangen-N¨urnberg, Erlangen,Germany. A. Maier is principal investigator at the SAOT. A. Birkhold,M. Kowarschik, and R. Fahrig are employees of Siemens Healthcare GmbH,91301 Forchheim, Germany. N. Strobel is with the Institute of Medical Engi-neering Schweinfurt, University of Applied Sciences W¨urzburg-Schweinfurt,97421 Schweinfurt, Germany. depends on the irradiated volume, the scatter-to-primary ratiocan be reduced by shaping the X-ray field to a small volumeof interest. In slit scanning, the stitching of these smallervolumes of interest yields a normal-sized reconstruction [7],[8]. Although slit scanning reduces patient dose and scatteredradiation, the prolonged acquisition time complicates motioncompensation and X-ray tube heat management [9]. Sincerelying on large air gaps or strict collimation limits theflexibility of the imaging system, clinical systems are equippedwith anti-scatter grids that physically block scattered photons[5], [10]–[14]. Although state-of-the-art for clinical CBCT,anti-scatter grids have disadvantages. For example, they can in-crease radiation exposure [15] and lead to Moir´e and grid lineartifacts [16]. Overall, the efficiency and effectiveness of anti-scatter grids depends on multiple factors, e.g., air gap, photonenergy, source-to-detector distance, and detector resolution [2],[17], [18]. In contrast to detector-side grids, primary modula-tion follows a different operation principle [19]–[21]. In theshadow of this modulator grid, almost no primary radiation ismeasured. Instead, information of the scatter properties of theirradiated volume is encoded. Using demodulation algorithms,the X-ray projection and the associated scatter distributioncan be distinguished. Due to its complex structure, primarymodulation, especially with C-arm systems, has not madeits way into the clinical practice. In conclusion, hardware-based solutions induce additional manufacturing costs andlimit flexibility. Also, since most approaches need algorithmicsupport or look-up tables, software-only solutions are highlydesirable [1], [2].
B. Algorithmic Scatter Compensation
Algorithmic scatter compensation approaches typically tryto estimate the scatter signal from the acquired X-ray pro-jections first. The estimated scatter image is then subtractedto obtain the primary signal. For CBCT, using Monte Carlo(MC) methods or Boltzmann transport solvers are commonapproaches [22]–[24]. Both require an initial reconstruction.Their computational complexity can partially be coped withusing dedicated hardware and variance reduction techniques[25] or by using a coarse simulation to derive the scatteringmodel online [26]. Trading off effectiveness for efficiency,model-based approaches are typically preferred for clinicalapplications. Such approaches rely on either simplified phys-ical or analytical models [27]–[29] or convolution kernels[30]–[32] in combination with an iterative correction scheme.Although being fast, their ability to generalize to differentimaging settings is limited.Recently, learning-based methods found their way to mod-eling physical processes [33]–[35]. Especially the application a r X i v : . [ phy s i c s . m e d - ph ] J a n of the U-net [36] to the task of scatter estimation, referredto as deep scatter estimation, yields superior results comparedto kernel-based or MC methods, respectively [37]. The U-net based method has the potential to become the new goldstandard in the domain of scatter correction. However, thereare several drawbacks to employing a deep U-net. First, deepconvolutional neural networks are challenging to comprehendin their operating principle. Given the high parameter com-plexity of such a U-net, large amounts of training samplesare mandatory to arrive at a robust solution. Second, tomaintain a fast performance, the U-net requires a dedicatedtop tier graphics processing unit (GPU), which might not beuniversally available. Third, the U-net is a universal functionapproximator without including any known physical charac-teristics of scattered radiation.To come up with an alternative, it appears attractive tobuild on the rich prior knowledge about X-ray scatter. Also,it has been shown that the incorporation of prior knowledgeinto neural networks reduces error bounds [38] and simplifiesthe analysis of the deployed network [39]. Although subjectto photon Poisson noise, X-ray scatter is mainly a low-frequency signal in the diagnostic energy regime [40], [41]. Ina previous proof-of-concept study, we exploited this propertyby approximating X-ray scatter using a smooth bivariate B-spline and directly inferred spline coefficients using a leanconvolutional encoder architecture [42]. This (a) allowed us toreduce the number of parameters and computational complex-ity tremendously, (b) ensured that no high-frequency details ofdiagnostic value can get manipulated, while (c) still meetingthe high accuracy of the U-net. C. Contributions
In this work, we extend the idea of approximating X-rayscatter as bivariate B-spline by integrating its evaluation intothe computational graph of a neural network. We analyze ourapproach in depth and compare it to several U-net realizationsin a nested cross-validation study using synthetic head andthorax X-ray image data. Besides benchmarking the parameterand computational complexity, we investigate all networks interms of their frequency response, the power spectral densityof the co-domain, and the response to unseen noise levels.Finally, we demonstrate how our method performs whenapplied to real data in an anthropomorphic thorax phantomstudy. II. M
ATERIALS AND M ETHODS
A. X-ray Projection Formation using B-splines
In general, in both techniques, X-ray fluoroscopy and CBCTit is assumed that primary photons either reach the detectoralong a straight line or are absorbed completely. Therefore,the observed X-ray projection is typically described in termsof its primary component I p ∈ R w × h with image width w and height h in pixels. The intensity at pixel ( u, v ) is givenby the Beer-Lambert law I p ( u, v ) = (cid:90) E I ( u, v, E ) e − (cid:82) λ µ ( s + λ · r ( u,v ) ,E ) dλ dE . (1) The polychromatic flat-field projection or X-ray spectrum I as well as the linear attenuation coefficient µ ( l , E ) dependon the photon energy E . Note that, due to the cone-beamgeometry, I decreases towards the borders and thus dependson the pixel position ( u, v ) . The linear photon attenuation fur-ther depends on the media along the straight line l : R (cid:55)→ R from the X-ray source s ∈ R in direction r ∈ R to the pixel ( u, v ) . Unfortunately, in reality, besides being photoelectricallyabsorbed, X-ray photons also undergo Compton scattering,Rayleigh scattering, or multiple occurrences of both effects.Therefore, a more realistic formation model I adds scatteredphotons I s leading to I ( u, v ) = I p ( u, v ) + I s ( u, v ) . (2)Since, for CBCT, we are mostly interested in the low-frequency components of the scatter, we neglect photon shotnoise and detector noise below.In a recent study [42], we experimentally showed that themain scatter components, which are low-frequency for diag-nostic X-rays, can be well approximated by sparse bivariate B-splines with error rates below . The domain and co-domainof a B-spline of degree k and order n = k + 1 are fully char-acterized by the knot vectors t u = ( t u, , t u, , . . . t u,w c − n ) and t v = ( t v, , t v, , . . . t v,h c − n ) , as well as the coefficient matrix C ∈ R w c × h c with width w c and height h c , respectively. Basedon this, we can approximate the spatial scatter distribution asa tensor product B-spline ˜ I s ,n ˜ I s ,n ( u, v ) = w c (cid:88) i =1 h c (cid:88) j =1 C ( i, j ) B i,n, t u ( u ) B j,n, t v ( v ) , (3)with the basis splines B , which are zero for knots that donot affect the spline at the pixel ( u, v ) . The basis splines arerecursively defined by B i,n, t ( x ) = x − t i t i + n − t i · B i,n − , t ( x )+ t i + n +1 − xt i + n +1 − t i +1 · B i +1 ,n − , t ( x ) , (4)with B i, , t ( x ) = (cid:40) , if t i ≤ x < t i +1 , otherwise . (5)Since the basis functions B are equal to zero for coefficientsthat do not contribute to a pixel ( u, v ) , only a n × n sub-gridof C needs to be considered for each pixel. Thus, using matrixnotation, the tensor product can be reformulated to ˜ I s ,n ( u, v ) = u T · [ M n, t u ( u )] · C uv · [ M n, t v ( v )] T · v , (6)with the coefficient patch C uv ∈ R n × n impacting thepixel ( u, v ) , the general matrix representation of a univariateB-spline M n, t ( · ) ∈ R n × n [43], and the vectors u = (cid:0) u u · · · u k (cid:1) T and v = (cid:0) v v · · · v k (cid:1) T . Thegeneral matrix representation of B-splines allows for arbitrarilyspaced knot grids, i.e., in our case, it allows for endpoint inter-polation, which exposes a more convenient behavior towardsthe borders of the evaluation grid. Besides the borders of thegrid, we currently limit our approach to uniform knot grids Fig. 1. The general architecture of our proposed approach. First, a(convolutional) encoder extracts the latent variables Z from the input X-rayprojection I . Second, a bottleneck network maps the latent space to bivariatespline coefficients C . The different choices for both networks are discussedlater in more detail. Third, we obtain the scatter estimate ˜ I s ,n by evaluatingthe spline coefficients C using the pre-calculated evaluation grid defined bythe pre-computed matrices U and V . The top (green) path refers to theforward pass, and the bottom (orange) path to the backward pass, respectively. and cubic B-splines ( k = 3 ). Thus, for most pixels ( u, v ) , theB-spline is evaluated using M = 16 − − − . (7)With using a fixed knot grid and evaluation grid, we canpre-calculate both u T · [ M n, t u ( u )] and v T · [ M n, t v ( v )] foreach pixel ( u, v ) . Padding the resulting vectors with zeros toaccount for all coefficients in C and stacking them row-wisely,we obtain the evaluation matrices U n ∈ R w × w c and V n ∈ R h × h c , respectively. Thus, we can calculate the cubic ( n = 4 )scatter distribution given the B-spline coefficients C via ˜ I s , = U · C · V T . (8)Since neural networks can extract scatter distributions from ameasured X-ray projection either directly [37] or indirectly viaB-spline coefficients [42] we aim to combine deep learningwith the matrix evaluation scheme. Using the Kroneckerproduct ‘ ⊗ ’, the derivative ∂ ˜ I s , ∂ C is ∂ ˜ I s , ∂ C = V ⊗ U . (9)Thus, we can straightforwardly embed the B-spline evaluationinto a computational graph f θ : R w × h (cid:55)→ R w × h withparameters to train θ without breaking differentiability andthus back-propagation. B. Network Architecture
In the following, we describe the devised neural network.The overall architecture comprises a generic convolutionalencoder followed by a bottleneck network to infer spline coef-ficients from measured X-ray projections, as proposed in ourprevious study [42]. This architecture is depicted in Fig. 1. Weemploy a lean convolutional encoder f θ : R w × h (cid:55)→ R w c × h c that consists of d convolutional blocks. A block comprises twoconvolutional layers with c feature channels and × kernels.Each convolutional layer is followed by a rectified linearunit (ReLU) [44] activation, and between two blocks, × average pooling is applied. As X-ray scatter is low-frequencyand to potentially limit the computational complexity of our model, we allow for additional pooling layers before the firstconvolutional block. The encoder f θ is completed by a × convolution to build the weighted sum of all channels c .Overall, based on convolutions only, f θ only encodes a localscatter representation Z , since its receptive field does notnecessarily cover the whole input image. To establish a globalcontext, we employ different types of bottleneck networks g φ to map Z to spline coefficients C : (1) a constrainedweighting matrix W with W i,j > , (2) an unconstrainedfully-connected layer with ReLU activation (which merelyrelates to an unconstrained weighting matrix with enforcednon-negativity constraint), (3) two fully-connected layers withReLU activation, or (4) two additional convolutional blocksfollowed by a fully-connected layer. C. Synthetic Dataset
Acquiring raw scatter-free X-ray projections and theirscatter-contaminated counterparts is tedious and time-consuming, especially at the scale to carry out deep learning.As a solution, we leveraged the X-ray transport code MC-GPU[25] to generate artificial pairs of scatter-contaminated andscatter-free X-ray projections. We used openly available CTscans from The Cancer Imaging Archive (TCIA) [45] as inputsto the simulation. In order to keep the simulation time manage-able, we selected 20 head scans from the HNSCC-3DCT-RTdataset [46] and 15 thorax scans from the CT Lymph Nodesdataset [47], respectively. To prepare the phantoms for MCsimulation, we employed a basic pre-processing pipeline [48]based on tissue and density estimation [49], and connectedcomponent labeling [50]. For each CT scan, we simulated astack of 260 X-ray projections ( w = 1152 , h = 768 ) overan angular range of °. The source-to-isocenter and source-to-detector distances are
785 mm and , respectively.Per X-ray projection, we simulated × photons sampledfrom an
85 kV peak voltage tungsten spectrum. All projectionswere flat-field normalized. Note that we used the data as is,without registering similar anatomies in a common referenceframe. To be more comparable to the related work [37],speed-up training times, and suppress simulation noise, wedown-sampled the projections to × pixels. Before thedown-sampling, we applied Gaussian filtering to the primaryand scatter projections independently ( σ p = 2 , σ s = 30 ).Corresponding cross-sectional slices were reconstructed on anisotropic grid with voxels. D. Real Dataset
To evaluate the proposed method on real data, we scannedthe thorax of an anthropomorphic phantom (PBU-60, KyotoKagaku Co., Ltd., Kyoto, Japan) using a C-arm CBCT system(ARTIS icono floor, Siemens Healthineers AG, Forchheim,Germany). In total, we acquired 12 datasets, three short scansfor each full view grid (referred to as full), and maximumcollimation (referred to as slit), both with and without ananti-scatter grid. Each short scan consists of 397 projections( × pixels, µ m isotropic spacing) over an angularrange of . ° using
85 kV peak tube voltage. The source-to-isocenter and source-to-detector distances are
750 mm and , respectively. We reconstructed × slices onan µ m isotropic voxel grid using an in-house reconstruc-tion pipeline. We regard the slit scan in conjunction with theanti-scatter grid as ground truth.III. E XPERIMENTS
In the following, we describe the experiments carried out toevaluate our proposed method. Since the U-net based approachoutperformed other computational scatter estimation methodsby far [37], we considered different configurations of the U-net as a baseline method. To account for our relatively smalltraining corpora, we evaluated our method and the baselineusing a nested cross-validation approach. For the head dataset(20 subjects), we used a ∗ -fold cross-validation approachand, for the thorax dataset (15 subjects), we used a ∗ -foldcross-validation approach. The real dataset was only used fortesting. To keep the training procedure manageable, we dividedthe evaluation into meta-parameter search based on the scattermean absolute percentage error (MAPE), and further in-depthanalysis. A. Meta-Parameter Search
Since clinic CBCT systems usually have fine-tuned acqui-sition protocols for each anatomic region, we evaluated eachnetwork architecture for head and thorax data separately. Tofix the meta-parameters, we only used the synthetic headdataset and the scatter MAPE as metric. We validated bothour approach and the U-net for different combinations ofdepths d and feature channels c using Glorot initialization [51],which was also used by the baseline [37]. Furthermore, wedistinguished between a deep U-net (DU-net) and a shallowU-net (SU-net), in which the number of feature maps is notdoubled at each level. For all experiments in this section, weperformed a ∗ -fold cross-validation and trained all networksusing the adaptive moments optimizer (Adam) [52] with aninitial learning rate of − for 100 epochs. In total, we trained12 networks for each configuration and used the averagedMAPE to assess their quality. To reduce the total computationtime, we stopped each training procedure when no significantperformance increase was observed for 20 epochs.In a first step, we scanned different parametrizations of thenetwork architectures to find the most promising ones for in-depth comparison. Overall, we tested both U-nets with c = 16 and d ∈ { , , , } , and our approach with c = 16 , d ∈{ , , } , and additional pre-pooling p ∈ { , , } . In addition,to decouple the architecture of our convolutional encoder f θ and the spline coefficient dimensionality, we investigated thefour bottleneck architectures g φ as described in Sec. II-B. B. Qualitative and Quantitative Results
Based on the findings of the previous experiments, weadapted the learning rate to − but kept the overall trainingroutine. We separately performed 4 ∗ ∗ C. In-Depth Analysis1) Spectral Analysis:
For clinical applications, data in-tegrity is of utmost importance to ensure that automatedsystems do not alter diagnostically relevant content. Whilethe predictions of neural networks may appear reasonableat first glance, unrealistic perturbations can be unveiled byinvestigating the spectral properties of (a) the predicted images[53] or (b) the neural network itself. Therefore, we firstinvestigate all networks’ performance concerning the predictedscatter distributions’ power spectral density. Second, fromscatter estimation theory, we know that the scatter distributioncan be recovered from the measured signal by the convolutionwith a so-called scatter kernel [30]. This allows us to interpreta neural network for a specific pair of scatter distribution andX-ray signal as a filtering operation and assess its frequencyresponse.
2) Noise Analysis:
Assessing a network’s robustness isinherently difficult given small training corpora. Therefore,adversarial attacks are often used to expose weaknesses delib-erately. Since we trained all networks on noise-free data, test-ing them on data with different unseen noise levels appearedappropriate. To this end, we applied Poisson noise associatedwith different photon counts ranging from to to ourhead dataset and investigated the accuracy of the networks’predictions.
3) Runtime Analysis:
For interventional applications, thefast execution speed of computer programs is essential. There-fore, we benchmarked the average execution time for all net-works for different batch sizes using a 12-core CPU (Intel(R)Xeon(R) Silver 4116 CPU 2.10GHz).
4) Real Data Analysis:
To confirm the generalizability toreal data, we tested the networks, which performed best onour synthetic thorax dataset, on the real thorax dataset. Forreference, we considered various different scatter suppressiontechniques, namely using an anti-scatter grid or slit scanning.As an almost scatter free baseline, we considered the con-figuration to use both, an anti-scatter grid and slit scanning.Note that we neither performed an intensity or a geometrycalibration in between the scans to account for the missinganti-scatter grid. IV. R
ESULTS
A. Meta-Parameter Search1) Network Parametrization:
Figure 2 establishes the rela-tionship between the number of convolutional parameters totrain for each network to the averaged error rates of all foldsand patients. All networks’ error rates range between to , and we observed that overall the more compact networksoutperform the DU-nets. Based on these findings, we selectedtwo spline networks ( c = 16 , ( d, p ) ∈ { (4 , , (5 , } ) andfour U-nets (deep and shallow, c = 16 , d ∈ { , } ) for furtherinvestigations.
2) Bottleneck:
Figure 3 shows the results for the differentbottleneck architectures. Our proposed constrained weightingmatrix, homogeneously initialized, achieves the best results onaverage, even surpassing the convolutional bottleneck followedby a fully-connected layer. The unconstrained fully-connected No. parameters M A P E [ % ] DU-netSU-netOursFig. 2. Distribution of different network configurations for our approachand the U-net in terms of absolute percentage errors averaged over all foldsand patients. Note that, due to visualization purposes, the standard deviationencoded by the circular margins does not correspond to absolute valuesbut rather indicates the relative spread between the networks. The standarddeviation is in the range of to . The x-axis refers to the number ofparameters in the convolutional layers.1 2 3 4Fold05101520 M A P E [ % ] constrainedunconstrained fc-netconv-netFig. 3. Boxplots of the mean absolute percentage error (MAPE) of ourproposed method for different bottleneck architectures averaged over thevalidation folds for each training fold. The horizontal lines indicate the meanvalue over all test and validation folds. layer architectures yield considerably worse results. Figure4, which shows the weighting matrix for the constrainedand unconstrained case, substantiates this finding. While theconstrained matrix converges to a block circulant matrix,which merely relates to a convolution, the unconstrained onehardly resembles a sensible operation and is overall noisy. B. Qualitative and Quantitative Results
Figure 5 shows fold-wise boxplots for the two syntheticdatasets. We observe similar error rates of approximately across all folds and network configurations regardingthe predicted scatter distributions for the head dataset. Ingeneral, all networks achieve high SSIM values above 0.99 constrained unconstrained
Fig. 4. Normalized constrained (left) and unconstrained (right) weightingmatrices used to map the latent variables Z to spline coefficients C . when comparing the reconstructed volumes from the simulatedideal primary signal to their counterparts obtained using thescatter-corrected projections. Note that for the head data,our approach performed equally well for all folds, whereasthe results obtained with the U-nets varied more widely.Processing the thorax datasets, on the other hand, was morechallenging. Again, all networks performed comparably wellwith error rates of approximately . . In comparison to thehead dataset, larger error margins and outliers were found.This can also be seen with the SSIM, which is, on average,just below 0.98. Again, we find that our proposed method ison par with U-net-like structures from a quantitative point ofview.Looking at selected subjects of both datasets in Fig. 6reveals the potential advantages of the proposed approach.Modeling the predicted scatter as a B-spline intrinsicallylimits the network output to smooth surfaces. In contrast,both U-nets preserve some details of the input, especially theshallow architecture. However, the reconstructed slices showno systematic trend, and all methods can adequately recoverthe desired signal. C. In-Depth Analysis1) Spectral Analysis:
We calculated the power spectral den-sity by azimuthally averaging the magnitudes of the 2D Fouriertransform of X-ray scatter distributions. Also, we averaged allpower spectral densities of all projection, patients, and testfolds. The resulting densities plotted in Fig. 7 support ourprevious findings. While the U-nets yield numerically soundpredictions, they systematically boost the high frequencies inthe scatter distributions. Our approach, however, preservesthe real power spectral density over the whole frequencyspectrum. As mentioned above, conventional convolutionalneural networks can be interpreted for a single input imagein terms of a filtering operation. Thus, we divided the Fouriertransform of the output by the Fourier transform of the input toobtain the respective frequency responses. Figure 8 shows theamplitude and phase of an ideal system’s frequency responses,for our method and both U-nets, averaged over all projectionsfor one patient. Our spline-net’s frequency response wascloser to the ideal frequency response in both, amplitude andphase. However, we observed a noticeable intensity shift inour method. The U-nets, in contrast, both exhibited largerdeviations in the patterns of amplitude and phase, indicatingthat their represented operation is less predictable.
2) Noise Analysis:
Figure 9 shows plots of the networks’error rates when confronted with noise. We could confirmthat the U-net is very sensitive to unseen noise levels in bothconfigurations, whereas our approach performs more robustly.
3) Runtime Analysis:
Figure 10 compiles the inferencespeed of all considered networks. As expected from theparameter complexity, our approach is the fastest with to
30 ms and therefore . to . times faster than the SU-netwith
34 ms to
50 ms . The DU-net is the slowest with
89 ms to
144 ms .
4) Real Data Analysis:
As shown in Fig. 11, all methodswere able to compensate for scatter artifacts well. However,
Head data Thorax data M A P E [ % ] M A P E [ % ] SS I M SS I M DU-net SU-net OursFig. 5. Quantitative boxplots for our synthetic datasets. The top row shows the mean absolute percentage errors (MAPE) with respect to the scatter groundtruth for each test fold. The bottom row shows structural similarity indices (SSIM) between reconstructed volumes from the simulated ideal primary signaland the scatter-corrected ones using neural networks. The horizontal lines indicate the overall average across all folds. For the sake of clarity, we only depictthe best performing networks in each category. For the head dataset, we depict the DU-net with d = 6 , SU-net with d = 7 , and our method with d = 4 and p = 1 . For the thorax dataset, we depict the DU-net with d = 7 , SU-net with d = 6 , and our method with d = 5 and p = 0 . Total Scatter 4.67 %DU-net 3.73 %SU-net 4.33 %Ours Reco. 54 HUUncorr. 4 HUDU-net 4 HUSU-net 2 HUOurs2.98 % 5.31 % 2.07 % 85 HU 5 HU 9 HU 10 HU2.95 % 3.55 % 5.27 % 103 HU 8 HU 12 HU 14 HU6.82 % 3.05 % 3.06 % 88 HU 16 HU 7 HU 5 HU − − −
10 0 10 20 30Absolute Percentage Error [%] − Fig. 6. Selected subjects of both datasets. The results of the learning-based approaches are given in terms of error maps using the absolute percentage error(APE) for scatter and the absolute error (AE) for the reconstructed slices. For convenience, the associated mean APE (MAPE) and mean AE (MAE) are alsoprovided below each output. − − Normalized frequency − − N o r m a li ze dpo w e r GTDU-netSU-netOursFig. 7. Power spectral density plots for the X-ray scatter distributions of theground truth (black solid), ours (light orange dashed), and the U-net (lightpurple dotted) averaged over all test folds. Note that the graphs associatedwith both U-nets are almost identical. A m p lit ud e GT DU-net SU-net Ours − − − P h a s e Fig. 8. Frequency responses of the systems for one test patient averaged overall validation folds. From top to bottom: Normalized log -amplitude, phasein radians. From left to right: ideal system (GT), our spline-net (Ours), deepU-net (DU-net), shallow U-net (SU-net). Photon count . . . . . . M A P E [ % ] DU-netSU-netOursFig. 9. Error rates of predicted scatter distributions for different noise levelsaveraged over all validation and test fold networks. All networks have beentrained with noise-free data. Note that a lower photon count relates to a highernoise level. 1 2 4 8 16 32Batch size R un ti m e [ m s ] DU-net 6DU-net 7SU-net 6SU-net 7Ours 4 \ \ ms for different architectures (specifiedby the depth and optional pre-pooling, d \ p ) and batch sizes. employing an anti-scatter grid still yielded the lowest overallerror in Hounsfield units (HU) as compared to using slit colli-mation in addition. The learning-based approaches performedabout as well as the slit scanning without an anti-scatter grid,which is in accordance to previous findings [37]. Overall, thenetworks achieved similar error rates, and no systematic trendwas observable. V. D ISCUSSION
X-ray scatter is a major source of artifacts in interven-tional CBCT. Deep-learning-based approaches have shown thepotential to outperform conventional physical or algorithmicapproaches to scatter compensation. Without incorporatingprior information, scatter distributions can be inferred fromthe measured X-ray signal [37]. However, data integrity androbustness are critical aspects of clinical imaging, which canbe violated by deep neural networks [54].To ensure sound scatter estimates, we proposed to embedbivariate B-splines in neural networks to constrain their co-domain to smooth results [42]. By reformulating the splineevaluation in terms of matrix multiplications, we were able tointegrate B-splines with neural networks without further adosuch that end-to-end training was feasible. In an extensivecross-validation using synthetic data, we showed that ourproposed lean convolutional encoder using B-spline evaluationperforms on par with several U-net based architectures. Wesubstantiated this finding in a first phantom study. There, ourapproach performed basically as well as the U-net and the slitscanning technique (without anti-scatter grid), which was usedas a baseline before [37].Note, however, that the proposed method offers severaladvantages not present in the U-net architectures. First, weconsiderably lowered the parameter and runtime complexity,rendering our method suitable for a variety of hardware. Moreimportantly, we verified that our spline-based approach ensuresdata integrity concerning the power spectral density of scatterestimates and the overall network’s frequency response. Thisproperty ensures that no high-frequency details, which relate toanatomic structures or pathologies, are altered. In comparison,the U-net considerably changes the concerning spectral con-tents, which was already shown for neural networks containing R ec on s t r u c ti on Grid+Slit Grid+Full Slit Full DU-net SU-net Ours D i ff e r e n ce . ± .
21 58 . ± .
09 123 . ± .
32 62 . ± .
25 64 . ± .
23 63 . ± . Fig. 11. Central slices of reconstructed volumes for different scatter compensation strategies and errors taken with respect to the reconstruction result resultingfrom the grid + slit data acquisition. Grid refers to employing a conventional anti-scatter grid. Slit refers to the most narrow collimator setting available onthe X-ray imaging system. Full refers to using no collimation at all. Both, the reconstructed slices as well as the difference images are shown using a graylevel window [ − , HU. For convenience the absolute average HU errors and the associated standard deviations with respect to three consecutivemeasurements are provided. up-convolutions [53]. As our spline network corresponds to alow-pass filtering operation, it is robust towards noise evenwhen trained on noise-free data.While we implemented the U-net baselines to the best ofour knowledge, our error rates are higher than previouslyreported [37]. Potential reasons include but are not limited to(a) different simulation codes, (b) our smaller training corpora,and (c) the heterogeneity of our data. Our simulation setupcurrently assumes an ideal detector, and we do not considerthe domain shift between synthetic and real data.For future work, we indicate several research directionsfor either method or data and experiments. First, since ourpreferred bottleneck weighting matrix converges to a blockcirculant matrix, a fixed representation is desirable to furtherreduce the number of trainable parameters and increase theplausibility of our approach. For instance, training both theencoder and the bottleneck separately or introducing addi-tional constraints are promising approaches to do so. Second,replacing the bottleneck fully-connected layer with a smallU-net reduces the number of parameters while still coveringthe entire latent space with its receptive field. Third, sinceour network is end-to-end trainable, it appears reasonable toinclude the reconstruction into the computational graph tocalculate the loss function in the CBCT domain [55], [56]. Lastbut not least, we believe our approach applies to low-frequencysignal estimation and correction in general, e.g., bias fieldcorrection in magnetic resonance imaging, ultrasound imaging,or microscopy techniques.VI. C
ONCLUSION
Embedding B-splines in neural networks ensures data in-tegrity for low-frequency signals. This reduces the number ofnetwork parameters needed to arrive at physically sensibleresults, and thanks to the reduced parameter set, networkinference can be made faster. D
ISCLAIMER
The concepts and information presented in this article arebased on research and are not commercially available.R
EFERENCES[1] E.-P. R¨uhrnschopf and K. Klingenbeck, “A general framework andreview of scatter correction methods in x-ray cone-beam computerizedtomography. Part 1: Scatter compensation approaches,”
Med Phys ,vol. 38, no. 7, pp. 4296–4311, 2011.[2] ——, “A general framework and review of scatter correction methodsin cone-beam CT. Part 2: Scatter estimation approaches,”
Med Phys ,vol. 38, no. 9, pp. 5186–5199, 2011.[3] F. Groedel and R. Wachter, “Bedeutung der R¨ohren-Fern- und Platten-Abstandsaufnahmen,”
Verh d Dt R¨ontgengesellschaft , vol. 17, pp. 134–135, 1926.[4] F. Janus, “Die Bedeutung der gestreuten ungerichteten R¨ontgenstrahlenfur die Bildwirkung und ihre Beseitigung,”
Verh d DtR¨ontgengesellschaft , vol. 17, pp. 136–139, 1926.[5] U. Neitzel, “Grids or air gaps for scatter reduction in digital radiography:A model calculation,”
Med Phys , vol. 19, no. 2, pp. 475–481, 1992.[6] J. Persliden and G. A. Carlsson, “Scatter rejection by air gaps in diagnos-tic radiology. calculations using a Monte Carlo collision density methodand consideration of molecular interference in coherent scattering,”
PhysMed Biol , vol. 42, no. 1, pp. 155–175, 1997.[7] G. T. Barnes, “Contrast and scatter in x-ray imaging.”
RadioGraphics ,vol. 11, no. 2, pp. 307–323, 1991.[8] R. Bhagtani and T. G. Schmidt, “Simulated scatter performance of aninverse-geometry dedicated breast CT system,”
Med Phys , vol. 36, no. 3,pp. 788–796, 2009.[9] T. G. Schmidt, R. Fahrig, N. J. Pelc, and E. G. Solomon, “An inverse-geometry volumetric CT system with a large-area scanned source: Afeasibility study,”
Med Phys , vol. 31, no. 9, pp. 2623–2627, 2004.[10] G. Bucky, “ ¨Uber die Ausschaltung der im Objekt entstehendenSekund¨arstrahlen bei R¨ontgenstrahlen,”
Verh d Dt R¨ontgengesellschaft ,vol. 9, pp. 30–32, 1913.[11] W. A. Kalender, “Calculation of x-ray grid characteristics by MonteCarlo methods,”
Phys Med Biol , vol. 27, no. 3, pp. 353–361, 1982.[12] H.-P. Chan and K. Doi, “Investigation of the performance of antiscattergrids: Monte Carlo simulation studies,”
Phys Med Biol , vol. 27, no. 6,pp. 785–803, 1982.[13] H.-P. Chan, Y. Higashida, and K. Doi, “Performance of antiscatter gridsin diagnostic radiology: Experimental measurements and Monte Carlosimulation studies,”
Med Phys , vol. 12, no. 4, pp. 449–454, 1985.[14] H.-P. Chan, K. L. Lam, and Y. Wu, “Studies of performance of antiscattergrids in digital radiography: Effect on signal-to-noise ratio,”
Med Phys ,vol. 17, no. 4, pp. 655–664, 1990. [15] H. Aichinger, J. Dierker, S. Joite-Barfuss, and M. Saebel,
Radiationexposure and image quality in X-ray diagnostic radiology . Berlin,New York: Springer, 2004.[16] D. M. Gauntt and G. T. Barnes, “Grid line artifact formation: Acomprehensive theory,”
Med Phys , vol. 33, no. 6, pp. 1668–1677, 2006.[17] V. Singh, A. Jain, D. R. Bednarek, and S. Rudin, “Limitations of anti-scatter grids when used with high resolution image detectors,” in
ProcSPIE Int Soc Opt Eng , B. R. Whiting and C. Hoeschen, Eds., vol. 9033,International Society for Optics and Photonics. SPIE, 2014, pp. 1618–1626.[18] R. Rana, A. Jain, A. Shankar, D. R. Bednarek, and S. Rudin, “Scatterestimation and removal of anti-scatter grid-line artifacts from anthro-pomorphic head phantom images taken with a high resolution imagedetector,” in
Proc SPIE Int Soc Opt Eng , D. Kontos and T. G. Flohr,Eds., vol. 9783, International Society for Optics and Photonics. SPIE,2016, pp. 1619–1628.[19] A. Bani-Hashemi, E. Blanz, J. Maltz, D. Hristov, and M. Svatos,“Tu-d-i-611-08: Cone beam x-ray scatter removal via image frequencymodulation and filtering,”
Med Phys , vol. 32, no. 6, pp. 2093–2093,2005.[20] L. Zhu, N. R. Bennett, and R. Fahrig, “Scatter correction method forx-ray CT using primary modulation: Theory and preliminary results,”
IEEE Trans Med Imaging , vol. 25, no. 12, pp. 1573–1587, 2006.[21] B. Bier, M. Berger, A. Maier, M. Kachelrieß, L. Ritschl, K. M¨uller, J.-H.Choi, and R. Fahrig, “Scatter correction using a primary modulator ona clinical angiography c-arm CT system,”
Med Phys , vol. 44, no. 9, pp.125–137, 2017.[22] W. Zbijewski and F. J. Beekman, “Fast scatter estimation for cone-beam x-ray CT by combined Monte Carlo tracking and richardson-lucyfitting,” in
IEEE Nucl Sci Symp Conf Rec , vol. 5, 2004, pp. 2774–2777.[23] G. Poludniowski, P. M. Evans, V. N. Hansen, and S. Webb, “An efficientMonte Carlo-based algorithm for scatter correction in keV cone-beamCT,”
Phys Med Biol , vol. 54, no. 12, pp. 3847–3864, 2009.[24] A. Wang, A. Maslowski, P. Messmer, M. Lehmann, A. Strzelecki, E. Yu,P. Paysan, M. Brehm, P. Munro, J. Star-Lack, and D. Seghers, “Acuroscts: A fast, linear boltzmann transport equation solver for computedtomography scatter – part ii: System modeling, scatter correction, andoptimization,”
Med Phys , vol. 45, no. 5, pp. 1914–1925, 2018.[25] A. Badal and A. Badano, “Accelerating Monte Carlo simulations ofphoton transport in a voxelized geometry using a massively parallelgraphics processing unit,”
Med Phys , vol. 36, no. 11, pp. 4878–4880,2009.[26] M. Baer and M. Kachelrieß, “Hybrid scatter correction for CT imaging,”
Phys Med Biol , vol. 57, no. 21, pp. 6849–6867, 2012.[27] W. Swindell and P. M. Evans, “Scattered radiation in portal images:A Monte Carlo simulation and a simple physical model,”
Med Phys ,vol. 23, no. 1, pp. 63–73, 1996.[28] W. Yao and K. W. Leszczynski, “An analytical approach to estimatingthe first order scatter in heterogeneous medium. ii. a practical applica-tion,”
Med Phys , vol. 36, no. 7, pp. 3157–3167, 2009.[29] M. Meyer, W. A. Kalender, and Y. Kyriakou, “A fast and pragmaticapproach for scatter correction in flat-detector CT using elliptic modelingand iterative optimization,”
Phys Med Biol , vol. 55, no. 1, pp. 99–120,2009.[30] B. Ohnesorge, T. Flohr, and K. Klingenbeck-Regn, “Efficient objectscatter correction algorithm for third and fourth generation CT scanners,”
Eur Radiol , vol. 9, no. 3, pp. 563–569, Mar 1999.[31] H. Li, R. Mohan, and X. R. Zhu, “Scatter kernel estimation with an edge-spread function method for cone-beam computed tomography imaging,”
Phys Med Biol , vol. 53, no. 23, pp. 6729–6748, 2008.[32] M. Sun and J. M. Star-Lack, “Improved scatter correction using adaptivescatter kernel superposition,”
Phys Med Biol , vol. 55, no. 22, pp. 6695–6720, 2010.[33] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neu-ral ordinary differential equations,” in
Adv Neural Inf Process Syst ,S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi,and R. Garnett, Eds. Curran Associates, Inc., 2018, pp. 6571–6583.[34] P. Roser, X. Zhong, A. Birkhold, N. Strobel, M. Kowarschik, R. Fahrig,and A. Maier, “Physics-driven learning of x-ray skin dose distribution ininterventional procedures,”
Med Phys , vol. 46, no. 10, pp. 4654–4665,2019.[35] P. Roser, X. Zhong, A. Birkhold, A. Preuhs, C. Syben, E. Hoppe,N. Strobel, M. Kowarschik, R. Fahrig, and A. Maier, “Simultaneousestimation of x-ray back-scatter and forward-scatter using multi-tasklearning,” in
International Conference on Medical Image Computingand Computer-Assisted Intervention . Springer, 2020, pp. 199–208. [36] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-works for biomedical image segmentation,” in
Med Image ComputComput Assist Interv , ser. Lecture notes in computer science, vol. 9351.Springer, 2015, pp. 234–241.[37] J. Maier, E. Eulig, T. V¨oth, M. Knaup, J. Kuntz, S. Sawall, andM. Kachelrieß, “Real-time scatter estimation for medical CT using thedeep scatter estimation: Method and robustness analysis with respectto different anatomies, dose levels, tube voltages, and data truncation,”
Med Phys , vol. 46, no. 1, pp. 238–249, 2019.[38] A. Maier, C. Syben, B. Stimpel, T. W¨urfl, M. Hoffmann, F. Schebesch,W. Fu, L. Mill, L. Kling, and S. Christiansen, “Learning with knownoperators reduces maximum error bounds,”
Nat Mach Intell , vol. 1, pp.373–380, 2019.[39] B. Stimpel, C. Syben, F. Schirrmacher, P. Hoelter, A. D¨orfler, andA. Maier, “Multi-modal deep guided filtering for comprehensible med-ical image processing,”
IEEE Trans Med Imaging , vol. 39, no. 5, pp.1703–1711, 2019.[40] G. H. Glover, “Compton scatter effects in CT reconstructions,”
MedPhys , vol. 9, no. 6, pp. 860–867, 1982.[41] J. M. Boone and J. A. Seibert, “An analytical model of the scatteredradiation distribution in diagnostic radiology,”
Med Phys , vol. 15, no. 5,pp. 721–725, 1988.[42] P. Roser, A. Birkhold, A. Preuhs, C. Syben, N. Strobel, M. Korwarschik,R. Fahrig, and A. Maier, “Deep scatter splines: Learning-based medicalx-ray scatter estimation using b-splines,” in
The 6th InternationalConference on Image Formation in X-Ray Computed Tomography (CT-Meeting) , 2020.[43] K. Qin, “General matrix representations for b-splines,”
The VisualComputer , vol. 16, no. 3-4, pp. 177–186, 2000.[44] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neuralnetworks,” in
Proceedings of the fourteenth international conference onartificial intelligence and statistics , 2011, pp. 315–323.[45] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore,S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, and F. Prior, “TheCancer Imaging Archive (TCIA): Maintaining and operating a publicinformation repository,”
J Digit Imaging , vol. 26, pp. 1045–1057, 072013.[46] T. Bejarano, M. De Ornelas Couto, and I. Mihaylov, “Head-and-neck squamous cell carcinoma patients with CT taken duringpre-treatment, mid-treatment, and post-treatment dataset. the cancerimaging archive,” 2018. [Online]. Available: http://doi.org/10.7937/K9/TCIA.2018.13upr2xf[47] H. Roth, L. Le, S. Ari, K. Cherry, J. Hoffman, S. Wang, andR. Summers, “A new 2.5 d representation for lymph node detectionin ct. the cancer imaging archive,” 2018. [Online]. Available:http://doi.org/10.7937/K9/TCIA.2015.AQIIDCNM[48] P. Roser, A. Birkhold, A. Preuhs, B. Stimpel, C. Syben, N. Strobel,M. Kowarschik, R. Fahrig, and A. Maier, “Fully-automatic CT datapreparation for interventional x-ray skin dose simulation,” in
Bildver-arbeitung f¨ur die Medizin 2020 . Springer, 2020, pp. 125–130.[49] W. Schneider, T. Bortfeld, and W. Schlegel, “Correlation between CTnumbers and tissue parameters needed for Monte Carlo simulations ofclinical dose distributions,”
Phys Med Biol , vol. 45, no. 2, pp. 459–478,2000.[50] L. He, X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao, “The connected-component labeling problem: A review of state-of-the-art algorithms,”
Pattern Recognit , vol. 70, pp. 25–43, 2017.[51] X. Glorot and Y. Bengio, “Understanding the difficulty of trainingdeep feedforward neural networks,” in
Proceedings of the thirteenthinternational conference on artificial intelligence and statistics , 2010,pp. 249–256.[52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in , Y. Bengio and Y. LeCun,Eds., 2015.[53] R. Durall, M. Keuper, and J. Keuper, “Watch your up-convolution: Cnnbased generative deep neural networks are failing to reproduce spectraldistributions,” in
Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition , 2020, pp. 7890–7899.[54] Y. Huang, T. W¨urfl, K. Breininger, L. Liu, G. Lauritsch, and A. Maier,“Some investigations on robustness of deep learning in limited angletomography,” in
Med Image Comput Comput Assist Interv . Springer,2018, pp. 145–153.[55] C. Syben, M. Michen, B. Stimpel, S. Seitz, S. Ploner, and A. K. Maier,“Pyro-NN: Python reconstruction operators in neural networks,”
MedPhys , vol. 46, no. 11, pp. 5110–5115, 2019.[56] C. Syben, B. Stimpel, P. Roser, A. D¨orfler, and A. Maier, “Knownoperator learning enables constrained projection geometry conversion: Parallel to cone-beam for hybrid MR/x-ray imaging,”