[PDF] X-ray Scatter Estimation Using Deep Splines

Abstract

Algorithmic X-ray scatter compensation is a desirable technique in flat-panel X-ray imaging and cone-beam computed tomography. State-of-the-art U-net based image translation approaches yielded promising results. As there are no physics constraints applied to the output of the U-Net, it cannot be ruled out that it yields spurious results. Unfortunately, those may be misleading in the context of medical imaging. To overcome this problem, we propose to embed B-splines as a known operator into neural networks. This inherently limits their predictions to well-behaved and smooth functions. In a study using synthetic head and thorax data as well as real thorax phantom data, we found that our approach performed on par with U-net when comparing both algorithms based on quantitative performance metrics. However, our approach not only reduces runtime and parameter complexity, but we also found it much more robust to unseen noise levels. While the U-net responded with visible artifacts, our approach preserved the X-ray signal's frequency characteristics.

Full PDF

11 X-ray Scatter Estimation Using Deep Splines

Philipp Roser, Annette Birkhold, Alexander Preuhs, Christopher Syben, Lina Felsner, Elisabeth Hoppe,Norbert Strobel, Markus Korwarschik, Rebecca Fahrig, Andreas Maier

Abstract —Algorithmic X-ray scatter compensation is a desir-able technique in ﬂat-panel X-ray imaging and cone-beam com-puted tomography. State-of-the-art U-net based image translationapproaches yielded promising results. As there are no physicsconstraints applied to the output of the U-Net, it cannot be ruledout that it yields spurious results. Unfortunately, those may bemisleading in the context of medical imaging. To overcome thisproblem, we propose to embed B-splines as a known operatorinto neural networks. This inherently limits their predictions towell-behaved and smooth functions. In a study using synthetichead and thorax data as well as real thorax phantom data, wefound that our approach performed on par with U-net whencomparing both algorithms based on quantitative performancemetrics. However, our approach not only reduces runtime andparameter complexity, but we also found it much more robustto unseen noise levels. While the U-net responded with visibleartifacts, our approach preserved the X-ray signal’s frequencycharacteristics.

Index Terms —Approximation, B-spline, Neural network, X-rayscatter.

I. I

NTRODUCTION X RAY transmission imaging enables advanced diagnos-tic imaging or facilitates instrument guidance duringminimally-invasive interventions. Unfortunately, primary pho-tons scattered by the patient’s body impair image quality atthe detector. Especially cone-beam X-ray systems using ﬂat-panel detectors suffer from this phenomenon due to the largeX-ray ﬁeld and the resulting high scatter-to-primary ratio.Besides contrast degradation in X-ray projections, artifactsin cone-beam computed tomography (CBCT) deteriorate theimage quality in the reconstructed volume. Since the advent ofCBCT, various methods to compensate for scatter have beendeveloped [1], [2]. These can be categorized into physical andalgorithmic scatter compensation methods.

A. Physical Scatter Compensation

Physical scatter compensation refers to the direct manipula-tion of the X-ray ﬁeld to either suppress scattered photons fromreaching the detector, or modulate them to enable differentia-tion into scattered and primary photons. A simple yet effectiveapproach is to increase the patient to detector distance, the so-called air gap [3]–[6]. Since the number of scattered photons

P. Roser, A. Preuhs, C. Syben, L. Felsner, E. Hoppe, and A. Maier are withthe Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander Universit¨at Erlangen-N¨urnberg, Erlangen, Germany. P. Roser isfunded by the Erlangen Graduate School in Advanced Optical Technolo-gies (SAOT), Friedrich-Alexander Universit¨at Erlangen-N¨urnberg, Erlangen,Germany. A. Maier is principal investigator at the SAOT. A. Birkhold,M. Kowarschik, and R. Fahrig are employees of Siemens Healthcare GmbH,91301 Forchheim, Germany. N. Strobel is with the Institute of Medical Engi-neering Schweinfurt, University of Applied Sciences W¨urzburg-Schweinfurt,97421 Schweinfurt, Germany. depends on the irradiated volume, the scatter-to-primary ratiocan be reduced by shaping the X-ray ﬁeld to a small volumeof interest. In slit scanning, the stitching of these smallervolumes of interest yields a normal-sized reconstruction [7],[8]. Although slit scanning reduces patient dose and scatteredradiation, the prolonged acquisition time complicates motioncompensation and X-ray tube heat management [9]. Sincerelying on large air gaps or strict collimation limits theﬂexibility of the imaging system, clinical systems are equippedwith anti-scatter grids that physically block scattered photons[5], [10]–[14]. Although state-of-the-art for clinical CBCT,anti-scatter grids have disadvantages. For example, they can in-crease radiation exposure [15] and lead to Moir´e and grid lineartifacts [16]. Overall, the efﬁciency and effectiveness of anti-scatter grids depends on multiple factors, e.g., air gap, photonenergy, source-to-detector distance, and detector resolution [2],[17], [18]. In contrast to detector-side grids, primary modula-tion follows a different operation principle [19]–[21]. In theshadow of this modulator grid, almost no primary radiation ismeasured. Instead, information of the scatter properties of theirradiated volume is encoded. Using demodulation algorithms,the X-ray projection and the associated scatter distributioncan be distinguished. Due to its complex structure, primarymodulation, especially with C-arm systems, has not madeits way into the clinical practice. In conclusion, hardware-based solutions induce additional manufacturing costs andlimit ﬂexibility. Also, since most approaches need algorithmicsupport or look-up tables, software-only solutions are highlydesirable [1], [2].

B. Algorithmic Scatter Compensation

Algorithmic scatter compensation approaches typically tryto estimate the scatter signal from the acquired X-ray pro-jections ﬁrst. The estimated scatter image is then subtractedto obtain the primary signal. For CBCT, using Monte Carlo(MC) methods or Boltzmann transport solvers are commonapproaches [22]–[24]. Both require an initial reconstruction.Their computational complexity can partially be coped withusing dedicated hardware and variance reduction techniques[25] or by using a coarse simulation to derive the scatteringmodel online [26]. Trading off effectiveness for efﬁciency,model-based approaches are typically preferred for clinicalapplications. Such approaches rely on either simpliﬁed phys-ical or analytical models [27]–[29] or convolution kernels[30]–[32] in combination with an iterative correction scheme.Although being fast, their ability to generalize to differentimaging settings is limited.Recently, learning-based methods found their way to mod-eling physical processes [33]–[35]. Especially the application a r X i v : . [ phy s i c s . m e d - ph ] J a n of the U-net [36] to the task of scatter estimation, referredto as deep scatter estimation, yields superior results comparedto kernel-based or MC methods, respectively [37]. The U-net based method has the potential to become the new goldstandard in the domain of scatter correction. However, thereare several drawbacks to employing a deep U-net. First, deepconvolutional neural networks are challenging to comprehendin their operating principle. Given the high parameter com-plexity of such a U-net, large amounts of training samplesare mandatory to arrive at a robust solution. Second, tomaintain a fast performance, the U-net requires a dedicatedtop tier graphics processing unit (GPU), which might not beuniversally available. Third, the U-net is a universal functionapproximator without including any known physical charac-teristics of scattered radiation.To come up with an alternative, it appears attractive tobuild on the rich prior knowledge about X-ray scatter. Also,it has been shown that the incorporation of prior knowledgeinto neural networks reduces error bounds [38] and simpliﬁesthe analysis of the deployed network [39]. Although subjectto photon Poisson noise, X-ray scatter is mainly a low-frequency signal in the diagnostic energy regime [40], [41]. Ina previous proof-of-concept study, we exploited this propertyby approximating X-ray scatter using a smooth bivariate B-spline and directly inferred spline coefﬁcients using a leanconvolutional encoder architecture [42]. This (a) allowed us toreduce the number of parameters and computational complex-ity tremendously, (b) ensured that no high-frequency details ofdiagnostic value can get manipulated, while (c) still meetingthe high accuracy of the U-net. C. Contributions

In this work, we extend the idea of approximating X-rayscatter as bivariate B-spline by integrating its evaluation intothe computational graph of a neural network. We analyze ourapproach in depth and compare it to several U-net realizationsin a nested cross-validation study using synthetic head andthorax X-ray image data. Besides benchmarking the parameterand computational complexity, we investigate all networks interms of their frequency response, the power spectral densityof the co-domain, and the response to unseen noise levels.Finally, we demonstrate how our method performs whenapplied to real data in an anthropomorphic thorax phantomstudy. II. M

ATERIALS AND M ETHODS

A. X-ray Projection Formation using B-splines

In general, in both techniques, X-ray ﬂuoroscopy and CBCTit is assumed that primary photons either reach the detectoralong a straight line or are absorbed completely. Therefore,the observed X-ray projection is typically described in termsof its primary component I p ∈ R w × h with image width w and height h in pixels. The intensity at pixel ( u, v ) is givenby the Beer-Lambert law I p ( u, v ) = (cid:90) E I ( u, v, E ) e − (cid:82) λ µ ( s + λ · r ( u,v ) ,E ) dλ dE . (1) The polychromatic ﬂat-ﬁeld projection or X-ray spectrum I as well as the linear attenuation coefﬁcient µ ( l , E ) dependon the photon energy E . Note that, due to the cone-beamgeometry, I decreases towards the borders and thus dependson the pixel position ( u, v ) . The linear photon attenuation fur-ther depends on the media along the straight line l : R (cid:55)→ R from the X-ray source s ∈ R in direction r ∈ R to the pixel ( u, v ) . Unfortunately, in reality, besides being photoelectricallyabsorbed, X-ray photons also undergo Compton scattering,Rayleigh scattering, or multiple occurrences of both effects.Therefore, a more realistic formation model I adds scatteredphotons I s leading to I ( u, v ) = I p ( u, v ) + I s ( u, v ) . (2)Since, for CBCT, we are mostly interested in the low-frequency components of the scatter, we neglect photon shotnoise and detector noise below.In a recent study [42], we experimentally showed that themain scatter components, which are low-frequency for diag-nostic X-rays, can be well approximated by sparse bivariate B-splines with error rates below . The domain and co-domainof a B-spline of degree k and order n = k + 1 are fully char-acterized by the knot vectors t u = ( t u, , t u, , . . . t u,w c − n ) and t v = ( t v, , t v, , . . . t v,h c − n ) , as well as the coefﬁcient matrix C ∈ R w c × h c with width w c and height h c , respectively. Basedon this, we can approximate the spatial scatter distribution asa tensor product B-spline ˜ I s ,n ˜ I s ,n ( u, v ) = w c (cid:88) i =1 h c (cid:88) j =1 C ( i, j ) B i,n, t u ( u ) B j,n, t v ( v ) , (3)with the basis splines B , which are zero for knots that donot affect the spline at the pixel ( u, v ) . The basis splines arerecursively deﬁned by B i,n, t ( x ) = x − t i t i + n − t i · B i,n − , t ( x )+ t i + n +1 − xt i + n +1 − t i +1 · B i +1 ,n − , t ( x ) , (4)with B i, , t ( x ) = (cid:40) , if t i ≤ x < t i +1 , otherwise . (5)Since the basis functions B are equal to zero for coefﬁcientsthat do not contribute to a pixel ( u, v ) , only a n × n sub-gridof C needs to be considered for each pixel. Thus, using matrixnotation, the tensor product can be reformulated to ˜ I s ,n ( u, v ) = u T · [ M n, t u ( u )] · C uv · [ M n, t v ( v )] T · v , (6)with the coefﬁcient patch C uv ∈ R n × n impacting thepixel ( u, v ) , the general matrix representation of a univariateB-spline M n, t ( · ) ∈ R n × n [43], and the vectors u = (cid:0) u u · · · u k (cid:1) T and v = (cid:0) v v · · · v k (cid:1) T . Thegeneral matrix representation of B-splines allows for arbitrarilyspaced knot grids, i.e., in our case, it allows for endpoint inter-polation, which exposes a more convenient behavior towardsthe borders of the evaluation grid. Besides the borders of thegrid, we currently limit our approach to uniform knot grids Fig. 1. The general architecture of our proposed approach. First, a(convolutional) encoder extracts the latent variables Z from the input X-rayprojection I . Second, a bottleneck network maps the latent space to bivariatespline coefﬁcients C . The different choices for both networks are discussedlater in more detail. Third, we obtain the scatter estimate ˜ I s ,n by evaluatingthe spline coefﬁcients C using the pre-calculated evaluation grid deﬁned bythe pre-computed matrices U and V . The top (green) path refers to theforward pass, and the bottom (orange) path to the backward pass, respectively. and cubic B-splines ( k = 3 ). Thus, for most pixels ( u, v ) , theB-spline is evaluated using M = 16  − − −  . (7)With using a ﬁxed knot grid and evaluation grid, we canpre-calculate both u T · [ M n, t u ( u )] and v T · [ M n, t v ( v )] foreach pixel ( u, v ) . Padding the resulting vectors with zeros toaccount for all coefﬁcients in C and stacking them row-wisely,we obtain the evaluation matrices U n ∈ R w × w c and V n ∈ R h × h c , respectively. Thus, we can calculate the cubic ( n = 4 )scatter distribution given the B-spline coefﬁcients C via ˜ I s , = U · C · V T . (8)Since neural networks can extract scatter distributions from ameasured X-ray projection either directly [37] or indirectly viaB-spline coefﬁcients [42] we aim to combine deep learningwith the matrix evaluation scheme. Using the Kroneckerproduct ‘ ⊗ ’, the derivative ∂ ˜ I s , ∂ C is ∂ ˜ I s , ∂ C = V ⊗ U . (9)Thus, we can straightforwardly embed the B-spline evaluationinto a computational graph f θ : R w × h (cid:55)→ R w × h withparameters to train θ without breaking differentiability andthus back-propagation. B. Network Architecture

In the following, we describe the devised neural network.The overall architecture comprises a generic convolutionalencoder followed by a bottleneck network to infer spline coef-ﬁcients from measured X-ray projections, as proposed in ourprevious study [42]. This architecture is depicted in Fig. 1. Weemploy a lean convolutional encoder f θ : R w × h (cid:55)→ R w c × h c that consists of d convolutional blocks. A block comprises twoconvolutional layers with c feature channels and × kernels.Each convolutional layer is followed by a rectiﬁed linearunit (ReLU) [44] activation, and between two blocks, × average pooling is applied. As X-ray scatter is low-frequencyand to potentially limit the computational complexity of our model, we allow for additional pooling layers before the ﬁrstconvolutional block. The encoder f θ is completed by a × convolution to build the weighted sum of all channels c .Overall, based on convolutions only, f θ only encodes a localscatter representation Z , since its receptive ﬁeld does notnecessarily cover the whole input image. To establish a globalcontext, we employ different types of bottleneck networks g φ to map Z to spline coefﬁcients C : (1) a constrainedweighting matrix W with W i,j > , (2) an unconstrainedfully-connected layer with ReLU activation (which merelyrelates to an unconstrained weighting matrix with enforcednon-negativity constraint), (3) two fully-connected layers withReLU activation, or (4) two additional convolutional blocksfollowed by a fully-connected layer. C. Synthetic Dataset

Acquiring raw scatter-free X-ray projections and theirscatter-contaminated counterparts is tedious and time-consuming, especially at the scale to carry out deep learning.As a solution, we leveraged the X-ray transport code MC-GPU[25] to generate artiﬁcial pairs of scatter-contaminated andscatter-free X-ray projections. We used openly available CTscans from The Cancer Imaging Archive (TCIA) [45] as inputsto the simulation. In order to keep the simulation time manage-able, we selected 20 head scans from the HNSCC-3DCT-RTdataset [46] and 15 thorax scans from the CT Lymph Nodesdataset [47], respectively. To prepare the phantoms for MCsimulation, we employed a basic pre-processing pipeline [48]based on tissue and density estimation [49], and connectedcomponent labeling [50]. For each CT scan, we simulated astack of 260 X-ray projections ( w = 1152 , h = 768 ) overan angular range of °. The source-to-isocenter and source-to-detector distances are

785 mm and , respectively.Per X-ray projection, we simulated × photons sampledfrom an

85 kV peak voltage tungsten spectrum. All projectionswere ﬂat-ﬁeld normalized. Note that we used the data as is,without registering similar anatomies in a common referenceframe. To be more comparable to the related work [37],speed-up training times, and suppress simulation noise, wedown-sampled the projections to × pixels. Before thedown-sampling, we applied Gaussian ﬁltering to the primaryand scatter projections independently ( σ p = 2 , σ s = 30 ).Corresponding cross-sectional slices were reconstructed on anisotropic grid with voxels. D. Real Dataset

To evaluate the proposed method on real data, we scannedthe thorax of an anthropomorphic phantom (PBU-60, KyotoKagaku Co., Ltd., Kyoto, Japan) using a C-arm CBCT system(ARTIS icono ﬂoor, Siemens Healthineers AG, Forchheim,Germany). In total, we acquired 12 datasets, three short scansfor each full view grid (referred to as full), and maximumcollimation (referred to as slit), both with and without ananti-scatter grid. Each short scan consists of 397 projections( × pixels, µ m isotropic spacing) over an angularrange of . ° using

85 kV peak tube voltage. The source-to-isocenter and source-to-detector distances are

750 mm and , respectively. We reconstructed × slices onan µ m isotropic voxel grid using an in-house reconstruc-tion pipeline. We regard the slit scan in conjunction with theanti-scatter grid as ground truth.III. E XPERIMENTS

In the following, we describe the experiments carried out toevaluate our proposed method. Since the U-net based approachoutperformed other computational scatter estimation methodsby far [37], we considered different conﬁgurations of the U-net as a baseline method. To account for our relatively smalltraining corpora, we evaluated our method and the baselineusing a nested cross-validation approach. For the head dataset(20 subjects), we used a ∗ -fold cross-validation approachand, for the thorax dataset (15 subjects), we used a ∗ -foldcross-validation approach. The real dataset was only used fortesting. To keep the training procedure manageable, we dividedthe evaluation into meta-parameter search based on the scattermean absolute percentage error (MAPE), and further in-depthanalysis. A. Meta-Parameter Search

Since clinic CBCT systems usually have ﬁne-tuned acqui-sition protocols for each anatomic region, we evaluated eachnetwork architecture for head and thorax data separately. Toﬁx the meta-parameters, we only used the synthetic headdataset and the scatter MAPE as metric. We validated bothour approach and the U-net for different combinations ofdepths d and feature channels c using Glorot initialization [51],which was also used by the baseline [37]. Furthermore, wedistinguished between a deep U-net (DU-net) and a shallowU-net (SU-net), in which the number of feature maps is notdoubled at each level. For all experiments in this section, weperformed a ∗ -fold cross-validation and trained all networksusing the adaptive moments optimizer (Adam) [52] with aninitial learning rate of − for 100 epochs. In total, we trained12 networks for each conﬁguration and used the averagedMAPE to assess their quality. To reduce the total computationtime, we stopped each training procedure when no signiﬁcantperformance increase was observed for 20 epochs.In a ﬁrst step, we scanned different parametrizations of thenetwork architectures to ﬁnd the most promising ones for in-depth comparison. Overall, we tested both U-nets with c = 16 and d ∈ { , , , } , and our approach with c = 16 , d ∈{ , , } , and additional pre-pooling p ∈ { , , } . In addition,to decouple the architecture of our convolutional encoder f θ and the spline coefﬁcient dimensionality, we investigated thefour bottleneck architectures g φ as described in Sec. II-B. B. Qualitative and Quantitative Results

Based on the ﬁndings of the previous experiments, weadapted the learning rate to − but kept the overall trainingroutine. We separately performed 4 ∗ ∗ C. In-Depth Analysis1) Spectral Analysis:

For clinical applications, data in-tegrity is of utmost importance to ensure that automatedsystems do not alter diagnostically relevant content. Whilethe predictions of neural networks may appear reasonableat ﬁrst glance, unrealistic perturbations can be unveiled byinvestigating the spectral properties of (a) the predicted images[53] or (b) the neural network itself. Therefore, we ﬁrstinvestigate all networks’ performance concerning the predictedscatter distributions’ power spectral density. Second, fromscatter estimation theory, we know that the scatter distributioncan be recovered from the measured signal by the convolutionwith a so-called scatter kernel [30]. This allows us to interpreta neural network for a speciﬁc pair of scatter distribution andX-ray signal as a ﬁltering operation and assess its frequencyresponse.

2) Noise Analysis:

Assessing a network’s robustness isinherently difﬁcult given small training corpora. Therefore,adversarial attacks are often used to expose weaknesses delib-erately. Since we trained all networks on noise-free data, test-ing them on data with different unseen noise levels appearedappropriate. To this end, we applied Poisson noise associatedwith different photon counts ranging from to to ourhead dataset and investigated the accuracy of the networks’predictions.

3) Runtime Analysis:

For interventional applications, thefast execution speed of computer programs is essential. There-fore, we benchmarked the average execution time for all net-works for different batch sizes using a 12-core CPU (Intel(R)Xeon(R) Silver 4116 CPU 2.10GHz).

4) Real Data Analysis:

To conﬁrm the generalizability toreal data, we tested the networks, which performed best onour synthetic thorax dataset, on the real thorax dataset. Forreference, we considered various different scatter suppressiontechniques, namely using an anti-scatter grid or slit scanning.As an almost scatter free baseline, we considered the con-ﬁguration to use both, an anti-scatter grid and slit scanning.Note that we neither performed an intensity or a geometrycalibration in between the scans to account for the missinganti-scatter grid. IV. R

ESULTS

A. Meta-Parameter Search1) Network Parametrization:

Figure 2 establishes the rela-tionship between the number of convolutional parameters totrain for each network to the averaged error rates of all foldsand patients. All networks’ error rates range between to , and we observed that overall the more compact networksoutperform the DU-nets. Based on these ﬁndings, we selectedtwo spline networks ( c = 16 , ( d, p ) ∈ { (4 , , (5 , } ) andfour U-nets (deep and shallow, c = 16 , d ∈ { , } ) for furtherinvestigations.

2) Bottleneck:

Figure 3 shows the results for the differentbottleneck architectures. Our proposed constrained weightingmatrix, homogeneously initialized, achieves the best results onaverage, even surpassing the convolutional bottleneck followedby a fully-connected layer. The unconstrained fully-connected No. parameters M A P E [ % ] DU-netSU-netOursFig. 2. Distribution of different network conﬁgurations for our approachand the U-net in terms of absolute percentage errors averaged over all foldsand patients. Note that, due to visualization purposes, the standard deviationencoded by the circular margins does not correspond to absolute valuesbut rather indicates the relative spread between the networks. The standarddeviation is in the range of to . The x-axis refers to the number ofparameters in the convolutional layers.1 2 3 4Fold05101520 M A P E [ % ] constrainedunconstrained fc-netconv-netFig. 3. Boxplots of the mean absolute percentage error (MAPE) of ourproposed method for different bottleneck architectures averaged over thevalidation folds for each training fold. The horizontal lines indicate the meanvalue over all test and validation folds. layer architectures yield considerably worse results. Figure4, which shows the weighting matrix for the constrainedand unconstrained case, substantiates this ﬁnding. While theconstrained matrix converges to a block circulant matrix,which merely relates to a convolution, the unconstrained onehardly resembles a sensible operation and is overall noisy. B. Qualitative and Quantitative Results

Figure 5 shows fold-wise boxplots for the two syntheticdatasets. We observe similar error rates of approximately across all folds and network conﬁgurations regardingthe predicted scatter distributions for the head dataset. Ingeneral, all networks achieve high SSIM values above 0.99 constrained unconstrained

Fig. 4. Normalized constrained (left) and unconstrained (right) weightingmatrices used to map the latent variables Z to spline coefﬁcients C . when comparing the reconstructed volumes from the simulatedideal primary signal to their counterparts obtained using thescatter-corrected projections. Note that for the head data,our approach performed equally well for all folds, whereasthe results obtained with the U-nets varied more widely.Processing the thorax datasets, on the other hand, was morechallenging. Again, all networks performed comparably wellwith error rates of approximately . . In comparison to thehead dataset, larger error margins and outliers were found.This can also be seen with the SSIM, which is, on average,just below 0.98. Again, we ﬁnd that our proposed method ison par with U-net-like structures from a quantitative point ofview.Looking at selected subjects of both datasets in Fig. 6reveals the potential advantages of the proposed approach.Modeling the predicted scatter as a B-spline intrinsicallylimits the network output to smooth surfaces. In contrast,both U-nets preserve some details of the input, especially theshallow architecture. However, the reconstructed slices showno systematic trend, and all methods can adequately recoverthe desired signal. C. In-Depth Analysis1) Spectral Analysis:

We calculated the power spectral den-sity by azimuthally averaging the magnitudes of the 2D Fouriertransform of X-ray scatter distributions. Also, we averaged allpower spectral densities of all projection, patients, and testfolds. The resulting densities plotted in Fig. 7 support ourprevious ﬁndings. While the U-nets yield numerically soundpredictions, they systematically boost the high frequencies inthe scatter distributions. Our approach, however, preservesthe real power spectral density over the whole frequencyspectrum. As mentioned above, conventional convolutionalneural networks can be interpreted for a single input imagein terms of a ﬁltering operation. Thus, we divided the Fouriertransform of the output by the Fourier transform of the input toobtain the respective frequency responses. Figure 8 shows theamplitude and phase of an ideal system’s frequency responses,for our method and both U-nets, averaged over all projectionsfor one patient. Our spline-net’s frequency response wascloser to the ideal frequency response in both, amplitude andphase. However, we observed a noticeable intensity shift inour method. The U-nets, in contrast, both exhibited largerdeviations in the patterns of amplitude and phase, indicatingthat their represented operation is less predictable.

2) Noise Analysis:

Figure 9 shows plots of the networks’error rates when confronted with noise. We could conﬁrmthat the U-net is very sensitive to unseen noise levels in bothconﬁgurations, whereas our approach performs more robustly.

3) Runtime Analysis:

Figure 10 compiles the inferencespeed of all considered networks. As expected from theparameter complexity, our approach is the fastest with to

30 ms and therefore . to . times faster than the SU-netwith

34 ms to

50 ms . The DU-net is the slowest with

89 ms to

144 ms .

4) Real Data Analysis:

As shown in Fig. 11, all methodswere able to compensate for scatter artifacts well. However,

Head data Thorax data M A P E [ % ] M A P E [ % ] SS I M SS I M DU-net SU-net OursFig. 5. Quantitative boxplots for our synthetic datasets. The top row shows the mean absolute percentage errors (MAPE) with respect to the scatter groundtruth for each test fold. The bottom row shows structural similarity indices (SSIM) between reconstructed volumes from the simulated ideal primary signaland the scatter-corrected ones using neural networks. The horizontal lines indicate the overall average across all folds. For the sake of clarity, we only depictthe best performing networks in each category. For the head dataset, we depict the DU-net with d = 6 , SU-net with d = 7 , and our method with d = 4 and p = 1 . For the thorax dataset, we depict the DU-net with d = 7 , SU-net with d = 6 , and our method with d = 5 and p = 0 . Total Scatter 4.67 %DU-net 3.73 %SU-net 4.33 %Ours Reco. 54 HUUncorr. 4 HUDU-net 4 HUSU-net 2 HUOurs2.98 % 5.31 % 2.07 % 85 HU 5 HU 9 HU 10 HU2.95 % 3.55 % 5.27 % 103 HU 8 HU 12 HU 14 HU6.82 % 3.05 % 3.06 % 88 HU 16 HU 7 HU 5 HU − − −

10 0 10 20 30Absolute Percentage Error [%] − Fig. 6. Selected subjects of both datasets. The results of the learning-based approaches are given in terms of error maps using the absolute percentage error(APE) for scatter and the absolute error (AE) for the reconstructed slices. For convenience, the associated mean APE (MAPE) and mean AE (MAE) are alsoprovided below each output. − − Normalized frequency − − N o r m a li ze dpo w e r GTDU-netSU-netOursFig. 7. Power spectral density plots for the X-ray scatter distributions of theground truth (black solid), ours (light orange dashed), and the U-net (lightpurple dotted) averaged over all test folds. Note that the graphs associatedwith both U-nets are almost identical. A m p lit ud e GT DU-net SU-net Ours − − − P h a s e Fig. 8. Frequency responses of the systems for one test patient averaged overall validation folds. From top to bottom: Normalized log -amplitude, phasein radians. From left to right: ideal system (GT), our spline-net (Ours), deepU-net (DU-net), shallow U-net (SU-net). Photon count . . . . . . M A P E [ % ] DU-netSU-netOursFig. 9. Error rates of predicted scatter distributions for different noise levelsaveraged over all validation and test fold networks. All networks have beentrained with noise-free data. Note that a lower photon count relates to a highernoise level. 1 2 4 8 16 32Batch size R un ti m e [ m s ] DU-net 6DU-net 7SU-net 6SU-net 7Ours 4 \ \ ms for different architectures (speciﬁedby the depth and optional pre-pooling, d \ p ) and batch sizes. employing an anti-scatter grid still yielded the lowest overallerror in Hounsﬁeld units (HU) as compared to using slit colli-mation in addition. The learning-based approaches performedabout as well as the slit scanning without an anti-scatter grid,which is in accordance to previous ﬁndings [37]. Overall, thenetworks achieved similar error rates, and no systematic trendwas observable. V. D ISCUSSION

X-ray scatter is a major source of artifacts in interven-tional CBCT. Deep-learning-based approaches have shown thepotential to outperform conventional physical or algorithmicapproaches to scatter compensation. Without incorporatingprior information, scatter distributions can be inferred fromthe measured X-ray signal [37]. However, data integrity androbustness are critical aspects of clinical imaging, which canbe violated by deep neural networks [54].To ensure sound scatter estimates, we proposed to embedbivariate B-splines in neural networks to constrain their co-domain to smooth results [42]. By reformulating the splineevaluation in terms of matrix multiplications, we were able tointegrate B-splines with neural networks without further adosuch that end-to-end training was feasible. In an extensivecross-validation using synthetic data, we showed that ourproposed lean convolutional encoder using B-spline evaluationperforms on par with several U-net based architectures. Wesubstantiated this ﬁnding in a ﬁrst phantom study. There, ourapproach performed basically as well as the U-net and the slitscanning technique (without anti-scatter grid), which was usedas a baseline before [37].Note, however, that the proposed method offers severaladvantages not present in the U-net architectures. First, weconsiderably lowered the parameter and runtime complexity,rendering our method suitable for a variety of hardware. Moreimportantly, we veriﬁed that our spline-based approach ensuresdata integrity concerning the power spectral density of scatterestimates and the overall network’s frequency response. Thisproperty ensures that no high-frequency details, which relate toanatomic structures or pathologies, are altered. In comparison,the U-net considerably changes the concerning spectral con-tents, which was already shown for neural networks containing R ec on s t r u c ti on Grid+Slit Grid+Full Slit Full DU-net SU-net Ours D i ff e r e n ce . ± .

21 58 . ± .

09 123 . ± .

32 62 . ± .

25 64 . ± .

23 63 . ± . Fig. 11. Central slices of reconstructed volumes for different scatter compensation strategies and errors taken with respect to the reconstruction result resultingfrom the grid + slit data acquisition. Grid refers to employing a conventional anti-scatter grid. Slit refers to the most narrow collimator setting available onthe X-ray imaging system. Full refers to using no collimation at all. Both, the reconstructed slices as well as the difference images are shown using a graylevel window [ − , HU. For convenience the absolute average HU errors and the associated standard deviations with respect to three consecutivemeasurements are provided. up-convolutions [53]. As our spline network corresponds to alow-pass ﬁltering operation, it is robust towards noise evenwhen trained on noise-free data.While we implemented the U-net baselines to the best ofour knowledge, our error rates are higher than previouslyreported [37]. Potential reasons include but are not limited to(a) different simulation codes, (b) our smaller training corpora,and (c) the heterogeneity of our data. Our simulation setupcurrently assumes an ideal detector, and we do not considerthe domain shift between synthetic and real data.For future work, we indicate several research directionsfor either method or data and experiments. First, since ourpreferred bottleneck weighting matrix converges to a blockcirculant matrix, a ﬁxed representation is desirable to furtherreduce the number of trainable parameters and increase theplausibility of our approach. For instance, training both theencoder and the bottleneck separately or introducing addi-tional constraints are promising approaches to do so. Second,replacing the bottleneck fully-connected layer with a smallU-net reduces the number of parameters while still coveringthe entire latent space with its receptive ﬁeld. Third, sinceour network is end-to-end trainable, it appears reasonable toinclude the reconstruction into the computational graph tocalculate the loss function in the CBCT domain [55], [56]. Lastbut not least, we believe our approach applies to low-frequencysignal estimation and correction in general, e.g., bias ﬁeldcorrection in magnetic resonance imaging, ultrasound imaging,or microscopy techniques.VI. C

ONCLUSION

Embedding B-splines in neural networks ensures data in-tegrity for low-frequency signals. This reduces the number ofnetwork parameters needed to arrive at physically sensibleresults, and thanks to the reduced parameter set, networkinference can be made faster. D

ISCLAIMER

The concepts and information presented in this article arebased on research and are not commercially available.R

EFERENCES[1] E.-P. R¨uhrnschopf and K. Klingenbeck, “A general framework andreview of scatter correction methods in x-ray cone-beam computerizedtomography. Part 1: Scatter compensation approaches,”

Med Phys ,vol. 38, no. 7, pp. 4296–4311, 2011.[2] ——, “A general framework and review of scatter correction methodsin cone-beam CT. Part 2: Scatter estimation approaches,”

Med Phys ,vol. 38, no. 9, pp. 5186–5199, 2011.[3] F. Groedel and R. Wachter, “Bedeutung der R¨ohren-Fern- und Platten-Abstandsaufnahmen,”

Verh d Dt R¨ontgengesellschaft , vol. 17, pp. 134–135, 1926.[4] F. Janus, “Die Bedeutung der gestreuten ungerichteten R¨ontgenstrahlenfur die Bildwirkung und ihre Beseitigung,”

Verh d DtR¨ontgengesellschaft , vol. 17, pp. 136–139, 1926.[5] U. Neitzel, “Grids or air gaps for scatter reduction in digital radiography:A model calculation,”

Med Phys , vol. 19, no. 2, pp. 475–481, 1992.[6] J. Persliden and G. A. Carlsson, “Scatter rejection by air gaps in diagnos-tic radiology. calculations using a Monte Carlo collision density methodand consideration of molecular interference in coherent scattering,”

PhysMed Biol , vol. 42, no. 1, pp. 155–175, 1997.[7] G. T. Barnes, “Contrast and scatter in x-ray imaging.”

RadioGraphics ,vol. 11, no. 2, pp. 307–323, 1991.[8] R. Bhagtani and T. G. Schmidt, “Simulated scatter performance of aninverse-geometry dedicated breast CT system,”

Med Phys , vol. 36, no. 3,pp. 788–796, 2009.[9] T. G. Schmidt, R. Fahrig, N. J. Pelc, and E. G. Solomon, “An inverse-geometry volumetric CT system with a large-area scanned source: Afeasibility study,”

Med Phys , vol. 31, no. 9, pp. 2623–2627, 2004.[10] G. Bucky, “ ¨Uber die Ausschaltung der im Objekt entstehendenSekund¨arstrahlen bei R¨ontgenstrahlen,”

Verh d Dt R¨ontgengesellschaft ,vol. 9, pp. 30–32, 1913.[11] W. A. Kalender, “Calculation of x-ray grid characteristics by MonteCarlo methods,”

Phys Med Biol , vol. 27, no. 3, pp. 353–361, 1982.[12] H.-P. Chan and K. Doi, “Investigation of the performance of antiscattergrids: Monte Carlo simulation studies,”

Phys Med Biol , vol. 27, no. 6,pp. 785–803, 1982.[13] H.-P. Chan, Y. Higashida, and K. Doi, “Performance of antiscatter gridsin diagnostic radiology: Experimental measurements and Monte Carlosimulation studies,”

Med Phys , vol. 12, no. 4, pp. 449–454, 1985.[14] H.-P. Chan, K. L. Lam, and Y. Wu, “Studies of performance of antiscattergrids in digital radiography: Effect on signal-to-noise ratio,”

Med Phys ,vol. 17, no. 4, pp. 655–664, 1990. [15] H. Aichinger, J. Dierker, S. Joite-Barfuss, and M. Saebel,

Radiationexposure and image quality in X-ray diagnostic radiology . Berlin,New York: Springer, 2004.[16] D. M. Gauntt and G. T. Barnes, “Grid line artifact formation: Acomprehensive theory,”

Med Phys , vol. 33, no. 6, pp. 1668–1677, 2006.[17] V. Singh, A. Jain, D. R. Bednarek, and S. Rudin, “Limitations of anti-scatter grids when used with high resolution image detectors,” in

ProcSPIE Int Soc Opt Eng , B. R. Whiting and C. Hoeschen, Eds., vol. 9033,International Society for Optics and Photonics. SPIE, 2014, pp. 1618–1626.[18] R. Rana, A. Jain, A. Shankar, D. R. Bednarek, and S. Rudin, “Scatterestimation and removal of anti-scatter grid-line artifacts from anthro-pomorphic head phantom images taken with a high resolution imagedetector,” in

Proc SPIE Int Soc Opt Eng , D. Kontos and T. G. Flohr,Eds., vol. 9783, International Society for Optics and Photonics. SPIE,2016, pp. 1619–1628.[19] A. Bani-Hashemi, E. Blanz, J. Maltz, D. Hristov, and M. Svatos,“Tu-d-i-611-08: Cone beam x-ray scatter removal via image frequencymodulation and ﬁltering,”

Med Phys , vol. 32, no. 6, pp. 2093–2093,2005.[20] L. Zhu, N. R. Bennett, and R. Fahrig, “Scatter correction method forx-ray CT using primary modulation: Theory and preliminary results,”

IEEE Trans Med Imaging , vol. 25, no. 12, pp. 1573–1587, 2006.[21] B. Bier, M. Berger, A. Maier, M. Kachelrieß, L. Ritschl, K. M¨uller, J.-H.Choi, and R. Fahrig, “Scatter correction using a primary modulator ona clinical angiography c-arm CT system,”

Med Phys , vol. 44, no. 9, pp.125–137, 2017.[22] W. Zbijewski and F. J. Beekman, “Fast scatter estimation for cone-beam x-ray CT by combined Monte Carlo tracking and richardson-lucyﬁtting,” in

IEEE Nucl Sci Symp Conf Rec , vol. 5, 2004, pp. 2774–2777.[23] G. Poludniowski, P. M. Evans, V. N. Hansen, and S. Webb, “An efﬁcientMonte Carlo-based algorithm for scatter correction in keV cone-beamCT,”

Phys Med Biol , vol. 54, no. 12, pp. 3847–3864, 2009.[24] A. Wang, A. Maslowski, P. Messmer, M. Lehmann, A. Strzelecki, E. Yu,P. Paysan, M. Brehm, P. Munro, J. Star-Lack, and D. Seghers, “Acuroscts: A fast, linear boltzmann transport equation solver for computedtomography scatter – part ii: System modeling, scatter correction, andoptimization,”

Med Phys , vol. 45, no. 5, pp. 1914–1925, 2018.[25] A. Badal and A. Badano, “Accelerating Monte Carlo simulations ofphoton transport in a voxelized geometry using a massively parallelgraphics processing unit,”

Med Phys , vol. 36, no. 11, pp. 4878–4880,2009.[26] M. Baer and M. Kachelrieß, “Hybrid scatter correction for CT imaging,”

Phys Med Biol , vol. 57, no. 21, pp. 6849–6867, 2012.[27] W. Swindell and P. M. Evans, “Scattered radiation in portal images:A Monte Carlo simulation and a simple physical model,”

Med Phys ,vol. 23, no. 1, pp. 63–73, 1996.[28] W. Yao and K. W. Leszczynski, “An analytical approach to estimatingthe ﬁrst order scatter in heterogeneous medium. ii. a practical applica-tion,”

Med Phys , vol. 36, no. 7, pp. 3157–3167, 2009.[29] M. Meyer, W. A. Kalender, and Y. Kyriakou, “A fast and pragmaticapproach for scatter correction in ﬂat-detector CT using elliptic modelingand iterative optimization,”

Phys Med Biol , vol. 55, no. 1, pp. 99–120,2009.[30] B. Ohnesorge, T. Flohr, and K. Klingenbeck-Regn, “Efﬁcient objectscatter correction algorithm for third and fourth generation CT scanners,”

Eur Radiol , vol. 9, no. 3, pp. 563–569, Mar 1999.[31] H. Li, R. Mohan, and X. R. Zhu, “Scatter kernel estimation with an edge-spread function method for cone-beam computed tomography imaging,”

Phys Med Biol , vol. 53, no. 23, pp. 6729–6748, 2008.[32] M. Sun and J. M. Star-Lack, “Improved scatter correction using adaptivescatter kernel superposition,”

Phys Med Biol , vol. 55, no. 22, pp. 6695–6720, 2010.[33] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neu-ral ordinary differential equations,” in

Adv Neural Inf Process Syst ,S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi,and R. Garnett, Eds. Curran Associates, Inc., 2018, pp. 6571–6583.[34] P. Roser, X. Zhong, A. Birkhold, N. Strobel, M. Kowarschik, R. Fahrig,and A. Maier, “Physics-driven learning of x-ray skin dose distribution ininterventional procedures,”

Med Phys , vol. 46, no. 10, pp. 4654–4665,2019.[35] P. Roser, X. Zhong, A. Birkhold, A. Preuhs, C. Syben, E. Hoppe,N. Strobel, M. Kowarschik, R. Fahrig, and A. Maier, “Simultaneousestimation of x-ray back-scatter and forward-scatter using multi-tasklearning,” in

International Conference on Medical Image Computingand Computer-Assisted Intervention . Springer, 2020, pp. 199–208. [36] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-works for biomedical image segmentation,” in

Med Image ComputComput Assist Interv , ser. Lecture notes in computer science, vol. 9351.Springer, 2015, pp. 234–241.[37] J. Maier, E. Eulig, T. V¨oth, M. Knaup, J. Kuntz, S. Sawall, andM. Kachelrieß, “Real-time scatter estimation for medical CT using thedeep scatter estimation: Method and robustness analysis with respectto different anatomies, dose levels, tube voltages, and data truncation,”

Med Phys , vol. 46, no. 1, pp. 238–249, 2019.[38] A. Maier, C. Syben, B. Stimpel, T. W¨urﬂ, M. Hoffmann, F. Schebesch,W. Fu, L. Mill, L. Kling, and S. Christiansen, “Learning with knownoperators reduces maximum error bounds,”

Nat Mach Intell , vol. 1, pp.373–380, 2019.[39] B. Stimpel, C. Syben, F. Schirrmacher, P. Hoelter, A. D¨orﬂer, andA. Maier, “Multi-modal deep guided ﬁltering for comprehensible med-ical image processing,”

IEEE Trans Med Imaging , vol. 39, no. 5, pp.1703–1711, 2019.[40] G. H. Glover, “Compton scatter effects in CT reconstructions,”

MedPhys , vol. 9, no. 6, pp. 860–867, 1982.[41] J. M. Boone and J. A. Seibert, “An analytical model of the scatteredradiation distribution in diagnostic radiology,”

Med Phys , vol. 15, no. 5,pp. 721–725, 1988.[42] P. Roser, A. Birkhold, A. Preuhs, C. Syben, N. Strobel, M. Korwarschik,R. Fahrig, and A. Maier, “Deep scatter splines: Learning-based medicalx-ray scatter estimation using b-splines,” in

The 6th InternationalConference on Image Formation in X-Ray Computed Tomography (CT-Meeting) , 2020.[43] K. Qin, “General matrix representations for b-splines,”

The VisualComputer , vol. 16, no. 3-4, pp. 177–186, 2000.[44] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectiﬁer neuralnetworks,” in

Proceedings of the fourteenth international conference onartiﬁcial intelligence and statistics , 2011, pp. 315–323.[45] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore,S. Phillips, D. Mafﬁtt, M. Pringle, L. Tarbox, and F. Prior, “TheCancer Imaging Archive (TCIA): Maintaining and operating a publicinformation repository,”

J Digit Imaging , vol. 26, pp. 1045–1057, 072013.[46] T. Bejarano, M. De Ornelas Couto, and I. Mihaylov, “Head-and-neck squamous cell carcinoma patients with CT taken duringpre-treatment, mid-treatment, and post-treatment dataset. the cancerimaging archive,” 2018. [Online]. Available: http://doi.org/10.7937/K9/TCIA.2018.13upr2xf[47] H. Roth, L. Le, S. Ari, K. Cherry, J. Hoffman, S. Wang, andR. Summers, “A new 2.5 d representation for lymph node detectionin ct. the cancer imaging archive,” 2018. [Online]. Available:http://doi.org/10.7937/K9/TCIA.2015.AQIIDCNM[48] P. Roser, A. Birkhold, A. Preuhs, B. Stimpel, C. Syben, N. Strobel,M. Kowarschik, R. Fahrig, and A. Maier, “Fully-automatic CT datapreparation for interventional x-ray skin dose simulation,” in

Bildver-arbeitung f¨ur die Medizin 2020 . Springer, 2020, pp. 125–130.[49] W. Schneider, T. Bortfeld, and W. Schlegel, “Correlation between CTnumbers and tissue parameters needed for Monte Carlo simulations ofclinical dose distributions,”

Phys Med Biol , vol. 45, no. 2, pp. 459–478,2000.[50] L. He, X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao, “The connected-component labeling problem: A review of state-of-the-art algorithms,”

Pattern Recognit , vol. 70, pp. 25–43, 2017.[51] X. Glorot and Y. Bengio, “Understanding the difﬁculty of trainingdeep feedforward neural networks,” in

Proceedings of the thirteenthinternational conference on artiﬁcial intelligence and statistics , 2010,pp. 249–256.[52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in , Y. Bengio and Y. LeCun,Eds., 2015.[53] R. Durall, M. Keuper, and J. Keuper, “Watch your up-convolution: Cnnbased generative deep neural networks are failing to reproduce spectraldistributions,” in

Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition , 2020, pp. 7890–7899.[54] Y. Huang, T. W¨urﬂ, K. Breininger, L. Liu, G. Lauritsch, and A. Maier,“Some investigations on robustness of deep learning in limited angletomography,” in

Med Image Comput Comput Assist Interv . Springer,2018, pp. 145–153.[55] C. Syben, M. Michen, B. Stimpel, S. Seitz, S. Ploner, and A. K. Maier,“Pyro-NN: Python reconstruction operators in neural networks,”

MedPhys , vol. 46, no. 11, pp. 5110–5115, 2019.[56] C. Syben, B. Stimpel, P. Roser, A. D¨orﬂer, and A. Maier, “Knownoperator learning enables constrained projection geometry conversion: Parallel to cone-beam for hybrid MR/x-ray imaging,”