Prediction of low-keV monochromatic images from polyenergetic CT scans for improved automatic detection of pulmonary embolism
Constantin Seibold, Matthias A. Fink, Charlotte Goos, Hans-Ulrich Kauczor, Heinz-Peter Schlemmer, Rainer Stiefelhagen, Jens Kleesiek
PPREDICTION OF LOW-KEV MONOCHROMATIC IMAGES FROM POLYENERGETIC CTSCANS FOR IMPROVED AUTOMATIC DETECTION OF PULMONARY EMBOLISM
Constantin Seibold ∗ , Matthias A. Fink ∗ ,Charlotte Goos , Hans-Ulrich Kauczor Heinz-Peter Schlemmer , Rainer Stiefelhagen , Jens Kleesiek , Institute of Anthropomatics & Robotics, Karlsruhe Institute of Technology, Germany Department of Diagnostic and Interventional Radiology, University Hospital Heidelberg, Germany German Cancer Research Center, Heidelberg, Germany Institute for AI in Medicine (IKIM), University Hospital Essen, Germany
ABSTRACT
Detector-based spectral computed tomography is a recentdual-energy CT (DECT) technology that offers the possi-bility of obtaining spectral information. From this spectraldata, different types of images can be derived, amongst oth-ers virtual monoenergetic ( monoE ) images.
MonoE imagespotentially exhibit decreased artifacts, improve contrast, andoverall contain lower noise values, making them ideal can-didates for better delineation and thus improved diagnosticaccuracy of vascular abnormalities.In this paper, we are training convolutional neural net-works (CNN) that can emulate the generation of monoE images from conventional single energy CT acquisitions.For this task, we investigate several commonly used image-translation methods. We demonstrate that these methodswhile creating visually similar outputs, lead to a poorerperformance when used for automatic classification of pul-monary embolism (PE). We expand on these methods throughthe use of a multi-task optimization approach, under whichthe networks achieve improved classification as well as gener-ation results, as reflected by PSNR and SSIM scores. Further,evaluating our proposed framework on a subset of the RSNA-PE challenge data set shows that we are able to improve theArea under the Receiver Operating Characteristic curve (Au-ROC) in comparison to a na¨ıve classification approach from0.8142 to 0.8420.
Index Terms — Image-to-Image Translation, SpectralComputer Tomography, Domain Adaptation, Pulmonary Em-bolism Diagnosis
1. INTRODUCTION
In spectral computed tomography (DECT), projection data si-multaneously obtained from both detector layers is utilized togenerate spectral images such as virtual monoenergetic ( mo-noE ) scans. Next to the conventional (polyenergetic) images, ∗ denotes equal contribution. multiple spectrally distinct attenuation maps can be obtainedfrom a single scan and used to derive different types of im-ages. The clinical uses of the DECTs can be summarized withenhanced visualization of intravascular contrast, reduction ofartifacts such as calcium blooming, material decomposition,and radiation dose reduction [1]. Therefore, in comparisonto conventional CT, DECT compares favorably for the diag-nosis of various diseases such as myocardial perfusion [2] orpulmonary embolisms [3].We argue that similar to the expert radiologist, convolu-tional neural networks (CNN) may benefit when trained onDECT data. However, as most currently existing CT datasets were acquired with conventional CT scanners they donot comprise monoE images. To bridge this gap, we in-vestigate the use of existing image-translation models suchas Pix2Pix [4], which might be able to use the underlyingdistributions in polyenergetic images to predict spectral im-ages akin to what was done to translate CT images to MRIscans [5]. In turn, these generated synthetic monoE imagesmight be used as input for CNNs potentially facilitating theirdetection of pathologies. While existing image-translationmethods are able to generate visually appealing results theydo not enforce features that enable the correct identification ofcertain classes. For this reason, we introduce a joint optimiza-tion between the generation of the monoenergetic domain andthe simultaneous identification of pathologies. This leads toa network that learns to combine features necessary for adownstream classification task as well as for synthetic imagegeneration. In other words, the proposed framework learns asuitable mapping on the basis of monoenergetic images.Our contributions can be summarized as (1) an extensivestudy comparing various image-translation methods for theprediction of monoE images from conventional polyenergeticscans, (2) evaluation of the classification accuracy of pre-dicted synthetic monoE images for the detection of PE, and(3) proposal of a training regime, enabling to generate datathat is not only visually similar but also incorporates featuresnecessary for the automatic identification of pathologies. a r X i v : . [ ee ss . I V ] F e b ) c)b) Fig. 1 . Overview of the different approaches for the combination of domain adaptation and classification. On the left, a) trainsa generator on the paired dataset, which would be followed by the training of a classifier on annotated data in b). c) displaysour approach of the joint optimization of generator and classifier, where the generator learns the mapping for unlabeled data,while adjusting its features in a way that it does not hinder the classification network when given annotated data.
2. METHODS AND MATERIALS
Suppose, we are given two distinct data sets D and D . D consists of unannotated images with poly- and monoenergeticdepictions. D describes a set of images with slice-level dis-ease annotations without corresponding monoenergetic repre-sentation. We now aim to design a unified model that jointlyoptimizes disease identification and domain adaption most fit-ting for the task. We have formulated these two tasks into thesame framework so that 1) it trains these tasks end-to-end and2) the two tasks can be mutually beneficial. The proposedarchitecture is displayed in Fig. 1 c). Methodology:
The proposed framework jointly optimizestwo tasks in an end-to-end manner. As one task, we considerthe problem of translating between the domain of polyener-getic x ∈ X and monoE images y ∈ Y as a paired image-translation problem. Here, a generator aims to learn a map-ping G : x → y , which minimizes the difference between thetwo paired images. This objective can be expressed as L L = E x,y [ || G ( x ) − y || ] . (1)We utilize the mean absolute error as it has been found to leadto less blurry images [4].Consecutively, the output of the generator is fed into aclassification network C , which attempts to predict the occur-rence of a disease label z , C : G ( x ) → z of the annotated dataset. We utilize ResNet50 [6], however our framework can beeasily extended to employ any other existing CNN model. Weutilize a sigmoid activation σ for making output predictions. L cls = E x,z [ − z log σ ( C ( G ( x ))) − (1 − z ) log(1 − σ ( C ( G ( x )))] (2)To optimize both objectives during the training process, weconstruct our data set as a combination of the two data sets(see below) and sample the batch in a way such that on av-erage it consists of 50% of either. Therefore, target disease labels are only given for half of the batch and monoener-getic target images for the other half. To accommodate thiscircumstance into the optimization function we introduce amarker variable m , which switches between [0 , dependingon whether we are presented a target image y or a target label z . In this manner, the final loss can be formulated as L = m ∗ L cls + (1 − m ) ∗ L L . (3)For backpropagation of the gradients one network is frozen,while the other is updated similar to an adversarial training. Implementation Details:
We train our networks jointly in anend-to-end manner by sequentially passing data through thegenerator and classification network. Our generator networkutilizes a fully convolutional 9-block ResNet encoder-decodernetwork, however, similar to our classifier, the model can eas-ily be replaced by more advanced architectures. We use Adamfor optimization with a learning rate of 0.0002, β = 0 . and β = 0 . with a weight decay of 0.00001. After trainingfor 5 epochs on the joint data set, we decay our learning rateto 0 over the following 5 epochs. We use an image-size of × with a batch size of 5 for all our experiments. Experimental Setup:
We utilize two data sets for our ex-periments. The private dual-energy computed tomographypulmonary angiography (DE-CTPA) data set D was gath-ered during routine clinical workup of 27 consecutive patientswith suspected pulmonary embolism. The CT scans were per-formed on a dual-layer detector (IQon Spectral CT, PhilipsHealthcare). Standard arterial series and the correspondingmonoenergetic images at low-energy levels (40 keV) were re-constructed. The data set contains 7892 image pairs.The second data set D is a subset of the RSNA STR Pul-monary Embolism Detection [7]. Out of the annotated 7279subjects, we sample 10% of the training data patient-wise.The sampled data set consists of a total of 161253 annotatedslices with roughly the same label distribution as present inthe open training set. We further split the data patient-wise50%/25%/25% into train-, val- and test-sets, respectively.nput L1 SPL Pix2Pix CRN Pix2PixHD Ours Target
SSIM : 0 . PSNR : 28 . SSIM : 0 . PSNR : 43 . SSIM : 0 . PSNR : 41 . SSIM : 0 . PSNR : 42 . SSIM : 0 . PSNR : 39 . SSIM : 0 . PSNR : 37 . SSIM : 0 . PSNR : 43 . SSIM : 1 . PSNR : − SSIM : 0 . PSNR : 27 . SSIM : 0 . PSNR : 42 . SSIM : 0 . PSNR : 39 . SSIM : 0 . PSNR : 42 . SSIM : 0 . PSNR : 40 . SSIM : 0 . PSNR : 37 . SSIM : 0 . PSNR : 42 . SSIM : 1 . PSNR : − SSIM : 0 . PSNR : 28 . SSIM : 0 . PSNR : 43 . SSIM : 0 . PSNR : 39 . SSIM : 0 . PSNR : 42 . SSIM : 0 . PSNR : 40 . SSIM : 0 . PSNR : 39 . SSIM : 0 . PSNR : 41 . SSIM : 1 . PSNR : − Fig. 2 . Qualitative comparison of different image translation methods on our internal DE-CTPA dataset. Individual SSIM andPSNR values are shown on the images. Areas around
Pulmonary Embolisms are displayed seperately.For our experiments on our DE-CTPA, we perform 5-fold cross validation and average our reconstruction resultsin terms of Peak-Signal-to-Noise-Ratio (PSNR) and Struc-tural Similarity Index Measure (SSIM). For the identifica-tion of PE, we perform binary classification on slice level foreach presented image domain and report the area under thereceiver-operating-characteristic (AuROC) on the test split ofthe model which performed best on the validation set. Wevalidated our model after each epoch.We compare against various image-translation modelssuch as an L1-loss-based generator as a baseline, Pix2Pix[4],Pix2PixHD[8], CRN[9], SPL[10]. We further added L1-Losses to feature-loss based methods (CRN, Pix2PixHD),denoted by *. All methods apart from CRN and Pix2PixHD,which use their originally proposed architecture, are trainedusing the same 9-block ResNet architecture.
Orig. denotesthe direct usage of conventional CT imagery for either thecomputation of PSNR/SSIM or as input into a classificationnetwork. To evaluate classification performance for differentimage translation methods, we train all methods on the samesplit in the cross validation setting of our internal data set.
Compliance with ethical standards:
The first data set wasgathered as part of a retrospective single-centre HIPAA-compliant study, which was approved by the local institu-tional review board (No. S-236/2020) with a waiver forwritten informed consent. As the second data set is part ofa public competition ethical approval was not required asconfirmed by the license attached with the open access data.
Conflicts of Interest:
No funding was received for conduct-ing this study. The authors have no relevant financial or non-financial interests to disclose. Method SSIM PSNROrig. . ± .
007 30 . ± . L1 ± ± SPL[10] ± . ± . Pix2Pix [4] . ± .
003 40 . ± . Pix2PixHD [8] . ± .
004 38 . ± . CRN [9] . ± .
551 19 . ± . Pix2PixHD* . ± .
004 38 . ± . CRN* . ± . . ± . Ours ± ± Table 1 . Reconstruction results of various Image-Translationmethods. Best and second-best result in bold and cursive .
3. RESULTSQuantative Results of Translation Properties:
The quan-titative results on the reconstruction ability are displayed inTable 1. Models optimized on image-based comparison out-perform feature-loss and adversarial methods for the evalu-ated task. Our method achieves similar performance to theL1-based generator. All methods apart from the feature lossbased CRN model manage to create high quality visual re-constructions of the monoE images. Qualitative samples canbe seen in Fig. 2. Areas around pulmonary embolisms arefurther highlighted.
Impact on automatic PE diagnosis:
The quantitative resultson the classification results of a ResNet50 network trained onvarious input image domains are displayed in Table 2. Despiteomain
Orig. L1 SPL Pix2Pix CRN* P2PHD*
OursAuROC
Table 2 . Pulmonary Embolism -Classification results of aResNet-50 trained on images from different image domainsof various Image-Translation methods. Best result in bold .the similar SSIM/PSNR results, the L1-loss-based model gen-erates images, which slightly hamper the classification abilityof a model. The other compared models worsen the perfor-mance, while our proposed method manages to generate vi-sually fitting images as well as improves classification resultsover the baseline.
4. DISCUSSION
We have investigated the potential use of the prediction ofmonoenergetic from polyenergetic images for the automaticidentification of pathologies in CTPAs. We have displayedthat most established image translation method either fail tocorrectly reconstruct that domain or are dismissing featuresnecessary for classification. To offset these shortcomingsof existing approaches we introduce an end-to-end learnableframework which combines the training of the classificationand translation network. The reconstruction loss terms man-age to let the network predict the visual properties, while theclassification loss lets it enhance distinguishable features forthe trained task. Results on the RSNA STR Pulmonary Em-bolism Detection dataset indicate that our approach providesa successful domain adaptation to monoenergetic imagery asit outperforms existing image-translation methods for paireddata, while using the same or less parameters.
5. CONCLUSION
The proposed joint optimization strategy allows training ofreconstruction of monoenergetic images without losing fea-tures necessary for the classification process. Our method,hereby, improves noticeably over straight forward classifica-tion, while outperforming existing methods.
6. ACKNOWLEDGEMENTS
The present contribution is supported by the Helmholtz As-sociation under the joint research school “HIDSS4Health -Helmholtz Information and Data Science School for Health”.
7. REFERENCES [1] Prabhakar Rajiah, Suhny Abbara, and Sandra SimonHalliburton, “Spectral detector ct for cardiovascular ap-plications,”
Diagnostic and Interventional Radiology ,vol. 23, no. 3, pp. 187, 2017. [2] Rachid Fahmi, Brendan L Eck, Jacob Levi, Anas Fares,Amar Dhanantwari, Mani Vembar, Hiram G Bezerra,and David L Wilson, “Quantitative myocardial perfu-sion imaging in a porcine ischemia model using a pro-totype spectral detector ct system,”
Physics in Medicine& Biology , vol. 61, no. 6, pp. 2407, 2016.[3] Jakob Weiss, Mike Notohamiprodjo, Malte Bongers,Christoph Schabel, Stefanie Mangold, Konstantin Niko-laou, Fabian Bamberg, and Ahmed E Othman, “Effectof noise-optimized monoenergetic postprocessing on di-agnostic accuracy for detecting incidental pulmonaryembolism in portal-venous phase dual-energy computedtomography,”
Investigative radiology , vol. 52, no. 3, pp.142–147, 2017.[4] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei AEfros, “Image-to-image translation with conditional ad-versarial networks,” in
Proceedings of the IEEE confer-ence on computer vision and pattern recognition , 2017,pp. 1125–1134.[5] Jelmer M Wolterink, Anna M Dinkla, Mark HFSavenije, Peter R Seevinck, Cornelis AT van den Berg,and Ivana Iˇsgum, “Deep mr to ct synthesis using un-paired data,” in
International workshop on simulationand synthesis in medical imaging . Springer, 2017, pp.14–23.[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 770–778.[7] “RSNA STR Pulmonary Embolism Detec-tion, , Accessed: 2020-10-25,” .[8] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, AndrewTao, Jan Kautz, and Bryan Catanzaro, “High-resolutionimage synthesis and semantic manipulation with con-ditional gans,” in
Proceedings of the IEEE conferenceon computer vision and pattern recognition , 2018, pp.8798–8807.[9] Qifeng Chen and Vladlen Koltun, “Photographic imagesynthesis with cascaded refinement networks,” in
Pro-ceedings of the IEEE international conference on com-puter vision , 2017, pp. 1511–1520.[10] M. Saquib Sarfraz, Constantin Seibold, Haroon Khalid,and Rainer Stiefelhagen, “Content and colour distilla-tion for learning image translations with the spatial pro-file loss,” in