[PDF] Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors

Abstract

Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure. In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples. We consider this to be the most realistic scenario for MedIA systems. Firstly, we study the effect of weight initialization (ImageNet vs. random) on the transferability of adversarial attacks from the surrogate model to the target model. Secondly, we study the influence of differences in development data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks. Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when developing security-critical MedIA systems planned to be deployed in clinical practice.

Full PDF

AAdversarial Attack Vulnerability of MedicalImage Analysis Systems: Unexplored Factors

Suzanne C. Wetstein ,(cid:63) , Cristina Gonz´alez-Gonzalo , ,(cid:63) , Gerda Bortsova ,(cid:63) ,Bart Liefers , , Florian Dubost , Ioannis Katramados , Laurens Hogeweg ,Bram van Ginneken , Josien P.W. Pluim , Marleen de Bruijne , , Clara I.S´anchez , , , and Mitko Veta Medical Image Analysis Group, Department of Biomedical Engineering, EindhovenUniversity of Technology, Eindhoven, The Netherlands A-Eye Research Group, Diagnostic Image Analysis Group, Department ofRadiology and Nuclear Medicine, Radboudumc, Nijmegen, The Netherlands Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen,The Netherlands Biomedical Imaging Group Rotterdam, Erasmus MC, The Netherlands COSMONiO, The Netherlands Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine,Radboudumc, Nijmegen, The Netherlands Department of Computer Science, University of Copenhagen, Denmark Department of Ophthalmology. Radboudumc, Nijmegen, The Netherlands

Abstract.

Adversarial attacks are considered a potentially serious secu-rity threat for machine learning systems. Medical image analysis (MedIA)systems have recently been argued to be particularly vulnerable to adver-sarial attacks due to strong ﬁnancial incentives. In this paper, we studyseveral previously unexplored factors aﬀecting adversarial attack vulner-ability of deep learning MedIA systems in three medical domains: oph-thalmology, radiology and pathology. Firstly, we study the eﬀect of vary-ing the degree of adversarial perturbation on the attack performance andits visual perceptibility. Secondly, we study how pre-training on a publicdataset (ImageNet) aﬀects the models’ vulnerability to attacks. Thirdly,we study the inﬂuence of data and model architecture disparity betweentarget and attacker models. Our experiments show that the degree ofperturbation signiﬁcantly aﬀects both performance and human percep-tibility of attacks. Pre-training may dramatically increase the transfer ofadversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or modelarchitecture between target and attacker models substantially decreasesthe success of attacks. We believe that these factors should be consideredwhen designing cybersecurity-critical MedIA systems, as well as kept inmind when evaluating their vulnerability to adversarial attacks.

Keywords:

Adversarial Attacks · Medical Imaging · Deep Learning (cid:63) indicates equal contribution a r X i v : . [ c s . CR ] J un S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al.

Deep learning (DL) has been shown to achieve close or even superior performanceto that of experts in medical image analysis (MedIA) applications, includingin ophthalmology [1, 2], radiology [3], and pathology [4–6]. This has createdan opportunity for automation of certain tasks and the subsequent regulatoryapproval for the integration of DL systems in clinical settings [7].A threat to DL systems is posed by so-called “adversarial attacks”. Suchattacks apply a carefully engineered, subtle perturbation to the target model’sinput to cause misclassiﬁcation. Such perturbed inputs, called “adversarial ex-amples”, have been shown eﬀective in fooling state-of-the-art systems [8, 9]. Ad-versarial attack methods have been proposed for scenarios assuming diﬀerent de-grees of knowledge of the target system [10]: from having full knowledge (“white-box” attacks) [8] to being agnostic to the (hyper)parameters of the target model(“black-box” attacks) [11]. The latter usually use another network, commonlyreferred to as surrogate , to craft adversarial examples.Finlayson et al. [12, 13] have recently argued that adversarial attacks posea disproportionately large threat in the medical domain due to two factors:ﬁrst, certain parties involved in healthcare systems have very strong ﬁnancialincentives to adversarially manipulate medical data, including images; second,certain characteristics of medical data and technological infrastructure aroundit may allow more eﬀective and less detectable attacks.Several studies have investigated adversarial attack vulnerability of DL Me-dIA systems for classiﬁcation and segmentation in diﬀerent imaging modali-ties, including color fundus (CF) imaging [13–15], chest X-ray [13, 14, 16], der-moscopy [13–15, 17], and brain MRI [17]. In these studies, adversarial attackswere proven eﬀective in both white- and black-box settings. However, some cru-cial aspects of adversarial attacks on MedIA systems have not been studied yet:

Perturbation degree and perceptibility of attacks : Most studies [13,16, 17] only used one perturbation degree in their experiments, although thisparameter highly aﬀects performance. One study [14] analyzed the impact ofdiﬀerent degrees of perturbation, but only in a white-box setting. To our knowl-edge, no studies explored the eﬀect of perturbation degree in black-box settings,which are more realistic. Furthermore, existing studies rarely discuss visual per-ceptibility of perturbations in adversarial examples, which might compromisethe attack’s eﬀectiveness in MedIA settings where human input is required.

Pre-training : Pre-training may positively aﬀect the transfer of adversarialattacks between target and surrogate models, since it increases the similaritybetween them. This could mean that this popular design choice [18] should bereconsidered as it poses a security risk. Existing studies on adversarial attacksoften use target and surrogate models that were pre-trained on the same data,speciﬁcally ImageNet [13, 14, 17], but do not study the inﬂuence of such pre-training on attack transferability.

Data and model architecture disparity : Although some studies analyzedblack-box attack transferability between targets and surrogates not sharing thesame network architecture [16, 17], all studies assumed perfect data parity, i.e. dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 3 surrogate and target models were trained on the exact same subset of the samedataset. This assumption is highly unrealistic when applied to real-world DLMedIA systems, which are most often closed source and use large amounts ofprivate training data [19–21].In our study, we investigate these aspects of adversarial attacks in threeMedIA applications: detection of referable diabetic retinopathy in CF images,classiﬁcation of pathologies in chest X-Ray, and breast cancer metastasis de-tection in histological lymph node sections. Our ﬁndings have implications onthe design of cyber-secure DL MedIA systems and on practices for evaluatingadversarial attack vulnerability of these systems in realistic attack scenarios.

In this study, we used two adversarial attack methods that were most commonlyused in the literature with high eﬀectiveness [13–17]: fast gradient sign method(FGSM) [8] and projected gradient descent (PGD) [9]. In FGSM, the adversarialperturbation is computed as the sign of the gradient of the loss with respect tothe input image. This adversarial perturbation is subsequently multiplied by aparameter (cid:15) , to control the perturbation degree, and added to the target image x to create an adversarial example: x adv = x + (cid:15) sign ( ∇ x L ( f ( x ; θ ) , y ), where L represents the loss, f the selected network architecture, θ the correspondingparameters, and y the image label. PGD is an iterative version of FGSM, inwhich several steps for computing the perturbation and adding it to the input areperformed: x ( i +1) adv = clip (cid:15)x (cid:8) x ( i ) + α sign ( ∇ x L ( f ( x ( i ) ; θ ) , y ) (cid:9) , where α controls thestep size and (cid:15) is the parameter regulating the maximum amount of perturbationadded to every pixel. We applied both methods in the black-box setting, sincewe consider it to be the most realistic setting for MedIA systems. In this setting, f (cid:48) ( · , θ (cid:48) ) of a surrogate model is used to compute the attack and transfer it to atarget model.To control that the target model performance is reduced solely due to the ad-versarial perturbation, we additionally computed “control” noise. While existingworks chose standard noise distributions such as Gaussian for this purpose [17],we chose to compare adversarial perturbations with its spatially shuﬄed versionto ensure the same degree of perturbation in adversarial and “control” examples. We used two architectures as both target and surrogate models. We chose Inception-v3 [22] and Densenet-121 [23] as both were previously applied in our selected ap-plications and achieved good performance [1,3,24,25]. All networks were trainedusing Adam optimization with learning rate decay and binary cross-entropy loss.For the dataset used in each application, a development and a test set weredeﬁned. The development set was used for training and validation of the target

S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al. and surrogate models. We randomly divided all development sets, at patient-level, into two non-overlapping, equal-sized parts: d1 and d2 to be able to studythe inﬂuence of data parity on attack transferability. A third set, d2/2 , wascreated by randomly sampling half of d2 to study the inﬂuence of dataset size.The independent test set was used to measure the performance of each modelon clean and adversarial examples. The description of each dataset and dataset-speciﬁc network parameters are stated below. Ophthalmology

We used the Kaggle dataset for diabetic retinopathy (DR)detection [26], which contains 88,702 color fundus images with manually-labeledDR severity. In order to have more images available for development, as pro-posed in Finlayson et al. [13], we merged the original training (35,126 images)and test sets (53,576 images) and split the images randomly at patient-level fordevelopment (88%) and testing (12%).Pre-processing included extracting the ﬁeld of view and rescaling to 512 × Radiology

We used the ChestX-Ray14 dataset [27], consisting of 112,120 frontal-view X-rays annotated with 14 pathology labels. The oﬃcial data split (80%-20%) was used to deﬁne our development and test sets.Pre-processing included downsampling images to 256 ×

256 resolution. Weused translation and horizontal ﬂipping for data augmentation.

Pathology

We used the PatchCamelyon (PCam) [25] dataset, which contains327,680 patches extracted from histopathology images of lymph node sections,labeled with the presence of metastatic tissue in the patch center. The oﬃcialdata split (90%-10%) was used to deﬁne our development and test sets.The top layers of both model architectures were replaced with a global aver-age pooling layer followed by a dense layer with one output and sigmoid activa-tion to be able to handle the 96 ×

96 resolution of the input. As data augmen-tation, we used ﬂipping and color augmentation.

Perturbation degree and perceptibility of attacks

In the ﬁrst experiment,we studied the performance of FGSM and PGD attacks under diﬀerent degrees ofperturbation (controlled by (cid:15) ) and the visual perceptibility of the perturbations.We evaluated the attacks for (cid:15) : 0.02, 0.04, and 0.06. These values were appliedto images rescaled between -1 and 1. In early experiments, we found (cid:15) = 0 . (cid:15) . For the PGD attacks, we used α = 0 .

01 and 20iterations. In this experiment, all models were randomly initialized and trainedon the same partition of the development set, d1 . dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 5 Pre-training

In the second experiment, we measured the attack eﬀectivenesswhen target and surrogate are both pre-trained on ImageNet, both randomlyinitialized, or have diﬀerent initializations (pre-trained or random). For thispurpose, we trained four versions of each architecture (two pre-trained and tworandomly initialized) to cover all possible target-surrogate combinations in black-box settings, using the same partition of the development set, d1 . Data and model architecture disparity

This experiment focused on theeﬀect of disparity in the data used for the development of target and surrogatemodel, as well as its interaction with architecture disparity. Here, we trainedfour randomly-initialized versions of each architecture: a target model trainedon d1 and three surrogate models trained on d1 , d2 , and d2/2 , respectively.In all experimental setups, the performance of the target models on the testset of each dataset was measured using the area under the receiver operatingcharacteristic curve (AUC) or mean AUC for the multi-class case. The results of our experiments with diﬀerent attack methods (FGSM and PGD)at diﬀerent perturbation degrees can be found in Table 1. Higher perturbationdegrees lead to substantially lower performance of target models. Experimentswith spatially shuﬄed noise suggest that at higher noise magnitudes part of theperformance drop was due to image corruption by the noise, though to a rathersmall extent. FGSM and PGD performed similarly. Based on this observation,we chose to use both attacks in our subsequent experiments and report averageresults.Figure 1 shows original images and their adversarial counterparts computedusing FGSM attack at diﬀerent perturbation degrees. As can be seen, applying

Table 1.

Eﬀects of perturbation degree on attack transferability. Average performance(AUC) over two model architectures is shown when using FGSM, PGD or “control”noise (spatially shuﬄed adversarial perturbations) with varying perturbation degrees.Data Noise FGSM PGD (cid:15) = 0.02 0.04 0.06 0.02 0.04 0.06Ophthalmology - 0.86Ophthalmology adversarial 0.44 0.32 0.33 0.56 0.37 0.34Ophthalmology shuﬄed 0.85 0.79 0.73 0.85 0.84 0.84Radiology - 0.80Radiology adversarial 0.62 0.57 0.55 0.64 0.54 0.49Radiology shuﬄed 0.80 0.79 0.77 0.80 0.80 0.79Pathology - 0.87Pathology adversarial 0.56 0.38 0.33 0.56 0.41 0.36Pathology shuﬄed 0.87 0.87 0.87 0.87 0.87 0.87 S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al. the same amount of perturbation to diﬀerent imaging modalities has a diﬀerenteﬀect on the perceptibility of the perturbation. In this experiment, we usedour own visual perception. For the ophthalmology and pathology datasets, wefound the perturbation perceptible at (cid:15) = 0 .

04 or larger. For the radiologydataset, the perturbations were already perceptible at (cid:15) = 0 .

02, albeit quitesubtle. These diﬀerences in perceptibility could occur because of diﬀerences incolor, homogeneity, contrast, and resolution between the imaging modalities.Furthermore, the judgement of perceptibility is subjective and depends on thebackground and goal of the observer. Adversarial attack perceptibility by trainedmedical experts could be examined in future studies.In summary, we found that perturbation degree signiﬁcantly aﬀects bothperformance and visual perceptibility of attacks. It is important to study higherperturbation degrees to not underestimate the attack vulnerability of the studied

Fig. 1.

Original and adversarial images created with FGSM using diﬀerent perturba-tion magnitudes.dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 7 system. However, an attack performed using a conspicuous degree of perturba-tion could be easily discovered by a (trained) human and thus neutralized. Fol-lowing this logic, for our further experiments, we chose to report attacks using (cid:15) = 0 .

02, as for two out of three applications this was the highest perturbationdegree that was still visually subtle.

Table 2 summarizes our experiments on the eﬀect of pre-training on adversarialattack transferability. In the ophthalmology and radiology datasets, the attacktransferability between pre-trained models was substantially higher than thatbetween randomly initialized models. This eﬀect was very pronounced in theophthalmology dataset, in which pre-training also gave the highest performanceboost on clean examples. In both datasets, the eﬀect was consistent: for all eightcombinations of attack method and target and surrogate pairs, pre-trained tar-gets had lower performance when attacked by pre-trained surrogates, comparedto their randomly initialized counterparts. In the pathology dataset, however,the opposite eﬀect was observed with similar consistency.In the ophthalmology and pathology datasets, pre-trained targets were con-sistently less vulnerable to the attacks by randomly initialized surrogates thanrandomly initialized targets to the attacks of pre-trained surrogates. The op-posite consistent eﬀect was observed in the radiology dataset. On average, pre-trained networks were moderately more vulnerable to attacks in ophthalmologyand radiology datasets and slightly less in the pathology dataset.We believe the increased transfer between pre-trained models may occurbecause their decision boundaries are more similar than those of randomly ini-tialized models. This may hold more strongly for models where pre-training de-creases the number of training epochs and/or improves the performance. Overall,we observed that pre-training MedIA networks on ImageNet may dramaticallyincrease the transfer of adversarial examples; the larger the performance gainachieved by pre-training, the larger the transfer. We believe this eﬀect should beconsidered when designing secure DL MedIA systems for deployment in clinicalpractice, as well as in future studies on adversarial attacks, since pre-training is

Table 2.

Eﬀects of pre-training on attack transferability. Average performance (AUC)over FGSM and PGD ( (cid:15) =0.02) and two model architectures is shown. Average relativeperformance with respect to the no-attack setting is shown in brackets.Target Surrogate Ophthalmology Radiology PathologyImageNet - 0.94 (100%) 0.82 (100%) 0.87 (100%)ImageNet ImageNet 0.12 (13%) Avg: 0.52 (64%) Avg: 0.68 (78%) Avg:ImageNet Random 0.83 (88%) 51% 0.64 (78%) 71% 0.73 (84%) 81%Random - 0.86 (100%) 0.80 (100%) 0.87 (100%)Random ImageNet 0.67 (79%) Avg: 0.73 (91%) Avg: 0.65 (75%) Avg:Random Random 0.50 (58%) 69% 0.63 (79%) 85% 0.56 (65%) 70% S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al. an optional (although popular) design choice and introduces a possible vulnera-bility to attacks.

The eﬀects of data and model architecture disparity between target and surro-gate models can be seen in Table 3. For all datasets, networks were substantiallyless susceptible to attacks crafted using surrogates with the same architecturebut trained on a diﬀerent data subset ( d2 or d2/2 ). This held for both target ar-chitectures and both attack methods. Decreasing the surrogate training set size(from d2 to d2/2 ) led to a drop in the attack performance for the ophthalmologyand radiology data. When the architecture of the surrogate was diﬀerent, how-ever, data disparity between the target and surrogate substantially decreasedthe attack performance only for the ophthalmology data. Disparity in the modelarchitecture had greater eﬀect on attack performance than disparity in datafor radiology and pathology data; for the ophthalmology data, it had equal orsmaller eﬀect, depending on the degree of data disparity.We believe that, since most MedIA systems are closed source and use privatetraining data, the attack scenario in which data and model parameters of targetand surrogate do not (completely) overlap is more realistic than one assumingdata and model parity. Our results show that in case of disparity the attacksperform substantially poorer than in case of parity, which is commonly assumedby existing studies [13,14,16,17]. By the same token, designers of MedIA systemscould consider using private data rather than public; keeping model informationprivate; and designing custom systems instead of using standard architectures. In our experiments, we observed that higher perturbation levels lead to increasedsuccess of attacks, but also to increased visual perceptibility, which might com-promise their eﬀectiveness in MedIA settings where human input is required.

Table 3.

Eﬀects of data and model architecture parity on attack transferability. Aver-age performance (AUC) over FGSM and PGD ( (cid:15) =0.02) and two model architectures isshown, with surrogate models trained on diﬀerent sets. Average relative performancewith respect to the no-attack setting is shown in brackets.Architecture Training set Ophthalmology Radiology Pathology- - 0.86 (100%) 0.80 (100%) 0.87 (100%)Same d1 0.44 (52%) 0.55 (69%) 0.41 (47%)Same d2 0.56 (65%) 0.64 (80%) 0.67 (77%)Same d2/2 0.75 (88%) 0.66 (83%) 0.65 (75%)Diﬀerent d1 0.55 (65%) 0.70 (88%) 0.71 (82%)Diﬀerent d2 0.66 (77%) 0.70 (88%) 0.74 (85%)Diﬀerent d2/2 0.80 (93%) 0.72 (90%) 0.71 (81%)dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 9

We observed that pre-training MedIA networks on ImageNet may dramaticallyincrease the transfer of adversarial examples; the larger the performance gainachieved by pre-training, the larger the transfer. Lastly, dataset and model ar-chitecture disparity between target and surrogate models can substantially de-crease the success of attacks. We believe that these factors should be consideredin the design of cybersecurity-critical MedIA systems, as well as kept in mindwhen evaluating vulnerability of these systems to adversarial attacks.

References

1. Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., etal: Development and validation of a deep learning algorithm for detection of diabeticretinopathy in retinal fundus photographs. JAMA. (22), 2402–2410 (2016)2. Ting, D. S. W., Cheung, C. Y. L., Lim, G., Tan, G. S. W., Quang, N. D., Gan, A., etal: Development and validation of a deep learning system for diabetic retinopathyand related eye diseases using retinal images from multiethnic populations withdiabetes. JAMA. (22), 2211–2223 (2017)3. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T. et al: Chexnet:Radiologist-level pneumonia detection on chest x-rays with deep learning. arXivpreprint arXiv:1711.05225. (2017)4. Bejnordi, B.E., Veta, M., Van Diest, P.J., Van Ginneken, B., Karssemeijer, N.,Litjens, G., et al: Diagnostic assessment of deep learning algorithms for detectionof lymph node metastases in women with breast cancer. JAMA (22), 2199–2210(2017)5. Bulten, W., Pinckaers, H., van Boven, H., Vink, R., de Bel, T., van Ginneken, B.,et al: Automated deep-learning system for Gleason grading of prostate cancer usingbiopsies: a diagnostic study. The Lancet Oncology. In press. (2020)6. Wetstein, S. C., Onken, A. M., Luﬀman, C., Baker, G. M., Pyle, M. E., Kensler,K. H., et al: Deep learning assessment of breast terminal duct lobular unitinvolution: towards automated prediction of breast cancer risk. arXiv preprintarXiv:1911.00036. (2019)7. Abrmoﬀ, M. D., Lavin, P. T., Birch, M., Shah, N., & Folk, J. C.: Pivotal trial ofan autonomous AI-based diagnostic system for detection of diabetic retinopathy inprimary care oﬃces. NPJ digital medicine. (1), 1–8. (2018)8. Goodfellow, I. J., Shlens, J., & Szegedy, C.: Explaining and harnessing adversarialexamples. arXiv preprint arXiv:1412.6572 (2014)9. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A.: Towards deep learn-ing models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. (2017)10. Yuan, X., He, P., Zhu, Q., & Li, X.: Adversarial examples: Attacks and defenses fordeep learning. IEEE transactions on neural networks and learning systems. (9),2805–2824. (2019)11. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A.:Practical black-box attacks against machine learning. In Proceedings of the 2017ACM on Asia conference on computer and communications security, 506–519. (2017)12. Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I.S.: Adversarial attacks on medical machine learning. Science (6433), 1287–1289(2019)13. Finlayson, S. G., Chung, H. W., Kohane, I. S., & Beam, A. L.: Adversarial attacksagainst medical deep learning systems. arXiv preprint arXiv:1804.05296 (2018)0 S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al.14. Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., & Lu, F.: Understand-ing Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems.arXiv preprint arXiv:1907.10456. (2019)15. Ozbulak, U., Van Messem, A., & De Neve, W.: Impact of Adversarial Examples onDeep Learning Models for Biomedical Image Segmentation. In International Confer-ence on Medical Image Computing and Computer-Assisted Intervention, 300–308.Springer, Cham. (2019)16. Taghanaki, S. A., Das, A., & Hamarneh, G.: Vulnerability analysis of chest x-rayimage classiﬁcation against adversarial attacks. In Understanding and Interpret-ing Machine Learning in Medical Image Computing Applications, 87–94. Springer,Cham. (2018)17. Paschali, M., Conjeti, S., Navarro, F., & Navab, N.: Generalizability vs. robustness:adversarial examples for medical imaging. arXiv preprint arXiv:1804.00504. (2018)18. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M.et al: A survey on deep learning in medical image analysis. Medical image analysis. , 60–88. (2017)19. Abrmoﬀ, M. D., Lou, Y., Erginay, A., Clarida, W., Amelon, R., Folk, J. C., &Niemeijer, M.: Improved automated detection of diabetic retinopathy on a publiclyavailable dataset through integration of deep learning. Investigative ophthalmology& visual science.57