Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors
Gerda Bortsova, Cristina González-Gonzalo, Suzanne C. Wetstein, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bart Liefers, Bram van Ginneken, Josien P.W. Pluim, Mitko Veta, Clara I. Sánchez, Marleen de Bruijne
AAdversarial Attack Vulnerability of MedicalImage Analysis Systems: Unexplored Factors
Suzanne C. Wetstein ,(cid:63) , Cristina Gonz´alez-Gonzalo , ,(cid:63) , Gerda Bortsova ,(cid:63) ,Bart Liefers , , Florian Dubost , Ioannis Katramados , Laurens Hogeweg ,Bram van Ginneken , Josien P.W. Pluim , Marleen de Bruijne , , Clara I.S´anchez , , , and Mitko Veta Medical Image Analysis Group, Department of Biomedical Engineering, EindhovenUniversity of Technology, Eindhoven, The Netherlands A-Eye Research Group, Diagnostic Image Analysis Group, Department ofRadiology and Nuclear Medicine, Radboudumc, Nijmegen, The Netherlands Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen,The Netherlands Biomedical Imaging Group Rotterdam, Erasmus MC, The Netherlands COSMONiO, The Netherlands Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine,Radboudumc, Nijmegen, The Netherlands Department of Computer Science, University of Copenhagen, Denmark Department of Ophthalmology. Radboudumc, Nijmegen, The Netherlands
Abstract.
Adversarial attacks are considered a potentially serious secu-rity threat for machine learning systems. Medical image analysis (MedIA)systems have recently been argued to be particularly vulnerable to adver-sarial attacks due to strong financial incentives. In this paper, we studyseveral previously unexplored factors affecting adversarial attack vulner-ability of deep learning MedIA systems in three medical domains: oph-thalmology, radiology and pathology. Firstly, we study the effect of vary-ing the degree of adversarial perturbation on the attack performance andits visual perceptibility. Secondly, we study how pre-training on a publicdataset (ImageNet) affects the models’ vulnerability to attacks. Thirdly,we study the influence of data and model architecture disparity betweentarget and attacker models. Our experiments show that the degree ofperturbation significantly affects both performance and human percep-tibility of attacks. Pre-training may dramatically increase the transfer ofadversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or modelarchitecture between target and attacker models substantially decreasesthe success of attacks. We believe that these factors should be consideredwhen designing cybersecurity-critical MedIA systems, as well as kept inmind when evaluating their vulnerability to adversarial attacks.
Keywords:
Adversarial Attacks · Medical Imaging · Deep Learning (cid:63) indicates equal contribution a r X i v : . [ c s . CR ] J un S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al.
Deep learning (DL) has been shown to achieve close or even superior performanceto that of experts in medical image analysis (MedIA) applications, includingin ophthalmology [1, 2], radiology [3], and pathology [4–6]. This has createdan opportunity for automation of certain tasks and the subsequent regulatoryapproval for the integration of DL systems in clinical settings [7].A threat to DL systems is posed by so-called “adversarial attacks”. Suchattacks apply a carefully engineered, subtle perturbation to the target model’sinput to cause misclassification. Such perturbed inputs, called “adversarial ex-amples”, have been shown effective in fooling state-of-the-art systems [8, 9]. Ad-versarial attack methods have been proposed for scenarios assuming different de-grees of knowledge of the target system [10]: from having full knowledge (“white-box” attacks) [8] to being agnostic to the (hyper)parameters of the target model(“black-box” attacks) [11]. The latter usually use another network, commonlyreferred to as surrogate , to craft adversarial examples.Finlayson et al. [12, 13] have recently argued that adversarial attacks posea disproportionately large threat in the medical domain due to two factors:first, certain parties involved in healthcare systems have very strong financialincentives to adversarially manipulate medical data, including images; second,certain characteristics of medical data and technological infrastructure aroundit may allow more effective and less detectable attacks.Several studies have investigated adversarial attack vulnerability of DL Me-dIA systems for classification and segmentation in different imaging modali-ties, including color fundus (CF) imaging [13–15], chest X-ray [13, 14, 16], der-moscopy [13–15, 17], and brain MRI [17]. In these studies, adversarial attackswere proven effective in both white- and black-box settings. However, some cru-cial aspects of adversarial attacks on MedIA systems have not been studied yet:
Perturbation degree and perceptibility of attacks : Most studies [13,16, 17] only used one perturbation degree in their experiments, although thisparameter highly affects performance. One study [14] analyzed the impact ofdifferent degrees of perturbation, but only in a white-box setting. To our knowl-edge, no studies explored the effect of perturbation degree in black-box settings,which are more realistic. Furthermore, existing studies rarely discuss visual per-ceptibility of perturbations in adversarial examples, which might compromisethe attack’s effectiveness in MedIA settings where human input is required.
Pre-training : Pre-training may positively affect the transfer of adversarialattacks between target and surrogate models, since it increases the similaritybetween them. This could mean that this popular design choice [18] should bereconsidered as it poses a security risk. Existing studies on adversarial attacksoften use target and surrogate models that were pre-trained on the same data,specifically ImageNet [13, 14, 17], but do not study the influence of such pre-training on attack transferability.
Data and model architecture disparity : Although some studies analyzedblack-box attack transferability between targets and surrogates not sharing thesame network architecture [16, 17], all studies assumed perfect data parity, i.e. dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 3 surrogate and target models were trained on the exact same subset of the samedataset. This assumption is highly unrealistic when applied to real-world DLMedIA systems, which are most often closed source and use large amounts ofprivate training data [19–21].In our study, we investigate these aspects of adversarial attacks in threeMedIA applications: detection of referable diabetic retinopathy in CF images,classification of pathologies in chest X-Ray, and breast cancer metastasis de-tection in histological lymph node sections. Our findings have implications onthe design of cyber-secure DL MedIA systems and on practices for evaluatingadversarial attack vulnerability of these systems in realistic attack scenarios.
In this study, we used two adversarial attack methods that were most commonlyused in the literature with high effectiveness [13–17]: fast gradient sign method(FGSM) [8] and projected gradient descent (PGD) [9]. In FGSM, the adversarialperturbation is computed as the sign of the gradient of the loss with respect tothe input image. This adversarial perturbation is subsequently multiplied by aparameter (cid:15) , to control the perturbation degree, and added to the target image x to create an adversarial example: x adv = x + (cid:15) sign ( ∇ x L ( f ( x ; θ ) , y ), where L represents the loss, f the selected network architecture, θ the correspondingparameters, and y the image label. PGD is an iterative version of FGSM, inwhich several steps for computing the perturbation and adding it to the input areperformed: x ( i +1) adv = clip (cid:15)x (cid:8) x ( i ) + α sign ( ∇ x L ( f ( x ( i ) ; θ ) , y ) (cid:9) , where α controls thestep size and (cid:15) is the parameter regulating the maximum amount of perturbationadded to every pixel. We applied both methods in the black-box setting, sincewe consider it to be the most realistic setting for MedIA systems. In this setting, f (cid:48) ( · , θ (cid:48) ) of a surrogate model is used to compute the attack and transfer it to atarget model.To control that the target model performance is reduced solely due to the ad-versarial perturbation, we additionally computed “control” noise. While existingworks chose standard noise distributions such as Gaussian for this purpose [17],we chose to compare adversarial perturbations with its spatially shuffled versionto ensure the same degree of perturbation in adversarial and “control” examples. We used two architectures as both target and surrogate models. We chose Inception-v3 [22] and Densenet-121 [23] as both were previously applied in our selected ap-plications and achieved good performance [1,3,24,25]. All networks were trainedusing Adam optimization with learning rate decay and binary cross-entropy loss.For the dataset used in each application, a development and a test set weredefined. The development set was used for training and validation of the target
S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al. and surrogate models. We randomly divided all development sets, at patient-level, into two non-overlapping, equal-sized parts: d1 and d2 to be able to studythe influence of data parity on attack transferability. A third set, d2/2 , wascreated by randomly sampling half of d2 to study the influence of dataset size.The independent test set was used to measure the performance of each modelon clean and adversarial examples. The description of each dataset and dataset-specific network parameters are stated below. Ophthalmology
We used the Kaggle dataset for diabetic retinopathy (DR)detection [26], which contains 88,702 color fundus images with manually-labeledDR severity. In order to have more images available for development, as pro-posed in Finlayson et al. [13], we merged the original training (35,126 images)and test sets (53,576 images) and split the images randomly at patient-level fordevelopment (88%) and testing (12%).Pre-processing included extracting the field of view and rescaling to 512 × Radiology
We used the ChestX-Ray14 dataset [27], consisting of 112,120 frontal-view X-rays annotated with 14 pathology labels. The official data split (80%-20%) was used to define our development and test sets.Pre-processing included downsampling images to 256 ×
256 resolution. Weused translation and horizontal flipping for data augmentation.
Pathology
We used the PatchCamelyon (PCam) [25] dataset, which contains327,680 patches extracted from histopathology images of lymph node sections,labeled with the presence of metastatic tissue in the patch center. The officialdata split (90%-10%) was used to define our development and test sets.The top layers of both model architectures were replaced with a global aver-age pooling layer followed by a dense layer with one output and sigmoid activa-tion to be able to handle the 96 ×
96 resolution of the input. As data augmen-tation, we used flipping and color augmentation.
Perturbation degree and perceptibility of attacks
In the first experiment,we studied the performance of FGSM and PGD attacks under different degrees ofperturbation (controlled by (cid:15) ) and the visual perceptibility of the perturbations.We evaluated the attacks for (cid:15) : 0.02, 0.04, and 0.06. These values were appliedto images rescaled between -1 and 1. In early experiments, we found (cid:15) = 0 . (cid:15) . For the PGD attacks, we used α = 0 .
01 and 20iterations. In this experiment, all models were randomly initialized and trainedon the same partition of the development set, d1 . dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 5 Pre-training
In the second experiment, we measured the attack effectivenesswhen target and surrogate are both pre-trained on ImageNet, both randomlyinitialized, or have different initializations (pre-trained or random). For thispurpose, we trained four versions of each architecture (two pre-trained and tworandomly initialized) to cover all possible target-surrogate combinations in black-box settings, using the same partition of the development set, d1 . Data and model architecture disparity
This experiment focused on theeffect of disparity in the data used for the development of target and surrogatemodel, as well as its interaction with architecture disparity. Here, we trainedfour randomly-initialized versions of each architecture: a target model trainedon d1 and three surrogate models trained on d1 , d2 , and d2/2 , respectively.In all experimental setups, the performance of the target models on the testset of each dataset was measured using the area under the receiver operatingcharacteristic curve (AUC) or mean AUC for the multi-class case. The results of our experiments with different attack methods (FGSM and PGD)at different perturbation degrees can be found in Table 1. Higher perturbationdegrees lead to substantially lower performance of target models. Experimentswith spatially shuffled noise suggest that at higher noise magnitudes part of theperformance drop was due to image corruption by the noise, though to a rathersmall extent. FGSM and PGD performed similarly. Based on this observation,we chose to use both attacks in our subsequent experiments and report averageresults.Figure 1 shows original images and their adversarial counterparts computedusing FGSM attack at different perturbation degrees. As can be seen, applying
Table 1.
Effects of perturbation degree on attack transferability. Average performance(AUC) over two model architectures is shown when using FGSM, PGD or “control”noise (spatially shuffled adversarial perturbations) with varying perturbation degrees.Data Noise FGSM PGD (cid:15) = 0.02 0.04 0.06 0.02 0.04 0.06Ophthalmology - 0.86Ophthalmology adversarial 0.44 0.32 0.33 0.56 0.37 0.34Ophthalmology shuffled 0.85 0.79 0.73 0.85 0.84 0.84Radiology - 0.80Radiology adversarial 0.62 0.57 0.55 0.64 0.54 0.49Radiology shuffled 0.80 0.79 0.77 0.80 0.80 0.79Pathology - 0.87Pathology adversarial 0.56 0.38 0.33 0.56 0.41 0.36Pathology shuffled 0.87 0.87 0.87 0.87 0.87 0.87 S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al. the same amount of perturbation to different imaging modalities has a differenteffect on the perceptibility of the perturbation. In this experiment, we usedour own visual perception. For the ophthalmology and pathology datasets, wefound the perturbation perceptible at (cid:15) = 0 .
04 or larger. For the radiologydataset, the perturbations were already perceptible at (cid:15) = 0 .
02, albeit quitesubtle. These differences in perceptibility could occur because of differences incolor, homogeneity, contrast, and resolution between the imaging modalities.Furthermore, the judgement of perceptibility is subjective and depends on thebackground and goal of the observer. Adversarial attack perceptibility by trainedmedical experts could be examined in future studies.In summary, we found that perturbation degree significantly affects bothperformance and visual perceptibility of attacks. It is important to study higherperturbation degrees to not underestimate the attack vulnerability of the studied
Fig. 1.
Original and adversarial images created with FGSM using different perturba-tion magnitudes.dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 7 system. However, an attack performed using a conspicuous degree of perturba-tion could be easily discovered by a (trained) human and thus neutralized. Fol-lowing this logic, for our further experiments, we chose to report attacks using (cid:15) = 0 .
02, as for two out of three applications this was the highest perturbationdegree that was still visually subtle.
Table 2 summarizes our experiments on the effect of pre-training on adversarialattack transferability. In the ophthalmology and radiology datasets, the attacktransferability between pre-trained models was substantially higher than thatbetween randomly initialized models. This effect was very pronounced in theophthalmology dataset, in which pre-training also gave the highest performanceboost on clean examples. In both datasets, the effect was consistent: for all eightcombinations of attack method and target and surrogate pairs, pre-trained tar-gets had lower performance when attacked by pre-trained surrogates, comparedto their randomly initialized counterparts. In the pathology dataset, however,the opposite effect was observed with similar consistency.In the ophthalmology and pathology datasets, pre-trained targets were con-sistently less vulnerable to the attacks by randomly initialized surrogates thanrandomly initialized targets to the attacks of pre-trained surrogates. The op-posite consistent effect was observed in the radiology dataset. On average, pre-trained networks were moderately more vulnerable to attacks in ophthalmologyand radiology datasets and slightly less in the pathology dataset.We believe the increased transfer between pre-trained models may occurbecause their decision boundaries are more similar than those of randomly ini-tialized models. This may hold more strongly for models where pre-training de-creases the number of training epochs and/or improves the performance. Overall,we observed that pre-training MedIA networks on ImageNet may dramaticallyincrease the transfer of adversarial examples; the larger the performance gainachieved by pre-training, the larger the transfer. We believe this effect should beconsidered when designing secure DL MedIA systems for deployment in clinicalpractice, as well as in future studies on adversarial attacks, since pre-training is
Table 2.
Effects of pre-training on attack transferability. Average performance (AUC)over FGSM and PGD ( (cid:15) =0.02) and two model architectures is shown. Average relativeperformance with respect to the no-attack setting is shown in brackets.Target Surrogate Ophthalmology Radiology PathologyImageNet - 0.94 (100%) 0.82 (100%) 0.87 (100%)ImageNet ImageNet 0.12 (13%) Avg: 0.52 (64%) Avg: 0.68 (78%) Avg:ImageNet Random 0.83 (88%) 51% 0.64 (78%) 71% 0.73 (84%) 81%Random - 0.86 (100%) 0.80 (100%) 0.87 (100%)Random ImageNet 0.67 (79%) Avg: 0.73 (91%) Avg: 0.65 (75%) Avg:Random Random 0.50 (58%) 69% 0.63 (79%) 85% 0.56 (65%) 70% S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al. an optional (although popular) design choice and introduces a possible vulnera-bility to attacks.
The effects of data and model architecture disparity between target and surro-gate models can be seen in Table 3. For all datasets, networks were substantiallyless susceptible to attacks crafted using surrogates with the same architecturebut trained on a different data subset ( d2 or d2/2 ). This held for both target ar-chitectures and both attack methods. Decreasing the surrogate training set size(from d2 to d2/2 ) led to a drop in the attack performance for the ophthalmologyand radiology data. When the architecture of the surrogate was different, how-ever, data disparity between the target and surrogate substantially decreasedthe attack performance only for the ophthalmology data. Disparity in the modelarchitecture had greater effect on attack performance than disparity in datafor radiology and pathology data; for the ophthalmology data, it had equal orsmaller effect, depending on the degree of data disparity.We believe that, since most MedIA systems are closed source and use privatetraining data, the attack scenario in which data and model parameters of targetand surrogate do not (completely) overlap is more realistic than one assumingdata and model parity. Our results show that in case of disparity the attacksperform substantially poorer than in case of parity, which is commonly assumedby existing studies [13,14,16,17]. By the same token, designers of MedIA systemscould consider using private data rather than public; keeping model informationprivate; and designing custom systems instead of using standard architectures. In our experiments, we observed that higher perturbation levels lead to increasedsuccess of attacks, but also to increased visual perceptibility, which might com-promise their effectiveness in MedIA settings where human input is required.
Table 3.
Effects of data and model architecture parity on attack transferability. Aver-age performance (AUC) over FGSM and PGD ( (cid:15) =0.02) and two model architectures isshown, with surrogate models trained on different sets. Average relative performancewith respect to the no-attack setting is shown in brackets.Architecture Training set Ophthalmology Radiology Pathology- - 0.86 (100%) 0.80 (100%) 0.87 (100%)Same d1 0.44 (52%) 0.55 (69%) 0.41 (47%)Same d2 0.56 (65%) 0.64 (80%) 0.67 (77%)Same d2/2 0.75 (88%) 0.66 (83%) 0.65 (75%)Different d1 0.55 (65%) 0.70 (88%) 0.71 (82%)Different d2 0.66 (77%) 0.70 (88%) 0.74 (85%)Different d2/2 0.80 (93%) 0.72 (90%) 0.71 (81%)dversarial Attack Vulnerability of MedIA Systems: Unexplored Factors 9
We observed that pre-training MedIA networks on ImageNet may dramaticallyincrease the transfer of adversarial examples; the larger the performance gainachieved by pre-training, the larger the transfer. Lastly, dataset and model ar-chitecture disparity between target and surrogate models can substantially de-crease the success of attacks. We believe that these factors should be consideredin the design of cybersecurity-critical MedIA systems, as well as kept in mindwhen evaluating vulnerability of these systems to adversarial attacks.
References
1. Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., etal: Development and validation of a deep learning algorithm for detection of diabeticretinopathy in retinal fundus photographs. JAMA. (22), 2402–2410 (2016)2. Ting, D. S. W., Cheung, C. Y. L., Lim, G., Tan, G. S. W., Quang, N. D., Gan, A., etal: Development and validation of a deep learning system for diabetic retinopathyand related eye diseases using retinal images from multiethnic populations withdiabetes. JAMA. (22), 2211–2223 (2017)3. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T. et al: Chexnet:Radiologist-level pneumonia detection on chest x-rays with deep learning. arXivpreprint arXiv:1711.05225. (2017)4. Bejnordi, B.E., Veta, M., Van Diest, P.J., Van Ginneken, B., Karssemeijer, N.,Litjens, G., et al: Diagnostic assessment of deep learning algorithms for detectionof lymph node metastases in women with breast cancer. JAMA (22), 2199–2210(2017)5. Bulten, W., Pinckaers, H., van Boven, H., Vink, R., de Bel, T., van Ginneken, B.,et al: Automated deep-learning system for Gleason grading of prostate cancer usingbiopsies: a diagnostic study. The Lancet Oncology. In press. (2020)6. Wetstein, S. C., Onken, A. M., Luffman, C., Baker, G. M., Pyle, M. E., Kensler,K. H., et al: Deep learning assessment of breast terminal duct lobular unitinvolution: towards automated prediction of breast cancer risk. arXiv preprintarXiv:1911.00036. (2019)7. Abrmoff, M. D., Lavin, P. T., Birch, M., Shah, N., & Folk, J. C.: Pivotal trial ofan autonomous AI-based diagnostic system for detection of diabetic retinopathy inprimary care offices. NPJ digital medicine. (1), 1–8. (2018)8. Goodfellow, I. J., Shlens, J., & Szegedy, C.: Explaining and harnessing adversarialexamples. arXiv preprint arXiv:1412.6572 (2014)9. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A.: Towards deep learn-ing models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. (2017)10. Yuan, X., He, P., Zhu, Q., & Li, X.: Adversarial examples: Attacks and defenses fordeep learning. IEEE transactions on neural networks and learning systems. (9),2805–2824. (2019)11. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A.:Practical black-box attacks against machine learning. In Proceedings of the 2017ACM on Asia conference on computer and communications security, 506–519. (2017)12. Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I.S.: Adversarial attacks on medical machine learning. Science (6433), 1287–1289(2019)13. Finlayson, S. G., Chung, H. W., Kohane, I. S., & Beam, A. L.: Adversarial attacksagainst medical deep learning systems. arXiv preprint arXiv:1804.05296 (2018)0 S.C. Wetstein (cid:63) , C. Gonz´alez-Gonzalo (cid:63) , G. Bortsova (cid:63) et al.14. Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., & Lu, F.: Understand-ing Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems.arXiv preprint arXiv:1907.10456. (2019)15. Ozbulak, U., Van Messem, A., & De Neve, W.: Impact of Adversarial Examples onDeep Learning Models for Biomedical Image Segmentation. In International Confer-ence on Medical Image Computing and Computer-Assisted Intervention, 300–308.Springer, Cham. (2019)16. Taghanaki, S. A., Das, A., & Hamarneh, G.: Vulnerability analysis of chest x-rayimage classification against adversarial attacks. In Understanding and Interpret-ing Machine Learning in Medical Image Computing Applications, 87–94. Springer,Cham. (2018)17. Paschali, M., Conjeti, S., Navarro, F., & Navab, N.: Generalizability vs. robustness:adversarial examples for medical imaging. arXiv preprint arXiv:1804.00504. (2018)18. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M.et al: A survey on deep learning in medical image analysis. Medical image analysis. , 60–88. (2017)19. Abrmoff, M. D., Lou, Y., Erginay, A., Clarida, W., Amelon, R., Folk, J. C., &Niemeijer, M.: Improved automated detection of diabetic retinopathy on a publiclyavailable dataset through integration of deep learning. Investigative ophthalmology& visual science.57