[PDF] Adversarial Robustness Study of Convolutional Neural Network for Lumbar Disk Shape Reconstruction from MR images

Abstract

Machine learning technologies using deep neural networks (DNNs), especially convolutional neural networks (CNNs), have made automated, accurate, and fast medical image analysis a reality for many applications, and some DNN-based medical image analysis systems have even been FDA-cleared. Despite the progress, challenges remain to build DNNs as reliable as human expert doctors. It is known that DNN classifiers may not be robust to noises: by adding a small amount of noise to an input image, a DNN classifier may make a wrong classification of the noisy image (i.e., in-distribution adversarial sample), whereas it makes the right classification of the clean image. Another issue is caused by out-of-distribution samples that are not similar to any sample in the training set. Given such a sample as input, the output of a DNN will become meaningless. In this study, we investigated the in-distribution (IND) and out-of-distribution (OOD) adversarial robustness of a representative CNN for lumbar disk shape reconstruction from spine MR images. To study the relationship between dataset size and robustness to IND adversarial attacks, we used a data augmentation method to create training sets with different levels of shape variations. We utilized the PGD-based algorithm for IND adversarial attacks and extended it for OOD adversarial attacks to generate OOD adversarial samples for model testing. The results show that IND adversarial training can improve the CNN robustness to IND adversarial attacks, and larger training datasets may lead to higher IND robustness. However, it is still a challenge to defend against OOD adversarial attacks.

Full PDF

AAdversarial Robustness Study of Convolutional Neural Network for Lumbar Disk Shape Reconstruction from MR images

Jiasong Chen a , Linchen Qian a , Timur Urakov b , Weiyong Gu c , Liang Liang* a a Dept. of Computer Science, b Dept. of Neurological Surgery, and c Dept. of Mechanical & Aerospace Engineering at University of Miami, Coral Gables, FL 33146. *Email: [email protected]

ABSTRACT

Keywords: deep neural network, adversarial robustness, in-distribution, out-of-distribution, lumbar disk image INTRODUCTION

Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have become the method of choice for medical image analysis, automating the entire analysis process in an end-to-end fashion while being as accurate as human doctors in many applications [1-3]. Several DNN-based medical image analysis systems have been FDA-cleared [4] and will make impacts on human lives. Despite the progress, challenges remain to build DNNs as reliable as human expert doctors. From the perspective of technology robustness, two challenges need to be resolved: The first challenge is DNN robustness to noises. It has been found that DNNs are not robust to a type of noise, called adversarial noise [5]. Adversarial noise was first discovered by Szegedy et al. et al. detection in classification applications, but our recent work shows that none of these methods are effective: they cannot distinguish between IND samples and OOD samples generated by OOD adversarial attacks [17]. For medical image analysis, reconstruction autoencoders were used for OOD detection with the assumption that the reconstruction error of an OOD sample will be much higher than the reconstruction errors of IND samples [18]. In this study, we will show that this reconstruction-based OOD detection method is ineffective for our application of spine image analysis. We were interested in lumbar spine image analysis for disk degeneration assessment. Human intervertebral discs [19] undergo a process of profound degeneration as early as the age of 12, and this process may manifest as discogenic low back pain, disc herniation, spinal stenosis, and/or spondylolisthesis, which may require surgical or non-surgical treatments to reduce pain and restore normal functions. Using magnetic resonance (MR) imaging, disc degeneration can be revealed in terms of disk geometry deformation and signal strength degradation. Currently, diagnosis of disk degeneration from MR images is largely manual, which is time-consuming and labor-intensive, resulting in high cost. Recently, CNNs were developed for spine image analysis [20, 21] with the hope that machine learning algorithms may eventually serve as an AI doctor to automatically make a diagnosis. We conducted this robustness study to investigate the feasibility of using CNNs for automated spine disk degeneration assessment from images without human intervention. The key to the medical assessment is to obtain the shapes of the individual disks, from which medical features can be calculated to determine the level of disk degeneration [22]. To reconstruct the shape of a disk, either image segmentation or shape regression can be applied using a CNN. Disk image segmentation is to classify the pixels of an image to one of the two classes: disk vs. the other region. Disk shape regression is to obtain the coordinates of the points on disk boundary (contour) from the image of a disk. To this end, we developed a U-net style CNN for both image segmentation and shape regression, and we investigated the robustness of this CNN with different training strategies. (a) an example of IND adversarial attack (b) an example of OOD adversarial attack Figure. 1. Examples of IND and OOD adversarial attacks. The "ground-truth" disk boundary/contour is shown in red. Given a clean image as input, the boundary output of the CNN model is shown in green. Given an adversarial image as input, the boundary output of the model is shown in blue. A binary image is a segmentation output of the model.

Before we present our methods and results, we would like to make a distinction between IND adversarial attacks and OOD adversarial attacks, using the examples in Fig.1. Given a clean image 𝑥 of a disk as input, a CNN model will output the boundary/contour 𝑦 (𝑎 ) of the disk and the binary segmentation map 𝑦 (𝑏 ) . A noisy image 𝑥 𝛿 = 𝑥 + 𝛿 is obtained by adding a small amount of noise 𝛿 to 𝑥 through an IND adversarial attack. Given 𝑥 𝛿 as input, the CNN model will output the boundary/contour 𝑦 (𝑎 ) of the disk and the binary segmentation map 𝑦 (𝑏 ) . In Fig.1(a), the L-infinity (Linf) vector norm of the noise is 0.03, i.e., ‖𝛿‖ = 0.03 . The noise generated by an IND adversarial attack is small, but the CNN model changes its outputs significantly: the Dice similarity coefficient between the model-output segmentation and the ground- truth segmentation is reduced from 0.961 to 0.146, and the Dice similarity coefficient between the model-output contour and the ground-truth contour is reduced from 0.949 to 0.660. To compute Dice between two contours, the region enclosed by each contour is found first, and then the Dice between the two regions of the contours is calculated. For an IND adversarial attack, the goal is to generate an adversarial IND sample 𝑥 𝛿 with a small amount of noise such that 𝑥 𝛿 still looks like 𝑥 (by human eyes) but the CNN model will change its outputs significantly. For an OOD adversarial attack, the goal is to generate an adversarial OOD sample 𝑥 𝑜𝑢𝑡 as input such that the corresponding outputs of the CNN model will be almost the same as the outputs of the model given the IND sample 𝑥 𝑖𝑛 = 𝑥 as input. In Fig.1(b), 𝑥 𝑜𝑢𝑡 is a very noisy CT image of the lung, not looking like 𝑥 𝑖𝑛 , but the corresponding outputs are almost identical to those of the IND sample 𝑥 𝑖𝑛 . Since the goals of IND and OOD adversarial attacks are completely different, different strategies are needed to defend against the attacks. To handle IND adversarial attacks, a model should be insensitive to noise in the input. To handle OOD adversarial attacks, a model should be able to distinguish between IND and OOD samples. METHODS

In this section, we provide the details of the CNN structure and dataset, explain the algorithm and loss functions for IND and OOD adversarial attacks, describe the process for IND adversarial training, and summarize the OOD deteciton method using reconstruction autoencoder.

Dataset

The dataset consists of de-identified lumbar spine MR images of 100 patients from University of Miami medical school. Three human experts manually annotated the boundaries and landmarks of the lumbar disks and vertebrae on the mid-sagittal MR image of each patient, by following the protocol in [22]. The best annotation (i.e., ground-truth) for each image was obtained through discussion to reach a consensus among the experts. Images and shapes from 80 patients were used for training; the images and shapes from the remaining patients were used for testing. Each lumbar image contains 5 disks. We cropped each image into squared regions, and each region contains a disk at the center. Then, each cropped region was resized to 128 ×128 pixels. A disk boundary is represented by a 2D contour that has 176 points. We applied PCA to the shapes (i.e., contours) in the training set to build three statistical shape models (SSM) [23]: SSM-P3, SSM-P5, SSM-P10, which contain 3, 5, 10 principle components to cover 70.5%, 82.2%, and 91.4% of the total variations, respectively. Each SSM was used to generate an augmented training set of 640,000 virtual shapes. The augmented training set from SSM-P10 contains much larger shape-variations than the augmented training set from SSM-P3. For each virtual shape 𝑠̃ , a virtual image 𝑥̃ is generated in the following steps: (1) randomly select an image 𝑥 in the original training set, which is accompanied with the ground-truth shape 𝑠 ∗ , (2) compute the thin-plate-spline (TPS) transform from 𝑠̃ to 𝑠 ∗ , and (3) apply the TPS transform to image 𝑥 and the warped image 𝑥̃ is the virtual image of the virtual shape 𝑠̃ . In this way, we obtained three augmented training sets. An example is shown in Fig. 2. Figure 2. (left) a real image with the ground-truth disk shape. (middle) virtual shape. (right) virtual image.

CNN Structure

The structure of the CNN in the U-net style is shown in Fig. 3. The encoder is based on Resnet-18 [24], which contains convolutions layers and residual connections. The decoder has transposed-convolution blocks, and there are concatenative skip-connections from the encoder to the decoder. The input to the encoder is a disk image (128×128), and the output from the decoder is a binary segmentation map. The output of the encoder is connected to a block of fully-connected layers, which outputs the boundary/contour of the disk. LeakyReLU is used as activation. GroupNorm [25] is used for normalization as a replacement of BatchNorm [26] that often makes the network unstable for regression.

Figure 3. The structure of the CNN for shape (boundary) regression and image segmentation.

To train the CNN model for regression and segmentation, the loss function

𝐿(𝑥) is used: 𝐿 𝑟𝑒𝑔 (𝑥) = ‖𝑠̂ − 𝑠‖ (1) 𝐿 𝑠𝑒𝑔 (𝑥) = 0.5(1 − 𝐷𝑖𝑐𝑒(𝑚̂ , 𝑚)) + 0.5𝐶𝐸(𝑚̂ , 𝑚) (2) 𝐿(𝑥) = 𝐿 𝑟𝑒𝑔 (𝑥) + 𝐿 𝑠𝑒𝑔 (𝑥) (3) 𝐿 𝑟𝑒𝑔 (𝑥) is the shape regression loss. 𝐷𝑖𝑐𝑒 is the function to compute dice similarity coefficient. 𝐶𝐸 is the function to compute cross-entropy. 𝐿 𝑠𝑒𝑔 (𝑥) is the image segmentation loss, consisting of Dice loss and CE loss. 𝑠 is the ground-truth contour (i.e., an array of the coordinates of the 176 points on the contour), and 𝑠̂ is the model output for the input 𝑥 . 𝑚 is the ground-truth segmentation map, and 𝑚̂ is the model output for the input 𝑥 . The Algorithm for IND and OOD Adversarial Attacks

Algorithm 1: Adversarial Attack (IND or OOD)

Input : 𝑥 , a sample (e.g., an image) in a dataset. 𝑓(𝑥) , the neural network for regression or segmentation/classification. 𝐽 , the objective function of the attack, which will be maximized. 𝜀, the maximum perturbation measured by Lp vector norm. N , the total number of iterations. 𝛼 , the learning rate of the optimizer (e.g., Adamax) 𝑥 𝑖𝑛𝑖𝑡 , a sample for algorithm initialization Output : an adversarial sample 𝑥 𝜀 , s.t. 𝐽(𝑥 𝜀 ) ≫ 𝐽(𝑥) and ‖𝑥 𝜀 − 𝑥‖ 𝑝 ≤ 𝜀 Process:

1: generate a random noise 𝜉 with ‖𝜉‖ 𝑝 ≤ 𝜀

2: initialize 𝑥 𝜀 = 𝑐𝑙𝑖𝑝(𝑥 𝑖𝑛𝑖𝑡 + 𝜉) for n from 1 to N: 4: 𝑥 𝜀 ← 𝑐𝑙𝑖𝑝 (𝑥 𝜀 + 𝛼 ∙ ℎ(𝐽 ′ (𝑥 𝜀 ))) , where 𝐽 ′ (𝑥) = 𝜕𝐽𝜕𝑥 Note: The 𝑐𝑙𝑖𝑝 operation ensures that ‖𝑥 𝜀 − 𝑥‖ 𝑝 ≤ 𝜀 . The 𝑐𝑙𝑖𝑝 operation also ensures that pixel values stay within the feasible range (e.g. 0 to 1). If L-inf vector norm is used, ℎ(𝐽 ′ ) is the sign function; and if L2 vector norm is used, ℎ(𝐽 ′ ) is a function that normalizes 𝐽 ′ by its L2 norm. The Algorithm for adversarial attacks is described above. Although IND and OOD adversarial attacks have completely different goals, the same algorithm with different objective functions can be used to implement the attacks. The Algorithm is based on projected-gradient descent (PGD) [27], which is widely used for robustness evaluation [28] against IND adversarial attacks. We recently applied the Algorithm for OOD adversarial attacks that are largely underexplored [17]. The goal of the Algorithm is to generate an adversarial sample such that the objective function for the adversarial attack is maximized. For an IND adversarial attack, the objective function should measure the difference between an output of the model and the corresponding ground-truth, and in our application, the Algorithm will try to enlarge the error of shape output from the CNN model by adding a small amount of noise 𝛿 to the clean image 𝑥 , as shown in Fig.1(a). The magnitude of the noise 𝛿 is constrained (i.e., ‖𝛿‖ 𝑝 ≤ 𝜀 ) so that the noisy image, 𝑥 𝜀 = 𝛿 + 𝑥 , looks like the clean image 𝑥 . We set 𝑥 𝑖𝑛𝑖𝑡 = 𝑥 . To evaluate the robustness of a CNN model, two objectives, 𝐽 𝐼𝑁𝐷_𝑟𝑒𝑔 and 𝐽 𝐼𝑁𝐷_𝑠𝑒𝑔 are used in the Algorithm for shape regression and image segmentation, respectively: 𝐽 𝐼𝑁𝐷_𝑟𝑒𝑔 = ‖𝑠̂ − 𝑠‖ (4) 𝐽 𝐼𝑁𝐷_𝑠𝑒𝑔 = 1 − 𝐷𝑖𝑐𝑒(𝑚̂, 𝑚) (5) where 𝑠 is the ground-truth contour and 𝑠̂ is the model output for the input 𝑥 𝜀 ; 𝑚 is the ground-truth segmentation and 𝑚̂ is the model output for the input 𝑥 𝜀 . For an OOD adversarial attack, as shown in Fig. 1(b), we set 𝑥 𝑖𝑛𝑖𝑡 to be an OOD sample not similar to any sample in the dataset. For example, 𝑥 𝑖𝑛𝑖𝑡 could be a CT image, and 𝑥 is a lumbar disk image. The Algorithm will modify 𝑥 𝑖𝑛𝑖𝑡 to obtain an OOD adversarial sample 𝑥 𝑜𝑢𝑡 = 𝑥 𝜀 such that the model outputs for the input 𝑥 𝜀 will be almost the same as the model outputs for the input 𝑥 . Thus, to evaluate the ability of a CNN model or a CNN-based method to detect OOD samples, the two objectives, 𝐽 𝑂𝑂𝐷_𝑟𝑒𝑔 and 𝐽 𝑂𝑂𝐷_𝑠𝑒𝑔 , for OOD adversarial attacks to regression and segmentation are chosen to be: 𝐽 𝑂𝑂𝐷_𝑟𝑒𝑔 = −‖𝑠̂ − 𝑠‖ (6) 𝐽 𝑂𝑂𝐷_𝑠𝑒𝑔 = 𝐷𝑖𝑐𝑒(𝑚̂, 𝑚) (7)

IND Adversarial Training to Defend Against IND Adversarial Attacks

A natural idea to improve robustness against noise is to add noise to the images and train the CNN model on clean and noisy images. Therefore, to defend against IND adversarial attacks, we could add IND adversarial samples to the training set, and train the model on clean and noisy/adversarial samples, which is known as adversarial training and effective for many image classification applications [28, 29]. It is very time-consuming to generate adversarial samples using the Algorithm, and therefore we set N , the total number of iterations to be 20. The noise level ɛ is set to 0.07 (Linf norm). 𝛼 is set to . The loss function for adversarial training is: 𝐿 𝑎𝑑𝑣_𝑟𝑠 = 0.5𝐿(𝑥) + 0.5 (0.5𝐿(𝑥 𝜀(𝑟𝑒𝑔) ) + 0.5𝐿(𝑥 𝜀(𝑠𝑒𝑔) )) (8) where 𝐿 is defined by Eq.(3). 𝑥 𝜀(𝑟𝑒𝑔) is an adversarial sample generated by only attacking the regression output, and 𝑥 𝜀(𝑠𝑒𝑔) is another adversarial sample generated by only attacking the segmentation output. We also applied two variants of the adversarial training, and the loss functions are: 𝐿 𝑎𝑑𝑣_𝑟 = 0.5𝐿(𝑥) + 0.5𝐿(𝑥 𝜀(𝑟𝑒𝑔) ) (9) 𝐿 𝑎𝑑𝑣_𝑠 = 0.5𝐿(𝑥) + 0.5𝐿(𝑥 𝜀(𝑠𝑒𝑔) ) (10) Using the loss function 𝐿 𝑎𝑑𝑣_𝑟 , the adversarial training aims to improve shape regression robustness. Using the loss function 𝐿 𝑎𝑑𝑣_𝑠 , the adversarial training aims to improve image segmentation robustness. As a weak form of adversarial training, we performed model training with uniform random noise, i.e., adding uniform random noise to the training images ( 𝑥 𝑟𝑎𝑛𝑑 = 𝑥 + 𝑛𝑜𝑖𝑠𝑒 ). We set the maximum noise amplitude to 0.07 (Linf form). The loss function for model training is: 𝐿 𝑟𝑎𝑛𝑑 = 0.5𝐿(𝑥) + 0.5𝐿(𝑥 𝑟𝑎𝑛𝑑 ) (11) Reconstruction-based OOD Detection to Defend Against OOD Adversarial Attacks

For medical image analysis, reconstruction autoencoders have been used for OOD detection with the assumption that the reconstruction error of an OOD sample will be much higher than the reconstruction errors of IND samples [18]. In our application, the CNN can be used for reconstruction-based OOD detection after a simple modification: adding another output channel in addition to the segmentation output, and this channel will output the reconstructed version 𝑥 𝑟𝑒𝑐 of the input 𝑥 . If the reconstruction error ‖𝑥 𝑜𝑢𝑡_𝑟𝑒𝑐 − 𝑥 𝑜𝑢𝑡 ‖ of an ODD sample 𝑥 𝑜𝑢𝑡 is larger than the reconstruction errors (i.e., ‖𝑥 𝑖𝑛_𝑟𝑒𝑐 − 𝑥 𝑖𝑛 ‖ ) of many samples in the training set, then 𝑥 𝑜𝑢𝑡 is identified to be OOD. OOD detection is classification, and reconstruction error is classification score. By varying the classification threshold in a large range, the area under the receiver operating characteristic curve (AUROC) can be obtained to measure the performance of OOD detection [30]. AUROC is in the range of 0 to 1: if AUROC is less than 0.5, then OOD detection is ineffective and no better than a random guess. For model training, a reconstruction loss 𝐿 𝑟𝑒𝑐 needs to be added (e.g., added to L in Eq.(3)): 𝐿 𝑟𝑒𝑐 (𝑥) = ∑ |𝑥 𝑟𝑒𝑐 [𝑖] − 𝑥[𝑖]| (12) Here, 𝑥 𝑟𝑒𝑐 [𝑖] is the value of the pixel- i of the reconstruction output from the model. 𝑥[𝑖] is the value of the corresponding pixel of the input image. RESULTS

Robustness Against IND Adversarial Attacks

To study model robustness against IND adversarial attacks, we trained 15 models that have the same structure (Fig. 3). For clarity, we gave these models different names. The name prefix of a model could be "P3", "P5" or "P10", which means the data generated from SSM-P3, SSM-P5, or SSM-P10 were used for training the model. "P3_std", "P5_std", and "P10_std" are the models trained with the loss in Eq.(3) only on clean data. "P3_rand", "P5_rand", and "P10_rand" are the models trained with the loss 𝐿 𝑟𝑎𝑛𝑑 . "P3_adv_rs", "P5_adv_rs", and "P10_adv_rs" are the models trained with the loss 𝐿 𝑎𝑑𝑣_𝑟𝑠 (i.e., adversarial training). "P3_adv_r", "P5_adv_r", and "P10_adv_r" are the models trained with the loss 𝐿 𝑎𝑑𝑣_𝑟 . "P3_adv_s", "P5_adv_s", and "P10_adv_s" are the models trained with the loss 𝐿 𝑎𝑑𝑣_𝑠 . The Adamax optimizer was used with the default parameters. The number of training epochs is 100, and batch size is 64. For model evaluation, the noise level 𝜀 is varied in a large range (0.01, 0.03, 0.05, 0.07, 0.1, 0.2), 𝛼 is set to 𝜀/5 , and the maximum number of iterations 𝑁 is set to 100 in the Algorithm. Given a noise level, the shape regression performance of a model is measured by 𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 and

𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 , defined by

𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 = 𝑡𝑒𝑠𝑡 𝑖 𝑚𝑎𝑥 ∑ ∑ 𝑑𝑖𝑠𝑡(𝑠̂ 𝑘 [𝑖], 𝑠 𝑘 [𝑖]) 𝑖 𝑚𝑎𝑥 𝑖=1𝐾 𝑡𝑒𝑠𝑡 𝑘=1 (13) 𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 = 𝑡𝑒𝑠𝑡 ∑ 𝐷𝑖𝑐𝑒(𝑠̂ 𝑘 , 𝑠 𝑘 ) 𝐾 𝑡𝑒𝑠𝑡 𝑘=1 (14) 𝐾 𝑡𝑒𝑠𝑡 is the number of the samples in the test set. 𝑖 𝑚𝑎𝑥 is the number of points on a disk boundary, and 𝑑𝑖𝑠𝑡(𝑠̂ 𝑘 [𝑖], 𝑠 𝑘 [𝑖]) is the Euclidean distance between the point- i on the k -th ground-truth shape 𝑠 𝑘 and the corresponding point of the shape 𝑠̂ 𝑘 output from the model. The image segmentation performance of a model is measured by 𝐷𝐼𝐶𝐸_𝑠𝑒𝑔 , defined by

𝐷𝐼𝐶𝐸_𝑠𝑒𝑔 = 𝑡𝑒𝑠𝑡 ∑ 𝐷𝑖𝑐𝑒(𝑚̂ 𝑘 , 𝑚 𝑘 ) 𝐾 𝑡𝑒𝑠𝑡 𝑘=1 (15) The results are reported in Table 1 to 3 and visualized in Fig. 4 to 7. Examples are shown in Fig. 8&9. Table 1: Shape regression errors (

𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 ) of the models under different adversarial noise levels. noise 0 0.01 0.03 0.05 0.07 0.1 0.2 P03_std 2.4913 4.0939 8.5369 10.1408 10.4531 10.525 10.5399 p05_std 2.0531 3.6349 10.024 15.2504 17.3851 18.6632 19.7239 P10_std 1.6886 3.917 12.781 21.5228 23.0385 23.527 24.3022 P03_rand 2.4726 3.8805 8.5904 10.6784 11.3548 11.652 12.6683 p05_rand 2.0398 3.4223 10.5357 18.1591 20.3922 21.5689 22.2194 P10_rand 1.684 3.6932 12.4964 22.6029 24.4443 25.2174 25.9216 P03_adv_r 2.6603 2.7958 3.1141 3.5179 4.0906 5.1555 8.4548 P05_adv_r 2.3019 2.5395 3.0886 3.7282 4.464 5.9477 9.3167 P10_adv_r 2.0098 2.3218 2.9138 3.5319 4.281 5.9074 9.5416 P03_adv_s 2.6959 2.9068 3.4149 4.0121 4.7687 6.1286 9.6913 P05_adv_s 2.1874 2.5595 3.4835 4.5297 5.7151 7.5313 11.3804 P10_adv_s 1.9462 2.5385 3.8151 5.046 6.2884 8.2526 18.1757

P03_adv_rs 2.657 2.8121 3.1788 3.6429 4.2769 5.4411 8.7731 P05_adv_rs 2.1857 2.4476 3.0946 3.8609 4.7454 6.2276 9.7687 P10_adv_rs 1.9608 2.3597 3.1337 3.9293 4.7708 6.2132 9.6021

Table 2: Shape regression accuracy (

𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 ) of the models under different adversarial noise levels. noise 0 0.01 0.03 0.05 0.07 0.1 0.2 P03_std 0.8772 0.8487 0.7063 0.6429 0.6235 0.6256 0.6216 p05_std 0.9064 0.8651 0.593 0.2895 0.1853 0.1377 0.1106 P10_std 0.9315 0.8906 0.5445 0.1888 0.1366 0.1224 0.105 P03_rand 0.8776 0.8522 0.7169 0.6266 0.5832 0.5638 0.5024 p05_rand 0.9063 0.8699 0.5943 0.2065 0.1282 0.1023 0.0917 P10_rand 0.9319 0.8959 0.5834 0.1897 0.1375 0.1154 0.0965 P03_adv_r 0.8705 0.8663 0.8573 0.8455 0.8287 0.7991 0.6977 P05_adv_r 0.8986 0.8885 0.8656 0.8403 0.8093 0.7407 0.6248 P10_adv_r 0.9186 0.9076 0.8857 0.8602 0.8311 0.7518 0.5721 P03_adv_s 0.8733 0.8685 0.8594 0.85 0.8348 0.8016 0.6727 P05_adv_s 0.906 0.8954 0.8698 0.8422 0.8124 0.7523 0.5795 P10_adv_s 0.9247 0.9142 0.8912 0.8643 0.8307 0.7504 0.3013 P03_adv_rs 0.8714 0.8671 0.8586 0.846 0.8273 0.7951 0.6916 P05_adv_rs 0.9059 0.8965 0.8708 0.8419 0.8064 0.74 0.6057 P10_adv_rs 0.9224 0.9109 0.8856 0.8579 0.8256 0.765 0.6131

Table 3: Image segmentation accuracy (

𝐷𝐼𝐶𝐸_𝑠𝑒𝑔 ) of the models under different adversarial noise levels. noise 0 0.01 0.03 0.05 0.07 0.1 0.2 P03_std 0.8822 0.7846 0.177 0.0268 0 0 0 p05_std 0.9161 0.8328 0.297 0.0537 0.0108 0 0 P10_std 0.9446 0.847 0.2907 0.0468 0.0013 0 0 P03_rand 0.8812 0.7951 0.2027 0.0343 0 0 0 p05_rand 0.914 0.8444 0.3888 0.0795 0.0206 0 0 P10_rand 0.9441 0.8579 0.3196 0.0498 0.007 0 0 P03_adv_r 0.8863 0.8598 0.7866 0.6532 0.4601 0.129 0 P05_adv_r 0.9009 0.8247 0.6385 0.4216 0.1869 0.0396 0 P10_adv_r 0.9228 0.8245 0.5922 0.3451 0.1838 0.0569 0 P03_adv_s 0.8755 0.8698 0.8574 0.8416 0.8142 0.7264 0.1144 P05_adv_s 0.9119 0.9001 0.872 0.8367 0.7929 0.6611 0.137 P10_adv_s 0.9321 0.9152 0.8827 0.8473 0.8007 0.6607 0.1936 P03_adv_rs 0.8742 0.8685 0.856 0.8387 0.8069 0.7133 0.1178 P05_adv_rs 0.9106 0.8958 0.8657 0.8263 0.7751 0.614 0.0701 P10_adv_rs 0.9266 0.9097 0.8755 0.833 0.7827 0.6378 0.0538 (a) shape regression

𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 (b) shape regression

𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 (c) image segmentation

𝐷𝐼𝐶𝐸_𝑠𝑒𝑔

Figure 4. The performance of the models trained only on clean data dropped quickly (DICE < 0.5) even when the noise was very small (0.03). The robustness did not improve by training on noisy images with uniform random noises. (a) shape regression

𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 (b) shape regression

𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 (c) image segmentation

𝐷𝐼𝐶𝐸_𝑠𝑒𝑔

Figure 5. The shape regression and image segmentation robustness were significantly improved by IND adversarial training with the loss 𝐿 𝑎𝑑𝑣_𝑟𝑠 . When noise is small, the model (P10_adv_rs) trained on a larger dataset performed better than the model (P3_adv_rs) trained on a smaller dataset. By comparing P10_adv_rs and P10_std, it can be seen that the adversarial training had a side effect that model performance on clean data (noise level is 0) may drop. (a) shape regression 𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 (b) shape regression

𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 (c) image segmentation

𝐷𝐼𝐶𝐸_𝑠𝑒𝑔

Figure 6. The shape regression robustness was significantly improved by IND adversarial training with the loss 𝐿 𝑎𝑑𝑣_𝑟 , and the image segmentation robustness was improved only slightly by the adversarial training. When noise is small, the model (P10_adv_r) trained on a larger dataset performed better than the model (P3_adv_r) trained on a smaller dataset. By comparing P10_adv_r and P10_std, it can be seen that the adversarial training had a side effect that model performance on clean data (noise level is 0) dropped. (a) shape regression 𝐸𝑟𝑟𝑜𝑟_𝑟𝑒𝑔 (b) shape regression

𝐷𝐼𝐶𝐸_𝑟𝑒𝑔 (c) image segmentation

𝐷𝐼𝐶𝐸_𝑠𝑒𝑔

Figure 7. The image segmentation robustness was significantly improved by IND adversarial training with the loss 𝐿 𝑎𝑑𝑣_𝑠 . The shape regression robustness was also significantly improved by the adversarial training. When noise is small, the model (P10_adv_s) trained on a larger dataset performed better than the model (P3_adv_s) trained on a smaller dataset. By comparing P10_adv_s and P10_std, it can be seen that the adversarial training led to a decrease of model performance on clean data. Figure 8. Shape regression outputs of three models under the IND adversarial attack with the loss 𝐽 𝐼𝑁𝐷_𝑟𝑒𝑔 . The ground-truth shape is shown in red color. Each output shape contour is shown in green color.

Figure 9. Image segmentation outputs of three models under the IND adversarial attack with the loss 𝐽 𝐼𝑁𝐷_𝑠𝑒𝑔 . The boundaries of the regions in the output segmentation map are traced and visualized in green color. The ground-truth shape is shown in red color.

Robustness Against OOD Adversarial Attacks

We first studied how the CNN model trained with the standard loss (i.e., P10_std) would behave under OOD adversarial attacks. In the Algorithm, we set 𝑥 𝑖𝑛𝑖𝑡 to be a CT image of the lung, which is taken from an open CT dataset [31]. An example of the OOD adversarial attack to the CNN model has been shown in Fig. 1(b). During the OOD adversarial attack with 𝐽 𝑂𝑂𝐷_𝑟𝑒𝑔 or 𝐽 𝑂𝑂𝐷_𝑠𝑒𝑔 , for each sample 𝑥 in the test set, a corresponding adversarial sample 𝑥 𝑜𝑢𝑡 was generated, and the model output for 𝑥 and the model output for 𝑥 𝑜𝑢𝑡 were compared using DICE, and two histograms of the DICE scores were obtained and shown in Fig. 10, which indicate that the CNN model was unable to "see" the difference between the IND samples in the test set and the OOD samples (noisy CT images generated by the OOD adversarial attack). Figure 10. The results of

OOD adversarial attack to the CNN model on the test set. We then evaluated the reconstruction-based OOD detection method (Section 2.5). The data from SSM-P10 were used for model training with different loss functions. The histograms of reconstruction errors are shown in Fig. 11, which indicate that the method is unable to distinguish between IND and OOD samples, and its AUROC is less than 0.5. (a) using the CNN model trained with the loss

𝐿 + 𝐿 𝑟𝑒𝑐 (b) using the CNN model trained with the loss 𝐿 𝑎𝑑𝑣_𝑟𝑠 + 𝐿 𝑟𝑒𝑐 Figure 11. Histograms of reconstruction errors show that the reconstruction-based OOD detection method is ineffective.

By examining the examples from the OOD adversarial attacks, as shown in Fig. 12, we observed a very interesting phenomenon: each adversarial sample from the model trained with the loss

𝐿 + 𝐿 𝑟𝑒𝑐 is equal to the original clean OOD sample (a CT image of the lung) plus some random noises, whereas each adversarial sample from the model trained with the loss 𝐿 𝑎𝑑𝑣_𝑟𝑠 + 𝐿 𝑟𝑒𝑐 is the combination of the original clean OOD sample, random noises, and an "object" whose shape is similar to a disk shape. This observation suggests that the model trained by IND adversarial training may have learned robust image-features of disk shapes, compared to the model trained with the standard loss. (a) using the CNN model trained with the loss 𝐿 + 𝐿 𝑟𝑒𝑐 (b) using the CNN model trained with the loss 𝐿 𝑎𝑑𝑣_𝑟𝑠 + 𝐿 𝑟𝑒𝑐 Figure 12. Examples from the OOD adversarial attacks to the two models trained with two different loss functions. DISCUSSION

The models trained with the standard loss (P3_std, P5_std, and P10_std) were vulnerable to IND adversarial attacks, and their performance for shape regression and image segmentation dropped significantly on noisy data (i.e., IND adversarial samples) even when the noise is imperceptible (Fig.1(a), Fig.8, and Fig. 9). Training the models with random noises (P3_rand, P5_rand, and P10_rand) did not improve robustness to IND adversarial attacks (Fig. 4). IND adversarial training significantly improved robustness of the models (P3_adv_rs, P5_adv_rs, and P10_adv_rs) for regression and segmentation (Fig. 5). By focusing on the segmentation output with the loss 𝐿 𝑎𝑑𝑣_𝑠 , not only segmentation robustness but also regression robustness was significantly improved (Fig. 7); however, by focusing on the regression output with the loss 𝐿 𝑎𝑑𝑣_𝑟 , only regression robustness was significantly improved (Fig. 6). Under IND adversarial training, the models trained on a "larger" training set (from SSM-P10) had better robustness than those trained on a "smaller" training set (from SSM-P3) when the noise level was small. OOD adversarial attacks are underexplored and have a significantly different goal, compared to IND adversarial attacks. The results (Fig.1(b) and Fig. 10) show that the model trained with the standard loss (P10_std) cannot detect any difference in image patterns between the disk images (IND samples) and the noisy CT images (OOD samples). The reconstruction-based OOD detection method can be easily fooled by the OOD adversarial attack, as shown in Fig. 11. Combined with our recent work on OOD adversarial attacks [17], we can make a safe conclusion that the current OOD detection methods, which have been evaluated by us, cannot defend against OOD adversarial attacks. Although IND adversarial training can be very effective, OOD adversarial training is computationally infeasible because the space of OOD samples is significantly larger than the space of IND samples. An OOD sample could be an X-ray image of the chest, a gray-scale image of a human face, or even a random noise image, as long as it does not look like the samples in the training set. The boundary between IND and OOD adversarial attacks could be blurry when the noise level is high. As shown in Fig. 8&9, when the noise level ɛ is higher than 0.1, the noisy images look very different from the real images and should be treated as OOD samples, which is the reason that we set ɛ to 0.07 for IND adversarial training. CONCLUSION

We conducted the robustness study of the U-net style CNN for lumbar spine disk shape reconstruction from MR images, and the results imply that the current neural network technologies are not robust enough for automated spine disk analysis from images without human intervention. Although IND adversarial training can be used to improve robustness against small input noises from IND adversarial attacks, it still remains to be a challenge to defend against OOD adversarial attacks, which means AI doctor on a human-expert level still remains to be a distant dream. Therefore, current neural network-based medical systems should serve only as assistants to human doctors. Given the fact that humans can easily detect the OOD samples, there must be some fundamental differences between the human brain and current artificial neural networks, which is worth further investigation.

SOURCE CODE

The source code of this study is publicly available at https://github.com/jiasongchen/SPIE-2021

ACKNOWLEDGEMENT

This work was supported in part by Amazon AWS Machine Learning Research Award.

REFERENCES [1] F. Shi et al. , "Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19,"

IEEE Reviews in Biomedical Engineering, pp. 1-1, 2020, doi: 10.1109/RBME.2020.2987975. [2] D. Shen, G. Wu, and H.-I. Suk, "Deep Learning in Medical Image Analysis,"

Annual Review of Biomedical Engineering, vol. 19, no. 1, pp. 221-248, 2017, doi: 10.1146/annurev-bioeng-071516-044442. [3] A. Esteva et al. , "Deep learning-enabled medical computer vision," npj Digital Medicine, vol. 4, no. 1, p. 5, 2021/01/08 2021, doi: 10.1038/s41746-020-00376-2. [4] S. Benjamens, P. Dhunnoo, and B. Meskó, "The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database," npj Digital Medicine, vol. 3, no. 1, p. 118, 2020/09/11 2020, doi: 10.1038/s41746-020-00324-0. [5] N. Akhtar and A. Mian, "Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey,"

IEEE Access, vol. 6, pp. 14410-14430, 2018, doi: 10.1109/ACCESS.2018.2807385. [6] C. Szegedy et al. , "Intriguing properties of neural networks," arXiv:1312.6199, arXiv:1412.6572,

Commun. ACM, vol. 61, no. 7, pp. 56-66, 2018, doi: 10.1145/3134599. [9] A. Graese, A. Rozsa, and T. E. Boult, "Assessing Threat of Adversarial Examples on Deep Neural Networks," in , 18-20 Dec. 2016 2016, pp. 69-74, doi: 10.1109/ICMLA.2016.0020. [10] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, "Universal Adversarial Perturbations," presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017. [11] V. Mirjalili and A. Ross, "Soft biometric privacy: Retaining biometric utility of face images while perturbing gender," in , 1-4 Oct. 2017 2017, pp. 564-573, doi: 10.1109/BTAS.2017.8272743. [12] S. G. Finlayson, H. W. Chung, I. S. Kohane, and A. L. Beam, "Adversarial Attacks Against Medical Deep Learning Systems," arXiv:1804.05296, et al. , "Robust Physical-World Attacks on Deep Learning Models,"

Conference on Computer Vision and Pattern Recognition, et al. , "Adversarial Attacks and Defences Competition,"

The NIPS '17 Competition: Building Intelligent Systems, pp. 195-231, 2017 2017. [15] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari, "Learning with a Strong Adversary," arXiv:1511.03034, arXiv:1611.01236, [17] L. Liang, L. Ma, L. Qian, and J. Chen, "An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder," arXiv:2009.08016, arXiv:1901.11210,

The Intervertebral Disc: Molecular and Structural Studies of the Disc in Health and Disease . Springer, 2014. [20] F. Galbusera, G. Casaroli, and T. Bassani, "Artificial intelligence and machine learning in spine research,"

JOR SPINE, vol. 2, no. 1, p. e1044, 2019, doi: 10.1002/jsp2.1044. [21] J. Huang et al. , "Spine Explorer: a deep learning based fully automated program for efficient and reliable quantifications of the vertebrae and discs on sagittal lumbar spine MR images,"

The Spine Journal, vol. 20, no. 4, pp. 590-599, 2020/04/01/ 2020. [22] X. Hu, M. Chen, J. Pan, L. Liang, and Y. Wang, "Is it appropriate to measure age-related lumbar disc degeneration on the mid-sagittal MR image? A quantitative image study,"

European Spine Journal, vol. 27, no. 5, pp. 1073-1081, 2018/05/01 2018, doi: 10.1007/s00586-017-5357-3. [23] L. Liang, M. Liu, C. Martin, J. A. Elefteriades, and W. Sun, "A machine learning approach to investigate the relationship between shape features and numerically predicted risk of ascending aortic aneurysm,"

Biomechanics and modeling in mechanobiology, vol. 16, no. 5, pp. 1519-1533, 2017. [24] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv:1512.03385, arXiv:1803.08494, arXiv:1502.03167, arXiv:1706.06083, arXiv:2002.08347,

February 01, 2020 2020. [29] J. Uesato, B. O'Donoghue, A. van den Oord, and P. Kohli, "Adversarial Risk and the Dangers of Evaluating Against Weak Attacks," arXiv:1802.05666,

February 01, 2018 2018. [30] D. Hendrycks and K. Gimpel, "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks,"

International Conference on Learning Representations, medRxiv, p. 2020.04.24.20078584, 2020, doi: 10.1101/2020.04.24.20078584.p. 2020.04.24.20078584, 2020, doi: 10.1101/2020.04.24.20078584.