Generative Adversarial U-Net for Domain-free Medical Image Augmentation
GGENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2021 1
Generative Adversarial U-Net for Domain-freeMedical Image Augmentation
Xiaocong Chen, Yun Li, Lina Yao,
Member, IEEE , Ehsan Adeli,
Member, IEEE ,and Yu Zhang,
Senior Member, IEEE
Abstract — The shortage of annotated medical images isone of the biggest challenges in the field of medical imagecomputing. Without a sufficient number of training sam-ples, deep learning based models are very likely to sufferfrom over-fitting problem. The common solution is imagemanipulation such as image rotation, cropping, or resizing.Those methods can help relieve the over-fitting problem asmore training samples are introduced. However, they donot really introduce new images with additional informationand may lead to data leakage as the test set may containsimilar samples which appear in the training set. To addressthis challenge, we propose to generate diverse images withgenerative adversarial network. In this paper, we developa novel generative method named generative adversarialU-Net , which utilizes both generative adversarial networkand U-Net. Different from existing approaches, our newlydesigned model is domain-free and generalizable to vari-ous medical images. Extensive experiments are conductedover eight diverse datasets including computed tomog-raphy (CT) scan, pathology, X-ray, etc. The visualizationand quantitative results demonstrate the efficacy and goodgeneralization of the proposed method on generating awide array of high-quality medical images.
Index Terms — Generative Adversarial Network, U-Net,Data Augmentation, Medical Image Analysis
I. I
NTRODUCTION
In recent decade, deep learning has attracted increasing re-search interests for the studies of medical imaging computingand its applications. One of the biggest challenges in applyingdeep learning to the medical imaging domain is to learngeneralizable feature patterns from small datasets or a limitednumber of annotated samples. Deep learning based methodsrequire a large amount of annotated training samples to supportthe inference which is hard to be fulfilled on medical imaginganalysis [1]–[3]. In medical imaging tasks, annotations areconducted by radiologists with expert knowledge about thedata and the related tasks. Benefit from increasingly releasedmedical datasets and grand challenges, the shortage of datasetis relieved to some extent. However, those datasets are still
Xiaocong Chen, Yun Li and Lina Yao are with the School ofComputer Science and Engineering, Faculty of Engineering, Univer-sity of New South Wales, Sydney, NSW, 2052, Australia. (e-mail: { xiaocong.chen,yun.li5,lina.yao } @unsw.edu.au)Ehsan Adeli is with the Department of Psychiatry and BehavioralSciences, Stanford University, Palo Alto, CA 94305 USA (email: [email protected]).Yu Zhang is with the Department of Bioengineering, Lehigh University,Bethlehem, PA 18015, USA.(e-mail: [email protected]). in limited size as they inevitably require laborious work fromradiologists [4].To overcome this problem, data augmentation has beenpopularly utilized. The most commonly used data augmenta-tion strategy is dataset manipulation including various simplemodifications of the data, such as translation, rotation, flip,crop, and scale [5], [6]. These data augmentation methodshave been widely applied to enrich the training set, therebyimproving the model performance in various computer visiontasks [7], [8]. Image modification can introduce some pixel-level side information to improve the performance. However,pixel-level modification cannot introduce new images but onlythe variants of the original one, and hence is still likely tosuffer from the over-fitting problem. Instead, synthetic dataaugmentation method is considered to be more reasonablealternative as it can generate sophisticated types of databased on the original images. Generative adversarial networks(GAN) is a representative of the synthetic data augmentationmethod [9] and capable of providing more variability to enrichthe dataset.Inspired by the game theory, GAN aims to achieve thenash equilibrium [9] inside the model. GAN consists of twomajor networks which are trained jointly under an adversarialsetting, where one network generates fake images based oninputs and the other network distinguishes the generatedimages from the real images. GANs have been increasinglyapplied to image synthesis [9], [10], such as denoising [11],image translation [12], etc. In addition, multiple variants ofGANs were proposed to generate high quality realistic naturalimages [13], story visualization [14], and synthesize high-resolution images based on the low-resolution one [15].Recently, several studies on medical imaging adopted theGANs and their variants as the main framework [4], [16], [17].Most studies have employed the GAN technique for generat-ing images or medical cross-modality translations. Zhang etal. [16] applied GAN to reduce the intrinsic noises in multi-sourced dataset as different devices would generate differenttypes of noises and such site effects would significantly affectdata distribution. Zhang et al. [18] proposed SkrGAN byincorporating the global contextual information which is fineforeground structures to improve the image quality. Xue etal. [19] used two GANs to learn the relationship betweenbrain MRI images and a brain tumor segmentation map. Firdet al. [4] utilized GAN to generate liver lesion images toimprove the CNN classification performance. Besides, GANs a r X i v : . [ ee ss . I V ] J a n GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2021
Class True Image Encoder R ep r e s en t a t i on P r o j e c t i on Generator GeneratedImageDiscriminator
RealFake ResNet BlockResNet BlockRepresentationResNet BlockResNet BlockResNet BlockResNet BlockResNet BlockResNet Block
Downscaling Upscaling Skip-Connection tanh
Linear Projection
Fig. 1 : Structure of the proposed model. Given an arbitrary class c t ∈ C , our model can generate the corresponding images basedon sampled image x j . Two different samples will be sampled from the given class c t each time to support the discriminator. x j will be encoded into a latent representation and concatenated with the Gaussian variable as the final representation to beused for generation. The generated image x g will be fed into discriminator together with real data x i , x j . The discriminator isdesigned to distinguish two distributions, i.e., true distribution { x i , x j } and fake distribution { x i , x g } . The generator aims tomake those two distributions as similar as possible.have also been successfully applied to segmentation. Dong etal. [20] adopted GAN to do the neural architecture search tofind the best way to make the segmentation for chest organs.Khosravan et al. [21] introduced a projection module intoGAN to boost the performance of segmentor on the lung.However, most of the existing studies have been focusingon a specific task or domain, and there is no robust methodthat is generalizable across various domains. In this study, weaim to design a domain-free GAN structure which is suitablefor any domains instead of a specific one. Specifically, theproposed method can be used in any domain such as X-Ray,CT scan, pathology etc. In addition, vanilla GANs suffer fromthe training instability problem which is hard to make themodel converge [22]. We employ the Wasserstein GAN as themain framework in our model as it has shown higher trainingstability. U-Net is a well-known structure on medical imaginganalysis, especially in segmentation [5], [23]. Segmentationaims to find the pixel-level abnormality, which requires strongfeature extraction capability. The generator in GANs requiresa similar capability. Hence, we utilize U-Net as the generatorin our study. U-Net is similar to auto-encoder which canlearn the latent representation and reconstruct the output withthe same size as the input. In order to fulfill the generationrequirement, we concatenate a Gaussian variable into the latentrepresentation to ensure that it will not generate the sameimage each time. The major contribution of our study canbe summarized as following: The contribution of this papercan be summarized as following: • We propose a new variant of GAN named generativeadversarial U-Net for the domain-free medical imageaugmentation. Images generated by the proposed methodhave a better quality than vanilla GANs and its well-known variant conditional GANs. • To leverage the superior features extraction capability, we first dissemble U-Net into encoder and generator. Then,we assemble the generator into the GAN structure togenerate images. • Extensive experiments are conducted on eight differentdatasets in different domains include CT scan, Pathology,Chest X-Ray, Dermatoscope, Ultrasound, and Optical Co-herence Tomography. Experimental results demonstratehigh generalizability and robustness across various datadomains.
II. M
ETHODOLOGY
In this section, the proposed generative adversarial U-Net will be briefly introduced. We will start by describingthe overall structure of the developed deep learning modelfollowed by explaining the components including ResidualU-Net generator, discriminator and the training strategy. Theoverall flowchart of our developed method is illustrated inFig. 1.
A. Overview
U-Net was first proposed in [24] and has been widelyused in medical image segmentation [5]. It is a type ofartificial neural network by using the auto-encoder structurewith skip connections. The encoder is designed to extractfeatures from the given images and the decoder is to constructthe segmentation map by using those extracted spatial features.The encoder follows a similar structure like fully convolutionalnetworks (FCN) [25] with the stacked convolutional layers.To be specific, the encoder consists of a sequence of blocksfor down-sampling operations, with each block includingseveral convolution layers followed by max-pooling layers.The number of filters in the convolutional layers is doubled
HEN et al. : GENERATIVE ADVERSARIAL U-NET FOR DOMAIN-FREE MEDICAL IMAGE AUGMENTATION 3 after each down-sampling operation. In the end, the encoderoutputs a learned feature map for the input image.Differently, the decoder is designed for up-sampling andconstructing the image segmentation. The decoder first utilizesa deconvolutional layer to up-sample the feature map gener-ated by the encoder. The deconvolutional layer contains thetransposed convolution operation and will halve the numberof filters in the output. It is followed by a sequence of up-sampling blocks which consist of two convolution layers anda deconvolutional layer. Then, another convolutional layer isused as the final layer to generate the segmentation result. Thefinal layer adopts Sigmoid function as the activation functionwhile all other layers use ReLU function.In addition, the U-Net concatenates parts of the encoderfeatures with the decoder, which is known as the skip-connection in ResNet [26]. For each block in the encoder,the result of the convolution before the max-pooling is trans-ferred to the decoder symmetrically. In decoder, each blockreceives the feature representation learned from encoder, andconcatenates them with the output of the deconvolutional layer.The concatenated result is then forwardly propagated to theconsecutive block. This concatenation operation is useful forthe decoder to capture the possible lost features by the max-pooling [23].
B. U-Net Based Generative Model
As mentioned previously, U-Net demonstrates state-of-the-art performance on medical image segmentation task, showingits superiority on the medical image feature extraction. Hence,we utilize the U-Net as the main structure for the proposedgenerative model. GANs G ( z ) are generative models that aimto learn a mapping from a random noise vector z to the outputimage x g , G : z → x g [9]. z ∈ N (0 , I ) (1)However, the images generated by GANs are randomizedwhich is hard to define the label. Hence, we use the conditionalGANs [13] instead. Conditional GANs learn a mapping froma random noise vector z and observed images x i for class c t to the output image x g , G c : ( z, x i ) → x g . Generator G istrained to generate images that cannot be distinguished fromthe “real” images by an adversarial trained discriminator D .Discriminator D is trained to detect fake images produced bygenerator. Alternatively, GANs are designed to conduct thedistribution discrepancy measurement between the generateddata and real data. The objective function of the conditionalGANs can be expressed as: L c ( G c , D ) = E x i ,x g [log D ( x i , x g )]+ E x i ,z [log(1 − D ( x i , G c ( x i , z )))] (2)where G c tries to minimize the objective function against D that tries to maximize it. Mathematically, it is formulated asfollows: G = arg min G c max D L c ( G c , D ) (3)where G is the result generator when Eq.(3) reaches theNash equilibrium. However, conditional GANs have similar limitations with the traditional GANs, which suffers fromtraining instability and model collapse problems. Hence, weuse the Wasserstein GAN [22] as the main structure for ourgenerative model. To be specific, a normal GAN minimizesJS-Divergence which is shown on Eq.(2), whereas objectivefunction for Wasserstein GAN is: L w ( G w , D w ) = E x i ,z [( D w ( x i , G w ( x i , z )))] − E x i ,x g [ D w ( x i , x g )] (4)Furthermore, an gradient penalty is introduced [10] to enforcethe Lipschitz constraint for Wasserstein GAN which is: λ E ˆ x ∼ P ˆ x [( (cid:107)∇ ˆ x D (ˆ x ) (cid:107) − ] (5)where the P ˆ x sampling uniformly along straight lines be-tween pairs of points sampled from the data distributionand generator distribution. The ˆ x is the combination of theoriginal images and generated ones sampled from an uniformdistribution U (0 , with a control factor (cid:15) .A generative adversarial network could be used to conductdata augmentation. Given a certain class c t and correspondingdata point x , we are able to learn a representation of the inputimage r x through the encoder such that r x = g ( x ) where g ( · ) represents the encoder network. In addition, a latent Gaussianvariable z i is introduced into the learned representation toprovide the variation with the following form z i ∈ g l ( N (0 , I )) (6)where g l is the linear projection to project the Gaussian noiseinto vector form so that it can be concatenated with the learnedrepresentation. Once the representation is learned, it will be fedinto the generator to generate images. In the proposed method,U-Net is split up into encoder and generator. The structure ofU-Net can be found in the right side of Fig. 1. The ResNetblock is used as the basic unit, which is defined as F ( k ) = (cid:88) j =1 w j k j + r k (7)where k j is k − th layer and w j is the corresponding trainableweight and r k is the residual. In addition, different from thetraditional U-Net, we use leaky ReLU f ( x ) as the activationfunction. f ( x ) = (cid:40) . x x ≤ x x > (8)It is worth noting that the generated image x g and originalimage x j are both provided to the discriminator. We want toensure that the generator is capable of generating the imagethat is related to but different from the original image x j .That is, the generated image x g is supposed to be drawn fromthe same class as x j other than just a duplicate or simplemodification of x j . By providing the current image x j , wecan prevent generator from simply encoding it. In addition, theclass information is provided where the generator can betterlearn the generalized pattern across all classes. GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2021
TABLE I : Summary description for the datasets used in our experimental study.
Name Modality Tasks
C. Model Architectures
The generator contains eight blocks where each block hasfour × convolutional layers with batch renormalization [27]followed by a downscaling or upscaling layer. Downscalinglayers are convolutions with stride 2 followed by leaky ReLU,batch normalization and dropout. Upscaling layers are de-convolution with stride followed by leaky ReLU, batchrenormalization and dropout. As aforementioned, we alsomaintain the skip-connections inside the generator. We use thesimilar strategy with ResNet, we use a × convolutional layerto pass features between blocks. DenseNet [28] is adopted asthe discriminator. Layer normalization is applied instead ofbatch normalization as we find it has a better performance.Discriminator contains four dense blocks and four transitionlayers where each dense block contains four convolutionallayers and ends with a dropout layer. The reason why we applythe dropout at the last layer is that it can avoid the overfitting. D. Optimization
To optimize our networks, we follow the standard approachintroduced in [9]: alternately update one gradient descent stepon D , then one step on G . As suggested in the original WGANpaper, we train the model based on the algorithm 1. Algorithm 1:
Training algorithm for our model input: gradient penalty coefficient λ , Adam parameter α, β , β , batch size m , input image x i input: discriminator parameter w , generatorparameter θ while θ not converged do for t = 1 , · · · do for i = 1 , · · · , m do Sample real data x j , random noise fromEq.(1), random number (cid:15) ∼ U (0 , ; x g = G θ ( z, x j ) ; ˆ x = (cid:15)x i + (1 − (cid:15) ) x g ; L i = Eq.(4) + Eq.(5); end w ← Adam( ∇ w m (cid:80) mi =1 L i , w, α, β , β ); end Sample batch of latent variable { z i } mi =1 ∼ N (0 , I ) ; θ ← Adam( ∇ θ m (cid:80) mi =1 − D w ( x g )) , θ, α, β , β ); end III. E
XPERIMENT
A. Experiments Setup
The experiments are conducted on multiple public availabledatasets which are: • NCT-CRC-HE-100K [29]: A dataset contains , non-overlapping image patches from hematoxylin andeosin stained histological images. Nine different types oftissues are involved. • ChestXray8 [30]: A dataset contains , frontal-view X-ray images of , unique patients with 14disease image labels. • HAM10000 [31]: A dataset contains , multi-sourcedermatoscopic images of common pigmented skin lesionswhich belongs to seven different categories. • Optical Coherence Tomography 2017 (OCT2017) [32]: Adataset contains , optical coherence tomographyimages for four different type retinal diseases. • X-Ray OCT 2017 [32]: A dataset extended fromOCT2017 which contains , chest CT images frompneumonia and normal patients. • LUNA [33]: LUng Nodule Analysis is an open datasetconsisting of 888 CT scans with three different classes:non-nodule, nodule < ≥ • BreastUltra [34]: A dataset contains 780 breast ultrasoundimages belonging to three different classes: normal, be-nign, and malignant. • LiTS [35]: Liver Tumor Segmentation Benchmark datasetwhich contains 3D computed tomography (CT). We usethe similar method in [36] to transfer them into 2D imageswith the axial view.OpenCV2 is utilized to resize all those medical images into × × . We use the ratio 7:1:2 to split the datasets intotraining set, validation set and test set on patient level. Theinformation of all these datasets is summarized on Table I.The experiments were conducted on a machine with eightGPUs which includes six NVIDIA TITAN X Pascal GPUsand two NVIDIA TITAN RTXs. The model is implementedin Tensorflow.Quality evaluation of the generated images is a challengingproblem [37]. Traditional metrics, such as per-pixel meansquare error, is hard to reflect the performance. Hence, weuse the Fr´echet Inception Distance (FID) [38] to measurethe distance between the generated distribution and the realdistribution. Lower FID indicates that the generated imageshave higher quality. In addition, we also use the Per-pixelAccuracy (PA) to measure the discriminability of the generatedimages [12], [37], [39]. In order to demonstrate the superiority HEN et al. : GENERATIVE ADVERSARIAL U-NET FOR DOMAIN-FREE MEDICAL IMAGE AUGMENTATION 5
TABLE II : Quantitative comparison of the quality of generated images with 95% confidence interval.
NCT-CRC-HE ChestXray8 HAM10000 OCT 2017FID ↓ PA ↑ FID PA FID PA FID PAGAN 41.65 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± X-Ray OCT LUNA BreastUltra LiTSFID ↓ PA ↑ FID PA FID PA FID PAGAN 47.02 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± TABLE III : Classification results for top-5 with 95% confidence interval with(W) and without(W/O) augmentation
NCT-CRC-HE ChestXray8Acc Pre Rec AUC Acc Pre Rec AUCResNet18 W ± ± ± ± ± ± ± ± W/O 0.806 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.846 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/0 0.840 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.721 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.750 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.744 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.739 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.678 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.843 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.861 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.857 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.800 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.859 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.878 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.853 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± W/O 0.810 ± ± ± ± ± ± ± ± of the proposed method, we select the following baselines tocompare the quality. • Vanilla GAN [9]: the original version of the GAN. • Conditional GAN [13]: conditional GAN which considerabout the label information.It is worth to mention that U-Net is type of auto-encoder, butauto-encoders can not be used as baselines. Auto-encoders arewidely used on reconstruction task instead of augmentation,it is not suitable in this manuscript. For classification, we useseveral different metrics which are: • Accuracy, which measures the percentage of correctlyclassified samples over the whole dataset • Precision, which measures the percentage of true posi- tives (TP) over all predicted positive samples • Recall, used to measure the percentage of TPs over allpositive samples • Area-under-the-curve (AUC) which measures the relationbetween FPs and TPs.The following baselines are selected as the classifier:ResNet-18, ResNet-50, DenseNet-161 and Support VectorMachine (SVM).
B. Hyper Parameters Setting
In this part, we briefly present the parameters setting used inour experiments. The gradient penalty coefficient λ is 10. ForAdma optimizer [40], we set α = 0 . , β = 0 , β = 0 . . GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2021
Ground Truth Our Method GAN cGAN
Fig. 2 : Generated images on three different domains: Lung CT, Chest X-Ray and Ultrasound. From left to right are: originalimages, our method, GAN generated image, and conditional GAN generated image. It is obvious that the images generatedby our method is more similar with the original ones.The growth rate for Dense Block k = 64 , the number oftraining epoch is , . The classifiers are trained for , epoches with Adma optimizer with α = 0 . , β = 0 . , β =0 . . C. Results
We first compared the performance of image generationamong GAN, cGAN, and our method. The results summarizedin Table II show that our method achieves the best resultcompared with GAN and conditional GAN on both metrics.In order to better analyze the generated images, we providethe visualization for those generated images. We use threedifferent datasets: Luna, ChestXray8 and BreatUltra as thedemo datasets. The visualization can be found on Fig 2. It’sobviously that the image generated by our model are moresimilar with the ground truth.We also provide the classification results with and withoutaugmentation on all the eight datasets with four differentclassifiers on Table III. We found that the performance of allthe classifiers improved significantly after augmentation.
IV. C
ONCLUSION
The lack of annotated medical images leads to a significantchallenge for imaging-based medical studies, such as quickdiagnosis, disease prediction, etc. Data augmentation is apopular approach to relieve this problem. In this paper, wepropose a new method named generative adversarial U-Net formedical image augmentation. It can be used to generate multi-modalities data to relieve the data shortage which is commonlyfaced in medical imaging research. Specifically, we adjust thestructure of the generative adversarial network to fit into theU-Net. We conduct extensive experiments on eight datasetswith different modalities from binary classification to multi-class classification. Our experimental results demonstrated thesuperior performance of the proposed method over the state-of-the-art approaches on all of these datasets. In the future, weplan to extend our work into the more challenging few-shotlearning scenario or semi-supervised learning scenario [41],[42] in which only a few samples even no samples availablefor certain class. We plan to investigate the transfer learning tohelp with enriching the current model to augment the abilityof dealing with unseen samples from a brand new class.
HEN et al. : GENERATIVE ADVERSARIAL U-NET FOR DOMAIN-FREE MEDICAL IMAGE AUGMENTATION 7 R EFERENCES [1] H. R. Roth, L. Lu, J. Liu, J. Yao, A. Seff, K. Cherry, L. Kim, and R. M.Summers, “Improving computer-aided detection using convolutionalneural networks and random view aggregation,”
IEEE transactions onmedical imaging , vol. 35, no. 5, pp. 1170–1181, 2015.[2] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. S´anchez,“A survey on deep learning in medical image analysis,”
Medical imageanalysis , vol. 42, pp. 60–88, 2017.[3] X. Zhang, X. Chen, M. Dong, H. Liu, C. Ge, and L. Yao, “Multi-taskgenerative adversarial learning on geometrical shape reconstruction fromeeg brain signals,” arXiv preprint arXiv:1907.13351 , 2019.[4] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, andH. Greenspan, “Gan-based synthetic medical image augmentation forincreased cnn performance in liver lesion classification,”
Neurocomput-ing , vol. 321, pp. 321–331, 2018.[5] X. Chen, L. Yao, and Y. Zhang, “Residual attention u-net for automatedmulti-class segmentation of covid-19 chest ct images,” arXiv preprintarXiv:2004.05645 , 2020.[6] X. Chen, L. Yao, T. Zhou, J. Dong, and Y. Zhang, “Momentum con-trastive learning for few-shot covid-19 diagnosis from chest ct images,” arXiv preprint arXiv:2006.13276 , 2020.[7] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-imagetranslation using cycle-consistent adversarial networks,” in
Proceedingsof the IEEE international conference on computer vision , 2017, pp.2223–2232.[8] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image trans-lation networks,”
Advances in neural information processing systems ,vol. 30, pp. 700–708, 2017.[9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
Advances in neural information processing systems , 2014, pp. 2672–2680.[10] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,“Improved training of wasserstein gans,” in
Advances in neural infor-mation processing systems , 2017, pp. 5767–5777.[11] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra,Y. Zhang, L. Sun, and G. Wang, “Low-dose ct image denoising using agenerative adversarial network with wasserstein distance and perceptualloss,”
IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1348–1357, 2018.[12] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translationwith conditional adversarial networks,” in
Proceedings of the IEEEconference on computer vision and pattern recognition , 2017, pp. 1125–1134.[13] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784 , 2014.[14] Y. Li, Z. Gan, Y. Shen, J. Liu, Y. Cheng, Y. Wu, L. Carin, D. Carlson, andJ. Gao, “Storygan: A sequential conditional gan for story visualization,”in
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , 2019, pp. 6329–6338.[15] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham, A. Acosta,A. Aitken, A. Tejani, J. Totz, Z. Wang et al. , “Photo-realistic singleimage super-resolution using a generative adversarial network,” in
Proceedings of the IEEE conference on computer vision and patternrecognition , 2017, pp. 4681–4690.[16] T. Zhang, J. Cheng, H. Fu, Z. Gu, Y. Xiao, K. Zhou, S. Gao, R. Zheng,and J. Liu, “Noise adaptation generative adversarial network for medicalimage analysis,”
IEEE Transactions on Medical Imaging , vol. 39, no. 4,pp. 1149–1159, 2019.[17] D. Nie, R. Trullo, J. Lian, C. Petitjean, S. Ruan, Q. Wang, and D. Shen,“Medical image synthesis with context-aware generative adversarialnetworks,” in
International Conference on Medical Image Computingand Computer-Assisted Intervention . Springer, 2017, pp. 417–425.[18] T. Zhang, H. Fu, Y. Zhao, J. Cheng, M. Guo, Z. Gu, B. Yang,Y. Xiao, S. Gao, and J. Liu, “Skrgan: Sketching-rendering unconditionalgenerative adversarial networks for medical image synthesis,” in
Interna-tional Conference on Medical Image Computing and Computer-AssistedIntervention . Springer, 2019, pp. 777–785.[19] Y. Xue, T. Xu, H. Zhang, L. R. Long, and X. Huang, “Segan: Adversarialnetwork with multi-scale l 1 loss for medical image segmentation,”
Neuroinformatics , vol. 16, no. 3-4, pp. 383–392, 2018.[20] N. Dong, M. Xu, X. Liang, Y. Jiang, W. Dai, and E. Xing, “Neuralarchitecture search for adversarial medical image segmentation,” in
International Conference on Medical Image Computing and Computer-Assisted Intervention . Springer, 2019, pp. 828–836. [21] N. Khosravan, A. Mortazi, M. Wallace, and U. Bagci, “Pan: Projectiveadversarial network for medical image segmentation,” in
InternationalConference on Medical Image Computing and Computer-Assisted Inter-vention . Springer, 2019, pp. 68–76.[22] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXivpreprint arXiv:1701.07875 , 2017.[23] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++:A nested u-net architecture for medical image segmentation,” in
DeepLearning in Medical Image Analysis and Multimodal Learning forClinical Decision Support . Springer, 2018, pp. 3–11.[24] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in
International Conference onMedical image computing and computer-assisted intervention . Springer,2015, pp. 234–241.[25] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in
Proceedings of the IEEE conference oncomputer vision and pattern recognition , 2015, pp. 3431–3440.[26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proceedings of the IEEE CVPR , 2016, pp. 770–778.[27] S. Ioffe, “Batch renormalization: Towards reducing minibatch depen-dence in batch-normalized models,” in
Advances in neural informationprocessing systems , 2017, pp. 1945–1953.[28] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Denselyconnected convolutional networks,” in
Proceedings of the IEEE confer-ence on computer vision and pattern recognition , 2017, pp. 4700–4708.[29] J. N. Kather, J. Krisam, P. Charoentong, T. Luedde, E. Herpel, C.-A.Weis, T. Gaiser, A. Marx, N. A. Valous, D. Ferber et al. , “Predictingsurvival from colorectal cancer histology slides using deep learning:A retrospective multicenter study,”
PLoS medicine , vol. 16, no. 1, p.e1002730, 2019.[30] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers,“Chestx-ray8: Hospital-scale chest x-ray database and benchmarks onweakly-supervised classification and localization of common thoraxdiseases,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , 2017, pp. 2097–2106.[31] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset,a large collection of multi-source dermatoscopic images of commonpigmented skin lesions,”
Scientific data , vol. 5, p. 180161, 2018.[32] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L.Baxter, A. McKeown, G. Yang, X. Wu, F. Yan et al. , “Identifyingmedical diagnoses and treatable diseases by image-based deep learning,”
Cell , vol. 172, no. 5, pp. 1122–1131, 2018.[33] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van denBogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurts et al. ,“Validation, comparison, and combination of algorithms for automaticdetection of pulmonary nodules in computed tomography images: theluna16 challenge,”
Medical image analysis , vol. 42, pp. 1–13, 2017.[34] M. H. Yap, G. Pons, J. Mart´ı, S. Ganau, M. Sent´ıs, R. Zwiggelaar, A. K.Davison, and R. Mart´ı, “Automated breast ultrasound lesions detectionusing convolutional neural networks,”
IEEE journal of biomedical andhealth informatics , vol. 22, no. 4, pp. 1218–1226, 2017.[35] P. Bilic, P. F. Christ, E. Vorontsov, G. Chlebus, H. Chen, Q. Dou, C.-W.Fu, X. Han, P.-A. Heng, J. Hesser et al. , “The liver tumor segmentationbenchmark (lits),” arXiv preprint arXiv:1901.04056 , 2019.[36] X. Xu, F. Zhou, B. Liu, D. Fu, and X. Bai, “Efficient multiple organlocalization in ct image using 3d region proposal network,”
IEEEtransactions on medical imaging , vol. 38, no. 8, pp. 1885–1898, 2019.[37] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, andX. Chen, “Improved techniques for training gans,”
Advances in neuralinformation processing systems , vol. 29, pp. 2234–2242, 2016.[38] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter,“Gans trained by a two time-scale update rule converge to a local nashequilibrium,” in
Advances in neural information processing systems ,2017, pp. 6626–6637.[39] X. Wang and A. Gupta, “Generative image modeling using style andstructure adversarial networks,” in
European conference on computervision . Springer, 2016, pp. 318–335.[40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[41] X. Zhang, L. Yao, and F. Yuan, “Adversarial variational embeddingfor robust semi-supervised learning,” in
Proceedings of the 25th ACMSIGKDD International Conference on Knowledge Discovery & DataMining , 2019, pp. 139–147.[42] L. Yao, F. Nie, Q. Z. Sheng, T. Gu, X. Li, and S. Wang, “Learningfrom less for better: semi-supervised activity recognition via sharedstructure discovery,” in