[PDF] A GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Abstract

Image-to-image translation based on generative adversarial network (GAN) has achieved state-of-the-art performance in various image restoration applications. Single image dehazing is a typical example, which aims to obtain the haze-free image of a haze one. This paper concentrates on the challenging task of single image dehazing. Based on the atmospheric scattering model, we design a novel model to directly generate the haze-free image. The main challenge of image dehazing is that the atmospheric scattering model has two parameters, i.e., transmission map and atmospheric light. When we estimate them respectively, the errors will be accumulated to compromise dehazing quality. Considering this reason and various image sizes, we propose a novel input-size flexibility conditional generative adversarial network (cGAN) for single image dehazing, which is input-size flexibility at both training and test stages for image-to-image translation with cGAN framework. We propose a simple and effective U-type residual network (UR-Net) to combine the generator and adopt the spatial pyramid pooling (SPP) to design the discriminator. Moreover, the model is trained with multi-loss function, in which the consistency loss is a novel designed loss in this paper. We finally build a multi-scale cGAN fusion model to realize state-of-the-art single image dehazing performance. The proposed models receive a haze image as input and directly output a haze-free one. Experimental results demonstrate the effectiveness and efficiency of the proposed models.

Full PDF

AA GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Shichao Kan a,b ,1 , Yue Zhang a,b ,1 , Fanghui Zhang a,b and Yigang Cen a,b , ∗ a Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China b Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

A R T I C L E I N F O

Keywords :generative adversarial networkimage dehazingimage restoration

A B S T R A C T

Image-to-image translation based on generative adversarial network (GAN) has achieved state-of-the-art performance in various image restoration applications. Single image dehazing is a typicalexample, which aims to obtain the haze-free image of a haze one. This paper concentrates on thechallenging task of single image dehazing. Based on the atmospheric scattering model, we designa novel model to directly generate the haze-free image. The main challenge of image dehazing isthat the atmospheric scattering model has two parameters, i.e., transmission map and atmosphericlight. When we estimate them respectively, the errors will be accumulated to compromise dehazingquality. Considering this reason and various image sizes, we propose a novel input-size ﬂexibilityconditional generative adversarial network (cGAN) for single image dehazing, which is input-sizeﬂexibility at both training and test stages for image-to-image translation with cGAN framework. Wepropose a simple and effective U-type residual network (UR-Net) to combine the generator and adoptthe spatial pyramid pooling (SPP) to design the discriminator. Moreover, the model is trained withmulti-loss function, in which the consistency loss is a novel designed loss in this paper. We ﬁnallybuild a multi-scale cGAN fusion model to realize state-of-the-art single image dehazing performance.The proposed models receive a haze image as input and directly output a haze-free one. Experimentalresults demonstrate the effectiveness and efﬁciency of the proposed models.

1. Introduction

Haze removal [8] is a classical ill-posed image restora-tion problem, which plays an important role in intelligenttransportation systems, e.g., object detection under haze con-ditions [18, 23, 19]. Haze is deﬁned as some particles suchas dust that obscure the clarity of the atmosphere. Dehaz-ing is to remove the veil of haze from a haze image andrestore a corresponding haze-free image. In recent years,because the development of deep learning technology hasgreatly improved the performance of image processing com-pared with non-learning-based technology, the problem ofdehazing attracts more and more attentions in image restora-tion research community. Various image dehazing methodsbased on deep learning technology have been proposed, in-cluding: (1) Generating medium transmission map [3] orhaze-free image [23, 17, 40] by a convolutional neural net-work (CNN); (2) generating transmission map [28] or haze-free image [39, 36, 24] based on encoder-decoder structurewithout adversary training; (3) reconstructing haze-free im-age based on generative adversary network (GAN) [38, 42,20, 29, 27], which are paired image-to-image translationmodels; (4) reconstructing haze-free image based on cycleGAN (CGAN) [37, 4, 22], which are unpaired image-to- ⋆ This work was supported in part by the National Natural ScienceFoundation of China under Grant 61872034 and Grant 61572067, in partby the Natural Science Foundation of Guizhou Province under Grant[2019]1064, in part by the Science and Technology Program of Guangzhouunder grant 201804010271. ∗ Corresponding author (S. Kan); (Y. Zhang); (F. Zhang); [email protected] (Y. Cen)

ORCID (s): (S. Kan) The ﬁrst two authors (Shichao Kan and Yue Zhang) contributeequally. image translation models.In order to directly generate medium transmission map,[3] and [28] proposed an end-to-end learning CNN model.To generate haze-free image from a haze one via an end-to-end manner, [23],[17] and [40] proposed light-weight andfast CNNs. [39], [36] and [24] incorporated some moderntechnologies into CNNs based on encoder-decoder struc-ture. Because the GAN framework [6] is not involved inthese models, the real of image restoration is usually sub-optimal. In order to merge GAN and image dehazing, su-pervised learning model with paired and unpaired samplesbased on adversary training are developed. [38], [42], [20],[29] and [27] are GAN-based end-to-end learning modelsthat trained with paired synthetic dataset, while [37], [4] and[22] are cycle-consistency models that trained with unpairedtraining dataset.From these deep learning-based methods and correspond-ing experimental results, we can see that: (1) Because theend-to-end dehazing models [17, 40, 39, 36, 24, 38, 42, 20,29] can directly generate the haze-free image without ad-ditional parameter estimation, they are generally more ef-ﬁcient than non-end-to-end dehazing models [3, 28]; (2)Due to there are no down-sampling and up-sampling pro-cess before and after image dehazing with input-size ﬂex-ibility models [17, 24], the information loss can be min-imized throughout the restoration process. Thus, imagesgenerated with input-size ﬂexibility models have a bettervisual effect than images generated with input-size ﬁxedmodels [40, 39, 38, 29]; (3) Because the paired sampleshave deﬁnitive supervised information, the training of thenetwork can be truly supervised when the paired samplesare used. Thus, paired image-to-image translation models[38, 42, 20, 29, 27] are usually more effective than unpairedimage-to-image translation models [37, 4, 22]; (4) Various

Shichao Kan et al.:

Preprint submitted to Elsevier

Page 1 of 11 a r X i v : . [ ee ss . I V ] F e b GAN-Based Input-Size Flexibility Model for Single Image Dehazing works [42, 20, 29, 27] focus on exploring single image de-hazing with GAN-based models, and achieve promising per-formance. Considering these properties, we propose an end-to-end input-size ﬂexibility conditional generative adversar-ial network (cGAN) for single image dehazing. The pro-posed model can not only remove the haze as much as pos-sible but also preserve the clear content of an image.The method proposed in this paper has obtained the state-of-the-art results on the datasets of the intelligent trafﬁc videoimage enhancement processing competition of ICIG 2019and the more large scale REalistic Single Image DEhazing(RESIDE) dataset [19] for single image dehazing. The per-formance improvement is primarily come from the input-size ﬂexibility training and test, multi-loss supervised train-ing and the designed end-to-end framework.Our works have the following contributions.• We propose an end-to-end input-size ﬂexibility cGANmodel for single image dehazing, which gathered theadvantages of various dehazing models, i.e., end-to-end manner, input-size ﬂexibility mode, training withGAN and multi-loss functions. Based on our model,both adversary training and test stages can be changedto input-size ﬂexibility mode and the image dehazingperformance can be improved greatly.• In our framework, we design a UR-Net structure basedon the popular U-Net [35] structure and residual learn-ing [11], which is simple and effective. The generatoris the iteration of UR-Net between two adjacent con-volutional layers. Moreover, the generator design ofour GAN structure is based on the idea of the encoder-decoder [12] mechanism and the size of feature mapin each layer of the decoder is related to the size of theinput image. In order to realize input-size ﬂexibilityadversary training, the spatial pyramid pooling (SPP)[9] structure is embedded into the discriminator.• Training with multi-loss functions is also an impor-tant part of our framework, we proposed a consistencyloss, and combined adversary loss, 𝐿 loss, the struc-tural similarity (SSIM) loss and a new peak signal tonoise ratio (PSNR) loss to train our network. The ef-fectiveness of these loss functions is veriﬁed by abla-tion studies.The rest of this paper is organized as follows. In Section2, related works about learning-based single image dehazingare reviewed. The idea, framework and details of the pro-posed input-size ﬂexibility cGAN for single image dehazingare presented in Section 3. In Section 4, datasets, evaluationmetrics and the experimental results are presented. Section5 concludes the paper.

2. Related Work

Single image dehazing is a difﬁcult vision task and hasa long research history. Traditional single image dehaz- http://icig2019.csig.org.cn/?page_id=328 ing methods are based on the handcrafted priors [5], e.g.,dark channel prior [8], color attenuation prior [44] and non-local prior [2, 21], which are usually simple and effectivefor many scenes. However, prior-based methods are lim-ited when describing speciﬁc statistics. In recent few years,learning-based methods are becoming popular because theycan overcome the limitations of speciﬁc priors. We also ori-ented to study learning-based single image dehazing in thispaper. Here, works related to them are reviewed in detail,including learning-based dehazing without and with GANmethods, respectively. Learning-based dehazing methods become more and morepopular since the learning idea was proposed by Tang etal. [34]. The original idea was learning a regression modelbased on random forests from prior-based haze-relevant fea-tures, such as dark channel [8], local max contrast [33], huedisparity [1], and local max saturation [34]. Subsequently,more powerful learning dehazing models were proposed, es-pecially CNN-based end-to-end learning methods. Song etal. [32] proposed a ranking CNN to capture the statisti-cal and structural attributes of hazy images, simultaneously.However, it is not an end-to-end learning system. Cai etal. [3] proposed an end-to-end learning system to directlygenerate a medium transmission map, which is based on theCNN framework and called DehazeNet. Ren et al. [28] pro-posed a coarse-to-ﬁne multi-scale CNN (MSCNN) model topredict transmission maps. Although the two models can belearned via an end-to-end manner, they are not end-to-enddehazing models.In 2017, Li et al. [17] proposed a light-weight, effec-tive and fast end-to-end learning model for image dehaz-ing, called AOD-Net, which can directly generate a haze-free image from a haze one. Since then, the end-to-end de-hazing idea is favored by researchers. Based on the AOD-Net framework, Liu et al. [23] investigated various lossfunctions and demonstrated that training with perception-driving loss can further boost the performance of dehaz-ing. Zhang et al. [39] proposed a multi-scale image de-hazing method using a perceptual pyramid deep networkbased on an encoder-decoder structure with a pyramid pool-ing module. In this model, the designed network is basedon dense blocks [13] and residual blocks [11], the percep-tual loss is also incorporated into the training process. Xu etal. [36] proposed an instance normalization unit and embed-ded it into the VGG-based [30] U-Net [35] with an encoder-decoder structure. Liu et al. [24] proposed a generic model-agnostic CNN (GMAN) for signal image dehazing, which isbased on the fully convolutional idea and is not rely on theatmosphere scattering model. Both Xu et al. and Zhang etal. are based on the mean squared error (MSE) and VGG-feature-based perceptual loss to train the network. Recently,Zhang and Tao [40] proposed a fast and accurate multi-scaleend-to-end dehazing network called FAMED-Net, which islightweight and computationally efﬁcient.Inspired by the success of these models, our proposed

Shichao Kan et al.:

Preprint submitted to Elsevier

Page 2 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing framework is based on the U-Net structure and residual learn-ing, which is also an end-to-end dehazing one. Differentfrom the previous idea, our network is designed for gen-eralized image restoration, especially for different sizes ofimages, which can accept input images of any size duringboth training and test processes.

The idea of GAN was ﬁrst proposed in [6], which isdesigned to synthesize realistic images via an adversarialprocess. Latter, it is widely extended to a variety of imagegeneration tasks, such as conditional image generation [25],paired image-to-image translation [15], unpaired image-to-image translation [43], etc. Now, it is also becoming popu-lar in single image dehazing. Zhang and Patel [38] proposedto jointly learn the transmission map, atmospheric light, anddehazing based on GAN framework, which is called denselyconnected pyramid dehazing network (DCPDN) and is anend-to-end single image dehazing model. Zhu et al. [42]formulated the atmospheric scattering model into a GANframework and proposed a DehazeGAN, which can be usedto learn the global atmospheric light and the transmissioncoefﬁcient simultaneously. In order to generate realistic clearimages, Li et al. [20] directly estimates the haze-free im-age based on an end-to-end trainable cGAN with encoder-decoder architecture. Ren et al. [29] adopted a fusion-basedstrategy to fuse three inputs from an original hazy imageand proposed an end-to-end gated fusion network (GFN)for single image dehazing, which is trained with MSE andadversarial loss. Qu et al. [27] directly generate a haze-free image from a haze one without the physical scatteringmodel, which is called enhanced pix2pix dehazing network(EPDN), and multi-loss function optimization idea is alsoused to train the network, including adversarial loss. All ofthese models are based on paired image-to-image transla-tion framework.Moreover, the unpaired image-to-image translation frame-work can be also found in single image dehazing. Yang etal. [37] proposed an end-to-end disentangled dehazing net-work to generate a haze-free image based on unpaired super-vision. Engin et al. [4] completed the dehazing task basedon unpaired supervision, which did not rely on the atmo-spheric scattering model and trained by combining cycle-consistency and perceptual losses. Liu et al. [22] developedan end-to-end learning system that uses unpaired fog andfog-free training images to generate a fog-free image, whichalso uses adversarial discriminators and cycle-consistencylosses to train the whole framework. The advantage of un-paired supervision training is that the training process doesnot need to rely on synthetic dataset, because unpaired sam-ples are easy to obtain. However, because these frameworksdo not rely on the paired training data, the performance torestore realistic images is limited.Therefore, our designed framework is based on pairedcGAN, which is also incorporating multi-loss function opti-mization in it.

3. Input-Size Flexibility ConditionalGenerative Adversarial Network

Most of the previous single image dehazing models arebased on the atmospheric scattering model, which tends toestimate the parameters of transmission map and atmosphericlight. But parameter estimation usually introduces estima-tion errors, which reduces the quality of restoration image.We thus develop an end-to-end and image-to-image trans-lation model for single image dehazing, which is indepen-dent of the atmospheric scattering model and there is noadditional parameter estimation. The proposed model di-rectly produces a haze-free image from a haze one and isinput-size ﬂexibility at both training and test stages. In thefollowing, we will ﬁrst analyse the atmospheric scatteringmodel and then present each component of our proposedinput-size ﬂexibility conditional generative adversarial net-work, respectively, i.e., generator, discriminator and lossfunctions.

The famous atmospheric scattering model [26] can beformulated as follows: 𝐼 𝑟𝑒 ( 𝑥 ) = 𝐽 𝑟𝑒 ( 𝑥 ) 𝑡 ( 𝑥 ) + 𝛼 (1 − 𝑡 ( 𝑥 )) , (1)where 𝐼 𝑟𝑒 ( 𝑥 ) is the real haze image that need to be restored, 𝐽 𝑟𝑒 ( 𝑥 ) is the expected haze-free image that could be recov-ered from 𝐼 𝑟𝑒 ( 𝑥 ) , 𝑡 ( 𝑥 ) is the medium transmission map, 𝛼 isthe global atmospheric light and 𝑥 is the indexes of the pix-els corresponding to an image ( 𝐼 𝑟𝑒 , 𝐽 𝑟𝑒 and 𝑡 ). In real tasks,only 𝐼 𝑟𝑒 ( 𝑥 ) of Eq.(1) is known, the other three variables areunknown. Because the ﬁnal goal is to estimate 𝐽 𝑟𝑒 ( 𝑥 ) , thusif 𝑡 ( 𝑥 ) and 𝛼 can be estimated, then one can directly obtainthe 𝐽 𝑟𝑒 ( 𝑥 ) according the following formula: 𝐽 𝑟𝑒 ( 𝑥 ) = 1 𝑡 ( 𝑥 ) 𝐼 𝑟𝑒 ( 𝑥 ) + 𝛼 (1 − 1 𝑡 ( 𝑥 ) ) (2)However, estimating 𝑡 ( 𝑥 ) is a complex task because 𝑡 ( 𝑥 ) is related with both the distance 𝑑 ( 𝑥 ) from the scene pointto the camera and the scattering coefﬁcient 𝛽 of the atmo-sphere, which can be described as follows: 𝑡 ( 𝑥 ) = 𝑒 − 𝛽𝑑 ( 𝑥 ) (3)Moreover, there will always exist an error in the estima-tion of each parameter. Suppose that 𝛿 and 𝛿 are the av-erage estimation errors of parameters 𝑡 and 𝛼 , respectively.When Eq.(2) is used to obtain a haze-free image, if the totalaverage estimated error is 𝛿 , then we have: 𝛿 = 𝛿 + 𝛿 + 𝛿 ⋅ 𝛿 (4)From Eq.(4), only both 𝛿 → and 𝛿 → , we canobtain 𝛿 → . When the estimating parameters are morethan one in a system, the estimated error of each parameteris usually difﬁcult to control simultaneously. In order toestimate them by an end-to-end manner, we design a new Shichao Kan et al.:

Preprint submitted to Elsevier

Page 3 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Figure 1:

The designed generator (UR-Net-7) of the proposed input-size ﬂexibility conditional generative adversarial network. framework with a novel consistency loss (Eq.(7)). Next, wewill analyze it.Through log transformation, Eq.(2) can be transformedto the following form: log( 𝐽 𝑟𝑒 ( 𝑥 ) − 𝛼 ) = log( 𝐼 𝑟𝑒 ( 𝑥 ) − 𝛼 ) − log( 𝑡 ( 𝑥 )) , (5)By setting 𝐽 𝑔 ( 𝑥 ) = log( 𝐽 𝑟𝑒 ( 𝑥 ) − 𝛼 ) , 𝐼 𝑟 ( 𝑥 ) = log( 𝐼 𝑟𝑒 ( 𝑥 ) − 𝛼 ) and 𝑀 ( 𝑥 ) = log( 𝑡 ( 𝑥 )) , Eq.(5) can be rewritten as follows: 𝐽 𝑔 ( 𝑥 ) = 𝐼 𝑟 ( 𝑥 ) − 𝑀 ( 𝑥 ) , (6)In this paper, we use encoder-decoder idea to realize de-hazing task. According to Eq.(6), if we assume that 𝐼 𝑟 isone layer output of the encoder network with input 𝐼 𝑟𝑒 , and 𝐽 𝑔 is one layer output of the decoder network. We can ob-tain 𝐽 𝑔𝑒 according to the following rule: 𝐼 𝑟 = log( 𝐼 𝑟𝑒 − 𝛼 ) ⇒ 𝛼 = 𝐼 𝑟𝑒 − exp( 𝐼 𝑟 ) then 𝐽 𝑔 = log( 𝐽 𝑔𝑒 − 𝛼 ) ⇒ 𝐽 𝑔𝑒 = 𝐼 𝑟𝑒 − [ 𝑒𝑥𝑝 ( 𝐼 𝑟 ) − 𝑒𝑥𝑝 ( 𝐽 𝑔 )] . In the following, we design ourframework based on this observation and Eq.(6). In Eq.(6), 𝐽 𝑔 ( 𝑥 ) is the residual of 𝐼 𝑟 ( 𝑥 ) and 𝑀 ( 𝑥 ) , thusresidual idea is an important component of our generator.We design an U-type residual network (UR-Net) for singleimage dehazing, and the whole generator is the iteration ofUR-Net between two adjacent convolutional layers, whichis shown in Fig. 1.The unit of the red dotted rectangle in Fig. 1 is an ex-ample of the designed UR-Net. Step is a convolutionallayer with a kernel of 5 × (1 , 𝑐 , ℎ , 𝑤 ) and the shape of conv8 is (1 , 𝑐 , ℎ , 𝑤 ) , then we have ℎ = ⌈ ℎ ⌉ , 𝑤 = ⌈ 𝑤 ⌉ , where ⌈ ⋅ ⌉ is an up-round symbol. Step is a de-convolutionallayer with a kernel of 5 × is (1 , 𝑐 𝑜 , ℎ 𝑜 , 𝑤 𝑜 ) , then we have ℎ 𝑜 = ℎ , 𝑤 𝑜 = 𝑤 for the purpose of input-size ﬂexibility.Next, we concatenate the output of step and conv7 inthe channel dimension, the output of step is the concate-nated result. This is the idea of U-Net for the purpose ofﬁne information recovery. In order to realize residual learn-ing between the input (conv7 in this example) and output ofthe penultimate layer of UR-Net, we need to ensure that theoutput channel dimension of the penultimate layer equalsto the input. Thus, in the step, we adopt a convolutionallayer with a kernel of 3 × . Finally, the residual can be obtainedby the subtraction between conv7 and the output of step .Moreover, batch normalization (bn) [14] is used to each con-volutional and de-convolutional layer in our framework forthe purpose of fast convergence. The activation function ofthe last layer is tanh( ⋅ ), other layers are leak ReLU (Lrelu)and the value of leak is set as 0.2.In order to provide noise to realize conditional input, weadopt dropout idea at both training and test stage after thede-convolutional layers corresponding to conv6, conv7 andconv8. The dropout rate is set as 0.5. In Fig. 1, the height ℎ 𝑖 and width 𝑤 𝑖 of each conv 𝑖 is related to the height ℎ andwidth 𝑤 of an input image, the calculation formulas are ℎ 𝑖 = ⌈ ℎ 𝑖 ⌉ and 𝑤 𝑖 = ⌈ 𝑤 𝑖 ⌉ . The designed generator is an encoder-decoder structure, the encoder consists of conv1 to conv8,and the other parts form the decoder.For convenience, in the following, we will use UR-Net- 𝐾 indicate that the number of UR-Net structure in the gen-erator is 𝐾 (e.g., the generator of Fig. 1 has 7 UR-Net struc-ture, thus we call it UR-Net-7). At the same time, we willuse UR-Net- 𝐾 ∗ represent that there is no subtraction pro-cess in the last UR-Net of the generator (the last UR-Net islocated at the last de-convolutional layer).The purpose of generator ( 𝐺 ) is generating a haze-freeoutput image 𝐽 𝑔𝑒 based on the input haze image 𝐼 𝑟𝑒 and ran-dom noise 𝑍 , i.e., 𝐺 ∶ { 𝐼 𝑟𝑒 , 𝑍 } → 𝐽 𝑔𝑒 . Shichao Kan et al.:

Preprint submitted to Elsevier

Page 4 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing

C16xC 9xC 4xC (cid:70) onv2d (cid:69) n (cid:47) relu concatenateCconv1 (cid:66)(cid:71) conv2 (cid:66)(cid:71) conv3 (cid:66)(cid:71) conv4 (cid:66)(cid:71) fc (cid:85)(cid:72)(cid:68)(cid:79)(cid:29)(cid:70)(cid:79)(cid:72)(cid:68)(cid:85)(cid:14)(cid:75)(cid:68)(cid:93)(cid:72)(cid:73)(cid:68)(cid:78)(cid:72)(cid:29)(cid:70)(cid:79)(cid:72)(cid:68)(cid:85)(cid:14)(cid:75)(cid:68)(cid:93)(cid:72)(cid:16)(cid:73)(cid:85)(cid:72)(cid:72) (cid:54)(cid:83)(cid:68)(cid:87)(cid:76)(cid:68)(cid:79)(cid:3)(cid:51)(cid:92)(cid:85)(cid:68)(cid:80)(cid:76)(cid:71)(cid:3)(cid:51)(cid:82)(cid:82)(cid:79)(cid:76)(cid:81)(cid:74) Figure 2:

The designed discriminator of the proposed input-size ﬂexibility conditional generative adversarial network.

The discriminator 𝐷 is an important part of our pro-posed input-size ﬂexibility cGAN model, which is used todiscriminate the input sample is a "real image" ( 𝐽 𝑟𝑒 ) or a"generated image" ( 𝐽 𝑔𝑒 ). As shown in Fig. 2, it consistsof an input layer, 4 convolutional layers, a spatial pyramidpooling (SPP) [10] layer and a fully convolutional layer (fc).One image of the input layer is the concatenation of realclear image and real haze image in the channel, another oneis the concatenation of real clear image and the generatedhaze-free image in the channel. The ﬁrst three convolutionallayers have a convolutional operation with a kernel of 5 × × The idea of multi-loss function optimization is widelyused in various CNN-based systems, which is proved effec-tively in different kinds of applications. It is also used in ourframework. Next, we will deﬁne them one-by-one.To ensure that 𝐼 𝑟 = log( 𝐼 𝑟𝑒 − 𝛼 ) and 𝐽 𝑔 = log( 𝐽 𝑔𝑒 − 𝛼 ) ,we need to constrain 𝐼 𝑟𝑒 − exp( 𝐼 𝑟 ) = 𝐽 𝑔𝑒 − exp( 𝐽 𝑔 ) . Thus,we deﬁne a consistency loss as follows:  𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 ( 𝐺 ) = || 𝐼 𝑟𝑒 − exp( 𝐼 𝑟 ) − 𝐽 𝑔𝑒 + exp( 𝐽 𝑔 ) || (7)This consistency loss function is to ensure that the transfor-mation for 𝐼 𝑟𝑒 and 𝐽 𝑔𝑒 in the network is approximated to thelog transformation with parameter 𝛼 , which is novel and im-portant for our framework. Instead of learning the parameterof 𝛼 , by this consistency loss, we use a convolutional layerto estimate the transformation of log( 𝐼 𝑟𝑒 − 𝛼 ) and the inversetransformation of log( 𝐽 𝑔𝑒 − 𝛼 ) , respectively.Then, we borrow the general cGAN loss function [15]to our model, which is deﬁned as follows:  𝑐𝐺𝐴𝑁 ( 𝐺, 𝐷 ) = 𝔼 𝐼 𝑟𝑒 ,𝐽 𝑟𝑒 [ 𝑙𝑜𝑔𝐷 ( 𝐼 𝑟𝑒 , 𝐽 𝑟𝑒 )]+ 𝔼 𝐼 𝑟𝑒 ,𝑍 [ 𝑙𝑜𝑔 (1 − 𝐷 ( 𝐼 𝑟𝑒 , 𝐺 ( 𝐼 𝑟𝑒 , 𝑍 )))] (8) At the training stage, the generator 𝐺 is trained to pro-duce outputs that cannot be distinguished as "fakes" by thediscriminator 𝐷 , and 𝐷 is trained to distinguish the gener-ated example as "fakes". Thus, 𝐺 tries to minimize objective(8) against an adversarial 𝐷 that tries to maximize it, i.e., 𝐺 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐺 𝑚𝑎𝑥 𝐷  𝑐𝐺𝐴𝑁 ( 𝐺 ; 𝐷 ) . In the last term of (8),minimizing 𝐺 is equivalent to maximizing 𝑙𝑜𝑔 ( 𝐷 ( 𝐼 𝑟𝑒 , 𝐺 ( 𝐼 𝑟𝑒 , 𝑍 ))) ,which is adopted at the implementation stage.Because the 𝐿 loss function can constrain the genera-tor’s output absoutely equal to the expected output thus re-duce the blur. We also introduce it as one of our loss func-tions, as follows:  𝐿 ( 𝐺 ) = 𝔼 𝐼 𝑟𝑒 ,𝐽 𝑟𝑒 ,𝑍 [ || 𝐽 𝑟𝑒 − 𝐺 ( 𝐼 𝑟𝑒 , 𝑍 ) || ] (9)Moreover, perception-driving losses are veriﬁed effec-tive in various image restoration tasks. Thus, in order tomake the generated haze-free images have a good visual ef-fect, we adopt SSIM and PSNR to construct our perceptionlosses. In our model, the calculation formula of SSIM is thesame as [41]. PSNR is deﬁned as: 𝑃 𝑆𝑁𝑅 ( 𝐽 𝑟𝑒 , 𝐽 𝑔𝑒 ) = 10 ⋅ 𝑙𝑜𝑔 ( ( 𝑚𝑎𝑥 ( 𝐽 𝑟𝑒 ) − 𝑚𝑖𝑛 ( 𝐽 𝑟𝑒 )) 𝑀𝑆𝐸 ( 𝐽 𝑟𝑒 , 𝐽 𝑔𝑒 ) ) , (10)where 𝑀𝑆𝐸 ( 𝐽 𝑟𝑒 , 𝐽 𝑔𝑒 ) is the mean of ( 𝐽 𝑟𝑒 − 𝐽 𝑔𝑒 ) and canbe formulated as 𝑀𝑆𝐸 ( 𝐽 𝑟𝑒 , 𝐽 𝑔𝑒 ) = 𝑚𝑒𝑎𝑛 (( 𝐽 𝑟𝑒 − 𝐽 𝑔𝑒 ) ) .According to the above formula, we thus deﬁne SSIMand PSNR losses as follows:  𝑆𝑆𝐼𝑀 ( 𝐺 ) = 1 − 𝑆𝑆𝐼𝑀 ( 𝐽 𝑟𝑒 , 𝐽 𝑔𝑒 ) , (11)  𝑃 𝑆𝑁𝑅 ( 𝐺 ) = 1 − 𝑃 𝑆𝑅𝑁 ( 𝐽 𝑟𝑒 , 𝐽 𝑔𝑒 ) 𝑡ℎ𝑟𝑒𝑠ℎ , (12)where 𝑡ℎ𝑟𝑒𝑠ℎ is a threshold that is set as 40 in our experi-ments.Finally, the overall loss function of our model is deﬁnedas follows:  =  𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 ( 𝐺 ) + 𝜆  𝑐𝐺𝐴𝑁 ( 𝐺, 𝐷 ) + 𝜆  𝐿 ( 𝐺 )+ 𝜆  𝑆𝑆𝐼𝑀 ( 𝐺 ) + 𝜆  𝑃 𝑆𝑁𝑅 ( 𝐺 ) + 𝜆 || 𝑤 || , (13)where 𝜆 , 𝜆 , 𝜆 and 𝜆 are weights of their correspond-ing loss functions, which are set as 1, 100, 100 and 100 inour experiments, respectively. The ﬁnal goal is minimizing(13). The last term is only used to multi-scale training stage(Section 3.5), in which 𝑤 is the weights of the generator and 𝜆 is the weight of this term. Multi-scale fusion is veriﬁed effectively in image de-hazing by Zhang and Tao [40]. In their model, a Gaussianpyramid architecture with a late fusion module is designedto fuse different estimated feature maps. We also adoptthe Gaussian pyramid architecture to design our multi-scale

Shichao Kan et al.:

Preprint submitted to Elsevier

Page 5 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Figure 3:

The fusion model of multi-scale generator. generator fusion model, as shown in Fig. 3, which aims toshow the generalization of our model to multi-scale frame-work. It should be noted that FAMED-Net[40] is trainedwithout adversarial, our multi-scale generator is trained basedon our cGAN framework. The input of the generator in-cludes one haze image (haze ) and the corresponding down-sampled images ( scale (haze ) and scale (haze )). Theoutput of the generator includes 4 haze-free images (haze-free , haze-free , haze-free , and haze-free 𝑓𝑢𝑠𝑖𝑜𝑛 in Fig. 3),which correspond to the original scale of input haze image, scale, scale, and multi-scale fused output. The moduleof multi-scale fusion is performed based on the concatena-tion of haze maps ( 𝑀 , 𝑀 , 𝑀 in Fig. 3) of original scale,2 × up-sampling of scale, 4 × up-sampling of scale. Thefused haze map ( 𝑀 𝑓𝑢𝑠𝑖𝑜𝑛 ) is obtained after applying a con-volutional layer with 1 × down-samplingscale and down-sampling scale of haze images are UR-Net-7 ∗ , UR-Net-6 ∗ and UR-Net-5 ∗ , respectively.The discriminator is also vital for the fusion generator.Because the designed discriminator is input-size ﬂexibility,thus we have two alternative of discriminator for the fusiongenerator, i.e., with and without sharing parameters for eachoutput of the generator. Although sharing parameters of dis-criminator can reduce the model size, it can not reduce thenumber of computations. In order to enhance the discrim-inant ability of this fusion generator, we directly adopt thediscriminator model without sharing parameters, i.e., eachoutput of the fusion generator is discriminated by differentdiscriminators.For the loss function of the fusion generator, we applyobjective (13) to each output of the generator and corre-sponding discriminator. We implement the model based on TensorFlow, and themodel is trained with minibatch SGD (Stochastic GradientDescent). The Adam solver [16] with a learning rate of0.0002 and momentum parameters 𝛽 = 0 . , 𝛽 = 0 . isapplied to optimize our model. All parameters are trainedfrom scratch and the batch size is set as 1. The hyper- parameter 𝜆 in objective (13) is set as 0.001. In order tobetter maintain the convergence balance between the gen-erator and the discriminator, we update parameters of thediscriminator once every 4 iterations. Because our modelis input-size ﬂexibility, we train this model by the imageswith different sizes to obtain a better haze-free image. How-ever, training with different sizes of input is slow, we thusﬁrst train the model with ﬁxed size of input images and thenﬁne-tuning it for the case of different sizes.

4. Experiments

We conduct experiments on the dataset of intelligent traf-ﬁc video image enhancement processing competition of ICIG2019 (we call it ICIG2019 for convenience) and the largescale REalistic Single Image DEhazing (RESIDE) dataset[19] for single image dehazing.The

ICIG2019 dataset contains 5500 clear images ofreal scene and corresponding synthetic haze ones. The train-ing and validation sets contain 5000 and 500 image pairs,respectively. In the experiments, we use the training set totrain our models and use the validation set to test the trainedmodels. Ablation studies are conducted on this dataset.The

RESIDE dataset is the largest single image dehaz-ing one until now, which contains 110,500 synthetic hazyindoor images (ITS) and 313,950 synthetic hazy outdoorimages (OTS) in the training set. The synthetic objectivetesting set (SOTS) contains 500 indoor images and 500 out-door images. The hybrid subjective testing set (HSTS) con-tains 10 real-world images and 10 synthetic images. In thetraining dataset, each clear image corresponds to multiplehaze images of different concentrations. For each clear im-age, we randomly select a corresponding haze image fromthe training samples to form our training set.We use 4 evaluation metrics that have been realized inthe skimage package of python to evaluate the performanceof single image dehazing, which are MSE (the smaller thebetter ↓ ), normalization root mean-squared error (NRMSE)(the smaller the better ↓ ), PSNR (the larger the better ↑ ) andSSIM (the larger the better ↑ ). We also re-test the comparedmethods by running the corresponding released models. Allthe results reported for the compared methods and our meth-ods are evaluated using the standard evaluation interface ofpython for a fair comparison. The experiments of ablation study is conducted on theICIG2019 dataset. We ﬁrst verify the effectiveness of eachloss function in Eq.(13) based on the model of UR-Net-7(Fig. 1) and the input image with size of 256 ×  𝐵𝑎𝑠𝑒 =  𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 ( 𝐺 ) +  𝑐𝐺𝐴𝑁 ( 𝐺, 𝐷 ) loss function in our frame-work, we verify the combinations of  𝐵𝑎𝑠𝑒 +  𝐿 ,  𝐵𝑎𝑠𝑒 +  𝑆𝑆𝐼𝑀 ,  𝐵𝑎𝑠𝑒 +  𝑃 𝑆𝑁𝑅 ,  𝐵𝑎𝑠𝑒 +  𝐿 +  𝑆𝑆𝐼𝑀 (without  𝑃 𝑆𝑁𝑅 ),  𝐵𝑎𝑠𝑒 +  𝐿 +  𝑃 𝑆𝑁𝑅 (without  𝑆𝑆𝐼𝑀 ),  𝐵𝑎𝑠𝑒 +  𝑆𝑆𝐼𝑀 +  𝑃 𝑆𝑁𝑅 (without  𝐿 ), and  𝐵𝑎𝑠𝑒 +  𝐿 +  𝑆𝑆𝐼𝑀 +  𝑃 𝑆𝑁𝑅 (  ). The experimental results are shown in Table 1. Shichao Kan et al.:

Preprint submitted to Elsevier

Page 6 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Table 1

Results of Different Losses on The ICIG2019 Dataset.loss MSE ↓ NRMSE ↓ PSNR ↑ SSIM ↑  𝐵𝑎𝑠𝑒  𝐵𝑎𝑠𝑒 +  𝐿  𝐵𝑎𝑠𝑒 +  𝑆𝑆𝐼𝑀  𝐵𝑎𝑠𝑒 +  𝑃𝑆𝑁𝑅  𝑃𝑆𝑁𝑅  𝑆𝑆𝐼𝑀  𝐿  Results of Different Input Sizes With and Without Input-sizeFlexibility Fine-tuning on The ICIG2019 Dataset.Training mode MSE ↓ NRMSE ↓ PSNR ↑ SSIM ↑ ×

256 287.9 0.116 24.58 0.904256 ×

256 + IFF 245.1 0.108 25.04 0.905368 ×

544 300.8 0.118 24.53 0.898368 ×

544 + IFF ×

512 317.1 0.116 24.67 0.891512 ×

512 + IFF 223.4

From Table 1, it can be seen that the best performanceis obtained when all the losses are used. The performancesof  𝐵𝑎𝑠𝑒 +  𝐿 ,  𝐵𝑎𝑠𝑒 +  𝑆𝑆𝐼𝑀 and  𝐵𝑎𝑠𝑒 +  𝑃 𝑆𝑁𝑅 aremuch better than the performance of  𝐵𝑎𝑠𝑒 , which verifythe effectiveness of each loss function after combined with  𝐵𝑎𝑠𝑒 . The performance without  𝑃 𝑆𝑁𝑅 is much betterthan the performances of  𝐵𝑎𝑠𝑒 +  𝐿 and  𝐵𝑎𝑠𝑒 +  𝑆𝑆𝐼𝑀 ,the performance without  𝑆𝑆𝐼𝑀 is much better than theperformances of  𝐵𝑎𝑠𝑒 +  𝐿 and  𝐵𝑎𝑠𝑒 +  𝑃 𝑆𝑁𝑅 , andthe performance without  𝐿 is much better than the per-formances of  𝐵𝑎𝑠𝑒 +  𝑆𝑆𝐼𝑀 and  𝐵𝑎𝑠𝑒 +  𝑃 𝑆𝑁𝑅 , whichverify the effectiveness of combining any two loss functionswith  𝐵𝑎𝑠𝑒 . Moreover, we notice that the values of MSE andPSNR of 𝐿 𝐵𝑎𝑠𝑒 + 𝐿 𝑃 𝑆𝑁𝑅 are much better than 𝐿 𝐵𝑎𝑠𝑒 + 𝐿 𝐿 and 𝐿 𝐵𝑎𝑠𝑒 + 𝐿 𝑆𝑆𝐼𝑀 , which shows that the proposed PSNRloss is much better than the 𝐿 loss and 𝐿 𝑆𝑆𝐼𝑀 loss whenthey combined with 𝐿 𝐵𝑎𝑠𝑒 , respectively.The second ablation experiments are the generator net-work with ﬁxed sizes of input and ﬁne-tuned with input-sizeﬂexibility images, which aims to verify the effectiveness ofinput-size ﬂexibility. The experimental results are shown inTable 2.In Table 2, the IFF is the abbreviation of input-size ﬂex-ibility ﬁne-tuning. The training mode indicates the sizes oftraining input. The test results are based on the mode ofinput-size ﬂexibility, i.e., the output size of an image equalsto the size of the input image. From Table 2, we can seethat with the input-size ﬂexibility ﬁne-tuning, better perfor-mances can be obtained. Moreover, the best MSE is ob-tained with the training mode of 368 ×

544 + IFF, the best

Table 3

Comparison With The State-of-the-art Methods on The Vali-dation of ICIG2019 Dataset.Methods MSE ↓ NRMSE ↓ PSNR ↑ SSIM ↑ MSCNN [28] 1292 0.250 17.33 0.810DCPDN [38] 971.2 0.218 19.06 0.848GFN [29] 766.7 0.176 20.96 0.828De-cGAN [20] 764.4 0.174 21.02 0.857AOD-Net [17] 646.8 0.175 20.73 0.868GMAN [24] 290.2 0.118 24.37 0.887GMAN ﬁne-tuned 287.1 0.118 24.43 0.891FAMED-Net [40] 249.5 0.107 25.17 0.909UR-Net-7

PSNR and SSIM are obtained with the training mode of512 ×

512 + IFF. The size of 368 ×

544 is the mean size of thetraining images (368 is the mean of heights and 544 is themean of widths). Moreover, we can see that when IFF is notused, the best MSE is obtained by the model trained withinput size of 256 × We compare the proposed method with the state-of-the-art CNN-based dehazing methods. They are AOD-Net [17],MSCNN [28], GMAN [24], DCPDN [38], De-cGAN [20],GFN [29], and recently proposed FAMED-Net [40]. Thecomparison results of ICIG2019 dataset are shown in Table3. In Table 3, the method of GMAN ﬁne-tuned means theﬁne-tuned model of GMAN on the ICIG2019 dataset basedon the pre-trained GMAN model. From Table 3, we can seethat the proposed UR-Net-7 is much better than the previousproposed methods for the evaluations of MSE, NRMSE andPSRN. The best SSIM is obtained by the proposed multi-scale cGAN, followed by the method of FAMED-Net. More-over, we notice that after the GMAN model is ﬁne-tuned(GMAN ﬁne-tuned) on the ICIG2019 dataset, the perfor-mance are batter than the GMAN without ﬁne tuning (GMAN(SPL19)).In these comparison methods, both AOD-Net and GMANare input-size ﬂexibility at the test stage. But they are notinput-size ﬂexibility at the training stage, one reason is thatthe batch-size of them is greater than 1 to obtain a goodperformance. The performance of them will drop a lot ifthe batch-size is set as 1 for input-size ﬂexibility purpose at

Shichao Kan et al.:

Preprint submitted to Elsevier

Page 7 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Table 4

Comparison With The State-of-the-art Methods on The Out-door of SOTS Dataset.Methods MSE ↓ NRMSE ↓ PSNR ↑ SSIM ↑ MSCNN [28] 812.2 0.202 20.02 0.880DCPDN [38] 828.1 0.204 19.93 0.858GFN [29] 676.2 0.172 21.47 0.849De-cGAN [20] 611.1 0.160 21.96 0.868AOD-Net [17] 693.0 0.185 20.47 0.899FAMED-Net [40] 199.6 0.098 26.17

UR-Net-7

Comparison With The State-of-the-art Methods on The Syn-thetic of HSTS Dataset.Methods MSE ↓ NRMSE ↓ PSNR ↑ SSIM ↑ MSCNN [28] 1164.2 0.233 18.47 0.813DCPDN [38] 841.5 0.197 20.21 0.852GFN [29] 527.6 0.147 22.83 0.887De-cGAN [20] 498.5 0.145 22.85 0.869AOD-Net [17] 711.1 0.181 20.56 0.887FAMED-Net [40] 168.9 0.089 26.68

UR-Net-7 the training stage, because batch-normalization is adoptedto realize good performance by using large batch-size. TheFAMED-Net is designed based on the AOD-Net, it can bealso changed to input-size ﬂexibility mode at the test stage,because late fusion idea is adopted, better performance canbe obtained. Different from these works, the proposed modelis GAN-based input-size ﬂexibility, which is input-size ﬂex-ibility at both training and test stages. Moreover, the pro-posed multi-scale cGAN obtained the best single image de-hazing performance based on the evaluations in this paper,which also proved the effectiveness of image late fusion.Different from previous fusion idea, the proposed fusionframework is based on cGAN, which is a cGAN fusion frame-work.Table 4 and Table 5 are the comparisons of the outdoorof SOTS and HSTS on the RESIDE dataset.From Table 4 and Table 5, it can be seen that the pro-posed multi-scale cGAN obtains the best results for the eval-uations of MSE, NRMSE and PSNR. For the evaluation ofSSIM, the best value is obtained by the method of FAMED-Net (both in Table 4 and Table 5) and multi-scale cGAN (inTable 4). Although the SSIM of the proposed multi-scalecGAN is less than the FAMED-Net 0.03% in Table 5, theMSE, NRMSE, and PSNR values of the proposed multi-scale cGAN are much higher than the FAMED-Net. In par-ticular, the PSNR is 1.34dB higher.Fig. 4 is the subjective comparisons on synthetic hazy

Table 6

Comparison With The State-of-the-art Methods on The In-door of SOTS Dataset.Methods MSE ↓ NRMSE ↓ PSNR ↑ SSIM ↑ MSCNN [28] 2097.5 0.383 16.00 0.780AOD-Net [17] 1144.8 0.271 19.07 0.824GFN [29] 443.0 0.175 22.48 0.888FAMED-Net [40] 361.4 0.153 23.63

UR-Net-7 images from the ICIG2019 validation set. From these de-hazed images, we can see that our methods (especially multi-scale cGAN) are relatively good for the ground, the cloudsand the sky.Table 6 is the comparisons of the indoor of SOTS on theRESIDE dataset. According to Table 6, we can see that thebest SSIM is obtained by the FAMED-Net. However, thebest values of MSE, NRMSE, and PSNR are obtained bythe proposed UR-Net-7 and multi-scale cGAN.

As analysis in Section 3.1, the general atmospheric scat-tering model can be simpliﬁed to Eq.(6). According to Eq.(6),one haze image ( 𝐼 ) can be seen as a clear image ( 𝐽 ) plus acontent related noise image 𝑀 , which is a general additivenoise model. For CNN-based image denoising or restora-tion, most of the noise models can be transformed into anadditive noise model, e.g., the multiplicative noise modelcan be transformed into an additive noise model by loga-rithmic transformation. Thus, the proposed input-size ﬂexi-bility cGAN is a general image restoration model.The haze map ( 𝑀 ) of Eq.(6) can be thought of as a kindof content related noise in a haze image. The visualizationsof 𝑀 are shown in Fig. 5. The haze maps in Fig 5 is thetransformed results of 𝑀 , which is same as the transforma-tion of 𝐽 𝑔 , i.e., add 1 and multiply by the mean. From Fig.5, it can be seen that the haze maps relate with the color,illumination and the concentration of haze, also the contentof the corresponding haze images. Similar to the haze ofreal scenes, there is no speciﬁc rule for these generated hazemaps.Considering the applicability, image dehazing can usu-ally be used to the preprocess step of other computer visiontasks. The proposed image dehazing algorithm can be usedto assist object detection, as shown in Fig.6, which is thecomparison of object detection results before and after de-hazing with the proposed UR-Net-7. The detection algo-rithm is SNIPER [31], we only use the released code andthe pre-trained model for detection. From the detection re-sults of the two images in the middle of Fig.6, we can seethat more objects can be detected after dehazing with UR-Net-7 (the objects in the red rectangle). https://github.com/mahyarnajibi/SNIPER Shichao Kan et al.:

Preprint submitted to Elsevier

Page 8 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing (cid:10)(cid:29)(cid:47)(cid:32)(cid:1)(cid:12)(cid:38)(cid:40)(cid:45)(cid:43)(cid:48) (cid:14)(cid:25)(cid:5)(cid:19)(cid:20)(cid:48) (cid:6)(cid:5)(cid:22)(cid:6)(cid:16)(cid:48) (cid:9)(cid:8)(cid:16)(cid:48) (cid:6)(cid:32)(cid:1)(cid:30)(cid:9)(cid:4)(cid:16)(cid:48) (cid:4)(cid:21)(cid:6)(cid:1)(cid:17)(cid:32)(cid:43)(cid:48)(cid:9)(cid:14)(cid:4)(cid:16)(cid:48) (cid:8)(cid:4)(cid:14)(cid:7)(cid:6)(cid:1)(cid:16)(cid:32)(cid:43)(cid:48) (cid:28)(cid:24)(cid:1)(cid:16)(cid:32)(cid:43)(cid:1)(cid:3)(cid:48) (cid:14)(cid:45)(cid:36)(cid:43)(cid:34)(cid:1)(cid:42)(cid:30)(cid:29)(cid:37)(cid:32)(cid:48)(cid:30)(cid:9)(cid:4)(cid:16)(cid:48) (cid:9)(cid:41)(cid:39)(cid:45)(cid:38)(cid:31)(cid:48)(cid:26)(cid:41)(cid:46)(cid:43)(cid:33)(cid:48)(cid:11)(cid:29)(cid:47)(cid:32)(cid:2)(cid:13)(cid:38)(cid:40)(cid:45)(cid:44)(cid:48) (cid:14)(cid:25)(cid:5)(cid:19)(cid:20)(cid:48) (cid:6)(cid:5)(cid:23)(cid:6)(cid:18)(cid:48) (cid:9)(cid:8)(cid:16)(cid:48) (cid:6)(cid:32)(cid:1)(cid:30)(cid:9)(cid:4)(cid:16)(cid:48) (cid:4)(cid:21)(cid:6)(cid:1)(cid:16)(cid:32)(cid:43)(cid:48)(cid:9)(cid:14)(cid:4)(cid:16)(cid:48) (cid:8)(cid:4)(cid:14)(cid:7)(cid:6)(cid:1)(cid:17)(cid:32)(cid:43)(cid:48) (cid:28)(cid:24)(cid:1)(cid:16)(cid:32)(cid:44)(cid:1)(cid:3)(cid:48) (cid:15)(cid:45)(cid:37)(cid:44)(cid:35)(cid:2)(cid:42)(cid:30)(cid:29)(cid:37)(cid:32)(cid:48)(cid:30)(cid:9)(cid:4)(cid:18)(cid:48) (cid:9)(cid:41)(cid:39)(cid:45)(cid:38)(cid:31)(cid:48)(cid:27)(cid:41)(cid:45)(cid:43)(cid:33)(cid:48)(cid:10)(cid:29)(cid:47)(cid:32)(cid:1)(cid:13)(cid:38)(cid:40)(cid:45)(cid:43)(cid:48) (cid:14)(cid:25)(cid:5)(cid:19)(cid:16)(cid:48) (cid:6)(cid:5)(cid:22)(cid:6)(cid:16)(cid:48) (cid:9)(cid:8)(cid:16)(cid:48) (cid:6)(cid:32)(cid:1)(cid:30)(cid:9)(cid:4)(cid:16)(cid:48) (cid:4)(cid:21)(cid:6)(cid:1)(cid:17)(cid:32)(cid:43)(cid:48)(cid:9)(cid:14)(cid:4)(cid:17)(cid:48) (cid:8)(cid:4)(cid:14)(cid:7)(cid:6)(cid:1)(cid:16)(cid:32)(cid:43)(cid:48) (cid:28)(cid:24)(cid:1)(cid:16)(cid:32)(cid:44)(cid:1)(cid:3)(cid:48) (cid:14)(cid:45)(cid:36)(cid:44)(cid:34)(cid:1)(cid:42)(cid:30)(cid:29)(cid:37)(cid:32)(cid:48)(cid:30)(cid:9)(cid:4)(cid:17)(cid:48) (cid:9)(cid:41)(cid:39)(cid:45)(cid:38)(cid:31)(cid:48)(cid:26)(cid:41)(cid:45)(cid:43)(cid:33)(cid:48)

Figure 4:

Subjective comparisons between the proposed methods and the most related state-of-the-art methods on synthetichazy images from ICIG2019 validation set. Best viewed in color.

Haze

Input

Haze

Map

Dehazing

Output (cid:712) (cid:712) (cid:712) (cid:712) (cid:712)

Figure 5:

The visualizations of haze map 𝑀 and corresponding dehazed results of UR-Net-7.Shichao Kan et al.: Preprint submitted to Elsevier

Page 9 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing

Figure 6:

The detection results before (the ﬁrst line) and after (the second line) dehazing with UR-Net-7.

5. Conclusions

In this paper, we developed an input-size ﬂexibility cGANwith multi-loss function training model for single image de-hazing, experimental results proved the effectiveness of input-size ﬂexibility and multi-loss function optimization. More-over, a multi-scale image restoration fusion framework basedon cGAN was proposed and veriﬁed for single image de-hazing. Experimental results showed that we obtained thestate-of-the-art single image dehazing performance on theICIG2019 and RESIDE datasets.Our basic idea is to realize image restoration based onEq.(6), thus, the proposed framework can be also used toother image restoration tasks, such as image denoising, im-age deblurring, and image fusion [7], etc., which will bestudied in the near future.

References [1] Ancuti, C.O., Ancuti, C., Hermans, C., Bekaert, P., 2010. A fastsemi-inverse approach to detect and remove the haze from a singleimage, in: Computer Vision - ACCV 2010 - 10th Asian Conferenceon Computer Vision, Queenstown, New Zealand, November 8-12,2010, Revised Selected Papers, Part II, pp. 501–514.[2] Berman, D., Treibitz, T., Avidan, S., 2016. Non-local image de-hazing, in: 2016 IEEE Conference on Computer Vision and PatternRecognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,pp. 1674–1682.[3] Cai, B., Xu, X., Jia, K., Qing, C., Tao, D., 2016. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. ImageProcessing 25, 5187–5198.[4] Engin, D., Genç, A., Ekenel, H.K., 2018. Cycle-dehaze: Enhancedcyclegan for single image dehazing, in: 2018 IEEE Conference onComputer Vision and Pattern Recognition Workshops, CVPR Work-shops 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 825–833.[5] Gao, Y., Hu, H., Li, B., Guo, Q., Pu, S., 2019. Detail preserved singleimage dehazing algorithm based on airlight reﬁnement. IEEE Trans.Multimedia 21, 351–362.[6] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley,D., Ozair, S., Courville, A.C., Bengio, Y., 2014. Generative adversar-ial nets, in: Advances in Neural Information Processing Systems 27:Annual Conference on Neural Information Processing Systems 2014,December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680.[7] Guo, X., Nie, R., Cao, J., Zhou, D., Mei, L., He, K., 2019. Fusegan: Learning to fuse multi-focus image via conditional generative adver-sarial network. IEEE Trans. Multimedia 21, 1982–1996.[8] He, K., Sun, J., Tang, X., 2011. Single image haze removal usingdark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2341–2353.[9] He, K., Zhang, X., Ren, S., Sun, J., 2015a. Spatial pyramid poolingin deep convolutional networks for visual recognition. IEEE Trans.Pattern Anal. Mach. Intell. 37, 1904–1916.[10] He, K., Zhang, X., Ren, S., Sun, J., 2015b. Spatial pyramid poolingin deep convolutional networks for visual recognition. IEEE Trans.Pattern Anal. Mach. Intell. 37, 1904–1916.[11] He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning forimage recognition, in: 2016 IEEE Conference on Computer Visionand Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June27-30, 2016, pp. 770–778.[12] Hinton, G.E., Salakhutdinov, R.R., 2006. Reducing the dimensional-ity of data with neural networks. Science 313, 504–507.[13] Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q., 2017.Densely connected convolutional networks, in: 2017 IEEE Confer-ence on Computer Vision and Pattern Recognition, CVPR 2017, Hon-olulu, HI, USA, July 21-26, 2017, pp. 2261–2269.[14] Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift, in: Proceedingsof the 32nd International Conference on Machine Learning, ICML2015, Lille, France, 6-11 July 2015, pp. 448–456.[15] Isola, P., Zhu, J., Zhou, T., Efros, A.A., 2017. Image-to-image trans-lation with conditional adversarial networks, in: 2017 IEEE Con-ference on Computer Vision and Pattern Recognition, CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, pp. 5967–5976.[16] Kingma, D.P., Ba, J., 2015. Adam: A method for stochastic optimiza-tion, in: 3rd International Conference on Learning Representations,ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference TrackProceedings.[17] Li, B., Peng, X., Wang, Z., Xu, J., Feng, D., 2017. Aod-net: All-in-one dehazing network, in: IEEE International Conference on Com-puter Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp.4780–4788.[18] Li, B., Peng, X., Wang, Z., Xu, J., Feng, D., 2018a. End-to-endunited video dehazing and detection, in: Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI-18), the30th innovative Applications of Artiﬁcial Intelligence (IAAI-18), andthe 8th AAAI Symposium on Educational Advances in Artiﬁcial In-telligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7,2018, pp. 7016–7023.[19] Li, B., Ren, W., Fu, D., Tao, D., Feng, D., Zeng, W., Wang, Z., 2019.Benchmarking single-image dehazing and beyond. IEEE Trans. Im-age Processing 28, 492–505.[20] Li, R., Pan, J., Li, Z., Tang, J., 2018b. Single image dehazing via

Shichao Kan et al.:

Preprint submitted to Elsevier

Page 10 of 11 GAN-Based Input-Size Flexibility Model for Single Image Dehazing conditional generative adversarial network, in: 2018 IEEE Confer-ence on Computer Vision and Pattern Recognition, CVPR 2018, SaltLake City, UT, USA, June 18-22, 2018, pp. 8202–8211.[21] Liu, Q., Gao, X., He, L., Lu, W., 2018a. Single image dehazing withdepth-aware non-local total variation regularization. IEEE Trans. Im-age Processing 27, 5178–5191.[22] Liu, W., Hou, X., Duan, J., Qiu, G., 2019a. End-to-end single imagefog removal using enhanced cycle consistent adversarial networks.arXiv preprint, abs/1902.01374 .[23] Liu, Y., Zhao, G., Gong, B., Li, Y., Raj, R., Goel, N., Kesav, S.,Gottimukkala, S., Wang, Z., Ren, W., Tao, D., 2018b. Improvedtechniques for learning to dehaze and beyond: A collective study.arXiv preprint, abs/1807.00202 .[24] Liu, Z., Xiao, B., Alrabeiah, M., Wang, K., Chen, J., 2019b. Singleimage dehazing with a generic model-agnostic convolutional neuralnetwork. IEEE Signal Process. Lett. 26, 833–837.[25] Mirza, M., Osindero, S., 2014. Conditional generative adversarialnets. arXiv preprint, abs/1411.1784 .[26] Narasimhan, S.G., Nayar, S.K., 2003. Contrast restoration of weatherdegraded images. IEEE Trans. Pattern Anal. Mach. Intell. 25, 713–724.[27] Qu, Y., Chen, Y., Huang, J., Xie, Y., 2019. Enhanced pix2pix de-hazing network, in: 2019 IEEE Conference on Computer Vision andPattern Recognition, CVPR 2019, 2019.[28] Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., Yang, M., 2016. Sin-gle image dehazing via multi-scale convolutional neural networks, in:Computer Vision - ECCV 2016 - 14th European Conference, Ams-terdam, The Netherlands, October 11-14, 2016, Proceedings, Part II,pp. 154–169.[29] Ren, W., Ma, L., Zhang, J., Pan, J., Cao, X., Liu, W., Yang, M., 2018.Gated fusion network for single image dehazing, in: 2018 IEEE Con-ference on Computer Vision and Pattern Recognition, CVPR 2018,Salt Lake City, UT, USA, June 18-22, 2018, pp. 3253–3261.[30] Simonyan, K., Zisserman, A., 2015. Very deep convolutional net-works for large-scale image recognition, in: 3rd International Con-ference on Learning Representations, ICLR 2015, San Diego, CA,USA, May 7-9, 2015, Conference Track Proceedings.[31] Singh, B., Najibi, M., Davis, L.S., 2018. SNIPER: efﬁcient multi-scale training, in: Advances in Neural Information Processing Sys-tems 31: Annual Conference on Neural Information Processing Sys-tems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada.,pp. 9333–9343.[32] Song, Y., Li, J., Wang, X., Chen, X., 2018. Single image dehazingusing ranking convolutional neural network. IEEE Trans. Multimedia20, 1548–1560.[33] Tan, R.T., 2008. Visibility in bad weather from a single image, in:2008 IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage,Alaska, USA.[34] Tang, K., Yang, J., Wang, J., 2014. Investigating haze-relevant fea-tures in a learning framework for image dehazing, in: 2014 IEEEConference on Computer Vision and Pattern Recognition, CVPR2014, Columbus, OH, USA, June 23-28, 2014, pp. 2995–3002.[35] Tang, Z., Peng, X., Li, K., Metaxas, D.N., 2019. Towards efﬁcientu-nets: A coupled and quantized approach. IEEE Transactions onPattern Analysis and Machine Intelligence , 1–1.[36] Xu, Z., Yang, X., Li, X., Sun, X., 2018. The effectiveness of instancenormalization: a strong baseline for single image dehazing. arXivpreprint, abs/1805.03305 .[37] Yang, X., Xu, Z., Luo, J., 2018. Towards perceptual image dehaz-ing by physics-based disentanglement and adversarial training, in:Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial In-telligence, (AAAI-18), the 30th innovative Applications of ArtiﬁcialIntelligence (IAAI-18), and the 8th AAAI Symposium on Educa-tional Advances in Artiﬁcial Intelligence (EAAI-18), New Orleans,Louisiana, USA, February 2-7, 2018, pp. 7485–7492.[38] Zhang, H., Patel, V.M., 2018. Densely connected pyramid dehazingnetwork, in: 2018 IEEE Conference on Computer Vision and Pat- tern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22,2018, pp. 3194–3203.[39] Zhang, H., Sindagi, V., Patel, V.M., 2018. Multi-scale single im-age dehazing using perceptual pyramid deep network, in: 2018 IEEEConference on Computer Vision and Pattern Recognition Workshops,CVPR Workshops 2018, Salt Lake City, UT, USA, June 18-22, 2018,pp. 902–911.[40] Zhang, J., Tao, D., 2020. Famed-net: A fast and accurate multi-scaleend-to-end dehazing network. IEEE Trans. Image Process. 29, 72–84.[41] Zhao, H., Gallo, O., Frosio, I., Kautz, J., 2017. Loss functions forimage restoration with neural networks. IEEE Trans. ComputationalImaging 3, 47–57.[42] Zhu, H., Peng, X., Chandrasekhar, V., Li, L., Lim, J., 2018. De-hazegan: When image dehazing meets differential programming, in:Proceedings of the Twenty-Seventh International Joint Conferenceon Artiﬁcial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm,Sweden., pp. 1234–1240.[43] Zhu, J., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks, in:IEEE International Conference on Computer Vision, ICCV 2017,Venice, Italy, October 22-29, 2017, pp. 2242–2251.[44] Zhu, Q., Mai, J., Shao, L., 2015. A fast single image haze removal al-gorithm using color attenuation prior. IEEE Trans. Image Processing24, 3522–3533.

Shichao Kan et al.: