[PDF] Two-Step Image Dehazing with Intra-domain and Inter-domain Adaptation

Abstract

Caused by the difference of data distributions, intra-domain gap and inter-domain gap are widely present in image processing tasks. In the field of image dehazing, certain previous works have paid attention to the inter-domain gap between the synthetic domain and the real domain. However, those methods only establish the connection from the source domain to the target domain without taking into account the large distribution shift within the target domain (intra-domain gap). In this work, we propose a Two-Step Dehazing Network (TSDN) with an intra-domain adaptation and a constrained inter-domain adaptation. First, we subdivide the distributions within the synthetic domain into subsets and mine the optimal subset (easy samples) by loss-based supervision. To alleviate the intra-domain gap of the synthetic domain, we propose an intra-domain adaptation to align distributions of other subsets to the optimal subset by adversarial learning. Finally, we conduct the constrained inter-domain adaptation from the real domain to the optimal subset of the synthetic domain, alleviating the domain shift between domains as well as the distribution shift within the real domain. Extensive experimental results demonstrate that our framework performs favorably against the state-of-the-art algorithms both on the synthetic datasets and the real datasets.

Full PDF

JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Two-Step Image Dehazing with Intra-domain andInter-domain Adaption

Xin Yi, Bo Ma † , Yulin Zhang, Longyao Liu, JiaHao Wu Abstract —Recently, image dehazing task has achieved remark-able progress by convolutional neural network. However, thoseapproaches mostly treat haze removal as a one-to-one problemand ignore the intra-domain gap. Therefore, haze distributionshift of the same scene images is not handled well. Also, dehazingmodels trained on the labeled synthetic datasets mostly sufferfrom performance degradation when tested on the unlabeled realdatasets due to the inter-domain gap. Although some previousworks apply translation network to bridge the synthetic domainand the real domain, the intra-domain gap still exists and affectsthe inter-domain adaption. In this work, we propose a novel Two-Step Dehazing Network (TSDN) to minimize the intra-domaingap and the inter-domain gap. First, we propose a multi-to-one dehazing network to eliminate the haze distribution shiftof images within the synthetic domain. Then, we conduct aninter-domain adaption between the synthetic domain and thereal domain based on the aligned synthetic features. Extensiveexperimental results demonstrate that our framework performsfavorably against the state-of-the-art algorithms both on thesynthetic datasets and the real datasets.

Index Terms —Image dehazing, intra-domain adaption, inter-domain adaption.

I. I

NTRODUCTION I MAGE dehazing task aims to recover the clear images fromtheir corresponding hazy images. The whole procedure canbe formulated as I ( x ) = J ( x ) t ( x ) + A (1 − t ( x )) (1)where I ( x ) denotes the hazy image and J ( x ) denotes theclear image, x denotes a pixel position in the image, A denotes the global atmospheric light and t ( x ) denotes thetransmission map. In homogeneous situation, the transmissionmap can be represented as t ( x ) = e − βd ( x ) , where β and d ( x ) is the atmosphere scattering parameter and the scene depth,respectively.Obviously, image dehazing is an ill-posed problem. So,many researchers try to transform this problem into a well-posed problem by estimating the atmospheric light intensityand the transmission map via certain priors [2], [3], [4].However, those methods are not robust and tend to fail insome scene, especially where the color of objects are similarto the atmospheric light. More recently, to avoid hand-craftpriors, other researchers apply convolutional neural networkto directly predict the transmission map and the atmosphericlight intensity from training data [5], [6], [7]. However, those The authors are with the Beijing Laboratory of Intelligent InformationTechnology, School of Computer Science and Technology, Beijing Instituteof Technology, Beijing 100081, China. † Corresponding author. (a) Input (b) MSBDN-DFF [1] (c) Ours

Fig. 1: The dehazed results of the same scene images withhaze distribution shift. Our method generates cleaner imagesthan MSBDN-DFF. More importantly, our method avoidsperformance gap when faced with the intra-domain gap. (a)Hazy images. (b) Results of MSBDN-DFF. (c) Our results.methods split the whole dehazing framework into two parts,without considering the inherent connection between them. Inaddition, inaccurate estimation of the transmission map and theatmospheric light intensity would lead to undesirable results.To overcome those problems, the end-to-end dehazingframeworks [8], [9], [10], [1] are proposed to predict clearimages directly from hazy inputs. However, those methodsmainly focus on the one-to-one dehazing task, i.e., dehazedresult is generated by a single hazy image. The intra-domaingap that haze distribution shift among hazy images of thesame scene receives little attention. Therefore, when the hazedistribution changes, the performance of those methods willchange simultaneously. As shown in Figure 1, MSBDN-DFF [1] suffers from performance drop due to the intra-domain gap. In addition, previous approches mostly trainmodels on the labeled synthetic datasets because it is difﬁcultto obtain sufﬁcient hazy images with ground truth in realworld. Naturally, those approches perform poorly on realhazy images due to the inter-domain gap. To address thisissue, Shao et al. [11] attempt to bridge the synthetic domainand the real domain by applying bidirectional translation a r X i v : . [ c s . C V ] F e b OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 network. Nevertheless, they only deal with the inter-domaingap between the synthetic domain and the real domain whileignoring the intra-domain gap.In order to improve the robustness of the model underdifferent haze distributions and the generalization under differ-ent domains, we propose a two-step image dehazing network(TSDN), which comprises an intra-domain adaption step andan inter-domain adaption step. First, we extract features ofhazy images which are in the same scene but with differenthaze distributions. Then, we pick the base feature accordingto loss-based deep supervision and align other features to thebase feature. Thus, the distribution shift of images within thesynthetic domain is eliminated in the feature space. Finally, weapply the similar alignment in the inter-domain adaption stepto close the inter-domain gap. For image dehazing on syntheticdatasets and real datasets, our proposed framework achievesstate-of-the-art performance against previous algorithms. Thecontributions of this work are summarized as follows: • We propose a novel intra-domain adaption on featurelayer which closes the haze distribution gap of the same sceneimages. • We propose an inter-domain adaption for image dehazingon the real hazy images based on the aligned intra-domainfeatures. • We conduct extensive experiments and comprehensiveablation studies on the synthetic datasets and the real datasetswhich validate the effectiveness of our proposed method. • Our domain adaption module can be integrated intoexisting dehazing frameworks for performance improvement.II. R

ELATED W ORKS

In this section, we brieﬂy review the image dehazing, do-main adaption and deeply-supervised learning methods, whichare related to our work.

A. Image Dehazing

Previous image dehazing approaches can be divided intotwo mainstreams, where one is based on prior and the otheris based on learning.

1) Prior-based methods:

Those methods recover clear im-ages through statistics prior, e.g., the albedo of the scene in[12]. Recently, researchers have explored different priors forimage dehazing [2], [3], [13], [4]. Speciﬁcally, based on theobservation that clear images have higher contrast than hazyimages, Tan et al. [2] enhance the visibility of hazy imagesby maximizing local contrast. He [3] proposes dark channelprior (DCP) that the intensity of pixels in haze-free patchesis very low in at least one color channel to achieve imagedehazing. Besides, based on a generic regularity that smallimage patches typically exhibit a one-dimensional distributionin the RGB color space, Fattal [13] proposes a color-linesapproach to recover the scene transmission. Zhu et al. [4]propose color attenuation prior to recover the scene depth ofthe hazy image with a supervised learning method.All above methods heavily rely on hypothetical priors.However, those priors tend to lose effectiveness in complexscene, leading to performance drop.

2) Learning-based methods:

Different from the abovemethods, learning-based methods use convolutional neural net-works to recover clear images from hazy images directly [5],[14], [8], [9], [10], [11], [1]. Speciﬁcally, an end-to-end systemfor transmission estimation is proposed in [5]. Ren et al. [14]design a multi-scale neural network for learning transmissionmaps from hazy images in a coarse-to-ﬁne manner. Qiu etal. [9] propose a pix2pix model with an enhancer block whichreinforces the dehazing effect in both color and details. Amulti-scale boosted decoder with dense feature fusion is pro-posed to restore clear images in [1]. However, those methodsonly focus on the one-to-one image dehazing while ignoresthe intra-domain gap.

B. Domain Adaption

The purpose of domain adaption is to eliminate the dis-tribution difference between labeled source domain and targetdomain. Recently, numerous domain adaption approaches havebeen proposed, including aligning the source domain andtarget domain distributions, generating a mapping between twodomains, or creating ensemble models [15]. The alignmentbased methods can be divided into pixel-level alignment [16],[17], [18], [19], [20] and feature-level alignment [21], [22],[23]. The feature-level alignment methods mostly try to pro-duce feature maps with the same distribution from images withdifferent distributions. And the pixel-level alignment methodsusually learn a transformation in the pixel space from onedomain to the other [16].With the introduction of GAN [24], adversarial learningbegins to be used in other computer vision tasks, e.g., imagegeneration [25], [26], [27], image-to-image translation [18],[28], [29], [30], etc. Among them, adversarial-based unsu-pervised domain adaption (UDA) utilizes adversarial learningto learn domain invariant features. This framework usuallyconsists of a generator and a discriminator, where they playmin-max games to obtain the distribution migration from thesource domain to the target domain.In image dehazing ﬁeld, Shao et al. [11] propose a bidi-rectional translation network to bridge the domain gap. How-ever, they only consider the inter-domain gap. In this work,we further minimize the intra-domain gap to achieve extraperformance gains.

C. Deeply Supervised Learning

The deeply supervised learning is proposed in [31]. Theyapply the classiﬁer on the deep feature layers of the neuralnetworks to promote better convergence. Also, they drawa conclusion that more discriminative features will improvethe ﬁnal performance of the classiﬁer. Recently, the deeplysupervised learning is widely used in image classiﬁcation [32],semantic segmentation [33], human pose estimation [34].In this work, we append auxiliary supervision branch on thefeature layer. Unlike classiﬁcation tasks which want to makefeatures more discriminative, we want to make features of thesame scene images less discriminative, i.e., eliminating hazedistribution shift in feature space. Furthermore, our supervisedlearning is based on the dehazing loss so that we can ensureall features are aligned to the best one.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

Synthetic DomainReal Domain

Feature F1 𝐷𝐷 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 Real DomainPredictions

Feature FrFeature F3 Feature F4Feature F2 alignedfeature

Loss RankReconstruction NetworkFeature Extractor 𝐺𝐺 𝐷𝐷 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 SharingWeights SharingWeights ℒ 𝑖𝑖𝑖𝑖𝑖𝑖𝑟𝑟 ℒ 𝑠𝑠𝑠𝑠𝑠𝑠 Feature Extractor 𝐺𝐺 Reconstruction Network ℒ 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 ℒ 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 Fig. 2: Illustration of our proposed method. The overall framework consists of two main modules, the dehazing module andthe domain adaption module. The dehazing module that comprises a feature extractor G and a reconstruction network aims torecover clear images from hazy images as depicted by red arrows and green arrows. The domain adaption module is dividedinto two steps, intra-domain adaption and inter-domain adaption, aiming to close the intra-domain gap and the inter-domaingap. In intra-domain phase, we propose a loss-based deep supervision and an intra-domain feature alignment to eliminate hazedistribution shift of the same scene images. In inter-domain phase, we apply inter-domain feature alignment based on thealigned synthetic feature to close the inter-domain gap. GRL

Feature F1Feature F3 Feature F4Feature F2Feature F1Base Feature Feature F4Feature F2 Feature F1Base FeatureFeature F2Feature F4

GRLGRLGRL

TargetTargetTargetSource 𝐷𝐷 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 Fig. 3: Illustration of the intra-domain adaption. First, weapply loss-based deep supervision to select the base feature F b from all features of the same scene images. Then, wealign all other features to the base feature in order to eliminatehaze distribution shift. This alignment is achieved by the intra-domain discriminator D intra and the GRL [35] module, where D intra tries to distinguish F b and GRL module reverses thegradient so that feature extractor G will generate similarlydistributed features to confuse D intra .III. M ETHOD

In this section, we introduce our overall method in sectionIII-A, intra-domain adaption in section III-B, inter-domainadaption in section III-C and loss functions in section III-D.

A. Method Overview

The overall framework comprises a dehazing module and adomain adaption module which is illustrated in Figure 2. Thegoal of the dehazing module is to recover clear images fromhazy images and the goal of the domain adaption module is toachieve intra-domain adaption and inter-domain adaption. Weﬁrst minimize the intra-domain gap then the inter-domain gap.In the intra-domain phase, we take a set of hazy images in thesame scene but with different haze distributions as input andget their corresponding features by the feature extractor G .We align all other features to the base feature F b by the intra-domain discriminator D intra . In the inter-domain phase, wetake real hazy images and synthetic hazy images as input andget their features by the same feature extractor G . We alignreal domain features to synthetic domain features through theinter-domain discriminator D inter . B. Intra-domain Adaption

Generally, a clear image corresponds to multiple hazyimages with different haze distributions. The goal of intra-domain adaption is to align those hazy images and improve theperformance on each image. To this end, we apply adversarialalignment in feature space via intra-domain discriminator D intra .Suppose we have n hazy images { x i ∈ R H × W × } ni =1 that belong to the same scene, we can extract n features { F i ∈ R h × w × k } ni =1 with a designed feature extractor network G . To ﬁnd the base feature F b to which all other features OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 are aligned, we apply deeply supervised learning based onthe dehazing loss. Speciﬁcally, we input all features into thereconstruction module to get their corresponding haze-freepredictions and dehazing losses L sys . According to thoselosses, we pick the feature with the lowest dehazing lossas base feature F b . Then, base feature F b along with otherfeatures F j ( j = 1 , , ..., n and j (cid:54) = b ) are fed into afully-convolutional network D intra to generate intra-domainclassiﬁcation score maps. The score map has the same spatialresolution as the feature where each pixel position representsthe intra-domain prediction of the same position in the feature.The loss function between the predicted classiﬁcation scoremap and the label is binary cross-entropy loss which can bewritten as L intra = − n − (cid:88) j (cid:88) h,w y log( D intra ( F ( h,w ) b ))+ (1 − y ) log(1 − D intra ( F ( h,w ) j )) (2)where ( h, w ) denotes a pixel position in the feature and y denotes intra-domain label. In our work, we set label y ofsource and target as 1 and 0, respectively. Correspondingly,base feature F b is the intra-domain source and other features F j are the intra-domain target. For the discriminator D intra ,we optimize it using loss function Eq.(2). For feature extractor G , we apply gradient reversal layer (GRL) [35] to performadversarial learning. The pipeline of the deeply supervisedlearning and the intra-domain adaption is illustrated in Figure3. The discriminator D intra tries to distinguish F b from F j ( j = 1 , , ) while feature extractor G tries to generatesimilarly distributed F b and F j to confuse D intra . Thus, thehaze distribution shift is eliminated in feature space. C. Inter-domain Adaption

The goal of the inter-domain adaption is to close the inter-domain gap between the synthetic domain and the real domain.In order to prevent the intra-domain alignment process fromaffecting the inter-domain alignment, we apply inter-domainadaption after we ﬁnish our intra-domain alignment.We perform inter-domain adaption in feature space byadversarial learning. Particularly, given a synthetic hazy image x s ∈ R H × W × and a real hazy image x r ∈ R H × W × , weextract their features F s ∈ R h × w × k and F r ∈ R h × w × k byextractor network G , respectively. Note that F s is alreadyaligned within the synthetic domain. Then, we obtain domainclassiﬁcation prediction maps of F s and F r by a fully-convolutional discriminator network D inter . The predictionmap has the same spatial shape as input and each position on itdenotes domain label of the same position on input. We applybinary cross-entropy loss between the classiﬁcation score mapand the label which can be written as L inter = − (cid:88) h,w z log( D inter ( F ( h,w ) s ))+ (1 − z ) log(1 − D inter ( F ( h,w ) r )) (3)where ( h, w ) denotes a pixel position in the feature and z denotes inter-domain label. We set the synthetic domain assource and real domain as target, where source label is 1 andtarget label is 0. D. Loss Functions

Given a synthetic dataset D sys and a real dataset D real ,where D sys consists of a hazy subset I haze = { x h } N h h =1 anda clear subset I clear = { x c } N c c =1 while D real only contains ahazy set J haze = { x r } N r r =1 , We adopt following loss functionsin our framework.

1) Domain Adversarial Losses:

As described in sectionIII-B and III-C, domain adversarial losses are generated by D intra and D inter . On the scale of the entire dataset, theintra-domain loss can be written as L = N c (cid:88) c =1 L intra (4)and the inter-domain loss can be written as L = N i (cid:88) i =1 L inter (5)where N i denotes minimum of N c and N r .

2) Image Dehazing Losses:

Those losses measure the dif-ference between the predicted images and the ground truth.In the synthetic domain, we apply L loss to make sure thedehazed results are close to the clear images. Since a clearimage corresponds to multiple hazy images, we further deﬁnethe hazy subset as I haze = { x ( c ) h , c = 1 , , ..., N c } N h h =1 , where x ( c ) h represents the h -th hazy image corresponding to the c -thclear image. Thus, the predicted clear images can be deﬁned as I pre = { y ( c ) h , c = 1 , , ..., N c } N h h =1 . The dehazing loss between I pre and I clear are deﬁned as L sys = 1 N h N h (cid:88) h =1 (cid:13)(cid:13)(cid:13) y ( c ) h − x c (cid:13)(cid:13)(cid:13) (6)Besides, in order to improve the performance of our model inthe real domain, we add the dark channel prior loss [3] andthe total variation loss on the predicted real images followingShao [11]. The dark channel prior of an image is deﬁned as I dark ( x ) = min p ∈ P ( x ) (cid:18) min c ∈{ r,g,b } I c ( p ) (cid:19) (7)where I is an image and c is a color channel of I , x representsa pixel position of I and P ( x ) represents a local patch centeredat x . We divide an image into n patches and deﬁne the overalldark channel loss as L dc = 1 n n (cid:88) x (cid:13)(cid:13) I dark ( x ) (cid:13)(cid:13) (8)The total variation loss is deﬁned as L tv = 1 w w (cid:88) i (cid:107) I i +1 ,j − I i,j (cid:107) + 1 h h (cid:88) j (cid:107) I i,j +1 − I i,j (cid:107) (9)where i and j denote the horizontal position and the verticalposition of an image, respectively. w is the width of the imageand h is the height. So, the image dehazing loss in the realdomain can be written as L real = λ dc L dc + λ tv L tv (10) OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

Layer Channels (In) Channels (Out) OutputConv 3 64 -DownSampling 64 128 -DownSampling 128 256 -ResBlocks-9 256 256 FeatureResBlocks-9 256 256 -UpSampling 256 128 -UpSampling 128 64 -Conv 64 3 Dehazed image

TABLE I: Conﬁgurations of dehazing module. “Conv” rep-resents the convolution layer, “ResBlocks-9” represents 9residual blocks.

3) Overall Loss:

The overall loss is deﬁned as weightedsum of all losses. In intra-domain training phase, the overallloss can be written as L altra = λ L + λ L sys (11)while in inter-domain training phase, the overall loss can bewritten as L alter = λ L + λ L sys + λ L real (12)IV. E XPERIMENTS

We introduce related experiments and ablation studies inthis section to verify our proposed method.

A. Experimental detailsB. Datasets

We choose RESIDE [36] dataset as our training dataset.For the intra-domain adaption step, we randomly sample 8000hazy images from ITS (Indoor Training Set) and 8000 hazyimages from OTS (Outdoor Training Set). Every four hazyimages correspond to one clear image, in other words, thereare total 20,000 images in the synthetic training set. For theinter-domain adaption step, we randomly sample 3000 imagesfrom URHI (Unannotated Realistic Hazy Images). To achievedata augmentation, we randomly crop images to × and randomly ﬂip the cropped images horizontally during thetraining phase. Furthermore, we ensure that the crop areasand horizontal directions of the same scene images (four hazyimages and one clear image) are consistent in each iteration. C. Implementation details

We implement our proposed method TSDN using Py-Torch [37] framework, and we conduct experiments on bothour designed network architecture (Table I) and MSBDN-DFF [1] network architecture. First, we train the dehazingmodule and the intra-domain discriminator within the syntheticdomain for 200 epochs. For the dehazing module, we applythe SGD [38] optimizer with learning rate . × − ,momentum . and weight decay × − . For the intra-domain discriminator, we apply the Adam [39] optimizer withlearning rate × − , β = 0 . and β = 0 . . We setparameter of reversed gradients as 0.1 in GRL module. For theoverall loss in the intra-domain phase, we set λ and λ as 1. Then, we adapt model to real hazy images by training modelwith inter-domain discriminator for 20 epochs. Also, we applySGD optimizer with learning rate × − for the dehazingmodule and Adam optimizer with learning rate × − forthe inter-domain discriminator. For the overall loss in the inter-domain phase, we set λ = 1 , λ = 1 , λ = 1 , λ tv = 10 − and λ dc = 10 − . D. Results on Synthetic Datasets

We evaluate our proposed framework and compare it withother previous methods on SOTS [36] dataset. The dehazedresults of same scene images with haze distribution shift areshown in Figure 4. From the results, we can observe that previ-ous algorithms all encounter the phenomenon of performancegap when facing different haze distributions, e.g., magniﬁedarea in the images. Normally, if a hazy image has thickerhaze, the dehazed image has higher chance that it remainshaze in global or detail. Compared with previous methods,our approach generates clearer images under different hazesituations which veriﬁes the effectiveness of the intra-domainadaption (more detailed proof can be found in IV-F).The quantitative evaluation are shown in Table II. Ourmethod achieves the best performance on both PSNR andSSIM. Compared with the state-of-the-art method MSBDN-DFF [1], our method achieves performance gain with . dBon PSNR and . on SSIM. E. Results on Real Images

To evaluate our method in the real domain, we conductexperiments on the real dataset RTTS [36] and compare visualresults with other previous methods.The visual results are shown in Figure 5. From the results,we can observe that previous dehazing methods have differentlimitations on real images. Speciﬁcally, DCP [3] suffers fromserious color distortion and overexposure, e.g., the ﬁrst, thirdand fourth rows of Figure 5 (b). Besides, the dehazed resultsof Dehazenet [5], FFA [10] and MSBDN-DFF [1] all haveresidual haze, e.g., the ﬁrst, fourth and sixth rows of Figure 5(c), (f) and (g). In addition, the dehazed results of EPDN [9]suffer from brightness issues (much darker), e.g., the thirdrow and the trafﬁc signs in the seventh row of Figure 5(d) and color distortion (some results are more yellow thanother methods), e.g., the sixth and seventh rows of Figure5 (d). Furthermore, DAdehazing [11] reaches better visualresults than the other previous methods. The overall brightnessand the color are well-maintained during dehazing process.However, there is still some residual haze, e.g., the trees inthe fourth row of Figure 5 (e). More importantly, some resultsof DAdehazing become less realistic or blur due to the effectof GAN, e.g., the people in the ﬁrst row, the trees in the fourthrow and the people in the sixth row of Figure 5 (e). Overall,the method we proposed achieves the best performance inremoving haze, maintaining the color and brightness of theimages, and restoring details.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6 (a)

Input (b)

DCP [3] (c)

DehazeNet [5] (d)

EPDN [9] (e)

DAdehazing [11] (f)

MSBDN-DFF [1] (g)

Ours (h) GT Fig. 4: Visual results of images with different haze distributions on SOTS [36] dataset

DCP [3] DehazeNet [5] DCPDN [6] EPDN [9] GFN [40] GDN [41] DAdehazing [11] MSBDN-DFF [1] OursPSNR 15.49 21.14 19.39 23.82 22.30 31.51 27.76 33.79 35.26SSIM 0.646 0.853 0.659 0.893 0.886 0.982 0.928 0.983 0.985

TABLE II: Quantitative evaluation of the dehazing results on SOTS [36] dataset.

BS ITA LDS PSNI SSIMSOTS ResBlocks 23.80 0.881ResBlocks (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

TABLE III: Ablation study on SOTS [36] dataset. “ITA”denotes the intra-domain adaption, “LDS” denotes the loss-based deep supervision and “ResBlocks” is the dehazingmodule in Table I.

F. Ablation Study

In order to verify the effectiveness of each module in ourproposed method, we conduct ablation studies on the intra-domain adaption and the inter-domain adaption.In the intra-domain adaption part, we conduct ablationstudy using the following settings: 1) BS : base network; 2) BS+ITA : base network with the intra-domain adaption; 3)

BS+ITA+LDS : base network with the intra-domain adaption and the loss-based deep supervision.The quantitative results of the intra-domain adaption areshown in Table III, which demonstrates that base network withthe intra-domain adaption and the loss-based deep supervisionachieves the best performance. To further prove that theimprovement on performance is promoted by the intra-domainadaption, we compare the intra-domain gap under all threemethods. Instead of directly measuring the distribution simi-larity of the features which is not intuitive, we utilize dehazinglosses to measure the intra-domain gap. In other words, if thedehazing losses of the same scene are less discrete, the featuresof the same scene are more closely aligned. Speciﬁcally, wecalculate the range, the standard deviation and the coefﬁcientof variation of the dehazing losses in each scene and take theaverage of all scenes. The results are shown in Figure 6. Fromthe results, we can observe that dehazing losses decrease fasterafter we apply the intra-domain adaption to base network.Moreover, dehazing losses are more compact in methods withthe intra-domain adaption which demonstrates that the featuresof the same scene images are aligned in feature space.In the inter-domain part, we conduct ablation study withthe following settings: 1) BS : base network; 2) BS+ITE : base

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7 (a)

Input (b)

DCP [3] (c)

DehazeNet [5] (d)

EPDN [9] (e)

DAdehazing [11] (f)

FFANet [10] (g)

MSBDN-DFF [1] (h)

Ours

Fig. 5: Visual results of real world hazy images on RTTS [36] datasetnetwork with the inter-domain adaption; 3)

BS+ITA : basenetwork with the intra-domain adaption; 4)

BS+ITA+ITE :base network with the intra-domain and the inter-domainadaption.The visual results are shown in Figure 7. The result ofbase network has residual haze due to the domain gap. Thisphenomenon is alleviated by the intra-domain adaption or theinter-domain adaption. However, color distortion appears if theinter-domain adaption is directly applied because the networkis sensitive to the haze distribution of the input image. Basenetwork with intra-domain adaption and inter-domain adaptionachieve the best performance. V. C

ONCLUSION

In this paper, we propose a novel two-step dehazing network(TSDN) which consists of an intra-domain adaption step andan inter-domain adaption step. First, we apply the intra-domainadaption within the synthetic domain by adversarial learningand deeply supervised learning. Speciﬁcally, we extract fea-tures of the same scene images with different haze distributionsand seek base feature from those features by dehazing loss.Then, we align other features to the base feature to eliminatedistribution difference of input images. Furthermore, we applythe inter-domain adaption between the synthetic domain andthe real domain based on the aligned synthetic features. Ourproposed methods can be easily integrated into existing dehaz-ing frameworks. Extensive experimental results demonstrate

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

Epoch

Range

BSBS+ITABS+ITA+LDS (a) Range

Epoch

Standard Deviation

BSBS+ITABS+ITA+LDS (b) Standard Deviation

Epoch

Coefficient of Variation

BSBS+ITABS+ITA+LDS (c) Coefﬁcient of Variation

Fig. 6: The dispersion evaluation of the dehazing losses. Network with the intra-domain adaption gets more compact dehazinglosses which demonstrates that the features are aligned in feature space. (a) Input (b) BS (c) BS+ITE (d) BS+ITA (e) BS+ITA+ITE

Fig. 7: Ablation study of real world hazy images on RTTS [36] datset.that our method perform favorably against the state-of-the-artalgorithms both on the synthetic datasets and the real datasets.R

EFERENCES[1] H. Dong, J. Pan, L. Xiang, Z. Hu, X. Zhang, F. Wang, and M.-H.Yang, “Multi-scale boosted dehazing network with dense feature fusion,”in

Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition , 2020, pp. 2157–2167.[2] R. T. Tan, “Visibility in bad weather from a single image,” in . IEEE, 2008,pp. 1–8.[3] K. He, J. Sun, and X. Tang, “Single image haze removal using darkchannel prior,”

IEEE transactions on pattern analysis and machineintelligence , vol. 33, no. 12, pp. 2341–2353, 2010.[4] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithmusing color attenuation prior,”

IEEE transactions on image processing ,vol. 24, no. 11, pp. 3522–3533, 2015.[5] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-endsystem for single image haze removal,”

IEEE Transactions on ImageProcessing , vol. 25, no. 11, pp. 5187–5198, 2016.[6] H. Zhang and V. M. Patel, “Densely connected pyramid dehazingnetwork,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2018, pp. 3194–3203.[7] H. Zhang, V. Sindagi, and V. M. Patel, “Joint transmission map estima-tion and dehazing using deep networks,”

IEEE Transactions on Circuitsand Systems for Video Technology , vol. 30, no. 7, pp. 1975–1986, 2019.[8] R. Li, J. Pan, Z. Li, and J. Tang, “Single image dehazing via conditionalgenerative adversarial network,” in

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , 2018, pp. 8202–8211.[9] Y. Qu, Y. Chen, J. Huang, and Y. Xie, “Enhanced pix2pix dehazingnetwork,” in

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , 2019, pp. 8160–8168.[10] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “Ffa-net: Feature fusionattention network for single image dehazing.” in

AAAI , 2020, pp. 11 908–11 915.[11] Y. Shao, L. Li, W. Ren, C. Gao, and N. Sang, “Domain adaptationfor image dehazing,” in

Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition , 2020, pp. 2808–2817.[12] R. Fattal, “Single image dehazing,”

ACM transactions on graphics(TOG) , vol. 27, no. 3, pp. 1–9, 2008. [13] ——, “Dehazing using color-lines,”

ACM transactions on graphics(TOG) , vol. 34, no. 1, pp. 1–14, 2014.[14] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Singleimage dehazing via multi-scale convolutional neural networks,” in

European conference on computer vision . Springer, 2016, pp. 154–169.[15] G. Wilson and D. J. Cook, “A survey of unsupervised deep domainadaptation,”

ACM Transactions on Intelligent Systems and Technology(TIST) , vol. 11, no. 5, pp. 1–46, 2020.[16] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan,“Unsupervised pixel-level domain adaptation with generative adversarialnetworks,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2017, pp. 3722–3731.[17] A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb,“Learning from simulated and unsupervised images through adversarialtraining,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2017, pp. 2107–2116.[18] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-imagetranslation using cycle-consistent adversarial networks,” in

Proceedingsof the IEEE international conference on computer vision , 2017, pp.2223–2232.[19] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discovercross-domain relations with generative adversarial networks,” arXivpreprint arXiv:1703.05192 , 2017.[20] Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised duallearning for image-to-image translation,” in

Proceedings of the IEEEinternational conference on computer vision , 2017, pp. 2849–2857.[21] C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu,and J. Huang, “Progressive feature alignment for unsupervised domainadaptation,” in

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , 2019, pp. 627–636.[22] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep do-main adaptation,” in

European conference on computer vision . Springer,2016, pp. 443–450.[23] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-inative domain adaptation,” in

Proceedings of the IEEE conference oncomputer vision and pattern recognition , 2017, pp. 7167–7176.[24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in

Advances in neural information processing systems , 2014, pp. 2672–2680.[25] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua, “Cvae-gan: Fine-grained

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9 image generation through asymmetric training,” in

Proceedings of theIEEE International Conference on Computer Vision (ICCV) , Oct 2017.[26] J. Yang, A. Kannan, D. Batra, and D. Parikh, “Lr-gan: Layered recursivegenerative adversarial networks for image generation,” arXiv preprintarXiv:1703.01560 , 2017.[27] C. H. Lin, C.-C. Chang, Y.-S. Chen, D.-C. Juan, W. Wei, and H.-T.Chen, “Coco-gan: generation by parts via conditional coordinating,” in

Proceedings of the IEEE International Conference on Computer Vision ,2019, pp. 4512–4521.[28] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan:Uniﬁed generative adversarial networks for multi-domain image-to-image translation,” in

Proceedings of the IEEE conference on computervision and pattern recognition , 2018, pp. 8789–8797.[29] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translationwith conditional adversarial networks,” in

Proceedings of the IEEEconference on computer vision and pattern recognition , 2017, pp. 1125–1134.[30] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro,“High-resolution image synthesis and semantic manipulation with condi-tional gans,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2018, pp. 8798–8807.[31] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervisednets,” in

Artiﬁcial intelligence and statistics , 2015, pp. 562–570.[32] D. Sun, A. Yao, A. Zhou, and H. Zhao, “Deeply-supervised knowledgesynergy,” in

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , 2019, pp. 6997–7006.[33] Z. Zhang, X. Zhang, C. Peng, X. Xue, and J. Sun, “Exfuse: Enhanc-ing feature fusion for semantic segmentation,” in

Proceedings of theEuropean Conference on Computer Vision (ECCV) , 2018, pp. 269–284.[34] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks forhuman pose estimation,” in

European conference on computer vision .Springer, 2016, pp. 483–499.[35] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by back-propagation,” in

International conference on machine learning . PMLR,2015, pp. 1180–1189.[36] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang,“Benchmarking single-image dehazing and beyond,”

IEEE Transactionson Image Processing , vol. 28, no. 1, pp. 492–505, 2019.[37] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation inpytorch,” 2017.[38] L. Bottou, “Large-scale machine learning with stochastic gradient de-scent,” in

Proceedings of COMPSTAT’2010 . Springer, 2010, pp. 177–186.[39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[40] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, and M.-H. Yang,“Gated fusion network for single image dehazing,” in

Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition , 2018,pp. 3253–3261.[41] X. Liu, Y. Ma, Z. Shi, and J. Chen, “Griddehazenet: Attention-basedmulti-scale network for image dehazing,” in