Image Correction via Deep Reciprocating HDR Transformation
Xin Yang, Ke Xu, Yibing Song, Qiang Zhang, Xiaopeng Wei, Rynson Lau
IImage Correction via Deep Reciprocating HDR Transformation
Xin Yang , (cid:63) , Ke Xu , (cid:63) , Yibing Song † , Qiang Zhang , Xiaopeng Wei , Rynson W.H. Lau City University of Hong Kong Dalian University of Technology Tencent AI Lab https://ybsong00.github.io/cvpr18_imgcorrect/index (a) Input (b) CAPE [17] (c) DJF [22] (d) L0S [46](e) WVM [8] (f) SMF [50] (g) DRHT (h) Ground TruthFigure 1: Image correction results on an underexposed input. Existing LDR methods have the limitation in recovering themissing details, as shown in (b)-(f). In comparison, we recover the missing LDR details in the HDR domain and preservethem through tone mapping, producing a more favorable result as shown in (g).
Abstract
Image correction aims to adjust an input image into avisually pleasing one. Existing approaches are proposedmainly from the perspective of image pixel manipulation.They are not effective to recover the details in the un-der/over exposed regions. In this paper, we revisit the imageformation procedure and notice that the missing details inthese regions exist in the corresponding high dynamic range(HDR) data. These details are well perceived by the hu-man eyes but diminished in the low dynamic range (LDR)domain because of the tone mapping process. Therefore,we formulate the image correction task as an HDR trans-formation process and propose a novel approach called
Deep Reciprocating HDR Transformation (DRHT) . Givenan input LDR image, we first reconstruct the missing de-tails in the HDR domain. We then perform tone mappingon the predicted HDR data to generate the output LDR im-age with the recovered details. To this end, we propose aunited framework consisting of two CNNs for HDR recon-struction and tone mapping. They are integrated end-to-endfor joint training and prediction. Experiments on the stan-dard benchmarks demonstrate that the proposed methodperforms favorably against state-of-the-art image correc-tion methods.
1. Introduction
The image correction problem has been studied fordecades. It dates back to the production of Charge-CoupledDevices (CCDs), which convert optical perception to digi-tal signals. Due to the semiconductors used in the CCDs,there is an unknown nonlinearity existed between the sceneradiance and the pixel values in the image. This nonlin-earity is usually modeled by gamma correction, which hasresulted in a series of image correction methods. Thesemethods tend to focus on image pixel balance via dif-ferent approaches including histogram equalization [28],edge preserving filtering [11, 1], and CNN encoder-decoder[41]. Typically, they function as a preprocessing step formany machine vision tasks, such as optical flow estima-tion [3, 15], image decolorization [37, 36], image deblur-ring [30, 29], face stylization [39, 35] and visual track-ing [38].Despite the demonstrated success, existing methods havethe limitation in correcting images with under/over expo-sure. An example is shown in Figure 1, where the state-of-the-art image correction methods fail to recover the missingdetails in the underexposed regions. This is because thepixel values around these regions are close to 0, and the de-tails are diminished within them. Although different image (cid:63)
Joint first authors. † Yibing Song is the corresponding author. Thiswork was conducted at City University of Hong Kong, led by Rynson Lau. a r X i v : . [ c s . C V ] A p r ixel operators have been proposed for image correction,the results are still unsatisfactory, due to the ill-posed natureof the problem. Thus, a question is raised if it is possible toeffectively recover the missing details during the image cor-rection process.To answer the aforementioned question, we trace backto the image formation procedure. Today’s cameras stillrequire the photographer to carefully choose the exposureduration ( ∆ t ) and rely on the camera response functions(CRFs) to convert a natural scene ( S ) into an LDR image( I ), which can be written as [5]: I = f CRF ( S × ∆ t ) , (1)However, when an inappropriate exposure duration is cho-sen, the existing CRFs can neither correct the raw data inthe CCDs nor the output LDR images. This causes theunder/over exposure in the LDR images. Based on thisobservation, we propose an end-to-end framework, called Deep Reciprocating HDR Transformation (DRHT) , for im-age correction. It contains two CNN networks. The firstCNN network reconstructs the missing details in the HDRdomain and the second CNN network transfers the detailsback to the LDR domain. Through the reciprocating HDRtransformation process, LDR images are corrected in the in-termediate HDR domain.Overall, the contribution in this work can be summarizedas follows. We interpret image correction as the Deep Re-ciprocating HDR Transformation (DRHT) process. An end-to-end DRHT model is therefore proposed to address theimage correction problem. To demonstrate the effective-ness of the proposed network, we have conducted extensiveevaluations on the proposed network with the state-of-the-art methods, using the standard benchmarks.
2. Related Work
In this section, we discuss relevant works to our problem,including image restoration and filtering, image manipula-tion, and image enhancement techniques.
Image Restoration and Filtering.
A variety of state-of-the-art image correction methods have been proposed. Im-age restoration methods improve the image quality mainlyby reducing the noise via different deep network de-signs [19, 40, 52], low-rank sparse representation learn-ing [21] or soft-rounding regularization [26]. Noise reduc-tion can help improve the image quality, but cannot recoverthe missing details. Edge-aware image filtering techniquesare also broadly studied for smoothing the images whilemaintaining high contrasted structures [2, 22, 33], smooth-ing repeated textures [23, 47, 50] or removing high contrastdetails [24, 54, 55]. Further operations can be done to en-hance the images by strengthening the details filtered out bythese methods and then adding them back. Although these filtering methods are sensitive to the local structures, over-exposed regions are usually smoothed in the output imagesand therefore details can hardly be recovered.
Image Manipulation.
Image correction has also beendone via pixel manipulation for different purposes, suchas color enhancement [48] and mimicking differentthemes/styles [42, 43]. Son et al . [34] propose a tone trans-fer model to perform region-dependent tone shifting andscaling for artistic style enhancement. Yan et al . [49] ex-ploit the image contents and semantics to learn tone adjust-ments made by photographers via their proposed deep net-work. However, these works mainly focus on manipulatingthe LDR images to adapt to various user preferences.
Image Enhancement.
Histogram equalization is the mostwidely used method for image enhancement by balancingthe histogram of the image. Global and local contrast ad-justments are also studied in [14, 31] for enhancing the con-trast and brightness. Kaufman et al . [17] propose a frame-work to apply carefully designed operators to strengthen thedetected regions ( e.g ., faces and skies), in addition to theglobal contrast and saturation manipulation. Fu et al . [8]propose a weighted variational method to jointly estimatethe reflectance and illumination for color correction. Guo etal . [10] propose to first reconstruct and refine the illumina-tion map from the maximum values in the RGB channelsand then enhance the illumination map. Recently, Shen etal . [32] propose a deep network to directly learn the map-ping relations of low-light and ground truth images. Thismethod can successfully recover rich details buried in lowlight conditions, but it tends to increase the global illumina-tion and generate surrealistic images.All these methods, however, cannot completely recoverthe missing details in the bright and dark regions. This ismainly because both their inputs and their enhancing op-erations are restricted to work in the LDR domain, whichdoes not offer sufficient information to recover all the de-tails while maintaining the global illumination.
3. Deep Reciprocating HDR Transformation
An overview of the proposed method is shown in Fig-ure 2(b). We first illustrate our reformulation of image cor-rection. We then show our HDR estimation network to pre-dict HDR data given LDR input. Finally, we show that theHDR data is tone mapped into the output LDR using a LDRcorrection network. The details are presented as follows:
Although human can well perceive the HDR data, it re-quires empirically configuring the camera during the imag-ing process. An overview of scene capturing and produc-ing LDR is shown in Figure 2(a). However, when undera) Image formulation process(b) Deep Reciprocating HDR Transformation (DRHT) pipelineFigure 2: An overview of image formulation process and the proposed DRHT pipeline. Given an input under/over exposedLDR image, we first reconstruct the missing details in the HDR domain and map them back to the output LDR domain.extreme lighting conditions (e.g., the camera is facing thesun), details in the natural scenes are lost during the tonemapping process. They cannot be recovered by existing im-age correction methods in the LDR domain.In order to recover the degraded regions caused by un-der/over exposures, we trace back to the image formationprocedure and formulate the correction as the Deep Recip-rocating HDR Transformation process: ˆ S = f ( I ; θ ) and ˆ I ldr = f ( ˆ S ; θ ) , where ˆ S and ˆ I ldr represent the recon-structed HDR data and the corrected LDR image, respec-tively. θ and θ are the CNN parameters. Specifically, wepropose the HDR estimation network ( f ) to first recoverthe details in the HDR domain and then the LDR correctionnetwork ( f ) to transfer the recovered HDR details back tothe LDR domain. Images are corrected via this end-to-endDRHT process. We propose a HDR estimation network to recover themissing details in the HDR domain, as explained below:
Network Architecture.
Our network is based on a fullyconvolutional encoder-decoder network. Given an inputLDR image, we encode it into a low dimensional latent rep-resentation, which is then decoded to reconstruct the HDRdata. Meanwhile, we add skip connections from each en-coder layer to its corresponding decoder layer. They enrich the local details during decoding in a coarse-to-fine man-ner. To facilitate the training process, we also add a skipconnection directly from the input LDR to the output HDR.Instead of learning to predict the whole HDR data, the HDRestimation network only needs to predict the difference be-tween the input and output, which shares some similarity toresidual learning [12]. We train this network from scratchand use batch normalization [16] and ELU [4] activation forall the convolutional layers.
Loss Function.
Given an input image I , the output of thisnetwork ˆ S = f ( I ; θ ) , and the ground truth HDR image Y , we use the Mean Square Error (MSE) as the objectivefunction: Loss hdr = 12 N N (cid:88) i =1 (cid:13)(cid:13)(cid:13) ˆ S i − α ( Y i ) γ (cid:13)(cid:13)(cid:13) , (2)where i is the pixel index and N refers to the total number ofpixels. α and γ are two constants in the nonlinear functionto convert the ground truth HDR data into LDR, which isempirically found to facilitate the network convergence. Wepretrain this network in advance before integrating it withthe remaining modules. We propose a LDR correction network, which shares thesame architecture as that of the HDR estimation network. Itims to preserve the recovered details in the LDR domain,as explained below:
Loss Function.
The output of the HDR estimation network ˆ S is in LDR as shown in Eq. 2. We first map it to the HDRdomain via inverse gamma correction. The mapped resultis denoted as ˆ S full . We then apply a logarithmic operationto preserve the majority of the details and feed the outputto the LDR correction network. Hence, the recovered LDRimage ˆ I ldr through our network becomes: ˆ I ldr = f ( log ( ˆ S full + δ ); θ ) , (3)where log () is used to compress the full HDR domain forconvergence while maintaining a relatively large range ofintensity, and δ is a small constant to remove zero values.With the ground truth LDR image I gt , the loss function is: Loss ldr = 12 N N (cid:88) i =1 ( (cid:13)(cid:13)(cid:13) ˆ I ildr − I gti (cid:13)(cid:13)(cid:13) + (cid:15) (cid:13)(cid:13)(cid:13) ˆ S i − α ( Y i ) γ (cid:13)(cid:13)(cid:13) ) , (4)where (cid:15) is a balancing parameter to control the influence ofthe HDR reconstruction accuracy. Hierarchical Supervision.
We train this LDR correctionnetwork together with the aforementioned HDR estimationnetwork. We adopt this end-to-end training strategy in or-der to adapt our whole model to the domain reciprocat-ing transformation. To facilitate the training process, weadopt the hierarchical supervision training strategies simi-lar to [13]. Specifically, we start to train the encoder partand the shallowest deconv layer of the LDR correction net-work by freezing the learning rates of all other higher de-conv layers. During training, higher deconv layers are grad-ually added for fine tuning while the learning rates of theencoder and shallower deconv layers will be decreased. Inthis way, this network can learn to transfer the HDR detailsto LDR domain in a coarse-to-fine manner.
The proposed DRHT model is implemented under theTensorflow framework [9] on a PC with an i7 4GHz CPUand an NVIDIA GTX 1080 GPU. The network parametersare initialized using the truncated normal initializer. We use × and × kernel sizes to generate 64-dimensional fea-ture maps for the first two conv layers and their counterpartdeconv layers for both networks, and the remaining kernelsize is set to × . For loss minimization, we adopt theADAM optimizer [20] with an initial learning rate of 1e-2 for 300 epochs, and then use learning rate of 5e-5 withmomentum β = 0 . and β = 0 . for another 100epochs. α and γ in Eq. 2, and δ in Eq. 3 are set to . , . and / , respectively. We also clip the gradients toavoid the gradient explosion problem. The general trainingtakes about ten days and the test time is about 0.05s for a256 ×
512 image. (a) Input DRHT (64.75) (b) Input DRHT (65.61)(c) Input DRHT (61.80) (d) Input DRHT (69.28)(e) Input DRHT (62.69) (f) Input DRHT (69.04)(g) Input DRHT (69.57) (h) Input DRHT (62.17)(i) Input DRHT (61.80) (j) Input DRHT (65.18)low difference high difference
Figure 3: Internal Analysis. We compare the reconstructedHDR images with the ground truth HDR images using theHDR-VDP-2 metric. The average Q score and SSIM indexon this test set are . and . , respectively.
4. Experiments
In this section, we first present the experiment setups andinternal analysis on the effectiveness of the HDR estimationnetwork. We then compare our DRHT model with the state-of-the-art image correction methods on two datasets.
Datasets.
We conduct experiments on the city scenepanorama dataset [51] and the Sun360 outdoor panoramadataset [45]. Specifically, since the low-resolution (64 × , image pairs (i.e., the input LDR and the groundtruth HDR) to train the first network and use , triplets(i.e., the input LDR, the ground truth HDR and the groundtruth LDR) to train the whole network. We use , im-ages from their testing set for evaluation. To adapt ourmodels to the real images with high resolution, we use thePhysically Based Rendering Technology (PBRT) [27] togenerate ground truth HDR scenes as well as the in-put and ground truth LDR images, which are then dividedinto , patches for training. We also use , imagesfrom the Sun360 outdoor panorama dataset [45] for end-to-end finetuning (i.e., (cid:15) in Eq. 4 is fixed as ), as they do nota) Input (b) CAPE [17] (c) DJF [22] (d) L0S [46](e) WVM [8] (f) SMF [50] (g) DRHT (h) Ground Truth(i) Input (j) CAPE [17] (k) DJF [22] (l) L0S [46](m) WVM [8] (n) SMF [50] (o) DRHT (p) Ground Truth(q) Input (r) CAPE [17] (s) DJF [22] (t) L0S [46](u) WVM [8] (v) SMF [50] (w) DRHT (x) Ground TruthFigure 4: Visual comparison on overexposed images in the bright scenes. The proposed DRHT method can effectivelyrecover the missing details buried in the overexposed regions compared with state-of-the-art approacheshave ground truth HDR images, and use , images forevaluation. The input images are corrupted from the orig-inals by adjusting the exposure (selected from the interval[-6, 3], in order not to learn the mapping between one spe-cific exposure degree and the ground truth) and contrasts toover/under expose the visible details. We resize the imagesto 256 ×
512 pixels in this dataset.
Evaluation Methods.
We compare the proposedmethod to state-of-the-art image correction methodsCape [17],WVM [8], SMF [50], L0S [46] and DJF [22]on the dataset. Among them, Cape [17] enhances the im-ages via a comprehensive pipeline including global con-trast/saturation correction, sky/face enhancement, shadow-saliency and texture enhancement. WVM [8] first decom-poses the input image into reflectance and illumination maps, and corrects the input by enhancing the illumina-tion map. Since the enhancement operations are mostlyconducted on the detail layer extracted by existing filteringmethods, we further compare our results to state-of-the-artimage filtering methods. Meanwhile, we compare the pro-posed method to two deep learning based image correctionmethods: Hdrcnn [6] and DrTMO [7]. Evaluation Metrics.
We evaluate the performance usingdifferent metrics. When internal analyzing the HDR estima-tion network, we use the widely adopted HDR-VDP-2 [25]metric it reflects human perception on different images.When comparing with existing methods, we use three com-monly adopted image quality metrics: PSNR, SSIM [44]and FSIM [53]. In addition, we provide the Q scores fromthe HDR-VDP-2 [25] metric to evaluate the image quality.a) Input (b) CAPE [17] (c) DJF [22] (d) L0S [46](e) WVM [8] (f) SMF [50] (g) DRHT (h) Ground Truth(i) Input (j) CAPE [17] (k) DJF [22] (l) L0S [46](m) WVM [8] (n) SMF [50] (o) DRHT (p) Ground Truth(q) Input (r) CAPE [17] (s) DJF [22] (t) L0S [46](u) WVM [8] (v) SMF [50] (w) DRHT (x) Ground TruthFigure 5: Visual comparison on under/over exposed images in the dark scenes. The proposed DRHT method can effectivelyrecover the missing details in the under/over exposed regions while maintaining the global illumination.
As the proposed DRHT method first recovers the detailsvia the HDR estimation network, we demonstrate its effec-tiveness in reconstructing the details in the HDR domain.We evaluate on the city scene dataset using the HDR-VDP-2 metric [25]. It generates the probability map and the Qscore for each test image. The probability map indicates thedifference between two images to be noticed by an observeron average. Meanwhile, the Q score predicts the qualitydegradation through a Mean-Opinion-score metric.We provide some examples in Figure 3 which are fromthe city scene test dataset. We overlay the predicted visualdifference on the generated result. The difference intensityis shown via a color bar where the low intensity is markedas blue while the high intensity is marked as red. It shows that the proposed HDR estimation network can effectivelyrecover the missing details on the majority of the input im-age. However, the limitation appears on the region wherethe part of sun is occluded by the building, as shown in (j).It brings high difference because the illumination contrastis high around the boundary between sun and the building.This difference is difficult to preserve in the HDR domain.The average Q score and SSIM index on this test set are . and . , respectively. They indicate that the syn-thesized HDR data through our HDR estimation network isclose to the ground truth HDR data. We compare the proposed DRHT method with state-of-the-art image correction methods on the standard bench-ethods City Scene dataset Sun360 Outdoor datasetPSNR SSIM FSIM Q score PSNR SSIM FSIM Q scoreCAPE [17] 18.99 0.7435 0.8856 59.44 17.13 0.7853 0.8781 54.87WVM [8] 17.70 0.8016 0.8695 53.17 11.25 0.5733 0.6072 41.12L0S [46] 19.03 0.6644 0.7328 84.33 15.72 0.7311 0.7751 51.73SMF [50] 18.61 0.7724 0.9035 81.07 14.85 0.6776 0.7622 50.77DJF [22] 17.54 0.7395 0.9512 84.74 14.49 0.6736 0.7360 50.03DRHT 28.18 0.9242 0.9622 97.87 22.60 0.7629 0.8691 56.17Table 1: Quantitative evaluation on the standard datasets. The proposed DRHT method is compared with existing imagecorrection methods based on several metrics including PSNR, SSIM, FSIM and Q score. It shows that the proposed DRHTmethod performs favorably against existing image correction methods.Methods City Scene dataset Sun360 Outdoor datasetPSNR SSIM FSIM Q score PSNR SSIM FSIM Q scoreHdrcnn [6] 11.99 0.2249 0.5687 39.64 11.09 0.6007 0.8637 56.31DrTMo [7] - - - - 14.64 0.6822 0.8101 52.39DRHT 28.18 0.9242 0.9622 97.87 22.60 0.7629 0.8691 56.17Table 2: Quantitative evaluation between the proposed DRHT method and two HDR prediction methods. The results ofDrTMo on the City Scene Dataset are not available as it requires high resolution inputs. The evaluation indicates the proposedDRHT method is effective to generate HDR data compared with existing HDR prediction methods.marks. The visual evaluation is shown in Figure 4 wherethe input images are captured in over exposure. The imagefiltering based methods are effective to preserve local edges.However, they cannot recover the details in the overexposedregions, as shown in (c), (d) and (f). It is because thesemethods tend to smooth the flat region while preserving thecolor contrast around the edge region. They fail to recoverthe details, which reside in the overexposed regions wherethe pixel values approach 255. Meanwhile, the image cor-rection methods based on global contrast and saturation ma-nipulation are not effective as shown in (r). They share thesimilar limitations as image filtering based methods as thepixel-level operation fails to handle overexposed images.The results of WVM [8] tend to be brighter as shown in(e), (m) and (u) as they over enhance the illumination layerdecomposed from the input image. Compared with existingmethods, the proposed DRHT method can successfully re-cover the missing details buried in the over exposed regionswhile maintaining the realistic global illumination.Figure 5 shows some under/over exposed examples inthe low-light scenes. It shows that the image filtering basedmethods can only strengthen existing details. CAPE [17]performs well in the low-light regions as shown in (b) butit simply adjusts the brightness and thus fails to correct allmissing details. Figure 5(i) shows that WVM [8] performspoorly in the scenes with dark skies, as it fails to decom-pose the dark sky into reflectance and the illumination lay-ers. Meanwhile, the missing details in the under/over ex-posed regions can be reconstructed via the proposed DRHT method as shown in (h) and (p). Global illumination is alsomaintained through residual learning.We note that the proposed DRHT method tends toslightly increase the intensity in the dark regions. Thereare two reasons for this. First, DRHT is trained on the cityscene dataset [51], where the sun is always located nearthe center of the images. Hence, when the input imagehas some bright spots near to the center, the night sky willtend to appear brighter as shown in Figure 5(p)). Second, aswe use the first network to predict the gamma compressedHDR image and then map it back to the LDR in the logarith-mic domain, low intensity values may be increased throughthe inverse gamma mapping and logarithmic compressionas shown in Figure 5(h).In additional to visual evaluation, we also provide quan-titative comparison between the proposed method and ex-isting methods as summarized in Table 1. It shows that theproposed method performs favorably against existing meth-ods under several numerical evaluation metrics.We further compare the proposed DRHT method withtwo HDR prediction methods ( i.e ., DrTMO [7] and Hdr-cnn [6]). These two methods can be treated as image correc-tion methods because their output HDR image can be tonemapped into the LDR image. In [7], two deep networks areproposed to first generate up-exposure and down-exposureLDR images from the single input LDR image. As eachimage with limited exposure cannot contain all the detailsof the scene to solve the under/over exposure problem, theyfuse these multiple exposed images and use [18] to gen-a) Input (b) DrTMo [7] (c) Hdrcnn [6] (d) DRHT (e) Ground Truth(e) Input (f) DrTMo [7] (g) Hdrcnn [6] (h) DRHT (e) Ground TruthFigure 6: Visual comparison with two HDR based correction methods: DrTMo [7] and Hdrcnn [6], on the Sun360 outdoordataset. The proposed DRHT performs better than these two methods in generating visually pleasing images.erate the final LDR images. Eilertsen et al . [6] propose adeep network to blend the input LDR image with the re-constructed HDR information in order to recover the highdynamic range in the LDR output images. However, byusing the highlight masks for blending, their method can-not deal with the under exposed regions and their resultstend to be dim as shown in Figures 6(c) and 6(g). Mean-while, we can also observe obvious flaws in the output im-ages of both DrTMO [7] and Hdrcnn [6] (e.g., the man’swhite shirt in Figure 6(b) and the blocking effect in thesnow in Figure 6(g)). The main reason lies in that exist-ing tone mapping methods fail to preserve the local detailsfrom the HDR domain when the under/exposure exposureproblem happens. In comparison, the proposed DRHT is ef-fective to prevent this limitation because we do not attemptto recover the whole HDR image but only focus on recov-ering the missing details by residual learning. The quanti-tative evaluation results shown in Table 2 indicate that theproposed DRHT method performs favorably against theseHDR prediction methods.
Despite the aforementioned success, the proposedDRHT method contains limitation to recover the detailswhen significant illumination contrast appears on the inputimages. Figure 7 shows one example. Although DRHT caneffectively recover the missing details of the hut in the un-derexposed region (i.e., the red box in Figure 7), there arelimited details around the sun (i.e., the black box). This ismainly because of the large area of overexposed sunshineis rare in our training dataset. In the future, we will aug-ment our training dataset to incorporate such extreme casesto improve the performance.
5. Conclusion
In this paper, we propose a novel deep reciprocatingHDR transformation (DRHT) model for under/over ex- (a) Input (b) DRHTFigure 7: Limitation analysis. The proposed DRHT methodis effective to recover the missing details in the underex-posed region marked in the red box, while limits on theoverexposed sunshine region marked in the black box.posed image correction. We first trace back to the imageformulation process to explain why the under/over expo-sure problem is observed in the LDR images, accordingto which we reformulate the image correction as the HDRmapping problem. We show that the buried details in theunder/over exposed regions cannot be completely recoveredin the LDR domain by existing image correction methods.Instead, the proposed DRHT method first revisits the HDRdomain and recovers the missing details of natural scenesvia the HDR estimation network, and then transfers the re-constructed HDR information back to the LDR domain tocorrect the image via another proposed LDR correction net-work. These two networks are formulated in an end-to-end manner as DRHT and achieve state-of-the-art correc-tion performance on two benchmarks.
Acknowledgements
We thank the anonymous reviewers for the insightful andconstructive comments, and NVIDIA for generous donationof GPU cards for our experiments. This work is in part sup-ported by an SRG grant from City University of Hong Kong(Ref. 7004889), and by NSFC grant from National NaturalScience Foundation of China (Ref. 91748104, 61632006,61425002). eferences [1] L. Bao, Y. Song, Q. Yang, and N. Ahuja. An edge-preservingfiltering framework for visibility restoration. In
InternationalConference on Pattern Recognition , 2012.[2] S. Bi, X. Han, and Y. Yu. An l image transform for edge-preserving smoothing and scene-level intrinsic decomposi-tion. ACM Transactions on Graphics , 2015.[3] J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang. Segflow:Joint learning for video object segmentation and opticalflow. In
IEEE International Conference on Computer Vision ,2017.[4] D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast andaccurate deep network learning by exponential linear units(elus). arXiv:1511.07289 , 2015.[5] P. Debevec and J. Malik. Recovering high dynamic rangeradiance maps from photographs. In
ACM Transactions onGraphics (SIGGRAPH) , 2008.[6] G. Eilertsen, J. Kronander, G. Denes, R. Mantiuk, andJ. Unger. Hdr image reconstruction from a single exposureusing deep cnns.
ACM Transactions on Graphics , 2017.[7] Y. Endo, Y. Kanamori, and J. Mitani. Deep reverse tone map-ping.
ACM Transactions on Graphics , 2017.[8] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding. Aweighted variational model for simultaneous reflectance andillumination estimation. In
IEEE Conference on ComputerVision and Pattern Recognition , 2016.[9] Google. Tensorflow.[10] X. Guo, Y. Li, and H. Ling. Lime: Low-light image enhance-ment via illumination map estimation.
IEEE Transactions onImage Processing , 2017.[11] K. He, J. Sun, and X. Tang. Guided image filtering.
IEEETransactions on Pattern Analysis and Machine Intelligence ,2013.[12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor image recognition. In
IEEE Conference on ComputerVision and Pattern Recognition , 2016.[13] S. He, J. Jiao, X. Zhang, G. Han, and R. Lau. Delving intosalient object subitizing and detection. In
IEEE InternationalConference on Computer Vision , 2017.[14] S. J. Hwang, A. Kapoor, and S. B. Kang. Context-based au-tomatic local image enhancement. In
European Conferenceon Computer Vision , 2012.[15] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, andT. Brox. Flownet 2.0: Evolution of optical flow estimationwith deep networks. In
IEEE Conference on Computer Vi-sion and Pattern Recognition , 2017.[16] S. Ioffe and C. Szegedy. Batch normalization: Acceleratingdeep network training by reducing internal covariate shift. In
International Conference on Machine Learning , 2015.[17] L. Kaufman, D. Lischinski, and M. Werman. Content-awareautomatic photo enhancement.
Computer Graphics Forum ,2012.[18] M. Kim and J. Kautz. Consistent tone reproduction. In
In-ternational Conference on Computer Graphics and Imaging ,2008. [19] Y. Kim, H. Jung, D. Min, and K. Sohn. Deeply aggregatedalternating minimization for image restoration. In
IEEE Con-ference on Computer Vision and Pattern Recognition , 2017.[20] P. Kingma and J. Ba. Adam: A Method for Stochastic Opti-mization. arXiv:1412.6980 , 2014.[21] J. Li, X. Chen, D. Zou, B. Gao, and W. Teng. Conformaland low-rank sparse representation for image restoration. In
IEEE International Conference on Computer Vision , 2015.[22] Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep jointimage filtering. In
European Conference on Computer Vi-sion , 2016.[23] S. Liu, J. Pan, and M.-H. Yang. Learning recursive filters forlow-level vision via a hybrid neural network. In
EuropeanConference on Computer Vision , 2016.[24] Z. Ma, K. He, Y. Wei, J. Sun, and E. Wu. Constant timeweighted median filtering for stereo matching and beyond. In
IEEE International Conference on Computer Vision , 2013.[25] R. Mantiuk, K. Joong, A. Rempel, and W. Heidrich. Hdr-vdp-2: A calibrated visual metric for visibility and qualitypredictions in all luminance conditions.
ACM Transactionson Graphics , 2011.[26] X. Mei, H. Qi, B.-G. Hu, and S. Lyu. Improving imagerestoration with soft-rounding. In
IEEE International Con-ference on Computer Vision , 2015.[27] M. Pharr, W. Jakob, and G. Humphreys.
Physically basedrendering: From theory to implementation . 2016.[28] S. Pizer, E. Amburn, J. Austin, R. Cromartie, A. Geselowitz,T. Greer, H. Bart, J. Zimmerman, and K. Zuiderveld. Adap-tive histogram equalization and its variations.
Computer Vi-sion, Graphics, and Image Processing , 1987.[29] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang.Single image dehazing via multi-scale convolutional neu-ral networks. In
European Conference on Computer Vision ,2016.[30] W. Ren, J. Pan, X. Cao, and M.-H. Yang. Video deblur-ring via semantic segmentation and pixel-wise non-linearkernel. In
IEEE International Conference on Computer Vi-sion , 2017.[31] A. Rivera, B. Ryu, and O. Chae. Content-aware dark imageenhancement through channel division.
IEEE Transactionson Image Processing , 2012.[32] L. Shen, Z. Yue, F. Feng, Q. Chen, S. Liu, and J. Ma.MSR-net:Low-light Image Enhancement Using Deep Con-volutional Network. arXiv:1711.02488 , 2017.[33] X. Shen, C. Zhou, L. Xu, and J. Jia. Mutual-structure forjoint filtering. In
IEEE International Conference on Com-puter Vision , 2015.[34] M. Son, Y. Lee, H. Kang, and S. Lee. Art-photographic detailenhancement.
Computer Graphics Forum , 2014.[35] Y. Song, L. Bao, S. He, Q. Yang, and M.-H. Yang. Stylizingface images via multiple exemplars.
Computer Vision andImage Understanding , 2017.[36] Y. Song, L. Bao, X. Xu, and Q. Yang. Decolorization: Isrgb2gray () out? In
ACM SIGGRAPH Asia Technical Briefs ,2013.[37] Y. Song, L. Bao, and Q. Yang. Real-time video decoloriza-tion using bilateral filtering. In
IEEE Winter Conference onApplications of Computer Vision , 2014.38] Y. Song, C. Ma, L. Gong, J. Zhang, R. Lau, and M.-H. Yang.Crest: Convolutional residual learning for visual tracking. In
IEEE International Conference on Computer Vision , 2017.[39] Y. Song, J. Zhang, L. Bao, and Q. Yang. Fast preprocess-ing for robust face sketch synthesis. In
International JointConference on Artificial Intelligence , 2017.[40] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistentmemory network for image restoration. In
IEEE Interna-tional Conference on Computer Vision , 2017.[41] Y.-H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M.-H.Yang. Deep image harmonization. In
IEEE Conference onComputer Vision and Pattern Recognition , 2017.[42] B. Wang, Y. Yu, T.-T. Wong, C. Chen, and Y.-Q. Xu. Data-driven image color theme enhancement.
ACM Transactionson Graphics , 2010.[43] B. Wang, Y. Yu, and Y.-Q. Xu. Example-based image colorand tone style enhancement.
ACM Transactions on Graph-ics , 2011.[44] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Imagequality assessment: from error visibility to structural simi-larity.
IEEE Transactions on Image Processing , 2004.[45] J. Xiao, K. Ehinger, A. Oliva, and A. Torralba. Recognizingscene viewpoint using panoramic place representation. In
IEEE Conference on Computer Vision and Pattern Recogni-tion , 2012.[46] L. Xu, C. Lu, Y. Xu, and J. Jia. Image smoothing via l0 gra-dient minimization.
ACM Transactions on Graphics , 2011.[47] L. Xu, Q. Yan, Y. Xia, and J. Jia. Structure extraction fromtexture via relative total variation.
ACM Transactions onGraphics , 2012.[48] J. Yan, S. Lin, S. Bing Kang, and X. Tang. A learning-to-rankapproach for image color enhancement. In
IEEE Conferenceon Computer Vision and Pattern Recognition , 2014.[49] Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu. Automaticphoto adjustment using deep neural networks.
ACM Trans-actions on Graphics , 2015.[50] Q. Yang. Semantic filtering. In
IEEE Conference on Com-puter Vision and Pattern Recognition , 2016.[51] J. Zhang and J.-F. Lalonde. Learning high dynamic rangefrom outdoor panoramas. In
IEEE International Conferenceon Computer Vision , 2017.[52] K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnndenoiser prior for image restoration. In
IEEE Conference onComputer Vision and Pattern Recognition , 2017.[53] L. Zhang, L. Zhang, X. Mou, and D. Zhang. Fsim: A featuresimilarity index for image quality assessment.
IEEE Trans-actions on Image Processing , 2011.[54] Q. Zhang, X. Shen, L. Xu, and J. Jia. Rolling guidance filter.In
European Conference on Computer Vision , 2014.[55] Q. Zhang, L. Xu, and J. Jia. 100+ times faster weighted me-dian filter (wmf). In